Data-Mining a Mountain of Phone Calls
Numerous reports allege that the National Security Agency may have been collecting telephone traffic information on millions of Americans. What could the NSA possibly hope to learn from such a mountain of data? As NPR's Larry Abramson reports, experts in data-mining are aiming their increasingly sensitive tools at just this kind of complex information, in the hopes of predicting when the risk of terrorist threats is the highest.
MELISSA BLOCK, host:
From NPR News, this is ALL THINGS CONSIDERED. I'm Melissa Block.
MICHELE NORRIS, host:
And I'm Michele Norris.
Today the Senate Intelligence Committee voted to approve General Michael Hayden as head of the CIA. As chief of the National Security Agency, Hayden oversaw domestic surveillance efforts as part of the war on terror. The government has refused to confirm or deny reports the agency also collected phone records on millions of domestic calls.
BLOCK: Reports about that program have raised questions about what the government could learn from those phone records. Many security experts say the answer is simple. The agency is data mining, trying to find meaningful patterns in phone records that could help prevent a terrorist attack.
NPR's Larry Abramson reports.
LARRY ABRAMSON reporting:
Even before 9/11, intelligence analysts were drowning in data. They were looking for a life raft, a way to float above the tides so they could see where they were going. Erik Kleinsmith helped design one of those survival tools as part of a team working on a Pentagon project called Able Danger. Part of the job involved coming up with a list of indicators and warnings.
Mr. ERIK KLEINSMITH (Chief of Intelligence of the Land Information Warfare Analysis): It's just a series of indicators that once I see the criteria for each of these 12, I know I've got more of a chance of this event happening. And so it's almost like looking at the forensics of a crime before the crime happens.
ABRAMSON: Able Danger became famous after 9/11 because some believed this data- mining program identified ringleader Mohamed Atta as someone to watch out for simply by trolling the internet and looking at connections between different terror groups and operatives.
So Kleinsmith, who now works for Lockheed, is not surprised at all that the NSA might be analyzing phone traffic to come up with a similar warning system.
Mr. KLEINSMITH: The next time I see these three phone calls being made, I can definitively say within 72 hours this event is going to happen. That's the kind of benefit of doing data mining that just looking through manually all that traffic is totally impossible.
ABRAMSON: Kleinsmith says phone traffic would in fact be a good place to start looking for connections that might serve as a trip wire to warn of an imminent attack. But you have to know what you're looking for. Just what does phone traffic look like before a terror attack?
U.S. analysts may well have a picture based on 9/11 calling patterns. USA Today cites anonymous sources as saying the government follows domestic calls a person makes after receiving a call from Pakistan, Afghanistan or the Middle East.
But Valdas Krebs(ph) who analyzes similar connections for commercial phones, says phone traffic patterns could be a bad viewfinder for hunters of terrorists.
MR. VALDAS KREBS (Commercial phone analyst): If you and I are working on a project together, we're doing certain things like meeting deadlines and coordinating and making sure we have resources. Well, the terrorists are also working on a project together, are doing the same thing.
ABRAMSON: The picture gets even fuzzier if you consider that media reports indicate the phone numbers provided to the NSA were anonymous and for many experts, that would limit their value. Usama Fayyad is chief data officer for Yahoo!
Mr. USAMA FAYYAD (Yahoo!): Phone call data, if you actually absolutely have no other information other than phone numbers, unique phone numbers, I don't think it could lead you to anything interesting.
ABRAMSON: Yahoo! is constantly trolling through consumer data for meaningful patterns about attractive products or effective ads. Fayyad says Yahoo! can find patterns in anonymous data, but that's only because people go to Yahoo with a similar goal in mind.
Mr. FAYYAD: You understand what the ads are about, you can measure the degree of response of the consumer to that content, to the product or to the ads and so forth. And that allows you to generate these reports that are completely anonymous, they are aggregated, but they tell you something about the behavior of the population.
ABRAMSON: Telephone callers, on the other hand, may behave the same but actually have little in common. That means the NSA would have to zero in on certain numbers in the search for meaningful patterns. Here the government faces another problem of information overload. According to some reports, the National Counterterrorist Center has 325,000 names on its terrorist watch list.
Mathematician Jonathan Farley, at Stanford's Center for International Security and Cooperation, says by sweeping in suspects with such a big broom, the links the NSA finds are likely to be garbage.
Mr. JONATHAN FARLEY (Stanford Center for International Security and Cooperation): Because we all know about the famous six degrees of separation, that we're all pretty much connected through a short series of links. So by that token, you could say George Bush is connected to Osama bin Laden, because George Bush knows someone that knows someone who was the brother of Osama bin Laden.
ABRAMSON: Farley says the quality of the NSA's data mining is only as good as any additional information the agency can use to place those calls in context. Without that additional data, government analysis of phone traffic may raise questions about civil liberties and yield little else.
Larry Abramson, NPR News, Washington.
Copyright © 2006 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.
NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.