JOE PALCA, host:
This is TALK OF THE NATION: SCIENCE FRIDAY from NPR News. I'm Joe Palca, sitting in for Ira Flatow.
Later in the hour, we'll be taking up a scratchy subject: itch.
But first, if the foiled German terror plot, the latest bin Laden video, and the president's speech last night are any indication - the terrorist threat isn't over.
As intelligence agencies continue to track terrorist activities worldwide, they may have some help from a rather unlikely source - academic mathematicians and computer scientists.
For example, mathematicians have developed theoretical tools that can be used to reveal things like the chain of command in a terrorist cell, or how many volunteers you need to be disable - how many volunteers you need to disable to disrupt the group, or say you want to sift through the massive amounts of information out there in cyberspace looking for terrorist activity.
Computer scientists are used to dealing with mountains of data. They're using those skills to scan e-mails, forum postings and online videos that could reveal an attack in the making.
We'll start the hour today talking about how math and computer science can help track down terrorists.
So if you'd like to join our discussion, give us a call. Our number is 1-800-989-8255, that's 1-800-989-TALK. And if you want more information about what we're talking about this hour, go to our Web site at www.sciencefriday.com, where you'll find links to the topic.
Now, let me introduce my guests. Bernard Brooks is a professor of mathematics and the head of research programs in the School of Mathematics - Mathematical Sciences at the Rochester Institute of Technology in Rochester, New York. He joins me today from the studios of member station WXXI in Rochester.
Welcome to the program, Dr. Brooks.
Dr. BERNARD BROOKS (Mathematics; Head of Research Programs, School of Mathematical Sciences, Rochester Institute of Technology): Thank you. Good afternoon.
PALCA: Good afternoon.
Also with us is Hsinchun Chen. He's the director of the Artificial Intelligence Laboratory and a professor of management information systems at the University of Arizona in Tucson. He joins me by phone from his office there.
Welcome to the program, Dr. Chen.
Dr. HSINCHUN CHEN (Director, Artificial Intelligence Laboratory; Professor, Management Information Systems, University of Arizona): Hi. Good morning.
PALCA: Good morning, or afternoon, depending on where you're sitting. Well, thanks to both of you for joining us.
Dr. Brooks, maybe we could start with you. The recent - there was this terror plot in Germany that the German officials broke up. How would mathematics or a mathematician be able to help uncover that plot or do something about it?
Dr. BROOKS: Well, mathematics is very useful for optimizing the resources that you have. We're working with - in terrorist hunting, we're working with limited resources and we have to make the best use of it.
If you think about the - let's say the communication network that the terrorists would have used for that particular cell, if that communication network can be monitored and one of the member of the cell can be identified, then we can use some of the algorithms that are useful for searching social networks in order to identify the clique or the terrorist cell.
PALCA: Well, explain what - first of all, what's a social network then?
Dr. BROOKS: It comes out of graph theory, which is an area that - area of mathematics that overlaps with sociology. If you think of a - let's say a fishnet, where the knots are the people and the strings in between the people are the relationships, we get what we refer to as a graph, so that the nodes or the dots are the people, and everybody I know is connected by edges from me to them.
Dr. BROOKS: Now, once I draw one of these sort of a fishnet or a spider web-type structures, we can identify patterns in it that could correspond to the hierarchical structure of a terrorist cell. Now...
PALCA: But don't you need to have some data? I mean where does the data for the connections or the nodes come from to begin with?
Dr. BROOKS: Well that would have to come from monitoring communications building the network.
PALCA: I see. I see. So in other words, if there's someone out there that's collecting this information, you can begin to make sense of it?
Dr. BROOKS: Yes. The science of the study of social networks has to start with the actual network laid down, you're right.
PALCA: Well, maybe I could turn to Dr. Chen, then. Is the work of computer scientists then something that could be gathering that data that maybe Dr. Brooks could use to set up algorithms to find people?
Dr. CHEN: Yeah. I think definitely - I think that's everyone want to see in a little bit. Clearly, the intelligence agency might be collecting data on various groups, various (unintelligible) and so on, and a lot of them are security level data.
But from the computer science or information science viewpoint - a lot of data that can be collected from the Web, especially from a targeted group, and just using my project, a project called Darkweb, which is monitoring the darker side of the Web in this scenario while looking at a lot of the Web sites and forums and bulletin boards, which are heavily used by some of the terrorist group.
So we do collect those data, and some of that may not contain very violent actual operations, but they do give you indication of who is linking to whom and the timing or the kind of surge of activities. So this will become data sources that can be used by the mathematicians to understand the cliques, the spheres and so on.
PALCA: I see. Tell me a little bit more about what you mean when you say dark Web.
Dr. CHEN: The Web is bright, is open, but at the same time, the Web is also dark and also hidden. And when we define "Dark Web," we're really looking at the side of the Web, which has a darker purpose to it, people hiding, as what this could be. Some kinds of crime could be hackers hiding in the Internet, this could be child pornography, this could be extremists or terrorists who are using the Web also for their own purpose, or really, the dark is a relative term - they just hide in the dark. And many of the purposes may be illegal or may be more violent and so on.
PALCA: So how do they - how do the dark Webs elude the kind of search engines like Google or what have you that typically find everything?
Dr. CHEN: Well, most of the search engines, they try to capture, quote, unquote, "all the contents in the Web," but because of the commercial reasons that they need to get pages that will have some commercial values.
So in many cases, they actually don't do exhaustive search. They may be doing a sampling of 20, 30 percent of portion of the Net. But here, as you can see, is the darker side of the Web. So a lot of those Web sites may be dangling, which means they're not connected to anyone. They will not submit their URLs to Google to search. So they're hiding, you know, because they are targeting a particular subset of the society of interest to them. So they are not - they are very difficult to find.
So Google, in most cases, they might have some content, but definitely they only have a very small percentage of those content.
PALCA: We're talking about the use of computer science and mathematical tools to track terror. And we are taking your calls at 1-800-989-8255, that's 1-800-989-TALK.
And, Dr. Brooks, maybe I could turn back to you. So if you - if Dr. Chen's data show - comes to you and it shows that two people are talking, how do you assess, you know, who's on top, who's on the bottom, who's giving the orders, who's carrying them out? How does the mathematical theory account for these things?
Dr. BROOKS: Well, there are search algorithms that look for particular patterns. And the patterns that we would be looking for would be different, depending on whether the cell that we think we're looking for is more of an amateur terrorist cell. If the people knew each other before they set up the terrorist cell, if they set themselves up - if they were just a group of disgruntled individuals, this probably happens more in domestic terrorists.
If you think about five people, they would all have known each other beforehand and wouldn't have the opportunity to set up the optimum, the perfect terrorist cell. So that would manifest itself as a complete subgraph. In other words, five people, they would all know each other. So you would have five dots and out of each one of those dots, the five people, you would have four edges emanating to each other, and that would look like a complete subgraph that you would be looking for.
On the other hand, Jonathan Farley, one of the organizers of the terrorists conference coming up this next week, is - he presented at last year's terrorist conference a mathematical theory that worked with the lattice theory and posets that set up a way to design the optimal terrorist cell. So...
PALCA: Do I want to know what posets are or is that - can we just posit that those are something you use in mathematics?
Dr. BROOKS: It's like a graph...
Dr. BROOKS: ...so that the fishbone - oh, sorry, I'm made it myself - the fishnet structure that I talked about before.
PALCA: Yeah, uh-huh.
Dr. BROOKS: It doesn't have a top or a bottom, whereas a poset would have a top or a bottom. So we're starting to work some hierarchy into these social networks - that's a good way to think about it. And he's set up this optimal terrorist cell in sort of a fishbone-like structure, think of it as a ladder with one leg removed, where commanders have a trusted lieutenant on one side and then the rest of the network below them. And he's designed this so that if it's disrupted in any way - in other words, one of the agents is compromised -then that sort of structure, that sort of network is the most resilient to that sort of compromise or capture of an agent.
Dr. BROOKS: So if a professional terrorist organization that had the luxury of setting up the perfect command structure, they would choose that sort of thing. So if we're looking for a more professional cell, we would be looking for something similar to that structure.
Dr. BROOKS: So now as a terrorist organization, I have to make - I would have to make a choice. I would have to say, do I go with the optimal structure, in which case I would be slightly easier to find with these search algorithms, or do I go with the less optimal structure, become a little bit harder to find but easier to compromise when I am found.
PALCA: All right. Well, let's see if our listeners have any questions about this. And let's go first to Roger(ph) in Rockville. Roger, welcome to the program.
ROGER (Caller): Thank you. When I heard the topic of this show, it made me think of the TV show "Numbers."
ROGER: I wondered whether your guests were familiar with it and whether they thought it was realistic.
PALCA: Hmm. Do - gentlemen, Dr. Brooks, Dr. Chen, do you know this program?
Dr. CHEN: Which program is it?
PALCA: It's called "Numbers."
Dr. CHEN: I'm not familiar with it.
PALCA: Oh. Dr. Brooks?
Dr. BROOKS: I have seen it. It's on Friday nights and one of the organizers, Anthony Harkin, of the conference coming up next week has actually done some consulting for that show.
PALCA: Really? Is it, are they realistic at all? What - first of all, maybe you could give a brief description of what the show is about.
Dr. BROOKS: The show is about an FBI agent and his mathematician brother, and they solve mysteries and crimes using mathematics. He...
PALCA: I see. So this has already passed into popular domain?
Dr. BROOKS: Yes. I mean, as a mathematician, it's nice to see the hero of the show being a mathematician. Easy for you...
PALCA: Yeah. Next thing you know, they'll have a hero of the show being a radio talk show host. You know, that would really be going out on a limb. Is it realistic, I guess, is Roger's question.
Dr. BROOKS: The mathematics that they use is real. Is it realistic when they do some calculations on a blackboard and discover, oh, the bad person is hiding behind such and such a bush, and the way they catch them at the end of the hour, no. It's...
PALCA: You guys are just going to have to work harder, that's all. Well, I'm afraid we're - well, so it has a nodding acquaintance with reality, but not totally realistic. Okay. We have to take a short break. When we come back, we'll be talking more with our guests about using computer science and mathematics to track terrorism. Stay with us.
(Soundbite of music)
PALCA: From NPR News this is TALK OF THE NATION: SCIENCE FRIDAY. I'm Joe Palca.
We're talking this hour about using math and science to fight terror. My guests are Hsinchun Chen, director of the Artificial Intelligence Laboratory at the University of Arizona in Tucson; and Bernard Brooks, professor of mathematics at the Rochester Institute of Technology in Rochester, New York.
And Dr. Chen, I want to turn to you first and ask - if you don't know what's out there, if people are hiding on the Web, how do you know where to start to look for them?
Dr. CHEN: Well, for computer scientists, definitely, we don't know the terrorism area as good any other terrorism researchers. So the first thing that we do is actually work with those people who have been monitoring those groups for years. And there are some international research groups, there are people in West Point, combating terrorism centers, there are people in other academic institutions. A lot of them are communication researchers, political scientists, and they've been monitoring terrorist groups for some time.
But the problem has been that because of Internet, because of the overwhelming information and communication over in the cyberspace, it's very difficult for them to keep track of this phenomenon. So by working with them, we typically can identify certain groups, identify certain seed URLs, we identify a certain forum that they are aware of. Now, using that as the starting point, then we can do the crawling, so crawling is sort of a spidering term, meaning that search engine do that as what they call the Web.
So, we send computer programs starting with those feed, URLs, to go to other link sites or other forums so we're - just like what Dr. Brooks says - talk about we're casting the net out from initial small patches and then into the bigger connected Web. With that then we use various weighting mechanism to identify other potentially important Web sites and forums that missed, even by those terrorism researchers.
So, even in the collection that we have collected over the past three or four years, we do spidering or crawling about every two or three months, so we collect about four, five collections a year, and we're getting probably thousands of Web sites, hundreds of millions of Web pages, and even close to billions of postings in forums and so on...
Dr. CHEN: ...you find different languages, in English, in Arabic, and Spanish, and French, and so on. But it's really becoming a very tightly connected Web of the darker nature.
PALCA: Mmm. And - I'm just curious, have you stumbled across something that has caused you to think, oh, I might be on to something here? Something that makes you alarmed?
Dr. CHEN: Yeah. There's definitely a lot examples of those kind of content that they created. But in particular, you see a lot of Web sites that are very radical and - but extremely popular among those disgruntled set of Muslim young men, and you see the posting and you see this, sort of, the followers of those messages, and that's really very disturbing. We also have seen content in one of our recent analysis of these analyses is IEDs on the dark Web; what are the things that they talk about regarding improvised explosive devices on the Web.
PALCA: Right. Uh-huh.
Dr. CHEN: And there's - actually, we found out they have seven Web sites that acting as the hubs, disseminating IED-related postings and manuals and videos and so on, and those become very powerful tool to entice their audience.
PALCA: Interesting, interesting. Well, let's take another call now. And let's go Kyle(ph) in Aurora, Illinois. Kyle, welcome to the program.
KYLE (Caller): Thank you. My question - I can understand crawling Web sites and trying to find the different links of communication or the different connections between them all, but I would think that more sensitive type of information - I mean, wouldn't it be extremely easy for them to just use something like, very widely available encryption methods like OpenSSL, in order to mask, like, more sensitive communications.
So I guess, how do they - the things that are kind of broadcast to everyone, like maybe IED explosives, how to make something like that, I understand how to try to find something like that. But let's say that Osama was trying to send, you know, a message to one other person and how would - is there a way we can really even try?
PALCA: Right. So in other words, if the content and even if you know that message is being sent, the content may be encrypted.
PALCA: Let me ask Dr. Brooks. What do you think about that?
Dr. BROOKS: Well, I think Kyle's question is a good one and our search algorithms for searching the net would not pick up those, because we're looking for more general patterns. But once we did identify the individual, then you would specifically monitor their communications, and then a different team of mathematicians - people more skilled in cryptography than I am - would be able to open those.
PALCA: So in other words, you get them to the table and then let somebody else try and figure out what they're doing there.
Dr. BROOKS: Yes, yes.
Dr. CHEN: Actually, in my experience searching those Web sites and forums, the major proposition right now are really not in those secret communications, because in most cases, they know those Web sites and forums are sometimes monitored by intelligence agencies or research programs. So they're smart. They're cautious.
Dr. CHEN: So, a lot of those communications will be taken offline, from landlines and cell phones and so on. So we don't see a lot of evidence like that. But the biggest values of their Web sites and forum are really serving as a recruiting, incitement and radicalization tools.
Dr. CHEN: So, they are hitting the large populations of the young people. In fact, the head of the idea, not infectious disease, but infectious ideas.
PALCA: Right, right. Dr. Chen...
Dr. CHEN: So those people are not born terrorists, but they get infected.
PALCA: Dr. Brooks...
Dr. BROOKS: Dr. Chen's right to equate the information flow on the Internet with an infectious disease. It's the same mathematics that is used to model epidemics we're using to model information flow over the Internet as one example of a network. And he speaks of these sort of hub connections that the theory shows obviously that those would be the ones to remove if you wanted to slow down the flow of information across that particular net.
Dr. CHEN: That's right.
PALCA: You know, there are a group of people who are gathering on the Web actually, as this show is on the air in the Science School area and some of them have posed some questions. So I'm going to ask "Second Life" - sorry, "Second Life" and the Science School area of "Second Life."
And - so we have a question here from Azorro(ph) - well, I'm going to have to guess at the pronunciation, Azorro Todria(ph), who says, given that the net and communications in general have been monitored for some years now, have any useful mathematical models of terrorist communications been developed that would enable new cells to be detected by identifying them from their pattern of behavior, rather than what they actually say? That sounds a little bit like what you're trying to do, Dr. Brooks, isn't it?
Dr. BROOKS: Yes, I mean, the quick answer to Mr. Todria's question is yes, that's exactly what we're doing. You don't have to necessarily listen to the content, you can just look for the pattern of communication.
Dr. CHEN: Could I say something?
PALCA: Yes, sure, Dr. Chen.
Dr. CHEN: Well, for computer scientists, we actually - there is a branch of computer science that is dealing with text mining. So, in a content of text mining, as you know that in the forums and chat rooms, there are lot of new ideas and new topic and new ideologies that come up. And this branch of computer science called text mining, almost specifically use natural language processing, we actually look at contents or what do they say, and you can even look at the opinions, like looking at their sentiments. So, that's another branch called opinion mining. These, in the past, have been used to monitor -let's say, people - new put-up reviews and movie ratings, and so on. Do you like it? Do you hate it?
Dr. CHEN: So, when you look into the forums, you can actually identify certain forums or Web sites that become more radical and talk about these topics more. You can also identify opinion leaders who are the most radical members who brought idea that are infectious, that are sticky(ph), that people listen to him.
In one study we had with one of the forums, before we speak, it has almost like 850,000 posting, that have been collected over the past three or four years. In every single second, there are people on the Web, and yet there are about 60,000 people connecting to this pretty radical Muslim forums, mostly Arabic. But if you can track, in this case, its natural language processing using Arabic text, looking at the sentiment of angers and racial hatred, and with that also look at the search activities or with a combination of mathematical model and natural language processing, you can identify the most critical members, the most critical messages that you really need to pay more attention to.
PALCA: Let's take another call now and go to Kate(ph) in Washington, D.C. Kate, welcome to SCIENCE FRIDAY.
KATE (Caller): Hi, Joe. How are you?
KATE: And hello, gentlemen, how are you today?
Dr. CHEN: Good.
Dr. BROOKS: Hello, we're fine.
KATE: Good. Let me just throw a little bit into this. Two things you need to know. Number one, I'm an African-American woman, and number two, I'm a corporate anthropologist. And I deal in both of these arenas, both the social networks as well as computer science. And let me throw something in that I think needs to be developed. Most of what is being viewed now is being viewed, whether it's mathematics or whether it's in computer science, but the populations that are being targeted are populations that are coming out of the imagination, primarily of Western, frightful white men.
KATE: And what that brings - I mean, even though we want to believe that science is objective and that it's all interesting, the bottom line is that many, many ethnic groups in this country have different ways of dealing with one another from cultural perspective - in terms of clusters, chain migration, et cetera. And the problem I have with a lot of this - especially being a black woman in this country - is that you'll get situations like CODIS, like DNA and CODIS, which is the, you know, the DNA database. And you'll get certain things and certain populations where they believe they've got the perp.
But no one has gone behind the scenes to see how people actually deal with one another and weight it from a cultural perspective, sufficient that they actually are dealing with the fact that like in some black communities, the sex ratio between men and women - there're so few men, that in many cases some of the children are siblings. So they think they've got the guy with the DNA and it turns out that that neighborhood is full of half-sibs.
PALCA: Right. Right.
KATE: If you don't know that and if you don't weight it from the perspective of reality, you've targeted a group of people for no other reason than white guys in America are terrified and so they form this thing of what they think they see. And they really don't know what the heck they're doing.
PALCA: Well, I think it raises an interesting question, Kate. I want to ask Dr. Brooks what he makes of that. But I also want to - well, first, let's start with Dr. Brooks. That would be interesting. What do you think about what Kate just had to say there?
Dr. BROOKS: Well, certainly, just like when Roger asked about the "Numbers" program and how - my answer being that they don't find - they don't rely on the mathematics in real life and say that guy is behind the bush. She's right. It has to be combined with actual boots-on-the-ground people. So it's - the mathematics provides a tool that optimizes your search, but it doesn't - it has to be supplemented with...
Dr. BROOKS: ...traditional police work.
PALCA: I was thinking that - I mean, Kate's point is a real-world point. And we are talking more theoretically, but we are talking about the marriage of the real world and theory, so I guess it's important to consider what Kate is saying. Thanks very much for calling about that, Kate.
Dr. BROOKS: There are...
PALCA: Thank you. Go ahead, Doctor.
Dr. BROOKS: We took a - just as an experiment at school, at RIT. We made a social network out of the mathematicians and the psychologists because Dr. DiFonzo is the lead on this rumor research that I'm involved in and he's in the psychology department. So we did an anonymous survey to see who knows whom in both of those departments. And then, I took that social network and I applied a common clustering algorithm, this Newman algorithm. Basically what it does is it divides the network into groups. And I applied it blind, so we didn't know who was a mathematician and who was a psychologist. And it managed to separate out - in two runs, it managed to separate out the psychologists from the mathematicians just on their - just on the basis of who was connected to whom.
PALCA: Interesting. We're talking this hour about using computer science and mathematical tools to track terror.
I'm Joe Palca. And this is TALK OF THE NATION from NPR News.
Dr. Chen, I think I might have cut you off there. Did you want to say something?
Dr. CHEN: Yeah, I think that's an interesting question. I think when you are studying those - in my case, sort of a cyber activism phenomenon, there's certainly a difference between the virtual world and your real identity, and there are a lot of studies about that. Whether what you talk about in the virtual world is a reflection of your true identity - you could be somebody extremely violent on the virtual space but you are young, gentle kid in school. So there's - certainly, there's that differences of a person's real identity. And this is really, also, complicated by a lot of cultural context.
And I think I concur with the previous questions and comments is that you really need to understand the ground truth. But in the phenomena that we're studying, there could be the first generation, Muslim persons, and then they could be the homegrown group. And they could be people in Germany. They could be people in U.K and Canada. And they are all in very different social context.
And we have one recent study that we studied the presence of the Muslim woman in somewhat radical forums and what's their role. Again, they are Muslim women. They have a different cultural context, and they are in a cyberspace, so their identity are even more different from their traditional role, but we see a lot of examples that they are not just caring about those radical idea. They may provide some supporting role. But a lot of the issue that they care about are really the women's right and then in how they should be educated at about younger generation. So not all the contents are radical. They actually care about other issues of relevance to them.
PALCA: Let me take another question now from the community that's gathered in "Second Life" at the Science School. This is one comes from Devlin Yuri(ph) who asked, how about the legal framework for this and how about privacy issues? And I wonder, Dr. Chen, have you had to wrestle with that at all?
Dr. CHEN: Oh, all the time.
Dr. CHEN: That's really a question number one that we always get. I'm very sensitive about civil liberty issues and, you know, getting into the area - the gray area of, you know, how you identify innocent citizens and so on. In our research, we're very careful, if that is correct, to work with the domain experts or a terrorism researcher looking as a known group that has particular presence. But, you know, when you backtrack, the war of terror is really very politically defined, that clearly, the definition in U.S. is very different from the definition in U.K. versus in China, for example. And I came from Taiwan. There's really no known group of terrorist group in Taiwan.
So it's very politically defined - so when you have those terminology, so you really had to be very careful about the content you collect. And since I'm funded by NSF, I'm really doing a scientific study of what are they thinking, why are they doing certain things the way they do. So we have to be careful about collecting the content we have.
So as I described earlier, working with those terrorism researcher, we know there are certain known groups identified in this case by the State Department, then we find the linkage to them. And sometimes we really to have avoid using the word terrorist, because we may call you a cyber extremist or cyber activist, but when we do the math, this is not going to - and as I received an e-mail recently - actually a couple of days ago, it asked me, are you searching for innocent communication of U.S. citizens?
Dr. CHEN: And the answer is no.
PALCA: Dr. Chen, I'm afraid we're going to have to stop there because we've run out of time for this segment. But thank you very much for shedding some light on this. And I appreciate it.
Dr. Hsinchun Chen is director of the Artificial Intelligence Laboratory and professor of management information systems at the University of Arizona in Tucson. And I would also like to thank Bernard Brooks. He's professor of mathematics and the head of research programs in the School of Mathematical Sciences at Rochester Institute of Technology in Rochester, New York.