RENEE MONTAGNE, Host:
Many people outside the Muslim world follow the news through social media. It helps if you can speak a local language. As NPR's Christopher Joyce reports, American scientists are trying to get computers to work around that problem.
CHRISTOPHER JOYCE: Computer scientist Rohini Srihari says existing computer translators for Urdu are often too literal.
ROHINI SRIHARI: What I want is to determine who are the people, places and things being talked about. Is there an opinion being expressed? Is it a positive or negative opinion being expressed?
JOYCE: At the University of Buffalo, Srihari has developed a natural language program that she says can do that. The computer has learned the nuances of written Urdu. Some of this is fairly mechanical. Urdu doesn't have the same kind of clear breaks between one word and the next the way English does. Some things are subtle, as in characters and words whose placement may connote sentiment.
SRIHARI: And when you're able to figure out what the topic of the conversation is, what kind of sentiment is being expressed around that, that's the goal of what we're trying to do.
JOYCE: On the screen, Srihari says, you can mouse over a section of script, and if it carries a negative connotation, it will highlight red. If it has a positive sentiment, it glows green. Srihari says the computer allows her to mine the Internet. Her research company gets funding from the Pentagon for the project.
SRIHARI: So in Twitter posts and tweets and so on, if there's specific factual information that's being mentioned, they want that extracted. There's also definitely an interest in sentiment and opinion mining.
JOYCE: Ernest Tucker is a history professor with the Center for Middle East and Islamic Studies at the U.S. Naval Academy. He struggles with Urdu himself.
ERNEST TUCKER: (Urdu spoken)
(SOUNDBITE OF LAUGHTER)
TUCKER: That's saying my Urdu's not very good in Urdu.
JOYCE: But Tucker does speak and read Persian, which is close, and he regularly reads publications from the whole region. He argues that history is best told not by what the Napoleons say, but by the foot-soldiers or, in this case, the tweeters.
TUCKER: And that's the goal of all historians anywhere, is to try to get the voices of more and more and more people into the conversation. And anything that can do that, particularly this kind of thing, is to me, it's a wonderful gift.
JOYCE: Tucker says he's skeptical about how well a computer is going to identify sentiment. He says you'll still need a human linguist to fine-tune any translation. For example, he says it's common in Middle Eastern languages to employ couplets from traditional poetry to convey feelings - symbolic language that could confuse a computer program.
TUCKER: For the Iranians, for the Pakistanis, for the Indians, it's still a part of the living connection to the cultures of the past.
JOYCE: Rohini Srihari acknowledges the program isn't perfect. It gets flummoxed by things like Urdish, a mash-up language for text messaging that's part Urdu and part English. But it has given her insight into what Urdu speakers have been talking about lately.
SRIHARI: A lot of the conversation, believe it or not, was about cricket. That seems to be on everyone's mind, all the time.
JOYCE: Christopher Joyce, NPR News.
(SOUNDBITE OF MUSIC)
MONTAGNE: You're listening to MORNING EDITION, from NPR News.
NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.