Copyright ©2010 NPR. For personal, noncommercial use only. See Terms of Use. For other uses, prior permission required.

MELISSA BLOCK, host:

In Act 2 of Shakespeare's "Hamlet," Polonius finds the increasingly unbalanced prince reading a book.

(Soundbite of play, "Hamlet")

Unidentified Man #1: (As Polonius) What do read, my lord?

Unidentified Man #2: (As Hamlet) Words. Words. Words.

BLOCK: Well, this afternoon, a mind-boggling collection of words went online, perhaps the biggest ever assembled, billions of them from millions of books published over the past four centuries. The words make up a searchable database that researchers at Harvard say is a tool to study cultural change.

NPR's Dan Charles reports.

DAN CHARLES: Google has been running millions of books through scanners, converting old-fashioned ink and paper into electronic digits. But many books are covered by copyright and publishers aren't letting people read them online. The new database gets around that problem. It's just a collection of the words in five million of the books, stripped of all context except the date in which they appeared.

But Erez Lieberman Aiden, a mathematician and bioengineer at Harvard, says it opens the door to a whole new style of literary scholarship.

Mr. EREZ LIEBERMAN AIDEN (Mathematician, Bioengineer): Instead of saying, what insight can I glean if I have one short text in front of me, it's, what insight can I glean if I have 500 billion words in front of me; if I have such a large collection of texts that you could never read it in a thousand lifetimes?

CHARLES: For instance,�you can type in a word or a short phrase and the database produces a graph, a curve that traces how often an author used those words every year since 1800.

Mr. JEAN-BAPTISTE MICHEL (Mathematician, Biologist): And you realize that it's fantastically addictive.

CHARLES: Jean-Baptiste Michel is also at Harvard. He's a mathematician and biologist.

Mr. MICHEL: You can just spend hours and hours and hours typing in the names of people you know, places you like, or just random stuff. And so you end up discovering quite a lot of things that way.

CHARLES: The two researchers discovered, for instance, that the trajectory of fame - the curve that shows how often a very famous person is mentioned in books - has changed over the centuries. Today, fame is more fleeting.

Mr. MICHEL: You become famous earlier in life. So fame knocks at your door earlier in your life. And then you rise to fame even faster than before. And the flipside of this is that you become forgotten also somewhat faster than before.

CHARLES: Specific years - 1973, for instance - also seem to fade from the literary record more quickly nowadays. God got a lot of print in the early 19th century, but not today. Aiden and Michel say these graphs are windows into evolving cultures. All those words represent a chunk of our cultural DNA, not a genome, they say, but a culturome. They've named the website where anybody can search their database�www.culturomics.org. It's just been unveiled in the journal�Science.

Now, Erez Lieberman Aiden is quick to point out it is limited.

Mr. AIDEN: Books are just one form of cultural exchange. It's a biased form of cultural exchange. There's only certain types of people who are writing books. There's only certain types of people who are managing to get their books published.

CHARLES: But at least we have books. We'll never catalog all the words in casual conversations or lovers' quarrels. Some scholars may be horrified by this approach to literature, but historian Caroline Winterer at Stanford is not. She says these new tools give historians more comprehensive information about the words that people used in the past to describe their world.

Professor CAROLINE WINTERER (Historian, Stanford University): Whereas before, you had to sit there and, well, you had to actually read the whole text, God forbid.

(Soundbite of laughter)

Prof. WINTERER: And, you know, you'd find three or four examples, and nobody could really check up on it. For better or for worse, it does give you a more accurate sense of some things in the humanities.

CHARLES: But certainly not everything, she says. Take the decline of the word God. Over the past century or two, some writers have started describing the wonders of the natural world as divine. Their books don't always use the word God.

Prof. WINTERER: But they are talking about nature, or the environment, or Yosemite, or Yellowstone. These are all codes for God.

CHARLES: And you'll only notice that, she says, if you do read the books and try to understand them.

Dan Charles, NPR News.

Copyright © 2010 NPR. All rights reserved. No quotes from the materials contained herein may be used in any media without attribution to NPR. This transcript is provided for personal, noncommercial use only, pursuant to our Terms of Use. Any other use requires NPR's prior permission. Visit our permissions page for further information.

NPR transcripts are created on a rush deadline by a contractor for NPR, and accuracy and availability may vary. This text may not be in its final form and may be updated or revised in the future. Please be aware that the authoritative record of NPR's programming is the audio.

Comments

 

Please keep your community civil. All comments must follow the NPR.org Community rules and terms of use, and will be moderated prior to posting. NPR reserves the right to use the comments we receive, in whole or in part, and to use the commenter's name and location, in any medium. See also the Terms of Use, Privacy Policy and Community FAQ.

Support comes from: