DAVID BIANCULLI, HOST:
This is FRESH AIR. One effect of the revelations about NSA surveillance has been to bring the unfamiliar word metadata into the spotlight. Our linguist Geoff Nunberg explains what metadata is, how it's related to the big data revolution, and whether it makes any difference - or, as he puts it, does meta matter?
GEOFF NUNBERG, BYLINE: This is just metadata; there's no content involved - that was how Sen. Dianne Feinstein defended the NSA's blanket surveillance of Americans' phone records and Internet activity. Before those revelations, not many people had heard of metadata, the term librarians and programmers use for the data that describes a particular document or record it's linked to. It's the data you find on a card in a library catalog, or the creation date and size of a file in a folder window. It's the penciled note on the back of a snapshot: Kathleen and Ashley, Lake Charles, 1963. Or it could be the times, numbers and GPS locations attached to the calls in a phone log.
Metadata was bound to break out sooner or later, riding the wave of data in all its forms and combinations. Big data and data mining are the reigning tech buzzwords these days, and university faculties are scrambling to meet the surge in demand for courses in the hot, new field of data science. It's as if data is usurping information as a byword.
Up to now, data has played a supporting role in the information age. There's a popular definition of data as the raw material that becomes information when it's processed and made meaningful. That puts information at the center of the modern tech world, but it isn't how anybody actually uses the two words. I have this image of somebody working on a spreadsheet as a manager leans over and says, is it information yet?
But the shift in focus from information to data reflects a genuine difference between the two. Information brings to mind the knowledge that's gathered in libraries, encyclopedias and journals - stuff that has an independent existence in the world. Data is always connected to particular things and events. It comes from experiments and sensors and official records; or from the scuff marks we leave behind as we click on websites, make calls, go through the E-Z Pass toll booths, visit an ATM. It's all out there, accumulating in ginormabytes, overflowing the server farms.
When you're focused on information in that stand-alone sense, metadata plays a subordinate role. In the old days, it was just a tool for getting to the stuff you were really interested in. Think how much metadata you had to wade through, to find a passage about drunkenness in Tocqueville's "Democracy in America" - looking up the book in the library card catalog, writing down its call number, finding it on the shelves, searching for drunkenness in the index, then finally turning it to the page.
Now that that kind of information is online, the metadata can seem almost irrelevant. No need for catalogs or indexes - you just enter a query and when the book comes up, you barrel in sideways. That's probably why Google was so careless about metadata when they digitized major library collections for Google Books. Literally, millions of books are misdated or misclassified. It's not odd to run into a web browser manual dated 1939 that lists Sigmund Freud as its author; or a copy of "Madame Bovary" attributed to Henry James, and filed under antiques and collectibles. The faulty metadata prompted some grumbles from academics, and Google's been working on fixing it. But it doesn't bother most of the people who use Google Books. They get at its information in other ways.
But metadata gets a lot more respect in other corners of the Google campus, not to mention from its competitors up and down U.S. 101. Their focus is not information in the abstract, but on collecting specific data about their users. And for that, they need to get the metadata right - who's visited this page; how long did they stay; where did they go next; who did they email, call or text - all so that advertisers can ensure that the seersucker jacket I clicked on yesterday will stalk me to the end of my days.
That's the same kind of metadata the NSA has been trawling. Its defenders maintain that we have to be willing to trade some privacy for some security and right now, we're all arguing about where to put the boundaries. But some advocates of the surveillance have also tried to soft-pedal its intrusiveness. You hear people pronouncing metadata as a soothing incantation, as if your right to privacy ends as soon as you lick and seal the envelope. Sifting through the metadata, the president said, involves just modest encroachments on privacy. James Clapper, the director of national intelligence, compared the programs to combing through a library with millions of volumes and sorting them by their Dewey decimal numbers, without actually opening and reading the books.
But if you're going to compare this to rummaging around in an old-fashioned library, it's more like opening the back covers of all the books, to see whose names are on the borrowers' cards. Whether or not you think the government should be sweeping this stuff up, calling it metadata doesn't make the process any less intrusive. Tell me where you've been and who you've been talking to; and I'll tell you about your politics, your health, your sexual orientation, your finances.
So maybe we should just let the word sink back into the nerdy cubicles it came from. When it comes to privacy, the meta doesn't matter. In the post-information age, it's just data all the way down.
BIANCULLI: Geoff Nunberg is a linguist who teaches at the University of California, Berkeley, School of Information.
NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.