Following Digital Breadcrumbs To 'Big Data' Gold

First of a two-part report

What do Facebook, Groupon and biotech firm Human Genome Sciences have in common? They all rely on massive amounts of data to design their products. Terabytes and even zettabytes of information about consumers or about genetic sequences can be harnessed and crunched.

The practice is called big data, and as the term suggests, it is huge in both scope and power. Analyzing big data enables anything from predicting prices to catching criminals, and has the potential to impact many industries.

Because of the ability to cheaply process terabytes of information, companies can analyze all kinds of things that weren't possible before, like Decide.com's price prediction tool that exposes the surprising volatility of consumer electronic prices. i i

hide captionBecause of the ability to cheaply process terabytes of information, companies can analyze all kinds of things that weren't possible before, like Decide.com's price prediction tool that exposes the surprising volatility of consumer electronic prices.

Courtesy Decide.com
Because of the ability to cheaply process terabytes of information, companies can analyze all kinds of things that weren't possible before, like Decide.com's price prediction tool that exposes the surprising volatility of consumer electronic prices.

Because of the ability to cheaply process terabytes of information, companies can analyze all kinds of things that weren't possible before, like Decide.com's price prediction tool that exposes the surprising volatility of consumer electronic prices.

Courtesy Decide.com

One way to understand how big data works is to think about your daily life. You write an email, call your boss, pass a security camera, maybe buy a plane ticket online. Taken alone, this is disjointed, boring information. To Elizabeth Charnock, it makes up your digital character.

"Digital character is this idea that almost everybody these days leaves behind a giant digital breadcrumb trail," she says.

Charnock founded Cataphora, a company that can process huge amounts of this sort of data about employees to determine patterns. She says those patterns can predict everything from a person's mood to their skill as a manager to a person's inclination to commit fraud.

Take rogue trader Jerome Kerviel, who cost his French bank billions of dollars in losses.

"His cellphone bill was literally an order of magnitude larger than any of his coworkers — why?" Charnock asks. "Well, because he wanted to put less things in writing. He almost never took vacation, even though French people love to take vacation."

Charnock says Kerviel also circumvented usual trading and communication protocols.

How One Company Harnesses Big Data

If you have your eye on that brand new camera with all of the features you never imagined you'd need, how do you know whether it's time to buy? Decide.com is a prediction tool that tells you when gadget prices are likely to rise, stay the same or fall — and if you should wait a few weeks to buy the rumored newer model.

Decide collects the prices of more than 100,000 electronic products every day from hundreds of online retailers. It also searches technology blogs for rumors of upcoming new releases, adding up to over 25 GB of data per day.

25 terabytes of data is like reading 150,000 printed books worth of information.

The data are sent to Amazon's cloud storage and processed with the help of Hadoop software, which is used for many big data projects to organize the information and minimize mistakes.

Decide's four computer science Ph.D.'s create algorithms to mine the data and predict whether prices will go up or down, similar to what the finance industry has done for years to forecast stock prices. But now that data storage has become so cheap, other businesses can get in the game.

Cost of GB over years.

In addition to crunching the numbers, Decide analyzes thousands of news articles, blog posts and press releases to see whether any rumors have surfaced. Sources that were reliable in the past are listened to more closely.

All in all, Decide has nearly 100 terabytes of data to analyze and end up with a simple prediction: Should you buy now or wait for prices to drop?

100 terabytes of information is equal to 18 times the printed collection of the Library of Congress.

—Sara Carothers, Stephanie d'Otreppe/NPR

"Any one of those things, you kind of say, 'So what?' But what we look for is a number of them that on the surface perhaps don't seem to be related but all seem to be happening at the same time," she says.

Charnock says had the French bank analyzed that data, it might have flagged the rogue trader earlier.

But big data is not just about connecting dots to detect crime. The ability to process so much information so quickly makes all kinds of things possible that weren't before. So LinkedIn finds jobs or people you might like to know about, and biotech companies can analyze gene sequences in billions of combinations to design drugs.

Data analytics itself is not new. Two decades ago, Wall Street hired teams of physicists to analyze investments. But in the past couple of years, computing, storage and bandwidth capacity have become so cheap that it has altered the scale of what's possible.

Now, with very little money, a gifted student or a small startup can design big-data applications.

"Everywhere you look, there's an opportunity to collect more data and then apply a statistical or mathematical approach to understanding what's happening," says Chris Kemp, chief executive officer of Nebula, a firm that provides storage and computing capacity for other companies to be able to process their big data applications.

Kemp says ultimately big data will give consumers better tools so they can do a better job of predicting things like prices, such as whether an airfare is likely to go up or down. Farmers can do a better job of insuring their crops if they can forecast the weather with greater accuracy.

Oren Etzioni, a professor of computer science at the University of Washington, says this trend is fueling intense demand for mathematics and computing talent.

"We have seen the industrial revolution, and we are witnessing a data revolution," Etzioni says.

He has started three big-data companies. One of them, Decide.com, employs four Ph.D.s to design better programs to forecast prices on consumer electronics.

Etzioni says a good data scientist can write algorithms that filter data, understand what they're telling you, and then graphically represent the information. The end result is like getting a bird's-eye view of a vast territory of information.

Big data can, and occasionally does, go wrong. Comic examples of that include mismatched recommendations, like "My TiVo thinks I'm gay." But think about a company divulging your Web-surfing history with your name attached, and you begin to get a sense of how big data opens the door to new possibilities of security or privacy breaches.

James Slavet, a venture capitalist at Greylock Partners, says his firm invests in companies that use big data creatively and responsibly. He says data cannot stand in for human judgment.

"They do use it to make the judgment more sound, more objective and to hopefully lead to better decision-making," he says.

Slavet calls big data a tectonic shift, one that will continue to affect many things we do for decades to come.

Comments

 

Please keep your community civil. All comments must follow the NPR.org Community rules and terms of use, and will be moderated prior to posting. NPR reserves the right to use the comments we receive, in whole or in part, and to use the commenter's name and location, in any medium. See also the Terms of Use, Privacy Policy and Community FAQ.

Support comes from: