Author Interview: Kenneth Cukier, Co-Author Of 'Big Data'Companies and governments have access to an unprecedented amount of digital information, much of it personal: what we buy, what we search for, what we read online. Kenneth Cukier, co-author of the book Big Data, describes how data-crunching is becoming the new norm.
The 'Big Data' Revolution: How Number Crunchers Can Predict Our Lives
When the streaming video service Netflix decided to begin producing its own TV content, it chose House of Cards as its first big project. Based on a BBC series, the show stars Kevin Spacey and is directed by David Fincher, and it has quickly become the most watched series ever on Netflix.
The success of House of Cards is no accident. Netflix executives knew exactly what their millions of customers were watching; they knew precisely how popular the works of Fincher were, and how many of their customers were fans of Kevin Spacey, and how many people were streaming the British House of Cards. Sifting through that mountain of data, Netflix executives were able to predict that House of Cards would be just what Netflix viewers would want to watch.
That kind of decision-making is an example of Big Data: the decade-long explosion of digital information, much of it personal, that has become available to companies and governments. This trend in predictions and decisions is the topic of a new book, Big Data: A Revolution That Will Transform How We Live, Work and Think.
One of the book's authors, Kenneth Cukier, joins NPR's Steve Inskeep to talk about how Big Data helps Target detect pregnancies, the police track potential criminals — and has even changed the way he talks to his kids.
On how Target identifies pregnant customers
"The example comes from Charles Duhigg, who's a reporter at The New York Times, and he's the one who uncovered the story. What Target was doing was they were trying to find out what customers were likely to be pregnant or not. So what they were able to do was to look at all the different things that couples were buying prior to the pregnancy — such as vitamins at one point, unscented lotion at another point, lots of hand towels at another point — and with that, make a prediction, score the likelihood that this person was pregnant, so that they could then send coupons to the people involved... there might be a coupon for a stroller or for diapers ...
"There was an example of a father coming in to a store and complaining that the teenage daughter was receiving fliers in the mail for coupons for baby products. And he said, 'What are you trying to do? Trying to get my teenage daughter pregnant?' And of course the way it ends is that he comes back later and apologizes, and says, 'It turns out there were things in my house that I wasn't aware of.'
"But the fact is, this is the sort of universe that we are now going to be in — we're all going to be in — because of Big Data. And all stores will be doing this, and all governments will be doing this. Your doctor will do this. Your employer will do this. This is the new norm."
On why Big Data doesn't care about causes, just correlation
"They crunched the numbers, and they found out that cars that were orange tended to not have breakdowns compared to other colors of cars ... So why might this be? Well, we can sort of concoct different scenarios. One is that orange tends to be a custom color, and if you order an orange car, perhaps the rest of the car was made in a custom way, a little bit more care was taken into it. We don't know why, and it's frankly, it's not that important. It might just bring us down a rabbit hole for us to try to find out why. But, again, if you just want to buy a car that's not going to break down, go with the correlation."
"Google stores all of its searches. What they were able to do was go through the database of previous searches to identify what was the likely predictor that there was going to be a flu outbreak in certain regions of America. Now, keep in mind, we pay for the [Centers for Disease Control and Prevention] to look at the United States and find out where flu outbreaks are taking place for the seasonal flu. But the difference is that it takes the CDC about two weeks to report the data. Google does it in real time simply on search queries."
On when Big Data crosses the line
"It goes too far when we start making predictions for things that we have not yet done but we have the propensity to do — for example, commit a crime. If Big Data correlations identify me as a 44-year-old male who's a journalist and who has grand eyes for things I can't afford, it may think that I'm going to be susceptible to embezzlement, and maybe I will get a knock on the door by the police, who say, 'We have reason to believe that you're about to commit a crime.' This is sort of like pre-crime in [the film] Minority Report.
"... Now, will that be the case, will we go down that route? Frankly, we already are. There's a whole branch of criminology called algorithmic criminology, and a dimension called predictive policing ... police forces in many cities in America are crunching the numbers and looking at where the likelihood of a crime is going to be, and when, based on the past patterns of crime. And now we can say not just that a crime is going to exist in an area, but that these people have a, say, 80 percent likelihood to be a felon.
"We have judicial systems that presume you have acted and therefore we are going to penalize. We've never had a system whereby we're making predictions about your likelihood, your propensity to do something, before you've actually acted, and therefore we're going to take remedial steps against you."
On how Big Data has changed how he talks to his kids ...
"I think about what it means to educate my children. I have a 5-year-old and a 2-year-old, and I talk to them about ... thinking in terms of how to understand the world and act in the world with imperfect information. This is going to sound very strange, but I ask my child questions ... which I know she doesn't have enough information to answer. And I absolutely reward her when she takes an educated guess, when she makes a decision based on imperfect information. And I find ... a way of explaining to her that that's great. You're living in a world in which you're never going to have enough information, but you're going to have to come to answers and conclusions and make decisions based on it. So it's really important that you take in as much information and come up, using your judgment and wisdom ... come up with a decision based on that."
... and how his daughter responds
"She says that she's writing a book — it's called Big Data for Ponies."