GUY RAZ, HOST:
It's the TED Radio Hour from NPR. I'm Guy Raz. So you probably don't think about this too much, but data is everywhere.
SUSAN ETLINGER: Without even knowing it, we've sort of crept into this era where data all of a sudden is ambient. It's everywhere we go.
RAZ: This is data analyst Susan Etlinger.
ETLINGER: So as we walk around every single day, our locations are tracked. The apps that we use are being tracked. If you are on a website, anything that you do on that website is likely being tracked. If we walk by surveillance cameras, if we get in a car that has a GPS, you have any kind of medical device - run a red light, there's a camera, so data's everywhere.
ETLINGER: Some people call this digital exhaust, you know, the idea that you're sort of walking around and these sort of particles of data are surrounding you all the time.
RAZ: Just coming out of you, yeah.
ETLINGER: That's right, everywhere we go. And I don't think it's possible for any individual person to truly understand everything that they are creating at any given time.
RAZ: And, sure, we all kind of know this is going on. But increasingly, we don't have to understand it. As we'll explore this episode, technology is doing that for us.
RICCARDO SABATINI: So in 10 years, the amount of information that you will have of yourself will be incomputable by a human being.
RAZ: Which means we're learning more about our world...
ANDREW CONNOLLY: These data will let us understand how we form.
RAZ: ...And beyond.
CONNOLLY: ...Our solar system formed, to understand our place in the universe.
RAZ: But the trick is knowing the difference between a lot of data and too much information.
KENNETH CUKIER: Data doesn't exist, OK? Information exists, and it's ephemeral. And when we capture it, it's data.
RAZ: So this episode - Ideas About Big Data, taking huge amounts of information and making sense of it.
ETLINGER: What's great about it is when we think about the ability, for example, to sequence the human genome or to look at the universe or the ways in which we could use data to look at epidemics and the spread of disease or even the spread of ideas, you know, that can make a tremendous difference. And so I do think we have this wonderful kind of tool at our fingertips. We just have to be, I think, a little bit careful with it.
RAZ: Susan Etlinger returns later to describe why exactly we need to be careful. But first, let's just say big data is like - is like a super boring term, right? Like, I don't even know if people are going to, like, download this episode.
CUKIER: It is the most vibrant thing happening in the world today.
RAZ: This is Kenneth Cukier. He's a senior editor at The Economist and co-author of the book "Big Data," where he wrote about how all that data flying off of us all the time, that kind of digital exhaust, is changing the way we live.
CUKIER: If every aspect of living gets this shadow to it, this veneer of data, suddenly we can learn new things that we never could before. And I see it as part of this sort of timeless march that we've been on of improving our society by applying our reason to it and our technical skills to it.
RAZ: And how our technical skills - they're really the skills we're building into our computers to process huge amounts of data in a way the human brain never could. Kenneth explained all this on the TED stage.
CUKIER: So what is the value of big data? Well, think about it. You have more information. You can do things that you couldn't do before.
One of the most impressive areas where this concept has taken place is in the area of machine learning, OK? Machine learning is a branch of artificial intelligence which itself is a branch of computer science. The general idea is that instead of instructing a computer what to do, we are going to simply throw data at the problem and tell the computer to figure it out for itself. And it'll help you understand it by seeing its origins. OK.
In the 1950s, a computer scientist at IBM named Arthur Samuel liked to play checkers. So he wrote a computer program so he could play against the computer. He played. He won. He played. He won, because the computer only knew what a legal move was.
So he wrote a small subprogram alongside it operating in the background, and all it did was score the probability that a given board configuration would likely lead to a winning board versus a losing board after every move. And then Arthur Samuel leaves the computer to play itself. It plays itself, it collects more data. It collects more data, it increases the accuracy of its prediction. And then Arthur Samuel goes back to the computer and he plays it, and he loses, and he plays it, and he loses, and he plays it, and he loses. And Arthur Samuel has created a machine that surpasses his ability in a task that he taught it.
And this idea of machine learning is going everywhere. How do you think we have self-driving cars? We changed the nature of the problem from one in which we tried to overtly and explicitly explain to the computer how to drive to one in which we say here's a lot of data around the vehicle, you figure it out. You figure it out. That is a traffic light, that that traffic light is red and not green, that that means that you need to stop and not go forward.
Big data is going to transform how we live, how we work and how we think.
RAZ: OK, so, like, 50 years from now, what's something else that's going to improve our lives because of big data, like - you know, like something in our everyday routine?
CUKIER: So an example will be I'll have a toilet, or I'll have a faucet that will be running water for me to wash my hands. When I do that, I'll probably have a sensor that will be taking a look at my cell follicles that come down. And it'll be analyzing my biochemistry. The toilet might turn out to be sort of the centerpiece of the home in terms of health care because you could actually monitor a stool sample on a daily basis.
If we did it every day, we might learn something about the progression of disease that we didn't know before. And so where today that information doesn't help us because we can't spot the signal that predicts a pancreatic cancer two years out until symptoms exist, now we will be able to spot it because we'll actually have learned something new. We'll have done something at a different scale, in this case, analyzing people and their health.
RAZ: That's amazing because that's just one small example. Right? So - I mean, are we talking about changes in the future on, like, a scale of the Industrial Age or the Information Age?
CUKIER: OK. So it's a great question. The printing press was, in some ways, the first big data revolution because the time to produce a book and the cost of a book, you know, just fell through the floor. Immediately - the very first thing we started doing was we were printing Bibles, the same things that the scribes were writing. But it just cost a lot less. So we had more Bibles around. But when we had more Bibles in circulation, we had more people who could read it and more people who wanted to read the Bible. And therefore, we had a greater thirst for literacy. And there was a kind of a movement towards mass literacy.
Soon thereafter, it wasn't about Bibles. It was the production of other works, new works, that we couldn't have ever even imagined. So the idea is that when we increase the amount of something - and here, the amount of printed works - we didn't just replicate what we were already doing and get the efficiency gain of lowering the price, increasing the volume. There was a state change. Likewise, I think, that we're in the very early stages of this same sort of revolution. You could consider it, if you will, the great Age of Discovery of machine learning and of big data. We're going to do entirely new things because of it.
CUKIER: Now there are dark sides to big data as well. It will improve our lives, but there are problems that we need to be conscious of. And the first one is the idea that we may be punished for predictions - that the police may use big data for their purposes, a little bit like "Minority Report." Now, it's a term called predictive policing, or algorithmic criminology. And the idea is that if we take a lot of data, for example, where past crimes have been, we know where to send the patrols. That makes sense.
But the problem, of course, is that it's not simply going to stop on location data. It's going to go down to the level of the individual. Why don't we use data about the person's high school transcript? Maybe we should use the fact they're unemployed or not, their credit score, their web-surfing behavior, whether they're up late at night, their Fitbit, when it's able to identify biochemistries, will show that they have aggressive thoughts.
We may have algorithms that are likely to predict what we are about to do, and we may be held accountable before we've actually acted. Privacy was the central challenge in a small-data era. In the big data age, the challenge will be safeguarding free will, moral choice, human volition, human agency.
RAZ: So Kenneth, you're talking about things like my credit score and my sleep habits. And, I mean, all this is data that's being collected from me - from everyone. So I mean, is it even possible to opt out of any of this?
CUKIER: I don't think you can. You could say, well, maybe this is similar to opting out of the internet. Well, you can do that. It's hard to live in the 20th century if you do, but it's possible. But I believe that big data, if you will, it's sort of like saying I want to opt out of the right angle or I want to opt out of mathematics. Life's already a complicated place. If you're going to be nervous about all of these features of living that are just going to be underneath the surface as the fabric of how things happen, life is going to become paralyzing.
And I think we're in a transition, so I can appreciate why people are a bit nervous to it. But at the end of the day, if they wheel you on a gurney into the emergency ward and you have a choice - you can have the medical system of the Middle Ages, or you can have the medical system of the 21st century - you don't even have to know or care how this stuff works. You're probably just going to say yeah, give me the anesthesia and save my life because big data is going to transform so many features of our lives that we're all going to simply accept the modern because the outcomes, usually, will be far, far better than they ever could have been if we didn't.
RAZ: I just wonder if big data is a gentler version of Big Brother.
CUKIER: The true answer is maybe. The Snowden affair should give everyone pause. I mean, it was sad that it became politicized because what this fellow was saying was not that we were living in a turnkey totalitarian state but that we were laying the infrastructure for this to happen. Because if we're going to accept big data and all the benefits that we can use it for, we need limitations so that we can preserve our fundamental freedoms. And if we don't have that, then these technologies can absolutely be used to the detriment of human beings, and we can't let that happen. So to cap it, I completely respect the idea that big data seems like a repackaging of Big Brother. But we can certainly go beyond that simple dualism to deal with the real, substantive problems that we have.
RAZ: You think that the benefits of this are so incredible that they just dwarf any downsides.
CUKIER: No, I think that the benefits are just so incredible that we absolutely must address these downsides, or we can't unlock these benefits. And we would be a stupid society if we didn't get these benefits.
RAZ: Kenneth Cukier is co-author of the book "Big Data." You can see his entire talk at TED.com. Back in a moment with more Ideas About Big Data. I'm Guy Raz, and you're listening to the TED Radio Hour from NPR.
(SOUNDBITE OF MUSIC)
(SOUNDBITE OF TED TALK)
(SOUNDBITE OF TED TALK)
NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.