KELLY MCEVERS, HOST:
We are in a time of big data. In recent years, NPR's done stories about how data analytics are being used to help political campaigns, rally supporters, compare the cost of similar surgeries in different cities, track public buses in real time and even maybe identify police officers at risk of committing misconduct. But the question is are we putting too much faith in big data? That's the question we're asking in this week's All Tech Considered.
(SOUNDBITE OF MUSIC)
MCEVERS: In her new book, mathematician Cathy O'Neil says we are in a techno utopia. And she does not mean that in a good way. Her book is called "Weapons Of Math Destruction: How Big Data Increases Inequality And Threatens Democracy." And she is with us now. Welcome to the show.
CATHY O'NEIL: Honored to be here, Kelly.
MCEVERS: So tell us what you mean by techno utopia.
O'NEIL: Well, techno utopia is this idea that the machine-learning tools, the algorithms, the things that help Google, like, have cars that drive themselves, that these tools are somehow making things objective and fair when, in fact, we really have no idea what's happening to most algorithms under the hood.
MCEVERS: So it sounds like when you're saying, you know, we have these algorithms, but we don't know exactly what they are under the hood, there's this sense that they're inherently unbiased. But what you're saying is that there's all kinds of room for biases.
O'NEIL: Yeah, for example, like, if you imagine, you know, an engineering firm that decided to build a new hiring process for engineers and they say, OK, it's based on historical data that we have on what engineers we've hired in the past and how they've done and whether they've been successful, then you might imagine that the algorithm would exclude women, for example. And the algorithm might do the right thing by excluding women if it's only told just to do what we have done historically. The problem is that when people trust things blindly and when they just apply them blindly, they don't think about cause and effect.
They don't say, oh, I wonder why this algorithm is excluding women, which would go back to the question of, I wonder why women haven't been successful at our firm before? So in some sense, it's really not the algorithm's fault at all. It's, in a large way, the way we apply algorithms and the way we trust them that is the problem.
MCEVERS: Your book has a lot of examples where big data has not lived up to its promise. I was wondering if you could give one example where this happened and, in fact, actually made things even worse?
O'NEIL: Yeah, well, so everybody knows about the sort of decades-long attempt to improve public education in the United States. It goes by various names like No Child Left Behind, you know, Race to the Top. But at the end of the day, what they've decided to do in a large part is to sort of remove these terrible teachers that we keep hearing about. And the way they try to find these terrible teachers is through something called the growth model. And the growth model, mathematically speaking, is pretty weak and has had, like, lots of unintended consequences.
When I say weak, I interviewed a teacher from New York City public schools named Tim Clifford. He's been teaching for 20 years, multiple awards, he's written quite a few books. He got a 6 out of 100 one year and then a 96 out of 100 the next year. And he says his techniques didn't change. So it's very inconsistent. It's not clear what this number is actually scoring in terms of teachers and the teaching ability. I interviewed a woman named Sarah Wysocki in the D.C. area who actually got fired because of her low growth model score.
MCEVERS: There must be other examples, though, where people, you know, good teachers got good scores.
O'NEIL: Yeah, I mean, there certainly are, but I would say it's relatively close to a random number generator. So the fact that some good teachers got good scores doesn't say enough. I guess the point is that you might have some statistical information when you hear a score, but it's not accurate enough to actually decide on whether a given teacher, an individual teacher is doing a good job. But it's treated as such because people just trust numbers, they trust scores.
MCEVERS: When you think about the kinds of problems that people are trying to solve with big data going forward, what are some of the areas where you think, yeah, just don't use data to do that one? That one's too complicated.
O'NEIL: It's such a massive field, like, you absolutely need to perform triage. So I really - I very, very carefully defined the kinds of algorithms that I worry about. And they have three characteristics. The first is that they're high-impact, they affect a lot of people. It's widespread and it's an important decision that the scoring pertains to, so like a job or going to jail, something that's important to people. So it's high-impact. The second one is that the things that worry me the most are opaque. Either that means that the people who get the scores don't understand how they're computed or sometimes that means that they don't even know they're getting scored.
Like if you're online, you don't even know you're scored but you are. And the third characteristic of things that I care about, which I call weapons of math destruction, the third characteristic is that they are actually destructive, that they actually can really screw up somebody's life. And most of the time, these algorithms are created with, like, good intentions in mind. But this destructiveness is typically undermines that good intention and actually creates a destructive feedback loop.
MCEVERS: Flagging you as a potential criminal to police or flagging you as some sort of person in a potential hiring position.
O'NEIL: Exactly. If you just imagine, like, something that is pretty well-known is that credit scores are being used to deny people jobs. And that actually creates worse credit scores. You know, an individual who doesn't get a job because they have a bad credit score goes on to having even worse credit scores. States are trying to prevent that from happening on a state-by-state basis. But what we have now in the age of big data is something called electronic credit scores, E-scores, that a lot of people don't even know they're being made of them.
They're not illegal, they're not regulated. And they could lead to the same kind of drastic and pernicious feedback loops.
MCEVERS: You know, we've talked a lot about the problems and the dangers of big data - want to know some of the things that you think it could be used for in a good way, some of the kinds of problems it can tackle. I mean, this is a powerful thing. Are there smart ways we can use it?
O'NEIL: So in the book, I talk a lot about predictive policing and recidivism risk scoring, which is two different kinds of algorithms that are currently being used to decide who to target, what kind of neighborhoods to target by the police and then, like, whether criminal defendants are high-risk and whether they should go to jail for longer. What I don't see happening, which I wish were happening with respect to the justice system, is a kind of audit of the actual entire process overall. We know a lot about how when people are held in solitary confinement, it's probably not good for them.
So why don't we use data to improve the entire system, including, like, knowing those kinds of attributes? How much time did this person spend in solitary confinement? What were the actual conditions? Was the GED offered as a - you know, what kind of facilities did they have? You know, I'd like to see the Amazon Workflow big data audit system be applied to the justice system. I think a lot could be learned and a lot could be improved.
MCEVERS: That's Cathy O'Neil. She's the author of "Weapons Of Math Destruction: How Big Data Increases Inequality And Threatens Democracy." Thank you so much.
O'NEIL: Thank you so much.
NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.