Black-Box Algorithms: Ready For Medical Use? : Shots - Health News It's hard for humans to check algorithms that computers devise on their own. But these artificial intelligence systems are already moving from the lab toward doctors' offices.
NPR logo

How Can Doctors Be Sure A Self-Taught Computer Is Making The Right Diagnosis?

  • Download
  • <iframe src="https://www.npr.org/player/embed/708085617/708856276" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript
How Can Doctors Be Sure A Self-Taught Computer Is Making The Right Diagnosis?

How Can Doctors Be Sure A Self-Taught Computer Is Making The Right Diagnosis?

  • Download
  • <iframe src="https://www.npr.org/player/embed/708085617/708856276" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript

AUDIE CORNISH, HOST:

We're taking a look at artificial intelligence - its benefits, its limits and the ethical questions it raises in this month's All Tech Considered.

(SOUNDBITE OF MUSIC)

CORNISH: As artificial intelligence becomes more sophisticated, it allows computer programs to perform tasks, at one time, only people could do, like reading X-rays. Many of these programs are called black-box models; that's because even the scientists who created them do not know how they make decisions. NPR's Richard Harris reports on the promise and the pitfalls of applying AI to medical care.

RICHARD HARRIS, BYLINE: If you want to glimpse the brave new world of artificial intelligence programs that are taking on life and death medical judgments, there's no better place than the Stanford University campus.

Hey.

PRANAV RAJPURKAR: Richard?

HARRIS: Yes.

RAJPURKAR: Pranav. Nice to meet you. How's it going?

HARRIS: Nice to meet you. Great.

Pranav Rajpurkar is still a graduate student but clearly a rising star in this world. He's developing computer programs that can learn how to diagnose lung disease. He basically gives his computer algorithm a big pile of data and lets it go to town on its own.

RAJPURKAR: Here's what a chest X-ray looks like, and here's the corresponding diseases in that chest X-ray. And then you just feed it hundreds of thousands of these, and then it starts to be able to automatically learn the pattern from the image itself to the different pathologies.

HARRIS: One weekend, they got a huge download of X-ray data that had just been released by the National Institutes of Health. He and his colleagues set up a machine-learning algorithm and let it run overnight. Lo and behold, by morning, the algorithm had taught itself to diagnose 14 different lung diseases with pretty good accuracy.

RAJPURKAR: And that got me really excited about the opportunities and the ease with which AI is able to do these tests.

HARRIS: Fast-forward to February of this year, and he and his colleagues have already moved far beyond that point. He leads me to a sun-filled room in the William Gates computer building.

RAJPURKAR: This is our lab.

HARRIS: The team is looking at a prototype of a new program which can diagnose tuberculosis among HIV-positive patients from South Africa, a country that has a shortage of doctors for that task. They are checking out how well the program actually performs.

UNIDENTIFIED PERSON #1: Can we go through a few of these?

RAJPURKAR: What is your guess on this one?

HARRIS: The computer scientists, including Amir Kiani and medical student Chloe O'Connell, lean into the screen, which shows a chest X-ray. There's also an image that shows where the program is focusing its attention and basic lab results.

CHLOE O'CONNELL: Oh, this is a great-looking chest X-ray.

HARRIS: O'Connell doesn't see any white areas in the lung. The algorithm says the patient is unlikely to have tuberculosis, and she agrees. They then click a button to see how the patient was actually diagnosed at the time of the X-ray. No TB.

O'CONNELL: Yay.

UNIDENTIFIED PERSON #2: OK, next case.

O'CONNELL: Hey. How are you?

MATTHEW LUNGREN: How's it going, guys?

HARRIS: Matt Lungren, a Stanford radiologist who's the main medical advisor for this project, comes in. He admits, first off, TB is not his strong suit.

LUNGREN: Usually, I'm exactly the opposite of the truth on TB.

(LAUGHTER)

LUNGREN: We just don't see any TB here, so that's the issue. OK. What do you got?

HARRIS: The film pops up. The algorithm says it's a likely case of TB. Lungren mulls it for a while before deciding to trust the algorithm's finding.

LUNGREN: Yeah, I'm going to go. I'll see.

HARRIS: Someone clicks a button, and the actual diagnosis pops up. Oops - not TB.

LUNGREN: Oh. We were wrong. I don't know. It's like I said, every time, just go the opposite...

(LAUGHTER)

HARRIS: The hospital diagnosis was based on the standard method - a sputum test, not an X-ray. The Stanford algorithm agrees with that call about 75 percent of the time. Rajpurkar says that's not too bad.

RAJPURKAR: And we also know that radiologists we have measured in South Africa get it right 62 percent of the time.

LUNGREN: And I get it right 50 percent.

(LAUGHTER)

HARRIS: Well, you're 0-for-1 right now - just saying.

LUNGREN: Exactly. Thank you for reminding me.

(LAUGHTER)

HARRIS: It's certainly beguiling to think a computer could do this better than a doctor. But will doctors trust an algorithm if they can't see for themselves how it reached its conclusion? John Zech has his doubts. He's training to be a radiologist. We sit down in a coffee shop patio in San Francisco, where he's doing a residency. Zech and his colleagues got interested in this project and dissected some of the pneumonia studies from the Stanford lab.

JOHN ZECH: Going to show you a few examples here.

HARRIS: He pulls up an X-ray on his iPad and notes that the lung has a big white spot, indicative of pneumonia. But the software indicates that it doesn't consider the white spot important in reaching its diagnosis.

ZECH: So if that's the case, like, what is it using?

HARRIS: Zech says sometimes the algorithm homes in on irrelevant information, like the type of X-ray machine. Those used in hospital rooms were much more likely to show pneumonia compared with those used in doctor's offices. That's hardly surprising since, if you have pneumonia, you're much more likely to be in a hospital.

ZECH: It was clear to us that it wasn't just looking for pneumonia in the lung, which is what you'd like such a model to do. It was - you know, it was being a good machine-learning model, and it was aggressively using all available information baked into the image to make its recommendations.

HARRIS: To put it bluntly, it was cheating. The algorithm also doesn't have access to a lot of information real-life doctors use when making tricky diagnoses, such as a patient's medical history, which he plunges into to sort out difficult cases.

ZECH: Medical diagnosis is hard. There's a lot of room. I want this technology to try to help me make those decisions.

HARRIS: Despite hype that this technology is just around the corner, Zech expects it will be a long time before a black-box algorithm can replace these human judgments.

CYNTHIA RUDIN: It's not clear that you really need a black box for any of it.

HARRIS: Cynthia Rudin is a computer scientist at Duke University who is a bit worried about where the field is heading at the moment.

RUDIN: I've worked on so many different predictive modeling problems, and I've never seen a high-stakes decision where you couldn't come up with an equally accurate model with something that's transparent, something that's interpretable.

HARRIS: Is it just that people are in love with this technology of black boxes, or are there other reasons why they want to employ them?

RUDIN: It's both.

HARRIS: It is just plain cool that you can give a bunch of data to a computer and it can train itself. It's also the case that it's easier to make a proprietary, commercially valuable product if it uses some sort of secret sauce nobody knows how to replicate.

RUDIN: And also, the black-box modeling software is much easier to use.

HARRIS: But Rudin says, especially from medical decisions that could have life or death consequences, it's worth putting in the extra time and effort to have a program built from the ground up, based on real clinical knowledge, so humans can decide whether to trust it or not. Pranav Rajpurkar and his colleagues at Stanford are acutely aware of this issue about black-box algorithms.

RAJPURKAR: The first thing I think about is not about convincing others but about convincing myself that this is, in fact, going to be useful for patients, and that's a question we think about every day and try to tackle every day.

HARRIS: One approach they've taken is they've added features so the algorithm not only comes up with an answer but also says how confident its human overlords should be in that result.

RAJPURKAR: In some way, a humble algorithm, but more importantly, an algorithm that's conscious of what it knows and what it doesn't know.

HARRIS: Perhaps most important - his team is not making a commercial product in secret. Instead, they're freely sharing their software and results so others can pick it apart and help the whole field move forward with more confidence. Richard Harris, NPR News.

(SOUNDBITE OF ANDREAS VOLLENWEIDER'S "STELLA")

Copyright © 2019 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.