Big Data Projects Surpass Biomedical Scientists' Ability To Analyze Them : Shots - Health News There's a plethora of projects to gather data about the brain, various kinds of cancer and every type of cell in the body. But researchers are struggling to keep up with the information explosion.
NPR logo

Big Data Coming In Faster Than Biomedical Researchers Can Process It

  • Download
  • <iframe src="https://www.npr.org/player/embed/503035862/503632456" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript
Big Data Coming In Faster Than Biomedical Researchers Can Process It

Big Data Coming In Faster Than Biomedical Researchers Can Process It

  • Download
  • <iframe src="https://www.npr.org/player/embed/503035862/503632456" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript

ARI SHAPIRO, HOST:

In medical research, small-scale projects are given way to huge efforts that rely on big data. The most famous of these is the Human Genome Project about 15 years ago. Now there are many others, and they share a common problem. As NPR's Richard Harris reports, scientists are gathering mountains of data far more quickly than they're able to make sense of it.

RICHARD HARRIS, BYLINE: Advertisers and retail chains collect a vast amount of data from their customers. And they've learn to squeeze a lot of commercially valuable information from it. So naturally, scientists would like to use the same approach to revolutionize health and medicine. Francis Collins, head of the National Institutes of Health, recently ticked off a long list of data-gathering efforts that followed in the wake of the human genome project, like the Cancer Moonshot and the BRAIN initiatives.

(SOUNDBITE OF ARCHIVED RECORDING)

FRANCIS COLLINS: We have the precision medicine initiative, which aims to figure out, by enrolling a million Americans, what really are the factors that are involved in health and disease?

HARRIS: And doctors coast to coast have spent billions to make their medical records digital so they can be mined for hints about how to improve wellness and conquer disease. Collins led a conversation about the issue at a meeting of advocates called Partnering for Cures, in New York. Atul Butte from UC San Francisco told the audience it's an embarrassment of riches in more ways than one.

(SOUNDBITE OF ARCHIVED RECORDING)

ATUL BUTTE: It's not just that any one data repository is growing exponentially. The number of repositories is growing exponentially.

HARRIS: Spending is now heading into the hundreds of billions of dollars, he said. But what's not growing is scientists' ability to make sense of that avalanche of data.

(SOUNDBITE OF ARCHIVED RECORDING)

BUTTE: As a country, I think we're investing close to zero analyzing any of that data.

HARRIS: And mining it for hints about health and disease isn't nearly as easy as, say, having Google figure out what you like and what ads to serve up for you. The raw data are not very robust and reliable. Electronic medical records can be tricky to work with and likely to have errors. And Greg Simon, who runs the Cancer Moonshot initiative, says data collected from scientific studies aren't trustworthy either.

(SOUNDBITE OF ARCHIVED RECORDING)

GREG SIMON: So many articles that are published today are going to be wrong in 10 years. That's just the history of scientific research. And the question is you just don't know which ones are going to be wrong.

HARRIS: So scientists trying to figure out how to analyze that flood of big data are going to have to cut through the dissonance to find a melody. Robert Califf, commissioner of the Food and Drug Administration, says that's no mean feat.

(SOUNDBITE OF ARCHIVED RECORDING)

ROBERT CALIFF: In a world where anything is possible because you have so much data, how do you figure out who's done the math right, what's inside the box that gives you the answer?

HARRIS: He said the only way to know for sure is to take ideas gleaned from big data sets and then try them out in people. That means convincing patients to participate in studies. Just a small percentage do today.

(SOUNDBITE OF ARCHIVED RECORDING)

CALIFF: And what we're seeing in our best academic centers, the clinicians say they don't have time to talk to patients about participating in studies. So far and away, this is our No. 1 issue now that we're focused on with big data.

HARRIS: These problems aren't just abstractions for Sonia Vallabh. Her mother died of a rare, fatal genetic disease in middle age, called prion disease. Vallabh quit her job as a lawyer and became a medical researcher at the Broad Institute in Cambridge, Mass. She turned to a huge set of genetic information to see what she could learn about her condition.

(SOUNDBITE OF ARCHIVED RECORDING)

SONIA VALLABH: We basically confirmed what we thought we knew about my genetic mutation, which is that it makes me almost a hundred percent likely to die this way by midlife.

HARRIS: But the data also yielded a surprise. Her disease is caused by having too much of a certain protein in her body. And some people with only half as much of this dangerous protein didn't get sick and die.

(SOUNDBITE OF ARCHIVED RECORDING)

VALLABH: So here's an experiment of nature handed to us on a platter by big data that says if we can find a way to turn down this disease protein - this protein that wants to kill me - that should be a safe way to delay or prevent disease.

HARRIS: But that's not a question for big data. Vallabh needs the old-fashioned kind of medical research, laboratory and clinical science, in order to develop a drug that would reduce the protein safely and effectively.

Richard Harris, NPR News.

Copyright © 2016 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.