IRA FLATOW, HOST:
This is SCIENCE FRIDAY; I'm Ira Flatow. We've all been there, sitting at the computer late at night, clicking on those websites that offer medical opinions, trying to convince ourselves that our headache must be caused by a brain tumor, right? Yeah, that dry skin you've had for the last couple of months, of course it's due to a thyroid disorder because that's what you're finding out on the Web. Recognize yourself?
You may be a cyberchondriac, convinced by advice and opinions you get online that your benign symptoms are really a cause for finishing up your last will and testament. And while the Web may sometimes mislead you into thinking you have a rare, unpronounceable disease, what's really more interesting to me is all those search queries can actually be helpful to doctors and drug companies, giving them clues about the side effects of drugs, for example.
Or when two drugs have a bad interaction with each other, something that may have slipped by researchers in the clinical trials - your searching can help scientists and doctors diagnose the spread and causes of disease.
My next guest is combining through those search results to do just that. Let me introduce him now. Eric Horvitz is a distinguished scientist at Microsoft Research, co-director of the Microsoft Research Lab in Redmond, Washington. Welcome back to SCIENCE FRIDAY, Dr. Horvitz.
ERIC HORVITZ: Hi, Ira, it's good to be back again.
FLATOW: Thank you. Tell me about - this new study that you did, you looked through the search results of Web users to find drug interactions.
HORVITZ: Yeah, so in general - well, a Pew study showed a few years ago that two-thirds of folks working with the Web are interested in health care. They're looking for health care information, they're doing self-diagnosis. We thought this would be a really interesting place to look for new kinds of medical knowledge to help the more traditional approaches to monitoring, for example, adverse side effects of drugs and drug interactions, post-marketing phase.
And this is typically a very complicated thing to study. So it's a collaborative work with close friends at Stanford. I'm working very closely with Ryen White, who's a fabulous collaborator at Microsoft Research. We looked at six million Web searchers and actually looked at - we're basically considering large-scale statistics with automated tools that are looking at queries made by people who consented to share their search activities with Microsoft for R&D purposes.
And we looked at 82 million queries, people querying on drugs and symptoms and so on, and we recognized an interaction between two drugs in the top 100 for U.S. - for the United States population. Paroxetine, which is a common antidepressant, and Pravastatin, known as Pravachol, a popular cholesterol-lowering drug, and found signs that these two drugs, when they interact, when they're both taken by the same person, in this case searched on by the same person, can cause a rise in blood sugar. Now - or hyperglycemia. Now to say...
FLATOW: And that was not known by the drug companies or published by the drug companies, let me put it that way.
HORVITZ: Well, here's the background, and let me frame this story a bit. So a team at Stanford had - going through an FDA database called AERS, A-E-R-S, the Adverse Event Reporting System, which is used by the FDA to collect and track side effects and interactions among drugs in the post-marketing phase - the team had recognized through statistical analysis previously unknown effects of those two drugs, Pravastatin and Pravachol - excuse me, Paroxetine and Pravastatin interacting to cause hyperglycemia, and they actually confirmed their finding statistically - this is a very intensive analysis - with electronic health records at three different medical sites and then working with actually even a mouse model.
And this was published, this work was published in 2011. We said let's go back, working with our Stanford colleagues, let's go back to 2010, before this was published, before this was known, and look at 12 months of logs, search logs of these people who consented to share their logs and see if we can actually build a tool that would automatically recognize this side effect of these two drugs when they're both searched on, are linked to people who also search on terms you would search on if you were experiencing a variety of symptoms of hyperglycemia.
And we found a significant bump compared to any - to people who were searching on one of the drugs.
FLATOW: 1-800-989-8255 is our number. So what you're saying in a larger scheme is that when people go online and search for things, you can use what they're searching for, by the logs that they leave behind in all those millions of searches, and you can find out stuff about the actions of drugs that we may not know of by...
HORVITZ: Exactly, and this is pretty exciting in a number of ways. Number one, obviously these are noisy signals. Not everybody searching on Paxil or Pravastatin are taking those drugs. But quite a few people who take them likely do search, sooner or later, especially when they might have side effects.
And on the large scale, large-scale statistics, which we find signal in the noise, you might call it, and not just that but the large-scale statistics is what we're interested in, and this actually is also great for protecting the privacy of any particular person.
FLATOW: Let's talk about how you could expand this. How could you expand this out into other health care?
HORVITZ: Well, this is a very good question, Ira, and we're very excited about a number of directions here. So for example, if you just take the same line of reasoning, you can imagine looking at medical devices post-marketing, and picking up clues or evidence that there is something going amiss for users. You know, does that CPAP machine cause tinnitus, for example?
This is really a nice way to look at drug interactions, new drug side effects, potentially, people are complaining about the rise of illness, interactions between illnesses. Do people who have - who search on sleep apnea a bunch - later search on heart disease and high blood pressure, for example?
So you can imagine a whole number of directions here where we make the large-scale Web, which is a place where human being do lots of things these days, a central network for health care.
FLATOW: Or you could do it, I would imagine, even with nutrition, with food additives, things like that, you know?
FLATOW: I'm just thinking out loud here, people who take melatonin - are they sleeping more?
HORVITZ: You know, but even back to the adverse drug events, it's just impressive how little we can study with formal clinical trials before we actually market, a drug company puts a drug in the marketplace. It's - the scale at which we can automate, even with noisy clues, on a very, very large scale - we can automate the search for signals given hypotheses - is enormous and wonderful.
Now, one thing I'd like to say is when we do this work, and if you want to look at the paper, it's on our website, we try to characterize how well the tool works. We want to get what we call a noise model so we know how to apply this to understand the false positives and the false negatives. And that's part of the research.
How do we take this big, noisy, crazy world of Web search and normalize it into a powerful tool?
FLATOW: Is it accurate, yeah. 1-800-989-8255. Two questions for you: One, in your study you got permission, right, from the people who you studied to put a little toolbar on their Web browser, right, so they could - you could follow them.
FLATOW: What about all the people who are listening to us today saying uh-oh, they're looking at my - where I'm going, and you know, I don't have that thing on my toolbar, but are they still going to be using my data and where I'm searching for?
HORVITZ: You know, this is a - I think the directions in privacy are critical for the Web and for the era that we're in, the revolution we're in with computing. And we need, as the computer science community, to come up with fabulous controls to make sure that we protect the privacy of users.
Our company, Microsoft, it takes this very, very, very seriously, as do other companies and competitors of ours. And so we're pursuing with, you know, vigorously ways to protect users. In this case we not only got consent, but we actually have a number of methods that anonymize the inputs so we can't track back, you know, who this was.
And the third part here, I think, which people like to hear, is that human eyes don't touch these queries. It's an automated crawler of sorts that goes over much, much bigger data than any human beings could ever have time to look at.
FLATOW: Mm-hmm. Let's talk about if these data-mining techniques could be used in hospitals or health care settings to improve patient care.
HORVITZ: Absolutely. In fact, in another arm of our research at Microsoft and other places and working with colleagues, we're working with actual electronic health care record data that hospitals are now collecting with higher and higher fidelity. So, for example, about a year ago, we fielded, commercially, classifiers, automatic systems that can learn from thousands of patient visits, and tens of thousands of variables about patient symptoms and lab tests and histories to predict, for example, that this patient being discharged today would be readmitted in 30 days, have a probability available to the discharging physician at discharge time to guide decision-making about what else they might do to keep that from happening.
FLATOW: Wow, wow. You've also studied what I began this section with, something called cyberchondria, for example...
FLATOW: ...when you come up with search results, you search - let's say, for example, chest pain. When you search for chest pain, what do you get? You think you have a little chest pain, maybe you came home from the gym and used the weights a little too much or something, and then suddenly your chest is - you've got a chest pain. If you search for it, you're going to come up with a whole bunch of stuff, aren't you?
HORVITZ: Yes. So we actually studied this phenomenon where - it's a very common activity, which we've all done. I would imagine most listeners in their cars or at home right now have done this. You have a few symptoms. You want to find out what's going on, so you do an online search, and you basically are using the Web as kind of a medical expert system.
HORVITZ: And, sure, it brings up movies, and the Web brings up news. Why can't it diagnose my symptoms, if you put one or more symptoms in? And what we found was, in the general case, there's a very high likelihood that this - that putting in very common symptoms - chest pain actually is quite common, and it rarely means heart attack, even in men my age, so - that haven't had a previous history, for example, of heart disease. But you can be led astray very quickly to rare, scary, fatal illnesses because of several factors, including the notion that there's more written about the scary things.
They're clicked - those scary items are clicked on more, giving the search engine feedback to bring those results up higher on the list. And you end up with what we call escalation. Looking at Web content will escalate people to thinking that their common symptom is not really common at all and they have a rare, fatal illness. I mean, nobody out there should ever put into a Web browser that they have a twitch in their hand or an eyelid is twitching, because in no time at all, you'll be thinking about a wheelchair for your - the onset of ALS, a terrible disease that is very scary to people.
FLATOW: Mm-hmm. But these searches - but you're saying in general that these searches are useful in the long run for people who can study what people are looking for...
HORVITZ: Right. So the Web is an incredibly valuable place for health care information. It's also a valuable sensor network for the studies that I talked about that we've done with our colleagues at Stanford. We also - in studying how prominently the Web is in people's lives these days for how they manage their own health, there - it has some rough edges on it, as well. One of the rough edges is the way indexing and crawling works when you mix the way content is indexed with cognitive biases, one of which is called the availability bias, the scary - this - the - when you see a bunch of scary things, the probability that you have that scary thing goes up in your mind. There are some rough edges that lead people astray at times. And we - I think people who - a certain set of the population that dwells on, for example, these scary things can really spend lots of time.
In fact, we actually found - you know, sort of, we could actually see automatically that many times, people came back again and again and again to the same search, with the same concerns, because of clear anxiety about the way the Web was answering their queries when they input very common symptoms. I mean, putting in the symptom headache, severe headache, way up on the list, as you said earlier on the show, brain tumor...
HORVITZ: ...appears, and all about brain tumors. It turns out one of the most common causes of a very severe headache, the worst in your whole life, is when you forget to have your coffee for a couple of days.
HORVITZ: And this is a very common situation, even at ambulatory health care clinics, people - patients will come in and say: My God. I think something is wrong. It's a hemorrhage of some kind or a tumor. And the doctors always check, what about your coffee? And that's very common...
FLATOW: But on the Web, they're just going to - not check. There's no one to check it. You're just going to get reassurance that you have a tumor.
HORVITZ: Well, caffeine withdrawal as a cause of headache is at the same - appears at the same rate in search results as brain tumor, and one's a lot more common.
FLATOW: Wow. Let me get a quick call in from Andy in Denver. Hi, Andy. Welcome to SCIENCE FRIDAY.
ANDY: Oh, thanks for taking my call. I was just curious about the study you were talking about earlier, where they had prior knowledge of this drug conflict that led to hypoglycemia. Has that same data-mining approach been used to find something without prior knowledge, to find some kind of other drug interaction that was not known ahead of time?
HORVITZ: So we actually - there's two parts to that - to the answer to that question. One is we actually took 60 pairs of drugs with known interactions, and two negatives and two positives for hyperglycemia. And this data was not well-known. It was basically sequestered away in special resources for physicians, and could test how well our methods work on these knowns. It turns out we can then build what we call an error model that tells us how well this will work on future combinations we haven't seen yet.
Now, let me tell you a quick, exciting bit of directional news. Ryen White and I created a system we called BLAERS, for Behavioral Log-based AERS. AERS is the FDA term for the Adverse Event Reporting System. And we've been now exploring many combinations as it - with a general tool, and we're seeing all sort of interesting interactions. And we're in touch with the FDA about this, and with our Stanford colleagues.
FLATOW: Mm-hmm. And so there's a whole constellation of different drugs you might be able to see interaction for.
HORVITZ: You can imagine. We can just take the top 100 and look at all combinations.
FLATOW: Oh. Quite interesting. Can you stay with us a few more minutes, doctor?
FLATOW: We're going to take a short break and talk more with Eric Horvitz, distinguished scientist at Microsoft. Our number: 1-800-989-8255 if you'd like to talk about drug interactions and talk about how using the computer, maybe, or cyberchondriac, something that I think we all suffer from. We used to call it med student syndrome or science reporter syndrome. We had everything that we were looking at in the New England Journal. 1-800-989-8255. Stay with us. We'll be right back after this break.
(SOUNDBITE OF MUSIC)
FLATOW: This is SCIENCE FRIDAY, from NPR. I'm Ira Flatow. We're talking about using the Internet to help diagnose drug interactions and other diseases. As you search for symptoms on the Internet, scientists are able to use what you're searching for and find out about how diseases are spreading or how - let's talk about that. Eric Horvitz is here. Let's talk about early uses of this. Wasn't it used, though, originally way back when, to - the Google Flu Trends, to find out how the flu was spreading?
HORVITZ: Yes. Some folks at Google did a really nice job around 2009; I guess they actually built a system in 2008, that were looking at flu-related search terms, coupling that with data from the CDC, the Centers for Disease Control, and building up a model that could actually provide advance warning, within a day, much faster than could be done by the CDC with the standard reporting done through a network of hospitals, of stats on flu around the United States. This work was very interesting and visionary. And the system was in the news again just recently, when it kind of messed up a bit, and it was kind of an interesting story.
FLATOW: It messed up. It got overloaded?
HORVITZ: Well, what happened was, when you build a model or a predictive system that's trained up and that's working on the data that it's seen in the past, if something anomalous happens, if that system isn't updated very quickly, you might get erroneous predictions. And so the Google Flu Trends system was in the news recently, when it was noted that it predicted - I think it was about twice the amount of flu that - than there was occurring, because of several anomalous factors.
For example, this year, the flu season started quite early, in November. I think it was the earliest since 2003. There was a predominant strain this year called H3N2. This is the most virulent of the three seasonal strains of influenza.
And because there were more deaths than usual, unfortunately - in this case particularly the elderly - and more serious illness because of this strain, coupled with the early rise of that flu, it became newsworthy, and there was quite a bit of news on it. And the media can actually stimulate searches - anxiety, interest - and that could overload a signal with new kinds of activity that the system was not used to. And so this - the message here is that these systems need to always be alive and active and learning and understanding even things like the influence of the news media on what people search on.
FLATOW: Speaking of searching, we have time for, I think, one more tweet. It comes in from David O'Leyar, who says: Do you have any sites that you feel are authoritative? Which ones should we be looking at, when it comes to health issues, for info?
HORVITZ: Well, if you look at our paper in 2008 on cyberchondria, we explored the difference between general Web search with any search engine, doing a focus search on a well - an authoritative health care site, and then looking at just Web crawl more generally, the statistics of correlation between these rare illnesses and kind of symptoms, for example. We found that going to an authoritative website - like a Mayo, for example, MSN Health - these sites would provide much better information than general Web search because of these factors we explored in that paper.
FLATOW: Mm-hmm. Well, Doctor Horvitz, I want to thank you very much for taking time to be with us today.
HORVITZ: It's been great to be here, thanks.
FLATOW: And this is just - this stuff is just in its infancy, is it not? I mean, there's a long way to go in...
HORVITZ: Yeah. And it's a very exciting path to go on, I think.
FLATOW: All right. Eric Horvitz is a distinguished scientist at Microsoft Research, co-director of the Microsoft Research Lab in Redmond, Washington. And we're happy he was with us today.
NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.