NPR logo Researchers Tap Web Chatter To Figure Out Who's Sick

Public Health

Researchers Tap Web Chatter To Figure Out Who's Sick

Hotspots show where the common cold is popping up across the U.S. via Sickweather hide caption

toggle caption
via Sickweather

Hotspots show where the common cold is popping up across the U.S.

via Sickweather

What if you could track people getting sick just by analyzing how they surf the Web?

Researchers from Google and the Centers for Disease Control and Prevention tried that back in 2009. They linked the number of Google searches for flu-like symptoms with the percentage of doctor's visits related to the flu. The findings suggested that search patterns alone could reveal how many people probably had or were about to get sick with flu.

From the work, Google Flu Trends was born. But critics quickly found that the estimates weren't as accurate as first thought. The algorithm underestimated the number of flu cases in 2009 and overestimated them in 2012. Google responded by tweaking its algorithm in 2009, 2013 and again in late October.

But the challenge to make online disease sleuthing more accurate continues. On Monday, researchers suggested Wikipedia searches might forecast the flu and a paper published Oct. 30 in the Royal Society of Open Science said combining Google Flu Trends with historic data on flu levels gives us the most accurate look yet.

Article continues after sponsorship

Shots decided to take a look at a few of the public health issues we've followed in the past and what we've learned from their limitations.

Five Public Health Issues Tracked Through The Internet

  • 1. INFLUENZA

    via HealthMap
    Flu map via HealthMap.
    via HealthMap

    This graph of flu trends for the past two years comes from HealthMap. The group, part of Boston Children's Hospital, draws data from different sources including Google Flu Trends, the self-reporting flu site "Flu Near You" and the CDC. The approach allows people to compare how accurate different models are.

    What We Learned: "The ways in which people communicate and search online changes," says Harvard epidemiologist John Brownstein, one of HealthMap's team members. The data are full of information not related to the flu. Keeping up on how people talk online and adjusting models to filter out misleading chatter is a big challenge. (We talked to Brownstein in 2012 for a post about "Webidemiology." Check it out for a video explainer.)

  • 2. EBOLA

    via HealthMap
    Ebola outbreak map
    via HealthMap

    This Ebola map and timeline are from HealthMap. The map uses information from the International Society for Infectious Disease, the World Health Organization, Google News and others to chart the outbreak between March 14 and Oct. 29. The group's model estimates the number of Ebola cases in the next 18 days.

    What We Learned: The results can help public health officials. HealthMap spotted an unknown hemorrhagic fever in Guinea that turned out to be Ebola nine days before WHO made the formal determination that the virus was on the loose. Brownstein says that in places like West Africa, data are harder to come by.

  • 3. COMMON COLD

    via Sickweather
    The common cold on the East Coast.
    via Sickweather

    This map of the northeastern U.S. shows late October data from the Sickweather app. The heat map highlights the prevalence of words such as "fever," "sick," "cough," "sore throat" and "runny nose" on Twitter and Facebook. The map also uses self-reported sickness from the app's users.

    What We Learned: Graham Dodge, CEO of Sickweather, says his team had to develop special dictionaries containing thousands of everyday phrases and keywords to help the software understand and disqualify false reports. Now, if someone says, "I look so hot right now," the algorithm knows the person isn't running a fever.

  • 4. FOODBORNE ILLNESS

    Courtesy of Cory Nissen
    Reported locations of food poisoning in Chicago created by Cory Nissen.
    Courtesy of Cory Nissen

    This map of Chicago shows restaurants where someone reported getting sick between March 2013 and Jan. 2014. FoodBorne Chicago used Twitter to identify restaurants that may have violated city health codes. Over 10 months, the team zeroed in on 270 tweets and asked the people who tweeted to fill out a survey. All told, customers completed 193 surveys about foodborne illnesses. City inspectors went to the implicated restaurants. Sixteen percent failed inspection and 25 percent passed with critical or serious violations. A similar study was done with Yelp in New York City in 2013.

    What We Learned: Tweets can provide valuable hints, but sometimes more detailed information is necessary. "People are willing to follow-up on that utterance" on Twitter, says Daniel O'Neil, who worked on the FoodBorne Chicago project.

  • 5. HIV

    Sean Young/UCLA
    HIV map from Sean Young at UCLA
    Sean Young/UCLA

    This map was made from a collection of 550 million tweets during six months in 2013. The locations represent a tweet related to behavior that could put someone at risk for HIV, including sexual behavior and drug abuse. Behavioral psychologist Sean Young and his team at the UCLA Center for Digital Behavior collected the data with the idea that it could be used to help prevent the spread of HIV.

    What We Learned: HIV is difficult to track because surveillance statistics from the CDC aren't released as frequently as researchers would like. Young says he needs more up-to-date official data to combine with real-time tweets to forecast behavior.