Why Good Science (Sometimes) Gets Things Wrong : 13.7: Cosmos And Culture Is science an unreliable partner? Commentator Tania Lombrozo admits that apparent contradictions between scientific studies, particularly those related to human health, can be unsettling. But she says that's no reason to give up on one of your most important partners in life.

Science: A Relationship You May Not Understand

Don't let your beautiful relationship with science run up on the rocks just because of the occasional contradiction or misunderstanding. Take a minute to try and see things from another perspective. Pascal Guyot/AFP/Getty Images hide caption

toggle caption
Pascal Guyot/AFP/Getty Images

Don't let your beautiful relationship with science run up on the rocks just because of the occasional contradiction or misunderstanding. Take a minute to try and see things from another perspective.

Pascal Guyot/AFP/Getty Images

Eating more antioxidants can reduce your risk of stroke and dementia. Or maybe not. Moderate alcohol consumption has some health benefits. But also some risks. Women should take calcium supplements. Or maybe they shouldn't.

Sound familiar? Just when you thought you knew what you should and shouldn't be doing to improve your health and wellbeing (whether or not you were actually doing it), new science comes along and changes the story. It's enough to feel betrayed; to decide that science is unreliable.

Before your relationship with science ends up on the rocks, I urge you to consider the situation from another perspective.

Most of the time, when two findings actually or apparently conflict, it's not the result of foul play or error. It's the result of how science works. And if you understand why, you might be more inclined to forgive science for its inevitable inconsistencies. You might even come to find them intriguing, if not exactly charming.

Here's the key insight you need to understand the inner life of science: the conclusions drawn from scientific studies almost always involve generalizing from a sample to a population.

If science didn't make such generalizations, it wouldn't be terribly useful or interesting. After all, we want to know whether we should eat antioxidants and whether we should feel guilty about that second glass of wine, not whether the people in a particular study should have. But with generalization comes the possibility for error.

To understand why, it helps to consider a simple example. Suppose you're interested in testing the hypothesis that men are, on average, taller than women. You could measure the height of every person alive today — approximately 7 billion people — and see whether the average height for men is greater than the average height for women. That would take you a rather long time. So, instead, you might measure the heights of a sample of men and women to see whether the average height for the men is greater than the average height for the women in that sample.

But how do you choose the sample? Ideally, you want your sample to be perfectly representative of the population, something that you're likely to approximate by choosing people at random. So if you had a list of all the people on the Earth, you would want to choose some subset — say 100 men and 100 women — at random. Then you could fly around the world measuring those 200 people in their varied and remote locations.

This might be a nice adventure, but hardly the way to do efficient research. (And good luck getting funding!) Even if we did have a complete list of all people and how to find them, there would be some probability that the 200 we chose weren't representative of the population. We could be unlucky and choose a few unusually tall women and a few unusually short men, ending up with average heights that are exactly the same.

This is where statistical tests come in: they can tell us, roughly, how likely it is that we would observe a particular result by chance given that we're sampling from a population with particular characteristics.

In psychology, my field, the standard practice is to consider a result "statistically significant" if the probability that it was generated by chance is under 5 percent. But using this criterion, there is still some probability that the "significant" result was due to chance alone.

One implication is that about 1 in 20 "significant" findings is likely to be a fluke. In practice, the number may be far larger, as scientists often don't publish papers that fail to find a significant result. So, published research is likely to overrepresent the flukey 5 percent. And if the flukey 5 percent are especially interesting, perhaps because of their novel and unexpected findings, then media coverage may exaggerate this overrepresentation even further.

Surely you won't hold science responsible for bad luck and a little press? We can still make this relationship work, right?

Unfortunately, though, the challenges of generalizing from a sample don't end with statistics. Recall that our hypothetical study involved a magical list of all humans and their locations as well as (NSF-funded?) travel around the world.

In a more realistic situation, you'll have to find a more convenient sample and a less-expensive strategy for data collection.

Perhaps you can measure the heights of 200 men and women in a local park one Sunday morning. Perhaps you find that the men are, on average, taller than the women. The most conservative conclusion is that men are taller than women in this particular sample at this particular time with this particular method of measurement. But that's not terribly useful or interesting. We're more likely to want to know about men and women in general.

Are we warranted in drawing conclusions about men and women in general from our park-going sample? Maybe our conclusions should only extend to American men and women. Or just people who go to parks? Or could it be people in the park on Sundays (after all, it could be that heights change throughout the week)? Maybe the study only tells us about people whose heights are being measured (after all, height could be affected by the very process of measurement — quantum mechanics tells us that crazier things are possible). What if that Sunday morning happened to involve a men's basketball practice? Then we'd have a whole new reason to doubt the generality of our result.

Of course, this example seems a little silly because we already know a lot about height. We know that it's unlikely to vary much from Sunday to Tuesday and that basketball players might skew our sample. But when it comes to many targets of contemporary research, we're working from a place of relative ignorance.

Consider the Nurses' Health Studies: they have involved over 200,000 nurse participants since 1979 and yielded tremendous insights into women's health. To what extent do the findings generalize to men? To what extent do the findings generalize to women from other cultures, or with very different lifestyles from nurses? Or to women being born today, who are likely to have somewhat different medical histories from the women born decades earlier?

There's also a generalization problem when it comes to characterizing the factors measured in particular studies. For example, a recent study suggests that high total antioxidant consumption isn't protective against stroke and dementia. However, particular antioxidants are protective, at least when consumed in fruits and vegetables. So we might have thought that a previous finding linking blueberry consumption to positive health benefits was a consequence of "antioxidants in general," when really it was a consequence of "antioxidants acquired from fruits." (Or maybe just from blueberries. Or maybe just for people whose diets are also high in calcium, who have a particular genotype and who either play the flute or have a wry sense of humor. We just don't know yet.)

So, some of the time when two studies appear to be in conflict, it's because the generalizations that were drawn on the basis of one or both sets of findings were too broad, straying too far beyond the characteristics of the particular sample and the particular factors considered. Sometimes it's the "fault" of the world, for providing a statistically unrepresentative sample. Sometimes it's the fault of the scientists, for choosing a poor sample or mischaracterizing the population to which the findings apply. Sometimes it's the fault of reporters, for straying too far beyond the data. Sometimes it's the fault of the editors, who opt for the catchier — but less accurate — headline. [Editor: who, me?] And sometimes it's all of the above.

Having what we took to be solid, scientific knowledge shift beneath our feet can be unsettling. But it shouldn't be that surprising. It is, after all, how science works.

Science, by its nature, involves enormous hubris: we try to make sense of the world from our limited observations. We expect that what we observe here and now will tell us something about what we haven't observed and may never observe. Science is all about generalizations.

But science is also modest: it changes in light of new evidence. Science is willing to admit when it's wrong. And it's this combination that makes science such a powerful partner — one worth sticking with in sickness and in health.

You can keep up with more of what Tania Lombrozo is thinking on Twitter: @TaniaLombrozo