Episode 677: The Experiment Experiment : Planet Money How much of published scientific research is false? Scientists are trying to figure it out.

Episode 677: The Experiment Experiment

  • Download
  • <iframe src="https://www.npr.org/player/embed/463237871/463238402" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript


OK. So, can you read me the title of the paper?

BRIAN NOSEK: (Reading) "Feeling The Future: Experimental Evidence For Anomalous Retroactive Influences On Cognition And Affect."

KESTENBAUM: Feeling the future meaning, like, people can see into the future, basically ESP.



We're talking to Brian Nosek here. He's a psychologist at the University of Virginia.

KESTENBAUM: And this is published where?

NOSEK: In the Journal Of Personality And Social Psychology.

KESTENBAUM: It's, like, the top journal in the field.

NOSEK: That's right.

KESTENBAUM: Peer-reviewed?

NOSEK: Peer-reviewed - extensively.

GOLDSTEIN: This paper came out in 2011, and Brian Nosek, like a lot of psychologists, thought - what is going on here?

KESTENBAUM: Like, there's no way that ESP is real. But then again, here is this paper describing not just one experiment but nine different experiments.

GOLDSTEIN: Brian Nosek says some of them are really pretty convincing, even today when you flip through the paper.

NOSEK: The main psy hypothesis was that participants would be able to identify the position of the hidden picture significantly more often than chance.

KESTENBAUM: In one experiment, he says, people were told to stare at this computer screen, and they were told that an image was going to appear on either the right site or the left side. And they were asked to guess which side. Like, look into the future. Which side do you think the image is going to appear on?

GOLDSTEIN: And people do OK on this. They do better in this study than if it had just been random chance.

KESTENBAUM: This paper was published by a scientist at Cornell University, a researcher who's really well-regarded. And when Brian Nosek read the paper, there did not seem to be any obvious problem with it.

NOSEK: The paper is beautiful. It follows all the rules of what one does, does it in a really beautiful way.

GOLDSTEIN: This ESP paper threw the field into really a crisis because, Nosek says, there were basically two possible explanations.

NOSEK: Either we have to conclude that ESP is true because there's these nine studies that show it, or we have to change our beliefs about the right ways to do science.

KESTENBAUM: Nosek is going with option B. And not just for psychology experiments - for biology, economics, studies that can affect public policy, medical decisions.

He thinks there is something wrong with the way we're doing science.


KESTENBAUM: Hello and welcome to PLANET MONEY.

I'm David Kestenbaum.

GOLDSTEIN: And I'm Jacob Goldstein. Today on the show - is something rotten at the heart of science? Or is ESP real?

KESTENBAUM: I hope ESP's real.

GOLDSTEIN: It could be both.

KESTENBAUM: I knew you were going to say that - not because I have ESP...

GOLDSTEIN: (Laughter).

KESTENBAUM: ...Just because I know you.


SVARE AND MICHAELS: (Singing) The weather is nice. The time of our lives is happening before our eyes.

UNIDENTIFIED WOMAN: Support for this podcast and the following message come from Showtime. January 17, Emmy winners Paul Giamatti and Damian Lewis star in the series premiere of the bold new drama "Billions."

U.S. Attorney Chuck Rhoades, a hard-charging prosecutor, is determined to take down the brilliant hedge fund king Bobby Axelrod. Two of New York's most powerful titans are locked in an epic battle of egos, and there's no line both men won't cross to win. "Billions" - series premiere January 17 only on Showtime. Download the app to start your free trial now.


SVARE AND MICHAELS: (Singing) We are going to run, run, run into the sun.

KESTENBAUM: The ESP research paper raised this very deep question for science in general. If this paper is wrong, but the researcher followed all the rules and experiment protocols, what else out there is wrong?

GOLDSTEIN: What about all those thousands and thousands of other studies? You know, just walk through a research library at a university. You see shelves and shelves of bound volumes of journals, going back decades, full of big, important experiments like, hey - we cloned a sheep.

KESTENBAUM: All that stuff you read in the newspaper - oat bran reduces the risk of heart attacks.

GOLDSTEIN: Or if you want to get somebody's attention, whisper in their right ear. That apparently works better than the left.

KESTENBAUM: How much of the stuff in those books is just wrong? Nosek says after that ESP paper comes out, no one's really sure.

NOSEK: People are saying, oh, there is a huge problem, or no, there's no problem at all - what are you talking about? And it's all speculation. And we just don't have any data.

GOLDSTEIN: Brian had been thinking about this stuff for a long time. And when that ESP study came out, everybody else in the field was suddenly thinking about it too. And Nosek figured, maybe this is my moment to do something, to get to the bottom of this question.

KESTENBAUM: He decides to do this thing that is supposed to be part of the whole normal scientific process but rarely actually gets done. He decides he's going to try to repeat a bunch of experiments, redo them - get the original papers, look at how the researchers did the experiment and try to replicate it, see if it actually holds up.

GOLDSTEIN: Nosek's lab can repeat, you know, one study, but repeating study after study after study...

NOSEK: No. We're not going to put aside all of our active research in order to try to replicate a bunch of studies in our literature. We just don't have the time to do it or the resources.

GOLDSTEIN: Why not? I mean, why couldn't that just be your work?

NOSEK: Because the grad students in my lab would never get jobs.

KESTENBAUM: So Nosek wasn't sure who else was going to join him on this quest. I mean, repeating work that has already been done is not how you get tenure, you know. You put in all this work, and if you confirm the original results, everyone's going to be like yeah, yeah. We knew that already

GOLDSTEIN: But Brian figured - what the heck? He'd try to find enough researchers to do that thing, to repeat lots and lots of experiments.

NOSEK: So we sent out an email to people in the field saying - are you interested? Do you want to join this effort? And dozens of people instantly joined. There are a lot of people that were interested in this issue. And so it was like, oh, wow. This can really be a project.

GOLDSTEIN: In fact, there's so much interest that Nosek decides we're going to redo 100 experiments - nice, big, round, impressive number.

KESTENBAUM: And he decides we're going to take them from the top journals - three of the top psychology journals. And so in 2012, hundreds of researchers in labs all over the place get to work. They start redoing all these experiments.

GOLDSTEIN: At Michigan State, David Johnson and his colleagues were trying to repeat this instrument that had shown if people wash their hands, they make less severe moral judgments.

DAVID JOHNSON: We had to have our participants wash their hands, and it had to be done in a way that didn't draw attention to the fact that they were washing their hands.

KESTENBAUM: In Virginia, researchers were testing what you could think of as the afternoon treat hypothesis. E.J. Masicampo at Wake Forest University did the original study.

E.J. MASICAMPO: So I essentially showed that a sugar boost, so drinking a sugary lemonade, makes people engage in more effortful decision-making.

GOLDSTEIN: If you're tired, drinking something with a bunch of sugar in it helps you think harder.

KESTENBAUM: Makes sense. But is it true?

The researchers doing the replications tried to repeat the experiments exactly. They followed the original protocols as closely as they could. If the original sugary beverage experiment gave the students lemonade, when they repeated it, the students got lemonade.

GOLDSTEIN: One by one, the researchers finished their work, submitted their result. But Brian Nosek, he didn't look at any of those results while the project was still going on.

GOLDSTEIN: So when did you first see the results?

NOSEK: It was the day we hit 100 studies completed. We shut down the project. We're done. We've hit 100.

KESTENBAUM: Someone else had been keeping a spreadsheet tracking all the replications and the results, and Nosek finally looks at this spreadsheet to see how everything turned out.

GOLDSTEIN: Yeah, he just goes down the list. You know, there are these 100 studies they've done, and for each one, he answers this basic question - did it confirm the original finding - yes or no? He just goes down the list - yes, yes, no.

NOSEK: And then just get to the end and count up 39 yeses and 61 nos.

GOLDSTEIN: So you found that, of these 100 studies, in most of them, they did not confirm the original finding.

NOSEK: Right.

KESTENBAUM: Jacob, this is amazing to me. Like, you know, I was a physics major. I went to grad school. Like, I spent a bunch of time in the research library, walking past all these bound volumes of journal articles. And for sure, if you had asked me back then, like, if I thought the findings in them were more likely to be true than false, I would say yes, definitely. And he found, like, 60 percent could not be replicated. Like that lemonade experiment, the hand-washing experiment - neither of those could be replicated.

GOLDSTEIN: Yet, to be honest, this number - this 39 yeses and 61 nos - this number right here is why I wanted to do this story. I mean, I found it - I found it completely shocking. And Nosek himself - he says he didn't know what to make of it.

NOSEK: Yeah, well, I - you know, I don't know what I thought, more of - well, how the heck did that happen?

KESTENBAUM: How do a bunch of scientific studies that seem to follow all the rules of the scientific method and make it into peer-reviewed journals - how is it possible that so many of them seem to be wrong?

GOLDSTEIN: The first thing to say - this is not about fraud. I mean, sure, there are examples out there of researchers just straight-up faking their results, but Nosek says - that's not what's going on here.

KESTENBAUM: He thinks what is going on here is something really more interesting. One of the ideas for what's happening is that a lot of what you are seeing in the journals are flukes, statistical flukes.

Here, let me give you an example. This is from a little experiment I did this morning flipping a coin 10 times.




KESTENBAUM: Heads again.




KESTENBAUM: Oh, heads.


KESTENBAUM: Heads again.

I flipped it 10 times, Jacob. Nine times I got heads, only one tales. If you do this statistical analysis you do for, like, a drug trial or any scientific paper, this looks like a remarkable result. There's only, like, a 1 percent chance that this is a fluke. So I send this off to the Journal Of Coin Flipping or whatever.

GOLDSTEIN: You're going to get this published. What are you going to call it? "Heads Up..."

KESTENBAUM: (Laughter) "Heads Up..."

GOLDSTEIN: ..."Coin-Flipping Bias In American Quarter Dollars Minted In 1977."

KESTENBAUM: You can imagine the world going crazy, right.


GOLDSTEIN: Schoolyards everywhere.


GOLDSTEIN: NFL before the game - whole new strategies are emerging based on your study.

KESTENBAUM: What you do not see in reading this one article in the Journal Of Coin-Flipping - and this is key - is that a bunch of other researchers have also been doing this experiment, though their results are less spectacular.









KESTENBAUM: Half heads, half tails - there a lot of experiments like this, so why don't you read about these in the journals?

GOLDSTEIN: Because it's such a boring (laughter) finding, right? Like, you get this finding. Nobody's going to publish it. And if you're the researcher, you think - I'm not even going to send it off. I'm just going to stick my results here in this file cabinet with all the other failed experiments.

KESTENBAUM: Nosek says there's even a name for this.

NOSEK: It's called the file drawer effect. Journals are much more likely to publish a positive result than a negative result. In fact, 97 percent of results in psychology that are published are positive results.

KESTENBAUM: Positive here means, hey, we found something. Negative is nah, didn't find anything.

GOLDSTEIN: So is there some whole shadow universe of negative results that nobody ever heard of?

NOSEK: There are tons of them, and everyone can tell you about the effects that they have in their labs that aren't published.

GOLDSTEIN: And this, of course, this is a disaster. You know, you can imagine lots of people doing the equivalent of testing whether coins are more likely to come up heads or tails when he flip them. Almost all of them find, of course, there is no difference. But one, David Kestenbaum, finds they're more likely to come up heads, and that is the one that gets published.

KESTENBAUM: This file drawer effect is the kind of thing that can help explain that ESP paper.

GOLDSTEIN: And also, it can partially explain why there were all those nos on Nosek's spreadsheet, why so many studies failed to replicate. But Nosek says the file drawer effect is not enough. It doesn't explain all of it. There aren't that many studies in people's file drawers.

KESTENBAUM: Nosek thinks there is another big issue going on here. And this one is a little touchier because it points to this human sort of weakness we have, our ability to trick ourselves, to subconsciously kind of skew the results.

GOLDSTEIN: So, OK, let's go back to the coin-flipping example. Let's say that at the beginning of the experiment, out plan is we're going to flip the coin 10 times and see what we get. We flip the coin 10 times, and we get seven heads.

KESTENBAUM: Now, seven heads is interesting. You look at that, and you say hey, seems like we're getting more heads than tails. But it is not enough to publish, so you do something that seems reasonable. You say let's get some more data. Let's add some more flips. Flip the coin, I don't know, four more times. You do that, you get a few more heads. And now you're, like, hey, we found something. Let's publish.

GOLDSTEIN: It seems reasonable. It is not reasonable. You are effectively, here, changing the rules in the middle of the game, in the middle of the experiment. And, you know, you could just keep extending the experiment until you find something that seems exciting.

KESTENBAUM: Yeah, we didn't get enough heads. Keep flipping. Flip a few more, few more - oh, now it looks interesting. Let's stop.

GOLDSTEIN: If you do that, what happens is you increase the chances that you'll find something that seems really interesting but is actually just a fluke, just random chance.

KESTENBAUM: And if you're like come on - does that really happen? Yeah, it really happens. Steve Lindsay is a psychology professor at the University of Victoria. He says he and lots of other people have made this kind of mistake in the past.

Here's how it happens.

STEVE LINDSAY: OK. Let's start now. We test 20 people and say, well, it's not quite significant, but it's looking promising. Let's test another 12 people. And the notion was, of course, you're just moving towards truth. You test more people. You're moving towards truth. But in fact - and I just didn't really understand this properly - if you do that, you increase the likelihood that you will get a, quote, "significant effect" by chance alone.

KESTENBAUM: There are lots of ways you can trick yourself like this, just subtle ways you change the rules in the middle of an experiment. Here's Brian Nosek again.

NOSEK: When I do research in the laboratory, I have choices I make about how to analyze the data and about what of the data that I get to report. And so I might be more likely to find a way of analyzing the data that looks good for me - right? It confirms my hypothesis. It provides a result that's exciting, that's very publishable. I might decide that must be the right way to analyze the data, and I might do that while thinking and trying to be genuine and accurate. But - and the fact that I have a conflict of interest in this, where the results have implications for me and my career advancement, means that I might construct stories to myself that lead me to finding results and reporting results in literature that just are exaggerations of reality that just aren't true.

KESTENBAUM: I'm going to start a journal called the Journal Of Exaggerated Reality.

GOLDSTEIN: You'll overtake the Journal Of Coin-Flipping as number one.

KESTENBAUM: But that is the fear, right, that there are a lot of journals out there that are the Journals Of Exaggerated Reality.

GOLDSTEIN: Yeah. I mean, you know, you put together the file drawer effect and this problem of scientists tricking themselves and it goes a long way toward explaining this, a long way toward explaining why, you know, in Nosek's project, so many of those studies failed to replicate.

KESTENBAUM: These are not just issues in psychology. People are suddenly now really worried about this kind of thing in lots of fields. And there is this big new push to do something about it. We talked to an economist who's doing something similar to what Nosek did, trying to repeat a bunch of experiments, this time in economics.

GOLDSTEIN: Yeah. And there are also projects going in ecology, another one in cancer biology.

KESTENBAUM: And just to be clear, like, there are certainly plenty of things that science knows for sure, like smoking causes cancer. Climate change is real. The Higgs boson, I'm telling you, exists. There's lots of stuff that gets published in the journals where there's just no question about it. Like, it happened on a lab bench somewhere. They can repeat it. Other people have repeated it. It is definitely true.

But there is this other category of scientific research. You know, single studies that have not been replicated that rely on correlations, where the file drawer effect is a problem. And researchers tricking themselves, even though they're trying not to, is a problem.

GOLDSTEIN: Nosek says there is this one thing that would go a long way toward fixing these problems - the file drawer effect and researchers tricking themselves - and this thing is pretty simple.

NOSEK: Before you do the study, you write down how you're going to do it, how you're going to analyze your data and what you're going to try to learn.

KESTENBAUM: You write all that down and then you submit it to an online registry. That makes it impossible to change the rules as you go along. And then when you finish your experiment, you put your results in the registry, too, even if you do not find anything. Those results that would normally go in the file drawer - those get made public so that everybody knows what you found.

GOLDSTEIN: And this has, in fact, happened in other fields. In drug research a while back, they made this mandatory, and these have had a huge effect. According to this one analysis, before the registry was created, more than half of the published studies of heart disease showed positive results. After the registry was created, only 8 percent had positive results - from more than 50 to 8.

KESTENBAUM: Steve Lindsay, the psychologist who talked about the problem of adding more people to an existing study to try to get a significant effect, he is also the editor right now of a journal called Psychological Science. And he says when scientists come to him now with their results, hoping to get published, he often points them to the registry.

LINDSAY: We're quite often saying OK, this looks promising. This looks interesting. I want you to run a pre-registered replication.

GOLDSTEIN: You mean, you won't publish it unless they do that.

LINDSAY: That's right.

KESTENBAUM: This is a hard thing. I mean, as someone who used to do research and hang out with researchers, like, it feels like someone's saying we don't trust you. You have to write down exactly what you're going to do before you do it. Like, if someone had told me that when we were doing physics, I would've been, like, that's fine. I don't need to do that. Like, I know what I'm doing.

GOLDSTEIN: Yet you can really see how this hold thing - the registry and even the replication themselves - it can be an emotional issue.

KESTENBAUM: We haven't mentioned this yet, but Brian Nosek, the guy who loves replication, right, who headed up that whole study to repeat all those experiments - one of the experiments that was being repeated was one he did originally. And when this other lab redid his experiment, they could not replicate his original results.

GOLDSTEIN: And that's a little hard for him to deal with.

KESTENBAUM: So do you think you were wrong?

NOSEK: No. No, I wasn't wrong.


KESTENBAUM: The replication was wrong, you think.

NOSEK: Well, it may not be wrong entirely either.


NOSEK: So I'm willing to grant that (laughter). So in fact, this is - I mean, I am in a tight box. We gave them access to the way in which we collected the data. They analyzed it the same way, and the results came out differently. So you know, this is - and this is the rub, right. I am invested in that original result.

GOLDSTEIN: Nosek told us after those results came out, nobody, not one researcher, raised their hand and said - oh, I guess that means I was wrong. A lot of people said no, no, the problem was with the replication.

KESTENBAUM: And Nosek says in some cases the problem was with the replication. But all of them? No way.


KESTENBAUM: that yesterday study by the way a bunch of researchers try to replicate it and could not.

GOLDSTEIN: That ESP study, by the way, a bunch of researchers tried to replicate it and could not. But the original researcher, he says he's got this paper he's working on where he pulled 90 ESP studies, and he says you put them all together. You can see something there.

Our show today was produced by Jess Jiang. Also, a special thanks to Anna Dreber at the Stockholm School of Economics. She is one of the people running that economics replication project. She's promised to let us know when the results are available.

KESTENBAUM: And we'll let you know.

Send us an email - planetmoney@npr.org.

GOLDSTEIN: And if you're looking for another show to try, check out Alt.Latino, hosted by the great Jasmine Garsd, who co-hosted PLANET MONEY last week, a great show about human smuggling across the border. Alt.Latino is a music show. It's a Latin music show. You can find it at npr.org/podcasts or on the NPR One app. I'm Jacob Goldstein.

KESTENBAUM: And I'm David Kestenbaum. Thanks for listening.


SVARE AND MICHAELS: (Singing) We are going to run...

KESTENBAUM: Can I tell you about two experiments I tried when I was a kid?


KESTENBAUM: First one - when I was little, I was on bicycle. You know, it's hard to turn the wheel because of the angular momentum when you're moving? I was, like - what happens if I really try and turn it?

GOLDSTEIN: (Laughter) Baby Kestenbaum, a tiny scientist at work.

KESTENBAUM: It was brutal. I went head over heels.

The other one is I spent a whole afternoon trying to move a crumpled up piece of paper with just my mind.

GOLDSTEIN: Did you move it?

KESTENBAUM: I didn't publish.

GOLDSTEIN: (Laughter).

KESTENBAUM: I really thought it might happen. I love it.

GOLDSTEIN: It's not sad to me. Other people may find it so. To me, not.

KESTENBAUM: The sad part is that the paper didn't move.


SVARE AND MICHAELS: (Singing) We're one step closer to our dreams. They're colored bright like autumn dreams. Not looking back, we're not afraid to fly. We're forever young, and we'll never die. We're one step closer to our dreams that burn as bright as we can see...

Copyright © 2016 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.