In a video released today at Edge.org, psychologist Simone Schnall raises interesting questions about the role of replication in social psychology and about what counts as "admissible evidence" in science.
Schnall comes at the topic from recent experience: One of her studies was selected for a replication attempt by a registered replication project, and the replication failed to find the effect from her original study.
An occasional failure to replicate isn't too surprising or disruptive to the field — what makes Schnall's case somewhat unique is the discussion that ensued, which occurred largely on blogs and social media. And it got ugly.
In the new video — from a talk she gave to a group of social scientists in September — Schnall considers the different levels of scrutiny received by different researchers and for different findings. She draws on the legal distinction made by Herbert Packer in 1964 — the distinction between "due process" models of law, for which the burden of proof is very high and the focus is on avoiding wrongful convictions, and "crime control" models of law, for which the burden of proof is much lower and the aim is to prevent any perpetrators from slipping through.
"Crime control comes from crisis," says Schnall. The U.S. Patriot Act might be seen as a legal example and the "replication police" as a scientific example stemming from social psychology's so-called replication crisis. If the culture of replication in scientific psychology is adopting a crime control model, then results that are false positives are the culprits, with a low standard of evidence required to bring a published result under suspicion.
Schnall's legal analogy raises a more general point. There are different kinds of errors in science: type I errors, or false positives, and type II errors, or failures to detect real effects. Norms for acceptable risk are determined by statistical practices within the field, such as considering a finding statistically significant when the probability of observing a result at least that extreme by chance is under 5 percent. If this number were lower, type I errors would decrease, but type II errors would become more common. If it were higher, type II errors would decrease, but type I errors would become more common.
With the failed replication of the Schnall study, part of what made the fallout so ugly was the sense that scientists, rather than studies, were on trial. You see, when it comes to evaluating our scientific peers, it's not so clear how to assess acceptable risk. A culture of trust might let in too many bad apples; a culture of suspicion might reject too many good eggs — not to mention create a hostile climate that's unlikely to make for the best science.
As in criminal law, the cost-benefit analysis needs to consider more than just the total number of errors. The scientific community also needs to consider the risks associated with different kinds of errors, and to establish corresponding community norms. That may be what's really at stake — not the reality of a handful of psychological findings, but the culture in which psychological science is pursued.
You can keep up with more of what Tania Lombrozo is thinking on Twitter: @TaniaLombrozo.