The gold standard for establishing a causal relationship — between, say, a drug and some health outcome — is a randomized controlled trial, or RCT.
RCTs are powerful for a few reasons.
First, RCTs involve an experimental manipulation of the causal factor of interest. Rather than merely observing a correlation between, say, chocolate consumption and levels of happiness in a large and diverse population, an RCT involves an intervention on people's consumption of chocolate: People in one group are assigned to eat chocolate; people in another group are assigned to abstain. This ensures that if a correlation between chocolate consumption and happiness is observed, it's due to a causal relationship such that chocolate affects happiness, and not some other causal structure (for instance, that being happy leads to greater chocolate consumption, or that both chocolate consumption and happiness are caused by frequent socializing).
Second, RCTs involve randomization. That's important because all sorts of factors could influence the effect of interest. Chocolate might (or might not) affect happiness, but so might one's social environment, immediate stressors, health, genetic predispositions, and more. Assigning people to one condition or the other at random ensures that these additional sources of variation in happiness aren't systematically related to chocolate consumption, and therefore a potential confounding factor.
So RCTs are a gold standard for good reason: They help us avoid some of the pitfalls in going from observations to conclusions about how factors are causally related.
What isn't often appreciated, though, is how the notion of causation established by an RCT departs from the way we often use causal language in informal conversations. Consider three ways that our everyday use of causal language conflicts with what we might call the "RCT" notion of causation:
1. Sometimes when we make causal claims, we're in the business of assigning moral responsibility. The actions of two drivers may have caused the collision in the sense that had either acted differently, the collision could have been avoided. But if one was driving carefully while the other ran a red light under the influence of alcohol, we're inclined to say the latter caused the accident. If we scale up this single event to something like an RCT, the conclusions can be counterintuitive. Imagine 200 situations in which a drunk driver runs a red light. Now we experimentally vary whether there's another driver crossing the intersection. The 100 cases with the second driver yield an accident; the 100 without do not. Do we conclude that the presence of the law-abiding driver is the cause of the collision? In one sense this is surely right; but in another it gets something deeply wrong.
2. Sometimes when we make causal claims, we care about the mechanism by which the cause brings about the effect, not merely that it increases its probability. To illustrate, imagine the following hypothetical scenario: Two thousand people with a terminal illness are randomly assigned to one of two groups. In the treatment group, participants take a new drug for their illness that's feared to increase the risk of skin cancer as a side effect. In a control group, participants take a placebo. After the trial period, the study finds that the treatment group indeed has higher rates of skin cancer than the control group. It sounds like a causal relationship has been established: The new drug causes skin cancer. But now suppose we learn the following: The new drug in fact causes forgetfulness. Forgetfulness meant that those in the treatment group were more likely to forget to put on sunscreen. Failing to wear sunscreen is what led to the elevated levels of skin cancer. Again, there's some sense in which the drug did cause an increase in skin cancer, but it also seems misleading to simply say that it causes skin cancer: What it does is prevent people from preventing skin cancer.
3. Sometimes when we make causal claims, we care about capturing a feature of the whole population we're making a claim about, not just a subset. Suppose 1,000 men and women in our treatment group take a drug designed to decrease depression; those in the control group take a placebo. We find that the former group experiences significantly fewer symptoms of depression. It appears that taking the drug causes depressive symptoms to decrease! But a finer-grained look reveals that the effect is driven entirely by women: For the men, taking the drug had no effect. Although the original RCT supports the claim that taking the drug has a causal effect on depressive symptoms, the statement doesn't apply equally to the whole population. The unqualified statement ("taking the drug causes depressive symptoms to decrease") seems to get something wrong.
These examples are just three among many. They illustrate some of the subtleties in the way we talk about causation — subtleties that have been recognize by psychologists and philosophers, but that aren't necessarily captured by the notion of causation established by an RCT. This isn't necessarily a surprise — lots of terms have multiple senses, some informal, some technical, and some a mix of both. But it does come with an important caution: We need to be careful in how we interpret the results of RCTs, and in what we infer from informal causal claims. And maybe, to help us get things right, we need a more precise way to talk about causation.
Tania Lombrozo is a psychology professor at the University of California, Berkeley. She writes about psychology, cognitive science and philosophy, with occasional forays into parenting and veganism. You can keep up with more of what she is thinking on Twitter: @TaniaLombrozo