Drawing Phony Connections With Mismatched Metrics

The Spurious Correlations blog compares unlikely data sets, such as the divorce rate in Maine to consumption of margarine. Writer Tyler Vigen talks with NPR's Scott Simon about his useless analyses.

Copyright © 2014 NPR. For personal, noncommercial use only. See Terms of Use. For other uses, prior permission required.

SCOTT SIMON, HOST:

This is WEEKEND EDITION from NPR News. I'm Scott Simon. Our times have gone mad for metrics. Now, have you ever compared the number of honey-producing bee colonies in the United States with the marriage rate in Vermont or seen any correlation between the divorce rate in Maine and the consumption of margarine?

Well, that's what the Spurious Correlations blog tries to do by plotting completely unrelated sets of data. It's the project of Tyler Vigen. He's a geospatial intelligence analyst with the U.S. Army National Guard and a law student at the Harvard Law School. Thanks for being with us.

TYLER VIGEN: Hey, no problem. Thanks for having me.

SIMON: So one correlation I noticed immediately - a lot of people will. Can we say that the more movies Nicholas Cage appears in each year, the more people drown by falling into swimming pools?

VIGEN: While we can certainly say that those two variables vary together, I wouldn't say the one causes the other, though.

SIMON: I know, that's the whole idea - correlation versus causation. But between 1999 and 2009, there was an astonishing correlation between those two events.

VIGEN: There certainly was, but part of the problem is that that correlation is based on a very small set of variables for a very small set of numbers. And with that few number of variables, it's hard to see whether there's a statistically significant correlation or if there's just a coincidence, which - I think it's the later.

SIMON: Reading your blog, I don't think I'm going to be eating cheese before bed.

VIGEN: (Laughter) Certainly. You're talking about the correlation between the per capita consumption of cheese and the number of deaths by people being tangled in their bed sheets.

SIMON: Yes.

VIGEN: Yes, indeed. And that's another one of those correlations that kind of happens by coincidence, where we see two variables that are varying together and there's not really a causal explanation for why.

SIMON: So are they all nonsense or do some of these get you to think?

VIGEN: Oh, the ones that I've posted so far have all been nonsense. I can imagine there are variables out there that get people to think. For example, about 100 years ago, there was a correlation between smoking and lung cancer. And research based on those correlations discovered the link between smoking and lung cancer. And that's been really important.

SIMON: Yeah. A lot of these correlations are - well, they surround mortality, don't they? Let me put it that way.

VIGEN: (Laughter) Certainly. Well, that's just a product of easily exportable data from the CDC. It was data that I could get my hands on really quickly when I was putting the project together. I'm interested in adding more variables later. I just haven't gotten to that part yet.

SIMON: You've heard the mashed potatoes correlation?

VIGEN: No, I haven't.

SIMON: I believe in this one utterly. One hundred percent of the people who eat mashed potatoes die.

VIGEN: That's true. (Laughter).

SIMON: Yes it is. Makes you think, doesn't it?

VIGEN: Certainly. That would be another form of correlation where we'd say, yes, but it's not statistically significant.

SIMON: Well, Mr. Vigen, forgive me, but don't eat cheese and go to bed, OK?

VIGEN: That sounds like a good idea.

SIMON: Tyler Vigen. His blog is called Spurious Correlations. Thanks so much.

VIGEN: Thanks for talking to me.

Copyright © 2014 NPR. All rights reserved. No quotes from the materials contained herein may be used in any media without attribution to NPR. This transcript is provided for personal, noncommercial use only, pursuant to our Terms of Use. Any other use requires NPR's prior permission. Visit our permissions page for further information.

NPR transcripts are created on a rush deadline by a contractor for NPR, and accuracy and availability may vary. This text may not be in its final form and may be updated or revised in the future. Please be aware that the authoritative record of NPR's programming is the audio.

Comments

 

Please keep your community civil. All comments must follow the NPR.org Community rules and Terms of Use. NPR reserves the right to use the comments we receive, in whole or in part, and to use the commenter's name and location, in any medium. See also the Terms of Use, Privacy Policy and Community FAQ.