I always hated statistics. I mean really, really, really hated it.
Recently though, I've had a change of heart about the subject. In response, I find statistics changing my mind, or at least changing my perspective.
Let me explain.
When I was an undergraduate physics major, lab classes were mandatory. One of the most important parts of lab was doing error analysis — and that meant applying basic statistical ideas like calculating averages and measures of variability (like standard deviations).
After a few weeks of this, I happily changed my major from physics to math-physics. The latter, I learned, came mercifully without the lab and its statistics requirement.
The problem for me wasn't doing the statistical calculations. They were OK. Instead, it was the idea of statistics that bummed me out. What I loved about physics were its laws. They were timeless. They were eternal. Most of all, I believed they fully and exactly determined everything about the behavior of the cosmos.
Statistics, on the other hand, was about the imperfect world of imperfect equipment taking imperfect data. For me, that realm was just a crappy version of the pure domain of perfect laws I was interested in. Measurements, by their nature, would always be messy. A truck goes by and jiggles your equipment. The kid you paid to do the observations isn't really paying attention. The very need to account for those variations made me sad.
Now, however, I see things very differently. My change of heart can be expressed in just two words — Big Data. Over the last 10 years, I've been watching in awe as the information we have been inadvertently amassing has changed society for better and worse. There is so much power, promise and peril for everyone in this brave new world that I knew I had to get involved. That's where my new life in statistics began.
The whole point of Big Data is to understand how to quickly and intelligently shift through peta-bytes of information and extract relationships. That means applying statistics-based methods to the numbers, names and other quantities that are what we mean by "The Data."
But to get anywhere with Big Data, I need to learn everything I can about statistics as fast as I can. My first refresher and guide in this effort has been the Coursera Course of Matthijs Rooduijn and Emiel van Loon of the University of Amsterdam. So far, I've only made it through the first week of their online lectures, but my platonic-oriented mind is already being retuned. The thing that's really getting to me is pretty simple, so I hope you'll excuse my naïve enthusiasm.
The issue is the world that's out there, independent of us. With my platonic-theoretical-physicist glasses on, I have always been happy to claim that we already know the exact laws exactly governing that independent world. But really, a claim like that is kind of bull. The real, independent world is way more complex than my theoretical physics equations can handle. This is particularly true when it comes to biology or, even more to the point, human society with its economy and culture and politics and elections.
So what can we do to understand the complexity of economies, cultures, politics and elections? We can take data. We can go out and measure whatever we can get our hands on. And it's right there that the light snaps on and vaults me past in my old distaste for statistics.
The problem with taking data is you don't know what it's telling you. It's always only a partial representation of the thing you are trying to understand. That means there is only one way to make clear links between the data you have taken and what the world wants to understand. You have to be very clear and very clever about interrogating the data. You have to develop methods — statistical methods — that extract answers you can trust.
Even more important, you need methods — statistical methods — for knowing exactly what the limits of trust are. Without these methods we would literally be lost. We'd be unable to see what data to take, what that data can tell us and when the data can't tell us anything at all.
Of course, what I'm saying will elicit a giant snooze for anyone who has thought even a bit about statistics and their use. But for us statistic-haters, the deeper philosophical basis of its methods in representing the world are worth consideration. That's because the effectiveness of all those algorithms creeping into every aspect of our lives hinge exactly on understanding the essential gap between the data we collect and the world it's meant to describe.
So now, finally, I can see the great range and beauty in the ideas behind statistics. Better late than never, at least on average.
Adam Frank is a co-founder of the 13.7 blog, an astrophysics professor at the University of Rochester, a book author and a self-described "evangelist of science." You can keep up with more of what Adam is thinking on Facebook and Twitter: @adamfrank4