Episode 669: A or B : Planet Money These days A/B testing is everywhere. It's shaped almost every website, some stores and even some school lessons. Today, the most meta episode ever. Planet Money A/B tests a show about A/B testing.

Episode 669: A or B

  • Download
  • <iframe src="https://www.npr.org/player/embed/459412925/459423886" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript


Back in 2007, when Barack Obama was running for president, he stopped by Google headquarters. And as sort of a joke, they asked him one of Google's notoriously tough interview questions.


ERIC SCHMIDT: What is the most efficient way to sort a million 32-bit integers?


HENN: Obama might have been fed the answer.



SCHMIDT: I'm - maybe I - I'm sorry...

OBAMA: No, no, no, no no, no...

SCHMIDT: Maybe we should...

OBAMA: I think...

SCHMIDT: That's not a...

OBAMA: I think the bubble sort would be the wrong way to go.


HENN: Obama had talked about using big data to make government better. Dan Siroker this engineer sitting in the back of the audience was smitten. For Dan, Obama's spiel was powerful stuff.

DAN SIROKER: And I remember the last thing he said when he came to Google was I want you to be involved. And I took him literally. Two weeks later, I flew to Chicago, in the dead of winter, joined the campaign as a volunteer.

HENN: Dan knew nothing about campaigns, but he did know computers. And he saw this problem with Obama's website. There was a little button that said sign up for email at the bottom, but most people didn't click.

So, he says...

SIROKER: Hey, what if we could improve this bottleneck, this percentage of people who sign up on our email list? So I ran an experiment on the barackobama.com splash page.

HENN: He created two versions of this website's page - version A and version B. Those two sites were almost exactly the same, but there was this one little difference. One button said sign up. The other button said learn more. And then, Dan flipped a switch. Suddenly, half of the visitors to the site saw version A with the sign up button, and the other half saw version B, the one that said learn more.

SIROKER: The learn more button had a sign-up rate of about 8.9 percent, which was much better than the original, around seven and a half.

HENN: But for Dan, this was a big success, and he kept testing. He ran a little test on the picture at the top of the page - one picture versus another, A versus B. He tested the size of the button, the color. And by the time he was finished with all these little A/B tests, the sign-up rate on Obama's website was up more than 40 percent. Dan's little test had changed the campaign.

Hello, and welcome to PLANET MONEY. I'm Steve Henn.


And I'm Robert Smith. A/B tests, like the one that Dan did - A/B tests are everywhere these days. Major websites, in stores, in classrooms - everyone's trying to figure out what you like better version A or version B. In fact, this very podcast that you're listening to right now - this podcast was A/B tested.

HENN: Today on the show, we give people what they want - or we think they want - journalism by the numbers.

And Robert (laughter), you know what? That phrase - hello and welcome to PLANET MONEY - it doesn't test well with the audience.

UNIDENTIFIED WOMAN #1: Support for this episode of PLANET MONEY comes from Casper. They're an online retailer for mattresses. Casper mattresses are American-made and obsessively engineered for comfort. They use two technologies latex foam and memory foam to give just the right amount of sink and bounce, and they have a risk-free trial. You can try out your Casper mattress for 100 days with free delivery and returns. It's outrageous comfort at a polite price. So go to casper.com/planet to check out their options. And they have a special offer for listeners of this podcast. Use the promo code planet to redeem $50 towards a Casper mattress that works for you.

SMITH: All right, so this is going to get a little bit meta here, but stick with me. The way this podcast started, you know, the little story about Barack Obama.


OBAMA: I think the bubble sort would be the wrong way to go.


SMITH: That was not the only way we came up with to start this podcast. In fact, we played two different introductions to this podcast to 42,000 unsuspecting listeners, and we watch them very carefully to see which introduction worked better.

So, I want you to imagine that this podcast is starting all over again. Clear your mind. Pretend like you just heard some stamps.com ad, and you're about to start the PLANET MONEY podcast with version B.

HENN: Since modern companies were created more than 100 years ago, most have been organized as these hierarchies. You get hired as a young kid straight from school, and you have a boss. Chances are your boss has a boss. And if you come up with some new idea and someone up this chain of bosses doesn't like it, well, the biggest boss wins.

SIROKER: These decisions, typically in organizations, get made using a method known as the HiPPO syndrome - the highest-paid person's opinion.

HENN: Dan Siroker says there is a problem with the highest-paid person theory, this HiPPO method.

SIROKER: Very rarely is the HiPPO opinion the most effective.

HENN: But, you know, this is the way most businesses work. It's the way the world works. You know, if I get into a fight with my editor about the best way to start this story, my editor's probably going to win. And for years...

SMITH: I'm going to fade out this introduction a little bit early because it sort of goes into some of the stuff that we talked about in the first introduction. So there you have it. You have two versions. Version A is the little story about Barack Obama.

HENN: And then there was version B, the big idea, a story about how bosses usually win most arguments and how A/B testing could change that.

SMITH: Now you may already have a strong opinion on which of these two intros was better. I certainly have my opinion.

HENN: And I have my opinion, too. But the entire point of A/B testing is that my opinion and your opinion, Robert, don't matter. They're just too small data points. And what's important is what thousands and thousands of people think.

SMITH: And so we tested both intros. We found thousands and thousands of people willing to listen to one intro or the other and collected their data. The Internet makes this super easy. And in fact, we at PLANET MONET are pretty late to the game. Newspapers, websites do this kind of testing all the time.

HENN: Like BuzzFeed - it's a popular website you may have heard of.

SMITH: (Laughter).

HENN: Every headline, every picture is tested.

JON MENSING: Absolutely.

HENN: Jon Mensing is an A/B testing expert at BuzzFeed. He's the reason you've taken so many online quizzes when you should have been working. The editors give him lists of possible headlines, and he makes sure that the one the readers share most, click on most - that that wins.

Jon doesn't just want to show me these headlines. He wants me to play a game. He wants me to guess which headline I think will work best.

MENSING: And I've got a big long list of these. This one's got three. This one is "33 Ways To Build The Ultimate Snow Fort," "33 Ways To Build The Best Snow Fort Ever" and "33 Ways To Build A Snow Fort You'll Want To Move Into."

HENN: I'm guessing version B.

MENSING: So your guess was "33 Ways To Build The Best Snow Fort Ever," but the winner was actually the third one, "33 Ways To Build A Snow Fort You'll Want To Move Into," which was about twice as good.

HENN: As my guess?

MENSING: Yes, it was.

HENN: Oh. Oh, that's discouraging.

This testing, it works. It drives traffic. The difference between a good headline and a bad headline can mean tens of thousands, even hundreds of thousands more views.

SMITH: And sometimes the stuff that Jon personally thought was awesome did not test as awesome.

HENN: When you first did tests like this, did it make you feel stupid?

MENSING: I would say I felt more emotionally invested in the results than I should have. And when...

HENN: (Laughter).

MENSING: ...You do an A/B test and you're just completely wrong about it, you know, it kind of hurts. But you get over it because that's not how it works. It's not about your ego. It's about what the data says and what your audience is telling you by what they're doing.

SMITH: And their audience is telling them that they like it, buzzfeed.com gets 200 million visits a month. That is a huge number in journalism.

HENN: And we'd love to be able to do this in radio, but it's hard, right. You can't put one story on 89.9 FM and another on 91.3 and see which station has better ratings. It's too expensive. It's blunt.

SMITH: And even for podcasts, we do not have great data. We know, for instance, that you downloaded this episode. But we don't know if you ever clicked play. And we don't know if you're listening now. And we don't know if you stopped listening back when I said hello and welcome to PLANET MONEY, as apparently everyone does.

HENN: No, don't say that.

SMITH: We're not allowed to say that anymore.

HENN: Until recently, there was no way for us to test something as discrete as two different intros. But then, NPR built this - I have to say - fabulous new app called NPR One.

SMITH: I have it right here on my phone.



UNIDENTIFIED WOMAN #2: You're listening, listening...




SMITH: We often describe it as Pandora for public radio because it feeds you all these great NPR stories and NPR podcasts like this one. And one of the great things about NPR One that makes it different than any radio you've ever listened to is that there is a little button in the corner of your screen that is a skip button. If you get bored or you don't want to hear something, you can just press that button and skip it. Here, watch.


KORVA COLEMAN: From NPR News in Washington, I'm Korva Coleman.

SMITH: Press the skip button.


STEVE INSKEEP: Let's focus next on the aftereffects of terror attacks.

SMITH: Press the button and skip it.

HENN: Ah, too much terrorism now.


DAVID GREENE: Let's just say it - you would not expect this scene in a conservative Muslim country. We're going to an annual convention for women video...

SMITH: Skip.

HENN: Robert...


HENN: I wanted to listen to that. You're terrible.

SMITH: Now for the purposes of A/B testing, the thing that we love about NPR One is that it gathers all this anonymous data on when people skip stories or when they push a little button that says they liked the story, and we can access that data.

HENN: You can literally see, when you're telling a story, if a whole bunch of people get bored. You can see hundreds of people hit the skip button at the same time. So we went to Sara Sarasohn - she's this great editor at NPR, and she runs the content side of NPR One. And we asked her to help us out.

Have you ever A/B tested part of a story before?

SARA SARASOHN: (Laughter) This is the very first time we have A/B tested a story before. And as a person who makes radio, I have found this to be really, really interesting and exciting to do.

HENN: So Sara took both of those intros we played for you, the little story, version A and the big idea, version B. And she played them for more than 42,000 people listening on the NPR One app. Half got version A. Half got version B. And while those people were listening, the app sent back data. How many people skipped? How many people listened through to the very end?

SMITH: Sara told us it would take a couple of days to test these intros, and I have to say I started to get a little bit nervous. I had put myself, a trained editor of radio, on the line saying I think this one's good.

HENN: Yeah, it takes this thing that we've always sort of thought of as an art, and it makes it a science where there's a right answer and a wrong answer. Is this intro better, or is that? So I went out to find other places where A/B testing is becoming routine. I visited this little shoe store in Chicago. It started as an online store. It's called BucketFeet.

LAURA SHUE: The first thing I'll say is that we call our stores studios, so this is the Bucktown studio.

HENN: Laura Shue manages these stores, and she is really into data. Recently, she just installed this system at the store in Chicago that lets her run A/B tests in physical stores.

SHUE: We have six cameras - two in the front, two in the middle, two in the back

HENN: I see the two up there. There's one right in the center of the store.

SHUE: There's actually three in the front, so the one right up front in the center measures incoming traffic.

HENN: Laura is running A/B tests all the time. If she wonders - should the kids' shoes go in the front of the store or the back of the store? - she tests it. And then she watches to see how people respond. She gets all this data. She opens up her computer and shows me. She even has videos.

Can you describe what we're seeing now?

SHUE: Yeah, so we just saw a customer walk up, looks like she glanced at kind of the artist description, maybe touched one of the magnets and then kind of wandered away.

HENN: The computer automatically draws circles around each person as they walk through the store, tracks their movements, logs the movements into a spreadsheet. It records how long they linger in front of a display.

When you were talking about online tracking, you said sometimes it verges on...

SHUE: (Laughter).

HENN: ...Creepy. Do you ever feel that way about this?

SHUE: This is very creepy. I think that's part of my hesitation in watching a lot of these videos. But, you know, if you have a specific question to answer, it's very useful information.

HENN: So to creep or not to creep? - a classic A/B question. And by the time I had finished watching shoppers in Chicago, I got a call from Sara Sarasohn. The data from NPR One was back. We had our answer. What is the best way to start this podcast?

Who won? Was it the big idea lead, which was kind of my emotional favorite, or was it the little story about the Obama campaign?

SARASOHN: It was the little story about the Obama campaign by a mile in terms of our statistics. There was just no question, Steve. I'm sorry. There is no way you can rearrange this data.



SMITH: Should I note here that we could have saved a lot of time if you had just listened to me, the expert, the editor?

HENN: (Laughter) Yeah, I guess. In my version, almost 9 percent of the people skipped at some point in the story. In the version you liked best, just under 7 percent of folks were skipping.

SMITH: I know that sounds like a small difference, but if you have hundreds of thousands of listeners, getting a bump of 2 percent - I mean, that means a lot.

HENN: It's a big deal. But this data also raises questions that sometimes you just can't answer. I mean, once we got people hooked in both versions, they mostly listened to the end - except for this one moment that I mentioned before.

SARASOHN: I cannot tell you why, but as soon as you said hello and welcome to PLANET MONEY on both of these things, people started bailing like crazy.

SMITH: And Steve, as the editor, I am still not sure what to think of that. In a weird way, once you start A/B testing, it's kind of addictive. It makes you want to say, like, oh, hello and welcome to PLANET MONEY isn't working. How about it's PLANET MONEY or this is PLANET MONEY. Which one of those tests better? Or maybe we don't say PLANET MONEY at all. Maybe that tests better. And as editor, you start to think - where does this stop? I mean, maybe if Stacey Vanek Smith had hosted this podcast, we would have gotten...

HENN: Hey.

SMITH: ...Two percent more listeners.


SMITH: You don't know.

HENN: And you could A/B test everything. You could A/B test our story choices. Do we not do stories on Chinese currency in the Fed because they don't test well? Or should we sneak in the music when we're trying to make a big, important point at the end of the podcast? Should we test how the podcast ends?

SMITH: Yeah, do you want a tie it up in a big bow? Or should we just bail?


HENN: I don't know. I think we should test that ending. I don't think it worked.

SMITH: Yeah, I don't think so either. You can email us at planetmoney@npr.org. Or tweet at us - we'd love to hear from you - @planetmoney. I'm @radiosmith.

HENN: I'm @hennseggs. Out episode today was produced by Jess Jiang. I'd also like to thank Eric Westendorf from LearnZillion and the folks at RetailNext, who helped me find people using A/B testing in the real world.

SMITH: Our show was edited, in part, by you and the fine folks at NPR One. Thanks, Sara.

And now that this episode is ending, may we recommend another fine podcast from NPR? It is called Mic Check. Frannie Kelley and Ali Shaheed Muhammad from A Tribe Called Quest talk to the biggest names in hip-hop. It's great. You can find Mic Check at npr.org/podcasts and on the NPR One app. You will not want to skip this one. I'm Robert Smith.

HENN: I'm Steve Henn. Thanks for listening.


Copyright © 2015 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.