Jennifer Golbeck: What Can Companies Predict From Your Digital Trail? Do you like curly fries? Have you Liked them on Facebook? You might be sharing more information than you realize, says computer scientist Jennifer Golbeck.
NPR logo

What Can Companies Predict From Your Digital Trail?

  • Download
  • <iframe src="https://www.npr.org/player/embed/440305167/441174817" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript
What Can Companies Predict From Your Digital Trail?

What Can Companies Predict From Your Digital Trail?

  • Download
  • <iframe src="https://www.npr.org/player/embed/440305167/441174817" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript

GUY RAZ, HOST:

When you think about your life inside the screen, your digital doppelganger, most of it's about your tweets or the photos and status updates you post, the things that go into the persona you create and have some control over. But what about all the things you don't control? Because almost everything you do online is a data point helping to build a virtual profile of who you might be.

JEN GOLBECK: A very extensive profile that is not just how smart are you, but what are your personality traits?

RAZ: This is Jen Golbeck. She's a computer scientist at the University of Maryland. And Jen, she actually creates the algorithms that can start to figure out who you are online. And before I go on, just a disclaimer that Jen does this for research purposes only. But other people and companies, they're doing it for entirely different reasons. Here's Jen's TED talk...

(SOUNDBITE OF TED TALK)

GOLBECK: As scientists, we use that to help the way people interact online, but there's less altruistic applications. And there's a problem in that users don't really understand these techniques and how they work. And even if they did, they don't have a lot of control over it.

So this is Target, the company. Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset (laughter). So how did Target figure out that this high school girl was pregnant before she told her parents? It turns out that they have the purchase history for hundreds of thousands of customers. And they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that, not by looking at, like, the obvious things, like, she's buying a crib or baby clothes, but things like she bought more vitamins than she normally had or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights.

(SOUNDBITE OF MUSIC)

RAZ: And then when it starts to all add up, the computer just spits out an answer.

GOLBECK: Yeah, and that's the interesting thing about this. It's these statistical correlations that come out when you analyze the data of a huge number of people. Because people buy all those things all the time, but buying them together turns out to be a pretty rare occurrence except among women who are in early stages of their pregnancy.

RAZ: Now, predicting a pregnancy may not sound that complicated, right? But what is actually happening and what people like Jen have designed are algorithms that are starting to predict human behavior simply based on the things we like or buy or who our online friends are. Things like...

GOLBECK: How well you'll interact with others if we put you in a team, or are you a really anxious person who's prone to get angry? Are you extroverted or introverted? And that matters for, say, job applications. Are you a list maker or a procrastinator? Neuroticism, sexual orientation. Are you narcissistic? Whether you drink, smoke, use drugs, potential health conditions. Are you a good romantic partner? Are you going to stay married if you get married?

And it's not right all the time, but it turns out that, yeah, not only can we figure out things about you, we can figure out things that will eventually be true about you that you don't know yet.

RAZ: Like, I know this is happening, but every time I hear it, it shocks me. It's just amazing (laughter). It's scary that you can do this.

GOLBECK: When we try to do this, whatever personality trait we've picked, we've been very successful at uncovering that about people from the little digital traces they leave around and figure out what are you doing that's like other people that we know about?

RAZ: This is easier to understand if you realize that you're not explicitly telling companies that, say, you're an introvert. What you reveal about yourself when you go online is that you do things and like things and buy things that they've already figured out introverts do and like and buy.

(SOUNDBITE OF TED TALK)

GOLBECK: So my favorite example is from this study that was published this year in the Proceedings of the National Academies. And they looked at just people's Facebook likes. And in their paper, they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries.

(LAUGHTER)

GOLBECK: Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people and if you're young, you tend to be friends with young people. This is well-established for hundreds of years. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would've scored high on that test, and their friends saw it. And by homophily, we know that he probably had smart friends. And so it spread to them and some of them liked it. And they had smart friends, and so it spread to them. And so it propagated through the network to kind of a host of smart people so that by the end, the action of liking the curly fries page is indicative of high intelligence not because of the content but because the actual action of liking reflects back the common attributes of other people who have done it.

RAZ: So here's the thing. I mean, all of us come across people that make assumptions about us, you know?

GOLBECK: All the time.

RAZ: And they're often wrong. And I worry these online profiles can't account for those moments when you're at your best or you really shine. And you don't have any control over the assumptions that these algorithms make.

GOLBECK: You absolutely cannot control this, what those algorithms find out about you. And that's the thing. We talk a lot about really carefully curating your online profile. But these algorithms, they don't care about the actual explicit things you've said. They look for these little patterns in things that you've done. And there's no way you can know what the algorithms will find out. But they can find out a lot, and quite accurately.

RAZ: Yeah, I mean, how do you feel about all this?

GOLBECK: I have a sort of dystopian view that I'm working to avoid, I guess. My job, what I spend my time doing, is building these algorithms that terrify me. What if you get fired from your job, not because of something you said, but because a social media profile says that you're going to be unreliable?

RAZ: Yeah.

GOLBECK: And we've been talking in a very U.S.-oriented context, right? But you think about sexual orientation. Sexual orientation is a personal trait that we're very good at predicting. And we can do it even if you do nothing. We can figure it out by looking at your friends. We can figure it out by your language, by your likes. All these different types of data reveal it. So we're very good at it. So we out someone in certain countries in Africa, and they go to jail.

RAZ: Or worse.

GOLBECK: They get executed, right? So let's say we just take everybody on Facebook in those countries in Africa that will execute you for being gay, and we run our algorithms on every person in that country, and we publish a list of everyone who's gay. I have just potentially killed a lot of people. I have ruined the lives of a lot of people just by running this artificial intelligence over their profiles.

RAZ: Wow, it's almost like the scientists who worked on the Manhattan Project and then, like, came to regret, you know, working on this thing that became a weapon. I mean, I don't know, do you ever think of yourself like that?

GOLBECK: I make that Manhattan Project analogy all the time. And I have to explain to people that I don't mean it as hyperbole, right? That, yeah, like, what I'm doing is not going to destroy a city, but it could destroy the lives of just as many people. So there is this potential, huge, life-changing impact of the technologies that I'm developing that really scares me. And we need to think about the impact of this and figure out ways to deal with it, both personally societally, legally because we don't want those really terrible things to happen. And we can't go back and undo them once they start happening.

RAZ: Jen Golbeck. She's a professor at the University of Maryland. You can see her entire talk at ted.com. This episode, Screen Time, Part Two, how life inside the screen is changing who we are. I'm Guy Raz, back in a moment with the TED Radio Hour from NPR.

Copyright © 2015 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.