Robots Are Now 'Creating New Robots,' Tech Reporter Says
Robots Are Now 'Creating New Robots,' Tech Reporter Says
The evolution of artificial intelligence has exploded over the past five years, leading to computers that can drive and talk. New York Times' Cade Metz explains how machines are learning on their own.
TERRY GROSS, HOST:
This is FRESH AIR. I'm Terry Gross. There's been a huge boon in the evolution of artificial intelligence over just the past five years due to more high-powered computers, new kinds of programs and an increasing pool of data to learn from. This has led to new developments such as driverless cars, machines that can carry on a conversation and robots that can interpret medical scans. Today, we're going to talk about this rapidly changing world of artificial intelligence and how these new high-tech developments may change our lives for better or for worse. I've also just described the beat that my guest, tech reporter Cade Metz, covers for The New York Times. He was formerly a staff writer at Wired magazine. He's currently at work on a book about the visionaries behind new developments in artificial intelligence.
Cade Metz, welcome to FRESH AIR.
CADE METZ: Thanks for having me.
GROSS: So you've been writing a lot about artificial intelligence, and you're writing a book about it, too, now. So let's talk about that. What is artificial intelligence? What do we mean now when we say it?
METZ: That's a really good question. And it's, on some levels, a hard question to answer just because people tend to throw around this term, and they have for a long time, to describe almost anything. And use of this term has accelerated in recent months, in recent years. Anything that is automated in some way or even any computing technology that's advancing the state of the art, even in a small or minor way, is described as artificial intelligence. So it creates all this confusion about what that means, right? That's a grandiose term. It has a long history. It dates back to the '50s when this group of academics got together to truly build artificial intelligence, a machine that would mimic the intelligence of a human.
But amidst all that hype, a lot of it which can be sort of just pushed aside, there's a very real change going on now, and it started about five years ago. And essentially, it's a fundamental change in the way that we build technology. What has changed now is that a certain type of computer algorithm - it's called a neural network - has started to work in ways that it did not work in the past. And what this algorithm allows us to do, allows those engineers to do is build systems that can learn tasks on their own.
So for instance, a neural network, if it analyzes millions of photos, it can learn to recognize faces and objects in those photos. Most of us have used Facebook. If you've ever used Facebook and you've posted a photo of friends and family, it can recognize those faces. Well, that's driven by one of these neural networks, which can learn that task on its own. We're also seeing these systems in speech recognition. So you can now bark commands into your cellphone. It can recognize what you say. It's used in machine translation where machines translate from one language to another.
GROSS: So let's back up. These are systems that can - what did you say? They can learn on their own.
METZ: Exactly. They're called neural networks. And that's a metaphor. They're meant to mimic the web of neurons in the brain. But really, they're just math. They're mathematical algorithms that analyze data. A group of millions of photos is just data, and these systems look for patterns in that data - the way a nose curves on a person's face, the way the lines of an eye come together. They can identify those patterns, the math of those visual patterns, for instance, and then learn to recognize what is in an image. In the same way, it can recognize patterns in voice data and learn to recognize what people say.
GROSS: So another example you've mentioned in your reporting is how there are now some computers that can read scans to see, for instance, if there's a nodule in a lung that might be a sign of cancer.
METZ: Exactly. It's the same basic technology. The same technology that allows Facebook to recognize faces on its social network can allow machines to analyze medical scans and recognize when there are signs of illness and disease. It can be applied to lung scans helping to identify cancer, retinal scans helping to identify signs of diabetic blindness. As time goes on, we're going to see machines improve in this capacity. And what they can kind of do is provide a first line of defense against these types of diseases. That's particularly useful in, say, the developing world, where you don't have as many doctors. You can use these types of systems to analyze the scans of patients that would otherwise require a human doctor.
GROSS: And so smart speakers, where you could tell the speaker what to play or what to turn on, that's also artificial intelligence, the same kind of learning pattern that you're describing.
METZ: It's exactly the same thing. So the - you know, the Amazon Echo, which sits on the coffee tables of so many people now, or Siri on the iPhone, they train these systems using real human speech, and they learn to recognize what you say. The next step is really understanding what you say. I think what you'll notice, and this shows the limitations of AI today, the next step is to really, truly understand what you say and act on it in a way that a human would. We're not there yet. These systems can recognize what you say and, on some level, understand. They can answer basic questions, respond to basic commands. But they can't really have a back-and-forth conversation the way you might expect them to, or you might want them to, or that might make it easier to really interact with machines.
GROSS: On a related note, like, when you talk to, say, Siri or to a smart speaker, how is the voice created that's responding to you? Is that actually a person's voice that's been programmed in such a way so that it can be responding to what you're saying? Or is that, like, a combination, like, a mashup of a lot of different human voices that have become this one robotic voice?
METZ: Well, it depends on which system you use. But for instance, Google's system is now, in much the same way, trained on real human speech. So it can - it has analyzed an enormous amount of human speech, and it can use that to create its own. This was just recently rolled into the new Google phones. And you can tell the difference. You know, when I was testing one of these phones late last year, I showed it to my 13-year-old daughter, who's a Apple iPhone user. She could hear the difference between her iPhone and this new Google phone. It sounded more like a human because it had indeed been trained on human speech.
It's the same basic technology - those neural networks that I was talking about. It recognizes those patterns, including the other - the way your voice may rise at the end of the sentence, or you may exhale in certain places. It duplicates those kind of things and gets a little bit closer to human speech. But I will say, and my daughter will say the same thing. It's not identical to human speech. It's still very limited. And that's the case with all these systems. As much as things have progressed in all these areas over the past five years, they still have a very long way to go to truly mimic the way you and I interact.
GROSS: What are some of the limitations that researchers are trying to overcome now?
METZ: Well, there are myriad limitations. For instance, researchers have realized that you can fool these systems into, say, seeing things that aren't there or to thinking they see things when they aren't really. That's a real issue if you talk to people who are kind of on the front lines of this technology. And it becomes a greater worry when you realize that these same technologies are helping in the development of the driverless cars that are being tested in Silicon Valley and Arizona and other places. In many respects, this is how these cars see and respond to what is around them. If you have flaws like that in a self-driving car or, let's say, a surveillance camera, that becomes a big issue.
GROSS: So you've described that these neural networks, the algorithms that are used for artificial intelligence that can learn to do things and kind of mimic human thought and behavior, they're meant to mimic the neurons in the brain. So what does that mean exactly? Like, how do you get algorithms to mimic the way the brain works?
METZ: This is a key question. So let me make it clear that is just a metaphor in many respects, right? This is an old idea. It dates back to the '50s and the late '50s when they first built these types of algorithms. And they were designed to mimic the brain. The reality is they can't mimic the brain because we don't even know how the brain works.
GROSS: Yeah. I was thinking that, too (laughter). Yeah.
METZ: So let's get to the point where we understand the brain completely. Then we can think about truly rebuilding the brain in digital form. That's an impossibility at this point. But essentially, you know, these algorithms are complex networks of mathematical operations. And each mathematical operation is referred to among those that build it as a neuron. OK. You're passing information between these mathematical operations. And it's - they're sort of constructed like a pyramid.
So you have, you know, neurons at the bottom, mathematical operations at the bottom of this pyramid. And it will start to recognize if you're, say, trying to identify a face in a photo. It'll recognize a line here or a line there. And then that information will bubble up to other mathematical operations that will then start to put those lines together with, say, an eye or a nose. As you work up that pyramid, you get to the point where all these operations are helping the system understand what a human face looks like.
This is not exactly the way the brain works. It's a very loose approximation. That needs to be understood if you're going to understand how these systems work. The problem is if you start talking about neural networks, you start assuming that the brain has been mimicked. And you said earlier in describing this that these systems are mimicking human thought. They are they are actually not mimicking human thought. We're still a very long way away from that.
What they're doing is they're - they are learning specific tasks, right? Identifying a photo is a task that these systems can learn and learn to do pretty well. In some cases, they can perform on par with a human. But we are complex biological systems that do a lot of things well beyond that. And machines are a very long way from mimicking everything that we can do.
GROSS: So we should take a short break here, and then we'll talk some more. If you're just joining us, my guest is Cade Metz. We're talking about artificial intelligence and new breakthroughs in related technology. He's a tech reporter for The New York Times. We'll be right back. This is FRESH AIR.
(SOUNDBITE OF TODD SICKAFOOSE'S "TINY RESISTORS")
GROSS: This is FRESH AIR. And if you're just joining us, we're talking about artificial intelligence and other breakthroughs in high tech. My guest, Cade Metz, is a tech correspondent for The New York Times and is writing a book about artificial intelligence.
So, you know, getting back to voice recognition and trying to make computers conversational, tell the story of the - I don't know what to call it, a computer or what - that was being taught speech and ended up picking up a lot of like racist terms and Holocaust-denying expressions. And the company ended up discontinuing this.
METZ: That was an earlier Microsoft project called Tay. And it's kind of a famous example of these types of systems gone wrong. It was an earlier system. And it had a - it had a serious design flaw. In many cases, it was just designed to repeat what the person had said to it in some way. And so humans interacting with this service online quickly realized this was the case and they could kind of coax the system into saying these racist and xenophobic things. Microsoft immediately took it offline.
And since then, the technology has seriously improved. These systems that learn on their own have come to the fore. And Microsoft sees a real path towards creating a conversational system. The problem is that Tay, this bot designed and deployed in the past by Microsoft, has created a real conundrum, not only for Microsoft but for anyone else trying to build these conversational technologies.
The thinking is that if you can get these new systems out in front of people and they will interact with them on their own and generate more conversational data that these systems can use that data to train themselves in ever more proficient ways. But these companies are wary of putting them out there because they know they're going to make mistakes. They know that they're going to learn from those human biases and the data. They know they're going to offend people in the end. And, you know, these are big companies with brands and reputations to protect. And they see this as a real stumbling block to reaching the point where they can build truly conversational systems.
GROSS: As we were saying before, you know, computers have to like - robots have to learn in order to function. They first have to learn. And you write about reinforcement learning, how some bots like learn tasks through trial and error. And you describe a project that was a Google project creating a bot that could beat the world's best player at the game of Go. So how did that bot learn through trial and error and reward? It sounds like a really behavioral kind of approach to behavioral psychology - reward and punishment type of approach without the punishment.
METZ: Well, I mean, you could add the punishment, to tell you the truth.
GROSS: Oh, really?
METZ: This project was a project out of a lab in London called DeepMind, which is now owned by Google. They built a machine, like you said, to play the ancient game of Go, which is the Eastern version of chess. It dates back thousands of years. And it's a national game in China and Japan and Korea. Just a few years ago, even experts in the field of AI assumed it would be at least another decade before we could build a machine that could crack the game of Go just because it's so complex.
People like to say, including the designers of this machine at DeepMind, they like to say that there are more possible moves on a Go board than atoms in the universe. You can never build a system that could explore all the possibilities on the board and reach a conclusion. You just couldn't build a computer that was that powerful. But with these neural networks, what they were able to do is build a system that could learn to play Go on its own. And the way they did that was to supply it with millions of moves from professional human players.
Once it learned the game, they essentially created two versions of the system. OK. It's a decent player at this point. And they would pit the system against itself. And it would play millions of games against itself. This is the reinforcement-learning aspect. It learned which moves in these games of self-play - which moves were successful and which were not. And in that way, it reached a level that was well above any human player. They took it to Korea a couple of years ago, and they beat the man who was essentially the Roger Federer of the Go world over the past decade. And then they took it to China, and they beat the No. 1 player in the world handily. It's a system that really shows what is possible with these algorithms.
GROSS: In which part - you've programmed the bot to try to get more points and not lose points, and that's its goal. And so it learns that some decisions lead to gaining points and some decisions lead to losing points. And it tries to orient itself toward its goal of gaining points. Is that...
METZ: Exactly. And if you do that on an enormous scale, over millions of games, you learn which moves are going to gain you points versus which ones are going to lose points, which ones are going to get you closer to winning the game - having more points than your opponent - and which you're going to lose. The thinking is that you can then apply this type of reinforcement learning to the real world.
So, for instance, people have already started research where they try to train cars in this way. I mean, they are literally training cars in games. So, you know, you have racing games, you know, video games that teenagers play. If you can train this virtual car to play that video game with the same method - right? - certain moves mean more points or less, then you can eventually train a car to drive real roads. But that's certainly easier said than done. The real world is more complicated even than a video game, of course. And kind of transferring that knowledge that a system has learned in a game to the real world is a big step.
GROSS: My guest is Cade Metz, a technology correspondent for The New York Times. After a break, we'll talk about why some artificial intelligence designers are worried about the possibility of creating intelligent machines that will refuse to allow humans to turn them off. And Milo Miles will review two Kronos Quartet albums, one in collaboration with Laurie Anderson, the other in collaboration with a group from Mali. I'm Terry Gross, and this is FRESH AIR.
(SOUNDBITE OF EMAV'S "TECHMO")
GROSS: This is FRESH AIR. I'm Terry Gross, back with Cade Metz, a tech correspondent for The New York Times. He writes about new developments in artificial intelligence and robotics like driverless cars, machines that can carry on a conversation, robots that can interpret medical scans and how these and other new high-tech developments may change our lives for better or worse. He's at work on a book about the visionaries behind new developments in artificial intelligence.
Now that we've learned so much about hacking through Russian interference in our election with like fake news and fake accounts on social media, the thought of there being robots that can be hacked by a malevolent actor is really scary because if robots are designed to do important things, whether it's like, you know, reading a lung scan to see if a person has cancer or whether it's a driverless car, if somebody can hack into that for devious reasons, that's kind of terrifying. So are designers worried about being able to adequately design safeguards?
METZ: They are certainly worried. There was a report recently, pulled together by a group of technologists and researchers across various companies, research labs and think tanks across the U.S. and the U.K., that looked at these very issues. And, again, they are myriad. You know, you talk about the possibility of hacking in to an autonomous system. Or, you know, certainly you can exploit its mistakes. Like I said, there are situations where these neural networks can be fooled into seeing things that aren't there - or failing to see things that are. That can be exploited by bad actors. But there are other issues that may be even closer.
One of the things that these systems are doing extremely well and increasingly well is essentially fabricating images that look real. There was a team out of a company called Nvidia - they're a chipmaker, but they have an AI lab in Finland. And this team of researchers put together a system where essentially they built a neural network that analyzed millions of celebrity photos. And this system learned to build its own celebrity - essentially taking all those patterns that you see in pictures of Gwyneth Paltrow, or whoever else you might imagine is in this database. And then it would build this celebrity - this fake celebrity that looked vaguely like someone you would see on the red carpet, but you couldn't quite identify.
Think of the implications of that in the age of fake news, right? These techniques will soon be applied not only to still images like that but to video and virtual reality. But even in the near term, as these systems become better and better at building fake images that look real, we're really going to have to change the way we look at anything we see online.
GROSS: Yeah, that's a really scary thought. I will say you had some of the images of these computer-generated celebrity faces online. And it's kind of hilarious because they look exactly like generic celebrities - like there's a certain generic celebrity look, you know? And I have to say that the computers - the artificial intelligence captured it perfectly.
METZ: Well, that's what is doing. It's identifying patterns, right? There are patterns that you and I respond to. And we say, ah, that's a celebrity, right? And some cases, those are intuitive. We can't necessarily articulate them. But these systems can identify those patterns and then make use of them. And in a way, that's a scary thing as well, right? These systems are operating on such a large scale. They're analyzing so many photos. We can never be sure exactly why they're making certain decisions. And that's a big worry as well. Think about that self-driving car again. If the self-driving car makes a mistake, you want to know why it made a mistake. Well, these systems operate in ways that even the people who build them do not completely understand.
GROSS: Why not? Why don't they understand them?
METZ: Well, can you analyze millions of photos in a matter of minutes? You can't, right? You cannot identify the patterns that that system identified. You can't pinpoint the exact decisions it made over the course of that analysis because you cannot do it yourself.
GROSS: So in other words, you have no idea why the size of the woman celebrity's lips are they are or why the length of stubble on the male celebrity's beard is - like the...
METZ: Absolutely, or...
GROSS: ...Or, you know, what's considered like the perfect eyes for a celebrity. Like, the artificial intelligence has made a gazillion calculations, and you're just kind of generalizing when you say something.
METZ: Absolutely, or think about it like this. Think back to what we discussed about the game of Go and that system that the DeepMind Lab in London built to play the game of Go. Some of the designers on that team were very good Go players. When that system was in the middle of one of those high-stakes games in Korea and China, they had no clue what it was doing. It was operating at a level that they could not understand. And these are some of the brightest people on earth literally. And over the course of those five-hour matches, where this machine is playing in ways that no human has ever played literally - and winning in that way, beating the best humans on Earth - the people who designed that machine are unclear what it's doing and why. It is playing at the level they could never play on their own.
GROSS: And the machine isn't going to be rattled by anxiety.
METZ: That's also true. And it doesn't need to sleep.
GROSS: Right. OK, so let's take a short break here, and then we'll talk some more. If you're just joining us, my guest is Cade Metz. He's a tech correspondent for The New York Times. We're going to talk more about artificial intelligence after we take a short break. This is FRESH AIR.
(SOUNDBITE OF OF MONTREAL SONG "FABERGE FALLS FOR SHUGGIE")
GROSS: This is FRESH AIR. And if you're just joining us, my guest is Cade Metz. He's a tech correspondent for The New York Times where he writes about artificial intelligence, driverless cars, robotics, virtual reality and other emerging areas in high tech.
OK, so another fear that you were right about - that some people have about artificial intelligence - and I think these are designers who are worried about it - is the fear that AI systems will learn to prevent humans from turning them off. When I read this, of course, the first thing I thought of is the computer HAL in "2001: A Space Odyssey" because HAL is refusing to be turned off.
GROSS: So how would you elaborate on what this fear is - what the danger is?
METZ: You know, I like that you bring up HAL in "2001" - one of my favorite movies, certainly. And I have to say it's the favorite movie of many of these AI researchers who are designing these systems. If you go into the personal conference room of the top AI researcher at Facebook, he's got images from "2001" all over this conference room. My point is that pop cultural images, movies and books influence not only the way these designers think but the way you and I think and the people who run these companies think. We tend to think about those doomsday scenarios because we've thought about them a lot over the decades through pop culture, right? I think that's part of it - is that people on some level relate to that, and they've worried about that in some way. And so as these machines get better, you tend to worry about that as well.
That said, among many of these researchers, there is a real concern that as time goes on this will be a big problem, for those reasons I talked about - that these machines learn in ways that we cannot completely understand. If you have a car, for instance, that learns by playing, you know, a video game, it happens on such a large scale. It plays the game so much that there are points where it's going to do things that we humans don't expect it to do. And as it reaches ever more for that higher point total, it's going to try to maximize its point total in every respect. The worry is that it's going to maximize it in ways that we don't want it to. That doomsday scenario is that if it is trying to get to the most points, well, it's not going to let us turn it off.
Now, on some level, that again is not something that we can see happening any time soon. That said, these systems that learn on their own are getting better and better - meaning the algorithms are getting better and better. And the hardware that runs these algorithms, that allows them to train ever more - you know, with ever more speed, are getting better and better. And so the worry is that as we improve the algorithms, as we get faster computers, that we will somehow cross that threshold where the machines are really behaving in ways not only that we don't understand but that we can't really control.
And among many thinkers in this area, they're worried - even at this early stage. And it is early. If that ever happens, it's years and years away. But they're worried at this early stage that we will cross that threshold and not know it. What they say is that we might as well start thinking about it now because it is potentially such a risk. And when they make that argument, it's hard to argue back. Why not start worrying now and try to build safeguards into these systems rather than wait?
GROSS: Can you give us an example of the kind of scenario where a bot would not allow itself to be turned off because it was pursuing the reward that it was programmed to pursue, like getting points for making the right decision?
METZ: Well, I mean, loosely speaking - right? - if you're trying to - I mean, again, this is largely theory. But if you're trying to maximize your points, you're not going to keep accumulating points if you're turned off, right? You could think about it in that respect. But think about - let's back up just a little bit. Think about it in this respect. There's a lab in San Francisco. It's called OpenAI. And it was founded by Elon Musk, the CEO of Tesla among others. And they specialize in this reinforcement learning technology, you know, where these systems learn by extreme trial and error.
And they were recently, sometime last year, training the system to play a really old kind of laughably clunky video game from the '80s or '90s. It's like a boat racing game, OK? And they trained it through this reinforcement learning method where essentially it plays the game over and over and over and tries to maximize its points. Well, this system learned in ways the designers didn't understand. Rather than trying to finish the race, which it should have done, it realized it could gain more points if it just ran into things and then spun back around and ran into other things and sort of tried to grab these sort of baubles that were around the course that awarded more points.
And so it got into this crazy loop where it was just wreaking havoc across this game solely in an effort to gain points and wasn't trying to actually win the game. That again, you know, is a metaphor for the type of thing these researchers are worried about. And what they ended up doing is they ended up building an algorithm that allowed for human input. When the machine started doing things like that that there were unexpected, the human designer could provide suggestions - give it a little nudge here or there, show that it needed to complete the race and not just wreak havoc.
GROSS: Wow, that's so interesting.
METZ: Isn't it?
GROSS: Yeah. So are we at the point where robots are designing other robots - where artificial intelligence is creating new forms of artificial intelligence?
METZ: In certain small ways - what's really interesting to me is that building these neural networks is very different than building other forms - traditional forms of computer software, OK? The people who build these things have a particular talent. Basically, you know, you're trying to coax a result out of this like vast sea of data. People often talk about it as a dark art. And they're like individuals - and they are now paid in the millions, by the way, by these companies to perform this dark art, as some call it. It is a real talent. And so what has happened is that because relatively few people know how to do this today - just because it wasn't done much in the past - that a lot of companies don't have this talent that can help them build these systems. And they want it.
But what's interesting is that people are now building machine-learning algorithms that can help build these machine-learning algorithms if you can wrap your brain around that. And so essentially - let's say at Google, for instance - they now have a system that can build an image recognition system that beats the performance of a system built by their human designers. So in that respect, you do have AI building AI. They call it meta learning. Again, this work is early, and it's unclear how far this will progress. But that's a real viable area of research.
GROSS: Why did you start writing about technology?
METZ: Well, what I tell everybody is that I have kind of a dual background, right? My mother is a voracious reader - a fiction reader mostly. My father was an engineer. He was a career IBMer (ph), a programmer. And so when I went to college, I was an English major, but I also interned at IBM as a programmer. So I always had this kind of dual interest, and it really comes from my parents.
GROSS: What was the most amazing piece of technology that you got first because of your father?
METZ: Well, we had an IBM PC even before it was commercially available, as the way I remember it - about 1981 with the monochrome green display. I remember building a program that asked trivia questions. You know, it asked you who Pete Rose was, and you had to type in the answer.
GROSS: (Laughter) So that helped lead you to the role that you play now.
METZ: That's a big part of it. And it - a lot of it was the stories my father would tell. He worked on the original project at IBM that designed the UPC symbol, the bar codes that are on all our groceries.
GROSS: No kidding.
METZ: He worked to test that. And he was a great storyteller, had great stories about the development of that and all the myths that sort of surrounded. There were people who protested the UPC symbol because they said it was the sign of the beast. You know, they had proof in Revelations from the Bible that this was going to lead the world to ruin. He loved to tell those stories. And there was a certain, you know, pride that I had when you would pick up a can of peas and see that symbol on the can.
GROSS: Wow. Cade Metz, it's been great to talk with you. Thank you so much.
METZ: Thank you.
GROSS: Cade Metz is a technology correspondent for The New York Times. After we take a short break, Milo Miles will review two new albums by the Kronos Quartet, one a collaboration with Laurie Anderson, the other with a group from Mali. This is FRESH AIR.
(SOUNDBITE OF MUSIC)
NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.