We asked the new AI to do some simple rocket science. It crashed and burned
LEILA FADEL, HOST:
Artificial intelligence has been stunning people with its ability to produce nuanced answers to complex questions. But it falls short when it comes to accuracy. NPR's Geoff Brumfiel explains why new AI systems can't seem to get their facts straight.
GEOFF BRUMFIEL, BYLINE: OK, so I had this wacky idea. I wanted to see if these new artificial intelligence programs were any good at rocket science. And I know it might seem like kind of a tall order for a chat bot. But here's the thing. Regular old computers crush rocket science. In fact, since the 1960s, computers have been doing most of the flying.
(SOUNDBITE OF MONTAGE)
UNIDENTIFIED PERSON #1: We're on an automatic sequence as the master computer supervises hundreds of events occurring over these last few minutes.
UNIDENTIFIED PERSON #2: ...Or other sequence parts.
UNIDENTIFIED PERSON #3: Hand off to Atlantis' computers has occurred.
UNIDENTIFIED PERSON #4: All the flight computers on Dragon, maintaining their calculations. Standing by.
BRUMFIEL: I wasn't going to ask the newest AI chat bot, ChatGPT, to actually fly a rocket, of course. But I figured I could ask it some basic questions about rockets. Then I called Tiera Fletcher, a real-life rocket scientist, and asked her to review ChatGPT's answers.
TIERA FLETCHER: Let's see. There are many important equations that are used in the design of a rocket. Yes. But one of the most fundamental and critical equation is the rocket equation - yes - also known as the Tsiolkovsky's rocket equation - yes - or ideal rocket equation. That's true. That's factual.
BRUMFIEL: So the words look pretty good. But then Fletcher took a look at the actual rocket equation.
FLETCHER: No. It would not work. It's just missing too many variables.
BRUMFIEL: Fletcher and I went over about a half dozen different formulas and explanations. And while it occasionally got something right...
FLETCHER: So that's actually correct.
BRUMFIEL: ...Mostly it got rocket science wrong.
FLETCHER: And it looks - oh, hold on. I think they mixed it up a little bit (laughter).
BRUMFIEL: The same was true for AI programs that create images. I asked one of them called Midjourney to produce a schematic of a rocket's engine.
FLETCHER: The tool spat this out? No way.
BRUMFIEL: It was impressive, but it was also missing important stuff, like that big, bell-shaped thrust chamber at the bottom where all the hot, flaming gases come out.
FLETCHER: If you do produce so much thrust - right? - and that pressure buildup has nowhere to go, you have a bomb on your hands. You no longer have a rocket. So I'm very concerned with the different designs that are coming out.
BRUMFIEL: So AI got an F from the rocket scientist. Next, I asked an AI expert if she'd trust these powerful new tools to fly her to the stars.
SASHA LUCCIONI: No, never. No (laughter). Nope.
BRUMFIEL: Sasha Luccioni is a researcher for the AI company HuggingFace. HuggingFace acts as a hub for developing and testing new artificial intelligence technology. Luccioni told me that these new programs aren't like the types of computers that put people on the moon.
LUCCIONI: The actual, like, way that the computer works is very, very different in between the Apollo landing computer and the ChatGPT computer.
BRUMFIEL: Traditional computers are more like tools that people use to do the rocket science. Their programmers give them a fixed set of rules to follow. But these new systems develop rules of their own. And here's how it works. They study a database filled with millions, maybe even billions, of pages of text or images and pull out patterns. Then they turn those patterns into rules and use the rules to produce new writing they think you want to read.
LUCCIONI: They generate, they hallucinate, they create new combinations of words based on what they learned.
BRUMFIEL: Some of these results can be really creative. ChatGPT has generated poems and songs on things like how to get a peanut butter sandwich out of a VCR. Luccioni thinks AI like this might someday help artists come up with new ideas. But when you ask it a question like, what are the most important equations for rocket science?
LUCCIONI: What it's doing is really kind of like mimicking, essentially, a bunch of physics textbooks that it's been exposed to. And so it's going to take, like, a couple of words from this one, a couple of words from that one and put them together. And it makes sense. But when you know physics, you realize that this is actually, like, the description of seven different equations that it, like, mushed together in a single paragraph.
BRUMFIEL: The key problem is that these programs can't tell if the mashed-up text they've produced is factually correct. And that means anything can contain an error, like this essay it wrote for a 9-year-old about the planet Saturn.
(SOUNDBITE OF ARCHIVED RECORDING)
UNIDENTIFIED CHILD #1: Hey, everybody. Today I'm going to talk about one of the planets in our solar system. It's called Saturn.
BRUMFIEL: A friend of mine posted this one to Facebook, and I had my 9-year-old read it. And right away, he spotted the problem.
(SOUNDBITE OF ARCHIVED RECORDING)
UNIDENTIFIED CHILD #1: It's so big that it could fit all the other planets inside of it.
UNIDENTIFIED CHILD #2: Error.
BRUMFIEL: Error? Why does that have an error? Tell me.
UNIDENTIFIED CHILD #2: Because it's the second-biggest planet. It can't fit the biggest planet in it. That's Jupiter. Jupiter can fit all the other planets in it, but not Saturn.
GARY MARCUS: These systems make mistakes all the time. They don't really understand what they're talking about.
BRUMFIEL: Gary Marcus is an AI scientist and author of the book "Rebooting AI." Some AI researchers maintain that the new programs can get more accurate; they just need more time and training. But Marcus disagrees.
MARCUS: There are some people that I think have a fantasy that we will solve the truth problem of these systems by just giving them more data. And there are people that realize, no, they're missing something fundamental. They're missing an ability to look at a database and fact-check against that database.
BRUMFIEL: Without the ability to tell truth from fiction, the new AI programs will always run the chance of introducing errors. That may not matter if you're asking it to, say, write a short poem at the end of a news report on AI. But Marcus worries it could be catastrophic if the bots were used for science or medicine, where small errors could create big problems. OpenAI, the company that made ChatGPT, did not respond to NPR's request for an interview, but it did recently role out an upgraded version with, quote, "improved factuality and mathematical capabilities." Ultimately, though, Marcus believes AI may need to take a different approach.
MARCUS: We need an entirely different architecture that reasons over facts. That doesn't have to be the whole thing, but that has to be in there.
BRUMFIEL: Or, as ChatGPT would put it, AI, so advanced and bright, but fact from fiction, it cannot recite. Till it learns to tell truth from lie, rocket science, it should not apply.
Geoff Brumfiel, NPR News.
NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.