MARY LOUISE KELLY, HOST:
Maybe you have heard of deepfakes, super-realistic videos that use artificial intelligence and other tools to make people appear to say or do things that they never said or did. Celebrities, politicians, everyday people have all been victims. The latest deepfake controversy hit the music business. YR Media's Nimah Gobir reports that the implications are far-reaching.
NIMAH GOBIR: You probably know Shawn Corey Carter by a different name - Jay-Z, multiplatinum rapper who also happens to be Beyonce's husband. He's got one of the most recognizable voices in all of popular music.
(SOUNDBITE OF SONG, "99 PROBLEMS")
JAY-Z: (Rapping) If you having girl problems, I feel bad for you, son. I got 99 problems, but a b**** ain't one. I got the rap patrol...
GOBIR: That's Jay-Z on his track "99 Problems" from "The Black Album," which debuted at No. 1 in 2003.
(SOUNDBITE OF SONG, "99 PROBLEMS")
JAY-Z: (Rapping) If you don't like my lyrics, you can press fast forward. Got beef with radio if I don't play they show.
GOBIR: And here's his voice on a different hit from the 1600s.
(SOUNDBITE OF YOUTUBE VIDEO, "JAY-Z RAPS THE 'TO BE, OR NOT TO BE' SOLILOQUY FROM HAMLET (SPEECH SYNTHESIS)")
AUTOMATED VOICE #1: (Imitating Jay-Z) To be or not to be - that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune...
GOBIR: This voice spitting Shakespeare is not Jay-Z. It's not even human. It's an audio deepfake, something UC Berkeley professor Hany Farid knows how to make because he specializes in spotting them.
HANY FARID: There are recognizable patterns in how we speak - the tone, the intonation, where we put the emphasis. So there's a way of capturing mathematically a person's speech. And the machine-learning algorithms are simply learning that pattern of speech and then synthesizing.
GOBIR: So basically, you take a bunch of audio. Let's say Jay-Z talking.
FARID: And you train a machine-learning algorithm to synthesize speech in their voice so that at the end of the training on hours and hours of audio, you type at the keyboard whatever you want that person to say, and it says it in their voice.
GOBIR: Like George Bush doing "In Da Club" by 50 Cent...
(SOUNDBITE OF YOUTUBE VIDEO, "GEORGE W. BUSH READS 'IN DA CLUB' BY 50 CENT (SPEECH SYNTHESIS)")
AUTOMATED VOICE #2: (Imitating George W. Bush) Go, shorty. It's your birthday. We gon' (ph) party like it's your birthday. And we gon' sip Bacardi like it's your birthday.
GOBIR: ...And Barack Obama reading Notorious B.I.G.'s "Juicy."
(SOUNDBITE OF YOUTUBE VIDEO, "BARACK OBAMA READS 'JUICY' BY THE NOTORIOUS B.I.G. (SPEECH SYNTHESIS)")
AUTOMATED VOICE #3: (Imitating Barack Obama) You know what I'm saying? It's all good, baby, baby. It was all a dream. I used to read Word Up! magazine.
GOBIR: All of these clips are the work of a developer and YouTuber named Vocal Synthesis, who won't reveal their real name or say why they choose to remain anonymous. But Vocal Synthesis did tell us this. They make sure to label all of their deepfakes computer generated using text-to-speech technology. So it's not like anyone's going to think Jay-Z's actually covering Billy Joel.
(SOUNDBITE OF YOUTUBE VIDEO, "JAY-Z COVERS 'WE DIDN'T START THE FIRE' BY BILLY JOEL (SPEECH SYNTHESIS)")
AUTOMATED VOICE #1: (Imitating Jay-Z) We didn't start the fire. It was always burning since the world's been turning.
GOBIR: Try telling that to Jay-Z, though. Vocal Synthesis says Jay's company, Roc Nation, filed a takedown request to withdraw the videos from YouTube for copyright infringement. Roc Nation's parent company, Live Nation, didn't respond to our request for comment. YouTube removed the videos for a hot second but then reposted them, which begs the question...
PATRICE PERKINS: Do I think that there was actual copyright infringement in how they used it? Probably not.
GOBIR: Patrice Perkins is an attorney with the Creative Genius Law firm. Even though a person's voice is not covered under U.S. copyright law, courts have ruled that the voice is part of someone's identity, so it can be protected in certain circumstances. Both Bette Midler and Tom Waits have successfully sued when companies used someone imitating their distinctive voices in commercials. And even though Vocal Synthesis is posting their creations for free...
PERKINS: That is the slippery slope for the artist, and that's the danger for the artist who is not aware and maybe not monitoring. And then they look up, and all of a sudden, they're in a General Mills commercial that they didn't perform for.
GOBIR: So what does all this mean for artists who aren't at Jay-Z's level and don't have the clout or the money to protect themselves? Jessica Brown, who goes by Money Maka, is not OK with a deepfakery.
JESSICA BROWN: Because my voice and my face are my property whether it's copyrighted or not. It's like somebody telling you you don't own your own body.
GOBIR: Brown worries about what's lost when technology replaces music's raw human energy.
BROWN: I thought the point of making music was to be creative. Everybody has their own spin to their own music. But if robots are making music and they can make any specific genre of music, then what are we even doing at this point?
GOBIR: She's also concerned about appropriation, especially given the history of Black art being used without permission, recognition or compensation. Attorney Patrice Perkins brings it back to Jay-Z.
PERKINS: That is a voice that, no matter what, we know is Jay-Z. It wasn't just a Black voice. Like, it was a specific Black voice that is representative of a culture of people.
GOBIR: Whether you're into it or not, technology that can make people do or say anything is here. And because it changes so fast, the laws are struggling to keep up. So for now at least, it's up to music lovers and makers to try to separate the real from the fake. For NPR News, I'm Nimah Gobir.
(SOUNDBITE OF SONG, "EMPIRE STATE OF MIND")
JAY-Z: (Rapping) Me, I'm out that Bed-Stuy.
KELLY: And that story was produced by YR Media, a national network of young journalists and artists.
NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.