Holly Herndon: How AI can transform your voice Artist Holly Herndon created an AI clone of her voice that can sing in any languages and in any tone. In her music, Holly shows how AI can enhance the power and artistry of the voice.

Holly Herndon: How AI can transform your voice

  • Download
  • <iframe src="https://www.npr.org/player/embed/1119220726/1119578154" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript

(SOUNDBITE OF TED TALK)

HOLLY+: (Singing in Catalan).

MANOUSH ZOMORODI, HOST:

It's the TED Radio Hour from NPR. I'm Manoush Zomorodi. On the show today, incognito. And what if we can disguise and manipulate our voices to do things that no human can do? Well, it gets really confusing. This is artist Holly Herndon on the TED stage, playing a recording of what sounds like her singing. But here's the thing.

HOLLY HERNDON: So that was my voice. But that wasn't me singing.

ZOMORODI: So who was it, then?

HERNDON: So that was actually a machine learning model trained on my voice that can read scores in Catalan. So I've been working with machine learning for years, and I've been attempting to create a kind of version - a machine learning model of my voice that could perform beyond my own physical limitations. And so just recently, I've been able to achieve this. And not only can this version of myself sing in English but also in multiple other languages.

ZOMORODI: Holly calls her software, appropriately, Holly+. And, yeah, Holly+ can read notes on sheet music, and it can do this.

(SOUNDBITE OF SONG, "MACK THE KNIFE")

HOLLY+: (Singing in German).

ZOMORODI: Sing a German rendition of "Mack The Knife" or...

(SOUNDBITE OF SONG, "BESAME MUCHO")

HOLLY+: (Singing in Spanish).

ZOMORODI: ...Sing the classic Latin hit "Besame Mucho." But I just want to reiterate Holly the person does not speak Spanish, and she has never sung these songs. Holly+ is singing these songs. And it can do it in any language and in any vocal range.

HERNDON: I trained a machine learning model on hours of my natural singing voice. So this required that I sang the entirety of the range of phonemes in the English language. So what does that mean? That means all of the kind of sounds that I would potentially make in the English language. So what I would do is I would sing from a set of phrases that are specifically designed to cover all of the sounds that I could create in English.

ZOMORODI: Phrases like what?

HERNDON: Well, I don't have the script in front of me, but you can find them pretty easily online. They're called TIMIT scripts, and they're really kind of random phrases. I think one phrase that pops into mind is that quick beige fox jumps in the air. Surprise, he shouts. Just kind of random things like that where (laughter)...

ZOMORODI: I love it.

HERNDON: ...It's kind of nonsense.

(SOUNDBITE OF SONG, "TRAS DE TI")

HOLLY+: (Singing in Spanish).

HERNDON: And so then that is mapped onto other languages and I can kind of create this multilingual voice that can sing, you know, beyond my own physical capability.

(SOUNDBITE OF SONG, "TRAS DE TI")

HOLLY+: (Singing in Spanish).

ZOMORODI: From what I've read, it took you and your collaborators years to get Holly+ working so that you can give it instructions and it spits out a remarkably lifelike song. And that is what we have heard so far on the show.

HERNDON: Yes.

ZOMORODI: But at the end of your TED Talk, you took Holly+ to another level.

(SOUNDBITE OF TED TALK)

HERNDON: So I invite you to consider, if given the opportunity, who would you like to perform through? And can you imagine someone else performing you? With that in mind, I'd like to invite the incredible musician Pher to the stage.

(APPLAUSE)

ZOMORODI: You invited another singer on stage, a man named Pher.

(SOUNDBITE OF TED TALK)

PHER: (Vocalizing).

ZOMORODI: Pher sang into a mic that he was holding in his right hand. He has a beautiful voice. But then he sang into another microphone, held in his other hand.

(SOUNDBITE OF TED TALK)

PHER AND HOLLY+: (Vocalizing).

(APPLAUSE)

ZOMORODI: And we heard a live version of Holly+, which was adapting his voice into your voice in real time.

(SOUNDBITE OF TED TALK)

PHER: (Singing) When you come around acting this way. And...

PHER AND HOLLY+: (Singing) Yes, the truth is I show you every day.

HOLLY+: (Singing) 'Cause you love to stay...

(APPLAUSE)

HOLLY+: (Singing) ...Living in all the pain.

ZOMORODI: Holly, it was so weird to see a Black man wearing glasses - someone who looks completely different than you - open his mouth and have your voice come out of it and have complete and total artistic control of your voice...

(SOUNDBITE OF TED TALK)

HOLLY+: (Singing) There ain't no leaving me...

ZOMORODI: ...With you standing right next to him with your mouth shut.

HERNDON: Exactly (laughter). Exactly.

ZOMORODI: I mean, it was surreal. But, you know, it was also kind of disturbing. How would you describe the audience's response?

HERNDON: Well, you know, when you're standing on the stage, you just see the lights, so you kind of can't really see so much. But I felt like at the end of the talk, you know, everyone seemed really happy and were kind of applauding and seemed kind of flabbergasted a little bit. But it was really interesting to also have some conversations after the performance with different people and to hear different people's concerns, of course. And...

ZOMORODI: What were their concerns? Do you remember?

HERNDON: Well, I mean, I think it's really, like, fully understandable that musicians - it's usually coming from musicians, vocalists themselves, who are then worried, like, OK, what does this mean for the sovereignty of my voice? Like, if anyone can just jump in my body and sing with my voice, what does that mean for me personally? And that's a real concern, and it's one that needs to be taken very seriously.

ZOMORODI: What do you tell them? How do you say, like, well, you know what? We're going to figure that out. Or what do you say?

HERNDON: Well, that is something I'm actually actively working on to figure out at the moment. My partner and I have started an organization called Spawning. And so we're trying to figure out this really thorny question of ownership and custody of one's own model - what kind of interactions we can build around that with fans that works for the artist. There's a very justified kind of wariness for new technology. So I think the only way to really deal with it is to kind of meet it head on because it's happening. It's coming. My answer to that would be that everyone should have the ownership and the ability to custody their own model and to be able to decide whether or not they want to make that public or whether or not they want to keep that to themselves or whether or not they want to license that to people. I think that should be a personal choice.

(SOUNDBITE OF MUSIC)

ZOMORODI: So I think there are going to be people listening who are really excited by this technology. But then there are going to be others who are like, why? Like, why? Cool party trick, but why?

HERNDON: Well, why? I think that there's many kind of artistic reasons why someone might want to perform through someone else's vocal timbre - for example, even just kind of the range. So if you have a baritone range and maybe you would like to know what it's like to sing as a soprano, you know? Or - it's - you're kind of changing your bodily resonance by being able to jump into someone else's vocal timbre. And I think, you know, with me, it's maybe less exciting. But with someone like a Beyonce, maybe some - maybe her fans would love to make a kind of series of songs in homage to their queen. And so I could see other voices being really revered by fans and creating a whole kind of ecosystem of fan-generated art and fan-generated music that could be actually really fun and interactive.

ZOMORODI: OK, so I guess Beyonce - I mean, I don't claim to ever speak for Beyonce - but I guess she might say, well, that's not OK if you are writing your own music and using my voice and then selling it.

HERNDON: She might very well say that, and I think that should be entirely up to her. I don't think that should be up to her record label or up to her publisher or up to anyone else but her.

ZOMORODI: So we've mostly been talking in the musician or artistic way of using the software. But, you know, most of us have probably heard of this ability to transform a voice as - like those fake videos of President Obama saying things that he has never said or other deepfake voices. That's what's going to come to mind for most people.

HERNDON: Yeah. I mean, I think - I try to avoid the deepfake phrase because I feel like it has such negative connotations around trickery and scamming people and whatnot. I see this technology as a really interesting way for people to find new ways to perform. So there's kind of two sides to this coin. There's the dystopian side, where we're using this technology to cheat people or fake people. And then the other side to that coin is to ask, OK, what if we could create digital versions of ourselves and allowed other people to perform through us, and we could perform through other people with their permission? What might that unlock? What kind of weird, new performance styles and genres could that create? What kind of new art forms would come out of that? So I'm trying to look at it from a more optimistic perspective, but I think that also requires consent.

ZOMORODI: OK. So let's say everyone's in. There's consent. Let's go back to what you were saying about, you know, the average person or average baritone being able to sing a soprano. It's kind of the ultimate disguise in some ways. What does it feel like? Have you ever sung into a microphone and had your voice come out sounding like someone else? Or right now, is it just your voice that you're doing this with?

HERNDON: Well, the only voice model I have is my own voice, so it's not really as spectacular when I do it.

ZOMORODI: No. No, you're missing out, clearly.

HERNDON: (Laughter) I am. But I definitely have plans to expand the catalog of voices. But also, you know, my - you know, my journey with machine learning and voice processing - you know, it didn't really just start five years ago. One of the reasons why I got into this topic in general is because I've been working with the digitally processed voice for over a decade now. And so I started doing that back in the day when I was a computer musician, and I was looking for a way to make my computer music performance more embodied. And so I started using my voice as a kind of data stream as a way to control different parameters on my computer. I never really thought of myself as a vocalist in that way.

So I very much relate to this idea of using computer processing to be able to create sound beyond the physical limitations of my voice. That's something I've been really obsessed with for a very long time now, and it's an incredibly beautiful feeling on stage when I can sing into a microphone and I can rumble an entire auditorium with a huge kind of, like, engulfing subbass that I've mapped to my voice. That's a really wonderful and beautiful, empowering feeling. So I think it can be really transformative in that way.

(SOUNDBITE OF HOLLY HERNDON SONG, "CHORUS")

ZOMORODI: That's artist Holly Herndon. If you want to hear what else Holly can do with her voice and computers, you can find all her music online, like this track, "Chorus," from her album "Proto." You can, of course, watch her full talk at ted.com.

(SOUNDBITE OF HOLLY HERNDON SONG, "CHORUS")

Copyright © 2022 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.