New Voices For The Voiceless: Synthetic Speech Gets An Upgrade

  • Hide caption
    Samantha Grimaldo was born with a rare disorder, Perisylvian syndrome, and has never been able to speak.
    Ellen Webber for NPR
  • Hide caption
    Samantha uses a text-to-speech iPhone app to help her communicate. Here she shows the app interface.
    Ellen Webber for NPR
  • Hide caption
    Joseph, Alexandra and Samantha Grimaldo sit around the kitchen counter in the family's home in Marlborough, Mass., playing with Samantha's voice app, though they mostly use sign language at home.
    Ellen Webber for NPR
  • Hide caption
    When Samantha was younger, she carried this device with her to help her communicate.
    Ellen Webber for NPR
  • Hide caption
    Samantha speaks with her mom, Ruane, about going to the movies with a friend.
    Ellen Webber for NPR
  • Hide caption
    Samantha watches her brother Nicholas play piano. Their mother says that a new customized voice created by researcher Rupal Patel from a young Samantha's voice sample is happy and has a sweetly familiar quality. "My son — my son Nicholas — I could hear some of his voice in it," she says.
    Ellen Webber for NPR

1 of 6

View slideshow i

Ever since she was a small child, Samantha Grimaldo has had to carry her voice with her.

Grimaldo was born with a rare disorder, Perisylvian syndrome, which means that though she's physically capable in many ways, she's never been able to speak. Instead, she's used a device to speak. She types in what she wants to say, and the device says those words out loud. Her mother, Ruane Grimaldo, says that when Samantha was very young, the voice she used came in a heavy gray box.

The text-to-speech iPhone app that Samantha Grimaldo uses has three voice options for her to choose from. i i

The text-to-speech iPhone app that Samantha Grimaldo uses has three voice options for her to choose from. Ellen Webber for NPR hide caption

itoggle caption Ellen Webber for NPR
The text-to-speech iPhone app that Samantha Grimaldo uses has three voice options for her to choose from.

The text-to-speech iPhone app that Samantha Grimaldo uses has three voice options for her to choose from.

Ellen Webber for NPR

"She used to have to carry this device around that was at least 4 or 5 pounds," Ruane says, "and she was only, like, 70 pounds herself. The poor thing had to carry this back and forth to school every day on the bus." It was miserable having to lug her voice around that way — a clunky box sitting on the seat next to her.

Today, fortunately, Samantha's voice takes up much less space. She types into a special program on an iPhone or iPad, and a synthesized voice in the program says the words aloud. The voice, one of several types on the market, is called "Heather." That's a nice enough name — easygoing and accessible — but Grimaldo doesn't like to use the voice if she can help it.

Her mother has noticed that when the family goes out to restaurants, Samantha prefers to write out her menu choices. Apparently, as she explains to her mother, this is because Samantha has some reservations about the voice itself — the cold metal sound of it.

"Because [it's] weird," Samantha says of the mechanical voice — speaking in the voice itself.

It's not just that the voice is artificial and disjointed. It sounds, Samantha says, "older." Samantha is only 17, and the sound of the voice — deep, methodical, mature — doesn't exactly align with her sense of herself. Like any teenager, she feels self-conscious about it.

"I don't want [people to] hear," she says.

The Voice For The Voiceless

If you don't have a voice, who speaks for you? Today there are more than 60 different options for people who need to use synthetic voices to communicate, but for the majority of people who use them, there is a single answer to that question: "Perfect Paul."

Rupal Patel, a speech scientist at Northeastern University, estimates that between 50 and 60 percent of the people who use synthetic voices use the same one — the Perfect Paul voice. If you have ever heard Stephen Hawking speak, or listened to the weather radio, you have heard the voice of Perfect Paul.

Perfect Paul is used so widely because some studies have shown that his voice is easiest to understand in a variety of situations, including classrooms and public outdoor spaces. Still, some in the community of people who rely on synthetic voices have found the Perfect Paul version frustrating — not because it's a bad voice, but because it's limiting.

In fact, it was through confronting the clear limits of Perfect Paul that speech scientist Patel came to the conclusion that people like Samantha Grimaldo needed new options.

It happened around 10 years ago when Patel was at a conference for the makers and users of synthetic voices.

Rupal Patel is a speech scientist at Northeastern University.

Rupal Patel is a speech scientist at Northeastern University. Courtesy of Mary Knox Merrill/Northeastern University hide caption

itoggle caption Courtesy of Mary Knox Merrill/Northeastern University

"I was watching a demonstration of a new technology, and someone came up and said something in their synthesized voice, and then someone else came up," Patel says.

Both spoke in the same voice — Perfect Paul's. Then a third person arrived, and another.

"It was the same voice saying different things," says Patel. "And sometimes they were saying the same phrase, but off by a few seconds ... so it felt like it was this echo going on. It was just a strange thing."

Standing there, in the middle of all these radically different people with the exact same voice, Patel had an idea: Isn't there something we can do to make these voices more individuated?

So, around seven years ago, Patel started working to change synthetic voices. When a person speaks, two things are happening. First, the source of speech comes from the voice box, which vibrates to produce sound. Then, the mouth shapes those sounds into speech.

In many people who have speech disorders, it's mainly the second part of the system that doesn't work. "In people with speech disorders, the source is pretty preserved," Patel says. "I thought, 'That's where the melody is — that's where someone's identity is, in terms of their vocal identity.' "

So Patel decided to capture the melody of a voice. She primarily works with kids, and so she asked kids with speech disorders who can still make some sounds to come into her lab and do something really simple. "We just need them to say a sustained sound, like ahhhhh," she says.

Patel can take that sound, run it through a computer and find out all kinds of things about how that person would sound if that person could speak words. "We can determine their pitch, the loudness, the breathiness of their voice, the changes in clarity," she says.

She then takes a recording of the voice of what she calls a "healthy donor" — for example, the voice of a child who is roughly the same age as the child she's trying to help — and gets them to say a large number of words. So she ends up with samples of the sounds they produce when they talk. She then combines that voice with the pitch, breathiness and other characteristics of the child with the voice disorder.

Patel played me examples of two different voices she's created. If you listen, you can clearly hear different pitch and clarity in the different voices.

These voices Patel can make are unique for each individual. Which brings us back to Samantha Grimaldo.

'You Need A Voice'

When Patel was getting started, Samantha was one of the first kids with a voice disorder who came to her lab to give a voice sample. At the time, Patel wasn't at the stage where she was actually constructing voices. But she's since figured it out, and recently, she created a new voice using Samantha's ahhhhh sample.

Last week, she gave the personalized voice to Ruane and Samantha so they could hear it. The voice was constructed from a sample taken when Samantha was much younger. For a current version of Samantha's voice, you'd need to take a new sample. Still, it was the first time that Samantha and her mother had heard anything close to Samantha's voice.

Ruane had listened earlier in the day, when Samantha was still at school, and was clearly deeply moved by the experience. It made her realize in a fresh way, she says, how difficult it had been for her to never hear her daughter's voice.

"When I heard it, I thought, 'Yeah! This could be it!' " Ruane says through tears. To her ear, the voice had a sweetly familiar quality. "My son — my son Nicholas — I could hear some of his voice in it," she says.

And so, when Samantha got home from school that afternoon, they sat down together to listen. Samantha's young voice, it turns out, is clear and light.

Ruane told me that when Samantha heard the voice, her eyes lit up and a smile broke out on her face. Both thought that the voice sounded happy.

Personalized voices like these aren't yet available to everyone. Patel has figured out how to do it, but not how to make it work on all of the different electronic devices that people use to play a synthetic voice. But Ruane Grimaldo hopes that voices like these will be available one day, very soon.

"You need a voice," she says. "You need a voice."



Please keep your community civil. All comments must follow the Community rules and terms of use, and will be moderated prior to posting. NPR reserves the right to use the comments we receive, in whole or in part, and to use the commenter's name and location, in any medium. See also the Terms of Use, Privacy Policy and Community FAQ.