If You Said Yes, Press 'One' Now

Voice recognition technology
iStockphoto.com

Americans spent 43 billion minutes, or 80,000 years, on the phone with automated voice recognition systems last year. And it probably won't come as a surprise that most were less than happy about it. In fact, according to one study, only one caller in 10 was satisfied with the experience.

But that doesn't mean that big companies are going to be encouraging us to speak to real live people anytime soon.

"Companies profess boundless interest in their customers," writes John Seabrook in the current issue of The New Yorker, "but they don't want to pay an employee to talk to a caller if they can avoid it." He says the average call that involves a live employee costs a company between $5 and $15. In contrast, an automated call costs a company virtually nothing once the technology has been paid for.

Voice recognition technologies have made enormous strides since first invented in the 1950s, but they're still very much in their infancy in many ways.

Most conventional voice recognition technology tries to anticipate the different things people say in advance, Seabrook says. But people also say the same things in different ways. Southerners, Seabrook notes, are more likely to say "Yes, Ma'am," or "Yes, Sir," for example. When people speak on cell phones in cars, there's not only added background noise to be filtered out, but Seabrook says that people actually speak differently than they would if they were indoors.

A typical voice recognition program searches bases for matches to what it hears. The success depends on how large the database is. But while corporations will do almost anything to prevent you from speaking to a real person, that often doesn't include spending money to update the database, says Seabrook, "because the whole reason they're doing this in the first place is to save money."

Seabrook says that "one of the interesting things about the world of voice recognition is that it's all based on probabilities." It's all about trying to determine the best match between the database and what you say. "There's no actual understanding of what you say," he explains.

The frontier of voice recognition, he says, is determining emotion. A company or government office can avert large problems if its voice recognition machines can detect anger and frustration from callers — especially the anger and frustration that result from the call recognition system itself. "It's a nightmare when you think about how much frustration you can get from just five minutes," says Seabrook. "If you could bottle that frustration, you could move mountains."

Comments

 

Please keep your community civil. All comments must follow the NPR.org Community rules and terms of use, and will be moderated prior to posting. NPR reserves the right to use the comments we receive, in whole or in part, and to use the commenter's name and location, in any medium. See also the Terms of Use, Privacy Policy and Community FAQ.