Computer Program Yields Clues About Baby Talk A computer program may help explain how babies learn to talk. Researchers have studied how babies learn to distinguish vowel sounds in different languages, and they've created a computer program that can learn in the same way. One of the researchers discusses the new work.
NPR logo

Computer Program Yields Clues About Baby Talk

  • Download
  • <iframe src="" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript
Computer Program Yields Clues About Baby Talk

Computer Program Yields Clues About Baby Talk

  • Download
  • <iframe src="" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript

JOE PALCA, host:

From NPR News, this is TALK OF THE NATION: SCIENCE FRIDAY. I'm Joe Palca.

When it comes to how infants learn to speak, the scientific community is split. One camp thinks that humans are naturally attuned to certain categories of sound, so that we're born with the ability to distinguish between ah's and oh's, or R's and L's. Others, including my guest, James McClelland, think that infants learn to distinguish and categorize sounds as they hear them. McClelland and his colleagues created a computer program that does just that. It can learn to differentiate and lump the same vowel sounds on the fly.

The idea is that if a computer program can do it, maybe that's how a baby can do it. A description in McClelland's program appears in this week's edition of the proceedings of the National Academy of Sciences. James McClelland joins us from Stanford University in Palo Alto, where he's on the faculty. Dr. McClelland, thanks for joining us.

Dr. JAMES McCLELLAND (Professor of Psychology, Stanford University): My pleasure.

PALCA: So how is this computer program - how is it going to - what's it going to tell us about how infants learn?

Dr. McCLELLAND: Well, I think it's going to push the debate a little further. I just like to say that, you know, the number of investigators who are now pursuing the idea that learning plays a tremendous role in the initial establishment of children's linguistics ability is burgeoning like crazy. And our paper is just another little piece of that growing wave of enthusiastic investigation of these issues.

PALCA: Okay. You know, I guess, since this is - since we're radio, thank goodness, we can actually hear the sounds that you were talking about. So why don't we cue up the two types of sounds? And we'll play them one at a time and then I'll get you to identify them, okay? So why don't we hear the first sound that you were using?

(Soundbite of a computer program)

PALCA: Okay. Now what were we hearing there? And what would that - what was the computer trying to do with those sounds?

Dr. McCLELLAND: Well, you're hearing a English-speaking mother, talking to her baby. And those sounds were actually invented as non-word sounds to test how mothers actually speak to their babies, by my collaborators, Janet Worker(ph) and Shigeyaki Amano(ph), who is based in Japan. So those sounds were the sounds produced by an English-speaking mother talking to her baby, and the vowel sounds were particularly interested in are the ones in the first syllables of those words, the ea(ph) sound in the first one you played, and the E sound in the second one you played. Those are distinct vowel categories that we use in English in words like hit and heat, bit and beat and so on.

PALCA: Now, I'm just going to say - Shawn and Gwen, in the control room - can we hear the first one again so we can hear what we just heard? I know that's not what we're going to do, but let's hear the first one again.

(Soundbite of computer program)

PALCA: Okay. And now, can we hear the second one? The second clip of tape?

(Soundbite of computer program)

PALCA: Okay. Now, what was that?

Mr. McCLELLAND: That was a Japanese mother speaking to her baby in Professor Amano's lab in Tokyo. And when we listened to those sounds as native English speakers, at least I don't hear the difference between the E sounds in the first syllable. So both of those two items seem to begin, for me, with the same E-like sound. The syllable would be P in both of them. You could play it again maybe…

PALCA: Okay. Can we play the second…

Mr. McCLELLAND: …others can see what they think, too.

PALCA: …second one again.

(Soundbite of computer program)

PALCA: And so those are supposed to be sounding different?

Mr. McCLELLAND: Yes, that's correct. In fact, the vowel sounds are - the vowel sound is twice as long in the second one as it is in the first one. So the E is held for twice the duration in the second item that you played as it is in the first item. And a Japanese mother, you know, would use that difference and Japanese speakers use those differences all the time to distinguish words the same way we use an E to distinguish words in English.

PALCA: So it sounds like there's - this is - if nothing else, it's a good example of why as you get older, it's hard to learn a new language.

Mr. McCLELLAND: Yeah. So what happens is that as you experience a language, you attune your perception to the structure that is present in that language. And that's what my collaborators and I are so fascinated by is how this process occurs.

PALCA: Well, if you want to join the discussion, our number is 800-989-8255, that's 800-989-TALK. And so how do you train a computer to hear this and how does that training help explain what's going on?

Mr. McCLELLAND: Okay. Well, so we start with a little simulation, which is supposed to represent the state of the young babies' perceptual representation of sounds in this region of the space of possible sounds. And in - we start with a completely home a genius representation to that space. In other words, our model sets is launched into the world with the ability to learn nay distinction that might arise among the sounds, but it doesn't have built into it any particular ones to start with.

PALCA: I got it. So it sort of learns like a baby would learn?

Mr. McCLELLAND: That's the idea

PALCA: I see.

Mr. McCLELLAND: We hope the model is charactering sort of the essence of what happens when babies learn.

PALCA: Okay. Well, let's take a call now and go to Mark in Catonsville. Mark, welcome to SCIENCE FRIDAY.

MARK (Caller): Yes. Thanks taking my call.

PALCA: Sure.

MARK: Well, I have an East Asian wife, Korean, and she has trouble confusing her Rs and her Ls, and she'll even stop to think and - she'll normally always put an R instead of an L, where she needs an L. Can you tell me why she does that or why it's just Rs and Ls?

PALCA: Interesting.

Mr. McCLELLAND: Well, I've actually done a good deal of research on that question and so have several other people. The situation is that in Japanese, there's a sound, which is similar to our L sound - not quite the same but similar to it - and there's no sounds that exactly like our R sound.

And in fact, this sound that the Japanese uses is a little bit in between the L and the R that we use. And the way we think about this - and this idea was introduced by Patricia Kuhl, one of the leaders in this research area - is that your experience creates something like a little perceptual magnet, so when you hear sounds that are similar to each other over time, they kind of build up and other sounds that are similar to them get sort of sucked into this little attractor or magnet. And so, when we listen to them, we hear them as being the same as what we're used to rather than having the characteristics that they actually have in the actual acoustic input.

PALCA: Okay, Mark. Did that help at all?

MARK: Yes, I'll try to explain it to her.

(Soundbite of laughter)

PALCA: It's going to be a little tricky, maybe. Let's take one more call now and go to David in Charlotte, North Carolina. Welcome to the program.

DAVID (Caller): Hi. How are you doing?

PALCA: Good.

DAVID: I have question regarding the infantile language acquirement debate that's going on right now. My sister has a newborn child and she's teaching her daughter through sign language and communication skills that way. And she's not deaf - but how does that - well, what are the differences between that and a vowel and consonant formation where that you'll find elsewhere?

PALCA: Oh, that's interesting.

Mr. McCLELLAND: That's a really interesting question. I - you know, what we've learned is that although there are differences between the auditory input modality and the gestural input modality, the amazing thing is that the brain -as it learns to structure input as a language as a medium of communication -treats the different kinds of inputs in very, very similar kinds of ways.

So the tendency that deaf individuals who have practiced sign language all their life, they have the same tendencies to take things that we would see as different and treat them as the same perceptually based on their experience, treating them as the same communicative gesture in throughout their lives.

PALCA: Well, this whole business of language acquisition, as you say, is totally fascinating. And thank you so much for coming along and showing us this little interesting aspect of it.

Mr. McCLELLAND: Oh, you're very welcome. Thank you.

PALCA: That was James McClelland, he's a professor of Psychology at Stanford. He joined us from Stanford University in Palo Alto.

Copyright © 2007 NPR. All rights reserved. Visit our website terms of use and permissions pages at for further information.

NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.