A Look At How Word Prediction Software Works
ROBERT SIEGEL, HOST:
On this All Tech Considered - my last one before retiring, so I'm allowed to indulge a point of personal curiosity - we're going to hear from someone who works on word prediction. For example, I've got my smartphone open. I'm going to type out a message - capital H. And at the bottom of the screen are three choices. The middle one is hi, which is what I meant - hi. Now I have three choices, and the first one is there, which is what I meant - comma - T-H - thanks is the middle choice. Thanks for is what I meant. And now I'll do J-O-I-N. Joining is the middle choice, I'll tap that. And then us is the choice on the left.
Hi there. Thanks for joining us. Ben Medlock, co-founder of SwiftKey, a startup that helped pioneer this technology and that is now part of Microsoft. He joins us from London. Hi. Welcome to the program.
BEN MEDLOCK: Hi, Robert.
SIEGEL: I'm just wondering, are there some metrics that you've learned in your work? That is, what is the likelihood of when I write a word that we can figure out what three good possibilities are of what the next word is that I have in mind?
MEDLOCK: Yeah. So if you imagine that pretty much the simplest thing you could do if you wanted to predict the word that someone's going to want to say is to look at all the words they've said in the past, count them up and then choose the three that occurred most frequently and just suggest those. So that will get you to somewhere around 10 percent accuracy because language is a system where you use a lot of the same words a lot of the time. But, of course, what this doesn't take into account is any context.
And so we can do a bit better than that by looking at all the things that you've said in the past again, but now, the context within which you've said them. So let's say you say the words I love. We could then look at all the words you said in that particular context and again count them up and choose the three that have the highest count. And then maybe you get the word you and the word chocolates.
SIEGEL: But you're asking what I am likely to do. Are the choices that I see personal and about my use of the language, or would everybody, after seeing hi, see there as one of the most-common words to follow?
MEDLOCK: You're actually spot-on. The problem with just the things you've said is that actually there's quite a lot of stuff that you say that you haven't said before in quite that way. So one of the ways we can improve on this idea of just using your own language is to blend it with lots of statistics from lots of other people.
SIEGEL: Is there some metric that's known to you and your colleagues in this field as to, you know, what your batting average is, how often you can get it right?
MEDLOCK: At the moment, I think it stands around a third of words - in English, at least - we can predict without the user actually having to type anything.
SIEGEL: And has that rate been getting better and better over the years?
MEDLOCK: It has, yeah. I think if you look back four or five years, we were probably just less than a quarter of the words. Now we're up to a third. And I'd like to think that some day we'll get towards that magic 50 percent.
SIEGEL: I'm curious. Having worked on this for some years right now, which, for you, is the more impressive finding - that the way we write is really pretty predictable for most people in most occasions or that word choice shows how individual and surprising human beings are?
MEDLOCK: Yeah. I think in some ways, the bread and butter of our technology is the fact that people do use language in predictable ways. And I sometimes like to think about functional language, where you're just trying to get a message across. And you quite often end up saying something that's the same when you don't really want to have to think about that. You'd kind of rather the technology did that for you. On the other hand, there are these times when we really want to be creative with language. And that's where, you know, we really don't have technology that can come anywhere near capturing the beauty of language in that way. So I think both things are very interesting and very fruitful in their different ways.
SIEGEL: Well, Ben Medlock of SwiftKey, thanks for talking with us today.
MEDLOCK: Thanks, Robert.
NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.