Analyzing The Language Of Suicide Notes To Help Save Lives
NEAL CONAN, HOST:
Every 14 minutes, someone in this country commits suicide, and research on ways to reduce that grim statistic appears to be on a plateau. In other words, psychologists don't have much in the way of new ideas - at least, right now - except maybe for what's described as groundbreaking work on the notes that those who kill themselves sometimes leave behind. A team of researchers at the Cincinnati Children's Hospital use computers to break down the language in these messages of despair, in the hope that they can better identify those at risk.
Dr. John Pestian leads the research team. He's a pediatrician and director of computational medicine at Cincinnati Children's Hospital, and joins us now from member station WGCU in Cincinnati. [POST-BROADCAST CORRECTION: The NPR member station in Cincinnati was misidentified. It is WGUC.] And good of you to join us today on TALK OF THE NATION.
JOHN PESTIAN: Oh, thank you for inviting me. And just for the record, I'm not a pediatrician. I'm a scientist there.
CONAN: All right. All right. I apologize.
PESTIAN: I'm a PhD. I'm not an MD.
CONAN: I understand you work from a collection, though, of 1,300 suicide notes. Who made this collection, and why?
PESTIAN: So myself and the folks in my lab, we collected that over about the last five or seven years. A large contributor was Dr. Edwin Shneidman, who was the father of suicide research at UCLA, and Dr. Anton Leenaars up in Canada, and we work together in collecting these data and mining them together to create what we would call, from the linguistic side, a corpus.
CONAN: A body.
PESTIAN: And they - I'm sorry.
CONAN: A body. Yes.
PESTIAN: A body, a body of language, a body of notes. And if we want to do analysis of those using natural language process and other methods, like machine - from machine learning, we need a large corpus in order to start and to teach the computers from.
CONAN: Where did you get them?
PESTIAN: So, Dr. Shneidman, before he passed away, he gave us a large - or a good body of it, Anton Leenaars. And then the rest came from people across the United States who now, to this day, still send notes so that they can be added into the corpus. And so over the last five, seven years, we've been able to amass about 1,300 of them.
CONAN: And sometimes those of us in other fields of work think we have got depressing jobs. Reading those things and to analyze them, that's got to be sad work.
PESTIAN: You know, I think it's sad if you look at it on the, you know, on the very intrinsic basis. But when you look beyond that and the opportunity in order to develop methods that will help save lives, you look beyond the sadness rather quickly. That being said, I've read all of them, and they are very sad tales, in many cases.
CONAN: What - how are they similar and how are they different?
PESTIAN: The most similar thing in the notes that I found in reading was the loss of hope. When hope is gone, when hopelessness emerges - and that's in most of the notes - then folks have a tendency, in order to - you know, that's what you see most often. Secondarily, what you see most often is these practical instructions. Remember to change the tires. Remember to change the oil. I drew a check, but I didn't put the money in. Please go ahead and make the deposit. So there's a lot of these practical, and that would come in second to the idea of hopelessness. Other emotions are, you know, depression, a little bit of anger, not so much hate, but just, again, the whole idea of abandonment, and I just can't go on any longer. I can't deal with this any longer.
CONAN: And you are analyzing these in order to teach computers how to recognize similar kinds of phrases.
PESTIAN: Yes. So, it's - in Cincinnati Children's, we see roughly about 40 suicidal kids a week come into the emergency room. And the idea is: Can we predict if they're going to come back? Of those 40, we send about 20 into the hospital, and we send 20 home for external therapy. And we want to know, are we going to be able to - or are we sending the right ones home? Are they going to come back? What's the risk that they're going to come back? So by asking me - them questions, and then comparing it to the suicide notes, we can kind of get an idea of how similar or how divergent their language is from the language of suicide.
CONAN: And is this a good predictor?
PESTIAN: So far. And again, we've only done one small study in the hospital. We've enrolled 30 kids in 30 core - 30 control groups as a prospective study. We've done a series of retrospective studies. So far we're about 90 to 93 percent accurate of predicting who is suicidal, not whether they come back. We have to develop some more statistics for that. We're just getting ready to start a four-center study that includes us and the University of Cincinnati adult hospital.
CONAN: Explain to us a little more how this works. How do you get language out of them when they come into the hospital, and how do you compare it to what's been written in suicide notes?
PESTIAN: Yeah. So you just sit down and you have whoever is in charge, in our case it's the social workers. We ask them a - the kids a series of questions. We ask things like, do you have hope? Do you have any secrets that you're hiding that you want to talk to us about? Do you - where does it hurt emotionally? And we listen to those responses.
And then at this point, because it's in development, we transcribe that. And then we use techniques that others and we have developed in order to compare one body of language in order, one kid with a larger body of language. And that falls under the whole idea of natural language processing.
CONAN: Natural - and machine learning is part of this.
PESTIAN: Machine learning is - would be the, you know, the top of the tree and natural language processing would be one of the methods under the machine learning. There's multiple - there's all kinds of tools in machine learning but NLP is one of them.
CONAN: And how does that work?
PESTIAN: So what we do is we take these notes. And we originally had a great deal of help from the survivor community. And the survivor community are the people who had someone in their family or a loved one die by suicide, because when you read the notes you have to annotate them with emotions. You have to look and say, oh, I'm really mad at my mother. Is that anger or is that hate? Someone has to read through that and annotate it and say - so we had these...
CONAN: To tell the computer, who is not going to understand this, yeah.
PESTIAN: Exactly. Exactly. So that annotation drives how the computer looks at it. And 160 of the survivors volunteered, and they all read the notes three times so we can get a good, what we call inter-rater reliability. And then we gave half of that to the computer and said learn from it, and then we'd keep the other half behind without the answers and ask the computer to learn - or tell us what the answer should be.
We did this in order to increase the quality. We did this in an NIH-sponsored international competition where we took the data and we asked linguists from all over the world to compete and see who could come up with the best methods in order to predict which emotions, what emotional categories we were looking at.
CONAN: And then you did, you say, retrospective studies. In other words, you went back and entered the...
PESTIAN: We took old...
CONAN: Go ahead, tell me. Yeah.
PESTIAN: Yeah, we took old data, and old, it wasn't - people weren't enrolled, but they were data that had been around. One example was where we had 66 notes that Shneidman had put together; 33 were from actual people who had committed and 33 where he went to a local labor union near UCLA and said if you were going to die, if you were going to commit suicide, what would you write?
And we took those notes and we shuffled them all up, and then we asked about 60 mental health professionals to tell us which ones were real and which ones were simulated. And mental health professionals were good as a flip of a coin, about 50 percent of the time. Now, that's a hard task, don't get me wrong.
CONAN: Mm-hmm. Yeah.
PESTIAN: But the machine was good about 90 percent of the time. And the reason why, we believe, is the whole idea of what we call psychological phenomenology where if you, Neal, see something, you interpret it the way you learned how to interpret it, and, I, John, see something, the same thing, I'm going to interpret the way I think it should be interpreted. Computers don't do that. People do that. Computers look at sentence structure, noun-verb patterns, things along that line.
CONAN: This is going to scare people, the idea that machines are going to be diagnosing people.
PESTIAN: Oh, yeah. So, we never do that. We give decision support. We help in the diagnosis. We help in providing information. The Institute of Medicine published a book not long ago, the Learning Hospital, and they basically said there's too much information for one person to be able to pull it all together, analyze it and make a diagnosis. We need better ways in order to present the information. And that's what we do. We just present it to the - in the end, the art and the value of medicine is in the human interaction.
CONAN: The human interaction. So the computer might send up a red flag, but it's a human who's got to go in there and make a decision.
PESTIAN: Yeah. It makes it up and say, you know, this is similar, just like if you had diabetes and you went in, your blood sugar was high. Then the computer - the lab machine, the computer would say, this is similar to other people who have diabetes. You better check on it. It's no different. In those cases, you're looking at biomarkers. With us we're looking at thought markers.
CONAN: Thought markers. That's an interesting way to put it. Who came up with that expression?
PESTIAN: I did.
PESTIAN: And the reason why is because I have hundreds of biologists around me, and I could never speak to them unless I came up with some of the terms that were...
PESTIAN: That they understood.
CONAN: Excuse me. I know we're talking about suicide, we're not supposed to laugh. But every profession has its argot, and so I guess you got to come up with one for yours as well. And the people you're working with, as you're now working with - you're now working with kids who are coming into the hospital and trying out this technique?
PESTIAN: Yes. So then the study we're just finalizing the protocol and ready to send off to institution review boards includes kids and adults. And it's Children's - and we see over a million patients a year, but again, just 40 of those are suicidal - it'll be Children's, the University of Cincinnati psychiatric - adult psychiatric. We have a hospital in Princeton, West Virginia, an Appalachian-based hospital; and the Canadian health system. And so those would be four sites where we do this on a larger basis.
CONAN: What are the ages of the kids you see?
PESTIAN: We do adolescents, so we'll do anywhere between right around 11 to 18.
CONAN: And is there any distinction between the language that adolescents use and the language that adults use?
PESTIAN: You know, I think in the number of likes and ums in a sentence, yes. But other than that, it's pretty much the emotions are emotions. They just may say it differently.
CONAN: We're talking with John Pestian, director of the Computational Medicine Center at Cincinnati Children's Hospital. You're listening to TALK OF THE NATION from NPR News.
And let's get a caller in on this. Chet is on the line with us from Tifton, Georgia.
CHET: Hi. You may have already touched on this. I came in kind of late. I'm a retired police officer. I still do training and so I've dealt with a lot of these issues.
There is a photo that went viral on Facebook this week. I don't know if he's familiar with the case. It's actually two photos. The first is a San Francisco police officer talking to a young man on the Golden Gate Bridge, and he did talk him out of it. And then the second photo is - was just taken last week. That young man, eight years later, has now given him - presented him with an award. And how often do you use survivors and - are they used in training or...
PESTIAN: Well, we use survivors - there was 160 of them that volunteered to help us when we were building the corpus for linguistic analysis. And they came forward when we asked for their help through the American Association of Suicidology, and all of them came forward. So we try to call on anyone who's willing to help, to be honest with you.
CHET: Some of the comments made about that particular two photographs are - several people commented just from looking at the photographs. The posture of a young man when he was, you know, in that state of mind. And then as he's smiling for the camera, presenting the award. Are you familiar with the photograph I'm talking about?
PESTIAN: No. I haven't seen them.
CHET: It just speaks volumes. Just looking at a photo of a different state of mind, you can see it even though his back is turned in the first photo.
PESTIAN: Well, I'd have to look at it. Thank you.
CONAN: Chet, thanks very much for the call. Interesting.
As you move ahead, what is the application - is that same ratio - let me go back to something you said before, where the social workers and other clinicians are about - right about 50 percent of the time and the machine right about 90 - has that held up?
PESTIAN: Yeah. Well, we have to test that again. That's what this larger study looks at - that accuracy, the clinician's accuracy. We also want to include acoustical patterns. How does the voice sound as it's being said and video patterns, so the idea of creating a - at least for kids, creating a virtual human that the kids can talk to in the emergency room and you can pick up their video and their voice characteristics along with their linguistics, is something that's very appealing in the long run. So, but we have to test that...
CONAN: How far away are you from incorporating the audio and video aspects of it?
PESTIAN: The next study incorporates the audio and video and genetics, so those are the three characteristics that are brought beyond the language.
CONAN: How do you get a corpus for that?
PESTIAN: In this case, it's the exact same way that we did in the first time - in the first study - we go in and talk to the kids, and adults; so there'll be roughly 500 people that enrolled in this study.
PESTIAN: But we'll also do the video with a camera and the audio with audio recording, and genetics are just a little toothbrush-like thing, called a buccal swab that you scrape against your cheek and then send off to be analyzed.
CONAN: As you look at the promise of this, in terms of alerting people to who is more at risk than someone else, this could be a breakthrough.
PESTIAN: Yes, it could. But we have a long road to go.
CONAN: How long before this next study is going to be completed?
PESTIAN: Hopefully we'll start enrolling in early fall and it'll last - enrollment will be about a year to 18 months, and then about a year worth of analysis before we can kind of get to that stage. So it's a couple years away.
CONAN: And you can understand the impatience of people saying, wait a minute, if this is promising, why can't we use it now?
PESTIAN: Mm-hmm. Yeah. We can't, though; it's not been tested enough.
CONAN: Because the risks are very high.
PESTIAN: Well, there's - again, we're not making diagnosis so there's never a risk with presenting the clinician more information and more data so that they can understand what's going on. But as far as a finalized test, we still have a lot of work to do.
CONAN: And you need to know the reliability?
PESTIAN: Mm-hmm. Reliability, validity, generalizability, all those -ilities have to be tested.
CONAN: Good luck with the -ilities.
CONAN: And congratulations on the marker phrase. I like that.
PESTIAN: If you like it, you can use it anytime you want.
CONAN: OK. Thanks very much. Appreciate it. John Pestian is a director of the Computational Medicine at Cincinnati Children's Hospital. He joined us today from our member station there in Cincinnati, WGCU. [POST-BROADCAST CORRECTION: The call letters are WGUC.]
Tomorrow, we'll have a look ahead - another one - this time with our own Robert Krulwich. Join us for that. It's the TALK OF THE NATION from NPR News. I'm Neal Conan in Washington.
NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.
Correction May 16, 2013
We incorrectly identify radio station WGUC as WGCU.