More States Opting To 'Robo-Grade' Student Essays By Computer Developers say they understand why teachers would be skeptical. But, they insist, computers already drive cars and detect cancer, so they can certainly handle grading students' essays.
NPR logo

More States Opting To 'Robo-Grade' Student Essays By Computer

  • Download
  • <iframe src="https://www.npr.org/player/embed/624373367/624911907" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript
More States Opting To 'Robo-Grade' Student Essays By Computer

More States Opting To 'Robo-Grade' Student Essays By Computer

  • Download
  • <iframe src="https://www.npr.org/player/embed/624373367/624911907" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript

SCOTT SIMON, HOST:

Little pop quiz now. Who writes our theme music - A, Snoop Dogg, B, Dolly Parton, C, Philip Glass or, D, B.J. Leiderman? A computer would quickly know D is the correct answer. But now computers are also starting to grade students' essays. As NPR's Tovia Smith reports, many teachers see that as a mistake.

TOVIA SMITH, BYLINE: Developers of the so-called robo-graders say they understand the skepticism. But they say if computers are already driving cars, detecting cancer and carrying on conversations, they can also handle grading a high school essay on, say, the fall of the Roman Empire.

PETER FOLTZ: Yeah, I've been working on this now for about 25 years, so I feel that it's something that - the time is right. And it's really starting to be used now.

SMITH: Peter Foltz is a professor at the University of Colorado and a researcher for Pearson, a company whose automated scoring program graded some 34 million student essays on state and national high-stakes tests last year. Foltz says computers learn what's considered good writing by analyzing essays graded by humans, and then they simply scan for those same features.

FOLTZ: We have artificial intelligence techniques which can judge anywhere from about 50 to a hundred features - whether a student is on topic, the coherence or the flow of an argument, the complexity of word choice. And we've done a number of studies to show that the scoring can be highly accurate.

SMITH: To demonstrate, he takes a not-so-stellar sample essay rife with spelling mistakes and sentence fragments, and he runs it by the robo-grader, which instantly spits back a not-so-stellar score.

FOLTZ: So it gives an overall score of two out of four on these different writing traits. And it gets a one on spelling and grammar. It gives a two on task and focus, and...

SMITH: Several states already use automated grading on their standardized tests. Utah, for example, started cautiously with human eyes backing up every computer score. But officials say the computers have proven spot-on, and now more states are considering it.

(SOUNDBITE OF ARCHIVED RECORDING)

JEFF WULFSON: I asked Alexa whether she thought we'd ever be able to use computers to reliably score student tests, and she said absolutely.

(LAUGHTER)

SMITH: Massachusetts Department of Education Deputy Commissioner Jeff Wulfson introduced the idea at a recent meeting. He's one of many now intrigued by the potential cost savings and the prospect of getting test results back in minutes rather than months. But many teachers are unconvinced.

KELLY HENDERSON: The idea is bananas as far as I'm concerned. An art form, a form of expression being evaluated by an algorithm is patently ridiculous.

ROBYN MARDER: Agreed.

SMITH: Kelly Henderson and Robin Marder teach English at Newton South High School just outside Boston.

HENDERSON: What about original ideas? Where's room for creativity of expression? A computer's going to miss all of that.

SMITH: Even worse, Henderson worries robo-graders will encourage the worst kind of formulaic writing.

HENDERSON: What is a computer program going to reward? Is it going to reward some vapid drivel that happens to be structurally sound?

LES PERELMAN: That's a very easy question to answer. And that's what we'll see in the Babel Generator.

SMITH: MIT researcher Les Perelman designed his Babel Generator to expose what he sees as the absurdity of robo-scoring. It works like a computerized Mad Libs, creating essays that makes zero sense but earn top scores from robo-graders.

PERELMAN: OK, so we'll generate an essay.

SMITH: To demonstrate, he gets an online practice question for the GRE exam that's graded with the same algorithms as actual tests. Then on his Babel Generator, he enters three words related to the essay prompt and presto, a 500-word wonder.

PERELMAN: Motive is a scrutinization that has not and no doubt never will be disrupting yet somehow assimilated.

SMITH: This is hilarious.

PERELMAN: Yeah.

SMITH: It makes no sense.

PERELMAN: It makes absolutely no sense.

SMITH: But Perelman promises that won't matter to the robo-grader and submits the essay for a score.

PERELMAN: And...

SMITH: Big moment of truth.

PERELMAN: Six points - perfect score. It's so scary that it works.

SMITH: It proves, Perelman says, that real ideas and facts don't matter to the algorithm and how easy it is to game the system. Even without a Babel Generator, he says, students can fool the computer by just using lots of big words, complex sentences and some key phrases like in conclusion. But Nitin Madnani, a researcher at ETS, the company that makes the GRE's robo-grader, says that's not exactly a hack.

NITIN MADNANI: If somebody is smart enough to pay attention to all the things that a - you know, an automated system pays attention to, to incorporate them in their writing, that's no longer gaming. That's good writing. So you kind of do want to give them a good grade.

SMITH: Madnani says actual GRE essays are always scored by a human reader as well as a computer, so pure babble would never pass a real test. And while other tests are graded only by machines, they're getting better at picking up student tricks and flagging them for human review. For example, some students have written one perfect paragraph and just repeated it four more times. Others have padded their essays with long quotes. David Shermis is a dean at the School of Education at the University of Houston, Clear Lake.

DAVID SHERMIS: In this game of cat and mouse, the vendors have already identified that as a strategy. And so the essay will be scored with very low confidence, and it will say, please have a human rater take a look at this.

SMITH: So in conclusion, robo-grading technology may indeed be demonstrating proficiency, but experts say it's also still got plenty of room for improvement. Tovia Smith, NPR News.

Copyright © 2018 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.