This past spring, 5 million students from third grade through high school took new, end-of-year tests in math and English that were developed by a consortium of states known as PARCC.
It's a big deal because these tests are aligned to the Common Core learning standards, and they're considered harder than many of the tests they replaced.
It's also a big deal because until last year, it was all but impossible to compare students across state lines. Not anymore.
There's just one problem: The results won't be released for a long time (late fall). What's the holdup, you ask?
The tests have all been read and the answers tallied. That's not the problem. The problem is, adding up right answers doesn't tell you how a child did. For that, you need cut scores. And PARCC doesn't have them yet.
"The cut score is the manifestation of how good is good enough," says Mary Ann Snider, chief academic officer for the Rhode Island Department of Education.
PARCC has already agreed on five basic student performance levels. Five is for students who display "distinguished" command of material. One is "minimal." The goal is to get all students at least to Level Three or "adequate" command. What PARCC doesn't have yet are the point cut-offs that draw hard lines between those categories.
Snider is hard at work helping nail down cut scores for those 5 million tests. She says it's not as simple as using the old A-through-F, 10-point scale, where 70 percent is the traditional cut-off for moderate or good enough.
"I don't know that I want my pilot to know 70 percent of the content for flying a plane. I'd like him to be closer to 90 or 100 percent," Snider says.
In short: Where you set the cut scores depends on the importance and difficulty of the skills being tested.
Patte Barth, director of the Center for Public Education, says it's a balancing act: "Establishing cut scores is part science. It's part art. But it's also part political."
Barth remembers the early days of the No Child Left Behind law, when the federal government told states they'd be punished if students weren't "proficient" — which is just a fancy way of saying "good enough." But, since states used wildly different standards and tests, they also got to set their own cut scores. As a result, Barth says, "they found a huge, huge range of performance levels."
Many states lowered the bar, creating the illusion of improvement. That's one reason Snider — from Rhode Island — found herself in a Denver hotel last week.
Welcome To Denver
The states involved with PARCC sent more than a hundred hand-picked teachers and educators to join Snider there. They poured into the hotel's cavernous, basement ballrooms and began debating those elusive scores.
"Would a Two be able to do this and this?" asks Lorretta Holloway, who teaches English at Framingham State University in Massachusetts. She was on the panel debating cut scores for the 11th-grade English test and explains what the closed-door deliberations were like. "Not should they. I'm thinking, 'Yes, because they all should do everything, right?' But would they, really?"
Holloway says, in the process of fleshing out these student performance levels, she constantly had to weigh what students likely know (the real) against the aspirational:
What should they know?
In that balance, says Snider, is a more honest test:
"We have to get enough of that right so that we're not giving kids a false sense of accomplishment."
Unlike those early days of No Child Left Behind.
"Yeah, it might be a tough test," says Marti Shirley, who teaches high school math in Mattoon, Ill., and was also on a cut-score panel. "But, you know what? It's gonna give us a true reflection of where our students are and what growth they need."
That may sound good, but remember what Barth said?
Establishing cut scores is part political. And PARCC has struggled mightily to win the political fight over raising the bar. Because it's selling a tough message:
"States should expect those scores to be lower. And, if they're smart, they're communicating that to the public," says Barth.
But what politician wants to preside over a huge drop in student test scores?
Not long ago, half of all states were involved in some way with PARCC. Today, it's seven, plus Washington, D.C. And quietly, in Denver, some teachers worried that the decline of PARCC could mean a return to the days when many state tests weren't honest.
Holloway says someone has to be honest with the freshmen who are surprised to find they can't keep up in her college writing class.
"Sit in my office with me," Holloway says, "when I'm passing out the pudding and the Kleenex while they're in tears. Because they are working, and they are still behind."
In the coming weeks, more teachers will sit in that windowless, hotel basement in Denver, debating the scores and skills that will separate good enough from not quite.
And they'll do it with the weight of 5 million tests — and the fate of 5 million students — on their shoulders.