A recent study suggests that computers can score student essays about as well as human beings. Les Perelman, a director of writing at MIT, isn’t impressed:
While his research is limited, because E.T.S. is the only organization that has permitted him to test its product, he says the automated reader can be easily gamed, is vulnerable to test prep, sets a very limited and rigid standard for what good writing is, and will pressure teachers to dumb down writing instruction.
The e-Rater’s biggest problem, he says, is that it can’t identify truth. He tells students not to waste time worrying about whether their facts are accurate, since pretty much any fact will do as long as it is incorporated into a well-structured sentence. “E-Rater doesn’t care if you say the War of 1812 started in 1945,” he said.
Sounds like another win for e-graders to me! An excessive deference to facts is just an obstacle to success these days, best left to the little people responsible for the drudge work of implementing plans and tactics. If you have higher ambitions, an ability to bullshit persuasively is far more important, and apparently our robot essay scorers know that. Besides, they can grade nearly a thousand essays a second. What’s not to like?
On a more serious note, I suspect that Perelman’s criticisms are off base. He says that electronic grading programs can be gamed, and I have no doubt that he’s right. But here’s the thing: the study that started all this didn’t say that robot graders have discovered some cosmically valid measure of writing quality. The study just said that computer graders handed out the same scores as human graders. In other words, apparently humans don’t care much about facts either; are easily impressed by big words; and have idiosyncratic likes and dislikes that can be easily pandered to. The average human being, it seems, can be gamed just as easily as a computer.
If you want a broader moral about computer intelligence from all this, I’ve got one of those too. Here it is: People who don’t believe in “real” artificial intelligence natter on endlessly about their belief that computers will never be able to truly replicate the glorious subtleties and emotional nuances of human thought. The problem is this: most of them overestimate just how impressive human thought really is. Human beings, in most cases, are just a bundle of fairly simpleminded algorithms that fuse together in enough different combinations that the results seem ineffable and impossible of reduction. But that’s only because most of the time we don’t really understand our own motivations. We aren’t nearly as impressive as we like to think.
In the end, this is my big difference with the AI naysayers: I’m just not as impressed by human intelligence as they are. All those human essay graders probably think they’re making use of deep human values and intelligence as they score those essays, but in fact they’re mostly just applying a few hundred (or maybe a few thousand) linguistic algorithms they’ve learned over the years and spitting out a number. And before you scoff about the poor drones doing this grading, who are nothing like you because you have subject area knowledge and do care about facts, well, how long do you really think it’s going to be before robo-graders have that too? If a computer can win Jeopardy! and act as an expert system for medical diagnoses, how long will it be before their judgement of factual material is as good as ours? Ten years? Twenty?
The future success of AI doesn’t fundamentally hinge on the fact that computers will someday be far more impressive than they are today. It hinges on the fact that human-level intelligence isn’t all that high a bar in the first place. My guess is that we don’t have very much longer to get used to that.