Several years ago I was visiting with some friends and happened to get into a conversation with their four-year-old daughter. I don’t remember why, but we got to talking about numbers, and as adults will do, I started quizzing her. Do you know what two plus two is? She did. How about four plus three. No problem. Six plus five? Nine plus four? Eight plus seven? Yes, yes, and yes. That was about as far as she could go, but I was pretty impressed. That’s not bad for a four-year-old, is it?
A year later she was in kindergarten and I was visiting again. And I was curious about how her mathematical prowess had progressed. Answer: it hadn’t. She couldn’t even answer the questions she had gotten correct the year before.
Now, this happened over two decades ago (the daughter in question graduated from college a couple of years ago) and I’ve long wondered if it even actually happened. I clearly remember it, and yet it all seems so unlikely. Did I just imagine the whole thing?
Maybe not. A few days ago I wrote about a Los Angeles Times project to post an online database that measures the performance of LAUSD teachers based on how their kids do on standardized tests. I approved: “Either you believe that the press should disseminate public data or you don’t,” I said, but there were some unspoken words in that sentence. What I really meant was, “Either you believe that the press should disseminate meaningful public data or you don’t” — since, needless to say, nobody believes the press should randomly disseminate useless and misleading data, public or otherwise.
So do standardized tests provide meaningful data? Millions of barrels of ink have been spilled on this question, but here’s an interesting take on the question from a study done a few years ago. Paul Camp, a physics professor at Spelman College, in the course of investigating how students learn Newtonian concepts, came across an interesting result: they don’t learn in a straight line. They learn things, then they get confused, and then they learn them again for good. Learning, in other words, follows a U-shaped pattern, and not just for university level physics:
U-shaped developmental patterns appear to be a general feature of human cognition….Competencies, once learned, do not disappear but they are unusually fragile while understanding reorganizes into a more mature form, and this fragility is reflected by variability in performance…. In short, achieving a new state of organization requires passage through a state of apparent disorganization.
….The existence of U-shaped development  has important implications for student evaluation. It directly implies that single point assessments are unfair and inaccurate.
There’s evidence that this U-shaped pattern is common (this paper, for example, compares 7-year-olds and 9-year-olds on certain kinds of math problems and finds that 7-year-olds do better). So is this what happened with my four-year-old friend? Did she learn simple arithmetic, then get confused about it during kindergarten, and then learn it for good in first grade? Maybe. Maybe I didn’t imagine the whole episode after all.
If this is true, it obviously has disturbing implications for the use of standardized tests in primary schools to evaluate teacher performance. If students routinely go through U-shaped learning curves, it means that a terrific third grade teacher might produce mediocre test scores if her kids tend to be in the trough of the U at year-end, while the fourth grade teacher who gets the kids the following year reaps the benefits.
I don’t have anywhere near the chops to evaluate this evidence, and it’s certainly not the end of the story. What’s more, I remain in favor of the Times project: standardized tests clearly aren’t the be-all-end-all of teacher evaluation, but if we’re going to use them at all we need to take them seriously. And for now, we’re using them. So let’s shine some sunlight on them.
Besides, if the tests really are poor indicators of short-term student performance, perhaps this project will make that clear. Parents, principals, and fellow teachers probably have a pretty good sense already of who the good and bad teachers are, and if the value-added testing metric used by the Times turns out to be wildly at variance with this sense, it should provoke a serious rethink. Either way, then, it’s likely to have a net positive effect. It’s worth a try.