Yesterday I wrote a quick post about the new PISA international test scores in math. Among other things, I noted that the results for American students seemed to be quite different from those in another international test, the TIMSS. What’s up with that?
This morning I got an email from Nathan Burroughs, a researcher at Michigan State University, that shed a bit of light on this. One problem with comparing scores is that not every country participates in both tests:
The differences are largely due to which countries participated in each test. Looking at only the 29 countries participating in the 2011 8th grade TIMSS and 2012 PISA, there is a extremely high correlation of .94 between the two tests in mathematics…. Part of what may be happening is that there is a lot of similarity in ranks at the extremes — the same countries do very well and very poorly — but some shuffling around in the middle ranks. It’s very revealing that the US does much better on the TIMSS — which includes fewer OECD countries — than on the PISA. There are a few substantive differences in the two assessments, particularly the age at which the test is given and the emphasis on particular content domains, but overall the tests tell pretty much the same story.
I can’t do a deep dive into this at the moment, so for now consider this some more raw data. The chart below is a quick and dirty comparison of just the 29 countries that participated in both tests. The light bars are TIMSS rankings and the dark bars are PISA rankings (so lower is better). Generally, speaking, Burroughs is right: countries that ranked highly on one test also ranked highly on the other. The outliers include Russia, Israel, Australia, Norway, the United States and a couple of others, which do substantially better on one test than on the other.
Roughly speaking, I think the most you can really say is that the United States is pretty average. What’s more, as near as I can tell, the United States has always been pretty average. Going back half a century, we’ve never been either the best or the worst. You can draw your own conclusions about what that means.