Over at Quanta, they have an interview with Donald Richards, a statistician who is one of the pioneers in the use of a new and better way of measuring correlations:
Wolchover: How does distance correlation work?
Richards: This is where the concept of a Fourier transform comes in….If you give me two probability distributions—the statistical spread of values that a variable takes on—and if I want to test whether the two distributions are the same, all I have to do is calculate their Fourier transforms. If these are equal then I know that the two probability distributions had to be equal, to begin with. The distance correlation coefficient, in layman’s terms, is a measure of how far apart these Fourier transforms are.
Ha ha ha. That’s the “layman’s” explanation. Richards must hang out with some pretty smart laymen. But let’s just pretend we all know what that means and move on. Last year Richards wrote a paper showing how distance correlation is better than bog-ordinary Pearson’s linear correlation. What prompted the paper?
Richards: This was prompted by an opinion piece in The Washington Post in 2015, by Eugene Volokh, a professor of law at UCLA. The title of the article is “Zero Correlation Between State Homicide Rate and State Gun Laws.” What he did was—you know, my eyes bugged out; I couldn’t believe it—he found some data on the states’ Brady scores, which are ratings based on the toughness of their gun laws, and he plotted the Brady scores on an x-y plot against the homicide rates in each of these states. And if you look at the plot, it looks like there’s no pattern.
….I was horrified. There are so many things wrong with this analysis….Should you even fit a linear regression line to this data set? If you look at the rest of the data, you don’t see any linearity to the relationship, and it’s easy to understand why: There are bunches of points that correspond to geographic and culturally similar regions. If you break up the states by region, then you see reasonably linear relationships starting to show up in the scatter plots. And then in each case, you find that the higher the Brady score, the lower the homicide rate.
Bloggers for the win! Though I have to admit I’m not sure whose side I’m on here. If I see a scatterplot that appears completely random to the naked eye, I’m pretty suspicious of efforts to massage the data with sophisticated techniques in order to extract a correlation. So…I’ll take this with a grain of salt. Still, at least Richards is speaking my language:
Wolchover: Do you hope that by developing better tools—like distance correlation—that eventually these methods will seep out into more common use?
Richards: Yes, I hope so. And in fact, I have heard that one of the big pharmaceutical companies is now starting to use distance correlation methods. And I know that people in academia are using it more. I hope to live long enough to see distance correlation be a standard pull-down tab in Excel, or if not Excel, certainly on Wolfram Alpha. You enter your x-y data, and boom: It gives you the distance correlation. I live for that day!
Anybody who lives for the day when Excel will include an obscure statistical function is my kind of person. My hopes and dreams for Excel are a little more mundane, like being able to pick any starting point I want for a chart. I guess I’m a cheap date.