Prior to the 1970s, doctors weren’t aware that there was more than one type of diabetes. It was a group of Stanford researchers who made the discovery when they were analyzing a data set containing six bodily measurements from 145 patients who either had or were at risk of the disease. They applied a mathematical technique that essentially allowed them to graph the six-dimensional data set in three-dimensional space, while still preserving certain relationships between the data points. Looking at the graph, they noticed that diabetes sufferers clustered into two main groups. These groups represented what is now known as type 1 and type 2 diabetes. Drawing a distinction between the two forms of the disease undoubtedly has improved the quality of life for millions of people over the past few decades, as it has enabled the development of more specialized treatments. The moral of the story: There is a lot hidden in the data we collect. And sometimes to uncover it, we simply need to change the way we look at the information.
Take a paper published this week in BioMed Central Genetics on the detection of drug-resistant malaria strains in 264 Papua-New Guinea volunteers. You see, we don’t have a vaccine for malaria, so we have to treat it after infection. One problem this poses is that malaria can become resistant to a particular drug treatment. Figuring out which patients have resistant strains and which have strains sensitive to a particular drug is important, so that treatments can be adjusted as patients develop resistances. What the researchers in this study did was attach fluorescent beads to molecular markers associated with the drug-resistant and drug-sensitive forms of malaria. Blood samples containing the resistant strains then omitted one type of florescent “signal,” while those containing the sensitive strains omitted another. The difference between the two signals is slight, however. Traditionally, to identify which strains are resistant and which are sensitive, the researchers would just plot the signals on an x–y (Cartesian) plane. Much like in the Stanford diabetes study, clusters on the graph would then emerge and be identified as containing either the drug-resistant strains or drug-sensitive strains.
Of course, the boundaries for what determines which data points fall inside and which fall outside of a cluster are not always clear. And improving those boundaries was a major goal for the researchers. In this case, their big insight came when they converted the traditional x–y coordinates into polar coordinates in order to better determine those boundaries—a conversion familiar to just about every high school student: Here is a refresher on polar coordinates, and a brief overview of how the researchers used it (you can skip it if you’re not interested):
Take the point (x,y) plotted on the Cartesian plane. The point is located on the plane by moving x units horizontally from (0,0), and then moving y units vertically from (x,0). Another way to define this point would be to consider an imaginary line that connects (x,y) to (0,0). Call the length of that line r, and the angle it makes with the x-axis Θ. Now you can write (x,y) as (r, Θ). It’s the same point, only it’s now expressed as a polar coordinate. The researchers did this conversion for each point in their data set and then plotted them by treating their r-values as y-coordinates and their Θ-values as x-coordinates. This trick, explained in more detail here, allowed them to more accurately define the boundaries for drug-resistant and drug-sensitive strains.
In converting the data from Cartesian to polar coordinates (and back again) no information is gained or lost; it simply affords the researchers a different perspective on the same data set. This perspective, however, improves the researchers’ ability to draw accurate boundaries around drug-resistant and drug-sensitive strains. To give you an idea of how much it improved their results, of the 264 blood samples the researchers studied, the polar coordinate analysis allowed them to reclassify 86 that were originally classified with the traditional method—that’s just about a third of the test subjects.
Not bad for a mathematical technique most schoolchildren learn before they even have to read Wuthering Heights.