Software Used to Make “Life-Altering” Decisions Is No Better Than Random People at Predicting Recidivism

And neither is very good, a new Dartmouth study confirms.

Abstract digital composite of gavel and binary codes.


Researchers at Dartmouth College have found that a computer program widely used by courts to predict an offenders’ risk of reoffending is no more fair or accurate than a bunch of random non-experts who were given the same data and asked to make predictions.

The program, Correctional Offender Management Profiling for Alternative Sanctions, is used in several states to inform pretrial, parole, and sentencing decisions. And while it may sound sophisticated—COMPAS has 137 variables and a proprietary algorithm—the software performs no better than a simple linear predictor using just two variables.

“Claims that secretive and seemingly sophisticated data tools are more accurate and fair than humans are simply not supported by our research findings,” said co-author Julia Dressel, an undergraduate who performed the research with Dartmouth computer scientist Hany Farid. 

For their peer-reviewed study, published Wednesday in Science Advances (Science magazine’s open-access “offspring”), Dressel and Farid commissioned human participants through Amazon’s Mechanical Turk program. In the first round, people were given a short description of a defendant that included seven features, excluding race. With that information, they were tasked with predicting whether a person would reoffend within two years of their most recent crime.

Those results were compared with COMPAS’s assessments of the same set of 1,000 defendants. The researchers found no significant difference: The human participants accurately predicted recidivism in 67 percent of cases; COMPAS was accurate in just over 65 percent.

“It is troubling that untrained internet workers can perform as well as a computer program used to make life-altering decisions about criminal defendants,” Farid said in a statement. “The use of such software may be doing nothing to help people who could be denied a second chance by black-box algorithms.”

The new study comes on the heels of a 2016 investigation by ProPublica, whose reporters found that COMPAS was not only remarkably unreliable—just 61 percent of the people predicted to reoffend did so—it also showed racial disparities. COMPAS incorrectly flagged black defendants for recidivism nearly twice as often as it incorrectly flagged white ones.

Participants in the Dartmouth study also turned up false positives for black defendants more frequently, even when they weren’t shown the person’s race. They did so in 37 percent of cases, compared with a bit over 40 percent for COMPAS. For white defendants, participants turned up false positives in about 27 percent of cases, versus COMPAS’s 25 percent.

The researchers recruited a second set of participants to repeat their study, only this time race was included as a factor. The overall accuracy rate was nearly identical to that of the first round, but the disparity between false positives for black defendants versus white went up: 40 percent versus 26 percent. 

The overall results prompted Dressel and Farid to question the sophistication of COMPAS’s predictive algorithm. For the study’s third portion, they set up their own tool: a simple linear predictor algorithm that used the same seven features as the human participants were given. It yielded similar results. Then the researchers tried building a more powerful program, which, using the same data, again showed nearly identical results.

COMPAS wasn’t the only software examined. The researchers reviewed a total of nine different algorithmic approaches to predicting recidivism, and even the best one had “only moderate levels of predictive accuracy,” they wrote. With the tools currently available, the authors note, the data COMPAS uses may not be “separable.” This means it may not be possible for any program—COMPAS or otherwise—to use it to accurately assess recidivism risk.

In yet another test, a linear predictor algorithm provided with just two features—the person’s age and number of previous convictions—did as well as COMPAS, which is to say, not great. “The entire use of recidivism prediction instruments in courtrooms should be called into question,” Dressel said. “Along with previous work on the fairness of criminal justice algorithms, these combined results cast significant doubt on the entire effort of predicting recidivism.”

“When considering using software such as COMPAS in making decisions that will significantly affect the lives and well-being of criminal defendants, it is valuable to ask whether we would put these decisions in the hands of random people who respond to an online survey,” they wrote. Because “the results from these two approaches appear to be indistinguishable.”