Most Studies of Social Interventions Are Pretty Worthless

Last year, Eva Vivalt of the Australian National University wrote a paper analyzing the results of international development programs like microloans, deworming, cash transfers, and so forth. This chart shows the basic results:

There are two things to notice. First, there’s not a lot of clustering. For nearly all these programs, the results are pretty widely dispersed. Second, where there is clustering, it’s right around zero, where the results are the least meaningful. A few months after Vivalt published her paper, Robert Wiblin described it this way:

The typical study result differs from the average effect found in similar studies so far by almost 100%. That is to say, if all existing studies of an education program find that it improves test scores by 0.5 standard deviations — the next result is as likely to be negative or greater than 1 standard deviation, as it is to be between 0-1 standard deviations….She also observed that results from smaller studies conducted by NGOs — often pilot studies — would often look promising. But when governments tried to implement scaled-up versions of those programs, their performance would drop considerably.

Last week, a charity announced a dramatic specific confirmation of Vivalt’s general results. Kelsey Piper provides the details:

No Lean Season is an innovative program that was created to help poor families in rural Bangladesh during the period between planting and harvesting (typically September to November). During that period, there are no jobs and no income, and families go hungry….No Lean Season aimed to solve that by giving small subsidies to workers so they could migrate to urban areas, where there are job opportunities, for the months before the harvest. In small trials, it worked great.

….Evidence Action wanted more data to assess the program’s effectiveness, so it participated in a rigorous randomized controlled trial (RCT) — the gold standard for effectiveness research for interventions like these — of the program’s benefits at scale. Last week, the results from the study finally came in — and they were disappointing. In a blog post, Evidence Action wrote: “An RCT-at-scale found that the [No Lean Season] program did not have the desired impact on inducing migration, and consequently did not increase income or consumption.” (The emphasis is in the original blog post.)

This admission was a big deal in development circles. Here’s why: It is exceptionally rare for a charity to participate in research, conclude that the research suggests its program as implemented doesn’t work, and publicize those results in a major announcement to donors.

I’m writing about this as much as a warning to myself as a warning to everyone else. In one sense, this is just part of the recent replicability crisis in the social sciences, but it really goes back farther than that. It’s been pretty well known for a very long time that the biggest problem with interventions like these is scalability. Pilot studies have the luxury of being (relatively) easy to fund since they’re small; being able to choose sites where everyone is excited about the program and buys into it; not having to account for long-term feedback caused by the existence of the program itself (i.e., people get accustomed to the program as a baseline rather than as an interesting new thing); and generally having to deal with less diversity in their sample population, which makes a simple one-size-fits-all program easier to implement and less likely to have to deal with community pushback.

Needless to say, this wide dispersion of results from small studies makes it really easy to cherry pick them to demonstrate whatever point you feel like making. I try to be tolerably honest in my reporting, but it’s nearly impossible not to fall prey to this from time to time.

There’s not a lot more to say about this except to make a few brief points:

  • Be skeptical of small studies. “This is just a single study” isn’t merely boilerplate. It’s a real warning.
  • Be very skeptical of claims that small programs are likely to scale well to state or national size. They might, but you should demand real evidence of this.
  • Researchers with the means should be far more willing to follow up pilots with large-scale programs, and far more willing to admit when they don’t work.

This is not a counsel of despair. The truth is that most social interventions at scale just don’t work all that well. This is hard stuff! Still, small pilot studies are the only means we have to provide direction for further research, and large programs that provide even a modest benefit should be considered worthwhile. In other words, we should probably be more demanding of small studies, but less demanding in our expectations for large programs.

WHO DOESN’T LOVE A POSITIVE STORY—OR TWO?

“Great journalism really does make a difference in this world: it can even save kids.”

That’s what a civil rights lawyer wrote to Julia Lurie, the day after her major investigation into a psychiatric hospital chain that uses foster children as “cash cows” published, letting her know he was using her findings that same day in a hearing to keep a child out of one of the facilities we investigated.

That’s awesome. As is the fact that Julia, who spent a full year reporting this challenging story, promptly heard from a Senate committee that will use her work in their own investigation of Universal Health Services. There’s no doubt her revelations will continue to have a big impact in the months and years to come.

Like another story about Mother Jones’ real-world impact.

This one, a multiyear investigation, published in 2021, exposed conditions in sugar work camps in the Dominican Republic owned by Central Romana—the conglomerate behind brands like C&H and Domino, whose product ends up in our Hershey bars and other sweets. A year ago, the Biden administration banned sugar imports from Central Romana. And just recently, we learned of a previously undisclosed investigation from the Department of Homeland Security, looking into working conditions at Central Romana. How big of a deal is this?

“This could be the first time a corporation would be held criminally liable for forced labor in their own supply chains,” according to a retired special agent we talked to.

Wow.

And it is only because Mother Jones is funded primarily by donations from readers that we can mount ambitious, yearlong—or more—investigations like these two stories that are making waves.

About that: It’s unfathomably hard in the news business right now, and we came up about $28,000 short during our recent fall fundraising campaign. We simply have to make that up soon to avoid falling further behind than can be made up for, or needing to somehow trim $1 million from our budget, like happened last year.

If you can, please support the reporting you get from Mother Jones—that exists to make a difference, not a profit—with a donation of any amount today. We need more donations than normal to come in from this specific blurb to help close our funding gap before it gets any bigger.

payment methods

WHO DOESN’T LOVE A POSITIVE STORY—OR TWO?

“Great journalism really does make a difference in this world: it can even save kids.”

That’s what a civil rights lawyer wrote to Julia Lurie, the day after her major investigation into a psychiatric hospital chain that uses foster children as “cash cows” published, letting her know he was using her findings that same day in a hearing to keep a child out of one of the facilities we investigated.

That’s awesome. As is the fact that Julia, who spent a full year reporting this challenging story, promptly heard from a Senate committee that will use her work in their own investigation of Universal Health Services. There’s no doubt her revelations will continue to have a big impact in the months and years to come.

Like another story about Mother Jones’ real-world impact.

This one, a multiyear investigation, published in 2021, exposed conditions in sugar work camps in the Dominican Republic owned by Central Romana—the conglomerate behind brands like C&H and Domino, whose product ends up in our Hershey bars and other sweets. A year ago, the Biden administration banned sugar imports from Central Romana. And just recently, we learned of a previously undisclosed investigation from the Department of Homeland Security, looking into working conditions at Central Romana. How big of a deal is this?

“This could be the first time a corporation would be held criminally liable for forced labor in their own supply chains,” according to a retired special agent we talked to.

Wow.

And it is only because Mother Jones is funded primarily by donations from readers that we can mount ambitious, yearlong—or more—investigations like these two stories that are making waves.

About that: It’s unfathomably hard in the news business right now, and we came up about $28,000 short during our recent fall fundraising campaign. We simply have to make that up soon to avoid falling further behind than can be made up for, or needing to somehow trim $1 million from our budget, like happened last year.

If you can, please support the reporting you get from Mother Jones—that exists to make a difference, not a profit—with a donation of any amount today. We need more donations than normal to come in from this specific blurb to help close our funding gap before it gets any bigger.

payment methods

We Recommend

Latest

Sign up for our free newsletter

Subscribe to the Mother Jones Daily to have our top stories delivered directly to your inbox.

Get our award-winning magazine

Save big on a full year of investigations, ideas, and insights.

Subscribe

Support our journalism

Help Mother Jones' reporters dig deep with a tax-deductible donation.

Donate