In 2015, I wrote a profile of Brian Wansink, a Cornell University behavioral science researcher who seemed to have it all: a high-profile lab at an elite university, more than 200 scientific studies to his name, a high-up government appointment, and a best-selling book. Plus, his research was really cool: Wansink studied how our environment affects our eating habits. He found, for example, that people who leave their cereal in plain view tend to weigh more than people who hide it away in a cupboard, and that people eat more when they use bigger plates. Like the junk food he studied, his work had an almost addictive quality.
So I took two flights and a drive to Ithaca, New York, to spend a few days with Wansink. His empire was impressive: Cornell’s Food and Brand Lab, which he ran, had its own test kitchens and dining rooms with two-way mirrors. His graduate students seemed to adore him. One night, he invited me to his stately lakeside home, where his wife cooked dinner and we all chatted into the night about how his beginnings as a working-class Iowa kid had shaped his libertarian ideology. Wansink even took me out to the garage to show off the legendary bottomless soup bowls that had earned him an Ignobel Prize. I came away certain that I had found a worthy profile subject. Indeed, when my article ran, it was the most-read story on the Mother Jones website for days.
There’s just one problem: It’s no longer clear how much of Wansink’s work can withstand scientific scrutiny. In January 2017, a team of researchers reviewed four of his published papers and turned up 150 inconsistencies. Since then, in a slowly unfolding scandal, Wansink’s data, methods, and integrity have been publicly called into question. Last week, the Journal of the American Medical Association (JAMA) retracted six articles he co-authored. To date, a whopping 13 Wansink studies have been retracted.
The day after JAMA announced its retractions, Cornell released a statement saying an internal campus investigation had “found that Professor Wansink committed academic misconduct in his research and scholarship, including misreporting of research data, problematic statistical techniques, failure to properly document and preserve research results, and inappropriate authorship.” Wansink, the statement added, “has been removed from all teaching and research. Instead, he will be obligated to spend his time cooperating with the university in its ongoing review of his prior research.”
Last year, when I first learned of the criticisms of his work, I chalked it up to academic infighting and expected the storm to blow over. But as the scandal snowballed, the seriousness of the problems grew impossible to ignore. I began to feel foolish for having called attention to science that, however fun and interesting, has turned out to be so thin. Were there warning signs I missed? Maybe. But I wasn’t alone. Wansink’s work has been featured in countless major news outlets—the New York Times has called it “brilliantly mischievous.” And when Wansink was named head of the USDA’s Center for Nutrition Policy and Promotion in 2007, the popular nutrition writer Marion Nestle deemed it a “brilliant appointment.”
Scientists bought it as well. Wansink’s studies made it through peer review hundreds of times—often at journals that are considered some of the most prestigious and rigorous in their fields. The federal government didn’t look too closely, either: The USDA based its 2010 dietary guidelines, in part, on Wansink’s work. So how did this happen?
To see how Wansink’s work elude the scientific gatekeepers, it helps to understand how journals decide which studies are worthy of publication. Most people know about the system of peer review, wherein research papers are vetted by the author’s academic peers prior to publication. But before that happens, the studies have to attract the attention of a journal editor. That step is key, according to Brian Nosek, a University of Virginia psychology professor who directs the scientific integrity advocacy group Center for Open Science. “Novel, exciting, and sexy results are simply much more likely to get published,” he says.
That makes perfect sense: Journals need readers, and who wants to read a study that just confirms something everyone knows? “Wansink is exceptional in that way,” Nosek says. “His results are unfailingly interesting.” It’s not hard to imagine his papers catching the eye of editors with titles such as, “Bad Popcorn in Big Buckets: Portion Size Can Influence Intake as Much as Taste.”
But Nosek fears the preference for “sexy” studies is undermining science. Results that are simultaneously captivating and scientifically sound are rare, he says. “We tend to overlook the uncertainty that come with those kinds of results.” Ivan Oransky, founder and editor of the scientific integrity watchdog blog Retraction Watch, agrees. “If something looks cool and exciting, we want it to be true,” he says. “We may not ask the tough questions.” Peer reviewers are human, too, and just as susceptible to sexy results as the rest of us. “Based on the kinds of things that I’ve seen get through peer review, it’s not even clear they’re reading the whole paper,” Oransky says.
And that brings us to another problem. It takes more than just provocative findings to land a spot in a coveted journal. The results have to be accurate, too, and scientists are expected to evaluate the credibility of their own data, in part by reporting what’s known as a “probability value” or “p-value.”
The p-value is used to assess the likelihood that a data set is meaningful, as opposed to being the result of random chance. “Essentially, they are an indicator of unusualness,” explains Nosek. P-values range from 0 to 1. The lower the value, the more unlikely that the results occurred by chance. In most fields, the p-value threshold is .05, meaning that the finding would occur just 1 in 20 times by chance.
In addition to the attention-grabbing experiments, Wansink’s papers contained p-values that met the p-value threshold. Or did they? Last year, BuzzFeed’s Stephanie Lee obtained emails in which Wansink appeared to be encouraging his grad students—with whom he often co-authored studies—to cherry-pick data so the p-values would be acceptable.
This practice is called “p-hacking” and it’s bad science. For instance, Lee cited a 2012 paper in which Wansink and his co-authors found that children were more likely to eat fruit that had Sesame Street stickers on it. But before submitting the study to the prestigious journal Pediatrics, Lee reported, Wansink fretted over its data quality:
The p-value was 0.06, just shy of the gold standard cutoff of 0.05. It was a “sticking point,” as he put it in a Jan. 7, 2012, email.
“It seems to me it should be lower,” he wrote, attaching a draft. “Do you want to take a look at it and see what you think. If you can get the data, and it needs some tweeking [sic], it would be good to get that one value below .05.”
When I emailed Wansink to ask whether he had engaged in “p-hacking,” he responded: “P-hacking is a weird word that means different things in different fields. I’ve seen it used a lot in economics where a person fishes around for correlations that are significant, and then builds a theory around them. When you have a theory or reason why you’re looking around for results, I call that exploratory research, and we try to always label it as such.”
Nosek wasn’t impressed with Wansink’s explanation. “Exploratory research is very important,” he told me via email. “What is a problem is presenting exploratory research (hypothesis generating) as if it is confirmatory research (hypothesis testing).”
If Wansink’s studies were preliminary, then they likely would not have been published in prestigious journals. The irony is, perhaps they should have been; if the journals would publish exploratory work, scientists might not feel so much pressure to wring ideal results out of their data.
Wansink’s alleged data manipulation would seem to be a flagrant example of p-hacking—Nosek called it a “caricature.” Yet both Nosek and Oransky say p-hacked papers had become common long before the Wansink scandal came to pass. The social sciences, in fact, have been reeling from what some experts call a “replication crisis”—because the journals only want novel research, there’s little incentive for others to put in the hard work to confirm published conclusions. “That this is now coming into the light is good news, but let’s not pretend this is an isolated case,” Oransky says. No one likes to talk about these issues, but this is not solved.”
P-hacking wasn’t the only problem the Cornell investigators found in Wansink’s work. In two papers, for example, he wrote about the lunchroom habits of elementary school children—the USDA based its popular Smarter Lunchrooms program partly on Wansink’s findings. But it turned out his data was collected using preschoolers. (The Lunch Tray blog has the full rundown here.)
Wansink stands by his work. The university’s accusations, he wrote in a statement, “can be debated, and I did so for a year without the success I expected. There was no fraud, no intentional misreporting, no plagiarism, or no misappropriation.” He added, “I believe all of my findings will be either supported, extended, or modified by other research groups.”
Be that as it may, I’m writing this before dawn on a Sunday, because I know I won’t have time this week to rehash a three-year-old piece about a scientist that perhaps wasn’t worthy of my attention in the first place. So how do I—and all the others who reviewed, wrote about, and based policy on Wansink’s work—become more vigilant?
Leif Nelson, a professor of marketing and business administration at the University of California-Berkeley’s Haas School of Business, has studied the replication crisis in social science for nearly a decade. He believes that peer reviewers are just beginning to catch on to problems like p-hacking, thanks in part to high-profile cases like Wansink’s. “Perhaps it is like recognizing a secret handshake,” he told me in an email. “Before you learned it, you never would have even noticed it, but after you learn it, you will never miss it.”
Nosek and his colleagues are working to change the field from within. In 2012, they launched the Open Science Framework, a website where researchers can preregister their experiments and share data, materials, and research plans in advance of publication. That way, there’s a public record of their results and the questions they were trying to answer that makes techniques like p-hacking nearly impossible. Since the project launched, Nosek says, the number of preregistered studies has doubled every year. There are now about 20,000 studies in the database.
That’s encouraging, but it doesn’t relieve the pressure to whip up ever more amazing and novel results. Fixing that problem, Oransky says, would require a change in the fundamental system through which research is funded and published. “You’re dealing with an incredibly porous system that has every incentive aligned in the wrong way,” he adds. “Maybe it makes me cynical, but I fail to be surprised by anything that happens in this system.”
Noted. Next time I see a really sexy-sounding study, I’ll be cynical, too.
This piece has been updated.