Of the many illnesses that plague Americans, heart disease is the deadliest—and one of the toughest to predict. Epidemiologists have long used surveys and clinical data to tease out genetic factors from lifestyle risks such as diet, smoking, and stress, with little success. But a new study shows that there might be a better tool to assess heart disease: Twitter.
A study published in the peer-reviewed journal Psychological Science analyzed tweets and health data from 1,300 counties across the United States. The researchers found that negative tweets—those expressing fatigue, hostility, and stress—were associated with elevated risk of coronary heart disease (the medical term for clogged arteries) in the counties where the writers of those tweets lived. High volumes of tweets expressing optimism, excitement, ambition, and activity, meanwhile, correlated with lower than average rates of heart disease.
Here are some word clouds with examples of language that predicted higher and lower levels of disease:
What’s more, the researchers found that the language used in tweets correlates much more closely with heart disease rates than traditional predictive factors such as your income and education level, your weight, and even whether you are a smoker:
Lead author Johannes Eichstaedt, a psychological scientist at University of Pennsylvania, described Twitter as “the perfect tool for figuring out something like heart disease.” Researchers have long suspected connections between emotional states and heart disease risk. And while it’s not surprising that people with high levels of stress and anger would be at higher risk than their mellower, happier peers, researchers have traditionally relied on surveys to evaluate people’s psychological well being. The problem is that survey-based studies can take years, and people aren’t always honest about their feelings. Which makes Twitter a researcher’s treasure trove. “Twitter is where people talk about themselves, where they express their emotions candidly,” Eichstaedt says.
Here’s a map showing coronary heart disease deaths by county, using data from the Centers for Disease Control and Prevention:
Now compare it with this map, which predicts rates of heart disease based on tweet language:
Another bonus of using Twitter as an epidemiological tool: It’s much easier and cheaper than going door to door or calling people to conduct surveys. “If I wanted to repeat this analysis I could do it in an afternoon,” says Eichstaedt. “With surveys, that would take a year.”