When it comes to data journalism, everyone’s a critic.
The launch of three major data journalism operations in only a few weeks—the revamped 538, Vox, and the New York Times‘ The Upshot—have produced a slew of opinion pieces. They are summarized quite nicely in this piece by Guardian journalist James Ball, but the one critique that sticks with me the most is the idea that we are at a moment in which there is lots of content about data but not so much actual, you know, data.
This may all seem like journalistic navel gazing, but it matters.
Here’s how journalism used to work: as a one-way process where the reporter, in her or his ivory tower, would selectively throw nuggets down to the world. That information would be (hopefully) gratefully received by the reader, but the relationship was almost entirely one-directional and couldn’t be imagined any other way. What could the consumer ever possibly have to offer? Why would you share your sources or your raw information? It wasn’t that long ago that reader’s feedback for a newspaper was relegated to the letters page—how else could you possibly interact with the journalist who had written the story?
Here’s how statistics used to work: Data was published in books written by statisticians. They were the only people who understood the numbers and had the tools to deal with that data.
Both of those models reflected the technology of the time and both are now broken, forever. Journalism today is at least as much about working with the community as it is telling the world what you think happened. The ethos of open journalism is that reporting becomes better by gathering the expertise of the world and helping to curate it.
Statistics and data have changed too. Governments everywhere have thrown open their vaults and released it to the world. The transparency revolution is not happening as quickly or as smoothly as we’d like, but since the launch of data.gov in 2009 the idea of data being available in anything other than open formats should be laughable. Now citizens have a sense of entitlement when it comes to raw information. We paid for it to be collected, so why shouldn’t we have it?
For a while, data journalism started to bring those two fields together, combining the flood of open data with a new type of reporting. It wasn’t just about analyzing the data, it was also about making it available and showing your work; taking the lessons learned from computer-assisted reporting in the 1960s and 1970s and using today’s tools to make it easier to lay bare what could be gleaned from the numbers.
This new, improved data journalism could start to perform a valuable democratic function: becoming a bridge between those who have the data (and are terrible at explaining it) and the world, which is crying out for raw information and ways of understanding it.
Wouldn’t it be odd if the reverse started to happen, if we moved back to a time when raw data was something only the chosen experts could analyze for themselves? Lots of content, not much data.
There are a number of news websites out there who do make the data free, besides Guardian Data, which I used to edit. Take a look at the Los Angeles Times data desk or La Nación in Argentina, which has led the way in the sometimes risky process of opening up that country’s data and making it available and understandable. The Texas Tribune publishes data every day and provides tools to help readers understand it (and those data pages bring in the majority of the site’s overall web traffic). And ProPublica‘s news apps team produces amazing data exploration tools which help make sense of the world. When Mother Jones compiled a database of every mass shooting in America, it made its data transparent, and other news organizations like the Boston Globe and The Wire made new visualizations and conducted different analyses based upon it.
These news organizations all have something in common: They make their work and raw data available for you to download and explore yourself as a matter of routine.
For context, it’s worth checking out this piece by Alex Howard at the Tow Center. He quotes Los Angeles Times Data Desk editor Ben Welsh on his role: “As we all know, there’s a lot of data out there…and, as anyone who works with it knows, most of it is crap. The projects I’m most proud of have taken large, ugly datasets and refined them into something worth knowing.”
This is data journalism for me: storytelling informed by the numbers that have become as common as oxygen around us. The reader interface of those stories can be anything from data visualizations to in-depth investigations and, yes, explainers. But the defining feature is that combination of storytelling with the actual, transparent data—being the readers’ guide to the world around them but also helping them navigate the river of raw information by assuming they are grown up enough to engage with it for themselves.
When you read a piece of data journalism, how does it make you feel? Do the numbers and data seem just as inscrutable as ever, or do you think that you could actually do it yourself, given a little patience and time? Do you feel thankful to the author for generously sharing their wisdom with you, or disenchanted that the truth seems just as far away as ever?
Journalists may have been traditionally scared of math and numbers, but today we have tools to help us navigate datasets that would have been unimaginably inscrutable a few years ago. From analysis to producing visualizations, the power has shifted from the professional statistician toward the reporter because literally anyone can do it.
But reporting on data is not the same as making it open. Take detailed election results, for instance. This is the essence of democracy, yet because news organizations pay for it they are reluctant to publish the raw numbers for anyone else to download and use. Journalism needs to get over that reluctance, and those who pride themselves as being data journalists should actively combat it.
Two years ago I wrote about the central responsibility of the champions of data journalism. It remains unchanged:
News organizations may be campaigners for open information but by withholding that data, become complicit in a system which essentially keeps data private until it’s no longer commercially valuable. It’s all very well calling for governments to throw open the doors of their data vaults, but if you are not willing to be open too, what is that worth?