A Beginner’s Guide To Data Journalism

Data is at the heart of many of the most important articles of today


Data-based reporting is now firmly ingrained in the newsroom. According to a recent study by Google News Lab and PolicyViz, 42% of reporters use data to tell stories more than twice a week, while 51% of all news organizations in the US and Europe now have a dedicated data journalist - and this rises to 60% for digital-only platforms. The rise of data journalism reached a pinnacle the 2017 Pulitzer Prize to the investigation into the Panamana Papers, an unprecedented leak of 2.6 TB of data and 11.5 million documents from the database of the world’s fourth largest offshore law firm, Mossack Fonseca, that exposed the dark side of the offshore industry. The only reason that journalists were able to gather insights from the leaks was custom software built specifically for analyzing these records, and a team of journalists using forensic data techniques to investigate properly. Indeed, learning how to work with data is now considered such a vital skill for journalists that Joanne Lipman, the editor-in-chief of USA Today, advised anyone wanting to be a journalist to learn to code. ‘Make sure that you take some classes,’ Lipman told Business Insider. ‘Not heavy-duty computer science necessarily, but learn development and programming. Learn to work with data and data visualization.’

Data journalism leverages computers, digital data formats, and other electronic technology to deal with huge quantities of information quickly and evaluate data with depth and flexibility. By analyzing these data sets, data journalists uncover and present stories more efficiently and in ways that are more interesting than traditional reporting, enabling readers to search and review information, essentially becoming researchers themselves. A data journalism team requires an array of skill sets to transform raw data sets into a publishable story. BBC's Visual Journalism team, for example, consists of web developers, journalists, and designers. They have to clean the data to ensure the validity of the insights gathered, using a program like Open Refine to ensure it is standardized. You also need to speak to the organization who created the datasets to ensure that it is in order and that the conclusions being drawn are correct. Then, you need someone to analyze the data and write the article based on their findings, before it is presented as graphics and visualizations that are accessible and user-friendly on all kinds of screen sizes and digital platforms.

The first stage in creating a data-driven story is coming up with a hypothesis. The key word in data journalist is still journalist, and asking the right questions is still the foundation of building a story. Data journalism was once described as social science done on deadline, and it needs a question that has a definitive answer. Equally, however, there needs to be sufficient data to provide this answer, so for a question to be valid, there needs to be enough data to answer the question.

This data is usually the same that journalists have always sought, with records from government agencies and financial transactions the primary focus. However, rather than the paper documents they would have normally sought, this information is now distributed in electronic formats and at a scale that dwarfs the old records. These datasets are either open or closed, with any of the biggest stories coming from leaked files that were private, such as the Panama papers and Edward Snowden’s global surveillance disclosure. Equally though, the rise of open data means that much of the information data journalists collect is freely available to the public from the likes of the IMF and the World Bank.

The open data movement has much in common with journalism and they need one another to succeed. They are both about transparency, accountability, and helping people be more informed and make better decisions. The open data movement was first popularized under President Obama, who issued his first policy paper on his first day in office in January 2009. The US government made available thousands of datasets for public scrutiny by journalists and policy-makers, with coders and developers invited to make the data useful to people and businesses. Obviously, you are going to be hard-pressed to find many members of the general public willing to trawl through this data, and it often comes down to journalists to find interesting information and bring it to the attention of a mass audience.

The Guardian, for example, is leading the way in data journalism and relies heavily on open data to find stories ahead of their rivals. At the recent Open Data Summit, Caelainn Barr, a Data Journalist at The Guardian, discussed work they did with hygiene ratings in food establishments across the UK. This was all held by the Food Standard Authority in a big CSV, which was accessible but not well translated for public consumption, nor was it relatable. They went through the dataset themselves, spoke to the inspectors who set the ratings themselves, and produced a map of the UK showing the number of failed inspections in each area from 0-100%. They found that 1 in 7 takeaways across the UK had failed food hygiene tests, making the front page with a story that no other paper had using data that was easy to come by, only nobody else had thought to use it.

Once you have your data set prepared, you need to find insights. This is usually done after it has been visualized. Visualization reveals intricate structure in data that cannot be absorbed in any other way. Storytelling with data visualization draws an impactful response from the user and reinforces it with numerical evidence. The way the human brain processes information means that presenting data as a story gives everyone in an organization a better understanding of it, and enabling a greater range of people to make sense of what it’s saying is often likely to lead to more insights. Larger media organizations often have their own teams in place working alongside journalists to present the data in a way that is fit for publication, though there is also software such as Tableau available that is reasonably easy to use for smaller publications that perhaps don’t require quite the same slickness and for journalists to explore the data as they look for insights.

Data journalism is not easy, but it does have the advantage of increasing the likelihood that the stories you find will be unique. This is especially important in today’s climate, with competition for eyes on page so incredibly fierce. Data skills are becoming increasingly vital, and aspiring journalists should certainly look to incorporate them into their repertoire.


Read next:

Interview: Digital Transformation At Bloomsbury