These are troubling times for journalists. The last decade has seen their numbers dwindle dramatically across the globe. In the US, they fell roughly one-third between 2006 and 2013, while the UK and Australia have seen similar levels of decline. So the rise of automated journalism is probably not something many in the profession will be looking at with greedy anticipation. More than likely, it will be with a wary eye, or perhaps even just resignation at what is essentially just another bullet in the back of the head for what was once a noble and respected career path. But is it really something they should worry about?
Automated journalism is the use of automated scripts to analyze data and generate and construct news stories in volumes impossible for individual human journalists. The technology relies on Natural Language Generation (NLG), a branch of AI that communicates the findings and insights discovered by Natural Language Processing (NLP), which understands and contextualizes text, by translating them into a digestible written narrative, essentially allowing humans and computers to speak the same language.
Such technology is now being adopted across the board in news media, from local papers all the way through to giants like USA Today. One example is RADAR, a collaboration between leading UK news agency Press Association (PA) and news automation specialists Urbs Media that aims to create 30,000 localized news reports every month. NLG software will pump out the stories across localized distribution networks from 2018. All the components of the AI-created stories will be automated, from the words on the screen to the graphics and images that accompany them. The project has even seen a 706,000 euro investment from Google's Digital News Initiative (DNI), a fund that promotes innovation in digital journalism in Europe.
The Washington Post, under the stewardship of tech dynamo Jeff Bezos, has also invested heavily in AI technology. They have developed ‘Heliograf’, which debuted last year during the Rio Olympics and also produced coverage around the US election later in the year.
Heliograf is a reasonably simple piece of technology. As Joe Keohane describes in a Wired article, ‘Editors create narrative templates for the stories, including key phrases that account for a variety of potential outcomes (from “Republicans retained control of the House” to “Democrats regained control of the House”), and then they hook Heliograf up to any source of structured data - in the case of the election, the data clearinghouse VoteSmart.org. The Heliograf software identifies the relevant data, matches it with the corresponding phrases in the template, merges them, and then publishes different versions across different platforms. The system can also alert reporters via Slack of any anomalies it finds in the data - for instance, wider margins than predicted - so they can investigate.’
It is not just being used for basic reporting though. Kris Hammond from Narrative Science, a company that specializes in natural language generation, has even gone so far as to predict that ‘A machine will win a Pulitzer one day,’ with researchers working hard to develop ways AI can be used in the discovery of stories from data in ways that humans couldn’t. Data journalism has grown tremendously over the last decade, with the likes of the Financial Times and the Guardian investing heavily in specialist data journalists, data scientists, and visualization software to find newsworthy stories in publicly available data sets and present them to their audience in a more engaging, and often interactive format. There is no real reason that the process of finding stories could not also be automated, with algorithms identifying patterns and automated data visualization software presenting them with limited need for human involvement. Indeed, humans could eventually be restricted to deciding what newsworthy questions to ask of data sets, writing story templates, and ensuring that the patterns revealed by the algorithms are not too spurious.
But while these advances may be a tremendous boon to newsroom efficiency, many also have serious concerns about their implications. They revolve primarily about what will happen to jobs and to journalistic ethics.
As is the case whenever you discuss adoption of AI in any industry, there is the question of what it means for jobs. According to PA's Editor-in-Chief Peter Clifton, ‘Skilled human journalists will still be vital in the process,’ and the mundane, low-skill nature of the AI-produced stories mean that human writers and editors have no cause for concern about being replaced by robots. However, such tasks were often carried out by trainee journalists and were good learning experiences whilst also enabling them to get a foot in the door and prove their worth, teaching them a range of skills from research to how to structure a story. It is difficult to see what would replace this experience if AI was doing the more mundane tasks. And that’s assuming that it does not develop much past this level. If we are to believe Kris Hammond’s assertion that a machine will one day win a Pulitzer, it is likely that it will take the work of even the most experienced reporters.
There is also the question of ethics. With Trump’s bellicose rantings against the media in recent months, it is easy to forget that there are strict regulations in place and a code of ethics the majority stringently adhere to. AI could prove seriously detrimental to the transparency required by journalists as much of its work is done behind the scenes and the code used and how it works largely obfuscated to humans. And if they do explain such technical concepts, is it likely to increase trust, or decrease it? Transparency is particularly important because we have seen how machine learning algorithms can pick up inherent biases from the data they learn from, such as racism from police crime figures. Machine learning is so effective as a framework for making predictions because programs learn from human-provided examples rather than explicit rules and heuristics. Data mining looks to find patterns in data, so if, as author of Math ∩ Programming, Jeremy Kun argues, ‘race is disproportionately (but not explicitly) represented in the data fed to a data-mining algorithm, the algorithm can infer race and use race indirectly to make an ultimate decision.’ The defining factor of crime is poverty, and this is an issue that still disproportionately impacts black people. It could also be argued that things like arrest histories have been affected by previously existent structural racism, and by feeding this information into an algorithm all you are doing is scaling stereotypes and reinforcing it with something seemingly unbiased.
Dan Keyserling, head of communications at Jigsaw, argues the same is true of journalism, noting that, ‘We need to treat numbers with the same kind of care that we would treat facts in a story. They need to be checked, they need to be qualified and their context needs to be understood.’ The issue here is that by turning over the more mundane jobs to machines, we may be doing serious damage to the pipeline of high quality journalists, and leaving machines to essentially run riot, with whatever biases they pick up being broadcast to a huge readership and potentially doing widespread damage to society. The technology has still not seen widespread adoption across newsrooms. But the efficiencies that it produces and the constraints that the newspaper industry is working under mean that it is really only a matter of time. The implications for jobs and the quality of the news being produced must be carefully monitored, or serious damage could be done.