Big Data, Government Overreach, and Media Hysteria

Trust in government use of data has never been lower


The past week has seen two major news events again raise the debate around governmental overreach when it comes to the collection and use of people’s personal data. First, we had Apple’s open letter to the FBI, in which Tim Cook explained the reasons behind the tech giant’s refusal to comply with a court order demanding they provide investigators with a back door into the phone of San Bernardino terrorists, Tashfeen Malik and Syed Farook. There was also the article in Ars Technica claiming that the machine learning algorithm used by NSA’s SKYNET program - an algorithm that the site said was likely used to identify people targeted by drone strikes - was fundamentally flawed and may have caused the deaths of innocent civilians.

These stories have raised two issues, which are different, but very much connected. The Ars Technica article and the idea that private data could be used to wrongly identify criminals, with terrible consequences, is one of the central reasons people fear data collection so much, and is partly why Apple believes it so important to customers that their information is protected from the government. However, there are a number of problems with the Ars Technica story, much of which speaks volumes about the hysteria surrounding Big Data, and why peoples’ fears about their privacy being invaded are somewhat misplaced.

Ars Technica’s headline, ‘NSA’s SKYNET program may be killing thousands of innocent people’, is both scare mongering and untrue. Granted, the public perception of SKYNET is not going to be helped by sharing a name with the AI system that destroys mankind in the Terminator films, and you’d think the NSA could have thought up something a bit more media friendly. However, the article itself is filled with conjecture and seemingly baseless assumptions. Martin Robbins, writing in the Guardian, has extensively debunked the article, pointing out that the NSA was not really even looking at terrorists themselves. It was looking at the couriers who delivered their messages, and data taken from their phones was being used in conjunction with a number of other intelligence sources to try and identify and locate them. The program was not simply about putting anyone whose behavior seemed ‘terroristy’ into a list and firing bombs at them. Robbins also, more importantly, noted that the exposed document ‘clearly states that these are preliminary results. The title paraphrases the conclusion to every other research study ever: ‘We’re on the right track, but much remains to be done.’ This was an experiment in courier detection and a work in progress, and yet the two publications not only pretend that it was a deployed system, but also imply that the algorithm was used to generate a kill list for drone strikes. You can’t prove a negative of course, but there’s zero evidence here to substantiate the story.’

Apple’s refusal to provide the FBI with a back door stems largely from the fear that governments will mis-use data in the ways the Ars Technica article mentions. Apple argued that by complying, they would open a Pandora’s Box whereby the government could get into peoples’ phones and access their personal data at will. The FBI denies this, saying that they were happy for Apple to keep the bypass to themselves and to destroy it once it had been used. Whoever you believe, it raises the important question of how much your privacy is worth - is it right to invade the privacy of a billion people to save lives? Right to privacy has always been a tricky concept to define legally, with many of the repercussions falling under the purview of other laws, such as theft, trespass, defamation. Is it really invasion of privacy that people fear, when most people expose so much about themselves so willingly on a daily basis anyway? As data gets bigger and more aggregated, it often becomes more anonymous anyway, and it is, for the most part, someone you don’t know looking at numbers on a screen which can’t be related back to you.

Is it, rather, fear that government incompetence is so great that they will analyze this data wrongly and shoot you in your bed while you sleep? Patrick Ball — a data scientist and the director of research at the Human Rights Data Analysis Group — who has previously given expert testimony before war crimes tribunals, described the NSA's methods as ‘ridiculously optimistic’ and ‘completely bulls**t.’ Ars Technica notes that: ‘A flaw in how the NSA trains SKYNET's machine learning algorithm to analyse cellular metadata, Ball told Ars, makes the results scientifically unsound.’ However, Ball provides scant evidence that it hasn’t worked, only that it identified an Al-Jazeera journalist as a courier because he had been acting like one through his role. Of course, it was right to do this because he matched the criteria that the NSA was looking for, which if anything simply proves that the machine algorithm does work. All that was required was a simple cross check with the man’s job description to explain it. Subsequently, the journalist has not been killed by a drone.

The Editorial Board of the New York Times wrote of the Apple case that, ‘Congress would do great harm by requiring such back doors. Criminals, and domestic and foreign intelligence agencies could exploit such features to conduct mass surveillance and steal national and trade secrets. There’s a very good chance that such a law, intended to ease the job of law enforcement, would make private citizens, businesses and the government itself far less secure.’ This is persuasive, but are we really saying that we’re so fearful of government incompetence that we should not allow data collection?

There is a strong argument that the US has no right to kill foreign citizens, but this is the argument, not whether data should be used to identify terrorists. Wrongheaded articles like Ars Technica’s spread paranoia that restricts the governments ability to look at Big Data, a resource that has driven success in almost every industry and organization that it has been used in. Apple’s statement, whatever its merits or flaws, has at least raised the issue. We need to have a real debate about Open Data and how much people are willing to share. As FBI Director James Comey noted, ‘we have awesome new technology that creates a serious tension between two values we all treasure: privacy and safety. That tension should not be resolved by corporations that sell stuff for a living. It also should not be resolved by the FBI, which investigates for a living. It should be resolved by the American people deciding how we want to govern ourselves in a world we have never seen before.’ People need to make a decision, and if the answer to this is that we cannot trust governments with data, maybe we need to ask ourselves some even more serious questions.

University lecture small

Read next:

How Are Higher Education Institutions Using Analytics?