The use of Big Data in the humanities has hitherto flown largely under the radar, the potential application uses for historians, however, is substantial.
Advances in text analytics in particular mean that historians can now use computer programs to trawl millions of historical documents and discover semantic patterns and trends in the past. These can be used to either evidence or disprove theories, with the possibilities for application seemingly endless.
One such example comes from Real Clear Politics, which looked at 34 presidents speeches since 1981, starting with President Reagan, and used text mining tools to analyze their political tone. They looked at the language used to chart each speech ideologically (on a left-right political scale) based on its content. Their analysis revealed a number of insights. For example, President Clinton’s speeches shifted significantly to the right during his second term, while President Obama’s speeches have become more liberal.
It is not just recent history either that text analytics can help look at. Two PhD students at the Stanford Literary Lab fed 2,958 19th century novels into a series of Big Data analytics tools, leveraging insights about what the semantics meant about the wider society in which they were used. For one, they found that words describing action and body parts became more prevalent as the century went on, and concluded that increasing urbanization during the 19th century brought people closer together physically, which made people’s bodies and actions harder to ignore.
The nature of historical documents means that it is often difficult to digitize the requisite information. Many are fragile, or in disparate parts of the world, so getting them on to a computer is a costly process requiring specialist equipment - cradle facilities, ultra high res camera. Text analytics is also a difficult process, which has to take into account the context behind words for them to mean anything. Data analytics is a far easier process, with less layers to the information being considered. Data analytics is now also being used to great effect by historians.
One group committed to building a global historic archive of data is The Collaborative for Historical Information and Analysis (CHIA, formed 2011). CHIA is attempting to build a World-Historical Dataverse - open entirely to the public - from which historians, both professional and amateur, can do their own text mining using Big Data tools. In 2013, the group received in excess of $601,000 in National Science Foundation grants - an unusual move for the foundation, who rarely fund research projects in history and the humanities.
The team at CHIA is attempting to upload and review several terabytes of data over the next ten years. To do so, they must also include the “what, where, when, source” of the originally entered data, and clean it up so as to remove any inconsistencies - which are far more likely with information taken from records as many as 400 years old. Such a resource is a huge boon to collaboration for historians across the globe, in both research and analysis and develop global theories less prone to cultural bias.
The organization of Big Data in history can bring together information that is currently scattered around libraries and museums, which is time consuming and laborourious for researchers to track down and analyze. The task faced by CHIA in bringing all this together is monumental, but the value of having it is even greater. The ability to see change over time, to pick out key variables, can give an insight into processes of growth, cycles, and interactions, and predict how these could continue in the future.