How Machine Learning Is Mapping History

Researchers are analyzing past societies with AI


The use of machine learning to garner insights about the future is one of its benefits most oft-cited by enterprise. For historians, however, its true potential lies in its ability to reveal new things the past - potential that is now being realized on a massive scale.

There have been a number of projects conducted by historians in recent years that have leveraged data analytics. Real Clear Politics, for example, applied text mining tools to 34 years of State of the Union addresses, starting with Ronald Reagan in 1981, to analyze their political tone. They looked at the language used to chart each speech ideologically (on a left-right political scale) based on its content. Their analysis revealed a number of insights. For example, President Clinton’s speeches shifted significantly to the right during his second term, while President Obama’s speeches became more liberal. Another project saw two PhD students at the Stanford Literary Lab feed 2,958 19th century novels into a series of Big Data analytics tools, leveraging insights about what the semantics meant about the wider society in which they were used. They found that words describing actions and body parts became more prevalent as the century went on, and concluded that increasing urbanization during the 19th century brought people closer together physically, which made people’s bodies and actions harder to ignore.

While these have produced some fascinating discoveries about past societies, by far the most significant project in terms of scale is currently being carried out by digital humanities researcher director of the Digital Humanities Laboratory at the Swiss Federal Institute of Technology in Lausanne (EPFL), Frederic Kaplan. The project, which he calls the Venice Time Machine, will use state-of-the-art scanners and adaptable machine learning algorithms to convert 1,000 years of maps, monographs, financial records, manuscripts, and sheet music held in Venice’s state archives into digital form, from which historians will be able to reveal in minute detail the lives of ordinary people throughout the period and the development of the city over the millennia.

Venice is, in many ways, the perfect test case for this kind of project. The Most Serene Republic of Venice is well known for its administrative systems, recording vast amounts of information in painstaking detail, including births, deaths, the details of every boat that entered or left the harbour, trades, land ownership and where everyone lived, financial records, death certificates, maps, private letters, ambassadors’ reports and medical information. When Napoleon conquered the Republic of Venice in 1797 and dissolved it, he found 80km of shelving full of such records. This has always been a treasure trove for historians and digitizing it all will mean it can be cross-referenced and analyzed with relative ease.

Frederic Kaplan has spent his career deploying machine learning technologies within the humanities. He has already used AI to search centuries of newspaper reports for linguistic patterns. The Venice Time Machine incorporates a range of state-of-the-art technologies. It uses specially adapted high-speed scanners - including one with a robotic arm to turn the pages of books - and it can even read books that have never been opened. These scanners are capable of producing several thousand high-definition images per hour, with the many terabytes of information then sent to servers in Venice for long-term storage, and to Lausanne, where high-performance computers transform the images into digital text ready for annotation. These computers use visual recognition software driven by machine learning technology that was previously unavailable, using the data it has harvested itself from the documents to find similar graphical shapes and thereby learn the structure of written text.

Tim Berners-Lee, Inventor of the World Wide Web, once said that ’Data is a precious thing and will last longer than the systems themselves.’ The implications of this data are tremendous. Joan Rosés, an economic historian at the London School of Economics and Political Science, notes that as the first true modern financial center, this information ‘could help change our understanding of how financial markets work.’ Historians have always viewed history in a way that suits them, but the objective eyes of a machine could ultimately produce a single, unassailable version of truth that could change how we view the foundations of society and drive real change moving forward.


Read next:

Why Blockchain Hype Must End