How Can Big Data Be A Cancer-Fighting Tool?

How can Big Data be a Cancer-Fighting Tool?


The silver bullet of medicine, curing cancer is still elusive, although massive resources and intensive research are now available and directed towards this problem. The term 'cancer' does not define just one disease, but a large family of body malfunctions whose common denominator is the uncontrolled growth of specific cells with mutated DNA. Depending on the tissue where this growth occurs and the general characteristics of the organism, no two cases are the same. For a disease estimated to hit more than one-third of the US population, the need to find answers is high.

From biology to IT

One of the most promising solutions to this problem is 'computational biology.' This field links natural science to information technology by aiming to code biological data into models that can be analyzed to define relations and patterns. A helpful tool in this endeavor is Big Data analysis, a new approach to look at datasets that are measured in TB (1TB = 1024GB). For example, the data retrieved from only one patient suffering from cancer could reach 50TB, including daily changes and breaking down genomic data.

The sheer possibility of collecting these amounts of data was highly unlikely just a few years ago, but it is now growing at an exponential rate due to the expansion of different Internet of Things (IoT) devices. This helps scientists get into the mechanics of cancer and not only classify it by the organ where it occurs, but by taking into consideration how it unfolds and reacts to treatment.

Precision medicine

These new insights can help researchers investigate at an atomic level and not only identify the vicious cells but go far in the DNA structure and identify the mutated genes that are causing the condition. This is just the first phase of a long drug discovery process that needs to recognize the substances that can make an impact on that specific gene and cause cancer remittance.

The difference from traditional cancer cure is that this approach does not only look at exterior manifestations but tries to understand the genetic triggers. This is an entirely new way of dealing with this deadly disease. The advantage of this method is that it focuses on the root cause instead of external manifestations. Scientists gather several sources of data points into one centralized data lake, and by applying unstructured data analysis algorithms, they aim to find meaningful connections. This is a world away compared to the clinical trial method which was used until now, with little success.

Breakthrough projects

According to Carl Sagan, we are made of star stuff. Applying this logic, Dr. Caldas from Cancer Research UK is adapting an algorithm developed by astronomers for galaxy studies to categorize medical images. Cancer researchers are utilizing the same methods that scientists use to count the stars and classify galaxies to count malign cells and create classifications of the disease.

Stars aren't the only inspiration for these pioneers, as they are being inspired by other areas as well. Changing market research algorithms to model breast cancer genes is another innovative idea. They can now create 3D maps that model the links between cancer triggering genes and manifestations much in the same way marketing researchers described the relationship between advertisement and buying behavior.

One of the most significant breakthroughs to date is the development of BPM 31510, a drug against pancreatic, breast and liver cancer created by an algorithm after analyzing data from 100 patients. The innovation is that the whole development process was reversed, and Big Data was used to come up with a hypothesis about substance interactions. This has saved a considerable amount of guesswork and error, and although it must be validated through the regular clinical trial process, this is a promising success story.

This way of reasoning can be extended to other cancer types, including the more particular ones, like mesothelioma, considered rare and with a low survival rate. In fact, using machine learning and Big Data analysis could give these patients more chances, and help enhance existing treatments. Since the method is not specific to the affected organs, but the associated genes and doesn’t take into consideration the number of cases, there is enough information to be gathered from patients already enrolled in control programs.

Ongoing programs

There are numerous cancer research institutes, and Big Data offers them the possibility to collaborate to speed up the process of drug finding.


This is a knowledge base focused on research and drug discovery, structured as a portal that aims to catalyze multidisciplinary analysis. It focuses on the study of proteins and their mutations due to cancer. To find a cure, the mutated protein is studied in connection with different substances that can bind to it. This platform is a public resource accessed by over 200,000 scientists, and its main accomplishment is that it saved time by indicating drug combinations that could not work successfully.

Project GENIE

AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) is a project that aims to create an oncology catalog of the links between cancer genomics and clinical results. The novelty consists in the fact that it uses statistics to help decision making especially for rare cancers, such as mesothelioma.

Future hope

Big Data, IoT and their applications for cancer research are just starting to unfold. There is a long way ahead before we can state that we have solved this problem, but a Nobel Prize for medicine in the next years could be shared by an interdisciplinary team of biologists, chemists, physicians and data scientists. In fact, it will become the norm to have IT researchers in medical groups to help make sense of the data gathered from patients in an entirely new way.

If there is anything to be learned from existing early results is that maybe the classical path of drug development –– guessing, testing, and validating –– is outdated. Perhaps it is a better idea to start with a clean slate, fill it with data and let machines come up with solutions that will be confirmed afterward.


Read next:

Why Blockchain Hype Must End