Big Data Opening Up The Human Genome

New technology and techniques is the only way to harness its true power


We have talked about the use of big data in medicine consistently across the last few years. It has very much been at the heart of modern medical discoveries and its future is bright.

However, we are currently at the very start of the process and it will not be until we can truly harness the power of data in our healthcare that we can truly make the most of the early potential it has shown.

One of the most important elements to this is going to be in the mapping of the human genome, which created a significant opportunity for medicine. Theoretically it could show exactly what may happen to somebody with specific genomic sequencing, which diseases they are at risk from, ailments that others with a similar makeup suffer from, and the best treatments if they do become ill.

However, the issue this brings is that the human genome contains 3 billion pairs of bases, with a considerable number of these varying in each person. They can change the smallest detail of a human being too, from whether somebody can roll their tongue through to the chances of them suffering from heart disease. This challenge is where big data is currently having a huge impact and will have an even bigger impact in the future.

The ability to identify not just these links, but also the elements that could create the links is going to stem from big data and analytics. David Delaney, Chief Medical Officer at SAP believes that 'There are so many factors that will cause expression or non-expression of a gene. And when you add in factors around a patient’s environment, it turns out that there are so many different pieces to this. The hope right now is that once we get more of the puzzle pieces out there, we will be able to start making the links that will eventually lead to major breakthroughs in things like cancer.'

One of the key diseases that this kind of data is going to impact most is Cancer and in an article in Health IT Analytics [], Marcin Imielinski, MD, PhD, a Core Member and Assistant Investigator at the New York Genome Center, points out that 'Ultimately, cancer is not a single disease...It’s a constellation of different diseases that you can subdivide based on organ type or tissue pathology, but you can also divide it on the basis of their genetic changes.' This makes it even more complex, given that there are over 100 types of cancer, but in the current system they are treated as a single disease.

Through using new data technologies, algorithms, and filtering, it is possible to treat them as genuinely different diseases, and through recording of genome sequences and different reactions to drugs it is possible to either treat cancers or take steps to prevent cancers that are more common with those with similar genetic sequences. When the data sets become large enough (i.e. when we have a considerable number of mapped genomes) it may be the largest and most highly valued single dataset in the world. A single human genome is around 200GB, so if we were to theoretically sequence the entire population of the world it would work out at 1,200,000,000,000GB or 1.2 Zettabytes of data, which would require a huge amount of computing power to process. Even if we could only get 10% of the world, this number would still pose many challenges in management but significant opportunities in finding these links more accurately.

However, according to Reid J.Robison (a physician who has now become a data scientist) of this, only around 0.1%, or 125mb, of data within each genome, are mutations that can cause different characteristics and diseases, meaning a dataset of only 750,000,000GB or 750 Exabytes for the entire world. Despite the huge numbers involved this would be far more manageable and gives scientists an even better chance of blockbuster breakthroughs.

We are not yet at a stage where these are going to be genuine practical solutions, but it is something that we are getting closer to and although big data will have a profound impact when we hit critical mass, it is already providing the key to finding out what we will need to look for when we get there. 

University lecture small

Read next:

How Are Higher Education Institutions Using Analytics?