The Big Data Of Genetic Research - And The Advancements It’s Helped

Did you know a fully sequenced human genome can take up more than 200 gigabytes of storage space?


In 2003, the National Human Genome Research Institute (NHGRI) announced that they had successfully sequenced the human genome.

In the 14 years since that momentous achievement, we've made amazing advances in genetics. The use of big data in recent years has also helped to facilitate changes in the field.

How has big data changed the study of genetics, and how will it continue to transform it in the future?

The Study of Hereditary Conditions

We've long suspected that many health conditions are hereditary, but we haven't had the tools necessary to identify the genes that cause each specific disorder.

Even after mapping the human genome, we don't know what each gene does. We've identified a few, such as the BCRA1 and BCRA2 genes, which indicate a higher risk of breast and ovarian cancer. Beyond that, though, we don't have the kind of information that we need to match the individual genes with the diseases that they cause.

Part of this is because to match the gene to its disease, researchers need to be able to study a large selection of people who have the disease. For some conditions, this isn't difficult. For those that relatively few people have, this can be infinitely more difficult or even impossible.

Studying siblings is one way to identify these genetic anomalies. In many cases, a condition that is present in one child doesn't manifest in another child from the same parents. This is also what enables cord blood from one sibling, harvested at birth, to be used to treat the other.

The sheer amount of information collected about any given diagnosis is mind-blowing - more information than any one person could go through in their lifetime. This is where big data comes in.

Genetics and Big Data

Big data is a buzz word that encompasses a lot of information, but in general, it refers to the use of enormous data storage systems paired with predictive algorithms to make sense of that data. With enough input, a system could even predict the future, using data pairs instead of crystal balls.

For genetics, the sheer amount of data that is collected is enormous - a fully sequenced human genome can take up more than 200 gigabytes of storage space, about as much space as about 75 BlueRay quality movies. In that sequenced data is more than 3 billion DNA base pairs.

Can you imagine going through 3 billion pairs to try to find just one gene that might trigger a disease under a specific set of circumstances?

Enter big data. A properly programmed set of predictive software can go through all of that data, looking at the individual base pairs and seeking out information about individual genes. This process takes a fraction of the time that it would take a human. All the humans need to do is input the kind of information that they're looking for - disease, patient demographic information or the identifier for an individual gene - and the computer will do the rest.

This also enables researchers and doctors to look into new ways to treat cancers and other diseases that are already well studied.

Genomic therapy, or the utilization of an individual's genome to personalize treatment, can potentially make cancer treatment more effective. Instead of treating a cancerous tumor like every other one that's come before it, doctors can scan the genome of the individual, figure out just what went wrong to cause the mutation in the tumor's cells and work to correct it from there. This could potentially be much more effective than just bombarding the cells with radiation or chemotherapy, which is the standard treatment currently.

Big data might not be the best tool in the shed just yet, but it is quickly shaping up to be the most effective way to manage the enormous amount of data that doctors and geneticists around the world collect.


Read next:

Why Blockchain Hype Must End