Machine learning algorithms are becoming greatly beneficial for drug development. According to a recent paper, ‘Use of machine learning approaches for novel drug discovery’, they can now be applied in several steps of the drug discovery methodology. These include ‘the prediction of target structure, prediction of biological activity of new ligands through model construction, discovery or optimization of hits, and construction of models that predict the pharmacokinetic and toxicological (ADMET) profile of compounds.’
AstraZeneca’s announcement today that they have joined forces with Human Longevity, a US sequencing and machine learning company, to sequence 2 million genomes is therefore not a surprise as such, but it does represent a step up in terms of the scale of such projects. AstraZeneca will be able to use Human Longevity's database of 1 million genomic and heath records alongside 500,000 DNA samples from their own clinical trials. The creation of this new database is likely to take as long as a decade, but the project will also include sequences from samples donated in the past 15 years.
Machine learning algorithms teach a computer to search for certain answers in datasets by itself, and discover patterns that can help regularly improve performance and behaviors. Clearly, the sort of numbers AstraZeneca are analyzing are far beyond human capabilities, and the project should discover patterns in the genomes that lead to insights otherwise impossible to garner, helping to better identify treatments and match them to patients.
The implications of machine learning for drug discovery are tremendous. A paper released by Google Research last year further explored the issue. Entitled ‘Massively Multitask Networks for Drug Discovery’, it examined how using data from a myriad of different sources could better determine the chemical compounds that would serve as ‘effective drug treatments for a variety of diseases.’
The tech giant built a system that trained networks of tens of thousands of CPU cores to comb through 37.8 million data points covering more than 200 different biological processes. They then ran system for over 50 million CPU hours, eventually concluding that by including data from multiple sources, they could make more accurate predictions of the efficacy of a drug across different diseases.
A team of scientists at Carnegie Mellon University (CMU) has also developed another application for machine learning to aid drug development. They looked specifically at the testing phase, which is often one of the most time-consuming stages. The team created an AI-led experimentation system that chooses which experiments to conduct. It finds patterns in the data to accurately predict results of experiments without actually carrying them out, and reduces the number of tests that have to be carried out by as much as 70%.
Pharmaceutical companies are currently lacking the skills to use such complex computational methodologies, so AstraZeneca’s partnership with Human Longevity and the involvement of Google is likely to be the future of such endeavors. It is unlikely to be a smooth road. The complexity of drug discovery could limit the impact of machine learning, as even the author of the Google paper admitted. However, the advantages and necessity of technology surely make it worthy of further investigation, and it will likely take partnerships to do it.