The Google engineers used their new model—dubbed AlphaMissense—to generate a catalog of 71 million possible missense variants. They were able to classify 89% as likely to be either benign or pathogenic mutations. That compares with just 0.1% that have been classified using conventional methods, according to the DeepMind engineers.
This is yet another example of how Google is investing to develop solutions for healthcare and medical care. In this case, DeepMind might find genetic sequences that are associated with disease or health conditions. In turn, these genetic sequences could eventually become biomarkers that clinical laboratories could use to help physicians make earlier, more accurate diagnoses and allow faster interventions that improve patient care.
“AI tools that can accurately predict the effect of variants have the power to accelerate research across fields from molecular biology to clinical and statistical genetics,” wrote Google DeepMind engineers Jun Cheng, PhD (left), and Žiga Avsec, PhD (right), in a blog post describing the new tool. Clinical laboratories benefit from the diagnostic biomarkers generated by this type of research. (Photo copyrights: LinkedIn.)
AI’s Effect on Genetic Research
Genetic experiments to identify which mutations cause disease are both costly and time-consuming, Google DeepMind engineers Jun Cheng, PhD, and Žiga Avsec, PhD, wrote in a blog post. However, artificial intelligence sped up that process considerably.
“By using AI predictions, researchers can get a preview of results for thousands of proteins at a time, which can help to prioritize resources and accelerate more complex studies,” they noted.
Of all possible 71 million variants, approximately 6%, or four million, have already been seen in humans, they wrote, noting that the average person carries more than 9,000. Most are benign, “but others are pathogenic and can severely disrupt protein function,” causing diseases such as cystic fibrosis, sickle-cell anemia, and cancer.
“A missense variant is a single letter substitution in DNA that results in a different amino acid within a protein,” Cheng and Avsec wrote in the blog post. “If you think of DNA as a language, switching one letter can change a word and alter the meaning of a sentence altogether. In this case, a substitution changes which amino acid is translated, which can affect the function of a protein.”
In the Google DeepMind study, AlphaMissense predicted that 57% of the 71 million variants are “likely benign,” 32% are “likely pathogenic,” and 11% are “uncertain.”
The AlphaMissense model is adapted from an earlier model called AlphaFold which uses amino acid genetic sequences to predict the structure of proteins.
“AlphaMissense was fed data on DNA from humans and closely related primates to learn which missense mutations are common, and therefore probably benign, and which are rare and potentially harmful,” The Guardian reported. “At the same time, the program familiarized itself with the ‘language’ of proteins by studying millions of protein sequences and learning what a ‘healthy’ protein looks like.”
The model assigned each variant a score between 0 and 1 to rate the likelihood of pathogenicity [the potential for a pathogen to cause disease]. “The continuous score allows users to choose a threshold for classifying variants as pathogenic or benign that matches their accuracy requirements,” Avsec and Cheng wrote in their blog post.
However, they also acknowledged that it doesn’t indicate exactly how the variation causes disease.
The engineers cautioned that the predictions in the catalog are not intended for clinical use. Instead, they “should be interpreted with other sources of evidence.” However, “this work has the potential to improve the diagnosis of rare genetic disorders, and help discover new disease-causing genes,” they noted.
Genomics England Sees a Helpful Tool
BBC noted that AlphaMissense has been tested by Genomics England, which works with the UK’s National Health Service. “The new tool is really bringing a new perspective to the data,” Ellen Thomas, PhD, Genomics England’s Deputy Chief Medical Officer, told the BBC. “It will help clinical scientists make sense of genetic data so that it is useful for patients and for their clinical teams.”
Heidi Rehm, PhD, co-director of the Program in Medical and Population Genetics at the Broad Institute, suggested that the DeepMind engineers overstated the certainty of the model’s predictions. She told the publication that she was “disappointed” that they labeled the variants as benign or pathogenic.
“The models are improving, but none are perfect, and they still don’t get you to pathogenic or not,” she said.
“Typically, experts don’t declare a mutation pathogenic until they have real-world data from patients, evidence of inheritance patterns in families, and lab tests—information that’s shared through public websites of variants such as ClinVar,” the MIT article noted.
Is AlphaMissense a Biosecurity Risk?
Although DeepMind has released its catalog of variations, MIT Technology Review notes that the lab isn’t releasing the entire AI model due to what it describes as a “biosecurity risk.”
The concern is that “bad actors” could try using it on non-human species, DeepMind said. But one anonymous expert described the restrictions “as a transparent effort to stop others from quickly deploying the model for their own uses,” the MIT article noted.
And so, genetics research takes a huge step forward thanks to Google DeepMind, artificial intelligence, and deep learning. Clinical laboratories and pathologists may soon have useful new tools that help healthcare provider diagnose diseases. Time will tell. But the developments are certain worth watching.
Genomic sequencing continues to benefit patients through precision medicine clinical laboratory treatments and pharmacogenomic therapies
EDITOR’S UPDATE—Jan. 26, 2022: Since publication of this news briefing, officials from Genomics England contacted us to explain the following:
The “five million genome sequences” was an aspirational goal mentioned by then Secretary of State for Health and Social Care Matt Hancock, MP, in an October 2, 2018, press release issued by Genomics England.
As of this date a spokesman for Genomics England confirmed to Dark Daily that, with the initial goal of 100,000 genomes now attained, the immediate goal is to sequence 500,000 genomes.
In accordance with this updated input, we have revised the original headline and information in this news briefing that follows.
What better proof of progress in whole human genome screening than the announcement that the United Kingdom’s 100,000 Genome Project has not only achieved that milestone, but will now increase the goal to 500,000 whole human genomes? This should be welcome news to clinical laboratory managers, as it means their labs will be positioned as the first-line provider of genetic data in support of clinical care.
Many clinical pathologists here in the United States are aware of the 100,000 Genome Project, established by the National Health Service (NHS) in England (UK) in 2012. Genomics England’s new goal to sequence 500,000 whole human genomes is to pioneer a “lasting legacy for patients by introducing genomic sequencing into the wider healthcare system,” according to Technology Networks.
The importance of personalized medicine and of the power of precise, accurate diagnoses cannot be understated. This announcement by Genomics England will be of interest to diagnosticians worldwide, especially doctors who diagnose and treat patients with chronic and life-threatening diseases.
Building a Vast Genomics Infrastructure
Genetic sequencing launched the era of precision medicine in healthcare. Through genomics, drug therapies and personalized treatments were developed that improved outcomes for all patients, especially those suffering with cancer and other chronic diseases. And so far, the role of genomics in healthcare has only been expanding, as Dark Daily covered in numerous ebriefings.
Genomics England, which is wholly owned by the Department of Health and Social Care in the United Kingdom, was formed in 2012 with the goal of sequencing 100,000 whole genomes of patients enrolled in the UK National Health Service. That goal was met in 2018, and now the NHS aspires to sequence 500,000 genomes.
Genomics England’s initial goals included:
To create an ethical program based on consent,
To set up a genomic medicine service within the NHS to benefit patients,
To make new discoveries and gain insights into the use of genomics, and
To begin the development of a UK genomics industry.
To gain the greatest benefit from whole genome sequencing (WGS), a substantial amount of data infrastructure must exist. “The amount of data generated by WGS is quite large and you really need a system that can process the data well to achieve that vision,” said Richard Scott, MD, PhD, Chief Medical Officer at Genomics England.
In early 2020, Weka, developer of the WekaFS, a fully parallel and distributed file system, announced that it would be working with Genomics England on managing the enormous amount of genomic data. When Genomics England reached 100,000 sequenced genomes, it had already gathered 21 petabytes of data. The organization expects to have 140 petabytes by 2023, notes a Weka case study.
Putting Genomics England’s WGS Project into Action
WGS has significantly impacted the diagnosis of rare diseases. For example, Genomics England has contributed to projects that look at tuberculosis genomes to understand why the disease is sometimes resistant to certain medications. Genomic sequencing also played an enormous role in fighting the COVID-19 pandemic.
Scott notes that COVID-19 provides an example of how sequencing can be used to deliver care. “We can see genomic influences on the risk of needing critical care in COVID-19 patients and in how their immune system is behaving. Looking at this data alongside other omics information, such as the expression of different protein levels, helps us to understand the disease process better,” he said.
What’s Next for Genomics Sequencing?
As the research continues and scientists begin to better understand the information revealed by sequencing, other areas of scientific study like proteomics and metabolomics are becoming more important.
“There is real potential for using multiple strands of data alongside each other, both for discovery—helping us to understand new things about diseases and how [they] affect the body—but also in terms of live healthcare,” Scott said.
Along with expanding the target of Genomics England to 500,000 genomes sequenced, the UK has published a National Genomic Strategy named Genome UK. This plan describes how the research into genomics will be used to benefit patients. “Our vision is to create the most advanced genomic healthcare ecosystem in the world, where government, the NHS, research and technology communities work together to embed the latest advances in patient care,” according to the Genome UK website.
Clinical laboratories professionals with an understanding of diagnostics will recognize WGS’ impact on the healthcare industry. By following genomic sequencing initiatives, such as those coming from Genomics England, pathologists can keep their labs ready to take advantage of new discoveries and insights that will improve outcomes for patients.