News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel

News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel
Sign In

Scientists Close in on Elusive Goal of Adapting Nanopore Technology for Protein Sequencing

Technology could enable medical laboratories to deploy inexpensive protein sequencing with a handheld device at point of care and remote locations

Clinical laboratories engaged in protein testing will be interested in several recent studies that suggest scientists may be close to adapting nanopore-sensing technology for use in protein identification and sequencing. The new proteomics techniques could lead to new handheld devices capable of genetic sequencing of proteins at low cost and with a high degree of sensitivity, in contrast to current approaches based on mass spectrometry.

But there are challenges to overcome, not the least of which is getting the proteins to cooperate. Compact devices based on nanopore technology already exist that can sequence DNA and RNA. But “there are lots of challenges with proteins” that have made it difficult to adapt the technology, Aleksei Aksimentiev, PhD, Professor of Biological Physics at the University of Illinois at Urbana-Champaign, told ASBMB Today, a publication of the American Society for Biochemistry and Molecular Biology. “In particular, they’re not uniformly charged; they’re not linear, most of the time they’re folded; and there are 20 amino acids, plus a zoo of post-translational modifications,” he added.

The ASBMB story notes that nanopore technology depends on differences in charges on either side of the membrane to force DNA or RNA through the hole. This is one reason why proteins pose such a challenge.

Giovanni Maglia, PhD, a Full Professor at the University of Groningen in the Netherlands and researcher into the fundamental properties of membrane proteins and their applications in nanobiotechnology, says he has developed a technique that overcomes these challenges.

“Think of a cell as a miniature city, with proteins as its inhabitants. Each protein-resident has a unique identity, its own characteristics, and function. If there was a database cataloging the fingerprints, job profiles, and talents of the city’s inhabitants, such a database would undoubtedly be invaluable!” said Behzad Mehrafrooz, PhD (above), Graduate Research Assistant at University of Illinois at Urbana-Champaign in an article he penned for the university website. This research should be of interest to the many clinical laboratories that do protein testing. (Photo copyright: University of Illinois.)

How the Maglia Process Works

In a Groningen University news story, Maglia said protein is “like cooked spaghetti. These long strands want to be disorganized. They do not want to be pushed through this tiny hole.”

His technique, developed in collaboration with researchers at the University of Rome Tor Vergata, uses electrically charged ions to drag the protein through the hole.

“We didn’t know whether the flow would be strong enough,” Maglia stated in the news story. “Furthermore, these ions want to move both ways, but by attaching a lot of charge on the nanopore itself, we were able to make it directional.”

The researchers tested the technology on what Maglia described as a “difficult protein” with many negative charges that would tend to make it resistant to flow.

“Previously, only easy-to-thread proteins were analyzed,” he said in the news story. “But we gave ourselves one of the most difficult proteins as a test. And it worked!”

Maglia now says that he intends to commercialize the technology through a new startup called Portal Biotech.

The Groningen University scientists published their findings in the journal Nature Biotechnology, titled “Translocation of Linearized Full-Length Proteins through an Engineered Nanopore under Opposing Electrophoretic Force.”

Detecting Post-Translational Modifications in the UK

In another recent study, researchers at the University of Oxford reported that they have adapted nanopore technology to detect post-translational modifications (PTMs) in protein chains. The term refers to changes made to proteins after they have been transcribed from DNA, explained an Oxford news story.

“The ability to pinpoint and identify post-translational modifications and other protein variations at the single-molecule level holds immense promise for advancing our understanding of cellular functions and molecular interactions,” said contributing author Hagan Bayley, PhD, Professor of Chemical Biology at University of Oxford, in the news story. “It may also open new avenues for personalized medicine, diagnostics, and therapeutic interventions.”

Bayley is the founder of Oxford Nanopore Technologies, a genetic sequencing company in the UK that develops and markets nanopore sequencing products.

The news story notes that the new technique could be integrated into existing nanopore sequencing devices. “This could facilitate point-of-care diagnostics, enabling the personalized detection of specific protein variants associated with diseases including cancer and neurodegenerative disorders,” the story states.

The Oxford researchers published their study’s findings in the journal Nature Nanotechnology titled, “Enzyme-less Nanopore Detection of Post-Translational Modifications within Long Polypeptides.”

Promise of Nanopore Protein Sequencing Technology

In another recent study, researchers at the University of Washington reported that they have developed their own method for protein sequencing with nanopore technology.

“We hacked the [Oxford Nanopore] sequencer to read amino acids and PTMs along protein strands,” wrote Keisuke Motone, PhD, one of the study authors in a post on X (formerly Twitter) following the study’s publication on the preprint server bioRxiv titled, “Multi-Pass, Single-Molecule Nanopore Reading of Long Protein Strands with Single-Amino Acid Sensitivity.”

“This opens up the possibility for barcode sequencing at the protein level for highly multiplexed assays, PTM monitoring, and protein identification!” Motone wrote.

In a commentary they penned for Nature Methods titled, “Not If But When Nanopore Protein Sequencing Meets Single-Cell Proteomics,” Motone and colleague Jeff Nivala, PhD, Principal Investigator at University of Washington, pointed to the promise of the technology.

Single-cell proteomics, enabled by nanopore protein sequencing technology, “could provide higher sensitivity and wider throughput, digital quantification, and novel data modalities compared to the current gold standard of protein MS [mass spectrometry],” they wrote. “The accessibility of these tools to a broader range of researchers and clinicians is also expected to increase with simpler instrumentation, less expertise needed, and lower costs.”

There are approximately 20,000 human genes. However, there are many more proteins. Thus, there is strong interest in understanding the human proteome and the role it plays in health and disease.

Technology that makes protein testing faster, more accurate, and less costly—especially with a handheld analyzer—would be a boon to the study of proteomics. And it would give clinical laboratories new diagnostic tools and bring some of that testing to point-of-care settings like doctor’s offices.

—Stephen Beale

Related Information:

Nanopores as the Missing Link to Next Generation Protein Sequencing

Nanopore Technology Achieves Breakthrough in Protein Variant Detection

The Scramble for Protein Nanopore Sequencing

The Emerging Landscape of Single-Molecule Protein Sequencing Technologies

ASU Researcher Advances the Science of Protein Sequencing with NIH Innovator Award          

The Missing Link to Make Easy Protein Sequencing Possible?

Engineered Nanopore Translocates Full Length Proteins

Not If But When Nanopore Protein Sequencing Meets Single-Cell Proteomics

Enzyme-Less Nanopore Detection of Post-Translational Modifications within Long Polypeptides

Unidirectional Single-File Transport of Full-Length Proteins through a Nanopore

Translocation of Linearized Full-Length Proteins through an Engineered Nanopore under Opposing Electrophoretic Force

Interpreting and Modeling Nanopore Ionic Current Signals During Unfoldase-Mediated Translocation of Single Protein Molecules

Multi-Pass, Single-Molecule Nanopore Reading of Long Protein Strands with Single-Amino Acid Sensitivity

Experimental Low-Cost Blood Test Can Detect Multiple Cancers, Researchers Say

Test uses a new ultrasensitive immunoassay to detect a known clinical laboratory diagnostic protein biomarker for many common cancers

Researchers from Mass General Brigham, the Dana-Farber Cancer Institute, Harvard University’s Wyss Institute and other institutions around the world have reportedly developed a simple clinical laboratory blood test that can detect a common protein biomarker associated with multiple types of cancer, including colorectal, gastroesophageal, and ovarian cancers.

Best of all, the researchers say the test could provide an inexpensive means of early diagnosis. This assay could also be used to monitor how well patients respond to cancer therapy, according to a news release.

The test, which is still in experimental stages, detects the presence of LINE-1 ORF1p, a protein expressed in many common cancers, as well as high-risk precursors, while having “negligible expression in normal tissues,” the researchers wrote in a paper they published in Cancer Discovery titled, “Ultrasensitive Detection of Circulating LINE-1 ORF1p as a Specific Multicancer Biomarker.”

The protein had previously been identified as a promising biomarker and is readily detectable in tumor tissue, they wrote. However, it is found in extremely low concentrations in blood plasma and is “well below detection limits of conventional clinical laboratory methods,” they noted.

To overcome that obstacle, they employed an ultra-sensitive immunoassay known as a Simoa (Single-Molecule Array), an immunoassay platform for measuring fluid biomarkers.

“We were shocked by how well this test worked in detecting the biomarker’s expression across cancer types,” said lead study author gastroenterologist Martin Taylor, MD, PhD, Instructor in Pathology, Massachusetts General Hospital and Harvard Medical School, in the press release. “It’s created more questions for us to explore and sparked interest among collaborators across many institutions.”

Kathleen Burns, MD, PhD

“We’ve known since the 1980s that transposable elements were active in some cancers, and nearly 10 years ago we reported that ORF1p was a pervasive cancer biomarker, but, until now, we haven’t had the ability to detect it in blood tests,” said pathologist and study co-author Kathleen Burns, MD, PhD (above), Chair of the Department of Pathology at Dana-Farber Cancer Institute and a Professor of Pathology at Harvard Medical School, in a press release. “Having a technology capable of detecting ORF1p in blood opens so many possibilities for clinical applications.” Clinical laboratories may soon have a new blood test to detect multiple types of cancer. (Photo copyright: Dana-Farber Cancer Institute.)

Simoa’s Advantages

In their press release, the researchers described ORF1p as “a hallmark of many cancers, particularly p53-deficient epithelial cancers,” a category that includes lung, breast, prostate, uterine, pancreatic, and head and neck cancers in addition to the cancers noted above.

“Pervasive expression of ORF1p in carcinomas, and the lack of expression in normal tissues, makes ORF1p unlike other protein biomarkers which have normal expression levels,” Taylor said in the press release. “This unique biology makes it highly specific.”

Simoa was developed at the laboratory of study co-author David R. Walt, PhD, the Hansjörg Wyss Professor of Bioinspired Engineering at Harvard Medical School, and Professor of Pathology at Harvard Medical School and Brigham and Women’s Hospital.

The Simoa technology “enables 100- to 1,000-fold improvements in sensitivity over conventional enzyme-linked immunosorbent assay (ELISA) techniques, thus opening the window to measuring proteins at concentrations that have never been detected before in various biological fluids such as plasma or saliva,” according to the Walt Lab website.

Simoa assays take less than two hours to run and require less than $3 in consumables. They are “simple to perform, scalable, and have clinical-grade coefficients of variation,” the researchers wrote.

Study Results

Using the first generation of the ORF1p Simoa assay, the researchers tested blood samples of patients with a variety of cancers along with 406 individuals, regarded as healthy, who served as controls. The test proved to be most effective among patients with colorectal and ovarian cancer, finding detectable levels of ORF1p in 58% of former and 71% of the latter. Detectable levels were found in patients with advanced-stage as well as early-stage disease, the researchers wrote in Cancer Discovery.

Among the 406 healthy controls, the test found detectable levels of ORF1p in only five. However, the control with the highest detectable levels, regarded as healthy when donating blood, “was six months later found to have prostate cancer and 19 months later found to have lymphoma,” the researchers wrote.

They later reengineered the Simoa assay to increase its sensitivity, resulting in improved detection of the protein in blood samples from patients with colorectal, gastroesophageal, ovarian, uterine, and breast cancers.

The researchers also employed the test on samples from 19 patients with gastroesophageal cancer to gauge its utility for monitoring therapeutic response. Although this was a small sample, they found that among 13 patients who had responded to therapy, “circulating ORF1p dropped to undetectable levels at follow-up sampling.”

“More Work to Be Done”

The Simoa assay has limitations, the researchers acknowledged. It doesn’t identify the location of cancers, and it “isn’t successful in identifying all cancers and their subtypes,” the press release stated, adding that the test will likely be used in conjunction with other early-detection approaches. The researchers also said they want to gauge the test’s accuracy in larger cohorts.

“The test is very specific, but it doesn’t tell us enough information to be used in a vacuum,” Walt said in the news release. “It’s exciting to see the early success of this ultrasensitive assessment tool, but there is more work to be done.”

More studies will be needed to valid these findings. That this promising new multi-cancer immunoassay is based on a clinical laboratory blood sample means its less invasive and less painful for patients. It’s a good example of an assay that takes a proteomic approach looking for protein cancer biomarkers rather than the genetic approach looking for molecular DNA/RNA biomarkers of cancer.

—Stephen Beale

Related Information:

Ultrasensitive Blood Test Detects ‘Pan-Cancer’ Biomarker

New Blood Test Could Offer Earlier Detection of Common Deadly Cancers

Ultrasensitive Detection of Circulating LINE-1 ORF1p as a Specific Multicancer Biomarker

Noninvasive and Multicancer Biomarkers: The Promise of LINE-1 Retrotransposons

LINE-1-ORF1p Is a Promising Biomarker for Early Cancer Detection, But More Research Is Needed

‘Pan-Cancer’ Found in Highly Sensitive Blood Test

Cambridge Researchers in UK Develop ‘Unknome Database’ That Ranks Proteins by How Little is Known about Their Functions

Scientists believe useful new clinical laboratory assays could be developed by better understanding the huge number of ‘poorly researched’ genes and the proteins they build

Researchers have added a new “-ome” to the long list of -omes. The new -ome is the “unknome.” This is significant for clinical laboratory managers because it is part of an investigative effort to better understand the substantial number of genes, and the proteins they build, that have been understudied and of which little is known about their full function.

Scientists at the Medical Research Council Laboratory of Molecular Biology (MRC-LMB) in Cambridge, England, believe these genes are important. They have created a database of thousands of unknown—or “unknome” as they cleverly dubbed them—proteins and genes that have been “poorly understood” and which are “unjustifiably neglected,” according to a paper the scientist published in the journal PLOS Biology titled, “Functional Unknomics: Systematic Screening of Conserved Genes of Unknown Function.”

The Unknome Database includes “thousands of understudied proteins encoded by genes in the human genome, whose existence is known but whose functions are mostly not,” according to a news release.

The database, which is available to the public and which can be customized by the user, “ranks proteins based on how little is known about them,” the PLOS Biology paper notes.

It should be of interest to pathologists and clinical laboratory scientists. The fruit of this research may identify additional biomarkers useful in diagnosis and for guiding decisions on how to treat patients.

Sean Munro, PhD

“These uncharacterized genes have not deserved their neglect,” said Sean Munro, PhD (above), MRC Laboratory of Molecular Biology in Cambridge, England, in a press release. “Our database provides a powerful, versatile and efficient platform to identify and select important genes of unknown function for analysis, thereby accelerating the closure of the gap in biological knowledge that the unknome represents.” Clinical laboratory scientists may find the Unknome Database intriguing and useful. (Photo copyright: Royal Society.)

Risk of Ignoring Understudied Proteins

Proteomics (the study of proteins) is a rapidly advancing area of clinical laboratory testing. As genetic scientists learn more about proteins and their functions, diagnostics companies use that information to develop new assays. But did you know that researchers tend to focus on only a small fraction of the total number of protein-coding DNA sequences contained in the human genome?

The study of proteomics is primarily interested in the part of the genome that “contains instructions for building proteins … [which] are essential for development, growth, and reproduction across the entire body,” according to Scientific American. These are all protein-coding genes.

Proteomics estimates that there are more than two million proteins in the human body, which are coded for 20,000 to 25,000 genes, according to All the Science.

To build their database, the MRC researchers ranked the “unknome” proteins by how little is known about their functions in cellular processes. When they tested the database, they found some of these less-researched proteins important to biological functions such as development and stress resistance. 

“The role of thousands of human proteins remains unclear and yet research tends to focus on those that are already well understood,” said Sean Munro, PhD, MRC Laboratory of Molecular Biology in Cambridge, England, in the news release. “To help address this we created an Unknome database that ranks proteins based on how little is known about them, and then performed functional screens on a selection of these mystery proteins to demonstrate how ignorance can drive biological discovery.”

Munro created the Unknome Database along with Matthew Freeman, PhD, Head of England’s Sir William Dunn School of Pathology, University of Oxford.

In the paper, they acknowledged the human genome encodes about 20,000 proteins, and that the application of transcriptomics and proteomics has “confirmed that most of these new proteins are expressed, and the function of many of them has been identified.

“However,” the authors added, “despite over 20 years of extensive effort, there are also many others that still have no known function.”

They also recognized limited resources for research and that a preference for “relative safety” and “well-established fields” are likely holding back discoveries.

The researchers note “significant” risks to continually ignoring unexplored proteins, which may have roles in cell processes, serve as targets for therapies, and be associated with diseases as well as being “eminently druggable,” Genetic Engineering News reported.

Setting up the Unknome Database

To develop the Unknome Database, the researchers first turned to what has already come to fruition. They gave each protein in the human genome a “knownness” score based on review of existing information about “function, conservation across species, subcellular localization, and other factors,” Interesting Engineering reported.

It turns out, 3,000 groups of proteins (805 with a human protein) scored zero, “showing there’s still much to learn within the human genome,” Science News stated, adding that the Unknome Database catalogues more than 13,000 protein groups and nearly two million proteins. 

The researchers then tested the database by using it to determine what could be learned about 260 “mystery” genes in humans that are also present in Drosophila (small fruit flies).

“We used the Unknome Database to select 260 genes that appeared both highly conserved and particularly poorly understood, and then applied functional assays in whole animals that would be impractical at genome-wide scale,” the researchers wrote in PLOS Biology.

“We initially selected all genes that had a knownness score of ≤1.0 and are conserved in both humans and flies, as well as being present in at least 80% of available metazoan genome sequences. … After testing for viability, the nonessential genes were then screened with a panel of quantitative assays designed to reveal potential roles in a wide range of biological functions,” they added.

“Our screen in whole organisms reveals that, despite several decades of extensive genetic screens in Drosophila, there are many genes with essential roles that have eluded characterization,” the researchers conclude.

Clinical Laboratory Testing Using the Unknome Database

Future use of the Unknome Database may involve CRISPR technology to explore functions of unknown genes, according to the PLOS Biology paper.

Munro told Science News the research team may work with other research efforts aimed at understanding “mysterious proteins,” such as the Understudied Proteins Initiative.

The Unknome Database’s ability to be customized by others means researchers can create their own “knownness” scores as it applies to their studies. Thus, the database could be a resource in studies of treatments or medications to fight diseases, Chemistry World noted.

According to a statement prepared for Healthcare Dive by SomaLogic, a Boulder, Colorado-based protein biomarker company, diagnostic tests that measure proteins can be applied to diseases and conditions such as:

In a study published in Science Translational Medicine, SomaLogic’s SomaScan assay was reportedly successful in predicting the likelihood within four years of myocardial infarction, heart failure, stroke, and even death.

“The 27-protein model has potential as a ‘universal’ surrogate end point for cardiovascular risk,” the researchers wrote in Science Translational Medicine.

Proteomics definitely has its place in clinical laboratory testing. The development of MRC-LMB’s Unknome Database will help researchers’ increase their knowledge about the functions of more proteins which should in turn lead to new diagnostic assays for labs.

—Donna Marie Pocius

Related Information:

Mapping the ‘Unknome’ May Reveal Critical Genes Scientists Have Ignored

How Many Proteins Exist?

Unknome: A Database of Human Genes We Know Almost Nothing About

Functional Unknomics: Systematic Screening of Conserved Genes of Unknown Function

Unknome Database Ranks Proteins Based on How Little is Known about Them

How a New Database of Human Genes Can Help Discover New Biology

The Unknome Catalogs Nearly Two Million Proteins. Many are Mysterious

Into the Unknome: Scientists at MRC LMB in Cambridge Create Database Ranking Human Proteins by How Little We know About Them

Scientists Hope to Illuminate Unknown Human Proteins with New Public Database

Proteomic Tests Empower Precision Medicine

A Proteomic Surrogate for Cardiovascular Outcomes That is Sensitive to Multiple Mechanisms of Change in Risk

University of Maryland Scientists Image World’s First ‘Vampire Virus’

Research could lead to improvements in gene therapy and antiviral resistance medications while also possibly leading to a new class of clinical laboratory tests

Scientists at the University of Maryland, Baltimore County (UMBC) have discovered what may be the scariest virus of all—the Vampire Virus. It’s a term that may inspire “Walking Dead” level horror in the wake of the COVID-19 pandemic, and though virologists and microbiologists might be tempted to dismiss them as imaginary, they are all too real. Even more apropos to the Dracula saga, the UM scientists found them in a soil sample. Yikes!

Happily, this ghoulish discovery could have positive implications for gene editing, gene therapy, and the development of new antiviral medications, according to The Conversation. In turn, these positive implications may eventually trigger the need to create new diagnostic tests that clinical laboratories can offer to physicians.

The UMBC scientists published their findings in the journal ISME, a publication of the International Society for Microbial Ecology, titled, “Simultaneous Entry as an Adaptation to Virulence in a Novel Satellite-Helper System Infecting Streptomyces Species.”

Vampire-like virus photo

The image above, taken from a University of Maryland news release, shows the satellite virus “latched onto its helper virus.” Discovery of vampire-like viruses that attach at the “neck” of other viruses may lead to important discoveries in the development of gene editing and antiviral therapies. Might clinical laboratories one day collect samples for pharmaceutical developers engaged in combating antiviral drug resistance? (Photo copyright: University of Maryland.)

Spotting a Vampire Virus

According to IFLScience, these tiny vampire viruses were first discovered by undergraduates who believed they were looking at sample contamination when analyzing sequences of bacteriophages from environmental soil samples. But upon repeating the experiment they realized it was no mistake.

In the UMBC news release, bioinformatician Ivan Erill, PhD, Professor of Biological Sciences at the University of Maryland, noted that “some viruses, called satellites, depend not only on their host organism to complete their life cycle, but also on another virus, known as a helper.

“The satellite virus needs the helper either to build its capsid, a protective shell that encloses the virus’ genetic material, or to help it replicate its DNA,” he added. “These viral relationships require the satellite and the helper to be in proximity to each other at least temporarily, but there were no known cases of a satellite actually attaching itself to a helper—until now.”

Although scientists have witnessed viruses working together before, this is the first known instance of a virus directly latching onto another virus’ capsid—rather like a vampire going for the neck.

“When I saw it, I was like, I can’t believe this,” said Tagide deCarvalho, PhD, Assistant Director of Natural and Mathematical Sciences at the University of Maryland and first author of the study, in a UM news release, “No one has ever seen a bacteriophage—or any other virus—attach to another virus.”

Visualizing the tiny viruses was only possible through the use of the transmission electron microscope (TEM) at UMBC’s Keith R. Porter Imaging Facility (KPIF), to which deCarvalho had access.

“Not everyone has a TEM at their disposal. [With the TEM] I’m able to follow up on some of these observations and validate them with imaging. There’s elements of discovery we can only make using the TEM,” said deCarvalho in the UMBC news release.

Using Vampire Viruses to Develop Better Gene Therapies

Spookily, the comparisons to Dracula and his parasitic brethren do not stop with their freeloading tendencies. The researchers found that some viruses without a satellite attached still showed signs of having been leeched onto before. Those viruses had the equivalent of “bite marks” showing evidence of encountering vampiric viruses in the past.

“It’s possible that a lot of the bacteriophages that people thought were contaminated were actually these satellite-helper systems,” said deCarvalho in the ISME paper.

But what does UMBC’s breakthrough mean for the greater scientific and medical community? Do we need to arm host viruses with silver crosses and necklaces of garlic? Jokes aside, this discovery could lead to further development in research of how to genetically alter viruses and deliver therapeutic elements into cells.

According to Healthline, some gene therapy or “gene editing” already involves the use of viruses. Scientists switch out the programming on a virus and trick it into healing, instead of harming the cells it infiltrates. Therefore, UMBC’s discovery could lead to new breakthroughs battling deadly viruses by using their own parasitic tricks to infiltrate other viruses.

Although groundbreaking and extremely interesting, the research is still in early stages. Any developments from this discovery aren’t likely to impact clinical laboratories any time soon. But after the past few years of battling the COVID-19 variants, this exciting discovery could help find new ways to prevent the next pandemic.  

—Ashley Croce

Related Information:

Vampire Viruses Prey on Other Viruses to Replicate Themselves and May Hold the Key to New Antiviral Therapies

Virus Seen Latching onto Another Virus (Like A Tiny Vampire) for First Time

UMBC Team Makes First-Ever Observation of a Virus Attaching to Another Virus

The First Discovered Vampire Virus Hooks Onto other Viruses—Meet the ‘MiniFlayer’

Simultaneous Entry as an Adaptation to Virulence in a Novel Satellite-Helper System infecting Streptomyces Species

Your Guide to Gene Therapy: How It Works and What It Treats

Bizarre First: Viruses Seen ‘Biting’ onto Other Viruses Like Tiny Vampires

Google DeepMind Says Its New Artificial Intelligence Tool Can Predict Which Genetic Variants Are Likely to Cause Disease

Genetic engineers at the lab used the new tool to generate a catalog of 71 million possible missense variants, classifying 89% as either benign or pathogenic

Genetic engineers continue to use artificial intelligence (AI) and deep learning to develop research tools that have implications for clinical laboratories. The latest development involves Google’s DeepMind artificial intelligence lab which has created an AI tool that, they say, can predict whether a single-letter substitution in DNA—known as a missense variant (aka, missense mutation)—is likely to cause disease.

The Google engineers used their new model—dubbed AlphaMissense—to generate a catalog of 71 million possible missense variants. They were able to classify 89% as likely to be either benign or pathogenic mutations. That compares with just 0.1% that have been classified using conventional methods, according to the DeepMind engineers.

This is yet another example of how Google is investing to develop solutions for healthcare and medical care. In this case, DeepMind might find genetic sequences that are associated with disease or health conditions. In turn, these genetic sequences could eventually become biomarkers that clinical laboratories could use to help physicians make earlier, more accurate diagnoses and allow faster interventions that improve patient care.

The Google engineers published their findings in the journal Science titled, “Accurate Proteome-wide Missense Variant Effect Prediction with AlphaMissense.” They also released the catalog of predictions online for use by other researchers.

Jun Cheng, PhD (left), and Žiga Avsec, PhD (right)

“AI tools that can accurately predict the effect of variants have the power to accelerate research across fields from molecular biology to clinical and statistical genetics,” wrote Google DeepMind engineers Jun Cheng, PhD (left), and Žiga Avsec, PhD (right), in a blog post describing the new tool. Clinical laboratories benefit from the diagnostic biomarkers generated by this type of research. (Photo copyrights: LinkedIn.)

AI’s Effect on Genetic Research

Genetic experiments to identify which mutations cause disease are both costly and time-consuming, Google DeepMind engineers Jun Cheng, PhD, and Žiga Avsec, PhD, wrote in a blog post. However, artificial intelligence sped up that process considerably.

“By using AI predictions, researchers can get a preview of results for thousands of proteins at a time, which can help to prioritize resources and accelerate more complex studies,” they noted.

Of all possible 71 million variants, approximately 6%, or four million, have already been seen in humans, they wrote, noting that the average person carries more than 9,000. Most are benign, “but others are pathogenic and can severely disrupt protein function,” causing diseases such as cystic fibrosis, sickle-cell anemia, and cancer.

“A missense variant is a single letter substitution in DNA that results in a different amino acid within a protein,” Cheng and Avsec wrote in the blog post. “If you think of DNA as a language, switching one letter can change a word and alter the meaning of a sentence altogether. In this case, a substitution changes which amino acid is translated, which can affect the function of a protein.”

In the Google DeepMind study, AlphaMissense predicted that 57% of the 71 million variants are “likely benign,” 32% are “likely pathogenic,” and 11% are “uncertain.”

The AlphaMissense model is adapted from an earlier model called AlphaFold which uses amino acid genetic sequences to predict the structure of proteins.

“AlphaMissense was fed data on DNA from humans and closely related primates to learn which missense mutations are common, and therefore probably benign, and which are rare and potentially harmful,” The Guardian reported. “At the same time, the program familiarized itself with the ‘language’ of proteins by studying millions of protein sequences and learning what a ‘healthy’ protein looks like.”

The model assigned each variant a score between 0 and 1 to rate the likelihood of pathogenicity [the potential for a pathogen to cause disease]. “The continuous score allows users to choose a threshold for classifying variants as pathogenic or benign that matches their accuracy requirements,” Avsec and Cheng wrote in their blog post.

However, they also acknowledged that it doesn’t indicate exactly how the variation causes disease.

The engineers cautioned that the predictions in the catalog are not intended for clinical use. Instead, they “should be interpreted with other sources of evidence.” However, “this work has the potential to improve the diagnosis of rare genetic disorders, and help discover new disease-causing genes,” they noted.

Genomics England Sees a Helpful Tool

BBC noted that AlphaMissense has been tested by Genomics England, which works with the UK’s National Health Service. “The new tool is really bringing a new perspective to the data,” Ellen Thomas, PhD, Genomics England’s Deputy Chief Medical Officer, told the BBC. “It will help clinical scientists make sense of genetic data so that it is useful for patients and for their clinical teams.”

AlphaMissense is “a big step forward,” Ewan Birney, PhD, Deputy Director General of the European Molecular Biology Laboratory (EMBL) told the BBC. “It will help clinical researchers prioritize where to look to find areas that could cause disease.”

Other experts, however, who spoke with MIT Technology Review were less enthusiastic.

“DeepMind is being DeepMind,” Insilico Medicine founder/CEO Alex Zhavoronkov, PhD, told the MIT publication. “Amazing on PR and good work on AI.”

Heidi Rehm, PhD, co-director of the Program in Medical and Population Genetics at the Broad Institute, suggested that the DeepMind engineers overstated the certainty of the model’s predictions. She told the publication that she was “disappointed” that they labeled the variants as benign or pathogenic.

“The models are improving, but none are perfect, and they still don’t get you to pathogenic or not,” she said.

“Typically, experts don’t declare a mutation pathogenic until they have real-world data from patients, evidence of inheritance patterns in families, and lab tests—information that’s shared through public websites of variants such as ClinVar,” the MIT article noted.

Is AlphaMissense a Biosecurity Risk?

Although DeepMind has released its catalog of variations, MIT Technology Review notes that the lab isn’t releasing the entire AI model due to what it describes as a “biosecurity risk.”

The concern is that “bad actors” could try using it on non-human species, DeepMind said. But one anonymous expert described the restrictions “as a transparent effort to stop others from quickly deploying the model for their own uses,” the MIT article noted.

And so, genetics research takes a huge step forward thanks to Google DeepMind, artificial intelligence, and deep learning. Clinical laboratories and pathologists may soon have useful new tools that help healthcare provider diagnose diseases. Time will tell. But the developments are certain worth watching.

—Stephen Beale

Related Information:

AlphaFold Is Accelerating Research in Nearly Every Field of Biology

A Catalogue of Genetic Mutations to Help Pinpoint the Cause of Diseases

Accurate Proteome-wide Missense Variant Effect Prediction with AlphaMissense

Google DeepMind AI Speeds Up Search for Disease Genes

DeepMind Is Using AI to Pinpoint the Causes of Genetic Disease

DeepMind’s New AI Can Predict Genetic Diseases

;