News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel

News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel
Sign In

Cambridge Researchers in UK Develop ‘Unknome Database’ That Ranks Proteins by How Little is Known about Their Functions

Scientists believe useful new clinical laboratory assays could be developed by better understanding the huge number of ‘poorly researched’ genes and the proteins they build

Researchers have added a new “-ome” to the long list of -omes. The new -ome is the “unknome.” This is significant for clinical laboratory managers because it is part of an investigative effort to better understand the substantial number of genes, and the proteins they build, that have been understudied and of which little is known about their full function.

Scientists at the Medical Research Council Laboratory of Molecular Biology (MRC-LMB) in Cambridge, England, believe these genes are important. They have created a database of thousands of unknown—or “unknome” as they cleverly dubbed them—proteins and genes that have been “poorly understood” and which are “unjustifiably neglected,” according to a paper the scientist published in the journal PLOS Biology titled, “Functional Unknomics: Systematic Screening of Conserved Genes of Unknown Function.”

The Unknome Database includes “thousands of understudied proteins encoded by genes in the human genome, whose existence is known but whose functions are mostly not,” according to a news release.

The database, which is available to the public and which can be customized by the user, “ranks proteins based on how little is known about them,” the PLOS Biology paper notes.

It should be of interest to pathologists and clinical laboratory scientists. The fruit of this research may identify additional biomarkers useful in diagnosis and for guiding decisions on how to treat patients.

Sean Munro, PhD

“These uncharacterized genes have not deserved their neglect,” said Sean Munro, PhD (above), MRC Laboratory of Molecular Biology in Cambridge, England, in a press release. “Our database provides a powerful, versatile and efficient platform to identify and select important genes of unknown function for analysis, thereby accelerating the closure of the gap in biological knowledge that the unknome represents.” Clinical laboratory scientists may find the Unknome Database intriguing and useful. (Photo copyright: Royal Society.)

Risk of Ignoring Understudied Proteins

Proteomics (the study of proteins) is a rapidly advancing area of clinical laboratory testing. As genetic scientists learn more about proteins and their functions, diagnostics companies use that information to develop new assays. But did you know that researchers tend to focus on only a small fraction of the total number of protein-coding DNA sequences contained in the human genome?

The study of proteomics is primarily interested in the part of the genome that “contains instructions for building proteins … [which] are essential for development, growth, and reproduction across the entire body,” according to Scientific American. These are all protein-coding genes.

Proteomics estimates that there are more than two million proteins in the human body, which are coded for 20,000 to 25,000 genes, according to All the Science.

To build their database, the MRC researchers ranked the “unknome” proteins by how little is known about their functions in cellular processes. When they tested the database, they found some of these less-researched proteins important to biological functions such as development and stress resistance. 

“The role of thousands of human proteins remains unclear and yet research tends to focus on those that are already well understood,” said Sean Munro, PhD, MRC Laboratory of Molecular Biology in Cambridge, England, in the news release. “To help address this we created an Unknome database that ranks proteins based on how little is known about them, and then performed functional screens on a selection of these mystery proteins to demonstrate how ignorance can drive biological discovery.”

Munro created the Unknome Database along with Matthew Freeman, PhD, Head of England’s Sir William Dunn School of Pathology, University of Oxford.

In the paper, they acknowledged the human genome encodes about 20,000 proteins, and that the application of transcriptomics and proteomics has “confirmed that most of these new proteins are expressed, and the function of many of them has been identified.

“However,” the authors added, “despite over 20 years of extensive effort, there are also many others that still have no known function.”

They also recognized limited resources for research and that a preference for “relative safety” and “well-established fields” are likely holding back discoveries.

The researchers note “significant” risks to continually ignoring unexplored proteins, which may have roles in cell processes, serve as targets for therapies, and be associated with diseases as well as being “eminently druggable,” Genetic Engineering News reported.

Setting up the Unknome Database

To develop the Unknome Database, the researchers first turned to what has already come to fruition. They gave each protein in the human genome a “knownness” score based on review of existing information about “function, conservation across species, subcellular localization, and other factors,” Interesting Engineering reported.

It turns out, 3,000 groups of proteins (805 with a human protein) scored zero, “showing there’s still much to learn within the human genome,” Science News stated, adding that the Unknome Database catalogues more than 13,000 protein groups and nearly two million proteins. 

The researchers then tested the database by using it to determine what could be learned about 260 “mystery” genes in humans that are also present in Drosophila (small fruit flies).

“We used the Unknome Database to select 260 genes that appeared both highly conserved and particularly poorly understood, and then applied functional assays in whole animals that would be impractical at genome-wide scale,” the researchers wrote in PLOS Biology.

“We initially selected all genes that had a knownness score of ≤1.0 and are conserved in both humans and flies, as well as being present in at least 80% of available metazoan genome sequences. … After testing for viability, the nonessential genes were then screened with a panel of quantitative assays designed to reveal potential roles in a wide range of biological functions,” they added.

“Our screen in whole organisms reveals that, despite several decades of extensive genetic screens in Drosophila, there are many genes with essential roles that have eluded characterization,” the researchers conclude.

Clinical Laboratory Testing Using the Unknome Database

Future use of the Unknome Database may involve CRISPR technology to explore functions of unknown genes, according to the PLOS Biology paper.

Munro told Science News the research team may work with other research efforts aimed at understanding “mysterious proteins,” such as the Understudied Proteins Initiative.

The Unknome Database’s ability to be customized by others means researchers can create their own “knownness” scores as it applies to their studies. Thus, the database could be a resource in studies of treatments or medications to fight diseases, Chemistry World noted.

According to a statement prepared for Healthcare Dive by SomaLogic, a Boulder, Colorado-based protein biomarker company, diagnostic tests that measure proteins can be applied to diseases and conditions such as:

In a study published in Science Translational Medicine, SomaLogic’s SomaScan assay was reportedly successful in predicting the likelihood within four years of myocardial infarction, heart failure, stroke, and even death.

“The 27-protein model has potential as a ‘universal’ surrogate end point for cardiovascular risk,” the researchers wrote in Science Translational Medicine.

Proteomics definitely has its place in clinical laboratory testing. The development of MRC-LMB’s Unknome Database will help researchers’ increase their knowledge about the functions of more proteins which should in turn lead to new diagnostic assays for labs.

—Donna Marie Pocius

Related Information:

Mapping the ‘Unknome’ May Reveal Critical Genes Scientists Have Ignored

How Many Proteins Exist?

Unknome: A Database of Human Genes We Know Almost Nothing About

Functional Unknomics: Systematic Screening of Conserved Genes of Unknown Function

Unknome Database Ranks Proteins Based on How Little is Known about Them

How a New Database of Human Genes Can Help Discover New Biology

The Unknome Catalogs Nearly Two Million Proteins. Many are Mysterious

Into the Unknome: Scientists at MRC LMB in Cambridge Create Database Ranking Human Proteins by How Little We know About Them

Scientists Hope to Illuminate Unknown Human Proteins with New Public Database

Proteomic Tests Empower Precision Medicine

A Proteomic Surrogate for Cardiovascular Outcomes That is Sensitive to Multiple Mechanisms of Change in Risk

European Study Links Genes Inherited from Neanderthals to Higher Risk for Severe COVID-19 Infections in Today’s Humans

About 50% of South Asians and 16% of Europeans carry gene cluster associated with respiratory failure after SARS-CoV-2 infection and hospitalization

Clinical pathology laboratories and medical laboratory scientists may be intrigued to learn that scientists from two research institutes in Germany and Sweden have determined that a strand of DNA associated with a higher risk of severe COVID-19 in humans is similar to the corresponding DNA sequences of a roughly 50,000-year-old Neanderthal from Croatia.

The researchers concluded that this gene cluster—passed down from Neanderthals to homo sapiens—triples the risk of developing severe COVID-19 respiratory symptoms for some modern day humans.

The study, published in the journal Nature, was authored by Svante Pääbo, PhD, Director of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, and Hugo Zeberg, MD, PhD, an Assistant Professor in the Department of Neuroscience at the Karolinska Institute, in Stockholm, Sweden, and research scientist at the Max Planck Institute for Evolutionary Anthropology.

In a press release, Pääbo said, “It is striking that the genetic heritage from the Neanderthals has such tragic consequences during the current pandemic. Why this is must now be investigated as quickly as possible.”

Might Useful Biomarkers for Clinical Laboratory Tests Be Identified?

Though it is not immediately clear how these findings may alter current approaches to developing treatments and a vaccine for the SARS-CoV-2 coronavirus, it is another example of how increased knowledge of human DNA leads to new understandings about genetic sequences that can spur development of useful biomarkers for clinical laboratory diagnostics tests.

Swedish geneticist Svante Pääbo, PhD

Swedish geneticist Svante Pääbo, PhD (above right), Director of the Max Planck Institute for Evolutionary Anthropology in Germany, is co-author of a recent study that traced a gene cluster linked to a higher risk of severe COVID-19 to 50,000-year-old Neanderthals from Croatia. “It is striking that the genetic heritage from the Neanderthals has such tragic consequences during the current pandemic,” he said. Nevertheless, such discoveries sometimes lead to new biomarkers for clinical laboratory tests and diagnostics. (Photo copyright: Max Planck Institute for Evolutionary Anthropology.)

This latest research reveals that people who inherit a specific six-gene combination on chromosome 3—called a haplotype—are three times more likely to need artificial ventilation if they are infected by the SARS-CoV-2 coronavirus. Yet, the researchers can only speculate as to why the gene cluster confers a higher risk.

“The genes in this region may well have protected the Neanderthals against some other infectious diseases that are not around today. And now, when we are faced with the [SARS-CoV-2] coronavirus, these Neanderthal genes have these tragic consequences,” Pääbo told the Guardian.

According to the study, the gene risk variant is most common in South Asia where about half of the population carries the Neanderthal risk variant. In comparison, one in six Europeans have inherited the gene sequence and the trait is almost nonexistent in Africa and East Asia.

“About 63% of people in Bangladesh have at least one copy of the disease-associated haplotype, and 13% have two copies (one from their mother and one from their father). For them, the Neandertal DNA might be partially responsible for increased mortality from a coronavirus infection. People of Bangladeshi origin living in the United Kingdom, for instance, are twice as likely to die of COVID-19 as the general population,” Science News reported.

Other Research Connecting Genes to Severe COVID-19 Symptoms

The haplotype on chromosome 3 first made headlines in June when the New England Journal of Medicine (NEJM) published the “Genomewide Association Study of Severe COVID-19 with Respiratory Failure,” which analyzed COVID-19 patients in seven hospitals in Italy and Spain. The researchers found an association between the gene cluster on chromosome 3 and severe symptoms of SARS-CoV-2 after infection and hospitalization. The study also pointed to the potential involvement of chromosome 9, which contains the ABO blood-group system gene, indicating that humans with type A blood may have a 45% higher risk of developing severe COVID-19 infections.

However, Mark Maslin, PhD, Professor of Climatology at University College London, cautions against drawing strong conclusions from the initial research tying disease risk to the genetic legacy of Neanderthals, the Guardian reported. He suggested that, while the Neanderthal-derived variant may contribute to COVID-19 risk in certain populations, genes are more likely to be just one of multiple risk factors for COVID-19 that include age, gender, and pre-existing conditions.

“COVID-19 is a complex disease, the severity of which has been linked to age, gender, ethnicity, obesity, health, virus load among other things,” Maslin told the Guardian. “This paper links genes inherited from Neanderthals with a higher risk of COVID-19 hospitalization and severe complications. But as COVID-19 spreads around the world it is clear that lots of different populations are being severely affected, many of which do not have any Neanderthal genes.

“We must avoid simplifying the causes and impact of COVID-19, as ultimately a person’s response to the disease is about contact and then the body’s immunity response, which is influenced by many environmental, health and genetic factors.”

Andre Franke, PhD, Director of the Institute of Clinical Molecular Biology, Kiel University in Germany, agrees with Maslin, the Associated Press reported. In a statement “ahead of the study’s final publication,” he said these latest findings have no immediate impact on the treatment of COVID-19, and he questioned “why that haplotype—unlike most Neanderthal genes—survived until today,” AP reported.

All of this deepens the mystery of the SARS-CoV-2 coronavirus. Genomics research continues to add new insights into what is known about COVID-19 and may ultimately provide answers on why some people contract the disease and remain asymptomatic—or have mild symptoms—while others become seriously ill or die. Understanding why and how certain genes increase the risk of severe COVID-19 could give rise to targeted clinical laboratory tests and therapies to fight the disease.

—Andrea Downing Peck

Related Information:

The Major Genetic Risk Factor for Severe COVID-19 Is Inherited from Neanderthals

Genomewide Association Study of Severe COVID-19 with Respiratory Failure

Neanderthal Genes Increase Risk of Serious COVID-19, Study Claims

Neandertal Gene Variant Increases Risk of Severe COVID-19

Study: Neanderthal Genes May Be a Liability for COVID Patients

Neanderthal Genes in People Today May Raise Risk of Severe COVID-19

COVID-19 Hospitalization and Death by Race/Ethnicity