News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel

News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel
Sign In

Cambridge Researchers in UK Develop ‘Unknome Database’ That Ranks Proteins by How Little is Known about Their Functions

Scientists believe useful new clinical laboratory assays could be developed by better understanding the huge number of ‘poorly researched’ genes and the proteins they build

Researchers have added a new “-ome” to the long list of -omes. The new -ome is the “unknome.” This is significant for clinical laboratory managers because it is part of an investigative effort to better understand the substantial number of genes, and the proteins they build, that have been understudied and of which little is known about their full function.

Scientists at the Medical Research Council Laboratory of Molecular Biology (MRC-LMB) in Cambridge, England, believe these genes are important. They have created a database of thousands of unknown—or “unknome” as they cleverly dubbed them—proteins and genes that have been “poorly understood” and which are “unjustifiably neglected,” according to a paper the scientist published in the journal PLOS Biology titled, “Functional Unknomics: Systematic Screening of Conserved Genes of Unknown Function.”

The Unknome Database includes “thousands of understudied proteins encoded by genes in the human genome, whose existence is known but whose functions are mostly not,” according to a news release.

The database, which is available to the public and which can be customized by the user, “ranks proteins based on how little is known about them,” the PLOS Biology paper notes.

It should be of interest to pathologists and clinical laboratory scientists. The fruit of this research may identify additional biomarkers useful in diagnosis and for guiding decisions on how to treat patients.

Sean Munro, PhD

“These uncharacterized genes have not deserved their neglect,” said Sean Munro, PhD (above), MRC Laboratory of Molecular Biology in Cambridge, England, in a press release. “Our database provides a powerful, versatile and efficient platform to identify and select important genes of unknown function for analysis, thereby accelerating the closure of the gap in biological knowledge that the unknome represents.” Clinical laboratory scientists may find the Unknome Database intriguing and useful. (Photo copyright: Royal Society.)

Risk of Ignoring Understudied Proteins

Proteomics (the study of proteins) is a rapidly advancing area of clinical laboratory testing. As genetic scientists learn more about proteins and their functions, diagnostics companies use that information to develop new assays. But did you know that researchers tend to focus on only a small fraction of the total number of protein-coding DNA sequences contained in the human genome?

The study of proteomics is primarily interested in the part of the genome that “contains instructions for building proteins … [which] are essential for development, growth, and reproduction across the entire body,” according to Scientific American. These are all protein-coding genes.

Proteomics estimates that there are more than two million proteins in the human body, which are coded for 20,000 to 25,000 genes, according to All the Science.

To build their database, the MRC researchers ranked the “unknome” proteins by how little is known about their functions in cellular processes. When they tested the database, they found some of these less-researched proteins important to biological functions such as development and stress resistance. 

“The role of thousands of human proteins remains unclear and yet research tends to focus on those that are already well understood,” said Sean Munro, PhD, MRC Laboratory of Molecular Biology in Cambridge, England, in the news release. “To help address this we created an Unknome database that ranks proteins based on how little is known about them, and then performed functional screens on a selection of these mystery proteins to demonstrate how ignorance can drive biological discovery.”

Munro created the Unknome Database along with Matthew Freeman, PhD, Head of England’s Sir William Dunn School of Pathology, University of Oxford.

In the paper, they acknowledged the human genome encodes about 20,000 proteins, and that the application of transcriptomics and proteomics has “confirmed that most of these new proteins are expressed, and the function of many of them has been identified.

“However,” the authors added, “despite over 20 years of extensive effort, there are also many others that still have no known function.”

They also recognized limited resources for research and that a preference for “relative safety” and “well-established fields” are likely holding back discoveries.

The researchers note “significant” risks to continually ignoring unexplored proteins, which may have roles in cell processes, serve as targets for therapies, and be associated with diseases as well as being “eminently druggable,” Genetic Engineering News reported.

Setting up the Unknome Database

To develop the Unknome Database, the researchers first turned to what has already come to fruition. They gave each protein in the human genome a “knownness” score based on review of existing information about “function, conservation across species, subcellular localization, and other factors,” Interesting Engineering reported.

It turns out, 3,000 groups of proteins (805 with a human protein) scored zero, “showing there’s still much to learn within the human genome,” Science News stated, adding that the Unknome Database catalogues more than 13,000 protein groups and nearly two million proteins. 

The researchers then tested the database by using it to determine what could be learned about 260 “mystery” genes in humans that are also present in Drosophila (small fruit flies).

“We used the Unknome Database to select 260 genes that appeared both highly conserved and particularly poorly understood, and then applied functional assays in whole animals that would be impractical at genome-wide scale,” the researchers wrote in PLOS Biology.

“We initially selected all genes that had a knownness score of ≤1.0 and are conserved in both humans and flies, as well as being present in at least 80% of available metazoan genome sequences. … After testing for viability, the nonessential genes were then screened with a panel of quantitative assays designed to reveal potential roles in a wide range of biological functions,” they added.

“Our screen in whole organisms reveals that, despite several decades of extensive genetic screens in Drosophila, there are many genes with essential roles that have eluded characterization,” the researchers conclude.

Clinical Laboratory Testing Using the Unknome Database

Future use of the Unknome Database may involve CRISPR technology to explore functions of unknown genes, according to the PLOS Biology paper.

Munro told Science News the research team may work with other research efforts aimed at understanding “mysterious proteins,” such as the Understudied Proteins Initiative.

The Unknome Database’s ability to be customized by others means researchers can create their own “knownness” scores as it applies to their studies. Thus, the database could be a resource in studies of treatments or medications to fight diseases, Chemistry World noted.

According to a statement prepared for Healthcare Dive by SomaLogic, a Boulder, Colorado-based protein biomarker company, diagnostic tests that measure proteins can be applied to diseases and conditions such as:

In a study published in Science Translational Medicine, SomaLogic’s SomaScan assay was reportedly successful in predicting the likelihood within four years of myocardial infarction, heart failure, stroke, and even death.

“The 27-protein model has potential as a ‘universal’ surrogate end point for cardiovascular risk,” the researchers wrote in Science Translational Medicine.

Proteomics definitely has its place in clinical laboratory testing. The development of MRC-LMB’s Unknome Database will help researchers’ increase their knowledge about the functions of more proteins which should in turn lead to new diagnostic assays for labs.

—Donna Marie Pocius

Related Information:

Mapping the ‘Unknome’ May Reveal Critical Genes Scientists Have Ignored

How Many Proteins Exist?

Unknome: A Database of Human Genes We Know Almost Nothing About

Functional Unknomics: Systematic Screening of Conserved Genes of Unknown Function

Unknome Database Ranks Proteins Based on How Little is Known about Them

How a New Database of Human Genes Can Help Discover New Biology

The Unknome Catalogs Nearly Two Million Proteins. Many are Mysterious

Into the Unknome: Scientists at MRC LMB in Cambridge Create Database Ranking Human Proteins by How Little We know About Them

Scientists Hope to Illuminate Unknown Human Proteins with New Public Database

Proteomic Tests Empower Precision Medicine

A Proteomic Surrogate for Cardiovascular Outcomes That is Sensitive to Multiple Mechanisms of Change in Risk

Scientists in United Kingdom Manipulate DNA to Create a Synthetic Bacteria That Could Be Immune to Infections

Use of synthetic genetics to replicate an infectious disease agent is a scientific accomplishment that many microbiologists and clinical laboratory managers expected would happen

Microbiologists and infectious disease doctors are quite familiar with Escherichia coli (E. coli). The bacterium has caused much human sickness and even death around the globe, and its antibiotic resistant strains are becoming increasingly difficult to eradicate.

Now, scientists in England have created a synthetic “recoded” version of E. coli bacteria that is being used in a positive way—to fight disease. Their discovery is being heralded as an important breakthrough in the quest to custom-alter DNA to create synthetic forms of life that one day could be designed to fight specific infections, create new drugs, or produce tools to diagnose or treat disease.

Scientists worldwide working in the field of synthetic genomics are looking for ways to modify genomes in order to produce new weapons against infection and disease. This research could eventually produce methods for doctors—after diagnosing a patient’s specific strain of bacteria—to then use custom-altered DNA as an effective weapon against that patient’s specific bacterial infection.

This latest milestone is the result of a five-year quest by researchers at the Medical Research Council Laboratory of Molecular Biology (MRC-LMB) in Cambridge, England, to create a man-made version of the intestinal bacteria by redesigning its four-million-base-pair genetic code.

The MRC-LMB lab’s success marks the first time a living organism has been created with a compressed genetic code.

The researchers published their findings in the journal Nature.

Synthetic Genomics and Clinical Laboratories

Benjamin A. Blount, PhD, a postdoctoral research associate at Imperial College London, and Tom Ellis, PhD, Professor in Synthetic Genome Engineering at Imperial College London, praised the MRC-LMB team’s accomplishment in a subsequent Nature article.

“This is a landmark in the emerging field of synthetic genomics and finally applies the technology to the laboratory’s workhorse bacterium,” they wrote. “Synthetic genomics offers a new way of life, while at the same time moving synthetic biology towards a future in which genomes can be written to design.”

All known forms of life on Earth contain 64 codons—a specific sequence of three consecutive nucleotides that corresponds with a specific amino acid or stop signal during protein synthesis. Jason Chin, PhD, Program Lead at MRC-LMB, said biologists long have questioned why there are 20 amino acids encoded by 64 codons.

“Is there any function to having more than one codon to encode each amino acid?” Chin asked during an interview with the Cambridge Independent. “What would happen if you made an organism that used a reduced set of codons?”

The MRC-LMB research team took an important step toward answering that question. Their synthetic E. coli strain, dubbed Syn61, was recoded through “genome-wide substitution of target codons by defined synonyms.” To do so, researchers mastered a new piece-by-piece technique that enabled them to recode 18,214 codons to create an organism with a 61-codon genome that functions without a previously essential transfer RNA.

“Our synthetic genome implements a defined recoding and refactoring scheme–with simple corrections at just seven positions–to replace every known occurrence of two sense codons and a stop codon in the genome,” lead author Julius Fredens, PhD, a post-doctoral research associate at MRC, and colleagues, wrote in their paper.

Science Alert reports that the laboratory-created version of E. coli (above) “isn’t quite a dead ringer for its ancestor. The cells are a touch longer, and they reproduce 1.6 times slower. But the edited E. coli seems healthy and produces the same range and quantity of proteins as the non-edited versions.” (Photo copyright: Jason Chin/STAT.)

Joshua Atkinson, PhD, a postdoctoral research associate at Rice University in Houston, labeled the breakthrough a “tour de force” in the field of synthetic genomics. “This achievement sets a new world record in synthetic genomics by yielding a genome that is four times larger than the pioneering synthesis of the one-million-base-pair Mycoplasma mycoides genome,” he stated in Synthetic Biology.

“Synthetic genomics is enabling the simplification of recoded organisms; the previous study minimized the total number of genes and this new study simplified the way those genes are encoded.”

Manmade Bacteria That are Immune to Infections

Researchers from the J. Craig Venter Institute in Rockville, Maryland, created the first synthetic genome in 2010. According to an article in Nature, the Venter Institute successfully synthesized the Mycoplasma mycoides genome and used it “reboot” a cell from a different species of bacterium.

The MRC-LMB team’s success may prove more significant.

“This new synthetic E. coli should not be able to decode DNA from any other organism and therefore it should not be possible to infect it with a virus,” the MRC-LMB stated in a news release heralding the lab’s breakthrough. “With E. coli already being an important workhorse of biotechnology and biological research, this study is the first time any commonly used model organism has had its genome designed and fully synthesized and this synthetic version could become an important resource for future development of new types of molecules.”

Because the MRC-LMB team was able to remove transfer RNA and release factors that decode three codons from the E. coli bacteria, their achievement may be the springboard to designing manmade bacteria that are immune to infections or could be turned into new drugs.

“This may enable these codons to be cleanly reassigned and facilitate the incorporation of multiple non-canonical amino acids. This greatly expands the scope of using non-canonical amino acids as unique tools for biological research,” the MRC-LMB news release added.

Though synthetic genomics impact on clinical laboratory diagnostics is yet to be known, medical laboratory leaders should be mindful of the potential for rapid innovation in this field as proof-of-concept laboratory innovations are translated into real-world applications.

—Andrea Downing Peck

Related Information:

Scientists Redesigned an Entire Genome to Create the Most Synthetic Life Form Yet

World’s First Synthetic Organism with Fully Recoded DNA Is Created at MRC LMB in Cambridge

Creating an Entire Bacterial Genome with a Compressed Genetic Code

Total Synthesis of Escherichia Coli with a Recoded Genome

Construction of an Escherichia Coli Genome with Fewer Codons Sets Records

Life Simplified: Recompiling a Bacterial Genome for Synonymous Codon Compression

Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome

Cambridge University Researchers Recode E. Coli DNA to Create Living, Reproducing Bacteria with Entirely Synthetic DNA

;