Scientists believe useful new clinical laboratory assays could be developed by better understanding the huge number of ‘poorly researched’ genes and the proteins they build
Researchers have added a new “-ome” to the long list of -omes. The new -ome is the “unknome.” This is significant for clinical laboratory managers because it is part of an investigative effort to better understand the substantial number of genes, and the proteins they build, that have been understudied and of which little is known about their full function.
The Unknome Database includes “thousands of understudied proteins encoded by genes in the human genome, whose existence is known but whose functions are mostly not,” according to a news release.
The database, which is available to the public and which can be customized by the user, “ranks proteins based on how little is known about them,” the PLOS Biology paper notes.
It should be of interest to pathologists and clinical laboratory scientists. The fruit of this research may identify additional biomarkers useful in diagnosis and for guiding decisions on how to treat patients.
“These uncharacterized genes have not deserved their neglect,” said Sean Munro, PhD (above), MRC Laboratory of Molecular Biology in Cambridge, England, in a press release. “Our database provides a powerful, versatile and efficient platform to identify and select important genes of unknown function for analysis, thereby accelerating the closure of the gap in biological knowledge that the unknome represents.” Clinical laboratory scientists may find the Unknome Database intriguing and useful. (Photo copyright: Royal Society.)
Risk of Ignoring Understudied Proteins
Proteomics (the study of proteins) is a rapidly advancing area of clinical laboratory testing. As genetic scientists learn more about proteins and their functions, diagnostics companies use that information to develop new assays. But did you know that researchers tend to focus on only a small fraction of the total number of protein-coding DNA sequences contained in the human genome?
The study of proteomics is primarily interested in the part of the genome that “contains instructions for building proteins … [which] are essential for development, growth, and reproduction across the entire body,” according to Scientific American. These are all protein-coding genes.
Proteomics estimates that there are more than two million proteins in the human body, which are coded for 20,000 to 25,000 genes, according to All the Science.
To build their database, the MRC researchers ranked the “unknome” proteins by how little is known about their functions in cellular processes. When they tested the database, they found some of these less-researched proteins important to biological functions such as development and stress resistance.
“The role of thousands of human proteins remains unclear and yet research tends to focus on those that are already well understood,” said Sean Munro, PhD, MRC Laboratory of Molecular Biology in Cambridge, England, in the news release. “To help address this we created an Unknome database that ranks proteins based on how little is known about them, and then performed functional screens on a selection of these mystery proteins to demonstrate how ignorance can drive biological discovery.”
In the paper, they acknowledged the human genome encodes about 20,000 proteins, and that the application of transcriptomics and proteomics has “confirmed that most of these new proteins are expressed, and the function of many of them has been identified.
“However,” the authors added, “despite over 20 years of extensive effort, there are also many others that still have no known function.”
They also recognized limited resources for research and that a preference for “relative safety” and “well-established fields” are likely holding back discoveries.
The researchers note “significant” risks to continually ignoring unexplored proteins, which may have roles in cell processes, serve as targets for therapies, and be associated with diseases as well as being “eminently druggable,” Genetic Engineering News reported.
Setting up the Unknome Database
To develop the Unknome Database, the researchers first turned to what has already come to fruition. They gave each protein in the human genome a “knownness” score based on review of existing information about “function, conservation across species, subcellular localization, and other factors,” Interesting Engineering reported.
It turns out, 3,000 groups of proteins (805 with a human protein) scored zero, “showing there’s still much to learn within the human genome,” Science News stated, adding that the Unknome Database catalogues more than 13,000 protein groups and nearly two million proteins.
The researchers then tested the database by using it to determine what could be learned about 260 “mystery” genes in humans that are also present in Drosophila (small fruit flies).
“We used the Unknome Database to select 260 genes that appeared both highly conserved and particularly poorly understood, and then applied functional assays in whole animals that would be impractical at genome-wide scale,” the researchers wrote in PLOS Biology.
“We initially selected all genes that had a knownness score of ≤1.0 and are conserved in both humans and flies, as well as being present in at least 80% of available metazoan genome sequences. … After testing for viability, the nonessential genes were then screened with a panel of quantitative assays designed to reveal potential roles in a wide range of biological functions,” they added.
“Our screen in whole organisms reveals that, despite several decades of extensive genetic screens in Drosophila, there are many genes with essential roles that have eluded characterization,” the researchers conclude.
Clinical Laboratory Testing Using the Unknome Database
Future use of the Unknome Database may involve CRISPR technology to explore functions of unknown genes, according to the PLOS Biology paper.
Munro told Science News the research team may work with other research efforts aimed at understanding “mysterious proteins,” such as the Understudied Proteins Initiative.
The Unknome Database’s ability to be customized by others means researchers can create their own “knownness” scores as it applies to their studies. Thus, the database could be a resource in studies of treatments or medications to fight diseases, Chemistry World noted.
According to a statement prepared for Healthcare Dive by SomaLogic, a Boulder, Colorado-based protein biomarker company, diagnostic tests that measure proteins can be applied to diseases and conditions such as:
“The 27-protein model has potential as a ‘universal’ surrogate end point for cardiovascular risk,” the researchers wrote in Science Translational Medicine.
Proteomics definitely has its place in clinical laboratory testing. The development of MRC-LMB’s Unknome Database will help researchers’ increase their knowledge about the functions of more proteins which should in turn lead to new diagnostic assays for labs.
Collected data could give healthcare providers and clinical laboratories a practical view of individuals’ oral microbiota and lead to new diagnostic assays
When people hear about microbiome research, they usually think of the study of gut bacteria which Dark Daily has covered extensively. However, this type of research is now expanding to include more microbiomes within the human body, including the oral microbiome—the microbiota living in the human mouth.
One example is coming from Genefitletics, a biotech company based in New Delhi, India. It recently launched ORAHYG, the first and only (they claim) at-home oral microbiome functional activity test available in Asia. The company is targeting the direct-to-consumer (DTC) testing market.
According to the Genefitletics website, the ORAHYG test can decode the root causes of:
“Using oral microbial gene expression sequencing technology and its [machine learning] model, [Genefitletics] recently debuted its oral microbiome gene expression solution, which bridges the gap between dentistry and systemic inflammation,” ETHealthworld reported.
“The molecular insights from this test would give an unprecedented view of functions of the oral microbiome, their interaction with gut microbiome and impact on metabolic, cardiovascular, cognitive, skin, and autoimmune health,” BioSpectrum noted.
“Microbes, the planet Earth’s original inhabitants, have coevolved with humanity, carry out vital biological tasks inside the body, and fundamentally alter how we think about nutrition, medicine, cleanliness, and the environment,” Sushant Kumar (above), founder and CEO of Genefitletics, told the Economic Times. “This has sparked additional research over the past few years into the impact of the trillions of microorganisms that inhabit the human body on our health and diverted tons of funding into the microbiome field.” Clinical laboratories may eventually see an interest and demand for testing of the oral microbiome. (Photo copyright: ETHealthworld.)
Imbalanced Oral Microbiome Can Trigger Disease
The term microbiome refers to the tiny microorganisms that reside on and inside our bodies. A high colonization of these microorganisms—including bacteria, fungi, yeast, viruses, and protozoa—live in our mouths.
“Mouth is the second largest and second most diverse colonized site for microbiome with 770 species comprising 100 billion microbes residing there,” said Sushant Kumar, founder and CEO of Genefitletics, BioSpectrum reported. “Each place inside the mouth right from tongue, throat, saliva, and upper surface of mouth have a distinctive and unique microbiome ecosystem. An imbalanced oral microbiome is said to trigger onset and progression of type 2 diabetes, arthritis, heart diseases, and even dementia.”
The direct-to-consumer ORAHYG test uses a saliva sample taken either by a healthcare professional or an individual at home. That sample is then sequenced through Genefitletics’ gene sequencing platform and the resulting biological data set added to an informatics algorithm.
Genefitletics’ machine-learning platform next converts that information into a pre-symptomatic molecular signature that can predict whether an individual will develop a certain disease. Genefitletics then provides that person with therapeutic and nutritional solutions that can suppress the molecules that are causing the disease.
“The current industrial healthcare system is really a symptom care [system] and adopts a pharmaceutical approach to just make the symptoms more bearable,” Kumar told the Economic Times. “The system cannot decode the root cause to determine what makes people develop diseases.”
Helping People Better Understand their Health
Founded in 2019, Genefitletics was created to pioneer breakthrough discoveries in microbial science to promote better health and increase longevity in humans. The company hopes to unravel the potential of the oral microbiome to help people fend off illness and gain insight into their health.
“Microorganisms … perform critical biological functions inside the body and transform our approach towards nutrition, medicine, hygiene and environment,” Kumar told CNBC. “It is important to understand that an individual does not develop a chronic disease overnight.
“It starts with chronic inflammation which triggers pro-inflammatory molecular indications. Unfortunately, these molecular signatures are completely invisible and cannot be measured using traditional clinical grade tests or diagnostic investigations,” he added. “These molecular signatures occur due to alteration in gene expression of gut, oral, or vaginal microbiome and/or human genome. We have developed algorithms that help us in understanding these alterations way before the clinical symptoms kick in.”
Genefitletics plans to utilize individuals’ collected oral microbiome data to determine their specific nutritional shortcomings, and to develop personalized supplements to help people avoid disease.
The company also produces DTC kits that analyze gut and vaginal microbiomes as well as a test that is used to evaluate an infant’s microbiome.
“The startup wants to develop comparable models to forecast conditions like autism, PCOS [polycystic ovarian syndrome], IBD [Inflammatory bowel disease], Parkinson’s, chronic renal [kidney] disease, anxiety, depression, and obesity,” the Economic Times reported.
Time will tell whether the oral microbiome tests offered by this company prove to be clinically useful. Certainly Genefitletics hopes its ORAHYG test can eventually provide healthcare providers—including clinical laboratory professionals—with a useful view of the oral microbiome. The collected data might also help individuals become aware of pre-symptomatic conditions that make it possible for them to seek confirmation of the disease and early treatment by medical professionals.
Research findings could lead to new biomarkers for genetic tests and give clinical laboratories new capabilities to diagnose different health conditions
New insights continue to emerge about “junk DNA” (aka, non-coding DNA). For pathologists and clinical laboratories, these discoveries may have value and eventually lead to new biomarkers for genetic testing.
One recent example comes from researchers at Stanford Medicine in California who recently learned how non-coding DNA—which makes up 98% of the human genome—affects gene expression, the function that leads to observable characteristics in an organism (phenotypes).
The research also could lead to a better understanding of how short tandem repeats (STRs)—the number of times a gene is copied into RNA for protein use—affect gene expression as well, according to Stanford.
“We’ve known for a while that short tandem repeats or STRs, aren’t junk because their presence or absence correlates with changes in gene expression. But we haven’t known how they exert these effects,” said study lead Polly Fordyce, PhD (above), Associate Professor of Bioengineering and Genetics at Stanford University, in a news release. The research could lead to new clinical laboratory biomarkers for genetic testing. (Photo copyright: Stanford University.)
To Bind or Not to Bind
In their Science paper, the Stanford researchers described an opportunity to explore a new angle to transcription factors binding to some sequences, also known as sequence motifs.
“Researchers have spent a lot of time characterizing these transcription factors and figuring out which sequences—called motifs—they like to bind to the most,” said the study lead Polly Fordyce, PhD, Associate Professor of Bioengineering and Genetics at Stanford University, in a Stanford Medicine news release.
“But current models don’t adequately explain where and when transcription factors bind to non-coding DNA to regulate gene expression. Sometimes, no transcription factor is attached to something that looks like a perfect motif. Other times, transcription factors bind to stretches of DNA that aren’t motifs,” the news release explains.
Transcription factors are “like light switches that can turn genes on or off depending on what cells need,” notes a King’s College LondonEDIT Labblog post.
But why do transcription factors target some places in the genome and not others?
“To solve the puzzle of why transcription factors go to some places in the genome and not to others, we needed to look beyond the highly preferred motifs,” Fordyce added. “In this study, we’re showing that the STR sequence around the motif can have a really big effect on transcription factor binding, providing clues as to what these repeated sequences might be doing.”
Such information could aid in understanding certain hereditary conditions and diseases.
“Variations in STR length have been associated with changes in gene expression and implicated in several complex phenotypes such as schizophrenia, cancer, autism, and Crohn’s disease. However, the mechanism by which STRs affect transcription remains unknown,” the researchers wrote in Science.
Special Assays Explore Binding
According to their paper, the research team turned to the Fordyce Lab’s previously developed microfluidic binding assays (MITOMI, k–MITOMI, and STAMMP) to analyze the impact of different DNA sequences on transcription factor binding.
“In the experiment we asked, ‘How do these changes impact the strength of transcription factor binding?’ We saw a surprisingly large effect. Varying the STR sequence around a motif can have a 70-fold impact on the binding,” Fordyce wrote.
In an accompanying Science article titled, “Repetitive DNA Regulates Gene Expression,” Thomas Kuhlman, PhD, Assistant Professor, Physics and Astronomy, University of California, Riverside, wrote that the study “demonstrates that STRs exert their effects by directly binding transcription factor proteins, thus explaining how STRs might influence gene expression in both normal and diseased states.”
“This research unveils, for the first time, the intricate connection between how variants in the non-coding genome affect genes that are associated with blood pressure and with hypertension. What we’ve created is a kind of functional map of the regulators of blood pressure genes, “said Philipp Maass, PhD, Lead Researcher and Assistant Professor Molecular Genetics, University of Toronto, in a news release.
The research team used massively parallel reporter assay (MPRA) technology to analyze 4,608 genetic variants associated with blood pressure.
The findings could aid precision medicine for cardiovascular health and may possibly be adopted to other conditions, according to The Hospital for Sick Children.
“The variants we have characterized in the non-coding genome could be used as genomic markers for hypertension, laying the groundwork for future genetic research and potential therapeutic targets for cardiovascular disease,” Maass noted.
Why All the ‘Junk’ DNA?
Clinical laboratory scientists may wonder why genetic research has primarily focused on 20,000 genes within the genome, leaving the “junk” DNA for later investigation. So did researchers at Harvard University.
“After the Human Genome Project, scientists found that there were around 20,000 genes within the genome, a number that some researchers had already predicted. Remarkably, these genes comprise only about 1-2% of the three billion base pairs of DNA. This means that anywhere from 98-99% of our entire genome must be doing something other than coding for proteins,” they wrote in a blog post.
“Imagine being given multiple volumes of encyclopedias that contained a coherent sentence in English every 100 pages, where the rest of the space contained a smattering of uninterpretable random letters and characters. You would probably start to wonder why all those random letters and characters were there in the first place, which is the exact problem that has plagued scientists for decades,” they added.
Not only is junk DNA an interesting study subject, but ongoing research may also produce useful new biomarkers for genetic diagnostics and other clinical laboratory testing. Thus, medical lab professionals may want to keep an eye on new developments involving non-coding DNA.
Technologies developed by Pääbo to sequence Neanderthal DNA are being widely used in many clinical laboratory settings, including to study infectious disease outbreaks
Pääbo is considered to be the founder of paleogenetics. This field of science studies the past through examination of preserved genetic material found in remains of ancient organisms. It was his development of pioneering technologies that allowed for the genomic sequencing of Neanderthal DNA.
“[Pääbo’s] work has revolutionized our understanding of the evolutionary history of modern humans,” said German electrochemist Martin Stratmann, PhD, President of the Max Planck Society for the Advancement of Science (MPG), in a press release. “Svante Pääbo, for example, demonstrated that Neanderthals and other extinct hominids made a significant contribution to the ancestry of modern humans.”
“The thing that’s amazing to me is that you now have some ability to go back in time and actually follow genetic history and genetic changes over time,” Svante Pääbo, PhD (above), director of the Max Planck Institute for Evolutionary Anthropology, stated in a news conference, Reuters reported. “It’s a possibility to begin to actually look on evolution in real time, if you like.” Development of modern clinical laboratory techniques for identifying and tracking disease outbreaks have already evolved due to these findings. (Photo copyright: Max Planck Institute for Evolutionary Anthropology.)
.
Comparing Neanderthal DNA to That of Modern Humans
Back in the mid-1990s, Pääbo and a team of researchers decoded relatively short fragments of mitochondrial DNA of a Neanderthal male. They discovered through their analysis that the DNA from the Neanderthal varied considerably from the genome of contemporary humans. This validated the belief that modern humans are not direct descendants of the Neanderthals.
Pääbo’s research team found nearly all (99.9%) of the Neanderthal DNA they studied to be heavily colonized by bacteria and fungi. That required them to create solutions for assembling the short components of mitochondrial DNA like a huge puzzle.
To accomplish this, the team had to:
Work under clean room conditions to prevent the accidental introduction of their own DNA into their experiments.
Establish more efficient extraction methods to enhance the output of Neanderthal DNA.
Generate complex computer programs that could compare the ancient DNA fragments with reference genomes of both humans and chimpanzees.
“Neanderthals are the closest relatives of humans today” said Pääbo in the press release. “Comparisons of their genomes with those of modern humans and with those of apes enable us to determine when genetic changes occurred in our ancestors. In the future, it could also be clarified why modern humans eventually developed a complex culture and technology that enabled them to colonize almost the entire world.”
Pääbo’s team succeeded in reconstructing their first version of the Neanderthal genome in 2010. Their comparisons between the genomes of Neanderthal and modern humans proved that the two groups had produced common offspring about 50,000 years ago and that this genetic contribution did influence human evolution.
The genome of modern non-African people still contains about 2% Neanderthal DNA.
“We have found around 30,000 positions in which the genomes of almost all modern humans differ from those of Neanderthals and great apes,” Pääbo added. “They answer what makes anatomically modern humans ‘modern’ in the genetic sense as well. Some of these genetic changes may be the key to understanding what distinguishes the cognitive abilities of today’s humans from those of now extinct hominids.”
Those with Neanderthal DNA More Susceptible to Severe COVID-19 Infection
Pääbo’s research also found that Neanderthal DNA may have affected the immune systems of modern people. During the COVID-19 pandemic, his work verified that individuals who carry a gene variant inherited from Neanderthals are more prone to severe forms of the illness than those who do not have that gene variant.
“We can make an average gauge of the number of the extra deaths we have had in the pandemic due to the contribution from the Neanderthals,” Pääbo said in a 2022 lecture, Reuters reported. “It is quite substantial, it’s more than one million extra individuals who have died due to this Neanderthal variant that they carry.”
Pääbo’s research team continues to develop new methods for reconstructing DNA fragments that are even more biodegraded, and which present in smaller amounts. Their ultimate goal is to investigate even older DNA and genetic material that is scarce due to climate conditions.
The DNA technologies pioneered by Pääbo to sequence Neanderthal DNA are being used widely in many clinical laboratory and research settings today. They include forensic science and the ability to collect DNA from human remains hundreds of years old to identify infectious disease outbreaks and study how today’s human genome has adopted new mutations.
And in less than eight hours, they had diagnosed a child with a rare genetic disorder, results that would take clinical laboratory testing weeks to return, demonstrating the clinical value of the genomic process
In another major genetic sequencing advancement, scientists at Stanford University School of Medicine have developed a method for rapid sequencing of patients’ whole human genome in as little as five hours. And the researchers used their breakthrough to diagnose rare genetic diseases in under eight hours, according to a Stanford Medicine news release. Their new “ultra-rapid genome sequencing approach” could lead to significantly faster diagnostics and improved clinical laboratory treatments for cancer and other diseases.
“A few weeks is what most clinicians call ‘rapid’ when it comes to sequencing a patient’s genome and returning results,” said cardiovascular disease specialist Euan Ashley, MD, PhD (above), professor of medicine, genetics, and biomedical data science, at Stanford University in the news release. “The right people suddenly came together to achieve something amazing. We really felt like we were approaching a new frontier.” Their results could lead to faster diagnostics and clinical laboratory treatments. (Photo copyright: Stanford Medicine.)
.
Need for Fast Genetic Diagnosis
In their NEJM paper, the Stanford scientists argue that rapid genetic diagnosis is key to clinical management, improved prognosis, and critical care cost savings.
“Although most critical care decisions must be made in hours, traditional testing requires weeks and rapid testing requires days. We have found that nanopore genome sequencing can accurately and rapidly provide genetic diagnoses,” the authors wrote.
To complete their study, the researchers sequenced the genomes of 12 patients from two hospitals in Stanford, Calif. They used nanopore genome sequencing, cloud computing-based bioinformatics, and a “custom variant prioritization.”
Their findings included:
Five people received a genetic diagnosis from the sequencing information in about eight hours.
Diagnostic rate of 42%, about 12% higher than the average rate for diagnosis of genetic disorders (the researchers noted that not all conditions are genetically based and appropriate for sequencing).
Five hours and two minutes to sequence a patient’s genome in one case.
Seven hours and 18 minutes to sequence and diagnose that case.
How the Nanopore Process Works
To advance sequencing speed, the researchers used equipment by Oxford Nanopore Technologies with 48 sequencing units called “flow cells”—enough to sequence a person’s whole genome at one time.
The Oxford Nanopore PromethION Flow Cell generates more than 100 gigabases of data per hour, AI Time Journal reported. The team used a cloud-based storage system to enable computational power for real-time analysis of the data. AI algorithms scanned the genetic code for errors and compared the patients’ gene variants to variants associated with diseases found in research data, Stanford explained.
According to an NVIDIA blog post, “The researchers accelerated both base calling and variant calling using NVIDIA GPUs on Google Cloud. Variant calling, the process of identifying the millions of variants in a genome, was also sped up with NVIDIA Clara Parabricks, a computational genomics application framework.”
Rapid Genetic Test Produces Clinical Benefits
“Together with our collaborators and some of the world’s leaders in genomics, we were able to develop a rapid sequencing analysis workflow that has already shown tangible clinical benefits,” said Mehrzad Samadi, PhD, NVIDIA Senior Engineering Manager and co-author of the NEJM paper, in the blog post. “These are the kinds of high-impact problems we live to solve.”
In their paper, the Stanford researchers described their use of the rapid genetic test to diagnose and treat an infant who was experiencing epileptic seizures on arrival to Stanford’s pediatric emergency department. In just eight hours, their diagnostic test found that the infant’s convulsions were attributed to a mutation in the gene CSNK2B, “a variant and gene known to cause a neurodevelopmental disorder with early-onset epilepsy,” the researchers wrote.
“By accelerating every step of this process—from collecting a blood sample to sequencing the whole genome to identifying variants linked to diseases—[the Stanford] research team took just hours to find a pathogenic variant and make a definitive diagnosis in a three-month-old infant with a rare seizure-causing genetic disorder. A traditional gene panel analysis ordered at the same time took two weeks to return results,” AI Time Journal reported.
New Benchmarks
The Stanford research team wants to cut the sequencing time in half. But for now, the five-hour rapid whole genome sequence can be considered by clinical laboratory leaders, pathologists, and research scientists a new benchmark in genetic sequencing for diagnostic purposes.
Stories like Stanford’s rapid diagnosis of the three-month old patient with epileptic seizures, point to the ultimate value of advances in genomic sequencing technologies.