Harvard School of Medicine researcher discovers only a fraction of all known human genes are ever included in research studies

It seems every day that diagnostic test developers are announcing new genetic tests for everything from researching bloodlines to predicting vulnerability to specific chronic diseases. However, as most pathologists know, there are more than 20,000 protein-coding genes in the human genome. Thus, an overwhelming majority of genes are not being researched or studied.

That’s according to Peter Kerpedjiev, PhD, a Postdoctoral Fellow at Harvard Medical School in Boston. Kerpedjiev analyzed US National Library of Medicine (NLM) data from its PubMed database. He found that roughly 25% of the articles tagged by the NLM only featured 100 of the 20,000 human genes.

Kerpedjiev studied approximately 40,000 NLM articles that were tagged as describing the structure, function, or location of a particular gene. He then created a list of the top-10 most-studied genes of all time, which contained interesting and unforeseen disclosures.

“The list was surprising,” Kerpedjiev told Nature. “Some genes were predictable; others were completely unexpected.”

Guardian of the Genome

According Kerpedjiev, the top-10 most-studied genes are:

  1. TP53;
  2. TNF;
  3. EGFR;
  4. VEGFA;
  5. APOE;
  6. IL6;
  7. TGFBI;
  8. MTHFR;
  9. ESR1; and,
  10. AKT1.

Kerpedjiev discovered that the top gene on the list—Tumor protein p53 (TP53)—was mentioned in about 8,500 articles to date, and that it is typically included in about two PubMed papers per day. When he began his research three years ago, TP53 was referenced in about 6,600 articles.

Peter Kerpedjiev, PhD (above), is a Postdoctoral Fellow in the lab of Nils Gehlenborg at Harvard Medical School. Previously, he was a PhD student working on modelling the tertiary structure of RNA molecules at the Theoretical Biochemistry Group at the University of Vienna. (Photo and caption copyright: Gehlenborg Lab.)

The National Library of Medicine describes the TP53 gene as a tumor suppressor that regulates cell division by preventing cells from growing and proliferating too quickly or uncontrolled. It is mutated in approximately half of all human cancers and is often referred to as the “guardian of the genome.”

“That explains its staying power,” Bert Vogelstein, MD, Professor of Oncology and Pathology at Johns Hopkins School of Medicine in Baltimore, Md., told Nature. “In cancer, there’s no gene more important.”

Critical Roles in Prevention/Treatment of Chronic Disease

The remaining genes on the list also have crucial roles in the functioning of the human body and disease prevention and treatment. Below is a brief summary of genes two through 10 on the list:

TNF encodes a proinflammatory cytokine that is part of the tumor necrosis factor superfamily. This family of proteins was originally distinguished by their ability to cause the necrosis of neoplasms. The TNF gene has been a drug target for cancer and inflammatory diseases, such as:

EGFR makes a protein known as the epidermal growth factor receptor, which positions the cell membrane to bind to other proteins outside the cell to help it receive signals to trigger cell growth, division, and survival. At least eight known mutations of the EGFR gene have been associated with lung cancer and often appear in drug-resistant cases of the disease.

Vascular Endothelial Growth Factor A (VEGFA) contains a heparin-binding protein that promotes the growth of blood vessels and is critical for physiological and pathological angiogenesis. Variants of the VEGFA gene have been affiliated with microvascular complications of diabetes mellitus and atherosclerosis.

ApoE produces a protein named Apolipoprotein E, which combines with lipids in the body to form lipoproteins that carry cholesterol and other fats through the bloodstream. ApoE-e3 is the most common allele (a variant of the gene) and is found in more than 50% of the general population. In addition to its role in cholesterol and lipoprotein metabolism, ApoE is also associated with:

  • Alzheimer’s disease;
  • Age-related hearing loss; and,
  • Macular degeneration.

Interleukin 6 (IL6) is a cytokine that is mainly produced at locations of acute and chronic inflammation. Once there, it is secreted into the serum where it incites an anti-inflammatory response. The IL6 gene is connected with inflammation-associated diseases such as:

Transforming Growth Factor Beta 1 (TGFB1) initiates chemical signals that regulate various cell activities including the proliferation, maturation, differentiation, motility, and apoptosis of cells throughout the body. The protein created by TGFB1 is abundant in skeletal tissues and regulates the formation and growth of bones and cartilage. Mutations in the TGFB1 gene have been associated with breast, colorectal, lung, liver, and prostate cancers. At least 12 mutations of this gene are known to cause Camurati-Engelmann disease, which is distinguished by hyperostosis (abnormally thick bones) in the arms, legs, and skull.

MTHFR makes methylenetetrahydrofolate reductase, an enzyme that performs a crucial role in processing amino acids. Polymorphisms of this gene have been linked to risk factors for a variety of conditions including:

  • Cardiovascular disease;
  • Stroke;
  • Hypertension;
  • Pre-eclampsia;
  • Glaucoma;
  • Psychiatric disorders; and,
  • Various cancers.

Estrogen Receptor 1 (ESR1) is a ligand-activated transcription factor that is significant for hormone and DNA binding. Estrogen and its receptors are crucial for sexual development and reproductive functions. They also can affect pathological processes including breast and endometrial cancers and osteoporosis.

AKT1 provides instructions for producing a protein known as AKT1 kinase that is located in many cell types throughout the body and is essential for the development and function of the nervous system. This gene belongs to a classification of genes known as oncogenes, which when mutated have the potential to cause normal cells to turn cancerous.

We Don’t Know What We Don’t Know

“It’s revealing how much we don’t know about because we just don’t bother to research it,” noted Dr. Helen Anne Curry, Senior Lecturer and Historian of Modern Science and Technology at the University of Cambridge, UK, in the Nature article. As far back as 2010, Dark Daily reported on university researchers predicting massive growth in anatomic pathology and clinical laboratory diagnostic testing based on the human genome.

How Kerpedjiev’s discovery might impact future genetic diagnostic test development remains to be seen. It will, however, be fascinating to see how this top-10 list of the most studied genes will change over time and how medical laboratory genetic testing may be affected.

—JP Schlingman

Related Information:

The Most Popular Gene in the Human Genome

Top 10 Genes in the Human Genome (by Number of Citations)

Explore the Normal Functions of Human Genes and the Health Implications of Genetic Changes

Stanford Study Shows How Pathologists May Eventually Use the Whole Human Genome for Diagnostic Purposes