Full Genome Sequencing of All Animal Species Continues, but Sequencing of Invertebrate Species Lags Behind That of Vertebrate Species
Scientists working to sequence all 1.66 million animal species say this is a missed opportunity to better understand our own genetics; such research would identify biomarkers useful for clinical laboratory testing
For 23 years, the world’s genomic scientists have been on a mission to sequence the genomes of all animal species. And they’ve made great progress. However, according to a recent study conducted by researchers at Washington State University (WSU) and Brigham Young University (BYU), only a fraction of the sequences are from invertebrate species. And that, according to the study’s authors, is “overlooking huge swathes of diversity and opportunity.”
The push to sequence the whole genomes of all animals began in 1998 with the sequencing of the Caenorhabditis elegans roundworm, according to a WSU news release. It was the first animal genome sequence, but it was not to be the last. Nearly 25 years later, genomic scientists have sequenced about 3,300 animal genomes. And while that’s a lot of genomic sequences, it’s a drop in bucket of the approximately 1.7 million animal species on the planet.
But here’s where the missed opportunity comes in. According to the WSU news release, “Vertebrates account for 54% of all genome sequencing assemblies, despite representing only 3.9% of animal species. In contrast, the invertebrates of the Arthropoda phylum, which includes insects and spiders, comprise only 34% of current datasets while representing 78.5% of all species.”
The WSU/BYU researchers described their findings in the journal Proceedings of the National Academy of Sciences (PNAS), titled, “Toward a Genome Sequence for Every Animal: Where Are We Now?”
Are Hominids More Charismatic?
The scientists analyzed the best available genome assemblies found in GenBank, the world’s most extensive genetic database. They found that 3,278 unique animal species across 24 phyla, 64 classes, and 258 orders have been sequenced and assembled to date.
They also found that sequencing efforts have focused heavily on species that most resemble humans. The Hominidae, a taxonomic family of primates that includes humans as well as great apes, bonobos, chimpanzees, orangutans, and gorillas, has the most contiguous genome data assembled.
The team discovered that vertebrates account for 54% of the animal genome sequencing that has been performed even though they make up less than four percent of known animal species. By comparison, invertebrates of the Arthropoda phylum, which represent 78.5% of all animal species, comprise only 34% of the completed animal genome sequencing. And yet, the Arthropoda phylum is the largest phylum in the animal kingdom and includes insects, spiders, scorpions, centipedes, millipedes, crabs, crayfish, lobsters, and barnacles.
“With genome assemblies accumulating rapidly, we want to think about where we are putting our efforts. It’s not being spread evenly across the animal tree of life,” said lead author Scott Hotaling, PhD, post-doctoral researcher at WSU, in the news release. “Invertebrates are still very underrepresented, which makes sense given that people seem to care more about vertebrates, the so-called ‘charismatic megafauna.’”
The team discovered that only five arthropod groups: ants, bees, butterflies, fruit flies, and mosquitos, were well represented in genome sequencing. The longest genome sequenced so far belongs to the Australian lungfish, the only surviving member of the family Neoceratodontidae.
1,100 Years to Sequence All Eukaryotic Life
The scientists also discerned that animal genome assemblies have been produced by 52 countries on every continent with permanent inhabitants. The majority of animal genome sequencing (77%) that is being performed is mostly occurring in developed countries located in the Northern Hemisphere, often referred to as the Global North. Nearly 70% of all animal genome assemblies have been produced by just three countries: the United States, China, and Switzerland.
There are geographic differences between regions regarding the types of animals being sequenced and assembled with North America concentrating on mammals and insects, Europe focusing on fish, and birds being the main type of animals sequenced in Asia.
The scientists would like to see more animal genome sequencing happening in countries from the Global South, or Southern Hemisphere, particularly in tropical regions that contain a myriad of diversity among animal species.
“If we want to build a global discipline, we need to include a global people,” Hotaling said. “It’s just basic equity, and from a pure scientific standpoint, the people who live in areas where species are being sequenced have a lot of knowledge about those species and ecosystems. They have a lot to contribute.”
But the WSU/BYU scientists found that many species in GenBank only have low-quality assemblies available. They noted that “the quality of a genome assembly is likely the most important factor dictating its long-term value.”
Fortunately, several animal genome sequencing ventures have been announced in recent years, so the amount of available data is expected to rise exponentially. These projects include:
- The Earth BioGenome Project (EBP) which aspires to sequence and catalog the genes of all the eukaryotic species on the planet within ten years.
- The Vertebrate Genomes Project which seeks to generate high-quality assemblies for 70,000 extant vertebrate species.
- The Bird 10K Project that seeks to generate assemblies for all extant birds.
- The i5K Project which plans to produce 5,000 arthropod genome assemblies.
- The Darwin Tree of Life Project which aims to sequence genomes for all eukaryotes in Britain and Ireland.
The authors of the PNAS paper noted that there are currently only about four genome assemblies happening each day and, at that rate, the sequencing of all eukaryotic life will not be completed until the year 3130.
So, microbiologists, clinical laboratory professionals, and genomic scientists have plenty of time to get up to speed.