California’s Massive Microarray SNP Genotyping Project Processed Genetic Data from More Than 100,000 Volunteers and Characterized 70 Billion Genetic Variants in 14 months

Faster sequencing speed and accuracy could fuel growth of biomarkers and lead to development of new medical laboratory tests and therapeutic drugs

Trailblazing methods used to create a treasure trove of genetic data from 100,000 Californians could pay dividends for clinical laboratories and pathology groups if similar projects identify novel biomarkers and fuel the development of new clinical laboratory tests and therapeutic drugs.

In fact, California is once again in the forefront, this time with a major program to create a big database of genetic data. The program is called the Genetic Epidemiology Research on Adult Health and Aging (GERA). It is a collaboration between the Kaiser Permanente Northern California Research Program on Genes, Environment, and Health (RPGEH) and the Institute for Human Genetics at the University of California, San Francisco (UCSF) that began in 2009.

Goal Is to Collect Genetic Information from 100,000 Californians

The genetic data will be collected from more than 100,000 Californians who are part of the Kaiser Permanente Medical Care plan. These individuals consented to anonymously share their DNA—via saliva samples—along with their medical records and answers to survey questions on their behavior and background with researchers.

“Data from this immense and ethnically diverse population will be a tremendous resource for science,” National Institutes of Health (NIH) Director Francis S. Collins, MD, PhD, stated in an NIH news release. “It offers the opportunity to identify potential genetic risks and influences on a broad range of health conditions, particularly those related to aging.”

National Institutes of Health Director Francis S. Collins, MD, PhD, expects the GERA project to aid researchers studying a wide range of diseases and conditions, which ultimately could lead to the creation of new clinical laboratory tests and new therapeutic drugs. (Photo copyright National Institutes of Health.)

Innovations in Microarray SNP Genotyping Are Key Reason for the Project’s Success

One of the GERA project’s major accomplishments was processing more than 100,000 samples—characterizing 70 billion genetic variants—within the two years dictated by the terms of its funding.

“In 2009, this [microarray SNP genotyping] was a huge task,” stated co-author Pui-Yan Kwok, MD, PhD, of the Institute for Human Genetics, UCSF, in a Genetics Society of America (GSA) news release. “It hadn’t been done this fast before. The assays ran 24/7, so we had to develop new processes for analyzing [microarray SNP genotyping] data in real time to alert us to any problems as soon as they happened. We also had to boost the analysis quality to make best use of the data.”

The GERA project represented the first large-scale use of the Affymetrix Axiom Genotyping Solution. In the Gene to Genomes blog on the GSA website, project co-principal investigator Neil Risch, PhD, a UCSF Professor of Biostatistics, describes some of the modifications researchers made that enabled the lab work to be completed within 14 months.

Speedy Analysis of Microarray SNP Genotyping Data Was a Quality Control Method

“Part of our solution to the time crunch was developing real-time turnaround in the data analysis,” Risch stated. “So within three hours after the results came out of the GeneTitan, we knew if anything was going wrong. Working in this way probably saved us hundreds of thousands of dollars.

“We also improved the way the genotypes were called [inferred], realizing that Affymetrix’s historical method was suboptimal for rare variants,” he added. “The upshot is that Affymetrix has since changed its protocol and has used a lot of the lessons that we learned with the GERA project to benefit other very large genotyping projects using the same platform—for example, the Million Veterans Program.”

Pui-Yan Kwok, MD, PhD, (above) is a Henry Bachrach Distinguished Professor at the University of California, San Francisco. Kwok was one of the genetic researchers involved in the project that has resulted in a huge database of genetic data on more than 100,000 Californians who participated in the Genetic Epidemiology Research on Adult Health and Aging (GERA). (Photo copyright University of California, San Francisco.)

Data Links Ethnically Diverse Population to Their Lifestyles and Environments

The project’s results were published in three separate papers in the August 2015 issue of the journal Genetics. These papers describe the population structure and genetic ancestry of the GERA participants, telomere length analysis, and detail methods used to speed up genotyping.

Risch says the individuals participating in GERA had an average age of 63. A “treasure trove of data” was produced by linking a large, ethnically diverse population’s genetic information to lifestyle and environmental data from surveys. In addition, clinical, pharmacy, imaging and diagnostic laboratory data from electronic medical records were part of this big data study.

“By linking these clinical records with genomic data from each person, we now have the power to track down many genetic and environmental contributions to disease,” Risch said in the GSA statement.

Useful Insights on Many Diseases Emerged from GERA’s Databased

GERA data already has been used to pinpoint genetic variants linked to prostate cancer, allergies, glaucoma, macular degeneration, diabetes, high cholesterol and other diseases.

“No matter which disease we’ve looked at, we found genetic variants that influence it. And the beauty of this dataset is that it covers countless diseases and traits, and the medical records are constantly being updated as the cohort grows older,” Risch said.

Researchers can apply for access to the GERA project’s data by following procedures on the database of Genotypes and Phenotypes (dbGaP) website. The site is managed by the National Center for Biotechnology Information, a division of the National Library of Medicine at the NIH.

“The GERA cohort has the largest number of people of any age with data in dbGaP,” National Institute on Aging Director Richard J. Hodes, MD, said in the NIH statement. “Federal funds were used to develop new approaches to genomics for this project and I’m pleased the data are now ready in dbGaP for researchers’ use.”

For clinical laboratories developing new genetic tests, the ability to access information in GERA’s database may be helpful. Meanwhile, the achievement of producing microarray SNP genotyping information from 100,000 Californians in the short time of 14 months is a powerful statement about rapid improvements in microarray SNP genotyping technologies. This progress is expected to continue and medical laboratories and anatomic pathology groups may be surprised at how quickly in vitro diagnostics manufacturers introduce low-cost, fast, and accurate gene-sequencing instruments into the clinical marketplace.

—Andrea Downing Peck

Related Information:

Tens of Thousands Californians Give Up Their Genetic Secrets

Genotyping Informatics and Quality Control for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort

Automated Assay of Telomere Length Measurement and Informatics for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort

Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort

Health Records and Genetic Data from More Than 100,000 Californians Power Medical Research

Turning Spit and Data into Treasure

NIH Adds Substantial Set of Genetic, Health Information to Online Database

California’s Massive Microarray SNP Genotyping Project Processed Genetic Data from More Than 100,000 Volunteers and Characterized 70 Billion Genetic Variants in 14 months

E-Briefings Categories