News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel

News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel
Sign In

Common DNA Testing Method Using SNP Chips Struggles to Find Rare Variants Associated with BRCA Test, UK Researchers Find

Results of the UK study confirm for clinical laboratory professionals the importance of fully understanding the design and function of SNP chips they may be using in their labs

Here is another example of a long-established clinical laboratory test that—upon new evidence—turns out to be not as accurate as once thought. According to research conducted at the University of Exeter in Devon, UK, Single-nucleotide polymorphism (SNP) chips (aka, SNP microarrays)—technology commonly used in commercial genetic testing—is inadequate at detecting rare gene variants that can increase breast cancer risk.  

A news release announcing the results of the large-scale study states, “A technology that is widely used by commercial genetic testing companies is ‘extremely unreliable’ in detecting very rare variants, meaning results suggesting individuals carry rare disease-causing genetic variants are usually wrong.”

Why is this a significant finding for clinical laboratories? Because medical laboratories performing genetic tests that use SNP chips should be aware that rare genetic variants—which are clinically relevant to a patient’s case—may not be detected and/or reported by the tests they are running.

UK Researchers Find ‘Shockingly High False Positives’

The objective of the Exeter study published in British Medical Journal (BMJ), titled, “Use of SNP Chips to Detect Rare Pathogenic Variants: Retrospective, Population Based Diagnostic Evaluation,” was “To determine whether the sensitivity and specificity of SNP chips are adequate for detecting rare pathogenic variants in a clinically unselected population.”

The conclusion reached by the Exeter researchers, the BMJ study states, is that “SNP chips are extremely unreliable for genotyping very rare pathogenic variants and should not be used to guide health decisions without validation.”  

Leigh Jackson, PhD, Lecturer in Genomic Medicine at University of Exeter and co-author of the BMJ study, said in the news release, “The number of false positives on rare genetic variants produced by SNP chips was shockingly high. To be clear: a very rare, disease-causing variant detected using [an] SNP chip is more likely to be wrong than right.” 

Caroline Wright, PhD, Professor in Genomic Medicine at the University of Exeter Medical School
In the news release, Caroline Wright, PhD (above), Professor in Genomic Medicine at the University of Exeter Medical School and senior author of the BMJ study, said, “SNP chips are fantastic at detecting common genetic variants, yet we have to recognize that tests that perform well in one scenario are not necessarily applicable to others.” She added, “We’ve confirmed that SNP chips are extremely poor at detecting very rare disease-causing genetic variants, often giving false positive results that can have profound clinical impact. These false results had been used to schedule invasive medical procedures that were both unnecessary and unwarranted.” (Photo copyright: University of Exeter.)

Large-Scale Study Taps UK Biobank Data

The Exeter researchers were concerned about cases of unnecessary invasive medical procedures being scheduled by women after learning of rare genetic variations in BRCA1 (breast cancer type 1) and BRCA2 (breast cancer 2) tests.

“The inherent technical limitation of SNP chips for correctly detecting rare genetic variants is further exacerbated when the variants themselves are linked to very rare diseases. As with any diagnostic test, the positive predictive value for low prevalence conditions will necessarily be low in most individuals. For pathogenic BRCA variants in the UK Biobank, the SNP chips had an extremely low positive predictive value (1-17%) when compared with sequencing. Were these results to be fed back to individuals, the clinical implications would be profound. Women with a positive BRCA result face a lifetime of additional screening and potentially prophylactic surgery that is unwarranted in the case of a false positive result,” they wrote.

Using UK Biobank data from 49,908 participants (55% were female), the researchers compared next-generation sequencing (NGS) to SNP chip genotyping. They found that SNP chips—which test genetic variation at hundreds-of-thousands of specific locations across the genome—performed well when compared to NGS for common variants, such as those related to type 2 diabetes and ancestry assessment, the study noted.

“Because SNP chips are such a widely used and high-performing assay for common genetic variants, we were also surprised that the differing performance of SNP chips for detecting rare variants was not well appreciated in the wider research or medical communities. Luckily, we had recently received both SNP chip and genome-wide DNA sequencing data on 50,000 individuals through the UK Biobank—a population cohort of adult volunteers from across the UK. This large dataset allowed us to systematically investigate the performance of SNP chips across millions of genetic variants with a wide range of frequencies, down to those present in fewer than 1 in 50,000 individuals,” wrote Wright and Associate Professor of Bioinformatics and Human Genetics at Exeter, Michael Weedon, PhD, in a BMJ blog post.

The Exeter researchers also analyzed data from a small group of people in the Personal Genome Project who had both SNP genotyping and sequencing information available. They focused their analysis on rare pathogenic variants in BRCA1 and BRCA2 genes.

The researchers found:

  • The rarer the variant, the less reliable the test result. For example, for “very rare variants” in less than one in 100,000 people, 84% found by SNP chips were false positives.
  • Low positive predictive values of about 16% for very rare variants in the UK Biobank.
  • Nearly all (20 of 21) customers of commercial genetic testing had at least one false positive rare disease-causing variant incorrectly genotyped.
  • SNP chips detect common genetic variants “extremely well.”

Advantages and Capabilities of SNP Chips

Compared to next-gen genetic sequencing, SNP chips are less costly. The chips use “grids of hundreds of thousands of beads that react to specific gene variants by glowing in different colors,” New Scientist explained.

Common variants of BRCA1 and BRCA2 can be found using SNP chips with 99% accuracy, New Scientist reported based on study data.

However, when the task is to find thousands of rare variants in BRCA1 and BRCA2 genes, SNP chips do not fare so well.

“It is just not the right technology for the job when it comes to rare variants. They’re excellent for the common variants that are present in lots of people. But the rarer the variant is, the less likely they are to be able to correctly detect it,” Wright told CNN.

SNP chips can’t detect all variants because they struggle to cluster needed data, the Exeter researchers explained.

“SNP chips perform poorly for genotyping rare genetic variants owing to their reliance on data clustering. Clustering data from multiple individuals with similar genotypes works very well when variants are common,” the researchers wrote. “Clustering becomes more difficult as the number of people with a particular genotype decreases.”

Clinical laboratories Using SNP Chips

The researchers at Exeter unveiled important information that pathologists and medical laboratory professionals will want to understand and monitor. Cancer patients with rare genetic variants may not be diagnosed accurately because SNP chips were not designed to identify specific genetic variants. Those patients may need additional testing to validate diagnoses and prevent harm.

—Donna Marie Pocius

Related Information:

Large-scale Study Finds Genetic Testing Technology Falsely Detects Very Rare Variants

Use of SNP Chips to Detect Rare Pathogenic Variants: Retrospective, Population-Based Diagnostic Evaluation

The Home DNA Kits “Falsely Warning of High Risk of Cancer”: DIY Genetic Tests are “Extremely Unreliable” at Detecting Rare Genetic Variants, Major New Study Warns

SNP Chips Perform Poorly for Detecting Rare Genetic Variants

Chip-based DNA Testing Wrong More than Right for Very Rare Variants

Common Genetic Tests Often Wrong When Identifying Rare Disease-Causing Variants Such as BRCA1and BRCA2, Study Says

Researchers Easily Reidentify Deidentified Patient Records with 95% Accuracy; Privacy Protection of Patient Test Records a Concern for Clinical Laboratories

Protecting patient privacy is of critical importance, and yet researchers reidentified data using only a few additional data points, casting doubt on the effectiveness of existing federally required data security methods and sharing protocols

Clinical laboratories and anatomic pathologists know the data generated by their diagnostics and testing services constitute most of a patient’s personal health record (PHR). They also know federal law requires them to secure their patients’ protected health information (PHI) and any threat to the security of that data endangers medical laboratories and healthcare practices as well.

Therefore, recent coverage in The Guardian which reported on how easily so-called “deidentified data” can be reidentified with just a few additional data points should be of particular interest to clinical laboratory and health network managers and stakeholders.

Risky Balance Between Data Sharing and Privacy

In December 2017, University of Melbourne (UM) researchers, Chris Culnane, PhD, Benjamin Rubinstein, and Vanessa Teague, PhD, published a report with the Cornell University Library detailing how they reidentified data listed in an open dataset of Australian medical billing records.

“We found that patients can be re-identified, without decryption, through a process of linking the unencrypted parts of the record with known information about the individual such as medical procedures and year of birth,” Culnane stated in a UM news release. “This shows the surprising ease with which de-identification can fail, highlighting the risky balance between data sharing and privacy.”

In a similar study published in Scientific Reports, Yves-Alexandre de Montjoye, PhD, a computation private researcher, used location data on 1.5 million people from a mobile phone dataset collected over 15 months to identify 95% of the people in an anonymized dataset using four unique data points. With just two unique data points, he could identify 50% of the people in the dataset.

“Location data is a fingerprint. It’s a piece of information that’s likely to exist across a broad range of data sets and could potentially be used as a global identifier,” Montjoye told The Guardian.

The problem is exacerbated by the fact that everything we do online these days generates data—much of it open to the public. “If you want to be a functioning member of society, you have no ability to restrict the amount of data that’s being vacuumed out of you to a meaningful level,” Chris Vickery, a security researcher and Director of Cyber Risk Research at UpGuard, told The Guardian.

This privacy vulnerability isn’t restricted to just users of the Internet and social media. In 2013, Latanya Sweeney, PhD, Professor and Director at Harvard’s Data Privacy Lab, performed similar analysis on approximately 579 participants in the Personal Genome Project who provided their zip code, date of birth, and gender to be included in the dataset. Of those analyzed, she named 42% of the individuals. Personal Genome Project later confirmed 97% of her submitted names according to Forbes.

In testimony before the Privacy and Integrity Advisory Committee of the Department of Homeland Security (DHS), Latanya Sweeney, PhD (above), Professor and Director at Harvard’s Data Privacy Lab stated, “One problem is that people don’t understand what makes data unique or identifiable. For example, in 1997 I was able to show how medical information that had all explicit identifiers, such as name, address and Social Security number removed could be reidentified using publicly available population registers (e.g., a voter list). In this particular example, I was able to show how the medical record of William Weld, the Governor of Massachusetts of the time, could be reidentified using only his date of birth, gender, and ZIP. In fact, 87% of the population of the United States is uniquely identified by date of birth (e.g., month, day, and year), gender, and their 5-digit ZIP codes. The point is that data that may look anonymous is not necessarily anonymous. Scientific assessment is needed.” (Photo copyright: US Department of Health and Human Services.)

These studies reveal that—regardless of attempts to create security standards—such as the Privacy Rule in the Health Insurance Portability and Accountability Act of 1996 (HIPAA)—the sheer amount of available data on the Internet makes it relatively easy to reidentify data that has been deidentified.

The Future of Privacy in Big Data

“Open publication of deidentified records like health, census, tax or Centrelink data is bound to fail, as it is trying to achieve two inconsistent aims: the protection of individual privacy and publication of detailed individual records,” Dr. Teague noted in the UM news release. “We need a much more controlled release in a secure research environment, as well as the ability to provide patients greater control and visibility over their data.”

While studies are mounting to show how vulnerable deidentified information might be, there’s little in the way of movement to fix the issue. Nevertheless, clinical laboratories should consider carefully any decision to sell anonymized (AKA, blinded) patient data for data mining purposes. The data may still contain enough identifying information to be used inappropriately. (See Dark Daily, “Coverage of Alexion Investigation Highlights the Risk to Clinical Laboratories That Sell Blinded Medical Data,” June 21, 2017.)

Should regulators and governments address the issue, clinical laboratories and healthcare providers could find more stringent regulations on the sharing of data—both identified and deidentified—and increased liability and responsibility regarding its governance and safekeeping.

Until then, any healthcare professional or researcher should consider the implications of deidentification—both to patients and businesses—should people use the data shared in unexpected and potentially malicious ways.

—Jon Stone

Related Information:

‘Data Is a Fingerprint’: Why You Aren’t as Anonymous as You Think Online

Research Reveals De-Identified Patient Data Can Be Re-Identified

Health Data in an Open World

The Simple Process of Re-Identifying Patients in Public Health Records

Harvard Professor Re-Identifies Anonymous Volunteers in DNA Study

How Someone Can Re-Identify Your Medical Records

Trading in Medical Data: Is this a Headache or An Opportunity for Pathologists and Clinical Laboratories

Coverage of Alexion Investigation Highlights the Risk to Clinical Laboratories That Sell Blinded Medical Data

Tiny Faroe Islands to Begin Sequencing Genomes of All 50,000 Residents in Ambitious Effort to Advance Personalized Medicine

Because of isolation from the worldwide DNA pool for the past 1,200 years, Faroese population is vulnerable to recessive gene disorders

Because of the dramatic—and still falling—cost of DNA sequencing, an ambitious project is launching with the goal of sequencing the full DNA of all 50,000 residents of the Faroe Islands. When completed, this project has the potential to reshape molecular diagnostics and clinical laboratory testing.

FarGen is the name of this effort and pathologists and clinical laboratory managers will want to follow its progress. Organizers of this unique effort expect that it will speed up the use of personalized medicine in mainstream medicine. This tiny, self-governing Danish land, located between Iceland and Norway, is moving forward with plans to decipher complete DNA sequences for every one of its 50,000 citizens. (more…)

;