News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel

News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel
Sign In

University of Washington and Microsoft Research Encode Data into DNA, Demonstrating Potential New Use for Genetic Sequences

The proof-of-concept experiment showed data can be encoded in DNA and retrieved using automated systems, a development that may have positive significance for clinical laboratories

It may seem far-fetched, but computer scientists and research groups have worked for years to discover if it is possible to store data on Deoxyribonucleic acid (DNA). Now, Microsoft Research (MR) and the University of Washington (UW) have achieved just that, and the implications of their success could be far-reaching.

Clinical pathologists are increasingly performing genetic DNA sequencing in their medical laboratories to identify biomarkers for disease, help clinicians understand their patients’ risk for a specific disease, and track the progression of a disease. The ability to store data in DNA would take that to another level and could have an impact on diagnostic pathology. Pathologist familiar with DNA sequencing may find a whole new area of medical service open to them.

The MR/UW researchers recently demonstrated a fully automated system that encoded data into DNA and then recovered the information as digital data. “In a simple proof-of-concept test, the team successfully encoded the word ‘hello’ in snippets of fabricated DNA and converted it back to digital data using a fully automated end-to-end system,” Microsoft stated in a news release.

The MR/UW team published their findings in Nature Scientific Reports.

DNA’s Potential Storage Capacity and Why We Need It

Thus far, the challenge of using DNA for data storage has been that there wasn’t a way to easily code and retrieve the information. That, however, seems to be changing quite rapidly. Several major companies have invested heavily in research, with consumer offerings expected soon.

At Microsoft Research, ‘consumer interest’ in genetic testing has driven the research into using DNA for data storage. “As People get better access to their own DNA, why not also give them the ability to read any kind of data written in DNA?” asked Doug Carmean, an Architect at Microsoft, during an interview with Wired.

Scientists are interested in using DNA for data storage because humanity is creating more data than ever before, and the pace is accelerating. Currently, most of that data is stored on tape, which is inexpensive, but has drawbacks. Tape degrades and has to be replaced every 10 years or so. But DNA, on the other hand, lasts for thousands of years!

“DNA won’t degrade over time like cassette tapes and CDs, and it won’t become obsolete,” Yaniv Erlich, PhD, Chief Science Officer at MyHeritage, an online genealogy platform located in Israel, and Associate Professor, Columbia University, told Science Mag.

Tape also takes up an enormous amount of physical space compared to DNA. One single gram of DNA can hold 215 petabytes (roughly one zettabyte) of data. Wired puts the storage capacity of DNA into perspective: “Imagine formatting every movie ever made into DNA; it would be smaller than the size of a sugar cube. And it would last for 10,000 years.”

Researchers at the University of Washington claim, “All the movies, images, emails and other digital data from more than 600 basic smartphones (10,000 gigabytes) can be stored in the faint pink smear of DNA at the end of this test tube.” (Photo and caption copyright: Tara Brown/University of Washington.)

Victor Zhirnov, Chief Scientist at Semiconductor Research Corporation says the worries over storage space aren’t simply theoretical. “Today’s technology is already close to the physical limits of scaling,” he told Wired, which stated, “Five years ago humans had produced 4.4 zettabytes of data; that’s set to explode to 160 zettabytes (each year!) by 2025. Current infrastructure can handle only a fraction of the coming data deluge, which is expected to consume all the world’s microchip-grade silicon by 2040.”

MIT Technology Review agrees, stating, “Humanity is creating information at an unprecedented rate—some 16 zettabytes every year. And this rate is increasing. Last year, the research group IDC calculated that we’ll be producing over 160 zettabytes every year by 2025.”

Heavy Investment by Major Players

The whole concept may seem like something out of a science fiction story, but the fact that businesses are investing real dollars into it is evidence that DNA for data storage will likely be a reality in the near future. Currently, there are a couple of barriers, but work is commencing to overcome them.

First, the cost of synthesizing DNA in a medical laboratory for the specific purpose of data storage must be cheaper for the solution to become viable. Second, the sequencing process to read the information must also become less expensive. And third is the problem of how to extract the data stored in the DNA.

In a paper published in ASPLOS ‘16, the MR/UW scientists wrote: “Today, neither the performance nor the cost of DNA synthesis and sequencing is viable for data storage purposes. However, they have historically seen exponential improvements. Their cost reductions and throughput improvements have been compared to Moore’s Law in Carlson’s Curves … Important biotechnology applications such as genomics and the development of smart drugs are expected to continue driving these improvements, eventually making data storage a viable application.”

Automation appears to be the final piece of the puzzle. Currently, too much human labor is necessary for DNA to be used efficiently as data storage.

 “Our ultimate goal is to put a system into production that, to the end user, looks very much like any other cloud storage service—bits are sent to a datacenter and stored there and then they just appear when the customer wants them,” said Microsoft principal researcher Karin Strauss (above), in the Microsoft news release. “To do that, we needed to prove that this is practical from an automation perspective.” Click here to watch a Microsoft Research video on the DNA storage process. (Photo copyright: Microsoft Research/YouTube.)

It may take some time before DNA becomes a viable medium for data storage. However, savvy pathology laboratory managers should be aware of, and possibly prepared for, this coming opportunity.

While it’s unlikely the average consumer will see much difference in how they save and retrieve data, medical laboratories with the ability to sequence DNA may find themselves very much in demand because of their expertise in sequencing DNA and interpreting gene sequences.

—Dava Stewart

Related Information:

With a “Hello,” Microsoft and UW Demonstrate First Fully Automated DNA Data Storage

Demonstration of End-to-End Automation of DNA Data Storage

UW Team Stores Digital Images in DNA—and Retrieves Them Perfectly

Microsoft and UW Demonstrate First Fully Automated DNA Data Storage

Storing Data in DNA Is A Lot Easier than Getting It Back Out

DNA Could Store All of the World’s Data in One Room

The Rise of DNA Data Storage

Forget Silicon—SQL On DNA Is the Next Frontier for Databases

Popularity of Direct-To-Consumer Genetic Tests Still Growing, Regardless of Concerns from Provider and Privacy Organizations

For blood brothers Quest and LabCorp this is good news, since the two medical laboratory companies perform most of the testing for the biggest DTC genetic test developers

Should clinical laboratories be concerned about direct-to-consumer (DTC) genetic tests? Despite alerts from healthcare organizations about the accuracy of DTC genetic testing—as well as calls from privacy organizations to give DTC customers more control over the use of their genetic data—millions of people have already taken DTC tests to learn about their genetic ancestry. And millions more are expected to send samples of their saliva to commercial DTC companies in the near future.

This growing demand for at-home DTC tests does not appear to be subsiding. And since most of the genetic testing is completed by the two largest lab companies—Quest Diagnostics (NYSE:DGX) and Laboratory Corporation of America (NYSE:LH)—other medical laboratories have yet to find their niche in the DTC industry.

Another factor is the recent FDA authorization allowing DTC company 23andme to report the results of its pharmacogenetic (PGx) test directly to customers without requiring a doctor’s order. For these reasons, this trend looks to be gaining momentum and support from federal governing organizations.

How will clinical pathology laboratories ultimately be impacted?

Data, Data, Where’s the Data?

Dark Daily has reported on DTC genetic testing for many years. According to MIT’s Technology Review, 26 million people—roughly 8% of the US population—have already taken at-home DNA tests. And that number is expected to balloon to more than 100 million in the next 24 months!

“The genetic genie is out of the bottle. And it’s not going back,” Technology Review reports.

The vast majority of the genetic information gathered goes into the databases of just four companies, with the top two—Ancestry and 23andMe—leading by a wide margin. The other two major players are FamilyTreeDNA and MyHeritage, however, Ancestry and 23andMe have heavily invested in online and television advertising, which is paying off.


In an op-ed response to a NYT editorial that warned readers to avoid 23andMe’s DTC genetic testing, 23andMe CEO and co-founder Anne Wojcicki (above) wrote, “We believe that consumers can learn about genetic information without the help of a medical professional, and we have the data to support that claim.” The FDA agreed and in February approved 23andMe to report pharmacogenetic test results directly to its customers. How this will play out for clinical laboratories remains to be seen. (Photo copyright: Inc.com.)

As more people add their data to a given database, the likelihood they will find connections within that database increases. This is called the Network Effect (aka, demand-side economies of scale) and social media platforms grow in a similar manner. Because Ancestry and 23andMe have massive databases, they have more information and can make more connections for their customers. This has made it increasingly difficult for other companies to compete.

Quest Diagnostics and LabCorp do the actual gene sequencing for the top players in the DTC genetic testing sector. The expected wave of new DTC genetic test costumers (74 million in the next 24 months) will certainly have a beneficial revenue impact on those two lab companies.

Why the Explosion in Genetic Testing by Consumers?

In 2013, just over 100,000 people took tests to have their DNA analyzed, mostly using Ancestry’s test, as Dark Daily reported. By 2017, that number had risen to around 12 million, and though Ancestry still had the majority market share, 23andMe was clearly becoming a force in the industry, noted Technology Review.

Given the reports of privacy concerns and the difficulty removing one’s genetic data from the Internet once it is online, why are people so eager to spit in those little tubes? There are several reported reasons, including:

And now there are several health-related reasons as well. For example, the study of pharmacogenetics has led clinicians to understand that certain genes reveal how our bodies process some medications. The FDA’s clearance allows 23andMe to directly inform customers about “genetic variants that may be associated with a patient’s ability to metabolize some medications to help inform discussions with a healthcare provider. The FDA is authorizing the test to detect 33 variants for multiple genes,” the FDA’s press release noted.

Controversy Over DTC Genetic Tests

The use of DTC genetic tests for healthcare purposes is not without scrutiny by regulatory agencies. The FDA removed 23andMe’s original health test from the market in 2013. According to Technology Review, the FDA’s letter was “one of the angriest ever sent to a private company” and said “that the company’s gene predictions were inaccurate and dangerous for those who might not fully understand the results.”

23andMe continues to refine its DTC tests. However, the debate continues. In February of this year, the New York Times (NYT) editorial board published an op-ed warning consumers to be wary of health tests offered by 23andMe, saying the tests “look for only a handful of [genetic] errors that may or may not elevate your risk of developing the disease in question. And they don’t factor into their final analysis other information, like family history.”

Anne Wojcicki, CEO and co-founder of 23andMe, responded with her own op-ed to the NYT, titled, “23andMe Responds: Empowering Consumers.” In her letter, Wojcicki contends that people should be empowered to take control of their own health, and that 23andMe allows them to do just that. “While 23andMe is not a diagnostic test for individuals with a strong family history of disease, it is a powerful and accurate screening tool that allows people to learn about themselves and some for the most common clinically useful genetic conditions,” she wrote.

Nevertheless, privacy concerns remain:

  • Who owns the results, the company or the consumer?
  • Who can access them?
  • What happens to them a year or five years after the test is taken?
  • When they are sold or used, are consumers informed?

Even as experts question the accuracy of DTC genetic testing in a healthcare context, and privacy concerns continue to grow, more people each year are ordering the tests. With predictions of 74 million more tests expected in the next 24 months, it’s certain that the medical laboratories that process those tests will benefit.

-Dava Stewart

Related Information:

More than 26 Million People Have Taken an At-Home Ancestry Test

How a DNA Testing Kit Revealed a Family Secret Hidden for 54 Years

23andMe Sells Data for Drug Search

Why You Should Be Careful About 23andMe’s Health Test

23andMe Responds: Empowering Consumers

Police Are Using Genetic Testing Companies to Track Down Criminals

The Problems with Ancestry DNA Analyses

FDA Authorizes 23andMe to Report Results of Direct-to-Consumer Pharmacogenetics Test to Customers without a Prescription, Bypassing Doctors and Clinical Laboratories

Erasing ‘DNA Footprint’ from the Internet Proves Difficult for Consumers Who Provide Data to Genetic Testing Companies

FDA Authorizes First Direct-To-Consumer Test for Detecting Genetic Variants That May Be Associated with Medication Metabolism

How DNA Databases Help Investigators Solve Crimes; Will Clinical Laboratories Be Asked to Help?

Studies show consumer genealogy databases are much broader than is generally known. If your cousins are in such a database, it’s likely you are too

Recent news stories highlighted crime investigators who used the DNA data in consumer genetic genealogy databases to solve cold cases. Though not widely known, such uses of direct-to-consumer DNA databases is becoming more commonplace, which might eventually lead to requests for clinical laboratories to assist in criminal investigations involving DNA data.

Case in point: investigators found the Golden State Killer, a serial killer/rapist/burglar who terrorized multiple California counties over a dozen years in the 1970s to 1980s, after uploading a DNA sample from the crime scene to GEDmatch, an open-data genomics database that features tools for genealogy research. They made the arrest after discovering a distant relative’s DNA in the genealogy database and matching it to the suspect, CBS News revealed in a 60 Minutes Overtime online report.

These and other investigators are using a technique called familial DNA testing (AKA, DNA Profiling), which enables them to use genetic material from relatives to solve crimes.

Clinical laboratories oversee DNA databases. Could DNA databases—developed and managed over years by medical laboratories for patient care—be subpoenaed by law enforcement investigating crimes?

The question raises many issues for society and for labs, including privacy responsibilities and appropriate use of genetic information. On the other hand, the genetic genie is already out of the bottle.

Leveraging Familia DNA to Solve Crimes a New Trend

“The solving of the Golden State Killer case opened this method up as a possibility, and other crime labs are taking advantage of it. Clearly, a trend has started,” Ruth Dickover, PhD, Director of Forensic Science, University of California, Davis, told the Los Angeles Times.

Indeed, the use of familial DNA testing is moving forward. The Verge reported 19 cold case samples have been identified in recent familial DNA testing and public database searches. It also said two new published studies may propel the technique further.

One study, published in the journal Science, suggests nearly every American of European ancestry may soon be identified through familial DNA testing.

The other study, published in Cell, shows that a person’s relatives can be detected when forensic DNA data are compared with consumer genetic databases.

Professor Noah Rosenberg and postdoctoral research fellow Jaehee Kim.

Noah Rosenberg, PhD (above left), Professor of Population Genetics and Society Biology at Stanford University, is shown above working with Jaehee Kim, PhD (right), a Postdoctoral Research Fellow in Biology, on math that could be used to track down relatives in genealogy databases based on forensic DNA. “This could be a way of expanding the reach of forensic genetics, potentially for solving even more cold cases. But at the same time, it could be exposing participants in those databases to forensic searches they might not have anticipated,” he told Wired. (Photo copyright: Stanford University/L.A. Cicero.)

15 Million People Already in Genealogy Databases

Researchers at Columbia University in New York and Hebrew University of Jerusalem told Science they were motivated by the recent trend of investigations leveraging third-party consumer genomics services to find criminals. But they perceived a gap.

“The big limitation is coverage. And even if you find an individual it requires complex analysis from that point,” Yaniv Erlich, PhD, Associate Professor at Columbia and Chief Science Officer at MyHeritage, told The Verge. MyHeritage is an online genealogy platform.

Others offering consumer genetic testing and family history exploration include 23andMe and Ancestry. As of April 2018, more than 15 million people have participated in direct-to-consumer genetic testing, the researchers noted.

The study aimed to find the likelihood that a person can be identified using a long-range familial search. It included these steps and findings:

  • Statistical analysis of 1.28 million people in the MyHeritage database;
  • Pairs of people with “identity-by-descent” were removed to avoid bias, such as first cousins and closer relationships;
  • Researchers aimed at finding a third cousin or closer relatives for each person in the database;
  • 60% of the 1.28 million people were matched with a third cousin or closer relative.

“We project that about 60% of the searches for individuals of European-descent will result in a third cousin or closer match, which can allow their identification using demographic identifiers. Moreover, the technique could implicate nearly any US individual of European descent in the near future,” the researchers wrote.

In an interview with Wired, Erlich added, “The takeaway is it doesn’t matter if you’ve been tested or not tested. You can be identified because the databases already cover such large fractions of the US—at least for European ancestry.”

Matching Forensic and Consumer Genetic Data

Meanwhile, the study published in Cell by researchers at Stanford University, University of California, Davis, and the University of Michigan also suggests investigators could compare forensic DNA samples with consumer genetic databases to find people related to criminals.

That study found:

  • 30% to 32% of people in a forensic database could be related to a child or parent in a consumer database;
  • 35% to 36% could be tied to a sibling.

These studies reveal that genetic data and familial DNA testing can help law enforcement find suspects, which is a good thing for society. But people who uploaded DNA data to some direct-to-consumer databases may find themselves caught up in searches they do not know about. So may their cousins.

Dark Daily recently covered other similar studies that showed it takes just one person’s DNA to reveal genetic information on an entire family. (See, “The Problems with Ancestry DNA Analyses,” October 18, 2018.) These developments in the use of DNA databases to identify criminals should be an early warning to clinical laboratories building databases of genetic information that, at some future point, law enforcement agencies might want access to those databases as part of ongoing criminal investigations.

—Donna Marie Pocius

Related Information:

Could Your DNA Help Solve a Cold Case?

So Many People Have Had Their DNA Sequenced That They’ve Put Other People’s Privacy in Jeopardy

The DNA Technique That Caught the Golden State Killer is More Powerful than We Thought

Identity Inference of Genomic Data Using Long-Range Familial Searches

Statistical Detection of Relatives Typed with Disjoint Forensic and Biomedical Loci

Genome Hackers Show No One’s DNA is Anonymous Anymore

Stanford Researchers Discover a New Way to Find Relatives from Forensic DNA

The Problems with Ancestry DNA Analyses

;