News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel

News, Analysis, Trends, Management Innovations for
Clinical Laboratories and Pathology Groups

Hosted by Robert Michel
Sign In

University of Washington and Microsoft Research Encode Data into DNA, Demonstrating Potential New Use for Genetic Sequences

The proof-of-concept experiment showed data can be encoded in DNA and retrieved using automated systems, a development that may have positive significance for clinical laboratories

It may seem far-fetched, but computer scientists and research groups have worked for years to discover if it is possible to store data on Deoxyribonucleic acid (DNA). Now, Microsoft Research (MR) and the University of Washington (UW) have achieved just that, and the implications of their success could be far-reaching.

Clinical pathologists are increasingly performing genetic DNA sequencing in their medical laboratories to identify biomarkers for disease, help clinicians understand their patients’ risk for a specific disease, and track the progression of a disease. The ability to store data in DNA would take that to another level and could have an impact on diagnostic pathology. Pathologist familiar with DNA sequencing may find a whole new area of medical service open to them.

The MR/UW researchers recently demonstrated a fully automated system that encoded data into DNA and then recovered the information as digital data. “In a simple proof-of-concept test, the team successfully encoded the word ‘hello’ in snippets of fabricated DNA and converted it back to digital data using a fully automated end-to-end system,” Microsoft stated in a news release.

The MR/UW team published their findings in Nature Scientific Reports.

DNA’s Potential Storage Capacity and Why We Need It

Thus far, the challenge of using DNA for data storage has been that there wasn’t a way to easily code and retrieve the information. That, however, seems to be changing quite rapidly. Several major companies have invested heavily in research, with consumer offerings expected soon.

At Microsoft Research, ‘consumer interest’ in genetic testing has driven the research into using DNA for data storage. “As People get better access to their own DNA, why not also give them the ability to read any kind of data written in DNA?” asked Doug Carmean, an Architect at Microsoft, during an interview with Wired.

Scientists are interested in using DNA for data storage because humanity is creating more data than ever before, and the pace is accelerating. Currently, most of that data is stored on tape, which is inexpensive, but has drawbacks. Tape degrades and has to be replaced every 10 years or so. But DNA, on the other hand, lasts for thousands of years!

“DNA won’t degrade over time like cassette tapes and CDs, and it won’t become obsolete,” Yaniv Erlich, PhD, Chief Science Officer at MyHeritage, an online genealogy platform located in Israel, and Associate Professor, Columbia University, told Science Mag.

Tape also takes up an enormous amount of physical space compared to DNA. One single gram of DNA can hold 215 petabytes (roughly one zettabyte) of data. Wired puts the storage capacity of DNA into perspective: “Imagine formatting every movie ever made into DNA; it would be smaller than the size of a sugar cube. And it would last for 10,000 years.”

Researchers at the University of Washington claim, “All the movies, images, emails and other digital data from more than 600 basic smartphones (10,000 gigabytes) can be stored in the faint pink smear of DNA at the end of this test tube.” (Photo and caption copyright: Tara Brown/University of Washington.)

Victor Zhirnov, Chief Scientist at Semiconductor Research Corporation says the worries over storage space aren’t simply theoretical. “Today’s technology is already close to the physical limits of scaling,” he told Wired, which stated, “Five years ago humans had produced 4.4 zettabytes of data; that’s set to explode to 160 zettabytes (each year!) by 2025. Current infrastructure can handle only a fraction of the coming data deluge, which is expected to consume all the world’s microchip-grade silicon by 2040.”

MIT Technology Review agrees, stating, “Humanity is creating information at an unprecedented rate—some 16 zettabytes every year. And this rate is increasing. Last year, the research group IDC calculated that we’ll be producing over 160 zettabytes every year by 2025.”

Heavy Investment by Major Players

The whole concept may seem like something out of a science fiction story, but the fact that businesses are investing real dollars into it is evidence that DNA for data storage will likely be a reality in the near future. Currently, there are a couple of barriers, but work is commencing to overcome them.

First, the cost of synthesizing DNA in a medical laboratory for the specific purpose of data storage must be cheaper for the solution to become viable. Second, the sequencing process to read the information must also become less expensive. And third is the problem of how to extract the data stored in the DNA.

In a paper published in ASPLOS ‘16, the MR/UW scientists wrote: “Today, neither the performance nor the cost of DNA synthesis and sequencing is viable for data storage purposes. However, they have historically seen exponential improvements. Their cost reductions and throughput improvements have been compared to Moore’s Law in Carlson’s Curves … Important biotechnology applications such as genomics and the development of smart drugs are expected to continue driving these improvements, eventually making data storage a viable application.”

Automation appears to be the final piece of the puzzle. Currently, too much human labor is necessary for DNA to be used efficiently as data storage.

 “Our ultimate goal is to put a system into production that, to the end user, looks very much like any other cloud storage service—bits are sent to a datacenter and stored there and then they just appear when the customer wants them,” said Microsoft principal researcher Karin Strauss (above), in the Microsoft news release. “To do that, we needed to prove that this is practical from an automation perspective.” Click here to watch a Microsoft Research video on the DNA storage process. (Photo copyright: Microsoft Research/YouTube.)

It may take some time before DNA becomes a viable medium for data storage. However, savvy pathology laboratory managers should be aware of, and possibly prepared for, this coming opportunity.

While it’s unlikely the average consumer will see much difference in how they save and retrieve data, medical laboratories with the ability to sequence DNA may find themselves very much in demand because of their expertise in sequencing DNA and interpreting gene sequences.

—Dava Stewart

Related Information:

With a “Hello,” Microsoft and UW Demonstrate First Fully Automated DNA Data Storage

Demonstration of End-to-End Automation of DNA Data Storage

UW Team Stores Digital Images in DNA—and Retrieves Them Perfectly

Microsoft and UW Demonstrate First Fully Automated DNA Data Storage

Storing Data in DNA Is A Lot Easier than Getting It Back Out

DNA Could Store All of the World’s Data in One Room

The Rise of DNA Data Storage

Forget Silicon—SQL On DNA Is the Next Frontier for Databases

How DNA Databases Help Investigators Solve Crimes; Will Clinical Laboratories Be Asked to Help?

Studies show consumer genealogy databases are much broader than is generally known. If your cousins are in such a database, it’s likely you are too

Recent news stories highlighted crime investigators who used the DNA data in consumer genetic genealogy databases to solve cold cases. Though not widely known, such uses of direct-to-consumer DNA databases is becoming more commonplace, which might eventually lead to requests for clinical laboratories to assist in criminal investigations involving DNA data.

Case in point: investigators found the Golden State Killer, a serial killer/rapist/burglar who terrorized multiple California counties over a dozen years in the 1970s to 1980s, after uploading a DNA sample from the crime scene to GEDmatch, an open-data genomics database that features tools for genealogy research. They made the arrest after discovering a distant relative’s DNA in the genealogy database and matching it to the suspect, CBS News revealed in a 60 Minutes Overtime online report.

These and other investigators are using a technique called familial DNA testing (AKA, DNA Profiling), which enables them to use genetic material from relatives to solve crimes.

Clinical laboratories oversee DNA databases. Could DNA databases—developed and managed over years by medical laboratories for patient care—be subpoenaed by law enforcement investigating crimes?

The question raises many issues for society and for labs, including privacy responsibilities and appropriate use of genetic information. On the other hand, the genetic genie is already out of the bottle.

Leveraging Familia DNA to Solve Crimes a New Trend

“The solving of the Golden State Killer case opened this method up as a possibility, and other crime labs are taking advantage of it. Clearly, a trend has started,” Ruth Dickover, PhD, Director of Forensic Science, University of California, Davis, told the Los Angeles Times.

Indeed, the use of familial DNA testing is moving forward. The Verge reported 19 cold case samples have been identified in recent familial DNA testing and public database searches. It also said two new published studies may propel the technique further.

One study, published in the journal Science, suggests nearly every American of European ancestry may soon be identified through familial DNA testing.

The other study, published in Cell, shows that a person’s relatives can be detected when forensic DNA data are compared with consumer genetic databases.

Professor Noah Rosenberg and postdoctoral research fellow Jaehee Kim.

Noah Rosenberg, PhD (above left), Professor of Population Genetics and Society Biology at Stanford University, is shown above working with Jaehee Kim, PhD (right), a Postdoctoral Research Fellow in Biology, on math that could be used to track down relatives in genealogy databases based on forensic DNA. “This could be a way of expanding the reach of forensic genetics, potentially for solving even more cold cases. But at the same time, it could be exposing participants in those databases to forensic searches they might not have anticipated,” he told Wired. (Photo copyright: Stanford University/L.A. Cicero.)

15 Million People Already in Genealogy Databases

Researchers at Columbia University in New York and Hebrew University of Jerusalem told Science they were motivated by the recent trend of investigations leveraging third-party consumer genomics services to find criminals. But they perceived a gap.

“The big limitation is coverage. And even if you find an individual it requires complex analysis from that point,” Yaniv Erlich, PhD, Associate Professor at Columbia and Chief Science Officer at MyHeritage, told The Verge. MyHeritage is an online genealogy platform.

Others offering consumer genetic testing and family history exploration include 23andMe and Ancestry. As of April 2018, more than 15 million people have participated in direct-to-consumer genetic testing, the researchers noted.

The study aimed to find the likelihood that a person can be identified using a long-range familial search. It included these steps and findings:

  • Statistical analysis of 1.28 million people in the MyHeritage database;
  • Pairs of people with “identity-by-descent” were removed to avoid bias, such as first cousins and closer relationships;
  • Researchers aimed at finding a third cousin or closer relatives for each person in the database;
  • 60% of the 1.28 million people were matched with a third cousin or closer relative.

“We project that about 60% of the searches for individuals of European-descent will result in a third cousin or closer match, which can allow their identification using demographic identifiers. Moreover, the technique could implicate nearly any US individual of European descent in the near future,” the researchers wrote.

In an interview with Wired, Erlich added, “The takeaway is it doesn’t matter if you’ve been tested or not tested. You can be identified because the databases already cover such large fractions of the US—at least for European ancestry.”

Matching Forensic and Consumer Genetic Data

Meanwhile, the study published in Cell by researchers at Stanford University, University of California, Davis, and the University of Michigan also suggests investigators could compare forensic DNA samples with consumer genetic databases to find people related to criminals.

That study found:

  • 30% to 32% of people in a forensic database could be related to a child or parent in a consumer database;
  • 35% to 36% could be tied to a sibling.

These studies reveal that genetic data and familial DNA testing can help law enforcement find suspects, which is a good thing for society. But people who uploaded DNA data to some direct-to-consumer databases may find themselves caught up in searches they do not know about. So may their cousins.

Dark Daily recently covered other similar studies that showed it takes just one person’s DNA to reveal genetic information on an entire family. (See, “The Problems with Ancestry DNA Analyses,” October 18, 2018.) These developments in the use of DNA databases to identify criminals should be an early warning to clinical laboratories building databases of genetic information that, at some future point, law enforcement agencies might want access to those databases as part of ongoing criminal investigations.

—Donna Marie Pocius

Related Information:

Could Your DNA Help Solve a Cold Case?

So Many People Have Had Their DNA Sequenced That They’ve Put Other People’s Privacy in Jeopardy

The DNA Technique That Caught the Golden State Killer is More Powerful than We Thought

Identity Inference of Genomic Data Using Long-Range Familial Searches

Statistical Detection of Relatives Typed with Disjoint Forensic and Biomedical Loci

Genome Hackers Show No One’s DNA is Anonymous Anymore

Stanford Researchers Discover a New Way to Find Relatives from Forensic DNA

The Problems with Ancestry DNA Analyses

;