The proof-of-concept experiment showed data can be encoded in DNA and retrieved using automated systems, a development that may have positive significance for clinical laboratories
It may seem far-fetched, but computer scientists and research groups have worked for years to discover if it is possible to store data on Deoxyribonucleic acid (DNA). Now, Microsoft Research (MR) and the University of Washington (UW) have achieved just that, and the implications of their success could be far-reaching.
Clinical pathologists are increasingly performing genetic DNA sequencing in their medical laboratories to identify biomarkers for disease, help clinicians understand their patients’ risk for a specific disease, and track the progression of a disease. The ability to store data in DNA would take that to another level and could have an impact on diagnostic pathology. Pathologist familiar with DNA sequencing may find a whole new area of medical service open to them.
The MR/UW researchers recently demonstrated a fully automated system that encoded data into DNA and then recovered the information as digital data. “In a simple proof-of-concept test, the team successfully encoded the word ‘hello’ in snippets of fabricated DNA and converted it back to digital data using a fully automated end-to-end system,” Microsoft stated in a news release.
DNA’s Potential Storage Capacity and Why We Need It
Thus far, the challenge of using DNA for data storage has
been that there wasn’t a way to easily code and retrieve the information. That,
however, seems to be changing quite rapidly. Several major companies have
invested heavily in research, with consumer offerings expected soon.
At Microsoft Research, ‘consumer interest’ in genetic testing has driven the research into using DNA for data storage. “As People get better access to their own DNA, why not also give them the ability to read any kind of data written in DNA?” asked Doug Carmean, an Architect at Microsoft, during an interview with Wired.
Scientists are interested in using DNA for data storage because
humanity is creating more data than ever before, and the pace is accelerating.
Currently, most of that data is stored on tape, which is inexpensive, but has
drawbacks. Tape degrades and has to be replaced every 10 years or so. But DNA,
on the other hand, lasts for thousands of years!
Tape also takes up an enormous amount of physical space compared to DNA. One single gram of DNA can hold 215 petabytes (roughly one zettabyte) of data. Wired puts the storage capacity of DNA into perspective: “Imagine formatting every movie ever made into DNA; it would be smaller than the size of a sugar cube. And it would last for 10,000 years.”
Victor Zhirnov, Chief Scientist at Semiconductor Research Corporation says the worries over storage space aren’t simply theoretical. “Today’s technology is already close to the physical limits of scaling,” he told Wired, which stated, “Five years ago humans had produced 4.4 zettabytes of data; that’s set to explode to 160 zettabytes (each year!) by 2025. Current infrastructure can handle only a fraction of the coming data deluge, which is expected to consume all the world’s microchip-grade silicon by 2040.”
MIT Technology Review agrees, stating, “Humanity is creating information at an unprecedented rate—some 16 zettabytes every year. And this rate is increasing. Last year, the research group IDC calculated that we’ll be producing over 160 zettabytes every year by 2025.”
Heavy Investment by Major Players
The whole concept may seem like something out of a science
fiction story, but the fact that businesses are investing real dollars into it
is evidence that DNA for data storage will likely be a reality in the near
future. Currently, there are a couple of barriers, but work is commencing to
First, the cost of synthesizing DNA in a medical laboratory
for the specific purpose of data storage must be cheaper for the solution to
become viable. Second, the sequencing process to read the information must also
become less expensive. And third is the problem of how to extract the data
stored in the DNA.
In a paper published in ASPLOS ‘16, the MR/UW scientists wrote: “Today, neither the performance nor the cost of DNA synthesis and sequencing is viable for data storage purposes. However, they have historically seen exponential improvements. Their cost reductions and throughput improvements have been compared to Moore’s Law in Carlson’s Curves … Important biotechnology applications such as genomics and the development of smart drugs are expected to continue driving these improvements, eventually making data storage a viable application.”
Automation appears to be the final piece of the puzzle. Currently,
too much human labor is necessary for DNA to be used efficiently as data
It may take some time before DNA becomes a viable medium for
data storage. However, savvy pathology laboratory managers should be aware of,
and possibly prepared for, this coming opportunity.
While it’s unlikely the average consumer will see much
difference in how they save and retrieve data, medical laboratories with the
ability to sequence DNA may find themselves very much in demand because of
their expertise in sequencing DNA and interpreting gene sequences.
Studies show consumer genealogy databases are much broader than is generally known. If your cousins are in such a database, it’s likely you are too
Recent news stories highlighted crime investigators who used the DNA data in consumer genetic genealogy databases to solve cold cases. Though not widely known, such uses of direct-to-consumer DNA databases is becoming more commonplace, which might eventually lead to requests for clinical laboratories to assist in criminal investigations involving DNA data.
Case in point: investigators found the Golden State Killer, a serial killer/rapist/burglar who terrorized multiple California counties over a dozen years in the 1970s to 1980s, after uploading a DNA sample from the crime scene to GEDmatch, an open-data genomics database that features tools for genealogy research. They made the arrest after discovering a distant relative’s DNA in the genealogy database and matching it to the suspect, CBS News revealed in a 60 Minutes Overtime online report.
These and other investigators are using a technique called familial DNA testing (AKA, DNA Profiling), which enables them to use genetic material from relatives to solve crimes.
Clinical laboratories oversee DNA databases. Could DNA databases—developed and managed over years by medical laboratories for patient care—be subpoenaed by law enforcement investigating crimes?
The question raises many issues for society and for labs, including privacy responsibilities and appropriate use of genetic information. On the other hand, the genetic genie is already out of the bottle.
Leveraging Familia DNA to Solve Crimes a New Trend
Indeed, the use of familial DNA testing is moving forward. The Verge reported 19 cold case samples have been identified in recent familial DNA testing and public database searches. It also said two new published studies may propel the technique further.
One study, published in the journal Science, suggests nearly every American of European ancestry may soon be identified through familial DNA testing.
The other study, published in Cell, shows that a person’s relatives can be detected when forensic DNA data are compared with consumer genetic databases.
Noah Rosenberg, PhD (above left), Professor of Population Genetics and Society Biology at Stanford University, is shown above working with Jaehee Kim, PhD (right), a Postdoctoral Research Fellow in Biology, on math that could be used to track down relatives in genealogy databases based on forensic DNA. “This could be a way of expanding the reach of forensic genetics, potentially for solving even more cold cases. But at the same time, it could be exposing participants in those databases to forensic searches they might not have anticipated,” he told Wired. (Photo copyright: Stanford University/L.A. Cicero.)
15 Million People Already in Genealogy Databases
Researchers at Columbia University in New York and Hebrew University of Jerusalem told Science they were motivated by the recent trend of investigations leveraging third-party consumer genomics services to find criminals. But they perceived a gap.
“The big limitation is coverage. And even if you find an individual it requires complex analysis from that point,” Yaniv Erlich, PhD, Associate Professor at Columbia and Chief Science Officer at MyHeritage, told The Verge. MyHeritage is an online genealogy platform.
Others offering consumer genetic testing and family history exploration include 23andMe and Ancestry. As of April 2018, more than 15 million people have participated in direct-to-consumer genetic testing, the researchers noted.
The study aimed to find the likelihood that a person can be identified using a long-range familial search. It included these steps and findings:
Statistical analysis of 1.28 million people in the MyHeritage database;
Pairs of people with “identity-by-descent” were removed to avoid bias, such as first cousins and closer relationships;
Researchers aimed at finding a third cousin or closer relatives for each person in the database;
60% of the 1.28 million people were matched with a third cousin or closer relative.
“We project that about 60% of the searches for individuals of European-descent will result in a third cousin or closer match, which can allow their identification using demographic identifiers. Moreover, the technique could implicate nearly any US individual of European descent in the near future,” the researchers wrote.
In an interview with Wired, Erlich added, “The takeaway is it doesn’t matter if you’ve been tested or not tested. You can be identified because the databases already cover such large fractions of the US—at least for European ancestry.”
Matching Forensic and Consumer Genetic Data
Meanwhile, the study published in Cell by researchers at Stanford University, University of California, Davis, and the University of Michigan also suggests investigators could compare forensic DNA samples with consumer genetic databases to find people related to criminals.
That study found:
30% to 32% of people in a forensic database could be related to a child or parent in a consumer database;
35% to 36% could be tied to a sibling.
These studies reveal that genetic data and familial DNA testing can help law enforcement find suspects, which is a good thing for society. But people who uploaded DNA data to some direct-to-consumer databases may find themselves caught up in searches they do not know about. So may their cousins.
Dark Daily recently covered other similar studies that showed it takes just one person’s DNA to reveal genetic information on an entire family. (See, “The Problems with Ancestry DNA Analyses,” October 18, 2018.) These developments in the use of DNA databases to identify criminals should be an early warning to clinical laboratories building databases of genetic information that, at some future point, law enforcement agencies might want access to those databases as part of ongoing criminal investigations.