The proof-of-concept experiment showed data can be encoded in DNA and retrieved using automated systems, a development that may have positive significance for clinical laboratories
It may seem far-fetched, but computer scientists and research groups have worked for years to discover if it is possible to store data on Deoxyribonucleic acid (DNA). Now, Microsoft Research (MR) and the University of Washington (UW) have achieved just that, and the implications of their success could be far-reaching.
Clinical pathologists are increasingly performing genetic DNA sequencing in their medical laboratories to identify biomarkers for disease, help clinicians understand their patients’ risk for a specific disease, and track the progression of a disease. The ability to store data in DNA would take that to another level and could have an impact on diagnostic pathology. Pathologist familiar with DNA sequencing may find a whole new area of medical service open to them.
The MR/UW researchers recently demonstrated a fully automated system that encoded data into DNA and then recovered the information as digital data. “In a simple proof-of-concept test, the team successfully encoded the word ‘hello’ in snippets of fabricated DNA and converted it back to digital data using a fully automated end-to-end system,” Microsoft stated in a news release.
DNA’s Potential Storage Capacity and Why We Need It
Thus far, the challenge of using DNA for data storage has
been that there wasn’t a way to easily code and retrieve the information. That,
however, seems to be changing quite rapidly. Several major companies have
invested heavily in research, with consumer offerings expected soon.
At Microsoft Research, ‘consumer interest’ in genetic testing has driven the research into using DNA for data storage. “As People get better access to their own DNA, why not also give them the ability to read any kind of data written in DNA?” asked Doug Carmean, an Architect at Microsoft, during an interview with Wired.
Scientists are interested in using DNA for data storage because
humanity is creating more data than ever before, and the pace is accelerating.
Currently, most of that data is stored on tape, which is inexpensive, but has
drawbacks. Tape degrades and has to be replaced every 10 years or so. But DNA,
on the other hand, lasts for thousands of years!
“DNA won’t degrade over time like cassette tapes and CDs, and it won’t become obsolete,” Yaniv Erlich, PhD, Chief Science Officer at MyHeritage, an online genealogy platform located in Israel, and Associate Professor, Columbia University, told Science Mag.
Tape also takes up an enormous amount of physical space compared to DNA. One single gram of DNA can hold 215 petabytes (roughly one zettabyte) of data. Wired puts the storage capacity of DNA into perspective: “Imagine formatting every movie ever made into DNA; it would be smaller than the size of a sugar cube. And it would last for 10,000 years.”
Victor Zhirnov, Chief Scientist at Semiconductor Research Corporation says the worries over storage space aren’t simply theoretical. “Today’s technology is already close to the physical limits of scaling,” he told Wired, which stated, “Five years ago humans had produced 4.4 zettabytes of data; that’s set to explode to 160 zettabytes (each year!) by 2025. Current infrastructure can handle only a fraction of the coming data deluge, which is expected to consume all the world’s microchip-grade silicon by 2040.”
MIT Technology Review agrees, stating, “Humanity is creating information at an unprecedented rate—some 16 zettabytes every year. And this rate is increasing. Last year, the research group IDC calculated that we’ll be producing over 160 zettabytes every year by 2025.”
Heavy Investment by Major Players
The whole concept may seem like something out of a science
fiction story, but the fact that businesses are investing real dollars into it
is evidence that DNA for data storage will likely be a reality in the near
future. Currently, there are a couple of barriers, but work is commencing to
overcome them.
First, the cost of synthesizing DNA in a medical laboratory
for the specific purpose of data storage must be cheaper for the solution to
become viable. Second, the sequencing process to read the information must also
become less expensive. And third is the problem of how to extract the data
stored in the DNA.
In a paper published in ASPLOS ‘16, the MR/UW scientists wrote: “Today, neither the performance nor the cost of DNA synthesis and sequencing is viable for data storage purposes. However, they have historically seen exponential improvements. Their cost reductions and throughput improvements have been compared to Moore’s Law in Carlson’s Curves … Important biotechnology applications such as genomics and the development of smart drugs are expected to continue driving these improvements, eventually making data storage a viable application.”
Automation appears to be the final piece of the puzzle. Currently,
too much human labor is necessary for DNA to be used efficiently as data
storage.
It may take some time before DNA becomes a viable medium for
data storage. However, savvy pathology laboratory managers should be aware of,
and possibly prepared for, this coming opportunity.
While it’s unlikely the average consumer will see much
difference in how they save and retrieve data, medical laboratories with the
ability to sequence DNA may find themselves very much in demand because of
their expertise in sequencing DNA and interpreting gene sequences.
Researchers in Boston are working to develop DNA as a low-cost, effective way to store data; could lead to new molecular technology industries outside of healthcare
Even as new insights about the role of DNA in various human diseases and health conditions continue to tumble out of research labs, a potential new use for DNA is emerging. A research team in Boston is exploring how to use DNA as a low-cost, reliable way to store and retrieve data.
This has implications for the nation’s clinical laboratories and anatomic pathology groups, because they are gaining experience in sequencing DNA, then storing that data for analysis and use in clinical care settings. If a way to use DNA as a data storage methodology was to become reality, it can be expected that medical laboratories will have the skillsets, experience, and information technology infrastructure already in place to offer a DNA-based data storage service. This would be particularly true for patient data and healthcare data.
Finding a way to reduce the cost of data storage is a primary reason why scientists are looking at ways that DNA could be used as a data storage technology. These scientists and technology developers seek ways to alleviate the world’s over-crowded hard drives, cloud servers, and databases. They hope this can be done by developing technologies that store digital information in artificially-made versions of DNA molecules.
The research so far suggests DNA data storage could be used to store data more effectively than existing data storage solutions. If this proves true, DNA-based data storage technologies could play a key role in industries outside of healthcare.
If so, practical knowledge of DNA handling and storage would be critical to these companies’ success. In turn, this could present unique opportunities for medical laboratory professionals.
DNA Data Storage: Durable but Costly
Besides enormous capacity, DNA-based data storage technology offers durability and long shelf life in a compact footprint, compared to other data storage mediums.
“DNA has an information-storage density several orders of magnitude higher than any other known storage technology,” Victor Zhirnov, PhD, Chief Scientist and Director, Semiconductor Research Corporation, told Wired.
However, projected costs are quite high, due to the cost of writing the information into the DNA. However, Catalog Technologies Inc. of Boston thinks it has a solution.
Rather than producing billions of unique bits of DNA, as Microsoft did while developing its own DNA data storage solution, Catalog’s approach is to “cheaply generate large quantities of just a few different DNA molecules, none longer than 30 base pairs. Then [use] billions of enzymatic reactions to encode information into the recombination patterns of those prefab bits of DNA. Instead of mapping one bit to one base pair, bits are arranged in multidimensional matrices, and sets of molecules represent their locations in each matrix.”
The Boston-based company plans to launch an industrial-scale DNA data storage service using a machine that can daily write a terabyte of data by leveraging 500-trillion DNA molecules, according to Wired. Potential customers include the entertainment industry, federal government, and information technology developers.
Catalog is supported by $9 million from investors. However, it is not the only company working on this. Microsoft and other companies are reportedly working on DNA storage projects as well.
“It’s a new generation of information storage technology that’s got a million times the information density, compared to flash storage. You can shrink down entire data centers into shoeboxes of DNA,” Catalog’s CEO, Hyunjun Park, PhD (above center, between Chief Science Officer Devin Leake on left and Milena Lazova, scientist, on right), told the Boston Globe. (Photo copyright: Catalog.)
Microsoft, University of Washington’s Synthetic DNA Data Storage
Microsoft and researchers at the University of Washington (UW) made progress on their development of a DNA-based storage system for digital data, according to a news release. What makes their work unique, they say, is the large-scale storage of synthetic DNA (200 megabytes) along with the ability to the retrieve data as needed.
“Synthetic DNA is durable and can encode digital data with high density, making it an attractive medium for data storage. However, recovering stored data on a large-scale currently requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted,” the researchers wrote in their paper published in Nature Biotechnology.
“Here, we encode and store 35 distinct files (over 200 megabytes of data ) in more than 13-million DNA oligonucleotides and show that we can recover each file individually and with no errors, using a random access approach,” the researchers explained.
“Our work reduces the effort, both in sequencing capacity and in processing, to completely recover information stored in DNA,” Sergey Yekhanin, PhD, Microsoft Senior Researcher, told Digital Trends.
Successful research by Catalog, Microsoft, and others may soon lead to the launch of marketable DNA data storage services. And medical laboratory professionals who already know the code—the life code that is—will likely find themselves more marketable as well!