Scientists Encode Malware with Synthesized DNA That Targets DNA Analysis Software Commonly Found in Gene Sequencers Used by Clinical Laboratories

Researchers demonstrated it was feasible to encode digital malware onto a strand of synthesized DNA and infect the gene sequencers and computer networks used by medical laboratories

As if anatomic pathology groups and clinical laboratory leaders don’t already have enough to think about, here comes a security vulnerability right out of a sci-fi thriller. Researchers at the University of Washington (UW) have used synthesized DNA to encode digital malware into a physical strand of DNA capable of establishing a remote connection to the computer network on which the sequenced DNA is read!

Stated differently, researchers have now demonstrated that is possible for bad guys to hack into a medical laboratory’s instrument systems and computer network using a physical strand of synthesized DNA that is encoded with digital malware.

Another Threat to Clinical Laboratories, Pathology Groups?

Does this translate into an immediate security issue for medical laboratories? For now, the threat is only theoretical. While researchers did succeed, their study findings should provide some comfort to pathology groups or medical laboratories worried about the implications of DNA-based malware. The UW researchers published their findings at the 2017 USENIX Security Symposium.

Synthetic DNA Malware Exploit is More Proof-of-Concept than Immediate Threat

At its core, computer code (AKA source code) is similar to DNA in that it is composed of a set number of states—with binary, zeroes, and ones. This led UW researchers to question whether they could translate the AGCT elements (adenine, guanine, cytosine, and thymine) of DNA into binary code capable of hacking DNA sequencers and accessing the information they contain.

In an article in The Atlantic, Tadayoshi Kohno, PhD, Short-Dooley Professor in the Department of Computer Science and Engineering at UW, who led the research team, noted that, “The present-day threat is very small, and people don’t need to lose sleep immediately. But we wanted to know what was possible and what the issues are down the line.”

Complexity of Engineering a DNA-Powered Computer Virus

To begin the process, researchers needed to create a specific DNA strand encoded with the exact proteins that would later convert into their exploit. An article in ArsTechnica suggests this would be a challenge due to the physical properties of DNA’s double-helix design.

In the article, John Timmer, PhD, wrote, “DNA with Gs and Cs forms a stronger double-helix. Too many of them, and the strand won’t open up easily for sequencing. Too few, and it’ll pop open when you don’t want it to.”

The study shows it took multiple attempts to find a DNA sequence that would both carry the malware code and withstand the synthesizing and sequencing processes. Even then, researchers needed an exploit for the software used on sequencers in clinical laboratories and other diagnostics providers to prove their theory. Study authors used their own modified version of an open-source sequencing software, adding an exploit they could target, instead of a version of the software already publicly in use.

Lee Organick (above left), Karl Koscher (center), and Peter Ney (right) worked with Luis Ceze and Tadayoshi Kohno, PhD, at the University of Washington to develop the DNA sequence containing the malware code. The researchers determined that it was feasible for the gene instruments used by clinical laboratories to be infected with the malware, which could then move to infect a clinical lab’s computer network. (Photo copyright: University of Washington.)

With their proteins synthesized and customized software in place, researchers still faced challenges getting the code to trigger. “With reads randomly appearing in an FASTQ file,” the researchers noted, “we would expect the modified program to be exploited 37.4% of the time.”

As with genetic code, the binary code of a program is highly sensitive to errors. Any misread bases or splitting of the code resulted in failure. When sequencers only read a few hundred bases at a time, ensuring the code doesn’t hit one of these splits is a challenge.

One unique difference between binary and genetic code also caused trouble—genetic sequences aren’t direction dependent, while binary sequences are. If the code is read in reverse, it won’t execute properly.

Future Concerns for Clinical Laboratories and Genetic Researchers

Today, the threat to medical laboratories and the sensitive data generated by sequencing is minor. However, tomorrow that threat could be more common.

In a WIRED article on the subject, Jason Callahan, Chief Information Security Officer for Illumina stated, “This is interesting research about potential long-term risks. We agree with the premise of the study—that this does not pose an imminent threat and is not a typical cyber security capability.”

Don Rule, founder of Translational Software, agrees. When asked about the threat posed to clinical laboratories, he said, “… if you have to pre-introduce the hack in the analytics program, this is a pretty circuitous way to take over a computer. I can see how it is feasible and right now Norton Antivirus is not looking for viruses encoded in the AGCT code set, but we are right not to lose a lot of sleep over it.”

However, as genetic sequencing becomes a common part of medicine, attackers might have increased reason to disrupt services or intercept data. The UW researchers cite “important domains like forensics, medicine, and agriculture” as potential targets.

While their successful attack was highly engineered, their research into open-source sequencing software revealed a range of common security weaknesses. Many clinical laboratories and anatomic pathology groups also run proprietary analysis software or use hardware with embedded software.

They recommend that medical laboratories work to centralize software updates and create ways to verify data and patches through digital signatures or other secure measures.

Already, genetic researchers take care to avoid synthesizing potentially dangerous sequences, and to contain tests and data. But this study shows that not all threats come from within the research or clinical laboratory environment. Both engineers of sequencing technology and hardware—and the medical laboratories using them—will need to optimize operations and monitor trends closely to see how security issues evolve alongside sequencing capabilities.

—Jon Stone

Related Information:

These Scientists Took Over a Computer by Encoding Malware in DNA

Computer Security and Privacy in DNA Sequencing

Computer Security, Privacy, and DNA Sequencing: Compromising Computers with Synthesized DNA, Privacy Leaks, and More

This Speck of DNA Contains a Movie, a Computer Virus, and an Amazon Gift Card

Researchers Encode Malware in DNA, Compromise DNA Sequencing Software

Biohackers Encoded Malware in a Strand of DNA

The Ultimate Virus: How Malware Encoded in Synthesized DNA Can Compromise a Computer System

Researchers Hacked into DNA and Encoded It with Malware