With everyone from academics to Microsoft looking at the prospect of storing data using DNA, it was probably inevitable that someone would start looking at the security implications. Apparently, they’re worse than most people might have expected. It turns out it’s possible to encode computer malware in DNA and use it to attack vulnerabilities on the computer that analyzes the sequence of that DNA.
The researchers didn’t find an actual vulnerability in DNA analysis software—instead, they specifically made a version of some software with an exploitable vulnerability to show that the risk is more than hypothetical. Still, an audit of some open source DNA analysis software shows that the academics who have been writing it haven’t been paying much attention to security best practices.
More like a virus than most
DNA sequencing involves determining the precise order of the bases that make up a DNA strand. While the process that generates the sequence is generally some combination of biology and/or chemistry, once it’s read, the sequence is typically stored as an ASCII string of As, Ts, Cs, and Gs. If handled improperly, that chunk of data could exploit vulnerable software to get it to execute arbitrary code. And DNA sequences tend to see a lot of software, which find overlapping sequences, align it to known genomes, look for key differences, and more.
To see whether this threat was more than hypothetical, the researchers started with a really simple exploit: store more data than a chunk of memory was intended to hold, and redirect program execution to the excess. In this case, said excess contained an exploit that would use a feature of the bash shell to connect into a remote server that the researchers controlled. If it worked, the server would then have full shell access to the machine running the DNA analysis software.
Actually implementing that in DNA, however, turned out to be challenging. DNA with Gs and Cs forms a stronger double-helix. Too many of them, and the strand won’t open up easily for sequencing. Too few, and it’ll pop open when you don’t want it to. Repetitive DNA can form complex structures that get in the way of all the enzymes we normally use to manipulate DNA. The computer code they wanted to use, however, had lots of long runs of the same character, which made for a repetitive sequence that was very low in Gs and Cs. The company they were ordering DNA from…