Could DNA be the solution to data storage shortage?

With the unceasing, advancing technical progress that forms the engine of our modern society, the amount of data as well as data size is increasing massively. Therefore, it is a crucial goal to save all these precious bits of information on a compact and stable medium. USB-sticks, CDs and hard drives have served us well during the last decades. However, there are current efforts to save data on an, maybe, unexpected medium, DNA. This sounds crazy at first, but if you think about it, why not? Breaking it down to a basic level, DNA is nothing else than a storage medium, containing all of the information required to construct and maintain a living organism! The concept hidden behind DNA is a system that has evolved throughout thousands and millions of years, maximising the amount of storable information in the simplest form possible, so why not use it to our benefit?

DNA a as storage medium

Let’s dive into how data storage in DNA would function in practice! DNA is a very special molecule consisting of the nucleotides A, T, G and C bonded to each other in specific sequences, genes. As part of the central dogma of biology, these sequences of nucleotides are translated into a protein corresponding to that specific sequence of bases. But what if these sequences instead corresponded to certain data, a letter or maybe a number? This is the principle of Nucleic Acid Memory, NAM. The unique combination of nucleotides could be used to store and relate information long term, as hard drives or as an information management system and be translated to binary code used by modern machines to store digital data [1]. 

The process of storing data in DNA could be explained by the following steps that are also illustrated in the picture below [2].

  1. Encoding digital information by translation into a nucleotide sequence
    One proposed strategy for this is to directly map the binary data of a digital file to the four bases of DNA, for example: 00→A, 01→ C, 10→G and 11→T  [3]. This strategy allows two bits of digital information per nucleotide.
  1. Synthesizing (creating) the DNA molecule that will contain the data
    There are many methods available today allowing us to synthetically create DNA sequences from nucleotides. Recent advances in the field include a 70 grams hand-held sequencing machine developed by Oxford Nanopore Technologies that could enable fast sequencing outside the lab [1].
  1. Storing the DNA in physical (in some device) or biological (in a cell) conditions
    The DNA needs to be protected from moisture and oxygen to prevent it from decay by ex. UV, oxidation and hydrolysis [2].
  1. Random access to retrieve the data stored in the molecule
    To effectively be able to use DNA as a memory storage media, it must be possible to access and extract the desired data from a pool consisting of a complex mixture of DNA. This could be achieved by amplifying the desired sequence with address-specific primers that bind to the desired DNA, or sorting the data by attaching a magnetic tag to the sequence that can be used to separate the DNA from the pool [2].
  1. Data readout via DNA sequencing
  2. Decoding the DNA sequence back into the original digital code
Picture from Emerging Approaches to DNA Data Storage: Challenges and Prospects by Andrea Doricchi et al. [2]

The major problem with this strategy is that DNA sequencing is prone to include errors such as deletions of nucleotides or swapping one nucleotide for another. There would be guaranteed data loss without thorough mechanisms to correct these errors and ensure perfect data recovery. Though there are ways to solve this. According to “Reading and writing digital data in DNA” published by Linda C. Meiser et al., an additional step could be added after the encoding step to ensure that the DNA sequence includes all data without losses or errors [3]. They have developed a computer code that automatically modifies the DNA sequence in a reversible way to prevent errors when sequencing [3]. Other strategies to guarantee the data’s integrity is to avoid the sequencing step altogether by using fluorescent probes that bind to the DNA that can be read by microscopy [6].

Picture from Reading and writing digital data in DNA by Linda C. Meiser et al. [3]

Where is the problem?

Okay, so DNA could store information. Pretty cool. But it seems rather cumbersome to synthesize DNA just to store some data and then decode it to binary numbers so it can be understood by modern devices. Not to mention the natural errors occurring when reading and sequencing DNA. Yes there are some drawbacks with using DNA as a storage media.

As for now, data storage in DNA is still in its early days, and because of that, this technique struggles with the classical issues that most, if not every, new and innovative method has to overcome. 

First and foremost, it is expensive. At least right now it is! As a little shoutout to our fellow iGEM teams, you will have probably noticed that ordering synthesized DNA could devour your funds in no time! And this problem unfortunately also applies to DNA storage. To safe distinct and personalised pieces of information, one would have to synthesize DNA containing a certain, predetermined sequence. Depending on the amount of information that is to be stored, this can become very costly, as one megabyte of data currently costs around 12,400$ to be stored on DNA [4]. However, DNA synthesis is the target of research and innovation as well, so depending on upcoming advancements, the price per megabyte could be reduced drastically. Only time will tell!

Secondly, one has to consider the accessibility of the stored data. As we have already mentioned, retrieving data from DNA is no problem, but the time required to do this is. Compared to current storage media like CDs, USB-sticks etc. where you can just plug the device into your computer and access every single file in a few seconds, reading information from DNA is far more time consuming. The DNA needs to be sequenced before it can be translated into a format that is understandable for modern computers, these are extra steps that make it unhandy to access information. Nevertheless, as for now, DNA would still be very well suited for long term storage of data that isn’t required to be read out on a frequent basis [5].

Finally, to utilize this incredibly interesting approach to data storage, we must first learn how to properly use it. As an example, baseline DNA storage has a higher error rate than traditional storage media. However, researchers have already come up with a way to detect and correct such inaccuracies [5]! Nevertheless, this is something that has to be kept in mind when working with data stored on DNA. Additionally, one has to consider that data storage is one of the most essential concepts in our modern world, be it in research, at work or simply while strolling through a park and taking cute pictures of squirrels. As a result, a sophisticated infrastructure is required to use DNA as a storage medium in our everyday lives. This infrastructure is not here yet, nevertheless, we could still use this concept right now already for distinct pieces of information that require a stable long term medium.

Opportunities of data storage in DNA

But with all these disadvantages, why even consider using this kind of memory storage? There are quite a few reasons actually! NAM possesses some exciting properties that clearly outweigh the negatives and put competing memory storage materials to shame!

Today, most memory devices are constructed from semiconductors made of for example silicon, germanium or gallium and with the increasing demand for data storage, analysts estimate that the global memory demand will exceed projected silicon supply already in 2040 [1]. DNA on the other hand, is a biological molecule that will self-assemble and is recyclable. The nucleotides constructing the DNA in cells, can be rearranged to store new information. Think about all the possible ways these properties could be taken advantage of during production of NAM! For example, the manufacturing could leverage existing biological bi-products and waste from other industries, such as fish eggs or remainders from harvested plants, as well as recycle old NAM products to use as raw material, thereby reducing waste and environmental impact during production [1]. 

One other advantage that might come to mind is the sheer density of information that can be encoded on DNA. Just think about it, unimaginably small amounts of DNA in the nucleus of a single cell holds all the necessary information to construct and maintain a living organism! DNA has an information density that is 10³ times higher compared to flash memory [1]. This is especially interesting, as traditional hard drives etc. are starting to struggle with the immense amounts of data that are required today as well as in the future. As we have already mentioned, there are calculations that predict lacking supplies of silicon, which is required to produce flash memories, already in the year 2040! As a result, it is crucial to find a medium that can store high amounts of information, in a resource efficient way. DNA appears to fit in nicely to these requirements! On top of that, it consists of biomolecules which can be recycled and retrieved from biological wastes. To put this into perspective, we would only need one kilogram of DNA to satisfy the required storage space in the year 2040 [1]!

Data storage on DNA also has the advantage of being more environmentally friendly than traditional methods! The amount of energy required to operate DNA as a storage medium is 108 times lower compared to flash drives [1]. By storing most of our data in DNA, we could notably reduce the amount of energy that is consumed, which in turn lowers the environmental impact through energy producing facilities. 

Finally, the by far biggest reason for using DNA to store data is its longevity. Regular hard drives etc. are not very well suited for long term storage exceeding 50 years as stored information needs to be transferred to new, fresh drives on a regular basis. This is crucial for traditional storage media, as they are living on borrowed time, and as soon as this time is over, all of the information inscribed on them is lost forever. DNA easily outshines everything that we have seen so far in terms of long term data storage. At room temperature, the half-life of DNA exceeds 100 years, and by lowering the temperature, the time in which DNA can be reliably used as a data storage medium increases exponentially! To give you an idea of what this means, the complete genomes of a Neanderthal (ca. 50,000 years old) as well as of an ancient horse (ca. 700,000 years old) could not only be retrieved, but also successfully sequenced [1]! This makes DNA the perfect medium for information, clearly outperforming everything that has been invented to this day!

DNA data storage in pop-culture

In 2020, Netflix released the series “Biohackers”, a science thriller about a medicine student at the university of Freiburg (what a nice coincidence). You may ask yourself what this has to do with our article. And the answer is quite simple! The first episode of this series is the very first Netflix series to be stored in DNA [7]. As you can see, first efforts are being made to implement this outstanding technique into our everyday lives!

Future perspective

As for now, DNA storage is obviously not ready to be used in a widespread manner yet. However, it is making rapid advancements and together with the unceasing progress in the field of DNA synthesis, DNA as a storage medium is getting increasingly accessible. It still has a long way to go, and data storage in DNA is not going to replace traditional technologies overnight. These two approaches are likely going to coexist and will be used accordingly in different fields, as DNA is for example highly suitable for long time storage, while flash drives are currently outperforming DNA in terms of short time storage. Once prices for DNA synthesis have gone down and a suitable infrastructure for working with genetically stored information is established, there is nothing stopping us from buying movies in the form of a little test tube instead of a blu-ray!  

[1] https://www.nature.com/articles/nmat4594.pdf

[2] https://pubs.acs.org/doi/10.1021/acsnano.2c06748#

[3] https://www.nature.com/articles/s41596-019-0244-5 → Has created functional code to encode and decode binary data to nucleotide sequence

[4]https://lifelinedatacenters.com/data-center/dnas-digital-storage/

[5]https://www.bbc.com/news/science-environment-59489560

[6] https://www.nature.com/articles/s41467-021-22277-y

[7]https://www.medicaldevice-network.com/analysis/dna-data-storage/

Website | + posts

Hello! I'm Emy!
I'm part of the Chalmers-Gothenburg team 2022.
I think synthetic biology is super cool. I am writing on this blog in hopes that you will think it is as exciting as me when you hear about what can, and has been achieved with genetically engineered microorganisms. It is truly amazing!

+ posts