Shall I Encode Thee In DNA? Sonnets Stored On Double Helix The world is full of data — and that's a problem. We have to find a place to store all those digital photos, tax records and unfinished novels. British scientists have demonstrated a possible solution: They've stored all of Shakespeare's sonnets on several small stretches of DNA.
NPR logo

Shall I Encode Thee In DNA? Sonnets Stored On Double Helix

  • Download
  • <iframe src="" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript
Shall I Encode Thee In DNA? Sonnets Stored On Double Helix

Shall I Encode Thee In DNA? Sonnets Stored On Double Helix

  • Download
  • <iframe src="" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript


Here is a problem facing us in this digital age. All the data we're stockpiling - digital images, tax records, unfinished novels - where are we going to store them? Some scientists say they may have a solution. It's not digital, it's biological.

NPR's Adam Cole reports scientists have successfully tested out using DNA as an archive by recording all of Shakespeare's sonnets on a double helix.

ADAM COLE, BYLINE: It all started in a pub a few months ago. Nick Goldman and Ewan Birney, two scientists from the European Bioinformatics Institute, were drinking beer and discussing a problem.

Their institute manages a huge database of genetic information - thousands and thousands of genes from humans and corn and pufferfish. And Goldman says all that data - and all the hard drives and the electricity used to power and keep them cool - is getting pretty expensive.

NICK GOLDMAN: The data we are being asked to be guardians of is growing exponentially. But our budgets are not growing exponentially.

COLE: That's a problem faced by many large companies with expanding archives - and the solution was right in front of the researchers. They worked with it every day.

GOLDMAN: We realized that that DNA itself is a really efficient way of storing information.

COLE: That's right. DNA, the genetic material that makes us us - is a natural hard drive. Here's why. It's a long chain that repeats four basic chemical units.

GOLDMAN: Four different bases - that's different forms of molecules - A, C, G and T.

COLE: Those are the four letters in DNA's alphabet.


COLE: When these letters are arranged in different ways, they spell out different instructions for our cells.

: A, G, A, C...

COLE: Three billion of those letters make up the entire instruction manual for our existence. And it's all stuffed into each cell in your body. DNA is millions of times more compact than the hard drive on your computer.

GOLDMAN: If only we could persuade it to take the form we wanted, encoding the information we defined.

COLE: Like a text file instead of genetic information. Over a second beer, Goldman and his colleague started to sketch out the details. They started with a text file of one of Shakespeare's sonnets.

UNIDENTIFIED MAN: Shall I compare thee to a summer's day?

COLE: This text file was written in a computer's most basic language.

GOLDMAN: Zeroes and ones.

COLE: Bits stored on a magnetic hard drive.

GOLDMAN: And some of these...

: Zero, zero, zero, one, zero, zero...

COLE: With a simple cipher, Goldman and his colleagues translated these zeroes and ones into the letters of DNA.

: C, G, C, A, G, A...

COLE: And then they did the same for the rest of Shakespeare's sonnets, and an audio clip of Martin Luther King's "I Have A Dream" speech, and a picture of their office. They sent that code - those strings of A's C's G's and T's - off to a company that built the physical strands of synthetic DNA and sent them back to Goldman.

GOLDMAN: My first reaction was that they hadn't done it properly because they sent me these little tiny test tubes that were quite clearly empty.

COLE: But the DNA was there - tiny specks at the bottom of the tubes. They sequenced the DNA, read the code, ran their cipher backwards...

: (Unintelligible)

COLE: And they ended up with a 100 percent accurate Shakespearean sonnet.

UNIDENTIFIED MAN: So long lives this and this gives life to thee.

COLE: All from the tiniest speck of DNA. They published their results in the journal "Nature," joining other groups who have experimented with DNA storage.

Goldman says the process would be easy to scale up. If you took everything human beings have ever written - an estimated 50 billion megabytes of text - and stored it in DNA, that DNA would still weigh less than a granola bar.

GOLDMAN: There's no problem with holding a lot information in DNA. The problem is paying for doing that.

COLE: The process would cost more than $10,000 per megabyte.

GOLDMAN: It's an unthinkably large amount of money at the moment.

COLE: At the moment.

Goldman and other scientists who are dabbling in DNA storage know that DNA synthesis costs are dropping rapidly. In a decade or so, a DNA archive might be cheaper than a room full of hard drives.

Adam Cole, NPR News.

Copyright © 2013 NPR. All rights reserved. Visit our website terms of use and permissions pages at for further information.

NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.