Shakespeare's Sonnets, Encoded In DNA
IRA FLATOW, HOST:
How much information do you think exists in the entire world? Take a guess. Forget megabytes and gigabytes and terabytes and petabytes, even exobytes. We're talking zetabytes here or 10 to the 21st bytes. Take the number 10, put 21 zeroes after it, that's what you've got because one recent estimate says there may be around three zetabytes of digital information out there. That's over one trillion gigabytes. Just imagine all those hard drives piled up, and then imagine them not starting up when you plug them in.
There's got to be a better way to store all that data long-term, right, for a long, long time because, you know, some of these CDs, DVDs, they're starting to wear out. I know some of mine, some of our archival stuff is starting to stuff there and not play anymore.
Well, my next guest has come up with a solution that's much longer-lasting, it's more compact than today's hard disks and magnetic tapes, and it is DNA. Nature uses DNA to store information. So why shouldn't we? And, you know, DNA can last for tens of thousands of years. Think of those Neanderthal bones.
Nick Goldman is the leader of a research team studying molecular genome evolution at the European Bioinformatics Institute in the U.K. And his paper on DNA data storage appears in the journal Nature this week. And he joins us by phone. Welcome to SCIENCE FRIDAY, Dr. Goldman.
NICK GOLDMAN: Thank you very much, glad to be here.
FLATOW: How did you come up with the idea of storing information in DNA? Can DNA actually store those ones and zeroes and things?
GOLDMAN: It stores the equivalent of ones and zeroes. We came up with the idea in a bar in Hamburg a few years ago because the institute where I work is - keeps some of the world's big, biological databases. We store the information, we archive the information, and we serve it to research biologists around the world over the Internet for them to do their research.
And the data that's coming to us is increasing exponentially. It's quite a headache for us to keep hold of all of that because our budgets don't increase exponentially.
FLATOW: Yeah, did you actually try this out on some DNA?
GOLDMAN: Yeah, we tried it out in a real - it's sort of compared to the databases we store, it's a small example, but compared to what people have done before, it's a pretty big example. We wrote a computer program that embodied a code that would convert the zeros and ones from a hard disk drive into the letters that we use to represent DNA, and then we - our collaborators in California were able to actually synthesize physical DNA.
FLATOW: And what did you store in that DNA?
GOLDMAN: We chose a few computer files that would just illustrate the range, but really, I mean, anything is just as possible. But we chose a photograph of our own institute because we're sort of self-publicists at heart, I guess, and an excerpt from Martin Luther King's speech "I Have a Dream," all of Shakespeare's sonnets and a PDF that contained in fact the paper, the scientific paper by Watson and Crick that first described the structure of DNA itself.
FLATOW: How much - is it possible, and let me ask you: How much DNA did that take? How much storage space physically did that all take up?
GOLDMAN: Physically in our experiment that took up - we couldn't really measure it in a scientific way, but between you and me, it's like a speck of dust.
FLATOW: All that stuff on a speck of dust.
FLATOW: And how much room would you need to store all the information in the world?
GOLDMAN: So we - I did a quick calculation the other day on the back of an envelope for that one, and I think you could do all the information in the world in one and a half cubic meters. So it would sort of go in the back of your station wagon, I guess.
FLATOW: Wow, I'd be careful about where I was driving if I had...
GOLDMAN: It would be really heavy. It would be bound to the suspension.
FLATOW: I'd worry that, you know, I'd drop it off somewhere. So is this a mass storage for libraries, or, I mean, because we don't have the ability to translate that stuff into DNA do we, at home, not yet.
GOLDMAN: No, not at home. And it's still too expensive to do it on any sort of large scale. The prices are coming down really quickly. It's driven by the revolution in genome technology and genome research. And at the moment - we did some cost projections. At the moment, it's so expensive that it's only economically viable if you're going to store it for 600 or more than 1,000 years.
On the long term, it's a good deal because once you've made the DNA, it costs nothing to maintain it in a good state.
FLATOW: All right, Nick, stay with us. We're going to take a break. We'll come back and talk a little more with you, if that's OK with you. Our number, 1-800-989-8255, talking with Nick Goldman about storing information on DNA. We'll be right back after this break.
(SOUNDBITE OF MUSIC)
FLATOW: I'm Ira Flatow. This is SCIENCE FRIDAY from NPR.
(SOUNDBITE OF MUSIC)
FLATOW: I'm Ira Flatow. This is SCIENCE FRIDAY from NPR.
(SOUNDBITE OF MUSIC)
FLATOW: This is SCIENCE FRIDAY. I'm Ira Flatow. We're talking with Nick Goldman about storing information on DNA. He's already tried this out. He's got all of Shakespeare's sonnets, Martin Luther King's speeches, text, files, photos, all in this - getting DNA to encode it all. Well, they encode - is this a synthetic DNA, you make the DNA yourself?
GOLDMAN: Our collaborators are a company called Agilent Technologies in California. They did that. They're one of the world leaders in the technologies that make DNA, synthetic DNA, to human designs.
FLATOW: And when we went to the break, you were saying that one of the advantages of storing it in DNA is once you make it, it will last a very long time.
GOLDMAN: That's right, and you gave the example in your introduction that - one of the examples that we like to use, it's quite well-known, you know, there's been these Neanderthal samples found, and they can get DNA out of there and read the DNA in technical - we call it we sequence the DNA. And those samples from Neanderthals, or another great example is wooly mammoth, they get samples routinely from 10,000- or 20,000- or 50,000-year-old mammoths.
And that's not even a carefully controlled sample. That's just, you know, a mammoth that laid down and died somewhere cold.
FLATOW: I'm sorry for the mammoth. So - but if you took your DNA, you would store it in a controlled environment somewhere, or does it not need that? If the mammoth is lying in the ice...
GOLDMAN: It doesn't really need it, but the first applications would be something of very, very high value. That's the only way you justify the currently high expensive, globally important historical documents or records of nuclear waste dumps or something like that. You could store it just in your house. I've got samples sitting around in my office, and they're not coming to any harm.
But if you wanted to be extra cautious, you'd keep it cold, near freezing, and you'd keep it dry, and you'd keep it in the dark.
FLATOW: You know, those us who are of the science-fiction ilk would think of these dangerous sequences, you know, of events where somehow the bacteria or other organisms see your Shakespeare sonnet DNA and say I want to read that and suck it up and turn it into something else.
GOLDMAN: The code we use to store information in DNA is completely different from the codes that living organisms on Earth use in their DNA. So it's no more dangerous to have the information stored in DNA that way than a piece of plastic's dangerous because you can make a CD out of it.
FLATOW: OK, now let's talk nuts and bolts here, Nick. How close are we to actually getting something of practical use outside of the experimental laboratory here?
GOLDMAN: We don't know because we're relying on technology to improve to bring those prices down. But we've looked at what's been going on in the last few years, driven by genome research. And the most expensive stage is synthesizing DNA - the prices there drop by a factor of 10 about every five years. So 10 years from now, it will be 100 times cheaper, and our projections are that at that point it's something you or I might be able to afford to do for information that was of great value to us that we wanted to keep on something like a 50-year time scale.
And that's something like - you know, maybe that will be my children taking a video recording of their wedding and wanting to put that somewhere safe for their grandchildren. It will be expensive, but it will be possible. And if you care enough about the information you want to pass down through a couple of generations, you might choose to do it.
FLATOW: I'll place it on the shelf right next to my 3-D printer when I...
GOLDMAN: It will be safe there.
FLATOW: One question is: With all this stuff in the DNA, how could you find that sonnet? If you had everything in the world in the back of your station wagon, how do you index that and find what you're looking for?
GOLDMAN: You're right, you'd have to decide how much you wanted to c lump together. And so you wouldn't put it all in the back - in one big box in the back of the car. You'd separate it up into whatever was a useful-sixed unit. And this isn't a new problem. That's what we do with all the words in all the books in a library.
We don't put them in one big bin and say your information is in there. We order them on the page, and we put the pages in a book. And we have an index system so we can look it up. And we'd have to invent the same kind of system to index different test tubes with information about different things in them.
FLATOW: All right, Nick, we're going to meet you back here in 10 years, OK, and see how far you've gotten.
GOLDMAN: OK, I'll make a date.
FLATOW: All right, Nick Goldman, leader of a research team studying molecular genome evolution at the European Bioinformatics Institute in the U.K. on storing all this information, we can just make DNA out of it, and it'll last for thousands of years. Thanks, Nick, for joining us this hour.
GOLDMAN: It's my pleasure.
NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.