How do you create an internet archive of all human knowledge?
BREWSTER KAHLE: Boy, it's - everything digital is just completely ephemeral, whether it's formats that go out of date, like those floppies - just try to run it - or a CD. Find somebody that has a DVD player. I mean, it's just - it's starting to get such that things that are really recent are just going away.
MANOUSH ZOMORODI, HOST:
This is Brewster Kahle.
KAHLE: The average life of a webpage before it's either changed or deleted is a hundred days. That's it. I think it was kind of a cruel joke to call webpages pages because you would think of them as lasting a long time, you know, Gutenberg Bibles and all of that kind of thing, and nope.
(SOUNDBITE OF MUSIC)
ZOMORODI: Brewster knew this was going to be a problem - disappearing websites, missing chapters of the internet. He knew this way back in 1996, and that is why he created the Internet Archive. As we heard, it's where Caper In The Castro can still be found, and over the years, the archives mission has expanded, saving old books and movies, TV shows and music.
KAHLE: The idea is to try to build the library of everything - the Library of Alexandria for the digital age. We can make all the books, music, video, webpages, software, everything ever published by people available to anybody curious enough to want to have access to it. That was the dream of the internet, and the Internet Archive is a part of making that dream come true.
ZOMORODI: Which is a lofty goal and a huge endeavor because, how does someone even begin to back up the internet? Brewster started by building something he called the Wayback Machine.
KAHLE: Yes, the - probably the most used and important part of the Internet Archive right now is the Wayback Machine, where we have collected webpages by going and basically clicking every web link on every webpage every two months. So if you go to archive.org and type in a URL, we'll show you different versions of that URL over time. We collect about a billion URLs every day, and we're finding that it's really important to journalists that are trying to find what actually happened. Lawyers love it because they can use it to say, hey. You said this before, and now you're not saying that. It's the only record often.
ZOMORODI: Yeah. And you can dig up information that someone's deleted.
KAHLE: Yes. Take Donald Trump's tweets. A large part of the policy of the impact on our country while Donald Trump was president was through his Twitter feed, and then they turned it off. So it all just kind of disappeared. So we have a copy that we have made available through the Wayback Machine that makes it so we can see what it is. Or when a company goes under Geocities and just everybody's sites go away - there are endless of these sites that go away or make different business decisions, and so people say, gosh, I'm glad that I can get a hold of that.
KAHLE: But we run into things like locked files, databases you can't get through to. Some of those are - make things real challenges. We're working back and forth with the different websites to try to make things available. And the web also has parts that go obsolete so that the old websites - you can't replay them anymore. So there's challenges every day.
(SOUNDBITE OF MUSIC)
ZOMORODI: Do you ever worry about things being lost to the past? I mean, I can imagine that this would make you neurotic...
ZOMORODI: ...Like, oh, we missed something.
KAHLE: Oh, yes. We missed Napster.
ZOMORODI: Oh, really?
KAHLE: So Napster was maybe the best, biggest music library ever built by people, and it was shut down. We didn't get it. And if you just take the libraries in Ukraine that are being purposefully targeted, just the same way the Nazis targeted the library in Belgrade, it's a way of erasing a culture. You go after their libraries. So, yes, we're worried about this all the time.
ZOMORODI: Right. So what do you do? Do you try to go back and fix things that you missed? Do you have an example, maybe?
KAHLE: Well, on Wikipedia, we've tried to take all of the footnotes, all the citations and turn them blue - turn them into little links. So we went and worked to fix the broken links in Wikipedia. We've now fixed over 15 million broken links. We've prioritized the books that are referenced in Wikipedia and acquired those books - bought them or got them donated - and we digitize them and then put them back in such that if there's a page number, you can click and turn right to that right page. We did a big project on the Ukrainian Wikipedia to try to collect all of the books that are referenced and make those clickable.
ZOMORODI: So how much harder is it to collect everything that's on the web now compared to, say, a decade ago or 15 years ago, simply because it is behind paywalls, or it's - you can't access it without a login?
KAHLE: Yes. So we have robots that are going around and collecting a million URLs, and fortunately, there are over a hundred people that work for the Internet Archive that are trying to work on keeping it all alive. We don't collect every YouTube video. It's just too big. But we try to collect ones that are referenced a lot or that are linked to from Twitter pages, say. So we can't collect everything, but we collect a lot.
And if we're not collecting the right things, go to archive.org. There's a save-page-now feature, and you can put in a URL, and people do this all the time. It's used at about 80 times a second. So even - anybody can go and participate in making things permanently available. I just did this for the obituary for my aunt. I went to the webpage, made sure that that obituary from that funeral home service was archived. So I did that this morning. So you, too, can go and participate in building the web archives.
(SOUNDBITE OF MUSIC)
ZOMORODI: In a minute, a new challenge that Brewster and the Internet Archive are facing - a legal battle between them and some of the biggest book publishers. At issue, whether archiving e-books is digital piracy or preserving the best of humanity for everyone to enjoy. I'm Manoush Zomorodi, and you're listening to the TED Radio Hour from NPR. Stay with us.
It's the TED Radio Hour from NPR. I'm Manoush Zomorodi. Today on the show, for all eternity. And we were talking to Brewster Kahle, the founder of the Internet Archive, a nonprofit that is trying to digitize everything that we humans create, from websites to music, old movies and, of course, books.
(SOUNDBITE OF MUSIC)
KAHLE: The Library of Congress has about 28 million books. We've digitized maybe 6 or 7 million. We physically own, probably, on that order, so we still have a long way to go.
ZOMORODI: And it's getting harder to proceed because e-books present very particular problems. For example, that e-book you downloaded, you don't actually own it.
KAHLE: So it turns out the big publishers don't sell e-books. They license them. So your e-book that's on your Kindle or whatever, you don't actually have that, not in the same sense you had a physical book. You can't pass it down to your kid. And anytime they want to change it, they can change it at any time or make it go away.
ZOMORODI: It's a licensing issue. And Brewster and the Internet Archive started trying to get around it by buying physical copies of books, scanning them and making their own e-books to lend out.
KAHLE: So we started that in 2011. And in the beginning of the pandemic, four of the largest publishers decided to sue the Internet Archive to say that you aren't allowed to digitize and lend.
ZOMORODI: What the archive calls equal access, those publishers say is digital piracy.
KAHLE: And that suit is ongoing. We'll hear, probably next year, from the district court, and it'll probably be appealed, but we'll see. The big concept that I never really would've imagined would be at play is digital ownership. When you buy a digital file, do you own it in the same sense that you owned a physical thing? You can't just go and post it and give it to everybody. That's understood. Fine. But do you get to keep it? And what the big publishers are saying is, no, there's no such thing as digital ownership anymore, ever. So that's the absolute opposite of what we were doing with the internet in the earliest days when we're trying to democratize access, democratize creation.
ZOMORODI: I actually went back into the TED archives and watched your talk from 2007, where you laid out your vision. And you knew then that there would be conflicts, even if you didn't know what they were.
(SOUNDBITE OF TED TALK)
KAHLE: There's a political and social question out of this is all of this - as we go digital, is it going to be public or private? There are some large companies that have seen this vision that are doing large-scale digitization, but they're locking up the public domain. The question is, is that the world that we really want to live in? What's the role of the public and versus the private, as things go forward? How do we go and have a world where we both have libraries and publishing in the future, just as we basically benefited as we were growing up? So universal access to all knowledge - I think it can be one of the greatest achievements of humankind, like the man on the Moon or the Gutenberg or the Library of Alexandria. It could be something that we're remembered for, for millennia for having achieved.
I think people have no idea of the heroics that not only the staff of the Internet Archive but now a thousand other organizations we work with on the web collection, about 500 libraries, and the book collections of how much goes on to try to make it so that the web that we sort of take for granted works, that you can get to past versions, that you can get - a lot of the links that you link to, I guarantee, if you've never heard of the Internet Archive, you've used it because it's just woven into everything.
ZOMORODI: That brings me to a final sort of existential question, Brewster. If everything digital eventually becomes obsolete, how do you archive the archive so it doesn't become obsolete, too?
KAHLE: Boy, libraries are - you know, they're destroyed all the time, so - and the question is how? And it's often governments or large powerful entities like corporations that seek to destroy them. So you want more than one copy in more than one place. Then you also want to make it so that it's still used, so that it's cared for. Our collections are almost completely on spinning disk, so we have to replace those every 5 to 10 years or they will go away. So we need people to want it to stay around. Fortunately, there are many, many, many people and many younger people that are seeing this as a path forward.
ZOMORODI: Not an easy path.
KAHLE: Building a library of everything is a challenge, but it starts one webpage at a time, one book at a time. And if we see ourselves as preserving history collectively, we'll all make it come true.
(SOUNDBITE OF MUSIC)
ZOMORODI: Brewster Kahle. He's the founder of the Internet Archive, and you can see his full talk at ted.com. Thanks so much also to C.M. Ralph, the artist and maker of the video game Caper In The Castro, and Adrienne Shaw, professor of media studies and production at Temple University.
NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.