Google's Tally Of World's Book Titles: 129,864,880

How many different books are there in the world? It's a question Jon Orwant often hears. He's an engineering manager at Google Books — part of the tech company's efforts to digitize all of the world's books. Recently, they announced an answer: 129,864,880. Melissa Block talks to Orwant about how Google came up with the number.

MELISSA BLOCK, host:

You may not have devoted your waking hours to wondering exactly how many books have been published, but Google has. In fact, they're trying to count as part of a mission to index and digitize all of them.

So far, Google says, our planet has produced 129,864,880 books - at least that was the number as of last week. Jon Orwant is engineering manager at Google Books. So Jon, that was the number last week. What about now?

Mr. JON ORWANT (Engineering Manager, Google Books): Well, now we're up to a shade over 130 million.

BLOCK: Okay, and that's that addition is due to what?

Mr. ORWANT: A combination of various things, but most of it is simply people printing more books.

BLOCK: Okay, well, let's define our terms here. What do we mean when we say book?

Mr. ORWANT: So to us, a book is simply a printed and bound set of pages. Basically, what we were trying to count was the number of monographs, kind of what you think of as typical books when you go into a bookstore or a library.

BLOCK: Okay, and obviously, we're not talking about individual copies of books. We're talking about distinct books.

Mr. ORWANT: Correct. You can think of it as being editions.

BLOCK: And when you're coming up with this number, Jon, are we talking about every book ever written anywhere on the planet?

Mr. ORWANT: Pretty much. That is what we were trying to estimate. So that includes books that were published or really printed before the day of Gutenberg. It includes every book that we're able to find in any bookstore, any library or from any of the 30,000 publishers that we work with.

BLOCK: Now, has it's been harder to come up with a number than you'd thought. Why is that?

Mr. ORWANT: So part of the reason is that there are many, many different standards for representing a book in any list of books. Librarians, as they go through and try to take down information from the copyright page, from the cover of the book, from the back cover of the book, when trying to figure out when the book was published, reasonable people could disagree.

So you want to be very careful as we take in all this information from our thousands of different sources to make sure that we account for those differences.

BLOCK: Okay, well, Jon, you know the complaints here. There have been a lot of people who take this all very seriously who say that your numbers are off. I was reading a posting on the technology journal Ars Technica, and somebody called your number probably complete bunk. They say your methodology is just really flawed. What do you think?

Mr. ORWANT: Yeah, I see we've been reading the same blog post.

(Soundbite of laughter)

Mr. ORWANT: So this number is an estimate. That Ars Technica article was based on another blog post, which was making the observation that a lot of metadata that we show, when people go to books.Google.com, is sometimes flawed. And that is absolutely true, but that is actually a separate issue from how we are actually counting all of the books.

A way to think of it is this: For any particular edition of "Tom Sawyer," we'll get the University of Michigan saying it was printed in this year, the Library of Congress saying it was printed in a different year. We're correctly able to identify that those are all referring to the same edition, but ultimately, when we want to show this to the outside world, we have to pick something to show, and we'll sometimes pick wrong. And that's what this original blog post was pointing out.

And then Ars Technica looked at that and just kind of drew this conclusion that any sort of attempt to analyze this number must therefore be flawed, which is not something that we agree with. You know, we're pretty confident in our estimate.

BLOCK: Why do you need this number, total number of books ever produced?

Mr. ORWANT: Curiosity, plain and simple. It's not something that we need for any particular reason other than the fact that a lot of the libraries that we've been working with for all of these years have been asking us. They've been curious as to how many books there are in the world.

BLOCK: Jon Orwant is engineering manager at Google Books. He spoke with us from Cambridge, Massachusetts. Jon, thanks so much.

Mr. ORWANT: Thank you.

Copyright © 2010 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by a contractor for NPR, and accuracy and availability may vary. This text may not be in its final form and may be updated or revised in the future. Please be aware that the authoritative record of NPR’s programming is the audio.

Comments

 

Please keep your community civil. All comments must follow the NPR.org Community rules and Terms of Use. NPR reserves the right to use the comments we receive, in whole or in part, and to use the commenter's name and location, in any medium. See also the Terms of Use, Privacy Policy and Community FAQ.