DAVE DAVIES, host:
What do you call a company that, in less than ten years, grew from practically nothing to a $225 billion high-tech enterprise, giving its employees free rides to work, subsidized childcare, and discount bike repairs, hair cuts and oil changes? It's Google, and Randall Stross' book about the search engine giant includes many measures of its size and reach. Here's one: By 2006, the company's data centers consumed more electricity than all the television sets in America.
Google's had to cut back on some of its employee perks lately. But Stross' book focuses less on Google's size and wealth than on its ambition to collect and organize virtually all the information in the world - from news stories, to maps, to videos, to every book ever published. Randall Stross writes the "Digital Domain" column for the New York Times and is the author of "The Wizard of Menlo Park, eBoys, and the Microsoft Way." His new book is called "Planet Google."
Well, Randall Stross, welcome to Fresh Air. Let's talk a little bit about Google's core business and that is, of course, Web searches. Now, I'm at computer here, and I'm going to type in your name in a Google search. And within a second, it gives me a list of 45,000 Web pages that have either Randall or Stross. Now, is it possible to describe in layman's terms what happens when we execute a Google search?
Mr. RANDALL STROSS (Journalist; Author, "Planet Google: One Company's Audacious Plan to Organize Everything We Know"): When you put in your search phrase and the results come up and it tells you how long it took - by that way, how long did that take? Does it say?
DAVIES: 0.12 seconds.
Mr. STROSS: Which may seem like a very, very fast time, given it looks like it has searched the entire Web and come back with results. It should be remembered that that time is only measuring the time it takes for Google to look at its copy of the Web and its index and then come back with the answers. So its index may or may not be fully current. Its spider goes out and collects copies of Web pages continuously, but that's not what it's doing when we send in a search request.
DAVIES: So, what it's doing then is it's really searching not the Web, but the copies of millions and millions of Web pages that it has accumulated in Google's electronic library where it can sort through them quickly.
Mr. STROSS: Right. It recently announced that it had passed the one trillion-page milestone. And what it does once it collects those pages is it looks at how they link to one another. The core of the Google algorithm is to make an educated guess about which pages are most likely to be useful. And the many things that it takes into account include how many times does a page from some other site put a link to that page? And before you can judge whether these outside links are valuable, you then have to look at what links to those pages. So you have to keep working your way outward, checking the references of the references and then the references of the references of the references. But by doing so and tracing those connections outward, it can make a guess and say, this page seems to be accorded more importance than this other page.
DAVIES: And that's done by these incredibly complex mathematical formulas.
Mr. STROSS: That it does not release to the public.
DAVIES: One of the things that you write about in this book is that when Google scaled-up its Web browsing - its Web-searching infrastructure, unlike a lot of companies, it didn't rely on readymade hardware that was built and bought it off the shelf. It built its own stuff, and it built lots and lots and lots of it. And you describe that they have - I don't know, how many data centers across the country? Tell us a little bit about those places where the computers are whirring away in the middle of the night conducting our Web searches.
Mr. STROSS: Google runs a number of data centers that are filled with racks of computers, racks that have the innards of, essentially, our PCs, and each data center has copies of the entire Web and its indexes so that it can minimize the time it takes for your search to go to one of their data centers - they usually route it to the physically closest one to you - and come back with an answer. It turns out that as fast as the Internet uncongested can run, physical distance does matter, and the closer Google is able to locate a data center to you, the faster you get your results.
It faced a choice early on. It could buy equipment that was specifically engineered for running continuously, not failing and being the most reliable equipment, state-of-the-art equipment available. It chose not to use that category of equipment but instead build its own machines using off-the-shelf parts that are very inexpensive because they're the same parts go into our PCs. And it knew those components, because they're so cheap, are going to fail. You can count on them failing. But they developed software that builds in redundancy and allows the system to in essence detect and then route around any problem on the computer.
And since much of the computing can be distributed across many machines - so your search request, when it goes in, doesn't go to one machine. Imagine it this way: a portion of its search, let's say, when you typed in my name, it was going to look for the S-T-R part of my name, will go to one computer, the S-T-R, A through M, and then the other part will go through the S-T-R, N through Z listing, actually much smaller slices. So it could divide a request into many little slivers, send those out to many different machines, and then collect them all and package them into the results you're going to see on the screen. By distributing the work, it's able to speed up the response, and it's also able to have many backups in case any one of those should fail.
DAVIES: This company was formed by a couple of college guys, right? Larry Page and Sergey Brin. The lore is it started in a garage and then it grew and it grew and it grew and it grew, but one of the things that you write about is that a defining battle as the Internet came into being was the struggle between open and closed information, and that Google, in a way, took advantage of the fact that the Web became open. Give us an example of what you mean, open and closed.
Mr. STROSS: When Google started - and it really goes back to when the two co-founders were graduate students in computer science at Stanford, even before they decided to found the company, move off campus and rent a garage - their research was investigating the way links from one page could be analyzed to come up with an educated guess about the value of another page. And they had full access to the Web. No one blocked their software from going out and making copies of all available pages on the Web, and the Web was small enough, at the time, that it literally could fit on the hard drive of a single machine in their dorm room.
But at the same time, there was another, older model about how information should be made available online. And that's the AOL model, which said information is valuable, and you must pay a subscription in order to gain access. You had to become a member of a service. Now, in the early days, it wasn't clear why anyone would put up information to make it available for free if it didn't lead to someone paying a subscription fee. In the early days, advertising was not clearly going to support publishing on the Web. In fact, Google is going to be founded, and it's going to get about two years in before it becomes clear that advertising just may work out after all.
DAVIES: So it's open to everybody, and millions and millions of people provide content for free, and it's out there for Google to index and provide access to. But it's also made a fortune, and this is what's one of the most fascinating things about this story is that they happened upon what really is the most simple of notions which made a fortune. How did they make all their money?
Mr. STROSS: What has proven to be so valuable are the little plain-text ads that run on the side of a search results page or at the very top. Or, not quite as valuable, but also what has turned into big business are the Google ads, the Google supplies, other Web sites that are matched to whatever words happen to be on someone else's Web page. When Google began with those text ads, it was not sure that this would turn into a big business. At the same time, it was shopping around its search engine services, hoping to license their technology. That seemed at the time to be as promising a course for them as the little text ads.
But what turned out was those text ads were incredibly effective because they are linked to what was on the mind of the searcher. And what's ingenious about them, and it was an accidental discovery, is that they're useful without knowing anything about the person who's typing in the search request. You don't need to know their gender, their age, their race, their religious background, even their hobbies or interests. All you need to know is what is on their mind at that moment, and you know that because of what they put in for their search request. So it's a really radical idea that advertising could be incredibly targeted without knowing anything about the demographic profile of the person.
DAVIES: You said that Google depends upon keeping track of what we're doing at other sites. Why?
Mr. STROSS: Whatever we do is helpful to it to understand what we value. If you can think about Google's Web search service, it is a service that tells us what other people have judged to be useful or not useful. And Google needs our contributions, our free contributions, our judgments. That's what its link analysis is all about. It's benefiting from the judgments that Web site editors have made when they put links to other sites into their Web pages.
Google has added all sorts of services unrelated to searching the Web that are designed to hold information that is very dear to us and personal to us on its servers. There is Gmail, its email service. It has a number of services that are competing with Microsoft Office, so instead of running Word on our personal computer, we can use Google Docs that run on Google servers. And its information collection expands, and as it comes to know more and more about us because we are using its services and saving our data on its servers, it can develop more sophisticated ways of trying to guess what we would like to see as far as advertising.
DAVIES: We're speaking with Randall Stross. His new book is "Planet Google." We'll talk more after a break. This is Fresh Air.
(Soundbite of music)
DAVIES: If you're just joining us, we're speaking with technology writer Randall Stross. He has a new book called "Planet Google: One Company's Audacious Plan to Organize Everything We Know."
It's fascinating to read your book and see how many audaciously ambitious things Google undertakes, like its ten-year project to put every single book on earth in digital form, which is, of course, an imperfect experiment, far from complete, fraught with all kinds of legal challenges. But it sort of raises an interesting question about what this company is really about. There's sort of this informal model: don't do evil. Should we see this as a benevolent force in the world, or should we worry about any institution that amasses as much money and power and information as this one has?
Mr. STROSS: As long as we get to see Google wrestle with the tough questions - for example, it has had to justify to a skeptical world why it would cooperate with the Chinese government to set up servers within China that censors search results. And it has been pretty forthcoming about its rationale, about its belief that this unsatisfactory arrangement will eventually give way as reform momentum builds in China, that the imposition of controls in China today will eventually - partly because of the flow of information, even if it's restricting information - will bring truly unfettered access to Google's information.
DAVIES: Do you buy that or is that a rationalization for a commercial motive?
Mr. STROSS: In the case of the Chinese censorship, I'm very ambivalent. I can see the arguments of both the supporters of Google's move, and I can also see very clearly the concerns that this is, in essence, shoring up a government that is, at its core, very repressive.
What I'm more concerned about are the debates that I don't hear Google engaged in internally, debates about what happens as its information stores include so much of our personal information, and we see at other companies data breaches, leaks. Google has been rather unforthcoming. It has essentially said, trust us. We will take good care of your data because we understand our business depends upon maintaining your trust.
However, data has a tendency to leak, and as more and more of the things that we are most concerned about that formerly had gone no farther than our office at home or our den, that sat on our hard drive and no other place, now sits at someone else's server. And it's not just Google. Many companies, of course, are running data centers, and much of our data that is very personal and sensitive sits on many servers. But it's Google that has taken a lead in attracting more of our information in more comprehensive ways than anybody else.
The fact that they don't show us that they are sensitive to how potentially catastrophic it would be for our data to find its way into the hands of someone we don't want it to be, including our own government's, to me, that's concerning.
DAVIES: Well, Randall Stross, I guess we're out of time, but thanks so much for spending some time with us.
Mr. STROSS: Thank you.
DAVIES: Randall Stross' book is "Planet Google: One Company's Audacious Plan to Organize Everything We Know." Fresh Air's executive producer is Danny Miller. Our engineer is Bob Perdick. Dorothy Ferebee is our administrative assistant. Sue Spolan directed the show. For Terry Gross, I'm Dave Davies.
NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.