Learn how Linux came to power one of the best search engines available on the Web.
by Jason Schumaker
Google, http://www.google.com/, is the hottest search engine being used on the Internet today (see Doc Searls' ``Google Gains...''article, page 10). It's fast and consistently returns relevant links. The company was founded in 1998 by Sergey Brin and Larry Page. The two collaborated on a new search engine technology called PageRank(tm). Since then, Google has gathered quite a following--Yahoo! recently hired Google to power its search engine--and now boasts the ability to link to over 1 billion URLs. I talked briefly with Sergey Brin, Google's co-founder and President.
Jason: What led to Google's decision to use Linux? When did that start?
Sergey: Well, Larry Page and I were in the Stanford PhD program in Computer Science. And we developed Google there. The way the computer science program worked is there was a hodgepodge of computer equipment lying around, and we would grab whatever scraps we could. We had all kinds of computers: HPs, Suns, Alphas and Intel's running Linux. So, we gained a lot of experience with all of those platforms.
When we started Google, we had to make the decision of what we wanted to use. Of course we chose Linux, because it is the most cost effective solution.
PCs are not only much cheaper these days, but we can also get them very quickly, because they're such a commodity item. That's an incredible benefit. We just installed another 1,000 computers and we got that done in a few weeks. That's really hard to do with any other kind of workstation. I think that's an advantage that people don't entirely realize.
Jason: Did you view it as being better, or was cost the main reason?
Sergey: It was better in some ways. Certainly for our purposes, we felt the support was better. For example, the actual kernel authors will respond to problems pretty quickly. They are especially responsive to Google nowadays, since we're so widely used. We can have a 15 minute turnaround. You can't really beat that for support.
That was an important factor, but frankly, the cost was a bigger issue. PCs are so cheap, which is very important. Sun's Solaris is probably more stable than Linux on PCs. It's hard to determine the blame, whether it's the hardware or the operating system. But, it's a minor difference.
Jason: Then, does all of your support come from newsgroups or do you actually pay for it through Red Hat?
Sergey: We have an operations team of about ten people, which helps a lot. And other than that we check newsgroups and e-mail the authors of the code. Usually, if it's a problem we can't figure out, we go straight to the authors.
Jason: Is Linux used on desktops at Google?
Sergey: It depends. Engineering mostly runs Linux. Business development/marketing runs Windows. Actually, I use Linux with VMWare running Windows. Some people have two computers, particularly some people in engineering who do UI development and need to test things out on Windows platforms. I find it better to just use VmWare and have one computer.
Jason: In a technical sense, what does Linux lack? What does it not provide?
Sergey: The 64-bit file system, which I know they are working on. It's slowly coming around. I think there are still occasionally some stability issues. I'm not saying Linux is unique in that respect, but you definitely want to have reliability. There are some issues dealing with higher memory systems. If you get to 2GB, and you try to push it past that, we encounter various problems. I know we've had some trouble with the network stack when we really push it hard. In terms of having lost most connections from lots of different machines.
Jason: Well, you're getting quite a few hits per day, aren't you?
Sergey: Yes, we are. We do about ten millions searches per day at Google.com. And another six million or so from OEM customers. So, we get a lot of hits. And when we crawl the Web, we crawl it pretty quickly, which can really stress the system.
Jason: Has your system been down entirely?
Sergey: No, but we certainly have individual computers go down. Our system has a lot of redundancy built into it, so the users don't see it from the outside.
Jason: I've read that you have developed your own network installation tools ...
Sergey: Yeah. We've re-used various components of things that people have built; we've had to now re-do them quite a bit ourselves. We have 5,000 computers now, and that's actually a fair amount of work to install. So we have our own network install system--where we can bring up 80 computers at a time. And we have our own testing software and monitoring tools to keep track of what the computers are doing, what state they're in. So, we've had to do a fair amount of development.
Jason: Of the 5,000 computers used by Google, can you roughly breakdown what they are used for, i.e., 3000 perform searches, 1000 do OEMs, 500 do web crawling, etc.?
Sergey: Without giving specific numbers, we can say approximately 80% of the machines are used for performing searches (google.com and partners); about 10% of the machines are used for Research and Development and another 10% of the machines are used for preproduction (crawling and indexing the web).
Jason: Are the tools worth releasing to the Open Source community?
Sergey: That's an interesting question. I mean, I don't know of too many installations that are of comparable size to ours, but it certainly is, now that you mention it, something we would consider. I don't think that any of them are robust enough or clean enough at this point and time. But, I think we can get them to that state if other people would take over the maintenance and contribute. I just don't think that there are too many people who would end up using them.
Jason: Could you briefly tell us something about yourself and how you came to work at Google?
Sergey: I was born in Moscow and came to the United States at the age of six. I grew up in Maryland, then went to the computer science program at Stanford. I started there in 1993, where I worked on data mining, which basically involves analyzing vast amounts of data to find interesting correlations and patterns. Then Larry joined in 1995. He started downloading the Web and we analyzed its link structure. We've worked together from then on.
Jason: Well, thanks so much for your time. Take care.
Sergey: Thank you.
Jason Schumaker (jason@ssc.com) has worked for Linux Journal for nearly two years. He is Assistant Editor and a staff writer.