How To Fix Your Overloaded Server

What happens when your users' pages become too popular for your network? Here is one ISP's solution.

by Jonathan Gross

As the Web becomes more and more populated with Web pages, there is also a growth in the number of people accessing those pages. Indexes, like Lycos and Yahoo, provide a good starting point for people who are looking for interesting pages to browse. However the resultant popularity has the potential to cause problems for the servers that are hosting the pages. This article examines what one ISP did when a page on his Web server began dragging the whole system down.

Cathie Walker gets her Internet feed from Islandnet and maintains a set of pages that she calls ``Centre For The Easily Amused'' on Islandnet's server that, as she says, ``ain't rocket science...[just] an average page with a catchy name and content.''

Cathie did nothing beyond the typical promotion of a Web site, submitting the link to the usual resources, Yahoo, EINet Galaxy, Starting Point, Lycos, Web Crawler, Harvest Gatherer, Apollo, Whole Internet Catalog, Nikos, and Jump Station, among others. On August 1, Centre for the Easily Amused (CEA) appeared on Netscape's ``What's New?'' list, generating even more traffic.

The next day, Cathie tried to retrieve her page, only to receive a ``Forbidden'' error from Islandnets server.

Meanwhile, Mark Morley, one of three owners and head technical contact for Islandnet, was working on the same machine the server was running on. Gradually, the machine slowed to a crawl, affecting not only Mark's work and the Web server, but also FTP access and the users that were dialing in. Figuring it was the server, Mark began wading through the logs and found that the number of requests for Cathie's page was inordinately high and was running the server into the ground. Her pages alone were generating between 25,000 and 50,000 hits a day. Since Islandnet is connected via a T1, bandwidth was not a problem, but the load imposed by 100,000 hits a day total on the server (a Sun Sparc 5 running SunOS 4.1.3 with 112 MEG of RAM and about 4GB of disk space) was unacceptable.

The Options

Mark figured he had three options:

  1. Remove access to Cathie's pages altogether.

  2. Set up secondary, dedicated servers and offload the high volume pages to these machines.

  3. Put her pages on some sort of on/off schedule.

Option one, while the easiest and the quickest, is not a satisfactory solution. If you are offering Web space to your customers, you should be able to do so, regardless of technical problems. As Mark said, ``...it wasn't her fault her page was so popular that it brought our server to its knees.''

Option two is the best option, since it insures that there are CPU cycles available to handle incoming requests quickly, however it is also the most expensive, since it requires additional hardware.

Option three is quick, extremely easy to implement, and allows access to the page without slowing the server down too much. Unfortunately it doesn't provide full-time access.

The Solution

Denying access to Cathie's pages was out. Ideally, Mark would have liked to add another server, but he didn't have the hardware; Mark chose option three as a compromise between the first two.

First, he created a page that said something like, ``Thanks for connecting to Cathie's pages. Due to high server loads, her page will be unavailable for a few minutes. Please try back later.'' He named this file BUSY. He then copied the file containing her home page over to another file called HOMEPAGE and removed the original file from her Web space. Now the problem was swapping pages every once in awhile.

Since Islandnet is a *nix system, Mark created two cron entries:

0,30 * * * * cp /home/cwalker/HOMEPAGE /home/cwalker/homepage.html
15,45 * * * * cp /home/cwalker/BUSY /home/cwalker/homepage.html
The first cron entry would swap the real page into homepage.html at the top of every hour and half past every hour. The second entry would swap in the ``busy signal'' page every fifteen minutes and forty-five minutes past the hour, creating a 15 minutes on, 15 minutes off scenario.

Immediately, things looked much better. When her page was actually available, the system load shot up but didn't have time to get out of hand before the ``busy'' page was swapped in. Mark kept this schedule for about a week until the initial burst of popularity had subsided and then was able to remove the timer and still keep an acceptable system load.

Prevention

In case this happens again, Mark has purchased additional hardware and configured a dedicated server. If pages become popular and start causing problems for the server, they are switched over to the dedicated machine, where the CPU can keep up with the requests.

Jonathan Gross is the editor of WEBsmith magazine.