return to first page linux journal archive
keywordscontents

Creating Web Plots on Demand

Mr. Pruett tells us how his company creates on-the-fly plots of database information for web display.

by Mark Pruett

I work at Virginia Power, an electric utility that services a large section of the southeastern United States. Part of my group's job is to collect and archive data from thousands of devices in the field, such as electrical transformers and circuits. Our use of Linux to collect this data is a story in itself (see ``SCADA--Linux Still Hard At Work'' in the January and February 1995 issues of Linux Journal).

To make this vast amount of data accessible throughout our company, my group built a set of Intranet applications. These applications are accessible to anyone on our company network using a standard web browser.

Using one of these applications, our users can generate graphic plots of data from our databases. For example, an engineer may want to see a plot of the electrical load on a particular transformer during the course of a month. Our program can extract data from our database, create a plot of that data and display the graphic through the user's web browser in a matter of seconds.

Using tools that come with most standard Linux distributions, you too can build web-based plots on demand. The tools you will need include Perl, the gnuplot graphics package, and the NetPBM graphics conversion tools. I'll assume you already have a web server up and running. We use the Apache web server, but the techniques described here will work with other web servers too.

Before I show you how to tie these tools together, you'll need to get and install gnuplot and the NetPBM packages. See ``Resources'' for information on Internet sites and Linux distributions where you can get these packages.

What is gnuplot?

gnuplot is a device-independent plotting package that has been available for various Unices, as well as other platforms, for many years. It features its own language for creating and displaying two- and three-dimensional plots. Commands can be entered interactively, one at a time, or can be placed together into a gnuplot script file and run as a batch job.

gnuplot supports a wide variety of output devices. The most useful of these devices can be an X window. You can run gnuplot from an xterm under X Windows, and it can plot your data into a separate X window. This is a good way for those unfamiliar with gnuplot to learn all about the various commands.

Since gnuplot can take its commands from script files and its data from simple text files, it's the perfect tool to automate the generation of graphics when requested by a web server.

Plotting access_log Data

I doubt you have much interest in plotting load curves of electrical transformers, so I'll present a more generally useful example. The example is a CGI Perl script that plots web server hits over the course of a given day.

The script will require some understanding of Perl; the small amount of CGI knowledge required will be explained as needed.

First, let's describe what the script needs to do. Every time a person requests information from an Apache web server, the request is logged to a file called access_log. This log resides in the web server's log directory. The exact location of this directory depends on how you have set up your web server. Each line of the log represents a single web hit. A line from an Apache server's access_log file looks something like this:

foobar.mydomain.org
- - [01/Sep/1997:17:14:31 -0400]
"GET /images/gnuplot_10270.gif HTTP/1.0" 200 9538
This line indicates that a web browser at foobar.mydomain.org requested the file /images/gnuplot_10270.gif to be sent to it at 5:14 PM on September 1. All lines in the log have the same format, so we should be able to extract just those lines that match the date we're seeking.

We can also tell the hour and minute of the access. We'll want to keep a count of the number of accesses for each minute of our target day. This data will be sent to gnuplot so it can create our plot. We'll want to create a simple x-y line plot of the data, with time of day on the x axis and number of hits on the y axis.

Once the graph has been plotted, we'll need to convert it to a graphics format that can be displayed with a web browser. Finally, we'll need to build an HTML page to send back to the browser. This page will have an image tag in it that points back to the graphics file we just created.

Don't worry if this seems like a lot of hoops to jump through. It can all be done in seven short steps with less than 100 lines of code. The complete Perl script is shown in Listing 1. The code is divided into numbered sections for easy reference. Let's look at the script one section at a time.

Section 1: Where to Find the Files

The first section of Listing 1 requires little explanation. I just assign variables for file names and directories that will be used later. This is the section you might change if your web server's directories are different from mine.

Section 2: The Date to Search For

The Perl script is meant to be started when a web browser requests it. In other words, it is a CGI program. A Common Gateway Interface (CGI) program provides a standard mechanism for providing ``interactive content'' over the Internet. Instead of requesting a static HTML document, the web browser asks the web server to run a program that will dynamically create an HTML document. CGI programs can be written in any language, but Perl is by far the most common language for CGI programming today.

My CGI script will actually run as a stand-alone program from the command line, although it isn't particularly useful when run that way. Running as a CGI program, it gains one important aspect: access to the QUERY_STRING environment variable. The QUERY_STRING environment variable is one way to pass information to a CGI program. If QUERY_STRING has been defined, then our program expects it to contain a date. This date should be in dd/mmm/yyyy format (e.g., 12/Sep/1997). That's the same format Apache uses to log its dates.

If QUERY_STRING isn't defined, we'll assume we should look for access data for today's date. In either case, the date we're seeking is stored in the variable $date.

Section 3: Searching the access_log File

In section 3 of Listing 1, we open the Apache access log and begin looking for lines that contain our target date. When we find a date that matches, we extract the hour and minute and place them in the variables $inhour and $inmin.

It's at this point that we run into a slight conceptual problem. We want to plot time of day on the x axis of our plot, but what we actually have is hours and minutes. What value will be plotted on the x axis? What we want is a time line from 12:01 AM to midnight. If we just turned times into integers, we would get values like 1024 for 10:24 AM. gnuplot could plot this, but since there are only 60 minutes in an hour, there would be a gap at the end of each plotted hour. There are many ways to solve this problem. The one I used was simply to scale the minutes 0 to 59 to a value between 0 and 99. The hours are then multiplied by 100 and the converted minutes are added to them. Thus, the time 1:30 PM becomes the integer value 1350. Our plot will only have tic marks for hours, so no one will notice our little conversion.

So in this section of code, we convert the hours and minutes to an integer value and use it as an array index for an array of counters called $accesses.

Section 4: Getting the Data Ready for gnuplot

gnuplot can plot the data it reads from a text file. We need to write a file for gnuplot, where each line of the file contains two integer values. The first value will be our time value (the x axis) and the second value will be the number of web hits at that time (the y axis). The file will contain 2401 values. Here are a few lines from an example file:

0808  1
0809  0
0810  1
0811  2
0812  0
0813  21
0814  0
0815  12
This file shows a few minutes during the 8 o'clock hour. We create the file by printing out the data in our $accesses array. Here we encounter another little ``gotcha''. Perl creates something called ``sparse arrays''. This just means that array elements are created as they are defined. In other words, if there were no accesses at 8:12 AM, as in our sample data above, then no corresponding array element is created. It's undefined.

Normally in Perl, you can traverse an array using a foreach loop. In our case though, we need to output data for all time intervals, even those times when no accesses took place. We need this so we have a contiguous set of data for gnuplot to graph.

This is actually very easy to do. In section 4 of Listing 1, we set up a simple for loop. Using Perl's defined function, we determine if an array element exists. If it does, we simply print the index value as our x value and the access count as our y value. If the array element isn't defined, we know there were no accesses at that time, so we print out a zero as our y value.

Section 5: Building a gnuplot Command File

gnuplot will automatically label the x and y axes with values from our data set. If the maximum number of web hits for any time interval in the data set was 48, then gnuplot might create tic marks on the y axis at 0, 10, 20, 30, 40 and 50. This would be fine for the y axis, but remember that the x axis contains a calculated time value that might not make much sense to a person reading our graph.

Luckily, we can override the default tic marks and create our own. That's the job of the for loop at the beginning of section five. We create a text string that will be embedded in the gnuplot command file we are about to create.

gnuplot creates plots based on a set of commands you provide it. Actually, gnuplot has only two commands for plotting data, plot and splot, and we'll be using the simpler of the two in our program. The other command we'll use is set to enable and disable particular options and features in gnuplot. The Perl print statement in section five of Listing 1 handles writing a gnuplot command file to disk.

Those not overly familiar with Perl may find this variation of the print statement somewhat confusing. Let's look at that line:

print GPFILE <<EOM;
This print command tells Perl to write every line in the Perl script following it to the file opened with the GPFILE file handle. It stops printing lines to the file when Perl encounters the line in the script consisting only of the letters EOM. So all the lines in section five following the print statement and continuing to the EOM line are not Perl statements at all, but are gnuplot commands that will be written by Perl to the gnuplot command file.

The print command will also substitute the Perl variable references with their values, so the line:

set title "Web Server Accesses $mon $day, $year"
might be written to the file as:

set title "Web Server Accesses Aug 12, 1997"

Section 6: Creating the Plot Graphic

Now that we have a data file and a gnuplot command file, how do we get our plot? And how do we show it to the user? This turns out to be the easy part.

First, we need to talk a bit more about gnuplot. I mentioned earlier that gnuplot can display to a variety of devices, including X terminals. But gnuplot can also write a file in portable pixmap format (PPM). In section five of the Perl script, note that we write the gnuplot command:

set term pbm color
to the gnuplot command file. This tells gnuplot to write the output to a file.

However, this only takes us part of the way. Web browsers don't know what to do with PPM files; they usually want either a GIF or JPEG file. That's where the NetPBM package comes in. This is a set of command line utilities that convert from one format to another. One of those tools is exactly what we're looking for: the aptly named ppmtogif. ppmtogif is a simple filter program. It takes a PPM image file from standard input and writes a GIF file to standard output. Since most web browsers support GIF, ppmtogif fits our needs perfectly.

Thus, section six turns out to be almost trivial. We use a Perl system call to run our UNIX commands, then run gnuplot using the command file we created in section five (which in turn will read the data file we built in section four). gnuplot sends the graphic in PPM format to standard output, and the ppmtogif filter turns it into a GIF file that is redirected to a disk file. We now have our graph in GIF format on disk.

Section 7: Displaying the Plot in a Browser

The final step (section 7 of Listing 1) is to display the graph in the user's web browser. Remember this is a CGI program, so the browser that invoked the CGI Perl script is waiting to receive something from our web server. What we'll deliver is a brief HTML page containing an HTML image tag that points to the GIF file we just created.

We use the same technique, a multi-line print statement, to create the HTML the browser will receive. Note that we must prepend a ``content-type'' preamble to the data we send back to the browser. This lets it know to expect an HTML page, rather than some other type of data.

The user generates the plot by typing in the URL of the CGI script in their browser. If the server is called myserver and our script is saved as usage_graph_cgi in the web server's cgi-bin directory, then they would type:

http://myserver/cgi-bin/usage_graph_cgi
To specify a date other than today, the user appends the date to the end of the URL, separated by a question mark:

http://myserver/cgi-bin/usage_graph_cgi?11/Sep/1997
If all goes well, and there's no reason why it shouldn't, the user should see a page that looks something like Figure 1.

Figure 1. Plot Displayed on the Web

Cleaning Up the GIF Files

One minor issue remains. We create a new GIF file each time our Perl program is run, so we must get rid of them when they're no longer needed.

In my case, each night I simply run a script under crontab that deletes any GIF files created during the day. Since most of our applications are run during the daytime hours, the chances of deleting a GIF file I still need are very small.

This scheme might not be acceptable in every situation, so you may need to devise a different way to clean up the GIF files that collect on your web server. [The find command will fit this purpose admirably. --Ed]

Reviewing the Steps

While I've provided a specific example of on-demand plotting, the techniques used can be applied to any type of data you might want to plot. If you can extract the data to a simple text file, and if the data lends itself to two- or three-dimensional plotting, you can deliver it to the Web. The basic steps are always the same:

Tying together tools like gnuplot and NetPBM to quickly build a useful program shows that software doesn't necessarily have to come packaged in the latest object-oriented component, tied together with ActiveX or CORBA. Often, good solid tools, text files and a touch of Perl will more than suffice to do the job.

Resources

Mark Pruett received his M.S. in computer science from Virginia Commonwealth University. Mark is a programmer who writes about programming. He hopes some day to be a writer who writes about how to write program documentation. He can be reached at pruettm@vancpower.com.