Learn all about Apache modules and mSQL programming using a web counting program as an example.
by Randy Jay Yarger
This article is not about web counters. Oh, I know what the title says, and I am going to be showing you how to use mSQL to make a fast and effective counter. But that is just icing on the cake. The real purpose of this article is to explore the fascinating worlds of Apache modules and mSQL programming.
Of course, counters are far from the newest thing to hit the streets. I'd be willing to bet that you have one on your home page. However, the vast majority of counters that people use are third party off-site counters. These counters are usually implemented by including an <IMG> tag on your page to call a remote site which keeps track of the number of accesses and returns a counter image containing the current count. The problems with this method are well known: it can be expensive, it doesn't include all of your hits (especially hits from text-based browsers), and it isn't customizable.
Being Linux users, we don't settle for the easy way of doing things--we do them the right way. Every major distribution of Linux comes with the Apache web server. This gives every Linux user with a network connection the ability to run their own web site without the constraints of using third-party sites for anything, not even counters.
The Apache web server is far and away the most widely used web server on the planet. One of the features that makes it so popular (besides that fact that it is free and always will be) is its modular design. The server can easily be enhanced by writing modules that compile right into the main program. These modules are usually written in C, but there are even modules which allow you to write modules in other languages, such as Perl.
The good news is that writing an Apache module is not hard to do. Even a novice C programmer like me was able to get the hang of it fairly quickly. With that said, let's get down to the business of writing a counter module for Apache.
There are several good reasons for making your web counter a module that is a part of the server. Counters that work via CGI (such as the ones described above) do not record all possible hits to your page. On the other hand, counters that work by analyzing log files are hard to keep up to date, and they can also miss some hits (e.g., it may catch /~myhome/index.html but miss /~myhome/ or /~myhome). A counter directly integrated into the web server, however, is guaranteed to catch all possible hits since all files requested from your web server must be sent through the server. In addition, going through the web server allows you to use the name of the file that was actually sent to the user, so that instead of /~myhome, /~myhome/ and /~myhome/index.html, all hits are labelled /~myhome/index.html. The other big advantages are speed and customization. You know that the counter is going to be up to date, since every file going through the server also goes through your counter.
Now, all we have to do is decide how to store the data. As a matter of fact, a friendly member of the Apache community has done some of our work by writing an Apache module called mod_cntr.c (see Resources) which is a counter that uses plain text files or DBM databases to store the data.
As far as making a counter module goes, that one is really all you need. Download the mod_cntr.c file, put it in your Apache source directory and recompile. But being discerning Linux users, we want more power--that power is going to come from the mSQL database.
The mSQL engine is very fast at the expense of being an incomplete SQL language, but it has enough for our needs. Running our counter module through an mSQL module removes the need for cumbersome text files. All we do is call up the nearest mSQL server (whether on the same machine as the server or not) and send it the instructions needed to maintain the database.
I won't go into the details on how to obtain and install mSQL here. Complete instructions are available on the mSQL web page (See Resources). To compile our mSQL-based counter, you need the libmsql.a library in your libraries path and the msql.h header somewhere accessible.
Of course, the best way to handle this would be to add mSQL capabilities to the already existing mod_cntr.c. And in fact, this is the way I do it for my production server. But as a simple case, I'm going to rewrite the module so that it only supports mSQL. This will remove a bit of clutter and be a bit more understandable.
Listing 1 contains the code for the mSQL-based counter. Let's first look at this from the perspective of an Apache module. The Apache module API contains very simple rules for creating modules. Lines 403-419 contain the module definition that the web server reads to define the module. There are many options here, but for our needs (and for the needs of most modules), we only need to define four of them. The first two, defined on lines 406 and 408, are create_cntr_dir_config_rec and create_cntr_srv_config_rec. These are the names of functions which initialize the variables used by our modules. Notice that there are two of these, one containing the characters ``dir'' and one with ``srv''. Apache allows users to customize most aspects of the server on a per-directory basis using .htaccess files; this configuration is accessed with create_cntr_dir_config_rec. The main server configuration files (access.conf and httpd.conf) are accessed with create_cntr_srv_config_rec.
The code for the create_cntr_dir_config_rec and create_cntr_svr_config_rec functions are located on lines 132-146 and 148-162, respectively. The two sections of code are almost identical. Each simply initializes the counter information, contained in the structure cntr_config_rec (lines 106-112), with the defaults defined in lines 122-127. If you were writing a module for any purpose, you should include two functions similar to these to initialize any default variables for both the server-wide and per-directory configurations.
The next definition for our module is cntr_cmds at line 410. This is the name of an array of type command_rec which is defined in the Apache header files. These are the options which users of your module can enter into the Apache configuration files. There are six fields which must be filled out for each command_rec:
Our module defines six configuration options. CounterType defining the type of database to use (mSQL only, in this case, but I left the line in for easy transition to the original version of this module with text and DBM databases). CounterAutoAdd allows the user to decide whether or not a URI (the portion of the URL relative to the host) encountered for the first time should automatically be added to the database. CounterDB is the name of the mSQL database used, followed by the table within that database. There are three similar options that do the same operations for the server-wide configuration.
Each of the options defined has an associated function (set_cntr_type, set_cntr_autoadd, etc.). These functions check to see if the user supplied a value for that option. If so, that value is added to the cntr_config_rec structure for the module, replacing the default.
The final bit of information about our module is on line 417. This is the name of the function which does all of the work. In our case, that is the function cntr_update located on lines 331-400.
That is all there is to creating an Apache module. Once you create a C file with a module structure that has two functions to initialize default variables, a command_rec structure with user-definable options (along with functions to insert those options into your module) and a function to perform the action, you have everything you need. Of course, to do anything useful you have to code the action into that last function, so let's take a look at cntr_update.
The first interesting bit of this function occurs on lines 343 and 344. The structure r is of type request_rec (defined in the Apache headers) and is passed to us from the web server. This structure has all of the information we need about the request the server is currently handling, including the URI of the document and whether or not the document has been fully resolved. In lines 343-344, we skip all of the unresolved steps the server goes through to get the final file, so that the URI we receive is the name of the file actually sent. This way, the URIs entered into our database are of the form /directoryname/index.html instead of /directoryname, /directoryname/ or whatever name the client-end user actually typed. On lines 348-351, we abort if there is no URI (the user made a typo or the file doesn't exist) or the file is a server-side include (no need to keep a counter for those). We then call the function get_module_config for both the server-wide and per-directory cases. This is a function provided by the web server which gathers all of the user-supplied options and calls the appropriate functions (that we have provided) and returns a cntr_config_rec structure with the initialized variables.
Next, we check to make sure that the user has provided a database and table for us to work with. Assuming everything is set, we then call the cntr_inc function to increment the counter for the given URI. After that we set a couple of environment variables which can be used by CGI programs that may be interested in the counter (e.g., an image generator that makes an odometer graphic).
The main purpose of the cntr_inc function (lines 310-329) is to do a bit of preprocessing of the URI and then call the appropriate database incrementing function. In our case we only have one type of database, so this is a very short function. First it strips double slashes (//) from the URI and then calls the cntr_incmsql function to do the actual work.
As far as creating an Apache module goes, we're done. Pretty cool, eh? I certainly never imagined that creating an extension to a web server could be so easy. Now all we have to do is code the cntr_incmsql function to do the work.
The cntr_incmsql function (lines 253-308) is the heart of the module. In 55 lines we implement a fast, effective web counter using an SQL database server. The C API provided with mSQL includes several functions that allow easy access to all of mSQL's features. The first one we use is on line 264. There we call msqlConnect to connect us to the local database server. You can also call msqlConnect with a non-NULL argument to connect to a remote database server. In the next two lines, we check to make sure we connected successfully and return an error if we didn't. Line 269 calls msqlSelectDB to select the database supplied to us by the configuration options. Again, if the function does not return successfully, an error is generated.
Now that we have connected to the appropriate database, we can send our first SQL query to find out if the given URI already exists in the database. The table we query is given by the user and must have a uri char field which is a non-null unique index, a cntr_count int field and a cntr_date char field. The query we send is
select cntr_count, cntr_date from tablename where uri='uriname'This query returns the existing count of the URI (if any) and the last time it was reset. We send this query using the msqlQuery function and check for errors as usual. The next step is to retrieve the results (if any) from the server. We do this with the msqlStoreResult function. This function returns a structure which allows us to retrieve the results row by row, along with other information about the results such as the number of rows retrieved. In line 281 we use the msqlNumRows function to see how many rows were retried. The number should be one or zero since each URI is unique. (A full list of the SQL queries supported by mSQL can be found on the mSQL home page listed in Resources.)
If there already exists an entry for that URI, we enter the block on lines 282-293. First we use the msqlFetchRow function to retrieve the number of counts for the URI. Since we know we only have one row of data, we only need to call the function once. If you were expecting multiple rows of data, you can keep calling msqlFetchRow to retrieve them until the data runs out. The msqlFetchRow function returns an array which contains each field that you requested. In this case, we asked for the count and the date for the URI. We then increment the count and place the count and date into our results structure so that CGIs or other interested programs can access them. Finally, we send the update query:
update(with the appropriate count and uri in place). After checking for errors, we are free to close the database (using msqlClose) and deallocate any database memory using (msqlFreeResult). We can then return success and everyone is happy.
If the URI we have does not exist in the database, we have to enter it. First we check to see if anyone has defined the CounterAutoAdd option in the server's configuration files. If they have, then we just skip this URI and go home. If CounterAutoAdd is not defined, we have to add the URI into the database with a count of 1 and the current date. The current date is set using the server-defined function ht_time which returns a pre-formatted string. We do this with the following query:
insert into cntr_date) values ( 'myuri', 1, 'currenttime' )(with appropriate values for tablename, myuri and currenttime). Then we always check for errors. If there are no errors, the module is finished. We now have a working hit counter built right into the web server.
This module is neat, but it is not perfect. A problem known as a race condition arises if you are running a high volume server. Note that on line 275, we check the database to see if the URI already exists. If it doesn't, then on line 296 we insert the URI into the database. Suppose that somewhere between line 275 and 296, another hit comes in for the same URI. The database indicates on return that the URI does not exist; it doesn't know the first hit is about to add it to the database. By the time the second hit reaches line 296, the URI has now been entered into the database by the first, and the database returns an error due to a non-unique URI.
In a fully-featured SQL server, this problem is solved by a technique called ``transactions'' where multiple commands can be entered into the database together--before any other commands are processed. Hopefully, mSQL will support transactions in the near future. Until then, one workaround for this problem is to initialize your database with all of the URIs for your site and a count of 0. This way, the counter will never be told that a URI doesn't exist.
Randy Yarger (randy@yarger.tcimet.net) is the Systems Administrator and Chief Programmer for H-Net, Humanities and Social Sciences On-line (http://www.h-net.msu.edu). His sign is Virgo, his favorite color is blue, he wants to work on world peace and is currently practicing for the interview portion of the Miss America pageant.