This month Mr. Lerner shows us how to set up a web site using many of the techniques he's taught us over the past months.
by Reuven M. Lerner
Over the last year, we have looked at ways to create sophisticated web sites using little more than CGI programs written in Perl. Among other things, we have explored HTML/Perl templates (for separating design from programs), HTTP cookies (to identify returning users) and relational databases (to store information in a readily retrievable format while gaining robustness and security).
This month, we will see how we can use all of these techniques together. Such a combination of techniques is used on many commercial sites and is the basic idea behind several web server projects, including Microsoft's Active Server Pages (ASPs), Vignette's StoryServer, AOL's freely distributed AOLServer and the freely distributable PHP/FI.
Note that many of the examples in this month's installment refer to techniques and ideas that were discussed in previous issues of Linux Journal. If you are new to the magazine or to the ideas, please see the ``Resources'' sidebar, which should give you some good starting points for learning about these subjects.
Our example site will revolve around a single table tracking users' birthdays. Once we have created the table for the birthdays, we will write a CGI program that allows users to enter their birthdays into the table. Finally, we will combine the use of cookies and Perl/HTML templates to create personalized home pages drawing upon the information stored in the database.
Assuming that MySQL is our relational database server and that our table will be in the ``test'' database, at the prompt type the command:
mysql testThis starts the MySQL client, allowing us to enter SQL queries interactively. The ``test'' database used in these examples comes with MySQL and is completely insecure--it cuts down on the space required to explain how to secure new databases created with MySQL. As a result, you should seriously consider creating separate databases and users for each of your web applications, so as to reduce the risk of unauthorized users modifying or viewing data on your system.
First, we must create a table with information about our users and their birthdays. Here are some simple SQL commands that create such a table:
mysql> create table birthdays (person_id int unsigned auto_increment, -> firstname varchar(15) not null, -> lastname varchar(15) not null, -> email varchar(50) not null primary key, -> birthdate date not null, -> key id (person_id));This creates the ``birthdays'' table in the ``test'' database. The table has five columns (person_id, firstname, lastname, email and birthdate), none of which may be blank. Each row in the table represents a distinct user whose birth date we wish to track; we ensure that no user is entered twice by setting the ``email'' column to be a primary key, which is a fancy way of saying that no value of ``email'' may be repeated. Since users may have multiple e-mail addresses, we cannot ensure that a user will not enter his or her birthday twice. However, this is likely to reduce such repetition, and is better than using the person's name, which is rarely unique.
The first column, person_id, will be set automatically by MySQL each time we add a user to the database. The first entry in the system will have person_id set to 1, the second entry will have it set to 2 and so forth. Because person_id is of type int unsigned, our system can accept no more than 4,294,967,295 unique entries--smaller than your particular database might need, but large enough for most of my purposes.
We can get a good picture of our database using the ``describe'' command at the ``mysql'' prompt, as follows:
mysql> describe birthdays; Field Type Null Key Default Extra person_id int(10) unsigned MUL 0 auto-increment firstname varchar(15) lastname varchar(15) email varchar(50) PRI birthdate date 0000-00-00
Now that we have created the infrastructure, we need applications that will allow people to enter their birthdays. The easiest way to do this is to create an HTML form whose contents are submitted to a CGI program that takes information from the form and saves it to the database. One such form is shown in Listing 1.
The form is relatively straightforward, although it might seem a bit daunting because of the long selection lists defined using the ``select'' and ``option'' tags. Using such lists, rather than allowing the user to enter a birthday into a text field, reduces the number of errors that a user might enter. It is certainly possible, though, that someone could create an HTML form with identical field names, using text fields instead of selection lists, and thus circumvent our system. The moral, then, is that you should always try to reduce the number of errors that users might enter, but always check to be certain that data were entered correctly.
As you can see from looking at Listing 1, our form gathers six pieces of information: first name, last name, e-mail address and user's birth month, day and year. The latter three pieces of information are separated so that we can simplify the user interface; as we will soon see, combining these to create an valid MySQL ``date'' type is fairly simple.
Our form's ``form'' tag indicates that data should be submitted using the POST method to a CGI program named ``enter-birthday-info.pl''.
We take the user's input and create a SQL query that creates a new entry in the table:
# Now that we have the basic information, create # an SQL query my $command = "insert into birthdays "; $command .= "(firstname, lastname, email, birthdate) "; $command .= "values "; $command .= "(\"$firstname\", \"$lastname\", \"$email\", \"$birthdate\")";Of course, you do not need to store the command in the variable $command. Indeed, you can create the command directly when using $dbh->query, rather than putting it together and then passing $command as an argument to $dbh->query. Putting the query together in this fashion makes it easier to read when programming and easier to send the SQL query to the screen if bugs appear.
After we send our query to the database server, the row is probably added. However, we do not want to just assume it was added because something serious might have happened, and we wish to give a correct indication of the outcome to the user.
First, we check to see if $dbh->errno--the value of an error returned by MySQL--was set to 2000. This is the specific error code returned when trying to insert a row that conflicts with another row. Since we have defined ``email'' to be a primary key, the odds are rather high that if errno is set to 2000, then we have tried to enter a duplicate e-mail address:
if ($dbh->errno == 2000) { &log_and_die( "There is already an entry in\ the database for \" $email\". Try another\ e-mail address!"); }If this was not the case, then we should check for any other error. The easiest way to detect errors is to see if $sth is undefined; if it has not been given any value, then an error occurred, which we identify for the user and in the error log. Note that our general error-catching mechanism needs to come after the mechanism for catching error 2000.
elseif (!defined $sth) { &log_and_die("MySQL error " . $dbh->errno . "\ on command \"$command\"<P>" . $dbh->errmsg) unless (defined $sth); }Finally, if no errors occurred, then we can print a message indicating success:
else { # Return something to the user print "<P>Done!</P>\n"; }
Now users can enter information about their birthdays into the system. But wait--at the beginning of the column, I said that we were going to make it possible to keep track of users' comings and goings. One easy way to do that is to use HTTP cookies, small pieces of data stored on the user's computer that are sent to the site that set them. CGI programs can set cookies whenever they return an HTTP response to a user's browser. Whatever cookies were set are then returned upon each subsequent visit to that web server. In this way, they can be used as variables, albeit variables that might be modified or removed by users worried about their privacy, or who might come to our site from a friend's computer.
Our cookie will have to be a unique identifier that can be used to bring up the user's entry from the ``birthdays'' table. We thus have two options--the user's e-mail address, which is guaranteed to be unique because it is a primary key, and person_id, which is automatically incremented each time a new entry is added to the table. We will use person_id, but there is no reason why you could not use the ``email'' column. Indeed, given that ``email'' is a primary key, you could even do away with the person_id column--except that if you were to create other tables that refer to individual users, keeping track of an integer (such as person_id) is much more efficient than using their full e-mail address.
How can we retrieve the ID that was added to a row that we might add? The most obvious way is to retrieve the row that we just entered, creating and sending an SQL query to the MySQL server. But MySQL has an easier way to do this--after sending our query, we can ask for $sth->insertid. $sth is the ``statement handle,'' an object that allows us to send and retrieve information about an individual SQL query and statement, and insertid is one of the methods that $sth provides.
Once we know the value of person_id, we can create a cookie with the following two lines of Perl:
my $cookie = $query->cookie(-name => "person_id", -value => $sth->insertid);The cookie (now stored in $cookie) is sent as part of the HTTP header returned by the CGI program to the user's browser. Thus, the original version (see Listing 2) of our program sent a basic MIME header at the beginning of the program's execution with the command:
print $query->header("text/html");But the new version of our program will have to move that to a later portion of the program, after we have already sent our query to the database. In addition, we will have to modify our statement such that it sends the cookie along with the MIME information in the header. This can be accomplished with the following code:
print $query->header(-type => "text/html", -cookie => $cookie);When the above code runs, it sends both the MIME header that describes the type of output that we are sending to the client, and the information describing the ``person_id'' cookie that we want to set. Every time this user visits our site in the future, we will be able to retrieve the value of ``person_id'', and thus look the user up in a table in our database.
You can see the results of changing our program in Listing 3. The modifications are fairly small, but they ensure that a cookie is returned to the user's browser whenever we successfully add a row to the database. (When no row is added to the database, the header remains as before, sending the MIME header but nothing else.)
The final part of our system will bring cookies and databases together with another technique we discussed several months ago, Perl/HTML templates. Templates are pages of HTML with small snippets of Perl interspersed between the HTML tags, accessed via a short program that uses the Text::Template module to turn the Perl into text. This allows HTML pages to perform database lookups, calculations and other elements that require computation too difficult to accomplish with server-side includes. Unlike straight CGI programs, templates can be edited by your site's designers and HTML coders, removing the programmer as a bottleneck to changing the HTML that a program produces.
In Listing 4, you can see wrapper.pl, a CGI program that turns templates into HTML. (A version of wrapper.pl published in a previous issue of LJ was not tested thoroughly, and contains several bugs that were fixed in this version.) Assuming that wrapper.pl is installed on your system and that the file /home/httpd/html/birthdayhp.tmpl is a valid template, you should be able to request
/cgi-bin/wrapper.pl?/home/httpd/html/birthdayhp.tmplIn other words, wrapper.pl should be invoked with a single argument, the name of the template that should be evaluated and then returned.
This version of wrapper.pl allows us to pass variables to the template using the LJ package (as described in the manual pages for Text::Template). This allows us to pass variable values from wrapper.pl to templates. In general, we would not want to pass variable values from wrapper.pl to a template, since the template should be somewhat isolated from its surroundings and should be allowed to assign its own variables. But in this case, we do want to pass one value, that of $query (the CGI instance), which allows us to access information passed to wrapper.pl per the CGI specification. This includes the ``cookie'' method, which allows us to retrieve cookies that our server has set in the past.
I have included a simple version of birthdayhp.tmpl in Listing 5, so that you can see how easy it is to include Perl inside HTML. There is a performance penalty for serving documents in this way, since you are forcing an invocation of Perl each time a template is viewed on your system. But that drawback is often counterbalanced by a template's versatility and ease of inclusion in a site. A large Web site's editorial and production staffs can thus modify the site's content and design without disturbing programs necessary for the site to run. The programs are surrounded by curly braces, making them easy to spot in an HTML file.
One thing to remember when dealing with templates is that the output from each Perl fragment is inserted into the resulting HTML. If you wish to send the text ``Hello, there'' to the user's browser from within a Perl fragment, in order for things to work correctly you must use:
{ "Hello, there" }or a slightly more formal version:
{ my $outputstring = ""; $outputstring .= "Hello, there"; $outputstring; }Do not make the mistake of trying to use print from within a template, as in this case:
{ my $outputstring = ""; $outputstring .= "Hello, there"; print $outputstring; }The contents of $outputstring will indeed be sent to the user's browser thanks to print, but the text will be sent ahead of the rest of the template. In place of the Perl block containing this code, the Text::Template module will insert the result from print--the result will have a value of 1 when printing is successful. Setting strings, rather than printing them directly, is the norm in the case of templates, but takes some getting used to if you are an experienced Perl programmer.
The trick now is to create a template that does the following:
Getting the value of the user's cookie is almost as simple as setting it. Just as our CGI program can set one or more cookies when it returns an HTTP header to the user's browser, the value of the cookie is sent to our server in the HTTP headers accompanying any request we might receive. Once our person_id cookie is set on the user's computer, every visit to our site will be prefaced with the name and value of person_id. We can retrieve the values of all cookies on our system with a call to the ``cookie'' method, just as we used it to create our cookie.
Listing 6 contains a very short CGI program (show-cookies.pl) that prints the names and values of all cookies sent to a particular server. Remember that cookies are only sent to the server that originally set them--so while your browser might contain a large number of cookie name-value pairs, the output from show-cookies.pl will only display those created by your server.
In show-cookies.pl, we iterate through each of the cookies sent to the web server. But for our particular purposes, we are only interested in a single cookie, ``person_id''. We can retrieve that by using the ``cookie'' method in the following way:
my $person_id = $query->cookie("person_id");Assuming that a cookie named ``person_id'' was sent to the web server (meaning that the ``person_id'' cookie had been set by a program on that server in the past), its value would now be available in the variable $person_id. We can then use $person_id as a unique key in our ``birthdays'' table, allowing us to retrieve information about returning users.
Among other things, we must ensure that the template handles the possibility that a user with a cookie might not appear in the database, or that a user might go to the customized home page without a cookie. In the test template (cookie.tmpl), this is handled in a fairly crude way, printing out both the value of the user's ``person_id'' cookie and the number of rows matching this value of person_id in the database. In the example template shown in Listing 7, we include the following code:
if (($person_id == 0) || ($sth->numrows == 0)) { $outputstring .= "<P>Error retrieving information.</P>\n"; $outputstring .= "<P>person_id (cookie) = \" $person_id\".</P>\n"; my $numrows = $sth->numrows; $outputstring .= "<P>Rows returned from table = \" $numrows\"</P>\n"; $outputstring .= "<P><a href=\"/birthday.html\">"; $outputstring .= "Enter your birthday</a></P>\n"; }This code checks to see if $person_id (which is sent as a cookie from the user's browser and should correspond to a single row in the ``birthdays'' table) is equal to 0, which can also mean that it is not set. If $person_id is 0, then we have no cookie on record for this user. This does not necessarily mean that the user has never visited our site before--some users reject cookies because of privacy concerns, others use multiple browsers (each of which keeps its own list of cookies) and still others might be accessing the web from multiple computers. But our system does ensure that users accessing our site from the same computer (and the same browser) will see their birthday displayed whenever they come to our system.
We also compare $sth->numrows with 0 to see if no rows were returned from the database. It is quite possible that a user might have visited our site long ago, and that the cookie from that visit remained on that user's computer--but that all early visitors' entries in the database were somehow deleted. Under such circumstances, $sth->numrows will return 0 (meaning that no rows had a person_id column matching $person_id from the user's cookie), and we have to request a new birthday entry from the user.
If the query did return a row (and we know that it will return one row at the most, since person_id must be unique), then we have to grab that row with $sth->fetchrow, and then read the values of the resulting array into our variables. In this particular case, we do nothing more than print them out:
{ while (my @arr = $sth->fetchrow) { my ($firstname, $lastname, $email, $birthdate) = @arr; $outputstring .= "<P>firstname = \"$firstname\"</P>\n"; $outputstring .= "<P>lastname = \"$lastname<"</P>\n"; $outputstring .= "<P>email = \"$email\"</P>\n"; $outputstring .= "<P>birthdate = \"$birthdate\"</P>\n"; } }Of course, if we were interested in doing something a little more interesting, we could do so by taking the values returned by $sth->fetchrow and using the resulting variables in the title of the HTML page or a comparison of today with the user's birth date. The point is that the database is a means for storing information between invocations of the CGI program. Once the information has been read into the CGI program from the database, we can use that information just as easily as if we had assigned the variables at the beginning of its invocation.
Most people do not need to be reminded of their birthdays. And indeed, the use of birthdays in this example was simply for demonstration purposes. Even with the limited information we stored in our database, we can create a bare-bones personalized home page that displays the user's name in the title. With a little more work, we could print a special message on this user's birthday, or an indication of how many days remain until the user's next birthday.
And because we have stored all users' birthdays in our database, we can create applications that access other birthdays on the system. For instance, we could create a CGI program (or a Perl/HTML template) that finds other users on the system with your birthday. The possibilities are endless, and putting the information into templates means that you (as the programmer or webmaster) can concentrate on writing the code necessary to make things run, while the site's editorial and production staffs can make things look pretty and ensure that they are grammatically correct as well.
With that, we end our whirlwind (albeit longer than usual) tour of integrating multiple techniques into a single web site. Web sites based on databases are increasingly popular, for good reason. The largest and best-known web sites combine back-end databases with templates and cookies to give each user a personalized experience; now that you have seen how it can be done, create some on your own sites.
Reuven M. Lerner is an Internet and Web consultant living in Haifa, Israel, who has been using the Web since early 1993. In his spare time, he cooks, reads and volunteers with educational projects in his community. You can reach him at reuven@netvision.net.il.