Interfacing Relational Databases to the Web

This document explains how to build a database-backed web site using Apache, the PHP3 module and the PostgreSQL relational database.

by Will Benton

Why would one want to link a database to the web? A better question might be, ``Why wouldn't one want to link a database to the web?'' Static HTML pages are good for many things: documentation, hypertext books, personal pages and other unchanging information. However, static pages present a few problems:

Static pages can be a hassle to maintain. If one is managing a large site with thousands of pages, changing just the ``look and feel'' of the site will involve either an inordinate amount of work or a long evening with CGI and Perl. This gets even nastier when the content of the site changes.
Static pages don't allow for user input, feedback or collaboration. Suppose you want to add a message board to your static web site. You can set up a form that mails user comments to the webmaster, who manually puts them on a page--you can even set up a script to do this. However, this presents a few problems, as we shall discuss later.
Static pages don't allow you to operate a web service. One can operate a web site with static HTML, but some of the most useful web sites today, such as Slashdot (http://slashdot.org/), CNN (http://www.cnn.com/) and Amazon.com (http://www.amazon.com/), are services offering dynamic database-backed content.

A database management system provides uniform access to structured data, much like an object-oriented programming language provides a uniform way to store data and define methods to act on that data. If you're writing a program in C++ and you need to deal with strings, would you rather wrap all associated routines in a class or deal with the fact that your string is really a char * each time you operate on it? With a database, one is offered a layer of abstraction between how the data is actually stored and how one uses it, which is quite useful for several reasons. You could achieve the same results with C or Perl scripts, but it wouldn't be as pretty and could get downright ugly.

I hope this quick rundown is enough to convince you that you really want to use a database for your dynamic web site. In this article, I will present instructions for installing PHP3 (http://www.php.net/) and PostgreSQL (http://www.postgresql.org/), a little bit of theory, some instructions for using SQL and PHP3, and an example application.

Installing the Tools

This description assumes you are running Red Hat, but most of these instructions will be applicable to other distributions; these tools are fairly painless to install from source, anyway.

Here's a list of what you'll need to run the example application and develop your own applications:

Apache 1.3 or greater. You will need at least version 1.3, because 1.2 does not support modules, and PHP is faster and more secure as a module. On a Red Hat system, you'll need both the apache and apache-devel packages. (Make sure you have the file apxs installed, because we're going to recompile the PHP3 Apache module.)
The source RPM for PHP3, version 3.0 or greater.
PostgreSQL 6.4 or greater; the example code will not run on version 6.3 without a little editing, because 6.4 has an SQL parser.

To rebuild PHP3 for PostgreSQL support, take the following steps:

Use su to become root.
Install the source RPM for PHP3 (rpm -ihv mod_php3-3.0.5-2.src.rpm on Red Hat 6.0). This will place a ``spec file'' in the directory /usr/src/redhat/SPECS and a tar file of the source in /usr/src/redhat/SOURCES. Since the PHP module that comes with Red Hat doesn't have database support enabled by default, we'll have to recompile it. RPM makes this fairly painless.
Because the PHP3 installation process assumes a default PostgreSQL installation, not the Red Hat one, we'll need to make some symlinks. Create a directory /usr/local/pgsql and make symbolic links from /usr/include/pgsql to /usr/local/pgsql/include and from /usr/lib/ to /usr/local/pgsql/lib.
Invoke your favorite editor on the spec file (mod_php3.spec) and search for ./configure; then add the configure option --with-pgsql.
Now build a binary package with rpm: /rpm -bb mod_php3.spec/
If all goes well, you'll have a binary package in /usr/src/redhat/RPMS/arch, where /arch is your architecture. Install it, and you're ready to move on.

Setting up PostgreSQL

PostgreSQL uses a different access system than the rest of your system; oddly enough, not even root has access to the database by default. The database system has its own user system and passwords, and postgres is the database administrator account by default. The advantage to the separate access system is that one can create database users who do not have UNIX accounts; this way, the database for your web application can specify access control without creating a potential security hole for your system. To add your web administrator (web) as a database user, use createuser (as root):

# su postgres -c createuser
Enter name of user to add ---> web
Enter user's postgres ID, or RETURN to use UNIX user id: 542 -> 542
Is user "web" allowed to create databases (y/n) y
Is user "web" allowed to add users? (y/n) y
createuser: web was successfully added

Then, as web (or whatever account you used), you'll be able to create a database with createdb foo and then try some queries on foo using psql foo.

You'll also need to set up PostgreSQL to accept incoming TCP/IP connections so your PHP3 pages can access it. Fortunately, System V init makes this easy. Simply open the file /etc/rc.d/init.d/postgresql and change the line

su postgres -l -c \
 'usr/bin/postmaster -S -D/var/lib/pgsql'

so that it reads

su postgres -l -c \
 '/usr/bin/postmaster -S -D/var/lib/pgsql -i'

While you're at it, you will probably want to specify a different port from the default (5432) for security reasons. To run the PostgreSQL back end on a different port, merely append a -p port to the above line.

All the SQL You Need To Know (Not Really)

Just about every relational database in the world uses SQL (or some extended version of SQL) as its query language. SQL allows you to define tables, select records based on given criteria, update values in one or many records and delete records. This is just a brief introduction to SQL; for more complete references, see Resources.

Creating Tables

To create a table, one uses the CREATE TABLE statement. Its syntax is as follows:

CREATE TABLE tablename (field-1 type-1, ..., field-n type-n)

In psql, you will need to end each statement with a semicolon. These semicolons are not part of the SQL language, but rather for the benefit of psql's lexer.

You may also declare fields as NOT NULL, UNIQUE or PRIMARY KEY, or specify a value as DEFAULT to a field. PostgreSQL will create an index on primary key fields. Unfortunately, as of version 6.4, PostgreSQL does not support foreign keys, but at least the parser will not choke on the SQL REFERENCES keyword.

Here's an example, akin to the UNIX password file:

CREATE TABLE passwd
   (username    varchar(8) PRIMARY KEY,
   -- PRIMARY KEY implies UNIQUE
    cryptedpass char(13),
    uid         int UNIQUE NOT NULL,
    gid         int NOT NULL,
    gecos       varchar(80),
    -- the GECOS field (real name, office, etc.)
    homedir     varchar(80),
    shell       varchar(50) DEFAULT '/bin/sh');

Note that SQL uses a double-dash to begin comments, which are terminated by a newline.

Inserting Data

To insert data into a table, use the INSERT statement:

INSERT INTO tablename (field-1, ..., field-n)
VALUES (value-1, ..., value-n/)

You needn't specify field names if you are inserting values into every field. Here's an example for the table we just created:

INSERT INTO passwd (username, cryptedpass,
   uid, gid, gecos, homedir, shell)
VALUES ('fred', '37MniLTaiPLaL', 42, 500,
   'Fred Mbogo', '/home/fred/', '/bin/sh');

Note that SQL uses single quotes for string constants. Any closet Pascal programmers will feel right at home.

Retrieving Data

The SQL SELECT statement returns records where values meet a certain criteria. Here are some examples of SELECT in action:

SELECT * FROM passwd;
-- returns all fields of all records
SELECT username FROM passwd;
-- returns all usernames
SELECT * FROM passwd WHERE username = 'fred';
SELECT * FROM passwd ORDER BY username, shell;
SELECT * FROM passwd
     WHERE homedir LIKE '/home%'
     -- % is the SQL wildcard character
     AND   shell = '/bin/sh'
     ORDER BY username;
SELECT homedir, projectname
     FROM passwd, projects
     -- assuming we have a projects
     table
     WHERE
     -- this will return the home directory of
       passwd.username = projects.leader;
       -- each project leader for each project

Keeping Your Data Current

To change field values in records, use UPDATE:

UPDATE tablename SET field-1 =


value-n WHERE qualification

The WHERE is optional, but if you don't specify a WHERE clause, SQL will update all the records, which is clearly the ``Wrong Thing''.

Let's say Fred Mbogo wants to change his shell. This script will do it:

UPDATE passwd SET shell = '/bin/tcsh' WHERE username = 'fred';

Deleting Records

To delete records, simply use DELETE:

DELETE FROM tablename WHERE qualifier

Just like UPDATE, the WHERE is optional, but you probably want it anyway. Let's say Fred has offended his sysadmin one too many times:

DELETE FROM passwd WHERE username = 'fred';

All the PHP3 You Need To Know (Not Really)

The on-line PHP3 manual, http://www.php.net/manual/, is an excellent reference and will be necessary reading before you create your own database web application. Furthermore, it is a database-backed web site and has lots of user comments. Here, we will examine just the most basic PHP3 features.

Using PHP3

Here is a simple PHP3 program, which demonstrates some basic features. Note the separate HTML and PHP3 blocks:

<title>Hello, world!</title>
<body>
<?php
  echo("Hello, world!\n");
  echo("<p>\nWhat a <b>bold</b> move this is!\n");
?>
</body>

This program will send the following HTML to the remote browser:

<title>Hello, world!</title>
<body>
Hello, world!
<p>
What a <b>bold</b> move this is!
</body>

A similar program, which takes an argument, would look like this:

<title>Hello, world!</title>
<body>
<?php
  echo("Hello, $name!\n");
  echo("<p>\nWhat a <b>bold</b> move this is!\n");
?>
</body>

You would view this page (assuming you called it hello.php3) like any CGI script: http://yourhost.net/~fred/hello.php3?name=fred. This, of course, assumes you are named Fred and have put this file in your /public_html directory.

Database Connectivity

PHP3 provides a number of useful functions for connecting to databases; the best place to read up on these is http://www.php.net/manual/ref.pgsql.php3, and we shall examine a few of them.

int pg_connect(host, port, options, tty, dbname);

This function returns an integer, the ``connection index'', which you will need for all operations on this connection. If a connection can't be established, it returns zero.

int pg_exec(conn, query); Executes the SQL query query on connection conn. Returns a result set index.
int pg_numrows(result); Returns the number of tuples in the result set result.
array pg_fetch_row(result,
Returns an array of values corresponding to the row row of result set result.
void pg_close(conn); Closes the connection conn.

Example Application: A Multiuser Address Book

Our example application is an address book that one can access over the Internet. A user logs in with her name and password and is presented with a menu of options, including browsing and searching the address book and adding a new person. For each person in the address book, the database stores an arbitrary number of e-mail addresses, telephone numbers, URLs and postal addresses. This address book also has some nifty features like mailing passwords to new accounts and automatic mailto and href links for e-mail and web addresses.

Design Considerations

We have already completed the first step of the design process--deciding what we need our application to do. What remains is defining the data that our application will access in three iterations:

A high-level data model
A low-level data model
A set of views and transactions that are legal for application users

Generally, one initially models databases in an entity-relationship diagram, which represents each table as an entity with a set of attributes, and a ``join'', or query, spanning multiple tables is represented by a relationship. Even if you're not going to go to the trouble of making up an E-R diagram, you should at least consider these concepts; we shall examine the entities and relationships for the address book in prose.

The low-level model we shall be using is the relational model, which has been around since 1970 and is the model used by most commercial relational databases, including Oracle, Sybase and Informix.

Finally, we shall define the ways in which our users can see the data. Once we've completed this, we're done with the hard part and can move on to the tedious part--the implementation.

Entities and Relationships

Since the entity-relationship model is such a high-level model, some of the hairier issues of the low-level model are not yet apparent, and our data model looks quite simple. The main advantage of the E-R model is that it presents a clear picture of the database miniworld--the segment of the real world that we're modeling--and is easily comprehensible to both laypersons and software engineers. Entities are defined as follows:

user: This describes an address book user. This entity has a unique ID, a login name, a password, a ``real name'' and an e-mail address.
addressbook: This is a ``weak entity''--a set of persons for which a particular user has information.
person: This describes a person for whom address information is available. This entity has a first name, a middle initial, a last name and a dynastic identifier (i.e., Jr., III, etc.). A person also has one or many of each of the following: e-mail address, postal address, telephone number and URL.

Relationships among entities are defined in this way:

A user owns exactly one addressbook.
An address book contains many persons.
A person is an entry in exactly one addressbook.

If you've thought about databases before, you may be asking yourself, ``Why can't a person be in more than one addressbook? Can't two users know the same person?'' That is the sort of design decision you must make in this process--I chose to allow each user her own, private address book and avoid the difficult issues raised by sharing records. Allowing many people to share a record is like allowing global variables in a C program that can be modified by any function--you can easily get unexpected results when one function changes the variable's value without the others knowing about it.

Moving to the Relational Model

Now we need to move our high-level model (which people can understand) to a low-level model (which our database management system can understand). This process is quite straightforward, although the database designer still has some options at this point, involving normalization. Normalization (see Resources) is a formal process to quantify and measure the quality of a relational model; as always, the tradeoff is theoretical quality versus performance.

The quick-and-dirty algorithm for moving your data model from the entity-relationship to the relational model is this:

Make sure every attribute of every entity is atomic.
Make a table for every attribute of an entity which can hold multiple values and define a join which ties these to the associated entity.
Make tables for each entity, using as fields the attributes you haven't dealt with yet. If an entity is involved in an n:1 relationship, include the key of the record it is related to as a foreign key.

This is necessarily a simplified version of the process; it does not deal with m:n relationships or some other details. For a more complete discussion of data model conversion, see a textbook on database theory.

SQL code for the ``finished'' relational model can be found at ftp://ftp.ssc.com/pub/lj/listings/issue67/3475.tgz. All code is released under the GNU GPL.

Views and Operations

The remaining design step is deciding what sort of capabilities we wish to grant users to access and update the data. This is perhaps not as much of a problem in our database, but what if we were designing a database of employees? It might cause great discord in the office if everyone knew the salary of the guy who spends every day surfing the Web and taking two-hour lunch breaks; however, they should be able to access his name, department and extension. Likewise, they shouldn't be able to change that information unless they are the department secretary or manager.

We do want some protection on our address book, so that you can type in your grandmother's e-mail address with the peace of mind that a spammer can't get it just by accessing your web server. We also don't want to bother the user with implementation details like unique ID numbers on each record--this should be a user-friendly address book. Therefore, we will allow the following:

A user can retrieve records from her own address book.
A user can insert and delete records in her own address book.
The user will be shown only what she needs to see.

To this end, we create views. A view can be just a few columns of a table or a few columns of a join. In SQL, a view is defined with the CREATE VIEW statement, which creates a view from a SELECT statement. A view can be accessed just like a table, except you can't perform inserts, updates or deletes on it. Some of the views in our example application also use PostgreSQL functions to make the final application programming easier, i.e., ``make a mailto URL from this e-mail address''.

We also make note of the constraints which we cannot enforce with views: for example, the consideration that one may view only her own address book. We must implement these constraints in the application program.

Implementation

Implementation in PHP3 is quite straightforward; many things in the example code speak for themselves, and others are well-commented.

The source code for the example application is intended to be more of a teaching tool than a finished product. It works well, but you would certainly want to add features before making a large-scale service from it. I have released it under the GNU GPL, so feel free to modify my code and share your modifications with others. This code is also on the FTP site shown above.

Resources

Will Benton can be reached at wcb@ccil.org