Portions of your web site can be kept secure using user name, password combinations.
by Reuven M. Lerner
One of the wonderful things about the Web is that so much information is freely available. For the cost of a telephone call and a monthly bill from your Internet service provider, you can read hundreds of newspapers, get updates on the computer industry and listen to radio stations from your home town.
Even the most open, freely available site usually contains one or more sections that are not meant for public consumption. The reasons for cordoning off sections of the site can vary: Perhaps the webmaster wants a place to put his favorite hacks, a repository for testing new programs or a directory in which staff notices can be placed. If a site wants to charge for content or restrict access to members of an organization, the problem becomes even more obvious.
One popular way to handle these problems is to create a directory that others are unlikely to guess. But this approach, known as ``security through obscurity'', only works as long as no one leaks the name of the hidden directory. A far more robust approach will restrict access based on user name,password combinations.
This month, we will look at ways in which to restrict access to your server with the Web's standard user name, password authorization scheme. The principles should apply to any web server, but I will be using the freely available Apache web server (available at http://www.apache.org/) in my examples.
Access restrictions are part of HTTP, the protocol used in most web transactions. When your browser requests a document from a server using HTTP, it is usually returned immediately, preceded by several headers (i.e., name,value pairs) describing its length, the date on which it was last modified and the type of content it contains.
HTTP's designers recognized that webmasters might want to restrict access to one or more directories. Since version 1.0, HTTP has included provisions for restricting access to parts of a web site.
Let's see how this protection works from a computer's view, first by looking at an unprotected site and then by looking at a protected one. Once we understand how access protection works, we can incorporate it into our own work.
Everything starts when a user asks the browser to retrieve a document. No matter whether the user types the URL into a text field, selects it from a list of book marks or clicks on a hyperlink in an existing page of HTML, the effect is the same. The browser takes the URL, dissects it into a protocol, a server and a document, and takes the appropriate action. In the case of a URL such as:
http://www.ssc.com/lj/the protocol name is http, the server name is www.ssc.com, and the document name (really a directory) is /lj/. Most Web servers are configured such that requesting a directory is the same as requesting the file index.html within that directory, so the above URL is effectively equivalent to this one:
http://www.ssc.com/lj/index.htmlWe can simulate the browser's actions by dissecting the URL on our own and by requesting the document /lj/ from www.ssc.com using HTTP from the Linux command line. The TELNET program is generally used to log into a remote machine, most often to open a shell on that machine. By giving telnet an argument in addition to the machine name, we can specify the port to which we wish to connect. Since web servers sit on port 80 by default, we can connect to the web server on www.ssc.com by typing:
telnet www.ssc.com 80When we establish a connection to that web server, we can enter an HTTP request. These requests start with a line describing the action we wish to take (known as a ``method''), the name of the document we wish to retrieve and the version of HTTP we are using. Beginning with HTTP 1.0, this initial line can be followed by one or more header lines containing information about the user's browser, document types that the browser is willing to expect, HTTP cookies that may have been set in the past and other useful bits of information. For our purposes, it is enough to enter this line:
GET /lj/ HTTP/1.0and then press enter twice--once to end the line containing the request, and a second time to indicate that we have finished sending all of the headers and that we will now wait for a response from the server.
If all goes well, the server will respond by returning a page of HTML. In this particular case, we will receive HTML-formatted text (as we can tell from the text/html Content-Type header at the top of the response) with the latest information about this very magazine. Your browser is responsible for taking the HTML returned by the server and displaying it for you.
If we try to retrieve a protected document, things get a bit more complicated. (We will see how to protect documents in just a moment; for now, assume that it is possible to restrict access to documents on a web server.) My main workstation, running Red Hat Linux 4.2 and Apache 1.2.4, contains a ``private'' directory whose contents are restricted. Let's retrieve the contents of /private/, just as I requested the contents of /lj/ before.
From the shell prompt, I connect to the web server with the following:
telnet localhost 80Once I am connected, I request the ``private'' directory:
GET /private/ HTTP/1.0Instead of receiving the contents of the /private directory or the index.html file contained within /private, I get the following response:
HTTP/1.1 401 Authorization Required Date: Mon, 26 Jan 1998 12:08:17 GMT Server: Apache/1.2.4 WWW-Authenticate: Basic realm="TestRealmName" Connection: close Content-Type: text/html <HTML><HEAD> <TITLE>401 Authorization Required</TITLE> </HEAD><BODY> <H1>Authorization Required</H1> This server could not verify that you are authorized to access the document you requested. Either you supplied the wrong credentials (e.g., bad password), or your browser doesn't understand how to supply the credentials required.<P> </BODY></HTML> Connection closed by foreign host.In other words, my request was rejected because I had not authenticated myself. When did it give me a chance to do so?
Herein lies the dirty little secret of user authentication: when you retrieve a protected document, your browser really has to request the document twice. The first time that it tries to retrieve the document, a browser receives a message similar to the one that we received above, marked with the response code 401 indicating that you need authorization in order to retrieve this document.
Old or broken browsers stop at that point, presenting the server's error message to the user. Modern browsers that understand authentication present the user with a dialog box into which the user can type a user name and password. The browser then takes the user name and password, puts both into Base64 format and sends that along in an ``Authorization'' header after the initial request.
Modern browsers also save time by keeping track of user names and passwords that you have already entered. Thus, the first time you encounter a protected directory, you are prompted for your user name and password. The second time you retrieve a file from the same directory, you will not be prompted. Whether the browser waits to receive the 401 -- Authorization Required error before sending the user name, password pair or it automatically responds to the message depends on the implementation.
Thus, if my user name is ``reuven'' and my password is ``password'', I can retrieve the contents of the /private/ directory by using TELNET to access port 80 on my local computer and entering:
GET /private/ HTTP/1.0 Authorization: Basic cmV1dmVuOnJldXZlbg==The first line is identical to what we have seen before; it indicates that we want to use HTTP 1.0 to retrieve the document named /private/ (which happens to be a directory, although the client does not know that) using the GET method. Rather than pressing enter twice after the first line, we only press it once and then add a single additional header. This one begins with ``Authorization:'', meaning that we are about to send authorization information to the system using the ``Basic'' algorithm, which is nothing more than a Base64 encoding of the user name and password that the user entered in the form username:password.
If the user name, password combination succeeds, the system returns the contents of the resource requested by the browser. If the request fails, the same message (with response code 401) is returned to the user's browser. The browser can allow the user to try again or can display the error message sent along with the 401 message.
In this case, the user name, password combination does indeed work, giving me the contents of /private/, which is the file /private/index.html, returned in the following manner:
HTTP/1.1 200 OK Date: Mon, 26 Jan 1998 12:41:14 GMT Server: Apache/1.2.4" Last-Modified: Mon, 26 Jan 1998 10:49:49 GMT ETag: "1057-ca-34cc6a4d" Content-Length: 202 Accept-Ranges: bytes Connection: close Content-Type: text/html <HTML> <Head> <Title>My private site</Title> </Head> <Body> <H1>My private site</H1> <P>This is my private site. From here, you can get to <a href="test.html">my test page</a>.</P> </Body> </HTML>The 200 status code at the top of the response indicates that everything has gone well and that the server was able to retrieve the document that we requested. As you can see from the Content-Type header (or simply by looking at the document's contents), the requested document contains HTML-formatted text. Were we to view this through our browser, we would undoubtedly see the text in different sizes.
If you are wondering how I managed to get the Base64 equivalent of my user name, password combination, it was with the help of the following one-line Perl program:
perl -e 'use MIME::Base64;\ print encode_base64("reuven:password");'Entering the above in the shell results in:
cmV1dmVuOnBhc3N3b3Jkwhich must have been the Base64 equivalent of reuven:password, because it allowed us access to the resource.
MIME::Base64 is a Perl module that you can get from CPAN (http://www.perl.com/CPAN/) for handling MIME-standard mail. I cannot remember the last time that I had to write a program to handle e-mail encoded with MIME, but the Base64 module comes in handy for non-mail applications such as this one.
If you have any experience with securing computer networks, you might be surprised to learn that user names and passwords are passed between web browsers and servers unencrypted. Indeed, while the text isn't passed completely in the clear, it would require another one-line Perl program to turn the Base64-encoded user name, password string back into its ASCII original.
Suffice it to say that this is not a very secure scheme. Someone monitoring packets sent over the network would have to work a bit harder in order to capture your user name and password, but not significantly harder than if the text were sent without any transformation.
At the very least, make sure to use user names and passwords that have nothing to do with /etc/passwd, the file that typically stores user information on Linux systems. Your secret documents can still be available via the Web, but your machine will not be open to break-ins which are a much more serious threat. (Someone who breaks into your computer can do much more than just read your documents.)
An authentication scheme known as ``Digest'' will soon be available. It is already available in Apache and is waiting for a browser to implement it. The digest method applies a function to a number of parameters, including the user name and password that are going to be sent, and a number generated by the server that is sent as part of the headers in the 401 -- Authorization required response. The result of the digest function is then sent over the network, rather than the user name and password themselves. This is not a foolproof system, but it is far better than the current situation in which your passwords are easily available.
Now that we have discussed the theory behind all of this, we will take a look at what is necessary to protect directories on your server.
The first thing you need is a file in which user names and passwords can be stored. Apache comes with a program, htpasswd which can be used to create and modify such files. The syntax is fairly simple:
htpasswd [-c] passwordfile usernameTo create a new password file (or overwrite an existing one), use the following syntax:
htpasswd -c /etc/httpd/conf/passwords reuvenIf you enter the above line at the Linux shell (with htpasswd in your $PATH environment variable), you will be prompted for a password. After you have entered the password twice, the user name, password pair will be stored in the file you specified.
The -c option creates a new file or overwrites an existing one. (This option is unnecessary to create a user; you can do that without the -c option, as described below.) Be especially careful with the -c option, because it overwrites old versions of the password file without warning or making backups.
To add a user to an existing password file or to change the password of an existing user, invoke htpasswd without the -c option:
htpasswd /etc/httpd/conf/passwords reuvenRegardless of whether you are adding a new user or changing an existing user's password, you will be asked to enter the user's password twice. When you have done that, the named file will be updated.
The password file contains nothing more than names and encrypted passwords in the format:
username1:password1 username2:password2 username3:password3For example, the password file that I created for this column contains the following entries:
reuven:zZDDIZ0NOlPzw reena:SjCCCbsjjz2Z2 foobar:RpubVfdhWwv1UIf you expect to handle many authorized user requests on your system and if the number of users on your server is high, you might want to consider using authorization using a more efficient system, such as DBM or DB. Support for DB and DBM are available for modern versions of Apache (although the appropriate module must be compiled in), as is support for a number of relational databases, including Msql and MySQL. More information on these options is available on the Apache web site.
Now that we have a list of user names and passwords in the correct format, we can use that list to protect the directories on our server. Each directory can use a different file containing user names and passwords--so your ``top-secret'' directory can have a different list of users than your ``secret'' directory.
There are two ways to protect files on your system. One is to put a file, called .htaccess by default, in the directory you wish to protect. This gives you the flexibility to modify individual directories quickly and easily and to give responsibility for different directories to the people in charge of those directories--but it also removes a certain element of central control.
We will thus look at the method in which access restrictions are defined in srm.conf, one of the Apache configuration files. Placing the access restrictions in srm.conf means you will have centralized control of access to your server, and you will have to restart the server each time you make changes.
Protected directories are declared in srm.conf within <Directory> and </Directory> statements with a relatively straightforward syntax. For instance, I added the following lines in this file to protect directories used in this article:
<Directory /home/httpd/html/private> AuthType Basic AuthName TestRealmName AuthUserFile /tmp/authusers require valid-user </Directory>The first and last lines confine these declarations to /home/httpd/html/private, the protected directory on my server. Someone requesting a file within /home/httpd/html (the root directory on my web server) can do so without having to enter a user name or password. Someone trying to retrieve a file in /home/httpd/html/private (known as /private to the outside world), or in any subdirectory of /private, will have to enter a user name and password.
The user name, password pair is be passed using the ``basic'' authentication scheme that we saw earlier, in which user name, password is encoded using Base64 and sent as part of the HTTP headers following the request. Until browsers begin to support the ``digest'' method (or even more secure methods), all protected directories should declare the AuthType to be ``Basic''.
AuthName is a way of identifying this directory to the outside world. You might want to call the directory something meaningful, such as ``Joe's private directory'', or ``FYI''. You might use AuthName to distinguish between different protected sections of your web server, such as ``private area'' and ``staff area''. AuthName is generally displayed in the dialog box into which a user can enter her user name and password.
Next, we indicate which password file should be used for this directory. As mentioned earlier, each directory can use a separate password file, so it is important to specify which one you wish to use. If you expect to use more than a few password files on your system, you might want to investigate the use of groups, which allow you to grant privileges to different subsets of users in a single password file. (Users can be placed in groups, which we will not address here, but which allow you to associate each user in the password file with one or more groups).
Finally, we indicate that we will allow only valid users, meaning only those whose user names and passwords are in the password file named in AuthUserFile. You could also specify individual users who would be allowed into the site, such as:
require user reuven reenaOnce you have placed this information in your server's srm.conf file, you need to tell the server to reread its configuration file. You can do this by shutting the server down and then restarting it or by sending it a HUP signal, as follows:
killall -v -1 httpdThis command sends a HUP signal (aka signal #1) to all instances of httpd currently running. Remember that Apache normally runs a number of servers simultaneously, so trying to identify individual processes and use the standard kill command is probably not a good way to go about it.
Once you have restarted the server, protected directories are only accessible to someone whose user name and password appears in one of these directories. If you want to test the protection mechanism, using TELNET (as described above) to pretend to be a web browser might be the best way to do it, in order to avoid a browser's cache of passwords.
Just as you can protect directories containing HTML files and pictures, you can also protect directories containing CGI programs. For instance, if you want to make a selected number of CGI programs accessible only to a select number of users, you can define /cgi-bin/private in the same way as you did /private.
Here, for example, is the definition that I added to srm.conf in order to protect /cgi-bin/private:
<Directory /home/httpd/cgi-bin/private> AuthType Basic AuthName TestRealmName AuthUserFile /tmp/authusers require valid-user </Directory>As you can see, the definition is identical to that for /private, except for the name of the directory.
In this case, we will be asked for a user name, password combination if we try to execute a CGI program in this directory, using either GET or POST. (Apache allows you to set a separate access privilege for each method, so you could allow all users to GET but a restricted group to POST and still others to PUT and DELETE.) Before the request will actually be sent to the CGI program in question, we will have to authenticate ourselves.
One of the nice benefits of protecting CGI directories is that all programs in that directory immediately have access to a new environment variable, REMOTE_USER, which contains the name of the user in question. This is available to CGI programs written in Perl and using CGI.pm via the remote_user method, but all programs can retrieve the value of the environment variable.
How can this be of use? Well, we know that the user name must be unique; no two users can share a user name. Thus, we can use the user name as a primary key (i.e., a unique index) into a table in a relational database containing more information about the user--his or her age, interests and last visit.
Indeed, over the last few months, this column has looked at a variety of techniques for keeping track of information about users, most often by setting an HTTP cookie on the user's computer and setting a primary key value in the cookie.
The advantage of this system is that the user must verify his or her identity before being allowed to access the program--meaning that by the time the CGI program is executed, we can be sure that the user name exists, is associated with a real user and that this user represents that person (or has access to the user's password). HTTP cookies operate on a per-computer basis; if someone were to use my computer while I am not looking, they could retrieve information from all of the private sites from which I have retrieved cookies.
Another advantage of using this form of identification rather than cookies is that it gives the user mobility. No longer is the user tied to a particular computer or browser. While users must sign in before being allowed to use the site, they can access the site from anywhere rather than just from their computer at work or home.
There are disadvantages, too--the main one is the inherent insecurity associated with the basic authentication scheme. And some users prefer not to be bothered with having to enter their user name and password each time they visit a site. Such users would rather the site recognize and remember their settings automatically.
Listing 1 is a short CGI program written in Perl that identifies the user name entered. If this program is placed in an unprotected directory, it will indicate that no value for REMOTE_USER is available. If run from within a protected directory, however, it will return the user name that was used to access that directory.
If you were to create a table in a relational database (such as MySQL), you could define the primary key to be a user name of no more than eight characters. The value of remote_user could then be used as a reliable index into the database.
Protecting web sites is sure to be an increasingly important topic as the Web continues to mature. Apache is remarkably flexible when it comes to such security mechanisms. While I mentioned groups, there was not enough space to discuss additional options, such as restricting access by domain or IP address. See the Apache documentation for more information on this issue and the sidebar for additional sources.
While user name, password combinations are useful for restricting access to a web site, they can also be used to produce a unique key into a database. If you are thinking of creating a database to keep track of your users, you might want to consider using access controls to force users to log in.
Restricting access to directories on your web site is neither complicated nor difficult and lets you put sensitive or private materials on the Web without having to worry about someone discovering a secret URL.
Reuven M. Lerner is an Internet and Web consultant living in Haifa, Israel, who has been using the Web since early 1993. In his spare time, he cooks, reads and volunteers with educational projects in his community. You can reach him at reuven@netvision.net.il.