Finally, let's get to form input. We mentioned forms briefly in Chapter 1, The Common Gateway Interface, and we'll cover them in more detail in Chapter 4, Forms and CGI. But here, we just want to introduce you to the basic concepts behind forms.
As we described in Chapter 1, forms provide a way to get input from users and supply it to a CGI program, as shown in Figure 2.1. The Web browser allows the user to select or type in information, and then sends it to the server when the Submit button is pressed. In this chapter, we'll talk a little about how the CGI program accesses the form input.
One way to send form data to a CGI program is by appending the form information to the URL, after a question mark. You may have seen URLs like the following:
http://some.machine/cgi-bin/name.pl?fortune
Up to the question mark (?), the URL should look familiar. It is merely a CGI script being called, by the name name.pl.
What's new here is the part after the "?". The information after the "?" character is known as a query string. When the server is passed a URL with a query string, it calls the CGI program identified in the first part of the URL (before the "?") and then stores the part after the "?" in the environment variable QUERY_STRING. The following is a CGI program called name.pl that uses query information to execute one of three possible UNIX commands.
#!/usr/local/bin/perl print "Content-type: text/plain", "\n\n"; $query_string = $ENV{'QUERY_STRING'}; if ($query_string eq "fortune") { print `/usr/local/bin/fortune`; } elsif ($query_string eq "finger") { print `/usr/ucb/finger`; } else { print `/usr/local/bin/date`; } exit (0);
You can execute this script as either:
http://some.machine/cgi-bin/name.pl?fortune http://some.machine/cgi-bin/name.pl?finger
or
http://some.machine/cgi-bin/name.pl
and you will get different output. The CGI program executes the appropriate system command (using backtics) and the results are sent to standard output. In Perl, you can use backtics to capture the output from a system command.
NOTE:
You should always be very careful when executing any type of system commands in CGI applications, because of possible security problems. You should never do something like this:
print `$query_string`;
NOTE:
The danger is that a diabolical user can enter a dangerous system command, such as:
rm -fr /
NOTE:
which can delete everything on your system.
Nor should you expose any system data, such as a list of system processes, to the outside world.
Although the previous example will work, the following example is a more realistic illustration of how forms work with CGI. Instead of supplying the information directly as part of the URL, we'll use a form to solicit it from the user.
(Don't worry about the HTML tags needed to create the form; they are covered in detail in Chapter 4, Forms and CGI.)
<HTML> <HEAD><TITLE>Simple Form!</TITLE></HEAD> <BODY> <H1>Simple Form!</H1> <HR> <FORM ACTION="/cgi-bin/unix.pl" METHOD="GET"> Command: <INPUT TYPE="text" NAME="command" SIZE=40> <P> <INPUT TYPE="submit" VALUE="Submit Form!"> <INPUT TYPE="reset" VALUE="Clear Form"> </FORM> <HR> </BODY> </HTML>
Since this is HTML, the appearance of the form depends on what browser is being used. Figure 2.2 shows what the form looks like in Netscape.
This form consists of one text field titled "Command:" and two buttons. The Submit Form! button is used to send the information in the form to the CGI program specified by the ACTION attribute. The Clear Form button clears the information in the field.
The METHOD=GET attribute to the <FORM> tag in part determines how the data is passed to the server. We'll talk more about different methods soon, but for now, we'll use the default method, GET. Now, assuming that the user enters "fortune" into the text field, when the Submit Form! button is pressed the browser sends the following request to the server:
GET /cgi-bin/unix.pl?command=fortune HTTP/1.0 . . (header information) .
The server executes the script called unix.pl in the cgi-bin directory, and places the string "command=fortune" into the QUERY_STRING environment variable. Think of this as assigning the variable "command" (specified by the NAME attribute to the <INPUT> tag) with the string supplied by the user, "fortune".
command=fortune
Let's go through the simple unix.pl CGI program that handles this form:
#!/usr/local/bin/perl print "Content-type: text/plain", "\n\n"; $query_string = $ENV{'QUERY_STRING'}; ($field_name, $command) = split (/=/, $query_string);
After printing the content type (text/plain in this case, since the UNIX programs are unlikely to produce HTML output) and getting the query string from the %ENV array, we use the split function to separate the query string on the "=" character into two parts, with the first part before the equal sign in $field_name, and the second part in $command. In this case, $field_name will contain "command" and $command will contain "fortune." Now, we're ready to execute the UNIX command:
if ($command eq "fortune") { print `/usr/local/bin/fortune`; } elsif ($command eq "finger") { print `/usr/ucb/finger`; } else { print `/usr/local/bin/date`; } exit (0);
Since we used the GET method, all the form data is included in the URL. So we can directly access this program without the form, by using the following URL:
http://some.machine/cgi-bin/unix.pl?command=fortune
It will work exactly as if you had filled out the form and submitted it.
In the previous example, we used the GET method to process the form. However, there is another method we can use, called POST. Using the POST method, the server sends the data as an input stream to the program. That is, if in the previous example the <FORM> tag had read:
<FORM ACTION="unix.pl" METHOD="POST">
the following request would be sent to the server:
POST /cgi-bin/unix.pl HTTP/1.0 . . (header information) . Content-length: 15 command=fortune
The version of unix.pl that handles the form with POST data follows. First, since the server passes information to this program as an input stream, it sets the environment variable CONTENT_LENGTH to the size of the data in number of bytes (or characters). We can use this to read exactly that much data from standard input.
#!/usr/local/bin/perl $size_of_form_information = $ENV{'CONTENT_LENGTH'};
Second, we read the number of bytes, specified by $size_of_form_information, from standard input into the variable $form_info.
read (STDIN, $form_info, $size_of_form_information);
Now we can split the $form_info variable into a $field_name and $command, as we did in the GET version of this example. As with the GET version, $field_name will contain "command," and $command will contain "fortune" (or whatever the user typed in the text field). The rest of the example remains unchanged:
($field_name, $command) = split (/=/, $form_info); print "Content-type: text/plain", "\n\n"; if ($command eq "fortune") { print `/usr/local/bin/fortune`; } elsif ($command eq "finger") { print `/usr/ucb/finger`; } else { print `/usr/local/bin/date`; } exit (0);
Since it's the form that determines whether the GET or POST method is used, the CGI programmer can't control which method the program will be called by. So scripts are often written to support both methods. The following example will work with both methods:
#!/usr/local/bin/perl $request_method = $ENV{'REQUEST_METHOD'}; if ($request_method eq "GET") { $form_info = $ENV{'QUERY_STRING'}; } else { $size_of_form_information = $ENV{'CONTENT_LENGTH'}; read (STDIN, $form_info, $size_of_form_information); } ($field_name, $command) = split (/=/, $form_info); print "Content-type: text/plain", "\n\n"; if ($command eq "fortune") { print `/usr/local/bin/fortune`; } elsif ($command eq "finger") { print `/usr/ucb/finger`; } else { print `/usr/local/bin/date`; } exit (0);
The environment variable REQUEST_METHOD contains the request method used by the form. In this example, the only new thing we did was check the request method and then assign the $form_info variable as needed.
So far, we've shown an example for retrieving very simple form information. However, form information can get complicated. Since under the GET method the form information is sent as part of the URL, there can't be any spaces or other special characters that are not allowed in URLs. Therefore, some special encoding is used. We'll talk more about this in Chapter 4, Forms and CGI, but for now we'll show a very simple example. First the HTML needed to create a form:
<HTML> <HEAD><TITLE>When's your birthday?</TITLE></HEAD> <BODY> <H1>When's your birthday?</H1> <HR> <FORM ACTION="/cgi-bin/birthday.pl" METHOD="POST"> Birthday (in the form of mm/dd/yy): <INPUT TYPE="text" NAME="birthday" SIZE=40> <P> <INPUT TYPE="submit" VALUE="Submit Form!"> <INPUT TYPE="reset" VALUE="Clear Form"> </FORM> <HR> </BODY> </HTML>
When the user submits the form, the client issues the following request to the server (assuming the user entered 11/05/73):
POST /cgi-bin/birthday.pl HTTP/1.0 . . (information) . Content-length: 21 birthday=11%2F05%2F73
In the encoded form, certain characters, such as spaces and other character symbols, are replaced by their hexadecimal equivalents. In this example, our program needs to "decode" this data, by converting the "%2F" to "/".
Here is the CGI program-birthday.pl-that handles this form:
#!/usr/local/bin/perl $size_of_form_information = $ENV{'CONTENT_LENGTH'}; read (STDIN, $form_info, $size_of_form_information);
The following complicated-looking regular expression is used to "decode" the data (see Chapter 4, Forms and CGI for a comprehensive explanation of how this works).
$form_info =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
In the case of this example, it will turn "%2F" into "/". The rest of the program should be easy to follow:
($field_name, $birthday) = split (/=/, $form_info); print "Content-type: text/plain", "\n\n"; print "Hey, your birthday is on: $birthday. That's what you told me, right?", "\n"; exit (0);