If you want to outsmart the Spanish Inquisition, learn Python. This article is a practical introduction to writing non-trivial applications in Python.
by Jacek Artymiak
Despite what assembly code and C coders might tell us, high-level languages do have their place in every programmer's toolbox, and some of them are much more than a computer-science curiosity. Out of the many high-level languages we can choose from today, Python seems to be the most interesting for those who want to learn something new and do real work at the same time. Its no-nonsense implementation of object-oriented programming and its clean and easy-to-understand syntax make it a language that is fun to learn and use, which is not something we can say about most other languages.
In this tutorial, you will learn how to write applications that use command-line options, read and write to pipes, access environment variables, handle interrupts, read from and write to files, create temporary files and write to system logs. In other words, you will find recipes for writing real applications instead of the old boring Hello, World! stuff.
To begin, if you have not installed the Python interpreter on your system, now is the time. To make that step easier, install the latest Python distribution using packages compatible with your Linux distribution. rpm, deb and tgz are also available on your Linux CD-ROM or on-line. If you follow standard installation procedures, you should not have any problems.
Next, read the excellent Python Tutorial written by Guido van Rossum, creator of the Python programming language. This tutorial is part of the official Python documentation, and you can find it in either the /usr/doc/python-docs-1.5.2 or /usr/local/doc/python-docs-1.5.2 catalog. It may be delivered in the raw LaTeX format, which must be processed first; if you don't know how to do this, go to http://www.python.org/doc/ to download it in an alternative format.
I also recommend that you have the Python Library Reference handy; you might want it when the explanations given here do not meet your needs. You can find it in the same places as the Python Tutorial.
Creating scripts can be done using your favorite text editor as long as it saves text in plain ASCII format and does not automatically insert line breaks when the line is longer than the width of the editor's window.
Always begin your scripts with either
#! /usr/local/bin/pythonor
#! /usr/bin/pythonIf the access path to the python binary on your system is different, change that line, leaving the first two characters (#!) intact. Be sure this line is truly the first line in your script, not just the first non-blank line--it will save you a lot of frustration.
Use chmod to set the file permissions on your script to make it executable. If the script is for you alone, type chmod 0700 scriptfilename.py; if you want to share it with others in your group but not let them edit it, use 0750 as the chmod value; if you want to give access to everyone else, use the value 0755. For help with the chmod command, type man chmod.
Command-line options and arguments come in handy when we want to tell our scripts how to behave or pass some arguments (file names, directory names, user names, etc.) to them. All programs can read these options and arguments if they want, and your Python scripts are no different.
Implementing appropriate handlers boils down to reading the argv list and checking for the options and arguments you want your script to recognize. There are a few ways to do this. Listing 1 is a simple option handler that recognizes common -h, -help and --help options, and when they are found, it exits immediately after displaying the help message.
Copy and save this script as help.py, make it executable with the chmod 0755 help.py command, and run it several times, specifying different options, both recognized by the handler and not; e.g. with one of the options, you will see this message: ./help.py -h or ./help.py -o. If the option handler does recognize one of the options, you will see this message:
help.py--does nothing useful (yet) options: -h, -help, or --help--display this help Copyright (c) Jacek Artymiak, 2000If you invoke help.py with an option it does not recognize, or without any options at all, it will display the ``I don't recognize this option'' message.
Note that we need to import the sys module before we can check the contents of the argv list and before we can call the exit function. The sys.exit statement is a safety feature which prevents further program execution when one of the help options is found inside the argv list. This ensures that users don't do something dangerous before reading the help messages (for which they wouldn't have a need otherwise).
The simple help option handler described above works quite well and you can duplicate and change it to recognize additional options, but that is not the most efficient way to recognize multiple options with or without arguments. The ``proper'' way to do it is to use the getopt module, which converts options and arguments into a nice list of tuples. Listing 2 shows how it works.
Copy this script, save it as options.py and make it executable. As you can see, it uses two modules: sys and getopt which are imported right at the beginning. Then we define a simple function that displays the help message whenever something goes wrong.
The actual processing of command-line arguments begins with the try statement, where we are testing the list of command-line options and arguments (sys.argv) for errors defined as unknown options or missing arguments; if they are detected, the script will display an error message and exit immediately (see the except statement group). When no errors have been detected, our script splits the list of options and their arguments into tuples in the options list and begins parsing them by executing a series of loops, each searching for one option and its expected arguments.
The getopt.getopt function generates two lists in our sample script: options which contains options and their arguments; and xarguments which contains arguments not related to any of the options. We can safely ignore them in most cases.
To recognize short (one-letter such as -h) and long (those prefixed with --) options, getopt.getopt uses two separate arguments. The list of short options contains all of them listed in a single string, e.g., getopt.getopt(sys.argv, 'ahoinmdwq'). It is possible to specify, in that string, options that absolutely require an argument to follow them immediately (e.g., -vfilename) or after a space (e.g., -v filename). This is done by inserting a colon (:) after the option, like this: getopt.getopt(sys.argv, 'ahoiv:emwn'). However, this creates a silly problem that may cause some confusion and unnecessarily waste your time; if the user forgets to specify the argument for the option that requires it, the option that follows it becomes its argument. Consider this example:
script.py -v -hIf you put v: in the short option string argument of the getopt.getopt function, option -h will be treated as the argument of option -v. This is a nuisance and makes parsing of the list of tuples option, argument much more difficult. The solution to this problem is simple: don't use the colon, but check the second item of the tuple that contains the option (first item of the analyzed tuple) which requires an argument. If it's empty, report an error, like the -a option handler.
Long options prefixed with -- must be listed as a separate argument to the getopt.getopt, e.g., getopt.getopt(sys.argv, 'ah', ['view', 'file=']). They can be serviced by the same handler as short options.
What you do after locating options given by the user is up to you. Listing 2 can be used as a template for your scripts.
A properly written application, especially one that opens files for writing or creates temporary files, ought to implement interrupt handlers to ensure that no files are left corrupted or undeleted when the user or the system decides to stop the execution of our script.
Signals, like those sent when you press CTRL-C during the execution of your script, are caught by handlers which may ignore them, allow default handlers to be executed or perform custom actions. Python implements some default handlers, but you can override them with your own code using the signal module.
The function that lets us trap signals is signal.signal. Its two arguments are the number of the signal you want to trap and the name of the signal handler. Listing 3 is a simple script that captures the SIGINT (signal numbers have their own symbolic equivalents) signal sent to it when you press CTRL-C.
The SIGINT signal is not the only one you can capture. If you want to capture additional signals, add more signal.signal calls to handle them, changing the signal number (the signal.SIGxxx constant) and the name of the handler (optional; you can use the same handler with more than one signal). To see what signals are available in Linux, type kill -l on the command line.
Signals can be ignored, which is useful if you want to prevent some of them from disturbing the execution of your script. Listing 4 shows the way to do so (be careful; this script can't be stopped with CTRL-C).
Another signal worth remembering is SIGALRM. Setting up a handler for that signal allows you to stop the execution of your script after the given number of seconds. This is done with signal.alarm as shown in Listing 5.
Many scripts need to work with files. Remember that before you can read or write to a file, it must exist and be open. Listing 6 is an example of a script that opens a file for reading. Writing to a file requires only a small change (see Listing 7).
As you can see, the first of the two scripts shown in Listings 6 and 7 fails to open a file for reading if the file doesn't exist. This is correct behavior. The second script tries to open a file for writing: if the file exists, it is truncated (i.e., its contents are deleted); if it doesn't exist, it is created. This may not always be the desired behavior. When you want to append some data at the end of a file, you ought to open it for writing, while preserving its original contents. To do that, change the second argument of the open function from 'w' to 'a'.
Once the file is open, we can read or write to it using these methods:
If you want your script to create temporary files, use the tempfile module. It simplifies the task of creating temporary files by automatically creating unique file names based on templates defined in variables tempdir and template. It doesn't create or delete temporary files itself, but you can accomplish this using a method similar to the one used in Listing 8.
Note that you need to use both the os.O_CREAT and os.RDWR flags to tell the os.open function to create a temporary file for both reading and writing. Also, remember to close and remove all temporary files created before exiting a script. You will find more information on the functions, constants and variables used in that example in the os, posix and tempfile sections of the Python Library Reference Manual.
It is a good idea to implement a single handler that will remove temporary files before exiting from the script, as in Listing 9.
Many command-line tools let us create pipes for processing data, and it is a good idea to consider implementing this functionality in your own scripts. Pipes allow us to read from the standard input and write to the standard output of our script, as well as read from the standard output and write to the standard input of other commands.
Everything we need to implement pipes in our scripts is stored in the os and sys modules. Let's teach our script to read data from its own standard input (represented by sys.stdin) and copy it, unchanged, to its own standard output (sys.stdout):
#! /usr/local/bin/python import sys sys.stdout.write(sys.stdin.read())This works well, but doesn't allow us to modify the data appearing on the script's standard input. This can be achieved in several ways, depending on how much data you want to process at one time. Listing 10 reads one line at a time and inserts # at the beginning of each line.
If you use the read(n) method instead of readline, you can set the number of bytes to be read from the standard input. Listing 11 reads 256 bytes at a time.
A slightly different approach is needed when you want to read the whole file at one go. We use the sub function from the re module to perform a simple substitution. See Listing 12.
That's about all the basic knowledge needed to work with the standard input and output of our script. However, Python can read the standard output of external pipes or write to their standard input. This time, we'll need to use the os module and its popen function.
Listing 13 writes to the standard input of the pipe sed 's/-/+/g' > output one hundred lines of text, each containing the - string. The data passed to the pipe is then processed by sed and ends up as one hundred lines with +++. You can read from a pipe, too. Listing 14 shows you how.
If you develop applications that you want to keep an eye on and leave a trace of their activity in the system log in a way similar to many daemons running on a typical Linux system, you can do so with the syslog function located in the syslog module. To enable writing to system logs, import the syslog module and add calls to the syslog.syslog function at those points needing to be documented in the system log. See Listing 15 for an example.
To see the output from your script, open another X terminal window or switch to another console and type
tail -f /var/log/messagesto reveal what your script has just been doing. The output looks like Listing 16.
Remember that if you send the same message to the system log several times in a row, it will be buffered until a different one arrives in the system log buffer. It will appear there only once, and the next line in the system log will indicate how many times it was repeated. This bit of code,
#! /usr/local/bin/python import syslog # some code for a in ['a', 'b', 'c']: syslog.syslog('Hello from Python!')will generate the following results:
Jan 20 00:04:33 localhost python: Hello from Python! Jan 20 00:04:49 localhost last message repeated 2 timesDon't treat the system log like a trash can where you can send any kind of garbage; write only the most important information to it.
Some scripts may need to access information stored in one or more environment variables. Their values at the time your script is executed are stored in the os.environ dictionary, available after you import the os module. Here is the script that prints out all the environment variables set at the time your script executed.
#! /usr/local/bin/python import os for a in os.environ.keys(): print a, ' = ', os.environ.[a]If you are interested in checking for a particular value and using it in your own script, use this bit of code to get you started.
#! /usr/local/bin/python import os if os.environ['USER']: print 'Hello, '+os.environ['USER']Listing 17
If you want to modify the value of a particular environment variable while your script is running, use Listing 17 as a guide.
Now you know enough to write some well-behaved scripts that look and work like many other Linux commands. I encourage you to read the Python Library Reference and see what is possible using only the basic Python distribution. If the standard Python library is not enough for you, a visit to the official Python web site will reveal a wealth of possibilities and bags of useful code which you can use to learn and solve your programming problems.
Jacek Artymiak (artymiak@safenet.pl) is a consultant specializing in helping companies and individuals use Linux as a desktop or personal system for common, everyday jobs. His other occupations include being a writer, journalist, web designer, computer graphics artist and programmer.