Shell Functions and Path Variables, Part 2

Mr. Collyer continues his discussion with a detailed description of the addpath function.

by Stephen Collyer

In my previous article, I described a shell function that handled command-line options. In Part 2, we will use it in the path variable functions I promised to describe.

Each of the path variable shell functions is structured in a similar fashion. First, local variables are declared. Next comes the option-handling code, which employs the options function we became familiar with last month. Finally, the main functionality of the code is implemented. Because each function has the same structure, I will describe only one in detail this month. The next installment will describe various implementation features of the remaining functions.

We must first understand what the ``environment'' of a process is. The path variables we manipulate will usually be variables in the environment of a process, and we need our functions to alter their values (for example to add or remove directories).

In a nutshell, the environment of a process is a group of named variables (similar to shell variables) which are passed to any created child process. (A process, of course, is the entity which runs a program. If you type ls to a shell, for example, this creates a process to run the ls program.) A shell variable can be put into the environment by ``exporting'' it. For example, the commands

A=fred
export $A

create a shell variable called A and turn it into an environment variable. So, if you start a new process from this shell, it can examine its environment, find the A variable and notice it has a value of fred. Environment variables, therefore, provide a one-way channel of information--from parent process to child. The parent and child processes don't share the environment variables--the child is given a new copy of them. Thus, if a child process changes the value of an environment variable, the parent will not be aware of the change.

Now, we wish to modify the path environment variables so that we can't start a new process when we run our shell utilities. Our utilities are implemented as functions, because functions run in the context of the calling process. Although they do not get a copy of the environment variables, they do have access to the existing set.

The addpath Function

The purpose of addpath is to add a pathel (path element) to a pathvar (path variable) in an idempotent fashion. Idempotent literally means ``of equal power'' and figuratively means ``doing it N times is the same as doing it once''. So, for example,

NEWP=
addpath -p NEWP /abc
addpath -p NEWP /abc

adds /abc to the pathvar NEWP exactly once. addpath checks the pathvar to see if the pathel is already present. If not, it adds it; if so, it doesn't.

This function is helpful, because if you use it to add to your PATH, for example, you won't end up with multiple copies of the same directory in your path. The code to do this is shown split up into various listings to make discussion easier.

Listing 1

In Listing 1, we create some variables local to the function. The set in the first three lines is for options handling; the set in the final 4 lines contains variables specific to this function. The options handling variables tend to be very similar in each function that uses the options function.

Listing 2

In Listing 2, we handle the options supplied to the function. We do this by calling the options functions described last month. We tell it the names of the options we are prepared to handle, and give it a quoted list of the supplied arguments. When options returns, it will give us information on the supplied arguments in the form of variables which it has created. In addpath, we are prepared to handle -h, -f, -b and -p, (-p options. The -p option requires an argument, which is the name of a pathvar, such as PATH. When options returns, it also creates a variable called options_shift_val. We can use this to shift away those command-line arguments it has already handled (i.e., arguments like -h, -b and so on). We do this immediately after the call to options. So, if the user had specified -h only, then options_shift_val would be set to 1, and we would shift away one argument; if -b and -p were specified, we would shift away three arguments (-b, -p and its required pathvar).

The next four ``if'' blocks appear in each pathvar function because they perform the following common tests:

If options created a variable called opt_h, then a -h argument was supplied and the user wanted some help. This we give by printing out usage information for the function and calling return. When a function call returns, it terminates, like a function call return in C. Don't make the mistake of calling exit in a function --this will terminate the shell process calling the function, which is probably not what you wanted to do.
We examine the options_missing_arg variable, which options creates if you didn't supply a required argument for an option. If this occurs, we print out the usage message, tell the user what went wrong and return.
We examine options_unknown_option. options sets this when you supply an argument that we're not prepared to handle (i.e., one that is not -h, -f, -b or -p, in this case). We return after giving the user some help.
Finally, we look at options_num_args_left. This gives us a count of the number of arguments that remain after we shifted away the ones that have already processed. The code here will be specific to each function, but for addpath, we require the user to specify the name of the directory to append. We check for this (sloppily) by complaining if no arguments are left at this point.

So far, we have performed the type of options processing that crops up over and over in shell scripts and functions that take arguments. We have checked that we haven't been supplied any options we don't know about, and that required arguments have been given. We have also provided a basic help facility via the -h handling.

Listing 3

Listing 3 shows the option handling code specific to the addpath function. The first two lines in this section set values for the variables COMMAND and pathvar to be used later. These lines may seem a little cryptic at first sight, but essentially they set up the default path variable to which we add (PATH) and the default command we execute to add to it. Note that there are single quotes around the contents of COMMAND. The shell will store the literal string between the quotes into COMMAND, and no variable substitution will be attempted.

If we type addpath /def, then pathvar will contain PATH, sep will contain : and dirname, will contain /def. Later, when we evaluate a line of code containing COMMAND, the shell will substitute the literal strings ${pathvar}, ${sep} and ${dirname} with their values to produce ``${PATH}:/def''. Note that we used three individual characters inside COMMAND: $, { and }. this ensures the shell interprets them literally and that they appear in the evaluated output.

The next two lines of code override the defaults if either the -f or -p option is given. If -f was given, the user wishes to add to the front of the pathvar and we set COMMAND accordingly, putting the pathvar variable at the end. If -p was given, the user supplied the name of a path variable; the options function stores this in opt_p and we set the pathvar variable to this value. So, if the user typed:

addpath -f -p NEWP /def

then opt_p and thus pathvar will contain NEWP, and COMMAND will look like $dirname$sep\$$pathvar, with the dirname variable sitting at the front.

Next, we set the value of the sep variable. Usually we want this to contain :, as that character separates path elements. However, if the path variable to which we are adding is initially empty, we don't want anything in sep. This ensures we don't add leading or trailing : characters to the path. The three lines of code starting sep=: implement this.

Finally, we store the name of the path element to be added in dirname. Note that at this point, it will be in $1; any other command-line arguments were shifted away immediately after options returned.

Listing 4

In Listing 4, we actually add the path element to the path variable. The first thing to note is that the real work of the function takes five lines of code. We have already performed all the required setup and argument checking, so there is little left to do. Essentially, we check to see whether the path element is already present in pathvar; if not, we add it.

element=$(eval echo \$$pathvar | colon2line |
  grep -x "$dirname")

First, note the variable=$(...) syntax. This is equivalent to the older variable="..." syntax and means ``Run the commands within the brackets, and use what they write to standard output as the value of variable.'' This is called command substitution. In our example, we have a pipeline of commands; the output of the final command (grep) becomes the value of element. Let's look at each command in the pipeline. Remember that $pathvar contains the name of the path variable we wish to add to, and $dirname contains the name of the directory to add.

Assuming the value of $pathvar is PATH, the command line is eval echo pathvar first expanded by the shell to eval echo $PATH. Next, because of the eval, the shell reevaluates the line. This time it, expands $PATH into something like ``/usr/bin:/bin:/usr/bin/X11'' and echoes its contents into the next command in the pipeline.

colon2line is another shell function, with code we have not yet seen. It merely prints each element of a colon-separated string on a separate line (i.e., it converts the : character to a newline character). This can be done in many ways. Here, for example, is an awk one-liner for this purpose:

awk `BEGIN{RS=":"}{print}'

This command tells awk to assume that : separates records in a piece of text, and then to print each colon-separated record it sees. Each separate line is then read by the third command in the pipeline.

The command grep -x "$dirname" is used to check for the presence of the path element we wish to add. The shell will replace $dirname with its value before grep is executed. It is surrounded by quotes so we can correctly handle the pathological case of $dirname containing spaces.

We tell grep we want only exact matches (-x). This ensures that if we run the following commands:

addpath /abc
addpath /ab

we add both of the distinct path elements. Without -x, grep would report it had seen a match when the second command was run, because ``/ab'' is a substring of ``/abc''.

If grep sees $dirname in its input, it writes the name to the standard output. Because we are using grep in command substitution brackets, its output is assigned as the value of the element variable.

In a nutshell, the pipeline ensures the element has a non-null value if it is already present in the path variable, and a null value otherwise. If the element is null ([ "$element" = "" ]), we add it with the following command:

eval eval $pathvar=$COMMAND

By now, you should be familiar with the purpose of the eval command: it tells the shell to reevaluate the line being processed, and on each evaluation, shell variables are expanded. The only question here is, why do we need two of them? Assume the user types the following:

addpath /abc

Assume further that PATH contained /usr/bin only, and that /abc was not already present in $PATH. Let us look at how this line is expanded step by step.

$pathvar=$COMMAND: initially
PATH=\$\{${pathvar}\}${sep}${dirname}: after first shell expansion
PATH=${PATH}:/abc: after first eval
PATH=/usr/bin:/abc: after second eval

It should be clear that if we had applied only one eval to the command, the shell would have executed the command in step 3 and we would have replaced the current contents of PATH with the literal characters ``${PATH}:/abc''. The final eval forces the replacement of the embedded ``${PATH}'' string with its current value. Remember also that the PATH variable we alter here is not local to this function; any changes made are visible to the calling shell.

That ends the description of the addpath function. The structure of the other pathvar functions is very similar to that of addpath, so in Part 3, I will describe only one or two points of interest about each of them.

Stephen Collyer (stephen@twocats.demon.co.uk) is a freelance software developer working in the UK. His interests include scripting languages and distributed and thread-based systems. Occasionally, he finds the time to talk to his wife and two remarkably attractive and highly intelligent children.