Tools for Programmers (Running Linux)

14.1. Debugging with gdb

Are you one of those programmers who scoff at the very idea of using a debugger to trace through code? Is it your philosophy that if the code is too complex for even the programmer to understand, then the programmer deserves no mercy when it comes to bugs? Do you step through your code, mentally, using a magnifying glass and a toothpick? More often than not, are bugs usually caused by a single-character omission, such as using the = operator when you mean +=?

Then perhaps you should meet gdb--the GNU debugger. Whether or not you know it, gdb is your friend. It can locate obscure and difficult-to-find bugs that result in core dumps, memory leaks, and erratic behavior (both for the program and the programmer). Sometimes even the most harmless-looking glitches in your code can cause everything to go haywire, and without the aid of a debugger like gdb, finding these problems can be nearly impossible--especially for programs longer than a few hundred lines. In this section, we'll introduce you to the most useful features of gdb by way of examples. There's a book on gdb, too--the Free Software Foundation's Debugging with GDB.

gdb is capable of either debugging programs as they run, or examining the cause for a program crash with a core dump. Programs debugged at runtime with gdb can either be executed from within gdb itself or can be run separately; that is, gdb can attach itself to an already running process to examine it. First, we'll discuss how to debug programs running within gdb and then move on to attaching to running processes and examining core dumps.

14.1.1. Tracing a Program

Our first example is a program called trymh that detects edges in a grayscale image. trymh takes as input an image file, does some calculations on the data, and spits out another image file. Unfortunately, it crashes whenever it is invoked, as so:

papaya$ trymh < image00.pgm > image00.pbm 
Segmentation fault (core dumped)

Now, using gdb we could analyze the resulting core file, but for this example, we'll show how to trace the program as it runs, instead.[51]

[51]The sample programs in this section are not programs you're likely to run into anywhere; they were thrown together by the authors for the purpose of demonstration.

Before we use gdb to trace through the executable trymh, we need to ensure that the executable has been compiled with debugging code (see the section "Section 13.1.6, "Enabling Debugging Code"" in Chapter 13, "Programming Languages"). To do so, we should compile trymh using the -g switch with gcc.

Note that enabling optimization (-O) with debug code (-g) is not illegal but not recommended. The problem is that gcc is too smart for its own good. For example, if you have two identical lines of code in two different places in a function, gdb may unexpectedly jump to the second occurrence of the line, instead of the first, as expected. This is because gcc combined the two lines into a single line of machine code used in both instances.

Some of the automatic optimizations performed by gcc can be confusing when using a debugger. To turn off all optimization (even optimizations performed without specifying -O), use the -O0 (that's dash-oh-zero) option with gcc.

Now we can fire up gdb to see what the problem might be:

papaya$ gdb trymh 
GDB is free software and you are welcome to distribute copies of it 
 under certain conditions; type "show copying" to see the conditions. 
There is absolutely no warranty for GDB; type "show warranty" for details. 
GDB 4.16 (i486-unknown-linux --target i486-linux),
Copyright 1996 Free Software Foundation, Inc. 
(gdb)

Now gdb is waiting for a command. (The command help displays information on the available commands.) The first thing we want to do is start running the program so that we can observe its behavior. However, if we immediately use the run command, the program simply executes until it exits or crashes.

First, we need to set a breakpoint somewhere in the program. A breakpoint is just a location in the program where gdb should stop and allow us to control execution of the program. For the sake of simplicity, let's set a breakpoint on the first line of actual code, so that the program stops just as it begins to execute. The list command displays several lines of code (an amount that is variable) at a time:

(gdb) list 
12        main() { 
13 
14         FloatImage inimage; 
15         FloatImage outimage; 
16         BinaryImage binimage; 
17         int i,j; 
18 
19         inimage = (FloatImage)imLoadF(IMAGE_FLOAT,stdin); 
20         outimage = laplacian_float(inimage); 
21  
(gdb) break 19 
Breakpoint 1 at 0x289c: file trymh.c, line 19. 
(gdb)

A breakpoint is now set at line 19 in the current source file. You can set many breakpoints in the program; breakpoints may be conditional (that is, triggered only when a certain expression is true), unconditional, delayed, temporarily disabled, and so on. You may set breakpoints on a particular line of code, a particular function, a set of functions, and in a slew of other ways. You may also set a watchpoint, using the watch command, which is similar to a breakpoint but is triggered whenever a certain event takes place--not necessarily at a specific line of code within the program. We'll talk more about breakpoints and watchpoints later in the chapter.

Next, we use the run command to start the program running. run takes as arguments the same arguments you'd give trymh on the command line; this can include shell wildcards and input/output redirection, as the command is passed to /bin/sh for execution:

(gdb) run < image00.pgm > image00.pfm 
Starting program: /amd/dusk/d/mdw/vis/src/trymh < image00.pgm >\
image00.pfm 

Breakpoint 1, main () at trymh.c:19 
19         inimage = (FloatImage)imLoadF(IMAGE_FLOAT,stdin); 
(gdb)

As expected, the breakpoint is reached immediately at the first line of code. We can now take over.

The most useful program-stepping commands are next and step. Both commands execute the next line of code in the program, except that step descends into any function calls in the program, and next steps only to the next line of code in the same function. next quietly executes any function calls that it steps over but does not descend in their code for us to examine.

imLoadF is a function that loads an image from a disk file. We know this function is not at fault (you'll have to trust us on that one), so we wish to step over it using the next command:

(gdb) next 
20         outimage = laplacian_float(inimage); 
(gdb)

Here, we are interested in tracing into the suspicious-looking laplacian_float function, so we use the step command:

(gdb) step 
laplacian_float (fim=0x0) at laplacian.c:21 
21          i = 20.0; 
(gdb)

Let's use the list command to get some idea of where we are:

(gdb) list  
16        FloatImage laplacian_float(FloatImage fim) {  
17  
18          FloatImage mask;  
19          float i;  
20  
21          i = 20.0;  
22          mask=(FloatImage)imNew(IMAGE_FLOAT,3,3);  
23          imRef(mask,0,0) = imRef(mask,2,0) = imRef(mask,0,2) = 1.0;  
24          imRef(mask,2,2) = 1.0; imRef(mask,1,0) = imRef(mask,0,1) = i/5;  
25          imRef(mask,2,1) = imRef(mask,1,2) = i/5; imRef(mask,1,1) = -i;  
(gdb) list  
26  
27          return convolveFloatWithFloat(fim,mask);  
28        }  
(gdb)

As you can see, using list multiple times just displays more of the code. Because we don't want to step manually through this code, and we're not interested in the imNew function on line 22, let's continue execution until line 27. For this, we use the until command:

(gdb) until 27 
laplacian_float (fim=0x0) at laplacian.c:27 
27          return convolveFloatWithFloat(fim,mask);  
(gdb)

Before we step into the convolveFloatWithFloat function, let's be sure the two parameters, fim and mask, are valid. The print command examines the value of a variable:

(gdb) print mask 
$1 = (struct {...} *) 0xe838 
(gdb) print fim 

$2 = (struct {...} *) 0x0 
(gdb)

mask looks fine, but fim, the input image, is null. Obviously, laplacian_float was passed a null pointer instead of a valid image. If you have been paying close attention, you noticed this as we entered laplacian_ float earlier.

Instead of stepping deeper into the program (as it's apparent that something has already gone wrong), let's continue execution until the current function returns. The finish command accomplishes this:

(gdb) finish 
Run till exit from #0  laplacian_float (fim=0x0) at laplacian.c:27 
0x28c0 in main () at trymh.c:20 
20       outimage = laplacian_float(inimage); 
Value returned is $3 = (struct {...} *) 0x0 
(gdb)

Now we're back in main. To determine the source of the problem, let's examine the values of some variables:

(gdb) list 
15       FloatImage outimage; 
16       BinaryImage binimage; 
17       int i,j; 
18       
19       inimage = (FloatImage)imLoadF(IMAGE_FLOAT,stdin); 
20       outimage = laplacian_float(inimage); 
21       
22       binimage = marr_hildreth(outimage); 
23       if  (binimage == NULL) { 
24         fprintf(stderr,"trymh: binimage returned NULL\n"); 
(gdb) print inimage 
$6 = (struct {...} *) 0x0 
(gdb)

The variable inimage, containing the input image returned from imLoadF, is null. Passing a null pointer into the image-manipulation routines certainly would cause a core dump in this case. However, we know imLoadF to be tried and true because it's in a well-tested library, so what's the problem?

As it turns out, our library function imLoadF returns NULL on failure--if the input format is bad, for example. Because we never checked the return value of imLoadF before passing it along to laplacian_float, the program goes haywire when inimage is assigned NULL. To correct the problem, we simply insert code to cause the problem to exit with an error message if imLoadF returns a null pointer.

To quit gdb, just use the command quit. Unless the program has finished execution, gdb will complain that the program is still running:

(gdb) quit 
The program is running.  Quit anyway (and kill it)? (y or n) y 
papaya$

In the following sections we examine some specific features provided by the debugger, given the general picture just presented.

14.1.2. Examining a Core File

Do you hate it when a program crashes and spites you again by leaving a 10 MB core file in your working directory, wasting much-needed space? Don't be so quick to delete that core file; it can be very helpful. A core file is just a dump of the memory image of a process at the time of the crash. You can use the core file with gdb to examine the state of your program (such as the values of variables and data) and determine the cause for failure.

The core file is written to disk by the operating system whenever certain failures occur. The most frequent reason for a crash and the subsequent core dump is a memory violation--that is, trying to read or write memory that your program does not have access to. For example, attempting to write data into a null pointer can cause a segmentation fault, which is essentially a fancy way to say, "you screwed up." Other errors that result in core files are so-called "bus errors" and "floating-point exceptions." Segmentation faults are a common error and occur when you try to access (read from or write to) a memory address that does not belong to your process's address space. This includes the address 0, as often happens with uninitialized pointers. Bus errors result in using incorrectly aligned data and are therefore rare on the Intel architecture, which does not pose strong alignment conditions like other architectures, such as SPARC. Floating-point exceptions point to a severe problem in a floating-point calculation like an overflow, but the most usual case is a division by zero.

However, not all such memory errors will cause immediate crashes. For example, you may overwrite memory in some way, but the program continues to run, not knowing the difference between actual data and instructions or garbage. Subtle memory violations can cause programs to behave erratically. One of the authors once witnessed a bug that caused the program to jump randomly around but without tracing it with gdb, it still appeared to work normally. The only evidence of a bug was that the program returned output that meant, roughly, that two and two did not add up to four. Sure enough, the bug was an attempt to write one too many characters into a block of allocated memory. That single-byte error caused hours of grief.

You can prevent these kinds of memory problems (even the best programmers make these mistakes!) using the Checker package, a set of memory-management routines that replaces the commonly used malloc() and free() functions. We'll talk about Checker in the section "Section 14.2.5, "Using Checker"."

However, if your program does cause a memory fault, it will crash and dump core. Under Linux, core files are named, appropriately, core. The core file appears in the current working directory of the running process, which is usually the working directory of the shell that started the program, but on occasion, programs may change their own working directory.

Some shells provide facilities for controlling whether core files are written. Under bash, for example, the default behavior is not to write core files at all. In order to enable core file output, you should use the command:

ulimit -c unlimited

probably in your .bashrc initialization file. You can specify a maximum size for core files other than unlimited, but truncated core files may not be of use when debugging applications.

Also, in order for a core file to be useful, the program must be compiled with debugging code enabled, as described in the previous section. Most binaries on your system will not contain debugging code, so the core file will be of limited value.

Our example for using gdb with a core file is yet another mythical program called cross. Like trymh in the previous section, cross takes as input an image file, does some calculations on it, and outputs another image file. However, when running cross, we get a segmentation fault:

papaya$ cross < image30.pfm > image30.pbm 
Segmentation fault (core dumped) 
papaya$

To invoke gdb for use with a core file, you must specify not only the core filename but also the name of the executable that goes along with that core file. This is because the core file itself does not contain all the information necessary for debugging:

papaya$ gdb cross core 
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.8, Copyright 1993 Free Software Foundation, Inc...
Core was generated by `cross'.
Program terminated with signal 11, Segmentation fault.
#0  0x2494 in crossings (image=0xc7c8) at cross.c:31
31              if ((image[i][j] >= 0) &&
(gdb)

gdb tells us that the core file terminated with signal 11. A signal is a kind of message that is sent to a running program from the kernel, the user, or the program itself. Signals are generally used to terminate a program (and possibly cause it to dump core). For example, when you type the interrupt character, a signal is sent to the running program, which will probably kill the program.

In this case, signal 11 was sent to the running cross process by the kernel when cross attempted to read or write to memory that it did not have access to. This signal caused cross to die and dump core. gdb says that the illegal memory reference occurred on line 31 of the source file cross.c:

(gdb) list  
26          xmax = imGetWidth(image)-1; 
27          ymax = imGetHeight(image)-1; 
28         
29          for (j=1; j<xmax; j++) { 
30            for (i=1; i<ymax; i++) { 
31              if ((image[i][j] >= 0) && 
32                  (image[i-1][j-1] < 0) || 
33                  (image[i-1][j] < 0) || 
34                  (image[i-1][j+1] < 0) || 
35                  (image[i][j-1] < 0) || 
(gdb)

Here, we see several things. First of all, there is a loop across the two index variables i and j, presumably in order to do calculations on the input image. Line 31 is an attempt to reference data from image[i][j], a two-dimensional array. When a program dumps core while attempting to access data from an array, it's usually a sign that one of the indices is out of bounds. Let's check them:

(gdb) print i 
$1 = 1 
(gdb) print j 
$2 = 1194 
(gdb) print xmax 
$3 = 1551 
(gdb) print ymax 
$4 = 1194 
(gdb)

Here we see the problem. The program was attempting to reference element image[1][1194], however, the array extends only to image[1550][1193] (remember that arrays in C are indexed from 0 to max-1). In other words, we attempted to read the 1195th row of an image that only has 1194 rows.

If we look at lines 29 and 30, we see the problem: the values xmax and ymax are reversed. The variable j should range from 1 to ymax (because it is the row index of the array), and i should range from 1 to xmax. Fixing the two for loops on lines 29 and 30 corrects the problem.

Let's say that your program is crashing within a function that is called from many different locations, and you want to determine where the function was invoked from and what situation led up to the crash. The backtrace command displays the call stack of the program at the time of failure. If you are like me and are too lazy to type backtrace all the time, you will be delighted to hear that you can also use the shortcut bt.

The call stack is the list of functions that led up to the current one. For example, if the program starts in function main, which calls function foo, which calls bamf, the call stack looks like:

(gdb) backtrace 
#0  0x1384 in bamf () at goop.c:31 
#1  0x4280 in foo () at goop.c:48 
#2  0x218 in main () at goop.c:116 
(gdb)

As each function is called, it pushes certain data onto the stack, such as saved registers, function arguments, local variables, and so forth. Each function has a certain amount of space allocated on the stack for its use. The chunk of memory on the stack for a particular function is called a stack frame, and the call stack is the ordered list of stack frames.

In the following example, we are looking at a core file for an X-based animation program. Using backtrace gives us:

(gdb) backtrace  
#0  0x602b4982 in _end () 
#1  0xbffff934 in _end () 
#2  0x13c6 in stream_drawimage (wgt=0x38330000, sn=4)\
at stream_display.c:94 
#3  0x1497 in stream_refresh_all () at stream_display.c:116 
#4  0x49c in control_update_all () at control_init.c:73 
#5  0x224 in play_timeout (Cannot access memory at address 0x602b7676. 
(gdb)

This is a list of stack frames for the process. The most recently called function is frame 0, which is the "function" _end in this case. Here, we see that play_timeout called control_update_all, which called stream_refresh_all, and so on. Somehow, the program jumped to _end where it crashed.

However, _end is not a function; it is simply a label that specifies the end of the process data segment. When a program branches to an address such as _end, which is not a real function, it is a sign that something must have caused the process to go haywire, corrupting the call stack. (This is known in hacker jargon as "jumping to hyperspace.") In fact, the error "Cannot access memory at address 0x602b7676" is another indication that something bizarre has occurred.

We can see, however, that the last "real" function called was stream_drawimage, and we might guess that it is the source of the problem. To examine the state of stream_drawimage, we need to select its stack frame (frame number 2), using the frame command:

(gdb) frame 2 
#2  0x13c6 in stream_drawimage (wgt=0x38330000, sn=4)\
at stream_display.c:94 
94        XCopyArea(mydisplay,streams[sn].frames[currentframe],\
XtWindow(wgt), 
(gdb) list 
91 
92        printf("CopyArea frame %d, sn %d, wid %d\n",currentframe,sn,wgt); 
93 
94        XCopyArea(mydisplay,streams[sn].frames[currentframe],\
XtWindow(wgt), 
95                 picGC,0,0,streams[sn].width,streams[sn].height,0,0); 
(gdb)

Well, not knowing anything else about the program at hand, we can't see anything wrong here, unless the variable sn (being used as an index into the array streams) is out of range. From the output of frame, we see that stream_drawimage was called with an sn parameter of 4. (Function parameters are displayed in the output of backtrace, as well as whenever we change frames.)

Let's move up another frame, to stream_refresh_all, to see how stream_display was called. To do this, we use the up command, which selects the stack frame above the current one:

(gdb) up  
#3  0x1497 in stream_refresh_all () at stream_display.c:116 
116         stream_drawimage(streams[i].drawbox,i); 
(gdb) list 
113     void stream_refresh_all(void) { 
114       int i; 
115       for (i=0; i<=numstreams; i++) { 
116         stream_drawimage(streams[i].drawbox,i); 
117 
(gdb) print i 
$2 = 4 
(gdb) print numstreams 
$3 = 4 
(gdb)

Here, we see that the index variable i is looping from 0 to numstreams, and indeed i here is 4, the second parameter to stream_drawimage. However, numstreams is also 4. What's going on?

The for loop on line 115 looks funny; it should read:

for (i=0; i<numstreams; i++) {

The error is in the use of the <= comparison operator. The streams array is indexed from 0 to numstreams- 1, not from 0 to numstreams. This simple off-by-one error caused the program to go berserk.

As you can see, using gdb with a core dump allows you to browse through the image of a crashed program to find bugs. Never again will you delete those pesky core files, right?

14.1.3. Debugging a Running Program

gdb can also debug a program that is already running, allowing you to interrupt it, examine it, and then return the process to its regularly scheduled execution. This is very similar to running a program from within gdb, and there are only a few new commands to learn.

The attach command attaches gdb to a running process. In order to use attach you must also have access to the executable that corresponds to the process.

For example, if you have started the program pgmseq with process ID 254, you can start up gdb with:

papaya$ gdb pgmseq

and once inside gdb, use the command:

(gdb) attach 254 
Attaching program `/home/loomer/mdw/pgmseq/pgmseq', pid 254 
_ _select (nd=4, in=0xbffff96c, out=0xbffff94c, ex=0xbffff92c, tv=0x0) 
    at _ _select.c:22 
_ _select.c:22: No such file or directory. 
(gdb)

The No such file or directory error is given because gdb can't locate the source file for _ _select. This is often the case with system calls and library functions, and it's nothing to worry about.

You can also start gdb with the command:

papaya$ gdb pgmseq 254

Once gdb attaches to the running process, it temporarily suspends the program and lets you take over, issuing gdb commands. Or you can set a breakpoint or watchpoint (with the break and watch commands) and use continue to cause the program to continue execution until the breakpoint is triggered.

The detach command detaches gdb from the running process. You can then use attach again, on another process, if necessary. If you find a bug, you can detach the current process, make changes to the source, recompile, and use the file command to load the new executable into gdb. You can then start the new version of the program and use the attach command to debug it. All without leaving gdb!

In fact, gdb allows you to debug three programs concurrently: one running directly under gdb, one tracing with a core file, and one running as an independent process. The target command allows you to select which one you wish to debug.

14.1.4. Changing and Examining Data

To examine the values of variables in your program, you can use the print, x, and ptype commands. The print command is the most commonly used data inspection command; it takes as an argument an expression in the source language (usually C or C++) and returns its value. For example:

(gdb) print mydisplay 
$10 = (struct _XDisplay *) 0x9c800 
(gdb)

This displays the value of the variable mydisplay, as well as an indication of its type. Because this variable is a pointer, you can examine its contents by dereferencing the pointer, as you would in C:

(gdb) print *mydisplay  
$11 = {ext_data = 0x0, free_funcs = 0x99c20, fd = 5, lock = 0,  
  proto_major_version = 11, proto_minor_version = 0,  
  vendor = 0x9dff0 "XFree86", resource_base = 41943040,  
  … 
  error_vec = 0x0, cms = {defaultCCCs = 0xa3d80 "",\
clientCmaps = 0x991a0 "'",  
    perVisualIntensityMaps = 0x0}, conn_checker = 0, im_filters = 0x0} 
(gdb)

mydisplay is an extensive structure used by X programs; we have abbreviated the output for your reading enjoyment.

print can print the value of just about any expression, including C function calls (which it executes on the fly, within the context of the running program):

(gdb) print getpid() 
$11 = 138 
(gdb)

Of course, not all functions may be called in this manner. Only those functions that have been linked to the running program may be called. If a function has not been linked to the program and you attempt to call it, gdb will complain that there is no such symbol in the current context.

More complicated expressions may be used as arguments to print as well, including assignments to variables. For example:

(gdb) print mydisplay->vendor = "Linux" 
$19 = 0x9de70 "Linux" 
(gdb)

assigns the value of the vendor member of the mydisplay structure the value "Linux" instead of "XFree86" (a useless modification, but interesting nonetheless). In this way, you can interactively change data in a running program to correct errant behavior or test uncommon situations.

Note that after each print command, the value displayed is assigned to one of the gdb convenience registers, which are gdb internal variables that may be handy for you to use. For example, to recall the value of mydisplay in the previous example, we need to merely print the value of $10:

(gdb) print $10 
$21 = (struct _XDisplay *) 0x9c800 
(gdb)

You may also use expressions, such as typecasts, with the print command. Almost anything goes.

The ptype command gives you detailed (and often long-winded) information about a variable's type or the definition of a struct or typedef. To get a full definition for the struct _XDisplay used by the mydisplay variable, we use:

(gdb) ptype mydisplay  
type = struct _XDisplay { 
    struct _XExtData *ext_data; 
    struct _XFreeFuncs *free_funcs; 
    int fd; 
    int lock; 
    int proto_major_version; 
    … 
    struct _XIMFilter *im_filters; 
} * 
(gdb)

If you're interested in examining memory on a more fundamental level, beyond the petty confines of defined types, you can use the x command. x takes a memory address as an argument. If you give it a variable, it uses the value of that variable as the address.

x also takes a count and a type specification as an optional argument. The count is the number of objects of the given type to display. For example, x/100x 0x4200 displays 100 bytes of data, represented in hexadecimal format, at the address 0x4200. Use help x to get a description of the various output formats.

To examine the value of mydisplay->vendor, we can use:

(gdb) x mydisplay->vendor 
0x9de70 <_end+35376>:   76 'L'  
(gdb) x/6c mydisplay->vendor 
0x9de70 <_end+35376>:   76 'L'  105 'i' 110 'n' 117 'u' 120 'x' 0 '\000' 
(gdb) x/s mydisplay->vendor 
0x9de70 <_end+35376>:    "Linux"  
(gdb)

The first field of each line gives the absolute address of the data. The second represents the address as some symbol (in this case, _end) plus an offset in bytes. The remaining fields give the actual value of memory at that address, first in decimal, then as an ASCII character. As described earlier you can force x to print the data in other formats.

14.1.5. Getting Information

The info command provides information about the status of the program being debugged. There are many subcommands under info; use help info to see them all. For example, info program displays the execution status of the program:

(gdb) info program  
Using the running image of child process 138. 
Program stopped at 0x9e. 
It stopped at breakpoint 1. 
(gdb)

Another useful command is info locals, which displays the names and values of all local variables in the current function:

(gdb) info locals 
inimage = (struct {...} *) 0x2000 
outimage = (struct {...} *) 0x8000 
(gdb)

This is a rather cursory description of the variables. The print or x commands describe them further.

(gdb) info address inimage 
Symbol "inimage" is a local variable at frame offset -20. 
(gdb)

By frame offset, gdb means that inimage is stored 20 bytes below the top of the stack frame.

You can get information on the current frame using the info frame command, as so:

(gdb) info frame  
Stack level 0, frame at 0xbffffaa8: 
 eip = 0x9e in main (main.c:44); saved eip 0x34 
 source language c. 
 Arglist at 0xbffffaa8, args: argc=1, argv=0xbffffabc 
 Locals at 0xbffffaa8, Previous frame's sp is 0x0 

 Saved registers: 
  ebx at 0xbffffaa0, ebp at 0xbffffaa8, esi at 0xbffffaa4, eip at\
0xbffffaac 
(gdb)

This kind of information is useful if you're debugging at the assembly-language level with the disass, nexti, and stepi commands (see the section "Section 14.1.6.2, "Instruction-level debugging"").

14.1.6. Miscellaneous Features

We have barely scratched the surface about what gdb can do. It is an amazing program with a lot of power; we have introduced you only to the most commonly used commands. In this section, we'll look at other features of gdb and then send you on your way.

If you're interested in learning more about gdb, we encourage you to read the gdb manual page and the Free Software Foundation manual. The manual is also available as an online Info file. (Info files may be read under Emacs, or using the info reader; see the section "Section 9.2.3, "Tutorial and Online Help"" in Chapter 9, "Editors, Text Tools, Graphics, and Printing", for details.)

14.1.6.1. Breakpoints and watchpoints

As promised, we're going to demonstrate further use of breakpoints and watchpoints. Breakpoints are set with the break command; similarly, watchpoints are set with the watch command. The only difference between the two is that breakpoints must break at a particular location in the program--on a certain line of code, for example--and watchpoints may be triggered whenever a certain expression is true, regardless of location within the program. Though powerful, watchpoints can be horribly inefficient; any time the state of the program changes, all watchpoints must be reevaluated.

When a breakpoint or watchpoint is triggered, gdb suspends the program and returns control to you. Breakpoints and watchpoints allow you to run the program (using the run and continue commands) and stop only in certain situations, thus saving you the trouble of using many next and step commands to walk through the program manually.

There are many ways to set a breakpoint in the program. You can specify a line number, as in break 20. Or, you can specify a particular function, as in break stream_unload. You can also specify a line number in another source file, as in break foo.c:38. Use help break to see the complete syntax.

Breakpoints may be conditional; that is, the breakpoint triggers only when a certain expression is true. For example, using the command:

break 184 if (status == 0)

sets a conditional breakpoint at line 184 in the current source file, which triggers only when the variable status is zero. The variable status must be either a global variable or a local variable in the current stack frame. The expression may be any valid expression in the source language that gdb understands, identical to the expressions used by the print command. You can change the breakpoint condition (if it is conditional) using the condition command.

Using the command info break gives you a list of all breakpoints and watchpoints and their status. This allows you to delete or disable breakpoints, using the commands clear, delete, or disable. A disabled breakpoint is merely inactive, until you reenable it (with the enable command); on the other hand, a breakpoint that has been deleted is gone from the list of breakpoints for good. You can also specify that a breakpoint be enabled once; meaning that once it is triggered, it will be disabled again--or enabled once and then deleted.

To set a watchpoint, use the watch command, as in:

watch (numticks < 1024 && incoming != clear)

Watchpoint conditions may be any valid source expression, as with conditional breakpoints.

14.1.6.2. Instruction-level debugging

gdb is capable of debugging on the processor-instruction level, allowing you to watch the innards of your program with great scrutiny. However, understanding what you see requires not only knowledge of the processor architecture and assembly language, but also some gist of how the operating system sets up process address space. For example, it helps to understand the conventions used for setting up stack frames, calling functions, passing parameters and return values, and so on. Any book on protected-mode 80386/80486 programming can fill you in on these details. But be warned: protected-mode programming on this processor is quite different from real-mode programming (as is used in the MS-DOS world). Be sure that you're reading about native protected-mode 386 programming, or else you might subject yourself to terminal confusion.

The primary gdb commands used for instruction-level debugging are nexti, stepi, and disass. nexti is equivalent to next, except that it steps to the next instruction, not the next source line. Similarly, stepi is the instruction-level analogue of step.

The disass command displays a disassembly of an address range that you supply. This address range may be specified by literal address or function name. For example, to display a disassembly of the function play_timeout, use the command:

(gdb) disass play_timeout 
Dump of assembler code for function play_timeout: 
to 0x2ac: 
0x21c <play_timeout>:           pushl  %ebp 
0x21d <play_timeout+1>:         movl   %esp,%ebp 
0x21f <play_timeout+3>:         call   0x494 <control_update_all> 
0x224 <play_timeout+8>:         movl   0x952f4,%eax 
0x229 <play_timeout+13>:        decl   %eax 
0x22a <play_timeout+14>:        cmpl   %eax,0x9530c 
0x230 <play_timeout+20>:        jne    0x24c <play_timeout+48> 
0x232 <play_timeout+22>:        jmp    0x29c <play_timeout+128> 
0x234 <play_timeout+24>:        nop     
0x235 <play_timeout+25>:        nop     
… 

0x2a8 <play_timeout+140>:       addb   %al,(%eax) 
0x2aa <play_timeout+142>:       addb   %al,(%eax) 
(gdb)

This is equivalent to using the command disass 0x21c (where 0x21c is the literal address of the beginning of play_timeout).

You can specify an optional second argument to disass, which will be used as the address where disassembly should stop. Using disass 0x21c 0x232 will only display the first seven lines of the assembly listing in the previous example (the instruction starting with 0x232 itself will not be displayed).

If you use nexti and stepi often, you may wish to use the command:

display/i $pc

This causes the current instruction to be displayed after every nexti or stepi command. display specifies variables to watch or commands to execute after every stepping command. $pc is a gdb internal register that corresponds to the processor's program counter, pointing to the current instruction.

14.1.6.3. Using Emacs with gdb

Emacs (described in the section "Section 9.2, "The Emacs Editor"" in Chapter 9, "Editors, Text Tools, Graphics, and Printing") provides a debugging mode that lets you run gdb--or another debugger--within the integrated program-tracing environment provided by Emacs. This so-called "Grand Unified Debugger" library is very powerful and allows you to debug and edit your programs entirely within Emacs.

To start gdb under Emacs, use the Emacs command M-x gdb and give the name of the executable to debug as the argument. A buffer will be created for gdb, which is similar to using gdb alone. You can then use core-file to load a core file or attach to attach to a running process, if you wish.

Whenever you step to a new frame (e.g., when you first trigger a breakpoint), gdb opens a separate window that displays the source corresponding to the current stack frame. This buffer may be used to edit the source text just as you normally would with Emacs, but the current source line is highlighted with an arrow (the characters =>). This allows you to watch the source in one window, and execute gdb commands in the other.

Within the debugging window, there are several special key sequences that can be used. They are fairly long, though, so it's not clear that you'll find them more convenient than just entering gdb commands directly. Some of the more common commands include:

C-x C-a C-s: The equivalent of a gdb step command, updating the source window appropriately
C-x C-a C-i: The equivalent of a stepi command
C-x C-a C-n: The equivalent of a next command
C-x C-a C-r: The equivalent of a continue command
C-x C-a <: The equivalent of an up command
C-x C-a >: The equivalent of a down command

If you do type in commands in the traditional manner, you can use M-p to move backwards to previously issued commands and M-n to move forward. You can also move around in the buffer using Emacs commands for searching, cursor movement, and so on. All in all, using gdb within Emacs is more convenient than using it from the shell.

In addition, you may edit the source text in the gdb source buffer; the prefix arrow will not be present in the source when it is saved.

Emacs is very easy to customize, and there are many extensions to this gdb interface that you could write yourself. You could define Emacs keys for other commonly used gdb commands or change the behavior of the source window. (For example, you could highlight all breakpoints in some fashion or provide keys to disable or clear breakpoints.)