Computer Science 010

Lecture Notes 5

Introductory Debugging and

More Unix

More on compiling and linking

When you use man or xman to look at manual pages for library functions, we have already noted that you need to look for a line that tells you what file to include. This is information for the compiler so that it knows the types of parameters and return values that a library function has. Many functions require additional information that you must include on the compiler command line. This information tells the compiler where an executable version of the code is so that it can be found at runtime. It turns out that the compiler knows where the most common stuff is so the things that are included from stdio.h, stdlib.h and string.h do not require any additional compiler information. Look at the man page for sqrt to see an example for which the compiler needs more information. It is in the math library, which is not linked by default. In order to link with the math library, you should add "-lm" to your command line. You should put it at the end of the command line, after the name of the C file you are compiling. So, we would say:

gcc -Wall -o foo foo.c -lm

More on printing

We have seen the command lpr to print files. lpr does not make very nice printouts, however. Another Unix command that gives nicer output is enscript. Use it as follows:

enscript -r -2 -Ec foo.c

-r tells it to print the output in landscape format. -2 says to use 2 columns. -Ec says to use the formatting rules for C. This will put keywords in bold face. Then give the file(s) you want to print. It will format them and send them to the printer.

Emacs tip of the day

Try M-x font-lock-mode when editing a C file. See if you can figure out how to turn this functionality on automatically every time you start emacs.

Unix tip of the day

Calling your executable test is a BAD idea. Try renaming one of your executables to test and see if you can run it. What's going on?

Debugging Strategies

Finding the bugs in a program is a skill that you need to learn if you are going to become a successful programmer. Strategies that have been available to you thus far have been:

These are all good strategies to use, but as programs become larger and more complex, these strategies are not always sufficient. Also, they are not always sufficient to understand program behavior when C code misbehaves by indexing arrays out of bounds, derferencing dangling pointers, etc. For these types of situations, it is better to be able to control the execution of your program more carefully and interactively examine addresses and variable values. To do that, you need a debugger.

The gdb Debugger

gdb is the debugger that is most commonly used to debug C programs on Unix machines. To use a debugger, you start the debugger and then run the program inside the debugger. You can stop execution at any line of code, display the value of any expression, execute the code one line at a time, and many other things. In order to use the debugger, you must compile your program with an extra option -g:

gcc -Wall -g -o foo foo.c

This option adds information to your executable file that allows the debugger to know where variables are stored so that when you ask the debugger what value a variable has, it will know what address the variable is stored at. This information is not normally present in executable programs as it makes them bigger and has no value unless you are using a debugger.

To start the debugger you use the gdb command and tell it the name of the executable program you want to debug. If you have a core file that was generated by a crash of the program, you then say match.core and it will load the core file so that you have the state of the program at the time of the crash. This is extremely useful. Many Unix systems always store core files with the name "core" but FreeBSD generally stores core files by appending ".core" to the executable name.

gdb match match.core

When gdb starts with a core file, it shows you what line of code caused the program to crash. You can also look at the values that variables had at the time of the crash.

A gdb Session

To demonstrate the debugger, I will show you a program that has a bug and show you how to use the debugger to find the bug. Here's the program we will work with:

#include <stdlib.h>
#include <stdio.h>

int main () {
  char *s = (char *) malloc (strlen ("Williams") + 1);
  strcpy (s, "Williams");
  free (s);
  s = NULL;
  printf ("%c\n", s[0]);
  exit (0);
}
   

When I run this program, I get a segmentation fault. (Do you know why?) Suppose I don't know why from looking at the code so I start the debugger (the program is called bug). I am going to run gdb within Emacs. After starting Emacs, I type "M-x gdb". The message area at the bottom of Emacs says:

Run gdb (like this): gdb 

It leaves the cursor at the end of gdb. I type in the name of the executable and core file so that it now appears as follows:

Run gdb (like this): gdb bug bug.core

Emacs creates a new buffer with the following contents:

Current directory is /home/faculty/terescoj/
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
Core was generated by `bug'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libc.so.4...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0  0x8048554 in main () at bug.c:9
(gdb) 

The first few lines is just legalese that appears whenever you start gdb. Then we see a statement indicating which executable created the core file. The next line tells us what caused the program to crash, in this case a Segmentation fault. We already knew this because we got a similar message from Unix when the program crashed. Next it tells us what libraries it is reading symbols from. You can safely ignore those lines. Finally it tells us where the program crashed. #0 indicates that this is the top function on the call stack. Next is the address of the instruction that crashed. You can ignore this. The remainder is important. It tells us the function that crashed, which file it is in, and which line number, in this case line 9. (gdb) is the prompt that is now waiting for user input.

The first thing we should do is find out what line we crashed. The last line of output tells us which line and we should also see another buffer in our Emacs window that contains the source code of bug.c with line 9 beginning with "=>":

#include <stdlib.h>
#include <stdio.h>
  
int main () {
  char *s = (char *) malloc (strlen ("Williams") + 1);
  strcpy (s, "Williams");
  free (s);
  s = NULL;
=>printf ("%c\n", s[0]);
  exit (0);
}
   

The arrow is pointing to line 9. Since we died with a segmentation fault, we expect to see that we tried to dereference a null pointer. There is only one variable on line 9, namely s. It is being used as an array, but we know that arrays and pointers are often interchangeable. In fact, it was declared to be a pointer. We should now be suspicious that s is null. We can confirm this by asking the debugger what value s has using the p (for print) command:

(gdb) p s

gdb responds with:

$1 = 0x0

Our suspicions have been confirmed. 0x0 is the memory address 0, or null. Now we need to figure out why it had that value. Simple enough in this case, since the immediately preceding line assigned it NULL! If only all memory errors were that easy to find.

Examining the Call Stack

Many times when you start the debugger, you may find that the program crashed in some library code. The error is still almost certainly in your code, not the library. In that case, you need to determine which of your routines called the library routine and the values of variables in that routine. This requires you to examine the call stack of routines at the time of the crash. The call stack is the list of functions that are currently active along with the current lines in each function. So if function f1 calls function f2, f2 is at the top of the stack and f1 is the next element in the stack. f1's current line is the line where it called f2. Here is a minor variation of the first program that crashes inside a library function so we need to examine the call stack to see what happened:

#include <stdlib.h>
#include <stdio.h>
   
int main () {
  char *s = (char *) malloc (strlen ("Williams") + 1);
  strcpy (s, "Williams");
  free (s);
  s = NULL;
  printf ("%s\n", s);
  exit (0);
}
   

The initial output that we get from gdb is the same in this case, except for the line that tells us where we are in the program. This time it says:

#0   0xef6a4734 in strlen ()

strlen is a library function, not our code. In fact, we don't even call strlen in our code! Also, since the source code is not available for this library function, the second buffer does not show us the line in strlen that crashed. To figure out what is happening, we look at the call stack using the bt command:

(gdb) bt
   

The output we get is:

#0   0xef6a4734 in strlen ()
#1   0xef6da65c in _doprnt ()
#2   0xef6e37b8 in printf ()
#3   0x10ae4 in main () at free.c:9

This means that strlen was called by some function called _doprnt, also not something we wrote. _doprnt was called by printf. printf was called in main at line 9. To see the values of variables in main, we issue the up command 3 times:

(gdb) up
#1   0xef6da65c in _doprnt ()
(gdb) up
#2   0xef6e37b8 in printf ()
(gdb) up
#3   0x10ae4 in main () at free.c:9
   

Now, we can look at the code and variables in main as we did earlier and again discover that s is null. This time the crash didn't occur until we were inside library functions because we did not actually attempt to dereference s. We just passed it as a parameter to printf. The first attempt to use the pointer was inside strlen so that is where the crash happened.

Executing Programs within gdb

In the previous examples we could easily determine what went wrong just by looking at the core file. That is not always the case. In particular, to track down problems with dangling pointers, we often need to execute the program stopping at every line looking for an unexpected change to a variable. There are several commands in gdb that help us do that. Consider the following program:

#include <stdio.h>
   
char *fillBuffer();
   
int main () {
  char *buffer;
  char *buffer2;
   
  buffer = fillBuffer();
  printf ("buffer = %s\n", buffer);
   
  buffer2 = fillBuffer();
  printf ("buffer = %s\n", buffer);
  printf ("buffer2 = %s\n", buffer2);
  exit (0);
}
   
char *fillBuffer () {
  char line[1000];
   
  printf ("Enter a line: ");
  gets (line);
  return line;
}
   

When we run this program, we get the following result:

-> fillbuffer
Enter a line: abc
buffer = abc
Enter a line: def
buffer = def
buffer2 = def

buffer is initially set correctly, but at the end of the program both variables have the same value! There is no core file because the program did not crash. It just did not do what we wanted. Assuming that we cannot figure this out from looking at the source code, we start the debugger:

-> gdb fillbuffer

This time we do not pass a core file on the command line. When gdb starts up, we only see the legalese. What I want to do is step through the main program to find out where the value of buffer changes. First I set a breakpoint at the beginning of main:

(gdb) b main
Breakpoint 1 at 0x10aac: file fillbuffer.c, line 9.

Now I can run the program and the debugger will stop when it gets to the beginning of main:

(gdb) r
Starting program: /home/cs-students/jcool/fillbuffer
warning: Unable to find dynamic linker breakpoint function.
warning: GDB will be unable to debug shared library initializers
warning: and track explicitly loaded dynamic code.
   
Breakpoint 1, main () at fillbuffer.c:9
(gdb)

This tells you which program it is running. You can safely ignore the following warning messages. When it reaches the breakpoint, it outputs a message indicating where it is and a prompt. It also updates the other Emacs buffer to show me the line of code at which it is stopped. Now, I want to execute my program one line at a time until after I assign the value to buffer. Then after each statement execution, I will print the value of buffer to find out where it changes. Note that when gdb points to a statement from my program, it is the statement it is going to execute next. Here we go:

(gdb) n
Enter a line: abc    Now the pointer points to the printf line
   (gdb) p buffer
$1 = 0xeffff440 "abc"
(gdb) n
buffer = abc         Now the pointer points to the 2nd call to fillBuffer
   (gdb) p buffer
$2 = 0xeffff440 "abc"
(gdb) n
Enter a line: def    Now the pointer points to the 2nd printf line
   (gdb) p buffer
$3 = 0xeffff440 "def"
(gdb)
   

The value of buffer changed, but we did not assign to buffer! What happened? It appears that something went wrong on the second call to fillBuffer. Suppose I still don't understand the problem. I will run it again inside the debugger only this time I will singlestep through the second call to fillBuffer to find out where buffer changes value. To do this, I will use the s command to step into fillBuffer. I will use n to step through fillBuffer. After each line of code I will go up to the main program and print buffer:

(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/cs-students/jcool/fillbuffer
warning: Unable to find dynamic linker breakpoint function.
warning: GDB will be unable to debug shared library initializers
warning: and track explicitly loaded dynamic code.
   
Breakpoint 1, main () at fillbuffer.c:9    The pointer points to the first call of fillBuffer
   (gdb) n
Enter a line: abc    Now the pointer points to the printf line
   (gdb) n
buffer = abc         Now the pointer points to the 2nd call to fillBuffer
   (gdb) s
fillbuffer () at fillbuffer.c:19   Now the pointer points to the printf inside fillbuffer
   (gdb) n     Now the pointer points to the gets call
   (gdb) up        Now the pointer points to the 2nd call to fillBuffer
   #1  0x10ad4 in main () at fillbuffer.c:11
(gdb) p buffer
$4 = 0xeffff440 "abc"
(gdb) n
Enter a line: def   Now the pointer points to the return statement
   (gdb) up        Now the pointer points to the 2nd call to fillBuffer
   #1  0x10ad4 in main () at fillbuffer.c:11
(gdb) p buffer
$4 = 0xeffff440 "def"
   

This means that the call to gets modified buffer! The only way this could happen is if buffer and line are using the same memory. When we print buffer, it tells us what address it is using. Its address is 0xeffff440. When working with strings, gdb displays the address and then the string value. Now, let's see what line is using:

(gdb) down
#0  fillBuffer () at fillbuffer.c:21
(gdb) p &line
$5 (char (*)[1000]) 0xeffff440
   

To print the address of line, I need to use the & address operator. This first shows me the type of line and then its address. In fact, buffer and line are using the same memory! How did that happen? We did something we should not have done. We passed a local array variable as a return value from a function. C automatically freed that memory on the return but we kept the address in buffer. On the next call, C happened to reuse that same memory. When the value in the memory changed, the value used by our dangling pointer also changed. We should not have ignored the compiler warning message!

Getting Help

gdb has extensive on-line help. To get help, just type "help". gdb will list categories of commands that you can get help on:

(gdb) help
List of classes of commands:
   
running -- Running the program
stack -- Examining the stack
data -- Examining data
breakpoints -- Making programs stop at certain points
files -- Specifying and examining files
status -- Status inquiries
support -- Support facilities
user-defined -- User-defined commands
aliases -- Aliases of other commands
obscure -- Obscure features
internals -- Maintenance commands
   
Type "help" followed by a class name for a list of commands in that class.
Type "help" followed by command name for full documentation.
Command name abbreviations are allowed if unambiguous.

The first five categories contain the most useful commands so you should limit your explorations to these areas. If you type "help running", you will see a list of 24 commands to control how your program runs, including the run, step, and next commands described above. The list includes a one line description of the command. If you see a command that looks useful for what you want to do, type "help <command>", substituting in the name of the command you are interested in, and you will get more detailed help on that command.

There is also a gdb entry in Emacs info that will give you a more tutorial introduction to gdb.

gdb Command Summary

l

List the program near the current line

l <line number>

List the program near the given line number of the current file

l <function name>

List the beginning few lines of the given function

p <expression>

Print the value of an expression

bt

Backtrace - prints the functions on the call stack identifying the current line in each function

up

Move to the current line in the calling function

down

Move to the current line in the called function

b <line number>

Set a breakpoint at the given line number of the current file

b <function name>

Set a breakpoint at the beginning of the given function

d <breakpoint number>

Deletes the breakpoint with the given number

r

Run the program from the beginning

n

Execute the next line without going into functions on function calls

s

Step to the next line, going into functions on function calls

c

Continue execution from the current breakpoint

help

On-line help

q

Quit gdb

The ddd Debugger

While gdb is a very powerful debugger, debuggers with a graphical user interface are usually easier to use and can convey more information. We will use ddd, the Data Display Debugger. gdb works at any command line, while ddd requires an X server. ddd actually is built on top of gdb, so all gdb functionality remains, but a convenient interface and the ability to display variables graphically make it that much better.

To start ddd from the command line for our "bug" program:

-> ddd bug
This will bring up a number of windows, the most important of which are the main window and the toolbox.

The main window looks like this:

And the toolbox looks like this:

You can use any of the gdb commands at the gdb prompt in the main window, or use the menus and buttons to perform those same functions and more. Try it out!