Computer Science 010

Lecture Notes 8

C Preprocessor and Make

C Preprocessor

Some of the lines that we have placed in our C files are not actually C code. Instead they are commands that are interpreted by the C preprocessor at the beginning of compilation. All preprocessor commands begin with a # and consist of a single line. So far, we have seen 2 preprocessor commands:

#include - used to include header files
#define - used to define constants

The C preprocessor is automatically run at the beginning of compilation. It reads the C file looking for preprocessor commands. In its interpretation of the command, it removes the command so that the rest of the compiler does not see it and performs some transformation on the text that is sent to the preprocessor. So, here is what the two commands we have used do:

#include replaces the #include statement with the contents of the file being included. After including the text, it then continues its preprocessing with the newly-included file. Thus, header files can include preprocessor commands.
#define looks through the remainder of the file and replaces all occurrences of the first word with the remainder of the line. Thus,
```
#define TRUE 1
```
replaces all occurrences of the string TRUE with the value 1.

There are some other useful preprocessor commands that provide conditional compilation. C is not as portable a language as Java. Nevertheless, we would like to write one C program and be able to compile and run it anywhere. To do this, though, we might need to take into account the features of a particular architecture or operating system. We might not be able to use exactly the same source code everywhere. Conditional compilation allows us to do that. We can surround a chunk of code with a conditional compilation operator and the chunk of code only gets included in the file sent from the preprocessor to the remainder of the compiler if the condition is true. Of course, the condition must be something that can be evaluated at compile time. There are 3 conditional compilation preprocessor commands:

#if <expression> ... #elif <expression> ... #endif
#ifdef <identifier> ... #endif
#ifndef <identifier> ... #endif

#if is useful to compile out debugging statements and be able to easily compile them back in as follows:

#define DEBUG 1
....
#ifdef DEBUG
   printf ("Some debugging output.\n");
#endif

If I want to disable all debugging output protected by similar statements, I can change the #define statement so that DEBUG is 0 and recompile. None of my debugging printfs will get compiled.

One particular expression we can use in #ifdef is:

#if defined (<identifier>)

This will include the enclosed source if and only if the identifier has been defined. It does not matter what value it is defined to. This is particularly useful in conjunction with the -D compiler option. -D is given an identifier and defines it. So, now I would leave out the #define DEBUG 1 line. Instead, if I wanted debugging, I would compile with the -DDEBUG compiler option. If I don't want debugging output, I would just omit that option.

#ifdef is simply shorthand for "#if defined". Normally, you would use #ifdef unless you wanted to have #elif clauses. There is no #elifdef so you need to say defined if you want a list of them. #ifndef is just the negation of #ifdef.

Header files can #include other header files. If we are not careful, we can easily get into a situation in which the same header file gets included multiple times. If this happens, however, our program won't compile because the compiler will see multiple definitions of the same thing. To avoid this, header files typically use the following format:

#ifndef _FOO_H
#define _FOO_H
   
< contents of header>
   
#endif

The convention is that each header file defines a variable that represents that file and is extremely unlikely to be defined elsewhere. This is done through naming conventions. The variable is simply the name of the header file, all capitalized, preceded by _ and with the . replaced with _. If the file is #include'd multiple times, the contents will really only be textually included one time. The other times the contents will be skipped because the header file's variable has already been defined.

With everything we know about C now, we should be able to understand most header files, so we will take a look at string.h. Recall that this file lives in the /usr/include directory. The file is quite large, so it is not included here.

Make

A large C program might contain 100 or more C and header files. Whenever a header file changes, you should recompile all the C files that use that header file and then relink your program. Keeping track of what has changed and knowing which files include which other files is a difficult task to do. Fortunately, there is a program called make that can help us deal with large C programs (or pretty much anything else!).

A Makefile consists of a collection of definitions followed by a collection of rules. The definitions define variables that are typically dependent on where you are building the program. They may define such things as the directories where files should be located.

The rules define how the program should be compiled. A rule consists of a target, the names of the files it depends upon, and one or more Unix commands to build the target. When you execute make, it reads in the Makefile. It looks for a file that has the name of each target. For each target, it compares the date and time at which the target was last modified with the date and time at which each file it depends on was modified. If the target is older than any of the files it depends on, the commands associated with the rule are executed. Usually, those will recreate the target. Here's an example:

match.o:  match.c
        gcc -c -g -Wall match.c
   
match:  match.o
        gcc -o match match.o

The first rule says that the object code file match.o depends on the source file match.c. If the source file is changed, its modification time will be more recent than the one in match.o. If we run make, make will notice this and will recompile the file for us using the following gcc command. If the object file is newer than the source file, then it means the source file did not change since the last time we compiled it so we do not need to compile it. In that case, the gcc command will be skipped.

The second rule says that the file match depends on the file match.o. This means that if the object code is newer than the executable program, we should relink the executable program. In this case the second command will be used to link match.

Now, suppose one of the files is missing. If a target is missing, but all of the files it depends on are present, the command(s) associated with the rule are executed. If one of the files in the dependency list is missing, make looks for another rule in which that dependency file appears as a target. It rebuilds the dependency file and then rebuilds the original target in that case. So, suppose the files match.c and match exist, but match.o does not. If we want to rebuild match, we can use the command:

make match

Make tries to compare the modification date of match with that of match.o. match.o does not exist so it looks for a rule with match.o as a target. It finds one. All the dependency files exist for match.o. So, it issues the following commands:

gcc -c -g -Wall match.c
gcc -o match match.o

As a result, it creates both match.o and match. If an error occurs during the execution of any command, make quits at that point. So, if make.c had compilation errors, it would not attempt to link match.

This all becomes more useful when we have large programs. We can put more than one file in a dependency list. The rules for .o targets typically include the corresponding .c file and any .h files #include'd in that .c file. The rules of an executable target typically list all the .o files that need to be linked together to create the executable program.

One extremely important rule about using makefiles is that the commands associated with a rule must be on lines that start with a TAB character. No other whitespace will do. If you do this incorrectly, make will report "missing separator. Stop." when it tries to read your makefile.

Also, note that the default name for the makefile is Makefile or makefile. If you use that name, you do not need to tell make where the makefile is.

Now, let's explore a use for the declarations. It is generally a good idea to define some identifiers that indicate what compiler to use and what the compiler arguments should be. Then the rules just use those definitions. Here's how that would look:

# This defines an identifier whose value is the name of the C compiler
CC = gcc
   
# This identifier are the flags to compile with
CFLAGS = -c -g -Wall
   
# This defines the linker program
LD = gcc
   
# These are my linker arguments.  In this case, there are none.
LDFLAGS = 
   
# Here are rules that use the identifiers.  The syntax $(CC) means use the value of the
# CC identifier.
match.o:  match.c
        $(CC) $(CFLAGS) match.c
   
match:  match.o
        $(LD) $(LDFLAGS) -o match match.o

The advantage of defining identifiers particularly pays off with large make files. In that case, I could just change my definition of CFLAGS and it would affect all my rules that used that identifier.

Makefiles can be extremely complicated, but what you see here can go quite far.