Computer Science 010

Lecture Notes 2

Some Advanced Unix

Before getting into C, here are some more advanced Unix commands that you will probably find useful.

Process Control

In doing the practice assignments, some of you may have created clocks using a command such as:

->xclock

This creates a clock but does not give you a new shell prompt. To get a new shell prompt, you need to add an & to the end of the command line:

->xclock &

What is happening? With the & the job is running in the background. Without the & the job is running in the foreground. In this case the shell waits for the command to complete before giving you a new command prompt.

So, what can you do if you forget the &? You can pause the clock and then tell it to run in the background. To do this, type C-z at the shell. You should then see a line like:

[3]+   Stopped    xclock

The number is the job number and is likely to be different. Now you should have a shell prompt. Your clock, however, is probably displayed but is not running. Use the bg command to run the stopped job in the background. Now the clock should continue running and you should get a shell prompt back:

-> bg
[3]+ xclock &
->

Your other option is to kill the program. From the shell you can type C-c. This will work for most, but not all, commands. This will remove xclock and give you back a shell prompt.

If C-c does not work, create a new shell window. (In the Mac lab with eXodus, choose the host again from the Clients menu, and from the Unix lab, point the mouse at the root, click the left mouse button, and select X terminal.) Now use ps -a to find the process number. Then use kill -9 <process-number> to kill the program. This should give you the shell prompt back in your original window.

-> ps -a
  PID  TT  STAT      TIME COMMAND
16576  p3  I      0:00.08 xterm
16839  p3  S      0:00.12 xman
16394  p3  Ss     0:00.17 tcsh
16488  p3  S      0:03.40 emacs
16840  p3  R+     0:00.00 ps -a
-> kill -9 16840
->

From the Unix lab, another way to kill the program is to point at the root with the mouse, click the right mouse button. Select "kill client" from the menu and then click on the clock.

Try these out!

Filename Completion

Unix (your shell, actually) provides filename completion so that it is often unnecessary to type entire filenames when specifying them at the Unix prompt. To use it, type the beginning of a filename and then hit the tab key. Unix will expand the filename as much as possible. If more than one filename has the same prefix, it will only expand it until the characters differ. For exmaple, if your directory contains foo and foo.c and you type "ls f<tab>", Unix will replace the tab with "oo" and then beep at you. If it can complete the filename uniquely, it will do so but will not beep.

Another related feature is the '*' wildcard character. If you put '*' in a filename, it will match 0 or more characters. Thus, "ls f*" will list all the filenames that begin with f.

Within Emacs, filename completion is done with the tab or the space key. So, if you are trying to visit a file, you can type a prefix of the filename followed by tab or the space key. Emacs will complete the filename as much as possible and show you all the matching filenames in a buffer.

Again, give these a try so you become comfortable with them.

Cancelling commands in Emacs

Sometimes you may want to stop a command that you have started in Emacs. For example, you might have started ispell but then decide you want to quit. The command to use is C-g. This will cancel whatever command is currently running. If no command is running, it will just beep.

Undoing commands in Emacs

Given an editor as powerful and complex as emacs, you are bound to type errant commands from time to time which may just change a few characters, but also may change your buffer completely. Rather than trying to figure out exactly what happened and how, it is often better to undo your previous command. The command is C-x u. You can use it repeatedly to undo previous commands.

Copy-and-paste in X windows

A useful way to move information from one X window to another is using X's copy-and-paste commands. Most applications support these, but not all of them. To cut point the mouse at the beginning of the text to copy and push down the left mouse button. Drag to the end of the text to copy and release the mouse button. Now move to the position where you want to insert the copy and click the middle mouse button.

Getting Started with C

C originated in the early 1970s at Bell Labs as a high-level language they could use to create Unix. Prior to C, most operating systems were written in assembly language. C has high-level syntax but offers constructs that allow programmers to do very low-level programming much like assembly language. While this offers a great deal of power to the programmer, it also makes it extremely easy to make mistakes that one cannot make in many other languages. You have all the tools available to shoot yourself in the foot, and any beginning C programmer will do just that from time to time (as will the experienced C programmer!). C quickly became popular, just as Unix did, because it was given freely to colleges and universities.

Important differences between C and Java include:

C is not object-oriented; there is no class construct or inheritance
C does not have garbage collection. You must explicitly free all memory that you allocate.
C does not do as much error checking for you. For example, it does not check that subscripts are within array bounds before dereferencing an array.

There are many smaller differences, both syntactic and semantic, but these are the major issues that you must overcome to become a proficient C programmer. The libraries are extensive but quite different than those that come with Java. For example, GUI programming can be done using the X library, but this is even more complicated than AWT or Swing.

C was created in an era in which machines were slow and had little memory. To get maximum speed out of programs, the C runtime system does minimal error checking. The reasoning was that programmers could insert error checking code at just those places in which it was needed. Of course, the end result is that most C programs do insufficient error checking and can crash in unexpected ways.

On the bright side, understanding a C program can give you a better understanding of what is really happening when a program in any language is compiled and run. Many of these details are hidden by other languages, but understanding what is happening in C will make you a better programmer in other, more modern languages.

Since C and Java have so much in common, we will not spend a great deal of time discussing their similarities, but instead focusing on their differences. The goals for today are to discuss the basic data types and statements, functions, and how to do input and output. By the end of today, you should be able to read, write, compile and execute simple C programs.

Finally, a note for those of you who know Pascal but not Java. C is quite similar to Pascal in programming style since they are both imperative programming languages. You will need to learn new syntax for statements and declarations. You also need to beware (as do Java programmers) that C provides much less error checking than Java.

Basic Data Types

The basic data types in C are:

int: 32 bit integer

short: 16 bit integer

long: In FreeBSD, this is the same as int, a 32 bit integer

char: An 8 bit character. This can hold any ASCII character. It can also be used as an 8 bit integer.

There are a few points worth specific mention here. The size of the integer types listed above is machine dependent. The sizes listed are what Solaris uses. The fact that there is no standard size for these types make C programs difficult to port from one machine type to another.

Booleans

Another interesting point is that there is no boolean type in C. Of course, C still has if-statements and while-statements, so you would expect there to be a boolean type. In fact, there is not. The expressions in these statements are integers! A value of 0 is equivalent to false; a non-zero value is equivalent to true. It is generally a good idea to define a type to represent booleans with two constant values true and false. This can be done by putting the following 2 lines at the beginning of your C program:

typedef int bool;
#define TRUE 1
#define FALSE 0

The first of these statements creates a new type named bool that behaves exactly the same as int. The reason to do this is to make your programs more understandable. If you are using a variable as a boolean, it would be good to declare it as such so that you (and others) can better understand your code. The next two lines each define a constant. #define is the keyword for defining constants. The word following the #define is the symbol being defined while the expression after that is the value it is given. You can now use TRUE and FALSE as symbols even though they will be interpreted as integers, as follows:

bool done = FALSE;
while (!done) {
    <do more processing>
}

Unsigned integers

It is possible to restrict the range of values that an integer takes on to be only positive integers. You do this by prefacing the type name with the keyword unsigned:

unsigned short counter;

The declaration above gives you a 16 bit integer that can take on the values from 0 to 2¹⁶-1. If the unsigned keyword is missing, the value can range from -2¹⁵+1 to 2¹⁵-1.

As an example of C's lack of runtime checking, the unsigned keyword is really only useful as documentation. Neither the C compiler nor the C runtime system prevent you from assigning negative numbers to variables declared to be unsigned.

Enums

The enum keyword allows you to define enumerated types as in Pascal but unfortunately missing from Java. When you define an enumerated type you provide identifiers representing the values of the type as follows:

enum Systems {
    Windows,
    Macintosh,
    Unix
};

The only thing you can do with enums is to assign values to variables and to compare variables, as in:

enum Systems best;
best = Unix;

As with all the types we have seen so far, C really treats these as integers. Thus, you could assign them to integer variables or assign an integer to an enum variable. These are generally not sensible things to do, but C will not complain if you try.

Arrays

Arrays are basically the same as in Java with a few differences:

You can declare the size of an array when you declare the array as follows:
```
int scores[10];
```
This gives you an array of 10 integers indexed from 0 to 9.
There is no equivalent of the .length variable that Java's arrays have. As a result, there is no way to ask an array how big it is.
Since an array does not know how big it is, it cannot determine if a value you use to index into an array is within the size of the array. Even more silly, C does not even warn you if you use a negative index even though the lowest bound for any array is 0. So, what happens if you use an array index that is out of bounds? Some random piece of memory will be returned or modified (depending on whether the bad index is on the left or right of an assignment statement). What should you do about it? If you have any doubt at all about whether an index is valid, you should add code to compare it to the array bounds.

Structs

Structs are as close as C gets to classes (not very close). A struct allows you to group together several data declarations so they can be treated as a group. Thus, a struct is like a class that has only variables, not methods, as follows:

struct money {
    int dollars;
    int cents;
};

Basic Statements

The syntax for statements is the same in C as it is in Java. C has:

Assignment statements:
```
a = b;
```
If statements:
```
if (a == b) {
    ...
}
```
While statements:
```
while (a == b) {
    ...
}
```
Do statements:
```
do {
    ...
} while (a == b);
```
For statements:
```
for (i = 0; i < 10; i++) {
    ...
}
```

Switch statements:

switch (i) {
    case 0: 
        <do something>
        break;
    case 1:
        <do something else>
        break;
    default:
        <do default thing>
}

As mentioned earlier, expressions in C take integer expressions. A zero value is false while a non-zero value is true.

Another difference between C and Java is that the assignment statement returns a value, the value being assigned. This allows the assignment statement to be used on the right hand side of an assignment statement! This may sound odd but it allows for the following convenient shorthand:

a = b = 0;

This initializes both a and b to 0.

Unfortunately, the combination of integer expressions and assignment statements being acceptable expressions leads to the following extremely common C programming error:

if (a = 1) {
    ...
}

Note that the expression contains an assignment operator rather than an equality operator. The result is that 1 is assigned to a. This assignment expression returns the value assigned, in this case 1. 1 is equivalent to true so the body of the if-statement is always executed. This is almost certainly not what the programmer intended! Beware of this very common error and be sure to use == inside expressions rather than =.

Functions

Another major difference between C and Java is that C does not have methods. Instead, statements are encapsulated into named units called functions. A function declaration is quite similar to a method declaration except that it does not occur within a class definition.

Functions are invoked in a manner similar to methods except that they are not sent to an object. Thus, the object.method syntax is replaced by a simple function name. Since functions are not sent to an object, there is no this identifier.

For example, here is a function to compute the minimum of two integer values.

int min (int i1, int i2) {
    int minvalue;
   
    if (i1 <= i2) {
        minvalue = i1;
    }
    else {
        minvalue = i2;
    }
   
    return minvalue;
}

This function would be called as follows:

min_int = min (a, b);

Prototypes

It is generally a good idea to declare all of your functions at the top of your files and to define them later. C requires that all functions be declared before they are used. A function declaration is also known as a prototype. It simply defines the return type of the function, the function name, and the types and names of the parameters. This single line ends in a ; instead of being followed by the function definition. Thus, the prototype for min is:

int min (int i1, int i2);

I/O

In order to do input and output, you must add the following line to the beginning of your program:

#include <stdio.h>

In a later class we will describe exactly what this means. For now, it is sufficient to know that it includes in your program the definitions of the printf and scanf functions required to do I/O.

printf

Both printf and scanf take similar arguments. The first argument is a formatting string. A formatting string is a string with literal characters, escape characters, and conversion specifications. Let's look at printf first:

printf ("Happy 21st Century!\n");

In this case, the string contains only literals and the escape sequence \n. \n represents a carriage return. Executing this printf statement results in the following output:

Happy 21st Century!

More commonly, you will want to output the values of variables or expressions. To do this, you include conversion specifications in the formatting string and extra arguments following the string that represent the expression to be output. The conversion string is a % followed by a character that identifies the type of the expression being output. For example, here is another way to output the same string:

year = 2001;
printf ("Happy %d!\n",year);

%d indicates that an integer will be output. year is the integer expression to output. It is possible to put more than one conversion specification in a printf as follows:

year = 2001;
punctuation = '!';
printf ("Happy %d%c\n",year, punctuation);

%c is the specification for a character. Specifications for the other types we have seen are:

%c

char

%hd

short

%d

int

%ld

long

C does not check that the conversion specification matches the type of the expression or even that the number of conversion specifications match the number of arguments, so be careful!

It is also possible to be precise about the size of the output. You can look in the book for details.

scanf

scanf is similar syntactically to printf although its purpose is to read input from the keyboard. Again, the first parameter to scanf is a formatting string. The following arguments are variables to which the input values should be assigned. Also, note that the variable names must be preceded by & when using the integer and character types already discussed. For example,

scanf ("%d/%d/%d", &month, &day, &year);

This expects the user to enter a date such as 1/4/2001. The % sequences define the type to read in. The / characters are character literals that scanf will match. If the characters typed by the user cannot be interpreted as the appropriate type, scanf will stop processing input and will return with only some of the variables being set.

getchar

getchar is another function that can be used to get keyboard input one character at a time. This function takes no arguments and returns a single character:

c = getchar();

The character returned might be whitespace (tab, newline, blank) or any other character. It is most useful when the input has little structure or when you are expecting a single character input.

Sample Program

/* This is a program to draw a happy face or sad face on the screen.
   It takes input from the user.  h draws a happy face.  s draws a sad face.  
   The program quits when the user enters q.

   Author:  Barbara Lerner
   Date:    January 4, 2000
*/

/* stdio.h defines printf and getchar */
#include <stdio.h>

/* An enumerated type defining face types. */
typedef enum {
  happy,
  sad
} mood;

/* A prototype of the function to draw faces. */
void face (mood which);

int main() {
  /* The character indicating what type of face to draw. */
  char c;

  /* Character we read into until we find the end of the line. */
  char c2;

  /* Loop until the user enters q to quit the program. */
  do {
    /* Prompt the user for the next input and get their input. */
    printf ("Enter h for a happy face, s for a sad face, and q to quit: ");
    c = getchar();

    /* Read the remaining characters until finding the newline character. */
    do {
      c2 = getchar();
    } while (c2 != '\n');

    /* Call the face function with the appropriate enum value based 
       upon the character input by the user.  */
    switch (c) {
      case 'h':
        face (happy);
        break;
      case 's':
        face (sad);
        break;

      /*  Do nothing on q.  We will exit the loop and quit the program. */
      case 'q':  
        break;

      /* No other user input has meaning so we output an error message. */
      default:
	printf ("I don't know how to draw that face!\n");
    }
  } while (c != 'q');

  /* Exit the program with an exit status indicating successful completion. */
  exit (0);
}

/* face draws a happy face or a sad face on the screen.
   Parameters:
     which - which type of face to draw.  Only happy and sad are valid values.
   
   Author:  Barbara Lerner
   Date:    Jan. 4, 2000
*/
void face (mood which) {
  /* Determine which type of face to draw and draw it. */
  switch (which) {
    case happy:
      printf (":-)\n");
      break;
    case sad:
      printf (":-(\n");
      break;
    default:
      /* If this function is used correctly, we should not get here. */
      printf ("face: unknown face type\n");
  }
}

Compiling and Running C Programs

To compile a C program, use the following command line:

gcc -Wall -o <output-filename> <source-filename>

For example, if you place the above program in a file called face.c, you would compile it using:

gcc -Wall -o face face.c

This will create a file called face in your directory. Note that there will be a * at the end of face when you look at it with ls. This just indicates that the file is executable and is not actually part of the filename. To execute the file, just type the output-filename (in this case face) at a shell prompt as follows:

-> face