Computer Science 010

Lecture Notes 4

Memory Management

Emacs Tips

Here are some more commands you will find useful in Emacs. Practice them to learn them.

Searching

Emacs provides an incremental search command that is very useful. You begin the search using C-s. In the message window at the bottom of the screen, you will see:

I-search:

The cursor will be following this prompt. Here you can type in the string that you are looking for. The search is done incrementally as you enter characters in the string. So, for example, suppose I wanted to search for the word "regions". I would type in "r" with no carriage return and Emacs will immediately move the cursor in the main buffer to the next r in the buffer. Now, when I type an "e", Emacs will move to the next location of "re". After typing characters one of 3 things will happen:

If at any point you want to abort the search, use the cancel command (C-g). This will put you back in the buffer where you started the search. Also, if you type lowercase characters in your search string, they will match both uppercase and lowercase characters in the text. If you enter uppercase characters, however, it will assume the case is significant and will only match uppercase characters.

There is a similar search that goes up from your current location in the buffer. This is bound to C-r and uses the prompt

I-search backward

Replacing

Replacement is related to searching. There are two replacement commands. The first is global, while the second is incremental. For each command, you give the string you want to replace, hit carriage return, and then give the string to replace it with. The global replacement will replace all occurrences of the first string with the second string and then tell you how many replacements were done. The incremental replacement will show you each occurrence and ask you if you want to change that occurrence or not.

To use global replacement, type "M-x replace-string". You will get the prompt:

Replace string:

Type the existing string to replace followed by carriage return. Now you will get the prompt:

Replace string <old string> with:

where <old-string> is the first string you typed. Enter the new string followed by carriage return. You will then see the message

Replaced <num> occurrences.

where <num> is the number of strings replaced.

Incremental replace is actually called query replace and can be executed using M-%. The first prompt is:

Query replace:

Type the existing string to replace followed by carriage return. Now you will get the prompt:

Query replace <old string> with:

where <old-string> is the first string you typed. Enter the new string followed by carriage return. Emacs will move the cursor in the buffer to the next occurrence of the first string. You will then see the prompt:

Query replacing <old-string> with <new-string>: (? for help)

Type y to replace that occurrence and move to the next occurrence. Type n to leave that occurrence unchanged but move to the next. Use the cancel command (C-g) to quit query replace and go back to the point in the buffer where you started the replace. Canceling will not undo any replacements already done and will leave you at the last occurrence Emacs visited, not where you started the replacement. If you are certain that you want to replace all remaining occurrences of the old string, you can enter ! to do them all at once. If you do not cancel, when you reach the end of the buffer, you will be told how many occurrences were changed as before:

Replaced <num> occurrences.

where <num> is the number of strings replaced.

For both global and query replacement, only those occurrences between your starting point in the buffer and the end of the buffer are changed. If you want to change all occurrences in the buffer, you must first go to the start of the buffer. Also, the global replacement is somewhat hazardous to use, especially for short strings, as it also matches parts of words. So you need to think carefully when using global replacement about whether you will accidentally change some text you don't want to change.

Regions

The cursor is the dark rectangle visible when editing in a buffer. It is possible to set a mark at the current cursor position. When you move the cursor, the mark stays at the old cursor location. The area between the mark and the cursor is called a region. There are numerous operations that can be performed on a region. The most important of these are described in this section.

One thing you can use this for is to quickly move between two points in your buffer. Set the mark at one point,using C-SPACE then go to the place you want to edit. When you want to jump back to the saved point, type C-x C-x. This will swap the mark and the cursor. Your cursor will be at the previously marked point and the mark will be at your recent cursor position. In this way, typing C-x C-x a second time will take you back to where you jumped from. It is often a good idea to confirm that the region includes what you expect before applying a region command. To do this, just type C-x C-x once to confirm what one boundary is and then C-x C-x again to confirm where the second boundary is.

One thing that regions are very useful for is cut-and-paste and copy-and-paste. To cut a region, type C-w. To copy a region, type M-w. To paste in the cut/copied text, type C-y (yank).

Note that some commands set the mark in addition to performing their main action. These commands will print

Mark set.

in the message buffer. That is why it is a good idea to confirm region boundaries before applying region commands.

More on cutting and pasting

On previous days, we have seen C-d and Backspace as ways of deleting individual characters. In the previous section, we saw C-w as a way of cutting an entire region. There are some other useful cut commands:

M-d

Cut next word

C-k

Cut rest of the line

The cut commands put the text cut into a kill ring. The kill ring contains the last 30 chunks of text that were cut. The paste command (C-y) pastes back in the most recently cut text. If that is not the one you want use M-y as your next command. That replaces the pasted text with the next most recently cut text. It is called a kill ring because it is circular. Eventually, you might get the 30th most recently cut text. If you type M-y again, you will wrap around to the most recently cut text.

Note that C-d and Backspace, which delete individual characters, do not update the kill ring. These deletions are small and are therefore not worth having each character take up a slot in the kill ring. It would be just as easy to type in that single character. Only commands that normally excise multiple characters are saved in the kill ring.

Also, note that if you use multiple cut commands consecutively, they cut text will be grouped together into one entry in the kill ring. This is very convenient so that you can use C-k to cut multiple consecutive lines and then paste them as a single unit elsewhere in the text.

Undo

Recall from Lecture 2 that undo in Emacs is bound to "C-x u". You can issue it multiple times in a row to undo multiple commands. Typing any command other than undo will stop the undo. Now, an interesting thing happens. Your command history now has several undo commands in it. If you start undoing, you will essentially undo the undo commands. Think of this as a redo command but it does not have a separate key binding.

If you want to undo all the editing changes since the last time you saved the file use the revert-buffer command (M-x revert-buffer).

Memory Management

Memory management involves the allocation of memory to hold data and the deallocation of memory when it is no longer used. Java programmers are familiar with memory allocation through the use of constructors and the new keyword. This allocates memory to hold a new object. In Java, memory deallocation is done automatically by the runtime system using a mechanism called garbage collection. The runtime system keeps track of where an object is referenced and when there are no more references to an object, it frees the memory associated with the object.

C also provides a mechanism for memory allocation. It does not provide an automatic mechanism for memory deallocation but instead requires the programmer to do so specifically. As a result, the programmer must be very aware of what their pointers are pointing to so that they can deallocate memory appropriately. Incorrect memory deallocation is the source of many C programming errors and a type of programming error that can be extremely difficult to track down.

Memory Allocation

Recall that types in C can either be pointer types or non-pointer types. Non-pointer types can be simple values like ints or complex values like structs. Pointer types contain * in the type specification. Memory is automatically allocated when non-pointer types are used. You can immediately assign to variables with non-pointer types:

int i;
typedef struct {
    int month;
    int day;
    int year;
} date;
date today;
   
i = 0;
today.month = 1;
today.day = 10;
today.year = 2000;

Variables declared with a pointer type have enough memory to hold a pointer, but not enough memory to hold the thing pointed to. Instead a pointer variable can take on a value in one of 3 ways:

We have already seen the first two of these so we'll focus on the third here. First, we declare a variable with a pointer type. The next line allocates memory to hold a value. malloc stands for memory allocation. We need to tell malloc how much memory to allocate and it returns a pointer to the newly allocated memory. The amount of memory we need is enough to hold an instance of the type we want to store at that memory. To find out how big an instance is, we use the sizeof function telling it the name of the type we plan to store there. sizeof returns the number of bytes needed and passes this value to malloc. malloc returns a pointer to this memory but since malloc can be used in lots of contexts it doesn't know what type of pointer we need. As a result, the return type of malloc is void *. This simply means it is returning a pointer but is not specific as to what type of thing is being pointed to. In order to assign the pointer to our variable we need to tell the assignment operator what pointer type we are using. To do this, we just precede the malloc call with the pointer type in parentheses. Also, note that we need to include stdlib.h which is where the prototype for malloc is defined. stdlib.h also defines NULL for us so we do not need to do that ourselves.

Finally, note the use of the pointer. To access the fields of the object pointed to, we need to use the -> operator. You can remember this operator because it looks like a pointer. If we have a value rather than a pointer, we use the . operator as in Java.

Pointers and Arrays Revisited

Recall that last time we said that pointers and arrays are very similar. In fact, when we declare an array variable, C allocates enough memory to hold a pointer to the array and the array elements themselves. It can do this because we need to declare the size of the array when we declare the array:

char name[20];
char school[] = "Williams";

In the first case, we tell it how big to make the array with a constant. In the second case we tell it what value we want to give the array and it allocates just enough memory to hold that value and the terminating null character.

Here is an equivalent way of doing the above using pointers:

char *name = (char *) malloc (20);
char *school = "Williams";

If we don't know how big we want the array initially, we could do the following:

char *school;
...
school = (char *) malloc (strlen ("Williams") + 1);
strcpy (school, "Williams");

First we declare a string pointer but do not allocate memory to hold the characters in the string. We use malloc later to allocate the memory. We use strlen to determine how long the string is and then add 1 for the terminating null character. Finally, we copy the string value in using strcpy. The null character is automatically copied for us.

Memory Deallocation

As mentioned earlier C frees some memory automatically. Specifically, variables that are local to a function have memory allocated when the function begins execution and have memory freed when the function completes execution as follows:

int increment (int i) {
  int i2;
  i2 = i1 + 1;
  return i2;
}
   
int j = increment (100);

When increment is called, memory is allocated for the parameter i large enough to hold an integer. The value 100 is copied into this memory to pass the parameter value in. Memory is also allocated for the local variable i2 large enough to hold an integer. The value 101 is copied into i2. On return the value in i2 is copied into j, the variable being assigned to by increment. The memory allocated for i and i2 is deallocated. Since none of the types involved are pointer types, the memory allocation and deallocation is done automatically by the C runtime system.

In contrast, consider the following function to duplicate an array:

char *duplicate (char *s) {
  char *s2 = (char *) malloc (strlen (s) + 1);
  strcpy (s2, s);
  return s2;
}
   
char *school = duplicate ("Williams");
...
free (school);

When duplicate is called, memory is allocated for s that is large enough to hold a pointer. The address of "Williams" is copied into this memory to pass the parameter value in. Memory is also allocated for the local variable s2 that is large enough to hold a pointer. The initializer of s2 calls malloc to allocate memory that is large enough to hold the string pointed to by the parameter. The strcpy function is then called to copy the string into this new memory. strcpy assumes that memory allocation is done before it is called. The return statement copies the pointer into the school pointer value being assigned to by duplicate. The memory used by s1 and s2 to hold the pointers is automatically deallocated, but the memory used to hold the copied string is not deallocated. The string continues to exist and is now pointed to by the school variable. Eventually, we decide that we no longer need the string. At that point we call the free function to deallocate the memory. The memory management system remembers how much memory was allocated when a pointer is returned by malloc and it deallocates that much memory when we call free. A later call to malloc may now return this pointer again allowing the memory to be recycled.

There are several dangers associated with pointers and memory management that are described in the rest of these notes.

Uninitialized Pointers

The first problem one can get into is to attempt to use a pointer before you have allocated memory for a value. In Java, variables are always initialized to null. In C, there is no default initialization. The runtime system simply allocates memory for the variables. The variable takes on whatever value happens to be there. If that chunk of memory was never used before by the program, it is probably 0, but really it could be any value. Suppose you use an uninitialized pointer as follows:

dateptr d;
d->month = 1;

One of two things might happen:

How do you avoid this problem? Always be sure to initialize your pointers before you use them. The simplest thing to do is to initialize them in their declaration. If you don't know what value to give them at that point, initialize them to NULL. A segmentation fault error will occur if you try to dereference null, but these bugs are much easier to find and fix than inadvertent variable changes.

Memory Leaks

A second problem that might occur is that you might forget to free memory that you are no longer using. The good news is that memory leaks will rarely cause your program to crash or exhibit bugs of any kind. The bad news is that the size of your program will grow continuously over time. For small programs this is not normally a problem, but it is for large programs. Large programs may become very slow and might eventually crash if they run out of memory that can be allocated.

It is good practice to free memory religiously even in small programs so that you get used to doing it and avoid memory leaks when you start to write large programs where it really matters.

The trick here is to make sure that every chunk of memory allocated with malloc is eventually freed with free. If you have a single variable referencing that memory, it is pretty simple to do it correctly. If you have multiple variables pointed to the same chunk of memory, you need to be sure that you only free the memory when you no longer need the value through any of the variables. Failure to do this correctly leads to the next problem....dangling pointers.

Dangling Pointers

A dangling pointer is a pointer that has a legal address but the contents of the address is not the value you expect to be there. This is slightly different than the uninitialized pointer problem. In the case of dangling pointers, the pointer pointed to a reasonable value at one time, but the memory was deallocated and has now been reused for a different purpose. Now, if you use the pointer, you will get an unexpected value.

Problem 1: Pointing to memory that was automatically deallocated

A dangling pointer can occur because you have a pointer to memory that was automatically freed when you exited a function:

char *buffer;
   
char *fillBuffer () {
  char line[1000];
  gets (line);
  return line;
}
   
buffer = fillBuffer();

Here we declare a pointer called buffer outside of the function but we do not allocate memory for the string pointed to. Inside fillBuffer, we declare an array called line that is large enough to hold 1000 characters. The C runtime system automatically allocates enough memory for this array. We now use the gets function to fill this array with the next line input by the user. Finally, we assign the address of the array to our buffer variable in the return statement. buffer now points to this allocated array. On return from the function, the memory that was automatically allocated to hold the line string is now deallocated. For a while, the value in buffer might appear to be ok, but eventually, the memory will be reallocated for another purpose and modified. At that point the value in buffer will change mysteriously.

The solution in this case is to be sure that we never use the & address-of operator to try to pass an address from a local variable to a global variable and also avoid returning arrays. Fortunately, the compiler gives us a warning in this case:

warning: function returns address of local variable

Problem 2: Pointing to memory that was deallocated with free

A very similar problem arises if you have two pointers to a chunk of memory and then free the memory while the other pointer continues to use the memory:

dateptr d1, d2;
   
d1 = (dateptr) malloc (sizeof (date));
d2 = d1;
...
free (d1);
d1 = NULL;
...
d2->month = 2;

Here we declare two pointers to dates. Then we allocate memory to hold a date and assign the address to d1. Next we assign d1's value to d2. This copies the pointer. Now, we have two variables pointing to the same date. Later when we free the memory using d1, we set d1 to NULL. (Note that free does not change the value in d1, so d1 continues to contain the same address unless we change it ourselves!) d2 is not changed, however. It still contains the same address. When we dereference d2, we will be lucky if the memory was not allocated for another purpose. In that case, things will appear to work ok. If we are unlucky, the memory will have been reused for another purpose. Our assignment will then inadvertently change memory that some other variable is using. This bug will show up at a later time when we try to use that other variable and it has an unexpected value.

This problem is very, very difficult to avoid in general. You just must be very careful when you assign one pointer variable to another and then use free. Also, unless the variable through which you are doing the free is about to be reclaimed automatically because a function is ending, or you are about to assign another value to the variable, you should set the variable to NULL. Incorrect use of the variable will then result in a segmentation fault, which is easier to find and fix then a dangling pointer.

Problem 3: Deleting the same memory more than once

This problem is really just a variation on the previous. If you have a dangling pointer and you call free with that pointer bizarre things might happen. If the memory has not been reallocated for another purpose, it's possible that nothing bad will happen. If it has been reallocated, however, you will now free memory that is being used for a different purpose. Instead of just inadvertently changing another variable's value, you have now turned that variable into a dangling pointer!

Use the same tricks as for problem 2 to avoid this problem.

Sample Program

/*                  
     Filename: book.c
       Author: Good C. Memory
   Definition: This program uses malloc( ) to allocate memory for structures
               to store the book records, and free( ) to deallocate them.
               This program is a modified version of Program 16-6 in Pure C 
               Programming by Amir Afzal.
   
*/
/* preprocessor commands */
  #include <stdio.h>         /* including the stdio.h header file for I/O */
  #include <stdlib.h>        /* including the stdlib.h header file for malloc and free */
   
/* declaring a structure type */
  #define MAX  100      /* set MAX to 100 */
  typedef   struct  {
          char  title [81] ;        /* the title of the book */
          char  author [81] ;       /*  the author of the book */
          char  category [41] ;     /* the category of the book */
  } book;
  
  int  menu  ( void ) ;       /* function menu prototype */
  book  add ( void ) ;        /* function add prototype */
  void  list  (book * array[ ], int ) ;     /* function list prototype */
   
int  main( )
{
/* declaring an array of structures */
  book  *library [MAX];     /* array of pointers to books */
  int choice;               /* Value returned from the menu */
  int index;		    /* Used to walk the array. */
  int count = 0;	    /* Number of books in the array */
  
  do
  {
    choice = menu () ;    /* show the main menu */
    switch (choice )      /* check user selection */
    {
      case 0 :        /* the exit choice */
        /* before ending the program, release the allocated memory */
        for ( index = 0 ; index < count; ++index ) {
            free ( library[index] ) ;
        }
        break ;
      
      case 1:             /* the add choice */
        if ( count < MAX )      /* if MAX number of books not reached */
        {
          library [count] = ( book * ) malloc ( sizeof ( book ) ) ;
          if  ( library [count] == NULL )   /* if memory allocation failed */
               printf ( "Failed to allocate memory.\n" ) ;
          else
          {
            *library [count] = add ( ) ;       /* add a book record */
            count ++ ;    /* add one to the count of the books */
          }
        } 
	else {
	   printf ( "Library has reached maximum capacity.\n" );
        }
        break ;
	
      case 2:       /* the list choice */
        list ( library, count ) ;         /* list  books */
        break ;
	
      default:
        printf ( "Wrong selection. Try again.\n" ) ;  /* error message */
    }       /* end of the switch */
    
  } while (  choice != 0  ) ;           /* end of the do-while */
  
  exit (0);
}       /* end of the main */
   
/*
  menu    This function shows the main menu, reads the user selection,
      and returns the user selection the caller.
*/
int  menu ()
{
    int choice ;          /* user selection */
    char junk[100];       /* Used to read to end of line */
    
    printf  (  "\n<< Super Duper Menu >>\n" ) ;
    printf  (  "    0: Exit\n" ) ;
    printf  (  "    1: Add a book\n" ) ;
    printf  (  "    2: List all books\n" ) ;
    printf  ( "Enter your selection: \n" ) ;
    scanf ("%d", &choice ) ;
    gets (junk);
    
    return choice;
}
/*
  add     This function stores the user input in a temporary structure, 
          and returns the filled  structure to the caller.
*/
   
book  add  ()
{
    book temp ;         /* declare a structure variable */
    printf  (  "\n<< Add MODE >>\n" ) ; /* display title */
    printf (  "Title?  " ) ;      /* ask for book title */
    gets  (  temp.title ) ;
    printf (  "Author?  " ) ;         /* ask for book author */
    gets  (  temp.author  ) ;
    printf (  "Category?  " ) ;       /* ask for book category */
    gets  ( temp.category  ) ;
    return temp ;         /* return structure */
}
   
/* 
  list      This function lists the books in the library. 
            It receives the pointer to the structure array, 
            and number of books, and displays the books list. 
*/
void  list ( book  *books[], int size )
{
    int  index ;                /* declare a loop counter */
    book *next;
    
    printf (  "\n<< BOOK LIST >>\n"  ) ;  /* display the title */   
    for  ( index = 0 ; index < size ; ++index) {
	next = books[index];
        printf ("%d: %s by %s --- %s\n", index+1, 
          next-> title, next-> author, next-> category ) ;
    }
    
}