Computer Science 010

Lecture Notes 11

File Manipulation in C

Command Line Arguments in C

First, a reminder: Remove all core files you are no longer using, as they take a LOT of disk space. Look for *.core in all of your directories.

File I/O in C

To this point, our C programs only take input from the keyboard and write output on the screen. We have seen how to redirect input and output to read and write files, but this is not sufficient for real file manipulation. First, we want the application to be able to decide when to use files, not the user. Second, a program might want to use more than one input file or more than one output file. Consider Emacs, for example. We might have multiple files open, each in a different buffer. To allow the editing of multiple files within an Emacs session, there must be some way to write code that allows us to open, read, and write multiple files during a running program.

File types and functions

To read from or write to a file, we need to tell the operation which file we want manipulate. We could give a filename to each read or write call, but that is awkward and also would result in slow execution. Instead, before we read or write a file, we must open the file. The open function call returns a file pointer, which is passed to subsequent read and write calls. Here is the prototype for the file open function:

FILE *fopen (const char *filename, const char *mode);

The type of a file pointer is FILE *. The FILE type is defined in stdio.h, as are the I/O functions we have been using. The filename is simply the name of the file we want to open. The filename can be a file in the current directory or can include a path. The final argument is the mode, which indicates if we intend to read from the file, write to the file, or both.

While mode is declared to be a string, there are actually a very small set of strings that are valid. The most common are:

"r"

Read

"w"

Write

"a"

Append

"r+"

Read and write

fopen returns NULL if the file could not be opened. Perhaps the file does not exist, or the user running the program does not have the appropriate read/write permission. Operating systems also typically limit the number of files that may open by one program simultaneously; we may have already reached the limit. Hence, it is imperative that we check the return value of fopen and do some appropriate error handling if NULL is returned. (More on error checking later.)

When done using a file you have opened with fopen, you must close the file. This will prevent you from accidentally using the file after you should be done with it, and forces all writes you made to become persistent on disk. (If you don't close the file, it's possible that the changes will only be made in memory and will be lost if the system crashes.) You also need to close and reopen a file if you want to use it in a different mode. The prototype for the close function is:

int fclose (FILE *stream);

fclose returns 0 if it successfully closes the file. Just as every call to malloc should have a call to free, we also should have a call to fclose for every call to fopen.

To write to a file we use fprintf, which is analogous to printf except that it takes an additional argument that identifies the file to write to:

int fprintf (FILE *stream, const char *format, ...);

The ... in the signature indicates that fprintf takes a varying number of arguments of varying types, just as printf does. fprintf (and printf) return the number of characters output. They return -1 if an error occurs.

Similarly, fscanf works like scanf but takes an additional argument identifying the file to read:

int fscanf (FILE *stream, const char *format, ...);

fscanf (and scanf) return the number of values assigned to variables. This will be fewer than the number of variables in the argument list if the end of the file is reached or if there is a type mismatch between the data and the format string conversion. For example, if the format string contains "%d", but the next input is not an integer, there is a mismatch. Both scanf and fscanf would return at this point and the return value would reflect how many successful assignments had occurred prior to the mismatch.

To determine if we have attempted to read beyond the end of a file, we can use feof:

int feof (FILE *stream);

This returns non-zero (true) if we are at the end of the file. Otherwise it returns 0 (false). Note that this only returns true after a failed read. We have to try to read and then test if the read failed. We cannot detect EOF and then decide whether to read or not, although that would usually be more convenient.

Identifying files

There are several ways that we could tell fopen what file to open. In any event, the type must be a char *, but it might get its value in different ways:

C Command-line: argc and argv

It is relatively easy to get arguments from the command line of a C program. We must change the signature of the function main. Thus far, we have been using a simple signature:

int main ();

C also recognizes a signature with additional function arguments:

int main (int argc, char **argv);

argc is the number of command line arguments including the program name used to invoke the command. argv is a dynamically-sized array of strings. It has one entry for each word on the command line. argv[0] is the program name. For example, suppose a user executes the following Unix command:

-> spellcheck myfile

argc will be set to 2. argv[0] is "spellcheck". argv[1] is "myfile". You can then take argv[1] and assign it to a more meaningful variable name and use it as a filename to open. (Of course, it does not need to be a filename. The user needs to know what type of arguments your program expects.)

stdin, stdout, stderr

stdio.h also defines three identifiers whose types are FILE *. These are stdin (standard input), stdout (standard output), and stderr (standard error output). stderr is separate from stdout so that if a program reports errors and the user redirects normal output to a file, the error output will still appear on the user's screen. This is to ensure that the user gets immediate feedback if something goes wrong. To write error messages to the user's screen, you would use:

fprintf (stderr, "My error message\n");

Whether the user redirects standard output or not, this message will appear on the user's screen.

It is also possible to use stdout as an argument to fprintf and stdin as an argument to fscanf. This does not seem useful at first, since you could just use printf and scanf, but consider the case where you want to write a program that reads from standard input only if the no filename is specified on the command line:

#include <stdio.h>
   
int main (int argc, char **argv) {
  FILE *infile;
  int someint;
   
  if (argc == 1) {
    infile = stdin;
  }
  else {
    infile = fopen (argv[1], "r");
  }
   
  fscanf (infile, "%d", &someint);
  fclose (infile);
}

Numbers on the command line

All command-line arguments are passed to main as character strings. However, it is often useful to pass integer or floating point values to a program. Consider the following not-very-interesting example I will call doubleint.c:

#include <stdio.h>

int main(int argc, char *argv[]) {

  int number;

  if (argc != 2) {
    fprintf(stderr, "Usage: %s number\n", argv[0]);
    exit(1);
  }

  number=atoi(argv[1]);
  printf("%d is doubled to %d\n", number, 2*number);

  return 0;

}

This example includes a check to make sure the argument exists, and a call to the atoi function which converts a string to its integer equivalent. Similar functions exist to convert to a long integer (atol) or a double precision floating point value (atof).

Error handling

C library functions nearly always return values, some of which indicate the occurrence of an error. For example, if malloc is unable to fulfill a memory request because there is not enough memory available, it will return NULL. An attempt to use the return value in this case will likely lead to a segementation fault. Always test the return value of functions which may fail to see that no error occurred.

There can be a number of reasons that a particular function returns an error value. In those cases, a predefined function will set the value of a predefined variable called errno. If you detect that an error occurred based on a return value and then want to know what specific error occurred, you need to know the value of errno. To have access to errno, you must include errno.h in your program. If you look at /usr/include/errno.h, you will see a long list of #define preprocessor commands. errno is an integer variable that will take on one of these values. If you want to do something special depending upon which error occurred, you will need to include errno.h and compare errno to the #define identifies to determine what happened. From there, you can execute the appropriate error handling code.

Often you will just report the error to the user and either continue or abort the program. Use the function perror:

void perror (const char *s);

perror looks at the value of errno and prints out a more useful error message. The string that you pass in is printed at the beginning of the error message. This allows you to customize the message while still making the underlying cause of the error visible to the user. Here's an example:

#include <stdio.h>
   
int main (int argc, char **argv) {
  FILE *infile;
  int someint;
   
  if (argc == 1) {
    infile = stdin;
  }
  else {
    infile = fopen (argv[1], "r");
    if (infile == NULL {
      perror ("open failed for %s", argv[1]);
      exit (-1);
  }
   
  fscanf (infile, "%d", &someint);
  fclose (infile);
  exit (0);
}

If we run this program and tell it to use a file file.txt that does not exist, the user will get this message:

open failed for file.txt: No such file or directory

Using perror ensures that every error with the same cause will have a similar error message, making it easier for the user or programmer to understand what is wrong. perror is declared in stdio.h. It is not necessary to include errno.h to use perror.

Not all functions that return error codes set errno. To find out if a function does, look at its man page. Look for a discussion of the return value of the function or a look for a discussion of errors. There is often a section of the man page called "RETURN VALUES" which explicitly explains the return values and indicates if the function sets errno. There is generally a separate section called "ERRORS" that lists the #define error values that it might set along with a short description of what that error code means for that function.

For example, from the fopen man page:

RETURN VALUES
Upon successful completion fopen(), fdopen() and freopen() return a FILE pointer. Otherwise, NULL is returned and the global variable errno is set to indicate the error.
ERRORS
[EINVAL] The mode provided to fopen(), fdopen(), or freopen() was invalid.

The fopen(), fdopen() and freopen() functions may also fail and set errno for any of the errors specified for the routine malloc(3).

The fopen() function may also fail and set errno for any of the errors specified for the routine open(2).

The fdopen() function may also fail and set errno for any of the errors specified for the routine fcntl(2).

The freopen() function may also fail and set errno for any of the errors specified for the routines open(2), fclose(3) and fflush(3).

The first indicates an attempt to use a file in a mode that the user does not have rights to use. We need to look at the man pages for other functions to see remaining errors. Look at the man page for open to see more errors, which we can use to do something like this:

#include <stdio.h>
#include <errno.h>
   
int main (int argc, char **argv) {
  FILE *infile;
  int someint;
  char s[1000];
   
  if (argc == 1) {
    infile = stdin;
  }
  else {
    infile = fopen (argv[1], "r+");
    if (infile == NULL {
      if (errno == ENOENT) {
        printf ("%s does not exist.  Create it? (y/n) ", argv[1]);
        gets(s);
        if (s[0] == 'y') {
          fclose (fopen (argv[1], "w"));
          infile = fopen (argv[1], "r");
        }
        else {
          printf ("Exiting program.\n");
          exit (-1);
      }
      else {
        perror ("main trying to open first argument");
        exit (-1);
      }
    }
  }
   
  fscanf (infile, "%d", &someint);
  fclose (infile);
  exit (0);
}

If the file does not exist, this program prompts the user to find out if the user would like to create it. If the user wants to create it, it is created by opening it in write mode (which automatically creates the file if it doesn't exist), closing the file and reopening it in read mode. (This is rather silly since now we have an empty file which won't be very useful for reading, but it shows you how to use the error values.)

There should be more error checking in this example. We should check the result of the attempts to open the file in write mode and the fclose calls, and check the return value of fscanf. As you might realize, good C programs devote a large percentage of their lines of code to doing error checking and error recovery.

sprintf, sscanf

Another variation of printf and scanf allows reading from and writing to strings. Here, the first argument is a string rather than a file pointer. sprintf is like printf except the result is placed into a string variable instead being printed. This is useful for building larger strings out of smaller ones and also for converting integers to strings. Be sure to allocate memory for the result before calling sprintf:

char date[100];
int month, day, year;
sprintf (date, "%d/%d/%d", month, day, year);

The above code sets the value of date (albeit with uninitialized integer values).

sscanf does the reverse. Given a string as the first argument, it will extract pieces of the string and put the results into variables. The following code sets the values of month, day, and year from the string constant in date.

char date[] = "1/24/2001";
int month, day, year;
sscanf (date, "%d/%d/%d", &month, &day, &year);

Example Program

Here is a sample program that manages a library (like the program from lecture 4) but this time it reads and writes the library to a file. Also note the additional error checking that is done to make the program more robust in case something happens to the library file.

/*                  
     Filename: book.c
       Author: Good C. Memory
   Definition: This program uses malloc( ) to allocate memory for structures
               to store the book records, and free( ) to deallocate them.
               This program is a modified version of Program 16-6 in Pure C 
               Programming by Amir Afzal.

*/
/* preprocessor commands */
#include <stdio.h>         /* including the stdio.h header file for I/O */
#include <stdlib.h>        /* including the stdlib.h header file for malloc and free */
#include <errno.h>         /* Include to test error codes */ 

#define EXTRABOOKS 100  /* The maximum number of books that may be added in one
			     run of the program. */

#define BADFILEFORMAT -1 /* Indicates the library file has a bad format */
#define OPENFAILURE -2 /* Indicates the library file has a bad format */
#define OUTOFMEMORY -3 /* Indicates we were unable to allocate enough memory */

/* Define maximum lengths of the fields */
#define TITLE_LENGTH 80
#define AUTHOR_LENGTH 80
#define CATEGORY_LENGTH 40

/* declaring a structure type */
  typedef   struct  {
    char  title [TITLE_LENGTH + 1] ;        /* the title of the book */
    char  author [AUTHOR_LENGTH + 1] ;       /*  the author of the book */
    char  category [CATEGORY_LENGTH + 1] ;     /* the category of the book */
  } book;
  
/* declaring the library type */
typedef struct {
  book **books;  /* The list of books in the library */
  int numBooks;  /* number of books in the library */
  int maxBooks;  /* The size of the books array */
} library;

  int  menu  ( void ) ;       /* function menu prototype */
  void  add ( library *theLibrary ) ;        /* function add prototype */
  void  list  (library *theLibrary ) ;     /* function list prototype */
  library *readBooks (char *libraryFile);
  library *allocateLibrary (int numBooks);
  void quitLibrary (char *libraryFile, library *theLibrary);

int  main(int argc, char **argv )
{
  char *libraryFileName;    /* Name of the library file */
  library *theLibrary;      /* the library being manipulated */
  int choice;               /* Value returned from the menu */

  /* Make sure the book file list argument is supplied. */
  if (argc == 1) {
    fprintf (stderr, "Usage: book book-list-file\n");
    fprintf (stderr, "  where book-list-file is the name of a file \n");
    fprintf (stderr, "  containing a book list. \n");
    exit (-1);
  }

  /* Read in the book list. */
  libraryFileName = argv[1];
  theLibrary = readBooks (libraryFileName);

  do
  {
    choice = menu () ;    /* show the main menu */
    switch (choice )      /* check user selection */
    {
      case 0 :        /* the exit choice */
   	quitLibrary (libraryFileName, theLibrary);
        break ;
      
      case 1:             /* the add choice */
   	add (theLibrary);
        break ;
	
      case 2:       /* the list choice */
        list ( theLibrary ) ;         /* list  books */
        break ;
	
      default:
        printf ( "Wrong selection. Try again.\n" ) ;  /* error message */
    }       /* end of the switch */
    
  } while (  choice != 0  ) ;           /* end of the do-while */
  
  exit (0);
}       /* end of the main */

/*
  menu    This function shows the main menu, reads the user selection,
      and returns the user selection the caller.
*/
int  menu ()
{
  int choice ;          /* user selection */
  char junk[100];
    
  printf  (  "\n<< Super Duper Menu >>\n" ) ;
  printf  (  "    0: Exit\n" ) ;
  printf  (  "    1: Add a book\n" ) ;
  printf  (  "    2: List all books\n" ) ;
  printf  ( "Enter your selection: \n" ) ;
  scanf ("%d", &choice ) ;
  gets (junk);
  return choice;
}
/*
  add     This function stores the user input in a temporary structure, 
          and returns the filled  structure to the caller.
*/

void  add  (library *theLibrary) {
  book *temp ;         /* holds the book as it is read in */

  if ( theLibrary->numBooks < theLibrary->maxBooks ) {
    /* if MAX number of books not reached */
    temp = ( book * ) malloc ( sizeof ( book ) ) ;
    if  ( temp == NULL ) {
      /* if memory allocation failed */
      fprintf (stderr, "The book list is full.  You cannot add more books.\n" );
      theLibrary->maxBooks = theLibrary->numBooks;
    }
    else {
      printf  (  "\n<< Add MODE >>\n" ) ; /* display title */
      printf (  "Title?  " ) ;      /* ask for book title */
      gets  (  temp->title ) ;
      printf (  "Author?  " ) ;         /* ask for book author */
      gets  (  temp->author  ) ;
      printf (  "Category?  " ) ;       /* ask for book category */
      gets  ( temp->category  ) ;
      theLibrary->books [theLibrary->numBooks] = temp;       /* add a book record */
      theLibrary->numBooks ++ ;    /* add one to the count of the books */
    }
  } 
  else {
    fprintf (stderr, "The book list is full.  You cannot add more books.\n" );
  }
}

/* 
  list      This function lists the books in the library. 
            It receives the pointer to the library structure
*/
void  list ( library *theLibrary) {
  book  **books = theLibrary->books;
  int size = theLibrary->numBooks;
  int  index ;                /* declare a loop counter */
  book *next;
    
  printf (  "\n<< BOOK LIST >>\n"  ) ;  /* display the title */   
  for  ( index = 0 ; index < size ; ++index) {
    next = books[index];
    printf ("%d: %s by %s --- %s\n", index+1, 
	    next-> title, next-> author, next-> category ) ;
  }
    
}

/* Read the books in from a file.
   Parameters: libraryFile - the name of the file containing a library
   Returns: a new structure containing the contents of the file.  If the file 
     does not exist, an empty library is returned.  The caller should free the 
     memory.
   Exits the program with OPENFAILURE if the library file exists but cannot be
   opened.  Exits the program with BADFILEFORMAT if the file is not formatted as
   a library file.  A library file should have an integer count alone on the
   first line.  Each book uses three lines.  The first is the title, the second
   is the author, the last is the category.  */

library *readBooks (char *libraryFile) {
  FILE *bookFile;           /* File containing the book list */
  int numRead = 0;          /* Number of books read in from the file */
  char error[500];
  book *nextBook;           /* Next book to read in */
  int numBooks;             /* Number of books in the file. */
  library *newLibrary = NULL;  /* The library structure being created. */

  /* Open the file for reading */
  bookFile = fopen (libraryFile, "r");

  /* Check if the open succeeded */
  if (bookFile == NULL) {

    /* Check if the file exists */
    if (errno == ENOENT) {
      /* It does not exist.  Create an empty library */
      return allocateLibrary (0);
    }
    else {
      /* It exists but it couldn't be opened.  Print an error and exit. */
      sprintf (error, "Cannot open the library %s", libraryFile);
      perror (error);
      exit (OPENFAILURE);
    }
  }

  /* The first field in the file is the number of books currently in the
     library. Make sure the input succeeded. */
  if (fscanf (bookFile, "%d\n", &numBooks) != 1) {
    /* An integer was not input.  Check if the file is empty. */
    if (feof (bookFile)) {
      /* The file is empty. Create an empty library. */
      fclose (bookFile);
      return allocateLibrary (0);
    }
    else {
      /* The file is not empty, but the first input was not an integer. */
      fprintf (stderr, "%s has the wrong format.  The first field is \n", libraryFile);
      fprintf (stderr, "expected to be an integer indicating the number of \n");
      fprintf (stderr, "books in the library. \n");
      exit (BADFILEFORMAT);
    }
  }

  /* We got the library size.  Now we create our structure. */
  newLibrary = allocateLibrary (numBooks);

  /* Read the books from the library and fill in the new structure. */
  for (; numRead != newLibrary->numBooks; numRead++) {
    /* Allocate memory for the next book */
    newLibrary->books[numRead] = (book *) malloc (sizeof (book));
    nextBook = newLibrary->books[numRead];
    if (nextBook == NULL) {
      fprintf (stderr, "Insufficient memory to read all the books!\n");
      exit (OUTOFMEMORY);
    }

    /* Read in the next entry checking that all the reads succeed */
    if (fgets (nextBook->title, TITLE_LENGTH, bookFile) != NULL &&
        fgets (nextBook->author, AUTHOR_LENGTH, bookFile) != NULL &&
	fgets (nextBook->category, CATEGORY_LENGTH, bookFile) != NULL) {
      /* All the fields were read.  Get rid of the newline at the end of each
	 field.  Then go to the next entry */
      nextBook->title[strlen (nextBook->title) - 1] = '\0';
      nextBook->author[strlen (nextBook->author) - 1] = '\0';
      nextBook->category[strlen (nextBook->category) - 1] = '\0';
      continue;
    }
     
    /* There was an error reading in the fields. */
    if (feof (bookFile)) {
      /* Premature end of file.  Throw out the last entry */
      newLibrary->numBooks--;
      break;
    }
    else {
      /* Some other error.  Report it and quit. */
      perror ("Error reading library");
      exit (BADFILEFORMAT);
    }
  }

  fclose (bookFile);
  return newLibrary;
}

/* Allocate memory to hold a library
   Parameters: numBooks - the minimum number of books the library should hold 
   Return the new library structure.  This is dynamically allocated memory that
     must be freed elsewhere
   Exits with OUTOFMEMORY if the memory allocation fails.
*/
library *allocateLibrary (int numBooks) {
  library *newLibrary = (library *) malloc (sizeof (library));

  if (newLibrary == NULL) {
    fprintf (stderr, "Insufficient memory to create the library.\n");
    exit (OUTOFMEMORY);
  }

  newLibrary->numBooks = numBooks;
  newLibrary->maxBooks = numBooks + EXTRABOOKS;

  /* Allocate the library to be larger than the number of books in the file
     currently. */
  newLibrary->books = (book **) calloc (newLibrary->maxBooks, sizeof (book *));
  /* Check if there was enough memory for all the extra books. */
  if (newLibrary->books == NULL) {
    /* Not enough memory.  Try making it just big enough for the current list. */
    newLibrary->books = (book **) calloc (newLibrary->numBooks, sizeof (book *));
    if (newLibrary->books == NULL) {
      fprintf (stderr, "Insufficient memory to create the book list.\n");
      exit (OUTOFMEMORY);
    }
    fprintf (stderr, "Warning:  The book list is full.  You will not be able to add new books.\n");
    newLibrary->maxBooks = numBooks;
  }

  return newLibrary;
}

/* Exit the library program.  This writes the library to a file and frees all
   the memory.
   Parameters: libraryFile - the name of the file to write the library to
               theLibrary - the data to write to the file 
*/
void quitLibrary (char *libraryFile, library *theLibrary) {
  FILE *bookFile;           /* File containing the book list */
  int index;		    /* Used to walk the array. */

  /* Write the library out to the file.  Free the memory as we go. */
  bookFile = fopen (libraryFile, "w");
	
  /* Write out the new list */
  fprintf (bookFile, "%d\n", theLibrary->numBooks);
  for ( index = 0 ; index < theLibrary->numBooks; ++index ) {
    fprintf (bookFile, "%s\n", theLibrary->books[index]->title);
    fprintf (bookFile, "%s\n", theLibrary->books[index]->author);
    fprintf (bookFile, "%s\n", theLibrary->books[index]->category);
    free ( theLibrary->books[index] ) ;
  }
  fclose (bookFile);
  free (theLibrary->books);
  free (theLibrary);
}