Lecture 3 - Multithreading


Agenda

Announcements

Approaches to Parallelism

Parallel programs can be categorized by how the cooperating processes communicate with each other:

  • Shared Memory - some variables are accessible from multiple processes. Reading and writing these values allow the processes to communicate.
  • Message Passing - communication requires explicit messages to be sent from one process to the other when they need to communicate.
  • These are functionally equivalent given operating system support. For example, one can write message-passing software using shared memory constructs, and one can simulate a shared memory by replacing accesses to non-local memory with a series of messages that access or modify the remote memory.

    We will look at shared memory parallelism using threads first.

    Brief Intro to POSIX threads

    Multithreading usually allows for the use of shared memory. Many operating systems provide support for threads, and a standard interface has been developed: POSIX Threads or pthreads.

    An online tutorial is available here

    A google search for "pthread tutorial" yields many others.

    Pthreads are available on bullpen.

    The basic idea is that we can create and destroy threads of execution in a program, on the fly, during its execution.

    Start with a look at a pthreads "Hello, world" program:

    pthreadhello.tgz Also available in /home/faculty/terescoj/shared/cs338/lect03.

    The most basic functionality involves the creation and destruction of threads:

  • pthread_create(3THR) - Well, this creates a new thread. It takes 4 arguments. The first is a pointer to a variable of type pthread_t. Upon return, this contains a thread identifier that is used later in pthread_join(). The second is a pointer to a pthread_attr_t that specifies thread creation attributes. In the pthreadhello program, we pass in NULL, which specifies the system default attributes. The third argument is a pointer to a function that will be called when the thread is started. This function must take a single parameter of type void * and return void *. The fourth parameter is the pointer that will be passed as the argument to the thread function.
  • pthread_exit(3THR) - This causes the calling thread to exit. This is called implicitly if the thread function returns. Its argument is a return status value, which can be retrieved by pthread_join().
  • pthread_join(3THR) - This causes the calling thread to block until the thread with the identifier passed as the first argument to pthread_join() has exited. The second argument is a pointer to a location where the return status passed to pthread_exit() can be stored. In the pthreadhello program, we pass in NULL, and hence ignore the value.
  • Prototypes for pthread functions are in pthread.h and programs need to link with libpthread.a (use -lpthread at link time). When using the Sun compiler, the -mt flag should also be specified to indicate multithreaded code.

    It is also a good idea to do some extra initialization, to make sure the system will allow your threads to make use of all available processors. It may, by default, allow only one thread in your program to be executing at any given time. If your program will create up to n concurrent threads, you should make the call:

      pthread_setconcurrency(n+1);
    

    somewhere before your first thread creation. The "+1" is needed to account for the original thread plus the n you plan to create.

    You may also want to specify actual attributes as the second argument to pthread_create(). To do this, declare a variable for the attributes:

      pthread_attr_t attr;
    

    and initialize it with:

      pthread_attr_init(&attr);
    

    and set parameters on the attributes with calls such as:

      pthread_attr_setscope(&attr, PTHREAD_SCOPE_PROCESS);
    

    I recommend the above setting for threads in Solaris.

    Then, you can pass in &attr as the second parameter to pthread_create().

    Any global variables in your program are accessible to all threads. Local variables are directly accessible only to the thread in which they were created, though the memory can be shared by passing a pointer as part of the last argument to pthread_create().

    A slightly more interesting example:

    proctree_threads.tgz Also available in /home/faculty/terescoj/shared/cs338/lect03.

    This example builds a "tree" of threads to a depth given on the command line. It includes calls to pthread_self(). This function returns the thread identifier of the calling thread.

    Brief Intro to Critical Sections

    As CS 432 veterans can tell you, concurrent access to shared variables can be dangerous.

    Consider this example:

    pthread_danger.tgz Also available in /home/faculty/terescoj/shared/cs338/lect03.

    Run it with one thread, and we get 100000. What if we run it with 2 threads? On a multiprocessor, it is going to give the wrong answer! Why?

    The answer is that we have concurrent access to the shared variable counter. Suppose that two threads are each about to execute counter++, what can go wrong?

    counter++ really requires three machine instructions: (i) load a register with the value of counter's memory location, (ii) increment the register, and (iii) store the register value back in counter's memory location. Even on a single processor, the operating system could switch the process out in the middle of this. With multiple processors, the statements really could be happening concurrently.

    Consider two threads running the statements that modify counter:
    Thread A Thread B
    A1 R0 = counter; B1 R1 = counter;
    A2 R0 = R0 + 1; B2 R1 = R1 + 1;
    A3 counter = R0; B3 counter = R1;

    Consider one possible ordering: A1 A2 B1 A3 B2 B3 , where counter=17 before starting. Uh oh.

    What we have here is a race condition. We need to make sure that when one process starts modifying counter, that it finishes before the other can try to modify it. This requires synchronization of the processes.

    We need to make those statements that increment counter atomic. We say that the modification of counter is a critical section.

    There are many solutions to the critical section problem that take up over a week of CS 432. But for our purposes, at least for now, it is sufficient to recognize the problem, and use available tools to deal with it.

    The pthread library provides a construct called a mutex that allows us to ensure that only one thread at a time is executing a particular block of code. We can use it to fix our "danger" program:

    pthread_nodanger.tgz Also available in /home/faculty/terescoj/shared/cs338/lect03.

    We declare a mutex like any other shared variable. It is of type pthread_mutex_t. Four functions are used:

  • pthread_mutex_init(3THR) - initialize the mutex and set it to the unlocked state.
  • pthread_mutex_lock(3THR) - request the lock on the mutex. If the mutex is unlocked, the calling thread acquires the lock. Otherwise, the thread is blocked until the thread that previously locked the mutex unlocks it.
  • pthread_mutex_lock(3THR) - unlock the mutex.
  • pthread_mutex_destroy(3THR) - destroy the mutex (clean up memory).
  • You may ask why it's safe to access the mutex concurrently when it's really just a shared variable itself. This is because the pthread library and the operating system make use of some techniques (again, discussed in detail in CS 432) to guarantee that only one thread can have the lock at a time.

    You will need all of this for the Homework/Lab 2 program.

    Bag of Tasks Paradigm

    One more item you will need for the Homework/Lab 2 program is the idea of a "bag of tasks" paradigm of parallel computation. The idea is that a collection of cooperating processes (or in your case, pthreads) are started, and each takes a unit of work from the "bag," processes it, then takes another unit of work. Each worker thread continues until there is no work left. There is probably a critical section when the worker threads are choosing their next unit of work. Take care that two threads can't get the same unit of work.

    More C

    There are a few other things you'll need to be aware of for the Homework/Lab 2 program:

  • There is no "string" type in C. You just use an array of char. See string(3C) for details on some useful string manipulation functions.
  • Dynamic memory management is done by the malloc(3C) and free(3C) functions. There is no garbage collection, so you need to make sure you return any memory you allocate.
  • You can use char variables and constant as ints, if you'd like.
  • You may find the functions toupper(3C) or tolower(3C) useful.