Parallel programs can be categorized by how the cooperating processes communicate with each other:
These are functionally equivalent given operating system support. For example, one can write message-passing software using shared memory constructs, and one can simulate a shared memory by replacing accesses to non-local memory with a series of messages that access or modify the remote memory.
We will look at shared memory parallelism using threads first.
Multithreading usually allows for the use of shared memory. Many operating systems provide support for threads, and a standard interface has been developed: POSIX Threads or pthreads.
An online tutorial is available here
A google search for "pthread tutorial" yields many others.
Pthreads are available on bullpen.
The basic idea is that we can create and destroy threads of execution in a program, on the fly, during its execution.
Start with a look at a pthreads "Hello, world" program:
pthreadhello.tgz Also available in /home/faculty/terescoj/shared/cs338/lect03.
The most basic functionality involves the creation and destruction of threads:
Prototypes for pthread functions are in pthread.h and programs need to link with libpthread.a (use -lpthread at link time). When using the Sun compiler, the -mt flag should also be specified to indicate multithreaded code.
It is also a good idea to do some extra initialization, to make sure the system will allow your threads to make use of all available processors. It may, by default, allow only one thread in your program to be executing at any given time. If your program will create up to n concurrent threads, you should make the call:
pthread_setconcurrency(n+1);
somewhere before your first thread creation. The "+1" is needed to account for the original thread plus the n you plan to create.
You may also want to specify actual attributes as the second argument to pthread_create(). To do this, declare a variable for the attributes:
pthread_attr_t attr;
and initialize it with:
pthread_attr_init(&attr);
and set parameters on the attributes with calls such as:
pthread_attr_setscope(&attr, PTHREAD_SCOPE_PROCESS);
I recommend the above setting for threads in Solaris.
Then, you can pass in &attr as the second parameter to pthread_create().
Any global variables in your program are accessible to all threads. Local variables are directly accessible only to the thread in which they were created, though the memory can be shared by passing a pointer as part of the last argument to pthread_create().
A slightly more interesting example:
proctree_threads.tgz Also available in /home/faculty/terescoj/shared/cs338/lect03.
This example builds a "tree" of threads to a depth given on the command line. It includes calls to pthread_self(). This function returns the thread identifier of the calling thread.
As CS 432 veterans can tell you, concurrent access to shared variables can be dangerous.
Consider this example:
pthread_danger.tgz Also available in /home/faculty/terescoj/shared/cs338/lect03.
Run it with one thread, and we get 100000. What if we run it with 2 threads? On a multiprocessor, it is going to give the wrong answer! Why?
The answer is that we have concurrent access to the shared variable counter. Suppose that two threads are each about to execute counter++, what can go wrong?
counter++ really requires three machine instructions: (i) load a register with the value of counter's memory location, (ii) increment the register, and (iii) store the register value back in counter's memory location. Even on a single processor, the operating system could switch the process out in the middle of this. With multiple processors, the statements really could be happening concurrently.
Consider two threads running the statements that modify counter:
Thread A | Thread B | ||
A1 | R0 = counter; | B1 | R1 = counter; |
A2 | R0 = R0 + 1; | B2 | R1 = R1 + 1; |
A3 | counter = R0; | B3 | counter = R1; |
Consider one possible ordering: A1 A2 B1 A3 B2 B3 , where counter=17 before starting. Uh oh.
What we have here is a race condition. We need to make sure that when one process starts modifying counter, that it finishes before the other can try to modify it. This requires synchronization of the processes.
We need to make those statements that increment counter atomic. We say that the modification of counter is a critical section.
There are many solutions to the critical section problem that take up over a week of CS 432. But for our purposes, at least for now, it is sufficient to recognize the problem, and use available tools to deal with it.
The pthread library provides a construct called a mutex that allows us to ensure that only one thread at a time is executing a particular block of code. We can use it to fix our "danger" program:
pthread_nodanger.tgz Also available in /home/faculty/terescoj/shared/cs338/lect03.
We declare a mutex like any other shared variable. It is of type pthread_mutex_t. Four functions are used:
You may ask why it's safe to access the mutex concurrently when it's really just a shared variable itself. This is because the pthread library and the operating system make use of some techniques (again, discussed in detail in CS 432) to guarantee that only one thread can have the lock at a time.
You will need all of this for the Homework/Lab 2 program.
One more item you will need for the Homework/Lab 2 program is the idea of a "bag of tasks" paradigm of parallel computation. The idea is that a collection of cooperating processes (or in your case, pthreads) are started, and each takes a unit of work from the "bag," processes it, then takes another unit of work. Each worker thread continues until there is no work left. There is probably a critical section when the worker threads are choosing their next unit of work. Take care that two threads can't get the same unit of work.
There are a few other things you'll need to be aware of for the Homework/Lab 2 program: