Lecture 7 - Message Passing


Agenda

Announcements

Homework/Lab 4

What is Jacobi iteration?

What about distributed memory?

So far we have seen three ways to create a parallel program:

  1. Let the compiler do whatever it can completely automatically
  2. Create threads explicitly using pthreads
  3. Specify parallel sections using OpenMP

These all suffer from one significant limitation - the cooperating threads must be able to communicate through shared variables.

How can we design and run parallel programs to work when there is no shared memory available?

Message Passing

Message Passing - processors have local memory, communicate by sending and receiving messages, no direct access to off-processor memory

Cooperating Processes

Unix programs can use fork() to create new processes. CS 432 veterans remember this from their Cow Shell projects.

The Unix system call fork() duplicates a process. The child is a copy of the parent - in execution at the same point, the statement after the return from fork().

The return value indicates if you are child or parent.

0 is child, >0 means parent, -1 means failure (limit reached, permission denied)

Example C program:

pid=fork();
if (pid) {
  parent stuff;
}
else {
  child stuff;
}

A more complete program that uses fork() along with three other system calls (wait(), getpid(), and getppid()) is here:

forking.tgz Also available in /home/faculty/terescoj/shared/cs338/lect07.

Processes created using fork() do not share context, and must allocate shared memory explicitly, or rely on a form of message passing to communicate.

We will not consider the Unix fork() method of multiprocessing much, but here's an example to show you that in fact, even global variables are not shared by processes that are created with fork():

what_shared.tgz Also available in /home/faculty/terescoj/shared/cs338/lect07.

Remember that the advantage of using processes such as these instead of threads is that the processes could potentially be running on different systems. But if they are going to cooperate, they will need to communicate:

Sockets and pipes provide only a very rudimentary interprocess communication. Each "message" sent through a pipe or across a socket has a unique sender and unique receiver and is really nothing more than a stream of bytes. The sender and receiver must add any structure to these communcations.

Here's a very simplistic example of two processes that can communicate over raw sockets. It is included mainly to show you that you don't want to be doing this if you can help it:

socket.tgz Also available in /home/faculty/terescoj/shared/cs338/lect07.

For many applications, this primitive interface is unreasonable. We want something at a higher level. Message passing libraries have evolved to meet this need.

Message Passing Libraries

Point-to-Point Communication

Basic send and receive operations

   P0: send(addr,len,dest,tag)

   P1: receive(addr,max_len,src,tag,rec_len)

These are basic components in any message-passing implementation. There may be others introduced by a specific library.

Point-to-Point Communication - Blocking

Simple semantics:

But beware of deadlock when using blocking routines!!

              Proc 0          Proc 1
        -------------------------------------
            bsend(to 1)    bsend(to 0)
           brecv(from 1)  brecv(from 0)

If both processors' send buffers cannot be copied into system buffers, or if the calls are strictly synchronous, the calls will block until the corresponding receive call is made... Neither can proceed...deadlock...

Possible solutions - reorder to guarantee matching send/receive pairs, or use nonblocking routines...

Point-to-Point Communication - Nonblocking

Example:

              Proc 0          Proc 1
        -------------------------------------
            nbsend(to 1)    nbsend(to 0)
           nbrecv(from 1)  nbrecv(from 0)
           compute......   compute......
             waitall         waitall
            use result      use result

Note the potential overlap of computation and communication

Deadlock less likely but still must be careful - burden of avoiding deadlock is on the programmer in a message passing model of parallel computation

Collective Communication

Using collective communication operators provided is generally better than trying to do it yourself. The message passing library can often do it more efficiently by taking advantage of low-level functionality.

Message Passing Interface (MPI)

MPI Terminology
MPI Simple Program - basic MPI functions

You already saw and exectuted a simple MPI program back in the first lab. Here's that code:

mpihello.tgz Also available in /home/faculty/terescoj/shared/cs338/lect07.

Remember that you will need to run this with mpirun if you want to get more than one process!

MPI calls and constructs in the simple program:

The model of parallelism is very different from what we have seen. All of our processes exist for the life of the program. We are not allowed to do anything before MPI_Init() or after MPI_Finalize(). We need to think in terms of a number of copies of the same program all starting up at MPI_Init().

MPI - Point-to-Point message types
MPI One Message Program - Blocking Message functions

A simple MPI program that sends a single message:

mpimsg.tgz Also available in /home/faculty/terescoj/shared/cs338/lect07.

MPI Ring Message Program - Nonblocking Message functions

A slightly more interesting MPI program that sends one message from each process:

mpiring.tgz Also available in /home/faculty/terescoj/shared/cs338/lect07.