Computer Science 335
Parallel Processing and High Performance Computing

Fall 2021, Siena College

Lab 3: Processes and MPI Introduction
Due: 4:00 PM, Thursday, September 30, 2021

This brief lab exercise will introduce you to computing with multiple processes. First, we will see an example using the Unix fork system call. Then we will run our first message passing programs.

You must work individually on this lab.

Learning goals:

  1. To see the basics of how processes are created and can communicate with each other in a Unix environment.
  2. To run an MPI application.

Getting Set Up

You can find the link to follow to set up your GitHub repository processes-lab-yourgitname for this Lab in Canvas.

Please answer the lab questions in the README.md file of your repository.

Introduction

Our first mechanism for introducing parallelism into our programs is to have multiple processes in execution that cooperate to solve a problem. Those processes will not share any memory - in fact, they will often be running on different physical pieces of hardware. When those processes need to communicate with each other (which they'll almost always need to do to perform a meaningful parallel computation), they will send message to each other.

This approach is called the message passing paradigm. It is very flexible in that message passing programs can be executed by creating multiple processes on the same physical system (usually one with multiple processors/cores), or by creating them on different systems that can communicate across some network medium.

Some characteristics of the message passing paradigm:

Creating Unix Processes

Unix programs can use fork() to create new processes.

The Unix system call fork() duplicates a process. The child is a copy of the parent - in execution at the same point, the statement after the return from fork().

The return value indicates if you are child or parent.

0 is child, >0 means parent, -1 means failure (limit reached, permission denied)

Example C program:

pid=fork();
if (pid) {
  parent stuff;
}
else {
  child stuff;
}

A more complete program that uses fork() along with three other system calls (wait(), getpid(), and getppid()) is in the forking example in your repository for this lab.

Processes created using fork() do not share context, and must allocate shared memory explicitly, or rely on a form of message passing to communicate.

Run the program on noreaster.teresco.org.

Question 1: How many times did the program print the output from line 20? Why? (2 points)

Question 2: How many times did the program print the output from line 28? Why? Which line was printed by the parent and which by the child? (3 points)

Question 3: Where did the number printed by line 42 come from? (1 point)

Remember that the advantage of using processes such as these instead of threads is that the processes could potentially be running on different systems. But if they are going to cooperate, they will need to communicate:

Sockets and pipes provide only a very rudimentary interprocess communication. Each "message" sent through a pipe or across a socket has a unique sender and unique receiver and is really nothing more than a stream of bytes. The sender and receiver must add any structure to these communcations.

sockets a very simplistic example of two processes that can communicate over raw sockets. It is included mainly to show you that you don't want to be doing this if you can help it.

To run this program, use the make command to build the two executables: client and server. They'll both have to be run, so start two terminals. Pick a port number (something over 1000) that will be used so the sockets in each program can connect up. Suppose you pick 5423. In one terminal, start the server:

./server -p 5423

It will then be ready to accept connections through the socket on port 5423 from clients that connect on that same port.

In the other terminal, start the client:

./client -p 5423 -h localhost

You will see messages in both terminals. The client will prompt for what you want to do next. That will be specified by a single character. The client will send that character to the server over the socket using the write system call, and it will be read from the socket by the server using the read system call. Certain characters will cause additional information to be sent through the socket. 'a' sends an array of integer values. 'i' resends the identification string that was sent to the server when the client started. 'q' will cause the client to terminate.

These are processed by the switch/case constructs near the bottom of the code for the client and the server.

Practice Program: Add a new command to the client and have it handled by the server. The command should be triggered by the command 'r' and should send a number and a string, and in response the server should print the message that number of times. (10 points)

For many applications, this primitive interface is unreasonable. We want something at a higher level. Message passing libraries have evolved to meet this need.

Message Passing Libraries

Message passing is supported through a set of library routines. This allows programmers to avoid dealing with the hardware directly. Programmers want to concentrate on the problem they're trying to solve, not worrying about writing to special memory buffers or making TCP/IP calls or even creating sockets.

Examples: P4, PVM, MPL, MPI, MPI-2, etc. MPI has become an industry standard, under the guidance of the MPI Forum.

We will be looking at MPI in detail for the next couple weeks. For today, you will be considering an MPI-based "Hello, World" program, mpihello.

Compile the program on noreaster.teresco.org with the make command. MPI programs need to know about additional libraries, so are often compiled with a different command (than, say, gcc) that is aware of the extra MPI libraries.

Question 4: What compiler command is used when you run make for the mpihello program? (1 point)

You should now have an executable mpihello. Run it.

Question 5: What is the output? (1 point)

The standard command line, where you type the name of the program you wish to run, results in an MPI program that has a single process.

The mechanism to run an MPI program and launch multiple processes is somewhat system-dependent, but often involves a command such as mpirun or mpiexec. On noreaster, the command is mpirun. To run two processes:

mpirun -np 2 ./mpihello

Question 6: What is the output? If you run repeatedly, do you always get the same output? Why or why not? (2 points)

Now run with increasing powers of two for the number of processes.

Question 7: How many processes can you launch before it takes more than about 10 seconds to launch the processes? About how long did it take to run the first that takes more than 10 seconds? (2 points)

Question 8: What is the range of rank values for a run of a given size? (1 point)

Notice that the first executable statement in an MPI program's main function is a call to the MPI_Init function, and the last is a call to the MPI_Finalize function. We will take care not to have any executable code outside that block.

Question 9: Briefly describe the function of each of the other MPI functions called in this example. (3 points)

Practice Program: Add code to the mpihello example that prints a message "Hey, I'm the rank 0 process!" only on the process with rank 0, and a message "Wow, I'm the highest-ranked process!" on the process with the highest rank. (4 points)

Submission

Commit and push!

Grading

This assignment will be graded out of 30 points.

Feature

Value Score
Question 1 2
Question 2 3
Question 3 1
sockets enhancements 10
Question 4 1
Question 5 1
Question 6 2
Question 7 2
Question 8 1
Question 9 3
mpihello additions 4
Total 30