Fall 2017, Siena College

Lab 8: Collective Communication
Due: 11:59 PM, Wednesday, October 11, 2017

For this lab, you will learn about collective communication using MPI. Until now, we have used only point-to-point communication, i.e., sends and receives.

You may work alone or with a partner on this lab.

Getting Set Up

You will receive an email with the link to follow to set up your GitHub repository coll-lab-yourgitname for this Lab. One member of the group should follow the link to set up the repository on GitHub, then that person should email the instructor with the other group members' GitHub usernames so they can be granted access. This will allow all members of the group to clone the repository and commit and push changes to the origin on GitHub. At least one group member should make a clone of the repository to begin work.

Reading

Read Pacheco Ch. 3, Section 4, to learn about MPI's collective communication capabilities.

Improving RDS

Also recall your response to and our discussion of Exercise 1.3, and read Section 3.4.1. Such a mechanism can be used to improve the speed of the communication at end of the RDS program from earlier labs.

Practice Program: In a file mpirds-tree.c, replace the (p-1)-step computation of the global sums with one a the tree structured global sum, but using only point-to-point communication. You may use a structure like that in either Figure 3.6 or Figure 3.7, or one of your own design, as long as it has the same efficiency. (10 points)

For the questions below, assume you are running your original or improved programs with 512 processes. You don't need to run it, just analyze your code.

Question 1: How many total messages are sent and received in your original program to compute the global sum? (1 point)

Question 2: How many "communication phases" are needed by your original program to compute the global sum? That is, what is the longest sequence of consecutive sends or receives by any one process? (1 point)

Question 3: How many addition operations are needed by your original program in the computation of the global sum? (1 point)

Question 4: How many total messages are sent and received in your improved program to compute the global sum? (1 point)

Question 5: How many "communication phases" are needed by your improved program to compute the global sum? That is, what is the longest sequence of consecutive sends or receives by any one process? (1 point)

Question 6: How many addition operations are needed by your improved program in the computation of the global sum? (1 point)

Of course, a good MPI programmer would have continued reading in Section 3.4.2 and realized that this can be done with one of MPI's higher-level collective communication routines: an `MPI_Reduce`.

Practice Program: In a file mpirds-reduce.c, replace the computation of the global sums with an appropriate reduction. (5 points)

An Example: Conway's Game of Life

The Game of Life was invented by John Conway in 1970. The game is played on a field of cells, each of which has eight neighbors (adjacent cells). A cell is either occupied (by an organism) or not. The rules for deriving a generation from the previous one are:

• Death: If an occupied cell has 0, 1, 4, 5, 6, 7, or 8 occupied neighbors, it dies (of either boredom or overcrowding, as the case may be)
• Survival: If an occupied cell has 2 or 3 occupied neighbors, it survives to the next generation
• Birth: If an unoccupied cell has 3 occupied neighbors, it becomes occupied.

The game is very interesting in that complex patterns and cycles arise. This is a very nice online, graphical simulation of the game.

In the life subdirectory of your repository, there is a serial implementation of the game, and the mpilife subdirectory contains a version parallelized with MPI.

We will look at this example during class, starting with the serial version and discussing how it is parallelized. Note that it uses both non-blocking sends and receives and some collective communications (reductions and a barrier).

The MPI version of Conway's Game of Life uses a distributed data structure. Each process maintains its own subset of the computational domain, in this case just a number of rows of the grid. Other processes do not know about the data on a given process. Only the data that is needed to compute the next generation, a one-cell overlap, is exchanged between neighbors before each iteration.

More Collective Communication Examples

In addition to the examples in the text chapter, there are two examples to consider in the repository for this lab: mpicoll and mpiscatgath.

Question 7: For each of the collective communication MPI calls in these examples, briefly describe its effect, both in general terms (what the function does) and what it is doing specifically in each case. What data exists on each process before and after each call? (14 points)

Submitting

Your submission requires that all required deliverables are committed and pushed to the master for your repository on GitHub.

Grading

This assignment is worth 35 points, which are distributed as follows:

 > Feature Value Score mpirds-tree.c program 10 Q1-Q6 6 mpirds-reduce.c program 5 Q7: describe collective communication 14 Total 35