Lab 10

Computer Science 335
Parallel Processing and High Performance Computing

Fall 2024, Siena College

Lab 10: OpenMP Practice
Due: 4:00 PM, Monday, December 2, 2024

In this lab, you will get a little practice parallelizing programs with OpenMP.

You may form groups of 2 if you wish. You should write your programs in your own (or your own group's) repository and answer the questions on this lab in your own document, as we work together as a class through the tasks.

Learning goals:

To practice using OpenMP to parallelize programs.

Getting Set Up

In Canvas, you will find a link to follow to set up your GitHub repository, which will be named openmp-lab-yourgitname, for this lab. Only one member of the group should follow the link to set up the repository on GitHub, then others should request a link to be granted write access.

You may choose to answer the lab questions in the README.md file in the top-level directory of your repository, or upload a document with your responses to your repository, or add a link to a shared document containing your responses to the README.md file.

More Pi

No, we're not going to do Jacobi iteration again, but we will revisit a familiar problem.

Practice Program: Write an OpenMP version of the Monte Carlo approximation of pi. Call your program openmp_pi.c in the pi directory of your repository. See below for some tips. (20 points)

Your program should take one command-line parameter: the number of random points to be generated by each thread. Let OpenMP determine the number of threads based on the OMP_NUM_THREADS environment variable.
Each thread in the parallel block will compute and report its own estimate of pi.
Use a reduction directive to combine the number of points in the circle to get your total for the final approximation.

Practice Program: Add appropriate timers to each of your Monte Carlo pi approximation programs (MPI, pthreads, OpenMP) to report the time taken to compute the entire approximation. Include all three programs in the timepi directory of your repository. (5 points)

Question 1: Run each of your Monte Carlo pi approximation programs (MPI, pthreads, OpenMP) with 16, 32, and 64 processes/threads on a Stampede3 production node (can be either interactive or batch, whatever you prefer) and 1 billion points per process/thread. Report the times taken in tabular form. (10 points)

Question 2: Discuss what these timings tell you about the relative efficiencies of the three versions of the program. (5 points)

Another Closest Pairs Variant

This section's program will be graded as a practice program and is worth 25 points.

In the cp directory of your repository, you will find a copy of the OpenMP closest pairs code from the class example. Your task is to remove the shared variable access by adding an array of structs where each thread can put its solution, and the main thread finds and reports the overall winner. This mode of operation of this program will be selected by passing the string noshared in argv[1]. The following steps will guide you.

Add the new mode to the parameter check near the start of main.
Add a third case to the condition under the "// do it" comment in main. Call a function (which you will write next) tmg_closest_pair_omp_noshared with the same parameter list as the others.
Define a struct with three fields to hold the two indices into the array of vertices and the distance between them that will hold the information about a "leading" pair.
Make a copy of the tmg_closest_pair_omp_coarse function and name it tmg_closest_pair_omp_noshared.
Remove the variables v1, v2, and distance from the shared clause on the parallel directive near the start of the function.
Move the declaration of the num_threads variable before the parallel directive, since we will need its value after the parallel block. Also, add num_threads to the shared clause of the parallel directive.
Move the two function calls that get the number of threads and thread number to the start of the parallel block. Don't forget that you no longer need to declare num_threads since it's declared outside of the parallel block and that variable will be shared by all of the threads.
Outside the parallel block, declare a variable that points to an array of the struct you defined above. If it's called leader, your declaration will look like:
```
  leader *leaders;
```
This will point to an array of these structs, one for each thread. Note that we cannot construct that array yet because we can't find out how many threads there will be until we get into the parallel block.
Add the leader variable you declared in the previous step to the shared clause.
Now we can allocate space for the array, but we want to make sure it gets created just once. Place the appropriate malloc below a single directive to ensure that only one thread executes it. The statement should allocate an array of the structs, one per thread.
The leader array will be accessed by the threads, but each will access only the one at the index corresponding to its thread number. But we need to make sure that the one thread that is chosen to create it has completed its work before any thread tries to access it. So follow the statement with a barrier directive.
Replace the local_v1, local_v2, and local_distance variables with references to leaders[thread_num].v1, leaders[thread_num].v2, and leaders[thread_num].distance, respectively.
Remove the critical directive and its if statement at the end of the parallel block.
Now, outside of the parallel block, add a loop over the leaders array to find the one with the smallest distance, and store those vertex indices and the distance in v1, v2, and distance.
free the memory you allocated for the leaders array.
Test your code.

Submission

Commit and push!

Grading

This assignment will be graded out of 65 points.

Feature	Value	Score
OpenMP pi	20
Timers in pi programs	5
Question 1: timings	10
Question 2: timing analysis	5
closest pairs variant	25
Total	65