Computer Science 335
Parallel Processing and High Performance Computing
Fall 2021, Siena College
Lab 10: OpenMP Practice
Due: 4:00 PM, Tuesday, November 23, 2021
In this lab, you will get a little practice parallelizing
programs with OpenMP.
You may work alone or with a partner on this lab.
Learning goals:
- To practice using OpenMP to parallelize programs.
Getting Set Up
You can find the link to follow to
set up your GitHub repository openmp-lab-yourgitname for this
Lab in Canvas. One member of the group should follow the
link to set up the repository on GitHub, then that person should
email the instructor with the other group members' GitHub usernames
so they can be granted access. This will allow all members of the
group to clone the repository and commit and push changes to the
origin on GitHub. At least one group member should make a clone of
the repository to begin work.
You may choose to answer the lab questions in the README.md file
in the top-level directory of your repository, or upload a document
with your responses to your repository, or add a link to a shared
document containing your responses to the README.md file.
More Pi
No, we're not going to do Jacobi iteration again, but we will revisit
a familiar problem.
Practice Program: Write an OpenMP version of the Monte Carlo
approximation of pi. Call your program openmp_pi.c in the
pi directory of your repository. See below for some tips. (20
points)
- Your program should take one command-line parameter: the number
of random points to be generated by each thread. Let OpenMP
determine the number of threads based on the OMP_NUM_THREADS
environment variable.
- Each thread in the parallel block will compute and report
its own estimate of pi.
- Use a reduction directive to combine the number of points
in the circle to get your total for the final approximation.
Practice Program: Add appropriate timers to each of your Monte Carlo
pi approximation programs (MPI, pthreads, OpenMP) to report the
time taken to compute the entire approximation. Include all three
programs in the timepi directory of your repository. (5 points)
Question 1: Run each of your Monte Carlo pi approximation programs
(MPI, pthreads, OpenMP) with 16, 32, and 64 processes/threads on a
Stampede2 production node (can be either interactive or batch,
whatever you prefer) and 1 billion points per process/thread.
Report the times taken in tabular form. (10 points)
Question 2: Discuss what these timings tell you about the relative
efficiencies of the three versions of the program. (5 points)
Another Closest Pairs Variant
This section's program will be graded as a practice program and is
worth 25 points.
In the cp directory of your repository, you will find a copy of
the OpenMP closest pairs code from the class example. Your task is to
remove the shared variable access by adding an array of structs where
each thread can put its solution, and the main thread finds and
reports the overall winner. This mode of operation of this program
will be selected by passing the string noshared in
argv[1]. The following steps will guide you.
- Add the new mode to the parameter check near the start of
main.
- Add a third case to the condition under the "// do it"
comment in main. Call a function (which you will write next)
tmg_closest_pair_omp_noshared with the same parameter list
as the others.
- Define a struct with three fields to hold the two indices
into the array of vertices and the distance between them that will
hold the information about a "leading" pair.
- Make a copy of the tmg_closest_pair_omp_coarse and name
it tmg_closest_pair_omp_noshared.
- Remove the variables v1, v2, and distance froom
the shared clause on the parallel directive near the start
of the function.
- Move the declaration of the num_threads variable before
the parallel directive, since we will need its value after the
parallel block. Also, add num_threads to the shared
clause of the parallel directive.
- Move the two function calls that get the number of threads and
thread number to the start of the parallel block. Don't forget
that you no longer need to declare num_threads since it's
declared outside of the parallel block and that variable will
be shared by all of the threads.
- Outside the parallel block, declare a variable that points
to an array of the struct you defined above. If it's called
leader, your declaration will look like:
leader *leaders;
This will point to an array of these structs, one for each
thread. Note that we cannot construct that array yet because we can't
find out how many threads there will be until we get into the
parallel block.
- Add the leader variable you declared in the previous step
to the shared clause.
- Now we can allocate space for the array, but we want to make
sure it gets created just once. Place the appropriate malloc
below a single directive to ensure that only one thread
executes it. The statement should allocate an array of the
structs, one per thread.
- The leader array will be accessed by the threads, but each
will access only the one at the index corresponding to its thread
number. But we need to make sure that the one thread that is chosen
to create it has completed its work before any thread tries to
access it. So follow the statement with a barrier directive.
- Replace the local_v1, local_v2, and
local_distance variables with references to
leaders[thread_num].v1, leaders[thread_num].v2, and
leaders[thread_num].distance, respectively.
- Remove the critical directive and its if statement at
the end of the parallel block.
- Now, outside of the parallel block, add a loop over the
leaders array to find the one with the smallest distance, and
store those vertex indices and the distance in v1, v2, and
distance.
- free the memory you allocated for the leaders array.
- Test your code.
Submission
Commit and push!
Grading
This assignment will be graded out of 65 points.
Feature | Value | Score |
OpenMP pi | 20 | |
Timers in pi programs | 5 | |
Question 1: timings | 10 | |
Question 2: timing analysis | 5 | |
closest pairs variant | 25 | |
Total | 65 | |
|