Computer Science 335
Parallel Processing and High Performance Computing
Fall 2021, Siena College
Lab 11: Traveling Salesperson Problem
Due: before the end of the semester
In this lab, we will work together on some tasks related
to parallelization of the famous Traveling Salesperson Problem
(TSP).
- To see how the parallel programming paradigms we have studied
can be applied to a computationally-challenging problem.
Getting Set Up
You can find the link to follow to
set up your GitHub repository tsp-lab-yourgitname for this
Lab in Canvas. One member of the group should follow the
link to set up the repository on GitHub, then that person should
email the instructor with the other group members' GitHub usernames
so they can be granted access. This will allow all members of the
group to clone the repository and commit and push changes to the
origin on GitHub. At least one group member should make a clone of
the repository to begin work.
Your initial repository should include the source code described in
Section 6.2 of Pacheco related to the text's TSP implementations.
Serial Implementations
We will look at the first part of Section 6.2 of Pacheco, pages
299-306, and study the implementations in tsp_rec.c,
tsp_iter1.c and tsp_iter2.c.
The input to these programs is a file describing a directed graph,
where the edge weight in row r, column c is the cost to
travel from place number r to place number c. Costs are
integers. Places are defined only by number.
Example files are in a public
repository
that you should fork into your own GitHub account, then clone onto the
computer(s) where you are working on this project. This repository
also includes a C program you can use to create input files of any
size for the Pacheco TSP program using METAL data from .tmg files.
Question 1: Bonus only: For 5 bonus points, add two new datasets to
this repository. Keep them fairly small, say between 8 and 20
places. Be creative! The places do not need to be cities and edge
costs do not need to be miles (or even be distances). You could do
rooms within a building, buildings on a campus, houses in a
neighborhood, anything really. It would be nice to get some more
variety. You will commit and push your new files to your fork of
the repository, then submit a pull request to have them pulled into
the origin.
Question 2: Pacheco Exercise 6.17, p. 345. (6 points)
Run each of the three serial versions of the program on several
problems of size 10 to 15, including some different inputs of the same
size.
Question 3: Are the times you see consistent with the results in
Pacheco Section 6.2.4 and Table 6.7? Briefly explain the
relative run times of each of the three versions. (10 points)
Question 4: Based on your timings you observed for these small cases,
about how long would you expect each serial implementation to take
to compute a TSP solution for 16 places? For 20 places? For 40
places? (3 points)
Parallelization with pthreads
We will look at the next part of Section 6.2 of Pacheco, pages
306-315, and study the implementations in pth_tsp_stat.c and
pth_tsp_dyn.c.
Note that this code uses condition variables: we will discuss them
briefly.
Question 5: Pacheco Exercise 6.18, p. 345-346. Only answer the
"Why" question; you need not do the code modifications in (a) and
(b). (3 points)
Question 6: Pacheco Exercise 6.20, p. 346. Only answer the
questions; you need not do the implementations. (6 points)
Question 7: Run each pthreads implementation (the static and the
dynamic) for 1, 2, 4, 8, 16, 32, and 64 threads on a Stampede2 node,
using a problem size such that the single-thread version runs for at
least 3 minutes. Present and analyze your timing results. Note that
the dynamic version takes a 3rd command-line parameter which is the
minimum size at which to split the stack. You can use 10 for this
value. (10 points)
Parallelization with MPI
Finally, we'll look at the part of Section 6.2 of Pacheco
relating to the MPI implementation of TSP, pages 319-341, and study
the implementations in mpi_tsp_stat.c and mpi_tsp_dyn.c.
Question 8: Find three different parts of the code where the pthreads
version was able to take advantage of shared memory and study how
the MPI version handles that case with distributed memory. Discuss
the costs in terms of code complexity, overhead of replicated data,
and overhead of communication. (10 points)
Question 9: Run each MPI implementation (the static and the dynamic
(you may omit dynamic runs if you get runtime errors on these))
on 1, 2, 4, 8, 16, 32, and 64 nodes on Stampede2 with 64 processes
per node (so your MPI_Comm_size should be 64, 128, 256, ...,
4096), using a problem size such that the single-thread version runs
for at least 5 minutes. Present and analyze your timing
results. (10 points)
Question 10: What size input would you expect to run for about a day
on a single process with a single thread on a Stampede2 production
node? (2 points)
Question 11: Run the dynamic partitioning MPI
implementation on a problem of the size you came up with for the
previous question. Run it on enough Stampede2 production nodes
(with 64 processes per node) so that you would expect the run time
to be around 3 minutes with perfect parallel efficiency. Present
and analyze your timing results. (10 points)
Submission
Commit and push!
Grading
This assignment will be graded out of 70 points.
Feature | Value | Score |
New datasets (up to +5 bonus) | 0 | |
Q1, Ex. 6.17 | 6 | |
Q2 | 10 | |
Q3 | 3 | |
Q4, Ex. 6.18 | 3 | |
Q5, Ex. 6.20 | 6 | |
Q6 | 10 | |
Q7 | 10 | |
Q8 | 10 | |
Q9 | 2 | |
Q10 | 10 | |
Total | 70 | |
|