Lab 11

In this lab, we will work together on some tasks related to parallelization of the famous Traveling Salesperson Problem (TSP).

You can find the link to follow to set up your GitHub repository tsp-lab-yourgitname for this Lab in Canvas. One member of the group should follow the link to set up the repository on GitHub, then that person should email the instructor with the other group members' GitHub usernames so they can be granted access. This will allow all members of the group to clone the repository and commit and push changes to the origin on GitHub. At least one group member should make a clone of the repository to begin work.

Your initial repository should include the source code described in Section 6.2 of Pacheco related to the text's TSP implementations.

We will look at the first part of Section 6.2 of Pacheco, pages 299-306, and study the implementations in tsp_rec.c, tsp_iter1.c and tsp_iter2.c.

The input to these programs is a file describing a directed graph, where the edge weight in row r, column c is the cost to travel from place number r to place number c. Costs are integers. Places are defined only by number.

Example files are in a public repository that you should fork into your own GitHub account, then clone onto the computer(s) where you are working on this project. This repository also includes a C program you can use to create input files of any size for the Pacheco TSP program using METAL data from .tmg files.

Question 1: Bonus only: For 5 bonus points, add two new datasets to this repository. Keep them fairly small, say between 8 and 20 places. Be creative! The places do not need to be cities and edge costs do not need to be miles (or even be distances). You could do rooms within a building, buildings on a campus, houses in a neighborhood, anything really. It would be nice to get some more variety. You will commit and push your new files to your fork of the repository, then submit a pull request to have them pulled into the origin.

Question 2: Pacheco Exercise 6.17, p. 345. (6 points)

Run each of the three serial versions of the program on several problems of size 10 to 15, including some different inputs of the same size.

Question 3: Are the times you see consistent with the results in Pacheco Section 6.2.4 and Table 6.7? Briefly explain the relative run times of each of the three versions. (10 points)

Question 4: Based on your timings you observed for these small cases, about how long would you expect each serial implementation to take to compute a TSP solution for 16 places? For 20 places? For 40 places? (3 points)

We will look at the next part of Section 6.2 of Pacheco, pages 306-315, and study the implementations in pth_tsp_stat.c and pth_tsp_dyn.c.

Question 5: Pacheco Exercise 6.18, p. 345-346. Only answer the "Why" question; you need not do the code modifications in (a) and (b). (3 points)

Question 6: Pacheco Exercise 6.20, p. 346. Only answer the questions; you need not do the implementations. (6 points)

Question 7: Run each pthreads implementation (the static and the dynamic) for 1, 2, 4, 8, 16, 32, and 64 threads on a Stampede2 node, using a problem size such that the single-thread version runs for at least 3 minutes. Present and analyze your timing results. Note that the dynamic version takes a 3rd command-line parameter which is the minimum size at which to split the stack. You can use 10 for this value. (10 points)

Finally, we'll look at the part of Section 6.2 of Pacheco relating to the MPI implementation of TSP, pages 319-341, and study the implementations in mpi_tsp_stat.c and mpi_tsp_dyn.c.

Question 8: Find three different parts of the code where the pthreads version was able to take advantage of shared memory and study how the MPI version handles that case with distributed memory. Discuss the costs in terms of code complexity, overhead of replicated data, and overhead of communication. (10 points)

Question 9: Run each MPI implementation (the static and the dynamic (you may omit dynamic runs if you get runtime errors on these)) on 1, 2, 4, 8, 16, 32, and 64 nodes on Stampede2 with 64 processes per node (so your MPI_Comm_size should be 64, 128, 256, ..., 4096), using a problem size such that the single-thread version runs for at least 5 minutes. Present and analyze your timing results. (10 points)

Question 10: What size input would you expect to run for about a day on a single process with a single thread on a Stampede2 production node? (2 points)

Question 11: Run the dynamic partitioning MPI implementation on a problem of the size you came up with for the previous question. Run it on enough Stampede2 production nodes (with 64 processes per node) so that you would expect the run time to be around 3 minutes with perfect parallel efficiency. Present and analyze your timing results. (10 points)

Feature	Value	Score
New datasets (up to +5 bonus)	0
Q1, Ex. 6.17	6
Q2	10
Q3	3
Q4, Ex. 6.18	3
Q5, Ex. 6.20	6
Q6	10
Q7	10
Q8	10
Q9	2
Q10	10
Total	70