Computer Science 400
Parallel Processing and High Performance Computing

Fall 2017, Siena College

Programming Project 4: Traveling Salesperson Problem
Due: before the end of the semester

Our last programming project will be a series of tasks related to the famous Traveling Salesperson Problem (TSP).

You may work alone or in a group of 2 or 3 on this project.

Getting Set Up

You will receive an email with the link to follow to set up your GitHub repository tsp-project-yourgitname for this Programming Project. One member of the group should follow the link to set up the repository on GitHub, then that person should email the instructor with the other group members' GitHub usernames so they can be granted access. This will allow all members of the group to clone the repository and commit and push changes to the origin on GitHub. At least one group member should make a clone of the repository to begin work.

Your initial repository should include the source code described in Section 6.2 of Pacheco related to the text's TSP implementations.

Serial Implementations

(Recommendation: complete this part by November 17.)

Read the first part of Section 6.2 of Pacheco, pages 299-306, and study the implementations in tsp_rec.c, tsp_iter1.c and tsp_iter2.c.

The input to these programs is a file describing a directed graph, where the edge weight in row r, column c is the cost to travel from place number r to place number c. Costs are integers. Places are defined only by number.

Example files are in a public repository that you should fork into your own GitHub account, then clone onto the computer(s) where you are working on this project.

Question 1: Add two new datasets (three if working in a group of three) to this repository. Keep them fairly small, say between 8 and 20 places. Be creative! The places do not need to be cities and edge costs do not need to be miles (or even be distances). You could do rooms within a building, buildings on a campus, houses in a neighborhood, anything really. It would be nice to get some variety. You will commit and push your new files to your fork of the repository, then submit a pull request to have them pulled into the master. (5 points)

Question 2: Pacheco Exercise 6.17, p. 345. (6 points)

Question 3: Run each of the three serial implementations for four different problem sizes each. Use between 10 and 14 places for your sizes. Present and analyze your timing results. Are these consistent with the results in Pacheco Section 6.2.4 and Table 6.7? Briefly explain the relative run times of each of the three versions. (10 points)

Question 4: Based on your timings from the previous question, about how long would you expect each serial implementation to take to compute a TSP solution for 16 places? For 20 places? For 40 places? (3 points)

Parallelization with pthreads

(Recommendation: complete this part by December 1.)

Read the next part of Section 6.2 of Pacheco, pages 306-315, and study the implementations in pth_tsp_stat.c and pth_tsp_dyn.c.

Question 5: Pacheco Exercise 6.18, p. 345-346. Only answer the "Why" question; you need not do the code modifications in (a) and (b). (3 points)

Question 6: Pacheco Exercise 6.20, p. 346. Only answer the questions; you need not do the implementations. (6 points)

Question 7: Run each pthreads implementation (the static and the dynamic) for 1, 2, 4, 8, 16, 32, and 64 threads on a Stampede2 node, using a problem size such that the single-thread version runs for at least 30 minutes. Present and analyze your timing results. (10 points)

Parallelization with MPI

Read the part of Section 6.2 of Pacheco relating to the MPI implementation of TSP, pages 319-341, and study the implementations in mpi_tsp_stat.c and mpi_tsp_dyn.c.

Question 8: Find three different parts of the code where the pthreads version was able to take advantage of shared memory and study how the MPI version handles that case with distributed memory. Discuss the costs in terms of code complexity, overhead of replicated data, and overhead of communication. (10 points)

Question 9: Run each MPI implementation (the static and the dynamic (you may omit dynamic runs if you get runtime errors on these)) on 1, 2, 4, 8, 16, 32, and 64 nodes on Stampede2 with 64 processes per node (so your MPI_Comm_size should be 64, 128, 256, ..., 4096), using a problem size such that the single-thread version runs for at least 30 minutes. Present and analyze your timing results. (10 points)

Question 10: What size input would you expect to run for a few months on a single process with a single thread on a Stampede2 production node? (2 points)

Question 11: This is now a bonus question. Run the dynamic (use static if having trouble running dynamic) partitioning MPI implementation on a problem of the size you came up with for the previous question. Run it on enough Stampede2 production nodes (with 64 processes per node) so that you would expect the run time to be around 30 minutes with perfect parallel efficiency. Present and analyze your timing results. (10 points)

Submitting

Your submission requires that all required deliverables are committed and pushed to the master for your repository on GitHub. For the new data sets, pull requests should be made to the common repository.

Grading

This assignment is worth 65 points, which are distributed as follows:

> FeatureValueScore
Q1: New datasets 5
Q2: 6.17 6
Q3: timings for serial versions 10
Q4: expected times for serial versions 3
Q5: 6.18 3
Q6: 6.20 6
Q7: timings and analysis for pthreads versions 10
Q8: pthreads/MPI comparison 10
Q9: timings and analysis for MPI versions 10
Q10: big problem size 2
Q11: run and analysis of big problem on many nodes 10 (Bonus)
Total 65