Computer Science 385
Design and Analysis of Algorithms

Spring 2018, Siena College

Lab 2: Introduction to METAL Graph Data
Due: Start of your next lab session

We will be studying many graph algorithms this semester, some of which you will be implementing in Java. For many of these tasks, you will be working with real-world data sets derived from highway systems.

A big advantage of working with this kind of data is that it has a connection to reality, and that we can visualize the data and the results of our manipulations of that data with the Google Maps API. This data is collected by members of the Travel Mapping (TM) Project (http://tm.teresco.org/). The Map-based Educational Tools for Algorithm Learning (METAL) Project (http://courses.teresco.org/metal/) uses data from the TM project and converts it into a format that is more convenient for us to load into a graph structure and use. Much more about the project is available at the link above, but everything you need to know for this week should be in this document.

You will be assigned a partner to work with on this lab. Only one submission per group is needed.

Learning goals:

  1. to gain initial experience with the real graph data provided by METAL
  2. to be able to reason about and choose and appropriate underlying structure for a graph in a given situation
  3. to be able to write code to work with graph data

Getting Set Up

You will receive an email with the link to follow to set up your GitHub repository, which will be named metalintro-lab-yourgitname, for this Lab. One member of the group should follow the link to set up the repository on GitHub, then that person should email the instructor with the other group members' GitHub usernames so they can be granted access. This will allow all members of the group to clone the repository and commit and push changes to the origin on GitHub.

The Graph Data

METAL provides hundreds of graph files that can be used to explore data structures and algorithms. They range in size from just a few to hundreds of thousands of vertices and edges. We can use the small graphs for tracing algorithms (or debugging code) by hand, medium-sized graphs for testing during implementation, and large graphs to help analyze the efficiency of algorithms and our implementations of them.

All graphs are linked from http://tm.teresco.org/graphs/.

The graph data comes in two formats, simple and collapsed, which are described in detail at http://courses.teresco.org/metal/graph-formats.shtml. We will use the collapsed format graphs. For this first set of questions, please refer to the NY-region.tmg graph, which is included in your starter repository.

Question 1: How many vertices and how many edges does this graph contain? (1 point)

Question 2: What is the latitude and longitude pair for the vertex that represents the Latham Circle (label US9/NY2)? What is its vertex number? Note that vertex numbers start at 0. (hint: load into your favorite text editor and figure out its line number and remember to adjust for the presence of the file header lines.) (3 points)
Question 3: What are the endpoint vertex numbers and edge label (route name) for the last edge defined in the file? What are the vertex labels of the endpoint vertices? What are the latitude/longitude pairs of these endpoints? (hint: load into your favorite text editor and jump to the appropriate lines.) (3 points)
Question 4: What are the endpoint vertex numbers and edge label for the edge defined on line 6330 of the file? What are the vertex labels of the endpoint vertices? What are the latitude/longitude pairs of these endpoints? What are the coordinates of the "shaping points" along this edge? (3 points)
Most of our work will involve writing programs that load one of these graphs and perform some operations on it. However, it is also nice to be able to see these graphs plotted on a map. This can be done with METAL's Highway Data Examiner (HDX), available at http://courses.teresco.org/metal/hdx/.

Question 5: Choose one of the medium-sized graphs (a few hundred vertices and edges) from the "Load METAL graph" section of the Algorithm Visualization Control Panel in HDX. Zoom and pan to explore the graph data. Write down the name of the graph you loaded, and show your lab instructor your graph loaded in HDX. (4 points)

Reading the Graph into a Java Program

We need a graph data structure that can be used to store this graph data. In class we have seen that graphs can be stored in either an adjacency matrix or an adjacency list structure, and the graph's edges can either be directed or undirected.

Question 6: Which internal structure and directedness makes sense for our highway mapping graphs? Why? (3 points)
In class, we have looked at some sample implementations of graph structures. Included in your GitHub starter is an implementation of a graph data structure that can be used to store the highway mapping graphs.

In addition to the topological information we already considered in class, here we will need to store more information with the vertices (label and coordinates) and edges (road names, list of intermediate points).

Study the code that reads in the graph data files (the HighwayGraph constructor), and the toString method that prints it out.

Run the program's main method, passing in the DC-region.tmg graph (provided in your starter repository) as its command-line parameter. Compare the output of the program with the TMG file and be sure you understand how the contents of the data structure correspond to the contents of the file.

Question 7: Show your instructor the output on your screen. (3 points)

Expanding the Program's Functionality

Your programming tasks involve expanding the functionality of the HighwayGraph class. Do not modify the code that constructs the graph to accomplish these tasks.

Vertex Search

The first task is to perform a search of the graph vertices to find the "extreme" vertices: those are at the northernmost, southernmost, easternmost, and westernmost locations. For each, report the vertex label and its latitude/longitude pair. Also, find the vertices that have the shortest and longest labels. For these, you need only report the label. You may handle ties in any reasonable manner.

Before you start coding, watch a visualization of this process on a METAL graph loaded in HDX. Load a small to medium size graph into HDX (DC-region.tmg is a good choice, which you can either upload from your repository, or find within HDX by choosing "Routes Within a Single Region" before pressing the "Get Graph List" button). Then select "Vertex Extremes Search" from the "Algorithm Selection" and choose "Pretty Slow" for the simulation speed. Press "Start" and watch as the algorithm searches for the extreme points and shortest and longest labels.

Question 8: There are nine colors used for vertices in METAL's Vertex Extremes Search. What does each represent? (5 points)
Question 9: Show your instructor the completed algorithm visualization on your screen. (3 points)
Now, implement this vertex search in the main method of the HighwayGraph class.

Question 10: Demonstrate your program's vertex search results for the following graphs: DC-region.tmg, YT-region.tmg, HI-region.tmg, and NY-region.tmg, each of which is in your starter repository. (20 points)
Question 11: Give the Big-O complexity of your vertex search algorithm in terms of |V| and/or |E|. (4 points)

Edge Search

Your second task is to perform a search of the graph edges to find the edges with the shortest and longest length. You will likely want to model your loop structure to traverse the edges in the graph after the one in the toString method.

Question 12: Run the "Edge Extremes Search" algorithm visualization in HDX and watch how it proceeds. Show your instructor the completed algorithm visualization on your screen. (3 points)
Implement this edge search in the main method of the HighwayGraph class right after your vertex search code.

Question 13: Demonstrate your program's edge search results for the following graphs: DC-region.tmg, YT-region.tmg, HI-region.tmg, and NY-region.tmg. (25 points)
Question 14: Give the Big-O complexity of your edge search algorithm in terms of |V| and/or |E|. (4 points)
Next, augment your code so that your edge search counts how many edges you considered as you were searching for the longest and shortest edges, and print both that count and the number of edges in the graph that we stored when it was constructed (available to main as g.numEdges).

Question 15: Using any of the graphs, verify that the number of edges visited did not match the number of edges in the graph. Why not? Can you see a relation between the two? (4 points)
This is a general problem when using an adjacency list representation for undirected graphs. In this case, it doesn't really matter if the same edge is considered multiple times. But other times (e.g., finding the total length of all edges), we might need to visit each edge exactly once.

Question 16: Fix your loop so that it considers each edge exactly once. (5 points)
The remaining 7 points will be awarded based on code style, documentation, and efficiency.

Submitting

Once all written items are initialed to indicate completion, turn in one copy of this handout. Be sure names of all group members are clearly on the first page.

Your submission also requires that your code is committed and pushed to the master for your repository's origin on GitHub. If you see everything you intend to submit when you visit your repository's page on GitHub, you're set.