Computer Science 385
Design and Analysis of Algorithms

Spring 2019, Siena College

Lab 2: Working with METAL Graph Data
Due: Start of your next lab session

We will be studying many graph algorithms this semester, some of which you will be implementing in Java. For many of these tasks, you will be working with real-world data sets derived from highway systems. This lab will introduce you to that data.

You will be assigned a partner to work with on this lab. Each group member must complete all tasks and submit his/her own copy of this packet.

All required code deliverables should be committed and pushed to the master branch for your repository's origin on GitHub. If you see everything you intend to submit when you visit your repository's page on GitHub, you're set.

Learning goals:

  1. to gain initial experience with the real graph data provided by METAL
  2. to be able to reason about and choose and appropriate underlying structure for a graph in a given situation
  3. to be able to write code to work with graph data

Getting Set Up

You will receive an email with the link to follow to set up your GitHub repository, which will be named metalintro-lab-yourgitname, for this Lab. Only one member of the group should follow the link to set up the repository on GitHub, then others will be granted write access.

The Graph Data

A big advantage of working with this kind of data is that it has a connection to reality, and that we can visualize the data and the results of our manipulations of that data in map form. This data is collected by members of the Travel Mapping (TM) Project (http://travelmapping.net/). The Map-based Educational Tools for Algorithm Learning (METAL) Project (/metal/) uses data from the TM project and converts it into a format that is more convenient for us to load into a graph structure and use. Much more about the project is available at the link above, but everything you need to know for this week should be in this document.

METAL provides hundreds of graph files that can be used to explore data structures and algorithms. They range in size from just a few to hundreds of thousands of vertices and edges. We can use the small graphs for tracing algorithms (or debugging code) by hand, medium-sized graphs for testing during implementation, and large graphs to help analyze the efficiency of algorithms and our implementations of them.

All graphs are linked from http://travelmapping.net/graphs/.

The graph data comes in two formats, simple and collapsed, which are described in detail at /metal/graph-formats.shtml. We will use the collapsed format graphs. For this first set of questions, please refer to the NY-region.tmg graph, which is included in your starter repository.

Question 1: How many vertices and how many edges does this graph contain? (1 point)

Question 2: What is the latitude and longitude pair for the vertex that represents the Latham Circle (label NY2/US9)? What is its vertex number? Note that vertex numbers start at 0. (hint: load into your favorite text editor and figure out its line number and remember to adjust for the presence of the file header lines.) (3 points)
Question 3: What are the endpoint vertex numbers and edge label (route name) for the last edge defined in the file? What are the vertex labels of the endpoint vertices? What are the latitude/longitude pairs of these endpoints? (hint: load into your favorite text editor and jump to the appropriate lines.) (3 points)
Question 4: What are the endpoint vertex numbers and edge label for the edge defined on line 11,573 of the file? What are the vertex labels of the endpoint vertices? What are the latitude/longitude pairs of these endpoints? What are the coordinates of the "shaping points" along this edge? (3 points)
Most of our work will involve writing programs that load one of these graphs and perform some operations on it. However, it is also nice to be able to see these graphs plotted on a map. This can be done with METAL's Highway Data Examiner (HDX), available at /metal/hdx/.

Question 5: Load up HDX, and use "Option 2" to load a medium-sized graph (a few hundred vertices and edges) into the system. When the "Algorithm Visualization Selection and Options" panel comes up, press "Done" to dismiss it for now. Zoom and pan to explore the graph data. Write down the name of the graph you loaded, and show your lab instructor your graph loaded in HDX. (4 points)

Reading the Graph into a Java Program

We need a graph data structure that can be used to store this graph data. In class we have seen that graphs can be stored in either an adjacency matrix or an adjacency list structure, and the graph's edges can either be directed or undirected.

Question 6: Which internal structure and directedness makes sense for our graphs that represent highway systems? Why? (3 points)
In class, we have looked at some sample implementations of graph structures. Included in your GitHub starter is an implementation of a graph data structure that can be used to store the graphs representing highway systems.

In addition to the topological information we already considered in class, here we will need to store more information with the vertices (label and coordinates) and edges (road names, list of intermediate points). You will see fields on the edges and vertices that store these.

Study the code that reads in the graph data files (the HighwayGraph constructor), and the toString method that prints it out.

Run the program's main method, passing in the DC-region.tmg graph (provided in your starter repository) as its command-line parameter. Compare the output of the program with the data in the TMG file and be sure you understand how the contents of the data structure correspond to the contents of the file.

Question 7: Show your instructor the output on your screen. (3 points)

Expanding the Program's Functionality

Your programming tasks involve expanding the functionality of the HighwayGraph class. Do not modify the code that constructs the graph to accomplish these tasks.

Vertex Search

The first task is to perform a search of the graph vertices to find the "extreme" vertices: those are at the northernmost, southernmost, easternmost, and westernmost locations. For each, report the vertex label and its latitude/longitude pair. Also, find the vertices that have the shortest and longest labels. For these, you need only report the label. You may handle ties in any reasonable manner.

Before you start coding, watch a visualization of this process on a METAL graph loaded in HDX. Load a small to medium size graph into HDX (DC-region.tmg is a good choice, which you can either upload from your repository, or find within HDX by typing its name in to the "Pick a Graph" text box in the "Option 1" section of the "Load Data" panel. You'll hit Enter to select the correct graph from the list, and then Enter again to confirm your selection. On the "Algorithm Visualization Selection and Options" panel, select "Vertex Extremes Search" and press "Done". This should bring up a small "Algorithm Visualization Status" panel. At the top of the page, there is another panel that you can use to control the visualization. To start, choose "Pretty Slow" and make sure the "Trace Pseudocode" box is checked. Press "Start" and watch as the algorithm searches for the extreme points and shortest and longest labels. Pause, step through, and change the speed to get accustomed to the HDX user interface, and to make sure you understand the search algorithm.

Question 8: There are nine colors used for vertices in METAL's Vertex Extremes Search. What does each represent? (2 points)
Question 9: Can an unvisited vertex inside the "bounding box" ever become the winner for any of the directional extremes? (1 point)
Question 10: Briefly explain why the algorithm continues to check points inside the "bounding box" for directional extremes, given your answer to the previous question. (2 points)
Question 11: Show your instructor the completed algorithm visualization on your screen. (3 points)
Now, implement this vertex search in the main method of the HighwayGraph class. To keep things simple, you may access the private instance variables throughout the code directly from your main method.

Question 12: Demonstrate your program's vertex search results for the following graphs: DC-region.tmg, YT-region.tmg, HI-region.tmg, and NY-region.tmg, each of which is in your starter repository, and on one other graph of your choice that has at least 3000 vertices. (20 points)
Question 13: Give the Big-O complexity of your vertex search algorithm in terms of |V| and/or |E|. (4 points)

Edge Search

Your second task is to perform a search of the graph edges to find the edges with the shortest and longest length. You will likely want to model your loop structure to traverse the edges in the graph after the one in the toString method.

Question 14: Run the "Edge Extremes Search" algorithm visualization in HDX and watch how it proceeds. Show your instructor the completed algorithm visualization on your screen. (3 points)
Implement this edge search in the main method of the HighwayGraph class right after your vertex search code.

Question 15: Demonstrate your program's edge search results for the following graphs: DC-region.tmg, YT-region.tmg, HI-region.tmg, and NY-region.tmg, and on one other graph of your choice that has at least 3000 edges. (25 points)
Question 16: Give the Big-O complexity of your edge search algorithm in terms of |V| and/or |E|. (4 points)
Next, augment your code so that your edge search counts how many edges you considered as you were searching for the longest and shortest edges, and print both that count and the number of edges in the graph that we stored when it was constructed (available to main as g.numEdges).

Question 17: Re-run your edge search using a couple of the graphs, and verify that the number of edges visited did not match the number of edges in the graph. Why not? What is the relation between the two? (4 points)
This is a general problem when using an adjacency list representation for undirected graphs. In this case, it doesn't really matter if the same edge is considered multiple times. But other times (e.g., finding the total length of all edges), we might need to visit each edge exactly once.

Question 18: Fix your loop so that it considers each edge exactly once. (5 points)
Question 19: With this in mind, add code to your program to compute the total length of all edges in the graph. Demonstrate on each of the graphs in the repository. (7 points)

Don't forget to commit and push all of your code for this lab to your group's GitHub repository.