Computer Science 211
Data Structures

Mount Holyoke College
Fall 2009


Lab 8: Practice with Search Trees
Due: 4:00 PM, Tuesday, November 24, 2009


This "mini-lab" assignment will give you some experience with search trees. We have looked at the BinarySearchTree in class, and we noticed that there was no guarantee of any balance, and hence no guarantee of logn behavior. We will look at one type of balanced tree in class next week, but for today we will make use of two balanced tree implementations available in structure: the RedBlackSearchTree and the SplayTree.

While the assignment is not officially due until 4:00 PM, Tuesday, November 24, 2009, you should be able to finish or at least come close during our fourth hour meeting.


Getting Started

The framework of the program you are to write is in the shared area under labs/treecompare in the file TreeCompare.java. The main method in this file reads in two files into Sets, one contains a collection of input strings to be used as the contents of a search tree, and one to be used as the values to search for among that input. These files are specified as command-line parameters to the program. A third parameter, an integer, is used to specify the number of times to repeat the search.


Programming Tasks

Your programming task is to add the words from the input set into each of the three types of search trees provided by structure, and then search for all of the words the requested number of times. You should have timers in your program to see how long each one takes to build the tree and to perform the search.

You may make use of the method System.currentTimeMillis to do the timings:

    long startTime = System.currentTimeMillis();
    // do stuff
    long elapsedTime = System.currentTimeMillis() - startTime;
    System.out.println("do stuff took " + elapsedTime + "ms");

Note: for large input files, you may find that the construction of your trees takes quite a long time and you might run into issues with the size of Java's call stack. If you encounter this problem you should be able to specify a larger allocation of memory for the call stack by adding an extra flag to your command line:

java -Xss20m TreeCompare ...

The -Xss20m increases the memory available for the call stack to 20 megabytes from the default 400 kilobytes.


Analysis

Once your program works, you are to construct some input files (or make use of some we used earlier in the semester) that show the benefits of each implementation. Keep in mind that the BinarySearchTree makes no effort to keep balance, so an unfortunate input sequence can result in a linear or nearly linear structure. The RedBlackSearchTree maintains a balance condition each time it is modified, adding cost to its construction but ensuring a tree height near logn. Finally, the SplayTree actually modifies the tree to move items you search for to the root of the tree, so recently-searched items will be found quickly in subsequent searches.

Come up with some input data sets that are large enough to produce meaningful timings and that demonstrate the benefits of each approach. In your writeup, which you may include in a comment in your TreeCompare.java file, describe your input data, show the timings you obtained, and explain why you believe the timings came out the way they did.


Submission and Grading

Submit your program (and the writeup contained therein), and input files (if they were different from ones we've used in class and for other assignments), as attachments to an email to jteresco AT mtholyoke.edu, no later than 4:00 PM, Tuesday, November 24, 2009.

This lab assignment is graded out of 15 points, broken down as follows.

Grading Breakdown

Program correctness 6 points
Program style 1 point
Program documentation 1 point
Appropriate input files 3 points
Timing results and analysis 4 points

The program style grade will be based on code formatting and approriate use of Java naming conventions. The program documentation grade is, of course, based on the comments you provide. You'll lose a half a point if your name is not included in the comment at the top. The program correctness grade is based on how well your program meets the functionality requirements.