Computer Science 210
Data Structures

Fall 2018, Siena College

Problem Set 7
Due: 10:00 AM, Monday, December 10, 2018 (no late submissions)

You may work alone or in a group of size 2 or 3 on this assignment. However, in order to make sure you learn the material and are well-prepared for the exams, you should work through non-coding problems on your own before discussing them with your partner, should you choose to work with someone. Also, be sure every programming task is a team effort. In particular, the "you do these and I'll do these" approach is sure to leave you unprepared for the exams.

All GitHub repositories must be created with all group members having write access and all group member names specified in the README.md file by 4:00 PM, Monday, December 3, 2018. This applies to those who choose to work alone as well!

Please also note that the regular late policy does not apply to this assignment, so we can go over some of the problems in class on our last day. No late submissions will be accepted. Whatever is pushed to your repository at the deadline will be considered your submission.

Getting Set Up

You will receive an email with the link to follow to set up your GitHub repository for this problem set (ps7-yourgitname). One member of the group should follow the link to set up the repository on GitHub. Other group members will receive a subsequent email with a link to follow that will grant them the rights to clone the repository and commit and push changes to the origin on GitHub.

Binary Search Trees

For the first three questions, consider three-node integer-valued binary trees whose nodes contain the elements "1", "2", and "3".

Question 1: Draw all valid trees whose in-order traversal would visit the nodes in the order 1-2-3. Which of these are binary search trees? Note that there are several valid tree "shapes" in each case. (2 points)

Question 2: Draw all valid trees whose preorder traversal would visit the nodes in the order 1-2-3. Which of these are binary search trees? (2 points)

Question 3: Draw all valid trees whose postorder traversal would visit the nodes in the order 1-2-3. Which of these are binary search trees? (2 points)

For the next two questions, please use the AVL tree rules as we studied them in class.

Question 4: Start with a complete AVL tree containing the values 10, 20, 30, 40, 50, 60, and 70. Insert the values 31, then 32, showing any rotations needed to maintain the AVL condition. (3 points)

Question 5: Now, starting with the original tree, insert 32 then 31. Show any rotations needed to maintain the AVL condition. (3 points)

Copy your version of BinarySearchTree.java from Lab 9 into your repository and then add, commit, and push that version as your starting point for this section.

One could argue that two BinarySearchTree instances are equal if they contain the same values in any shape tree, or that they must contain the same values and have the same shape.

Question 6: Draw two small BinarySearchTrees which are equal by the first definition, but not equal by the second. (2 points)

For the methods below, you may write them recursively or iteratively. You may only call other methods if doing so does not cause a significant efficiency degradation.

Practice Program: Write a method equals in the BinarySearchTree class that considers trees equal if they contain the same values. Write associated tests in the main method of BinarySearchTree. (5 points)

Practice Program: Write a method shapeEquals in the BinarySearchTree class that considers trees equal if they contain the same values and have the same shape. Write associated tests in the main method of BinarySearchTree. (5 points)

Practice Program: Write a method isAVL in the BinarySearchTree class that returns a boolean indicating if the tree satisfies the AVL condition. Write associated tests in the main method of BinarySearchTree. (5 points)

Heaps and Heapsort

Question 7: Bailey Problem 13.2, p. 338. (2 points)

Question 8: Bailey Problem 13.4, p. 338. (2 points)

Question 9: Bailey Problem 13.6, p. 338. (2 points)

Question 10: Bailey Problem 13.12, p. 339. Assume that you are building a min heap. (2 points)

Question 11: Complete the heapsort procedure on the array you just "heapified" in 13.12. (2 points)

Question 12: Repeat the heapsort in the previous two questions, but build a max heap. (3 points)

Back to Word Frequency

The Java API interface java.util.Map provides a mechanism by which key-value pairs can be stored using different underlying structures.

In a recent lab, you worked with a program that computed and reported the frequency of words in input text. Data structures that implement the Map interface are perfectly suited to perform this kind of task. We will use the words as keys and counts as values.

In this part of the problem set, you will be doing a small empirical analysis study using a program that computes word frequencies. The program reads the words from the input, then computes the frequencies five times: once with each of the structures we used previously for this task, once for a TreeMap, and twice with a HashMap.

In the version of this program that you will find in WordFrequency.java in your repository, the input is first read into an ArrayList, and entries from the ArrayList are used to populate the data structures that perform frequency counting. This separates out the overhead of file I/O from the timings and can give us a better idea of the relative efficiencies of the data structures for this task.

Question 13: According to the Java API documentation, what is the underlying data structure used to implement java.util.TreeMap? (1 point)

Question 14: According to the Java API documentation, what is the underlying data structure used to implement java.util.HashMap? (1 point)

In each of the computations of the word frequencies, there are method calls operating on the structure that stores the frequencies. For example, for the ArrayList version, the indexOf, add, and get methods are called. Each call to indexOf performs a linear search, so has a best case running time of O(1), when an entry for the word is found at the beginning of the ArrayList and a worst case running time of O(n) when there is no entry for the word yet or it's found near the end. Each call to get is O(1), since the get method of the ArrayList class can access the entry directly in its underlying array. The calls to add result in the new entry being added at the end of the list. Most of these are O(1), with the exception of the times when the underlying array needs to be expanded, in which case it is O(n).

Question 15: For each of the method calls on each of the other structures, give its running time in Big O notation and justify your answer. Be specific, and if there is best and worst case behavior, note that clearly. Follow the example in the description of the ArrayList behavior above. (12 points)

In your repository, you will find a number of text files of varying sizes. Run WordFrequency multiple times, redirecting your input at the command line from each of these files. Run each case at least three times.

Question 16: Present a table of the results of the runs described above. Include a row for each input file, and a column for the name of the file, the number of words in the file, the number of unique words found, and the elapsed times for each of the 5 structures. Use the fastest time you get for each case. Order your table's rows in increasing numbers of words in the input file. (6 points)

Question 17: The last two tests each use a HashMap, but the second is likely to be faster than the first. Using the Java API documentation for reference, explain the difference between the two HashMaps. Why would you not use an even larger value in hopes of further timing improvements? (3 points)

Question 18: Analyze the overall results in your table. Relate the timings back to the expected running times of the methods. Do they match expectations? Be specific. (10 points)

Submitting

Please answer questions that require a text response in the README.md file of your repository. Hand-drawn images may be submitted electronically through your repository or in hard copy. Only one submission per group is needed.

Your submission requires that all required code deliverables are committed and pushed to the master for your repository's origin on GitHub. That's it! If you see everything you intend to submit when you visit your repository's page on GitHub, you're set.

Grading

This assignment is worth 75 points, which are distributed as follows:

> FeatureValueScore
Question 1 2
Question 2 2
Question 3 2
Question 4 3
Question 5 3
Question 6 2
BinarySearchTree equals correctness and tests 5
BinarySearchTree shapeEquals correctness and tests 5
BinarySearchTree isAVL correctness and tests 5
Question 7 2
Question 8 2
Question 9 2
Question 10 2
Question 11 2
Question 12 3
Question 13 1
Question 14 1
Question 15 12
Question 16 6
Question 17 3
Question 18 10
Total 75