Computer Science 210
Data Structures
Fall 2018, Siena College
You may work alone or in a group of size 2 or 3 on this assignment. However, in order to make sure you learn the material and are well-prepared for the exams, you should work through non-coding problems on your own before discussing them with your partner, should you choose to work with someone. Also, be sure every programming task is a team effort. In particular, the "you do these and I'll do these" approach is sure to leave you unprepared for the exams.
All GitHub repositories must be created with all group members having write access and all group member names specified in the README.md file by 4:00 PM, Monday, December 3, 2018. This applies to those who choose to work alone as well!
Please also note that the regular late policy does not apply to this assignment, so we can go over some of the problems in class on our last day. No late submissions will be accepted. Whatever is pushed to your repository at the deadline will be considered your submission.
Getting Set Up
You will receive an email with the link to follow to set up your GitHub repository for this problem set (ps7-yourgitname). One member of the group should follow the link to set up the repository on GitHub. Other group members will receive a subsequent email with a link to follow that will grant them the rights to clone the repository and commit and push changes to the origin on GitHub.
Binary Search Trees
For the first three questions, consider three-node integer-valued binary trees whose nodes contain the elements "1", "2", and "3".
For the next two questions, please use the AVL tree rules as we studied them in class.
Copy your version of BinarySearchTree.java from Lab 9 into your repository and then add, commit, and push that version as your starting point for this section.
One could argue that two BinarySearchTree instances are equal if they contain the same values in any shape tree, or that they must contain the same values and have the same shape.
For the methods below, you may write them recursively or iteratively. You may only call other methods if doing so does not cause a significant efficiency degradation.
Heaps and Heapsort
Back to Word Frequency
The Java API interface java.util.Map provides a mechanism by which key-value pairs can be stored using different underlying structures.
In a recent lab, you worked with a program that computed and reported the frequency of words in input text. Data structures that implement the Map interface are perfectly suited to perform this kind of task. We will use the words as keys and counts as values.
In this part of the problem set, you will be doing a small empirical analysis study using a program that computes word frequencies. The program reads the words from the input, then computes the frequencies five times: once with each of the structures we used previously for this task, once for a TreeMap, and twice with a HashMap.
In the version of this program that you will find in WordFrequency.java in your repository, the input is first read into an ArrayList, and entries from the ArrayList are used to populate the data structures that perform frequency counting. This separates out the overhead of file I/O from the timings and can give us a better idea of the relative efficiencies of the data structures for this task.
In each of the computations of the word frequencies, there are method calls operating on the structure that stores the frequencies. For example, for the ArrayList version, the indexOf, add, and get methods are called. Each call to indexOf performs a linear search, so has a best case running time of O(1), when an entry for the word is found at the beginning of the ArrayList and a worst case running time of O(n) when there is no entry for the word yet or it's found near the end. Each call to get is O(1), since the get method of the ArrayList class can access the entry directly in its underlying array. The calls to add result in the new entry being added at the end of the list. Most of these are O(1), with the exception of the times when the underlying array needs to be expanded, in which case it is O(n).
In your repository, you will find a number of text files of varying sizes. Run WordFrequency multiple times, redirecting your input at the command line from each of these files. Run each case at least three times.
Submitting
Please answer questions that require a text response in the README.md file of your repository. Hand-drawn images may be submitted electronically through your repository or in hard copy. Only one submission per group is needed.
Your submission requires that all required code deliverables are committed and pushed to the master for your repository's origin on GitHub. That's it! If you see everything you intend to submit when you visit your repository's page on GitHub, you're set.
Grading
This assignment is worth 75 points, which are distributed as follows:
> Feature | Value | Score |
Question 1 | 2 | |
Question 2 | 2 | |
Question 3 | 2 | |
Question 4 | 3 | |
Question 5 | 3 | |
Question 6 | 2 | |
BinarySearchTree equals correctness and tests | 5 | |
BinarySearchTree shapeEquals correctness and tests | 5 | |
BinarySearchTree isAVL correctness and tests | 5 | |
Question 7 | 2 | |
Question 8 | 2 | |
Question 9 | 2 | |
Question 10 | 2 | |
Question 11 | 2 | |
Question 12 | 3 | |
Question 13 | 1 | |
Question 14 | 1 | |
Question 15 | 12 | |
Question 16 | 6 | |
Question 17 | 3 | |
Question 18 | 10 | |
Total | 75 | |