Assignment #1: vEB tree

Assignment Overview

The first assignment will teach you how to implement and empirically evaluate van Emde Boas tree. The primary goal of this assignment is to become familiar with van Emde Boas trees and understand the asymptotic behavior. All the code in this programming assignment must be written in C/C++. If you have not used C++ before, here's a short tutorial on the language. Even if you are familiar with C++, go over this guide for additional information on writing code in the system.

If you want to use another programming language for this assignment please ask the instructor first.

This is a single-person project that will be completed individually (i.e., no groups).

Release date: Wed, January 18
Due date: Wed, February 1

Implementation Details

There are three steps in this assignment:

Implement vEB tree using O(u) space
Write a test program for vEB tree implementation
Empirical performance evaluation
Write a report

Step #1 - Implement vEB tree using O(u) space

The first step is to implemet the van Emde Boas (vEB) tree. You can refer to the lecture notes for the pseudo code. You do not need to implement space-optimized X-fast trees/Y-fast trees. Your vEB tree implementation can take O(u) space, where u is the universe size.

In your vEB tree implementation you need to support the following API:

We have a universe U of size u = |U| and a set S of size |S| = n, S ⊂ U, and we want to implement the following operations:

Insert (x): S ← S ⋃ {x}
Query (x): Return whether x ∈ S
Successor (x): Return the minimum y ∈ S,such that y ≧ x

Note: that x ∈ U but x need not be in S.

Step #2 - Write a test program for vEB tree implementation

Your next step is to write a test program to validate the correctness of the above operations.

As a reference, you can find a test program and a Makefile in C++. This program creates a binary search tree (BST) using std::set and performs insertions and queries (find, successor) for N items.

To build and run the test program you need follow the following instructions:

make
./test NUM_ITEMS

The test program also records the time to perform N insertions and queries and reports it using std::chrono library in C++.

Step #3 - Empirical performance evaluation

You need to evalate the empirical performance of the vEB tree implementation and compare it against a binary search tree (BST). You should use an existing search tree implementation from the standard library (e.g., std::set).

You need to write a benchmark to measure the running time of the above operations. In the benchmark, you will insert and query (find, successor) N 32-bit integers in the data structure and measure the running time. You can use the std::chrono to measure the time. If you are using a different programming language than C/C++ you can find a different timing function in that language.

You can extend the test program for benchmarking. Similar to the std::set in the program, you need to use the vEB tree implementation.

Step #4 - Write a report

In the report, you need to plot the performance of the vEB tree and BST. The x-axis of the plot will be the number of items (N) and y-axis will be the time to perform the operation.

For x-axis you need to vary the number of items. You should perform evaluation for: N: {1M, 2M, 4M, 8M, 16M, 32M, 64M} (M: Million).

Instructions

You will use the Cade cluster to finish this project.

CADE manages clusters that you can use to do your development and testing for all of the class projects. You are free to use other machines and environments, but all grading will be done on these machines. Please test your solutions on these machines.

Check with CADE if you need to setup an account.

CADE machines all share your home directory, so you needn't log in to the same machine each time to continue working.

After you have an account choose a machine at random from the lab status page from the lab1- set of machines (that is, lab1-1.eng.utah.edu through lab1-40.eng.utah.edu).

ssh lab1-10.eng.utah.edu

CADE user accounts have tcsh set as their default shell. Each time you login first run bash before anything else. All instructions, examples, and scripts from this class assume you are using bash as your shell. You'll need to do this each time unless you reset your default shell ( link) (which I'd recommend). Perhaps, savvy users can provide slick setups. This step is important. If you don't reset your shell, other things will mysteriously break as you try to work through the labs.

There is also a CADE setup document available in Canvas for reference.

Submission

You need to submit a tar.gz file of your source code to canvas.

You should also include a report.pdf in your submission that contains:

A plot showing the peformance of vEB tree and BST for insertions.
A plot showing the peformance of vEB tree and BST for queries.
A plot showing the peformance of vEB tree and BST for successors.
A brief analysis of the performance of the two data structures. You also need to include a discussion about the theoretical guarantees of these data structures and empirical performance.

We will evaluate the correctness and the performance of your implementation off-line after the project due date.

Collaboration Policy

Every student has to work individually on this assignment.
Students are allowed to discuss high-level details about the project with others.
Students are not allowed to copy the contents of a white-board after a group meeting with other students.
Students are not allowed to copy the solutions from another colleague.
You can not copy code from the internet.

If you have any questions please contact the instructor.