CS 5968/6968 – Data Str & Alg Scalable Comp

Lectures: MoWe / 11:50AM-01:10PM MT at GC 2760

Instructor: Prashant Pandey

  • Email: prashant.pandey [at] utah.edu

  • Office Hours: MoWe / 9:30AM-10:30AM MT at WEB 2686

Teaching Assistant:

  • Benwei Shi

    • Email: b.shi [at] utah.edu

    • Office Hours: MoWe 1:30PM-2:30PM MT on Zoom (link in Canvas)

We will use Piazza for all Q&A. Piazza CS 6968

Course Overview

This course studies advanced data structures and algorithms for handling scalability challenges in large-scale data analysis and machine learning pipelines. It will cover modern hashing techniques, filters and sketching algorithms, locality sensitive-hashing, succinct data structures, string algorithms, graph algorithms, external memory algorithms, and learned indexes. This course is appropriate for both undergraduate and graduate students with intermediate data structure and algorithm skills. The course will also require intermediate programming skills in C/C++.

Prerequisites

Official Prerequisites: For undergrads: CS 4150 (Undergrad algorithms) For grads: CS 6150 (Grad algorithms)

Course Topics

  • Compact trees

  • Succinct data structures

  • String algorithms

  • Hashing techniques

  • Filters and sketches

  • Locality sensitive hashing (LSH)

  • (Approximate) Nearest neighbor search

  • Graph algorithms

  • External memory algorithms

  • Distributed data structures

Assignments

  • Assignments: There will be two assignments. The assignments will include theoretical problems and/or programming tasks. The assignments will be completed individually.

Projects

  • Final project: The main portion of a student's grade in this course is the final group project. Students will organize into groups of three and choose to implement a project that is

    • relevant to the materials discussed in class,
    • requires a significant theory or programming effort from all team members,
    • unique (i.e., two groups may not choose the same project topic).

    The projects will vary in both scope and topic, but they must satisfy this criterion. We will discuss this more in-depth during class, though students are encouraged to begin to think about projects that interest them early on. If a group is unable to come up with their own project idea, the instructor will provide suggestions on interesting topics.

Paper Reading

There is a set of assigned paper readings for the course. The reading list is designed to provide additional information and insight into the current state-of-the-art data structures and algorithms research. Each student is required to pick five papers from the reading list and turn in a one-paragraph synopsis of each of the five papers. There will be five deadlines throughout the semester when students would be required to submit the synopsis. Late submissions will not be accepted without prior approval from the instructor.

Each review must include the following information:

  • What is the problem and why is it hard? (Three sentence).
  • An overview of the main idea and contributions (Three sentences).
  • How do the authors evaluate their solution? (Two sentence).

These reading reviews must be your own writing. You may not copy from the papers or other sources that you find on the web. Plagiarism will not be tolerated.

Scribing

  • Use this template when scribing.
  • Each student may have to scribe 1-2 lectures, depending on class size.
  • Pick a date below when you are available to scribe and send your choice to Benwei Shi (TA). First-come first-served.
  • Submit scribe notes (pdf + source) to Benwei Shi (TA).
  • Please give real bibliographical citations for the papers that we mention in class (DBLP can help you collect bibliographic info).
  • Scribe notes are due by 9pm on the day after lecture. They are posted immediately without proofreading (though we may proofread later and ask for some changes to be made.)

Useful Resources

Please refer to this brief overview of asymptotic notations The Asymptotic Cheat Sheet. This will help you easily follow theoretical analyses in the course.

Assignments, scribe notes, and final projects must be typeset in LaTeX. If you are not familiar with LaTeX, see this introduction. Here's a quick Overleaf tutorial.

Grading

  • Assignments: 30%

  • Final Project: 40%

  • Paper Reports: 10%

  • Class participation: 10%

  • Final Exam: 10%

Late submission policy

  • No late submissions are allowed. Please plan accordingly based on the submission dates.
  • In case of emergencies, prior permission from the instructor is required.

Collaboration and Plagiarism

Everyone needs to read the SoC Policy on Academic Misconduct.

Working with others on assignment is a good way to learn the material and we encourage it. However, there are limits to the degree of cooperation that we will permit.

When working on programming assignments, you must work only with others whose understanding of the material is approximately equal to yours. In this situation, working together to find a good approach for solving a programming problem is cooperation; listening while someone dictates a solution is cheating. You must limit collaboration to a high-level discussion of solution strategies, and stop short of actually writing down a group answer. Anything that you hand in, whether it is a paper report or a computer program, must be written in your own words. If you base your solution on any other written solution, you are cheating.

If you collaborate with other students to discuss a problem and then write your own solution, make sure to declare upfront in the write up names of all the students you collaborated with.

Never look at another student's code or share your code with any other student.

You must not make your code public (on Github or by any other means).

Tools like Github Copilot, ChatGPT, and copying code from sites like Stack Overflow also constitutes cheating. Do not write code with Copilot enabled in this course.

We do not distinguish between cheaters who copy other's work and cheaters who allow their work to be copied. If you cheat, you will be given an E in the course and referred to the University Student Behavior Committee.

Clearly, any attempt to subvert the ordinary grading process constitutes cheating.

If you have any questions about what constitutes cheating, please ask first.