Models of Computation for Massive Data
Instructor : Jeff Phillips
Fall 2013 | Wednesday, (sometimes Fridays) 4:35 pm - 5:45 pm
MEB 3147
Catalog number: CS 7931 001


Description:
This course will explore advanced models of computation pertinent for processing massive data sets. As data sets grow to terabyte and petabyte scales, traditional models and paradigms of sequential computation become obsolete. Different efficiency trade-offs analyzing memory usage, I/O calls, or inter-node communication become the dominant bottlenecks. These paradigms are formalized as I/O-Efficient, Parallel, Streaming, GPU-based, Map-Reduce, and other distributed algorithmic models of computation. This course will study the history and specifics of these models. Students in the class will learn the proper settings in which to use these paradigms, the advantages and disadvantages of each model, and how to analyze algorithms with these settings.

YouTube Channel for Class.

Students in the class are lucky to use resources from EmuLab and Amazon Web Services.
Project Sechedule:
Here is the schedule for the project.

Schedule:
Date Topic Video Link Assignment
Wed 8.28 Class Overview + Review of RAM - read before class: Big Data Abstractions
Fri 8.30 I/O : Overview and History of Model - Arge Class | Zeh class and links | Vitter book Project Proposal Due
Wed 9.04 I/O : Sorting V3 Aggarwal + Vitter | Arge notes
Fri 9.06 I/O : Searching with B-Trees V4 Arge notes
Wed 9.11 I/O : Extensions (Cache Oblivious, Parallel External) (notes, +) V5 + CacheObliv | Parallel EM | Parallel Disk
Fri 9.13 I/O : Graph Algorithms (notes) V6 Zeh notes | Buffer Tree | Katriel + Meyer
Wed 9.18 (travelling - No Class) Cormode Overview
Fri 9.20 Streaming : Overview of Model + Reservoir Sampling (notes) V Muthu Notes | Chakrabarti Notes | Reservoir Sampling
Wed 9.25 Streaming : Heavy-Hitters + Count-Min (notes) V8 Chapter 1 | MJRTY + Chapter 3 | Count-Min
Fri 9.27 Streaming : Count-Sketch V9+ * Chapter 3 | Count-Min
Wed 10.02 Streaming : Sliding Windows + Distinct Elements (notes) + Rank * Sliding Windows + Chapter 2 + Chapter 8
Fri 10.04 Parallel : Overview and History of Model V11 Vishkin on PRAM | on BSP
Wed 10.09 Parallel : Prefix Sum and Scanning V12 Vishkin Chap 3
Fri 10.11 Parallel : Merging and Sorting V13 Vishkin Chap 4
Wed 10.16 (Fall Break - No Class)
Fri 10.18 (Fall Break - No Class)
Wed 10.23 Parallel : Selection (notes) V14 Vishkin Chap 5 Intermediate Project Report
Fri 10.25 MapReduce : Overview and History of Model (notes) V15 Slides by Leskovec | Wired Article
Wed 10.30 (travelling - No Class)
Fri 11.01 (travelling - No Class)
Wed 11.06 MapReduce : Counting Triangles (notes) V16 SV paper | SV Slides
Fri 11.08 MapReduce : Formalizing the Model V17 Goodrich model | MRC model | MUD model | Minimal
Wed 11.13 MapReduce : Filtering + Dremel V18 Filtering paper
Fri 11.15 MapReduce : Minimal MapReduce Algorithms V19 Tao + Lin + Xiao
Wed 11.20 GPU : Overview and History of Model (notes) + Bitonic Sort V20 Desbrun: Histroy of GPU | CUDA tutorial | GPGPU Survey
Fri 11.22 GPU : Memory Bank Algorithms V Sitchinava+Weichert | Kider on sorting | Hybrid sort
Wed 11.27 (Thanksgiving - No Class) V Distributed Computing : Overview and History of Model
Fri 11.29 (Thanksgiving - No Class)
Wed 12.04 Distributed Computing : Peer-to-Peer Networks V22 PASTRY | CHORD | Wikipedia
Fri 12.06 Distributed Computing : Distributed Streaming V23 Mergeable Summaries paper | distrib monitoring
Wed 12.11 Project Presentations
Fri 12.13 Project Presentations



1 credit or 3 credit:
The class can be taken for 1 credit or 3 credits.
If you take the class as 1 credit, then I expect you to attend all (or almost all) of the lectures. If I feel attendence is lagging, I may give pop quizes. Alternatively, I may require that 1 credit students either present a lecture or scribe lecture notes.

If you take the class for 3 credits, you must also do a project. The project will be quite independent, but I will try to provide feedback when asked for. It will also require a short presentation at the end of the semester.
To take the class for 3 credits, the project must be approved by the instructor!

Grading:
As this will be taught as an advanced seminar, grading will be fairly forgiving. Those making an effort to do well, will recieve high marks.

Students will be required to video tape and scribe approximately one lecture. Examples of latex of how scribing notes can look are lined, with needed style file.

Scribe Assignments and Video Assignments.

Project:
(Only required for 3-credit students.) A major component of this class will be a project where you will investigate in-depth a focused topic in a particular model or the relation between several models. Details.

FAQ:
Q: Will there be a book?
A: No. Most information is available online and I will post links on the (under-construction) webpage.

Q: I took undergraduate algorithms, but not graduate algorithms. Will I be lost in the analysis?
A: My intention is "no." I expect you to understand big-O notation and how to analyze basic algorithms in it. But this class will not be as intense or extensive in its analysis as CS 6150 (Advanced Algorithms). We will apply big-O analysis to analyze aspects other than just run-time, and the goal of this course will be to teach you how and when to do that.