CS-7944 : Parallel Algorithms Seminar

The purpose of this seminar is to study and explore interesting parallel & distributed algorithms and ideas. We will cover a wide range of topics, such as parallel data structures, models of parallel computation, parallel complexity theory, scalable numerical methods, and randomization techniques for parallel algorithms. While most papers will be from the last five years, a few older papers will be included early in the semester to set the stage for people not familiar with parallel algorithms. It is recommended that such students also register for CS-6230 : High Performance Computing & Parallelization.

We will meet and discuss (typically) one paper a week. If you are registering for credit, you will be expected to present (at least) one paper from the list below. your choice of paper(s) at the earliest. If you intend to register for more than 1 credit, please check with me first.

Tue 12:25-1:45pm

MEB 3147 (LCR)

Here is a tentative list of papers for discussion (in no particular order). We will likely add additional ones as our discussions progress. It is also unlikely that we will be able to cover all of these and will try and prioritize the ordering.

** Use the discussion board on this page to select papers. **

12 Jan Parallel Algorithms, Guy Blelloch, Bruce Maggs.
19 Jan Parallel Distributed Memory Construction of Suffix and Longest Common Prefix Arrays by Patrick Flick and Srinivas Aluru, ACM/IEEE SC’15.
26 Jan - James King Efficient parallel scan algorithms for GPUs, S. Sengupta, M. Harris, and M. Garland, NVIDIA Technical Report NVR-2008-003, December 2008
Parallel Random Numbers: As Easy as 1, 2, 3, John K. Salmon, Mark A. Moraes, Ron O. Dror, David E. Shaw
Majid Direction-optimizing breadth-first search, Scott Beamer, Krste Asanović, David Patterson
Majid Multilevel k-way partitioning scheme for irregular graphs, G Karypis, V Kumar
Chris Mertin A Framework for Low-Communication 1-D FFT, Ping Tak Peter Tang, Jongsoo Park, Daehyun Kim, Vladimir Petrov
5 Apr - Vairavan Sivaraman Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems, Jehoshua Bruck et al.
12 Apr - Ashok Jalepalli Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors, Guojing Cong, David A. Bader
19 Apr - Jimmy Moore Richard Vuduc, Aparna Chandramowlishwaran, Jee Choi, Murat Guney, and Aashay Shringarpure. 2010. On the limits of GPU acceleration. In Proceedings of the 2nd USENIX conference on Hot topics in parallelism (HotPar’10). USENIX Association, Berkeley, CA, USA, 13-13.
19 Apr - Jimmy Moore Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, and Pradeep Dubey. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proceedings of the 37th annual international symposium on Computer architecture (ISCA ’10). ACM, New York, NY, USA, 451-460.
N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. Proc. SC09, November 2009.
Gallivan, Kyle A., Robert J. Plemmons, and A. H. Sameh. “Parallel algorithms for dense linear algebra computations.” SIAM review (1990): 54-135.
Deep Learning with COTS HPC Systems, A. Coates et al.

cuDNN: Efficient Primitives for Deep Learning, Chetlur et al.