School of Computing UofU calendar UofU index UofU directory Map About Salt Lake SoC Calendar University of Utah University of Utah
CPU Colloquium

Manu Shantharam
Penn State


Friday, May 11, 2012
SCI Conference Room, 3760 WEB
Lecture 3:30 p.m.

Host: Mary Hall

Title: Application-aware strategies for managing performance and resilience tradeoffs

Abstract
Resiliency is a key issue as we move toward peta-to-exascale HPC systems that are expected to encounter multiple faults within a day, with faults ranging from fail-stop failures to silent errors. A natural concern is the vulnerability of long running scientific applications on such HPC systems, often involving computations with very large sparse matrices.

In this talk, I will illustrate the challenges posed by soft errors on supercomputing systems, specifically in the context of iterative methods such as conjugate gradients to solve sparse linear systems. First, I will analyze the effects of a single soft error during the solution process and discuss results of an empirical evaluation. Next, I will present our new checksum encoded algorithm based fault tolerant preconditioned conjugate gradients (PCG) method for sparse linear system solution. Our checksum based approach can be applied to all the key operations in PCG, including sparse matrix-vector multiplication (SpMV), vector operations and the application of a preconditioner through sparse triangular solution. I will discuss the overheads of our method and compare it with a well known classical fault tolerant algorithm. Finally, I will conclude by discussing some future research directions.





Return to 2012 Events Calendar


School of Computing • 50 S. Central Campus Dr. Rm. 3190 • Salt Lake City, UT 84112
801-581-8224 • Fax: 801-581-5843 • Send comments to webmaster@cs.utah.edu
Disclaimer