Loading Events

« All Events

  • This event has passed.

Colloquium – Ignacio Laguna

September 17, 2018 @ 10:00 am - 11:30 am

Dr. Ignacio Laguna
Computer Scientist
Center for Applied Scientific Computing (CASC)
Lawrence Livermore National Laboratory

Monday, September 17, 2018
3147 MEB

Host: Ganesh Gopalakrishnan

Understanding Resilience to Soft Errors in HPC Scientific Applications

Abstract:Ensuring execution correctness and numerical reliability of high-performance computing (HPC) simulations is becoming increasingly important in extreme-scale systems. As systems scale and the number of system components grow, the chances of experiencing soft errors
increases as well. While soft errors can be in many cases detected and corrected by low-level hardware mechanisms, some errors can escape these
mechanisms and affect the results of scientific simulations. In this talk, we present a set of models and frameworks that allow us to (1)
replicate these errors in a controlled environment, (2) reason about how these errors propagate and are naturally masked (sometimes) within the
application space, and (3) protect applications from allowing these errors to propagate to the final program output.


September 17, 2018
10:00 am - 11:30 am
Event Category:


3147 MEB