Mahesh Lakshminarasimhan

Ph.D. Student in Computer Science

The University of Utah

`$ whoami`

Mahesh Lakshminarasimhan is a Ph.D. student at the School of Computing in the University of Utah advised by Prof. Mary Hall. His current research focuses on domain-specific compilers for autotuning ODE solvers in AMReX-based applications (Astrophysics, Combustion, Hydrodynamics) targeted at CPU and GPU-accelerate systems.

He is a research affiliate with the Computational Research Division, Lawrence Berkeley National Laboratory. He collobarates with the Center for Computational Science and Engineering (CCSE). He spent his 2019 summer as a Computer Systems Engineer in the Performance and Algorithms Research (PAR) group mentored by Dr. Samuel W. Williams.

Mahesh received his Master’s degree in Computer Science from Boise State University under the advisement Prof. Cathie Olschanowsky. As a research assistant at ADaPT lab, he analyzed the performance of stencil and tensor computations, and explored their optimization space by developing AdaptMemBench, a configurable application-specific benchmarking framework leveraging the polyhedral model.

Interests

High-Performance Computing
Autotuning for Scientific Computing
Domain-Specific Optimizing Compilers
Performance Modeling and Analysis
Hardware/Software Co-design

Education

Ph.D. Computer Science, 2019-Present

University of Utah
M.S. Computer Science, 2019

Boise State University, Idaho
B.Engg. Computer Science, 2016

Anna University, India

Professional Experience

Ph.D. Research Assistant

School of Computing, The University of Utah

Aug 2019 – Present Salt Lake City, Utah

Advisor: Prof. Mary Hall

Computer Systems Engineer - I

Lawrence Berkeley National Laboratory

May 2019 – Aug 2019 California

Mentor: Samuel W. Williams | Supervisor: Erich Strohmaier

Graduate Research Assistant

Boise State University

Aug 2017 – Apr 2019 Boise, Idaho

Advisor: Prof. Cathie Olschahmowsky

Software Development Engineer

Works Applications Inc./ IVTL Infoview Technologies

Jun 2016 – Jul 2017 Japan/India

Projects

AdaptMemBench

PelePhysics

PASTA Benchmark Suite

Posters and Talks

Characterizing the Performance of Sparse Tensor Kernels using PASTA

Sep 27, 2019 2:00 PM School of Computing, University of Utah

Performance Analysis of ODE Solvers in AMReX Applications

Aug 1, 2019 10:00 AM Lawrence Berkeley National Laboratory, California

A Configurable Benchmarking Framework For Memory Access

Apr 18, 2019 9:00 AM Boise State University

Application-Specific Memory Subsystem Benchmarking

Mar 5, 2019 12:00 AM Computer Science Department, Boise State University

Recent Publications

PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite

Tensor computations present significant performance challenges that impact a wide spectrum of applications ranging from machine learning, healthcare analytics, social network analysis, data mining to quantum chemistry and signal processing. Efforts to improve the performance of tensor computations include exploring data layout, execution scheduling, and parallelism in common tensor kernels. This work presents a benchmark suite for arbitrary-order sparse tensor kernels using state-of-the-art tensor formats: coordinate (COO) and hierarchical coordinate (HiCOO) on CPUs and GPUs. It presents a set of reference tensor kernel implementations that are compatible with real-world tensors and power law tensors extended from synthetic graph generation techniques. We also propose Roofline performance models for these kernels to provide insights of computer platforms from sparse tensor view. This benchmark suite along with the synthetic tensor generator will be publicly available.

Project

Mahesh Lakshminarasimhan, Catherine Olschanowsky

December 2018 AdaptMemBench

AdaptMemBench: Application-Specific Memory Subsystem Benchmarking

Application performance often depends on achieved memory bandwidth. Achieved memory bandwidth varies greatly given specific combinations of instruction mix and order, working set size, and access pattern. Achieving good application performance depends on optimizing these characteristics within the constraints of the given application. This task is complicated due to the lack of information about the impact of small changes on the performance. Some information is provided by benchmarks, but most memory benchmarks are confined to simple access patterns that are not representative of patterns found in real applications. This work presents AdaptMemBench, a configurable benchmark framework designed to explore the performance capabilities of compute kernels extracted from applications. AdaptMemBench provides a framework to emulate application-specific memory access patterns. A set of templates manages standard timing and measurement tasks. The build system accommodates the polyhedral model, making the framework provides a convenient testbed for potential code optimizations. AdaptMemBench supports reproducibility in experimental results and facilitates sharing results. Given that small changes in benchmarks have a large impact on performance a common framework isolates the measured portions of code. This eases the process of rerunning experiments and porting to new systems. The strengths of AdaptMemBench are demonstrated through a collection of case studies on common compute kernels including: streaming patterns, multidimensional stencils, and sparse matrix operations.

PDF Code Project

M Sujaritha, S Annadurai, J Satheeshkumar, K Sharan, Mahesh Lakshminarasimhan

March 2017 COMPAG

Weed detecting robot in sugarcane fields using fuzzy real time classifier

We present a weed detecting robotic model for sugarcane fields that uses a fuzzy real time classifier on leaf textures. The differentiation between weed and crop and weed removal are the two challenging tasks for the farmers especially in the Indian sugarcane cultivation scenario. The automatic weed detection and removal becomes a vital task for improving the cost effectiveness and efficiency of the agricultural processes. The detection of weeds by the robotic model employs a Raspberry Pi based control system placed in a moving vehicle. An automated image classification system has been designed which extracts leaf textures and employs a fuzzy real-time classification technique. Morphological operators are applied to extract circular leaf patterns in different scales from the leaf images. An optimal set of features have been identified for the characterization of crops and weeds in sugarcane fields. A weed detecting robotic prototype is designed and developed using a Raspberry Pi micro controller and suitable input output subsystems such as cameras, small light sources and motors with power systems. The prototype’s control incorporates the weed detection mechanism using a Raspbian operating system support and python programming. The designed robotic prototype correctly identifies the sugarcane crop among nine different weed species. The system detects weeds with 92.9% accuracy over a processing time of 0.02 s.

PDF DOI