Avatar

Mahesh Lakshminarasimhan

Ph.D. Student in Computer Science

The University of Utah

$ whoami

Mahesh Lakshminarasimhan is a Ph.D. student at the School of Computing in the University of Utah advised by Prof. Mary Hall. His current research focuses on domain-specific compilers for autotuning ODE solvers in AMReX-based applications (Astrophysics, Combustion, Hydrodynamics) targeted at CPU and GPU-accelerate systems.

He is a research affiliate with the Computational Research Division, Lawrence Berkeley National Laboratory. He collobarates with the Center for Computational Science and Engineering (CCSE). He spent his 2019 summer as a Computer Systems Engineer in the Performance and Algorithms Research (PAR) group mentored by Dr. Samuel W. Williams.

Mahesh received his Master’s degree in Computer Science from Boise State University under the advisement Prof. Cathie Olschanowsky. As a research assistant at ADaPT lab, he analyzed the performance of stencil and tensor computations, and explored their optimization space by developing AdaptMemBench, a configurable application-specific benchmarking framework leveraging the polyhedral model.

Interests

  • High-Performance Computing
  • Autotuning for Scientific Computing
  • Domain-Specific Optimizing Compilers
  • Performance Modeling and Analysis
  • Hardware/Software Co-design

Education

  • Ph.D. Computer Science, 2019-Present

    University of Utah

  • M.S. Computer Science, 2019

    Boise State University, Idaho

  • B.Engg. Computer Science, 2016

    Anna University, India

Professional Experience

 
 
 
 
 

Ph.D. Research Assistant

School of Computing, The University of Utah

Aug 2019 – Present Salt Lake City, Utah
Advisor: Prof. Mary Hall
 
 
 
 
 

Computer Systems Engineer - I

Lawrence Berkeley National Laboratory

May 2019 – Aug 2019 California
Mentor: Samuel W. Williams | Supervisor: Erich Strohmaier
 
 
 
 
 

Graduate Research Assistant

Boise State University

Aug 2017 – Apr 2019 Boise, Idaho
Advisor: Prof. Cathie Olschahmowsky
 
 
 
 
 

Software Development Engineer

Works Applications Inc./ IVTL Infoview Technologies

Jun 2016 – Jul 2017 Japan/India

Posters and Talks

Characterizing the Performance of Sparse Tensor Kernels using PASTA

Performance Analysis of ODE Solvers in AMReX Applications

A Configurable Benchmarking Framework For Memory Access

Application-Specific Memory Subsystem Benchmarking

Recent Publications

More on Google Scholar

PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite

Tensor computations present significant performance challenges that impact a wide spectrum of applications ranging from machine learning, healthcare analytics, social network analysis, data mining to quantum chemistry and signal processing. Efforts to improve the performance of tensor computations include exploring data layout, execution scheduling, and parallelism in common tensor kernels. This work presents a benchmark suite for arbitrary-order sparse tensor kernels using state-of-the-art tensor formats: coordinate (COO) and hierarchical coordinate (HiCOO) on CPUs and GPUs. It presents a set of reference tensor kernel implementations that are compatible with real-world tensors and power law tensors extended from synthetic graph generation techniques. We also propose Roofline performance models for these kernels to provide insights of computer platforms from sparse tensor view. This benchmark suite along with the synthetic tensor generator will be publicly available.

AdaptMemBench: Application-Specific Memory Subsystem Benchmarking

Application performance often depends on achieved memory bandwidth. Achieved memory bandwidth varies greatly given specific combinations of instruction mix and order, working set size, and access pattern. Achieving good application performance depends on optimizing these characteristics within the constraints of the given application. This task is complicated due to the lack of information about the impact of small changes on the performance. Some information is provided by benchmarks, but most memory benchmarks are confined to simple access patterns that are not representative of patterns found in real applications. This work presents AdaptMemBench, a configurable benchmark framework designed to explore the performance capabilities of compute kernels extracted from applications. AdaptMemBench provides a framework to emulate application-specific memory access patterns. A set of templates manages standard timing and measurement tasks. The build system accommodates the polyhedral model, making the framework provides a convenient testbed for potential code optimizations. AdaptMemBench supports reproducibility in experimental results and facilitates sharing results. Given that small changes in benchmarks have a large impact on performance a common framework isolates the measured portions of code. This eases the process of rerunning experiments and porting to new systems. The strengths of AdaptMemBench are demonstrated through a collection of case studies on common compute kernels including: streaming patterns, multidimensional stencils, and sparse matrix operations.

Weed detecting robot in sugarcane fields using fuzzy real time classifier

We present a weed detecting robotic model for sugarcane fields that uses a fuzzy real time classifier on leaf textures. The differentiation between weed and crop and weed removal are the two challenging tasks for the farmers especially in the Indian sugarcane cultivation scenario. The automatic weed detection and removal becomes a vital task for improving the cost effectiveness and efficiency of the agricultural processes. The detection of weeds by the robotic model employs a Raspberry Pi based control system placed in a moving vehicle. An automated image classification system has been designed which extracts leaf textures and employs a fuzzy real-time classification technique. Morphological operators are applied to extract circular leaf patterns in different scales from the leaf images. An optimal set of features have been identified for the characterization of crops and weeds in sugarcane fields. A weed detecting robotic prototype is designed and developed using a Raspberry Pi micro controller and suitable input output subsystems such as cameras, small light sources and motors with power systems. The prototype’s control incorporates the weed detection mechanism using a Raspbian operating system support and python programming. The designed robotic prototype correctly identifies the sugarcane crop among nine different weed species. The system detects weeds with 92.9% accuracy over a processing time of 0.02 s.

Contact