647 South 1200 East, #15C

Salt Lake City, UT 84102






Karthik Ramani



Design, analysis, and exploration of architectural and compilation techniques for accelerators, graphics processors, and chip multiprocessors.



Ph.D., Computer Science (Advisor: Dr. Al Davis)                                   Expected: August 2008

School of Computing, University of Utah, Salt Lake City

Thesis title: CoGenE: A framework for compilation, architectural design, and exploration of embedded domain specific accelerators (DSA)


Masters in Electrical Engineering                                                                                 May 2004

University of Utah, Salt Lake City (GPA:  4.0/4.0) 


Bachelors in Electronics Engineering                                                                           April 2002

University of Madras, India (Aggregate: 88.04%, first in the department)



Computer Architecture: Accelerator design, chip multiprocessors, graphics processors, heterogeneous interconnects performance and power modeling.

Compilers: Compilation for interconnects, code generation for heterogeneous architectures.



         University of Utah graduate research fellowship, 2007-2008.

         School of Computing fellowship, 2004-2005.

         Department of Electrical and Computer Engineering scholarship, 2002-2003.

         Awarded first place in the department of Electronics and Communication Engineering, 2002.

         Outstanding academic award, University of Madras, 2002 (top ranked in 10000 students).

         Second place, higher secondary certificate (HSC) examinations, 1998.



Intern, Graphics Processing Group, ATI (AMD) Research, Santa Clara, CA

Manager: John Brothers, Director and Dr. Dan Shimizu, AMD Fellow           March 2008 – present


Project Goal:  Power estimation, validation, and exploration for future graphics processors

         Feasibility study for fast early stage power estimation in graphics processors.

         Design validation for accuracy and speed of architectural power modeling.


Intern, Graphics Processing Group, ATI (AMD) Research, Santa Clara, CA

Manager: Dr. Ali Ibrahim and Dr. Dan Shimizu, AMD Fellow              May 2007 – September 2007                                                                         


Project Goal:  Architectural power modeling infrastructure for design space exploration in graphics processing units (GPU)

         Worked as a liaison between physical design, circuit design and architecture teams.

         Designed and implemented a novel architectural power modeling infrastructure for GPUs.

         Implemented flexible interfaces between power and performance models.

         Researched and developed a framework for exploration of interconnect choices in GPUs.


Graduate Technical Intern, FACT, Intel, Hudson, MA

Manager: Dr. Shubu Mukherjee and Brian Slechta                               January 2006 - May 2006


Project goal: Analysis and validation of an ASIM-based next generation X86 architecture.

         Developed various modules in the instruction fetch and branch prediction pipeline.

         Validated performance of the front end (Fetch, decode and branch prediction) pipeline.

         Employed profiling to accelerate simulation speed of the ASIM based simulator.




·         PowerRed: A simulation tool that employs process parameters to estimate the power dissipation of architectural blocks in GPUs. High level models represent circuit level implementations of various blocks in a processor and help in high speed early stage design.



·         Micro-architectural Wire Management for Performance and Power in Partitioned Architectures, R. Balasubramonian, N. Muralimanohar, K. Ramani, L. Cheng, J. Carter, Provisional US Patent filed, February 2006.



         Karthik Ramani and Alan Davis, “A Case for Interconnect aware Compilation in Programmable Domain Specific Accelerators”, in Review

         Karthik Ramani and Alan Davis, “CoGenE: A Design Automation Framework for Embedded Domain Specific Architectures”, Extended Abstract, in Review.

         Karthik Ramani and Alan Davis, “Design Space Exploration for Domain Specific Architectures”, in Review

         Karthik Ramani and Alan Davis, “Automating the Design of Domain Specific Accelerators using Stall Cycle Analysis (SCA)”, Technical Report, UUCS-08-002, February 2008

         Karthik Ramani, Ali Ibrahim, and Dan Shimizu, “PowerRed: A Flexible Modeling Framework for Power Efficiency Exploration in GPUs”, Workshop on General Purpose Processing on Graphics Processing Units, (GPGPU), 2007

         Karthik Ramani and Alan Davis, “Application Driven Embedded System Design: A Face Recognition Case Study”, International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), Austria, 2007

         Rajeev Balasubramonian, Naveen Muralimanohar, Karthik Ramani, Liqun Cheng and John Carter, “Leveraging Wire Properties at the Micro-architecture Level”, IEEE MICRO, 2006

         Liqun Cheng, Naveen Muralimanohar, Karthik Ramani, Rajeev Balasubramonian, and John Carter, “Interconnect-Aware Coherence Protocols for Chip Multiprocessors”, International symposium on Computer Architecture (ISCA), Boston, 2006

         Naveen Muralimanohar, Karthik Ramani, and Rajeev Balasubramonian, “Power Efficient Resource Scaling in Partitioned Architectures through Dynamic Heterogeneity”, ISPASS, Austin, 2006

         Liqun Cheng, Naveen Muralimanohar, Karthik Ramani, Rajeev Balasubramonian, and John Carter, “Wire Management for cache coherence in Chip-Multiprocessors”, 6th Workshop on Complexity-Effective Design (WCED, with ISCA-32), Madison, June 2005

         Rajeev Balasubramonian, Naveen Muralimanohar, Karthik Ramani, and Venkatanand Venkatachalapathy, “Micro-architectural Wire Management for Performance and Power in Partitioned Architectures”, 11th International Symposium on High-Performance Computer Architecture (HPCA-11), San Francisco, February 2005

         Karthik Ramani, Naveen Muralimanohar, and Rajeev Balasubramonian, “Micro-architectural Techniques to Reduce Interconnect Power in Clustered Processors”, 5th Workshop on Complexity-Effective Design (WCED, with ISCA-31), Munich, June 2004

         Malarvizhi G, Karthik Ramani, Hariharan Rajasekaran and Meenakshi M, “A Near Optimum MAC Protocol based on the Incremental Collision Resolution Multiple Access Protocol for CDMA based Communications System”, 5th IEEE conference on Wireless Personal Multimedia Communications (WPMC), Hawaii, October 2002

         Malarvizhi G and Karthik Ramani, “Analysis of a Multi-Channel Wireless Access Protocol with Non-instantaneous feedback under Correlated fading”, The IEEE conference on Networks (ICON), Singapore, 2002

         Malarvizhi G and Karthik Ramani, “Multi-Channel Wireless Access Protocol under Correlated fading”, 12th IEEE Workshop on Local and Metropolitan Area Networks (LANMAN), Stockholm, 2002.



         AMD Research AARL, June, 2007

         University of Utah, March, 2007

         ATI Research Silicon Valley, February, 2007



         Design Automation for Embedded Accelerator Architectures: Embedded system design is a complex process that conflicts with the rapid changes in the application space and the short time-to-market requirements for embedded devices. Given these constraints, this work employs a framework that employs compilation and architectural design to explore the design space of programmable “ASIC-like” accelerators. The tool identifies a set of near-optimal energy-performance designs for a given application suite (in review).


         Interconnect aware Compilation: The performance benefits of “ASIC-like” accelerators arise from efficient routing on the interconnect that transmits data across the major subsystems on a fine grained basis. For programmability, the compiler needs to perform data scheduling. This work employs integer linear programming (ILP) based interconnection scheduling to achieve the high performance of accelerators (CASES 07). Our compiler can also retarget different applications for a given architecture.


         Workload Characterization: Recognition is a critical workload for the future. This work characterizes the access, compute, and control requirements of various algorithms for face and speech recognition. Based on characterization, a real time embedded processor was designed for face recognition (CASES 07). We are employing a similar approach for ray tracing and wireless telephony.


         Power Modeling for Graphics (GPU) and Chip-multiprocessing: Power estimation is a critical design metric for the design of processors. This work (GPGPU 07) presents PowerRed, a tool estimates power dissipation in GPUs for three categories of circuits: analytical power models for predictable circuits like memories, empirical power models for circuits like ALUs, and area based power models for interconnects.


         Heterogeneous Interconnects for GPUs: Interconnects incur a significant fraction of the total chip power dissipation. Given the bandwidth needs, this work (WCED 04, ISCA 06) explores the design of heterogeneous wires with varying delay, bandwidth, and power characteristics. The effect of bus encoding techniques on power dissipation has also been studied.


         Dynamic Resource Management: Increasing resources in a linear manner provides a commensurate increase in power dissipation while providing diminishing returns in performance. This study exploits the dynamic characteristics of programs to employ dynamic frequency scaling and minimize the ED2 budget for a processor. Techniques for reducing the frequency of thermal emergencies have also been proposed (ISPASS 06). 



      Efficient test methodologies for SOC cores and interconnects.

      Design and implementation of a compiler for the MiniJava language.

      Implementation of the Yalnix kernel (Memory management, process management, inter-process communication and garbage collection) for emulated hardware.

      Design, implementation and testing of 16-bit four stage digital signal processor pipeline. 

      Implementation of an orthogonal frequency division multiplexing (OFDM) communication system on the ADSP 21161N DSP processor. 



Programming languages:      C, C++, Java, and Python

Platforms:                              UNIX, Linux, Solaris and Windows
CAD tools:                             Cadence, Synopsys,

Architecture simulators:          SimpleScalar, ML-RSIM, UVSIM, SIMICS, ASIM



         Active member of ACM and IEEE.

         Reviewer for CASES 2005, LCTES 2007, CASES 2007, LCTES 2008.








Dr. Alan Davis, Professor, Advisor

School of Computing

University of Utah



Dr. Rajeev Balasubramonian, Assistant Professor, Co-advisor

School of Computing

University of Utah



Dr. John Carter, Associate Professor

School of Computing

University of Utah



Dr. Shubu Mukherjee, Principal Engineer

Fault Aware Computing Technology (FACT/AMI) group,

Intel Corporation



Mr. Brian Slechta, Group Lead

Architecture Modeling Infrastructure Group

Intel Corporation