Impulse: Building a Smarter Memory Controller
The goal of the Impulse project is to build a configurable main memory controller that will significantly increase processor cache and system bus utilization. The Impulse memory controller provides an interface for software (e.g., the operating system, compiler, or runtime libraries) to remap physical memory to support scatter/gather-style memory accesses. This capability enables it to:

fetch from and store to discontiguous data as if it were densely allocated,

create superpages that can be mapped in a single TLB slot from discontiguous physical pages,

prefetch pointer-based data structures, and

aggressively prefetch in the memory controller without danger of cache pollution due to inaccurate prefetching heuristics.

Although we strive to keep these web pages as up to date as possible, the best way to find out about the latest in Impulse technology is to read our publications and technical reports.

Impulse People

Impulse Publications

Impulse-related Conferences

Impulse compilation effort (centered at the University of Massachusetts)

For project members only...


Motivation Behind Impulse

The Impulse project is motivated by the oft-quoted observation that microprocessor and L1 cache speeds are increasing at a rate of 60% per year, while DRAM latencies are only decreasing at 7% per year. This growing performance disparity is driving efforts to develop increasingly complex cache hierarchies to hide memory latencies. The problem is getting worse as CPU designers introduce ever greater degrees of internal parallelism to CPUs, e.g., 4-way superscalar processors are being displaced by 8-way designs. Von Neumann's prediction of 1945 continues to hold true - memory is the primary system bottleneck.

The traditional approach to attacking the memory system bottleneck has been to build deeper and more complex cache hierarchies. We believe that this approach is beginning to fail - caches are beginning to fail, especially for critical commercial and military applications such as database management, data mining, image processing, sparse matrix operations, simulations, and streams-oriented multimedia applications. A study by Sites and Perl on a commercial database workload showed that memory bus and DRAM latencies caused an 8X slowdown from peak performance to actual performance. A major source of the performance disparity between estimated peak operation and observed operation on real applications is the static nature of the cache-to-memory interface, which causes a number of problems. Data that is contiguous in main memory is loaded as a unit into contiguous regions of the cache, even when adjacent data items are unrelated. This wastes valuable cache space and memory bus capacity. In addition, the location of data in main memory statically determines where in the cache it is mapped without regard to whether or not other active data is also mapped there, which induces unnecessary conflict misses (and subsequent high latency cache refills). These issues have existed for years, but only recently has the gap between CPU speed and memory speed become so large that the amount of spatial and temporal locality in many programs is insufficient to mask the problems.

One possible approach to attacking this emerging problem is to design and build a more complex, programmable cache controller on or near the CPU chip. We believe that this approach is impractical and unnecessary. It is impractical because the CPU-cache interface is extremely timing sensitive. Changing this interface requires that CPU manufacturers get involved, which adds several years to the time any results will be seen, and requires a large demonstrated benefit to be worth the effort. We believe that a better alternative is to leave the existing CPU-cache interface untouched, and instead build an smart main memory controller that can adapt the way that main memory appears to the CPU. We are developing such a memory controller, which we hypothesize will improve significantly the utilization of cache and memory bus capacity, thereby improving overall system performance.


Collaborations

We are working closely with the Architecture and Language Implementation group at the University of Massachusetts to develop compiler technology for Impulse. As part of this effort, we are extending the Scale compiler framework to transform programs automatically to exploit Impulse's novel memory controller features. In particular, we are working on compiler-directed superpage creation, software-based MMC prefetching, and automatic insertion of Impulse remapping operations in loops that access arrays inefficiently.

This work is being done with the generous support of the Defense Advanced Research Projects Agency Information Technology Office (DARPA-ITO), the Air Force Research Laboratory, SGI, and Hewlett-Packard. The university and investigators are seeking patent rights on the technology being developed through the Impulse effort. Other parties interested in collaborating with us, supporting the project, or licensing our technology should contact the Principal Investigator.

We would also like to thank Intel, Xilinx, Cadence, Synopsys, Avant!, and Innoveda for their support.



Feedback to: impulse@cs.utah.edu

Last modified September 2000