|
Impulse: Building a Smarter Memory Controller |
|
fetch from and store to discontiguous data as if it were densely allocated,
create superpages that can be mapped in a single TLB slot from discontiguous
physical pages,
prefetch pointer-based data structures, and
aggressively prefetch in the memory controller without danger of cache
pollution due to inaccurate prefetching heuristics.
Impulse compilation effort
(centered at the
University of Massachusetts)
The traditional approach to attacking the memory system bottleneck has been to build deeper and more complex cache hierarchies. We believe that this approach is beginning to fail - caches are beginning to fail, especially for critical commercial and military applications such as database management, data mining, image processing, sparse matrix operations, simulations, and streams-oriented multimedia applications. A study by Sites and Perl on a commercial database workload showed that memory bus and DRAM latencies caused an 8X slowdown from peak performance to actual performance. A major source of the performance disparity between estimated peak operation and observed operation on real applications is the static nature of the cache-to-memory interface, which causes a number of problems. Data that is contiguous in main memory is loaded as a unit into contiguous regions of the cache, even when adjacent data items are unrelated. This wastes valuable cache space and memory bus capacity. In addition, the location of data in main memory statically determines where in the cache it is mapped without regard to whether or not other active data is also mapped there, which induces unnecessary conflict misses (and subsequent high latency cache refills). These issues have existed for years, but only recently has the gap between CPU speed and memory speed become so large that the amount of spatial and temporal locality in many programs is insufficient to mask the problems.
One possible approach to attacking this emerging problem is to design and build a more complex, programmable cache controller on or near the CPU chip. We believe that this approach is impractical and unnecessary. It is impractical because the CPU-cache interface is extremely timing sensitive. Changing this interface requires that CPU manufacturers get involved, which adds several years to the time any results will be seen, and requires a large demonstrated benefit to be worth the effort. We believe that a better alternative is to leave the existing CPU-cache interface untouched, and instead build an smart main memory controller that can adapt the way that main memory appears to the CPU. We are developing such a memory controller, which we hypothesize will improve significantly the utilization of cache and memory bus capacity, thereby improving overall system performance.
This work is being done with the generous support of the Defense Advanced Research Projects Agency Information Technology Office (DARPA-ITO), the Air Force Research Laboratory, SGI, and Hewlett-Packard. The university and investigators are seeking patent rights on the technology being developed through the Impulse effort. Other parties interested in collaborating with us, supporting the project, or licensing our technology should contact the Principal Investigator.
|
|
|
|
|
We would also like to thank Intel, Xilinx, Cadence, Synopsys, Avant!, and Innoveda for their support.
|
|
|
|
|
|
Last modified September 2000