Reading List

CMP Cache Design

1. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. Kim, Burger, and Keckler. ASPLOS-2002.
2. A NUCA substrate for flexible CMP Cache sharing. Huh, Kim, Shafi, Zhang, Burger, and Keckler. ICS-2005.
3. Managing wire delay in large chip multiprocessor caches. Beckmann and Wood. MICRO-2004.
4. Nahalal: Cache organization for chip multiprocessors. Guz, Keidar, Kolodny, and Weiser. CAL-2007.
5. Cooperative caching for chip multiprocessors. Chang and Sohi. ISCA-2006.
6. An adaptive shared/private NUCA cache partitioning scheme for chip multiprocessors. Dybdahl and Stenstrom. HPCA-2007.
7. A NUCA model for embedded systems cache design. Foglia, Mangano, and Prete.
8. ASR: Adaptive selective replication for CMP caches. Beckmann, Marty, and Wood. MICRO-2006.
9. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. MICRO-2006.
10. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. HPCA - 2008.
11. Molecular caches: A caching structure for dynamic creation of application-specific heterogeneous cache regions. MICRO - 2006.
12. Adaptive set pinning: Managing shared caches in chip multiprocessors. ASPLOS - 2008.

Exploring the design space of future CMPs. PACT - 2001.
Exploring the cache design space for large scale CMPs. dasCMP - 2005
Organizing the last line of defence before hitting the memory wall for CMPs. HPCA - 2004.
Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. PACT - 2006.

Victim Replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. Zhang and Asanovic. ISCA-2005.
Optimizing replication, communication and capacity allocation in CMPs. Chisti, Powell, and Vijaykumar. ISCA-2005.
Distance associativity for high-performance energy-efficient non-uniform cache architectures. Chisti, Powell, and Vijaykumar. MICRO-2003.
Dynamic partitioning of shared cache memory. Suh, Rudolph, and Devadas. Jnl. of supercomputing-2004.

Predicting inter-thread contention on a chip multi-processor architecture. HPCA - 2005.
Architectural support for operating system-driven CMP cache management. PACT - 2006.
Just say no: Benefits of early cache miss determination. HPCA - 2003.
Datacenter-on-chip Architectures: Tera-scale opportunities and challenges. Iyer et. al. Intel Tech. Journal-2007
The V-Way Cache : Demand-based associativity via global replacement. Qureshi, Thompson, and Yale Patt. ISCA-2005.
A case for MLP-aware cache replacement. Qureshi, Lynch, Mutlu, and Yale Patt. ISCA-2006.

Interconnects

1. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. Kumar, Zyuban, and Tullsen. ISCA-2005.
2. Interconnect design considerations for large NUCA caches. Muralimanohar, and Balasubramonian. ISCA-2007.
3. Interconnect-aware coherence protocols for chip multiprocessors. Cheng, Muralimanohar, Ramani, Balasubramonian, and Carter. ISCA-2006.
4. Leveraging wire properties at the microarchitectural level. Balasubramonian, Muralimanohar, Ramani, Cheng, and Carter. MICRO-2006.
5. Microarchitectural wire management for performance and power in partitioned architectures. Balasubramonian, Muralimanohar, Ramani, and Venkatachalapathy. HPCA-2005.

Misc.

1. The Landscape of Parallel Computing Research: A View from Berkeley. Asanovic et. al. 2006
2.
 Recognition, Mining and Synthesis Moves Computers to the Era of Tera. Dubey. Intel Tech Magazine.-2005.

Maximizing CMP throughput with mediocre cores. Davis, Laudon, and Olukotun. PACT-2006.
Dataflow predication. Smith et. al. MICRO-2006.
Molecular caches: A caching structure for dynamic creation of application-specific heterogeneous cache regions. Varadarajan et. al. MICRO-2006.
Computation spreading: Employing hardware migration to specialize CMP cores on-the-fly. Chakraborty, Well, and Sohi. ASPLOS-2006.


Kshitij Sudan
Last modified: Feb. 8, 2008.