University of Utah
Search
School of Computing
 

Improving the Performance of Large-Scale Shared Memory Systems

by
Liqun Cheng

Advised by
John Carter

As network hop latency rapidly approaches thousands of processor cycles, it becomes a major factor in determining parallel programs' performance. Unfortunately, cc-NUMA designs put the directory access into the critical path of 3-hop(cache-to-cache) misses, which incurs high overheads. In the producer-consumer sharing pattern, producer need to access home directory, send invalidations to all consumers and collect all invalidation acknowledgements before it could get the exclusive ownership. Similarly, consumers wishing to use the new data need to send requests to home directory and wait for producer to downgrade its exclusive copy which also involves 3 network hops.

This paper proposes an adaptive protocol that delegates directory information to producer node after detecting a stable producer-consumer sharing pattern. By removing the directory access from the critical path, producer and consumer can communicate with each other directly, which converts these 3-hop misses to 2-hop misses. Moreover, we exploit a hardware write-update(PUT) mechanism to implement a novel write update protocol on directory based system under sequential consistency and demonstrate how selective update allow the producer to pre-send needed data to a targeted set of consumers, which converts 2-hop misses to local misses.

Using a cycle-accurate execution-driven simulator of a future-generation SGI multiprocessor, our proposed adaptive cache coherent protocol speeds up program execution by 13% and reduces 17% network traffic using a 32-entry delegate cache and 32KB RACs. With a 1k-entry delegate cache and 1MB RACs, performance improves by 21% and network traffic drops by 15%.


School of Computing • 50 S. Central Campus Dr. Rm. 3190 • Salt Lake City, UT 84112
801-581-8224 • Send comments to webmaster@cs.utah.edu
Disclaimer

Home People Research Admissions Site Map