Techniques for Reducing Consistency-Related Communication in Distributed Shared Memory Systems

John B. Carter, Univ. of Utah

John K. Bennett and Willy Zwaenepoel, Rice Univ.

Distributed shared memory (DSM) is a software abstraction of shared memory on a distributed memory machine. One of the key problems in building an efficient DSM system is to reduce the amount of communication needed to keep the distributed memories consistent. In this paper we present four techniques for doing so: (1) software release consistency; (2) multiple consistency protocols; (3) write-shared protocols; and (4) an update-with-timeout mechanism. These techniques have been implemented in the Munin DSM system. We compare the performance of seven application programs on Munin, first to their performance when implemented using message passing and then to their performance when running on a conventional DSM system that does not embody the above techniques. On a 16-processor cluster of workstations, Munin's performance is within 5% of message passing for four out of the seven applications. For the other three, performance is within 29% to 33%. Detailed analysis of two of these three applications indicates that the addition of a function shipping capability would bring their performance within 7% of the message passing performance. Compared to a conventional DSM system, Munin achieves performance improvements ranging from a few to several hundred percent, depending on the application.

A version of the full paper will appear in TOCS.