The Avalanche Project


Commercial Impact:

In order for the Avalanche work to have an impact on the future commercial technology, it is necessary to create the new context sensitive cache controller unit (CSCCU) for an existing commercial microprocessor. Hewlett-Packard has indicated a strong interest in the Avalanche approach and has donated the design files and associated CAD tools for their HP 7100 CPU. We are working with HP to remove the existing cache controller from the 7100 and to craft an appropriate interface for the new Avalanche CSCCU.

Results To Date:

The goal of our efforts are to reduce all aspects of communication latency without requiring proprietary new operating systems or hardware. The major latency components are the system call overhead, interrupt handling overhead, the communication protocol, cache miss penalties to promote data to the top level cache where it can be used, and interconnect fabric latency. Our efforts address all but the final component since we will use an existing commercially available, high performance interconnect provided by Myricom Inc. The project has been underway for less than 3 months, results to date include:
  1. Development and prototype implementation of a sender-based "0-copy" protocol (SBP) which provides a significant reduction in communication latency for message passing applications which run under existing commercial versions of Unix. Our existing testbed is based on HP-UX workstations connected by a FDDI communications media called Medusa, and we have ported existing PVM and Xlib applications to provide the test cases. Measured results show latency has been reduced from 360 usec to 85 usec, throughput for memory to memory traffic has been increased from 3.75 MB/sec to 11.2 MB/sec, and throughput for node to node filesystem copy has been increased from 1.7 MB/sec to 8.3 MB/sec. Interrupt handling overhead has been reduced from 25 usec to 7 usec. System call overhead has been reduced from 13 usec to 18 cycles! Throughput and latency times will improve by over an order of magnitude with the Myricom interconnect.

  2. Scalable shared memory performance is impeded by cache misses induced by excessive invalidations and reloading of shared data by write-invalidate coherence protocols. Avalanche supports multiple hardware consistency protocols and a write state buffer that supports multiple concurrent writers to a single cache line. Using an execution-driven simulation of the Avalanche architecture on five SPLASH benchmark applications, we found that these two hardware mechanisms can reduce cache stall times by 5-60% and overall execution times by 10-28%.

  3. We have developed a detailed simulator for the Avalanche architecture based on the Mint simulation tool. Mint itself simulates a collection of processors with support for shared memory, synchronization, and most Unix system calls. We have augmented Mint to support message passing, multiprogramming, and multithreaded applications. On top of this simulation environment, we have implemented a detailed model of the Avalanche architecture, including the full memory hierarchy, context sensitive cache and communication controller, interconnect, and internal queues. The resulting simulator accurately models most features of Avalanche at a fine grain, including cache and memory conflicts, controller bottlenecks, and network contention. Architectural investigation and simulator enhancement continue.


An Overview of Avalanche

Avalanche is a project which is attempting to significantly reduce the latency of both distributed shared memory and message passage multiprocessor communications. The project assumes that the commercial sector will continue to provide improvements in both processor and communication fabric technology that will be tough to compete with in an academic research environment. Nonetheless, communication latency continues to be an inhibiting influence on the development of scalable cluster multicomputers.

As processor technology advances, memory size increases and the memory hierarchy deepens in order to feed the increased demands of the processor. The result of this deepening hierarchy is an alarming acceleration in cache miss penalties in recent years. The problem is exacerbated in the presence of multicomputer communications traffic. If the incoming traffic is placed directly in the highest levels of the memory hierarchy, the conflict cache miss rate may rise significantly as active lines are displaced by the incoming data. If the incoming traffic is placed in the lowest level of the memory hierarchy, then the huge miss penalties add significantly to the communication latency. Both effects have severe negative effects on the resulting performance of the multicomputer.

The core of the effort is the development of a new Context Sensitive Cache Controller Unit (CSCCU) for the Hewlett-Packard 7100 CPU, a member of the HP Precision Architecture family. The CSCCU supports a flexible suite of cache coherence protocols for DSM applications, and provides context sensitive injection of incoming message traffic into the appropriate level of the memory hierarchy in order to minimize message latency. The interconnect fabric is being provided by Myricom Inc. and is called Myrinet. The target for the project is a 64 processing element prototype which will be constructed in the final year of the ARPA supported project duration.

The Avalanche project is currently supported by Advanced Research Project Agency (ARPA) Order Number B990, and by the Space and Naval Warfare Systems Command (SPAWAR) Contract Number N0039-95-C-0018. Current funding began in November 1994, and runs through November 1997.


Avalanche Personnel

Principal Investigator:

o Al Davis: Chief architect
Co-Principal Investigators:

o John Carter: CSCCU distributed shared memory and cache design
o Kent Smith: VLSI design
Faculty Associates:

o Erik Brunvand: VLSI design
o Ganesh Gopalakrishnan: Protocol verification
o Robert Kessler: Compiler support and applications
o Chris Johnson: Computational Engineering Applications
o Mark Swanson: CSCCU message passing architecture and protocol design
Staff Members:

o Leigh Stoller: Message passing protocol and simulation tools
o Terry Tateyama: CSCCU to network interface design
o Marshall Soares: VLSI tools and layout
o Kurtis Bleeker: Simulation tools
Students:

o Chen Chi Kuo: CCCSU to network interface design
o Ravi Kuramkote: Cache and distributed shared memory protocol design
o Ratan Nalumasu: Protocol verification
o Steve Baker: VLSI design
o Benny Yih: Applications

[Up] Back to the Avalanche Project Home Page, or the Computer Science Department Home Page.
Feedback to <avalanche@jensen.cs.utah.edu>.
Last modified around Monday January 31, 1995.