The Avalanche Project
In order for the Avalanche work to have an impact on the future commercial
technology, it is necessary to create the new context sensitive cache
controller unit (CSCCU) for an existing commercial microprocessor.
Hewlett-Packard has indicated a strong interest in the Avalanche approach
and has donated the design files and associated CAD tools for their HP 7100
CPU. We are working with HP to remove the existing cache controller from
the 7100 and to craft an appropriate interface for the new Avalanche CSCCU.
Results To Date:
The goal of our efforts are to reduce all aspects of communication latency
without requiring proprietary new operating systems or hardware. The major
latency components are the system call overhead, interrupt handling
overhead, the communication protocol, cache miss penalties to promote data
to the top level cache where it can be used, and interconnect fabric
latency. Our efforts address all but the final component since we will use
an existing commercially available, high performance interconnect provided
by Myricom Inc. The project has been underway for less than 3 months,
results to date include:
- Development and prototype implementation of a sender-based "0-copy"
protocol (SBP) which provides a significant reduction in communication
latency for message passing applications which run under existing
commercial versions of Unix. Our existing testbed is based on HP-UX
workstations connected by a FDDI communications media called Medusa, and we
have ported existing PVM
and Xlib
applications to provide the test cases.
Measured results show latency has been reduced from 360 usec to 85 usec,
throughput for memory to memory traffic has been increased from 3.75 MB/sec
to 11.2 MB/sec, and throughput for node to node filesystem copy has been
increased from 1.7 MB/sec to 8.3 MB/sec. Interrupt handling overhead
has been reduced from 25 usec to 7 usec. System call overhead has been
reduced from 13 usec to 18 cycles! Throughput and latency times will
improve by over an order of magnitude with the Myricom interconnect.
- Scalable shared memory performance is impeded by cache misses induced by
excessive invalidations and reloading of shared data by write-invalidate
coherence protocols. Avalanche supports multiple hardware
consistency protocols and a write state buffer that supports multiple
concurrent writers to a single cache line.
Using an execution-driven simulation of the Avalanche architecture on five
SPLASH
benchmark applications, we found that these two hardware mechanisms can reduce
cache stall times by 5-60% and overall execution times by 10-28%.
- We have developed a detailed simulator for the Avalanche architecture
based on the
Mint simulation tool. Mint itself simulates a
collection of processors with support for shared memory, synchronization, and
most Unix system calls. We have augmented Mint to support message passing,
multiprogramming, and multithreaded applications. On top of this simulation
environment, we have implemented a detailed model of the Avalanche
architecture, including the full memory hierarchy, context sensitive cache and
communication controller, interconnect, and internal queues. The resulting
simulator accurately models most features of Avalanche at a fine grain,
including cache and memory conflicts, controller bottlenecks, and network
contention. Architectural investigation and simulator enhancement continue.
Avalanche is a project which is attempting to
significantly reduce the latency of both distributed shared memory
and message passage multiprocessor communications.
The project assumes that the commercial sector will continue to provide
improvements in both processor and communication fabric technology that
will be tough to compete with in an academic research environment.
Nonetheless, communication latency continues to be an inhibiting influence
on the development of scalable cluster multicomputers.
As processor technology advances, memory size increases and the memory
hierarchy deepens in order to feed the increased demands of the processor.
The result of this deepening hierarchy is an alarming acceleration in cache
miss penalties in recent years. The problem is exacerbated in the presence
of multicomputer communications traffic. If the incoming traffic is placed
directly in the highest levels of the memory hierarchy, the conflict cache
miss rate may rise significantly as active lines are displaced by the
incoming data. If the incoming traffic is placed in the lowest level of
the memory hierarchy, then the huge miss penalties add significantly to the
communication latency. Both effects have severe negative effects on the
resulting performance of the multicomputer.
The core of the effort is the development of a new Context
Sensitive Cache Controller Unit (CSCCU) for the
Hewlett-Packard 7100 CPU, a member
of the HP Precision
Architecture family. The CSCCU supports a flexible suite of cache
coherence protocols for DSM applications, and provides context sensitive
injection of incoming message traffic into the appropriate level of the memory
hierarchy in order to minimize message latency. The interconnect fabric is
being provided by Myricom Inc. and
is called Myrinet. The target for the project is a 64 processing element
prototype which will be constructed in the final year of the ARPA supported
project duration.
The Avalanche project is currently supported by
Advanced Research Project Agency
(ARPA) Order Number B990, and by the
Space and Naval Warfare Systems Command
(SPAWAR) Contract Number N0039-95-C-0018.
Current funding began in November 1994, and runs through November 1997.
- Principal Investigator:
Al Davis: Chief architect
- Co-Principal Investigators:
John Carter: CSCCU distributed shared memory and cache design
Kent Smith: VLSI design
- Faculty Associates:
Erik Brunvand: VLSI design
Ganesh Gopalakrishnan: Protocol verification
Robert Kessler: Compiler support and applications
Chris Johnson: Computational Engineering Applications
Mark Swanson: CSCCU message passing architecture and protocol design
- Staff Members:
Leigh Stoller: Message passing protocol and simulation tools
Terry Tateyama: CSCCU to network interface design
Marshall Soares: VLSI tools and layout
Kurtis Bleeker: Simulation tools
- Students:
Chen Chi Kuo: CCCSU to network interface design
Ravi Kuramkote: Cache and distributed shared memory protocol design
Ratan Nalumasu: Protocol verification
Steve Baker: VLSI design
Benny Yih: Applications
Back to the
Avalanche Project
Home Page, or the
Computer Science Department Home Page.
Feedback to <avalanche@jensen.cs.utah.edu>.
Last modified around Monday January 31, 1995.