Go backward to Objective
Go up to Top
Go forward to Recent Accomplishments
Approach
The Avalanche design effort is based on the use of as much existing
commercial technology as possible. In particular, the
Hewlett-Packard
PA-8000 microprocessor
and the
Myricom
Myrinet interconnect
technology.
The PA-8000 is scheduled for release in mid-1996.
This choice was
motivated by Hewlett-Packard's interest and collaboration in the research
and the expected maturity date of the research. It makes little sense to
design for a new microprocessor design that will be obsolete by the time
the research work reaches the implementation stage. Maximizing the use of
commercially available technology minimizes the resources utilized to
reinvent the wheel and sharpens the focus on key technical problems. It
also makes it easier to transition this technology into the commercial
mainstream and improves the likelihood that the research will impact future
commercial design decisions as well.
The Avalanche specific effort can be viewed as an all out attack on the
reduction of communication latency. It is important to view the total
latency path from an application standpoint. This total latency has
a number of components:
- the overhead of the applications programming interface at both
the sending and receiving ends;
- the communication protocol overhead;
- the I/O subsystem delay;
- the penalties caused
by context switches, security verification, data validation, and the
operating system;
- and, with growing significance, the number of stall cycles
that result from cache misses.
As processor performance growth outstrips
the performance improvement rate of DRAM components, the memory hierarchy
has deepened. For uniprocessor performance, this has enabled system
performance to keep reasonable pace with that of the CPU. However for
parallel systems where communication frequency is high this presents a
significant problem as the deeper hierarchy has caused a cache miss to main
memory to grow by a factor of six to eight (in CPU cycles) in the last 5
years. Hence there is a significant need to be able to inject incoming
communication data to as high a level in the memory hierarchy as possible
without increasing memory latency due to conflict displacement of active
cache lines.
The Avalanche effort is creating a novel cache control mechanism that
provides a tight and context sensitive integration between the CPU, the
memory hierarchy, and the communication fabric. The approach is neither
CPU or communication fabric specific and can therefore leverage the use of
existing commercial CPU and fabric technology. The prototype
implementation however is specific to the choice of the Myrinet fabric and
the PA-8000 CPU. The context sensitive approach for injection of message
data into the receiving memory hierarchy or of the consistency protocol
choice for distributed shared memory applications is also new. Further
improvements to communication latency can be made by improvements in the
applications programming interface (API), the communication protocols, and
the hardware interface to the communication fabric which have typically
been designed for generality rather than for performance.
The focus for the first year of the project has been:
- the creation of an accurate simulation environment (PAINT) that is
capable of testing the new memory architecture ideas,
- creation of an efficient message passing API (Direct Deposit), design
and implementation of Direct Deposit on an existing commercial operating
system,
- design and prototype implementation of the new hardware
communications interface that provides efficient hardware support for the
Direct Deposit protocol,
- developing simulation infrastructure to test the ideas for context
sensitivity and flexible coherency protocols,
- porting test applications into the simulation environment,
- creation of a novel application synthesis system which will greatly
enhance the ability to examine the utility of architectural design choices
with respect to application algorithmic structure,
- develop formal verification expertise capable of validating our
cache protocol design, and
- creating the VLSI CAD infrastructure that will permit final
implementation of the Avalanche architecture in the 0.6 micron
HP CMOS 14B process.
The project has been underway for 8
months and is on schedule for completion of the above by the end of FY95.
Next year's focus will be to:
- finalize the architecture choice for the context sensitive memory
control via simulation using real and synthesized applications,
- formally verify our cache management protocol,
- validate our VLSI CAD capability and interface design with a fragment
test chip via MOSIS
which now supports the
HP CMOS 14B process,
- testing this fragment chip with the PA-8000 when it is introduced
mid-year 1996, and
- begin the implementation of the final context sensitive controller
chip.
In FY97, the focus will be on the implementation of a
64 processing element prototype of the Avalanche system using the PA-8000
CPU, the custom Avalanche cache controller chip, and the Myrinet
interconnect and to quantify the performance advantages of the approach on
important large scale applications chosen by ARPA.
This work was sponsored by the
Space and Naval Warfare Systems Command (SPAWAR) and
Advanced Research Projects Agency (ARPA),
Communication and Memory Architectures for
Scalable Parallel Computing,
ARPA order #B990 under SPAWAR contract #N00039-95-C-0018
Back to the
Avalanche Project Home Page,
or Computer Science Department Home Page.
Feedback to <avalanche@jensen.cs.utah.edu>.