|
The ACT (Adaptive Cellular Telephony) Co-Processor
by
Ali Ibrahim
Advised by
Al Davis
Future mobile terminals need to support higher data rates, full motion
video and multimedia applications, a variety of wireless standards, be
energy efficient, flexible, and have a low time to market and be low in
cost. The computational requirements imposed by these applications and
standards have increased exponentially (faster than Moore's law) since
the introduction of the first generation wireless telephony (1G).
The traditional approach for applications requiring both performance and
low-power is to employ ASICs for compute intensive components. In areas
where applications evolve rapidly, flexibility is also an important
factor and a general purpose or embedded processor approach has often
been used for this reason. For applications such as wireless
communications, voice, and video processing: ASICs are too inflexible and
costly; low-power processors do not have sufficient computational power;
and general purpose processors consume too much power. This situation
motivates this investigation of an alternative approach.
The key in designing for low power, high performance, and flexibility
relies on finding opportunities for customization for a particular
domain. There could be a high number of parameters involved in this
process (memory system, single- vs. multi-cluster, bypass logic, register
files, compression, and function unit design). Each of these parameters
can have a big effect on the performance, power, and flexibility.
In ACT, a high energy-delay product efficiency was achieved through
software controlled distributed memories, modulo addressed distributed
single ported register files, compiler controlled clock gating,
multi-level reconfigurable interconnects, semi-reconfigurable address
generation units, SIMD-ALUs, compression techniques, context switching,
and extra hardware to support special wireless operations. The processor
is basically a fine-grain VLIW architecture. The fine-grained software
control provides considerable generality, and efficiency in terms of
energy-delay product since different pipelines can be dynamically
reconfigured to support a new processing phase that resembles data flows
found in an ASIC implementation.
For a range of algorithms taken from 3G wireless, DSP and MPEG kernels,
the processor is within one to two orders of magnitude of the
energy-delay product of an ASIC and three to four orders of magnitude
more efficient than a low power embedded processor implementation. Energy
and performance numbers for ACT were calculated using Synopsys Nanosim, a
commercial Spice level circuit simulator, on a fully synthesized and
back-annotated .25 micron Verilog- and Module Compiler-based
implementation.
|