Architecture for Cellular Telephony
A C T
Future mobile terminals need to support higher data rates, full motion video and
multimedia applications, a variety of wireless standards, be energy
efficient, flexible, and have a low time to market and be low in
cost. The computational requirements imposed by these applications and standards have increased
exponentially (faster than Moore's law) since the introduction of the first generation wireless
telephony (1G).
The traditional approach for applications requiring both performance
and low-power is to employ ASICs
for compute intensive components. In areas where applications evolve
rapidly, flexibility is also an important factor and a general purpose or
embedded processor approach has often been used for this reason. For
applications such as wireless communications, voice, and video processing:
ASICs are too inflexible and costly; low-power processors do not have
sufficient computational power; and general purpose processors consume too
much power. This situation motivates this investigation of an alternative
approach.
The key in designing for low power, high performance, and flexibility
relies on finding opportunities for customization for a particular
domain. There could be a high number of parameters involved in this
process (memory system, single- vs. multi-cluster,
bypass logic, register files, compression, and function unit design). Each
of these parameters can have a big effect on the performance, power,
and flexibility.
In ACT, a high energy-delay product efficiency was
achieved through software controlled
distributed memories, modulo addressed distributed single ported
register files, compiler controlled clock gating, multi-level reconfigurable interconnects,
semi-reconfigurable address generation units, SIMD-ALUs, compression
techniques, context switching, and extra
hardware to support special wireless operations.
The processor is basically a fine-grain VLIW architecture.
The fine-grained software control provides considerable generality,
and efficiency in terms of energy-delay product
since different pipelines can be dynamically reconfigured to
support a new processing phase that resembles data flows found in an ASIC
implementation.
For a range of algorithms taken from 3G wireless, DSP and MPEG
kernels, the processor is within one to two orders of magnitude of the energy-delay
product of an ASIC and three to four orders of magnitude more efficient than a
low power embedded processor implementation.
Energy and performance numbers for ACT were calculated
using Synopsys Nanosim, a commercial Spice level circuit simulator, on
a fully synthesized and back-annotated .25μm Verilog- and Module
Compiler-based implementation.
RESEARCH STATEMENT
TAKE ME HOME