Architecture for Cellular Telephony

Future mobile terminals need to support higher data rates, full motion video and multimedia applications, a variety of wireless standards, be energy efficient, flexible, and have a low time to market and be low in cost. The computational requirements imposed by these applications and standards have increased exponentially (faster than Moore's law) since the introduction of the first generation wireless telephony (1G).

The traditional approach for applications requiring both performance and low-power is to employ ASICs for compute intensive components. In areas where applications evolve rapidly, flexibility is also an important factor and a general purpose or embedded processor approach has often been used for this reason. For applications such as wireless communications, voice, and video processing: ASICs are too inflexible and costly; low-power processors do not have sufficient computational power; and general purpose processors consume too much power. This situation motivates this investigation of an alternative approach.

The key in designing for low power, high performance, and flexibility relies on finding opportunities for customization for a particular domain. There could be a high number of parameters involved in this process (memory system, single- vs. multi-cluster, bypass logic, register files, compression, and function unit design). Each of these parameters can have a big effect on the performance, power, and flexibility.

In ACT, a high energy-delay product efficiency was achieved through software controlled distributed memories, modulo addressed distributed single ported register files, compiler controlled clock gating, multi-level reconfigurable interconnects, semi-reconfigurable address generation units, SIMD-ALUs, compression techniques, context switching, and extra hardware to support special wireless operations. The processor is basically a fine-grain VLIW architecture. The fine-grained software control provides considerable generality, and efficiency in terms of energy-delay product since different pipelines can be dynamically reconfigured to support a new processing phase that resembles data flows found in an ASIC implementation.

For a range of algorithms taken from 3G wireless, DSP and MPEG kernels, the processor is within one to two orders of magnitude of the energy-delay product of an ASIC and three to four orders of magnitude more efficient than a low power embedded processor implementation. Energy and performance numbers for ACT were calculated using Synopsys Nanosim, a commercial Spice level circuit simulator, on a fully synthesized and back-annotated .25μm Verilog- and Module Compiler-based implementation.