Application-Specific Accelerators for Communications

  • Yang Sun
  • Kiarash Amiri
  • Michael Brogioli
  • Joseph R. Cavallaro


For computation-intensive digital signal processing algorithms, complexity is exceeding the processing capabilities of general-purpose digital signal processors (DSPs). In some of these applications, DSP hardware accelerators have been widely used to off-load a variety of algorithms from the main DSP host, including FFT, FIR/IIR filters, multiple-input multiple-output (MIMO) detectors, and error correction codes (Viterbi, Turbo, LDPC) decoders. Given power and cost considerations, simply implementing these computationally complex parallel algorithms with high-speed general-purpose DSP processor is not very efficient. However, not all DSP algorithms are appropriate for off-loading to a hardware accelerator. First, these algorithms should have data-parallel computations and repeated operations that are amenable to hardware implementation. Second, these algorithms should have a deterministic dataflow graph that maps to parallel datapaths. The accelerators that we consider are mostly coarse grain to better deal with streaming data transfer for achieving both high performance and low power. In this chapter, we focus on some of the basic and advanced digital signal processing algorithms for communications and cover major examples of DSP accelerators for communications.


LDPC Code Turbo Code Convolutional Code Check Node Hardware Accelerator 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alamouti, S.M.: A simple transmit diversity technique for wireless communications. IEEE Journal on Selected Areas in Communications 16(8), 1451–1458 (1998)CrossRefGoogle Scholar
  2. 2.
    Amiri, K., Cavallaro, J.R.: FPGA implementation of dynamic threshold sphere detection for MIMO systems. IEEE Asilomar Conference on Signals, Systems and Computers pp. 94–98 (2006)Google Scholar
  3. 3.
    Bahl, L., Cocke, J., Jelinek, F., Raviv, J.: Optimal decoding of linear codes for minimizing symbol error rate. IEEE Transactions on Information Theory IT-20, 284–287 (1974)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Bass, B.: A low-power, high-performance, 1024-point FFT processor. In: IEEE International Solid-State Circuit Conference (ISSCC) (1999)Google Scholar
  5. 5.
    Berrou, C., Glavieux, A., Thitimajshima, P.: Near Shannon limit error-correcting coding and decoding: Turbo-codes. In: IEEE International Conference on Communications, pp. 1064–1070 (1993)Google Scholar
  6. 6.
    Bougard, B., Giulietti, A., Derudder, V.,Weijers, J.W., Dupont, S., Hollevoet, L., Catthoor, F., Van der Perre, L., De Man, H., Lauwereins, R.: A scalable 8.7-nJ/bit 75.6-Mb/s parallel concatenated convolutional (turbo-) codec. In: IEEE International Solid-State Circuit Conference (ISSCC) (2003)Google Scholar
  7. 7.
    Brack, T., Alles, M., Lehnigk-Emden, T., Kienle, F., Wehn, N., Lapos, Insalata, N., Rossi, F., Rovini, M., Fanucci, L.: Low complexity LDPC code decoders for next generation standards. In: Design, Automation, and Test in Europe (DATE), pp. 1–6 (2007)Google Scholar
  8. 8.
    Brogioli, M.: Reconfigurable heterogeneous DSP/FPGA based embedded architectures for numerically intensive embedded computingworkloads. Ph.D. thesis, Rice University, Houston, Texas, USA (2007)Google Scholar
  9. 9.
    Brogioli, M., Radosavljevic, P., Cavallaro, J.: A general hardware/software codesign methodology for embedded signal processing and multimedia workloads. In: IEEE 40th Asilomar Conference on Signals, Systems, and Computers, pp. 1486–1490 (2006)Google Scholar
  10. 10.
    Burg, A.: VLSI circuits for MIMO communication systems. Ph.D. thesis, Swiss Federal Institute of Technology, Zurich, Switzerland (2006)Google Scholar
  11. 11.
    Burg, A., Borgmann, M., Wenk, M., Zellweger, M., Fichtner, W., Bolcskei, H.: VLSI implementation of MIMO detection using the sphere decoding algorithm. IEEE Journal of Solid-State Circuits 40(7), 1566–1577 (2005)CrossRefGoogle Scholar
  12. 12.
    Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19, 297–301 (1965)MATHMathSciNetGoogle Scholar
  13. 13.
    Damen, M.O., Gamal, H.E., Caire, G.: On maximum likelihood detection and the search for the closest lattice point. IEEE Transactions on Information Theory 49(10), 2389–2402 (2003)CrossRefGoogle Scholar
  14. 14.
  15. 15.
    Fincke, U., Pohst, M.: Improved methods for calculating vectors of short length in a lattice, including a complexity analysis. Mathematics of Computation 44(170), 463–471 (1985)MATHMathSciNetGoogle Scholar
  16. 16.
    Forney, G.D.: The Viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Foschini, G.: Layered space-time architecture for wireless communication in a fading environment when using multiple antennas. Bell Labs. Tech. Journal 2, 41–59 (1996)Google Scholar
  18. 18.
    Gallager, R.: Low-density parity-check codes. IEEE Transactions on Information Theory 8, 21–28 (1962)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Garrett, D., Davis, L., ten Brink, S., Hochwald, B., Knagge, G.: Silicon complexity for maximum likelihood MIMO detection using spherical decoding. IEEE Journal of Solid-State Circuits 39(9), 1544–1552 (2004)CrossRefGoogle Scholar
  20. 20.
    Golden, G., Foschini, G.J., Valenzuela, R.A., Wolniansky, P.W.: Detection algorithms and initial laboratory results using V-BLAST space-time communication architecture. Electronics Letters 35, 14–15 (1999)CrossRefGoogle Scholar
  21. 21.
    Gunnam, K., Choi, G.S., Yeary, M.B., Atiquzzaman, M.: VLSI architectures for layered decoding for irregular LDPC codes of WiMax. In: IEEE International Conference on Communications, pp. 4542–4547 (2007)Google Scholar
  22. 22.
    Guo, Z., Nilsson, P.: Algorithm and implementation of the K-best sphere decoding for MIMO detection. IEEE Journal on Selected Areas in Communications 24(3), 491–503 (2006)CrossRefGoogle Scholar
  23. 23.
    Hassibi, B., Vikalo, H.: On the sphere-decoding algorithm I. Expected complexity. IEEE Transactions on Signal Processing 53(8), 2806–2818 (2005)CrossRefMathSciNetGoogle Scholar
  24. 24.
    Hunter, H.C., Moreno, J.H.: A new look at exploiting data parallelism in embedded systems. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 159–169 (2003)Google Scholar
  25. 25.
    Tensilica Inc.: (2009)
  26. 26.
    Texas Instruments: TMS320C55x DSP CPU Programmer’s Reference Supplement. (2005)
  27. 27.
    Texas Instruments: TMS320C6474 high performance multicore processor datasheet. (2008)
  28. 28.
    Instruments, T.: TMS320C6000 CPU and Instruction Set Reference Guide. (2001)
  29. 29.
    Lechner, G., Sayir, J., Rupp, M.: Efficient DSP implementation of an LDPC decoder. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4, pp. 665–668 (2004)Google Scholar
  30. 30.
    Lee, S.J., Shanbhag, N.R., Singer, A.C.: Area-efficient high-throughput MAP decoder architectures. IEEE Transactions on VLSI Systems 13, 921–933 (2005)CrossRefGoogle Scholar
  31. 31.
    Martina, M., Nicola, M., Masera, G.: A flexible UMTS-WiMax turbo decoder architecture. IEEE Transactions on Circuits and Systems II 55, 369–273 (2008)CrossRefGoogle Scholar
  32. 32.
    Myllylä, M., Silvola, P., Juntti, M., Cavallaro, J.R.: Comparison of two novel list sphere detector algorithms for mimo-ofdm systems. IEEE International Symposium on Personal Indoor and Mobile Radio Communications (2006)Google Scholar
  33. 33.
    Parhi, K.K.: VLSI Digital Signal Processing Systems Design and Implementation. Wiley (1999)Google Scholar
  34. 34.
    Prescher, G., Gemmeke, T., Noll, T.G.: A parametrizable low-power high-throughput turbodecoder. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5, pp. 25–28 (2005)Google Scholar
  35. 35.
    Rovini, M., Gentile, G., Rossi, F., Fanucci, L.: A scalable decoder architecture for IEEE 802.11n LDPC codes. In: IEEE Global Telecommunications Conference, pp. 3270–3274 (2007)Google Scholar
  36. 36.
    Sadjadpour, H., Sloane, N., Salehi, M., Nebe, G.: Interleaver design for turbo codes. IEEE Journal on Seleteced Areas in Communications 19, 831–837 (2001)CrossRefGoogle Scholar
  37. 37.
    Salmela, P., Gu, R., Bhattacharyya, S., Takala, J.: Efficient parallel memory organization for turbo decoders. In: Proc. European Signal Processing Conf., pp. 831–835 (2007)Google Scholar
  38. 38.
    Freescale Semiconductor: MSC8156 six core broadband wireless access DSP. (2009)
  39. 39.
    Semiconductor, F.: Freescale Starcore Architecture. (2009)
  40. 40.
    Shin, M.C., Park, I.C.: A programmable turbo decoder for multiple 3G wireless standards. In: IEEE Solid-State Circuits Conference, vol. 1, pp. 154–484 (2003)Google Scholar
  41. 41.
    Sun, J., Takeshita, O.: Interleavers for turbo codes using permutation polynomials over integer rings. IEEE Transactions on Information Theory 51(1) (2005)MathSciNetGoogle Scholar
  42. 42.
    Sun, Y., Cavallaro, J.R.: A low-power 1-Gbps reconfigurable LDPC decoder design for multiple 4G wireless standards. In: IEEE International SOC Conference (SoCC), pp. 367–370 (2008)Google Scholar
  43. 43.
    Sun, Y., Karkooti, M., Cavallaro, J.R.: VLSI decoder architecture for high throughput, variable block-size and multi-rate LDPC codes. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2104–2107 (2007)Google Scholar
  44. 44.
    Sun, Y., Zhu, Y., Goel, M., Cavallaro, J.R.: Configurable and scalable high throughput turbo decoder architecture for multiple 4G wireless standards. In: IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 209–214 (2008)Google Scholar
  45. 45.
    Tarokh, V., Jafarkhani, H., Calderbank, A.R.: Space-time block codes from orthogonal designs. IEEE Transactions on Information Theory 45(5), 1456–1467 (1999)MATHCrossRefMathSciNetGoogle Scholar
  46. 46.
    Tarokh, V., Jafarkhani, H., Calderbank, A.R.: Space time block coding for wireless communications: Performance results. IEEE Journal on Selected Areas in Communications 17(3), 451–460 (1999)CrossRefGoogle Scholar
  47. 47.
    Telatar, I.E.: Capacity of multiantenna Gaussian channels. European Transactions on Telecommunications 10, 585–595 (1999)CrossRefGoogle Scholar
  48. 48.
    Thul, M.J., Gilbert, F., Vogt, T., Kreiselmaier, G., Wehn, N.: A scalable system architecture for high-throughput turbo-decoders. Journal of VLSI Signal Processing pp. 63–77 (2005)Google Scholar
  49. 49.
    Viterbi, A.: Error bounds for convolutional coding and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory IT-13, 260–269 (1967)CrossRefGoogle Scholar
  50. 50.
    Wijting, C., Ojanperä, T., Juntti, M., Kansanen, K., Prasad, R.: Groupwise serial multiuser detectors for multirate DS-CDMA. In: IEEE Vehicular Technology Conference, vol. 1, pp. 836–840 (1999)Google Scholar
  51. 51.
    Willmann, P., Kim, H., Rixner, S., Pai, V.S.: An efficient programmable 10 Gigabit Ethernet network interface card. In: ACM International Symposium on High-Performance Computer Architecture, pp. 85–86 (2006)Google Scholar
  52. 52.
    Wong, K., Tsui, C., Cheng, R.S., Mow, W.: A VLSI architecture of a K-best lattice decoding algorithm for MIMO channels. IEEE International Symposium on Circuits and Systems 3, 273–276 (2002)Google Scholar
  53. 53.
    Ye, Z.A., Moshovos, A., Hauck, S., Banerjee, P.: CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit. In: Proceedings of the 27th Annual International Symposium on Computer Architecture, pp. 225–235 (2000)Google Scholar
  54. 54.
    Zhong, H., Zhang, T.: Block-LDPC: a practical LDPC coding system design approach. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 52(4), 766–775 (2005)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Yang Sun
    • 1
  • Kiarash Amiri
    • 1
  • Michael Brogioli
    • 2
  • Joseph R. Cavallaro
    • 1
  1. 1.Rice UniversityHoustonUSA
  2. 2.Freescale Semiconductor Inc.AustinUSA

Personalised recommendations