Advertisement

Application-Specific Accelerators for Communications

  • Chance Tarver
  • Yang Sun
  • Kiarash Amiri
  • Michael Brogioli
  • Joseph R. Cavallaro
Chapter

Abstract

For computation-intensive digital signal processing algorithms, complexity is exceeding the processing capabilities of general-purpose digital signal processors (DSPs). In some of these applications, DSP hardware accelerators have been widely used to off-load a variety of algorithms from the main DSP host, including the fast Fourier transform, digital filters, multiple-input multiple-output detectors, and error correction codes (Viterbi, turbo, low-density parity-check) decoders. Given power and cost considerations, simply implementing these computationally complex parallel algorithms with high-speed general-purpose DSP processor is not very efficient. However, not all DSP algorithms are appropriate for off-loading to a hardware accelerator. First, these algorithms should have data-parallel computations and repeated operations that are amenable to hardware implementation. Second, these algorithms should have a deterministic dataflow graph that maps to parallel datapaths. In this chapter, we focus on some of the basic and advanced digital signal processing algorithms for communications and cover major examples of DSP accelerators for communications.

References

  1. 1.
    LTE; Evolved Universal Terrestrial Radio Access (E-UTRA) User Equipment (UE) radio transmission and reception, 3GPP TS 36.101 V13.2.1 (Release 13) (May 2016)Google Scholar
  2. 2.
    Abdelaziz, M., Tarver, C., Li, K., Anttila, L., Martinez, R., Valkama, M., Cavallaro, J.R.: Sub-Band Digital Predistortion for Noncontiguous Transmissions: Algorithm Development and Real-Time Prototype Implementation. In: 2015 49th Asilomar Conference on Signals, Systems and Computers, pp. 1180–1186 (2015).  https://doi.org/10.1109/ACSSC.2015.7421326
  3. 3.
    Alamouti, S.M.: A Simple Transmit Diversity Technique for Wireless Communications. IEEE Journal on Selected Areas in Communications 16(8), 1451–1458 (1998)CrossRefGoogle Scholar
  4. 4.
    Amiri, K., Cavallaro, J.R.: FPGA Implementation of Dynamic Threshold Sphere Detection for MIMO Systems. In: IEEE Asilomar Conf. on Signals, Syst. and Computers, pp. 94–98 (2006)Google Scholar
  5. 5.
    Analog Devices: The SHARC Processor Family. http://www.analog.com/en/products/processors-dsp/sharc.html (2016)
  6. 6.
    Andrews, J.G., Buzzi, S., Choi, W., Hanly, S.V., Lozano, A., Soong, A.C.K., Zhang, J.C.: What Will 5G Be? IEEE Journal on Selected Areas in Communications 32(6), 1065–1082 (2014).  https://doi.org/10.1109/JSAC.2014.2328098 CrossRefGoogle Scholar
  7. 7.
    Bahl, L., Cocke, J., Jelinek, F., Raviv, J.: Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate. IEEE Transactions on Information Theory IT-20, 284–287 (1974)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Berrou, C., Glavieux, A., Thitimajshima, P.: Near Shannon Limit Error-Correcting Coding and Decoding: Turbo-Codes. In: IEEE Int. Conf. on Commun., pp. 1064–1070 (1993)Google Scholar
  9. 9.
    Brack, T., Alles, M., Lehnigk-Emden, T., Kienle, F., Wehn, N., Lapos, Insalata, N., Rossi, F., Rovini, M., Fanucci, L.: Low Complexity LDPC Code Decoders for Next Generation Standards. In: Design, Automation, and Test in Europe, pp. 1–6 (2007)Google Scholar
  10. 10.
    Brogioli, M.: Reconfigurable Heterogeneous DSP/FPGA Based Embedded Architectures for Numerically Intensive Embedded Computing Workloads. Ph.D. thesis, Rice University, Houston, Texas, USA (2007)Google Scholar
  11. 11.
    Brogioli, M., Radosavljevic, P., Cavallaro, J.: A General Hardware/Software Codesign Methodology for Embedded Signal Processing and Multimedia Workloads. In: IEEE Asilomar Conf. on Signals, Syst., and Computers, pp. 1486–1490 (2006)Google Scholar
  12. 12.
    Burg, A.: VLSI Circuits for MIMO Communication Systems. Ph.D. thesis, Swiss Federal Institute Of Technology, Zurich, Switzerland (2006)Google Scholar
  13. 13.
    Burg, A., Borgmann, M., Wenk, M., Zellweger, M., Fichtner, W., Bolcskei, H.: VLSI Implementation of MIMO Detection using the Sphere Decoding Algorithm. IEEE Journal of Solid-State Circuits 40(7), 1566–1577 (2005)CrossRefGoogle Scholar
  14. 14.
  15. 15.
    Cheng, C.C., Tsai, Y.M., Chen, L.G., Chandrakasan, A.: A 0.077 to 0.168 nJ/bit/iteration Scalable 3GPP LTE Turbo Decoder with an Adaptive Sub-Block Parallel Scheme and an Embedded DVFS Engine. In: IEEE Custom Integrated Circuits Conference, pp. 1–4 (2010)Google Scholar
  16. 16.
    Cupaiuolo, T., Siti, M., Tomasoni, A.: Low-Complexity High Throughput VLSI Architecture of Soft-Output ML MIMO Detector. In: Design, Automation and Test in Europe Conference and Exhibition, pp. 1396–1401 (2010)Google Scholar
  17. 17.
    Damen, M.O., Gamal, H.E., Caire, G.: On Maximum Likelihood Detection and the Search for the Closest Lattice Point. IEEE Transaction on Information Theory 49(10), 2389–2402 (2003)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Fincke, U., Pohst, M.: Improved Methods for Calculating Vectors of Short Length in a Lattice, Including a Complexity Analysis. Mathematics of Computation 44(170), 463–471 (1985)zbMATHGoogle Scholar
  19. 19.
    Foschini, G.: Layered Space-Time Architecture for Wireless Communication in a Fading Environment when Using Multiple Antennas. Bell Labs. Tech. Journal 2, 41–59 (1996)Google Scholar
  20. 20.
    Freescale Semiconductor: MSC8156 Six Core Broadband Wireless Access DSP. www.freescale.com/starcore (2009)
  21. 21.
    Gallager, R.: Low-Density Parity-Check Codes. IEEE Transactions on Information Theory IT-8, 21–28 (1962)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Garrett, D., Davis, L., ten Brink, S., Hochwald, B., Knagge, G.: Silicon Complexity for Maximum Likelihood MIMO Detection Using Spherical Decoding. IEEE Journal of Solid-State Circuits 39(9), 1544–1552 (2004)CrossRefGoogle Scholar
  23. 23.
    Garrido, M., Qureshi, F., Takala, J., Gustafsson, O.: Hardware architectures for the fast Fourier transform. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, third edn. Springer (2018)Google Scholar
  24. 24.
    Ghannouchi, F.M., Hammi, O.: Behavioral Modeling and Predistortion. IEEE Microwave Magazine 10(7), 52–64 (2009).  https://doi.org/10.1109/MMM.2009.934516 CrossRefGoogle Scholar
  25. 25.
    Golden, G., Foschini, G.J., Valenzuela, R.A., Wolniansky, P.W.: Detection Algorithms and Initial Laboratory Results Using V-BLAST Space-Time Communication Architecture. Electronics Letters 35(1), 14–15 (1999)CrossRefGoogle Scholar
  26. 26.
    Gunnam, K., Choi, G.S., Yeary, M.B., Atiquzzaman, M.: VLSI Architectures for Layered decoding for Irregular LDPC Codes of WiMax. In: IEEE International Conference on Communications, pp. 4542–4547 (2007)Google Scholar
  27. 27.
    Guo, Z., Nilsson, P.: Algorithm and Implementation of the K-best Sphere Decoding for MIMO Detection. IEEE Journal on Seleteced Areas in Communications 24(3), 491–503 (2006)CrossRefGoogle Scholar
  28. 28.
    Gustafsson, O., Wanhammar, L.: Arithmetic. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, third edn. Springer (2018)Google Scholar
  29. 29.
    Han, S., Tellambura, C.: A Complexity-Efficient Sphere Decoder for MIMO Systems. In: IEEE International Conference on Communications, pp. 1–5 (2011)Google Scholar
  30. 30.
    Hassibi, B., Vikalo, H.: On the Sphere-Decoding Algorithm I. Expected Complexity. IEEE Transaction On Signal Processing 53(8), 2806–2818 (2005)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Hunter, H.C., Moreno, J.H.: A New Look at Exploiting Data Parallelism in Embedded Systems. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 159–169 (2003)Google Scholar
  32. 32.
    Jin, J., Tsui, C.: Low-Complexity Switch Network for Reconfigurable LDPC Decoders. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18(8), 1185–1195 (2010)CrossRefGoogle Scholar
  33. 33.
    Katz, A., Wood, J., Chokola, D.: The Evolution of PA Linearization: From Classic Feedforward and Feedback Through Analog and Digital Predistortion. IEEE Microwave Magazine 17(2), 32–40 (2016).  https://doi.org/10.1109/MMM.2015.2498079 CrossRefGoogle Scholar
  34. 34.
    Kessler, C.W.: Compiling for VLIW DSPs. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, third edn. Springer (2018)Google Scholar
  35. 35.
    Larsson, E.G., Edfors, O., Tufvesson, F., Marzetta, T.L.: Massive MIMO for Next Generation Wireless Systems. IEEE Communications Magazine 52(2), 186–195 (2014).  https://doi.org/10.1109/MCOM.2014.6736761 CrossRefGoogle Scholar
  36. 36.
    Lechner, G., Sayir, J., Rupp, M.: Efficient DSP Implementation of an LDPC Decoder. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 665–668 (2004)Google Scholar
  37. 37.
    Lee, S.J., Shanbhag, N.R., Singer, A.C.: Area-Efficient High-Throughput MAP Decoder Architectures. IEEE Transaction on VLSI Systems 13(8), 921–933 (2005)CrossRefGoogle Scholar
  38. 38.
    Li, K., Ghazi, A., Boutellier, J., Abdelaziz, M., Anttila, L., Juntti, M., Valkama, M., Cavallaro, J.R.: Mobile GPU Accelerated Digital Predistortion on a Software-Defined Mobile Transmitter. In: 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 756–760 (2015).  https://doi.org/10.1109/GlobalSIP.2015.7418298
  39. 39.
    Li, K., Ghazi, A., Tarver, C., Boutellier, J., Abdelaziz, M., Anttila, L., Juntti, M.J., Valkama, M., Cavallaro, J.R.: Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters. CoRR abs/1612.09001 (2016). URL http://arxiv.org/abs/1612.09001
  40. 40.
    Li, K., Yin, B., Wu, M., Cavallaro, J.R., Studer, C.: Accelerating Massive MIMO Uplink Detection on GPU for SDR Systems. In: 2015 IEEE Dallas Circuits and Systems Conference (DCAS), pp. 1–4 (2015).  https://doi.org/10.1109/DCAS.2015.7356600
  41. 41.
    Lin, C.H., Chen, C.Y., Wu, A.Y.: Area-Efficient Scalable MAP Processor Design for High-Throughput Multistandard Convolutional Turbo Decoding. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19(2), 305–318 (2011)CrossRefGoogle Scholar
  42. 42.
    Mango: WARP Project. URL http://warpproject.org
  43. 43.
    Martina, M., Nicola, M., Masera, G.: A Flexible UMTS-WiMax Turbo Decoder Architecture. IEEE Transactions on Circuits and Systems II 55(4), 369–273 (2008)CrossRefGoogle Scholar
  44. 44.
    May, M., Ilnseher, T., Wehn, N., Raab, W.: A 150Mbit/s 3GPP LTE Turbo Code Decoder. In: IEEE Design, Automation & Test in Europe Conference & Exhibition, pp. 1420–1425 (2010)Google Scholar
  45. 45.
    McAllister, J.: High performance stream processing on FPGA. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, third edn. Springer (2018)Google Scholar
  46. 46.
    Menard, D., Caffarena, G., Lopez, J.A., Novo, D., Sentieys, O.: Analysis of finite word-length effects in fixed-point systems. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, third edn. Springer (2018)Google Scholar
  47. 47.
    Myllylä, M., Silvola, P., Juntti, M., Cavallaro, J.R.: Comparison of Two Novel List Sphere Detector Algorithms for MIMO-OFDM Systems. In: IEEE International Symposium on Personal Indoor and Mobile Radio Communications, pp. 1–5 (2006)Google Scholar
  48. 48.
  49. 49.
    NXP Semiconductor: QorIQ Layerscape: A Converged Architecture Approach (2017)Google Scholar
  50. 50.
    Parhi, K.K.: VLSI Digital Signal Processing Systems Design and Implementation. Wiley (1999)Google Scholar
  51. 51.
    Pelcat, M.: Models of architecture for DSP systems. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, third edn. Springer (2018)Google Scholar
  52. 52.
    Qualcomm: Snapdragon 835 Mobile Platform. online: https://www.qualcomm.com/products/snapdragon/processors/835 (2017)
  53. 53.
    Renfors, M., Juntti, M., Valkama, M.: Signal processing for wireless transceivers. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, third edn. Springer (2018)Google Scholar
  54. 54.
    Rovini, M., Gentile, G., Rossi, F., Fanucci, L.: A Scalable Decoder Architecture for IEEE 802.11n LDPC Codes. In: IEEE Global Telecommunications Conference, pp. 3270–3274 (2007)Google Scholar
  55. 55.
    Sadjadpour, H., Sloane, N., Salehi, M., Nebe, G.: Interleaver Design for Turbo Codes. IEEE Journal on Seleteced Areas in Communications 19(5), 831–837 (2001)CrossRefGoogle Scholar
  56. 56.
    Salmela, P., Gu, R., Bhattacharyya, S., Takala, J.: Efficient Parallel Memory Organization for Turbo Decoders. In: Proc. European Signal Processing Conf., pp. 831–835 (2007)Google Scholar
  57. 57.
    Shin, M.C., Park, I.C.: A Programmable Turbo Decoder for Multiple 3G Wireless Standards. In: IEEE Solid-State Circuits Conference, vol. 1, pp. 154–484 (2003)Google Scholar
  58. 58.
    Studer, C., Benkeser, C., Belfanti, S., Huang, Q.: Design and Implementation of a Parallel Turbo-Decoder ASIC for 3GPP-LTE. IEEE Journal of Solid-State Circuits 46(1), 8–17 (2011)CrossRefGoogle Scholar
  59. 59.
    Sun, J., Takeshita, O.: Interleavers for Turbo Codes Using Permutation Polynomials Over Integer Rings. IEEE Transaction on Information Theory 51(1), 101–119 (2005)MathSciNetCrossRefGoogle Scholar
  60. 60.
    Sun, Y.: Parallel VLSI Architectures for Multi-Gbps MIMO Communication Systems. Ph.D. thesis, Rice University, Houston, Texas, USA (2010)Google Scholar
  61. 61.
    Sun, Y., Cavallaro, J.R.: A Low-power 1-Gbps Reconfigurable LDPC Decoder Design for Multiple 4G Wireless Standards. In: IEEE International SOC Conference, pp. 367–370 (2008)Google Scholar
  62. 62.
    Sun, Y., Cavallaro, J.R.: Scalable and Low Power LDPC Decoder Design Using High Level Algorithmic Synthesis. In: IEEE International SOC Conference (SoCC), pp. 267–270 (2009)Google Scholar
  63. 63.
    Sun, Y., Cavallaro, J.R.: A Flexible LDPC/Turbo Decoder Architecture. Journal of Signal Processing System 64(1), 1–16 (2011)CrossRefGoogle Scholar
  64. 64.
    Sun, Y., Cavallaro, J.R.: Efficient Hardware Implementation of a Highly-Parallel 3GPP LTE, LTE-Advance Turbo Decoder. Integration, the VLSI Journal, Special Issue on Hardware Architectures for Algebra, Cryptology and Number Theory 44(4), 305–315 (2011)Google Scholar
  65. 65.
    Sun, Y., Karkooti, M., Cavallaro, J.R.: VLSI Decoder Architecture for High Throughput, Variable Block-Size and Multi-Rate LDPC Codes. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2104–2107 (2007)Google Scholar
  66. 66.
    Sun, Y., Wang, G., Cavallaro, J.R.: Multi-Layer Parallel Decoding Algorithm and VLSI Architecture for Quasi-Cyclic LDPC Codes. In: IEEE International Symposium on Circuits and Systems, pp. 1776–1779 (2011)Google Scholar
  67. 67.
    Sun, Y., Zhu, Y., Goel, M., Cavallaro, J.R.: Configurable and Scalable High Throughput Turbo Decoder Architecture for Multiple 4G Wireless Standards. In: IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 209–214 (2008)Google Scholar
  68. 68.
    Sung, W.: Optimization of number representations. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, third edn. Springer (2018)Google Scholar
  69. 69.
    Sutter, B.D., Raghavan, P., Lambrechts, A.: Coarse grained reconfigurable array architectures. In: S.S. Bhattacharyya, E.F. Deprettere, R. Leupers, J. Takala (eds.) Handbook of Signal Processing Systems, third edn. Springer (2018)Google Scholar
  70. 70.
    Tarokh, V., Jafarkhani, H., Calderbank, A.R.: Space-Time Block Codes from Orthogonal Designs. IEEE Transactions on Information Theory 45(5), 1456–1467 (1999)MathSciNetCrossRefGoogle Scholar
  71. 71.
    Tarokh, V., Jafarkhani, H., Calderbank, A.R.: Space Time Block Coding for Wireless Communications: Performance Results. IEEE Journal on Selected Areas in Communications 17(3), 451–460 (1999)CrossRefGoogle Scholar
  72. 72.
    Telatar, I.E.: Capacity of Multiantenna Gaussian Channels. European Transaction on Telecommunications 10, 585–595 (1999)MathSciNetCrossRefGoogle Scholar
  73. 73.
    Texas Instruments: TMS320TCI6614 Communications Infrastructure KeyStone SoC Data Manual. http://www.ti.com/lit/ds/symlink/tms320tci6614.pdf (2013)
  74. 74.
    Texas Instruments: Communications Processors Products. http://focus.ti.com/docs/prod/folders/print/tms320c6474.html (2016)
  75. 75.
    Texas Instruments: Digital Signal Processors. https://www.ti.com/lsds/ti/processors/dsp/overview.page (2017)
  76. 76.
    Texas Instruments: Wideband Transmit IC Solution with integrated Digital Predistortion, Digital Upconversion. online: http://www.ti.com/product/GC5322/description (2017)
  77. 77.
    Wannstrom, J.: Carrier Aggregation Explained. online: http://www.3gpp.org/technologies/keywords-acronyms/101-carrier-aggregation-explained (2013)
  78. 78.
    Wijting, C., Ojanpera, T., Juntti, M., Kansanen, K., Prasad, R.: Groupwise Serial Multiuser Detectors for Multirate DS-CDMA. In: IEEE Vehicular Technology Conference, vol. 1, pp. 836–840 (1999)Google Scholar
  79. 79.
    Willmann, P., Kim, H., Rixner, S., Pai, V.S.: An Efficient Programmable 10 Gigabit Ethernet Network Interface Card. In: ACM International Symposium on High-Performance Computer Architecture, pp. 85–86 (2006)Google Scholar
  80. 80.
    Witte, E., Borlenghi, F., Ascheid, G., Leupers, R., Meyr, H.: A Scalable VLSI Architecture for Soft-Input Soft-Output Single Tree-Search Sphere Decoding. IEEE Tran. on Circuits and Systems II: Express Briefs 57(9), 706–710 (2010)CrossRefGoogle Scholar
  81. 81.
    Wong, C.C., Chang, H.C.: Reconfigurable Turbo Decoder with Parallel Architecture for 3GPP LTE System. IEEE Tran. on Circuits and Systems II: Express Briefs 57(7), 566–570 (2010)CrossRefGoogle Scholar
  82. 82.
    Wong, K., Tsui, C., Cheng, R.S., Mow, W.: A VLSI Architecture of a K-best Lattice Decoding Algorithm for MIMO Channels. In: IEEE International Symposium on Circuits and Systems, vol. 3, pp. 273–276 (2002)Google Scholar
  83. 83.
    Wu, M., Sun, Y., Wang, G., Cavallaro, J.R.: Implementation of a High Throughput 3GPP Turbo Decoder on GPU. Journal of Signal Processing Systems 65(2), 171 (2011). https://doi.org/10.1007/s11265-011-0617-7 CrossRefGoogle Scholar
  84. 84.
    Wu, M., Wang, G., Yin, B., Studer, C., Cavallaro, J.R.: LTE-A Turbo Decoder on GPU and Multicore CPU. In: 2013 Asilomar Conference on Signals, Systems and Computers, pp. 824–828 (2013).  https://doi.org/10.1109/ACSSC.2013.6810402
  85. 85.
    Xilinx: Digital Pre-Distortion. online: https://www.xilinx.com/products/intellectual-property/ef-di-dpd.html (2017)
  86. 86.
    Ye, Z.A., Moshovos, A., Hauck, S., Banerjee, P.: CHIMAERA: A High Performance Architecture with a Tightly Coupled Reconfigurable Functional Unit. In: Proceedings of the 27th Annual International Symposium on Computer Architecture, pp. 225–235 (2000)Google Scholar
  87. 87.
    Zhong, H., Zhang, T.: Block-LDPC: A Practical LDPC Coding System Design Approach. IEEE Transactions on Circuits and Systems I 52(4), 766–775 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Chance Tarver
    • 1
  • Yang Sun
    • 1
  • Kiarash Amiri
    • 1
  • Michael Brogioli
    • 1
  • Joseph R. Cavallaro
    • 1
  1. 1.Rice UniversityHoustonUSA

Personalised recommendations