Advertisement

Low-latency and high-throughput software turbo decoders on multi-core architectures

  • 72 Accesses

Abstract

In the last few years, with the advent of a software-defined radio (SDR), the processor cores were stated to be an efficient solution to execute the physical layer components. Indeed, multi-core architectures provide both high-processing performance and flexibility, such that they are used in current base station systems instead of dedicated FPGA or ASIC devices. Currently, an extension of the SDR concept is running. Indeed, cloud platforms become attractive for the virtualization of radio access network functions. Actually, they improve the efficiency of the computational resource usage, and thus the global power efficiency. However, the implementation of a physical layer on a Cloud-RAN platform as discussed by Wubben and Paul (2016); Checko et al. (JAMA 17(1):405–426, 2015); Inc (2015); and Wubben et al. (JAMA 31(6):35–44, 2014) or FlexRAN platform as discussed by Wilson (2018); Foukas et al. (2017); Corp. (2017); Foukas et al. (2016) is a challenging task according to the drastic latency and throughput constraints as discussed by Yu et al. (2017) and Parvez (2018). Processing latencies from 10 μ s up to hundred of μ s are required for future digital communication systems. In this context, most of works about software implementations of ECC applications is based on massive frame parallelism to reach high throughput. Nonetheless, they produce unacceptable decoding latencies. In this paper, a new turbo decoder parallelization approach is proposed for x86 multi-core processors. It provides both: high-throughput and low-latency performances. In comparison with all CPU- and GPU-related works, the following results are observed: shorter processing latency, higher throughput, and lower energy consumption. Regarding to the best state-of-the-art x86 software implementations, 1.5 × to 2 × throughput improvements are reached, whereas a latency reduction of 50 × and an energy reduction of 2 × are observed.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    The 4th streaming SIMD extensions introduced by INTEL in 2006.

  2. 2.

    The 2nd advanced vector extensions deployed by INTEL in its processors since 2013.

  3. 3.

    The clang compiler is the only compiler available on the server platform (P3). As we cannot be the administrator of the workstation, it is not possible for us to update the old fedora distribution. In order to have a fair comparison between the platforms, the clang 4 compiler was also used for P1 and P2 platforms.

  4. 4.

    For small numbers of iterations, it may be necessary to correct the extrapolated values with the information provided in Table 3.

  5. 5.

    The theoretical limit is 6 instructions per cycle. In practice, it is necessary for these instructions to use different functional units so it is much lower.

References

  1. 1.

    Wubben D, Paul H (2016) Analysis of virtualized turbo-decoder implementation for Cloud-RAN systems. In: Proceeding of the 9th international symposium on turbo codes & iterative information processing, pp 385–389

  2. 2.

    Checko A, Christiansen HL, Yan Y, Scolari L, Kardaras G, Berger MS, Dittmann L (2015) Cloud RAN for mobile networks - a technology overview. IEEE Commun Surv Tutor 17(1):405–426. Firstquarter

  3. 3.

    Inc F (2015) The benefits of Cloud-RAN architecture in mobile network expansion. Fujitsu Network Communications Inc., Tech. Rep.

  4. 4.

    Wubben D, Rost P, Bartelt JS, Lalam M, Savin V, Gorgoglione M, Dekorsy A, Fettweis G (2014) Benefits and impact of cloud computing on 5G signal processing: flexible centralization through Cloud-RAN. IEEE Signal Process Mag 31(6):35–44

  5. 5.

    Wilson R (2018) Intel flexran reference designs deployed in 5g infrastructure

  6. 6.

    Foukas X, Nikaein N, Kassem MM, Marina MK, Kontovasilis K (2017) Demo: Flexran – a software-defined ran platform. In: Proceedings of the 23rd annual international conference on mobile computing and networking conference (MOBICOM), pp 465–467

  7. 7.

    I. corp., INTEL 5G Vision (Network, Cloud Client) (2017)

  8. 8.

    Foukas X, Nikaein N, Kassem MM, Marina MK, Kontovasilis K (2016) Flexran: a flexible and programmable platform for software-defined radio access networks. In: Proceedings of the 12th international on conference on emerging networking experiments and technologies (CoNEXT), pp 427–441

  9. 9.

    Yu H, Lee H, Jeon H (2017) What is 5G? Emerging 5G mobile services and network requirements. Sustainability, 9

  10. 10.

    Parvez I, Rahmati A, Guvenc I, Sarwat AI, Dai H (2018) A survey on low latency towards 5G: RAN, core network and caching solutions. IEEE Communications Surveys & Tutorials, arXiv:1708.02562v2

  11. 11.

    Berrou C, Glavieux A, Thitimajshima P (1993) Near Shannon limit error-correcting coding and decoding: turbo-codes. In Proceedings of the international conference on communications (ICC). Geneva, pp 1064–1070

  12. 12.

    Third generation partnership project, 3GPP home page, www.3gpp.org

  13. 13.

    Brejza MF, Li L, Maunder RG, Al-Hashimi BM, Berrou C, Hanzo L (2015) 20 years of turbo coding and energy-aware design guidelines for energy-constrained wireless applications. IEEE Commun Surv Tutor 18(1):8–28

  14. 14.

    Belfanti S, Roth C, Gautschi M, Benkeser C, Huang Q (2013) A 1Gbps LTE-advanced turbo-decoder ASIC in 65nm CMOS. In: Proceedings of the symposium on VLSI circuits (VLSIC), pp 284–285

  15. 15.

    Studer C, Benkeser C, Belfanti S, Huang Q (2011) Design and implementation of a parallel turbo-decoder ASIC for 3GPP-LTE. IEEE J Solid State Circuits 46(1):8–17

  16. 16.

    Sun Y, Cavallaro JR (2011) Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder. Integration VLSI J 44(4):305–315

  17. 17.

    Wu M, Sun Y, Cavallaro JR (2010) Implementation of a 3GPP LTE turbo decoder accelerator on GPU. In: Proceedings of the IEEE workshop on signal processing systems (SIPS), pp 192–197

  18. 18.

    May M, Ilnseher T, Wehn N, Raab W (2010) A 150Mbit/s 3GPP LTE turbo code decoder. In: Proceedings of the design, automation & test in europe conference & exhibition (DATE), pp 1420–1425

  19. 19.

    Benkeser C, Burg A, Cupaiuolo T, Huang Q (2009) Design and optimization of an HSDPA turbo decoder ASIC. IEEE J Solid State Circuits 44(1):98–106

  20. 20.

    Muller O, Baghdadi A, Jezequel M (2009) From parallelism levels to a multi-ASIP architecture for turbo decoding. IEEE Trans Very Large Scale Integr (VLSI) Syst 17(1):92–102

  21. 21.

    Vogt J, Finger A (2000) Improving the max-log-MAP turbo decoder. Electron Lett 36(23):1937–1939

  22. 22.

    Liu C, Bie Z, Chen C, Jiao X (2013) A parallel LTE turbo decoder on GPU. In: Proceedings of the 15th IEEE international conference on communication technology conference (ICCT), pp 609–614

  23. 23.

    Chen X, Zhu J, Wen Z, Wang Y, Yang H (2013) BER, guaranteed optimization and implementation of parallel turbo decoding on GPU. In: Proceedings of the 8th international conference on communications and networking in China (CHINACOM), pp 183–188

  24. 24.

    Xianjun J, Canfeng C, Jaaskelainen P, Guzma V, Berg H (2013) A 122mb/s turbo decoder using a mid-range GPU. In: IWCMC proc., pp 1090–1094

  25. 25.

    Wu M, Wang G, Yin B, Studer C, Cavallaro JR (2013) HSPA+/LTE - a turbo decoder on GPU and multicore architecture. In: Proceedings of the Asilomar conference on signals, systems and computers, pp 824–828

  26. 26.

    Li R, Dou Y, Xu J, Niu X, Ni S (2014) An efficient parallel SOVA-based turbo decoder for software defined radio on GPU. IEICE Trans Fund Electron Commun Comput Sci 97(5):1027–1036

  27. 27.

    Cassagne A, Tonnellier T, Leroux C, Le Gal B, Aumage O, Barthou D (2016) Beyond Gbps turbo decoder on multi-core CPUs. In: Proceeding of the international symposium on turbo codes and iterative information processing (ISTC). Brest, pp 136–140

  28. 28.

    Le Gal B, Jego C (2016) High-throughput multi-core LDPC decoders based on x86 processor. IEEE Trans Parallel Distrib Syst (TPDS) 27(5):1373–1386

  29. 29.

    Andrade J, Falcao G, Silva V, Sousa L (2016) A survey on programmable LDPC decoders. IEEE Access 4:6704–6718

  30. 30.

    Le Gal B, Jego C (2017) Low-latency software LDPC decoders. In: Proceedings of the IEEE internationnal workshop on signal processing systems (SIPS). Lorient, pp 1–6

  31. 31.

    Le Gal B, Leroux C, Jego C (2015) Multi-Gb/s software decoding of polar codes. IEEE Trans Signal Process 63(2):349–359

  32. 32.

    Sarkis G, Giard P, Vardy A, Thibeault C, Gross WJ (2015) Unrolled polar decoders, part ii: fast list decoders. IEEE Journal on Selected Areas in Communications - Special Issue on Recent Advances In Capacity Approaching Codes (submitted)

  33. 33.

    Falcao G, Andrade J, Silva V, Sousa L (2011) GPU-based DVB-S2 LDPC decoder with high throughput and fast error floor detection. IET Electron Lett 47(9):542–543

  34. 34.

    Chinnici S, Spallaccini P (2012) Fast simulation of turbo codes on GPUs. In: Proceeding of the 7th international symposium on turbo codes and iterative information processing (ISTC), pp 61–65

  35. 35.

    Grayver E (2013) Implementing software defined radio. Springer, New York

  36. 36.

    ITU-R (2014) Framework and overall objectives of the future development of IMT for 2020 and beyond in [IMT.VISION]

  37. 37.

    Berrou C, Glavieux A, Thitimajshima P (1993) Near Shannon limit error-correcting coding and decoding: turbo-codes. In: Proceedings of the IEEE international conference on communications (ICC), pp 1064–1070

  38. 38.

    Bahl L, Cocke J, Jelinek F, Raviv J (1974) Optimal decoding of linear codes for minimizing symbol error rate. IEEE Trans Inf Theory, 284–287

  39. 39.

    Robertson P, Villebrun E, Hoeher P (1995) A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain. In: Proceedings of the IEEE international conference on communications (ICC), vol 2, pp 1009–1013

  40. 40.

    Boutillon E, Sánchez-Rojas J-L, Marchand C (2014) Simplified compression of redundancy free trellis sections in turbo decoder. IEEE Commun Lett 18(6):941–944

  41. 41.

    Muller O, Baghdadi A, Jezequel M (2006) Exploring parallel processing levels for convolutional turbo decoding. In: Proceedings of the international conference on information & communication technologies, vol 2, pp 2353–2358

  42. 42.

    Sun Y, Cavallaro JR (2011) Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder. Integr VLSI J 44(4):305–315

  43. 43.

    Ilnseher T, Kienle F, Weis C, Wehn N (2012) A 2.15 GBit/s turbo code decoder for LTE advanced base station applications. In: Proceedings of the 7th international symposium on turbo codes and iterative information processing (ISTC), pp 21–25

  44. 44.

    Shrestha R, Paily RP (2014) High-throughput turbo decoder with parallel architecture for LTE wireless communication standards. IEEE Trans Circ Syst I 61:9

  45. 45.

    Wu M, Sun Y, Wang G, Cavallaro JR (2011) Implementation of a high throughput 3GPP turbo decoder on GPU. J Signal Process Syst Springer 65(2):171–183

  46. 46.

    Yoge D, Chandrachoodan N (2012) GPU implementation of a programmable turbo decoder for software defined radio applications. In: Proceedings of the 25th international conference on VLSI design (VLSID), pp 149–154

  47. 47.

    Zhang Y, Xing Z, Yuan L, Liu C, Wang Q (2014) The acceleration of turbo decoder on the newest GPGPU of kepler architecture. In: Proceedings of the 14th international symposium on communications and information technologies (ISCIT), pp 199–203

  48. 48.

    Huang L, Luo Y, Wang H, Yang F, Shi Z, Gu D (2011) A high speed turbo decoder implementation for CPU-based SDR system. In: Proceedings of the IET international conference on communication technology and application (ICCTA), pp 19–23

  49. 49.

    Zhang S, Qian R, Peng T, Duan R, Chen K (2012) High throughput turbo decoder design for GPP platform. In: Proceedings of the 7th international conference on communications and networking in China, pp 817–821

  50. 50.

    Giard P, Sarkis G, Leroux C, Thibeault C, Gross WJ (2016) Low-latency software polar decoders. Journal of Signal Processing Systems Springer

  51. 51.

    Sun J, Takeshita O (2005) Interleavers for turbo codes using permutation polynomials over integer rings. IEEE Trans Inf Theory 51:101–119

  52. 52.

    Montorsi G, Paily RP (2001) Design of fixed-point iterative decoders for concatenated codes with interleavers. IEEE J Select Areas Commun 19:5

  53. 53.

    Shahabuddin S, Janhunen J, MJ, et al (2014) Design of a transport triggered vector processor for turbo decoding. J Analog Integr Circ Signal Process 78:3

  54. 54.

    Studer C, Benkeser C, Belfanti S, Huang Q (2011) Design and implementation of a parallel turbo-decoder ASIC for 3GPP-LTE. IEEE J Solid State Circuits 46(1):8–17

  55. 55.

    Belfanti S, Roth C, Gautschi M, Benkeser C, Huang Q (2013) A 1 Gbps LTE-advanced turbo-decoder ASIC in 65 nm CMOS. In: Proceedings of the symposium on VLSI circuits

  56. 56.

    Wu M, Wang G, Yin B, Studer C, Cavallaro JR (2013) HSPA+/LTE-a turbo decoder on GPU and multicore CPU. In: Proceedings of the Asilomar conference on signals, systems and computers, pp 824–828

Download references

Author information

Correspondence to Bertrand Le Gal.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Le Gal, B., Jego, C. Low-latency and high-throughput software turbo decoders on multi-core architectures. Ann. Telecommun. 75, 27–42 (2020) doi:10.1007/s12243-019-00727-5

Download citation

Keywords

  • Turbo code
  • Multi-core
  • SIMD
  • High throughput
  • Low latency