Abstract
In the last few years, with the advent of a software-defined radio (SDR), the processor cores were stated to be an efficient solution to execute the physical layer components. Indeed, multi-core architectures provide both high-processing performance and flexibility, such that they are used in current base station systems instead of dedicated FPGA or ASIC devices. Currently, an extension of the SDR concept is running. Indeed, cloud platforms become attractive for the virtualization of radio access network functions. Actually, they improve the efficiency of the computational resource usage, and thus the global power efficiency. However, the implementation of a physical layer on a Cloud-RAN platform as discussed by Wubben and Paul (2016); Checko et al. (JAMA 17(1):405–426, 2015); Inc (2015); and Wubben et al. (JAMA 31(6):35–44, 2014) or FlexRAN platform as discussed by Wilson (2018); Foukas et al. (2017); Corp. (2017); Foukas et al. (2016) is a challenging task according to the drastic latency and throughput constraints as discussed by Yu et al. (2017) and Parvez (2018). Processing latencies from 10 μ s up to hundred of μ s are required for future digital communication systems. In this context, most of works about software implementations of ECC applications is based on massive frame parallelism to reach high throughput. Nonetheless, they produce unacceptable decoding latencies. In this paper, a new turbo decoder parallelization approach is proposed for x86 multi-core processors. It provides both: high-throughput and low-latency performances. In comparison with all CPU- and GPU-related works, the following results are observed: shorter processing latency, higher throughput, and lower energy consumption. Regarding to the best state-of-the-art x86 software implementations, 1.5 × to 2 × throughput improvements are reached, whereas a latency reduction of 50 × and an energy reduction of 2 × are observed.
Similar content being viewed by others
Notes
The 4th streaming SIMD extensions introduced by INTEL in 2006.
The 2nd advanced vector extensions deployed by INTEL in its processors since 2013.
The clang compiler is the only compiler available on the server platform (P3). As we cannot be the administrator of the workstation, it is not possible for us to update the old fedora distribution. In order to have a fair comparison between the platforms, the clang 4 compiler was also used for P1 and P2 platforms.
For small numbers of iterations, it may be necessary to correct the extrapolated values with the information provided in Table 3.
The theoretical limit is 6 instructions per cycle. In practice, it is necessary for these instructions to use different functional units so it is much lower.
References
Wubben D, Paul H (2016) Analysis of virtualized turbo-decoder implementation for Cloud-RAN systems. In: Proceeding of the 9th international symposium on turbo codes & iterative information processing, pp 385–389
Checko A, Christiansen HL, Yan Y, Scolari L, Kardaras G, Berger MS, Dittmann L (2015) Cloud RAN for mobile networks - a technology overview. IEEE Commun Surv Tutor 17(1):405–426. Firstquarter
Inc F (2015) The benefits of Cloud-RAN architecture in mobile network expansion. Fujitsu Network Communications Inc., Tech. Rep.
Wubben D, Rost P, Bartelt JS, Lalam M, Savin V, Gorgoglione M, Dekorsy A, Fettweis G (2014) Benefits and impact of cloud computing on 5G signal processing: flexible centralization through Cloud-RAN. IEEE Signal Process Mag 31(6):35–44
Wilson R (2018) Intel flexran reference designs deployed in 5g infrastructure
Foukas X, Nikaein N, Kassem MM, Marina MK, Kontovasilis K (2017) Demo: Flexran – a software-defined ran platform. In: Proceedings of the 23rd annual international conference on mobile computing and networking conference (MOBICOM), pp 465–467
I. corp., INTEL 5G Vision (Network, Cloud Client) (2017)
Foukas X, Nikaein N, Kassem MM, Marina MK, Kontovasilis K (2016) Flexran: a flexible and programmable platform for software-defined radio access networks. In: Proceedings of the 12th international on conference on emerging networking experiments and technologies (CoNEXT), pp 427–441
Yu H, Lee H, Jeon H (2017) What is 5G? Emerging 5G mobile services and network requirements. Sustainability, 9
Parvez I, Rahmati A, Guvenc I, Sarwat AI, Dai H (2018) A survey on low latency towards 5G: RAN, core network and caching solutions. IEEE Communications Surveys & Tutorials, arXiv:1708.02562v2
Berrou C, Glavieux A, Thitimajshima P (1993) Near Shannon limit error-correcting coding and decoding: turbo-codes. In Proceedings of the international conference on communications (ICC). Geneva, pp 1064–1070
Third generation partnership project, 3GPP home page, www.3gpp.org
Brejza MF, Li L, Maunder RG, Al-Hashimi BM, Berrou C, Hanzo L (2015) 20 years of turbo coding and energy-aware design guidelines for energy-constrained wireless applications. IEEE Commun Surv Tutor 18(1):8–28
Belfanti S, Roth C, Gautschi M, Benkeser C, Huang Q (2013) A 1Gbps LTE-advanced turbo-decoder ASIC in 65nm CMOS. In: Proceedings of the symposium on VLSI circuits (VLSIC), pp 284–285
Studer C, Benkeser C, Belfanti S, Huang Q (2011) Design and implementation of a parallel turbo-decoder ASIC for 3GPP-LTE. IEEE J Solid State Circuits 46(1):8–17
Sun Y, Cavallaro JR (2011) Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder. Integration VLSI J 44(4):305–315
Wu M, Sun Y, Cavallaro JR (2010) Implementation of a 3GPP LTE turbo decoder accelerator on GPU. In: Proceedings of the IEEE workshop on signal processing systems (SIPS), pp 192–197
May M, Ilnseher T, Wehn N, Raab W (2010) A 150Mbit/s 3GPP LTE turbo code decoder. In: Proceedings of the design, automation & test in europe conference & exhibition (DATE), pp 1420–1425
Benkeser C, Burg A, Cupaiuolo T, Huang Q (2009) Design and optimization of an HSDPA turbo decoder ASIC. IEEE J Solid State Circuits 44(1):98–106
Muller O, Baghdadi A, Jezequel M (2009) From parallelism levels to a multi-ASIP architecture for turbo decoding. IEEE Trans Very Large Scale Integr (VLSI) Syst 17(1):92–102
Vogt J, Finger A (2000) Improving the max-log-MAP turbo decoder. Electron Lett 36(23):1937–1939
Liu C, Bie Z, Chen C, Jiao X (2013) A parallel LTE turbo decoder on GPU. In: Proceedings of the 15th IEEE international conference on communication technology conference (ICCT), pp 609–614
Chen X, Zhu J, Wen Z, Wang Y, Yang H (2013) BER, guaranteed optimization and implementation of parallel turbo decoding on GPU. In: Proceedings of the 8th international conference on communications and networking in China (CHINACOM), pp 183–188
Xianjun J, Canfeng C, Jaaskelainen P, Guzma V, Berg H (2013) A 122mb/s turbo decoder using a mid-range GPU. In: IWCMC proc., pp 1090–1094
Wu M, Wang G, Yin B, Studer C, Cavallaro JR (2013) HSPA+/LTE - a turbo decoder on GPU and multicore architecture. In: Proceedings of the Asilomar conference on signals, systems and computers, pp 824–828
Li R, Dou Y, Xu J, Niu X, Ni S (2014) An efficient parallel SOVA-based turbo decoder for software defined radio on GPU. IEICE Trans Fund Electron Commun Comput Sci 97(5):1027–1036
Cassagne A, Tonnellier T, Leroux C, Le Gal B, Aumage O, Barthou D (2016) Beyond Gbps turbo decoder on multi-core CPUs. In: Proceeding of the international symposium on turbo codes and iterative information processing (ISTC). Brest, pp 136–140
Le Gal B, Jego C (2016) High-throughput multi-core LDPC decoders based on x86 processor. IEEE Trans Parallel Distrib Syst (TPDS) 27(5):1373–1386
Andrade J, Falcao G, Silva V, Sousa L (2016) A survey on programmable LDPC decoders. IEEE Access 4:6704–6718
Le Gal B, Jego C (2017) Low-latency software LDPC decoders. In: Proceedings of the IEEE internationnal workshop on signal processing systems (SIPS). Lorient, pp 1–6
Le Gal B, Leroux C, Jego C (2015) Multi-Gb/s software decoding of polar codes. IEEE Trans Signal Process 63(2):349–359
Sarkis G, Giard P, Vardy A, Thibeault C, Gross WJ (2015) Unrolled polar decoders, part ii: fast list decoders. IEEE Journal on Selected Areas in Communications - Special Issue on Recent Advances In Capacity Approaching Codes (submitted)
Falcao G, Andrade J, Silva V, Sousa L (2011) GPU-based DVB-S2 LDPC decoder with high throughput and fast error floor detection. IET Electron Lett 47(9):542–543
Chinnici S, Spallaccini P (2012) Fast simulation of turbo codes on GPUs. In: Proceeding of the 7th international symposium on turbo codes and iterative information processing (ISTC), pp 61–65
Grayver E (2013) Implementing software defined radio. Springer, New York
ITU-R (2014) Framework and overall objectives of the future development of IMT for 2020 and beyond in [IMT.VISION]
Berrou C, Glavieux A, Thitimajshima P (1993) Near Shannon limit error-correcting coding and decoding: turbo-codes. In: Proceedings of the IEEE international conference on communications (ICC), pp 1064–1070
Bahl L, Cocke J, Jelinek F, Raviv J (1974) Optimal decoding of linear codes for minimizing symbol error rate. IEEE Trans Inf Theory, 284–287
Robertson P, Villebrun E, Hoeher P (1995) A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain. In: Proceedings of the IEEE international conference on communications (ICC), vol 2, pp 1009–1013
Boutillon E, Sánchez-Rojas J-L, Marchand C (2014) Simplified compression of redundancy free trellis sections in turbo decoder. IEEE Commun Lett 18(6):941–944
Muller O, Baghdadi A, Jezequel M (2006) Exploring parallel processing levels for convolutional turbo decoding. In: Proceedings of the international conference on information & communication technologies, vol 2, pp 2353–2358
Sun Y, Cavallaro JR (2011) Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder. Integr VLSI J 44(4):305–315
Ilnseher T, Kienle F, Weis C, Wehn N (2012) A 2.15 GBit/s turbo code decoder for LTE advanced base station applications. In: Proceedings of the 7th international symposium on turbo codes and iterative information processing (ISTC), pp 21–25
Shrestha R, Paily RP (2014) High-throughput turbo decoder with parallel architecture for LTE wireless communication standards. IEEE Trans Circ Syst I 61:9
Wu M, Sun Y, Wang G, Cavallaro JR (2011) Implementation of a high throughput 3GPP turbo decoder on GPU. J Signal Process Syst Springer 65(2):171–183
Yoge D, Chandrachoodan N (2012) GPU implementation of a programmable turbo decoder for software defined radio applications. In: Proceedings of the 25th international conference on VLSI design (VLSID), pp 149–154
Zhang Y, Xing Z, Yuan L, Liu C, Wang Q (2014) The acceleration of turbo decoder on the newest GPGPU of kepler architecture. In: Proceedings of the 14th international symposium on communications and information technologies (ISCIT), pp 199–203
Huang L, Luo Y, Wang H, Yang F, Shi Z, Gu D (2011) A high speed turbo decoder implementation for CPU-based SDR system. In: Proceedings of the IET international conference on communication technology and application (ICCTA), pp 19–23
Zhang S, Qian R, Peng T, Duan R, Chen K (2012) High throughput turbo decoder design for GPP platform. In: Proceedings of the 7th international conference on communications and networking in China, pp 817–821
Giard P, Sarkis G, Leroux C, Thibeault C, Gross WJ (2016) Low-latency software polar decoders. Journal of Signal Processing Systems Springer
Sun J, Takeshita O (2005) Interleavers for turbo codes using permutation polynomials over integer rings. IEEE Trans Inf Theory 51:101–119
Montorsi G, Paily RP (2001) Design of fixed-point iterative decoders for concatenated codes with interleavers. IEEE J Select Areas Commun 19:5
Shahabuddin S, Janhunen J, MJ, et al (2014) Design of a transport triggered vector processor for turbo decoding. J Analog Integr Circ Signal Process 78:3
Studer C, Benkeser C, Belfanti S, Huang Q (2011) Design and implementation of a parallel turbo-decoder ASIC for 3GPP-LTE. IEEE J Solid State Circuits 46(1):8–17
Belfanti S, Roth C, Gautschi M, Benkeser C, Huang Q (2013) A 1 Gbps LTE-advanced turbo-decoder ASIC in 65 nm CMOS. In: Proceedings of the symposium on VLSI circuits
Wu M, Wang G, Yin B, Studer C, Cavallaro JR (2013) HSPA+/LTE-a turbo decoder on GPU and multicore CPU. In: Proceedings of the Asilomar conference on signals, systems and computers, pp 824–828
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Le Gal, B., Jego, C. Low-latency and high-throughput software turbo decoders on multi-core architectures. Ann. Telecommun. 75, 27–42 (2020). https://doi.org/10.1007/s12243-019-00727-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12243-019-00727-5