Low-Latency Software Polar Decoders

  • Pascal Giard
  • Gabi Sarkis
  • Camille Leroux
  • Claude Thibeault
  • Warren J. Gross


Polar codes are a new class of capacity-achieving error-correcting codes with low encoding and decoding complexity. Their low-complexity decoding algorithms rendering them attractive for use in software-defined radio applications where computational resources are limited. In this work, we present low-latency software polar decoders that exploit modern processor capabilities. We show how adapting the algorithm at various levels can lead to significant improvements in latency and throughput, yielding polar decoders that are suitable for high-performance software-defined radio applications on modern desktop processors and embedded-platform processors. These proposed decoders have an order of magnitude lower latency and memory footprint compared to state-of-the-art decoders, while maintaining comparable throughput. In addition, we present strategies and results for implementing polar decoders on graphical processing units. Finally, we show that the energy efficiency of the proposed decoders is comparable to state-of-the-art software polar decoders.


Polar codes Successive-cancellation decoding Software decoders 



The authors wish to thank Samuel Gagné of École de technologie supérieure and CMC Microsystems for providing access to the Intel Core i7-4770S processor and NVIDIA Tesla K20c graphical processing unit, respectively. Claude Thibeault is a member of ReSMiQ. Warren J. Gross is a member of ReSMiQ and SYTACom.


  1. 1.
    PCI (2006). Express base specification revision 2.0. PCI-SIG.Google Scholar
  2. 2.
    IEEE (2008). Standard for floating-point arithmetic. IEEE Std 754–2008 pp. 1–70. doi: 10.1109/IEEESTD.2008.4610935.
  3. 3.
    IEEE (2012). Standard for information technology–telecommunications and information exchange between systems local and metropolitan area networks–specific requirements part 11: wireless LAN medium access control (MAC) and physical layer (PHY) specifications. IEEE Std 802.11-2012 (Revision of IEEE Std 802.11–2007), 1–2793. doi: 10.1109/IEEESTD.2012.6178212.
  4. 4.
    Alamdar-Yazdi, A., & Kschischang, F. R. (2011). A simplified successive-cancellation decoder for polar codes. IEEE Communications Letters, 15(12), 1378–1380. doi: 10.1109/LCOMM.2011.101811.111480  10.1109/LCOMM.2011.101811.111480.CrossRefGoogle Scholar
  5. 5.
    Arıkan, E. (2009). Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Transactions on Information Theory, 55(7), 3051–3073. doi: 10.1109/TIT.2009.2021379.MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bang, S., Ahn, C., Jin, Y., Choi, S., Glossner, J., & Ahn, S. (2014). Implementation of LTE system on an SDR platform using CUDA and UHD. Analog Integrated Circuits and Signal Processing, 78(3), 599–610. doi: 10.1007/s10470-013-0229-1.CrossRefGoogle Scholar
  7. 7.
    Demel, J., Koslowski, S., & Jondral, F. (2015). A LTE receiver framework using GNU Radio. Journal of Signal Processing System, 78(3), 313–320. doi: 10.1007/s11265-014-0959-z.CrossRefGoogle Scholar
  8. 8.
    Feng, W. C., & Xiao, S. (2010). To GPU synchronize or not GPU synchronize?. In IEEE international symposium on circuits and system. (ISCAS). doi: 10.1109/ISCAS.2010.5537722  10.1109/ISCAS.2010.5537722 (pp. 3801–3804).
  9. 9.
    Giard, P., Sarkis, G., Thibeault, C., & Gross, W. J. (2014). Fast software polar decoders. In IEEE international conference on acoustic, speech, and signal process. (ICASSP). doi: 10.1109/ICASSP.2014.6855069 (pp. 7555–7559).
  10. 10.
    Han, X., Niu, K., & He, Z. (2013). Implementation of IEEE 802.11n LDPC codes based on general purpose processors. In IEEE international conference on communication technology. (ICCT). doi: 10.1109/ICCT.2013.6820375 (pp. 218–222).
  11. 11.
    Jouguet, P., & Kunz-Jacques, S. (2014). High performance error correction for quantum key distribution using polar codes. Quantum Information and Computation, 14(3-4), 329–338.MathSciNetGoogle Scholar
  12. 12.
    Le Gal, B., Jego, C., & Crenne, J. (2014). A high throughput efficient approach for decoding LDPC codes onto GPU devices. IEEE Embedded Systems Letters, 6(2), 29–32. doi: 10.1109/LES.2014.2311317.CrossRefGoogle Scholar
  13. 13.
    Le Gal, B., Leroux, C., & Jego, C. (2014). Software polar decoder on an embedded processor. In IEEE international workshop on signal processing system. (SiPS). doi: 10.1109/SiPS.2014.6986083.
  14. 14.
    Le Gal, B., Leroux, C., & Jego, C. (2015). Multi-Gb/s software decoding of polar codes. IEEE Transactions on Signal Processing, 63(2), 349–359. doi: 10.1109/TSP.2014.2371781.MathSciNetCrossRefGoogle Scholar
  15. 15.
    Leroux, C., Raymond, A., Sarkis, G., & Gross, W. (2013). A semi-parallel successive-cancellation decoder for polar codes. IEEE Transactions on Signal Processing, 61(2), 289–299. doi: 10.1109/TSP.2012.2223693.MathSciNetCrossRefGoogle Scholar
  16. 16.
    NVIDIA (2012). Kepler GK110 - the fastest, most efficient HPC architecture ever built. NVIDIA’s Next Generation CUDA Computer Architecture: Kepler GK110.Google Scholar
  17. 17.
    NVIDIA (2014). NVIDIA management library (NVML), NVML API Reference Guide.Google Scholar
  18. 18.
    NVIDIA (2014). Performance guidelines. CUDA C Programming Guide.Google Scholar
  19. 19.
    Sarkis, G., Giard, P., Thibeault, C., & Gross, W.J. (2014). Autogenerating software polar decoders. In IEEE global conference on signal and information processing. (GlobalSIP). doi: 10.1109/GlobalSIP.2014.7032067 (pp. 6–10).
  20. 20.
    Sarkis, G., Giard, P., Vardy, A., Thibeault, C., & Gross, W. J. (2014). Fast polar decoders: Algorithm and implementation. IEEE Journal on Selected Areas in Communications, 32(5), 946–957. doi: 10.1109/JSAC.2014.140514.CrossRefGoogle Scholar
  21. 21.
    Sarkis, G., & Gross, W. J. (2013). Increasing the throughput of polar decoders. IEEE Communications Letters, 17(4), 725–728. doi: 10.1109/LCOMM.2013.021213.121633.CrossRefGoogle Scholar
  22. 22.
    Tal, I., & Vardy, A. (2013). How to construct polar codes. IEEE Transactions on Information Theory, 59 (10), 6562–6582. doi: 10.1109/TIT.2013.2272694.MathSciNetCrossRefGoogle Scholar
  23. 23.
    Tan, K., Liu, H., Zhang, J., Zhang, Y., Fang, J., & Voelker, G. M. (2011). Sora: high-performance software radio using general-purpose multi-core processors. Communications of the ACM, 54(1), 99–107. doi: 10.1145/1866739.1866760.CrossRefGoogle Scholar
  24. 24.
    Treibig, J., Hager, G., & Wellein, G. (2010). LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In International conference on parallel process. Workshops (ICPPW). doi: 10.1109/ICPPW.2010.38 (pp. 207–216).
  25. 25.
    Wang, G., Wu, M., Yin, B., & Cavallaro, J. R. (2013). High throughput low latency LDPC decoding on GPU for SDR systems. In IEEE global conference on signal and information processing. (GlobalSIP). doi: 10.1109/GlobalSIP.2013.6737137 (pp. 1258–1261).
  26. 26.
    Xianjun, J., Canfeng, C., Jaaskelainen, P., Guzma, V., & Berg, H. (2013). A 122 Mb/s turbo decoder using a mid-range GPU. In International wireless communication and mobile comput. Conference. (IWCMC). doi: 10.1109/IWCMC.2013.6583709 (pp. 1090–1094).

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Pascal Giard
    • 1
  • Gabi Sarkis
    • 1
  • Camille Leroux
    • 2
  • Claude Thibeault
    • 3
  • Warren J. Gross
    • 1
  1. 1.Department of Electrical and Computer EngineeringMcGill UniversityMontréalCanada
  2. 2.IMS LabBordeaux-INPBordeauxFrance
  3. 3.Department of Electrical EngineeringÉcole de Technologie SupérieureMontréalCanada

Personalised recommendations