Journal of Signal Processing Systems

, Volume 64, Issue 1, pp 149–159 | Cite as

Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU

  • Hyunwoo JiEmail author
  • Junho Cho
  • Wonyong Sung


Software based decoding of low-density parity-check (LDPC) codes frequently takes very long time, thus the general purpose graphics processing units (GPGPUs) that support massively parallel processing can be very useful for speeding up the simulation. In LDPC decoding, the parity-check matrix H needs to be accessed at every node updating process, and the size of the matrix is often larger than that of GPU on-chip memory especially when the code length is long or the weight is high. In this work, the parity-check matrix of cyclic or quasi-cyclic (QC) LDPC codes is greatly compressed by exploiting the periodic property of the matrix. Also, vacant elements are eliminated from the sparse message arrays to utilize the coalesced access of global memory supported by GPGPUs. Regular projective geometry (PG) and irregular QC LDPC codes are used for sum-product algorithm based decoding with the GTX-285 NVIDIA graphics processing unit (GPU), and considerable speed-up results are obtained.


Low-density parity-check (LDPC) codes Compute Unified Device Architecture (CUDA) General Purpose Graphics Processing Unit (GPGPU) Memory access optimization 



This work was supported in part by the National Research Foundation (NRF) grant funded by the Korea government (MEST) (No. 20090075770 and No. 20090084804) and in part by the MEST under the Brain Korea 21 Project.


  1. 1.
    Gallager, R. G. (1963). Low density parity check codes. Cambridge: MIT.Google Scholar
  2. 2.
    The Digital Video Broadcasting Standard [Online]. Available:
  3. 3.
    The IEEE 802.16 Working Group [Online]. Available:
  4. 4.
    The IEEE 802.11n Working Group [Online]. Available:
  5. 5.
    Falcão, G., Silva, V., & Sousa L. (2009). How GPUs can outperform ASICSs for fast LDPC decoding. In Proc. of the 2third International Conference on Supercomputing, New York, USA, pp. 390–399Google Scholar
  6. 6.
    Falcão, G., Yamagiwa, S., Silva, V., & Sousa, L. (2009). Parallel LDPC decoding on GPUs using a stream-based computing approach. Journal of Computer Science and Technology, 24, 913–924.CrossRefGoogle Scholar
  7. 7.
    Tanner, R. M. (1981). A recursive approach to low complexity codes. IEEE Transactions on Information Theory, IT-27, 533–547.MathSciNetCrossRefGoogle Scholar
  8. 8.
    Kou, Y., Lin, S., & Fossorier, M. (2001). Low density parity check codes based on finite geometries: a rediscovery and more. IEEE Transactions on Information Theory, 47, 2711–2736.MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    MacKay, D. J. C. (1999). Good error-correcting codes based on very sparse matrices. IEEE Transactions on Information Theory, 45, 399–431.MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Chen, J., Dholakia, A., Eleftheriou, E., Fossorier, M., & Hu, X. Y. (2002). Near optimal reduced-complexity decoding algorithms for LDPC codes. In Proc. IEEE Int. Symp. Information Theory, Lausanne, Switzerland, p. 455Google Scholar
  11. 11.
    The CUDA Programming Guide [Online]. Available:
  12. 12.
    Bell, N., & Garland, M. (2008). Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation.Google Scholar
  13. 13.
    Im, E. (2000). Optimizing the performance of sparse matrix-vector multiplication. Technical Report, UMI Order Number: CSD-00-1104., University of California at Berkeley.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.School of Electrical EngineeringSeoul National UniversitySeoulSouth Korea

Personalised recommendations