Abstract
Software based decoding of low-density parity-check (LDPC) codes frequently takes very long time, thus the general purpose graphics processing units (GPGPUs) that support massively parallel processing can be very useful for speeding up the simulation. In LDPC decoding, the parity-check matrix H needs to be accessed at every node updating process, and the size of the matrix is often larger than that of GPU on-chip memory especially when the code length is long or the weight is high. In this work, the parity-check matrix of cyclic or quasi-cyclic (QC) LDPC codes is greatly compressed by exploiting the periodic property of the matrix. Also, vacant elements are eliminated from the sparse message arrays to utilize the coalesced access of global memory supported by GPGPUs. Regular projective geometry (PG) and irregular QC LDPC codes are used for sum-product algorithm based decoding with the GTX-285 NVIDIA graphics processing unit (GPU), and considerable speed-up results are obtained.
Similar content being viewed by others
Notes
Segment size is 32, 64, and 128 bytes for 8-bit, 16-bit, and 32-, 64- and 128-bit data, respectively
The compute capability of a device is defined by a major and minor revision number. Devices with the same major revision number are of the same core architecture. The minor revision number corresponds to an incremental improvement to the core architecture, possibly including new features. The version of GTX-200 series is 1.3.
Block dimension is the number of threads that constitute one thread block.
The maximum number of threads per thread block is 512.
The index calculation is described in Section 3.3 in detail.
References
Gallager, R. G. (1963). Low density parity check codes. Cambridge: MIT.
The Digital Video Broadcasting Standard [Online]. Available: www.dvb.org
The IEEE 802.16 Working Group [Online]. Available: http://www.ieee802.org/16/
The IEEE 802.11n Working Group [Online]. Available: http://www.ieee802.org/11/
Falcão, G., Silva, V., & Sousa L. (2009). How GPUs can outperform ASICSs for fast LDPC decoding. In Proc. of the 2third International Conference on Supercomputing, New York, USA, pp. 390–399
Falcão, G., Yamagiwa, S., Silva, V., & Sousa, L. (2009). Parallel LDPC decoding on GPUs using a stream-based computing approach. Journal of Computer Science and Technology, 24, 913–924.
Tanner, R. M. (1981). A recursive approach to low complexity codes. IEEE Transactions on Information Theory, IT-27, 533–547.
Kou, Y., Lin, S., & Fossorier, M. (2001). Low density parity check codes based on finite geometries: a rediscovery and more. IEEE Transactions on Information Theory, 47, 2711–2736.
MacKay, D. J. C. (1999). Good error-correcting codes based on very sparse matrices. IEEE Transactions on Information Theory, 45, 399–431.
Chen, J., Dholakia, A., Eleftheriou, E., Fossorier, M., & Hu, X. Y. (2002). Near optimal reduced-complexity decoding algorithms for LDPC codes. In Proc. IEEE Int. Symp. Information Theory, Lausanne, Switzerland, p. 455
The CUDA Programming Guide [Online]. Available: http://developer.NVIDIA.com/object/cuda.html
Bell, N., & Garland, M. (2008). Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation.
Im, E. (2000). Optimizing the performance of sparse matrix-vector multiplication. Technical Report, UMI Order Number: CSD-00-1104., University of California at Berkeley.
Acknowledgements
This work was supported in part by the National Research Foundation (NRF) grant funded by the Korea government (MEST) (No. 20090075770 and No. 20090084804) and in part by the MEST under the Brain Korea 21 Project.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is an improved version of the “Massively parallel implementation of cyclic LDPC codes on a general purpose graphic processing unit,” which was presented in the IEEE Workshop on Signal Processing Systems (SiPS) held in Tampere (Finland) in 2009. Implementation results of standardized irregular QC LDPC codes for Wi-Fi and WiMax are added, and a two-dimensional message array compression technique is included.
Rights and permissions
About this article
Cite this article
Ji, H., Cho, J. & Sung, W. Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU. J Sign Process Syst 64, 149–159 (2011). https://doi.org/10.1007/s11265-010-0547-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-010-0547-9