Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU
- 371 Downloads
Software based decoding of low-density parity-check (LDPC) codes frequently takes very long time, thus the general purpose graphics processing units (GPGPUs) that support massively parallel processing can be very useful for speeding up the simulation. In LDPC decoding, the parity-check matrix H needs to be accessed at every node updating process, and the size of the matrix is often larger than that of GPU on-chip memory especially when the code length is long or the weight is high. In this work, the parity-check matrix of cyclic or quasi-cyclic (QC) LDPC codes is greatly compressed by exploiting the periodic property of the matrix. Also, vacant elements are eliminated from the sparse message arrays to utilize the coalesced access of global memory supported by GPGPUs. Regular projective geometry (PG) and irregular QC LDPC codes are used for sum-product algorithm based decoding with the GTX-285 NVIDIA graphics processing unit (GPU), and considerable speed-up results are obtained.
KeywordsLow-density parity-check (LDPC) codes Compute Unified Device Architecture (CUDA) General Purpose Graphics Processing Unit (GPGPU) Memory access optimization
This work was supported in part by the National Research Foundation (NRF) grant funded by the Korea government (MEST) (No. 20090075770 and No. 20090084804) and in part by the MEST under the Brain Korea 21 Project.
- 1.Gallager, R. G. (1963). Low density parity check codes. Cambridge: MIT.Google Scholar
- 2.The Digital Video Broadcasting Standard [Online]. Available: www.dvb.org
- 3.The IEEE 802.16 Working Group [Online]. Available: http://www.ieee802.org/16/
- 4.The IEEE 802.11n Working Group [Online]. Available: http://www.ieee802.org/11/
- 5.Falcão, G., Silva, V., & Sousa L. (2009). How GPUs can outperform ASICSs for fast LDPC decoding. In Proc. of the 2third International Conference on Supercomputing, New York, USA, pp. 390–399Google Scholar
- 10.Chen, J., Dholakia, A., Eleftheriou, E., Fossorier, M., & Hu, X. Y. (2002). Near optimal reduced-complexity decoding algorithms for LDPC codes. In Proc. IEEE Int. Symp. Information Theory, Lausanne, Switzerland, p. 455Google Scholar
- 11.The CUDA Programming Guide [Online]. Available: http://developer.NVIDIA.com/object/cuda.html
- 12.Bell, N., & Garland, M. (2008). Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation.Google Scholar
- 13.Im, E. (2000). Optimizing the performance of sparse matrix-vector multiplication. Technical Report, UMI Order Number: CSD-00-1104., University of California at Berkeley.Google Scholar