Implementation of a High Throughput Soft MIMO Detector on GPU
Multiple-input multiple-output (MIMO) significantly increases the throughput of a communication system by employing multiple antennas at the transmitter and the receiver. To extract maximum performance from a MIMO system, a computationally intensive search based detector is needed. To meet the challenge of MIMO detection, typical suboptimal MIMO detectors are ASIC or FPGA designs. We aim to show that a MIMO detector on Graphic processor unit (GPU), a low-cost parallel programmable co-processor, can achieve high throughput and can serve as an alternative to ASIC/FPGA designs. However, careful architecture aware software design is needed to leverage the performance offered by GPU. We propose a novel soft MIMO detection algorithm, multi-pass trellis traversal (MTT), and show that we can achieve ASIC/FPGA-like performance and handle different configurations in software on GPU. The proposed design can be used to accelerate wireless physical layer simulations and to offload MIMO detection processing in wireless testbed platforms.
KeywordsGPU Soft output detection MIMO Wireless baseband architecture
This work was supported in part by Nokia, NSN, Texas Instruments, Xilinx, and by NSF under grants CCF-0541363, CNS-0551692, CNS-0619767, EECS-0925942 and CNS-0923479.
- 1.Amiri, K., Sun. Y., Murphy, P., Hunter, C., Cavallaro, J. R., et al. (2007). Warp, a unified wireless network testbed for education and research. In MSE ’07: Proceedings of the 2007 IEEE international conference on microelectronic systems education.Google Scholar
- 2.Antikainen, J., Salmela, P., Silven, O., Juntti, M., Takala, J., & Myllyla, M. (2007). Application-specific instruction set processor implementation of list sphere detector. EURASIP Journal on Embedded Systems.Google Scholar
- 6.Falcão, G., Silva, V., & Sousa, L. (2009). How GPUs can outperform ASICs for fast LDPC decoding. In ICS ’09: Proceedings of the 23rd international conference on supercomputing.Google Scholar
- 13.NVIDIA Corporation (2008). CUDA compute unified device architecture programming guide. http://www.nvidia.com/object/cuda_develop.html.
- 14.NVIDIA Corporation (2009). NVIDIA CUDA visual profiler version 2.2 readme. http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/cudaprof_1.2_readme.html.
- 15.Qi, Q., & Chakrabarti, C. (2007). Sphere decoding for multiprocessor architectures. In IEEE workshop on signal processing systems (pp. 17–19).Google Scholar
- 16.Sun, Y., & Cavallaro, J. R. (2009). High throughput vlsi architecture for soft-output mimo detection based on a greedy graph algorithm. In GLSVLSI ’09: Proceedings of the 19th ACM great lakes symposium on VLSI. ACM.Google Scholar
- 17.Sun, Y., & Cavallaro, J. R. (2008). A low-power 1-Gbps reconfigurable LDPC decoder design for multiple 4G wireless standards. In IEEE international SOC conference (pp. 367–370).Google Scholar
- 18.van der Laan, W. J. (2009). Decuda. http://wiki.github.com/laanwj/decuda.
- 19.Wong, K., Tsui, C., Cheng, R., & Mow, W. (2002). A VLSI architecture of a K-best lattice decoding algorithm for MIMO channels. In IEEE int. symp. on circuits and syst. (Vol. 3, pp. 273–276).Google Scholar
- 20.Wu, M., Sun, Y., & Cavallaro, J. R. (2009). Reconfigurable real-time MIMO detector on GPU. In IEEE 43rd asilomar conference on signals, systems and computers (ASILOMAR’09).Google Scholar
- 21.Wu, M., Gupta, S., Sun, Y., & Cavallaro, J. R. (2009). A GPU implementation of A real-time MIMO detector. In IEEE workshop on signal processing systems (SiPS’09).Google Scholar