Abstract
Multiple-input multiple-output (MIMO) wireless is an enabling technology for high spectral efficiency and has been adopted in many modern wireless communication standards, such as 3GPP-LTE and IEEE 802.11n. However, (optimal) maximum a-posteriori (MAP) detection suffers from excessively high computational complexity, which prevents its deployment in practical systems. Hence, many algorithms have been proposed in the literature that trade-off performance versus detection complexity. In this paper, we propose a flexible N-Way MIMO detector that achieves excellent error-rate performance and high throughput on graphics processing units (GPUs). The proposed detector includes the required QR decomposition step and a tree-search detector, which exploits the massive parallelism available in GPUs. The proposed algorithm performs multiple tree searches in parallel, which leads to excellent error-rate performance at low computational complexity on different GPU architectures, such as Nvidia Fermi and Kepler. We highlight the flexibility of the proposed detector and demonstrate that it achieves higher throughput than existing GPU-based MIMO detectors while achieving the same or better error-rate performance.
Similar content being viewed by others
Notes
We assume the reader is familiar with CUDA. A detailed description and explanation can be found in [12].
The serial computations can be handled by any thread. For example, it is possible to always pick the 1st thread to compute the squared ℓ 2 -norm.
The column-norm reordering processing, however, is an effective way of improving the N = 1 case. Nevertheless, the BER performance of the N = 1 case with column norm reordering preprocessing is still worse than that of the N = 2 case without column-norm reordering.
References
Amiri, K., Cavallaro, J. R., Dick, C., Rao, R. M. (2011). A high throughput configurable SDR detector for multi-user MIMO wireless systems. Journal of Signal Processing Systems, 62, 233–245.
Barbero, L. G., & Thompson, J. S. (2006). A fixed-complexity MIMO detector based on the complex sphere decoder. In: IEEE international workshop on signal processing advances in wireless communications (SPAWC). IEEE.
Burg, A., Borgmann, M., Wenk, M., Zellweger, M., Fichtner, W., Bölcskei, H. (2005). VLSI implementation of MIMO detection using the sphere decoding algorithm. Journal of Solid-State Circuits, 40, 1566–1577.
Burg, A., Haene, S., Perels, D., Luethi, P., Felber, N., Fichtner, W. (2006). Algorithm and VLSI architecture for linear MMSE detection in MIMO-OFDM systems. In: IEEE international symposium on circuits and systems (ISCAS) (pp. 4012–4105). IEEE.
Hess, C., Wenk, M., Burg, A., Luethi, P., Studer, C., Felber, N., Fichtner, W. (2007). Reduced-complexity MIMO detector with close-to ML error rate performance. In: Proceedings of the 17th ACM great lakes symposium on VLSI (pp. 200–203).
Hochwald, B., & ten Brink, S. (2003). Achieving near-capacity on a multiple-antenna channel. IEEE Transactions on Communications, 51, 389–399.
Janhunen, J., Silven, O., Juntti, M., Myllyla, M. (2008). Software defined radio implementation of K-best list sphere detector algorithm. In: International conference on embedded computer systems: Architectures, modeling, and simulation (SAMOS) (pp. 100–107).
Karypis, G., & Kumar, V. (1994). Unstructured tree search on SIMD parallel computers. IEEE Transactions on Parallel and Distributed Systems, 5(10), 1057–1072.
Kerr, A., Campbell, D., Richards, M. (2009). QR decomposition on GPUs. In: Proceedings of 2nd workshop on GPGPU (pp. 71–78). ACM.
Li, M., Bougard, B., Lopez, E., Bourdoux, A., Novo, D., Van Der Perre, L., Catthoor, F. (2008). Selective spanning with fast enumeration: A near maximum-likelihood MIMO detector designed for parallel programmable baseband architectures. In: IEEE international conference on communications (ICC) (pp. 737–741). IEEE.
Michalke, C., Zimmermann, E., Fettweis, G. (2006). Linear MIMO receivers vs. tree search detection: A performance comparison overview. In: International symposium on personal, indoor and mobile radio communications (PIMRC). IEEE.
NVIDIA Corporation (2008). CUDA compute unified device architecture programming guide. http://www.nvidia.com/object/cuda_develop.html
Qi, Q., & Chakrabarti, C. (2010). Parallel high throughput soft-output sphere decoder. In: IEEE workshop on signal processing systems (SiPS) (pp. 50–55). IEEE.
Roger, S., Ramiro, C., Gonzalez, A., Almenar, V., Vidal, A. (2012). Fully parallel GPU implementation of a fixed-complexity soft-output MIMO detector. IEEE Transactions on Vehicular Technology, 61, 3796–3800.
Schnorr, C., & Euchner, M. (1993). Lattice basis reduction: Improved practical algorithms and solving subset sum problems. Mathematical Programming, 66, 181–191.
Studer, C., Burg, A., Bolcskei, H. (2005). Soft-output sphere decoding: Algorithms and VLSI implementation. IEEE Journal on Selected Areas in Communications, 26(2), 290–300.
Trefethen, L., & Bau, D. (1997). Numerical linear algebra. SIAM: Society for Industrial and Applied Mathematics.
Wong, K., Tsui, C., Cheng, R., Mow, W. (2002). A VLSI architecture of a K-best lattice decoding algorithm for MIMO channels. IEEE international symposium on circuits and systems (ISCAS), 3, 273–276.
Wu, M., Sun, Y., Gupta, S., Cavallaro, J. R. (2010). Implementation of a high throughput soft MIMO detector on GPU. Journal of Signal Processing Systems, 64(1),123–136.
Wu, M., Dick, C., Sun, Y., Cavallaro, J. (2011a). Improving MIMO sphere detection through antenna detection order scheduling. In: Software defined radio forum (SDR-WInnComm) (pp. 280–284).
Wu, M., Sun, Y., Wang, G., Cavallaro, J. (2011b). Implementation of a high throughput 3GPP turbo decoder on GPU. Journal of Signal Processing Systems, 171–183.
Wu, M., Yin, B., Cavallaro, J. R. (2012). Flexible N-way MIMO detector on GPU. In: IEEE workshop on signal processing systems (SiPS) (pp. 318–323). IEEE.
Wubben, D., Bohnke, R., Kuhn, V., Kammeyer, K. D. (2004). Near-maximum-likelihood detection of MIMO systems using MMSE-based lattice reduction. In: IEEE international conference on communications (vol. 2, pp. 798–802). IEEE.
Acknowledgments
This work was supported in part by Renesas Mobile, Texas Instruments, Xilinx, Samsung, Huawei, and by the US National Science Foundation under grants CNS-1265332, ECCS-1232274, EECS-0925942 and CNS-0923479.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, M., Yin, B., Wang, G. et al. GPU Acceleration of a Configurable N-Way MIMO Detector for Wireless Systems. J Sign Process Syst 76, 95–108 (2014). https://doi.org/10.1007/s11265-014-0877-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-014-0877-0