Skip to main content
Log in

GPU Acceleration of a Configurable N-Way MIMO Detector for Wireless Systems

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Multiple-input multiple-output (MIMO) wireless is an enabling technology for high spectral efficiency and has been adopted in many modern wireless communication standards, such as 3GPP-LTE and IEEE 802.11n. However, (optimal) maximum a-posteriori (MAP) detection suffers from excessively high computational complexity, which prevents its deployment in practical systems. Hence, many algorithms have been proposed in the literature that trade-off performance versus detection complexity. In this paper, we propose a flexible N-Way MIMO detector that achieves excellent error-rate performance and high throughput on graphics processing units (GPUs). The proposed detector includes the required QR decomposition step and a tree-search detector, which exploits the massive parallelism available in GPUs. The proposed algorithm performs multiple tree searches in parallel, which leads to excellent error-rate performance at low computational complexity on different GPU architectures, such as Nvidia Fermi and Kepler. We highlight the flexibility of the proposed detector and demonstrate that it achieves higher throughput than existing GPU-based MIMO detectors while achieving the same or better error-rate performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

Notes

  1. We assume the reader is familiar with CUDA. A detailed description and explanation can be found in [12].

  2. The serial computations can be handled by any thread. For example, it is possible to always pick the 1st thread to compute the squared 2 -norm.

  3. The column-norm reordering processing, however, is an effective way of improving the N = 1 case. Nevertheless, the BER performance of the N = 1 case with column norm reordering preprocessing is still worse than that of the N = 2 case without column-norm reordering.

References

  1. Amiri, K., Cavallaro, J. R., Dick, C., Rao, R. M. (2011). A high throughput configurable SDR detector for multi-user MIMO wireless systems. Journal of Signal Processing Systems, 62, 233–245.

    Article  Google Scholar 

  2. Barbero, L. G., & Thompson, J. S. (2006). A fixed-complexity MIMO detector based on the complex sphere decoder. In: IEEE international workshop on signal processing advances in wireless communications (SPAWC). IEEE.

  3. Burg, A., Borgmann, M., Wenk, M., Zellweger, M., Fichtner, W., Bölcskei, H. (2005). VLSI implementation of MIMO detection using the sphere decoding algorithm. Journal of Solid-State Circuits, 40, 1566–1577.

    Article  Google Scholar 

  4. Burg, A., Haene, S., Perels, D., Luethi, P., Felber, N., Fichtner, W. (2006). Algorithm and VLSI architecture for linear MMSE detection in MIMO-OFDM systems. In: IEEE international symposium on circuits and systems (ISCAS) (pp. 4012–4105). IEEE.

  5. Hess, C., Wenk, M., Burg, A., Luethi, P., Studer, C., Felber, N., Fichtner, W. (2007). Reduced-complexity MIMO detector with close-to ML error rate performance. In: Proceedings of the 17th ACM great lakes symposium on VLSI (pp. 200–203).

  6. Hochwald, B., & ten Brink, S. (2003). Achieving near-capacity on a multiple-antenna channel. IEEE Transactions on Communications, 51, 389–399.

    Article  Google Scholar 

  7. Janhunen, J., Silven, O., Juntti, M., Myllyla, M. (2008). Software defined radio implementation of K-best list sphere detector algorithm. In: International conference on embedded computer systems: Architectures, modeling, and simulation (SAMOS) (pp. 100–107).

  8. Karypis, G., & Kumar, V. (1994). Unstructured tree search on SIMD parallel computers. IEEE Transactions on Parallel and Distributed Systems, 5(10), 1057–1072.

    Article  Google Scholar 

  9. Kerr, A., Campbell, D., Richards, M. (2009). QR decomposition on GPUs. In: Proceedings of 2nd workshop on GPGPU (pp. 71–78). ACM.

  10. Li, M., Bougard, B., Lopez, E., Bourdoux, A., Novo, D., Van Der Perre, L., Catthoor, F. (2008). Selective spanning with fast enumeration: A near maximum-likelihood MIMO detector designed for parallel programmable baseband architectures. In: IEEE international conference on communications (ICC) (pp. 737–741). IEEE.

  11. Michalke, C., Zimmermann, E., Fettweis, G. (2006). Linear MIMO receivers vs. tree search detection: A performance comparison overview. In: International symposium on personal, indoor and mobile radio communications (PIMRC). IEEE.

  12. NVIDIA Corporation (2008). CUDA compute unified device architecture programming guide. http://www.nvidia.com/object/cuda_develop.html

  13. Qi, Q., & Chakrabarti, C. (2010). Parallel high throughput soft-output sphere decoder. In: IEEE workshop on signal processing systems (SiPS) (pp. 50–55). IEEE.

  14. Roger, S., Ramiro, C., Gonzalez, A., Almenar, V., Vidal, A. (2012). Fully parallel GPU implementation of a fixed-complexity soft-output MIMO detector. IEEE Transactions on Vehicular Technology, 61, 3796–3800.

    Article  Google Scholar 

  15. Schnorr, C., & Euchner, M. (1993). Lattice basis reduction: Improved practical algorithms and solving subset sum problems. Mathematical Programming, 66, 181–191.

    Article  MathSciNet  Google Scholar 

  16. Studer, C., Burg, A., Bolcskei, H. (2005). Soft-output sphere decoding: Algorithms and VLSI implementation. IEEE Journal on Selected Areas in Communications, 26(2), 290–300.

    Article  Google Scholar 

  17. Trefethen, L., & Bau, D. (1997). Numerical linear algebra. SIAM: Society for Industrial and Applied Mathematics.

  18. Wong, K., Tsui, C., Cheng, R., Mow, W. (2002). A VLSI architecture of a K-best lattice decoding algorithm for MIMO channels. IEEE international symposium on circuits and systems (ISCAS), 3, 273–276.

    Google Scholar 

  19. Wu, M., Sun, Y., Gupta, S., Cavallaro, J. R. (2010). Implementation of a high throughput soft MIMO detector on GPU. Journal of Signal Processing Systems, 64(1),123–136.

    Google Scholar 

  20. Wu, M., Dick, C., Sun, Y., Cavallaro, J. (2011a). Improving MIMO sphere detection through antenna detection order scheduling. In: Software defined radio forum (SDR-WInnComm) (pp. 280–284).

  21. Wu, M., Sun, Y., Wang, G., Cavallaro, J. (2011b). Implementation of a high throughput 3GPP turbo decoder on GPU. Journal of Signal Processing Systems, 171–183.

  22. Wu, M., Yin, B., Cavallaro, J. R. (2012). Flexible N-way MIMO detector on GPU. In: IEEE workshop on signal processing systems (SiPS) (pp. 318–323). IEEE.

  23. Wubben, D., Bohnke, R., Kuhn, V., Kammeyer, K. D. (2004). Near-maximum-likelihood detection of MIMO systems using MMSE-based lattice reduction. In: IEEE international conference on communications (vol. 2, pp. 798–802). IEEE.

Download references

Acknowledgments

This work was supported in part by Renesas Mobile, Texas Instruments, Xilinx, Samsung, Huawei, and by the US National Science Foundation under grants CNS-1265332, ECCS-1232274, EECS-0925942 and CNS-0923479.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, M., Yin, B., Wang, G. et al. GPU Acceleration of a Configurable N-Way MIMO Detector for Wireless Systems. J Sign Process Syst 76, 95–108 (2014). https://doi.org/10.1007/s11265-014-0877-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0877-0

Keywords

Navigation