Journal of Signal Processing Systems

, Volume 64, Issue 1, pp 123–136 | Cite as

Implementation of a High Throughput Soft MIMO Detector on GPU

  • Michael WuEmail author
  • Yang Sun
  • Siddharth Gupta
  • Joseph R. Cavallaro


Multiple-input multiple-output (MIMO) significantly increases the throughput of a communication system by employing multiple antennas at the transmitter and the receiver. To extract maximum performance from a MIMO system, a computationally intensive search based detector is needed. To meet the challenge of MIMO detection, typical suboptimal MIMO detectors are ASIC or FPGA designs. We aim to show that a MIMO detector on Graphic processor unit (GPU), a low-cost parallel programmable co-processor, can achieve high throughput and can serve as an alternative to ASIC/FPGA designs. However, careful architecture aware software design is needed to leverage the performance offered by GPU. We propose a novel soft MIMO detection algorithm, multi-pass trellis traversal (MTT), and show that we can achieve ASIC/FPGA-like performance and handle different configurations in software on GPU. The proposed design can be used to accelerate wireless physical layer simulations and to offload MIMO detection processing in wireless testbed platforms.


GPU Soft output detection MIMO Wireless baseband architecture 



This work was supported in part by Nokia, NSN, Texas Instruments, Xilinx, and by NSF under grants CCF-0541363, CNS-0551692, CNS-0619767, EECS-0925942 and CNS-0923479.


  1. 1.
    Amiri, K., Sun. Y., Murphy, P., Hunter, C., Cavallaro, J. R., et al. (2007). Warp, a unified wireless network testbed for education and research. In MSE ’07: Proceedings of the 2007 IEEE international conference on microelectronic systems education.Google Scholar
  2. 2.
    Antikainen, J., Salmela, P., Silven, O., Juntti, M., Takala, J., & Myllyla, M. (2007). Application-specific instruction set processor implementation of list sphere detector. EURASIP Journal on Embedded Systems.Google Scholar
  3. 3.
    Burg, A., Borgmann, M., Wenk, M., Zellweger, M., Fichtner, W., & Bolcskei, H. (2005). VLSI implementation of MIMO detection using the sphere decoding algorithm. IEEE Journal Solid-State Circuit, 40, 1566–1577.CrossRefGoogle Scholar
  4. 4.
    Chen, S., Zhang, T., & Xin, Y. (2007). Relaxed K-best MIMO signal detector design and VLSI implementation. IEEE Transactions on Very Large Scale Integration (VLSI) System, 15, 328–337.CrossRefGoogle Scholar
  5. 5.
    de Jong, Y. L. C. , & Willink, T. J. (2002). Iterative tree search detection for MIMO wireless systems. IEEE Transactions on Communications, 53(6), 930–935.CrossRefGoogle Scholar
  6. 6.
    Falcão, G., Silva, V., & Sousa, L. (2009). How GPUs can outperform ASICs for fast LDPC decoding. In ICS ’09: Proceedings of the 23rd international conference on supercomputing.Google Scholar
  7. 7.
    Fincke, U., & Pohst, M. (1985). Improved methods for calculating vectors of short length in a lattice, including a complexity analysis. Mathematics of Computation, 44(170), 463–471.MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Garrett, D., Davis, L., ten Brink, S., Hochwald, B., & Knagge, G. (2004). Silicon complexity for maximum likelihood MIMO detection using spherical decoding. IEEE Journal of Solid-State Circuit, 39, 1544–1552.CrossRefGoogle Scholar
  9. 9.
    Guo, Z., & Nilsson, P. (2006). Algorithm and implementation of the K-best sphere decoding for MIMO detection. IEEE Journal on Selected Areas in Communication, 24, 491–503.CrossRefGoogle Scholar
  10. 10.
    Hochwald, B., & Brink, S. (2003). Achieving near-capacity on a multiple-antenna channel. IEEE Transactions on Communications, 51, 389–399.CrossRefGoogle Scholar
  11. 11.
    Huang, X., Liang, C., & Ma, J. (2008). System architecture and implementation of MIMO sphere decoders on FPGA. IEEE Transactions on Very Large Scale Integration (VLSI) System, 2, 188–197.CrossRefGoogle Scholar
  12. 12.
    Janhunen, J., Silvn, O., & Juntti, M. (2010). Programmable processor implementations of K-best list sphere detector for MIMO receiver. Signal Processing, 90(1), 313–323.zbMATHCrossRefGoogle Scholar
  13. 13.
    NVIDIA Corporation (2008). CUDA compute unified device architecture programming guide.
  14. 14.
    NVIDIA Corporation (2009). NVIDIA CUDA visual profiler version 2.2 readme.
  15. 15.
    Qi, Q., & Chakrabarti, C. (2007). Sphere decoding for multiprocessor architectures. In IEEE workshop on signal processing systems (pp. 17–19).Google Scholar
  16. 16.
    Sun, Y., & Cavallaro, J. R. (2009). High throughput vlsi architecture for soft-output mimo detection based on a greedy graph algorithm. In GLSVLSI ’09: Proceedings of the 19th ACM great lakes symposium on VLSI. ACM.Google Scholar
  17. 17.
    Sun, Y., & Cavallaro, J. R. (2008). A low-power 1-Gbps reconfigurable LDPC decoder design for multiple 4G wireless standards. In IEEE international SOC conference (pp. 367–370).Google Scholar
  18. 18.
    van der Laan, W. J. (2009). Decuda.
  19. 19.
    Wong, K., Tsui, C., Cheng, R., & Mow, W. (2002). A VLSI architecture of a K-best lattice decoding algorithm for MIMO channels. In IEEE int. symp. on circuits and syst. (Vol. 3, pp. 273–276).Google Scholar
  20. 20.
    Wu, M., Sun, Y., & Cavallaro, J. R. (2009). Reconfigurable real-time MIMO detector on GPU. In IEEE 43rd asilomar conference on signals, systems and computers (ASILOMAR’09).Google Scholar
  21. 21.
    Wu, M., Gupta, S., Sun, Y., & Cavallaro, J. R. (2009). A GPU implementation of A real-time MIMO detector. In IEEE workshop on signal processing systems (SiPS’09).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Michael Wu
    • 1
    Email author
  • Yang Sun
    • 1
  • Siddharth Gupta
    • 1
  • Joseph R. Cavallaro
    • 1
  1. 1.Electrical and Computer EngineeringRice UniversityHoustonUSA

Personalised recommendations