Journal of Signal Processing Systems

, Volume 89, Issue 3, pp 417–430 | Cite as

Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters

  • Kaipeng Li
  • Amanullah Ghazi
  • Chance Tarver
  • Jani Boutellier
  • Mahmoud Abdelaziz
  • Lauri Anttila
  • Markku Juntti
  • Mikko Valkama
  • Joseph R. Cavallaro


Digital predistortion (DPD) is a widely adopted baseband processing technique in current radio transmitters. While DPD can effectively suppress unwanted spurious spectrum emissions stemming from imperfections of analog RF and baseband electronics, it also introduces extra processing complexity and poses challenges on efficient and flexible implementations, especially for mobile cellular transmitters, considering their limited computing power compared to basestations. In this paper, we present high data rate implementations of broadband DPD on modern embedded processors, such as mobile GPU and multicore CPU, by taking advantage of emerging parallel computing techniques for exploiting their computing resources. We further verify the suppression effect of DPD experimentally on real radio hardware platforms. Performance evaluation results of our DPD design demonstrate the high efficacy of modern general purpose mobile processors on accelerating DPD processing for a mobile transmitter.


Digital predistortion Software-defined radio Mobile SoC CUDA NEON SIMD 



This work was supported by the US NSF under grants EECS-1408370, CNS-1265332, ECCS-1232274, and the Finnish Agency of Innovation, Tekes.


  1. 1.
    Mak, P.-I., U, S.-P., & Martins, R.P. (2007). Transceiver architecture selection: review, state-of-the-art survey and case study. IEEE Circuits and Systems Magazine, 7(2), 6–25.CrossRefGoogle Scholar
  2. 2.
    Larsson, E., Edfors, O., Tufvesson, F., & Marzetta, T. (2014). Massive MIMO for next generation wireless systems. IEEE Communications Magazine, 52(2), 186–195.CrossRefGoogle Scholar
  3. 3.
    Dahlman, E., Parkvall, S., & Skold, J. (2011). 4G LTE/LTE-advanced for mobile broadband.Google Scholar
  4. 4.
    Haykin, S. (2005). Cognitive radio: brain-empowered wireless communications. IEEE Journal on Selected Areas in Communications, 23(2), 201–220.CrossRefGoogle Scholar
  5. 5.
    Lehtinen, V., Lahteensuo, T., Vasenkari, P., Piipponen, A., & Valkama, M. (2013). Gating factor analysis of maximum power reduction in multicluster lte-a uplink transmission, in. IEEE Radio and Wireless Symposium (RWS), 2013, 151–153.Google Scholar
  6. 6.
    Kim, J., & Konstantinou, K. (2001). Digital predistortion of wideband signals based on power amplifier model with memory. Electronics Letters, 37(23), 1–2.CrossRefGoogle Scholar
  7. 7.
    Anttila, L., Handel, P., & Valkama, M. (2010). Joint mitigation of power amplifier and I/Q modulator impairments in broadband direct-conversion transmitters. IEEE Transactions on Microwave Theory and Techniques, 58(4), 730–739.CrossRefGoogle Scholar
  8. 8.
    Kim, Y.D., Jeong, E.R., & Lee, Y.H. (2007). Adaptive compensation for power amplifier nonlinearity in the presence of quadrature modulation/demodulation errors. IEEE Transactions on Signal Processing, 55(9), 4717–4721.CrossRefMathSciNetGoogle Scholar
  9. 9.
    Wolf, M. (2014). High-performance embedded computing: applications in cyber-physical systems and mobile computing. Newnes.Google Scholar
  10. 10.
    Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., & Phillips, J.C. (2008). GPU computing. Proceedings of the IEEE, 96(5), 879–899.CrossRefGoogle Scholar
  11. 11.
    Wang, G., Xiong, Y., Yun, J., & Cavallaro, J.R. (2013). Accelerating computer vision algorithms using opencl framework on the mobile gpu - a case study. In IEEE International conference on acoustics, speech and signal processing (pp. 2629–2633).Google Scholar
  12. 12.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: convolutional architecture for fast feature embedding, Proceedings of the 22Nd ACM international conference on multimedia, MM ’14 (pp. 675–678). New York: ACM.Google Scholar
  13. 13.
    Li, K., Wu, M., Wang, G., & Cavallaro, J.R. (2014). A high performance GPU-based software-defined basestation. In 48th IEEE Asilomar conference on signals, systems, and computers (ASILOMAR).Google Scholar
  14. 14.
    Li, K., Yin, B., Wu, M., Cavallaro, J.R., & Studer, C. (2015). Accelerating massive MIMO uplink detection on GPU for SDR systems. In 2015 IEEE Dallas on circuits and systems conference (DCAS) (pp. 1–4).Google Scholar
  15. 15.
    Nvidia CUDA tookit documentation.
  16. 16.
    The open standard for parallel programming of heterogeneous systems,
  17. 17.
    Abdelaziz, M., Tarver, C., Li, K., Anttila, L., Martinez, R., Valkama, M., & Cavallaro, J.R. (2015). Sub-band digital predistortion for noncontiguous transmissions: algorithm development and real-time prototype implementation. In 2015 49th Asilomar conference on signals, systems and computers (pp. 1180–1186).Google Scholar
  18. 18.
    Ghazi, A., Boutellier, J., Abdelaziz, M., Xiaojia, L., Anttila, L., Cavallaro, J.R., Bhattacharyya, S.S., Valkama, M., & Juntti, M. (2014). Low power implementation of digital predistortion filter on a heterogeneous application specific multiprocessor. In IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 8336–8340).Google Scholar
  19. 19.
    Li, K., Ghazi, A., Boutellier, J., Abdelaziz, M., Anttila, L., Juntti, M., Valkama, M., & Cavallaro, J R. (2015). Mobile GPU accelerated digital predistortion on a software-defined mobile transmitter. In 2015 IEEE Global conference on signal and information processing (GlobalSIP) (pp. 756–760).Google Scholar
  20. 20.
    Ghazi, A., Boutellier, J., Anttila, L., Juntti, M., & Valkama, M. (2015). Data-parallel implementation of reconfigurable digital predistortion on a mobile gpu. In 2015 49th Asilomar conference on signals, systems and computers (pp. 186–191).Google Scholar
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
    Raich, R., & Zhou, G.T. (2004). Orthogonal polynomials for complex gaussian processes. IEEE Transactions on Signal Processing, 52(10), 2788–2797.CrossRefzbMATHMathSciNetGoogle Scholar
  26. 26.
    Changsoo, E., & Powers, E.J. (1997). A new Volterra predistorter based on the indirect learning architecture. IEEE Transactions on Signal Processing, 45(1), 223–227.CrossRefGoogle Scholar
  27. 27.
  28. 28.
  29. 29.
    Jetson performance tuning,
  30. 30.
    Nikolskiy, V.P., Stegailov, V.V., & Vecher, V.S. (2016). Efficiency of the Tegra K1 and X1 systems-on-chip for classical molecular dynamics. In 2016 International conference on high performance computing simulation (HPCS) (pp. 682–689).Google Scholar
  31. 31.
    Stokke, K.R., Stensland, H.K., Griwodz, C., & Halvorsen, P. (2016). A High-precision, Hybrid GPU, CPU and RAM power model for generic multimedia workloads. In Proceedings of the 7th International conference on multimedia systems, MMSys ’16 (pp. 14:1–14:12). New York: ACM.Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringRice UniversityHoustonUSA
  2. 2.Department of Computer Science and EngineeringUniversity of OuluOuluFinland
  3. 3.Department of Electronics and Communication EngineeringTampere University of TechnologyTampereFinland

Personalised recommendations