Journal of Signal Processing Systems

, Volume 76, Issue 3, pp 283–299 | Cite as

Computer Vision Accelerators for Mobile Systems based on OpenCL GPGPU Co-Processing

  • Guohui Wang
  • Yingen Xiong
  • Jay Yun
  • Joseph R. Cavallaro


In this paper, we present an OpenCL-based heterogeneous implementation of a computer vision algorithm – image inpainting-based object removal algorithm – on mobile devices. To take advantage of the computation power of the mobile processor, the algorithm workflow is partitioned between the CPU and the GPU based on the profiling results on mobile devices, so that the computationally-intensive kernels are accelerated by the mobile GPGPU (general-purpose computing using graphics processing units). By exploring the implementation trade-offs and utilizing the proposed optimization strategies at different levels including algorithm optimization, parallelism optimization, and memory access optimization, we significantly speed up the algorithm with the CPU-GPU heterogeneous implementation, while preserving the quality of the output images. Experimental results show that heterogeneous computing based on GPGPU co-processing can significantly speed up the computer vision algorithms and makes them practical on real-world mobile devices.


Mobile SoC Computer vision CPU-GPU partitioning Co-processing OpenCL 



This work was supported in part by Qualcomm, and by the US National Science Foundation under grants CNS-1265332, ECCS-1232274, and EECS-0925942.


  1. 1.
    Bordallo Lopez, M, Nykänen, H, Hannuksela, J, Silven, O, Vehviläinen, M. (2011). Accelerating image recognition on mobile devices using GPGPU. Proceeding of SPIE (Vol. 7872, p. 78720R).Google Scholar
  2. 2.
    Cheng, KT, & Wang, Y. (2011). Using mobile GPU for general-purpose computing - a case study of face recognition on smartphones. Proceedings of IEEE international symposium VLSI design, automation and test (VLSI-DAT) (pp. 1–4). doi: 10.1109/VDAT.2011.5783575.
  3. 3.
    Clemons, JL ((2013)). Computer architectures for mobile computer vision systems. Ph.D. Thesis, University of Michigan. URL
  4. 4.
    Criminisi, A, Perez, P, Toyama, K. (2003). Proceedings in IEEE conference computer vision and pattern recognition (CVPR) (Vol. 2, pp. 721–728). doi: 10.1109/CVPR.2003.1211538.
  5. 5.
    Ensor, A, & Hall, S (2011). GPU-based image analysis on mobile devices. arXiv:11123110.
  6. 6.
    Fernández, V, Orduna, JM, Morillo, P. (2012). Performance characterization of mobile phones in augmented reality marker tracking. International conference of computational and mathematical methods in science and engineering (CMMSE) (pp. 537–549).Google Scholar
  7. 7.
    Google (2013). Android Development Guide. URL
  8. 8.
    Hofmann, R. (2012). Extraction of natural feature descriptors on mobile GPUs. Thesis Hochschulschriftenserver der Universitat Koblenz-Landau.Google Scholar
  9. 9.
    Hofmann, R, Seichter, H, Reitmayr, G. (2012). A GPGPU accelerated descriptor for mobile devices. IEEE international symposium mixed and augmented reality (ISMAR) (pp. 289–290).Google Scholar
  10. 10.
    Limited (2013). Imagination Technologies, PowerVR Graphics. URL
  11. 11.
    Lee, SE, Zhang, Y, Fang, Z, Srinivasan, S, Iyer, R, Newell, D. (2009). Accelerating mobile augmented reality on a handheld platform. Proceedings of IEEE international conference computer design (ICCD) (pp. 419–426).Google Scholar
  12. 12.
    Leskela, J, Nikula, J, Salmela, M. (2009). OpenCL Embedded Profile prototype in mobile device. Proceedings in IEEE workshop signal processing system (SiPS) (pp. 279–284). doi: 10.1109/SIPS.2009.5336267.
  13. 13.
    Munshi, A (2010). The OpenCL Specification v1.1, the Khronos Group. URL
  14. 14.
    Munshi, A, & Leech, J (2009). The OpenGL ES 2.0 Specification, the Khronos Group. URL
  15. 15.
    Munshi, A, Gaster, B, Mattson, T G, Fung, J, Ginsburg, D. (2011). OpenCL programming guide.: Addison-Wesley.Google Scholar
  16. 16.
    Nah, JH, Kang, YS, Lee, KJ, Lee, SJ, Han, TD, Yang, SB. (2010). MobiRT: an implementation of OpenGL ES-based CPU-GPU hybrid ray tracer for mobile devices. ACM SIGGRAPH ASIA (p. 50).Google Scholar
  17. 17.
    Niida, S, Uemura, S, Nakamura, H (2010). Mobile services - user tolerance for waiting time. IEEE Vehicular Technology Magazine, 5(3), 61–67. doi: 10.1109/MVT.2010.937850.CrossRefGoogle Scholar
  18. 18.
    NVIDIA Corp (2013). CUDA toolkit v5.5. URL
  19. 19.
    NVIDIA Corpration (2013). NVIDIA Tegra mobile processor. URL
  20. 20.
    Paucher, R, & Turk, M. (2010). Location-based augmented reality on mobile phones. IEEE computer society conf. Computer vision and pattern recognition workshops (CVPRW) (pp. 9–16).Google Scholar
  21. 21.
    Pulli, K, Baksheev, A, Kornyakov, K, Eruhimov, V (2012). Real-time computer vision with OpenCV. Communications of the ACM, 55(6), 61–69.CrossRefGoogle Scholar
  22. 22.
    Qualcomm Inc (2013). Qualcomm Snapdragon Processor. URL
  23. 23.
    Rister, B, Wang, G, Wu, M, Cavallaro, JR. (2013). A fast and efficient SIFT detector using the mobile GPU. Proceedings of IEEE international conference acoustics, speech, and signal processing (ICASSP) (pp. 2674–2678).Google Scholar
  24. 24.
    Singhal, N, Park, IK, Cho, S. (2010). Implementation and optimization of image processing algorithms on handheld GPU. Proceedings of IEEE international conference image processing (ICIP) (pp. 4481–4484). doi: 10.1109/ICIP.2010.5651740.
  25. 25.
    Wagner, D, Reitmayr, G, Mulloni, A, Drummond, T, Schmalstieg, D. (2008). Pose tracking from natural features on mobile phones. Proceedings of international symposium mixed and augmented reality (ISMAR) (pp. 125–134).Google Scholar
  26. 26.
    Wang, G., Rister, B., Cavallaro, J.R. (2013a). Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone. In Proceedings in IEEE global conference signal and information processing (GlobalSIP) (pp. 759-762).Google Scholar
  27. 27.
    Wang, G, Xiong, Y, Yun, J, Cavallaro, JR. (2013b). Accelerating computer vision algorithms using OpenCL framework on the mobile GPU - a case study. Proceedings of IEEE international conference acoustics, speech, and signal processing (ICASSP) (pp. 2629–2633).Google Scholar
  28. 28.
    Wang, Y.C., & Cheng, K.T. (2012). Energy and performance characterization of mobile heterogeneous computing. IEEE workshop on signal processing systems (SiPS) (pp. 312–317). IEEE. Google Scholar
  29. 29.
    Xiong, Y, Liu, D, Pulli, K. (2009). Effective gradient domain object editing on mobile devices. Proceedings of Asilomar conference signals, systems and computers (ASILOMAR) (pp. 1256–1260). doi: 10.1109/ACSSC.2009.5469959.
  30. 30.
    Yang, X., & Cheng, K.T. (2012a). Accelerating SURF detector on mobile devices, ACM Multimedia. URL
  31. 31.
    Yang, X., & Cheng, K.T. (2012b). LDB: an ultra-fast feature for scalable Augmented Reality on mobile devices. In Proceedings of IEEE international symposium mixed and augmented reality (ISMAR) (pp. 49–57).Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Guohui Wang
    • 1
  • Yingen Xiong
    • 2
  • Jay Yun
    • 2
  • Joseph R. Cavallaro
    • 1
  1. 1.Department of Electrical and Computer EngineeringRice UniversityHoustonUSA
  2. 2.Qualcomm Technologies Inc.San DiegoUSA

Personalised recommendations