Skip to main content

Computer Vision Accelerators for Mobile Systems based on OpenCL GPGPU Co-Processing


In this paper, we present an OpenCL-based heterogeneous implementation of a computer vision algorithm – image inpainting-based object removal algorithm – on mobile devices. To take advantage of the computation power of the mobile processor, the algorithm workflow is partitioned between the CPU and the GPU based on the profiling results on mobile devices, so that the computationally-intensive kernels are accelerated by the mobile GPGPU (general-purpose computing using graphics processing units). By exploring the implementation trade-offs and utilizing the proposed optimization strategies at different levels including algorithm optimization, parallelism optimization, and memory access optimization, we significantly speed up the algorithm with the CPU-GPU heterogeneous implementation, while preserving the quality of the output images. Experimental results show that heterogeneous computing based on GPGPU co-processing can significantly speed up the computer vision algorithms and makes them practical on real-world mobile devices.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15


  1. In this case, the candidate patch is in the area of (S AO A).

  2. In this case, the candidate patch is in the area of OA.


  1. Bordallo Lopez, M, Nykänen, H, Hannuksela, J, Silven, O, Vehviläinen, M. (2011). Accelerating image recognition on mobile devices using GPGPU. Proceeding of SPIE (Vol. 7872, p. 78720R).

  2. Cheng, KT, & Wang, Y. (2011). Using mobile GPU for general-purpose computing - a case study of face recognition on smartphones. Proceedings of IEEE international symposium VLSI design, automation and test (VLSI-DAT) (pp. 1–4). doi:10.1109/VDAT.2011.5783575.

  3. Clemons, JL ((2013)). Computer architectures for mobile computer vision systems. Ph.D. Thesis, University of Michigan. URL

  4. Criminisi, A, Perez, P, Toyama, K. (2003). Proceedings in IEEE conference computer vision and pattern recognition (CVPR) (Vol. 2, pp. 721–728). doi:10.1109/CVPR.2003.1211538.

  5. Ensor, A, & Hall, S (2011). GPU-based image analysis on mobile devices. arXiv:11123110.

  6. Fernández, V, Orduna, JM, Morillo, P. (2012). Performance characterization of mobile phones in augmented reality marker tracking. International conference of computational and mathematical methods in science and engineering (CMMSE) (pp. 537–549).

  7. Google (2013). Android Development Guide. URL

  8. Hofmann, R. (2012). Extraction of natural feature descriptors on mobile GPUs. Thesis Hochschulschriftenserver der Universitat Koblenz-Landau.

  9. Hofmann, R, Seichter, H, Reitmayr, G. (2012). A GPGPU accelerated descriptor for mobile devices. IEEE international symposium mixed and augmented reality (ISMAR) (pp. 289–290).

  10. Limited (2013). Imagination Technologies, PowerVR Graphics. URL

  11. Lee, SE, Zhang, Y, Fang, Z, Srinivasan, S, Iyer, R, Newell, D. (2009). Accelerating mobile augmented reality on a handheld platform. Proceedings of IEEE international conference computer design (ICCD) (pp. 419–426).

  12. Leskela, J, Nikula, J, Salmela, M. (2009). OpenCL Embedded Profile prototype in mobile device. Proceedings in IEEE workshop signal processing system (SiPS) (pp. 279–284). doi:10.1109/SIPS.2009.5336267.

  13. Munshi, A (2010). The OpenCL Specification v1.1, the Khronos Group. URL

  14. Munshi, A, & Leech, J (2009). The OpenGL ES 2.0 Specification, the Khronos Group. URL

  15. Munshi, A, Gaster, B, Mattson, T G, Fung, J, Ginsburg, D. (2011). OpenCL programming guide.: Addison-Wesley.

  16. Nah, JH, Kang, YS, Lee, KJ, Lee, SJ, Han, TD, Yang, SB. (2010). MobiRT: an implementation of OpenGL ES-based CPU-GPU hybrid ray tracer for mobile devices. ACM SIGGRAPH ASIA (p. 50).

  17. Niida, S, Uemura, S, Nakamura, H (2010). Mobile services - user tolerance for waiting time. IEEE Vehicular Technology Magazine, 5(3), 61–67. doi:10.1109/MVT.2010.937850.

    Article  Google Scholar 

  18. NVIDIA Corp (2013). CUDA toolkit v5.5. URL

  19. NVIDIA Corpration (2013). NVIDIA Tegra mobile processor. URL

  20. Paucher, R, & Turk, M. (2010). Location-based augmented reality on mobile phones. IEEE computer society conf. Computer vision and pattern recognition workshops (CVPRW) (pp. 9–16).

  21. Pulli, K, Baksheev, A, Kornyakov, K, Eruhimov, V (2012). Real-time computer vision with OpenCV. Communications of the ACM, 55(6), 61–69.

    Article  Google Scholar 

  22. Qualcomm Inc (2013). Qualcomm Snapdragon Processor. URL

  23. Rister, B, Wang, G, Wu, M, Cavallaro, JR. (2013). A fast and efficient SIFT detector using the mobile GPU. Proceedings of IEEE international conference acoustics, speech, and signal processing (ICASSP) (pp. 2674–2678).

  24. Singhal, N, Park, IK, Cho, S. (2010). Implementation and optimization of image processing algorithms on handheld GPU. Proceedings of IEEE international conference image processing (ICIP) (pp. 4481–4484). doi:10.1109/ICIP.2010.5651740.

  25. Wagner, D, Reitmayr, G, Mulloni, A, Drummond, T, Schmalstieg, D. (2008). Pose tracking from natural features on mobile phones. Proceedings of international symposium mixed and augmented reality (ISMAR) (pp. 125–134).

  26. Wang, G., Rister, B., Cavallaro, J.R. (2013a). Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone. In Proceedings in IEEE global conference signal and information processing (GlobalSIP) (pp. 759-762).

  27. Wang, G, Xiong, Y, Yun, J, Cavallaro, JR. (2013b). Accelerating computer vision algorithms using OpenCL framework on the mobile GPU - a case study. Proceedings of IEEE international conference acoustics, speech, and signal processing (ICASSP) (pp. 2629–2633).

  28. Wang, Y.C., & Cheng, K.T. (2012). Energy and performance characterization of mobile heterogeneous computing. IEEE workshop on signal processing systems (SiPS) (pp. 312–317). IEEE.

  29. Xiong, Y, Liu, D, Pulli, K. (2009). Effective gradient domain object editing on mobile devices. Proceedings of Asilomar conference signals, systems and computers (ASILOMAR) (pp. 1256–1260). doi:10.1109/ACSSC.2009.5469959.

  30. Yang, X., & Cheng, K.T. (2012a). Accelerating SURF detector on mobile devices, ACM Multimedia. URL

  31. Yang, X., & Cheng, K.T. (2012b). LDB: an ultra-fast feature for scalable Augmented Reality on mobile devices. In Proceedings of IEEE international symposium mixed and augmented reality (ISMAR) (pp. 49–57).

Download references


This work was supported in part by Qualcomm, and by the US National Science Foundation under grants CNS-1265332, ECCS-1232274, and EECS-0925942.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Guohui Wang.

Additional information

This paper was partially presented at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, May 2013.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, G., Xiong, Y., Yun, J. et al. Computer Vision Accelerators for Mobile Systems based on OpenCL GPGPU Co-Processing. J Sign Process Syst 76, 283–299 (2014).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Mobile SoC
  • Computer vision
  • CPU-GPU partitioning
  • Co-processing
  • OpenCL