Energy-Aware Real-Time Face Recognition System on Mobile CPU-GPU Platform

  • Yi-Chu Wang
  • Bryan Donyanavard
  • Kwang-Ting (Tim) Cheng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6554)


The Graphics Processor Unit (GPU) has expanded its role from an accelerator for rendering graphics into an efficient parallel processor for general purpose computing. The GPU, an indispensable component in desktop and server-class computers as well as game consoles, has also become an integrated component in handheld devices, such as smartphones. Since the handheld devices are mostly powered by battery, the mobile GPU is usually designed with an emphasis on low-power rather than on performance. In addition, the memory bus architecture of mobile devices is also quite different from those of desktops, servers, and game consoles. In this paper, we try to provide answers to the following two questions: (1) Can a mobile GPU be used as a powerful accelerator in the mobile platform for general purpose computing, similar to its role in the desktop and server platforms? (2) What is the role of a mobile GPU in energy-optimized real-time mobile applications? We use face recognition as an application driver which is a compute-intensive task and is a core process for several mobile applications. The experiments of our investigation were performed on an Nvidia Tegra development board which consists of a dual-core ARM Cortex A9 CPU and a Nvidia mobile GPU integrated in a SoC. The experiment results show that, utilizing the mobile GPU can achieve a 4.25x speedup in performance and 3.98x reduction in energy consumption, in comparison with a CPU-only implementation on the same platform.


Fast Fourier Transform Face Recognition Graphic Processing Unit Gabor Wavelet Face Recognition System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Nvidia CUDA Compute Unified Device Architecture Programming Guide; Version 2.0, Nvidia Corporation (2008),
  2. 2.
    Samant, S.S., Xia, J., Muyan-Ozcelik, P., Owens, J.D.: High performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy. Medical Phisics (2008)Google Scholar
  3. 3.
    Kim, J.-S., Hwangbo, M., Kanade, T.: Realtime Affine-photometric KLT Feature Tracker on GPU in CUDA Framework. In: IEEE Workshop of Embedded Computer Vision (2009)Google Scholar
  4. 4.
    OMAP3 family of multimedia application processors, Texas Instruments Inc. (2007),
  5. 5.
    Akenine-Moller, T., Strom, J.: Graphics Processing Units for Handhelds. Proceedings of the IEEE 96(5), 779–789 (2008)CrossRefGoogle Scholar
  6. 6.
    Munshi, A., Ginsburg, D., Shreiner, D.: OpenGL ES 2.0 Programming Guide. Addison-Wesley, USA (2008)Google Scholar
  7. 7.
    Leskela, J., Nikula, J., Salmela, M.: OpenCL embedded profile prototype in mobile device. In: IEEE Workshop on Signal Processing Systems (SiPS 2009), pp. 279–284 (2009)Google Scholar
  8. 8.
    Munshi, A., (ed.) Khronos OpenCL Working Group, The OpenCL Specification, Version 1.0, Rev. 43, Khronos Group, USA (May 2009)Google Scholar
  9. 9.
  10. 10.
    Chu, S-W., Yeh, M-C., Cheng, K-T.: A real-time, embedded face-annotation system. ACM MM Technical Demonstrations (2008)Google Scholar
  11. 11.
    Rofouei, M., Stathopoulos, T., Ryffel, S., Kaiser, W., Sarrafzadeh, M.: Energy-Aware High Performance Computing with Graphic Processing Units. In: Workshop on Power Aware Computing and Systems (HotPower 2008), San Diego, December 8-10 (2008)Google Scholar
  12. 12.
    Ren, D.Q., Suda, R.: Power Efficient Large Matrices Multiplication by Load Scheduling on Multi-core and GPU Platform with CUDA. In: International Conference on Computational Science and Engineering, CSE 2009 (2009)Google Scholar
  13. 13.
    Su, Y., Shan, S., Chen, X., Gao, W.: Hierarchical Ensemble of Global and Local Classifiers for Face Recognition. IEEE Transactions on Image Processing 18(8), 1885–1896 (2009)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Phillips, P.J., Moon, H., Rizvi, S.A., Rauss, P.J.: The FERET evaluation methodology for face-recognition algorithms. PAMI 22(10), 1090–1104 (2000)CrossRefGoogle Scholar
  15. 15.
    Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. University of Massachusetts, Amherst,Technical Report 07-49 (2007)Google Scholar
  16. 16.
  17. 17.
  18. 18.
    Fialka, O., Cadik, M.: FFT and Convolution Performance in Image Filtering on GPU. In: Tenth International Conference on Information Visualization (2006)Google Scholar
  19. 19.
    Nvidia Corp., CUDA CUFFT LibraryGoogle Scholar
  20. 20.
    Mitchell, J.L., Ansari, M.Y., Hart, E.: Advanced image processing with DirectX 9 pixel shaders. In: Engel, W. (ed.) ShaderX2: Shader Programming Tips and Tricks with DirectX9.0. Wordware Publishing, Inc. (2003)Google Scholar
  21. 21.
    Sumanaweera, T., Liu, D.: Medical image reconstruction with the FFT in GPU Gems 2. In: Pharr, M. (ed.), pp. 765–784. Addison-Wesley (2005)Google Scholar
  22. 22.
    Brandon, L.D., Boyd, C., Govindaraju, N.: Fast computation of general Fourier Transforms on GPUS. In: IEEE International Conference on Multimedia and Expo. (ICME 2008), pp. 5–8 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yi-Chu Wang
    • 1
  • Bryan Donyanavard
    • 1
  • Kwang-Ting (Tim) Cheng
    • 1
  1. 1.Dept. of Electrical and Computer EngineeringUniversity of CaliforniaSanta BarbaraUSA

Personalised recommendations