Journal of Real-Time Image Processing

, Volume 16, Issue 6, pp 2379–2407 | Cite as

Speeding up spatiotemporal feature extraction using GPU

  • Ahmed MehrezEmail author
  • Ahmed A. Morgan
  • Elsayed E. Hemayed
Original Research Paper


Spatiotemporal feature extraction algorithms are widely used in many image processing and computer vision applications. They are favored because of their robust generated features. However, they have high computational complexity. Parallelizing these algorithms, in order to speed their execution up, is of great importance. In this paper, we propose new parallel implementations, using GPU computing, for the two most widely used spatiotemporal feature extraction algorithms: scale-invariant feature transform and speeded up robust features. In our implementations, we solve problems with previous parallel implementations, such as load imbalance, thread synchronization, and the use of atomic operations. Our implementations speed up the execution by simultaneously processing all the work of each stage of the two algorithms, without dividing that stage into smaller sequential ones. The allocation of the threads in our implementations further allows them to increase the occupancy of the GPU streaming multiprocessors (SMs). We compare our presented implementations to previous CPU and GPU parallel implementations of the two algorithms. Results show that the proposed implementations could do all the processing in real time with high accuracy. They further achieve higher speedup, frame rate, and SM occupancy than the previous best-known parallel implementations of the two algorithms.


CUDA Graphics processing unit (GPU) Image matching Scale-invariant feature transform (SIFT) Speeded up robust features (SURF) 


  1. 1.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  2. 2.
    Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. Lect. Notes Comput. Sci. 3667, 91–103 (2006)CrossRefGoogle Scholar
  3. 3.
    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)CrossRefGoogle Scholar
  4. 4.
    Lee, C., Rhee, C.E., Lee, H.-J.: Complexity reduction by modified scale-space construction in sift generation optimized for a mobile GPU. IEEE Trans. Circuits Syst. Video Technol. 27(10), 2246–2259 (2017)CrossRefGoogle Scholar
  5. 5.
    Zhang, Q., Chen, Y., Zhang, Y., Xu, Y.: SIFT implementation and optimization for multi-core systems. In: 2008. IPDPS 2008. IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2008)Google Scholar
  6. 6.
    Moren, K., Göhringer, D.: A framework for accelerating local feature extraction with OpenCL on multi-core CPUs and co-processors. J. Real-Time Image Process. 10(1007), 1–18 (2016)Google Scholar
  7. 7.
    Zhu, F., Chen, P., Yang, D., Zhang, W., Chen, H., Zang, B.: A GPU-based high-throughput image retrieval algorithm. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. ACM30-37, (2012)Google Scholar
  8. 8.
    Yan, W., Shi, X., Yan, X., Wang, L.: Computing OpenSURF on OpenCL and general purpose GPU. Int. J. Adv. Robot. Syst. 10(10), 375 (2013)CrossRefGoogle Scholar
  9. 9.
    Lu, Y., Li, Y., Song, B., Zhang, W., Chen, H., Peng, L.: Parallelizing image feature extraction algorithms on multi-core platforms. J. Parallel Distrib. Comput. 92, 1–14 (2016)CrossRefGoogle Scholar
  10. 10.
    Luebke, D.: CUDA: scalable parallel programming for high-performance scientific computing. In: The 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro (ISBI 2008). IEEE836-838, (2008)Google Scholar
  11. 11.
    Hwu, W.-M.W.: GPU Computing Gems Emerald Edition. Elsevier, Amsterdam (2011)Google Scholar
  12. 12.
    Brown, M., Lowe, D. G.: Invariant features from interest point groups. In: Proceedings of the British Machine Vision Conference 2002, BMVC, pp. 253–262. (2002)Google Scholar
  13. 13.
    Antonini, M., Barlaud, M., Mathieu, P., Daubechies, I.: Image coding using wavelet transform. IEEE Trans. Image Process. 1(2), 205–220 (1992)CrossRefGoogle Scholar
  14. 14.
    Heymann, S., Muller, K., Smolic, A., Frohlich, B., Wiegand, F.: SIFT implementation and optimization for general-purpose GPU. In: Proceedings of the International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, (2007)Google Scholar
  15. 15.
    Sinha, S. N., Frahm, J.-M., Pollefeys, M., Genc, Y.: GPU-based video feature tracking and matching. In: EDGE, Workshop on Edge Computing Using New Commodity Architectures, vol. 278, p. 4321. (2006)Google Scholar
  16. 16.
    Sinha, S., Frahm, J.-M., Pollefeys, M., Genc, Y.: Feature tracking and matching in video using programmable graphics hardware. Mach. Vis. Appl. 22(1), 207–217 (2007)CrossRefGoogle Scholar
  17. 17.
    Wu, C.: SiftGPU: a GPU implementation of scale invariant feature transform, (2012)
  18. 18.
    Vedaldi, A.: An open implementation of the SIFT detector and descriptor. UCLA CSD, (2007)
  19. 19.
    Yonglong, Z., Kuizhi, M., Xiang, J., Peixiang, D.: Parallelization and optimization of sift on GPU using CUDA. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications, The 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), IEEE1351-1358, (2013)Google Scholar
  20. 20.
    Mohammadi, M., Rezaeian, M.: Towards affordable computing: SiftCU a simple but elegant GPU-based implementation of SIFT. Int. J. Comput. Appl. 90(7), 30–37 (2014)Google Scholar
  21. 21.
    Acharya, K., Babu, R. V., Vadhiyar, S. S: A real-time implementation of SIFT using GPU. J. Real-Time Image Process. 1–11 (2014). CrossRefGoogle Scholar
  22. 22.
    Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. GPU Gems 3(39), 851–876 (2007)Google Scholar
  23. 23.
    Terriberry, T., French, L., Helmsen, J.: GPU accelerating speeded-up robust features. In: Proceedings of 3DPVT. p. 355–362. (2008)Google Scholar
  24. 24.
    Blelloch, G.: Prefix sums and their applications. In: J.H. Reif (ed). Synthesis of Parallel Algorithms, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA (1993)Google Scholar
  25. 25.
    Bilgic, B., Horn, B. K., Masaki, I.: Efficient integral image computation on the GPU. In: Intelligent Vehicles Symposium (IV), 2010 IEEE, IEEE528-533, (2010)Google Scholar
  26. 26.
    Fang, Z., Yang, D., Zhang, W., Chen, H., Zang, B.: A comprehensive analysis and parallelization of an image retrieval algorithm. In: 2011 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE154-164, (2011)Google Scholar
  27. 27.
    Schulz, A., Jung, F., Hartte, S.: CUDA SURF: a real-time implementation for SURF. (2011)
  28. 28.
    Cheon, S., Eom, I.K., Ha, S.W., Moon, Y.H.: An enhanced SURF algorithm based on new interest point detection procedure and fast computation technique. J. Real-Time Image Process (2016). CrossRefGoogle Scholar
  29. 29.
    Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: ACM SIGARCH Computer Architecture News, ACM.37, 3, pp. 152–163. (2009)CrossRefGoogle Scholar
  30. 30.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, Amsterdam (2011)zbMATHGoogle Scholar
  31. 31.
    Nvidia: NVIDIA Tesla P100: the most advanced datacenter accelerator ever built, featuring pascal GP100, the world’s fastest GPU, In: whitepaper.
  32. 32.
    C. Nvidia: C Programming Guide v9. 1. Nvidia Corporation, Santa Clara (2017)Google Scholar
  33. 33.
    Barandiaran, I., Cortes, C., Nieto, M., Grana, M., Ruiz, O. E.: A new evaluation framework and image dataset for keypoint extraction and feature descriptor matching. In: Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP). vol 1, pp. 252–257. (2013)Google Scholar
  34. 34.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)CrossRefGoogle Scholar
  35. 35.
    Van Rijsbergen, C.: Information Retrieval. vol 14, Department of Computer Science, University of glasgow. (1979)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Ahmed Mehrez
    • 1
    Email author
  • Ahmed A. Morgan
    • 2
    • 3
  • Elsayed E. Hemayed
    • 2
  1. 1.Department of Electrical Engineering, Faculty of Engineering at ShoubraBenha UniversityCairoEgypt
  2. 2.Department of Computer Engineering, Faculty of EngineeringCairo UniversityGizaEgypt
  3. 3.College of Computers and Information SystemsUmm Al-Qura UniversityMakkahSaudi Arabia

Personalised recommendations