Speeding up spatiotemporal feature extraction using GPU

Mehrez, Ahmed; Morgan, Ahmed A.; Hemayed, Elsayed E.

doi:10.1007/s11554-018-0755-2

Speeding up spatiotemporal feature extraction using GPU

Original Research Paper
Published: 09 February 2018

Volume 16, pages 2379–2407, (2019)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Ahmed Mehrez¹,
Ahmed A. Morgan^2,3 &
Elsayed E. Hemayed²

351 Accesses
5 Citations
Explore all metrics

Abstract

Spatiotemporal feature extraction algorithms are widely used in many image processing and computer vision applications. They are favored because of their robust generated features. However, they have high computational complexity. Parallelizing these algorithms, in order to speed their execution up, is of great importance. In this paper, we propose new parallel implementations, using GPU computing, for the two most widely used spatiotemporal feature extraction algorithms: scale-invariant feature transform and speeded up robust features. In our implementations, we solve problems with previous parallel implementations, such as load imbalance, thread synchronization, and the use of atomic operations. Our implementations speed up the execution by simultaneously processing all the work of each stage of the two algorithms, without dividing that stage into smaller sequential ones. The allocation of the threads in our implementations further allows them to increase the occupancy of the GPU streaming multiprocessors (SMs). We compare our presented implementations to previous CPU and GPU parallel implementations of the two algorithms. Results show that the proposed implementations could do all the processing in real time with high accuracy. They further achieve higher speedup, frame rate, and SM occupancy than the previous best-known parallel implementations of the two algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. Lect. Notes Comput. Sci. 3667, 91–103 (2006)
Article Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Lee, C., Rhee, C.E., Lee, H.-J.: Complexity reduction by modified scale-space construction in sift generation optimized for a mobile GPU. IEEE Trans. Circuits Syst. Video Technol. 27(10), 2246–2259 (2017)
Article Google Scholar
Zhang, Q., Chen, Y., Zhang, Y., Xu, Y.: SIFT implementation and optimization for multi-core systems. In: 2008. IPDPS 2008. IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2008)
Moren, K., Göhringer, D.: A framework for accelerating local feature extraction with OpenCL on multi-core CPUs and co-processors. J. Real-Time Image Process. 10(1007), 1–18 (2016)
Google Scholar
Zhu, F., Chen, P., Yang, D., Zhang, W., Chen, H., Zang, B.: A GPU-based high-throughput image retrieval algorithm. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. ACM30-37, (2012)
Yan, W., Shi, X., Yan, X., Wang, L.: Computing OpenSURF on OpenCL and general purpose GPU. Int. J. Adv. Robot. Syst. 10(10), 375 (2013)
Article Google Scholar
Lu, Y., Li, Y., Song, B., Zhang, W., Chen, H., Peng, L.: Parallelizing image feature extraction algorithms on multi-core platforms. J. Parallel Distrib. Comput. 92, 1–14 (2016)
Article Google Scholar
Luebke, D.: CUDA: scalable parallel programming for high-performance scientific computing. In: The 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro (ISBI 2008). IEEE836-838, (2008)
Hwu, W.-M.W.: GPU Computing Gems Emerald Edition. Elsevier, Amsterdam (2011)
Google Scholar
Brown, M., Lowe, D. G.: Invariant features from interest point groups. In: Proceedings of the British Machine Vision Conference 2002, BMVC, pp. 253–262. (2002)
Antonini, M., Barlaud, M., Mathieu, P., Daubechies, I.: Image coding using wavelet transform. IEEE Trans. Image Process. 1(2), 205–220 (1992)
Article Google Scholar
Heymann, S., Muller, K., Smolic, A., Frohlich, B., Wiegand, F.: SIFT implementation and optimization for general-purpose GPU. In: Proceedings of the International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, (2007)
Sinha, S. N., Frahm, J.-M., Pollefeys, M., Genc, Y.: GPU-based video feature tracking and matching. In: EDGE, Workshop on Edge Computing Using New Commodity Architectures, vol. 278, p. 4321. (2006)
Sinha, S., Frahm, J.-M., Pollefeys, M., Genc, Y.: Feature tracking and matching in video using programmable graphics hardware. Mach. Vis. Appl. 22(1), 207–217 (2007)
Article Google Scholar
Wu, C.: SiftGPU: a GPU implementation of scale invariant feature transform, https://github.com/pitzer/SiftGPU (2012)
Vedaldi, A.: An open implementation of the SIFT detector and descriptor. UCLA CSD, http://vision.ucla.edu/~vedaldi/code/sift.html (2007)
Yonglong, Z., Kuizhi, M., Xiang, J., Peixiang, D.: Parallelization and optimization of sift on GPU using CUDA. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications, The 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), IEEE1351-1358, (2013)
Mohammadi, M., Rezaeian, M.: Towards affordable computing: SiftCU a simple but elegant GPU-based implementation of SIFT. Int. J. Comput. Appl. 90(7), 30–37 (2014)
Google Scholar
Acharya, K., Babu, R. V., Vadhiyar, S. S: A real-time implementation of SIFT using GPU. J. Real-Time Image Process. 1–11 (2014). https://doi.org/10.1007/s11554-014-0446-6
Article Google Scholar
Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. GPU Gems 3(39), 851–876 (2007)
Google Scholar
Terriberry, T., French, L., Helmsen, J.: GPU accelerating speeded-up robust features. In: Proceedings of 3DPVT. p. 355–362. (2008)
Blelloch, G.: Prefix sums and their applications. In: J.H. Reif (ed). Synthesis of Parallel Algorithms, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA (1993)
Bilgic, B., Horn, B. K., Masaki, I.: Efficient integral image computation on the GPU. In: Intelligent Vehicles Symposium (IV), 2010 IEEE, IEEE528-533, (2010)
Fang, Z., Yang, D., Zhang, W., Chen, H., Zang, B.: A comprehensive analysis and parallelization of an image retrieval algorithm. In: 2011 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE154-164, (2011)
Schulz, A., Jung, F., Hartte, S.: CUDA SURF: a real-time implementation for SURF. https://www.d2.mpi-inf.mpg.de/surf (2011)
Cheon, S., Eom, I.K., Ha, S.W., Moon, Y.H.: An enhanced SURF algorithm based on new interest point detection procedure and fast computation technique. J. Real-Time Image Process (2016). https://doi.org/10.1007/s11554-016-0614-y
Article Google Scholar
Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: ACM SIGARCH Computer Architecture News, ACM.37, 3, pp. 152–163. (2009)
Article Google Scholar
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, Amsterdam (2011)
MATH Google Scholar
Nvidia: NVIDIA Tesla P100: the most advanced datacenter accelerator ever built, featuring pascal GP100, the world’s fastest GPU, In: whitepaper. https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf
C. Nvidia: C Programming Guide v9. 1. Nvidia Corporation, Santa Clara (2017)
Google Scholar
Barandiaran, I., Cortes, C., Nieto, M., Grana, M., Ruiz, O. E.: A new evaluation framework and image dataset for keypoint extraction and feature descriptor matching. In: Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP). vol 1, pp. 252–257. (2013)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)
Article Google Scholar
Van Rijsbergen, C.: Information Retrieval. vol 14, Department of Computer Science, University of glasgow. citeseer.ist.psu.edu/vanrijsbergen79information.html (1979)

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Faculty of Engineering at Shoubra, Benha University, Cairo, 11614, Egypt
Ahmed Mehrez
Department of Computer Engineering, Faculty of Engineering, Cairo University, Giza, 12613, Egypt
Ahmed A. Morgan & Elsayed E. Hemayed
College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
Ahmed A. Morgan

Authors

Ahmed Mehrez
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed A. Morgan
View author publications
You can also search for this author in PubMed Google Scholar
Elsayed E. Hemayed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Mehrez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mehrez, A., Morgan, A.A. & Hemayed, E.E. Speeding up spatiotemporal feature extraction using GPU. J Real-Time Image Proc 16, 2379–2407 (2019). https://doi.org/10.1007/s11554-018-0755-2

Download citation

Received: 19 July 2017
Accepted: 16 January 2018
Published: 09 February 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11554-018-0755-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speeding up spatiotemporal feature extraction using GPU

Abstract

Access this article

Similar content being viewed by others

A framework for accelerating local feature extraction with OpenCL on multi-core CPUs and co-processors

A real-time implementation of SIFT using GPU

Performance Evaluation of Selected Parallel Object Detection and Tracking Algorithms on an Embedded GPU Platform

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

A framework for accelerating local feature extraction with OpenCL on multi-core CPUs and co-processors

A real-time implementation of SIFT using GPU

Performance Evaluation of Selected Parallel Object Detection and Tracking Algorithms on an Embedded GPU Platform

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation