Skip to main content
Log in

A framework for accelerating local feature extraction with OpenCL on multi-core CPUs and co-processors

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

In this paper, we examined heterogeneous architectures, for their suitability to run the scale invariant feature transformation (SIFT) algorithm in real time. The SIFT is one of the most robust as well as one of the most computational intensive algorithms to extract local features in many machine-vision applications. Many ongoing researches presented methods on improving the SIFT execution time. However, described techniques focus only on improving the SIFT execution time on a single homogeneous device. To address the gap in improving SIFT algorithm execution time on multi-device heterogeneous platforms we have prepared the OpenCL-SIFT implementation. We have described techniques to efficiently parallelize the application that contains many different computing cores. By a careful optimization process, we presented the performance portable implementation, for an efficient processing on various multi-device heterogeneous platforms. The experimental results showed that our implementation obtains appropriate accuracy and higher efficiency compared to recent open-source SIFT implementations. Using proposed methods we extracted SIFT features with more than 30 FPS on Full-HD images with different processor architectures. Additionally to increase the performance, we showed efficient (in average speed-up of 2.69×) multi-device scheduling methods for SIFT feature extraction. Finally, we described guidelines to optimize GPGPU-OpenCL programs for ×86 multi-core CPUs. The discussed methods are generic and may be used for the design of other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Top500 Supercomputer Site. http://www.top500.org/ (2015)

  2. Acharya, K., Venkatesh Babu, R., Vadhiyar, S.: A real-time implementation of sift using GPU. J. Real-Time Image Process. (2014). doi:10.1007/s11554-014-0446-6

    Google Scholar 

  3. AMD: Amd Graphics Cores Next (gcn) Architecture. White Paper. http://www.amd.com (2012)

  4. AMD: AMD Accelerated Parallel Processing-OpenCL Programming Guide, v2.7. http://www.amd.com/ (2013)

  5. AMD: AMD CodeXL. http://www.amd.com (2014)

  6. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) Computer Vision ECCV. Lecture Notes in Computer Science, vol. 3951, pp. 404–417. Springer, Berlin Heidelberg (2006)

    Google Scholar 

  7. Wu, C.: SIFTGPU: a GPU implementation of scale invariant feature transform. http://www.cs.unc.edu/~ccwu/siftgpu/lowesift (2011)

  8. Fang, J., Sips, H., Zhang, L., Xu, C., Che, Y., Varbanescu, A.L.: Test-driving Intel Xeon Phi (best paper award). In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (ICPE) (2014)

  9. Feng, H., Li, E., Chen, Y., Zhang, Y.: Parallelization and characterization of sift on multi-core systems. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 14–23 (2008)

  10. Hess, R.: An open source sift library. In: Proc. ACM Multimedia (MM). http://robwhess.github.io/opensift/ (2010)

  11. Intel: Intel SDK for OpenCL Applications XE 2013 R3. http://www.software.intel.com (2014)

  12. Intel: Intel Vtune-Amplifier. http://www.intel.com (2014)

  13. Juan, L., Gwon, O.: A comparison of sift, pca-sift and surf. Int. J. Image Process. (IJIP) 3(4), 143–152 (2009)

    Google Scholar 

  14. Ke, Y., Sukthankar, R.: Pca-sift: a more distinctive representation for local image descriptors. In: Computer Vision and Pattern Recognition, CVPR. Proceedings of the IEEE Computer Society Conference on, vol. 2, pp. 506–513 (2004)

  15. Khronos: The OpenCL specification, v1.2, rev. 19. http://www.khronos.org/opencl (2012)

  16. Mikolajczyk, K.: Local feature evaluation dataset. http://www.robots.ox.ac.uk/~vgg/research/affine/ (2009)

  17. Lindeberg, T.: Linear scale-space and related multi-scale representations. In: Scale-Space Theory in Computer Vision. The Springer International Series in Engineering and Computer Science, vol. 256, pp. 31–60. Springer, New York (1994)

    Chapter  Google Scholar 

  18. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  19. Maleki, S., Gao, Y., Garzaran, M.J., Wong, T., Padua, D.A.: An evaluation of vectorizing compilers. In: PACT Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 372–382 (2011)

  20. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005). doi:10.1109/TPAMI.2005.188

    Article  Google Scholar 

  21. Moren, K., Perschke, T., Goehringer, D.: Accelerating local feature extraction using opencl on heterogeneous platforms. In: Design and Architectures for Signal and Image Processing (DASIP), Conference on (2014)

  22. Nvidia: Nvidia’s Next Generation Cuda Compute Architecture: Fermi, v1.1. White Paper. http://www.nvidia.com (2009)

  23. Nvidia: Nvidia’s Next Generation Cuda Compute Architecture: Kepler, v1.0. White Paper. http://www.nvidia.com (2012)

  24. Nvidia: CUDA Toolkit 5.0. http://www.nvidia.com (2014)

  25. Nvidia: NVIDIA Visual Profiler. CUDA Toolkit. http://www.nvidia.com (2014)

  26. Stratton, J.A., Grover, V., Marathe, J., Aarts, B., Murphy, M., Hu, Z., Hwu, W.W.: Efficient compilation of fine grained spmd-threaded programs for multicore CPUs. In: CGO Proceedings of the 8th annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 111–119 (2010)

  27. Wang, W., Zhang, Y., Guoping, L., Yan, S., Jia, H.: Clsift: an optimization study of the scale invariance feature transform on gpus. In: High Performance Computing and Communications IEEE International Conference on Embedded and Ubiquitous Computing (HPCC\_EUC), IEEE 10th International Conference on, pp. 93–100 (2013). doi:10.1109/HPCC.and.EUC.2013.23

  28. Yang, D., Liu, L., Zhu, F., Zhang, W.: A parallel analysis on scale invariant feature transform (sift) algorithm. In: Temam, O., Yew, P.C., Zang, B. (eds.) Advanced Parallel Processing Technologies. Lecture Notes in Computer Science, vol. 6965, pp. 98–111. Springer, Berlin Heidelberg (2011)

    Chapter  Google Scholar 

  29. Yonglong, Z., Kuizhi, M., Xiang, J., Peixiang, D.: Parallelization and optimization of sift on gpu using cuda. In: High Performance Computing and Communications IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), IEEE 10th International Conference on, pp. 1351–1358 (2013). doi:10.1109/HPCC.and.EUC.2013.192

  30. Zhang, Q., Chen, Y., Zhang, Y., Xu, Y.: Sift implementation and optimization for multi-core systems. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1–8 (2008)

Download references

Acknowledgments

The authors would like to thank the reviewers, Thomas Perschke and Volker Schatz for proofreading the manuscript and helpful discussion about directions for future work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konrad Moren.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moren, K., Göhringer, D. A framework for accelerating local feature extraction with OpenCL on multi-core CPUs and co-processors. J Real-Time Image Proc 16, 901–918 (2019). https://doi.org/10.1007/s11554-016-0576-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-016-0576-0

Keywords

Navigation