CCCV 2017: Computer Vision pp 99-110 | Cite as

Massively Parallel Image Index for Vocabulary Tree Based Image Retrieval

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 772)

Abstract

Although vocabulary tree based algorithm has high efficiency for image retrieval, it still faces a dilemma when dealing with large data. In this paper, we show that image indexing is the main bottleneck of vocabulary tree based image retrieval and then propose how to exploit the GPU hardware and CUDA parallel programming model for efficiently solving the image index phase and subsequently accelerating the remaining retrieval stage. Our main contributions include tree structure transformation, image package processing and task parallelism. Our GPU-based image index is up to around thirty times faster than the original method and the whole GPU-based vocabulary tree algorithm is improved by twenty percentage in speed.

Keywords

Large-scale image retrieval Vocabulary tree Image index GPU-based model 

Notes

Acknowledgments

The authors would like to acknowledge Henrik Stewénius, David Nistér, Mikolajczyk, K. and Noah Snavely et al. for making their related datasets and source codes publicly available to us. This work is supported by the National Natural Science Foundation of China (Grant 61772213 and 61371140) and the Special Fund CZY17011 for Basic Scientific Research of Central Colleges, South-Central University for Nationalities, also in part by Grants 2015CFA062, 2015BAA133 and 2017010201010121.

References

  1. 1.
    Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: CVPR (2003)Google Scholar
  2. 2.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)Google Scholar
  3. 3.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  4. 4.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  5. 5.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  6. 6.
    Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: ECCV (2008)Google Scholar
  7. 7.
    Jgou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. Int. J. Comput. Vision 87, 316–336 (2010)CrossRefGoogle Scholar
  8. 8.
    Wu, C., SiftGPU: A GPU implementation of david lowe’s scale invariant feature transform (SIFT). http://cs.unc.edu/~ccwu/siftgpu/
  9. 9.
    Wu, C., Agarwal, S., Curless, B., Seitz, S.M.: Multicore bundle adjustment. In: CVPR (2011)Google Scholar
  10. 10.
    Farber, R.: CUDA Application Design and Development. Morgan Kaufmann, San Francisco (2011)Google Scholar
  11. 11.
    Arandjelović, R., Zisserman, A.: DisLocation: scalable descriptor distinctiveness for location recognition. In: ACCV (2014)Google Scholar
  12. 12.
    Wilson, K., Snavely, N.: Robust global translations with 1DSfM. In: ECCV (2014)Google Scholar
  13. 13.
    Snavely, N.: A CPU implementation of David Nistér and Henrik Stewénius’s vocabulary tree algorithm. https://github.com/snavely/VocabTree2
  14. 14.
    Agarwal, S., Snavely, N., Simon, I., Seitz, S., Szeliski, R.: Building Rome in a day. In: ICCV (2009)Google Scholar
  15. 15.
    Sattler, T., Havlena, M., Schindler, K., Pollefeys, M.: Large-scale location recognition and the geometric burstiness problem. In: CVPR (2016)Google Scholar
  16. 16.
    Koniusz, P., Yan, F., Gosselin, P.H., Mikolajczyk, K.: Higher-order occurrence pooling for bags-of-words: visual concept detection. IEEE Trans. Pattern Anal. Mach. Intell. 39, 313–326 (2017)CrossRefGoogle Scholar
  17. 17.
    Schönberger, J.L., Frahm, J.-M.: Structure-from-motion revisited. In: CVPR (2016)Google Scholar
  18. 18.
    Shen, T., Zhu, S., Fang, T., Zhang, R., Quan, L.: Graph-based consistent matching for structure-from-motion. In: ECCV (2016)Google Scholar
  19. 19.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: BMVC (2002)Google Scholar
  20. 20.
    Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vision 60, 63–86 (2004)CrossRefGoogle Scholar
  21. 21.
    Stewénius, H., Nistér, D.: UKbench dataset. http://vis.uky.edu/~stewe/ukbench/
  22. 22.
    Mikolajczyk, K.: Binaries for affine covariant region descriptors. http://www.robots.ox.ac.uk/~vgg/research/affine/

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  1. 1.National Key Laboratory of Science and Technology on Multi-spectral Information Processing, School of AutomationHuazhong University of Science and TechnologyWuhanPeople’s Republic of China
  2. 2.Hubei Key Laboratory of Medical Information Analysis & Tumor Diagnosis and Treatment, School of Biomedical EngineeringSouth-Central University for NationalitiesWuhanPeople’s Republic of China

Personalised recommendations