Multimedia Systems

, Volume 21, Issue 2, pp 229–241 | Cite as

Multi-order visual phrase for scalable partial-duplicate visual search

  • Shiliang Zhang
  • Qi Tian
  • Qingming Huang
  • Wen Gao
  • Yong Rui
Special Issue Paper


Visual phrase considers multiple visual words and captures extra spatial clues among them. Thus, visual phrase shows better discriminative power than single visual word in image retrieval and matching. Not withstanding their success, existing visual phrases still show obvious shortcomings: (1) limited flexibility, i.e., visual phrases are considered for matching only if they contain the same number of visual words; (2) large quantization error and low repeatability, i.e., quantization errors in visual words are aggregated in visual word combinations and visual phrases, making them harder to be matched than single visual words. To avoid these issues, we propose multi-order visual phrase (MVP) which contains two complementary clues: center visual word quantized from the local descriptor of each image keypoint and the visual and spatial clues of multiple nearby keypoints. Two MVPs are flexibly matched by first matching their center visual words, then estimating a match confidence by checking the spatial and visual consistency of their neighbor keypoints. Therefore, center visual word matching equals to traditional visual word matching, but the neighbor spatial and visual clues checking significantly boosts the discriminative power. MVP does not scarify the repeatability of single visual word and is more robust to quantization error than existing visual phrases. We test our approach in three image retrieval tasks on UKbench, Oxford5K, and 1 million distractor images collected from Flickr. Comparisons with recent retrieval approaches and existing visual phrase features clearly demonstrate the competitive accuracy and significantly better efficiency of MVP.


Image local descriptor Image matching Large-scale image retrieval Visual vocabulary 



This work was supported in part to Dr. Qi Tian by ARO grant W911NF-12-1-0057, Faculty Research Awards by NEC Laboratories of America, and 2012 UTSA START-R Research Award respectively. This work was supported in part by National Science Foundation of China (NSFC) 61128007. This work was supported in part by National Basic Research Program of China (973 Program): 2012CB316400, in part by National Natural Science Foundation of China: 61025011 and 61332016.


  1. 1.
    Bao, B., Zhu, G., Shen, J., Yan, S.: Robust image analysis with sparse representation on quantized visual features. IEEE Trans. Image Process. 22(3), 860–871 (2013)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Bay, H., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)CrossRefGoogle Scholar
  3. 3.
    Brown, M., Lowe, D.: Unsupervised 3D object recognition and reconstruction in unordered datasets. In: IEEE International Conference on 3-D Digital Imaging and Modeling, pp. 56-63. Ottawa, Ontario, Canada (2005)Google Scholar
  4. 4.
    Brown, M., Loww, D.G.: Automatic panoramic image stitching using invariant features. Int. J. Comput. Vis. 74(1), 59–73 (2007)CrossRefGoogle Scholar
  5. 5.
    Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–391 (1981)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Jégou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: European Conference on Computer Vision. Marseille, France, pp. 304–317 (2008)Google Scholar
  7. 7.
    Jégou, H., Douze, M., Schmid, C.: Improving bag-of-feature for large scale image search. Int. J. Comput. Vis. 87(3), 316–336 (2010)CrossRefGoogle Scholar
  8. 8.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptor into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
  9. 9.
    Juan, L., Gwun, O.: A comparison of SIFT, PCA-SIFT and SURF. Int J Image Processing 3(4), 143–152 (2009)Google Scholar
  10. 10.
    Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. Comput. Vis. Pattern Recognit. 2, II-506 (2004)Google Scholar
  11. 11.
    Ke, Y., Sukthankar, R., Huston, L.: Efficient near-duplicated detection and sub-image retrieval. In: ACM Multimedia. New York City, pp. 10–16 (2004)Google Scholar
  12. 12.
    Levin, A., Zomet, A., Peleg, S., Weiss. Y.: Seamless image stitching in the gradient domain. In: European Conference on Computer Vision, pp. 377–389. Berlin, Heidelberg (2004)Google Scholar
  13. 13.
    Liu, D., Hua, G., Viola, P., Chen, T.: Integrated feature selection and higher-order spatial feature extraction for object categorization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar
  14. 14.
    Lowe, D.G.: Distinctive image features from scale invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  15. 15.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–391. Cardiff, UK (2002)Google Scholar
  16. 16.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)CrossRefGoogle Scholar
  17. 17.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: IEEE Conference on Computer Vision and Pattern Recognition, New York City, NY, pp. 17–22 (2006)Google Scholar
  18. 18.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, pp. 17–22 (2007)Google Scholar
  19. 19.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  20. 20.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an effcient alternative to SIFTor SURF. In: ICCV, pp. 2564–2571. Barcelona, Spain (2011)Google Scholar
  21. 21.
    Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. IEEE Conf. Comput. Visi. Pattern Recognit. 2, 2033–2040 (2006)Google Scholar
  22. 22.
    Shen, X., Lin, Z., Brandt, J., Avidan, S., Wu, Y.: Object retrieval and localization with spatially-constrained similarity measure and k-NN reranking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3013–3020. Providence, Rhode Island, USA (2012)Google Scholar
  23. 23.
    Shum, H.Y., Szeliski, R.: Systems and experiment paper: construction of panoramic image mosaics with global and local alignment. Int. J. Comput. Vis. 36(2), 101–130 (2000)CrossRefGoogle Scholar
  24. 24.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision. Nice, France (2003)Google Scholar
  25. 25.
    Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRefGoogle Scholar
  26. 26.
    Wang, B., Li, Z., Li, M., Ma, W.Y.: Large-scale duplicate detection for web image search. In: IEEE International Conference on Multimedia and Expo, pp. 353–356. Toronto, Ontario, Canada (2006)Google Scholar
  27. 27.
    Wang, M., Li, G., Lu, Z., Gao, Y., Chua, T.-S.: When amazon meets google: product visualization by exploring multiple web sources. ACM Trans. Internet Technol 12(4), 12 (2013)CrossRefGoogle Scholar
  28. 28.
    Wang, M., Li, H., Tao, D., Lu, K., Wu, X.: Multimodal graph-based reranking for web image search. IEEE Trans. Image Process. 21(11), 4649–4661 (2012)CrossRefMathSciNetGoogle Scholar
  29. 29.
    Wang, M., Yang, K., Hua, X., Zhang, H.: Towards a relevant and diverse search of social images. IEEE Trans. Multimed. 12(8), 829–842 (2010)CrossRefGoogle Scholar
  30. 30.
    Wang, X., Yang, M., Cour, T., Zhu, S., Yu, K., Han, T.X.: Contextual weighting for vocabulary tree based image retrieval. In: Internationall Conference on Computer Vision, pp. 6–13. Barcelona, Spain (2011)Google Scholar
  31. 31.
    Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling feature for large scale partial-duplicated web image search. In: IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL (2009)Google Scholar
  32. 32.
    Yang, J., Yu, K., Gong, Y., Huang, T. : Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1794–1801. Miami, Florida, USA (2009)Google Scholar
  33. 33.
    Zhang, S., Huang, Q., Hua, G., Jiang, S., Gao, W., Tian, Q.: Building contextual visual vocabulary for large-scale image applications. In: ACM Multimedia. Florence, Italy (2010)Google Scholar
  34. 34.
    Zhang, S., Huang, Q., Lu, Y., Gao, W., Tian, Q. : Building pair-wise visual word tree for efficient image re-ranking. In: ICASSP, pp. 794–797. Dallas, Texas, USA (2010)Google Scholar
  35. 35.
    Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive visual words and visual phrases for image applications. In: ACM Multimedia. Beijing, China (2009)Google Scholar
  36. 36.
    Zhang, S., Tian, Q., Lu, K., Huang, Q., Gao, W.: Edge-SIFT: discriminative binary descriptor for scalable partial-duplicate mobile search. IEEE Trans. Image Process. 22(7), 2889–2902 (2013)CrossRefGoogle Scholar
  37. 37.
    Zhang, S., Yang, M., Wang, X., Lin, Y., Tian, Q.: Semantic-aware co-indexing for image retrieval. In: IEEE International Conference on Computer Vision, Sydney, Australia (2013)Google Scholar
  38. 38.
    Zhang, Y., Jia, Z., Chen, T.: Image retrieval with geometry-preserving visual phrases. In: IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA (2011)Google Scholar
  39. 39.
    Zheng, Y.-T., Zhao, M., Neo, S.-Y., Chua, T.-S., Tian, Q.: Visual synset: towards a higher-level visual representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Anchorage, Alaska, USA (2008)Google Scholar
  40. 40.
    Zhou, W., Li, H., Lu, Y., Tian, Q.: Large scale image search with geometric coding. In: ACM Multimedia. Arizona, USA (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Shiliang Zhang
    • 1
  • Qi Tian
    • 1
  • Qingming Huang
    • 2
  • Wen Gao
    • 3
  • Yong Rui
    • 4
  1. 1.Department of Computer ScienceUniversity of Texas at San AntonioSan AntonioUSA
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.Peking UniversityBeijingChina
  4. 4.Microsoft Research AsiaBeijingChina

Personalised recommendations