Abstract
Object retrieval is still an open question. A promising approach is based on the matching of visual phrases. However, this routine is often corrupted by visual phrase burstiness, i.e., the repetitive occurrence of some certain visual phrases. Burstiness leads to over-counting the co-occurring visual patterns between two images, thus would deteriorate the accuracy of image similarity measurement. On the other hand, existing methods are incapable of capturing the complete geometric variation between images. In this paper, we propose a novel strategy to address the two problems. Firstly, we propose a unified framework for matching geometry-constrained visual phrases. This framework provides a possibility of combing the optimal geometry constraints to improve the validity of matched visual phrases. Secondly, we propose to address the problem of visual phrase burstiness from a probabilistic view. This approach effectively filters out the bursty visual phrases through explicitly modelling their distribution. Experiments on five benchmark datasets demonstrate that our method outperforms other approaches consistently and significantly.
Similar content being viewed by others
Notes
Source code is available at: http://jiangwh.weebly.com/
References
Arandjelovi R, Zisserman A (2012) Three things everyone should know to improve object retrieval. IEEE Conference on Computer Vision and Pattern Recognition, Providence, pp 2911–2918
Bay H, Tuytelaars T, Van Gool L (2006) SURF : speeded up robust features. European Conference on Computer Vision, Graz, pp 404–417
Chum O, Matas J (2010) Unsupervised discovery of co-occurrence in sparse high dimensional data. IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, pp 3416–3423
Chum O, Mikulik A (2011) Total recall II: query expansion revisited. IEEE Conference on Computer Vision and Pattern Recognition, Colorado, pp 889–896
El sayad I, Martinet J, Urruty T, Djeraba C (2010) Toward a higher-level visual representation for content-based image retrieval. Multimed Tools Appl 1–28
Hao Q, Cai R, Li Z et al (2012) 3D visual phrases for landmark recognition. IEEE Conference on Computer Vision and Pattern Recognition, Providence, pp 3594–3601
J’egou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33:117–128
Jégou H, Chum O (2012) Negative evidences and co-occurences in image retrieval : the benefit of PCA and whitening. European Conference on Computer Vision, Florence, pp 774–787
Jégou H, Douze M (2010) Aggregating local descriptors into a compact image representation. IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, pp 3304–3311
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. European Conference on Computer Vision, Marseille, pp 304–317
Jégou H, Douze M, Schmid C (2009) On the burstiness of visual elements. IEEE Conference on Computer Vision and Pattern Recognition, Miami, pp 1169–1176
Jiang Y, Meng J, Yuan J, Luo J (2015) Randomized spatial context for object search. IEEE Trans Image Process 24:1748–1762
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Conference on Computer Vision and Pattern Recognition, New York, pp 2169–2178
Lebeda K, Matas JJ, Chum O (2012) Fixing the locally optimized RANSAC. British Machine Vision Conference, Surrey, pp 95.1–95.11
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110. doi:10.1023/B:VISI.0000029664.99615.94
Mikolajczyk K, Schmid C (2001) Indexing based on scale invariant interest points. IEEE Conference on Computer Vision and Pattern Recognition, British Columbia, pp 525–531
Murata M, Nagano H, Member S, Mukai R (2014) BM25 with exponential IDF for instance search. IEEE Trans Multimed 16:1690–1699
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. IEEE Conference on Computer Vision and Pattern Recognition, New York, pp 2161–2168
Philbin J, Chum O, Isard M et al (2007) Object retrieval with large vocabularies and fast spatial matching. IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, pp 1–8
Philbin J, Chum O, Isard M et al (2008) Lost in quantization: improving particular object retrieval in large scale image databases. IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, pp 1–8
Qin D, Wengert C, Van Gool L (2013) Query adaptive similarity for large scale object retrieval. IEEE Conference on Computer Vision and Pattern Recognition, Portland, pp 1610–1617
Revaud J, Douze M, Schmid C (2012) Correlation-based burstiness for logo retrieval. ACM Multimedia, Nara, pp 965–968
Shen X, Lin Z, Brandt J, Wu Y (2014) Spatially-constrained similarity measure for large-scale object retrieval. IEEE Trans Pattern Anal Mach Intell 36:1229–1241
Shi M, Avrithis Y, Jégou H (2015) Early burst detection for memory-efficient image retrieval. IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp 605–613
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. IEEE Conference on Computer Vision and Pattern Recognition, Nice, pp 1470–1477
Smucker MD, Allan J, Carterette B (2007) A comparison of statistical significance tests for information retrieval evaluation. ACM Conference on Information and Knowledge Management, Lisbon, pp 623–632
Tian Q, Zhang S, Zhou W et al (2011) Building descriptive and discriminative visual codebook for large-scale image applications. Multimedia Tools Appl 51:441–477
Uijlings JR, Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104:154–171
Wang X, Yang M, Cour T, Zhu S (2011) Contextual weighting for vocabulary tree based image retrieval. IEEE International Conference on Computer Vision, Barcelona, pp 209–216
Xu J, Jagadeesh V, Ni Z (2013) Graph-based topic-focused retrieval in distributed camera network. IEEE Trans Multimed 15:2046–2057
Zhang Y, Jia Z, Chen T (2011) Image retrieval with geometry-preserving visual phrases. IEEE Conference on Computer Vision and Pattern Recognition, Providence, pp 809–816
Zhao W, Wu X, Ngo C (2010) On the annotation of web videos by efficient near-duplicate search. IEEE Trans Multimed 12:448–461
Zheng L, Wang S (2013) Visual phraselet: refining spatial constraints for large scale image search. IEEE Signal Process Lett 20:391–394
Zheng L, Wang S, He F, Tian Q (2014) Seeing the big picture : deep embedding with contextual evidences. arXiv preprint arXiv:1406.0132
Zheng L, Wang S, Liu Z et al (2014) Packing and padding: coupled multi-index for accurate image retrieval. IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp 1963–1970
Zheng L, Wang S, Zhou W, Tian Q (2014) Bayes merging of multiple vocabularies for scalable image retrieval. IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp 1963–1970
Zhong W, Qifa K, Isard M, Jian S (2009) Bundling features for large scale partial-duplicate web image search. IEEE Conference on Computer Vision and Pattern Recognition, Miami, pp 25–32
Acknowledgments
This work is supported by Chinese National Natural Science Foundation under Grants 61471049,61532018 and 61372169, and BUPT Excellent Ph.D. students Foundation under Grant CX201425.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, W., Zhao, Z. & Su, F. Bayes pooling of visual phrases for object retrieval. Multimed Tools Appl 75, 9095–9119 (2016). https://doi.org/10.1007/s11042-015-2939-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2939-0