Skip to main content
Log in

Label Propagation with Ensemble of Pairwise Geometric Relations: Towards Robust Large-Scale Retrieval of Object Instances

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Spatial verification methods permit geometrically stable image matching, but still involve a difficult trade-off between robustness as regards incorrect rejection of true correspondences and discriminative power in terms of mismatches. To address this issue, we ask whether an ensemble of weak geometric constraints that correlates with visual similarity only slightly better than a bag-of-visual-words model performs better than a single strong constraint. We consider a family of spatial verification methods and decompose them into fundamental constraints imposed on pairs of feature correspondences. Encompassing such constraints leads us to propose a new method, which takes the best of existing techniques and functions as a unified Ensemble of pAirwise GEometric Relations (EAGER), in terms of both spatial contexts and between-image transformations. We also introduce a novel and robust reranking method, in which the object instances localized by EAGER in high-ranked database images are reissued as new queries. EAGER is extended to develop a smoothness constraint where the similarity between the optimized ranking scores of two instances should be maximally consistent with their geometrically constrained similarity. Reranking is newly formulated as two label propagation problems: one is to assess the confidence of new queries and the other to aggregate new independently executed retrievals. Extensive experiments conducted on four datasets show that EAGER and our reranking method outperform most of their state-of-the-art counterparts, especially when large-scale visual vocabularies are used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2016). NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR.

  • Arandjelovic, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In CVPR, pp 2911–2918.

  • Avrithis, Y. S., & Tolias, G. (2014). Hough pyramid matching: Speeded-up geometry re-ranking for large scale image retrieval. International Journal of Computer Vision, 107(1), 1–19.

    Article  Google Scholar 

  • Babenko, A., & Lempitsky, V. S. (2015). Aggregating local deep features for image retrieval. In ICCV, pp. 1269–1277.

  • Babenko, A, Slesarev, A, Chigorin, A, & Lempitsky, V. S. (2014). Neural codes for image retrieval. In ECCV, pp. 584–599.

  • Bay, H., Ess, A., Tuytelaars, T., & Gool, L. J. V. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346–359.

    Article  Google Scholar 

  • Cao, Y., Wang, C., Li, Z., Zhang, L., & Zhang, L. (2010). Spatial-bag-of-features. In CVPR, pp. 3352–3359.

  • Chum, O., Matas, J., & Obdrzálek, S. (2004). Enhancing RANSAC by generalized model optimization. In ACCV.

  • Chum, O., Mikulík, A., Perdoch, M., & Matas, J. (2011). Total recall II: Query expansion revisited. In CVPR, pp. 889–896.

  • Chum, O., Perdoch, M., & Matas, J. (2009). Geometric min-hashing: Finding a (thick) needle in a haystack. In CVPR, pp. 17–24.

  • Chum, O., Philbin, J., Sivic, J., Isard, M., & Zisserman, A. (2007). Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV, pp. 1–8.

  • Deng, C., Ji, R., Liu, W., Tao, D., & Gao, X. (2013). Visual reranking through weakly supervised multi-graph learning. In ICCV, pp. 2600–2607.

  • Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communication of ACM, 24(6), 381–395.

    Article  MathSciNet  Google Scholar 

  • Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2016). Deep image retrieval: Learning global representations for image search. In ECCV, pp. 241–257.

  • Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2017). End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision, 124(2), 237–254.

    Article  MathSciNet  Google Scholar 

  • Jégou, H., & Chum, O. (2012). Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In ECCV, pp. 774–787.

  • Jégou, H., Douze, M., & Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In ECCV, pp. 304–317.

  • Jégou, H., Douze, M., & Schmid, C. (2009). On the burstiness of visual elements. In CVPR, pp. 1169–1176.

  • Jégou, H., Douze, M., & Schmid, C. (2010). Improving bag-of-features for large scale image search. International Journal of Computer Vision, 87(3), 316–336.

    Article  Google Scholar 

  • Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., & Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1704–1716.

    Article  Google Scholar 

  • Jing, Y., & Baluja, S. (2008). VisualRank: Applying PageRank to large-scale image search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1877–1890.

    Article  Google Scholar 

  • Johns, E. D., & Yang, G. (2014). Pairwise probabilistic voting: Fast place recognition without RANSAC. In ECCV, pp. 504–519.

  • Kalantidis, Y., Mellina, C., & Osindero, S. (2016). Cross-dimensional weighting for aggregated deep convolutional features. In ECCV workshops, pp. 685–701.

  • Li, X., Larson, M., & Hanjalic, A. (2015). Pairwise geometric matching for large-scale object retrieval. In CVPR, pp. 5153–5161.

  • Liu, Z., Li, H., Zhou, W., & Tian, Q. (2012). Embedding spatial context information into inverted file for large-scale image retrieval. In ACM multimedia, pp. 199–208.

  • Lowe, D. G. (2004). Distinctive image features from scale invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86.

    Article  Google Scholar 

  • Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP, pp. 331–340.

  • Ng, J. Y., Yang, F., & Davis, L. S. (2015). Exploiting local features from deep networks for image retrieval. In CVPR workshops, pp. 53–61.

  • Pedronette, D. C. G., & da Silva, Torres R. (2013). Image re-ranking and rank aggregation based on similarity of ranked lists. Pattern Recognition, 46(8), 2350–2360.

    Article  Google Scholar 

  • Perdoch, M., Chum, O., & Matas, J. (2009). Efficient representation of local geometry for large scale object retrieval. In CVPR, pp. 9–16.

  • Perronnin, F., Liu, Y., Sánchez, J., & Poirier, H. (2010). Large-scale image retrieval with compressed Fisher vectors. In CVPR, pp. 3384–3391.

  • Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR.

  • Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR.

  • Qin, D., Gammeter, S., Bossard, L., Quack, T., & Gool, L. J. V. (2011). Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. In CVPR, pp. 777–784.

  • Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In CVPR workshops, pp. 512–519.

  • Radenovic, F., Tolias, G., & Chum, O. (2016). CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In ECCV, pp. 3–20.

  • Romberg, S., & Lienhart, R. (2013). Bundle min-hashing. IJMIR, 2(4), 243–259.

    Google Scholar 

  • Romberg, S., Pueyo, L. G., Lienhart, R., & van Zwol, R. (2011). Scalable logo recognition in real-world images. In ICMR, p. 25.

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Sattler, T., Havlena, M., Schindler, K., & Pollefeys, M. (2016). Large-scale location recognition and the geometric burstiness problem. In CVPR, pp. 1582–1590.

  • Sattler, T., Leibe, B., & Kobbelt, L. (2009). SCRAMSAC: improving RANSAC’s efficiency with a spatial consistency filter. In ICCV, pp. 2090–2097.

  • Schönberger, J. L., Berg, A. C., & Frahm, J. (2015a). Efficient two-view geometry classification. In GCPR, pp. 53–64.

  • Schönberger, J. L., Berg, A. C., & Frahm, J. (2015b). PAIGE: PAirwise image geometry encoding for improved efficiency in structure-from-motion. In CVPR, pp. 1009–1018.

  • Schönberger, J. L., Price, T., Sattler, T., Frahm, J., & Pollefeys, M. (2016). A vote-and-verify strategy for fast spatial verification in image retrieval. In ACCV, pp. 321–337.

  • Schönberger, J. L., Radenovic, F., Chum, O., & Frahm, J. (2015c). From single image query to detailed 3D reconstruction. In CVPR, pp. 5126–5134.

  • Shen, X., Lin, Z., Brandt, J., & Wu, Y. (2014). Spatially-constrained similarity measure for large-scale object retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1229–1241.

    Article  Google Scholar 

  • Silpa-Anan, C., & Hartley, R. I. (2008). Optimised KD-trees for fast image descriptor matching. In CVPR.

  • Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV, pp. 1470–1477.

  • Sivic, J., & Zisserman, A. (2009). Efficient visual search of videos cast as text retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 591–606.

    Article  Google Scholar 

  • Tian, X., Yang, L., Wang, J., Wu, X., & Hua, X. (2011). Bayesian visual reranking. IEEE Transactions on Multimedia, 13(4), 639–652.

    Article  Google Scholar 

  • Tolias, G., Avrithis, Y. S., & Jégou, H. (2016). Image search with selective match kernels: Aggregation across single and multiple images. International Journal of Computer Vision, 116(3), 247–261.

    Article  MathSciNet  Google Scholar 

  • Tolias, G., & Jégou, H. (2014). Visual query expansion with or without geometry: Refining local descriptors by feature aggregation. Pattern Recognition, 47(10), 3466–3476.

    Article  Google Scholar 

  • Tolias, G., Kalantidis, Y., Avrithis, Y. S., & Kollias, S. D. (2014). Towards large-scale geometry indexing by feature selection. Computer Vision and Image Understanding, 120, 31–45.

    Article  Google Scholar 

  • Tolias, G., Sicre, R., Jégou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. In ICLR.

  • Turpin, A., & Scholer, F. (2006). User performance versus precision measures for simple search tasks. In SIGIR, pp. 11–18.

  • Wu, X., & Kashino, K. (2015a). Adaptive dither voting for robust spatial verification. In ICCV, pp. 1877–1885.

  • Wu, X., & Kashino, K. (2015b). Robust spatial matching as ensemble of weak geometric relations. In BMVC, pp. 25.1–25.12.

  • Wu, X., & Kashino, K. (2015c). Second-order configuration of local features for geometrically stable image matching and retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 25(8), 1395–1408.

    Article  Google Scholar 

  • Wu, Z., Ke, Q., Isard, M., & Sun, J. (2009). Bundling features for large scale partial-duplicate web image search. In CVPR, pp. 25–32.

  • Xu, B., Bu, J., Chen, C., Wang, C., Cai, D., & He, X. (2015a). EMR: A scalable graph-based ranking model for content-based image retrieval. IEEE Transactions on Knowledge and Data Engineering, 27(1), 102–114.

    Article  Google Scholar 

  • Xu, Z., Yang, Y., & Hauptmann, A. G. (2015b). A discriminative CNN video representation for event detection. In CVPR, pp. 1798–1807.

  • Yang, Y., & Newsam S. (2011). Spatial pyramid co-occurrence for image classification. In ICCV, pp. 1465–1472.

  • Zhang, Y., Jia, Z., & Chen, T. (2011). Image retrieval with geometry-preserving visual phrases. In CVPR, pp. 809–816.

  • Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Schölkopf, B. (2003). Learning with local and global consistency. In NIPS, pp. 321–328.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaomeng Wu.

Additional information

Communicated by Josef Sivic.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, X., Hiramatsu, K. & Kashino, K. Label Propagation with Ensemble of Pairwise Geometric Relations: Towards Robust Large-Scale Retrieval of Object Instances. Int J Comput Vis 126, 689–713 (2018). https://doi.org/10.1007/s11263-018-1063-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-018-1063-9

Keywords

Navigation