Abstract
Recently, feature grouping has been proposed as a method for improving retrieval results for logos and web images. This relies on the idea that a group of features matching over a local region in an image is more discriminative than a single feature match. In this paper, we evolve this concept further and apply it to the more challenging task of landmark recognition. We propose a novel combination of dense sampling of SIFT features with interest regions which represent the more salient parts of the image in greater detail. In place of conventional dense sampling used in category recognition that computes features on a regular grid at a number of fixed scales, we allow the sampling density and scale to vary based on the scale of the interest region. We develop new techniques for exploring stronger geometric constraints inside the feature groups and computing the match score. The spatial information is stored efficiently in an inverted index structure. The proposed approach considers part-based matching of interest regions instead of matching entire images using a histogram under bag-of-words. This helps reducing the influence of background clutter and works better under occlusion. Experiments reveal that directing more attention to the salient regions of the image and applying proposed geometric constraints helps in vastly improving recognition rates for reasonable vocabulary sizes.
Similar content being viewed by others
References
Cao, Y., Wang, C., Li, Z., Zhang, L., Zhang, L.: Spatial-bag-of-features. In: CVPR, pp. 3352–3359 (2010)
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: British Machine Vision Conference (2011)
Chen, D.M., Baatz, G., Köser, K., Tsai, S.S., Vedantham, R., Pylvänäinen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., Grzeszczuk, R.: City-scale landmark identification on mobile devices. In: CVPR, pp. 737–744 (2011)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Gehler, P.V., Nowozin, S.: On feature combination for multiclass object classification. In: ICCV, pp. 221–228 (2009)
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: ECCV, pp. 304–317 (2008)
Jegou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. Int. J. Comput. Vis. 87(3), 316–336 (2010)
Kalantidis, Y., Pueyo, L.G., Trevisiol, M., van Zwol, R., Avrithis, Y.S.: Scalable triangulation-based logo recognition. In: ICMR, p. 20 (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: CVPR (2008)
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)
Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends Comput. Graph. Vis. 3(3), 177–280 (2007)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms (2008). Available at http://www.vlfeat.org/
Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-duplicate web image search. In: CVPR, pp. 25–32 (2009)
Wu, Z., Xu, Q., Jiang, S., Huang, Q., Cui, P., Li, L.: Adding affine invariant geometric constraint for partial-duplicate image retrieval. In: ICPR, pp. 842–845 (2010)
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Multimedia Information Retrieval, pp. 197–206 (2007)
Acknowledgements
We are thankful to Natural Sciences and Engineering Research Council of Canada and Alberta Innovates Technology Futures for continued support of this research.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bhattacharya, P., Gavrilova, M.L. Spatial consistency of dense features within interest regions for efficient landmark recognition. Vis Comput 29, 491–499 (2013). https://doi.org/10.1007/s00371-013-0813-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-013-0813-5