Abstract
In this paper, we propose a new approach of visual vocabulary optimization with spatial context, which contains important spatial information that has not been fully exploited. The novelty of our method mainly lies in two aspects: when spatial information is considered, and how spatial information is used. For the first aspect, the existing methods generally consider spatial information after the visual vocabulary is built, while we employ the spatial information in the construction of visual vocabulary, to produce more accurate visual vocabulary. For the second aspect, different from existing methods which use spatial information to re-rank the original retrieval results, to generate the local keypoint groups such as visual phrases, or in spatial pyramid matching kernel, etc, we propose a novel method that employs spatial information as side information to constrain the construction of visual vocabulary. Instead of simply assigning keypoints to the nearest cluster centers, we also take the spatial context of keypoints into consideration in the clustering process. With the proposed approach, more accurate visual vocabulary can be generated, and the evaluation results can be improved in both image annotation and classification tasks. Experiments on widely-used 15-scenes dataset demonstrate the effectiveness of the proposed approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sivic, J., Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. In: ICCV (2003)
Nister, D., Stewenius, H.: Scalable Recognition with a Vocabulary Tree. In: CVPR (2006)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object Retrieval with Large Vocabulary and Fast Spatial Matching. In: CVPR (2007)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases. In: CVPR (2008)
Lepetit, V., Lagger, P., Fua, P.: Randomized Trees for Real-time Keypoint Recognition. In: CVPR (2005)
Yeh, T., Lee, J., Darrell, T.: Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning. In: ICCV (2007)
Fischler, M.A., Bolles, R.C.: Random Sample Consensus. Comm. ACM 24(6), 381–395 (1981)
Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive Visual Words and Visual Phrases for Image Applications. ACM Multimedia (2009)
Jiang, Y.G., Ngo, C.W., Yang, J.: Towards Optimal Bag-of-Features for Object Categorization and Semantic Video Retrieval. In: CIVR (2007)
Zheng, Y.-T., Neo, S.-Y., Chua, T.-S., Tian, Q.: Visual Synset: a Higher-level Visual Representation for Object-based Image Retrieval. The Visual Computer (2009)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: CVPR (2006)
Perdoch, M., Chum, O., Matas, J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval. In: CVPR (2009)
Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study. In: TMM (2010)
Grauman, K., Darrell, T.: Approximate Correspondences in High Dimensions. In: NIPS (2007)
Yilmaz, E., Aslam, J.A.: Estimating Average Precision with Incomplete and Imperfect Judgments. In: CIKM (2006)
Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. IJCV (2004)
Mikolajczyk, K., Schmid, C.: A Performance Evaluation of Local Descriptors. TPAMI (2005)
Oliva, A., Torraba, A.: Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelop. IJCV (2001)
Fei-Fei, L., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: CVPR (2005)
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-means Clustering with Background Knowledge. In: ICML (2001)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear Spatial Pyramid Matching using Sparse Coding for Image Classification. In: CVPR (2009)
Liu, D., Hua, G., Viola, P., Chen, T.: Integrated Feature Selection and Higher-Order Spatial Feature Extraction for Object Categorization. In: CVPR (2008)
Ji, R., Yao, H., Sun, X.: Towards Semantic Embedding in Visual Vocabulary. In: CVPR (2010)
Ji, R., Xie, X., Yao, H., Ma, W.-Y.: Vocabulary Hierarchy Optimization for Effective and Transferable Retrieval. In: CVPR (2009)
Lu, Z., Ip, H.H.S.: Image Categorization with Spatial Mismatch Kernels. In: CVPR (2009)
Grauman, K., Darrell, T.: The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In: ICCV (2005)
van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel Codebooks for Scene Categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Qin, J., Yung, N.H.C.: Scene categorization via contextual visual words. Pattern Recognition (2010)
Cai, H., Yan, F., Mikolajczyk, K.: Learning Weights for Codebook in Image Classification and Retrieval. In: CVPR (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, Z., Peng, Y., Xiao, J. (2012). Visual Vocabulary Optimization with Spatial Context for Image Annotation and Classification. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, CW., Andreopoulos, Y., Breiteneder, C. (eds) Advances in Multimedia Modeling. MMM 2012. Lecture Notes in Computer Science, vol 7131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27355-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-27355-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27354-4
Online ISBN: 978-3-642-27355-1
eBook Packages: Computer ScienceComputer Science (R0)