Skip to main content

Visual Vocabulary Optimization with Spatial Context for Image Annotation and Classification

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7131))

Abstract

In this paper, we propose a new approach of visual vocabulary optimization with spatial context, which contains important spatial information that has not been fully exploited. The novelty of our method mainly lies in two aspects: when spatial information is considered, and how spatial information is used. For the first aspect, the existing methods generally consider spatial information after the visual vocabulary is built, while we employ the spatial information in the construction of visual vocabulary, to produce more accurate visual vocabulary. For the second aspect, different from existing methods which use spatial information to re-rank the original retrieval results, to generate the local keypoint groups such as visual phrases, or in spatial pyramid matching kernel, etc, we propose a novel method that employs spatial information as side information to constrain the construction of visual vocabulary. Instead of simply assigning keypoints to the nearest cluster centers, we also take the spatial context of keypoints into consideration in the clustering process. With the proposed approach, more accurate visual vocabulary can be generated, and the evaluation results can be improved in both image annotation and classification tasks. Experiments on widely-used 15-scenes dataset demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sivic, J., Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. In: ICCV (2003)

    Google Scholar 

  2. Nister, D., Stewenius, H.: Scalable Recognition with a Vocabulary Tree. In: CVPR (2006)

    Google Scholar 

  3. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object Retrieval with Large Vocabulary and Fast Spatial Matching. In: CVPR (2007)

    Google Scholar 

  4. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases. In: CVPR (2008)

    Google Scholar 

  5. Lepetit, V., Lagger, P., Fua, P.: Randomized Trees for Real-time Keypoint Recognition. In: CVPR (2005)

    Google Scholar 

  6. Yeh, T., Lee, J., Darrell, T.: Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning. In: ICCV (2007)

    Google Scholar 

  7. Fischler, M.A., Bolles, R.C.: Random Sample Consensus. Comm. ACM 24(6), 381–395 (1981)

    Article  Google Scholar 

  8. Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive Visual Words and Visual Phrases for Image Applications. ACM Multimedia (2009)

    Google Scholar 

  9. Jiang, Y.G., Ngo, C.W., Yang, J.: Towards Optimal Bag-of-Features for Object Categorization and Semantic Video Retrieval. In: CIVR (2007)

    Google Scholar 

  10. Zheng, Y.-T., Neo, S.-Y., Chua, T.-S., Tian, Q.: Visual Synset: a Higher-level Visual Representation for Object-based Image Retrieval. The Visual Computer (2009)

    Google Scholar 

  11. Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: CVPR (2006)

    Google Scholar 

  12. Perdoch, M., Chum, O., Matas, J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval. In: CVPR (2009)

    Google Scholar 

  13. Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study. In: TMM (2010)

    Google Scholar 

  14. Grauman, K., Darrell, T.: Approximate Correspondences in High Dimensions. In: NIPS (2007)

    Google Scholar 

  15. Yilmaz, E., Aslam, J.A.: Estimating Average Precision with Incomplete and Imperfect Judgments. In: CIKM (2006)

    Google Scholar 

  16. Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. IJCV (2004)

    Google Scholar 

  17. Mikolajczyk, K., Schmid, C.: A Performance Evaluation of Local Descriptors. TPAMI (2005)

    Google Scholar 

  18. Oliva, A., Torraba, A.: Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelop. IJCV (2001)

    Google Scholar 

  19. Fei-Fei, L., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: CVPR (2005)

    Google Scholar 

  20. Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-means Clustering with Background Knowledge. In: ICML (2001)

    Google Scholar 

  21. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear Spatial Pyramid Matching using Sparse Coding for Image Classification. In: CVPR (2009)

    Google Scholar 

  22. Liu, D., Hua, G., Viola, P., Chen, T.: Integrated Feature Selection and Higher-Order Spatial Feature Extraction for Object Categorization. In: CVPR (2008)

    Google Scholar 

  23. Ji, R., Yao, H., Sun, X.: Towards Semantic Embedding in Visual Vocabulary. In: CVPR (2010)

    Google Scholar 

  24. Ji, R., Xie, X., Yao, H., Ma, W.-Y.: Vocabulary Hierarchy Optimization for Effective and Transferable Retrieval. In: CVPR (2009)

    Google Scholar 

  25. Lu, Z., Ip, H.H.S.: Image Categorization with Spatial Mismatch Kernels. In: CVPR (2009)

    Google Scholar 

  26. Grauman, K., Darrell, T.: The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In: ICCV (2005)

    Google Scholar 

  27. van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel Codebooks for Scene Categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  28. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)

    Google Scholar 

  29. Qin, J., Yung, N.H.C.: Scene categorization via contextual visual words. Pattern Recognition (2010)

    Google Scholar 

  30. Cai, H., Yan, F., Mikolajczyk, K.: Learning Weights for Codebook in Image Classification and Retrieval. In: CVPR (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, Z., Peng, Y., Xiao, J. (2012). Visual Vocabulary Optimization with Spatial Context for Image Annotation and Classification. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, CW., Andreopoulos, Y., Breiteneder, C. (eds) Advances in Multimedia Modeling. MMM 2012. Lecture Notes in Computer Science, vol 7131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27355-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27355-1_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27354-4

  • Online ISBN: 978-3-642-27355-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics