Advertisement

Towards a Universal and Limited Visual Vocabulary

  • Jian Hou
  • Zhan-Shen Feng
  • Yong Yang
  • Nai-Ming Qi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6939)

Abstract

Bag-of-visual-words is a popular image representation and attains wide application in image processing community. While its potential has been explored in many aspects, its operation still follows a basic mode, namely for a given dataset, using k-means-like clustering methods to train a vocabulary. The vocabulary obtained this way is data dependent, i.e., with a new dataset, we must train a new vocabulary. Based on previous research on determining the optimal vocabulary size, in this paper we research on the possibility of building a universal and limited visual vocabulary with optimal performance. We analyze why such a vocabulary should exist and conduct extensive experiments on three challenging datasets to validate this hypothesis. As a consequence, we believe this work sheds a new light on finally obtaining a universal visual vocabulary of limited size which can be used with any datasets to obtain the best or near-best performance.

Keywords

Recognition Rate Visual Word Cosine Similarity Image Pattern Video Retrieval 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, pp. 1470–1477 (2003)Google Scholar
  2. 2.
    Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1458–1465 (2005)Google Scholar
  3. 3.
    Lazebnik, S., Schmid, C., Ponce, J.: A maximum entropy framework for part-based texture and object recognition. In: IEEE International Conference on Computer Vision, pp. 832–838 (2005)Google Scholar
  4. 4.
    Yang, J., Jiang, Y., Hauptmann, A., Ngo, C.: Evaluating bag-of-visual-words representations in scene classification. In: International Workshop on Multimedia Information Retrieval, pp. 197–206 (2007)Google Scholar
  5. 5.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178 (2006)Google Scholar
  6. 6.
    Marszalek, M., Schmid, C.: Spatial weighting for bag-of-features. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2118–2125 (2006)Google Scholar
  7. 7.
    Viitaniemi, V., Laaksonen, J.: Spatial extensions to bag of visual words. In: ACM International Conference on Image and Video Retrieval (2009)Google Scholar
  8. 8.
    Cai, H., Yan, F., Mikolajczyk, K.: Learning weights for codebook in image classification and retrieval. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2320–2327 (2010)Google Scholar
  9. 9.
    Nister, D., Stewenius, H.: Scale recognition with a vocabulary tree. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2161–2168 (2006)Google Scholar
  10. 10.
    Li, T., Mei, T., Kweon, I.S.: Learning optimal compact codebook for efficient object categorization. In: IEEE 2008 Workshop on Applications of Computer Vision, pp. 1–6 (2008)Google Scholar
  11. 11.
    Mallapragada, P., Jin, R., Jain, A.: Online visual vocabulary pruning using pairwise constraints. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 3073–3080 (2010)Google Scholar
  12. 12.
    Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: An in-depth study. Technical report, INRIA (2003)Google Scholar
  13. 13.
    Zhao, W., Jiang, Y., Ngo, C.: Keyframe retrieval by keypoints: Can point-to-point matching help? In: ACM International Conference on Image and Video Retrieval, pp. 72–81 (2006)Google Scholar
  14. 14.
    Deselaers, T., Pimenidis, L., Ney, H.: Bag-of-visual-words models for adult image lassification and filtering. In: International Conference on Pattern Recognition, pp. 1–4 (2008)Google Scholar
  15. 15.
    Hou, J., Kang, J., Qi, N.M.: On vocabulary size in bag-of-visual-words representation. In: The 2010 Pacific-Rim Conference on Multimedia, pp. 414–424 (2010)Google Scholar
  16. 16.
    Ries, C.X., Romberg, S., Lienhart, R.: Towards universal visual vocabularies. In: International Conference on Multimedia and Expo., pp. 1067–1072 (2010)Google Scholar
  17. 17.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: CVPR, Workshop on Generative-Model Based Vision, p. 178 (2004)Google Scholar
  18. 18.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42, 145–175 (2001)CrossRefzbMATHGoogle Scholar
  19. 19.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 524–531 (2005)Google Scholar
  20. 20.
    Jia, L.L., Fei-Fei, L.: What, where and who? classifying event by scene and object recognition. In: IEEE International Conference on Computer Vision, pp. 1–8 (2007)Google Scholar
  21. 21.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  22. 22.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report 7694, Caltech (2007)Google Scholar
  23. 23.
    Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: IEEE International Conference on Computer Vision, pp. 1447–1454 (2006)Google Scholar
  24. 24.
    Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: A real-world web image database from national university of singapore. In: ACM International Conference on Image and video retrieval, pp. 1–9 (2009)Google Scholar
  25. 25.
    Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: Weak hypotheses and boosting for generic object detection and recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 71–84. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jian Hou
    • 1
  • Zhan-Shen Feng
    • 1
  • Yong Yang
    • 2
  • Nai-Ming Qi
    • 2
  1. 1.School of Computer Science and TechnologyXuchang UniversityChina
  2. 2.School of AstronauticsHarbin Institute of TechnologyHarbinChina

Personalised recommendations