A BOVW Based Query Generative Model

  • Reede Ren
  • John Collomosse
  • Joemon Jose
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6523)


Bag-of-visual words (BOVW) is a local feature based framework for content-based image and video retrieval. Its performance relies on the discriminative power of visual vocabulary, i.e. the cluster set on local features. However, the optimisation of visual vocabulary is of a high complexity in a large collection. This paper aims to relax such a dependence by adapting the query generative model to BOVW based retrieval. Local features are directly projected onto latent content topics to create effective visual queries; visual word distributions are learnt around local features to estimate the contribution of a visual word to a query topic; the relevance is justified by considering concept distributions on visual words as well as on local features. Massive experiments are carried out the TRECVid 2009 collection. The notable improvement on retrieval performance shows that this probabilistic framework alleviates the problem of visual ambiguity and is able to afford visual vocabulary with relatively low discriminative power.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, A., Triggs, B.: Multilevel image coding with hyperfeatures. International Journal of Computer Vision 78(1), 15–27 (2008)CrossRefGoogle Scholar
  2. 2.
    Battiato, S., Farinella, G.M., Gallo, G., Ravì, D.: Spatial hierarchy of textons distributions for scene classification. In: Huet, B., Smeaton, A., Mayer-Patel, K., Avrithis, Y. (eds.) MMM 2009. LNCS, vol. 5371, pp. 333–343. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR 2009 (2009)Google Scholar
  4. 4.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding 106(1), 59–70 (2007)CrossRefGoogle Scholar
  5. 5.
    Jiang, Y.-G., Ngo, C.-W.: Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval. Computer Vision and Image Understanding 113(3), 405–414 (2009)CrossRefGoogle Scholar
  6. 6.
    Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding:classification, annotation and segmentation in an automatic framework. In: Proc. IEEE Computer Vision and Pattern Recognition, CVPR (2009)Google Scholar
  7. 7.
    Liu, D., Hua, G., Viola, P., Chen, T.: Integrated feature selection and higher order spatial feature extraction for object categorisation. In: CVPR 2008, pp. 1–8 (2008)Google Scholar
  8. 8.
    Lowe, D.: Object recognition from local scale-invariant features. In: ICCV, pp. 1150–1157 (September 1999)Google Scholar
  9. 9.
    Marszalek, M., Schmid, C., Harzallah, H., van de Weijer, J.: Learning representations for visual object class recognition. In: ICCV (2007)Google Scholar
  10. 10.
    Punitha, P., Misra, H., Ren, R., Hannah, D., Goyal, A., Villa, R., Jose, J.M.: Glasgow university at trecvid 2009. In: TRECVID (2009)Google Scholar
  11. 11.
    Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006, Washington, DC, USA, pp. 2033–2040. IEEE Computer Society Press, Los Alamitos (2006)Google Scholar
  12. 12.
    Snoek, C.G.M., van de Sande, K.E.A., de Rooij, O., Huurnink, B., van Gemert, J., Uijlings, J.R.R., He, J., Li, X., Everts, I., Nedovic, V., van Liempt, M., van Balen, R., de Rijke, M., Geusebroek, J.-M., Gevers, T., Worring, M., Smeulders, A.W.M., Koelma, D., Yan, F., Tahir, M.A., Mikolajczyk, K., Kittler, J.: The mediamill TRECVID 2009 semantic video search engine. In: TRECVID (2009)Google Scholar
  13. 13.
    Uijlings, J.R.R., Smeulders, A.W.M., Scha, R.J.H.: Real-time bag of words, approximately. In: CIVR 2009, Santorini, Fira, Greece, pp. 1–8. ACM, New York (2009)Google Scholar
  14. 14.
    van Gemert, J.C., Veenman, C.J., Smeulders, A.W., Geusebroek, J.-M.: Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1271–1283 (2010)CrossRefGoogle Scholar
  15. 15.
    Zhai, C., Lafferty, J.: A risk minimization framework for information retrieval. Inf. Process. Manage 42(1), 31–55 (2006)CrossRefMATHGoogle Scholar
  16. 16.
    Zhang, S., Tan, Q., Hua, G., Huang, Q., Li, S.: Descriptive visual words and visual phrases for image applications. In: ACM Multimedia 2009 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Reede Ren
    • 1
  • John Collomosse
    • 1
  • Joemon Jose
    • 2
  1. 1.CVSSPUniversity of SurreyGuildfordUK
  2. 2.IR GroupUniversity of GlasgowGlasgowUK

Personalised recommendations