Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers

  • Abhinav Gupta
  • Larry S. Davis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5302)


Learning visual classifiers for object recognition from weakly labeled data requires determining correspondence between image regions and semantic object classes. Most approaches use co-occurrence of “nouns” and image features over large datasets to determine the correspondence, but many correspondence ambiguities remain. We further constrain the correspondence problem by exploiting additional language constructs to improve the learning process from weakly labeled data. We consider both “prepositions” and “comparative adjectives” which are used to express relationships between objects. If the models of such relationships can be determined, they help resolve correspondence ambiguities. However, learning models of these relationships requires solving the correspondence problem. We simultaneously learn the visual features defining “nouns” and the differential visual features defining such “binary-relationships” using an EM-based approach.


Image Region Image Annotation Correspondence Problem Binary Relationship Automatic Image Annotation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Armitage, L., Enser, P.: Analysis of user need in image archives. Journal of Information Science (1997)Google Scholar
  2. 2.
    Barnard, K., Duygulu, P., Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research, 1107–1135 (2003)Google Scholar
  3. 3.
    Carneiro, G., Chan, A.B., Moreno, P., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE PAMI (2007)Google Scholar
  4. 4.
    Carbonetto, P., Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Duygulu, P., Barnard, K., Freitas, N., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Barnard, K., Forsyth, D.: Learning the semantics of words and pictures. In: ICCV, pp. 408–415 (2001)Google Scholar
  7. 7.
    Andrews, S., Tsochantaridis, I., Hoffman, T.: Support vector machines for multiple-instance learning. In: NIPS (2002)Google Scholar
  8. 8.
    Li, J., Wang, J.: Automatic linguistic indexing of pictures by statistical modeling approach. IEEE PAMI (2003)Google Scholar
  9. 9.
    Maron, O., Ratan, A.: Multiple-instance learning for natural scene classification. ICML (1998)Google Scholar
  10. 10.
    Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: NIPS (2003)Google Scholar
  11. 11.
    Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: CVPR (2004)Google Scholar
  12. 12.
    Mori, Y., Takahashi, H., Oka, R.: Image to word transformation based on dividing and vector quantizing images with words. MISRM (1999)Google Scholar
  13. 13.
    Brown, P., Pietra, S., Pietra, V., Mercer, R.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics (1993)Google Scholar
  14. 14.
    Srikanth, M., Varner, J., Bowden, M., Moldovan, D.: Exploiting ontologies for automatic image annotation. SIGIR (2005)Google Scholar
  15. 15.
    Jin, R., Chai, J., Si, L.: Effective automatic image annotation via a coherent language model and active learning. Mutimedia (2004)Google Scholar
  16. 16.
    Brill, E.: A simple rule-based part of speech tagger. ACL (1992)Google Scholar
  17. 17.
    Brill, E.: Transformation-based error-driven learning and natural language processing. Computational Linguistics (1995)Google Scholar
  18. 18.
    Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007)Google Scholar
  19. 19.
    Barnard, K., Yanai, K., Johnson, M., Gabbur, P.: Cross modal disambiguation. Toward Category-Level Object Recognition (2006)Google Scholar
  20. 20.
    Barnard, K., Fan, Q.: Reducing correspondence ambiguity in loosely labeled training data. In: CVPR (2007)Google Scholar
  21. 21.
    Barnard, K., Johnson, M.: Word sense disambigutaion with pictures. AI (2005)Google Scholar
  22. 22.
    Barnard, K., Fan, Q., Swaminathan, R., Hoogs, A., Collins, R., Rondot, P., Kaufold, J.: Evaluation of localized semantics: data, methodology and experiments. Univ. of Arizona, TR-2005 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Abhinav Gupta
    • 1
  • Larry S. Davis
    • 1
  1. 1.Department of Computer ScienceUniversity of MarylandCollege ParkUSA

Personalised recommendations