Cross Modal Disambiguation

  • Kobus Barnard
  • Keiji Yanai
  • Matthew Johnson
  • Prasad Gabbur
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4170)


We consider strategies for reducing ambiguity in multi-modal data, particularly in the domain of images and text. Large data sets containing images with associated text (and vice versa) are readily available, and recent work has exploited such data to learn models for linking visual elements to semantics. This requires addressing a correspondence ambiguity because it is generally not known which parts of the images connect with which language elements. In this paper we first discuss using language processing to reduce correspondence ambiguity in loosely labeled image data. We then consider a similar problem of using visual correlates to reduce ambiguity in text with associated images. Only rudimentary image understanding is needed for this task because the image only needs to help differentiate between a limited set of choices, namely the senses of a particular word.


Gaussian Mixture Model Machine Translation Word Sense Word Sense Disambiguation Statistical Machine Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Proceedings of Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2004)Google Scholar
  2. 2.
    Agirre, E., Rigau, G.: Word sense disambiguation using conceptual density. In: Proceedings of COLING 1996, Copenhagen, Denmark, pp. 16–22 (1996)Google Scholar
  3. 3.
    Agirre, E., Rigau, G.: A proposal for word sense disambiguation using conceptual distance. In: Proceedings of the 1st International Conference on Recent Advances in Natural Language Processing (1995)Google Scholar
  4. 4.
    Amar, R.A., Dooly, D.R., Goldman, S.A., Zhang, Q.: Multiple instance learning of real-valued data. In: 18th Int. Conf. Machine Learning (2001)Google Scholar
  5. 5.
    Andrews, S., Hofmann, T., Tsochantaridis, I.: Multiple instance learning with generalized support vector machines. In: AAAI (2002)Google Scholar
  6. 6.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems, 15 (2002)Google Scholar
  7. 7.
    Bar-Hillel, Y.: The present status of automatic translation of languages. In: Booth, D., Meagher, R.E. (eds.) Advances in Computers, pp. 91–163. Academic Press, New York (1960)Google Scholar
  8. 8.
    Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)MATHCrossRefGoogle Scholar
  9. 9.
    Barnard, K., Duygulu, P., Forsyth, D.: Exploiting text and image feature co-occurrence statistics in large datasets. In: Veltkamp, R. (ed.) Trends and Advances in Content-Based Image and Video Retrieval. Springer, Heidelberg (to appear)Google Scholar
  10. 10.
    Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)MATHCrossRefGoogle Scholar
  11. 11.
    Barnard, K., Duygulu, P., Raghavendra, K.G., Gabbur, P., Forsyth, D.: The effects of segmentation and feature choice in a translation model of object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, vol.II, pp. 675–682 (2003)Google Scholar
  12. 12.
    Barnard, K., Fan, Q., Swaminathan, R., Hoogs, A., Collins, R., Rondot, P., Kaufhold, J.: Evaluation of localized semantics: data, methodology, and experiments. Technical report, University of Arizona (2005)Google Scholar
  13. 13.
    Barnard, K., Forsyth, D.: Learning the semantics of words and pictures. In: International Conference on Computer Vision, pp. II: 408–415 (2001)Google Scholar
  14. 14.
    Barnard, K., Johnson, M.: Word sense disambiguation with pictures. Artificial Intelligence 167, 13–30 (2005)CrossRefGoogle Scholar
  15. 15.
    Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: CVPR (2005)Google Scholar
  16. 16.
    Brill, E.: A simple rule-based part of speech tagger. In: Third Conference on Applied Natural Language Processing. ACL (1992)Google Scholar
  17. 17.
    Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)Google Scholar
  18. 18.
    Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16, 79–85 (1990)Google Scholar
  19. 19.
    Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of machine translation: parameter estimation. Computational Linguistics 19(10), 263–311 (1993)Google Scholar
  20. 20.
    Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: European Conference on Computer Vision, vol. 1, pp. 350–362 (2004)Google Scholar
  21. 21.
    Carson, C., Thomas, M., Belongie, S., Hellerstein, J.M., Malik, J.: Blobworld: A system for region-based image indexing and retrieval. In: Third International Conference on Visual Information Systems. Springer, Heidelberg (1999)Google Scholar
  22. 22.
    La Cascia, M., Sethi, S., Sclaroff, S.: Combining textual and visual cues for content based image retrieval on the web. In: IEEE Workshop on Content Based Access of Image and Video Libraries, pp. 24–28 (1998)Google Scholar
  23. 23.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)MATHMathSciNetGoogle Scholar
  24. 24.
    Deng, Y., Manjunath, B.S.: Unsupervised segmentation of color-texture regions in images and video. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(8), 800–810 (2001)CrossRefGoogle Scholar
  25. 25.
    Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  26. 26.
    Edmonds, P., Kilgarriff, A. (eds.): Journal of Natural Language Engineering, vol. 9 (January 2003)Google Scholar
  27. 27.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Workshop on Generative-Model Based Vision (2004)Google Scholar
  28. 28.
    Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: Proceedings of CVPR 2004, vol.2, pp.1002–1009 (2004)Google Scholar
  29. 29.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2003)Google Scholar
  30. 30.
    Francis, W.N., Kucera, H.: Frequency Analysis of English Usage. Lexicon and Grammar. Houghton Mifflin (1981)Google Scholar
  31. 31.
    Miller, G., Leacock, C., Randee, T., Bunker, R.: A semantic concordance. In: Procedings of the 3rd DARPA Workshop on Human Language Technology, pp. 303–308 (1993)Google Scholar
  32. 32.
    Gale, W., Church, K., Yarowsky, D.: One sense per discourse. In: DARPA Workshop on Speech and Natural Language, pp. 233–237 (1992)Google Scholar
  33. 33.
    Gonzalo, J., Verdejo, F., Chugur, I., Cigarran, J.: Indexing with wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for NLP, Montreal, Canada, pp. 38–44 (1998)Google Scholar
  34. 34.
    Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. Technical report, Massachusetts Institute of Technology (1998)Google Scholar
  35. 35.
    Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: SIGIR, pp. 119–126 (2003)Google Scholar
  36. 36.
    Johnson, M., Barnard, K.: ImCor: A linking of SemCor sense disambiguated text to corel image data (2004),
  37. 37.
    Kaplan, A.: An experimental study of ambiguity in context (1950)Google Scholar
  38. 38.
    Maron, O., Lozano-Perez, T.: A framework for multiple-instance learning. In: Neural Information Processing Systems. MIT Press, Cambridge (1998)Google Scholar
  39. 39.
    Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: The Fifteenth International Conference on Machine Learning (1998)Google Scholar
  40. 40.
    Melamed, D.: Empirical methods for exploiting parallel texts. MIT Press, Cambridge (2001)Google Scholar
  41. 41.
    Mihalcea, R., Faruque, E.: Senselearner: Minimally supervised word sense disambiguation for all words in open text. In: Proceedings of ACL/SIGLEX Senseval-3, Barcelona, Spain (July 2004)Google Scholar
  42. 42.
    Mihalcea, R., Moldovan, D.: Word sense disambiguation based on semantic density. In: Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (August 1998)Google Scholar
  43. 43.
    Mihalcea, R., Moldovan, D.: An iterative approach to word sense disambiguation. In: Proceedings of Florida Artificial Intelligence Research Society Conference (FLAIRS 2000), Orlando, FL, pp. 219–223 (May 2000)Google Scholar
  44. 44.
    Montoyo, P.M.: Wordnet enrichment with classification systems. In: Proceedings of NAACL Workshop WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Carnegie Mellon University, Pittsburgh, USA, pp. 101–106 (2001)Google Scholar
  45. 45.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(9), 888–905 (2000)Google Scholar
  46. 46.
    Shirahatti, N.V., Barnard, K.: Evaluating image retrieval. In: Proceedings of CVPR 2005, vol.1, pp. 955–961 (2005)Google Scholar
  47. 47.
    Stetina, J., Kurohashi, S., Nagao, M.: General word sense disambiguation method based on A full sentential context. In: Harabagiu, S. (ed.) Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference, pp. 1–8. Association for Computational Linguistics, Somerset (1998)Google Scholar
  48. 48.
    Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol.II, pp. 762–769 (2004)Google Scholar
  49. 49.
    Traupman, J., Wilensky, R.: Experiments in improving unsupervised word sense disambiguation. Technical report, University of California at Berkeley (2003)Google Scholar
  50. 50.
    Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Conference on Applied Natural Language Processing. ACL (1995)Google Scholar
  51. 51.
    Yngve, V.: Syntax and the problem of multiple meaning. In: Locke, W., Booth, D. (eds.) Machine Translation of Languages, New York, pp. 208–226. Wiley, Chichester (1955)Google Scholar
  52. 52.
    Zhang, Q., Goldman, S.A.: Em-dd:an improved multiple-instance learning technique. In: Neural Information Processing Systems (2001)Google Scholar
  53. 53.
    Zhang, Q., Goldman, S.A., Yu, W., Fritts, J.E.: Content-based image retrieval using multiple-instance learning. In: 19th Int. Conf. Machine Learning (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kobus Barnard
    • 1
  • Keiji Yanai
    • 2
  • Matthew Johnson
    • 3
  • Prasad Gabbur
    • 4
  1. 1.Department of Computer ScienceUniversity of Arizona 
  2. 2.Department of Computer ScienceThe University of Electro-CommunicationsTokyoJapan
  3. 3.Department of EngineeringUniversity of Cambridge 
  4. 4.Electrical and Computer EngineeringUniversity of Arizona 

Personalised recommendations