Abstract
Human and Computer do not speak the same language. This is one of the challenging problems to those working on interface of human and computers. This paper is an effort to summarize the approaches which brings humans and computers a bit closer in terms of interpretation of visual information. When we describe outside world we describe in terms of language expressions but what we see is in terms of pictures. A significant portion of human information is gathered through our visual channel. But we communicate using language (text). However, this is not the ease for computational systems. The interpretation of images in computational process is generally in the form of attribute values which cannot be directly correlated with words or concepts. Describing the visual scenes in terms of phrases is the problem in reference for this paper. Recent research efforts focus on combining text and images for semantic image interpretation. We summarize some of these approaches and propose a conceptual framework for information extraction that combines both image and text..
Preview
Unable to display preview. Download preview PDF.
References
Andrenucci, A, Sneiders, E.: Automated question answering: review of the main approaches. In: ICITA’ 05, pp. 514–519 (2005)
Barnard, K., Forsyth, D.: learning the semantics of words and pictures. In: International Conference on Computer Vision 2, 408–415 (2001)
Blei, D., Michael, Jordan, M. I.: Modeling annotated data. In: proceedings of 26th Annual international ACM SIGIR conference (2003)
Buitelaar, P., Sintek, M., Kiesel, M.: A lexicon model for multilingual/multimedia Ontologies. Proceedings of the third European Semantic Web Conference (2006)
Carson et al., 1997 Carson et al. (1997)
Hudelot, C., Maillot, N., Thonnat, M.: Symbol Grounding for Semantic Image Interpretation: From image data to semantics (2005)
Duygulu, P., Barnard, K., de Freitas, N. and Forsyth, D: Object recognition as machine translation: Learning a lexicon from a fixed image vocabulary. In seventh European Conference on Computer Vision, pages 97–112 (2002)
Jeon, J., Lavrenko, V, Manmatha Automatic Image annotation and retrieval using cross media relevance models. In the Proceedings of SIGIR’03, Toronto, Canada (2003)
Katz, B., Lin, J., Stauffer, C., Grimson, E.: Answering questions about moving objects in surveillance videos. In. Proc. of AAAI Spring Symposium on New Directions in QA (2003)
Boris, K.: (Annotating the World Wide Web using natural language’, Proceedings of the 5th RIAO Conference on Computer Assisted Information Searching on the Internet (1997)
Ma and Manjunath 1998
Martin-Valdivia, M. T., Diaz-Galiano, M. C., Montejo-Raez, A., Urena-Lopez, L. A. Using information gain to improve multi-maodal information retrieval systems, Information Processing Management 44, 1146–1158 (2008)
Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In MISRM’ 99 First International Workshop on Multimedia Intelligent Storage and Retrieval Management. (1999)
Papadopoulsola, G. T., Mezaris, V., Dasiopoulou, S and Kompatsiaris, I.: Semantic image analysis using a learning approach and spatial context. In Proceedings of the 1st International conference on semantic and digital media technologies (SAMT). (2006)
Su, L., Sharp, B., Chibelushi, C.: Knowledge-based image understanding: A rulebased production system for X-ray segmentation. In Proceedings of Fourth International Conference on Enterprise Information System, volume 1 (2002)
Town, C., Sinclair, D. Language-based querying of image collections on the basis of an extensible ontology. Image vision Comput. 22, 251–267 (2004)
Vompras, J.: Towards adaptive ontology-based image retrieval. In: Stefan Brass, C. G., editor, 17th GI-Workshop on the Foundations of Databases, Worlitz, Germany, pp. 148–152. Institute of Computer Science, Martin-Luther-University Halle-Wittenberg. (2005).
Winograd, T.: Understanding Natural Language, Academic Press, New York (1973)
Yang, H., Chaisorn, L., Zhao, Y., Neo, S. Y., Chua, T. S Video QA: question answering on news video. In: Proc. Of ACM MM’ 03, pp. 632–641 (2003)
Yeh, T., Lee, J.J., Darell, T.: Photo-based Question Answering, ACM Multimedia (2008)
Möller, M. M., Sintek M.: A Generic Framework for Semantic Medical Image Retrieval. In Proceedings of 7th Korea-Germany Joint Workshopon Advanced Medical Image Pro (2007)
Hudelot, C. Maillot, N. Thonnat, M.: Symbol Grounding for Semantic Image Interpretation: From Image Data to Semantics, In Proceedings of Tenth IEEE International Cofnerence on Computer Vision (2005)
Siddiqui, T., Tiwary, U. Natural Language Processing and Information Retrieval. Oxford University Press. (2007)
Faraday, S A. Attending to Web Pages Pete Faraday, Microsoft, Redmond http://www.cofc.edu/∼learning/chi01_faraday.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Indian Institute of Information Technology, India
About this paper
Cite this paper
Siddiqui, T.J., Tiwary, U.S. (2009). Words and Pictures: An HCI Perspective. In: Tiwary, U.S., Siddiqui, T.J., Radhakrishna, M., Tiwari, M.D. (eds) Proceedings of the First International Conference on Intelligent Human Computer Interaction. Springer, New Delhi. https://doi.org/10.1007/978-81-8489-203-1_4
Download citation
DOI: https://doi.org/10.1007/978-81-8489-203-1_4
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-8489-404-2
Online ISBN: 978-81-8489-203-1
eBook Packages: Computer ScienceComputer Science (R0)