Skip to main content

Using Visual-Textual Mutual Information and Entropy for Inter-modal Document Indexing

  • Conference paper
Advances in Information Retrieval (ECIR 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

Abstract

This paper presents a contribution in the domain of automatic visual document indexing based on inter-modal analysis, in the form of a statistical indexing model. The approach is based on inter-modal document analysis, which consists in modeling and learning some relationships between several modalities from a data set of annotated documents in order to extract semantics. When one of the modalities is textual, the learned associations can be used to predict a textual index for visual data from a new document (image or video). More specifically, the presented approach relies on a learning process in which associations between visual and textual information are characterized by the mutual information of the modalities. Besides, the model uses the information entropy of the distribution of the visual modality against the textual modality as a second source to select relevant indexing terms. We have implemented the proposed information theoretic model, and the results of experiments assessing its performance on two collections (image and video) show that information theory is an interesting framework to automatically annotate documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Snoek, C.G.M., Worring, M.: Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications 25(1), 5–35 (2005)

    Article  Google Scholar 

  2. Nigay, L., Coutaz, J.: A Design Space for Multimodal Systems: Concurrent Processing and Data Fusion. In: Proceedings of INTERCHI ’93, pp. 172–178. ACM Press, New York (1993), citeseer.ist.psu.edu/nigay93design.html

    Google Scholar 

  3. Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423 (1948)

    MathSciNet  Google Scholar 

  4. Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM’99 First International Workshop on Multimedia Intelligent Storage and Retrieval Management (1999), citeseer.ist.psu.edu/368129.html

  5. Barnard, K., Duygulu, P., Forsyth, D.: Recognition as translating images into text. In: Internet Imaging IX, Electronic Imaging (2003)

    Google Scholar 

  6. Bastan, M., Duygulu, P.: Recognizing objects and scenes in news videos. In: Sundaram, H., et al. (eds.) CIVR 2006. LNCS, vol. 4071, Springer, Heidelberg (2006)

    Google Scholar 

  7. Blei, D., Jordan, M.: Modeling annotated data. In: 26th International Conference on Research and Development in Information Retrieval (SIGIR’03), New York (2003)

    Google Scholar 

  8. Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119–126. ACM Press, New York (2003)

    Google Scholar 

  9. Salton, G., Buckley, C., Yu, C.T.: An evaluation of term dependence models in information retrieval. In: Salton, G., Schneider, H.-J. (eds.) SIGIR 1982. LNCS, vol. 146, pp. 151–173. Springer, Heidelberg (1983)

    Chapter  Google Scholar 

  10. Wells, W.M., et al.: Multi-modal volume registration by maximization of mutual information. Medical Image Analysis 1(1), 35–51 (1996)

    Article  Google Scholar 

  11. Butz, T., Thiran, J.-P.: Multi-modal signal processing: An information theoretical framework. Technical Report 02.01, Signal Processing Institute (ITS), Swiss Federal Institute of Technology (EPFL) (2002)

    Google Scholar 

  12. Salton, G.: The SMART Retrieval System. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  13. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK (1994), citeseer.ist.psu.edu/schmid94probabilistic.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Martinet, J., Satoh, S. (2007). Using Visual-Textual Mutual Information and Entropy for Inter-modal Document Indexing. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71496-5_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71494-1

  • Online ISBN: 978-3-540-71496-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics