Using Visual-Textual Mutual Information and Entropy for Inter-modal Document Indexing

Martinet, Jean; Satoh, Shin’ichi

doi:10.1007/978-3-540-71496-5_50

Jean Martinet¹ &
Shin’ichi Satoh¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

European Conference on Information Retrieval

2056 Accesses
2 Citations

Abstract

This paper presents a contribution in the domain of automatic visual document indexing based on inter-modal analysis, in the form of a statistical indexing model. The approach is based on inter-modal document analysis, which consists in modeling and learning some relationships between several modalities from a data set of annotated documents in order to extract semantics. When one of the modalities is textual, the learned associations can be used to predict a textual index for visual data from a new document (image or video). More specifically, the presented approach relies on a learning process in which associations between visual and textual information are characterized by the mutual information of the modalities. Besides, the model uses the information entropy of the distribution of the visual modality against the textual modality as a second source to select relevant indexing terms. We have implemented the proposed information theoretic model, and the results of experiments assessing its performance on two collections (image and video) show that information theory is an interesting framework to automatically annotate documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Snoek, C.G.M., Worring, M.: Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications 25(1), 5–35 (2005)
Article Google Scholar
Nigay, L., Coutaz, J.: A Design Space for Multimodal Systems: Concurrent Processing and Data Fusion. In: Proceedings of INTERCHI ’93, pp. 172–178. ACM Press, New York (1993), citeseer.ist.psu.edu/nigay93design.html
Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423 (1948)
MathSciNet Google Scholar
Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM’99 First International Workshop on Multimedia Intelligent Storage and Retrieval Management (1999), citeseer.ist.psu.edu/368129.html
Barnard, K., Duygulu, P., Forsyth, D.: Recognition as translating images into text. In: Internet Imaging IX, Electronic Imaging (2003)
Google Scholar
Bastan, M., Duygulu, P.: Recognizing objects and scenes in news videos. In: Sundaram, H., et al. (eds.) CIVR 2006. LNCS, vol. 4071, Springer, Heidelberg (2006)
Google Scholar
Blei, D., Jordan, M.: Modeling annotated data. In: 26th International Conference on Research and Development in Information Retrieval (SIGIR’03), New York (2003)
Google Scholar
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119–126. ACM Press, New York (2003)
Google Scholar
Salton, G., Buckley, C., Yu, C.T.: An evaluation of term dependence models in information retrieval. In: Salton, G., Schneider, H.-J. (eds.) SIGIR 1982. LNCS, vol. 146, pp. 151–173. Springer, Heidelberg (1983)
Chapter Google Scholar
Wells, W.M., et al.: Multi-modal volume registration by maximization of mutual information. Medical Image Analysis 1(1), 35–51 (1996)
Article Google Scholar
Butz, T., Thiran, J.-P.: Multi-modal signal processing: An information theoretical framework. Technical Report 02.01, Signal Processing Institute (ITS), Swiss Federal Institute of Technology (EPFL) (2002)
Google Scholar
Salton, G.: The SMART Retrieval System. Prentice-Hall, Englewood Cliffs (1971)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK (1994), citeseer.ist.psu.edu/schmid94probabilistic.html

Download references

Author information

Authors and Affiliations

National Institute of Informatics, Multimedia Information Research Division, Tokyo, Japan
Jean Martinet & Shin’ichi Satoh

Authors

Jean Martinet
View author publications
You can also search for this author in PubMed Google Scholar
Shin’ichi Satoh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martinet, J., Satoh, S. (2007). Using Visual-Textual Mutual Information and Entropy for Inter-modal Document Indexing. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_50

Download citation

DOI: https://doi.org/10.1007/978-3-540-71496-5_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics