Multi-modal Solution for Unconstrained News Story Retrieval

Younessian, Ehsan; Rajan, Deepu

doi:10.1007/978-3-642-27355-1_19

Ehsan Younessian²² &
Deepu Rajan²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7131))

Included in the following conference series:

International Conference on Multimedia Modeling

2002 Accesses
1 Citations

Abstract

We propose a multi-modal approach to retrieve associated news stories sharing the same main topic. In the textual domain, we utilize Automatic Speech Recognition (ASR) and refined Optical Character Recognition (OCR) transcripts while in the visual domain we employ a Near Duplicate Keyframe detection method to identify stories with common visual clues. In addition, we adopt another visual representation namely semantic signature, indicating pre-defined semantic concepts included in the news story, to improve the discriminativness of visual modality. We propose a query-class weighting scheme to integrate the retrieval outcomes gained from visual modalities. Experimental results show the distinguishing power of the enhanced representation in individual modalities and the superiority of our fusion approach performance compared to existing strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16(6), 345–379 (2010)
Article Google Scholar
Das, D., Chen, D., Hauptmann, A.G.: Improving multimedia retrieval with a video ocr. In: Gevers, T., Jain, R.C., Santini, S. (eds.) Society of Photo-Optical Instrumentation Engineers (SIPE) Conference, vol. 6820, p. 68200B. SPIE (January 2008)
Google Scholar
Hauptmann, A.G., Jin, R., Ng, T.D.: Multi-modal information retrieval from broadcast video using ocr and speech recognition. In: JCDL 2002, pp. 160–161. ACM (July 2002)
Google Scholar
http://aspell.net (last visited August 2010)
http://jocr.sourceforge.net (last visited August 2010)
http://www-nlpir.nist.gov/projects/tv2006/tv2006.html (last visited August 2010)
Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study. IEEE Transactions on Multimedia 12(1), 42–53 (2009)
Article Google Scholar
Rice, J.A.: Mathematical Statistic and Data Analysis, 3rd edn. Duxbury, Belmont (2007)
Google Scholar
Xie, L., Natsev, A., Testic, J.: Dynamic multimodal fusion in video search. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1499–1502 (July 2007)
Google Scholar
Yan, R., Hauptmann, A.G.: Probabilistic latent query analysis for combining multiple retrieval sources. In: SIGIR 2006, pp. 324–331. ACM (August 2006)
Google Scholar
Yan, R., Yang, J., Hauptmann, A.G.: Learning query-class dependent weights in automatic video retrieval. In: ACM MM 2004, pp. 548–555. ACM (2004)
Google Scholar
Zhao, W.-L., Ngo, C.-W.: Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection. IEEE Transactions on Image Processing 18, 412–423 (2009)
Article Google Scholar
Zheng, Y., Duan, L., Tian, Q., Jin, J.: Tv commercial classification by using multi-modal textual information. In: 2006 IEEE International Conference on Multimedia and Expo, pp. 497–500 (July 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University, Singapore
Ehsan Younessian & Deepu Rajan

Authors

Ehsan Younessian
View author publications
You can also search for this author in PubMed Google Scholar
Deepu Rajan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Information Technology, Alpen-Adria-Universität Klagenfurt, Universitätsstr. 65-67, 9020, Klagenfurt, Austria
Klaus Schoeffmann
EURECOM, 2229 Rout des Crêtes, BP 193, 06904, Sophia Antipolis Cedex, France
Bernard Merialdo
School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, 15213-3890, Pittsburgh, PA, USA
Alexander G. Hauptmann
Department of Computer Science, City University of Hong Kong, Tat Chee Ave, Kowloon, Hong Kong
Chong-Wah Ngo
Department of Electronic and Electrical Engineering, University College London, Roberts Building, Torrington Place, WC1E 7JE, London, UK
Yiannis Andreopoulos
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstrasse 9-11 188/2, 1040, Vienna, Austria
Christian Breiteneder

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Younessian, E., Rajan, D. (2012). Multi-modal Solution for Unconstrained News Story Retrieval. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, CW., Andreopoulos, Y., Breiteneder, C. (eds) Advances in Multimedia Modeling. MMM 2012. Lecture Notes in Computer Science, vol 7131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27355-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-27355-1_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27354-4
Online ISBN: 978-3-642-27355-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics