Abstract
We propose a multi-modal approach to retrieve associated news stories sharing the same main topic. In the textual domain, we utilize Automatic Speech Recognition (ASR) and refined Optical Character Recognition (OCR) transcripts while in the visual domain we employ a Near Duplicate Keyframe detection method to identify stories with common visual clues. In addition, we adopt another visual representation namely semantic signature, indicating pre-defined semantic concepts included in the news story, to improve the discriminativness of visual modality. We propose a query-class weighting scheme to integrate the retrieval outcomes gained from visual modalities. Experimental results show the distinguishing power of the enhanced representation in individual modalities and the superiority of our fusion approach performance compared to existing strategies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16(6), 345–379 (2010)
Das, D., Chen, D., Hauptmann, A.G.: Improving multimedia retrieval with a video ocr. In: Gevers, T., Jain, R.C., Santini, S. (eds.) Society of Photo-Optical Instrumentation Engineers (SIPE) Conference, vol. 6820, p. 68200B. SPIE (January 2008)
Hauptmann, A.G., Jin, R., Ng, T.D.: Multi-modal information retrieval from broadcast video using ocr and speech recognition. In: JCDL 2002, pp. 160–161. ACM (July 2002)
http://aspell.net (last visited August 2010)
http://jocr.sourceforge.net (last visited August 2010)
http://www-nlpir.nist.gov/projects/tv2006/tv2006.html (last visited August 2010)
Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study. IEEE Transactions on Multimedia 12(1), 42–53 (2009)
Rice, J.A.: Mathematical Statistic and Data Analysis, 3rd edn. Duxbury, Belmont (2007)
Xie, L., Natsev, A., Testic, J.: Dynamic multimodal fusion in video search. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1499–1502 (July 2007)
Yan, R., Hauptmann, A.G.: Probabilistic latent query analysis for combining multiple retrieval sources. In: SIGIR 2006, pp. 324–331. ACM (August 2006)
Yan, R., Yang, J., Hauptmann, A.G.: Learning query-class dependent weights in automatic video retrieval. In: ACM MM 2004, pp. 548–555. ACM (2004)
Zhao, W.-L., Ngo, C.-W.: Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection. IEEE Transactions on Image Processing 18, 412–423 (2009)
Zheng, Y., Duan, L., Tian, Q., Jin, J.: Tv commercial classification by using multi-modal textual information. In: 2006 IEEE International Conference on Multimedia and Expo, pp. 497–500 (July 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Younessian, E., Rajan, D. (2012). Multi-modal Solution for Unconstrained News Story Retrieval. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, CW., Andreopoulos, Y., Breiteneder, C. (eds) Advances in Multimedia Modeling. MMM 2012. Lecture Notes in Computer Science, vol 7131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27355-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-27355-1_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27354-4
Online ISBN: 978-3-642-27355-1
eBook Packages: Computer ScienceComputer Science (R0)