Skip to main content
Log in

Multi-modal fusion for associated news story retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we investigate multi-modal approaches to retrieve associated news stories sharing the same main topic. In the visual domain, we employ near duplicate keyframe/scene detection method using local signatures to identify stories with mutual visual cues. Further, to improve the effectiveness of visual representation, we develop a semantic signature that contains pre-defined semantic visual concepts in a news story. We propose a visual concept weighting scheme to combine local and semantic signature similarities to obtain the enhanced visual content similarity. In the textual domain, we utilize Automatic Speech Recognition (ASR) and refined Optical Character Recognition (OCR) transcripts and determine the enhanced textual similarity using the proposed semantic similarity measure. To fuse textual and visual modalities, we investigate different early and late fusion approaches. In the proposed early fusion approach, we employ two methods to retrieve the visual semantics using textual information. Next, using a late fusion approach, we integrate uni-modal similarity scores and the determined early fusion similarity score to boost the final retrieval performance. Experimental results show the usefulness of the enhanced visual content similarity and the early fusion approach, and the superiority of our late fusion approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16(6):345–379

    Article  Google Scholar 

  2. Aytar Y, Shah M, Luo J (2008) Utilizing semantic word similarity measures for video retrieval. In: Proceedings of IEEE conference on Computer Vision and Pattern Recognition, CVPR ’08, pp 1–8

  3. Boyd PS, Alexander R (2008) Broadcast journalism: techniques of radio and television news. Focal Press

  4. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2):121–167

    Article  Google Scholar 

  5. Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27

    Article  Google Scholar 

  6. Do Q, Roth D, Sammons M, Tu Y, Vydiswaran V (2009) Robust, light-weight approaches to compute lexical similarity. Technical report, University of Illinois

  7. Donald K, Smeaton A (2005) A comparison of score, rank and probability-based fusion methods for video shot retrieval. In: Image and video retrieval, pp 61–70

  8. Hardoon DR, Szedmak SR, Shawe-taylor JR (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  MATH  Google Scholar 

  9. Hauptmann AG, Jin R, Ng TD (2002) Multi-modal information retrieval from broadcast video using ocr and speech recognition. In: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’02, pp 160–161

  10. Ionescu B, Mironica I, Seyerlehner K, Knees P, Schlüter J, Schedl M, Cucu H, Buzo A, Lambert P (2012) Arf @ mediaeval 2012: multimodal video classification. In: MediaEval

  11. Jiang YG, Yang J, Ngo CW, Hauptmann AG (2009) Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans Multimedia 12(1):42–53

    Article  Google Scholar 

  12. Kolb P (2009) Experiments on the difference between semantic similarity and relatedness. In: Proceedings of the 17th Nordic conference of computational linguistics, NODALIDA ’09 vol 4, pp 81–88

  13. Rice JA (2007) Mathematical statistic and data analysis, 3rd edn. Duxbury, Belmont, CA

    Google Scholar 

  14. Sargin ME, Yemez Y, Erzin E, Tekalp AM (2007) Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Trans Multimedia 9(7):1396–1403

    Article  Google Scholar 

  15. Srikanth M, Bowden M, Moldovan D (2005) LCC at trecvid 2005. In: Proceedings of NIST TREC video retrieval evaluation. Citeseer, pp 3–6

  16. Stark MM, Riesenfeld RF (1998) Wordnet: an electronic lexical database. In: Proceedings of 11th Eurographics workshop on rendering. MIT Press

  17. TRECVID (2006) www-nlpir.nist.gov/projects/tv2006/tv2006.html. Retrieved 15 May 2011

  18. Wu X, Hauptmann AG, Ngo CW (2007) Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th ACM international conference on multimedia, MM ’07, pp 218–227

  19. Wu X, Hauptmann AG, Ngo C-W (2007) Novelty detection for cross-lingual news stories with visual duplicates and speech transcripts. In: Proceedings of the 15th ACM international conference on multimedia, MM ’07, pp 168–177

  20. Wu X, Takimoto M, Satoh S, Adachi J (2008) Scene duplicate detection based on the pattern of discontinuities in feature point trajectories. In: Proceedings of the 16th ACM international conference on multimedia, MM ’08, p 51

  21. Yan R, Yang J, Hauptmann AG (2004) Learning query-class dependent weights in automatic video retrieval. In: Proceedings of the 12th annual ACM international conference on multimedia, MM ’04, pp 548–555

  22. Younessian E, Rajan D (2012) Multi-modal solution for unconstrained news story retrieval. In: Proceedings of the 18th international conference on advances in Multimedia Modeling, MMM ’12, pp 186–195

  23. Younessian E, Rajan D (2012) Scene signatures for unconstrained news video stories. In: Proceedings of the 18th international conference on advances in Multimedia Modeling, MMM ’12, pp 77–88

  24. Younessian E, Rajan D, Chng ES (2009) Improved keypoint matching method for near-duplicate keyframe retrieval. In: Proceedings of IEEE International Symposium on Multimedia, ISM ’09, pp 298–303

  25. Zhong Lan Z, Bao L, Yu S-I, Liu W, Hauptmann AG (2012) Double fusion for multimedia event detection. In: Proceedings of the 18th international conference on Multimedia and Modeling, MMM ’12, vol 7131. Lecture notes in computer science. Springer, pp 173–185

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ehsan Younessian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Younessian, E., Rajan, D. Multi-modal fusion for associated news story retrieval. Multimed Tools Appl 74, 2563–2585 (2015). https://doi.org/10.1007/s11042-013-1404-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1404-1

Keywords

Navigation