Skip to main content

Combining Topic Model and Relevance Filtering to Localize Relevant Frames in Web Videos

  • Conference paper
Advances in Multimedia Modeling

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7733))

Abstract

Numerous web videos associated with rich metadata are available on the Internet today. While such metadata like video tags bring us facilitations and opportunities for video search and multimedia content understanding, some challenges also arise due to the fact that those video tags are usually annotated at the video level while many tags actually only describe parts of the video content. Thus how to localize the relevant parts or frames of web video for given tags is the key to many applications and research tasks. In this paper we propose to combine topic model and relevance filtering to localize relevant frames. Our method is designed in three steps. First we apply relevance filtering to assign relevance scores to video frames and a raw relevant frame set is obtained by selecting the top ranked frames. Then we separate the frames into topics by mining the underlying semantics using Latent Dirichlet Allocation and use the raw relevance set as validation set to select relevant topics. Finally, the topical relevances are used to refine the raw relevant frame set and the final results are obtained. Experiment results on real web videos validate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ulges, A., Schulze, C., Koch, M., Breuel, T.: Learning automatic concept detectors from online video. Computer Vision and Image Understanding 114(4), 428–438 (2010)

    Article  Google Scholar 

  2. Ulges, A., Schulze, C., Breuel, T.: Identifying Relevant Frames in Weakly Labeled Videos for Training Concept Detectors. In: Proc. ACM Conference on Image and Video Retrieval (2008)

    Google Scholar 

  3. Borth, D., Ulges, A., Breuel, T.: Relevance Filtering meets Active Learning: Improving Web-based Concept Detectors. In: Proc. International Conference on Multimedia Information Retrieval (2010)

    Google Scholar 

  4. Ulges, A., Schulze, C., Breuel, T.: Multiple Instance Learning from Weakly Labeled Videos. In: SAMT Workshop on Cross-Media Information Analysis and Retrieval (2008)

    Google Scholar 

  5. Ballan, L., Bertini, M., Del Bimbo, A., Meoni, M., Serra, G.: Tag suggestion and localization in user-generated videos based on social knowledge. In: Proc. ACM Multimedia Intl Workshop on Social Media (2010)

    Google Scholar 

  6. Zhang, M.-L., Zhou, Z.-H.: Improve Multi-Instance Neural Networks through Feature Selection. Neural Process Letters 19(1), 1–10 (2004)

    Article  Google Scholar 

  7. Shen, J., Cheng, Z.: Personalized video similarity measure. Multimedia Syst. 17(5), 421–433 (2011)

    Article  Google Scholar 

  8. Wang, M., Hua, X.-S., Tang, J., Hong, R.: Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation. IEEE Transactions on Multimedia 11(3), 465–476 (2009)

    Article  Google Scholar 

  9. Shen, J., Tao, D., Li, X.: Modality Mixture Projections for Semantic Video Event Detection. IEEE Trans. Circuits Syst. Video Techn. 18(11), 1587–1596 (2008)

    Article  Google Scholar 

  10. Wang, M., Yang, K., Hua, X.-S., Zhang, H.-J.: Towards a Relevant and Diverse Search of Social Images. IEEE Transactions on Multimedia 12(8), 829–842 (2010)

    Article  Google Scholar 

  11. Yanai, K.: Automatic Web Image Selection with a Probabilistic Latent Topic Model. In: Proc. of the Seventeenth International World Wide Web Conference, Poster Paper (2008)

    Google Scholar 

  12. Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning Object Categories from Google’s Image Search. In: Proc. of the 10th Inter. Conf. on Computer Vision (2005)

    Google Scholar 

  13. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  14. Wang, C., Blei, D., Fei-Fei, L.: Simultaneous Image Classification and Annotation. In: Proc. Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  15. Feng, Y., Lapata, M.: Topic Models for Image Annotation and Text Illustration. In: Proc. Human Language Technologies (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yi, L., Li, H., Neo, SY. (2013). Combining Topic Model and Relevance Filtering to Localize Relevant Frames in Web Videos. In: Li, S., et al. Advances in Multimedia Modeling. Lecture Notes in Computer Science, vol 7733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35728-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35728-2_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35727-5

  • Online ISBN: 978-3-642-35728-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics