Advertisement

Query-Focused Extractive Video Summarization

  • Aidean SharghiEmail author
  • Boqing Gong
  • Mubarak Shah
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9912)

Abstract

Video data is explosively growing. As a result of the “big video data”, intelligent algorithms for automatic video summarization have (re-)emerged as a pressing need. We develop a probabilistic model, Sequential and Hierarchical Determinantal Point Process (SH-DPP), for query-focused extractive video summarization. Given a user query and a long video sequence, our algorithm returns a summary by selecting key shots from the video. The decision to include a shot in the summary depends on the shot’s relevance to the user query and importance in the context of the video, jointly. We verify our approach on two densely annotated video datasets. The query-focused video summarization is particularly useful for search engines, e.g., to display snippets of videos.

Keywords

User Query Submodular Function Video Shot Video Summarization Video Summary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

A.S. & B.G. are supported in part by NSF IIS #1566511. M.S. is partially supported by NIJ W911NF-14-1-0294.

Supplementary material

419983_1_En_1_MOESM1_ESM.pdf (3.2 mb)
Supplementary material 1 (pdf 3265 KB)

References

  1. 1.
    Pritch, Y., Rav-Acha, A., Gutman, A., Peleg, S.: Webcam synopsis: peeking around the world. In: IEEE 11th International Conference on Computer Vision 2007, ICCV 2007, pp. 1–8. IEEE (2007)Google Scholar
  2. 2.
    Pal, C., Jojic, N.: Interactive montages of sprites for indexing and summarizing security video. In: IEEE Computer Society Conference on CVPR 2005, vol. 2. IEEE (2005)Google Scholar
  3. 3.
    Kang, H.W., Matsushita, Y., Tang, X., Chen, X.Q.: Space-time video montage. In: IEEE Computer Society Conference on CVPR 2006, vol. 2. IEEE (2006)Google Scholar
  4. 4.
    Jiang, R.M., Sadka, A.H., Crookes, D.: Advances in video summarization and skimming. In: Grgic, M., Delac, K., Ghanbari, M. (eds.) Recent Advances in Multimedia Signal Processing and Communications. SCI, vol. 231, pp. 27–50. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Rav-Acha, A., Pritch, Y., Peleg, S.: Making a long video short: dynamic video synopsis. In: 2006 IEEE Computer Society Conference on CVPR, vol. 1. IEEE (2006)Google Scholar
  6. 6.
    Goldman, D.B., Curless, B., Salesin, D., Seitz, S.M.: Schematic storyboarding for video visualization and editing. ACM Trans. Graph. (TOG) 25, 862–871 (2006). ACMCrossRefGoogle Scholar
  7. 7.
    Liu, T., Kender, J.R.: Optimization algorithms for the selection of key frame sequences of variable length. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 403–417. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Aner, A., Kender, J.R.: Video summaries through mosaic-based shot and scene clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 388–402. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Vasconcelos, N., Lippman, A.: A spatiotemporal motion model for video summarization. In: Proceedings of IEEE Computer Society Conference on CVPR 1998, pp. 361–366. IEEE (1998)Google Scholar
  10. 10.
    Wolf, W.: Key frame selection by motion analysis. In: Proceedings of 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1996, vol. 2, pp. 1228–1231. IEEE (1996)Google Scholar
  11. 11.
    Lee, K.M., Kwon, J.: A unified framework for event summarization and rare event detection. In: 2012 IEEE Conference on CVPR. IEEE (2012)Google Scholar
  12. 12.
    Cong, Y., Yuan, J., Luo, J.: Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimedia 14(1), 66–75 (2012)CrossRefGoogle Scholar
  13. 13.
    Ngo, C., Ma, Y., Zhang, H.: Automatic video summarization by graph modeling. In: Proceedings of the Ninth IEEE International Conference on Computer Vision 2003. IEEE (2003)Google Scholar
  14. 14.
    Khosla, A., Hamid, R., Lin, C.J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: Proceedings of the IEEE Conference on CVPR (2013)Google Scholar
  15. 15.
    Kim, G., Sigal, L., Xing, E.: Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In: Proceedings of the IEEE Conference on CVPR (2014)Google Scholar
  16. 16.
    Xiong, B., Grauman, K.: Detecting snap points in egocentric video with a web photo prior. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 282–298. Springer, Heidelberg (2014)Google Scholar
  17. 17.
    Chu, W.S., Song, Y., Jaimes, A.: Video co-summarization: video summarization by visual co-occurrence. In: Proceedings of the IEEE Conference on CVPR (2015)Google Scholar
  18. 18.
    Song, Y., Vallmitjana, J., Stent, A., Jaimes, A.: TVSum: summarizing web videos using titles. In: Proceedings of the IEEE Conference on CVPR (2015)Google Scholar
  19. 19.
    Liu, W., Mei, T., Zhang, Y., Che, C., Luo, J.: Multi-task deep visual-semantic embedding for video thumbnail selection. In: Proceedings of the IEEE Conference on CVPR (2015)Google Scholar
  20. 20.
    Potapov, D., Douze, M., Harchaoui, Z., Schmid, C.: Category-specific video summarization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 540–555. Springer, Heidelberg (2014)Google Scholar
  21. 21.
    Sun, M., Farhadi, A., Seitz, S.: Ranking domain-specific highlights by analyzing edited videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 787–802. Springer, Heidelberg (2014)Google Scholar
  22. 22.
    Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings of the IEEE Conference on CVPR (2015)Google Scholar
  23. 23.
    Gygli, M., Grabner, H., Riemenschneider, H., Van Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 505–520. Springer, Heidelberg (2014)Google Scholar
  24. 24.
    Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: Proceedings of the IEEE Conference on CVPR (2013)Google Scholar
  25. 25.
    Lee, Y.J., Grauman, K.: Predicting important objects for egocentric video summarization. Int. J. Comput. Vis. 114(1), 38–55 (2015)CrossRefMathSciNetGoogle Scholar
  26. 26.
    Liu, D., Hua, G., Chen, T.: A hierarchical visual model for video object summarization. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2178–2190 (2010)CrossRefGoogle Scholar
  27. 27.
    Gong, B., Chao, W.L., Grauman, K., Sha, F.: Diverse sequential subset selection for supervised video summarization. In: Advances in Neural Information Processing Systems, pp. 2069–2077 (2014)Google Scholar
  28. 28.
    Gygli, M., Grabner, H., Van Gool, L.: Video summarization by learning submodular mixtures of objectives. In: Proceedings of the IEEE Conference on CVPR (2015)Google Scholar
  29. 29.
    Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Aggarwal, C.C., Zhai, C.X. (eds.) Mining Text Data, pp. 43–76. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  30. 30.
    Kulesza, A., Taskar, B.: Determinantal point processes for machine learning. arXiv preprint arXiv:1207.6083 (2012)Google Scholar
  31. 31.
    Ghosh, J., Lee, Y.J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: 2012 IEEE Conference on CVPR. IEEE (2012)Google Scholar
  32. 32.
    Yeung, S., Fathi, A., Fei-Fei, L.: Videoset: Video summary evaluation through text. arXiv preprint arXiv:1406.5824 (2014)
  33. 33.
    Daumé III., H., Marcu, D.: Bayesian query-focused summarization. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2006)Google Scholar
  34. 34.
    Schilder, F., Kondadadi, R.: Fastsum: fast and accurate query-based multi-document summarization. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 205–208. Association for Computational Linguistics (2008)Google Scholar
  35. 35.
    Gupta, S., Nenkova, A., Jurafsky, D.: Measuring importance and query relevance in topic-focused multi-document summarization. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, pp. 193–196 (2007)Google Scholar
  36. 36.
    Ellouze, M., Boujemaa, N., Alimi, A.M.: IM(S)\(^{2}\): interactive movie summarization system. J. Vis. Commun. Image Represent. 21(4), 283–294 (2010)CrossRefGoogle Scholar
  37. 37.
    Xiong, B., Kim, G., Sigal, L.: Storyline representation of egocentric videos with an applications to story-based search. In: Proceedings of the IEEE International CVPR (2015)Google Scholar
  38. 38.
    Kulesza, A., Taskar, B.: Learning determinantal point processes. arXiv preprint arXiv:1202.3738 (2012)
  39. 39.
    Chao, W.L., Gong, B., Grauman, K., Sha, F.: Large-margin determinantal point processes. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2015)Google Scholar
  40. 40.
    Affandi, R.H., Kulesza, A., Fox, E.B.: Markov determinantal point processes. arXiv preprint arXiv:1210.4850 (2012)
  41. 41.
    Borth, D., Chen, T., Ji, R., Chang, S.F.: Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM International Conference on Multimedia. ACM (2013)Google Scholar
  42. 42.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vsion 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  43. 43.
    Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)CrossRefzbMATHGoogle Scholar
  44. 44.
    Yu, F., Cao, L., Feris, R., Smith, J., Chang, S.F.: Designing category-level attributes for discriminative visual recognition. In: Proceedings of the IEEE Conference on CVPR (2013)Google Scholar
  45. 45.
    Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of the ACL-04 Workshop, Text Summarization Branches Out, vol. 8 (2004)Google Scholar
  46. 46.
    Zhao, B., Xing, E.: Quasi real-time summarization for consumer videos. In: Proceedings of the IEEE Conference on CVPR (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Center for Research in Computer VisionUniversity of Central FloridaOrlandoUSA

Personalised recommendations