Multimedia Tools and Applications

, Volume 76, Issue 5, pp 6683–6707 | Cite as

Modeling the timing of cuts in automatic editing of concert videos

  • Mikko J. Roininen
  • Jussi Leppänen
  • Antti J. Eronen
  • Igor D. D. Curcio
  • Moncef Gabbouj


Increasing amount of video content is being recorded by people in public events. However, the editing of such videos can be challenging for the average user. We describe an approach for modeling the shot cut timing of professionally edited concert videos. We analyze the temporal positions of cuts in relation to the music meter grid and form Markov chain models from the found switching patterns and their occurrence frequencies. The stochastic Markov chain models are combined with audio change point analysis and cut deviation models for automatically generating temporal editing cues for unedited concert video recordings. Videos edited according to the modeling are compared in a user study against a baseline automatic editing method as well as against videos edited by hand. The study results show that users prefer the cut timing from the proposed system over the baseline with a clear margin, whereas a much smaller difference is observed in the preference of hand-made videos over the proposed method.


Automatic video editing Cut timing Example-based modeling Live music content analysis 


  1. 1.
    Cai R, Zhang L, Jing F, Lai W, Ma W (2007) Automated music video generation using WEB image resource. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, ICASSP 2007. doi:10.1109/ICASSP.2007.366341, Honolulu, pp 737–740
  2. 2.
    Chu WT, Chen JC, Wu JL (2007) Tiling slideshow: an audiovisual presentation method for consumer photos. IEEE Multimedia 14(3):36–45. doi:10.1109/MMUL.2007.66 CrossRefGoogle Scholar
  3. 3.
    Dwelle T (1994) Music video 101: home camcorder production. DASH Entertainment Productions Incorporated.
  4. 4.
    Ellis DP (2007) Beat tracking by dynamic programming. J New Music Res 36(1):51–60CrossRefGoogle Scholar
  5. 5.
    Eronen A, Klapuri A (2010) Music tempo estimation with k-NN regression. IEEE Audio, Speech Language Process 18(1):50–57CrossRefGoogle Scholar
  6. 6.
    Foote J, Cooper M, Girgensohn A (2002) Creating music videos using automatic media analysis. In: Proceedings of the tenth ACM international conference on multimedia, MULTIMEDIA ’02. doi:10.1145/641007.641119. ACM, New York, pp 553–560
  7. 7.
    Gillet O, Essid S, Richard G (2007) On the correlation of automatic audio and visual segmentations of music videos. IEEE Trans Circuits Syst Video Technol 17(3):347–355. doi:10.1109/TCSVT.2007.890831 CrossRefGoogle Scholar
  8. 8.
    Hua XS, Lu L, Zhang HJ (2004) Automatic music video generation based on temporal pattern analysis. In: Proceedings of the 12th annual ACM international conference on multimedia, MULTIMEDIA ’04. doi:10.1145/1027527.1027641. ACM, New York, pp 472–475
  9. 9.
    Hua XS, Lu L, Zhang HJ (2004) Optimization-based automated home video editing system. IEEE Trans Circuits Syst Video Technol 14(5):572–583. doi:10.1109/TCSVT.2004.826750 CrossRefGoogle Scholar
  10. 10.
    Kennedy L, Naaman M (2009) Less talk, more rock: Automated organization of community-contributed collections of concert videos. In: Proceedings of the 18th international conference on world wide web, WWW ’09. doi:10.1145/1526709.1526752. ACM, New York, pp 311–320
  11. 11.
    Klapuri A, Eronen A, Astola J (2006) Analysis of the meter of acoustic musical signals. IEEE Audio, Speech, Language Process 14(1):342–355. doi:10.1109/TSA.2005.854090 CrossRefGoogle Scholar
  12. 12.
    Lai PS, Cheng SS, Sun SY, Huang TY, Su JM, Xu YY, Chen YH, Chuang SC, Tseng CL, Hsieh CL, Lu YL, Shen YC, Chen JR, Nie JB, Tsai FP, Huang HC, Pao HT, Fu HC (2005) Automated information mining on multimedia tv news archives Proceedings of the 9th international conference on knowledge-based intelligent information and engineering systems - Volume Part II, KES’05. Springer-Verlag, Berlin, Heidelberg, pp 1238–1244Google Scholar
  13. 13.
    Liao C, Wang PP, Zhang Y (2008) Mining association patterns between music and video clips in professional MTV. In: Proceedings of the 15th international multimedia modeling conference on advances in multimedia modeling, MMM ’09. doi:10.1007/978-3-540-92892-8_41. Springer-Verlag, Berlin, Heidelberg, pp 401–412
  14. 14.
    Matsuo Y, Amano M, Uehara K (2002) Mining video editing rules in video streams. In: Proceedings of the 10th ACM international conference on multimedia, MULTIMEDIA ’02. doi:10.1145/641007.641058. ACM, New York, pp 255–258
  15. 15.
    Naci U (2010) Multimedia content analysis, indexing and summarization: a perspective on real-life uses cases. Phd thesis, Technische Universiteit Delft.
  16. 16.
    Nakano T, Murofushi S, Goto M, Morishima S (2011) Dancereproducer: An automatic mashup music video generation system by reusing dance video clips on the web. In: Proceedings of the 8th sound and music computing conference (SMC 2011), pp 183–189Google Scholar
  17. 17.
    Nitta N, Babaguchi N (2011) Example-based video remixing. Multimedia Tools Appl 51(2):649–673. doi:10.1007/s11042-010-0633-9 CrossRefGoogle Scholar
  18. 18.
    Norris JR (1998) Markov chains. Cambridge series in statistical and probabilistic mathematics. Cambridge University PressGoogle Scholar
  19. 19.
    Ohya H, Morishima S (2013) Automatic mash up music video generation system by remixing existing video content. In: International conference on culture and computing (Culture Computing), 2013. doi:10.1109/CultureComputing.2013.44, pp 157–158
  20. 20.
    Ojala J, Mate S, Curcio IDD, Lehtiniemi A, Väänänen-Vainio-Mattila K (2014) Automated creation of mobile video remixes: user trial in three event contexts. In: Proceedings of the 13th international conference on mobile and ubiquitous multimedia, MUM ’14. doi:10.1145/2677972.2677975. ACM, New York, pp 170–179
  21. 21.
    Peeters G, Papadopoulos H (2011) Simultaneous beat and downbeat-tracking using a probabilistic framework: theory and large-scale evaluation. IEEE Audio, Speech, Language Process 19(6):1754–1769. doi:10.1109/TASL.2010.2098869 CrossRefGoogle Scholar
  22. 22.
    Saini M, Venkatagiri SP, Ooi WT, Chan MC (2013) The Jiku mobile video dataset. In: Proceedings of the 4th ACM multimedia systems conference, MMSys ’13. doi:10.1145/2483977.2483990. ACM, New York, pp 108–113
  23. 23.
    Saini MK, Gadde R, Yan S, Ooi WT (2012) MoViMash: online mobile video mashup. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12. doi:10.1145/2393347.2393373. ACM, New York, pp 139–148
  24. 24.
    Shamma DA, Pardo B, Hammond KJ (2005) MusicStory: a personalized music video creator. In: Proceedings of the 13th annual ACM international conference on multimedia, MULTIMEDIA ’05. doi:10.1145/1101149.1101278. ACM, New York, pp 563–566
  25. 25.
    Shao X, Xu C, Maddage NC, Tian Q, Kankanhalli MS, Jin JS (2006) Automatic summarization of music videos. ACM Trans Multimedia Comput Commun Appl 2(2):127–148. doi:10.1145/1142020.1142023 CrossRefGoogle Scholar
  26. 26.
    Shrestha P, de With PH, Weda H, Barbieri M, Aarts EH (2010) Automatic mashup generation from multiple-camera concert recordings. In: Proceedings of the international conference on multimedia, MM ’10. doi:10.1145/1873951.1874023. ACM, New York, pp 541–550
  27. 27.
    Tomás B, Pereira F (2011) Musical slideshow: boosting user experience in photo presentation. Multimedia Tools Appl 55(3):627–653. doi:10.1007/s11042-010-0582-3 CrossRefGoogle Scholar
  28. 28.
    Vihavainen S, Mate S, Seppälä L, Cricri F, Curcio IDD (2011) We want more: human-computer collaboration in mobile social video remixing of music concerts. ACM, New YorkGoogle Scholar
  29. 29.
    Wang J, Chng E, Xu C, Lu H, Tian Q (2007) Generation of personalized music sports video using multimodal cues. IEEE Trans Multimedia 9(3):576–588. doi:10.1109/TMM.2006.888013 CrossRefGoogle Scholar
  30. 30.
    Wang JC, Yang YH, Jhuo IH, Lin YY, Wang HM (2012) The acousticvisual emotion Guassians model for automatic generation of music video. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12. doi:10.1145/2393347.2396494. ACM, New York, pp 1379–1380
  31. 31.
    Wu X, Xu B, Qiao Y, Tang X (2012) Automatic music video generation: cross matching of music and image. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12. doi:10.1145/2393347.2396495. ACM, New York, pp 1381–1382
  32. 32.
    Xu S, Jin T, Lau F (2008) Automatic generation of music slide show using personal photos. In: 10th IEEE international symposium on multimedia, 2008. ISM 2008. doi:10.1109/ISM.2008.39, pp 214–219
  33. 33.
    Yoon JC, Lee IK, Byun S (2009) Automated music video generation using multi-level feature-based segmentation. Multimedia Tools Appl 41(2):197–214. doi:10.1007/s11042-008-0225-0 CrossRefGoogle Scholar
  34. 34.
    Young S, Young S (1994) The HTK hidden Markov model toolkit: design and philosophy. Entropic Cambridge Research Laboratory Ltd 2:2–44Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Mikko J. Roininen
    • 1
  • Jussi Leppänen
    • 2
  • Antti J. Eronen
    • 2
  • Igor D. D. Curcio
    • 2
  • Moncef Gabbouj
    • 1
  1. 1.Department of Signal ProcessingTampere University of TechnologyTampereFinland
  2. 2.Nokia TechnologiesTampereFinland

Personalised recommendations