Automatic System for Editing Dance Videos Recorded Using Multiple Cameras

  • Shuhei TsuchidaEmail author
  • Satoru Fukayama
  • Masataka Goto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10714)


As social media has matured, uploading video content has increased. Multiple videos of physical performances, such as dance, are difficult to integrate into high-quality videos without knowledge of video-editing principles. In this study, we present a system that automatically edits dance-performance videos taken from multiple viewpoints into a more attractive and sophisticated dance video. Our system can crop the frame of each camera appropriately by using the performer’s behavior and skeleton information. The system determines the camera switches and cut lengths following a probabilistic model of general cinematography guidelines and of knowledge extracted from expert experience. In this study, our system automatically edited a dance video of four performers taken from multiple viewpoints, and ten video-production experts evaluated the generated video. As a result of a comparison of another automatic editing system, our system tended to be performed better.


Video editing Dance Computational cinematography Automation 



This work was supported in part by JST ACCEL Grant Number JPMJAC1602, Japan.


  1. 1.
    Heck, R., Wallick, M., Gleicher, M.: Virtual videography. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 3(1) (2007). Article No. 4
  2. 2.
    Arev, I., Park, H.S., Sheikh, Y., Hodgins, J., Shamir, A.: Automatic editing of footage from multiple social cameras. ACM Trans. Graph. (TOG) 33(4) (2014). Article No. 81
  3. 3.
    Ranjan, A., Birnholtz, J., Balakrishnan, R.: Improving meeting capture by applying television production principles with audio and motion detection. In: Proceedings of CHI 2008, pp. 227–236. ACM, New York (2008).
  4. 4.
    Zsombori, V., Frantzis, M., Guimaraes, R.L., Ursu, M.F., Cesar, P., Kegel, I., Craigie, R., Bulterman, D.C.: Automatic generation of video narratives from shared UGC. In: Proceedings of Hypertext 2011, pp. 325–334. ACM, New York (2011).
  5. 5.
    Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: Proceedings of CVPR 2013, pp. 2714–2721. IEEE, New York (2013).
  6. 6.
    Jain, E., Sheikh, Y., Shamir, A., Hodgins, J.: Gaze-driven video re-editing. ACM Trans. Graph. (TOG) 34(2) (2015). Article No. 21
  7. 7.
    Shin, H.V., Berthouzoz, F., Li, W., Durand, F.: Visual transcripts: lecture notes from blackboard-style lecture videos. ACM Trans. Graph. (TOG) 34(6) (2015). Article No. 240
  8. 8.
    Kumar, M., Gandhi, V., Ronfard, R., Gleicher, M.: Zooming on all actors: automatic focus + context split screen video generation. Comput. Graph. Forum 36, 455–465 (2017). Article No. 2CrossRefGoogle Scholar
  9. 9.
    Sun, X., Foote, J., Kimber, D., Manjunath, B.S.: Region of interest extraction and virtual camera control based on panoramic video capturing. IEEE Trans. Multimed. 7(5), 981–990 (2005). CrossRefGoogle Scholar
  10. 10.
    Roininen, M.J., Leppnen, J., Eronen, A.J., Curcio, I.D., Gabbouj, M.: Modeling the timing of cuts in automatic editing of concert videos. Multimed. Tools Appl. 76(5), 6683–6707 (2017). CrossRefGoogle Scholar
  11. 11.
    Mate, S., Curcio, I.D.: Automatic video remixing systems. IEEE Commun. Mag. 55(1), 180–187 (2017). CrossRefGoogle Scholar
  12. 12.
    Truong, A., Berthouzoz, F., Li, W., Agrawala, M.: QuickCut: an interactive tool for editing narrated video. In: Proceedings of UIST 2016, pp. 497–507. ACM, New York (2016).
  13. 13.
    Leake, M., Davis, A., Truong, A., Agrawala, M.: Computational video editing for dialogue-driven scenes. ACM Trans. Graph. (TOG) 36(4) (2017). Article No. 130
  14. 14.
    Jeong, K.A., Suk, H.J.: Jockey time: making video playback to enhance emotional effect. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 62–66. ACM, New York (2016).
  15. 15.
    Nam, T.J., Lee, J.H., Park, S., Suk, H.J.: Understanding the relation between emotion and physical movements. Int. J. Affect. Eng. 13, 217–226 (2014). Article No. 3CrossRefGoogle Scholar
  16. 16.
    Montepare, J.M., Goldstein, S.B., Clausen, A.: The identification of emotions from gait information. J. Nonverbal Behav. 11(1), 33–42 (1987). CrossRefGoogle Scholar
  17. 17.
    Foust, J.C., Fink, E.J., Gross, L.S.: Video Production: Disciplines and Techniques. Taylor and Francis, Abingdon (2012)Google Scholar
  18. 18.
    Bowen, C.J.: Grammar of the Edit. Taylor and Francis, Abingdon (2013)Google Scholar
  19. 19.
    Böck, S., Krebs, F., Widmer, G.: Joint beat and downbeat tracking with recurrent neural networks. In: ISMIR, pp. 255–261 (2016)Google Scholar
  20. 20.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)Google Scholar
  21. 21.
    Farneback, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.National Institute of Advanced Industrial Science and Technology (AIST)TsukubaJapan

Personalised recommendations