Skip to main content

Automatic Video Editing

  • Chapter
  • First Online:
Smart Algorithms for Multimedia and Imaging

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

Automatic video editing is an artistic process involving at least the steps of selecting the most valuable footage from the points of view of visual quality and the importance of the action filmed; and cutting the footage into a brief and coherent visual story that would be interesting to watch is implemented in a purely data-driven manner. We describe a system that is capable of learning the editing style from samples extracted from the content created by professional editors, including motion picture masterpieces, and of applying this data-driven style to cut non-professional videos with the ability to mimic the individual style of selected reference samples. Visual semantic and aesthetic features are extracted by an ImageNet-trained convolutional neural network, and the editing controller can be trained by an imitation learning algorithm or reinforcement learning algorithm. As a result, during the test the controller shows signs of observing basic cinematography editing rules learned from the corpus of motion pictures masterpieces. The loss function developed for learning approaches can be efficiently applied in a global optimisation setting of the automatic video editing problem using dynamic programming.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Arev, I., Park, H.S., Sheikh, Y., Hodgins, J., Shamir, A.: Automatic editing of footage from multiple social cameras. ACM Trans. Graph. 33(4), 1–11 (2014)

    Article  Google Scholar 

  • Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM. 45(6), 891–923 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • ASC Unveils List of 100 Milestone Films in Cinematography of the 20th Century (2019) Accessed on 06 October 2020. https://theasc.com/news/asc-unveils-list-of-100-milestone-films-in-cinematography-of-the-20th-century

  • Boiman, O., Rav-Acha, A.: System and method for semi-automatic video editing. US Patent 9,570,107 (2017)

    Google Scholar 

  • Cong, Y., Yuan, J., Luo, J.: Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Transactions on Multimedia. 14(1), 66–75 (2012)

    Article  Google Scholar 

  • Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behaviour recognition via sparse spatio-temporal features. In: Proceedings of EEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)

    Chapter  Google Scholar 

  • Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In: Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, pp. 726–740 (1987)

    Google Scholar 

  • Jin, X., Chi, J., Peng, S., Tian, Y., Ye, C., Li, X.: Deep image aesthetics classification using inception modules and fine-tuning connected layer. In: Proceedings of the 8th IEEE International Conference on Wireless Communications and Signal Processing, pp. 1–6 (2016)

    Google Scholar 

  • Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)

    Google Scholar 

  • Langford, J., Li, L., Strehl, A.: Vowpal Wabbit Online Learning Project (2007) Accessed on 06 October 2020. http://hunch.net/?p=309

  • Leake, M., Davis, A., Truong, A., Agrawala, M.: Computational video editing for dialog-driven scenes. ACM Trans. Graph. 36(4), 130 (2017)

    Article  Google Scholar 

  • Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vision. (2009). Accessed on 06 October 2020). https://doi.org/10.1109/ICCV.2007.4409116

  • Matias, J., Phan, H.: System and method of generating video from video clips based on moments of interest within the video clips. US Patent 10,186,298 (2017)

    Google Scholar 

  • Médioni, T.: Three-dimensional convolutional neural networks for video highlight detection. US Patent 9,836,853 (2017)

    Google Scholar 

  • Merabti, B., Christie, M., Bouatouch, K.: A virtual director using hidden Markov models. In: Computer Graphics Forum. Wiley (2015). https://doi.org/10.1111/cgf.12775.hal-01244643

  • Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzku, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vis. 65 (1/2), pp. 43–72 (2005)

    Google Scholar 

  • Park, H.S., Jain, E., Sheikh, Y.: 3D social saliency from head-mounted cameras. In: Proceedings of the 25th International Conference on Neural Information Processing Systems., vol. 1, pp. 422–430 (2012)

    Google Scholar 

  • Podlesnaya, A., Podlesnyy, S.: Deep learning based semantic video indexing and retrieval. In: Proceedings of SAI Intelligent Systems Conference, pp. 359–372 (2016)

    Google Scholar 

  • Podlesnyy, S.: Towards data-driven automatic video editing. In: Liu, Y., Wang, L., Zhao, L., Yu, Z. (eds.) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. Advances in Intelligent Systems and Computing, vol. 1074. Springer, Cham (2020)

    Google Scholar 

  • Pudovkin, V.I.: Model (sitter) instead of actor. In: Collected Works, vol. 1, p. 184, Moscow (1974) (in Russian)

    Google Scholar 

  • Rav-Acha, A., Boiman, O.: System and method for semi-automatic video editing. US Patent. 9, 554,111 (2017)

    Google Scholar 

  • Rav-Acha, A., Boiman, O.: Method and system for automatic B-roll video production. US Patent 9,524,752 (2016)

    Google Scholar 

  • Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Proceedings of the 32nd Conference on Neural Information Processing Systems, pp. 1658–1669 (2018)

    Google Scholar 

  • Ross, S., Gordon, G.J., Bagnell, J.A.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pp. 627–635 (2011)

    Google Scholar 

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. CoRR, arXiv: 1409.0575 (2014)

    Google Scholar 

  • Safonov, I.V., Kurilin, I.V., Rychagov, M.N., Tolstaya, E.V.: Document Image Processing for Scanning and Printing. Springer Nature Switzerland AG (2019)

    Google Scholar 

  • Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25(3), 835–846 (2006)

    Article  Google Scholar 

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. CoRR, arXiv:1409.4842 (2014)

    Google Scholar 

  • Tola, E., Lepetit, V., Fua, P.: DAISY: An efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Pattern Anal. Mach. Intellig. 32(5), 815–830 (2010)

    Article  Google Scholar 

  • Tsivian, Y.: Cinemetrics: part of the humanities’ cyberinfrastructure. In: Ross, M., Grauer, M., Freisleben, B. (eds.) Digital Tools in Media Studies, vol. 9, pp. 93–100. Transcript Verlag, Bielefeld (2009)

    Chapter  Google Scholar 

  • Uchihachi, S., Foote, J.T., Wilcox, L.: Automatic video summarization using a measure of shot importance and a frame-packing method. US Patent 6,535,639 (2003)

    Google Scholar 

  • Zhao, B., Xing, E.P.: Quasi real-time summarization for consumer videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2513–2520 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey Y. Podlesnyy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Podlesnyy, S.Y. (2021). Automatic Video Editing. In: Rychagov, M.N., Tolstaya, E.V., Sirotenko, M.Y. (eds) Smart Algorithms for Multimedia and Imaging. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-66741-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66741-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66740-5

  • Online ISBN: 978-3-030-66741-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics