Automatic Video Editing

Podlesnyy, Sergey Y.

doi:10.1007/978-3-030-66741-2_6

Sergey Y. Podlesnyy⁹

Part of the book series: Signals and Communication Technology ((SCT))

441 Accesses
1 Citations

Abstract

Automatic video editing is an artistic process involving at least the steps of selecting the most valuable footage from the points of view of visual quality and the importance of the action filmed; and cutting the footage into a brief and coherent visual story that would be interesting to watch is implemented in a purely data-driven manner. We describe a system that is capable of learning the editing style from samples extracted from the content created by professional editors, including motion picture masterpieces, and of applying this data-driven style to cut non-professional videos with the ability to mimic the individual style of selected reference samples. Visual semantic and aesthetic features are extracted by an ImageNet-trained convolutional neural network, and the editing controller can be trained by an imitation learning algorithm or reinforcement learning algorithm. As a result, during the test the controller shows signs of observing basic cinematography editing rules learned from the corpus of motion pictures masterpieces. The loss function developed for learning approaches can be efficiently applied in a global optimisation setting of the automatic video editing problem using dynamic programming.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arev, I., Park, H.S., Sheikh, Y., Hodgins, J., Shamir, A.: Automatic editing of footage from multiple social cameras. ACM Trans. Graph. 33(4), 1–11 (2014)
Article Google Scholar
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM. 45(6), 891–923 (1998)
Article MathSciNet MATH Google Scholar
ASC Unveils List of 100 Milestone Films in Cinematography of the 20th Century (2019) Accessed on 06 October 2020. https://theasc.com/news/asc-unveils-list-of-100-milestone-films-in-cinematography-of-the-20th-century
Boiman, O., Rav-Acha, A.: System and method for semi-automatic video editing. US Patent 9,570,107 (2017)
Google Scholar
Cong, Y., Yuan, J., Luo, J.: Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Transactions on Multimedia. 14(1), 66–75 (2012)
Article Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behaviour recognition via sparse spatio-temporal features. In: Proceedings of EEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Chapter Google Scholar
Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In: Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, pp. 726–740 (1987)
Google Scholar
Jin, X., Chi, J., Peng, S., Tian, Y., Ye, C., Li, X.: Deep image aesthetics classification using inception modules and fine-tuning connected layer. In: Proceedings of the 8th IEEE International Conference on Wireless Communications and Signal Processing, pp. 1–6 (2016)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
Langford, J., Li, L., Strehl, A.: Vowpal Wabbit Online Learning Project (2007) Accessed on 06 October 2020. http://hunch.net/?p=309
Leake, M., Davis, A., Truong, A., Agrawala, M.: Computational video editing for dialog-driven scenes. ACM Trans. Graph. 36(4), 130 (2017)
Article Google Scholar
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vision. (2009). Accessed on 06 October 2020). https://doi.org/10.1109/ICCV.2007.4409116
Matias, J., Phan, H.: System and method of generating video from video clips based on moments of interest within the video clips. US Patent 10,186,298 (2017)
Google Scholar
Médioni, T.: Three-dimensional convolutional neural networks for video highlight detection. US Patent 9,836,853 (2017)
Google Scholar
Merabti, B., Christie, M., Bouatouch, K.: A virtual director using hidden Markov models. In: Computer Graphics Forum. Wiley (2015). https://doi.org/10.1111/cgf.12775.hal-01244643
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzku, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vis. 65 (1/2), pp. 43–72 (2005)
Google Scholar
Park, H.S., Jain, E., Sheikh, Y.: 3D social saliency from head-mounted cameras. In: Proceedings of the 25th International Conference on Neural Information Processing Systems., vol. 1, pp. 422–430 (2012)
Google Scholar
Podlesnaya, A., Podlesnyy, S.: Deep learning based semantic video indexing and retrieval. In: Proceedings of SAI Intelligent Systems Conference, pp. 359–372 (2016)
Google Scholar
Podlesnyy, S.: Towards data-driven automatic video editing. In: Liu, Y., Wang, L., Zhao, L., Yu, Z. (eds.) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. Advances in Intelligent Systems and Computing, vol. 1074. Springer, Cham (2020)
Google Scholar
Pudovkin, V.I.: Model (sitter) instead of actor. In: Collected Works, vol. 1, p. 184, Moscow (1974) (in Russian)
Google Scholar
Rav-Acha, A., Boiman, O.: System and method for semi-automatic video editing. US Patent. 9, 554,111 (2017)
Google Scholar
Rav-Acha, A., Boiman, O.: Method and system for automatic B-roll video production. US Patent 9,524,752 (2016)
Google Scholar
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Proceedings of the 32nd Conference on Neural Information Processing Systems, pp. 1658–1669 (2018)
Google Scholar
Ross, S., Gordon, G.J., Bagnell, J.A.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pp. 627–635 (2011)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. CoRR, arXiv: 1409.0575 (2014)
Google Scholar
Safonov, I.V., Kurilin, I.V., Rychagov, M.N., Tolstaya, E.V.: Document Image Processing for Scanning and Printing. Springer Nature Switzerland AG (2019)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25(3), 835–846 (2006)
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. CoRR, arXiv:1409.4842 (2014)
Google Scholar
Tola, E., Lepetit, V., Fua, P.: DAISY: An efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Pattern Anal. Mach. Intellig. 32(5), 815–830 (2010)
Article Google Scholar
Tsivian, Y.: Cinemetrics: part of the humanities’ cyberinfrastructure. In: Ross, M., Grauer, M., Freisleben, B. (eds.) Digital Tools in Media Studies, vol. 9, pp. 93–100. Transcript Verlag, Bielefeld (2009)
Chapter Google Scholar
Uchihachi, S., Foote, J.T., Wilcox, L.: Automatic video summarization using a measure of shot importance and a frame-packing method. US Patent 6,535,639 (2003)
Google Scholar
Zhao, B., Xing, E.P.: Quasi real-time summarization for consumer videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2513–2520 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Cinema and Photo Research Institute (NIKFI) of Gorky Film Studios, Moscow, Russia
Sergey Y. Podlesnyy

Authors

Sergey Y. Podlesnyy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey Y. Podlesnyy .

Editor information

Editors and Affiliations

National Research University of Electronic Technology (MIET), Moscow, Russia
Michael N. Rychagov
Aramco Innovations LLC, Moscow, Russia
Ekaterina V. Tolstaya
Google Research, New York, NY, USA
Mikhail Y. Sirotenko

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Podlesnyy, S.Y. (2021). Automatic Video Editing. In: Rychagov, M.N., Tolstaya, E.V., Sirotenko, M.Y. (eds) Smart Algorithms for Multimedia and Imaging. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-66741-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-66741-2_6
Published: 06 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66740-5
Online ISBN: 978-3-030-66741-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics