Automated Video Mashups: Research and Challenges

Saini, Mukesh Kumar; Ooi, Wei Tsang

doi:10.1007/978-3-319-65840-7_6

Mukesh Kumar Saini⁵ &
Wei Tsang Ooi⁶

745 Accesses

Abstract

The proliferation of video cameras, such as those embedded in smartphones and wearable devices, has made it increasingly easy for users to film interesting events (such as public performance, family events, and vacation highlights) in their daily lives. Moreover, often there are multiple cameras capturing the same event at the same time, from different views. Concatenating segments of the videos produced by these cameras together along the event time forms a video mashup, which could depict the event in a less monotonous and more informative manner. It is, however, inefficient and costly to manually create a video mashup. This chapter aims to introduce the problem of automated video mashup to the readers, survey the state-of-the-art research work in this area, and outline the set of open challenges that remain to be solved. It provides a comprehensive introduction to practitioners, researchers, and graduate students who are interested in the research and challenges of automated video mashup.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Hardcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that the audio and video samples captured at the same time are generated at a different time at the source due to the difference in the speed of light and the speed of sound. Humans, however, have learned to compensate for the difference in normal settings.
2.
See [42] for a discussion on narration, story, and events.

References

Nakano, T., Murofushi, S., Goto, M., Morishima, S.: Dancereproducer: an automatic mashup music video generation system by reusing dance video clips on the web. In: Sound and Music Computing Conference (SMC), pp. 183–189 (2011)
Google Scholar
Fu, Y., Guo, Y., Zhu, Y., Liu, F., Song, C., Zhou, Z.H.: Multi-view video summarization. IEEE Trans. Multimedia (TOMM) 12(7), 717–729 (2010)
Google Scholar
Pritch, Y., Ratovitch, S., Hende, A., Peleg, S.: Clustered synopsis of surveillance video. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 195–200. IEEE (2009)
Google Scholar
Wang, X., Hirayama, T., Mase, K.: Viewpoint sequence recommendation based on contextual information for multiview video. IEEE Multimedia 22(4), 40–50 (2015)
Google Scholar
Saini, M.K., Gadde, R., Yan, S., Ooi, W.T.: Movimash: online mobile video mashup. In: ACM International Conference on Multimedia (MM), pp. 139–148. ACM (2012)
Google Scholar
Nguyen, D.T.D., Saini, M., Nguyen, V.T., Ooi, W.T.: Jiku director: a mobile video mashup system. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 477–478. ACM, Barcelona, Spain (2013)
Google Scholar
Shrestha, P., Weda, H., Barbieri, M., Aarts, E.H., et al.: Automatic mashup generation from multiple-camera concert recordings. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 541–550. ACM, Firenze, Italy (2010)
Google Scholar
Arev, I., Park, H.S., Sheikh, Y., Hodgins, J., Shamir, A.: Automatic editing of footage from multiple social cameras. ACM Trans. Grap. (TOG) 33(4), 81:1–81:11 (2014)
Google Scholar
Su, K., Naaman, M., Gurjar, A., Patel, M., Ellis, D.P.: Making a scene: alignment of complete sets of clips based on pairwise audio match. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, p. 26. ACM, Hong Kong (2012)
Google Scholar
Sinha, S.N., Pollefeys, M.: Synchronization and calibration of camera networks from silhouettes. In: International Conference on Pattern Recognition (ICPR), pp. 116–119. IEEE (2004)
Google Scholar
Meyer, B., Stich, T., Magnor, M.A., Pollefeys, M.: Subframe temporal alignment of non-stationary cameras. In: British Machine Vision Conference (BMVC), pp. 1–10 (2008)
Google Scholar
Caspi, Y., Simakov, D., Irani, M.: Feature-based sequence-to-sequence matching. Int. J. Comput. Vis. (IJCV) 68(1), 53–64 (2006)
Google Scholar
Elhayek, A., Stoll, C., Kim, K., Seidel, H., Theobalt, C.: Feature-based multi-video synchronization with subframe accuracy. Pattern Recogn. 266–275 (2012)
Google Scholar
Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., Seidel, H.P.: Markerless motion capture with unsynchronized moving cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 224–231. IEEE (2009)
Google Scholar
Kammerl, J., Birkbeck, N., Inguva, S., Kelly, D., Crawford, A.J., Denman, H., Kokaram, A., Pantofaru, C.: Temporal synchronization of multiple audio signals. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), pp. 4603–4607. IEEE, Firenze, Italy (2014)
Google Scholar
Shrestha, P., Barbieri, M., Weda, H.: Synchronization of multi-camera video recordings based on audio. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 545–548. ACM, Augsburg, Germany (2007)
Google Scholar
Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system with an efficient search strategy. J. New Music Res. 32(2), 211–221 (2003)
Google Scholar
Cremer, M., Cook, R.: Machine-assisted editing of user-generated content. In: IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, pp. 725,404–725,404–410 (2009)
Google Scholar
Laiola Guimaraes, R., Cesar, P., Bulterman, D.C., Zsombori, V., Kegel, I.: Creating personalized memories from social events: community-based support for multi-camera recordings of school concerts. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 303–312. ACM, Scottdale, AZ, USA (2011)
Google Scholar
Korchagin, D., Garner, P.N., Dines, J.: Automatic temporal alignment of av data with confidence estimation. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 269–272. IEEE (2010)
Google Scholar
Bano, S., Cavallaro, A.: Discovery and organization of multi-camera user-generated videos of the same event. Inf. Sci. 302, 108–121 (2015)
Article Google Scholar
Bano, S., Cavallaro, A.: Vicomp: composition of user-generated videos. Multimedia Tools Appl. (MTAP) 75(12), 1–24 (2015)
Google Scholar
Wu, Y., Mei, T., Xu, Y.Q., Yu, N., Li, S.: Movieup: Automatic mobile video mashup. IEEE Trans. Circ. Syst. Video Technol. 25(12), 1941–1954 (2015)
Google Scholar
Wilk, S., Kopf, S., Effelsberg, W.: Video composition by the crowd: a system to compose user-generated videos in near real-time. In: Proceedings of the 6th ACM Multimedia Systems Conference, pp. 13–24. ACM, Portland, USA (2015)
Google Scholar
Mei, T., Hua, X.S., Zhu, C.Z., Zhou, H.Q., Li, S.: Home video visual quality assessment with spatiotemporal factors. IEEE Trans. Circ. Syst. Video Technol. (CSVT) 17(6), 699–706 (2007)
Google Scholar
Wilk, S., Effelsberg, W.: The influence of camera shakes, harmful occlusions and camera misalignment on the perceived quality in user generated video. In: IEEE International Conference on Multimedia and Expo (ICME). pp. 1–6. IEEE, Chengdu, China (2014)
Google Scholar
Daniyal, F., Cavallaro, A.: Multi-camera scheduling for video production. In: Conference for Visual Media Production (CVMP), pp. 11–20. IEEE (2011)
Google Scholar
Daniyal, F., Taj, M., Cavallaro, A.: Content and task-based view selection from multiple video streams. Multimedia Tools Appl. (MTAP) 46(2–3), 235–258 (2010)
Article Google Scholar
Goshorn, R., Goshorn, J., Goshorn, D., Aghajan, H.: Architecture for cluster-based automated surveillance network for detecting and tracking multiple persons. In: ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), pp. 219–226. IEEE (2007)
Google Scholar
Jiang, H., Fels, S., Little, J.J.: Optimizing multiple object tracking and best view video synthesis. IEEE Trans. Multimedia (TOMM) 10(6), 997–1012 (2008)
Google Scholar
Vihavainen, S., Mate, S., Seppälä, L., Cricri, F., Curcio, I.D.: We want more: human-computer collaboration in mobile social video remixing of music concerts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 287–296. ACM (2011)
Google Scholar
Lerch, A.: An introduction to audio content analysis: applications in signal processing and music informatics. Wiley (2012)
Google Scholar
Dmytyk, E.: On film editing: an introduction to the art of film construction (1984)
Google Scholar
Canini, L., Benini, S., Leonardi, R.: Classifying cinematographic shot types. Multimedia Tools Appl. (MTAP) 62(1), 51–73 (2013)
Google Scholar
Carlier, A., Calvet, L., Nguyen, D.T.D., Ooi, W.T., Gurdjos, P., Charvillat, V.: 3d interest maps from simultaneous video recordings. In: ACM International Conference on Multimedia, pp. 577–586. ACM (2014)
Google Scholar
Zsombori, V., Frantzis, M., Guimaraes, R.L., Ursu, M.F., Cesar, P., Kegel, I., Craigie, R., Bulterman, D.C.: Automatic generation of video narratives from shared ugc. In: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, pp. 325–334. ACM, Eindhoven, Netherlands (2011)
Google Scholar
Nguyen, D.T.D., Carlier, A., Ooi, W.T., Charvillat, V.: Jiku director 2.0: a mobile video mashup system with zoom and pan using motion maps. In: Proceedings of the ACM International Conference on Multimedia, pp. 765–766. ACM, Orlando, FL, USA (2014)
Google Scholar
Beerends, J.G., De Caluwe, F.E.: The influence of video quality on perceived audio quality and vice versa. J. Audio Eng. Soc. (AES) 47(5), 355–362 (1999)
Google Scholar
Saini, M., Venkatagiri, S.P., Ooi, W.T., Chan, M.C.: The jiku mobile video dataset. In: ACM Multimedia Systems Conference (MMSys), pp. 108–113. ACM (2013)
Google Scholar
Ballan, L., Brostow, G.J., Puwein, J., Pollefeys, M.: Unstructured video-based rendering: interactive exploration of casually captured videos. ACM Trans. Graphics (TOG) 29(4), 87. ACM (2010)
Google Scholar
Park, H.S., Jain, E., Sheikh, Y.: 3D social saliency from head-mounted cameras. In: Advances in Neural Information Processing Systems (NIPS), pp. 431–439 (2012)
Google Scholar
Nack, F.: Event and story: an intricate relationship. In: Proceedings of the 2011 Joint ACM Workshop on Modeling and Representing Events, J-MRE ’11, pp. 49–50. ACM, NY, USA (2011). https://doi.org/10.1145/2072508.2072520
Frantzis, M., Zsombori, V., Ursu, M., Guimaraes, R.L., Kegel, I., Craigie, R.: Interactive video stories from user generated content: a school concert use case. In: International Conference on Interactive Digital Storytelling, pp. 183–195. Springer, Berlin (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Ropar, Rupnagar, India
Mukesh Kumar Saini
National University of Singapore, Singapore, Singapore
Wei Tsang Ooi

Authors

Mukesh Kumar Saini
View author publications
You can also search for this author in PubMed Google Scholar
Wei Tsang Ooi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Tsang Ooi .

Editor information

Editors and Affiliations

Departamento de Comunicaciones, Universitat Politècnica de València (UPV) – Campus de Gandia, Valencia, Grao de Gandia, Spain
Mario Montagud
Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands
Pablo Cesar
Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands
Fernando Boronat
Departamento de Comunicaciones, Universitat Politècnica de València (UPV) – Campus de Gandia, Valencia, Grao de Gandia, Spain
Jack Jansen

Definitions

Video Mashup :: A video that is produced by concatenating video segments cut from input video clips recoded at the same event from different views.
Time-Synchronized Video Mashup :: A video mashup that follows the same timeline as the event itself.
Asynchronous Video Mashup :: A video mashup that does not follow the same timeline as the event itself, and can be shorter or longer than the actual event. An example of an asynchronous video mashup is a summary video.
Cut point :: A time point in a video mashup when we choose to switch from one input video clip to another.
Shot Length :: The video playback time between two cut points. In other words, the length of a video segment from the same input clip included in the video mashup.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Saini, M.K., Ooi, W.T. (2018). Automated Video Mashups: Research and Challenges. In: Montagud, M., Cesar, P., Boronat, F., Jansen, J. (eds) MediaSync. Springer, Cham. https://doi.org/10.1007/978-3-319-65840-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-65840-7_6
Published: 27 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65839-1
Online ISBN: 978-3-319-65840-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automated Video Mashups: Research and Challenges

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Definitions

Definitions

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation