Skip to main content

Automated Video Mashups: Research and Challenges

  • Chapter
  • First Online:
MediaSync
  • 745 Accesses

Abstract

The proliferation of video cameras, such as those embedded in smartphones and wearable devices, has made it increasingly easy for users to film interesting events (such as public performance, family events, and vacation highlights) in their daily lives. Moreover, often there are multiple cameras capturing the same event at the same time, from different views. Concatenating segments of the videos produced by these cameras together along the event time forms a video mashup, which could depict the event in a less monotonous and more informative manner. It is, however, inefficient and costly to manually create a video mashup. This chapter aims to introduce the problem of automated video mashup to the readers, survey the state-of-the-art research work in this area, and outline the set of open challenges that remain to be solved. It provides a comprehensive introduction to practitioners, researchers, and graduate students who are interested in the research and challenges of automated video mashup.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 59.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that the audio and video samples captured at the same time are generated at a different time at the source due to the difference in the speed of light and the speed of sound. Humans, however, have learned to compensate for the difference in normal settings.

  2. 2.

    See [42] for a discussion on narration, story, and events.

References

  1. Nakano, T., Murofushi, S., Goto, M., Morishima, S.: Dancereproducer: an automatic mashup music video generation system by reusing dance video clips on the web. In: Sound and Music Computing Conference (SMC), pp. 183–189 (2011)

    Google Scholar 

  2. Fu, Y., Guo, Y., Zhu, Y., Liu, F., Song, C., Zhou, Z.H.: Multi-view video summarization. IEEE Trans. Multimedia (TOMM) 12(7), 717–729 (2010)

    Google Scholar 

  3. Pritch, Y., Ratovitch, S., Hende, A., Peleg, S.: Clustered synopsis of surveillance video. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 195–200. IEEE (2009)

    Google Scholar 

  4. Wang, X., Hirayama, T., Mase, K.: Viewpoint sequence recommendation based on contextual information for multiview video. IEEE Multimedia 22(4), 40–50 (2015)

    Google Scholar 

  5. Saini, M.K., Gadde, R., Yan, S., Ooi, W.T.: Movimash: online mobile video mashup. In: ACM International Conference on Multimedia (MM), pp. 139–148. ACM (2012)

    Google Scholar 

  6. Nguyen, D.T.D., Saini, M., Nguyen, V.T., Ooi, W.T.: Jiku director: a mobile video mashup system. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 477–478. ACM, Barcelona, Spain (2013)

    Google Scholar 

  7. Shrestha, P., Weda, H., Barbieri, M., Aarts, E.H., et al.: Automatic mashup generation from multiple-camera concert recordings. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 541–550. ACM, Firenze, Italy (2010)

    Google Scholar 

  8. Arev, I., Park, H.S., Sheikh, Y., Hodgins, J., Shamir, A.: Automatic editing of footage from multiple social cameras. ACM Trans. Grap. (TOG) 33(4), 81:1–81:11 (2014)

    Google Scholar 

  9. Su, K., Naaman, M., Gurjar, A., Patel, M., Ellis, D.P.: Making a scene: alignment of complete sets of clips based on pairwise audio match. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, p. 26. ACM, Hong Kong (2012)

    Google Scholar 

  10. Sinha, S.N., Pollefeys, M.: Synchronization and calibration of camera networks from silhouettes. In: International Conference on Pattern Recognition (ICPR), pp. 116–119. IEEE (2004)

    Google Scholar 

  11. Meyer, B., Stich, T., Magnor, M.A., Pollefeys, M.: Subframe temporal alignment of non-stationary cameras. In: British Machine Vision Conference (BMVC), pp. 1–10 (2008)

    Google Scholar 

  12. Caspi, Y., Simakov, D., Irani, M.: Feature-based sequence-to-sequence matching. Int. J. Comput. Vis. (IJCV) 68(1), 53–64 (2006)

    Google Scholar 

  13. Elhayek, A., Stoll, C., Kim, K., Seidel, H., Theobalt, C.: Feature-based multi-video synchronization with subframe accuracy. Pattern Recogn. 266–275 (2012)

    Google Scholar 

  14. Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., Seidel, H.P.: Markerless motion capture with unsynchronized moving cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 224–231. IEEE (2009)

    Google Scholar 

  15. Kammerl, J., Birkbeck, N., Inguva, S., Kelly, D., Crawford, A.J., Denman, H., Kokaram, A., Pantofaru, C.: Temporal synchronization of multiple audio signals. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), pp. 4603–4607. IEEE, Firenze, Italy (2014)

    Google Scholar 

  16. Shrestha, P., Barbieri, M., Weda, H.: Synchronization of multi-camera video recordings based on audio. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 545–548. ACM, Augsburg, Germany (2007)

    Google Scholar 

  17. Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system with an efficient search strategy. J. New Music Res. 32(2), 211–221 (2003)

    Google Scholar 

  18. Cremer, M., Cook, R.: Machine-assisted editing of user-generated content. In: IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, pp. 725,404–725,404–410 (2009)

    Google Scholar 

  19. Laiola Guimaraes, R., Cesar, P., Bulterman, D.C., Zsombori, V., Kegel, I.: Creating personalized memories from social events: community-based support for multi-camera recordings of school concerts. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 303–312. ACM, Scottdale, AZ, USA (2011)

    Google Scholar 

  20. Korchagin, D., Garner, P.N., Dines, J.: Automatic temporal alignment of av data with confidence estimation. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 269–272. IEEE (2010)

    Google Scholar 

  21. Bano, S., Cavallaro, A.: Discovery and organization of multi-camera user-generated videos of the same event. Inf. Sci. 302, 108–121 (2015)

    Article  Google Scholar 

  22. Bano, S., Cavallaro, A.: Vicomp: composition of user-generated videos. Multimedia Tools Appl. (MTAP) 75(12), 1–24 (2015)

    Google Scholar 

  23. Wu, Y., Mei, T., Xu, Y.Q., Yu, N., Li, S.: Movieup: Automatic mobile video mashup. IEEE Trans. Circ. Syst. Video Technol. 25(12), 1941–1954 (2015)

    Google Scholar 

  24. Wilk, S., Kopf, S., Effelsberg, W.: Video composition by the crowd: a system to compose user-generated videos in near real-time. In: Proceedings of the 6th ACM Multimedia Systems Conference, pp. 13–24. ACM, Portland, USA (2015)

    Google Scholar 

  25. Mei, T., Hua, X.S., Zhu, C.Z., Zhou, H.Q., Li, S.: Home video visual quality assessment with spatiotemporal factors. IEEE Trans. Circ. Syst. Video Technol. (CSVT) 17(6), 699–706 (2007)

    Google Scholar 

  26. Wilk, S., Effelsberg, W.: The influence of camera shakes, harmful occlusions and camera misalignment on the perceived quality in user generated video. In: IEEE International Conference on Multimedia and Expo (ICME). pp. 1–6. IEEE, Chengdu, China (2014)

    Google Scholar 

  27. Daniyal, F., Cavallaro, A.: Multi-camera scheduling for video production. In: Conference for Visual Media Production (CVMP), pp. 11–20. IEEE (2011)

    Google Scholar 

  28. Daniyal, F., Taj, M., Cavallaro, A.: Content and task-based view selection from multiple video streams. Multimedia Tools Appl. (MTAP) 46(2–3), 235–258 (2010)

    Article  Google Scholar 

  29. Goshorn, R., Goshorn, J., Goshorn, D., Aghajan, H.: Architecture for cluster-based automated surveillance network for detecting and tracking multiple persons. In: ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), pp. 219–226. IEEE (2007)

    Google Scholar 

  30. Jiang, H., Fels, S., Little, J.J.: Optimizing multiple object tracking and best view video synthesis. IEEE Trans. Multimedia (TOMM) 10(6), 997–1012 (2008)

    Google Scholar 

  31. Vihavainen, S., Mate, S., Seppälä, L., Cricri, F., Curcio, I.D.: We want more: human-computer collaboration in mobile social video remixing of music concerts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 287–296. ACM (2011)

    Google Scholar 

  32. Lerch, A.: An introduction to audio content analysis: applications in signal processing and music informatics. Wiley (2012)

    Google Scholar 

  33. Dmytyk, E.: On film editing: an introduction to the art of film construction (1984)

    Google Scholar 

  34. Canini, L., Benini, S., Leonardi, R.: Classifying cinematographic shot types. Multimedia Tools Appl. (MTAP) 62(1), 51–73 (2013)

    Google Scholar 

  35. Carlier, A., Calvet, L., Nguyen, D.T.D., Ooi, W.T., Gurdjos, P., Charvillat, V.: 3d interest maps from simultaneous video recordings. In: ACM International Conference on Multimedia, pp. 577–586. ACM (2014)

    Google Scholar 

  36. Zsombori, V., Frantzis, M., Guimaraes, R.L., Ursu, M.F., Cesar, P., Kegel, I., Craigie, R., Bulterman, D.C.: Automatic generation of video narratives from shared ugc. In: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, pp. 325–334. ACM, Eindhoven, Netherlands (2011)

    Google Scholar 

  37. Nguyen, D.T.D., Carlier, A., Ooi, W.T., Charvillat, V.: Jiku director 2.0: a mobile video mashup system with zoom and pan using motion maps. In: Proceedings of the ACM International Conference on Multimedia, pp. 765–766. ACM, Orlando, FL, USA (2014)

    Google Scholar 

  38. Beerends, J.G., De Caluwe, F.E.: The influence of video quality on perceived audio quality and vice versa. J. Audio Eng. Soc. (AES) 47(5), 355–362 (1999)

    Google Scholar 

  39. Saini, M., Venkatagiri, S.P., Ooi, W.T., Chan, M.C.: The jiku mobile video dataset. In: ACM Multimedia Systems Conference (MMSys), pp. 108–113. ACM (2013)

    Google Scholar 

  40. Ballan, L., Brostow, G.J., Puwein, J., Pollefeys, M.: Unstructured video-based rendering: interactive exploration of casually captured videos. ACM Trans. Graphics (TOG) 29(4), 87. ACM (2010)

    Google Scholar 

  41. Park, H.S., Jain, E., Sheikh, Y.: 3D social saliency from head-mounted cameras. In: Advances in Neural Information Processing Systems (NIPS), pp. 431–439 (2012)

    Google Scholar 

  42. Nack, F.: Event and story: an intricate relationship. In: Proceedings of the 2011 Joint ACM Workshop on Modeling and Representing Events, J-MRE ’11, pp. 49–50. ACM, NY, USA (2011). https://doi.org/10.1145/2072508.2072520

  43. Frantzis, M., Zsombori, V., Ursu, M., Guimaraes, R.L., Kegel, I., Craigie, R.: Interactive video stories from user generated content: a school concert use case. In: International Conference on Interactive Digital Storytelling, pp. 183–195. Springer, Berlin (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Tsang Ooi .

Editor information

Editors and Affiliations

Definitions

Definitions

 

Video Mashup :

A video that is produced by concatenating video segments cut from input video clips recoded at the same event from different views.

Time-Synchronized Video Mashup :

A video mashup that follows the same timeline as the event itself.

Asynchronous Video Mashup :

A video mashup that does not follow the same timeline as the event itself, and can be shorter or longer than the actual event. An example of an asynchronous video mashup is a summary video.

Cut point :

A time point in a video mashup when we choose to switch from one input video clip to another.

Shot Length :

The video playback time between two cut points. In other words, the length of a video segment from the same input clip included in the video mashup.

 

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Saini, M.K., Ooi, W.T. (2018). Automated Video Mashups: Research and Challenges. In: Montagud, M., Cesar, P., Boronat, F., Jansen, J. (eds) MediaSync. Springer, Cham. https://doi.org/10.1007/978-3-319-65840-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65840-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65839-1

  • Online ISBN: 978-3-319-65840-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics