On the Unsolved Problem of Shot Boundary Detection for Music Videos

  • Alexander SchindlerEmail author
  • Andreas Rauber
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11295)


This paper discusses open problems of detecting shot boundaries for music videos. The number of shots per second and the type of transition are considered to be a discriminating feature for music videos and a potential multi-modal music feature. By providing an extensive list of effects and transition types that are rare in cinematic productions but common in music videos, we emphasize the artistic use of transitions in music videos. By the use of examples we discuss in detail the shortcomings of state-of-the-art approaches and provide suggestions to address these issues.


Music Information Retrieval Music videos Shot boundary detection 


  1. 1.
    Schindler, A., Rauber, A.: A music video information retrieval approach to artist identification. In: Proceedings of the 10th International Symposium on Computer Music Multidisciplinary Research, CMMR 2013, Marseille, France, 14–18 October 2013 (2013, to appear)Google Scholar
  2. 2.
    Schindler, A., Rauber, A.: Harnessing music-related visual stereotypes for music information retrieval. ACM Trans. Intell. Syst. Technol. 8(2), 20:1–20:21 (2016)CrossRefGoogle Scholar
  3. 3.
    Tripathi, S., Acharya, S., Sharma, R.D., Mittal, S., Bhattacharya, S.: Using deep and convolutional neural networks for accurate emotion classification on DEAP dataset. In: Twenty-Ninth IAAI Conference, pp. 4746–4752 (2017)Google Scholar
  4. 4.
    Macrae, R., Anguera, X., Oliver, N.: MuViSync: realtime music video alignment. In: 2010 IEEE International Conference on Multimedia and Expo, ICME, pp. 534–539. IEEE (2010)Google Scholar
  5. 5.
    Slizovskaia, O., Gómez, E., Haro, G.: Musical instrument recognition in user-generated videos using a multimodal convolutional neural network architecture. In: Proceedings of the ACM on International Conference on Multimedia Retrieval, ICMR 2017, pp. 226–232 (2017)Google Scholar
  6. 6.
    Schindler, A.: A picture is worth a thousand songs: exploring visual aspects of music. In: Proceedings of the 1st International Workshop on Digital Libraries for Musicology, DLfM 2014 (2014)Google Scholar
  7. 7.
    Oramas, S., Nieto, O., Barbieri, F., Serra, X.: Multi-label music genre classification from audio, text, and images using deep features. CoRR, abs/1707.04916 (2017)Google Scholar
  8. 8.
    Schindler, A., Rauber, A.: An audio-visual approach to music genre classification through affective color features. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 61–67. Springer, Cham (2015). Scholar
  9. 9.
    Iyengar, G., Lippman, A.B.: Models for automatic classification of video sequences. In: Storage and Retrieval for Image and Video Databases VI, vol. 3312, pp. 216–228. International Society for Optics and Photonics (1997)Google Scholar
  10. 10.
    Hampapur, A., Weymouth, T., Jain, R.: Digital video segmentation. In: Proceedings of the 2nd ACM International Conference on Multimedia, pp. 357–364. ACM (1994)Google Scholar
  11. 11.
    Cotsaces, C., Nikolaidis, N., Pitas, I.: Video shot detection and condensed representation. A review. IEEE Signal Process. Mag. 23(2), 28–37 (2006)CrossRefGoogle Scholar
  12. 12.
    Yuan, J., et al.: A formal study of shot boundary detection. IEEE Trans. Circ. Syst. Video Technol. 17(2), 168–186 (2007)CrossRefGoogle Scholar
  13. 13.
    Smeaton, A.F., Over, P., Doherty, A.R.: Video shot boundary detection: seven years of TRECVID activity. Comput. Vis. Image Underst. 114(4), 411–418 (2010)CrossRefGoogle Scholar
  14. 14.
    Lienhart, R.W.: Reliable dissolve detection. In: Storage and Retrieval for Media Databases, vol. 4315, pp. 219–231. International Society for Optics and Photonics (2001)Google Scholar
  15. 15.
    Zheng, W., Yuan, J., Wang, H., Lin, F., Zhang, B.: A novel shot boundary detection framework. In: Visual Communications and Image Processing, vol. 5960, p. 596018. International Society for Optics and Photonics (2006)Google Scholar
  16. 16.
    Cernekova, Z., Pitas, I., Nikou, C.: Information theory-based shot cut/fade detection and video summarization. IEEE Trans. Circ. Syst. Video Technol. 16(1), 82–91 (2006)CrossRefGoogle Scholar
  17. 17.
    Xia, D., Deng, X., Zeng, Q.: Shot boundary detection based on difference sequences of mutual information. In: Fourth International Conference on Image and Graphics, ICIG 2007, pp. 389–394. IEEE (2007)Google Scholar
  18. 18.
    M Quśenot, G., Moraru, D., Besacier, L.: CLIPS at TRECVID: shot boundary detection and feature detection (2003)Google Scholar
  19. 19.
    Zhao, Z.-C., Zeng, X., Liu, T., Cai, A.-N.: BUPT at TRECVID 2007: shot boundary detection. In: TRECVID (2007)Google Scholar
  20. 20.
    Boreczky, J.S., Wilcox, L.D.: A hidden Markov model framework for video segmentation using audio and image features. In: ICASSP, vol. 98, pp. 3741–3744 (1998)Google Scholar
  21. 21.
    Amir, A., et al.: IBM research TRECVID-2003 video retrieval system. NIST TRECVID-2003 7(8), 36 (2003)Google Scholar
  22. 22.
    Hauptmann, A., et al.: Confounded expectations: Informedia at TRECVID 2004. In: Proceedings of TRECVID (2004)Google Scholar
  23. 23.
    Baraldi, L., Grana, C., Cucchiara, R.: Hierarchical boundary-aware neural encoder for video captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3185–3194. IEEE (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Center for Digital Safety and SecurityAIT Austrian Institute of Technology GmbHViennaAustria
  2. 2.Institute of Information Systems EngineeringVienna University of TechnologyViennaAustria

Personalised recommendations