Skip to main content
Log in

A highly robust deep learning technique for overlap detection using audio fingerprinting

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Due to the proliferation of video-based applications, there is a high demand for automated systems to support various video-based tasks that are free from human intervention, i.e., manual tagging. In this paper, we present a novel approach for detecting the presence of overlap between two videos by exploiting their corresponding audio signals, which is a crucial preprocessing step for audio, and further video alignment and synchronisation. Several existing approaches have limitations related to timestamps, overlapping regions, and the length of video clips. For the proposed work, we target the challenging scenario consisting of simultaneously recorded videos in an unconstrained manner by multiple users attending performance events. xOur work is an attempt towards developing a robust framework that not only considers noisy components present in the audio but is also free from the limitations mentioned above. We compare our framework with several other existing approaches. Our proposed framework outperforms other approaches by an average of 13.71% in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available in the following repository: https://traces.cs.umass.edu/index.php/Mmsys/Mmsys

References

  1. Müller M, Mattes H, Kurth F (2006) An efficient multiscale approach to audio synchronization. In: 7th International Society on Music Information Retrieval (ISMIR), vol. 546, pp 192–197. Citeseer

  2. Diego F, Ponsa D, Serrat J, López AM (2010) Video alignment for change detection. IEEE Transactions on Image Processing 20(7):1858–1869

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  3. Covell M, Baluja S (2007) Known-audio detection using waveprint: Spectrogram fingerprinting by wavelet hashing. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP-07, vol.1, pp 237. IEEE

  4. Joder C, Essid S, Richard G (2013) Learning optimal features for polyphonic audio-to-score alignment. IEEE Transactions on Audio, Speech, and Language Processing 21(10):2118–2128

    Article  Google Scholar 

  5. Six J, Leman M (2014) Panako: a scalable acoustic fingerprinting system handling time-scale and pitch modification. In: 15th International Society for Music Information Retrieval Conference (ISMIR-2014)

  6. Trigeorgis G, Nicolaou MA, Schuller BW, Zafeiriou S (2017) Deep canonical time warping for simultaneous alignment and representation learning of sequences. IEEE transactions on pattern analysis and machine intelligence 40(5):1128–1138

    Article  PubMed  Google Scholar 

  7. Dannenberg RB, Raphael C (2006) Music score alignment and computer accompaniment. Communications of the ACM 49(8):38–43

    Article  Google Scholar 

  8. Joder C, Essid S, Richard G (2011) A conditional random field framework for robust and scalable audio-to-score matching. IEEE Transactions on Audio, Speech, and Language Processing 19(8):2385–2397

    Article  Google Scholar 

  9. Montecchio N, Cont A (2011) A unified approach to real time audio-to-score and audio-to-audio alignment using sequential montecarlo inference techniques. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 193–196. IEEE

  10. Duan Z, Pardo B (2011) A state space model for online polyphonic audioscore alignment. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 197–200. IEEE

  11. Haitsma J, Kalker T (2002) A highly robust audio fingerprinting system. In: 3rd International Society on Music Information Retrieval (ISMIR), vol. 2002, pp 107–115

  12. Wang A, et al (2003) An industrial strength audio search algorithm. In: 4th International Society on Music Information Retrieval (ISMIR), vol. 2003, pp 7–13. Citeseer

  13. Ke Y, Hoiem D, Sukthankar R (2005) Computer vision for music identification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR-05), vol. 1, pp 597–604. IEEE

  14. Ewert S, Muller M, Grosche P (2009) High resolution audio synchronization using chroma onset features. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp 1869–1872. IEEE

  15. Gasser MGM, Arzt A, Widmer G (2013) Automatic alignment of music performances with structural differences

  16. Burloiu G (2014) An online audio alignment tool for live musical performance. In: 2014 11th International Symposium on Electronics and Telecommunications (ISETC), pp 1–4 . IEEE

  17. Wang S, Ewert S, Dixon S (2016) Robust and efficient joint alignment of multiple musical performances. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24(11):2132–2145

    Article  Google Scholar 

  18. Tralie C, Dempsey E (2020) Exact, parallelizable dynamic time warping alignment with linear memory. arXiv preprint arXiv:2008.02734

  19. Müller M, Özer Y, Krause M, Prätzlich T, Driedger J (2021) Sync toolbox: A python package for efficient, robust, and accurate music synchronization. Journal of Open Source Software 6(64):3434

    Article  ADS  Google Scholar 

  20. Salvador S, Chan P (2007) Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11(5):561–580

    Article  Google Scholar 

  21. Prätzlich T, Driedger J, Müller M (2016) Memory-restricted multiscale dynamic time warping. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 569–573. IEEE

  22. Bullock L, Bredin H, Garcia-Perera LP (2020) Overlap-aware diarization: Resegmentation using neural end-to-end overlapped speech detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 7114–7118 . IEEE

  23. Cano P, Batlle E, Kalker T, Haitsma J (2005) A review of audio fingerprinting. Journal of VLSI signal processing systems for signal, image and video technology 41(3):271–284

    Article  Google Scholar 

  24. Saini M, Venkatagiri SP, Ooi WT, Chan MC (2013) The jiku mobile video dataset. In: Proceedings of the 4th ACM Multimedia Systems Conference, pp 108–113

  25. Giorgino T (2009) Computing and visualizing dynamic time warping alignments in r: the dtw package. Journal of statistical Software 31:1–24

    Article  Google Scholar 

Download references

Funding

This work is supported by the grant received from DST, Govt. of India for the Technology Innovation Hub at the IIT Ropar in the framework of National Mission on Interdisciplinary Cyber-Physical Systems.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anterpreet Kaur Bedi.

Ethics declarations

Conflicts of interests/Competing interests

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Uikey, A., Bedi, A.K., Choudhary, P. et al. A highly robust deep learning technique for overlap detection using audio fingerprinting. Multimed Tools Appl 83, 29119–29137 (2024). https://doi.org/10.1007/s11042-023-16713-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16713-y

Keywords

Navigation