A highly robust deep learning technique for overlap detection using audio fingerprinting

Uikey, Akash; Bedi, Anterpreet Kaur; Choudhary, Priyankar; Ooi, Wei Tsang; Saini, Mukesh

doi:10.1007/s11042-023-16713-y

A highly robust deep learning technique for overlap detection using audio fingerprinting

Published: 11 September 2023

Volume 83, pages 29119–29137, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Akash Uikey¹,
Anterpreet Kaur Bedi ORCID: orcid.org/0000-0001-6064-5925²,
Priyankar Choudhary¹,
Wei Tsang Ooi³ &
…
Mukesh Saini¹

230 Accesses
Explore all metrics

Abstract

Due to the proliferation of video-based applications, there is a high demand for automated systems to support various video-based tasks that are free from human intervention, i.e., manual tagging. In this paper, we present a novel approach for detecting the presence of overlap between two videos by exploiting their corresponding audio signals, which is a crucial preprocessing step for audio, and further video alignment and synchronisation. Several existing approaches have limitations related to timestamps, overlapping regions, and the length of video clips. For the proposed work, we target the challenging scenario consisting of simultaneously recorded videos in an unconstrained manner by multiple users attending performance events. xOur work is an attempt towards developing a robust framework that not only considers noisy components present in the audio but is also free from the limitations mentioned above. We compare our framework with several other existing approaches. Our proposed framework outperforms other approaches by an average of 13.71% in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of deep learning techniques in audio event recognition (AER) applications

Article 14 June 2023

A Comparison of Approaches for Synchronizing Events in Video Streams Using Audio

Speech Music Overlap Detection Using Spectral Peak Evolutions

Data Availability

The datasets generated during and/or analysed during the current study are available in the following repository: https://traces.cs.umass.edu/index.php/Mmsys/Mmsys

References

Müller M, Mattes H, Kurth F (2006) An efficient multiscale approach to audio synchronization. In: 7th International Society on Music Information Retrieval (ISMIR), vol. 546, pp 192–197. Citeseer
Diego F, Ponsa D, Serrat J, López AM (2010) Video alignment for change detection. IEEE Transactions on Image Processing 20(7):1858–1869
Article ADS MathSciNet PubMed Google Scholar
Covell M, Baluja S (2007) Known-audio detection using waveprint: Spectrogram fingerprinting by wavelet hashing. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP-07, vol.1, pp 237. IEEE
Joder C, Essid S, Richard G (2013) Learning optimal features for polyphonic audio-to-score alignment. IEEE Transactions on Audio, Speech, and Language Processing 21(10):2118–2128
Article Google Scholar
Six J, Leman M (2014) Panako: a scalable acoustic fingerprinting system handling time-scale and pitch modification. In: 15th International Society for Music Information Retrieval Conference (ISMIR-2014)
Trigeorgis G, Nicolaou MA, Schuller BW, Zafeiriou S (2017) Deep canonical time warping for simultaneous alignment and representation learning of sequences. IEEE transactions on pattern analysis and machine intelligence 40(5):1128–1138
Article PubMed Google Scholar
Dannenberg RB, Raphael C (2006) Music score alignment and computer accompaniment. Communications of the ACM 49(8):38–43
Article Google Scholar
Joder C, Essid S, Richard G (2011) A conditional random field framework for robust and scalable audio-to-score matching. IEEE Transactions on Audio, Speech, and Language Processing 19(8):2385–2397
Article Google Scholar
Montecchio N, Cont A (2011) A unified approach to real time audio-to-score and audio-to-audio alignment using sequential montecarlo inference techniques. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 193–196. IEEE
Duan Z, Pardo B (2011) A state space model for online polyphonic audioscore alignment. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 197–200. IEEE
Haitsma J, Kalker T (2002) A highly robust audio fingerprinting system. In: 3rd International Society on Music Information Retrieval (ISMIR), vol. 2002, pp 107–115
Wang A, et al (2003) An industrial strength audio search algorithm. In: 4th International Society on Music Information Retrieval (ISMIR), vol. 2003, pp 7–13. Citeseer
Ke Y, Hoiem D, Sukthankar R (2005) Computer vision for music identification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR-05), vol. 1, pp 597–604. IEEE
Ewert S, Muller M, Grosche P (2009) High resolution audio synchronization using chroma onset features. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp 1869–1872. IEEE
Gasser MGM, Arzt A, Widmer G (2013) Automatic alignment of music performances with structural differences
Burloiu G (2014) An online audio alignment tool for live musical performance. In: 2014 11th International Symposium on Electronics and Telecommunications (ISETC), pp 1–4 . IEEE
Wang S, Ewert S, Dixon S (2016) Robust and efficient joint alignment of multiple musical performances. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24(11):2132–2145
Article Google Scholar
Tralie C, Dempsey E (2020) Exact, parallelizable dynamic time warping alignment with linear memory. arXiv preprint arXiv:2008.02734
Müller M, Özer Y, Krause M, Prätzlich T, Driedger J (2021) Sync toolbox: A python package for efficient, robust, and accurate music synchronization. Journal of Open Source Software 6(64):3434
Article ADS Google Scholar
Salvador S, Chan P (2007) Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11(5):561–580
Article Google Scholar
Prätzlich T, Driedger J, Müller M (2016) Memory-restricted multiscale dynamic time warping. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 569–573. IEEE
Bullock L, Bredin H, Garcia-Perera LP (2020) Overlap-aware diarization: Resegmentation using neural end-to-end overlapped speech detection. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 7114–7118 . IEEE
Cano P, Batlle E, Kalker T, Haitsma J (2005) A review of audio fingerprinting. Journal of VLSI signal processing systems for signal, image and video technology 41(3):271–284
Article Google Scholar
Saini M, Venkatagiri SP, Ooi WT, Chan MC (2013) The jiku mobile video dataset. In: Proceedings of the 4th ACM Multimedia Systems Conference, pp 108–113
Giorgino T (2009) Computing and visualizing dynamic time warping alignments in r: the dtw package. Journal of statistical Software 31:1–24
Article Google Scholar

Download references

Funding

This work is supported by the grant received from DST, Govt. of India for the Technology Innovation Hub at the IIT Ropar in the framework of National Mission on Interdisciplinary Cyber-Physical Systems.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Roopnagar, 140001, Punjab, India
Akash Uikey, Priyankar Choudhary & Mukesh Saini
Electrical and Instrumentation Engineering Department, Thapar Institute of Engineering and Technology, Patiala, 147004, Punjab, India
Anterpreet Kaur Bedi
Department of Computer Science, National University of Singapore, Singapore, 119077, Singapore
Wei Tsang Ooi

Authors

Akash Uikey
View author publications
You can also search for this author in PubMed Google Scholar
Anterpreet Kaur Bedi
View author publications
You can also search for this author in PubMed Google Scholar
Priyankar Choudhary
View author publications
You can also search for this author in PubMed Google Scholar
Wei Tsang Ooi
View author publications
You can also search for this author in PubMed Google Scholar
Mukesh Saini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anterpreet Kaur Bedi.

Ethics declarations

Conflicts of interests/Competing interests

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Uikey, A., Bedi, A.K., Choudhary, P. et al. A highly robust deep learning technique for overlap detection using audio fingerprinting. Multimed Tools Appl 83, 29119–29137 (2024). https://doi.org/10.1007/s11042-023-16713-y

Download citation

Received: 08 April 2022
Revised: 29 August 2023
Accepted: 31 August 2023
Published: 11 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16713-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A highly robust deep learning technique for overlap detection using audio fingerprinting

Abstract

Access this article

Similar content being viewed by others

A review of deep learning techniques in audio event recognition (AER) applications

A Comparison of Approaches for Synchronizing Events in Video Streams Using Audio

Speech Music Overlap Detection Using Spectral Peak Evolutions

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests/Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A highly robust deep learning technique for overlap detection using audio fingerprinting

Abstract

Access this article

Similar content being viewed by others

A review of deep learning techniques in audio event recognition (AER) applications

A Comparison of Approaches for Synchronizing Events in Video Streams Using Audio

Speech Music Overlap Detection Using Spectral Peak Evolutions

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests/Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation