A hybrid model for unsupervised single channel speech separation

Prasanna Kumar, MK; Kumaraswamy, R.

doi:10.1007/s11042-023-16108-z

A hybrid model for unsupervised single channel speech separation

Published: 05 July 2023

Volume 83, pages 13241–13259, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

MK Prasanna Kumar¹ &
R. Kumaraswamy²

140 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The performance of any voice recognition platform in real environment depends on how well the desired speech signal is separated from unwanted signals like background noise or background speakers. In this paper, we propose a three stage hybrid model to separate two speakers from single channel speech mixture under unsupervised condition. Proposed method combines three techniques namely speech segmentation, NMF (Nonnegative Matrix Factorization) and Masking. Speech segmentation groups the short speech frames belonging to individual speakers by identifying the speaker change over points. The segmentation block groups the speech frames belonging to individual speakers but lacks in continuity of the speech samples. Therefore a second stage is built using NMF. NMF algorithm performs better in separating the speech mixture when parts of the individual speech signals are known a priori. This requirement is satisfied by speech segmentation stage. NMF further separates the individual speech signals in the mixture by maintaining continuity of speech samples over time. To further improve the accuracy of separated speech signals, various masking methods like TFR (Time frequency Ratio), SM (Soft Mask) and HM (Hard Mask) are applied. The separation results are compared with other unsupervised algorithms. The proposed hybrid model produces promising results in unsupervised single channel speech separation. This model can be applied at the front end of any voice recognition platform to further improve the recognition efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

Fundamentals, present and future perspectives of speech enhancement

Article 22 January 2020

References

Bin G, Woo W (2013) Unsupervised single Channel separation of no stationary signals using gamma tone filter bank and itakura-satio nonnegative matrix two-dimensional factorizations. IEEE Trans Circuits Syst 60(3):662–675
Article MathSciNet Google Scholar
Boldt J, Ellis DPW (2009) A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation. In Proceedings of the 17th European Signal Processing Conference (EUSIPCO2009) EURASIP, August 24-28, 2009, Glasgow, Scotland
Changsheng Z, Dongmin H, Sijin Z (March 2020) A robust unsupervised method for the single channel speech separation. Proc. Int. Conf. on Computational intelligence and security, Macao, pp.387–390
DeLiang W, Jitong C (2018) Supervised speech separation based on deep learning, An overview. IEEE Trans Audio Speech Lang Process 26(10):1702–1726
Article Google Scholar
Ellis D (2006) Model based scene analysis, Computational auditory scene analysis: Principles, Algorithms and Applications. Wiley/IEEE Press, New York
Google Scholar
Fevotte C, Gribonval R, Vincent E (April 2005) BSS_EVAL toolbox user guide - Revision 2.0, Technical Report 1706, IRISA
Gil J, Te Won L (2003) A maximum likelihood approach to single channel source separation. J Mach Learn Res 14:1365–1392
MathSciNet Google Scholar
Jun D, Yanhui T, Li Rong D, Chin L (2016) A Regression Approach to Single Channel Speech Separation Via High Resolution Deep Neural Networks'. IEEE Trans Audio Speech Lang Process 24(8):1424–1437
Article Google Scholar
Karhunen J, Oja E (2001) Independent component analysis. John Wiley Sons
Google Scholar
Ke W, Frank S, Lei X (April 2019) A Pitch-aware Approach to Single-channel Speech Separation. Proc. Int. Conf. Acoustics Speech and Signal Processing, Brighton, United Kingdom, pp. 296–300
Kumaraswamy R, Yegnanarayana R, S. (2009) Determining mixing parameters from multi speaker data using speech specific information. IEEE Trans Audio Speech Lang Process 17(6):1196–1207
Article Google Scholar
Kwang M, Chanjun C, Chaejun L (March 2020) Lightweight U-Net Based Monaural Speech Source Separation for Edge Computing Device. Proc. Int. Conf. Consumer Electronics, Las Vegas, USA, pp. 1–4
Michael S, Michael W, Franz P (2011) Source- Filter based single channel speech separation using pitch information. IEEE Trans Audio Speech Lang Process 19(2):242–254
Article Google Scholar
PrasannaKumar MK, Kumaraswamy R (2015) Supervised and unsupervised separation of convolutive speech mixtures using f0 and formant frequencies. Int J Speech Technol 18(4):649–662
Article Google Scholar
PrasannaKumar MK, Kumaraswamy R (2017) An unsupervised approach for cochannel speech separation using Hilbert-Huang Transform and Fuzzy C-Means Clustering. Int J Speech Technol 20(1):1–13
Article Google Scholar
PrasannaKumar MK, Kumaraswamy R (2017) Single-channel speech separation using Empirical Mode Decomposition and multi pitch information with estimation of number of speakers. Int J Speech Technol 20(1):109–125
Article Google Scholar
PrasannaKumar MK, Kumaraswamy R (2017) Single-channel speech separation using Combined EMD and speech-specific information. Int J Speech Technol 20(4):1037–1047
Article Google Scholar
PrasannaKumar MK, Kumaraswamy R (2017) Single channel speech separation based on Empirical Mode Decomposition and Hilbert transform. IET Signal Processing 11(5):579–586
Article Google Scholar
PrasannaKumar MK, Kumaraswamy R (2021) Unsupervised speech separation by detecting speaker changeover points under single channel condition. Int J Speech Technol 24(4):1101–1112
Article Google Scholar
Qingju L, Jackson PJ, Wenwu W (2019) A speech synthesis approach for high quality speech separation and generation. IEEE Signal Process Lett 26(12):1872–1876
Article Google Scholar
Schmidt M, Olsson R (2006) Single channel speech separation using sparse non negative matrix factorization. Proc. Int. Conf. Spoken Lang. Process. (INTER SPEECH), Pittsburgh, PA, pp. 2614–2617
Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An Algorithm for intelligibility prediction of time–frequency weighted noisy speech. in IEEE Trans Audio Speech Lang Process 19(7): 2125–2136
Tengtrairat N, Bin G (2013) Single channel Blind separation using pseudo stereo mixture and complex 2-D histogram. IEEE Trans Neural Netw Learn Syst 24(11):1722–1735
Article Google Scholar
Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14(4):1462–1469
Article Google Scholar
Virtanen T (2007) Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans Audio Speech Lang Process 15(3):1066–1074
Article Google Scholar
Xiang L, Xihong W, Jing C (April 2019) A Spectral-change-aware Loss Function for DNN-based Speech Separation', Proc. Int. Conf. Acoustics Speech and Signal Processing, Brighton, United Kingdom, pp. 6870–6874
Xiao L, Deliang W (2016) A Deep Ensemble Learning Method for Monaural Speech Separation. IEEE Trans Audio Speech Lang Process 24(5):967–977
Article Google Scholar

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Department of Electronics & Telecommunication, BMS College of Engineering, Bangalore, India
MK Prasanna Kumar
Department of Electronics & Communication, Siddaganga Institute of Technology, Tumakuru, India
R. Kumaraswamy

Authors

MK Prasanna Kumar
View author publications
You can also search for this author in PubMed Google Scholar
R. Kumaraswamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to MK Prasanna Kumar.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Prasanna Kumar, M., Kumaraswamy, R. A hybrid model for unsupervised single channel speech separation. Multimed Tools Appl 83, 13241–13259 (2024). https://doi.org/10.1007/s11042-023-16108-z

Download citation

Received: 10 February 2022
Revised: 30 April 2023
Accepted: 26 June 2023
Published: 05 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-16108-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid model for unsupervised single channel speech separation

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

A Deep Learning Framework for Audio Deepfake Detection

Fundamentals, present and future perspectives of speech enhancement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid model for unsupervised single channel speech separation

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

A Deep Learning Framework for Audio Deepfake Detection

Fundamentals, present and future perspectives of speech enhancement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation