Quest for Speech Enhancement Method in the Analysis of Pathological Voices

Gour, G. B.; Udayashankara, V.; Badakh, Dinesh K.; Kulkarni, Yogesh A.

doi:10.1007/s00034-022-02286-y

Quest for Speech Enhancement Method in the Analysis of Pathological Voices

Published: 12 January 2023

Volume 42, pages 3617–3648, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

G. B. Gour ORCID: orcid.org/0000-0001-9053-1814¹,
V. Udayashankara²,
Dinesh K. Badakh³ &
…
Yogesh A. Kulkarni³

181 Accesses
1 Altmetric
Explore all metrics

Abstract

In an uncontrolled scenario like hospitals, medical research centres, laboratories, forensics, the voice recordings are normally affected with babble noise. Most of the speech enhancement (SE) methods are limited by the colour noise due to non-consideration of phase aspect. Hence, the objective of the work is to depict a more efficient and compact SE method through the comparative study of conventional SE methods taking into account of both continuous speech and sustained vowels. This approach is supported with supervised voice activity detection (VAD). The study includes an experimental investigation of time and transform-based seven conventional methods. This approach consists of pre-processing, SE evaluation block and supervised VAD section. The speech segments are efficiently detected by VAD using bio-inspired Mel frequency cepstral coefficients and nonlinear features with support vector machine. This supervised VAD helps in reliable estimation of SNR-based parameters. The performance of each of the methods is evaluated with continuous NOIZEUS corpus at varying signal-to-noise ratio (SNR) levels, laryngeal pathology-based data from Saarbruecken voice database and voice disorder data collected from cancer hospitals. Experimental results show that the performance of the Wiener filtering and geometrical-based SE methods is found to be better comparatively, as far as continuous and sustained vowels are concerned with babble noise. The paper also reveals the significant involvement of nonlinear features in VAD, as the nonlinear dynamical methods are capable of analysing irregular behaviours of vocal cords.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 4

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Data Availability

Databases used in the present study such as SVD [2, 21, 29] and NOIZEUS [4] are publically available, whereas Laryngeal cancer disorder database is available with due permissions. The pertaining details are attached in the form of supplementary material for the reference.

References

S. An, C. Bao, B. Xia, An Adaptive Β-Order Mmse Estimator For Speech Enhancement Using Super-Gaussian Speech Model. IEEE China Summit and International Conference on Signal and Information Processing, (2013), pp. 327–331, https://doi.org/10.1109/ChinaSIP.2013.6625354.
W.J. Barry, M. Putzer, Saarbrucken Voice Database. Institute of Phonetics, Univ. of Saarland. http://www.stimmdatenbank.coli.unisaarland.de/
S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979). https://doi.org/10.1109/TASSP.1979.1163209
Article Google Scholar
D.M. Công, Noise Reduction in Speech Enhancement by Spectral Subtraction with Scalar Kalman Filter. Ha Noi. (2015)
N. Das, S. Chakraborty, J. Chaki, N. Padhy, D. Dey, Fundamentals, present and future perspectives of speech enhancement. Int. J. Speech Technol. (2020). https://doi.org/10.1007/s10772-020-09674-2
Article Google Scholar
Y. Ephraim, D. Malah, Speech enhancement using a min mum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust Speech Signal Process. 32(6), 1109–1121 (1984)
Article Google Scholar
N.R. French, J.C. Steinberg, Factors governing the intelligibility of speech sounds. J. Acoust. Soc. Am. 19(1), 90–119 (1947)
Article Google Scholar
T. Gerkmann, M. Krawczyk, MMSE-optimal spectral amplitude estimation given the STFT-phase. IEEE Signal Process. Lett. 20(2), 129–132 (2013). https://doi.org/10.1109/LSP.2012.2233470
Article Google Scholar
G.B. Gour, V. Udayashankara, D.K. Badakh, Y.A. Kulkarni, Framework based supervised voice activity detection using linear and non-linear features. Indian J. Comput. Sci. Eng. 11(6), 935–942 (2020). https://doi.org/10.21817/indjcse/2020/v11i6/201106181
Article Google Scholar
R. Hegger, H. Kantz, T. Schreiber, Practical implementation of nonlinear time series methods: The TISEAN package. Chaos Interdiscip. J. Nonlinear Sci. 9(2), 413–435 (1999). https://doi.org/10.1063/1.166424
Article MATH Google Scholar
P. Henriquez, J.B. Alonso, M.A. Ferrer, C.M. Travieso, J.I. Godino-Llorente, F. Diaz-de-Maria, Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans. Audio Speech Lang. Process. 17(6), 1186–1195 (2009). https://doi.org/10.1109/TASL.2009.2016734
Article Google Scholar
P.P. Ingale, S.L. Nalbalwar, Deep neural network based speech enhancement using mono channel mask. Int. J. Speech Technol. 22, 841–850 (2019). https://doi.org/10.1007/s10772-019-09627-4
Article Google Scholar
M.T. Islam, C. Shahnaz, W.P. Zhu, M.O. Ahmad, Enhancement of noisy speech with low speech distortion based on probabilistic geometric spectral subtraction (2018). arXiv preprint arXiv:1802.05125
J.J. Jiang, Y. Zhang, C. McGilligan, Chaos in voice, from modeling to measurement. J. Voice 20(1), 2–17 (2005). https://doi.org/10.1016/j.jvoice.2005.01.001
Article Google Scholar
S. Kamath, P. Loizou, A Multi-Band Spectral Subtraction Method for Enhancing Speech Corrupted by Colored Noise. IEEE International Conference on Acoustics, Speech, and Signal Processing, (2002), pp. IV-4164-IV-4164, https://doi.org/10.1109/ICASSP.2002.5745591.
D.H. Klatt, Prediction of perceived phonetic distance from critical-band spectra: a first step. Proc. IEEE ICASSP'82, vol. 2 (1982). pp. 1278–1281
K. Kondo, Subjective quality measurement of speech, its evaluation estimation and applications (Springer, Berlin Heidelberg, 2012)
Book Google Scholar
Z. Liu, H.T. Ma, F. Chen, A New Data-driven Band-weighting function for Predicting the Intelligibility of Noise-suppressed Speech. Proceedings of APSIPA Annual Summit and Conference, (Malaysia, 2017). pp. 12–15
T. Lotter, Speech enhancement by MAP spectral amplitude estimation using a super gaussian speech model. EURASIP J. Appl. Signal Process. 7, 1110–1126 (2005)
MATH Google Scholar
Y. Lu, P.C. Loizou, A geometric approach to spectral subtraction. Speech Commun. 50, 453–466 (2008). https://doi.org/10.1016/j.specom.2008.01.003
Article Google Scholar
D. Martínez, E. Lleida, A. Ortega, A. Miguel, J. Villalba, Voice pathology detection on the saarbrücken voice database with calibration and fusion of scores using multifocal toolkit, in Advances in speech and language technologies for iberian languages communications in computer and information science, vol. 328, ed. by D. Torre Toledano, A. Ortega Giménez, A. Teixeira, J. González Rodríguez, L. Hernández Gómez, R. San Segundo Hernández, D. Ramos Castro (Springer, Berlin Heidelberg, 2012)
Google Scholar
M.A.B. Messaoud, A. Bouzid, Sparse representations for single channel speech enhancement based on voiced/unvoiced classification. Circuits Syst. Signal Process. 36(5), 1912–1933 (2017). https://doi.org/10.1007/s00034-016-0384-6
Article Google Scholar
P. Murphy, O. Akande, Cepstrum-based harmonics-to-noise ratio measurement in voiced speech, in Nonlinear Speech Modeling and Applications. ed. by G. Chollet, A. Esposito, M. Faundez-Zanuy, M. Marinaro (Springer Berlin Heidelberg, Berlin, Heidelberg, 2005), pp.199–218. https://doi.org/10.1007/11520153_9
Chapter Google Scholar
M.U. Nemade, S.K. Shah, Performance Comparison of Single Channel Speech Enhancement Techniques for Personal Communication. International Journal of Innovative Research in Computer and Communication Engineering Vol. 1, Issue 1. (2013)
K.K. Paliwal, A. Basu, Speech enhancement method based on Kalman Filtering. Computer systems and communication group, TATA Institute of Fundamental Research, Bombay, India. CH-2396–0/87/0000–0177. (1987)
C. Plapous, C. Marro, P. Scalart, Improved signal-to-noise ratio estimation for speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process 14(6), 2098–2108 (2006)
Article Google Scholar
C. Plapous, C. Marro, P. Scalart, A Two-Step Noise Reduction Technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, (2004), pp. I-289 https://doi.org/10.1109/ICASSP.2004.1325979.
A.H. Poorjam, M.A. Little, J.R. Jensen, M.G. Christensen, A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2018), pp. 296–300 https://doi.org/10.1109/ICASSP.2018.8462459.
M. Putzer, J. Koreman, A German database of patterns of pathological vocal fold vibration. Phonus Instit. Phon. Univ. Saarl. 3, 143–153 (1997)
Google Scholar
S. So, A.E.W. George, R. Ghosh, K.K. Paliwal, Kalman Filter with Sensitivity Tuning for Improved Noise Reduction in Speech. Circuits Syst. Signal Process 36, 1476–1492 (2017). https://doi.org/10.1007/s00034-016-0363-y
Article MATH Google Scholar
C.M. Travieso, J.B. Alonso, J.R.O. Arroyave, J.F.V. Bonilla, E. Nöth, A.G.R. García, Detection of different voice diseases based on the nonlinear characterization of speech signals. Exp. Syst. Appl. 82, 184–195 (2017)
Article Google Scholar
S. Vihari, A.S. Murthy, P. Soni, D.C. Naik, Comparison of speech enhancement algorithms. Proced. Comput. Sci. 89, 666–676 (2016). https://doi.org/10.1016/j.procs.2016.06.032
Article Google Scholar

Download references

Acknowledgements

Authors would like to express sincere gratitude to all the participants who took part in the experimental work in the formation of laryngeal cancer disorder database. We also thank the technical staff who all the way assisted with cases in recording sessions.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, BLDEAs V.P.Dr.P.G.Halakatti College of Engineering and Technology, Vijayapur, Karnataka, 586103, India
G. B. Gour
Department of Electronics and Instrumentation, Sri Jayachamarajendra College of Engineering, Mysuru, Karnataka, 570006, India
V. Udayashankara
Department of Radiation Oncology, Sri Siddhivinayak Ganapati Cancer Hospital, Miraj, Maharastra, 416410, India
Dinesh K. Badakh & Yogesh A. Kulkarni

Authors

G. B. Gour
View author publications
You can also search for this author in PubMed Google Scholar
V. Udayashankara
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh K. Badakh
View author publications
You can also search for this author in PubMed Google Scholar
Yogesh A. Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. B. Gour.

Ethics declarations

Conflict of interest

All the authors of the paper declare that they have no conflict of interest.

Ethical Approval

All the methods and procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

The written informed consent has been taken from all the subjects involved in the research study. The present research work did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 84 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gour, G.B., Udayashankara, V., Badakh, D.K. et al. Quest for Speech Enhancement Method in the Analysis of Pathological Voices. Circuits Syst Signal Process 42, 3617–3648 (2023). https://doi.org/10.1007/s00034-022-02286-y

Download citation

Received: 17 November 2021
Revised: 25 December 2022
Accepted: 26 December 2022
Published: 12 January 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00034-022-02286-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quest for Speech Enhancement Method in the Analysis of Pathological Voices

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Informed Consent

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 84 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Quest for Speech Enhancement Method in the Analysis of Pathological Voices

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Informed Consent

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 84 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation