Abstract
In an uncontrolled scenario like hospitals, medical research centres, laboratories, forensics, the voice recordings are normally affected with babble noise. Most of the speech enhancement (SE) methods are limited by the colour noise due to non-consideration of phase aspect. Hence, the objective of the work is to depict a more efficient and compact SE method through the comparative study of conventional SE methods taking into account of both continuous speech and sustained vowels. This approach is supported with supervised voice activity detection (VAD). The study includes an experimental investigation of time and transform-based seven conventional methods. This approach consists of pre-processing, SE evaluation block and supervised VAD section. The speech segments are efficiently detected by VAD using bio-inspired Mel frequency cepstral coefficients and nonlinear features with support vector machine. This supervised VAD helps in reliable estimation of SNR-based parameters. The performance of each of the methods is evaluated with continuous NOIZEUS corpus at varying signal-to-noise ratio (SNR) levels, laryngeal pathology-based data from Saarbruecken voice database and voice disorder data collected from cancer hospitals. Experimental results show that the performance of the Wiener filtering and geometrical-based SE methods is found to be better comparatively, as far as continuous and sustained vowels are concerned with babble noise. The paper also reveals the significant involvement of nonlinear features in VAD, as the nonlinear dynamical methods are capable of analysing irregular behaviours of vocal cords.
Similar content being viewed by others
References
S. An, C. Bao, B. Xia, An Adaptive Β-Order Mmse Estimator For Speech Enhancement Using Super-Gaussian Speech Model. IEEE China Summit and International Conference on Signal and Information Processing, (2013), pp. 327–331, https://doi.org/10.1109/ChinaSIP.2013.6625354.
W.J. Barry, M. Putzer, Saarbrucken Voice Database. Institute of Phonetics, Univ. of Saarland. http://www.stimmdatenbank.coli.unisaarland.de/
S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979). https://doi.org/10.1109/TASSP.1979.1163209
D.M. Công, Noise Reduction in Speech Enhancement by Spectral Subtraction with Scalar Kalman Filter. Ha Noi. (2015)
N. Das, S. Chakraborty, J. Chaki, N. Padhy, D. Dey, Fundamentals, present and future perspectives of speech enhancement. Int. J. Speech Technol. (2020). https://doi.org/10.1007/s10772-020-09674-2
Y. Ephraim, D. Malah, Speech enhancement using a min mum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust Speech Signal Process. 32(6), 1109–1121 (1984)
N.R. French, J.C. Steinberg, Factors governing the intelligibility of speech sounds. J. Acoust. Soc. Am. 19(1), 90–119 (1947)
T. Gerkmann, M. Krawczyk, MMSE-optimal spectral amplitude estimation given the STFT-phase. IEEE Signal Process. Lett. 20(2), 129–132 (2013). https://doi.org/10.1109/LSP.2012.2233470
G.B. Gour, V. Udayashankara, D.K. Badakh, Y.A. Kulkarni, Framework based supervised voice activity detection using linear and non-linear features. Indian J. Comput. Sci. Eng. 11(6), 935–942 (2020). https://doi.org/10.21817/indjcse/2020/v11i6/201106181
R. Hegger, H. Kantz, T. Schreiber, Practical implementation of nonlinear time series methods: The TISEAN package. Chaos Interdiscip. J. Nonlinear Sci. 9(2), 413–435 (1999). https://doi.org/10.1063/1.166424
P. Henriquez, J.B. Alonso, M.A. Ferrer, C.M. Travieso, J.I. Godino-Llorente, F. Diaz-de-Maria, Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans. Audio Speech Lang. Process. 17(6), 1186–1195 (2009). https://doi.org/10.1109/TASL.2009.2016734
P.P. Ingale, S.L. Nalbalwar, Deep neural network based speech enhancement using mono channel mask. Int. J. Speech Technol. 22, 841–850 (2019). https://doi.org/10.1007/s10772-019-09627-4
M.T. Islam, C. Shahnaz, W.P. Zhu, M.O. Ahmad, Enhancement of noisy speech with low speech distortion based on probabilistic geometric spectral subtraction (2018). arXiv preprint arXiv:1802.05125
J.J. Jiang, Y. Zhang, C. McGilligan, Chaos in voice, from modeling to measurement. J. Voice 20(1), 2–17 (2005). https://doi.org/10.1016/j.jvoice.2005.01.001
S. Kamath, P. Loizou, A Multi-Band Spectral Subtraction Method for Enhancing Speech Corrupted by Colored Noise. IEEE International Conference on Acoustics, Speech, and Signal Processing, (2002), pp. IV-4164-IV-4164, https://doi.org/10.1109/ICASSP.2002.5745591.
D.H. Klatt, Prediction of perceived phonetic distance from critical-band spectra: a first step. Proc. IEEE ICASSP'82, vol. 2 (1982). pp. 1278–1281
K. Kondo, Subjective quality measurement of speech, its evaluation estimation and applications (Springer, Berlin Heidelberg, 2012)
Z. Liu, H.T. Ma, F. Chen, A New Data-driven Band-weighting function for Predicting the Intelligibility of Noise-suppressed Speech. Proceedings of APSIPA Annual Summit and Conference, (Malaysia, 2017). pp. 12–15
T. Lotter, Speech enhancement by MAP spectral amplitude estimation using a super gaussian speech model. EURASIP J. Appl. Signal Process. 7, 1110–1126 (2005)
Y. Lu, P.C. Loizou, A geometric approach to spectral subtraction. Speech Commun. 50, 453–466 (2008). https://doi.org/10.1016/j.specom.2008.01.003
D. Martínez, E. Lleida, A. Ortega, A. Miguel, J. Villalba, Voice pathology detection on the saarbrücken voice database with calibration and fusion of scores using multifocal toolkit, in Advances in speech and language technologies for iberian languages communications in computer and information science, vol. 328, ed. by D. Torre Toledano, A. Ortega Giménez, A. Teixeira, J. González Rodríguez, L. Hernández Gómez, R. San Segundo Hernández, D. Ramos Castro (Springer, Berlin Heidelberg, 2012)
M.A.B. Messaoud, A. Bouzid, Sparse representations for single channel speech enhancement based on voiced/unvoiced classification. Circuits Syst. Signal Process. 36(5), 1912–1933 (2017). https://doi.org/10.1007/s00034-016-0384-6
P. Murphy, O. Akande, Cepstrum-based harmonics-to-noise ratio measurement in voiced speech, in Nonlinear Speech Modeling and Applications. ed. by G. Chollet, A. Esposito, M. Faundez-Zanuy, M. Marinaro (Springer Berlin Heidelberg, Berlin, Heidelberg, 2005), pp.199–218. https://doi.org/10.1007/11520153_9
M.U. Nemade, S.K. Shah, Performance Comparison of Single Channel Speech Enhancement Techniques for Personal Communication. International Journal of Innovative Research in Computer and Communication Engineering Vol. 1, Issue 1. (2013)
K.K. Paliwal, A. Basu, Speech enhancement method based on Kalman Filtering. Computer systems and communication group, TATA Institute of Fundamental Research, Bombay, India. CH-2396–0/87/0000–0177. (1987)
C. Plapous, C. Marro, P. Scalart, Improved signal-to-noise ratio estimation for speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process 14(6), 2098–2108 (2006)
C. Plapous, C. Marro, P. Scalart, A Two-Step Noise Reduction Technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, (2004), pp. I-289 https://doi.org/10.1109/ICASSP.2004.1325979.
A.H. Poorjam, M.A. Little, J.R. Jensen, M.G. Christensen, A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2018), pp. 296–300 https://doi.org/10.1109/ICASSP.2018.8462459.
M. Putzer, J. Koreman, A German database of patterns of pathological vocal fold vibration. Phonus Instit. Phon. Univ. Saarl. 3, 143–153 (1997)
S. So, A.E.W. George, R. Ghosh, K.K. Paliwal, Kalman Filter with Sensitivity Tuning for Improved Noise Reduction in Speech. Circuits Syst. Signal Process 36, 1476–1492 (2017). https://doi.org/10.1007/s00034-016-0363-y
C.M. Travieso, J.B. Alonso, J.R.O. Arroyave, J.F.V. Bonilla, E. Nöth, A.G.R. García, Detection of different voice diseases based on the nonlinear characterization of speech signals. Exp. Syst. Appl. 82, 184–195 (2017)
S. Vihari, A.S. Murthy, P. Soni, D.C. Naik, Comparison of speech enhancement algorithms. Proced. Comput. Sci. 89, 666–676 (2016). https://doi.org/10.1016/j.procs.2016.06.032
Acknowledgements
Authors would like to express sincere gratitude to all the participants who took part in the experimental work in the formation of laryngeal cancer disorder database. We also thank the technical staff who all the way assisted with cases in recording sessions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All the authors of the paper declare that they have no conflict of interest.
Ethical Approval
All the methods and procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent
The written informed consent has been taken from all the subjects involved in the research study. The present research work did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gour, G.B., Udayashankara, V., Badakh, D.K. et al. Quest for Speech Enhancement Method in the Analysis of Pathological Voices. Circuits Syst Signal Process 42, 3617–3648 (2023). https://doi.org/10.1007/s00034-022-02286-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-022-02286-y