Skip to main content
Log in

Quest for Speech Enhancement Method in the Analysis of Pathological Voices

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

In an uncontrolled scenario like hospitals, medical research centres, laboratories, forensics, the voice recordings are normally affected with babble noise. Most of the speech enhancement (SE) methods are limited by the colour noise due to non-consideration of phase aspect. Hence, the objective of the work is to depict a more efficient and compact SE method through the comparative study of conventional SE methods taking into account of both continuous speech and sustained vowels. This approach is supported with supervised voice activity detection (VAD). The study includes an experimental investigation of time and transform-based seven conventional methods. This approach consists of pre-processing, SE evaluation block and supervised VAD section. The speech segments are efficiently detected by VAD using bio-inspired Mel frequency cepstral coefficients and nonlinear features with support vector machine. This supervised VAD helps in reliable estimation of SNR-based parameters. The performance of each of the methods is evaluated with continuous NOIZEUS corpus at varying signal-to-noise ratio (SNR) levels, laryngeal pathology-based data from Saarbruecken voice database and voice disorder data collected from cancer hospitals. Experimental results show that the performance of the Wiener filtering and geometrical-based SE methods is found to be better comparatively, as far as continuous and sustained vowels are concerned with babble noise. The paper also reveals the significant involvement of nonlinear features in VAD, as the nonlinear dynamical methods are capable of analysing irregular behaviours of vocal cords.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

Databases used in the present study such as SVD [2, 21, 29] and NOIZEUS [4] are publically available, whereas Laryngeal cancer disorder database is available with due permissions. The pertaining details are attached in the form of supplementary material for the reference.

References

  1. S. An, C. Bao, B. Xia, An Adaptive Β-Order Mmse Estimator For Speech Enhancement Using Super-Gaussian Speech Model. IEEE China Summit and International Conference on Signal and Information Processing, (2013), pp. 327–331, https://doi.org/10.1109/ChinaSIP.2013.6625354.

  2. W.J. Barry, M. Putzer, Saarbrucken Voice Database. Institute of Phonetics, Univ. of Saarland. http://www.stimmdatenbank.coli.unisaarland.de/

  3. S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979). https://doi.org/10.1109/TASSP.1979.1163209

    Article  Google Scholar 

  4. D.M. Công, Noise Reduction in Speech Enhancement by Spectral Subtraction with Scalar Kalman Filter. Ha Noi. (2015)

  5. N. Das, S. Chakraborty, J. Chaki, N. Padhy, D. Dey, Fundamentals, present and future perspectives of speech enhancement. Int. J. Speech Technol. (2020). https://doi.org/10.1007/s10772-020-09674-2

    Article  Google Scholar 

  6. Y. Ephraim, D. Malah, Speech enhancement using a min mum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust Speech Signal Process. 32(6), 1109–1121 (1984)

    Article  Google Scholar 

  7. N.R. French, J.C. Steinberg, Factors governing the intelligibility of speech sounds. J. Acoust. Soc. Am. 19(1), 90–119 (1947)

    Article  Google Scholar 

  8. T. Gerkmann, M. Krawczyk, MMSE-optimal spectral amplitude estimation given the STFT-phase. IEEE Signal Process. Lett. 20(2), 129–132 (2013). https://doi.org/10.1109/LSP.2012.2233470

    Article  Google Scholar 

  9. G.B. Gour, V. Udayashankara, D.K. Badakh, Y.A. Kulkarni, Framework based supervised voice activity detection using linear and non-linear features. Indian J. Comput. Sci. Eng. 11(6), 935–942 (2020). https://doi.org/10.21817/indjcse/2020/v11i6/201106181

    Article  Google Scholar 

  10. R. Hegger, H. Kantz, T. Schreiber, Practical implementation of nonlinear time series methods: The TISEAN package. Chaos Interdiscip. J. Nonlinear Sci. 9(2), 413–435 (1999). https://doi.org/10.1063/1.166424

    Article  MATH  Google Scholar 

  11. P. Henriquez, J.B. Alonso, M.A. Ferrer, C.M. Travieso, J.I. Godino-Llorente, F. Diaz-de-Maria, Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans. Audio Speech Lang. Process. 17(6), 1186–1195 (2009). https://doi.org/10.1109/TASL.2009.2016734

    Article  Google Scholar 

  12. P.P. Ingale, S.L. Nalbalwar, Deep neural network based speech enhancement using mono channel mask. Int. J. Speech Technol. 22, 841–850 (2019). https://doi.org/10.1007/s10772-019-09627-4

    Article  Google Scholar 

  13. M.T. Islam, C. Shahnaz, W.P. Zhu, M.O. Ahmad, Enhancement of noisy speech with low speech distortion based on probabilistic geometric spectral subtraction (2018). arXiv preprint arXiv:1802.05125

  14. J.J. Jiang, Y. Zhang, C. McGilligan, Chaos in voice, from modeling to measurement. J. Voice 20(1), 2–17 (2005). https://doi.org/10.1016/j.jvoice.2005.01.001

    Article  Google Scholar 

  15. S. Kamath, P. Loizou, A Multi-Band Spectral Subtraction Method for Enhancing Speech Corrupted by Colored Noise. IEEE International Conference on Acoustics, Speech, and Signal Processing, (2002), pp. IV-4164-IV-4164, https://doi.org/10.1109/ICASSP.2002.5745591.

  16. D.H. Klatt, Prediction of perceived phonetic distance from critical-band spectra: a first step. Proc. IEEE ICASSP'82, vol. 2 (1982). pp. 1278–1281

  17. K. Kondo, Subjective quality measurement of speech, its evaluation estimation and applications (Springer, Berlin Heidelberg, 2012)

    Book  Google Scholar 

  18. Z. Liu, H.T. Ma, F. Chen, A New Data-driven Band-weighting function for Predicting the Intelligibility of Noise-suppressed Speech. Proceedings of APSIPA Annual Summit and Conference, (Malaysia, 2017). pp. 12–15

  19. T. Lotter, Speech enhancement by MAP spectral amplitude estimation using a super gaussian speech model. EURASIP J. Appl. Signal Process. 7, 1110–1126 (2005)

    MATH  Google Scholar 

  20. Y. Lu, P.C. Loizou, A geometric approach to spectral subtraction. Speech Commun. 50, 453–466 (2008). https://doi.org/10.1016/j.specom.2008.01.003

    Article  Google Scholar 

  21. D. Martínez, E. Lleida, A. Ortega, A. Miguel, J. Villalba, Voice pathology detection on the saarbrücken voice database with calibration and fusion of scores using multifocal toolkit, in Advances in speech and language technologies for iberian languages communications in computer and information science, vol. 328, ed. by D. Torre Toledano, A. Ortega Giménez, A. Teixeira, J. González Rodríguez, L. Hernández Gómez, R. San Segundo Hernández, D. Ramos Castro (Springer, Berlin Heidelberg, 2012)

    Google Scholar 

  22. M.A.B. Messaoud, A. Bouzid, Sparse representations for single channel speech enhancement based on voiced/unvoiced classification. Circuits Syst. Signal Process. 36(5), 1912–1933 (2017). https://doi.org/10.1007/s00034-016-0384-6

    Article  Google Scholar 

  23. P. Murphy, O. Akande, Cepstrum-based harmonics-to-noise ratio measurement in voiced speech, in Nonlinear Speech Modeling and Applications. ed. by G. Chollet, A. Esposito, M. Faundez-Zanuy, M. Marinaro (Springer Berlin Heidelberg, Berlin, Heidelberg, 2005), pp.199–218. https://doi.org/10.1007/11520153_9

    Chapter  Google Scholar 

  24. M.U. Nemade, S.K. Shah, Performance Comparison of Single Channel Speech Enhancement Techniques for Personal Communication. International Journal of Innovative Research in Computer and Communication Engineering Vol. 1, Issue 1. (2013)

  25. K.K. Paliwal, A. Basu, Speech enhancement method based on Kalman Filtering. Computer systems and communication group, TATA Institute of Fundamental Research, Bombay, India. CH-2396–0/87/0000–0177. (1987)

  26. C. Plapous, C. Marro, P. Scalart, Improved signal-to-noise ratio estimation for speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process 14(6), 2098–2108 (2006)

    Article  Google Scholar 

  27. C. Plapous, C. Marro, P. Scalart, A Two-Step Noise Reduction Technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, (2004), pp. I-289 https://doi.org/10.1109/ICASSP.2004.1325979.

  28. A.H. Poorjam, M.A. Little, J.R. Jensen, M.G. Christensen, A Supervised Approach to Global Signal-to-Noise Ratio Estimation for Whispered and Pathological Voices. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2018), pp. 296–300 https://doi.org/10.1109/ICASSP.2018.8462459.

  29. M. Putzer, J. Koreman, A German database of patterns of pathological vocal fold vibration. Phonus Instit. Phon. Univ. Saarl. 3, 143–153 (1997)

    Google Scholar 

  30. S. So, A.E.W. George, R. Ghosh, K.K. Paliwal, Kalman Filter with Sensitivity Tuning for Improved Noise Reduction in Speech. Circuits Syst. Signal Process 36, 1476–1492 (2017). https://doi.org/10.1007/s00034-016-0363-y

    Article  MATH  Google Scholar 

  31. C.M. Travieso, J.B. Alonso, J.R.O. Arroyave, J.F.V. Bonilla, E. Nöth, A.G.R. García, Detection of different voice diseases based on the nonlinear characterization of speech signals. Exp. Syst. Appl. 82, 184–195 (2017)

    Article  Google Scholar 

  32. S. Vihari, A.S. Murthy, P. Soni, D.C. Naik, Comparison of speech enhancement algorithms. Proced. Comput. Sci. 89, 666–676 (2016). https://doi.org/10.1016/j.procs.2016.06.032

    Article  Google Scholar 

Download references

Acknowledgements

Authors would like to express sincere gratitude to all the participants who took part in the experimental work in the formation of laryngeal cancer disorder database. We also thank the technical staff who all the way assisted with cases in recording sessions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. B. Gour.

Ethics declarations

Conflict of interest

All the authors of the paper declare that they have no conflict of interest.

Ethical Approval

All the methods and procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

The written informed consent has been taken from all the subjects involved in the research study. The present research work did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 84 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gour, G.B., Udayashankara, V., Badakh, D.K. et al. Quest for Speech Enhancement Method in the Analysis of Pathological Voices. Circuits Syst Signal Process 42, 3617–3648 (2023). https://doi.org/10.1007/s00034-022-02286-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-022-02286-y

Keywords

Navigation