Application of glottal flow descriptors for pathological voice diagnosis

Gidaye, Girish; Nirmal, Jagannath; Ezzine, Kadria; Shrivas, Avinash; Frikha, Mondher

doi:10.1007/s10772-020-09679-x

Application of glottal flow descriptors for pathological voice diagnosis

Published: 28 January 2020

Volume 23, pages 205–222, (2020)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Girish Gidaye¹,
Jagannath Nirmal²,
Kadria Ezzine³,
Avinash Shrivas⁴ &
…
Mondher Frikha³

312 Accesses
6 Citations
3 Altmetric
Explore all metrics

Abstract

Acoustic analysis of speech signal enables automatic detection and classification of voice disorders along with its severity. This automatic assessment provides help to the clinician in initial diagnosis of pathological larynx in non-intrusive way. Voice pathologies damage the vocal cords and consequently alter the dynamics (fluctuation speed) of vocal cords. In this article, we have estimated glottal volume velocity waveform (GVVW) from the speech pressure waveforms of healthy and pathological subjects using quasi closed phase (QCP) glottal inverse filtering algorithm to capture altered dynamics of vocal cords. Closed-phase methods revealed notable stability in diverse voice qualities and sub-glottal pressures. The GVVW is the source of significant acoustical clues rooted in speech. The estimated GVVW is then parameterized by various time based, frequency based and Liljencrants–Fant (LF) model based glottal descriptors. Glottal descriptor’s vectors have been passed on to stochastic gradient descent (SGD) classifier for voice disorder evaluation. The normal pitch utterance of sustained vowel /a/ quarried from German, English, Arabic and Spanish voice databases is used. Information gain (IG) feature scoring technique is employed to select optimal descriptors and to rank them. Several intra and cross-database experiments were performed to explore the usefulness of glottal descriptors for voice disorder detection, severity detection and classification. Student’s t-tests were performed to validate the obtained results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Assessing a Set of Glottal Features from Vocal Fold Biomechanics

Automatic Assessment of Pathological Voice Quality Using Multidimensional Acoustic Analysis Based on the GRBAS Scale

Article 05 June 2015

Spectral Analysis of Speech Signal Characteristics: A Comparison Between Healthy Controls and Laryngeal Disorder

References

Airaksinen, M., Raitio, T., Story, B., & Alku, P. (2014). Quasi closed phase glottal inverse filtering analysis with weighted linear prediction. IEEE/ACM Transactions on Audio, Speech, and Language Processing,22(3), 596–607.
Google Scholar
Airaksinen, M., Story, B., & Alku, P. (2013). Quasi closed phase analysis for glottal inverse filtering. In Proceedings of the Interspeech 2013, (pp. 143–147).
Airas, M. (2008). TKK Aparat: An environment for voice inverse filtering and parameterization. Logopedics Phoniatrics Vocology,33, 49–64.
Google Scholar
Ali, Z., et al. (2017). Intra- and Inter-database study for Arabic, English, and German databases: Do conventional speech features detect voice pathology? Journal of Voice,31(3), 386.e1–386.e8.
Google Scholar
Alku, P., Pohjalainen H., & Airaksinen, M. (2017). Aalto Aparat: A freely available tool for glottal inverse filtering and voice source parameterization. In Proceeding of Subsidia: Tools and Resources for Speech Sciences, Malaga.
Al-nasheri, A., Ali, Z., Muhammad, G., & Alsulaiman, M. (2014). Voice pathology detection using auto-correlation of different filters bank. Proceedings of 11th ACS/IEEE International Conference on Computer Systems and Applications, (pp. 110–117).
Al-nasheri, A., et al. (2016). An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. Journal of Voice,31(1), 1139.e9–113.e18.
Google Scholar
Al-nasheri, A., et al. (2018). Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access,6, 6961–6974.
Google Scholar
Arias-Londõno, J., Godino-Llorente, J., Markaki, M., & Stylianou, Y. (2011). On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices. Logopedics Phoniatrics Vocology,36, 60–69.
Google Scholar
Arjmandi, M., Pooyan, M., Mikaili, M., Vali, M., & Moqarehzadeh, A. (2011). Identification of voice disorders using long-time features and support vector machine with different feature reduction methods. Journal of Voice,25(6), 275–289.
Google Scholar
Barry, W., & Pützer, M. (2017). Saarbrucken voice database. Institute of Phonetics. Retrieved May 2, 2017 from https://www.Stimmdatenbank.coli.uni-saarland.de.
Benmalek, E., Elmhamdi, J., & Jilbab, A. (2018). Multiclass classification of Parkinson’s disease using cepstral analysis. International Journal of Speech Technology,21(1), 39–49.
Google Scholar
Boyanov, B., & Hadjitodorov, S. (1997). Acoustic analysis of pathological voices. IEEE Engineering in Medicine and Biology Magazine,16, 74–82.
Google Scholar
Boyanov, B., Ivanov, T., Hadjitodorov, S., & Chollet, G. (1993). Robust hybrid pitch detector. Electronics Letters,29(22), 1924–1926.
Google Scholar
Davis, S. (1979). Acoustic characteristics of normal and pathological voices. Speech and Language,1, 271–335.
Google Scholar
Drugman, T., Bozkurt, B., & Dutoit, T. (2012). A comparative study of glottal source estimation techniques. Computer Speech & Language,26(1), 20–34.
Google Scholar
Fant, G., Liljencrants, J., & Lin, Q. (1985). A four-parameter model of glottal flow. STL-QPSR,26(4), 001–013.
Google Scholar
Fontes, A., Souza, P., Neto, A., Martins, A., & Silveira, L. (2014). Classification system of pathological voices using correntropy. Mathematical Problems in Engineering. https://doi.org/10.1155/2014/924786.
Article Google Scholar
Godino-Llorente, J., & Gómez-Vilda, P. (2004). Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Transactions on Biomedical Engineering,51(2), 380–384.
Google Scholar
Godino-Llorente, J., Gómez-Vilda, P., & Blanco-Velasco, M. (2006). Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and short-term cepstral parameters. IEEE Transactions on Biomedical Engineering,53(10), 1943–1953.
Google Scholar
Kasuya, H., Ogawa, S., Mashima, K., & Ebihara, S. (1986). Normalized noise energy as an acoustic measure to evaluate pathologic voice. Journal of the Acoustical Society of America,80(5), 1329–1334.
Google Scholar
Kay Elemetrics Corp. (1994). Disordered voice database, Version 1.03 (CD-ROM), MEEI, Voice and Speech Lab. Boston: Kay Elemetrics Corp.
Lee, J., Kang, H., & Choi, J. (2013). An investigation of vocal tract characteristics for acoustic discrimination of pathological voices. BioMed Research International,2013, 1–11.
Google Scholar
Lehto, L., Airas, M., Björkner, E., Sundberg, J., & Alku, P. (2007). Comparison of two inverse filtering methods in parameterization of the glottal closing phase characteristics in different phonation types. Journal of Voice,21(2), 138–150.
Google Scholar
Leonardo, A., Kohler, M., Vellasco, M., & Cataldo, E. (2015). Analysis and classification of voice pathologies using glottal signal parameters. Journal of Voice,30(5), 549–556.
Google Scholar
Ma, C., Kamp, Y., & Willems, L. (1993). Robust signal selection for linear prediction analysis of voiced speech. Speech Communication,12(1), 69–81.
Google Scholar
Manfredi, C., Pierazzi, L., & Bruscaglioni, P. (1999). Pitch estimation for noise retrieval in time and frequency domain. Medical & Biological Engineering & Computing,37(2), 532–533.
Google Scholar
Markaki, M., & Stylianou, Y. (2011). Voice pathology detection and discrimination based on modulation spectral features. IEEE Transactions on Audio Speech and Language Processing,19(7), 1938–1948.
Google Scholar
Mesallam, T., et al. (2017). Development of the Arabic voice pathology database (AVPD) and its evaluation by using speech features and machine learning algorithms. Journal of Healthcare Engineering,8, 1–13.
Google Scholar
Michaelis, D., Gramss, H., & Strube, W. (1997). Glottal-to-Noise ratio: A new measure for describing pathological voices. Acustica/Acta Acustica,83, 700–706.
Google Scholar
Muhammad, G., & Melhem, M. (2014). Pathological voice detection and binary classification using MPEG-7 audio features. Biomedical Signal Processing and Control,11, 1–9.
Google Scholar
Muhammad, G., et al. (2017). Voice pathology detection using interlaced derivative pattern on glottal source excitation. Biomedical Signal Processing and Control,31, 156–164.
Google Scholar
Nemr, K., et al. (2012). GRBAS and Cape-V scales: High reliability and consensus when applied at different times. Journal of Voice,26(6), 812.e17–822.e17.
Google Scholar
Panek, D., Skalski, A., & Gajda, J. (2014). Quantification of linear and non-linear acoustic analysis applied to voice pathology detection. Information Technologies in Biomedicine,4, 355–364.
Google Scholar
Qi, Y., & Hillman, R. (1997). Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. Journal of the Acoustical Society of America,102(1), 537–543.
Google Scholar
Rose, P., & Robertson, J. (2002). Forensic speaker identification. London: Taylor & Francis.
Google Scholar
Sauder, C., Bretl, M., & Eadie, T. (2017). Predicting voice disorder status from smoothed measures of cepstral peak prominence using PRAAT and analysis of dysphonia in speech and voice. Journal of Voice,31(5), 557–566.
Google Scholar
Sousa, R., Ferreira, A., & Alku, P. (2014). The harmonic and noise information of the glottal pulses in speech. Biomedical Signal Processing and Control,10, 137–143.
Google Scholar
Szaleniec, J., Modrzejewski, M., Szaleniec, M., & Wszolek, W. (2007). Application of new acoustic parameters in ANN-aided pathological speech diagnosis. Archives of Acoustics,32(1), 177–186.
Google Scholar
Tulics, M., & Vicsi, K. (2019). The automatic assessment of the severity of dysphonia. International Journal of Speech Technology,22(1), 1–10.
Google Scholar
Winholtz, W. (1992). Vocal tremor analysis with the vocal demodulator. Journal of Speech and Hearing Research,35(3), 562–573.
Google Scholar
Wszolek, W. (2006). Selected methods of pathological speech signal analysis. Archives of Acoustics,31(4), 413–430.
Google Scholar
Yumoto, E., Sasaki, Y., & Okamura, H. (1984). Harmonics-to-noise ratio and psychophysical measurement of the degree of hoarseness. Journal of Speech and Hearing Research,27(1), 2–6.
Google Scholar

Download references

Author information

Authors and Affiliations

Research Scholar: K. J. Somaiya College of Engineering, Vidyalankar Institute of Technology, Mumbai, India
Girish Gidaye
K. J. Somaiya College of Engineering, Vidyavihar (E), Mumbai, India
Jagannath Nirmal
ATISP, ENET’COM, Sfax University, Sfax, Tunisia
Kadria Ezzine & Mondher Frikha
Vidyalankar Institute of Technology, Mumbai, India
Avinash Shrivas

Authors

Girish Gidaye
View author publications
You can also search for this author in PubMed Google Scholar
Jagannath Nirmal
View author publications
You can also search for this author in PubMed Google Scholar
Kadria Ezzine
View author publications
You can also search for this author in PubMed Google Scholar
Avinash Shrivas
View author publications
You can also search for this author in PubMed Google Scholar
Mondher Frikha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Girish Gidaye.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Table

Table 12 Ranking of descriptors

Full size table

12.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gidaye, G., Nirmal, J., Ezzine, K. et al. Application of glottal flow descriptors for pathological voice diagnosis. Int J Speech Technol 23, 205–222 (2020). https://doi.org/10.1007/s10772-020-09679-x

Download citation

Received: 07 May 2019
Accepted: 19 January 2020
Published: 28 January 2020
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10772-020-09679-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of glottal flow descriptors for pathological voice diagnosis

Abstract

Access this article

Similar content being viewed by others

Assessing a Set of Glottal Features from Vocal Fold Biomechanics

Automatic Assessment of Pathological Voice Quality Using Multidimensional Acoustic Analysis Based on the GRBAS Scale

Spectral Analysis of Speech Signal Characteristics: A Comparison Between Healthy Controls and Laryngeal Disorder

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Application of glottal flow descriptors for pathological voice diagnosis

Abstract

Access this article

Similar content being viewed by others

Assessing a Set of Glottal Features from Vocal Fold Biomechanics

Automatic Assessment of Pathological Voice Quality Using Multidimensional Acoustic Analysis Based on the GRBAS Scale

Spectral Analysis of Speech Signal Characteristics: A Comparison Between Healthy Controls and Laryngeal Disorder

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation