Automated classification of normal and pathological speech signals can provide an objective and accurate mechanism for pathological speech diagnosis, and is an active area of research. A large part of this research is based on analysis of acoustic measures extracted from sustained vowels. However, sustained vowels do not reflect real-world attributes of voice as effectively as continuous speech, which can take into account important attributes of speech such as rapid voice onset and termination, changes in voice frequency and amplitude, and sudden discontinuities in speech. This paper presents a methodology based on empirical mode decomposition (EMD) for classification of continuous normal and pathological speech signals obtained from a well-known database. EMD is used to decompose randomly chosen portions of speech signals into intrinsic mode functions, which are then analyzed to extract meaningful temporal and spectral features, including true instantaneous features which can capture discriminative information in signals hidden at local time-scales. A total of six features are extracted, and a linear classifier is used with the feature vector to classify continuous speech portions obtained from a database consisting of 51 normal and 161 pathological speakers. A classification accuracy of 95.7 % is obtained, thus demonstrating the effectiveness of the methodology.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Henriquez P, Alonso JB, Ferrer MA, Travieso CM, Godino-Llorente JI, Diaz-de-Maria F (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195
Parsa V, Jamieson DG (2000) Identification of pathological voices using glottal noise measures. J Speech Lang Hear Res 43(2):469–485
Saenz-Lechona N, Godino-Llorentea JI, Osma-Ruiza V, Gomez-Vilda P (2006) Methodological issues in the development of automatic systems for voice pathology detection. Biomed Signal Process Control 1(2):120–128
Gelzinis A, Verikas A, Bacauskiene M (2008) Automated speech analysis applied to laryngeal disease categorization. Comput Methods Programs Biomed 91(1):36–47
Schlotthauer G, Torres ME, Jackson-Menaldi MC (2010) A pattern recognition approach to spasmodic dysphonia and muscle tension dysphonia automatic classification. J Voice 24(3):346–353
Godino-Llorente JI, Gomez-Vilda P (2004) Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans Biomed Eng 51(2):380–384
Shama K, Krishna A, Cholayya NU (2007) Study of harmonics-to-noise ratio and critical-band energy spectrum of speech as acoustic indicators of laryngeal and voice pathology. EURASIP J Adv Signal Process. doi:10.1155/2007/85286
Markaki M, Stylianou Y, Arias-Londono JD, Godino-Llorente JI (2010) Dysphonia detection based on modulation spectral features and cepstral coefficients. In: Douglas S, Kehtarnavaz N (eds) Proceedings of the 2010 IEEE international conference on acoustics, speech, and signal processing, Dallas, Texas, USA, pp 5162–5165
Umapathy K, Krishnan S, Parsa V, Jamieson DG (2005) Discrimination of pathological voices using a time–frequency approach. IEEE Trans Biomed Eng 52(3):421–430
Ghoraani B, Krishnan S (2009) A joint time–frequency and matrix decomposition feature extraction methodology for pathological voice classification. EURASIP J Adv Signal Process. doi:10.1155/2009/928974
Parsa V, Jamieson DG (2001) Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. J Speech Lang Hear Res 4(2):327–338
Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, Yen NC, Tung CC, Liu HH (1998) The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond A 454(1971):903–995
Kaleem MF, Sugavaneswaran L, Guergachi A, Krishnan S (2010) Application of empirical mode decomposition and Teager energy operator to EEG signals for mental task classification. In: Armentano R, Monzon JE, Sacristan E, Lovell N (eds) Proceedings of the 2010 annual international conference of the IEEE engineering in medicine and biology society (EMBC), Buenos Aires, Brazil, pp 4590–4593
Mijovic B, De Vos M, Gligorijevic I, Taelman J, Van Huffel S (2010) Source separation from single-channel recordings by combining empirical mode decomposition and independent component analysis. IEEE Trans Biomed Eng 57(9):2188–2196
Schlotthauer G, Torres ME, Rufiner HL (2009) Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies. In: Doessel O, Schlegel WC (eds) IFMBE proceedings, world congress on medical physics and biomedical engineering, vol 25/4, Springer, Berlin, pp 984–987
Schlotthauer G, Torres ME, Rufiner HL (2010) Pathological voice analysis and classification based on empirical mode decomposition. In: Esposito A et al (eds) Development of multimodal interfaces: active listening and synchrony; LNCS 5967, pp 364–381
Kay Elemetrics Corporation (1994) Massachusetts eye and ear infirmary voice disorders database. Version 1.03 (CDROM), Lincoln Park, NJ, USA
Sugavaneswaran L, Umapathy K, Krishnan S (2010) Exploiting the ambiguity domain for non-stationary biomedical signal classification. In: Armentano R, Monzon JE, Sacristan E, Lovell N (eds) Proceedings of the 2010 annual international conference of the IEEE engineering in medicine and biology society (EMBC), Buenos Aires, Brazil, pp 1934–1937
Malyska N, Quatieri TF, Sturim D (2005) Automatic dysphonia recognition using iologically-inspired amplitude-modulation features. In: Petropulu AP, Bystrom M (eds) Proceedings of the 2005 IEEE international conference on acoustics, speech, and signal processing, Philadelphia, Pennsylvania, USA, vol 1, pp 873–876
Furui S (1986) On the role of spectral transition for speech perception. J Acoust Soc Am 80(4):1016–1025
Adam O (2006) Advantages of the Hilbert Huang transform for marine mammals signal analysis. J Acoust Soc Am 120(5):2965–2973
Flandrin P et al. (2007) Matlab codes for empirical mode decomposition algorithm. http://perso.ens-lyon.fr/patrick.flandrin/emd.html. Accessed 25 Jan 2013
Hettmansperger TP, McKean J (2010) Robust nonparametric statistical methods, 2nd edn. Chapman and Hall/CRC Monographs on Statistics and Applied Probability, CRC Press, New York
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley and Sons, New York
Wu Z, Huang NE (2009) Ensemble empirical mode decomposition: a noise assisted data analysis method. Adv Adapt Data Analysis 1(1):1:41
Moran RJ, Reilly RB, de Chazal P, Lacy PD (2006) Telephony-based voice pathology assessment using automated speech analysis. IEEE Trans Biomed Eng 53(3):468–477
Kaleem MF, Ghoraani B, Guergachi A, Krishnan S (2011) Telephone-quality pathological speech classification using empirical mode decomposition. In: Bonato P, Laine A, Lovell N (eds) Proceedings of the 2011 annual international conference of the IEEE engineering in medicine and biology society (EMBC), Boston, MA, USA, pp 7095–7098
About this article
Cite this article
Kaleem, M., Ghoraani, B., Guergachi, A. et al. Pathological speech signal analysis and classification using empirical mode decomposition. Med Biol Eng Comput 51, 811–821 (2013). https://doi.org/10.1007/s11517-013-1051-8
- Empirical mode decomposition
- Speech signal analysis
- Feature extraction
- Pathological speech classification