Abstract
Both pre-processing (feature extraction) and pattern classification techniques are discussed in this chapter. Traditionally, specialised parameters have been used for the analysis of speech disorders: harmonic-to-noise ratio, jitter, shimmer, and others. These have been devised using expert opinions from speech and language therapists and other professionals. They are typically calculated using widely available software packages, but still require trained personnel to collect and prepare the recordings, as well as to interpret the resulting parameters. More recently, researchers have also investigated many of the parameters or features used in speech and speaker recognition. Features such as the ubiquitous mel-frequency cepstral coefficients are often used, but so are numerous less common methods, such as formant frequencies, modulation spectra, chaos-theory parameters, and prosodic and phonological features. Each of these has had its fair share of success, but the most successful systems have generally used a combination of multiple features and/or multiple classification algorithms. Numerous methods for discriminating between disordered and normal speech, and sometimes between different forms of speech disorder, have been devised. They have typically been based on neural networks, Markov models, support vector machines, and other classifiers (both linear and non-linear), although Gaussian Mixture Models are probably the most widely used, robust, and successful so far.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Accardo A, Fabbro F, Mumolo E (1992) Analysis of normal and pathological voices via short-time fractal dimension. In: Proceedings of annual international conference of the IEEE engineering in medicine and biology society, vol 14, pp 1270–1271
Alpan A, Schoentgen J, Maryn Y, Grenez F, Murphy P (2009) Cepstral analysis of vocal dysperiodicities in disordered connected speech. In: Proceedings of INTERSPEECH-2009, pp 959–962
Askenfelt A, Hammarberg B (1986) Speech waveform perturbation analysis: a perceptual-acoustical comparison of seven measures. J Speech Hearing Res 29:50–64
Awan SN, Scarpino SE (2004) Measures of vocal F0 from continuous speech samples: an inter-program comparison. J Speech Lang Pathol Audiol 28:122–131
Baken RI (1987) Clinical measurement of speech and voice. College Hill Press, Boston
Carmichael J, Wan V, Green P (2008) Combining neural network and rule-based systems for dysarthria diagnosis. In: Proceedings of INTERSPEECH-2008, pp 2226–2229
Castillo-Guerra E, Lovey DF (2003) A modern approach to dysarthria classification. In: 25th Annual Conference of the IEEE Engineering in Medicine and Biology Society, vol 3, 2257–2260. doi:10.1109/IEMBS.2003.1280248
Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans Audio, Speech Lang Process 20(1):30–42. doi:10.1109/TASL.2011.2134090
de Krom G (1994) Consistency and reliability of voice quality ratings for different types of speech fragments. J Speech Hearing Res 37(5):965–1000
de Krom G (1995) Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. J Speech Hearing Res 38:794–811
Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of pathological speech. Eng Med and Biol 2002. In: Proceedings of the 24th annual conference and annual fall meeting of the biomedical engineering society EMBS/BMES, vol 1, pp 182–183. doi:10.1109/IEMBS.2002.1134447
Droppo J, Acero A (2010). In: IEEE international conference on acoustics speech and signal processing ICASSP-2010, pp 4358–4361. doi:10.1109/ICASSP.2010.5495652
Ganapathiraju A, Hamaker JE, Picone J (2004) Applications of support vector machines to speech recognition. IEEE Trans Signal Process 52(8):2348–2355. doi:10.1109/TSP.2004.831018
Gunn SR (1998) Support vector machines for classification and regression. School of Electronics and Computer Science technical report, University of Southampton
Haderlein T, Zorn D, Steidl S, Nöth E, Shozakai M, Schuster M (2006) Visualization of voice disorders using the Sammon transform. In: Proceedings of the 9th international conference on text, speech and dialogue (TSD ‘06). Lecture notes in computer science, vol 4188, pp 589–596
Hariharan M, Paulraj MP, Yaacob S (2010) Time-domain features and probabilistic neural network for the detection of vocal fold pathology. Malays J Comput Sci 23(1):60–67
HenrÃquez P, Alonso JB, Ferrer MA, Travieso CM, Godino-Llorente JI, DÃaz-de-MarÃa F (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195
Horii Y (1979) Fundamental frequency perturbation observed in sustained phonation. J Speech Hearing Res 22:5–19
Hosom JP, Shriberg L, Green JR (2004) Diagnostic assessment of childhood apraxia of speech using automatic speech recognition (ASR) methods. J Med Speech Lang Pathol 12(4):167–171
Kumar A, Mullick SK (1990) Attractor dimension, entropy and modelling of speech time series. Electron Lett 26(21):1790–1791
Llerena C, Alvarez L, Ayllon D (2011) Pitch detection in pathological voices driven by three tailored classical pitch detection algorithms. In: Recent advances in signal processing, computational geometry and systems theory. Proceeding of the ISCGAV’11 and ISTASC’11, pp 113–118
Maier A, Haderlein T, Eysholdt U, Rosanowski F, Batliner A, Schuster M, Nöth E (2009) PEAKS—a system for the automatic evaluation of voice and speech disorders. Speech Commun 51(5):425–437. doi:10.1016/j.specom.2009.01.004
Maier A, Haderlein T, Stelzle F, Nöth E, Nkenke E, Rosanowski F, Schützenberger A, Schuster M (2010) Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP J Audio Speech Music Process. doi:10.1155/2010/926951
Malyska N, Quatieri TF, Sturim D (2005) Automatic dysphonia recognition using biologically inspired amplitude-modulation features. In: IEEE international conference on acoustics, speech, and. signal processing ICASSP-2005, pp 873–876
Markaki M, Stylianou Y (2009) Using modulation spectra for voice pathology detection and classification. In: Proceedings of the IEEE conference on engineering in medicine and biology society 2009, pp 2514–2517
Markaki M, Stylianou Y, Arias-Londono JD, Godino-Llorente JI (2010) Dysphonia detection based on modulation spectral features and cepstral coefficients. In Proceedings of ICASSP-2010, pp 5162–5165. doi:10.1109/ICASSP.2010.5495020
Middag C, Martens J-P, van Nuffelen G, de Bodt M (2009) Automated intelligibility assessment of pathological speech using phonological features. EURASIP J Adv Signal Process. doi:10.1155/2009/629030
Moakes PA, Beet S (1994) Analysis of non-linear speech generating dynamics. In Proceedings of 3rd international conference on spoken language processing (ICSLP 94), pp 1039–1042
Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16:69–88
Muhammad G, Mesallam TA, Malki KH, Farahat M, Alsulaiman M (2011) Formant analysis in dysphonic patients and automatic Arabic digit speech recognition. BioMed Eng OnLine 10:41. doi:10.1186/1475-925X-10-41
Padrell-Sendra J, Martin-Iglesias D, Diaz-de-Maria F (2006) Support vector machines for continuous speech recognition. In: Proceedings of the 14th European signal processing conference EUSIPCO-2006. http://www.eurasip.org/Proceedings/Eusipco/Eusipco2006/papers/1568981563.pdf. Accessed 16 Feb 2012
Parsa V, Jamieson DG (2001) Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. J Speech Lang Hear Res 44:327–339
Pinto J, Lovitt A, Hermansky H (2007) Exploiting phoneme similarities in hybrid HMM-ANN keyword spotting. In Proceedings of INTERSPEECH-2007, pp 1817–1820
Pompili A, Abad A, Trancoso I, Fonseca J, Martins IP, Leal G, Farrajota L (2011) An on-line system for remote treatment of aphasia. In: Proceedings of 2nd workshop on speech and language processing for assistive technologies (SLPAT). http://www.inesc-id.pt/pt/indicadores/Ficheiros/7415.pdf. Accessed 16 Feb 2012
Reilly RB, Moran R, Lacy PD (2004) Voice pathology assessment based on a dialogue system and speech analysis. In Proc Amer Assoc Artif Intell Fall Symp Dialogue Syst Health Commun 104–109
Ringeval F, Demouy J, Szaszák G, Chetouani M, Robel L, Xavier J, Cohen D, Plaza M (2010) Automatic intonation recognition for the prosodic assessment of language-impaired children. IEEE Trans Audio, Speech, and Lang Process 19(5):1328–1342. doi:10.1109/TASL.2010.2090147
Salhi L, Mourad T, Cherif A (2010) Voice disorders identification using multilayer neural network. Int Arab J Inf Technol 7(2):177–185
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409
Silva DG, Oliveira LC, Andrea M (2009) Jitter estimation algorithms for detection of pathological voices. EURASIP J Adv Signal Process 1–10. doi:10.1155/2009/567875
Steidl S, Stemmer G, Hacker C, Nöth E (2004) Adaption in the pronunciation space for non-native speech recognition. In Proc Int Conf on Spoken Lang Process ICSLP 318–321
Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE Trans Biomed Eng 59(5):1264–1271
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 The Author(s)
About this chapter
Cite this chapter
Baghai-Ravary, L., Beet, S.W. (2013). Established Methods. In: Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4574-6_5
Download citation
DOI: https://doi.org/10.1007/978-1-4614-4574-6_5
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4573-9
Online ISBN: 978-1-4614-4574-6
eBook Packages: EngineeringEngineering (R0)