Established Methods

Baghai-Ravary, Ladan; Beet, Steve W.

doi:10.1007/978-1-4614-4574-6_5

Ladan Baghai-Ravary³ &
Steve W. Beet³

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

960 Accesses

Abstract

Both pre-processing (feature extraction) and pattern classification techniques are discussed in this chapter. Traditionally, specialised parameters have been used for the analysis of speech disorders: harmonic-to-noise ratio, jitter, shimmer, and others. These have been devised using expert opinions from speech and language therapists and other professionals. They are typically calculated using widely available software packages, but still require trained personnel to collect and prepare the recordings, as well as to interpret the resulting parameters. More recently, researchers have also investigated many of the parameters or features used in speech and speaker recognition. Features such as the ubiquitous mel-frequency cepstral coefficients are often used, but so are numerous less common methods, such as formant frequencies, modulation spectra, chaos-theory parameters, and prosodic and phonological features. Each of these has had its fair share of success, but the most successful systems have generally used a combination of multiple features and/or multiple classification algorithms. Numerous methods for discriminating between disordered and normal speech, and sometimes between different forms of speech disorder, have been devised. They have typically been based on neural networks, Markov models, support vector machines, and other classifiers (both linear and non-linear), although Gaussian Mixture Models are probably the most widely used, robust, and successful so far.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Accardo A, Fabbro F, Mumolo E (1992) Analysis of normal and pathological voices via short-time fractal dimension. In: Proceedings of annual international conference of the IEEE engineering in medicine and biology society, vol 14, pp 1270–1271
Google Scholar
Alpan A, Schoentgen J, Maryn Y, Grenez F, Murphy P (2009) Cepstral analysis of vocal dysperiodicities in disordered connected speech. In: Proceedings of INTERSPEECH-2009, pp 959–962
Google Scholar
Askenfelt A, Hammarberg B (1986) Speech waveform perturbation analysis: a perceptual-acoustical comparison of seven measures. J Speech Hearing Res 29:50–64
Google Scholar
Awan SN, Scarpino SE (2004) Measures of vocal F0 from continuous speech samples: an inter-program comparison. J Speech Lang Pathol Audiol 28:122–131
Google Scholar
Baken RI (1987) Clinical measurement of speech and voice. College Hill Press, Boston
Google Scholar
Carmichael J, Wan V, Green P (2008) Combining neural network and rule-based systems for dysarthria diagnosis. In: Proceedings of INTERSPEECH-2008, pp 2226–2229
Google Scholar
Castillo-Guerra E, Lovey DF (2003) A modern approach to dysarthria classification. In: 25th Annual Conference of the IEEE Engineering in Medicine and Biology Society, vol 3, 2257–2260. doi:10.1109/IEMBS.2003.1280248
Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans Audio, Speech Lang Process 20(1):30–42. doi:10.1109/TASL.2011.2134090
Article Google Scholar
de Krom G (1994) Consistency and reliability of voice quality ratings for different types of speech fragments. J Speech Hearing Res 37(5):965–1000
Google Scholar
de Krom G (1995) Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. J Speech Hearing Res 38:794–811
Google Scholar
Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of pathological speech. Eng Med and Biol 2002. In: Proceedings of the 24th annual conference and annual fall meeting of the biomedical engineering society EMBS/BMES, vol 1, pp 182–183. doi:10.1109/IEMBS.2002.1134447
Droppo J, Acero A (2010). In: IEEE international conference on acoustics speech and signal processing ICASSP-2010, pp 4358–4361. doi:10.1109/ICASSP.2010.5495652
Ganapathiraju A, Hamaker JE, Picone J (2004) Applications of support vector machines to speech recognition. IEEE Trans Signal Process 52(8):2348–2355. doi:10.1109/TSP.2004.831018
Article Google Scholar
Gunn SR (1998) Support vector machines for classification and regression. School of Electronics and Computer Science technical report, University of Southampton
Google Scholar
Haderlein T, Zorn D, Steidl S, Nöth E, Shozakai M, Schuster M (2006) Visualization of voice disorders using the Sammon transform. In: Proceedings of the 9th international conference on text, speech and dialogue (TSD ‘06). Lecture notes in computer science, vol 4188, pp 589–596
Google Scholar
Hariharan M, Paulraj MP, Yaacob S (2010) Time-domain features and probabilistic neural network for the detection of vocal fold pathology. Malays J Comput Sci 23(1):60–67
Google Scholar
Henríquez P, Alonso JB, Ferrer MA, Travieso CM, Godino-Llorente JI, Díaz-de-María F (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195
Article Google Scholar
Horii Y (1979) Fundamental frequency perturbation observed in sustained phonation. J Speech Hearing Res 22:5–19
Google Scholar
Hosom JP, Shriberg L, Green JR (2004) Diagnostic assessment of childhood apraxia of speech using automatic speech recognition (ASR) methods. J Med Speech Lang Pathol 12(4):167–171
Google Scholar
Kumar A, Mullick SK (1990) Attractor dimension, entropy and modelling of speech time series. Electron Lett 26(21):1790–1791
Article Google Scholar
Llerena C, Alvarez L, Ayllon D (2011) Pitch detection in pathological voices driven by three tailored classical pitch detection algorithms. In: Recent advances in signal processing, computational geometry and systems theory. Proceeding of the ISCGAV’11 and ISTASC’11, pp 113–118
Google Scholar
Maier A, Haderlein T, Eysholdt U, Rosanowski F, Batliner A, Schuster M, Nöth E (2009) PEAKS—a system for the automatic evaluation of voice and speech disorders. Speech Commun 51(5):425–437. doi:10.1016/j.specom.2009.01.004
Article Google Scholar
Maier A, Haderlein T, Stelzle F, Nöth E, Nkenke E, Rosanowski F, Schützenberger A, Schuster M (2010) Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP J Audio Speech Music Process. doi:10.1155/2010/926951
Google Scholar
Malyska N, Quatieri TF, Sturim D (2005) Automatic dysphonia recognition using biologically inspired amplitude-modulation features. In: IEEE international conference on acoustics, speech, and. signal processing ICASSP-2005, pp 873–876
Google Scholar
Markaki M, Stylianou Y (2009) Using modulation spectra for voice pathology detection and classification. In: Proceedings of the IEEE conference on engineering in medicine and biology society 2009, pp 2514–2517
Google Scholar
Markaki M, Stylianou Y, Arias-Londono JD, Godino-Llorente JI (2010) Dysphonia detection based on modulation spectral features and cepstral coefficients. In Proceedings of ICASSP-2010, pp 5162–5165. doi:10.1109/ICASSP.2010.5495020
Middag C, Martens J-P, van Nuffelen G, de Bodt M (2009) Automated intelligibility assessment of pathological speech using phonological features. EURASIP J Adv Signal Process. doi:10.1155/2009/629030
MATH Google Scholar
Moakes PA, Beet S (1994) Analysis of non-linear speech generating dynamics. In Proceedings of 3rd international conference on spoken language processing (ICSLP 94), pp 1039–1042
Google Scholar
Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16:69–88
Article Google Scholar
Muhammad G, Mesallam TA, Malki KH, Farahat M, Alsulaiman M (2011) Formant analysis in dysphonic patients and automatic Arabic digit speech recognition. BioMed Eng OnLine 10:41. doi:10.1186/1475-925X-10-41
Article Google Scholar
Padrell-Sendra J, Martin-Iglesias D, Diaz-de-Maria F (2006) Support vector machines for continuous speech recognition. In: Proceedings of the 14th European signal processing conference EUSIPCO-2006. http://www.eurasip.org/Proceedings/Eusipco/Eusipco2006/papers/1568981563.pdf. Accessed 16 Feb 2012
Parsa V, Jamieson DG (2001) Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. J Speech Lang Hear Res 44:327–339
Article Google Scholar
Pinto J, Lovitt A, Hermansky H (2007) Exploiting phoneme similarities in hybrid HMM-ANN keyword spotting. In Proceedings of INTERSPEECH-2007, pp 1817–1820
Google Scholar
Pompili A, Abad A, Trancoso I, Fonseca J, Martins IP, Leal G, Farrajota L (2011) An on-line system for remote treatment of aphasia. In: Proceedings of 2nd workshop on speech and language processing for assistive technologies (SLPAT). http://www.inesc-id.pt/pt/indicadores/Ficheiros/7415.pdf. Accessed 16 Feb 2012
Reilly RB, Moran R, Lacy PD (2004) Voice pathology assessment based on a dialogue system and speech analysis. In Proc Amer Assoc Artif Intell Fall Symp Dialogue Syst Health Commun 104–109
Google Scholar
Ringeval F, Demouy J, Szaszák G, Chetouani M, Robel L, Xavier J, Cohen D, Plaza M (2010) Automatic intonation recognition for the prosodic assessment of language-impaired children. IEEE Trans Audio, Speech, and Lang Process 19(5):1328–1342. doi:10.1109/TASL.2010.2090147
Article Google Scholar
Salhi L, Mourad T, Cherif A (2010) Voice disorders identification using multilayer neural network. Int Arab J Inf Technol 7(2):177–185
Google Scholar
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409
Article Google Scholar
Silva DG, Oliveira LC, Andrea M (2009) Jitter estimation algorithms for detection of pathological voices. EURASIP J Adv Signal Process 1–10. doi:10.1155/2009/567875
Steidl S, Stemmer G, Hacker C, Nöth E (2004) Adaption in the pronunciation space for non-native speech recognition. In Proc Int Conf on Spoken Lang Process ICSLP 318–321
Google Scholar
Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE Trans Biomed Eng 59(5):1264–1271
Article Google Scholar

Download references

Author information

Authors and Affiliations

Phonetics Laboratory, University of Oxford, Oxford, OX1 2JF, UK
Ladan Baghai-Ravary & Steve W. Beet

Authors

Ladan Baghai-Ravary
View author publications
You can also search for this author in PubMed Google Scholar
Steve W. Beet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ladan Baghai-Ravary .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Baghai-Ravary, L., Beet, S.W. (2013). Established Methods. In: Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4574-6_5

Download citation

DOI: https://doi.org/10.1007/978-1-4614-4574-6_5
Published: 08 August 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4573-9
Online ISBN: 978-1-4614-4574-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics