Skip to main content

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

  • 960 Accesses

Abstract

Both pre-processing (feature extraction) and pattern classification techniques are discussed in this chapter. Traditionally, specialised parameters have been used for the analysis of speech disorders: harmonic-to-noise ratio, jitter, shimmer, and others. These have been devised using expert opinions from speech and language therapists and other professionals. They are typically calculated using widely available software packages, but still require trained personnel to collect and prepare the recordings, as well as to interpret the resulting parameters. More recently, researchers have also investigated many of the parameters or features used in speech and speaker recognition. Features such as the ubiquitous mel-frequency cepstral coefficients are often used, but so are numerous less common methods, such as formant frequencies, modulation spectra, chaos-theory parameters, and prosodic and phonological features. Each of these has had its fair share of success, but the most successful systems have generally used a combination of multiple features and/or multiple classification algorithms. Numerous methods for discriminating between disordered and normal speech, and sometimes between different forms of speech disorder, have been devised. They have typically been based on neural networks, Markov models, support vector machines, and other classifiers (both linear and non-linear), although Gaussian Mixture Models are probably the most widely used, robust, and successful so far.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Accardo A, Fabbro F, Mumolo E (1992) Analysis of normal and pathological voices via short-time fractal dimension. In: Proceedings of annual international conference of the IEEE engineering in medicine and biology society, vol 14, pp 1270–1271

    Google Scholar 

  • Alpan A, Schoentgen J, Maryn Y, Grenez F, Murphy P (2009) Cepstral analysis of vocal dysperiodicities in disordered connected speech. In: Proceedings of INTERSPEECH-2009, pp 959–962

    Google Scholar 

  • Askenfelt A, Hammarberg B (1986) Speech waveform perturbation analysis: a perceptual-acoustical comparison of seven measures. J Speech Hearing Res 29:50–64

    Google Scholar 

  • Awan SN, Scarpino SE (2004) Measures of vocal F0 from continuous speech samples: an inter-program comparison. J Speech Lang Pathol Audiol 28:122–131

    Google Scholar 

  • Baken RI (1987) Clinical measurement of speech and voice. College Hill Press, Boston

    Google Scholar 

  • Carmichael J, Wan V, Green P (2008) Combining neural network and rule-based systems for dysarthria diagnosis. In: Proceedings of INTERSPEECH-2008, pp 2226–2229

    Google Scholar 

  • Castillo-Guerra E, Lovey DF (2003) A modern approach to dysarthria classification. In: 25th Annual Conference of the IEEE Engineering in Medicine and Biology Society, vol 3, 2257–2260. doi:10.1109/IEMBS.2003.1280248

  • Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans Audio, Speech Lang Process 20(1):30–42. doi:10.1109/TASL.2011.2134090

    Article  Google Scholar 

  • de Krom G (1994) Consistency and reliability of voice quality ratings for different types of speech fragments. J Speech Hearing Res 37(5):965–1000

    Google Scholar 

  • de Krom G (1995) Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. J Speech Hearing Res 38:794–811

    Google Scholar 

  • Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of pathological speech. Eng Med and Biol 2002. In: Proceedings of the 24th annual conference and annual fall meeting of the biomedical engineering society EMBS/BMES, vol 1, pp 182–183. doi:10.1109/IEMBS.2002.1134447

  • Droppo J, Acero A (2010). In: IEEE international conference on acoustics speech and signal processing ICASSP-2010, pp 4358–4361. doi:10.1109/ICASSP.2010.5495652

  • Ganapathiraju A, Hamaker JE, Picone J (2004) Applications of support vector machines to speech recognition. IEEE Trans Signal Process 52(8):2348–2355. doi:10.1109/TSP.2004.831018

    Article  Google Scholar 

  • Gunn SR (1998) Support vector machines for classification and regression. School of Electronics and Computer Science technical report, University of Southampton

    Google Scholar 

  • Haderlein T, Zorn D, Steidl S, Nöth E, Shozakai M, Schuster M (2006) Visualization of voice disorders using the Sammon transform. In: Proceedings of the 9th international conference on text, speech and dialogue (TSD ‘06). Lecture notes in computer science, vol 4188, pp 589–596

    Google Scholar 

  • Hariharan M, Paulraj MP, Yaacob S (2010) Time-domain features and probabilistic neural network for the detection of vocal fold pathology. Malays J Comput Sci 23(1):60–67

    Google Scholar 

  • Henríquez P, Alonso JB, Ferrer MA, Travieso CM, Godino-Llorente JI, Díaz-de-María F (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195

    Article  Google Scholar 

  • Horii Y (1979) Fundamental frequency perturbation observed in sustained phonation. J Speech Hearing Res 22:5–19

    Google Scholar 

  • Hosom JP, Shriberg L, Green JR (2004) Diagnostic assessment of childhood apraxia of speech using automatic speech recognition (ASR) methods. J Med Speech Lang Pathol 12(4):167–171

    Google Scholar 

  • Kumar A, Mullick SK (1990) Attractor dimension, entropy and modelling of speech time series. Electron Lett 26(21):1790–1791

    Article  Google Scholar 

  • Llerena C, Alvarez L, Ayllon D (2011) Pitch detection in pathological voices driven by three tailored classical pitch detection algorithms. In: Recent advances in signal processing, computational geometry and systems theory. Proceeding of the ISCGAV’11 and ISTASC’11, pp 113–118

    Google Scholar 

  • Maier A, Haderlein T, Eysholdt U, Rosanowski F, Batliner A, Schuster M, Nöth E (2009) PEAKS—a system for the automatic evaluation of voice and speech disorders. Speech Commun 51(5):425–437. doi:10.1016/j.specom.2009.01.004

    Article  Google Scholar 

  • Maier A, Haderlein T, Stelzle F, Nöth E, Nkenke E, Rosanowski F, Schützenberger A, Schuster M (2010) Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP J Audio Speech Music Process. doi:10.1155/2010/926951

    Google Scholar 

  • Malyska N, Quatieri TF, Sturim D (2005) Automatic dysphonia recognition using biologically inspired amplitude-modulation features. In: IEEE international conference on acoustics, speech, and. signal processing ICASSP-2005, pp 873–876

    Google Scholar 

  • Markaki M, Stylianou Y (2009) Using modulation spectra for voice pathology detection and classification. In: Proceedings of the IEEE conference on engineering in medicine and biology society 2009, pp 2514–2517

    Google Scholar 

  • Markaki M, Stylianou Y, Arias-Londono JD, Godino-Llorente JI (2010) Dysphonia detection based on modulation spectral features and cepstral coefficients. In Proceedings of ICASSP-2010, pp 5162–5165. doi:10.1109/ICASSP.2010.5495020

  • Middag C, Martens J-P, van Nuffelen G, de Bodt M (2009) Automated intelligibility assessment of pathological speech using phonological features. EURASIP J Adv Signal Process. doi:10.1155/2009/629030

    MATH  Google Scholar 

  • Moakes PA, Beet S (1994) Analysis of non-linear speech generating dynamics. In Proceedings of 3rd international conference on spoken language processing (ICSLP 94), pp 1039–1042

    Google Scholar 

  • Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16:69–88

    Article  Google Scholar 

  • Muhammad G, Mesallam TA, Malki KH, Farahat M, Alsulaiman M (2011) Formant analysis in dysphonic patients and automatic Arabic digit speech recognition. BioMed Eng OnLine 10:41. doi:10.1186/1475-925X-10-41

    Article  Google Scholar 

  • Padrell-Sendra J, Martin-Iglesias D, Diaz-de-Maria F (2006) Support vector machines for continuous speech recognition. In: Proceedings of the 14th European signal processing conference EUSIPCO-2006. http://www.eurasip.org/Proceedings/Eusipco/Eusipco2006/papers/1568981563.pdf. Accessed 16 Feb 2012

  • Parsa V, Jamieson DG (2001) Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. J Speech Lang Hear Res 44:327–339

    Article  Google Scholar 

  • Pinto J, Lovitt A, Hermansky H (2007) Exploiting phoneme similarities in hybrid HMM-ANN keyword spotting. In Proceedings of INTERSPEECH-2007, pp 1817–1820

    Google Scholar 

  • Pompili A, Abad A, Trancoso I, Fonseca J, Martins IP, Leal G, Farrajota L (2011) An on-line system for remote treatment of aphasia. In: Proceedings of 2nd workshop on speech and language processing for assistive technologies (SLPAT). http://www.inesc-id.pt/pt/indicadores/Ficheiros/7415.pdf. Accessed 16 Feb 2012

  • Reilly RB, Moran R, Lacy PD (2004) Voice pathology assessment based on a dialogue system and speech analysis. In Proc Amer Assoc Artif Intell Fall Symp Dialogue Syst Health Commun 104–109

    Google Scholar 

  • Ringeval F, Demouy J, Szaszák G, Chetouani M, Robel L, Xavier J, Cohen D, Plaza M (2010) Automatic intonation recognition for the prosodic assessment of language-impaired children. IEEE Trans Audio, Speech, and Lang Process 19(5):1328–1342. doi:10.1109/TASL.2010.2090147

    Article  Google Scholar 

  • Salhi L, Mourad T, Cherif A (2010) Voice disorders identification using multilayer neural network. Int Arab J Inf Technol 7(2):177–185

    Google Scholar 

  • Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409

    Article  Google Scholar 

  • Silva DG, Oliveira LC, Andrea M (2009) Jitter estimation algorithms for detection of pathological voices. EURASIP J Adv Signal Process 1–10. doi:10.1155/2009/567875

  • Steidl S, Stemmer G, Hacker C, Nöth E (2004) Adaption in the pronunciation space for non-native speech recognition. In Proc Int Conf on Spoken Lang Process ICSLP 318–321

    Google Scholar 

  • Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE Trans Biomed Eng 59(5):1264–1271

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ladan Baghai-Ravary .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Baghai-Ravary, L., Beet, S.W. (2013). Established Methods. In: Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4574-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-4574-6_5

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-4573-9

  • Online ISBN: 978-1-4614-4574-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics