Approach for Spectral Analysis in Detection of Selected Pronunciation Pathologies

  • Michał KręcichwostEmail author
  • Piotr Rasztabiga
  • Andre Woloshuk
  • Paweł Badura
  • Zuzanna Miodońska
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 925)


A framework for semi-automated detection of selected types of sigmatism is presented in this paper. A database of speech recordings was collected containing sibilant /s/ surrounded by vowels in different articulation phases. Recordings of three pronunciation modes were included into the database: normal, simulated lateral sigmatism, and simulated interdental sigmatism. The data was collected under the supervision of a speech therapy expert, who also provided labelling and annotation of each database entry. Twenty eight features of four types were extracted from each time frame within the sibilant: the mel-frequency cepstral coefficients, filter bank energies, spectral brightness, and zero-crossing rate. A feature aggregation procedure weighing the time frame location influence was proposed to describe each phoneme by a single feature vector. At the three-class classification stage, two tools were employed and compared: the random forest and support vector machine. The latter provides more accurate and repeatable classification results in each articulation phase with a median sensitivity, specificity, and accuracy exceeding 0.71, 0.85, and 0.80, respectively. The results also show that the assessment is generally more efficient when the phoneme is located at the beginning or ending of the word than when in the middle position.


Computer-aided pronunciation evaluation Sibilants Sigmatism diagnosis 



This research was supported by the Polish Ministry of Science and Silesian University of Technology statutory financial support No. BK-209/RIB1/2018.


  1. 1.
    Lobacz, P., Dobrzanska, K.: Opis akustyczny glosek sybilantnych w wymowie dzieci przedszkolnych. Audiofonologia 15, 7–26 (1999). (in Polish)Google Scholar
  2. 2.
    Miodońska, Z., Kręcichwost, M., Szymańska, A.: Computer-aided evaluation of sibilants in preschool children sigmatism diagnosis. In: Information Technologies in Medicine, pp. 367–376. Springer (2016)Google Scholar
  3. 3.
    Wielgat, R., Zielinski, T., Wozniak, T., Grabias, S., Król, D.: Automatic recognition of pathological phoneme production. Folia Phoniatr Logop 60(6), 323–331 (2008). Spoken Language Technology for EducationCrossRefGoogle Scholar
  4. 4.
    Valentini-Botinhao, C., Degenkolb-Weyers, S., Maier, A., Nöth, E., Eysholdt, U., Bocklet, T.: Automatic detection of sigmatism in children. In: WOCCI, pp. 1–4 (2012)Google Scholar
  5. 5.
    Seddik, A.F., El Adawy, M., Shahin, A.I.: A computer-aided speech disorders correction system for arabic language, pp. 18–21, September 2013Google Scholar
  6. 6.
    Bodusz, W., Miodońska, Z., Badura, P.: Approach for spectrogram analysis in detection of selected pronunciation pathologies. In: Innovations in Biomedical Engineering, vol. 623, pp. 3–11. Springer (2018)Google Scholar
  7. 7.
    Kostera, K., Więclawek, W., Kręcichwost, M.: Prototype measurement system for spatial analysis of speech signal for speech therapy. In: Innovations in Biomedical Engineering, vol. 623, pp. 79–86. Springer (2018)Google Scholar
  8. 8.
    Kręcichwost, M., Miodońska, Z., Trzaskalik, J., Pyttel, J., Spinczyk, D.: Acoustic mask for air flow distribution analysis in speech therapy. In: Information Technologies in Medicine, pp. 377–387. Springer (2016)Google Scholar
  9. 9.
    Król, D., Lorenc, A.: Acoustic field distribution in speech with the use of the microphone array. Tarnowskie Colloquia Naukowe 3(4), 9–16 (2017)Google Scholar
  10. 10.
    Sebkhi, N., Desai, D., Islam, M., Lu, J., Wilson, K., Ghovanloo, M.: Multimodal speech capture system for speech rehabilitation and learning. IEEE Trans. Biomed. Eng. 64(11), 2639–2649 (2017)CrossRefGoogle Scholar
  11. 11.
    Aron, M., Berger, M.-O., Kerrien, E., Wrobel-Dautcourt, B., Potard, B., Laprie, Y.: Multimodal acquisition of articulatory data: geometrical and temporal registration. J. Acoust. Soc. Am. 139(2), 636–648 (2016)CrossRefGoogle Scholar
  12. 12.
    Opielinski, K.J., Gudra, T., Migda, J.: Computer ultrasonic imaging of the tongue shape changes in the process of articulation of vowels. In: Computer Recognition Systems 2, pp. 629–636. Springer, Berlin (2007)Google Scholar
  13. 13.
    Wielgat, R., Mik, L., Lorenc, A.: Correlational and regressive analysis of the relationship between tongue and lips motion - an EMA and video study of selected polish speech sounds, pp. 509–514, June 2017Google Scholar
  14. 14.
    Martony, J.: On the synthesis and perception of voiceless fricatives. STL-QPSR 3(1), 17–22 (1962)Google Scholar
  15. 15.
    Young, S.J., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.4. Cambridge University Press, Cambridge (2006)Google Scholar
  16. 16.
    Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, 1st edn. Prentice Hall PTR, Upper Saddle River (2001)Google Scholar
  17. 17.
    Paliwal, K.K.: Decorrelated and liftered filter-bank energies for robust speech recognition. In: EUROSPEECH (1999)Google Scholar
  18. 18.
    Jensen, K., Andersen, T.H.: Real-time beat estimation using feature extraction. In: Computer Music Modeling and Retrieval, pp. 13–22. Springer, Berlin (2004)CrossRefGoogle Scholar
  19. 19.
    Bachu, R.G., Kopparthi, S., Adapa, B., Barkana, B.D.: Voiced/unvoiced decision for speech signals based on zero-crossing rate and energy. In: Advanced Techniques in Computing Sciences and Software Engineering, pp. 279–282. Springer, Dordrecht (2010)Google Scholar
  20. 20.
    Reidy, P.F.: Spectral dynamics of sibilant fricatives are contrastive and language specific. J. Acoust. Soc. Am. 140(4), 2518–2529 (2016)CrossRefGoogle Scholar
  21. 21.
    Klesla, J.: Analiza akustyczna polskich spolglosek tracych bezdzwiecznych realizowanych przez dzieci nieslyszace. Audiofonologia Problemy teorii i praktyki 26, 107–118 (2004). (in Polish)Google Scholar
  22. 22.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  23. 23.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)zbMATHGoogle Scholar
  24. 24.
    Soli, S.D.: Second formants in fricatives: acoustic consequences of fricative vowel coarticulation. J. Acoust. Soc. Am. 70(4), 976–984 (1981)CrossRefGoogle Scholar
  25. 25.
    Sereno, J.A., Baum, S.R., Marean, G.C., Lieberman, P.: Acoustic analyses and perceptual data on anticipatory labial coarticulation in adults and children. J. Acoust. Soc. Am. 77(S1), S7–S8 (1985)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Michał Kręcichwost
    • 1
    Email author
  • Piotr Rasztabiga
    • 1
  • Andre Woloshuk
    • 2
  • Paweł Badura
    • 1
  • Zuzanna Miodońska
    • 1
  1. 1.Faculty of Biomedical EngineeringSilesian University of TechnologyZabrzePoland
  2. 2.Weldon School of Biomedical EngineeringPurdue UniversityWest LafayetteUSA

Personalised recommendations