Language-Independent Age Estimation from Speech Using Phonological and Phonemic Features

  • Tino Haderlein
  • Catherine Middag
  • Florian Hönig
  • Jean-Pierre Martens
  • Michael Döllinger
  • Anne Schützenberger
  • Elmar Nöth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9302)

Abstract

Language-independent and alignment-free phonological and phonemic features were applied for automatic age estimation based on voice and speech properties. 110 persons (average: 75.7 years) read the German version of the text “The North Wind and the Sun”. For comparison with the automatic approach, five listeners estimated the speakers’ age perceptually. Support Vector Regression and feature selection were used to compute the best model of aging. This model was found to use the following features: (a) the percentage of voiced frames, (b) eight phonological features, representing vowel height, nasality in consonants, turbulence, and position of the lips, and finally, (c) seven phonemic features. The latter features might be relevant due to altered articulation because of dentures. The mean absolute error between computed and chronological age was 5.2 years (RMSE: 7.0). It was 7.7 years (RMSE: 9.6) for an optimistic trivial estimator and 10.5 years (RMSE: 11.9) for the average listener.

Keywords

Age estimation Phonological features Phonemic features SVR 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rusz, J., Cmejla, R., Ruzickova, H., Ruzicka, E.: Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease. J. Acoust. Soc. Am. 129, 350–367 (2011)CrossRefGoogle Scholar
  2. 2.
    Middag, C., Bocklet, T., Martens, J.-P., Nöth, E.: Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment. In: Proc. Interspeech, ISCA, pp. 3005–3008 (2011)Google Scholar
  3. 3.
    Middag, C.: Automatic Analysis of Pathological Speech. PhD thesis, Ghent University, Ghent, Belgium (2012)Google Scholar
  4. 4.
    Haderlein, T., Middag, C., Maier, A., Martens, J.-P., Döllinger, M., Nöth, E.: Visualization of intelligibility measured by language-independent features. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 547–554. Springer, Heidelberg (2014) Google Scholar
  5. 5.
    Schneider, S., Plank, C., Eysholdt, U., Schützenberger, A., Rosanowski, F.: Voice Function and Voice-Related Quality of Life in the Elderly. Gerontology 57, 109–114 (2011)CrossRefGoogle Scholar
  6. 6.
    International Phonetic Association (IPA): Handbook of the International Phonetic Association. Cambridge University Press, Cambridge (1999)Google Scholar
  7. 7.
    Middag, C., Saeys, Y., Martens, J.-P.: Towards an ASR-free objective analysis of pathological speech. In: Proc. Interspeech, ISCA, pp. 294–297 (2010)Google Scholar
  8. 8.
    Moerman, M., Pieters, G., Martens, J.-P., van der Borgt, M.-J., Dejonckere, P.: Objective evaluation of the quality of substitution voices. Eur. Arch. Otorhinolaryngol. 261, 541–547 (2004)CrossRefGoogle Scholar
  9. 9.
    van Immerseel, L., Martens, J.-P.: AMPEX Disordered Voice Analyzer [computer program]. Digital Speech and Signal Processing research group, Ghent University, Ghent, Belgium. http://dssp.elis.ugent.be/downloads-software (last visited May 28, 2015)
  10. 10.
    van Immerseel, L.M., Martens, J.-P.: Pitch and voiced/unvoiced determination with an auditory model. J. Acoust. Soc. Am. 91, 3511–3526 (1992)CrossRefGoogle Scholar
  11. 11.
    Smola, A.J., Schölkopf, B.: A Tutorial on Support Vector Regression. Statistics and Computing 14, 199–222 (2004)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) MATHGoogle Scholar
  13. 13.
    Harrington, J., Palethorpe, S., Watson, C.I.: Does the Queen speak the Queen’s English? Nature 408, 927–928 (2000)CrossRefGoogle Scholar
  14. 14.
    Watson, P.J., Munson, B.: A comparison of vowel acoustics between older and younger adults. In: Proc. ICPhS XIV, pp. 561–564. International Phonetic Association (2007)Google Scholar
  15. 15.
    Harrington, J., Palethorpe, S., Watson, C.I.: Age-related changes in fundamental frequency and formants: a longitudinal study of four speakers. In: Proc. Interspeech, ISCA, pp. 2753–2756 (2007)Google Scholar
  16. 16.
    Schötz, S.: Prosodic and non-prosodic cues in human and machine estimation of female and male speaker age. In: Bruce, G., Horne, M. (eds.) Nordic Prosody: Proceedings of the IXth Conference, pp. 215–223. Lund, Sweden (2004)Google Scholar
  17. 17.
    Spiegl, W., Stemmer, G., Lasarcyk, E., Kolhatkar, V., Cassidy, A., Potard, B., Shum, S., Song, Y.C., Xu, P., Beyerlein, P., Harnsberger, J., Nöth, E.: Analyzing features for automatic age estimation on cross-sectional data. In: Proc. Interspeech, ISCA, pp. 2923–2926 (2009)Google Scholar
  18. 18.
    Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic estimation of perceptual age using speaker modeling techniques. In: Proc. Eurospeech, ISCA, pp. 3005–3008 (2003)Google Scholar
  19. 19.
    Bocklet, T., Maier, A., Nöth, E.: Age determination of children in preschool and primary school age with GMM-based supervectors and support vector machines/regression. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 253–260. Springer, Heidelberg (2008) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Tino Haderlein
    • 1
  • Catherine Middag
    • 2
  • Florian Hönig
    • 1
  • Jean-Pierre Martens
    • 2
  • Michael Döllinger
    • 3
  • Anne Schützenberger
    • 3
  • Elmar Nöth
    • 1
  1. 1.Lehrstuhl für Mustererkennung (Informatik 5)Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)ErlangenGermany
  2. 2.Vakgroep voor Elektronica en Informatiesystemen (ELIS)Universiteit GentGentBelgium
  3. 3.Phoniatrische und pädaudiologische Abteilung in der HNO-KlinikKlinikum der Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations