A Comparison of Human and Machine Estimation of Speaker Age

  • Mark Huckvale
  • Aimee Webb
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9449)


The estimation of the age of a speaker from his or her voice has both forensic and commercial applications. Previous studies have shown that human listeners are able to estimate the age of a speaker to within 10 years on average, while recent machine age estimation systems seem to show superior performance with average errors as low as 6 years. However the machine studies have used highly non-uniform test sets, for which knowledge of the age distribution offers considerable advantage to the system. In this study we compare human and machine performance on the same test data chosen to be uniformly distributed in age. We show that in this case human and machine accuracy is more similar with average errors of 9.8 and 8.6 years respectively, although if panels of listeners are consulted, human accuracy can be improved to a value closer to 7.5 years. Both human and machines have difficulty in accurately predicting the ages of older speakers.


Speaker profiling Speaker age prediction Computational paralinguistics 


  1. 1.
    Tanner, D.C., Tanner, M.E.: Forensic Aspects of Speech Patterns: Voice Prints, Speaker Profiling, Lie and Intoxication Detection. Lawyers & Judges Publishing, Tucson (2004)Google Scholar
  2. 2.
    Pellegrini, T., Hedayati, V., Trancoso, I., Hämäläinen, A., Dias, M.: Speaker age estimation for elderly speech recognition in European Portuguese. In: Proceedings of InterSpeech 2014, Singapore, pp. 2962–2966 (2014)Google Scholar
  3. 3.
    Moyse, E.: Age estimation from faces and voices: a review. Psychologica Belgica 54, 255–265 (2014)CrossRefGoogle Scholar
  4. 4.
    Braun, A., Cerrato, L.: Estimating speaker age across languages. In: Proceedings of ICPhS 1999, San Francisco, pp. 1369–1372 (1999)Google Scholar
  5. 5.
    Krauss, R., Freyberg, R., Morsella, E.: Inferring speakers’ physical attributes from their voices. J. Exp. Soc. Psychol. 38, 618–625 (2002)CrossRefGoogle Scholar
  6. 6.
    Amilon, K., van de Weijer, J., Schötz, S.: The impact of visual and auditory cues in age estimation. In: Müller, C. (ed.) Speaker Classification II. Lecture Notes in Computer Science LNCS(LNAI), vol. 4441, pp. 10–21. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Moyse, E., Beaufort, A., Brédart, S.: Evidence for an own-age bias in age estimation from voices in older persons. Eur. J. Aging 11, 241–247 (2014)CrossRefGoogle Scholar
  8. 8.
    Bahari, M., McLaren, M., van Hamme, H., van Leeuwen, D.: Speaker age estimation using i-vectors. Eng. Appl. Artif. Intell. 34, 99–108 (2014)CrossRefGoogle Scholar
  9. 9.
    Li, M., Han, K., Narayanan, S.: Automatic speaker age and gender recognition using acoustic and prosodic level information. Comput. Speech Lang. 27, 151–167 (2013)CrossRefGoogle Scholar
  10. 10.
    Bocklet, T., Maier, A., Nöth, E.: Age determination of children in preschool and primary school age with GMM-based supervectors and support vector machines/regression. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 253–260. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Feld, M., Barnard, E., van Heerden, C., Müller, C.: Multilingual spear age recognition: regression analyses on the Lwazi corpus. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 534–539 (2009)Google Scholar
  12. 12.
    Dobry, G., Hecht, R., Avigal, M., Zigel, Y.: Supervector dimension reduction for efficient speaker age estimation based on the acoustic speech signal. IEEE Trans. Audio Speech Lang. Process. 19, 1975–1985 (2011)CrossRefGoogle Scholar
  13. 13.
    Bahari, M., van Hamme, H.: Speaker age estimation and gender detection based on supervised non-negative matrix factorization. In: Proceedings of IEEE Workshop Biometric Measurements and Systems for Security and Medical Applications, pp. 1–6 (2011)Google Scholar
  14. 14.
    Bahari, M., van Hamme, H.: Speaker age estimation using hidden Markov model weight supervectors. In: IEEE International Conference on Information Science, Signal Processing and their Applications, pp. 517–521 (2012)Google Scholar
  15. 15.
    Speech Ark, Second Accents of the British Isles Corpus.
  16. 16.
    Hadfield, J.: MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. J. Stat. Softw. 33, 1–22 (2010)CrossRefGoogle Scholar
  17. 17.
    Eyben, F., Weninger, F., Groß, F., Schuller, B.: Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, Barcelna, Spain, pp. 835–838 (2013)Google Scholar
  18. 18.
    Schuller, B., Steidl, S., Batliner, A., Epps, J., Eyben, F., Ringeval, F., Marchi, E., Zhang, Y.: The INTERSPEECH 2014 Computational Paralinguistics Challenge: Cognitive and Physical Load. Interspeech 2014, Singapore (2014)Google Scholar
  19. 19.
    Smola, A., Schölkopf, B.: A tutorial on support vector regression. J. Stat. Comput. 14, 199–222 (2004)MathSciNetCrossRefGoogle Scholar
  20. 20.
    CRAN Project, E1071 package of functions from Dept. Statistics, TU Wein.
  21. 21.
    Branco, P., Torgo, L., Ribeiro, R.: A survey of predictive modelling under imbalanced distributions. CoRR abs/1505.01658 (2015)Google Scholar
  22. 22.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)zbMATHGoogle Scholar
  23. 23.
    Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154, pp. 378–389. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  24. 24.
    Ardila, A.: Normal aging increases cognitive heterogeneity: analysis of dispersion in WAIS-III scores across age. Arch. Clin. Neuropsychol. 22, 1003–1011 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Speech, Hearing and Phonetic SciencesUniversity College LondonLondonUK

Personalised recommendations