Skip to main content

A Comparison of Human and Machine Estimation of Speaker Age

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9449)


The estimation of the age of a speaker from his or her voice has both forensic and commercial applications. Previous studies have shown that human listeners are able to estimate the age of a speaker to within 10 years on average, while recent machine age estimation systems seem to show superior performance with average errors as low as 6 years. However the machine studies have used highly non-uniform test sets, for which knowledge of the age distribution offers considerable advantage to the system. In this study we compare human and machine performance on the same test data chosen to be uniformly distributed in age. We show that in this case human and machine accuracy is more similar with average errors of 9.8 and 8.6 years respectively, although if panels of listeners are consulted, human accuracy can be improved to a value closer to 7.5 years. Both human and machines have difficulty in accurately predicting the ages of older speakers.


  • Speaker profiling
  • Speaker age prediction
  • Computational paralinguistics

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-25789-1_11
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-25789-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.


  1. Tanner, D.C., Tanner, M.E.: Forensic Aspects of Speech Patterns: Voice Prints, Speaker Profiling, Lie and Intoxication Detection. Lawyers & Judges Publishing, Tucson (2004)

    Google Scholar 

  2. Pellegrini, T., Hedayati, V., Trancoso, I., Hämäläinen, A., Dias, M.: Speaker age estimation for elderly speech recognition in European Portuguese. In: Proceedings of InterSpeech 2014, Singapore, pp. 2962–2966 (2014)

    Google Scholar 

  3. Moyse, E.: Age estimation from faces and voices: a review. Psychologica Belgica 54, 255–265 (2014)

    CrossRef  Google Scholar 

  4. Braun, A., Cerrato, L.: Estimating speaker age across languages. In: Proceedings of ICPhS 1999, San Francisco, pp. 1369–1372 (1999)

    Google Scholar 

  5. Krauss, R., Freyberg, R., Morsella, E.: Inferring speakers’ physical attributes from their voices. J. Exp. Soc. Psychol. 38, 618–625 (2002)

    CrossRef  Google Scholar 

  6. Amilon, K., van de Weijer, J., Schötz, S.: The impact of visual and auditory cues in age estimation. In: Müller, C. (ed.) Speaker Classification II. Lecture Notes in Computer Science LNCS(LNAI), vol. 4441, pp. 10–21. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  7. Moyse, E., Beaufort, A., Brédart, S.: Evidence for an own-age bias in age estimation from voices in older persons. Eur. J. Aging 11, 241–247 (2014)

    CrossRef  Google Scholar 

  8. Bahari, M., McLaren, M., van Hamme, H., van Leeuwen, D.: Speaker age estimation using i-vectors. Eng. Appl. Artif. Intell. 34, 99–108 (2014)

    CrossRef  Google Scholar 

  9. Li, M., Han, K., Narayanan, S.: Automatic speaker age and gender recognition using acoustic and prosodic level information. Comput. Speech Lang. 27, 151–167 (2013)

    CrossRef  Google Scholar 

  10. Bocklet, T., Maier, A., Nöth, E.: Age determination of children in preschool and primary school age with GMM-based supervectors and support vector machines/regression. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 253–260. Springer, Heidelberg (2008)

    CrossRef  Google Scholar 

  11. Feld, M., Barnard, E., van Heerden, C., Müller, C.: Multilingual spear age recognition: regression analyses on the Lwazi corpus. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 534–539 (2009)

    Google Scholar 

  12. Dobry, G., Hecht, R., Avigal, M., Zigel, Y.: Supervector dimension reduction for efficient speaker age estimation based on the acoustic speech signal. IEEE Trans. Audio Speech Lang. Process. 19, 1975–1985 (2011)

    CrossRef  Google Scholar 

  13. Bahari, M., van Hamme, H.: Speaker age estimation and gender detection based on supervised non-negative matrix factorization. In: Proceedings of IEEE Workshop Biometric Measurements and Systems for Security and Medical Applications, pp. 1–6 (2011)

    Google Scholar 

  14. Bahari, M., van Hamme, H.: Speaker age estimation using hidden Markov model weight supervectors. In: IEEE International Conference on Information Science, Signal Processing and their Applications, pp. 517–521 (2012)

    Google Scholar 

  15. Speech Ark, Second Accents of the British Isles Corpus.

  16. Hadfield, J.: MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. J. Stat. Softw. 33, 1–22 (2010)

    CrossRef  Google Scholar 

  17. Eyben, F., Weninger, F., Groß, F., Schuller, B.: Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, Barcelna, Spain, pp. 835–838 (2013)

    Google Scholar 

  18. Schuller, B., Steidl, S., Batliner, A., Epps, J., Eyben, F., Ringeval, F., Marchi, E., Zhang, Y.: The INTERSPEECH 2014 Computational Paralinguistics Challenge: Cognitive and Physical Load. Interspeech 2014, Singapore (2014)

    Google Scholar 

  19. Smola, A., Schölkopf, B.: A tutorial on support vector regression. J. Stat. Comput. 14, 199–222 (2004)

    MathSciNet  CrossRef  Google Scholar 

  20. CRAN Project, E1071 package of functions from Dept. Statistics, TU Wein.

  21. Branco, P., Torgo, L., Ribeiro, R.: A survey of predictive modelling under imbalanced distributions. CoRR abs/1505.01658 (2015)

    Google Scholar 

  22. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  23. Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154, pp. 378–389. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  24. Ardila, A.: Normal aging increases cognitive heterogeneity: analysis of dispersion in WAIS-III scores across age. Arch. Clin. Neuropsychol. 22, 1003–1011 (2007)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Mark Huckvale .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Huckvale, M., Webb, A. (2015). A Comparison of Human and Machine Estimation of Speaker Age. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25788-4

  • Online ISBN: 978-3-319-25789-1

  • eBook Packages: Computer ScienceComputer Science (R0)