A Comparison of Human and Machine Estimation of Speaker Age
- Cite this paper as:
- Huckvale M., Webb A. (2015) A Comparison of Human and Machine Estimation of Speaker Age. In: Dediu AH., Martín-Vide C., Vicsi K. (eds) Statistical Language and Speech Processing. Lecture Notes in Computer Science, vol 9449. Springer, Cham
The estimation of the age of a speaker from his or her voice has both forensic and commercial applications. Previous studies have shown that human listeners are able to estimate the age of a speaker to within 10 years on average, while recent machine age estimation systems seem to show superior performance with average errors as low as 6 years. However the machine studies have used highly non-uniform test sets, for which knowledge of the age distribution offers considerable advantage to the system. In this study we compare human and machine performance on the same test data chosen to be uniformly distributed in age. We show that in this case human and machine accuracy is more similar with average errors of 9.8 and 8.6 years respectively, although if panels of listeners are consulted, human accuracy can be improved to a value closer to 7.5 years. Both human and machines have difficulty in accurately predicting the ages of older speakers.