Audio Features Selection for Automatic Height Estimation from Speech
Aiming at the automatic estimation of the height of a person from speech, we investigate the applicability of various subsets of speech features, which were formed on the basis of ranking the relevance and the individual quality of numerous audio features. Specifically, based on the relevance ranking of the large set of openSMILE audio descriptors, we performed selection of subsets with different sizes and evaluated them on the height estimation task. In brief, during the speech parameterization process, every input utterance is converted to a single feature vector, which consists of 6552 parameters. Next, a subset of this feature vector is fed to a support vector machine (SVM)-based regression model, which aims at the straight estimation of the height of an unknown speaker. The experimental evaluation performed on the TIMIT database demonstrated that: (i) the feature vector composed of the top-50 ranked parameters provides a good trade-off between computational demands and accuracy, and that (ii) the best accuracy, in terms of mean absolute error and root mean square error, is observed for the top-200 subset.
Keywordsheight estimation from speech speech parameterization feature ranking feature selection SVM regression models
Unable to display preview. Download preview PDF.
- 2.van Dommelen, W.A., Moxness, B.H.: Acoustic parameters in speaker height and weight identification: sex-specific behaviour. Language and Speech 38, 267–287 (1995)Google Scholar
- 3.van Oostendorp, M.: Schwa in phonological theory. GLOT International 3, 3–8 (1998)Google Scholar
- 10.Dusan, S.: Estimation of speaker’s height and vocal tract length from speech signal. In: Proc. of the 9th European Conference on Speech Communication and Technology (Interspeech 2005), pp. 1989–1992 (2005)Google Scholar
- 11.Fant, G.: Acoustic Theory of Speech Production. Mouton, The Hague (1960)Google Scholar
- 13.Eyben, F., Wöllmer, M., Schüller, B.: openEAR – introducing the Munich open-source emotion and affect recognition toolkit. In: Proc. 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction 2009 (ACII 2009), September 10-12. IEEE, Amsterdam (2009)Google Scholar
- 14.Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department, Cambridge (2006)Google Scholar
- 15.Robnik-Šikonja, M., Kononenko, I.: An adaptation of Relief for attribute estimation in regression. In: Fourteenth International Conference on Machine Learning, pp. 296–304 (1997)Google Scholar
- 17.Garofolo, J.: Getting started with the DARPA-TIMIT CD-ROM: an acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), Gaithersburgh, MD, USA (1988)Google Scholar
- 18.Pellom, B.L., Hansen, J.H.L.: Voice analysis in adverse conditions: the centennial Olympic park bombing 911 call. In: Proc. of the 40th Midwest Symposium on Circuits and Systems (MWSCAS 1997), vol. 2, pp. 873–876 (1997)Google Scholar
- 20.Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, London (2009)Google Scholar