Abstract
This paper presents an automatic speech-based classification scheme to classify speaker characteristics. In the training phase, speech data are grouped into speaker groups according to speakers’ gender, age and accent. Voice features are then extracted to feature vectors which are used to train speaker characteristic models with different techniques which are Vector Quantization, Gaussian Mixture Model and Support Vector Machine. Fusion of classification results from those groups is then performed to obtain final classification results for each characteristic. The Australian National Database of Spoken Language (ANDOSL) corpus was used for evaluation of gender, age and accent classification. Experiments showed high performance for the proposed classification scheme.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Schultz, T.: Speaker characteristics, in Speaker Classification I, pp. 47–74. Springer, Heidelberg (2007)
Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic estimation of one’s age with his/her speech based upon acoustic modeling techniques of speakers. In: Proc. IEEE Int’l Conference on Acoustic Signal and Speech Processing, pp. 137–140 (2002)
Shafran, I., Riley, M., Mohri, M.: Voice signatures. In: Proc. IEEE Automatic Speech Recognition and Understanding Workshop (2003)
Metze, F., Ajmera, J., Englert, R., Bub, U., Burkhardt, F., Stegmann, J., Müller, C., Huber, R., Andrassy, B., Bauer, J.G., Littel, B.: Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications. In: ICASSP 2007 Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, Hawai’i, USA, vol. 4, pp. 1089–1092 (2007)
Shriberg, E.: Higher-Level Features in Speaker Recognition, in Speaker Classification I, pp. 241–259. Springer, Heidelberg (2007)
Schötz, S.: Acoustic analysis of adult speaker age, in Speaker Classification I, pp. 88–107. Springer, Heidelberg (2007)
Campbell, J.P., Reynolds, D.A., Dunn, R.B.: Fusing high- and low-level features for speaker recognition. In: Proceedings of Eurospeech, pp. 2665–2668 (2003)
Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: Low Level Descriptors and Functionals. In: Proc. Interspeech, Antwerp, pp. 2253–2256 (2007)
Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 Emotion Challenge. In: Proc. Interspeech. ISCA, Brighton (2009)
Mitchell, A.G., Delbridge, A.: The Pronunciation of English in Australia, pp. 11–19 (1965)
http://www.convictcreations.com/research/languageidentity.html
Harrington, J., Cox, F., Evans, Z.: An acoustic phonetic study of broad, general, and cultivated Australian English vowels. Australian Journal of Linguistics 17(2), 155–184 (1997)
Berkling, K., Zissman, M., Vonwiller, J., Cleirigh, C.: Improving accent identification through knowledge of English syllable structure. In: ICSLP 1998, pp. 89–92 (1998)
Kumpf, K., King, R.W.: Automatic accent classification of foreign accented Australian English speech. In: Fourth International Conference on Spoken Language Processing, pp. 1740–1743 (1996)
Kollengode, A.S., Ahmad, H., Adam, B., Serge, B.: Performance of speaker-independent speech recognisers for automatic recognition of Australian English. In: Proceedings of the 11th Australian International Conference on Speech Science & Technology, Auckland, pp. 494–499 (2006)
Eyben, F., Wollmer, M., Schuller, B.: Speech and Music Interpretation by Large-Space Extraction (2009), http://sourceforge.net/projects/openSMILE
Woodland, P.C., Gales, M.J.F., Pye, D., Young, S.J.: Broadcast news transcription using HTK. In: Proc. ICASSP 1997, Munich, pp. 719–722 (1997)
Millar, J.B., Vonwiller, J.P., Harrington, J.M., Dermody, P.J.: The Australian National Database of Spoken Language. In: Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP 1994), vol. 1, pp. 97–100 (1994)
Duda, R.O., Hart, P.E.: Pattern classification and scene analysis. John Wiley & Sons, Chichester (1973)
Tran, D., Ma, W., Sharma, D., Nguyen, T.: Fuzzy Vector Quantization for Network Intrusion Detection. In: IEEE International Conference on Granular Computing, Silicon Valley, USA, November 2-4 (2007)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Hathaway, R.: Another interpretation of the EM algorithm for mixture distribution. Journal of Statistics & Probability Letters 4, 53–56 (1986)
Huang, X.D., Lee, K., Hon, H., Hwang, M.: Improved acoustic modeling for the SPHINX speech recognition system. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Canada, pp. 345–348 (1991)
Reynolds, D., Rose, R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Processing 3(1), 72–83 (1995)
Wildermoth, B.R., Paliwal, K.K.: GMM based speaker recognition on readily available databases. In: Micro. Elec. Eng. Research Conf. 2003 (2003)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Knowledge Discovery and Data Mining 2(2), 121–167 (1998)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Chang, C.-C., Lin, C.-J.: LibSVM: a library for sup-port vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nguyen, P., Tran, D., Huang, X., Sharma, D. (2010). Automatic Speech-Based Classification of Gender, Age and Accent. In: Kang, BH., Richards, D. (eds) Knowledge Management and Acquisition for Smart Systems and Services. PKAW 2010. Lecture Notes in Computer Science(), vol 6232. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15037-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-15037-1_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15036-4
Online ISBN: 978-3-642-15037-1
eBook Packages: Computer ScienceComputer Science (R0)