RANSAC-Based Training Data Selection on Spectral Features for Emotion Recognition from Spontaneous Speech

Bozkurt, Elif; Erzin, Engin; Erdem, Çiǧdem Eroǧlu; Erdem, A. Tanju

doi:10.1007/978-3-642-25775-9_3

Elif Bozkurt²¹,
Engin Erzin²¹,
Çiǧdem Eroǧlu Erdem²² &
…
A. Tanju Erdem²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6800))

2548 Accesses
3 Citations

Abstract

Training datasets containing spontaneous emotional speech are often imperfect due the ambiguities and difficulties of labeling such data by human observers. In this paper, we present a Random Sampling Consensus (RANSAC) based training approach for the problem of emotion recognition from spontaneous speech recordings. Our motivation is to insert a data cleaning process to the training phase of the Hidden Markov Models (HMMs) for the purpose of removing some suspicious instances of labels that may exist in the training dataset. Our experiments using HMMs with Mel Frequency Cepstral Coefficients (MFCC) and Line Spectral Frequency (LSF) features indicate that utilization of RANSAC in the training phase provides an improvement in the unweighted recall rates on the test set. Experimental studies performed over the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF and MFCC based classifiers provide further significant performance improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Angelova, A., Abu-Mostafa, Y., Perona, P.: Pruning training sets for learning of object categories. In: Proc. Int. Conf. on Computer Vision and Pattern Recognition, CVPR (2005)
Google Scholar
Barandela, R., Gasca, E.: Decontamination of training samples for supervised pattern recognition methods. In: Amin, A., Pudil, P., Ferri, F., Iñesta, J.M. (eds.) SPR 2000 and SSPR 2000. LNCS, vol. 1876, pp. 621–630. Springer, Heidelberg (2000)
Chapter Google Scholar
Ben-Gal, I.: Outlier Detection, Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers. Kluwer Academic Publishers, Dordrecht (2005)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
MATH Google Scholar
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 7, 1895–1924 (1998)
Article Google Scholar
Erzin, E., Yemez, Y., Tekalp, A.M.: Multimodal speaker identification using an adaptive classifier cascade based on modality realiability. IEEE Transactions on Multimedia 7(5), 840–852 (2005)
Article Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Graphics and Image Processing 24 (1981)
Google Scholar
Gu, B., Hu, F., Liu, H.: Sampling and its applications in data mining: A survey. Tech. Rep. School of Computing, National University of Singapore (2000)
Google Scholar
Guyon, I., Matin, N., Vapnik, V.: Discovering informative patterns and data cleaning. In: Workshop on Knowledge Discovery in Databases (1994)
Google Scholar
Itakura, F.: Line spectrum representation of linear predictive coefficients of speech signals. Journal of the Acoustical Society of America 57(1), S35 (1975)
Article Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers. John Wiley and Sons, Chichester (2004)
Book MATH Google Scholar
Kwon, O., Chan, K., Hao, J., Lee, T.: Emotion recognition by speech signals. In: Proc. of Eurospeech 2003, Geneva (September 2003)
Google Scholar
Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. Journal 13, 293–303 (2005)
Google Scholar
Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S.: Emotion recognition based on phoneme classes. In: Proc. ICSLP 2004, pp. 889–892 (2004)
Google Scholar
Morris, R.W., Clements, M.A.: Modification of formants in the line spectrum domain. IEEE Signal Processing Letters 9(1), 19–21 (2002)
Article Google Scholar
Olken, F.: Random Sampling from Databases. Ph. D. Thesis, Department of Computer Science, University of California, Berkeley (1993)
Google Scholar
Ratsch, G., Onada, T., Muller, K.: Regularizing adaboost. Advances in Neural Information Processing Systems 11, 564–570 (2000)
Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Hidden markov model based speech emotion recognition. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing, ICASSP (2003)
Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Interspeech (2009), ISCA. Brighton, UK (2009)
Google Scholar
Seppi, D., Batliner, A., Schuller, B., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Aharonson, V.: Patterns, prototypes, performance: Classifying emotional user states. In: Interspeech (2008) ISCA (2008)
Google Scholar
Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis and Machine Vision. Thomson (2008)
Google Scholar
Wang, S., Dash, M., Chia, L., Xu, M.: Efficient sampling of training set in large and noisy multimedia data. ACM Transactions on Multimedia Computing, Communications and Applications 3 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia, Vision and Graphics Laboratory, College of Engineering, Koç University, 34450, Sariyer, Istanbul, Turkey
Elif Bozkurt & Engin Erzin
Department of Electrical and Electronics Engineering, Bahçeşehir University, 34349, Beşiktaş, Istanbul, Turkey
Çiǧdem Eroǧlu Erdem
Department of Electrical and Electronics Engineering, Özyeǧin University, 34662, Üsküdar, Istanbul, Turkey
A. Tanju Erdem

Authors

Elif Bozkurt
View author publications
You can also search for this author in PubMed Google Scholar
Engin Erzin
View author publications
You can also search for this author in PubMed Google Scholar
Çiǧdem Eroǧlu Erdem
View author publications
You can also search for this author in PubMed Google Scholar
A. Tanju Erdem
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Psychology and IIASS, International Institute for Advanced Scientific Studies, Second University of Naples, Vietri sul Mare, SA, Italy
Anna Esposito
School of Computing Science, University of Glasgow, Glasgow, UK
Alessandro Vinciarelli
Department of Telecommunication and Media Informatics, Laboratory of Speech Acoustics, Budapest University of Technology and Economics, 1117, Budapest, Hungary
Klára Vicsi
TELECOM ParisTech, CNRS-LTCI UMR 5141, 75014, Paris, France
Catherine Pelachaud
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7500 AE, Enschede, The Netherlands
Anton Nijholt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bozkurt, E., Erzin, E., Erdem, Ç.E., Erdem, A.T. (2011). RANSAC-Based Training Data Selection on Spectral Features for Emotion Recognition from Spontaneous Speech. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. Lecture Notes in Computer Science, vol 6800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25775-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-25775-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25774-2
Online ISBN: 978-3-642-25775-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics