Emotional Content Comparison in Speech Signal Using Feature Embedding

Rovetta, Stefano; Mnasri, Zied; Masulli, Francesco

doi:10.1007/978-981-15-5093-5_5

Stefano Rovetta⁷,
Zied Mnasri^7,8 &
Francesco Masulli^7,9

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 184))

992 Accesses

Abstract

Expressive speech processing has been improved in the recent years. However, it is still hard to detect emotion change in the same speech signal or to compare emotional content of a pair of speech signals, especially using unlabeled data. Therefore, feature embedding has been used in this work to enhance emotional content comparison for pairs of speech signals, cast as a classification task. Actually, feature embedding was proved to reduce the dimensionality and the intra-feature variance in the input space. Besides, deep autoencoders have recently been used as a feature embedding tool in several applications, such as image, gene and chemical data classification. In this work, a deep autoencoder is used for feature embedding before performing classification by vector quantization of the emotional content of pairs of speech signals. Autoencoding was performed following two schemes, for all features and for each group of features. The results show that the autoencoder succeeds (a) to reveal a more compact and a clearly separated structure of the mapped features, and (b) to improve the classification rates for the similarity/dissimilarity of all emotional content aspects that were compared, i.e neutrality, arousal and valence; in order to calculate the emotion identity metric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005)
Google Scholar
Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: 2013 IEEE Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 511–516 (2013)
Google Scholar
Eyben, F., Buchholz, S., Braunschweiler, N., Latorre, J., Wan, V., Gales, M. J., & Knill, K.: Unsupervised clustering of emotion and voice styles for expressive TTS. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4009–4012 (2012)
Google Scholar
Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., Andre, E., Busso, C., Truong, K.P.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)
Article Google Scholar
Hozjan, V., Kacic, Z.: Context-independent multilingual emotion recognition from speech signals. Int. J. Speech Technol. 6(3), 311–320 (2003)
Article Google Scholar
Huang, P., Huang, Y., Wang, W., & Wang, L.: Deep embedding network for clustering. In: IEEE 2014 22nd International Conference on Pattern Recognition, pp. 1532–1537 (2014)
Google Scholar
Kim, J., Saurous, R.A.: Emotion recognition from human speech using temporal information and deep learning. Proc. Interspeech 2018, 937–940 (2018)
Article Google Scholar
Moneta, C., Parodi, G., Rovetta, S., Zunino, R.: Automated diagnosis and disease characterization using neural network analysis. In: Proceedings of 1992 IEEE International Conference on Systems, Man, and Cybernetics, pp. 123–128 (1992)
Google Scholar
Ng, A.: Sparse autoencoder. CS294A Lecture notes. http://web.stanford.edu/class/cs294a/sparseAutoencoder2011.pdf
Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. Neural Comput. Appl. 9(4), 290–296 (2000)
Article Google Scholar
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)
Article Google Scholar
Ridella, S., Rovetta, S., Zunino, R.: K-winner machines for pattern classification. IEEE Trans. Neural Networks 12(2), 371–385 (2001)
Article Google Scholar
Rovetta, S., Mnasri, Z., Masulli, F., & Cabri, A.: Emotion recognition from speech signal using fuzzy clustering. In:: EUSFLAT Conference (2019) (to appear)
Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing—ICASSP 2003, vol. 2, pp. II-1 (2003)
Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-577 (2004)
Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association (2009)
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Mortillaro, M.: The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism. In: Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France (2013)
Google Scholar
Song, C., Liu, F., Huang, Y., Wang, L., Tan, T.:Auto-encoder based data clustering. In: Iberoamerican Congress on Pattern Recognition, pp. 117–124. Springer, Berlin, Heidelberg (2013)
Google Scholar
Szekely, E., Cabral, J. P., Cahill, P., Carson-Berndsen, J.: Clustering expressive speech styles in audiobooks using glottal source parameters. In: Twelfth Annual Conference of the International Speech Communication Association (2011)
Google Scholar
Tian, F., Gao, B., Cui, Q., Chen, E., Liu, T.Y.: Learning deep representations for graph clustering. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)
Google Scholar
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487 (2016)
Google Scholar

Download references

Acknowledgments

This work was supported by the research grant funded by “Fondi di Ricerca di Ateneo 2016” of the University of Genova.

Author information

Authors and Affiliations

DIBRIS, Università Degli Studi di Genova, Genoa, Italy
Stefano Rovetta, Zied Mnasri & Francesco Masulli
Electrical Engineering Department, ENIT, University Tunis El Manar, Tunis, Tunisia
Zied Mnasri
Sbarro Inst. for Cancer Research and Molecular Medecine, Temple University, Philadelphia, PA, USA
Francesco Masulli

Authors

Stefano Rovetta
View author publications
You can also search for this author in PubMed Google Scholar
Zied Mnasri
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Masulli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zied Mnasri .

Editor information

Editors and Affiliations

Dipartimento di Psicologia and IIASS, Università della Campania “Luigi Vanvitelli”, Caserta, Italy
Anna Esposito
Fundació Tecnocampus, Pompeu Fabra University, Mataró, Barcelona, Spain
Marcos Faundez-Zanuy
Department of Civil, Environmental, Energy, and Material Engineering, University Mediterranea of Reggio Calabria, Reggio Calabria, Italy
Francesco Carlo Morabito
Laboratorio di Neuronica, Dipartimento Elettronica e Telecomunicazioni , Politecnico di Torino, Torino, Italy
Eros Pasero

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rovetta, S., Mnasri, Z., Masulli, F. (2021). Emotional Content Comparison in Speech Signal Using Feature Embedding. In: Esposito, A., Faundez-Zanuy, M., Morabito, F., Pasero, E. (eds) Progresses in Artificial Intelligence and Neural Systems. Smart Innovation, Systems and Technologies, vol 184. Springer, Singapore. https://doi.org/10.1007/978-981-15-5093-5_5

Download citation

DOI: https://doi.org/10.1007/978-981-15-5093-5_5
Published: 10 July 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5092-8
Online ISBN: 978-981-15-5093-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics