Skip to main content

Emotional Content Comparison in Speech Signal Using Feature Embedding

  • Chapter
  • First Online:
Progresses in Artificial Intelligence and Neural Systems

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 184))

  • 992 Accesses

Abstract

Expressive speech processing has been improved in the recent years. However, it is still hard to detect emotion change in the same speech signal or to compare emotional content of a pair of speech signals, especially using unlabeled data. Therefore, feature embedding has been used in this work to enhance emotional content comparison for pairs of speech signals, cast as a classification task. Actually, feature embedding was proved to reduce the dimensionality and the intra-feature variance in the input space. Besides, deep autoencoders have recently been used as a feature embedding tool in several applications, such as image, gene and chemical data classification. In this work, a deep autoencoder is used for feature embedding before performing classification by vector quantization of the emotional content of pairs of speech signals. Autoencoding was performed following two schemes, for all features and for each group of features. The results show that the autoencoder succeeds (a) to reveal a more compact and a clearly separated structure of the mapped features, and (b) to improve the classification rates for the similarity/dissimilarity of all emotional content aspects that were compared, i.e neutrality, arousal and valence; in order to calculate the emotion identity metric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005)

    Google Scholar 

  2. Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: 2013 IEEE Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 511–516 (2013)

    Google Scholar 

  3. Eyben, F., Buchholz, S., Braunschweiler, N., Latorre, J., Wan, V., Gales, M. J., & Knill, K.: Unsupervised clustering of emotion and voice styles for expressive TTS. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4009–4012 (2012)

    Google Scholar 

  4. Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., Andre, E., Busso, C., Truong, K.P.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)

    Article  Google Scholar 

  5. Hozjan, V., Kacic, Z.: Context-independent multilingual emotion recognition from speech signals. Int. J. Speech Technol. 6(3), 311–320 (2003)

    Article  Google Scholar 

  6. Huang, P., Huang, Y., Wang, W., & Wang, L.: Deep embedding network for clustering. In: IEEE 2014 22nd International Conference on Pattern Recognition, pp. 1532–1537 (2014)

    Google Scholar 

  7. Kim, J., Saurous, R.A.: Emotion recognition from human speech using temporal information and deep learning. Proc. Interspeech 2018, 937–940 (2018)

    Article  Google Scholar 

  8. Moneta, C., Parodi, G., Rovetta, S., Zunino, R.: Automated diagnosis and disease characterization using neural network analysis. In: Proceedings of 1992 IEEE International Conference on Systems, Man, and Cybernetics, pp. 123–128 (1992)

    Google Scholar 

  9. Ng, A.: Sparse autoencoder. CS294A Lecture notes. http://web.stanford.edu/class/cs294a/sparseAutoencoder2011.pdf

  10. Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. Neural Comput. Appl. 9(4), 290–296 (2000)

    Article  Google Scholar 

  11. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)

    Article  Google Scholar 

  12. Ridella, S., Rovetta, S., Zunino, R.: K-winner machines for pattern classification. IEEE Trans. Neural Networks 12(2), 371–385 (2001)

    Article  Google Scholar 

  13. Rovetta, S., Mnasri, Z., Masulli, F., & Cabri, A.: Emotion recognition from speech signal using fuzzy clustering. In:: EUSFLAT Conference (2019) (to appear)

    Google Scholar 

  14. Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing—ICASSP 2003, vol. 2, pp. II-1 (2003)

    Google Scholar 

  15. Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-577 (2004)

    Google Scholar 

  16. Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association (2009)

    Google Scholar 

  17. Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Mortillaro, M.: The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism. In: Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France (2013)

    Google Scholar 

  18. Song, C., Liu, F., Huang, Y., Wang, L., Tan, T.:Auto-encoder based data clustering. In: Iberoamerican Congress on Pattern Recognition, pp. 117–124. Springer, Berlin, Heidelberg (2013)

    Google Scholar 

  19. Szekely, E., Cabral, J. P., Cahill, P., Carson-Berndsen, J.: Clustering expressive speech styles in audiobooks using glottal source parameters. In: Twelfth Annual Conference of the International Speech Communication Association (2011)

    Google Scholar 

  20. Tian, F., Gao, B., Cui, Q., Chen, E., Liu, T.Y.: Learning deep representations for graph clustering. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)

    Google Scholar 

  21. Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487 (2016)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the research grant funded by “Fondi di Ricerca di Ateneo 2016” of the University of Genova.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zied Mnasri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rovetta, S., Mnasri, Z., Masulli, F. (2021). Emotional Content Comparison in Speech Signal Using Feature Embedding. In: Esposito, A., Faundez-Zanuy, M., Morabito, F., Pasero, E. (eds) Progresses in Artificial Intelligence and Neural Systems. Smart Innovation, Systems and Technologies, vol 184. Springer, Singapore. https://doi.org/10.1007/978-981-15-5093-5_5

Download citation

Publish with us

Policies and ethics