Towards a Classifier to Recognize Emotions Using Voice to Improve Recommendations

Fuentes, José Manuel; Taverner, Joaquin; Rincon, Jaime Andres; Botti, Vicente

doi:10.1007/978-3-030-51999-5_18

Towards a Classifier to Recognize Emotions Using Voice to Improve Recommendations

José Manuel Fuentes¹⁶,
Joaquin Taverner¹⁶,
Jaime Andres Rincon¹⁶ &
…
Vicente Botti¹⁶

Conference paper
First Online: 06 July 2020

1157 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1233))

Abstract

The recognition of emotions in tone voice is currently a tool with a high potential when it comes to making recommendations, since it allows to personalize recommendations using the mood of the users as information. However, recognizing emotions using tone of voice is a complex task since it is necessary to pre-process the signal and subsequently recognize the emotion. Most of the current proposals use recurrent networks based on sequences with a temporal relationship. The disadvantage of these networks is that they have a high runtime, which makes it difficult to use in real-time applications. On the other hand, when defining this type of classifier, culture and language must be taken into account, since the tone of voice for the same emotion can vary depending on these cultural factors. In this work we propose a culturally adapted model for recognizing emotions from the voice tone using convolutional neural networks. This type of network has a relatively short execution time allowing its use in real time applications. The results we have obtained improve the current state of the art, reaching 93.6% success over the validation set.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://catalog.elra.info/en-us/repository/browse/ELRA-S0329/.

References

Balakrishnan, A., Rege, A.: Reading emotions from speech using deep neural networks. Technical report, Stanford University, Computer Science Department (2017)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Mahjoub, M.: Speech emotion recognition: methods and cases study, pp. 175–182 (2018)
Google Scholar
McCluskey, K.W., Albas, D.C., Niemi, R.R., Cuevas, C., Ferrer, C.: Cross-cultural differences in the perception of the emotional content of speech: a study of the development of sensitivity in Canadian and Mexican children. Dev. Psychol. 11(5), 551 (1975)
Article Google Scholar
Paliwal, K.K.: Spectral subband centroid features for speech recognition. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 1998 (Cat. No. 98CH36181), vol. 2, pp. 617–620. IEEE (1998)
Google Scholar
Paulmann, S., Uskul, A.K.: Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners. Cogn. Emot. 28(2), 230–244 (2014)
Article Google Scholar
Pépiot, E.: Voice, speech and gender: male-female acoustic differences and cross-language variation in English and French speakers. Corela Cogn. Représent. Lang. (HS-16) (2015)
Google Scholar
Picard, R.W., et al.: Affective computing. Perceptual Computing Section, Media Laboratory, Massachusetts Institute of Technology (1995)
Google Scholar
Rincon, J., de la Prieta, F., Zanardini, D., Julian, V., Carrascosa, C.: Influencing over people with a social emotional model. Neurocomputing 231, 47–54 (2017)
Article Google Scholar
Russell, J.A., Lewicka, M., Niit, T.: A cross-cultural study of a circumplex model of affect. J. Pers. Soc. Psychol. 57(5), 848 (1989)
Article Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition, vol. 2, pp. 401–404 (2003)
Google Scholar
Schuller, B., Villar, R., Rigoll, G., Lang, M.: Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition, vol. 1, pp. 325–328 (2005)
Google Scholar
Thompson, W., Balkwill, L.-L.: Decoding speech prosody in five languages. Semiotica 2006, 407–424 (2006)
Article Google Scholar
Tyagi, V., Wellekens, C.: On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP 2005, vol. 1, pp. I–529. IEEE (2005)
Google Scholar
Ueda, M., Morishita, Y., Nakamura, T., Takata, N., Nakajima, S.: A recipe recommendation system that considers user’s mood. In: Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services, pp. 472–476. ACM (2016)
Google Scholar
Zhang, B., Quan, C., Ren, F.: Study on CNN in the recognition of emotion in audio and images. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1–5, June 2016
Google Scholar

Download references

Acknowledgements

This work is partially supported by the Spanish Government project TIN2017-89156-R, GVA-CEICE project PROMETEO/2018/002, Generalitat Valenciana and European Social Fund FPI grant ACIF/2017/085, Universitat Politecnica de Valencia research grant (PAID-10-19), and by the Spanish Government (RTI2018-095390-B-C31).

Author information

Authors and Affiliations

Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València, Valencia, Spain
José Manuel Fuentes, Joaquin Taverner, Jaime Andres Rincon & Vicente Botti

Authors

José Manuel Fuentes
View author publications
You can also search for this author in PubMed Google Scholar
Joaquin Taverner
View author publications
You can also search for this author in PubMed Google Scholar
Jaime Andres Rincon
View author publications
You can also search for this author in PubMed Google Scholar
Vicente Botti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joaquin Taverner .

Editor information

Editors and Affiliations

University of Salamanca, Salamanca, Spain
Fernando De La Prieta
University of Lille, Villeneuve d’Ascq, France
Philippe Mathieu
Polytechnic University of Valencia, Valencia, Spain
Jaime Andrés Rincón Arango
German University in Cairo, New Cairo City, Egypt
Alia El Bolock
Polytechnic University of Valencia, Valencia, Spain
Elena Del Val
University of Valencia, Valencia, Spain
Jaume Jordán Prunera
Instituto Superior de Engenharia do Port, Porto, Portugal
João Carneiro
Complutense University of Madrid, Madrid, Spain
Rubén Fuentes
National Laboratory for Energy and Geology, Amadora, Portugal
Fernando Lopes
Polytechnic University of Valencia, Valencia, Spain
Vicente Julian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fuentes, J.M., Taverner, J., Rincon, J.A., Botti, V. (2020). Towards a Classifier to Recognize Emotions Using Voice to Improve Recommendations. In: De La Prieta, F., et al. Highlights in Practical Applications of Agents, Multi-Agent Systems, and Trust-worthiness. The PAAMS Collection. PAAMS 2020. Communications in Computer and Information Science, vol 1233. Springer, Cham. https://doi.org/10.1007/978-3-030-51999-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-51999-5_18
Published: 06 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51998-8
Online ISBN: 978-3-030-51999-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics