Skip to main content

Towards a Classifier to Recognize Emotions Using Voice to Improve Recommendations

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1233))

Abstract

The recognition of emotions in tone voice is currently a tool with a high potential when it comes to making recommendations, since it allows to personalize recommendations using the mood of the users as information. However, recognizing emotions using tone of voice is a complex task since it is necessary to pre-process the signal and subsequently recognize the emotion. Most of the current proposals use recurrent networks based on sequences with a temporal relationship. The disadvantage of these networks is that they have a high runtime, which makes it difficult to use in real-time applications. On the other hand, when defining this type of classifier, culture and language must be taken into account, since the tone of voice for the same emotion can vary depending on these cultural factors. In this work we propose a culturally adapted model for recognizing emotions from the voice tone using convolutional neural networks. This type of network has a relatively short execution time allowing its use in real time applications. The results we have obtained improve the current state of the art, reaching 93.6% success over the validation set.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://catalog.elra.info/en-us/repository/browse/ELRA-S0329/.

References

  1. Balakrishnan, A., Rege, A.: Reading emotions from speech using deep neural networks. Technical report, Stanford University, Computer Science Department (2017)

    Google Scholar 

  2. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  3. Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Mahjoub, M.: Speech emotion recognition: methods and cases study, pp. 175–182 (2018)

    Google Scholar 

  4. McCluskey, K.W., Albas, D.C., Niemi, R.R., Cuevas, C., Ferrer, C.: Cross-cultural differences in the perception of the emotional content of speech: a study of the development of sensitivity in Canadian and Mexican children. Dev. Psychol. 11(5), 551 (1975)

    Article  Google Scholar 

  5. Paliwal, K.K.: Spectral subband centroid features for speech recognition. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 1998 (Cat. No. 98CH36181), vol. 2, pp. 617–620. IEEE (1998)

    Google Scholar 

  6. Paulmann, S., Uskul, A.K.: Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners. Cogn. Emot. 28(2), 230–244 (2014)

    Article  Google Scholar 

  7. Pépiot, E.: Voice, speech and gender: male-female acoustic differences and cross-language variation in English and French speakers. Corela Cogn. Représent. Lang. (HS-16) (2015)

    Google Scholar 

  8. Picard, R.W., et al.: Affective computing. Perceptual Computing Section, Media Laboratory, Massachusetts Institute of Technology (1995)

    Google Scholar 

  9. Rincon, J., de la Prieta, F., Zanardini, D., Julian, V., Carrascosa, C.: Influencing over people with a social emotional model. Neurocomputing 231, 47–54 (2017)

    Article  Google Scholar 

  10. Russell, J.A., Lewicka, M., Niit, T.: A cross-cultural study of a circumplex model of affect. J. Pers. Soc. Psychol. 57(5), 848 (1989)

    Article  Google Scholar 

  11. Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition, vol. 2, pp. 401–404 (2003)

    Google Scholar 

  12. Schuller, B., Villar, R., Rigoll, G., Lang, M.: Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition, vol. 1, pp. 325–328 (2005)

    Google Scholar 

  13. Thompson, W., Balkwill, L.-L.: Decoding speech prosody in five languages. Semiotica 2006, 407–424 (2006)

    Article  Google Scholar 

  14. Tyagi, V., Wellekens, C.: On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP 2005, vol. 1, pp. I–529. IEEE (2005)

    Google Scholar 

  15. Ueda, M., Morishita, Y., Nakamura, T., Takata, N., Nakajima, S.: A recipe recommendation system that considers user’s mood. In: Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services, pp. 472–476. ACM (2016)

    Google Scholar 

  16. Zhang, B., Quan, C., Ren, F.: Study on CNN in the recognition of emotion in audio and images. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1–5, June 2016

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by the Spanish Government project TIN2017-89156-R, GVA-CEICE project PROMETEO/2018/002, Generalitat Valenciana and European Social Fund FPI grant ACIF/2017/085, Universitat Politecnica de Valencia research grant (PAID-10-19), and by the Spanish Government (RTI2018-095390-B-C31).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joaquin Taverner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fuentes, J.M., Taverner, J., Rincon, J.A., Botti, V. (2020). Towards a Classifier to Recognize Emotions Using Voice to Improve Recommendations. In: De La Prieta, F., et al. Highlights in Practical Applications of Agents, Multi-Agent Systems, and Trust-worthiness. The PAAMS Collection. PAAMS 2020. Communications in Computer and Information Science, vol 1233. Springer, Cham. https://doi.org/10.1007/978-3-030-51999-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-51999-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-51998-8

  • Online ISBN: 978-3-030-51999-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics