Skip to main content

Sentiment Analysis of Telephone Conversations Using Multimodal Data

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2018)

Abstract

Sentiment analysis of conversations is a widely studied topic, but the proposed solutions are mostly based only on text analysis, which in the real conditions of telephone conversations is not ideal and contains a lot of mistakes and inaccuracies arising at the stage of speech recognition. Today, there are almost no papers about the sentiment analysis of conversations using multimodal datasets for the Russian language. In this paper, we suggest the use of multimodal sentiment analysis of conversations, with both the recognized text and the audio signal used as the training data. To do this, we assemble our own dataset consisting of records of telephone conversations, labelled by sentiment intensity. The texts are obtained with the help of ready-made tools for automatic speech recognition. We carry out a number of experiments to find the best way to extract features from audio and texts and we also build models for determining the sentiment intensity for individual modalities and a combination of them. Different classification algorithms are compared: linear, neural networks and ensembles of decision trees, where XGBoost works best for audio, Logistic Regression - for text and LightGBM - for multimodal data. The results show that combining several modalities allows to achieve the best quality of classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/srubin/p2fa-vislab.

  2. 2.

    https://www.readbeyond.it/aeneas/.

  3. 3.

    https://www.nltk.org/.

  4. 4.

    https://github.com/jiaaro/pydub.

  5. 5.

    https://github.com/scikit-learn/scikit-learn.

References

  1. Arkhipenko, K., et al.: Comparison of neural network architectures for sentiment analysis of Russian tweets. In: Proceedings of the International Conference Dialogue (2016)

    Google Scholar 

  2. Baltrusaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. CoRR abs/1705.09406. arXiv:1705.09406 (2017)

  3. Carletta, J., et al.: The AMI meeting corpus: a pre-announcement. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 28–39. Springer, Heidelberg (2006). https://doi.org/10.1007/11677482_3. ISBN 3-540-32549-2

    Chapter  Google Scholar 

  4. Chen, M., et al.: Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction - ICMI 2017. ACM Press (2017). https://doi.org/10.1145/3136755.3136801

  5. Chollet, F., et al.: Keras (2015). https://keras.io

  6. Greff, K., et al.: LSTM: a search space Odyssey. CoRR abs/1503.04069. arXiv:1503.04069 (2015)

  7. Hershey, S., et al.: CNN architectures for large-scale audio classification. CoRR abs/1609.09430. arXiv:1609.09430 (2016)

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  9. Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31. ISBN 978-3-319-26122-5

    Chapter  Google Scholar 

  10. LeCun, Y., Haffner, P., Bottou, L., Bengio, Y.: Object recognition with gradient-based learning. Shape, Contour and Grouping in Computer Vision. LNCS, vol. 1681, pp. 319–345. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-46805-6_19. ISBN 978-3-540-46805-9

    Chapter  Google Scholar 

  11. Loukachevitch, N., et al.: Task on sentiment analysis of tweets about telecom and financial companies. In: Proceedings of International Conference Dialogue (2015)

    Google Scholar 

  12. Martínez-Cáamara, E., et al.: Sentiment analysis in Twitter. Nat. Lang. Eng. 20, 1–28 (2014)

    Article  Google Scholar 

  13. McFee, B., et al.: librosa/librosa: 0.6.1, May 2018. https://doi.org/10.5281/zenodo.1252297

  14. Mikolov, T., et al.: Effcient estimation of word representations in vector space. CoRR abs/1301.3781. arXiv:1301.3781 (2013)

  15. Panchenko, A., et al.: Human and machine judgements for Russian semantic relatedness. In: Ignatov, D.I., et al. (eds.) AIST 2016. CCIS, vol. 661, pp. 221–235. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52920-2_21. ISBN 978-3-319-52920-2

    Chapter  Google Scholar 

  16. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing - EMNLP 2002. Association for Computational Linguistics (2002). https://doi.org/10.3115/1118693.1118704

  17. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  18. Poria, S., et al.: Multimodal sentiment analysis: addressing key issues and setting up baselines. CoRR abs/1803.07427 (2018)

    Book  Google Scholar 

  19. Ramos, J.: Using TF-IDF to determine word relevance in document queries, January 2003

    Google Scholar 

  20. Řeh\(\mathring{\rm {u}}\)řuek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta, May 2010. http://is.muni.cz/publication/884893/en

  21. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556. arXiv:1409.1556 (2014)

  22. Somasundaran, S., et al.: Manual annotation of opinion categories in meetings. In: Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006 - LAC 2006. Association for Computational Linguistics (2006). https://doi.org/10.3115/1641991.1641998

  23. Turney, P.D.: Thumbs up or thumbs down? In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL 2002. Association for Computational Linguistics (2001). https://doi.org/10.3115/1073083.1073153

  24. Yuhas, B.P., Goldstein, M.H., Sejnowski, T.J.: Integration of acoustic and visual speech signals using neural networks. IEEE Commun. Mag. 27(11), 65–71 (1989). https://doi.org/10.1109/35.41402

    Article  Google Scholar 

  25. Zadeh, A., et al.: MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. CoRR abs/1606.06259. arXiv:1606.06259 (2016)

  26. Zadeh, A., et al.: Multi-attention recurrent network for human communication comprehension. CoRR abs/1802.00923. arXiv:1802.00923 (2018)

  27. Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey, January 2018

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Gafuanovich Logumanov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Logumanov, A.G., Klenin, J.D., Botov, D.S. (2018). Sentiment Analysis of Telephone Conversations Using Multimodal Data. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2018. Lecture Notes in Computer Science(), vol 11179. Springer, Cham. https://doi.org/10.1007/978-3-030-11027-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11027-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11026-0

  • Online ISBN: 978-3-030-11027-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics