Multimodal Sentiment Analysis Using Deep Neural Networks

  • Harika Abburi
  • Rajendra Prasath
  • Manish Shrivastava
  • Suryakanth V. Gangashetty
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10089)


Due to increase of online product reviews posted daily through various modalities such as video, audio and text, sentimental analysis has gained huge attention. Recent developments in web technologies have also enabled the increase of web content in Hindi. In this paper, an approach to detect the sentiment of an online Hindi product reviews based on its multi-modality natures (audio and text) is presented. For each audio input, Mel Frequency Cepstral Coefficients (MFCC) features are extracted. These features are used to develop a sentiment models using Gaussian Mixture Models (GMM) and Deep Neural Network (DNN) classifiers. From results, it is observed that DNN classifier gives better results compare to GMM. Further textual features are extracted from the transcript of the audio input by using Doc2vec vectors. Support Vector Machine (SVM) classifier is used to develop a sentiment model using these textual features. From experimental results it is observed that combining both the audio and text features results in improvement in the performance for detecting the sentiment of an online product reviews.


Multimodal sentiment analysis MFCC Doc2Vec GMM SVM Deep neural networks 


  1. 1.
    Chaovalit, P., Zhou, L.: Movie review mining: a comparison between supervised and unsupervised classification approaches. In: Proceedings of IEEE 38th Hawaii International Conference on System Sciences, Big Island, Hawaii, pp. 1–9 (2005)Google Scholar
  2. 2.
    Gamallo, P., Garcia, M.: Citius: a naive-bayes strategy for sentiment analysis on english tweets. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 171–175, August 2014Google Scholar
  3. 3.
    Kaushik, L., Sangwan, A., Hansen, J.H.L.: Sentiment extraction from natural audio streams. In: Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 8485–8489 (2013)Google Scholar
  4. 4.
    Kaushik, L., Sangwan, A., Hansen, J.H.: Automatic audio sentiment extraction using keyword spotting. In: Proceedings of Interspeech, pp. 2709–2713, September 2015Google Scholar
  5. 5.
    Kumar, A., Sebastian, T.M.: Sentiment analysis on twitter. IJCSI Int. J. Comput. Sci. 9(4), 372–378 (2012)Google Scholar
  6. 6.
    Mairesse, F., Polifroni, J., Fabbrizio, G.D.: Can prosody inform sentiment analysis? experiments on short spoken reviews. In: Proceedings of IEEE International Confernce on Acoustics, Speech and Signal processing (ICASSP), pp. 5093–5096 (2012)Google Scholar
  7. 7.
    Wollmer, M., Felix, W., Knaup, T., Morency, L.P.: YouTube movie reviews: sentiment analysis in an audio-visual context. IEEE Intll. Syst. 28(3), 46–53 (2013)CrossRefGoogle Scholar
  8. 8.
    Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5, 1093–1113 (2014)CrossRefGoogle Scholar
  9. 9.
    Morency, L.P., Mihalcea, R., Doshi, P.: Towards multimodal sentiment analysis: harvesting opinions from the web. In: Proceedings of the 13th International Conference on Multimodal Interfaces (ICMI2011), pp. 169–176, November 2011Google Scholar
  10. 10.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)Google Scholar
  11. 11.
    Perez-Rosas, V., Mihalcea, R., Morency, L.P.: Multimodal sentiment analysis of spanish online videos. IEEE Intll. Syst. 28(3), 38–45 (2013)CrossRefGoogle Scholar
  12. 12.
    Perez-Rosas, V., Mihalcea, R., Morency, L.P.: Utterance level multimodal sentiment analysis. In: Proceedings of ACL, pp. 973–982 (2013)Google Scholar
  13. 13.
    Poria, S., Cambria, E., Gelbukh, A.: Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of EMNLP, pp. 2539–2544 (2015)Google Scholar
  14. 14.
    Poria, S., Cambria, E., Howard, N., Huang, G.B., Hussain, A.: Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174, 50–59 (2015)CrossRefGoogle Scholar
  15. 15.
    Xing, L., Yuan, L., Qinglin, W., Yu, L.: An approach to sentiment analysis of short chinese texts based on SVMs. In: Proceedings of the 34th Chinese Control Conference, pp. 28–30. IEEE, July 2015Google Scholar
  16. 16.
    Yadav, S.K., Bhushan, M., Gupta, S.: Multimodal sentiment analysis: sentiment analysis using audiovisual format. In: Proceedings of IEEE 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1415–1419 (2015)Google Scholar
  17. 17.
    Yi, J., Nasukawa, T., Bunescu, R., Niblack, W.: Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In: Proceedings of IEEE International Conference on Data Mining (ICDM) (2003)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Harika Abburi
    • 1
  • Rajendra Prasath
    • 2
  • Manish Shrivastava
    • 1
  • Suryakanth V. Gangashetty
    • 1
  1. 1.Langauage Technology Research CenterInternational Institute of Information Technology HyderabadHyderabadIndia
  2. 2.NTNUTrondheimNorway

Personalised recommendations