An Integrated Word Embedding-Based Dual-Task Learning Method for Sentiment Analysis

  • Yanping Fu
  • Yun LiuEmail author
  • Sheng-Lung Peng
Research Article - Special Issue - Intelligent Computing And Interdisciplinary Applications


Sentiment analysis aimed to automate the task of discriminating the sentiment tendency of a textual review, which expresses a simple sentiment as positive, negative, or neutral. In general, the basic sentiment analysis solution used for feature extraction is the word embedding technique, which only focuses on the contextual or global semantic information and ignores the sentiment polarity of text. Thus, the word embedding technique leads to biased analysis results, especially for some words that have the same semantic context but an opposite sentiment. In this paper, we propose an integrated sentiment embedding method to combine context and sentiment information using a dual-task learning algorithm to perform sentiment analysis. First, we propose three sentiment language models by encoding the sentiment information of texts into word embedding based on three existing semantic models, namely, continuous bag-of-words, prediction, and log-bilinear. Next, based on semantic language models and the proposed sentiment language models, we propose a dual-task learning algorithm to generate hybrid word embedding named integrated sentiment embedding, in which the joint learning method and parallel learning method are applied to jointly process tasks. Experiments on sentence-level and document-level sentiment classification tasks demonstrate that the proposed integrated sentiment embedding has better classification performances compared with basic word embedding methods.


Integrated sentiment embedding Language model Dual-task learning Sentiment analysis 



This work was supported by the Fundamental Research Funds for the Central Universities (2019YJ S006) and the National Key Research and Development of China (2016YFB0800900).


  1. 1.
    Berger, A.L.; Pietra, V.J.D.; Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)Google Scholar
  2. 2.
    Collobert, R.; Weston, J.; Bottou, L.; et al.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)zbMATHGoogle Scholar
  3. 3.
    Chowdhury, G.: Natural language processing. Annu. Rev. Inf. Sci. Technol. 37, 51–89 (2003)CrossRefGoogle Scholar
  4. 4.
    Mikolov, T.; Chen, K.; Corrado, G.; et al.: Efficient estimation of word representations in vector space (2013). arXiv preprint arXiv:1301.3781
  5. 5.
    Guthrie, D.; Allison, B.; Liu, W.; et al.: A closer look at skip-gram modelling. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), pp. 1–4 (2006)Google Scholar
  6. 6.
    Mnih, A.; Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of the 24th International Conference on Machine Learning, pp. 641–648 (2007)Google Scholar
  7. 7.
    Mikolov, T.; Sutskever, I.; Chen, K.; et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  8. 8.
    Kühnen, U.; Hannover, B.; Schubert, B.: The semantic-procedural interface model of the self: the role of self-knowledge for context-dependent versus context-independent modes of thinking. J. Pers. Soc. Psychol. 80(3), 397 (2001)CrossRefGoogle Scholar
  9. 9.
    Chen, H.; Finin, T.; Joshi, A.: Semantic web in the context broker architecture, UMBC Faculty Collection (2004)Google Scholar
  10. 10.
    Maton, K.: Making semantic waves: a key to cumulative knowledge-building. Linguist. Educ. 24(1), 8–22 (2013)CrossRefGoogle Scholar
  11. 11.
    Bellegarda, J.R.: Exploiting latent semantic information in statistical language modeling. Proc. IEEE 88(8), 1279–1296 (2000)CrossRefGoogle Scholar
  12. 12.
    Pennington, J.; Socher, R.; Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)Google Scholar
  13. 13.
    Bellegarda, J.R.: Exploiting both local and global constraints for multi-span statistical language modeling. ICASSP 2, 677–680 (1998)Google Scholar
  14. 14.
    Zhai, F.; Potdar, S.; Xiang, B.; et al.: Neural models for sequence chunking. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)Google Scholar
  15. 15.
    Bonhage, C.E.; Meyer, L.; Gruber, T.; et al.: Oscillatory EEG dynamics underlying automatic chunking during sentence processing. Neuroimage 66, 11–21 (2015)Google Scholar
  16. 16.
    Carneiro, H.C.C.; França, F.M.G.; Lima, P.M.V.: Multilingual part-of-speech tagging with weightless neural networks. Neural Netw. 152, 647–657 (2017)Google Scholar
  17. 17.
    Jamatia, A.; Gambäck, B.; Das, A.: Part-of-speech tagging for code-mixed English-Hindi twitter and facebook chat messages. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 239–248 (2015)Google Scholar
  18. 18.
    Lample, G.; Ballesteros, M.; Subramanian, S.; et al.: Neural architectures for named entity recognition (2016). arXiv preprint arXiv:1603.01360
  19. 19.
    Neelakantan, A.; Collins, M.: Learning dictionaries for named entity recognition using minimal supervision (2015). arXiv preprint arXiv:1504.06650
  20. 20.
    Cambria, E.: Affective computing and sentiment analysis. IEEE Intell. Syst. 31(2), 102–107 (2016)CrossRefGoogle Scholar
  21. 21.
    Tang, D.; Wei, F.; Qin, B.; et al.: Sentiment embeddings with applications to sentiment analysis. IEEE Trans. Knowl. Data Eng. 28(2), 496–509 (2016)CrossRefGoogle Scholar
  22. 22.
    Liu, K.L.; Li, W.J.; Guo, M.: Emoticon smoothed language models for twitter sentiment analysis. Aaai 12, 22–26 (2012)Google Scholar
  23. 23.
    Maas, A.L.; Daly, R.E.; Pham, P.T.; et al.: Learning word vectors for sentiment analysis. In: Meeting of the Association for Computational Linguistics. Human Language Technologies. Association for Computational Linguistics (2011)Google Scholar
  24. 24.
    Tang, D.; Wei, F.; Yang, N.; et al.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1555–1565 (2014)Google Scholar
  25. 25.
    Tang, D.; Wei, F.; Qin, B.; et al.: Coooolll: a deep learning system for twitter sentiment classification. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp. 208–212 (2014)Google Scholar
  26. 26.
    Pang, B.; Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pp. 271–279. Association for Computational Linguistics (2004)Google Scholar
  27. 27.
    Lai, S.; Liu, K.; He, S.; et al.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)CrossRefGoogle Scholar
  28. 28.
    Mnih, A.; Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)Google Scholar
  29. 29.
    Mikolov, T.; Kombrink, S.; Burget, L.; et al.: Extensions of recurrent neural network language model. In: Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531 (2011)Google Scholar
  30. 30.
    Mikolov, T.; Zweig, G.: Context dependent recurrent neural network language model. In: 2012 IEEE Spoken Language Technology Workshop (SLT) pp. 234–239 (2012)Google Scholar
  31. 31.
    Bengio, Y.; Ducharme, R.; Vincent, P.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)zbMATHGoogle Scholar
  32. 32.
    Collobert, R.; Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning (2008)Google Scholar
  33. 33.
    Young, T.; Hazarika, D.; Poria, S.; et al.: Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13(3), 55–75 (2018)CrossRefGoogle Scholar
  34. 34.
    Kumar, A.; Irsoy, O.; Ondruska, P.; et al.: Ask me anything: dynamic memory networks for natural language processing. In: International Conference on Machine Learning, pp. 1378–1387 (2016)Google Scholar
  35. 35.
    Kombrink, S.; Mikolov, T.; Karafiät M.; et al.: Recurrent neural network based language modeling in meeting recognition. In: Twelfth Annual Conference of the International Speech Communication Association (2011)Google Scholar
  36. 36.
    Mikolov, T.; Karafiät, M.; Burget, L.; et al.: Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association (2010)Google Scholar
  37. 37.
    Mikolov, T.; Chen, K.; Corrado, G.; Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv preprint arXiv:1301.3781
  38. 38.
    Morin, F.; Bengio, Y.: Hierarchical probabilistic neural network language model. Aistats 5, 246–252 (2005)Google Scholar
  39. 39.
    Goldberg, Y.; Levy, O.: word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method [Online] (2014). arXiv:1402.3722
  40. 40.
    Hinton, G.E.; Osindero, S.; Teh, Y.W.: A fast learning algorithm for deep belief networks. Neural Comput. 18, 1527–1554 (2006)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Ma, Y.; Peng, H.; Cambria, E.: Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM[C]. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  42. 42.
    Al-Rfou, R.; Choe, D.; Constant, N.; et al.: Character-level language modeling with deeper self-attention[C]. Proc. AAAI Conf. Artif. Intell. 33, 3159–3166 (2019)Google Scholar
  43. 43.
    Devlin, J.; Chang, M.W.; Lee, K.; et al.: Bert: pre-training of deep bidirectional transformers for language understanding[J] (2018). arXiv preprint arXiv:1810.04805
  44. 44.
    Bespalov, D.; Bai, B.; Qi, Y.; Shokoufandeh, A.: Sentiment classification based on supervised latent n-gram analysis. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 375–382 (2011)Google Scholar
  45. 45.
    Vilares, D.; Alonso, M.A.; et al.: Sentiment analysis on monolingual, multilingual and code-switching twitter corpora[C]. In: Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 2–8 (2015)Google Scholar
  46. 46.
    Abdulla, N.A.; Ahmed, N.A.; Shehab, M.A.; et al.: Arabic sentiment analysis: Lexicon-based and corpus-based[C]//2013. In: IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies, pp. 1–6 (2013)Google Scholar
  47. 47.
    Steiner-Correa, F.; Viedma-del-Jesus, M.I.; Lopez-Herrera, A.G.: A survey of multilingual human-tagged short message datasets for sentiment analysis tasks. Soft. Comput. 22, 8227–8242 (2018)CrossRefGoogle Scholar
  48. 48.
    Al-Smadi, M.; Talafha, B.; Al-Ayyoub, M.; et al.: Using long short-term memory deep neural networks for aspect-based sentiment analysis of Arabic reviews. Int. J. Mach. Learn. Cybernet. 10, 2163–2175 (2018)CrossRefGoogle Scholar
  49. 49.
    Ranjan, R.; Patel, V.M.; Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41, 121–135 (2019)CrossRefGoogle Scholar
  50. 50.
    Zhang, Z.; Luo, P.; Loy, C.C.; et al.: Facial landmark detection by deep multi-task learning. In: European Conference on Computer Vision, pp. 94–108 (2014)CrossRefGoogle Scholar
  51. 51.
    Liu, W.; et al.: Multi-task deep visual-semantic embedding for video thumbnail selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  52. 52.
    Argyriou, A.; Evgeniou, T.; Pontil, M.: Multi-task feature learning. In: Advances in Neural Information Processing Systems, pp. 41–48 (2007)Google Scholar
  53. 53.
    Dahl, G.; Yu, D.; Deng, L.; Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRefGoogle Scholar
  54. 54.
    Agostinelli, F.; Hoffman, M.; Sadowski, P.; Baldi, P.: Learning activation functions to improve deep neural networks [Online] (2014). arXiv:1412.6830
  55. 55.
    Zhang, B.; Liu, C.H.; Tang, J.; et al.: Learning-based energy-efficient data collection by unmanned vehicles in smart cities. IEEE Trans. Ind. Inf. 14(4), 1666–1676 (2018)CrossRefGoogle Scholar
  56. 56.
    Vogl, T.P.; Mangis, J.K.; Rigler, A.K.; et al.: Accelerating the convergence of the back-propagation method. Biol. Cybern. 59, 257–263 (1988)CrossRefGoogle Scholar
  57. 57.
    Ng, A.Y.: Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: Proceedings of the Twenty-first International Conference on Machine Learning, pp. 78–98 (2004)Google Scholar

Copyright information

© King Fahd University of Petroleum & Minerals 2019

Authors and Affiliations

  1. 1.School of Electronic and Information Engineering, Key Laboratory of Communication and Information SystemsBeijing Municipal Commission of Education Beijing Jiaotong UniversityBeijingChina
  2. 2.Department of Computer Science and Information EngineeringNational Dong Hwa UniversityShoufeng, HualienTaiwan

Personalised recommendations