Learning low-dimensional vector representations of words from a large corpus is one of the basic tasks in natural language processing (NLP). The existing universal word embedding model learns word vectors mainly through grammar and semantic information from the context, while ignoring the sentiment information contained in the words. Some approaches, although they model sentiment information in the reviews, do not consider certain words in different domains. In a case where the emotion changes, if the general word vector is directly applied to the review sentiment analysis task, then this will inevitably affect the performance of the sentiment classification. To solve this problem, this paper extends the CBoW (continuous bag-of-words) word vector model and proposes a cross-domain sentiment aware word embedding learning model, which can capture the sentiment information and domain relevance of a word at the same time. This paper conducts several experiments on Amazon user review data in different domains to evaluate the performance of the model. The experimental results show that the proposed model can obtain a nearly 2% accuracy improvement compared with the general word vector when modeling only the sentiment information of the context. At the same time, when the domain information and the sentiment information are both included, the accuracy and Macro-F1 value of the sentiment classification tasks are significantly improved compared with existing sentiment word embeddings.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Hu S, Zou L, Yu J, Wang H (2018) Answering natural language questions by subgraph matching over knowledge graphs. IEEE Trans Knowl Data Eng 30(5):824–837
Mikolov T, Sutskever I, Chen K et al (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst Stateline Curran Assoc 26:3111–3119
Moreno E, Gonzalez R (2016) Automatic algorithm to classify and locate research papers using natural language. IEEE Latin Am Trans 14(3):1367–1371
Almuhareb A, Alsanie W (2019) Arabic word segmentation with long short-term memory neural networks and word embedding. IEEE Access 7:12879–12887
Mills M, Bourbakis N (2014) Graph-based methods for natural language processing and understanding—a survey and analysis. IEEE Trans Syst Man Cybern Syst 44(1):59–71
Bollegala D, Mu T, Goulermas JY (2016) Cross-domain sentiment classification using sentiment sensitive embeddings. IEEE Trans Knowl Data Eng 28(2):398–410
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Le A, Clanuwat T, Kitamoto A (2019) A human-inspired recognition system for pre-modern japanese historical documents. IEEE Access 7:84163–84169
Dong L, Wei F, Xu K, Liu S, Zhou M (2016) Adaptive multi-compositionality for recursive neural network models. IEEE Trans Audio Speech Lang Process 24(3):422–431
Hassan A, Mahmood A (2018) Convolutional recurrent deep learning model for sentence classification. IEEE Access 6:13949–13957
Schouten K, Frasincar F (2016) Survey on aspect-level sentiment analysis. IEEE Trans Knowl Data Eng 28(3):813–830
Er MJ, Zhang Y, Wang N et al (2016) Attention pooling-based convolutional neural network for sentence modelling. Inf Sci 373:388–403
Tang D, Wei F, Qin B, Yang N (2016) Sentiment embeddings with applications to sentiment analysis. IEEE Trans Knowl Data Eng 28(2):496–509
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Bengio Y, Ducharme R, Vincent P et al (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155
Mikolov T, Chen K, Corrado G et al (2013) Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781.
Dong X, Dong J (2018) The visual word booster: a spatial layout of words descriptor exploiting contour cues. IEEE Trans Image Process 27(8):3904–3917
Duyu T, Furu W, Bing Q et al (2016) Sentiment embeddings with applications to sentiment analysis. IEEE Trans Knowl Data Eng 28(2):496–509
Deng D, Jing L, Yu J, Sun S (2019) Sparse self-attention LSTM for sentiment lexicon construction. IEEE/ACM Trans Audio Speech Lang Process 27(11):704–718
Rida-E-Fatima S, Javed A, Banjar A (2019) A multi-layer dual attention deep learning model with refined word embeddings for aspect-based sentiment analysis. IEEE Access 7:114795–114807
Sarma PK, Liang Y, Sethares WA (2018) Domain adapted word embeddings for improved sentiment classification. In: Proceedings of the 56th Annual Meeting of the Association for computational linguistics (short Papers). ACL Press, Melbourne, pp 534–539
Y. Hao, T. Mu, R. Hong, M. Wang (2019) Cross-domain sentiment encoding through stochastic word embedding. IEEE Trans Knowl Data Eng
Minmin C (2017) Efficient vector representation for documents through corruption. https://arxiv.org/abs/1707.02377
Lu W, Hai LC, Lofgren J (2016) A general regularization framework for domain adaptation. In: Proceedings of the 2016 Conference on empirical methods in natural language processing. ACL Press, Austin, pp 950–954
McAuley J, Targett C, Shi Q et al (2015) Image-based recommendations on styles and substitutes. In: Proceedings of the 38th International ACM SIGIR conference on research and development in information retrieval. ACM Press, Shanghai, pp 43–52
Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Xiong S, Lv H, Zhao W et al (2018) Towards Twitter sentiment classification by multi-level sentiment-enriched word embeddings. Neurocomputing 278:2459–2466
Lin M, Xu Z, Yao Z (2018) Multi-attribute group decision-making under probabilistic uncertain linguistic environment. J Oper Res Soc 69(2):157–170
Lin M, Chen Z, Liao H, Xu Z (2019) ELECTRE II method to deal with probabilistic linguistic term sets and its application to edge computing. Nonlinear Dyn 96(3):2125–2143
Garg H, Kumar K (2019) Prioritized aggregation operators based on linguistic connection number for multiple attribute group decision-making under linguistic intuitionistic fuzzy environment. ICSES Trans Neural Fuzzy Comput 2(3):1–15
Wu XL, Liao HC (2019) Comparison analysis between DNMA method and other MCDM methods. ICSES Trans Neural Fuzzy Comput 2(1):4–10
This work was supported by the Chongqing Research Program of Basic Research and Frontier Technology (Grant No. cstc2017jcyjAX0270) and the National Natural Science Foundation of China (Grant No. 61772099, and No. 61872086).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Liu, J., Zheng, S., Xu, G. et al. Cross-domain sentiment aware word embeddings for review sentiment analysis. Int. J. Mach. Learn. & Cyber. 12, 343–354 (2021). https://doi.org/10.1007/s13042-020-01175-7