Abstract
Sentiment analysis is very popular in natural language processing and text mining. The traditional sentiment analysis methods use supervised and unsupervised classifiers in a single domain and achieve good results. When training data and test data come from different domains, these methods become poor. The problem of cross-domain opinion analysis is that it is not easy to get a large number of tagged data sets and it is impossible to tag all the data in the interesting domains. We propose an extraction method for topic and sentiment words based on conditional random field and syntactic structure to analyze the sentiment orientation of Chinese product reviews. We aim to extract topic and sentiment words from target domain and identify their sentiment orientation with one or a few topic and sentiment words being tagged in the source domain and words in the target domain without any tagged information. Our experimental results show that our method is effective in cross-domain sentiment analysis.
Similar content being viewed by others
References
Cambria, E., Schuller, B., Xia, Y., & Havasi, C. (2013). New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems, 28(2), 15–21.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP (pp. 79–86).
Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82–89.
Zhang, P., & He, Z. (2013). A weakly supervised approach to Chinese sentiment classification using partitioned self-training. Journal of Information Science, 39(6), 815–831.
Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Meeting on Association for Computational Linguistics (pp. 417–424). Association for Computational Linguistics.
Pardo, M. Á. A., Vilares, D., & Gómez-Rodríguez, C. (2015). A syntactic approach for opinion mining on spanish reviews. Natural Language Engineering, 21(1), 139–163.
Wilson, T., Wiebe, J., & Hoffmann, P. (2009). Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Computational Linguistics, 35(3), 399–433.
Yang, X., Zhang, T., Xu, C., & Yang, M. H. (2015). Boosted multifeature learning for cross-domain transfer. ACM Transactions on Multimedia Computing Communications & Applications, 11(3), 1–18.
Bollegala, D., Weir, D., & Carroll, J. (2013). Cross-domain sentiment classification using a sentiment sensitive thesaurus. IEEE Transactions on Knowledge and Data Engineering, 25(8), 1719–1731.
Bollegala, D., Mu, T., & Goulermas, J. Y. (2016). Cross-domain sentiment classification using sentiment sensitive embeddings. IEEE Transactions on Knowledge and Data Engineering, 28(2), 398–410.
Lafferty, J. D., Mccallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Eighteenth international conference on machine learning (Vol. 3, pp. 282–289). Morgan Kaufmann Publishers Inc.
Fei, S., & Pereira, F. (2003). Shallow parsing with conditional random fields. In Conference of the North American chapter of the Association for Computational Linguistics on human language technology (Vol. 53, pp. 134–141). Association for Computational Linguistics.
Zhou, Y., Hu, Q., Jia, Y., & Jia, Y. (2015). Combining heterogeneous deep neural networks with conditional random fields for Chinese dialogue act recognition. Neurocomputing, 168(C), 408–417.
Xiong, Y. (2012). Integrating N-gram model information for Chinese word segmentation based on conditional random fields. In: International conference on machine learning and cybernetics (Vol. 2, pp. 762–766). IEEE.
Ruokolainen, T. (2012). Applying piecewise approximation in perceptron training of conditional random fields. In International conference on advances in intelligent data analysis (Vol. 7619, pp. 324–333). Springer.
Luo, J., & Li, Y. (2013). Intrusion detection method based on fuzzy conditional random fields. Journal of Computational Information Systems, 9(20), 8361–8370.
Marcińczuk, M., Stanek, M., Piasecki, M., & Musiał, A. (2012). Rich set of features for proper name recognition in polish texts. In International conference on security and intelligent information systems (Vol. 7053, pp. 332–344). Springer.
Tao, L., Elhamifar, E., Khudanpur, S., Hager, G. D., & Vidal, R. (2012). Sparse hidden Markov models for surgical gesture classification and skill evaluation. In International conference on information processing in computer-assisted interventions (Vol. 7330, pp. 167–177). Springer.
Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., et al. (2006). A comparative study of energy minimization methods for Markov random fields. In European conference on computer vision (Vol. 30, pp. 16–29). Berlin: Springer.
Lafferty, J. D., Mccallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Eighteenth international conference on machine learning (Vol. 3, pp. 282–289). Morgan Kaufmann Publishers Inc.
Hao, Z., Wang, H., & Wen, W. (2013). Product named entity recognition for chinese query questions based on a skip-chain crf model. Neural Computing and Applications, 23(2), 371–379.
Liao, L., Fox, D., & Kautz, H. (2007). Extracting places and activities from GPS traces using hierarchical conditional random fields. International Journal of Robotics Research, 26(1), 119–134.
Nicolas, S., Dardenne, J., Paquet, T., & Heutte, L. (2010). Document image segmentation using a 2D conditional random field model. In International conference on document analysis and recognition (Vol. 1, pp. 407–411). IEEE.
Tang, J., Hong, M., Li, J., & Liang, B. (2006). Tree-structured conditional random fields for semantic annotation. Lecture Notes in Computer Science, 4273, 640–653.
Delaye, A., & Liu, C. L. (2014). Multi-class segmentation of free-form online documents with tree conditional random fields. International Journal on Document Analysis and Recognition, 17(4), 313–329.
Ng, V., Dasgupta, S., & Arifin, S. M. N. (2006). Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In COLING/ACL on main conference poster sessions (Vol. 13, pp. 611–618). Association for Computational Linguistics.
Acknowledgements
This research is supported by the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No.: 20133718110014), the National Statistical Science Research (Grant No.: 2016LZ12), the Science and Technology of Taian (Grant Nos.: 2015GX2012 and 201630576), the National Economy and Society Information Development Soft Science of Shandong Province (Grant No.: 2015EI017). The author would like to thank all the students and teachers for their efforts. We are also appreciating the reviewers and editors for their valuable suggestions and comments to improve this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, G., Pu, P. & Liang, Y. Topic and Sentiment Words Extraction in Cross-Domain Product Reviews. Wireless Pers Commun 102, 1773–1783 (2018). https://doi.org/10.1007/s11277-017-5235-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-017-5235-7