Abstract
Unstructured online reviews are undergoing a rather rapid expansion with the development of E-commerce, and they contain sentiment information in which consumers and businesses are very interested. Therefore, effective sentiment classification has become one of the important research topics. Many studies have shown that ensemble learning methods may have great hopeful applicability in sentiment classification tasks. In this paper, we propose a new ensemble learning framework for sentiment classification of Chinese online reviews. First of all, according to the complicated characteristics of Chinese online reviews, we extract Part of Speech Combination Pattern, Frequent Word Sequence Pattern and Order Preserved Submatrix Pattern as the input features. Furthermore, we use the algorithm of Random Subspace based on Information Gain by considering the problem of massive features in the reviews, which can improve the base classifiers simultaneously. Finally, we adopt the algorithm of Constructing Base Classifiers based on Product Attributes to combine the sentiment information of each attribute in a review so as to obtain better performance on sentiment classification. The experimental results show that the proposed ensemble learning framework has significant improvement in sentiment classification of Chinese online reviews.
Similar content being viewed by others
References
Xu, R., Wong, K, Xia, Y.: Coarse-fine opinion mining-WIA in NTCIR-7 moat task. In: Proceedings of NTCIR-7 Workshop Meeting, pp. 307–313 (2008)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002)
Tan, S., Zhang, J.: An empirical study of sentiment analysis for chinese documents. Expert Syst. Appl. 34(4), 2622–2629 (2008)
Liu, Y.: Computational Linguistics. Tsinghua University Press, Beijing (2002)
Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)
Sivic, J., Zisserman, A.: Efficient visual search of videos cast as text retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 591–606 (2009)
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Yang, L.G., Zhu, J., Tian, S.P.: Survey of text sentiment analysis. J. Comput. Appl. 33, 1574–1607 (2013)
Turney P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp. 417–424 (2002)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)
Salton, G., Yu, C.T.: On the Construction of Effective Vocabularies for Information Retrieval. ACM SIGIR Forum, pp. 48–60. ACM, New York (1973)
Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3(6), 1137–1155 (2003)
Mikolov, T., Chen K., Corrado G., et al.: Efficient estimation of word representations in vector space. In: Computer Science (2013)
Gui, L., Zhou, Y., Xu, R., et al.: Learning representations from heterogeneous network for sentiment classification of product reviews. Knowl. Based Syst. 124, 34–45 (2017)
Chen, T., Xu, R., He, Y., et al.: Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst. Appl. 72, 221–230 (2017)
Polikar, R.: Ensemble based systems in decision making. IEEE Circ. Syst. Mag. 6(3), 21–44 (2006)
Fang, D., Wang, G.: Text sentiment classification based on ensemble learning. Comput. Syst. Appl. 07, 177–181+248 (2012)
Wu, C.C.: Sentiment classification method based on ensemble learning for Chinese micro-blog. Public Commun. Sci. Technol. 16, 235–236+192 (2014)
Wang, G., Sun, J., Ma, J., et al.: Sentiment classification: the contribution of ensemble learning. Decis. Support Syst. 57(1), 77–93 (2004)
Alnashwan, R., O’Riordan, A.P., Sorensen, H., et al.: Improving sentiment analysis through ensemble learning of meta-level features. In: KDWEB 2016: 2nd International Workshop on Knowledge Discovery on the Web. Sun SITE Central Europe (CEUR)/RWTH Aachen University, Aachen (2016)
Deriu, J., Gonzenbach, M., Uzdilli F., et al.: SwissCheese at SemEval-2016 Task 4: sentiment classification using an ensemble of convolutional neural networks with distant supervision. In: SemEval@ NAACL-HLT, pp. 1124–1128 (2006)
Liu, H.Y., Zhao, Y.Y., Qin, B, et al.: Comment target extraction and sentiment classification. J. Chin. Inf. Process. 01, 84–88+122 (2010)
Gao, L., Dai, X.Y., Huang, S.J., et al.: Product attribute extraction based on feature selection and pointwise mutual information pruning. Pattern Recog. Artif. Intell. 02, 187–192 (2015)
Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Advances in Knowledge Discovery and Data Mining, pp. 301–311 (2005)
Pei, J., Han, J., Mortazavi-Asl, B., et al.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)
Liu, Z., Xue, Y., Li, M., et al.: Discovery of deep order-preserving submatrix in DNA microarray data based on sequential pattern mining. Int. J. Data Mining Bioinform. 17, 217–237 (2017)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)
Agrawal, R., Srikant, R.: Mining Sequential Patterns. ICDE, vol. 3. IEEE Computer Society, Washington, DC (1995)
Hu, M., Liu, B.: Opinion feature extraction using class sequential rules. In: AAAI Spring Symposium, pp. 61–66 (2006)
Li, J., Sun M.: Experimental study on sentiment classification of Chinese review using machine learning techniques. In: International Conference on Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007, vol. 2007, pp. 393–400. IEEE (2007)
Liu, Y., Chen, F., Kong, W., et al.: Identifying web spam with the wisdom of the crowds. ACM Trans. Web (TWEB) 6(1), 1–30 (2012)
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, Burlington (2011)
Abadi, M., Agarwal, A., Barham, P., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint. arXiv:1603.04467 (2016)
Dong, Z., Dong, Q.: HowNet—a hybrid language and knowledge resource. In: Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering, 2003, pp. 820–824. IEEE (2003)
Yuan, B., Liu, Y., Li, H.: Sentiment classification in Chinese microblogs: lexicon-based and learning-based approaches. Int. Proc. Econ. Dev. Res. 68, 1 (2013)
Acknowledgements
The authors thank gratefully for the colleagues participated in this work and provided technical supports. This work is supported by Grant from the National Natural Science Foundation of China (No. 61672126), Guangdong Provincial Engineering Technology Research Center for Data Science (Nos. 2016KF09, 2016KF10), and the National Statistical Science Research Project of China (No. 2016LY98). This work was also supported by the Science and Technology Department of Guangdong Province in China (Grant Nos. 2016A010101020, 2016A010101021, 2016A010101022), Foundation of Guangdong Polytechnic of Science and Technology (No. XJSC2016206), Natural Science Funds of Shenzhen Science and Technology Innovation Commission (No. JCYJ20160527172144272) and the Innovation Project of Graduate School of South China Normal University (No. 2015lkxm37). Furthermore, the authors thank gratefully for the scholars who shared datasets used in this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, J., Xue, Y., Hu, X. et al. Sentiment analysis of Chinese online reviews using ensemble learning framework. Cluster Comput 22 (Suppl 2), 3043–3058 (2019). https://doi.org/10.1007/s10586-018-1858-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-1858-z