Sentimental feature selection for sentiment analysis of Chinese online reviews

  • Lijuan Zheng
  • Hongwei WangEmail author
  • Song Gao
Original Article


With the growing availability and popularity of online reviews, the sentiment analysis arises in response to the requirement of organizing useful information in speed. Feature selection directly affects the representation of online reviews and brings a lot of challenges to the domain of sentiment analysis. However, little attention has been paid to feature selection of Chinese online reviews so far. Therefore, we are motivated to explore the effects of feature selection on sentiment analysis of Chinese online reviews. Firstly, N-char-grams and N-POS-grams are selected as the potential sentimental features. Then, the improved Document Frequency method is used to select feature subsets, and the Boolean Weighting method is adopted to calculate feature weight. At last, experiments based on online reviews of mobile phone are conducted, and Chi-square test is carried out to test the significance of experimental results. The results suggest that sentiment analysis of Chinese online reviews obtains higher accuracy when taking 4-POS-grams as features. Besides that, low order N-char-grams can achieve a better performance than high order N-char-grams when taking N-char-grams as features. Furthermore, the improved document frequency achieves significant improvement in sentiment analysis of Chinese online reviews.


Online reviews Sentiment Feature selection Statistical machine learning 



This work is partially supported by the NSFC Grant 70971099 and 71371144, the fundamental research funds for the Central Universities (1200219198), and Shanghai Philosophy and Social Science Planning Projects (2013BGL004).


  1. 1.
    Li X, Xie H, Chen L, Wang J, Deng X (2014) News impact on stock price return via sentiment analysis. Knowl Based Syst 69:14–23CrossRefGoogle Scholar
  2. 2.
    Forman C, Ghose A, Wiesenfeld B (2008) Examining the relationship between reviews and sales: the role of reviewer identity disclosure in electronic markets. Inf Syst Res 19(3):291–313CrossRefGoogle Scholar
  3. 3.
    Greaves F, Ramirez D, Millett C, Darzi A, Donaldson L (2013) Harnessing the cloud of patient experience: using social media to detect poor quality healthcare. BMJ Qual Saf 22(3):251–255CrossRefGoogle Scholar
  4. 4.
    Yang L, Xu LD, Shi ZZ (2012) An enhanced dynamic hash trie algorithm for lexicon search. Enterpr Inf Syst 6(4):419–432CrossRefGoogle Scholar
  5. 5.
    Li HX, Xu LD, Wang JY, Mo ZW (2003) Feature space theory in data mining: transformations between extensions and intensions in knowledge representation. Expert Syst 20(2):60–71CrossRefGoogle Scholar
  6. 6.
    Ye Q, Lin B, Li YJ (2005) Sentiment classification for chinese reviews: a comparison between SVM and semantic approaches. In: proceedings of the 4th international conference on machine learning and cybernetics. NY, USA: IEEE Press, pp 2341–2346Google Scholar
  7. 7.
    Xie ZX, Xu Y (2014) Sparse group LASSO based uncertain feature selection. Int J Mach Learn Cybern 5(2):201–210CrossRefGoogle Scholar
  8. 8.
    Subrahmanya N, Shin YC (2013) A variational bayesian framework for group feature selection. Int J Mach Learn Cybern 4(6):609–619CrossRefGoogle Scholar
  9. 9.
    Wei P, Ma PJ, Hu QH, Su XH (2014) Comparative analysis on margin based feature selection algorithms. Int J Mach Learn Cybern 5(3):339–367CrossRefGoogle Scholar
  10. 10.
    Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):12–21CrossRefGoogle Scholar
  11. 11.
    Huang C (1997) Word segmentation issues in chinese information processing. Applied linguistics (in Chinese), p 1Google Scholar
  12. 12.
    Zhao H, Huang C, Li M (2006) An improved chinese word segmentation system with conditional random field. In: proceedings of the 5th SIGNAN workshop on Chinese language processing. Sydney, Australia, pp 162–165Google Scholar
  13. 13.
    Gao J, Li M, Wu A, Huang C (2005) Chinese word segmentation and named entity recognition: a pragmatic approach. Comput Linguist 31(4):531–574CrossRefzbMATHGoogle Scholar
  14. 14.
    Zhang D (2013) An evolutionary approach to automatic chinese text segmentation. In: ninth international conference on natural computationGoogle Scholar
  15. 15.
    Abbasi A, Chen H, Thoms S, Fu T (2008) Affect analysis of web forums and blogs using correlation ensembles. IEEE Trans Knowl Data Eng 20(9):1168–1180CrossRefGoogle Scholar
  16. 16.
    Ghiassi M, Skinner J, Zimbra D (2013) Twitter brand sentiment analysis: a hybrid system using N-gram analysis and dynamic artificial neural network. Expert Syst Appl 40(16):6266–6282CrossRefGoogle Scholar
  17. 17.
    Remus R, Rill S (2013) Data-driven vs. dictionary-based word n-gram feature induction for sentiment analysis. In: 25th international conference of the German-Society-for -Computational-Linguistics-and-Language-Technology (GSCL). Darmstadt, Germany, pp 25–27Google Scholar
  18. 18.
    Pang B, Lee L, Vaithyanathan S (2002) Sentiment classification using machine learning techniques. In: proceedings of the conference on empirical methods in natural language processing, Philadelphia, US, pp 79–86Google Scholar
  19. 19.
    Cui H, Mittal V, Datar M (2006) Comparative experiments on sentiment classification for online product reviews. In: proceedings of the 21st national conference on artificial intelligence (AAAI-06), Boston, USA, pp 1265–1270Google Scholar
  20. 20.
    Ng V, Dasgupta S, Arifin N (2006) Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: proceedings of the COLING/ACL main conference poster sessions, Association for Computational Linguistics, Morristown, NJ, USA, pp 611–618Google Scholar
  21. 21.
    Turney P (2002) Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of review. In: proceedings of the 40th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Morristown, NJ, USA, pp 417–424Google Scholar
  22. 22.
    Mullen T, Collier N (2004) Sentiment analysis using support vector machines with diverse information sources. In: proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, Spain, pp 412–418Google Scholar
  23. 23.
    Ng V, Dasgupta S, Arifin SMN (2006) Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews.In: proceedings conference computational linguistics, association for computational linguistics, pp 611–618Google Scholar
  24. 24.
    Ng HT, Goh WB, Low KL (1997) Feature selection, perceptron learning and a usability case study for text categorization. In: proceedings of the 20th annual Int’l ACM SIGIR conference on research and development in information retrieval, pp 67–73Google Scholar
  25. 25.
    Liu X (2011) Sentiment polarity classification on chinese reviews based on statistic natural language. Master’s Degree Thesis, Tongji UniversityGoogle Scholar
  26. 26.
    Wang HW, Yin P, Yao JN (2013) Text feature selection for sentiment classification of chinese online reviews. J Exp Theor Artif Intell 25(4):425–439CrossRefGoogle Scholar
  27. 27.
    Rückstieß T, Osendorfer C, Smagt PVD (2013) Minimizing data consumption with sequential online feature selection. Int J Mach Learn Cybern 4(3):235–243CrossRefGoogle Scholar
  28. 28.
    Xia HS, Peng LY (2009) SVM-based comments classification and mining of virtual community: for case of sentiment classification of hotel reviews. In: proceedings of the Int’l symposium on intelligent information systems and applications, pp 507–511Google Scholar
  29. 29.
    Phienthrakul T, Kijsirikul B, Takamura H, Okumura M (2009) Sentiment classification with support vector machines and multiple kernel functions. Lect Notes Computer Sci 58:583–592CrossRefGoogle Scholar
  30. 30.
    Ye Q, Zhang ZQ, Law R (2009) Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst Appl 36(3):6527–6535CrossRefGoogle Scholar
  31. 31.
    Moraes R, Valiati JF, Gaviao N, Wilson P (2013) Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40(2):621–633CrossRefGoogle Scholar
  32. 32.
    Wan X (2011) Bilingual co-training for sentiment classification of chinese product reviews. Comput Linguist 37(3):587–616CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.School of BusinessLiaocheng UniversityLiaochengChina
  2. 2.School of Economics and ManagementTongji UniversityShanghaiChina

Personalised recommendations