Optimal Feature Selection for Learning-Based Algorithms for Sentiment Classification

  • Zhaoxia WangEmail author
  • Zhiping Lin


Sentiment classification is an important branch of cognitive computation—thus the further studies of properties of sentiment analysis is important. Sentiment classification on text data has been an active topic for the last two decades and learning-based methods are very popular and widely used in various applications. For learning-based methods, a lot of enhanced technical strategies have been used to improve the performance of the methods. Feature selection is one of these strategies and it has been studied by many researchers. However, an existing unsolved difficult problem is the choice of a suitable number of features for obtaining the best sentiment classification performance of the learning-based methods. Therefore, we investigate the relationship between the number of features selected and the sentiment classification performance of the learning-based methods. A new method for the selection of a suitable number of features is proposed in which the Chi Square feature selection algorithm is employed and the features are selected using a preset score threshold. It is discovered that there is a relationship between the logarithm of the number of features selected and the sentiment classification performance of the learning-based method, and it is also found that this relationship is independent of the learning-based method involved. The new findings in this research indicate that it is always possible for researchers to select the appropriate number of features for learning-based methods to obtain the best sentiment classification performance. This can guide researchers to select the proper features for optimizing the performance of learning-based algorithms. (A preliminary version of this paper received a Best Paper Award at the International Conference on Extreme Learning Machines 2018.)


Machine learning Feature selection Optimal feature selection Relationship analysis Sentiment classification Social media Text analysis 



The authors would like to thank Dr. Ho Seng Beng, Dr. Quek Boon Kiat, and the team of A*STAR AI program for their discussion and help. The authors would also like to thank the intern students from NTU and SUTD for the assistance in this research.

Compliance with Ethical Standards

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The authors declare that they have no conflict of interest.


  1. 1.
    Asgarian E, Kahani M, Sharifi S. The impact of sentiment features on the sentiment polarity classification in Persian reviews. Cognit Comput. 2018;10(1):117–35.CrossRefGoogle Scholar
  2. 2.
    Feng S, Wang Y, Song K, Wang D, Yu G. Detecting multiple coexisting emotions in microblogs with convolutional neural networks. Cognit Comput. 2018;10(1):136–55.CrossRefGoogle Scholar
  3. 3.
    Yang H, Wu CLC. Sentiment discovery of social messages using self-organizing maps. Cognit Comput. 2018;10(6):1152–66.CrossRefGoogle Scholar
  4. 4.
    Dashtipour K, Gogate M, Adeel A, Ieracitano C, Hussain A. Exploiting deep learning for Persian sentiment analysis. Int Conf Brain Inspired Cognit Syst. 2018:597–604.Google Scholar
  5. 5.
    Cambria E, Hussain A, Durrani T, Havasi C, Eckl C, Munro J. Sentic computing for patient centered applications. Proc IEEE ICSP. 2010:1279–82.Google Scholar
  6. 6.
    Bovet A, Morone F, Makse HA. Validation of Twitter opinion trends with national polling aggregates : Hillary Clinton vs Donald Trump. Sci Rep. 2018;8(1):8673.CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Wang Z, Tong JC, Xin X, Chin HC. Anomaly detection through enhanced sentiment analysis on social media data. In: 2014 IEEE 6th international conference on cloud computing technology and science; 2014. p. 917–22.CrossRefGoogle Scholar
  8. 8.
    Chen L, Jiang T, Li W, Geng S, Hussain S. Who should pay for online reviews? Design of an online user feedback mechanism. Electron Commer Res Appl. 2017;23:38–44.CrossRefGoogle Scholar
  9. 9.
    Cambria E, Poria S, Gelbukh A, Thelwall M. Sentiment analysis is a big suitcase. IEEE Intell Syst. 2017;32(6):74–80.CrossRefGoogle Scholar
  10. 10.
    Wang Z, Chong CS, Lan L, Yang Y, Ho S, Tong JC. Fine-grained sentiment analysis of social media with emotion sensing. Future Technol Conf. 2016:1361–4.Google Scholar
  11. 11.
    Xing FZ, Pallucchini F, Cambria E. Cognitive-inspired domain adaptation of sentiment lexicons. Inf Process Manag. 2019;56(3):554–64.CrossRefGoogle Scholar
  12. 12.
    Cambria E, Poria S, Hazarika D, Kwok K. SenticNet 5: discovering conceptual primitives for sentiment analysis by means of context embeddings. In: The thirty-second AAAI conference on artificial intelligence (AAAI-18); 2018. p. 1795–802.Google Scholar
  13. 13.
    Mondal A, Cambria E, Das D, Hussain A, Bandyopadhyay S. Relation extraction of medical concepts using categorization and sentiment analysis. Cognit Comput. 2018;10(4):670–85.CrossRefGoogle Scholar
  14. 14.
    Lauren P, Qu G, Yang J, Watta P, Huang G, Lendasse A. Generating word embeddings from an extreme learning machine for sentiment analysis and sequence labeling tasks. Cognit Comput. 2018;10(4):625–38.CrossRefGoogle Scholar
  15. 15.
    Li Y, Pan Q, Yang T, Wang S, Tang J, Cambria E. Learning word representations for sentiment analysis. Cognit Comput. 2017;9(6):843–51.CrossRefGoogle Scholar
  16. 16.
    Kolchyna O, Souza TTP, Treleaven P, Aste T. Twitter sentiment analysis: lexicon method, machine learning method and their combination. arXiv preprint arXiv. 2015:32.Google Scholar
  17. 17.
    Zhang L, Ghosh R, Dekhil M, Hsu M, Liu B. Combining lexicon-based and learning-based methods for twitter sentiment analysis. Int J Electron Commun Soft Comput Sci Eng. 2015;89:1–8.Google Scholar
  18. 18.
    Cambria E, Olsher D, Kwok K. Sentic activation: a two-level affective common sense reasoning framework. Proc AAAI. 2012:186–92.Google Scholar
  19. 19.
    Cambria E, Mazzocco T, Hussain A, Eckl C. Sentic medoids: organizing affective common sense knowledge in a multi-dimensional vector space. LNCS. 2011;6677:601–10.Google Scholar
  20. 20.
    Wang Z, Tong JC, Ho SB. Method and system of intelligent sentiment and emotion sensing with adaptive learning. In: Patent cooperation treaty (PCT) international application no.PCT/SG2017/050172; 2017.Google Scholar
  21. 21.
    Dashtipour K, Poria S, Hussain A, Cambria E. Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cognit Comput. 2016;8(4):757–71.CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Narayanan V, Arora I, Bhatia A. Fast and accurate sentiment classification using an enhanced Naive Bayes model. Int Conf Intell Data Eng Automated Learn. 2013:194–201.Google Scholar
  23. 23.
    Wang Z, Tong JC, Chin HC. Enhancing machine-learning methods for sentiment classification of web data. Asia Inf Retr Symp. 2014;8870:394–405.Google Scholar
  24. 24.
    Chang W, Wang J. Mine is yours? Using sentiment analysis to explore the degree of risk in the sharing economy. Electron Commer Res Appl. 2018;28:141–58.CrossRefGoogle Scholar
  25. 25.
    Al-obeidat F, Spencer B, Kafeza E. The opinion management framework: identifying and addressing customer concerns extracted from online product reviews. Electron Commer Res Appl. 2018;27:52–64.CrossRefGoogle Scholar
  26. 26.
    Malandri L, Xing FZ, Orsenigo C, Vercellis C, Cambria E. Public mood – driven asset allocation: the importance of financial sentiment in portfolio management. Cognit Comput. 2018;10(6):1167–76.CrossRefGoogle Scholar
  27. 27.
    Cambria E, Hussain A, Havasi C, Eckl C. SenticSpace: visualizing opinions and sentiments in a multi-dimensional vector space. Knowl-Based Intell Inf Eng Syst. 2010:385–93.Google Scholar
  28. 28.
    Tang J, Alelyani S, Liu H. Feature selection for classification: a review. Data Classif Algorithms Appl. 2014:37.Google Scholar
  29. 29.
    Duric A, Song F. Feature selection for sentiment analysis based on content and syntax models. Decis Support Syst. 2012;53(4):704–11.CrossRefGoogle Scholar
  30. 30.
    Wang S, Li D, Song X, Wei Y, Li H. A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Syst Appl. 2011;38(7):8696–702.CrossRefGoogle Scholar
  31. 31.
    Al-Radaideh QA, Al-Qudah GY. Application of rough set-based feature selection for Arabic sentiment analysis. Cognit Comput. 2017;9(4):436–45.CrossRefGoogle Scholar
  32. 32.
    Prusa JD, Khoshgoftaar TM, Dittman DJ. Impact of feature selection techniques for tweet sentiment classification. Twenty-Eighth Int Flairs Conf. 2015:299–304.Google Scholar
  33. 33.
    Nigam K, Lafferty J, Mccallum A. Using maximum entropy for text classification. In: IJCAI-99 workshop on machine learning for information filtering; 1999. p. 61–7.Google Scholar
  34. 34.
    Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.Google Scholar
  35. 35.
    Huang G, Zhu Q, Siew C. “Extreme learning machine: a new learning scheme of feedforward neural networks,” in Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on, 2004, vol. 2, no. August 2004, pp. 985–990.Google Scholar
  36. 36.
    Li S, Xia R, Zong C, Huang C-R. “A framework of feature selection methods for text categorization,” in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2009, no August, pp. 692–700.Google Scholar
  37. 37.
    Boiy E, Moens M-F. A machine learning approach to sentiment analysis in multilingual Web texts. Inf Retr Boston. Sep. 2009;12(5):526–58.CrossRefGoogle Scholar
  38. 38.
    “Twitter-sentiment-analyzer,” Available from: [Cited 4 Sep. 2013].
  39. 39.
    “Twitter-sentiment-analysis2,” Available from: [Cited 2 Dec. 2017].
  40. 40.
    Liu X, Gao C, Li P. A comparative analysis of support vector machines and extreme learning machines. Neural Netw. 2012;33:58–66.CrossRefPubMedGoogle Scholar
  41. 41.
    Gelman A, Goodrich B, Gabry J, Ali I. R-squared for Bayesian regression models. Am Stat. 2018:1–6.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Information SystemsSingapore Management University (SMU)SingaporeSingapore
  2. 2.Nanjing University of Information Science and Technology (NUIST)NanjingChina
  3. 3.Institute of High Performance Computing (IHPC)Agency for Science, Technology and Research (A*STAR)SingaporeSingapore
  4. 4.School of Electrical and Electronic Engineering (EEE)Nanyang Technological UniversitySingaporeSingapore

Personalised recommendations