Abstract
The feature selection (FS) has been the latest challenge in the area of sentiment classification. The filter- and wrapper-based feature selection methods are applied in the domain to reduce feature set size and increase accuracy of the classifiers. In this paper, a hybrid of filter and wrapper method for selecting relevant features is proposed. The feature subset is first selected from the original feature set using computationally fast rank-based FS methods. The selected features are further refined using two wrapper approaches. In the first approach, recursive feature elimination is applied to select optimal feature set, and in the second approach, evolutionary method based on binary particle swarm optimization is applied for finalization of feature subset. The comparison between the two proposed techniques is conducted on five different domain datasets used in the area of sentiment analysis. We used simple and efficient ML algorithms (Naïve Bayes, support vector machine and logistic regression) to evaluate the performance of the hybrid FS techniques. Finally, we assessed the performance of the proposed hybrid FS technique by comparing our results with the state-of-the-art methods. The results reveal that the proposed method is able to give better accuracy with fewer number of features.
Similar content being viewed by others
References
Medhat, W.; Hassan, A.; Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014)
Pang, B.; Lee, L.: Opinion mining and sentiment analysis. Found. Trends® Inf. Retr. 2(1–2), 1–135 (2008)
Pang, B.; Lee, L.; Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86. ACL (2002)
Pang, B.; Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. ACL (2004)
Yang, Y.; Pederson, J.: A comparative study on feature selection in text categorization. In: International Conference on Machine Learning (ICML), vol. 97, pp. 412–420 (1997)
Tang, J.; Alelyani, S.; Liu, H.: Feature selection for classification: a review. In: Aggarwal, C.C. (ed.) Data Classification: Algorithms and Applications, pp. 37–64. CRC Press (2014)
Kohavi, R.; John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Abbasi, A.; Chen, H.: Sentiment analysis in multiple languages: feature selection for opinion classification in Web forums. ACM Trans. Inf. Syst. 26(3), 12:11–12.34 (2008)
Onan, A.; Koruko, S.; Glu, S.: A feature selection model based on genetic rank aggregation for text sentiment classification. J. Inf. Sci. 43(1), 25–38 (2017)
Cervante, L.; Xue, B.; Zhang, M.; Shang, L.: Binary particle swarm optimization for feature selection: a filter based approach. In: IEEE Congress on Evolutionary Computation (CEC), pp. 1–8 (2012)
Xue, B.; Zhang, M.; Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6), 1656–1671 (2013)
Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. ACL (2002)
Sharma, A.; Dey, S.: A comparative study of feature selection and machine learning techniques for sentiment analysis. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, pp. 1–7. ACM (2012)
Tan, S.; Zhang, J.: An empirical study of sentiment analysis for Chinese documents. Expert Syst. Appl. 34(4), 2622–2629 (2008)
Agarwal, B.; Mittal, N.: Prominent feature extraction for review analysis: an empirical study. J. Exp. Theor. Artif. Intell. 28(3), 485–498 (2016)
Xia, R.; Zong, C.; Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)
Xie, J.; Wang, C.: Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases. Expert Syst. Appl. 38(5), 5809–5815 (2011)
Peng, Y.; Wu, Z.; Jiang, J.: A novel feature selection approach for biomedical data classification. J. Biomed. Inform. 43(1), 15–23 (2010)
Agarwal, B.; Mittal, N.: Sentiment Classification using Rough Set based Hybrid Feature Selection. In: Proceedings of 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 115–119. ACL (2013)
Yousefpour, A.; Ibrahim, R.; Hamed, H.N.A.: Ordinal-based and frequency-based integration of feature selection methods for sentiment analysis. Expert Syst. Appl. 75, 80–93 (2017)
Zhang, L.; Wang, J.; Zha, Y.; Yang Z.: A novel hybrid feature selection method algorithm: using ReliefF estimation for GA-Wrapper Search. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, pp. 380–384. IEEE (2003)
Hsu, H.H.; Hsieh, C.W.; Lu, M.D.: Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl. 38(7), 8144–8150 (2011)
Apolloni, J.; Leguizamón, G.; Alba, E.: Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl. Soft Comput. J. 38, 922–932 (2016)
Zhang, Y.; Zhang, Y.; Lv, Y.; Hou, X.; Liu, F.; Jia, W.; Yang, M.; Phillips, P.; Wang, S.: Alcoholism detection by medical robots based on Hu moment invariants and predator–prey adaptive-inertia chaotic particle swarm optimization. Comput. Electr. Eng. J. 63, 126–138 (2017)
Zhang, Y.; Wang, S.; Sui, Y.; Yang, M.; Liu, B.; Cheng, H.; Sun, J.; Jia, W.; Phillips, P.; Gorriz, J.: Multivariate approach for Alzheimer’s disease detection using stationary wavelet entropy and predator-prey particle swarm optimization. J. Alzheimers Dis. 65(3), 855–869 (2018)
Basari, A.S.H.; Hussin, B.; Ananta, I.G.P.; Zeniarja, J.: Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization. Procedia Eng. 53, 453–462 (2013)
Shang, L.; Zhou, Z.; Liu, X.: Particle swarm optimization-based feature selection in sentiment classification. Soft Comput. 20(10), 3821–3834 (2016)
Chen, Y.T.; Chen, M.C.: Using Chi square statistics to measure similarities for text categorization. Expert Syst. Appl. 38(4), 3085–3090 (2011)
Parlar, T.; Özel, S.A.; Song, F.: QER: a new feature selection method for sentiment analysis. Hum. Centric Comput. Inf. Sci. 8(1), 10 (2018)
Meesad, P.; Boonrawd, P.; Nuipian, V.: A Chi square-test for word importance differentiation in text classification. In: International Conference on Information and Electronics Engineering, vol. 6, pp. 110–114. IACSIT (2011)
Kennedy, J.; Eberhart, R.: A discrete binary version of particle swarm optimization. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics and Computational Cybernetics and Simulation, vol. 5, pp. 4104–4108. IEEE (1997)
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Blitzer, J.; Dredze, M.; Pereira, F.: Biographies, Bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45rd Annual Meeting on Association for Computational Linguistics, vol. 7, pp. 440–447. ACL (2007)
Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A.; Benítez, J.M.; Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ansari, G., Ahmad, T. & Doja, M.N. Hybrid Filter–Wrapper Feature Selection Method for Sentiment Classification. Arab J Sci Eng 44, 9191–9208 (2019). https://doi.org/10.1007/s13369-019-04064-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-019-04064-6