Selecting an Optimal Feature Set for Stance Detection
- 359 Downloads
Stance detection is an automatic recognition of author’s view point in relation to a given object. An important stage of the solution process is determining the most appropriate way to represent texts. The paper proposes a new method of selecting an optimal feature set. The method is based on a homogenous ensemble of feature selection methods and a procedure of determining the optimal number of features. In this procedure the dependence of task performance on the number of features is approximated and the optimal number of features is determined by analyzing the growth rate of the function. There have been conducted experiments with text corpora consisting of “for” and “against” stances towards vaccinations of children, the Unified State Examination at school, and human cloning. The results demonstrate that the proposed method allows to achieve better performance in comparison with individual methods and even an overall feature set with a considerably fewer number of features.
KeywordsStance detection Feature selection Ensembles Gini index
The reported study was funded by the Ministry of Education and Science of the Russian Federation according to the research project No. 34.2092.2017/4.6.
- 5.Chen, P., Wilbik, A., van Loon, S., Boer, A.-K., Kaymak, U.: Finding the optimal number of features based on mutual information. In: Kacprzyk, J., Szmidt, E., Zadrożny, S., Atanassov, K.T., Krawczak, M. (eds.) IWIFSGN/EUSFLAT -2017. AISC, vol. 641, pp. 477–486. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-66830-7_43CrossRefGoogle Scholar
- 6.Ferreira, W., Vlachos, A.: Emergent: a novel data-set for stance classification. In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2016), San Diego, California, USA, pp. 1163–1168 (2016)Google Scholar
- 8.Guru, D.S., Suhil, M., Pavithra, S.K., Priya, G.R.: Ensemble of feature selection methods for text classification: an analytical study. In: Abraham, A., Muhuri, P.K., Muda, A.K., Gandhi, N. (eds.) ISDA 2017. AISC, vol. 736, pp. 337–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76348-4_33CrossRefGoogle Scholar
- 10.Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. dissertation. Department of Computer Science, Waikato University, Hamilton, NZ (1999)Google Scholar
- 14.Mohammad, S.M., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C.: SemEval-2016 task 6: detecting stance in tweets. In: Proceedings of the International Workshop on Semantic Evaluation (SemEval–2016), San Diego, California, USA, pp. 31–41 (2016)Google Scholar
- 16.Seetha, H., Murty, M.N., Tripathy, B.K.: Modern Technologies for Big Data Classification and Clustering. IGI Global (2018)Google Scholar
- 18.Sridhar, D., Foulds, J., Huang, B., Getoor, L., Walker, M.: Joint models of disagreement and stance in online debate. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp. 116–125 (2015)Google Scholar
- 19.Trivedi, S.K., Dey, S.: A comparative study of various supervised feature selection methods for spam classification. In: Proceedings of the 2nd International Conference on Information and Communication Technology for Competitive Strategies, Udaipur, India, Article No. 64 (2016)Google Scholar
- 20.Vora, S., Yang, H.: A comprehensive study of eleven feature selection algorithms and their impact on text classification. In: Proceedings of the Computing Conference, London, UK, pp. 440–449 (2017)Google Scholar
- 22.Vychegzhanin, S.V., Razova, E.V., Kotelnikov, E.V.: What number of features is optimal? A new method based on approximation function for stance detection task. In: Proceedings of the 9th International Conference on Information Communication and Management, Prague, Czech Republic, pp. 43–47 (2019)Google Scholar
- 23.Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, pp. 412–420 (1997)Google Scholar