Abstract
Stance detection is an automatic recognition of author’s view point in relation to a given object. An important stage of the solution process is determining the most appropriate way to represent texts. The paper proposes a new method of selecting an optimal feature set. The method is based on a homogenous ensemble of feature selection methods and a procedure of determining the optimal number of features. In this procedure the dependence of task performance on the number of features is approximated and the optimal number of features is determined by analyzing the growth rate of the function. There have been conducted experiments with text corpora consisting of “for” and “against” stances towards vaccinations of children, the Unified State Examination at school, and human cloning. The results demonstrate that the proposed method allows to achieve better performance in comparison with individual methods and even an overall feature set with a considerably fewer number of features.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
We also tried Bagging and AdaBoost classifiers but SVM showed the best results.
- 2.
- 3.
- 4.
References
Adel, A., Omar, N., Al-Shabi, A.: A comparative study of combined feature selection methods for arabic text classification. J. Comput. Sci. 10(11), 2232–2239 (2014)
Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)
Bolón-Canedo, V., Alonso-Betanzos, A.: Recent Advances in Ensembles for Feature Selection. Intelligent Systems Reference Library. Springer, Heidelberg (2018). https://doi.org/10.1007/978-3-319-90080-3
Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)
Chen, P., Wilbik, A., van Loon, S., Boer, A.-K., Kaymak, U.: Finding the optimal number of features based on mutual information. In: Kacprzyk, J., Szmidt, E., Zadrożny, S., Atanassov, K.T., Krawczak, M. (eds.) IWIFSGN/EUSFLAT -2017. AISC, vol. 641, pp. 477–486. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-66830-7_43
Ferreira, W., Vlachos, A.: Emergent: a novel data-set for stance classification. In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2016), San Diego, California, USA, pp. 1163–1168 (2016)
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)
Guru, D.S., Suhil, M., Pavithra, S.K., Priya, G.R.: Ensemble of feature selection methods for text classification: an analytical study. In: Abraham, A., Muhuri, P.K., Muda, A.K., Gandhi, N. (eds.) ISDA 2017. AISC, vol. 736, pp. 337–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76348-4_33
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. dissertation. Department of Computer Science, Waikato University, Hamilton, NZ (1999)
Hoque, N., Singh, M., Bhattacharyya, D.K.: EFS-MI: an ensemble feature selection method for classification. Complex Intell. Syst. 4(2), 105–118 (2017)
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), Article 94 (2016)
Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2009)
Mohammad, S.M., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C.: SemEval-2016 task 6: detecting stance in tweets. In: Proceedings of the International Workshop on Semantic Evaluation (SemEval–2016), San Diego, California, USA, pp. 31–41 (2016)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Seetha, H., Murty, M.N., Tripathy, B.K.: Modern Technologies for Big Data Classification and Clustering. IGI Global (2018)
Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl.-Based Syst. 118, 124–139 (2017)
Sridhar, D., Foulds, J., Huang, B., Getoor, L., Walker, M.: Joint models of disagreement and stance in online debate. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp. 116–125 (2015)
Trivedi, S.K., Dey, S.: A comparative study of various supervised feature selection methods for spam classification. In: Proceedings of the 2nd International Conference on Information and Communication Technology for Competitive Strategies, Udaipur, India, Article No. 64 (2016)
Vora, S., Yang, H.: A comprehensive study of eleven feature selection algorithms and their impact on text classification. In: Proceedings of the Computing Conference, London, UK, pp. 440–449 (2017)
Wang, R., Zhou, D., Jiang, M., Si, J., Yang, Y.: A survey on opinion mining: from stance to product aspect. IEEE Access 7, 41101–41124 (2019)
Vychegzhanin, S.V., Razova, E.V., Kotelnikov, E.V.: What number of features is optimal? A new method based on approximation function for stance detection task. In: Proceedings of the 9th International Conference on Information Communication and Management, Prague, Czech Republic, pp. 43–47 (2019)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, pp. 412–420 (1997)
Acknowledgments
The reported study was funded by the Ministry of Education and Science of the Russian Federation according to the research project No. 34.2092.2017/4.6.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Vychegzhanin, S., Razova, E., Kotelnikov, E., Milov, V. (2019). Selecting an Optimal Feature Set for Stance Detection. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2019. Lecture Notes in Computer Science(), vol 11832. Springer, Cham. https://doi.org/10.1007/978-3-030-37334-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-37334-4_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37333-7
Online ISBN: 978-3-030-37334-4
eBook Packages: Computer ScienceComputer Science (R0)