Advertisement

Selecting an Optimal Feature Set for Stance Detection

  • Sergey Vychegzhanin
  • Elena Razova
  • Evgeny KotelnikovEmail author
  • Vladimir Milov
Conference paper
  • 359 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11832)

Abstract

Stance detection is an automatic recognition of author’s view point in relation to a given object. An important stage of the solution process is determining the most appropriate way to represent texts. The paper proposes a new method of selecting an optimal feature set. The method is based on a homogenous ensemble of feature selection methods and a procedure of determining the optimal number of features. In this procedure the dependence of task performance on the number of features is approximated and the optimal number of features is determined by analyzing the growth rate of the function. There have been conducted experiments with text corpora consisting of “for” and “against” stances towards vaccinations of children, the Unified State Examination at school, and human cloning. The results demonstrate that the proposed method allows to achieve better performance in comparison with individual methods and even an overall feature set with a considerably fewer number of features.

Keywords

Stance detection Feature selection Ensembles Gini index 

Notes

Acknowledgments

The reported study was funded by the Ministry of Education and Science of the Russian Federation according to the research project No. 34.2092.2017/4.6.

References

  1. 1.
    Adel, A., Omar, N., Al-Shabi, A.: A comparative study of combined feature selection methods for arabic text classification. J. Comput. Sci. 10(11), 2232–2239 (2014)CrossRefGoogle Scholar
  2. 2.
    Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)CrossRefGoogle Scholar
  3. 3.
    Bolón-Canedo, V., Alonso-Betanzos, A.: Recent Advances in Ensembles for Feature Selection. Intelligent Systems Reference Library. Springer, Heidelberg (2018).  https://doi.org/10.1007/978-3-319-90080-3CrossRefGoogle Scholar
  4. 4.
    Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)CrossRefGoogle Scholar
  5. 5.
    Chen, P., Wilbik, A., van Loon, S., Boer, A.-K., Kaymak, U.: Finding the optimal number of features based on mutual information. In: Kacprzyk, J., Szmidt, E., Zadrożny, S., Atanassov, K.T., Krawczak, M. (eds.) IWIFSGN/EUSFLAT -2017. AISC, vol. 641, pp. 477–486. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-66830-7_43CrossRefGoogle Scholar
  6. 6.
    Ferreira, W., Vlachos, A.: Emergent: a novel data-set for stance classification. In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2016), San Diego, California, USA, pp. 1163–1168 (2016)Google Scholar
  7. 7.
    Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)CrossRefGoogle Scholar
  8. 8.
    Guru, D.S., Suhil, M., Pavithra, S.K., Priya, G.R.: Ensemble of feature selection methods for text classification: an analytical study. In: Abraham, A., Muhuri, P.K., Muda, A.K., Gandhi, N. (eds.) ISDA 2017. AISC, vol. 736, pp. 337–349. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-76348-4_33CrossRefGoogle Scholar
  9. 9.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)CrossRefGoogle Scholar
  10. 10.
    Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. dissertation. Department of Computer Science, Waikato University, Hamilton, NZ (1999)Google Scholar
  11. 11.
    Hoque, N., Singh, M., Bhattacharyya, D.K.: EFS-MI: an ensemble feature selection method for classification. Complex Intell. Syst. 4(2), 105–118 (2017)CrossRefGoogle Scholar
  12. 12.
    Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), Article 94 (2016)CrossRefGoogle Scholar
  13. 13.
    Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2009)zbMATHGoogle Scholar
  14. 14.
    Mohammad, S.M., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C.: SemEval-2016 task 6: detecting stance in tweets. In: Proceedings of the International Workshop on Semantic Evaluation (SemEval–2016), San Diego, California, USA, pp. 31–41 (2016)Google Scholar
  15. 15.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  16. 16.
    Seetha, H., Murty, M.N., Tripathy, B.K.: Modern Technologies for Big Data Classification and Clustering. IGI Global (2018)Google Scholar
  17. 17.
    Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl.-Based Syst. 118, 124–139 (2017)CrossRefGoogle Scholar
  18. 18.
    Sridhar, D., Foulds, J., Huang, B., Getoor, L., Walker, M.: Joint models of disagreement and stance in online debate. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp. 116–125 (2015)Google Scholar
  19. 19.
    Trivedi, S.K., Dey, S.: A comparative study of various supervised feature selection methods for spam classification. In: Proceedings of the 2nd International Conference on Information and Communication Technology for Competitive Strategies, Udaipur, India, Article No. 64 (2016)Google Scholar
  20. 20.
    Vora, S., Yang, H.: A comprehensive study of eleven feature selection algorithms and their impact on text classification. In: Proceedings of the Computing Conference, London, UK, pp. 440–449 (2017)Google Scholar
  21. 21.
    Wang, R., Zhou, D., Jiang, M., Si, J., Yang, Y.: A survey on opinion mining: from stance to product aspect. IEEE Access 7, 41101–41124 (2019)CrossRefGoogle Scholar
  22. 22.
    Vychegzhanin, S.V., Razova, E.V., Kotelnikov, E.V.: What number of features is optimal? A new method based on approximation function for stance detection task. In: Proceedings of the 9th International Conference on Information Communication and Management, Prague, Czech Republic, pp. 43–47 (2019)Google Scholar
  23. 23.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, pp. 412–420 (1997)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Vyatka State UniversityKirovRussia
  2. 2.Nizhny Novgorod State Technical University n.a. R.E. AlekseevNizhny NovgorodRussia

Personalised recommendations