Selecting Relevant Educational Attributes for Predicting Students’ Academic Performance

  • Abir Abid
  • Ilhem Kallel
  • Ignacio J. Blanco
  • Mounir Benayed
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 736)


Predicting students’ academic performance is one of the oldest and most popular applications of educational data mining. It helps to estimate the unknown evaluation of a student’s performance. However, a huge amount of data with different formats and from multiple sources may contain a large number of features supposed as not-relevant that could influence the prediction results. The main objective of this paper is to improve the effectiveness of a predictive model for students’ academic performance. For this purpose, we propose a methodology to carry out a comparative study for evaluating the influence of feature selection techniques on the prediction of students’ academic performance. In our study, F-measure parameter is used to evaluate the effectiveness of the selected techniques. Two real data sources are used in this work, Mathematics and language courses. The outcomes are compared and discussed in order to identify the technique that has the best influence for an accurate predictive model.



The authors express thanks to the Erasmus+ project for funding the research reported under the Grant Agreement number 2015-1-ES01-K107-015469.


  1. 1.
    Romero, C., Ventura, S.: Data mining in education. Wiley Interdisc. Rev. Data Min. Knowl. Discovery 3(1), 12–27 (2013)CrossRefGoogle Scholar
  2. 2.
    Abid, A., Kallel, I., BenAyed, M.: Teamwork construction in e-learning system: a systematic literature review. In: 2016 15th International Conference on Information Technology Based Higher Education and Training (ITHET). IEEE, pp. 1–7 (2016)Google Scholar
  3. 3.
    Mitra, P., Murthy, C., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)CrossRefGoogle Scholar
  4. 4.
    Miller, A.: Subset Selection in Regression. CRC Press, Boca Raton (2002)CrossRefzbMATHGoogle Scholar
  5. 5.
    Hall, M.A.: Correlation based feature selection for machine learning (1999)Google Scholar
  6. 6.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256 (1992)Google Scholar
  7. 7.
    Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning, pp. 171–182. Springer (1994)Google Scholar
  8. 8.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (2014)Google Scholar
  9. 9.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, vol. 2. Cambridge University Press, Cambridge (1996)zbMATHGoogle Scholar
  10. 10.
    Ramaswami, M., Bhaskaran, R.: A study on feature selection techniques in educational data mining. arXiv preprint arXiv:0912.3924 (2009)
  11. 11.
    Velmurugan, T., Anuradha, C.: Performance evaluation of feature selection algorithms in educational data mining. Perform. Eval. 5(02) (2016)Google Scholar
  12. 12.
    Costa, E.B., Fonseca, B., Santana, M.A., de Araújo, F.F., Rego, J.: Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Comput. Hum. Behav. 73, 247–256 (2017)CrossRefGoogle Scholar
  13. 13.
    Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34(3), 483–519 (2013)CrossRefGoogle Scholar
  14. 14.
    Noura, A., Shili, H., Romdhane, L.B.: Reliable attribute selection based on random forest (RASER). In: International Conference on Intelligent Systems Design and Applications, pp. 11–24. Springer (2017)Google Scholar
  15. 15.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  16. 16.
    Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)Google Scholar
  17. 17.
    Friedman, J., Hastie, T., Tibshirani, R., et al.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)CrossRefzbMATHGoogle Scholar
  18. 18.
    Quinlan, J.R.: C4.5: Programming for Machine Learning, vol. 38. Morgan Kauffmann, Burlington (1993)Google Scholar
  19. 19.
    Smith, T.C., Frank, E.: Introducing machine learning concepts with WEKA. In: Statistical Genomics: Methods and Protocols, pp. 353–378 (2016)Google Scholar
  20. 20.
    Márquez-Vera, C., Morales, C.R., Soto, S.V.: Predicting school failure and dropout by using data mining techniques. IEEE Revista Iberoamericana de Tecnologias del Aprendizaje 8(1), 7–14 (2013)CrossRefGoogle Scholar
  21. 21.
    Gu, Q., Cai, Z., Zhu, L., Huang, B.: Data mining on imbalanced data sets. In: IEEE 2008 International Conference on Advanced Computer Theory and Engineering, ICACTE 2008, pp. 1020–1024 (2008)Google Scholar
  22. 22.
    Cortez, P., Silva, A.M.G.: Using data mining to predict secondary school student performance (2008)Google Scholar
  23. 23.
    Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)zbMATHGoogle Scholar
  24. 24.
    Volungevičienė, A., Daukšienė, E., Caldirola, E., Blanco, I.J.: Success factors for virtual mobility exchange on open educational resources (2014)Google Scholar
  25. 25.
    Chatty, A., Kallel, I., Alimi, A.M.: Counter-ant algorithm for evolving multirobot collaboration. In: Proceedings of the 5th International Conference on Soft Computing as Transdisciplinary Science and Technology. ACM, pp. 84–89 (2008)Google Scholar
  26. 26.
    Abdelkefi, M., Kallel, I.: Towards a fuzzy multiagent tutoring system for M-learners’ emotion regulation. In: 2017 16th International Conference on Information Technology Based Higher Education and Training (ITHET). IEEE, pp. 1–6 (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Abir Abid
    • 1
  • Ilhem Kallel
    • 1
    • 2
  • Ignacio J. Blanco
    • 3
  • Mounir Benayed
    • 1
    • 4
  1. 1.REGIM-Lab: Research Groups in Intelligent Machines, ENISUniversity of SfaxSfaxTunisia
  2. 2.ISIMS: Higher Institute of Computer Science and Multimedia of SfaxSfaxTunisia
  3. 3.University of GranadaGranadaSpain
  4. 4.Computer Science and Communications Department, Faculty of Sciences of SfaxUniversity of SfaxSfaxTunisia

Personalised recommendations