Identifying key factors of student academic performance by subgroup discovery

  • Sumyea HelalEmail author
  • Jiuyong Li
  • Lin Liu
  • Esmaeil Ebrahimie
  • Shane Dawson
  • Duncan J. Murray
Regular Paper


Identifying the factors that influence student academic performance is essential to provide timely and effective support interventions. The data collected during enrolment and after commencement into a course provide an important source of information to assist with identifying potential risk indicators associated with poor academic performance and attrition. Both predictive and descriptive data mining techniques have been applied on educational data to discover the significant reasons behind student performance. These techniques have their own advantages and limitations. For example, predictive techniques tend to maximise accuracy for correctly classifying students, while the descriptive techniques simply search for interesting student features without considering their academic outcome. Subgroup discovery is a data mining method which takes the advantages of both predictive and descriptive approaches. This study uses subgroup discovery to extract significant factors of student performance for a certain outcome (Pass or Fail). In this work, we have utilised student demographic and academic data recorded at enrolment, as well as course assessment and participation data retrieved from the institution’s learning management system (Moodle) to detect the factors affecting student performance. The results have demonstrated the effectiveness of the subgroup discovery method in general in identifying the factors, and the pros and cons of some popular subgroup discovery algorithms used in this research. From the experiments, it has been found that students, who have indigent socio-economic background or been admitted based on special entry requirement, are most likely to fail. The experiments on Moodle data have revealed that students having lower level of access to the course resources and forum have higher possibility of being unsuccessful. From the combined data, we have identified some interesting subgroups which are not detected using enrolment or Moodle data separately. It has been found that those students, who study off-campus or part-time and have a low level of contributions to the course learning activities, are more likely to be the low-performing students.


Subgroup discovery Education data mining Moodle Enrolment data 


  1. 1.
    BlackBoard. Accessed 05 Mar 2018
  2. 2.
    Desire2Learn. Accessed 05 Mar 2018
  3. 3.
    Moodle. Accessed 05 Mar 2018
  4. 4.
    Orange. Accessed 05 Mar 2018
  5. 5.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)Google Scholar
  6. 6.
    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Mult. Valued Logic Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
  7. 7.
    Atzmueller, M., Doerfel, S., Mitzlaff, F.: Description-oriented community detection using exhaustive subgroup discovery. Inf. Sci. 329, 965–984 (2016)CrossRefGoogle Scholar
  8. 8.
    Atzmueller, M., Lemmerich, F.: VIKAMINE—Open-source subgroup discovery, pattern mining, and analytics. In: Proceedings of ECML/PKDD 2012: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Heidelberg, Germany (2012)Google Scholar
  9. 9.
    Atzmueller, M., Puppe, F.: SD-Map–A Fast Algorithm for Exhaustive Subgroup Discovery, pp. 6–17. Springer, Berlin (2006)Google Scholar
  10. 10.
    Blagojević, M.: Živadin Micić: a web-based intelligent report e-learning system using data mining techniques. Comput. Electr. Eng. 39(2), 465–474 (2013)CrossRefGoogle Scholar
  11. 11.
    Borkar, S., Rajeswari, K.: Predicting students academic performance using education data mining. Comput. Sci. Mobile Comput. 2, 273–279 (2013)Google Scholar
  12. 12.
    Brito, P.Q., Soares, C., Almeida, S., Monte, A., Byvoet, M.: Customer segmentation in a large database of an online customized fashion business. Robot. Comput. Integr. Manuf. 36, 93–100 (2015)CrossRefGoogle Scholar
  13. 13.
    Carmona, C., Chrysostomou, C., Seker, H., del Jesus, M.: Fuzzy rules for describing subgroups from influenza a virus using a multi-objective evolutionary algorithm. Appl. Soft Comput. 13(8), 3439–3448 (2013)CrossRefGoogle Scholar
  14. 14.
    Carmona, C., González, P., García, B., del Jesus, M., Aguilera, J.: Mefes: an evolutionary proposal for the detection of exceptions in subgroup discovery. An application to concentrating photovoltaic technology. Knowl. Based Syst. 54, 73–85 (2013)CrossRefGoogle Scholar
  15. 15.
    Carmona, C., Ruiz-Rodado, V., del Jesus, M., Weber, A., Grootveld, M., González, P., Elizondo, D.: A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Inf. Sci. 298, 180–197 (2015)CrossRefGoogle Scholar
  16. 16.
    Carmona, C.J., González, P., del Jesus, M.J., Herrera, F.: NMEEF-SD: non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans. Fuzzy Syst. 18, 958–970 (2010)CrossRefGoogle Scholar
  17. 17.
    Carmona, C.J., González, P., del Jesus, M.J., Romero, C., Ventura, S.: Evolutionary algorithms for subgroup discovery applied to e-learning data. In: IEEE EDUCON 2010 Conference, pp. 983–990 (2010)Google Scholar
  18. 18.
    Carmona, C.J., González, P., del Jesus, M.J., Ventura, S.: Subgroup discovery in an e-learning usage study based on moodle. In: 7th International Conference on Next Generation Web Services Practices, pp. 446–451 (2011)Google Scholar
  19. 19.
    Chen, C.M., Chen, M.C.: Mobile formative assessment tool based on data mining techniques for supporting web-based learning. Comput. Educ. 52(1), 256–273 (2009)CrossRefGoogle Scholar
  20. 20.
    Dominguez, A.K., Yacef, K., Curran, J.R.: Data mining for individualised hints in elearning. In: Proceedings of the International Conference on Educational Data Mining, pp. 91–100 (2010)Google Scholar
  21. 21.
    Fleiss, J.: Statistical Methods for Rates and Proportions Rates and Proportions. Wiley, New York (1973)zbMATHGoogle Scholar
  22. 22.
    Gamberger, D., Lavrač, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Intell. Res. 17, 501–527 (2002)CrossRefzbMATHGoogle Scholar
  23. 23.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning, 1989. Addison Wesley, Reading (1989)zbMATHGoogle Scholar
  24. 24.
    Goldsmith, B.R., Boley, M., Vreeken, J., Scheffler, M., Ghiringhelli, L.M.: Uncovering structure-property relationships of materials by subgroup discovery. New J. Phys. 19(1), 13–31 (2017)CrossRefGoogle Scholar
  25. 25.
    Gray, G., McGuinness, C., Owende, P.: An application of classification models to predict learner progression in tertiary education. In: Advance Computing Conference (IACC), 2014 IEEE International, pp. 549–554 (2014)Google Scholar
  26. 26.
    Grosskreutz, H., Stefan, R.: On subgroup discovery in numerical domains. Data Min. Knowl. Discov. 19(2), 210–226 (2009)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  28. 28.
    Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc, Burlington (2005)Google Scholar
  29. 29.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29(2), 1–12 (2000)CrossRefGoogle Scholar
  30. 30.
    Helal, S.: Subgroup discovery algorithms: a survey and empirical evaluation. J. Comput. Sci. Technol. 31(3), 561–576 (2016)CrossRefGoogle Scholar
  31. 31.
    Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. In: Knowledge Information system, pp. 495–525 (2011)Google Scholar
  32. 32.
    Holzhüter, M., Frosch-Wilke, D., Klein, U.: Exploiting Learner Models Using Data Mining for E-Learning: A Rule Based Approach, pp. 77–105. Springer, Berlin (2013)Google Scholar
  33. 33.
    Hsieh, T.C., Wang, T.I.: A mining-based approach on discovering courses pattern for constructing suitable learning path. Exp. Syst. Appl. 37(6), 4156–4167 (2010)CrossRefGoogle Scholar
  34. 34.
    Jin, N., Flach, P., Wilcox, T., Sellman, R., Thumim, J., Knobbe, A.: Subgroup discovery in smart electricity meter data. IEEE Trans. Ind. Inform. 10(2), 1327–1336 (2014)CrossRefGoogle Scholar
  35. 35.
    Jovanoski, V., Lavrač, N.: Classification rule learning with APRIORI-C. In: Proceedings of the 10th Portuguese Conference on Artificial Intelligence, pp. 44–51 (2001)Google Scholar
  36. 36.
    Jovanovic, M., Vukicevic, M., Milovanovic, M., Minovic, M.: Using data mining on student behavior and cognitive style data for improving e-learning systems: a case study. Int. J. Comput. Intell. Syst. 5(3), 597–610 (2012)CrossRefGoogle Scholar
  37. 37.
    Kardan, S., Conati, C.: A framework for capturing distinguishing user interaction behaviours in novel interfaces. In: International Conference on User Modeling, Adaptation, and Personalization, pp. 126–138 (2012)Google Scholar
  38. 38.
    Kavšek, B., Lavrač, N., Jovanoski, V.: APRIORI-SD: Adapting Association Rule Learning to Subgroup Discovery, pp. 230–241. Springer, Berlin (2003)Google Scholar
  39. 39.
    Kavšek, B., Lavrač, N.: Apriori-SD: adapting association rule learning to subgroup discovery. In: Advances in Intelligent Data Analysis V, pp. 543–583 (2006)Google Scholar
  40. 40.
    Khan, T.M., Clear, F., Sajadi, S.S.: The relationship between educational performance and online access routines: analysis of students’ access to an online discussion forum. In: Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, pp. 226–229 (2012)Google Scholar
  41. 41.
    Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271 (1996)Google Scholar
  42. 42.
    Lara, J.A., Lizcano, D., Martínez, M.A., Pazos, J., Riera, T.: A system for knowledge discovery in e-learning environments within the european higher education area: application to student data from open university of madrid, udima. Comput. Educ. 72, 23–36 (2014)CrossRefGoogle Scholar
  43. 43.
    Lavrac, N., Flach, P.A., Zupan, B.: Rule evaluation measures: a unifying view. In: Proceedings of the 9th International Workshop on Inductive Logic Programming, pp. 174–185 (1999)Google Scholar
  44. 44.
    Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)MathSciNetGoogle Scholar
  45. 45.
    Leeuwen, M.V., Knobbe, A.: Diverse subgroup set discovery. Data Min. Knowl. Discov. 25(2), 208–242 (2012)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Lemmerich, F., Atzmueller, M., Puppe, F.: Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Discov. 30(3), 711–762 (2016)MathSciNetCrossRefGoogle Scholar
  47. 47.
    Lemmerich, F., Ifl, M., Puppe, F.: Identifying influence factors on students success by subgroup discovery. In: Educational Data Mining 2011, pp. 345–346 (2011)Google Scholar
  48. 48.
    Li, J., Liu, J., Toivonen, H., Satou, K., Sun, Y., Sun, B.: Discovering statistically non-redundant subgroups. Knowl. Based Syst. 67, 315–327 (2014)CrossRefGoogle Scholar
  49. 49.
    Lockyer, L., Heathcote, E., Dawson, S.: Informing pedagogical action: aligning learning analytics with learning design. Am. Behav. Sci. 57(10), 1439–1459 (2013)CrossRefGoogle Scholar
  50. 50.
    Macfadyen, L.P., Dawson, S.: Mining lms data to develop an "early warning system" for educators: a proof of concept. Comput. Educ. 54(2), 588–599 (2010)CrossRefGoogle Scholar
  51. 51.
    Marschark, M., Shaver, D.M., Nagle, K.M., Newman, L.A.: Predicting the academic achievement of deaf and hard-of-hearing students from individual, household, communication, and educational factors. Except. Child. 81(3), 350–369 (2015)CrossRefGoogle Scholar
  52. 52.
    Mwalumbwe, I., Mtebe, J.: Using learning analytics to predict students’ performance in moodle learning management system: a case of mbeya university of science and technology. IEEE Trans. Learn. Technol. 79, 1–13 (2017)Google Scholar
  53. 53.
    Elakia, G., Aarthi, N.J.: Application of data mining in educational database for predicting behavioural patterns of the students. Int. J. Comput. Sci. Inf. Technol. 5(3), 4469–4472 (2014)Google Scholar
  54. 54.
    Natek, S., Zwilling, M.: Student data mining solution knowledge management system related to higher education institutions. Exp. Syst. Appl. 41(14), 6400–6407 (2014)CrossRefGoogle Scholar
  55. 55.
    Natu, M., Palshikar, G.K.: Interesting Subset Discovery and Its Application on Service Processes, pp. 245–269. Springer, Berlin (2014)Google Scholar
  56. 56.
    Ogor, E.N.: Student academic performance monitoring and evaluation using data mining techniques. In: Proceedings of the Electronics, Robotics and Automotive Mechanics Conference, pp. 354–359 (2007)Google Scholar
  57. 57.
    Perry, J.W., Kent, A., Berry, M.M.: Machine literature searching x. machine language; factors underlying its design and development. Am. Doc. 6(4), 242–254 (1955)CrossRefGoogle Scholar
  58. 58.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc, San Francisco (1993)Google Scholar
  59. 59.
    Romero, C., Espejo, P.G., Zafra, A., Romero, J.R., Ventura, S.: Web usage mining for predicting final marks of students that use moodle courses. Comput. Appl. Eng. Educ. 21(1), 135–146 (2013)CrossRefGoogle Scholar
  60. 60.
    Romero, C., González, P., Ventura, S., del Jesus, M., Herrera, F.: Evolutionary algorithms for subgroup discovery in e-learning: a practical application using moodle data. Exp. Syst. Appl. 36(2, Part 1), 1632–1644 (2009)CrossRefGoogle Scholar
  61. 61.
    Romero, C., López, M.I., Luna, J.M., Ventura, S.: Predicting students’ final performance from participation in on-line discussion forums. Comput. Educ. 68, 458–472 (2013)CrossRefGoogle Scholar
  62. 62.
    Shaw, R.S.: A study of the relationships among learning styles, participation types, and performance in programming language learning supported by online forums. Comput. Educ. 58(1), 111–120 (2012)CrossRefGoogle Scholar
  63. 63.
    Tair, M.M.A., El-halees, A.M.: Mining educational data to improve students’ performance: a case study. Inf. Commun. Technol. Res. 2, 140–146 (2012)Google Scholar
  64. 64.
    Thiele, T., Singleton, A., Pope, D., Stanistreet, D.: Predicting students’ academic performance based on school and socio-demographic characteristics. Stud. High. Educ. 41(8), 1424–1446 (2016)CrossRefGoogle Scholar
  65. 65.
    Wei, H.C., Peng, H., Chou, C.: Can more interactivity improve learning achievement in an online course? Effects of college students’ perception and actual use of a course-management system on their learning achievement. Comput. Educ. 83, 10–21 (2015)CrossRefGoogle Scholar
  66. 66.
    Wrobel, S.: An Algorithm for multi-relational discovery of subgroups. In: Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery, pp. 78–87 (1997)Google Scholar
  67. 67.
    Yoo, J., Kin, J.: Predicting learner’s project performance with dialogue features in online Q and A discussions. In: Intelligent Tutoring Systems ITS, pp. 570–575 (2012)Google Scholar
  68. 68.
    Zacharis, N.Z.: A multivariate approach to predicting student outcomes in web-enabled blended learning courses. Internet High. Educ. 27, 44–53 (2015)CrossRefGoogle Scholar
  69. 69.
    Zheng, B., Warschauer, M.: Participation, interaction, and academic achievement in an online discussion environment. Comput. Educ. 84, 78–89 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Sumyea Helal
    • 1
    Email author
  • Jiuyong Li
    • 1
  • Lin Liu
    • 1
  • Esmaeil Ebrahimie
    • 1
    • 2
    • 3
  • Shane Dawson
    • 4
  • Duncan J. Murray
    • 5
  1. 1.School of Information Technology and Mathematical SciencesUniversity of South AustraliaAdelaideAustralia
  2. 2.School of Biological SciencesThe University of AdelaideAdelaideAustralia
  3. 3.School of Biological SciencesFlinders UniversityAdelaideAustralia
  4. 4.Teaching Innovation UnitUniversity of South AustraliaAdelaideAustralia
  5. 5.Business Intelligence and PlanningUniversity of South AustraliaAdelaideAustralia

Personalised recommendations