Mining for Marks: A Comparison of Classification Algorithms when Predicting Academic Performance to Identify “Students at Risk”

  • Lebogang Mashiloane
  • Mike Mchunu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8284)


A major concern for higher education institutions is the high failure and drop-out rates amongst students, especially first year students. Tertiary institutions thus have a common interest in identifying students at risk of failing or dropping out. Previous research studies have identified factors that influence success/failure which include, but are not limited to, the students’ personal information, academic background and social environment. This study aims to use the emerging field of Educational Data Mining as a preventative measure rather than reiterate factors that influence success. The first year student data collected and stored in the School of Computer Science at the University of the Witwatersrand has been utilised in this study. The study used the students’ first semester/midyear mark to predict success/failure at the end of the academic year. This will assist in identifying students at risk of failing and could assist with early intervention. A modified version of the CRISP-DM methodology was used. The investigation was broken down into two phases: training and test phase. In the training phase, student data from the years 2009 to 2011 were modelled using the WEKA Explorer GUI. Three classifiers: J48 classifier, Naïve Bayes and Decision Table, were used for modelling and were also compared. Using both the run information from WEKA and performance metrics, the J48 classifier was shown to be the better performing algorithm in the training phase. This algorithm was then integrated into the back-end of the Success Or Failure Determiner (SOFD) tool, which was created specifically for this study. In the test phase 92% of the instances were predicted correctly. Furthermore 23 of the 25 students who failed were flagged. The research findings indicated that the midyear mark can be considered as a factor which correctly predicts the Computer Science I final year marks. After further investigation with larger sample sizes, the tool can be used practically in the school of Computer Science to identify students at risk of failing.


Educational Data Mining J48 Classifier Decision Table Naïve Bayes WEKA GUI 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bhullar, M.S., Kaur, A.: Use of Data Mining in Education Sector. Lecture Notes in Engineering and Computer Science, vol. 2200, pp. 513–516 (2012) Google Scholar
  2. 2.
    Butcher, D.F., Muth, W.A.: Perdicting performance in an introductory computer science course. ACM 28(3) (1985)Google Scholar
  3. 3.
    Campbell, P., McCabe, G.: Predicting the success of freshmen in a computer science major. Commun. ACM 27(11), 1108–1113 (1984), CrossRefGoogle Scholar
  4. 4.
    Chandra, E., Nandhini, K.: Predicting student performance using classification techniques. In: Proceedings of SPIT-IEEE Clloquium and International ConferenceGoogle Scholar
  5. 5.
    Delavari, N., Phon-Amnuaisuk, S., Beikzadeh, M.: Data mining application in higher learning institutions. International Journal of Informatics in Education 7(1), 31–54 (2008)Google Scholar
  6. 6.
    Durant, K.T., Smith, M.D.: Predicting unix commands using decision tables and decision trees. In: Proceedings of the Third International Conference on Data Mining, pp. 427–436 (September 2004)Google Scholar
  7. 7.
    Fraser, W.J., Killen, R.: Factors influencing academic success or failure of first-year and senior university students: do education students and lecturers perceive things differently. South African Journal of Education 23(4), 254–260Google Scholar
  8. 8.
    Garcia-Saiz, D., Zorrilla, M.: Comparing classification methods for predicting distance students performance. In: JMLR: Workshop and Conference Proceedings 17, 2nd Workshop on Applications of Pattern Analysis 2011, pp. 26–32 (2011)Google Scholar
  9. 9.
    Kumar, V., Rathee, N.: Knowledge discovery from database using an integration of clustering and classification. IJACSA - International Journal of Advanced Computer Science and Applications 2(3), 29–33 (2011)Google Scholar
  10. 10.
    Panday, U.K., Pal, S.: Data Mining: A prediction of performer or underperformer using classification. International Journal of Computer Science and Information Technologies 2, 686–690 (2011)Google Scholar
  11. 11.
    Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977), CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Naik, N., Purohit, S.: Article: Prediction of final result and placement of students using classification algorithm. International Journal of Computer Applications 56(12), 35–40 (2012), published by Foundation of Computer Science, New York, USA Google Scholar
  13. 13.
    O’Byrne, J., Britton, S., George, A., Franklin, S., Frey, A.: Using academic predictors to identify first year science students at risk of failing. CAL-laborate International 17 (2009)Google Scholar
  14. 14.
    Osmanbegović, E., Suljić, M.: Data mining approach for predicting student performance. Economic Review 10(1) (2012)Google Scholar
  15. 15.
    Riesenfeld, R.: Bayes’ Theorem (2011),
  16. 16.
    Rauchas, S., Rosman, B., Konidaris, G.: Language performance at high school and success in first year computer science. SIGCSE 2006 (2006)Google Scholar
  17. 17.
    Obsivac, T., Popelinsky, L., Bydzovska, J.B.J.G.,, H.: Predicting drop-out from social behaviour of students, p. 103Google Scholar
  18. 18.
    Turner, E.H., Turner, R.M.: Teaching entering students to think like computer scientists. SIGCSE (2005)Google Scholar
  19. 19.
    Wimshurst, K.J., Wortley, R.K.: Academic success and failure: Student characteristics and broader implications for research in higher education. In: Effective Teaching and Learning. Griffith Institute for Higher Education (2005)Google Scholar
  20. 20.
    Yadav, S., Pal, S.: Data mining: A prediction for performance improvement of engineering students using classification. World of Computer Science and Information Technology Journal (WCSIT) 2(2), 51–56 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Lebogang Mashiloane
    • 1
  • Mike Mchunu
    • 1
  1. 1.School of Computer ScienceUniversity of the WitwatersrandJohannesburgSouth Africa

Personalised recommendations