Detecting students-at-risk in computer programming classes with learning analytics from students’ digital footprints

  • David AzconaEmail author
  • I-Han Hsiao
  • Alan F. Smeaton


Different sources of data about students, ranging from static demographics to dynamic behavior logs, can be harnessed from a variety sources at Higher Education Institutions. Combining these assembles a rich digital footprint for students, which can enable institutions to better understand student behaviour and to better prepare for guiding students towards reaching their academic potential. This paper presents a new research methodology to automatically detect students “at-risk” of failing an assignment in computer programming modules (courses) and to simultaneously support adaptive feedback. By leveraging historical student data, we built predictive models using students’ offline (static) information including student characteristics and demographics, and online (dynamic) resources using programming and behaviour activity logs. Predictions are generated weekly during semester. Overall, the predictive and personalised feedback helped to reduce the gap between the lower and higher-performing students. Furthermore, students praised the prediction and the personalised feedback, conveying strong recommendations for future students to use the system. We also found that students who followed their personalised guidance and recommendations performed better in examinations.


Computer Science Education Learning analytics Predictive modelling Machine learning Peer learning Educational data mining 



This research was supported by the Irish Research Council in association with the National Forum for the Enhancement of Teaching and Learning in Ireland under project number GOIPG/2015/3497, by Science Foundation Ireland under grant number 12/RC/2289, and by Fulbright Ireland. The authors are indebted to Dr. Stephen Blott who developed the grading platform and Dr. Darragh O’Brien, lecturer on the module which is the subject of this work, for their help. We would also like to thank all students who participated in this initiative for their comments and feedback and to the anonymous reviewers for their helpful and constructive feedback.


  1. Altadmri, A., Brown, N.C.C.: 37 million compilations: investigating novice programming mistakes in large-scale student data. In: Proceedings of the 46th ACM Technical Symposium on Computer Science Education, pp. 522–527. ACM (2015)Google Scholar
  2. Arnold, K.E., Pistilli, M.D.: Course signals at purdue: using learning analytics to increase student success. In: Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, pp. 267–270. ACM (2012)Google Scholar
  3. Azcona, D., Hsiao, I.H., Smeaton, A.F.: PredictCS: personalizing programming learning by leveraging learning analytics. In: Companion Proceedings of the 8th International Conference on Learning Analytics and Knowledge (LAK 2018), pp. 462–468 (2018)Google Scholar
  4. Azcona, D., Smeaton, A.F.: Targeting at-risk students using engagement and effort predictors in an introductory computer programming course. In: European Conference on Technology Enhanced Learning (EC-TEL’17), pp. 361–366. Springer, NY (2017)Google Scholar
  5. Azcona, D., Corrigan, O., Scanlon, P., Smeaton, A.F.: Innovative learning analytics research at a data-driven HEI. In: Third International Conference on Higher Education Advances. Editorial Universitat Politecnica de Valencia (2017)Google Scholar
  6. Blikstein, P., Worsley, M.: Multimodal learning analytics and education data mining: using computational technologies to measure complex learning tasks. J. Learn. Anal. 3(2), 220–238 (2016)CrossRefGoogle Scholar
  7. Bloomfield, A., Groves, J.F.: A tablet-based paper exam grading system. In: ACM SIGCSE Bulletin, Vol. 40, No. 3, pp. 83–87. ACM (2008)Google Scholar
  8. Boyer, K.E., Phillips, R., Ingram, A., Ha, E.Y., Wallis, M., Vouk, M., Lester, J.: Investigating the relationship between dialogue structure and tutoring effectiveness: a hidden markov modeling approach. Int. J. Artif. Intell. Educ. 21(1–2), 65–81 (2011)Google Scholar
  9. Brooks, C., Thompson, C.: Predictive modelling in teaching and learning. In: Lang, C., Siemens, G., Wise, A.F., Gasevic, D. (eds.) The Handbook of Learning Analytics, 1st edn, pp. 61–68. Society for Learning Analytics Research (SoLAR), Alberta (2017)CrossRefGoogle Scholar
  10. Buffardi, K., Edwards, S.H.: Effective and ineffective software testing behaviors by novice programmers. In: Proceedings of the Ninth Annual International ACM Conference on International Computing Education Research, pp. 83–90. ACM (2013)Google Scholar
  11. Burleson, W.: Affective Learning Companions: Strategies for Empathetic Agents with Real-time Multimodal Affective Sensing to Foster Meta-cognitive and Meta-affective Approaches to Learning, Motivation, and Perseverance. Ph.D. Thesis, Massachusetts Institute of Technology (2006)Google Scholar
  12. Carter, A.S., Hundhausen, C.D., Adesope, O.: The normalized programming state model: Predicting student performance in computing courses based on programming behavior. In: Proceedings of the Eleventh Annual International Conference on International Computing Education Research, pp. 141–150. ACM (2015)Google Scholar
  13. Cheang, B., Kurnia, A., Lim, A., Oon, W.C.: On automated grading of programming assignments in an academic institution. Comput. Educ. 41(2), 121–131 (2003)CrossRefGoogle Scholar
  14. Chen, W., Looi, C.K.: Group scribbles-supported collaborative learning in a primary grade 5 science class. In: Productive Multivocality in the Analysis of Group Interactions, pp. 257–263. Springer (2013)Google Scholar
  15. Conati, C.: Probabilistic assessment of user’s emotions in educational games. Appl. Artif. Intell. 16(7–8), 555–575 (2002)CrossRefGoogle Scholar
  16. Conijn, R., Chris, S., Ad, K., Uwe, M.: Predicting student performance from LMS data: a comparison of 17 blended courses using Moodle LMS. IEEE Trans. Learn. Technol. 10(1), 17–29 (2017)CrossRefGoogle Scholar
  17. Corrigan, O., Smeaton, A.F., Glynn, M., Smyth, S.: Using educational analytics to improve test performance. In: Design for Teaching and Learning in a Networked World, pp. 42–55. Springer (2015)Google Scholar
  18. Denny, P., Luxton-Reilly, A., Hamer, J.: Student use of the peerwise system. In: ACM SIGCSE Bulletin, Vol. 40, No. 3, pp. 73–77. ACM (2008)Google Scholar
  19. Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition, vol. 31. Springer (2013)Google Scholar
  20. Diana, N., Eagle, M., Stamper, J.C., Grover, S., Bienkowski, M.A., Basu, S.: An instructor dashboard for real-time analytics in interactive programming assignments. In: LAK, pp. 272–279 (2017)Google Scholar
  21. Edwards, S.H., Perez-Quinones, M.A.: Web-cat: automatically grading programming assignments. In: ACM SIGCSE Bulletin, Vol. 40, pp. 328–328. ACM (2008)Google Scholar
  22. Gehringer, E.F.: Electronic peer review and peer grading in computer-science courses. ACM SIGCSE Bull. 33(1), 139–143 (2001)CrossRefGoogle Scholar
  23. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)CrossRefzbMATHGoogle Scholar
  24. Guerra, J., Sahebi, S., Lin, Y.R., Brusilovsky, P.: The problem solving genome: Analyzing sequential patterns of student work with parameterized exercises. The 7th International Conference on Educational Data Mining EDM 2014, pp. 153–160 (2014)Google Scholar
  25. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)CrossRefGoogle Scholar
  26. Hsiao, I.H.: Mobile grading paper-based programming exams: automatic semantic partial credit assignment approach. In: European Conference on Technology Enhanced Learning, pp. 110–123. Springer (2016)Google Scholar
  27. Hsiao, I.H., Lin, Y.L.: Enriching programming content semantics: an evaluation of visual analytics approach. Comput. Hum. Behav. 72, 771–782 (2017)CrossRefGoogle Scholar
  28. Hsiao, I.H., Sosnovsky, S., Brusilovsky, P.: Guiding students to the right questions: adaptive navigation support in an e-learning system for java programming. J. Comput. Assist. Learn. 26(4), 270–283 (2010)CrossRefGoogle Scholar
  29. Hsiao, I.H., Pandhalkudi Govindarajan, S.K., Lin, Y.L.: Semantic visual analytics for today’s programming courses. In: Proceedings of the Sixth International Conference on Learning Analytics and Knowledge, pp. 48–53. ACM (2016)Google Scholar
  30. Hsiao, I.-H., Huang, P.-K., Murphy, H.: Integrating programming learning analytics across physical and digital space. IEEE Trans. Emerg. Top. Comput. 1, 1–12 (2017a)Google Scholar
  31. Hsiao, I.H., Huang, P.K., Murphy, H.: Uncovering reviewing and reflecting behaviors from paper-based formal assessment. In: Proceedings of the Seventh International Learning Analytics and Knowledge Conference, pp. 319–328. ACM (2017b)Google Scholar
  32. Ihantola, P., Vihavainen, A., Ahadi, A., Butler, M., Borstler, J., Edwards, S.H., Isohanni, E., Korhonen, A., Petersen, A., Rivers, K., et al.: Educational data mining and learning analytics in programming: literature review and case studies. In: Proceedings of the 2015 ITiCSE on Working Group Reports, pp. 41–63. ACM, NY (2015)Google Scholar
  33. Jackson, D., Usher, M.: Grading student programs using ASSYST. In: ACM SIGCSE Bulletin, vol. 29, pp. 335–339. ACM (1997)Google Scholar
  34. Jadud, M.C., Dorn, B.: Aggregate compilation behavior: Findings and implications from 27,698 users. In: Proceedings of the Eleventh Annual International Conference on International Computing Education Research, pp. 131–139. ACM (2015)Google Scholar
  35. Lin, C.P., Chen, W., Yang, S.J., Xie, W., Lin, C.C.: Exploring students’ learning effectiveness and attitude in group scribbles-supported collaborative reading activities: a study in the primary classroom. J. Comput. Assist. Learn. 30(1), 68–81 (2014)CrossRefGoogle Scholar
  36. Looi, C.K., Lin, C.P., Liu, K.P.: Group scribbles to support knowledge building in jigsaw method. IEEE Trans. Learn. Technol. 1(3), 157–164 (2008)CrossRefGoogle Scholar
  37. Lu, Y., Hsiao, I.H.: Personalized information seeking assistant (PISA): from programming information seeking to learning. Inf. Retr. J. 20(5), 433–455 (2017)CrossRefGoogle Scholar
  38. Martinez-Maldonado, R., Dimitriadis, Y., Martinez-Mones, A., Kay, J., Yacef, K.: Capturing and analyzing verbal and physical collaborative learning interactions at an enriched interactive tabletop. Int. J. Comput. Support. Collab. Learn. 8(4), 455–485 (2013)CrossRefGoogle Scholar
  39. Martinez-Maldonado, R., Clayphan, A., Yacef, K., Kay, J.: MTFeedback: providing notifications to enhance teacher awareness of small group work in the classroom. IEEE Trans. Learn. Technol. 8(2), 187–200 (2015)CrossRefGoogle Scholar
  40. Murphy, H.E.: Digitalizing paper-based exams: an assessment of programming grading assistant. In: Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education, pp. 775–776. ACM (2017)Google Scholar
  41. Ochoa, X.: Multimodal learning analytics. In: The Handbook of Learning Analytics, 1 ed., C. Lang, G. Siemens, A. F. Wise and D. Gasevic, Eds., pp. 129–141. Society for Learning Analytics Research (SoLAR), Alberta, Canada (2017)Google Scholar
  42. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  43. Piech, C., Sahami, M., Koller, D., Cooper, S., Blikstein, P.: Modeling how students learn to program. In: Proceedings of the 43rd ACM technical symposium on Computer Science Education, pp. 153–160. ACM, New York (2012)Google Scholar
  44. Price, T.W., Zhi, R., Barnes, T.: Hint generation under uncertainty: the effect of hint quality on help-seeking behavior. In: International Conference on Artificial Intelligence in Education, pp. 311–322. Springer, Berlin (2017)Google Scholar
  45. Prieto, L.P., Sharma, K., Kidzinski, L., Dillenbourg, P.: Orchestration load indicators and patterns: In-the-wild studies using mobile eye-tracking. IEEE Transactions on Learning Technologies (2017)Google Scholar
  46. Ritterfeld, U., Shen, C., Wang, H., Nocera, L., Wong, W.L.: Multimodality and interactivity: connecting properties of serious games with educational outcomes. Cyberpsychol. Behav. 12(6), 691–697 (2009)CrossRefGoogle Scholar
  47. Rivers, K., Koedinger, K.R.: Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. Int. J. Artif. Intel. Educ. 27(1), 37–64 (2017)CrossRefGoogle Scholar
  48. Singh, A., Karayev, S., Gutowski, K., Abbeel, P.: Gradescope: a fast, flexible, and fair system for scalable assessment of handwritten work. In: Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale, pp. 81–88. ACM (2017)Google Scholar
  49. Sosnovsky, S., Peter, B.: Evaluation of topic-based adap-tation andstudent modeling in QuizGuide. User Model. User-AdaptedInteract. 25(4), 371–424 (2015)CrossRefGoogle Scholar
  50. Tempelaar, D.T., Rienties, B., Nguyen, Q.: Towards actionable learning analytics using dispositions. IEEE Trans. Learn. Technol. 10(1), 6–16 (2017)CrossRefGoogle Scholar
  51. VanLehn, K., Cheema, S., Wetzel, J., Pead, D.: Some less obvious features of classroom orchestration systems. In: Educational Technologies: Challenges, Applications and Learning Outcomes. Nova Science Publishers, Inc. (2016)Google Scholar
  52. Vea, L., Rodrigo, M.M.: Modeling negative affect detector of novice programming students using keyboard dynamics and mouse behavior. In: Pacific Rim International Conference on Artificial Intelligence, pp. 127–138. Springer, Berlin (2016)Google Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Insight Centre for Data AnalyticsDublin City UniversityGlasnevin, DublinIreland
  2. 2.School of Computing, Informatics and Decision Systems EngineeringArizona State UniversityTempeUSA

Personalised recommendations