Automatic Classification of Error Types in Solutions to Programming Assignments at Online Learning Platform

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11626)


Online programming courses are becoming more and more popular, but they still have significant drawbacks when compared to the traditional education system, e.g., the lack of feedback. In this study, we apply machine learning methods to improve the feedback of automated verification systems for programming assignments. We propose an approach that provides an insight on how to fix the code for a given incorrect submission. To achieve this, we detect frequent error types by clustering previously submitted incorrect solutions, label these clusters and use this labeled dataset to identify the type of an error in a new submission. We examine and compare several approaches to the detection of frequent error types and to the assignment of clusters to new submissions. The proposed method is evaluated on a dataset provided by a popular online learning platform.


MOOC Automatic evaluation Clustering Classification Programming 


  1. 1.
    Falleri, J.R., Morandat, F., Blanc, X., Martinez, M., Monperrus, M.: Fine-grained and accurate source code differencing. In: ASE (2014)Google Scholar
  2. 2.
    Gall, H.C., Fluri, B., Pinzger, M.: Change analysis with evolizer and changedistiller. IEEE Softw. 26, 26–33 (2009)CrossRefGoogle Scholar
  3. 3.
    Giger, E., Pinzger, M., Gall, H.C.: Comparing fine-grained source code changes and code churn for bug prediction. In: Proceedings of the 8th Working Conference on Mining Software Repositories, MSR 2011, pp. 83–92. ACM, New York (2011).
  4. 4.
    Kim, S., James Whitehead, E., Zhang, Y.: Classifying software changes: clean or buggy? IEEE Trans. Softw. Eng. 34, 181–196 (2008)CrossRefGoogle Scholar
  5. 5.
    Kreutzer, P., Dotzler, G., Ring, M., Eskofier, B.M., Philippsen, M.: Automatic clustering of code changes. In: 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp. 61–72 (2016)Google Scholar
  6. 6.
    Larose, D.T., Larose, C.D.: Discovering Knowledge in Data: An Introduction to Data Mining. Wiley, Hoboken (2014)zbMATHGoogle Scholar
  7. 7.
    Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)
  8. 8.
    Weißgerber, P., Diehl, S.: Identifying refactorings from source-code changes. In: 21st IEEE/ACM International Conference on Automated Software Engineering (ASE 2006), pp. 231–240 (2006)Google Scholar
  9. 9.
    Yin, P., Neubig, G., Allamanis, M., Brockschmidt, M., Gaunt, A.L.: Learning to represent edits. In: International Conference on Learning Representations (2019).
  10. 10.
    Zaki, M.J., Meira, W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.JetBrains ResearchSaint PetersburgRussia
  2. 2.Higher School of EconomicsSaint PetersburgRussia
  3. 3.Saint Petersburg State UniversitySaint PetersburgRussia

Personalised recommendations