Abstract
In educational settings, automated program repair techniques serve as a feedback mechanism to guide students working on their programming assignments. Recent work has investigated using large language models (LLMs) for program repair. In this area, most of the attention has been focused on using proprietary systems accessible through APIs. However, the limited access and control over these systems remain a block to their adoption and usage in education. The present work studies the repairing capabilities of open large language models. In particular, we focus on a recent family of generative models, which, on top of standard left-to-right program synthesis, can also predict missing spans of code at any position in a program. We experiment with one of these models on four programming datasets and show that we can obtain good repair performance even without additional training.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Azcona, D., Smeaton, A.: +5 Million Python & Bash Programming Submissions for 5 Courses & Grades for Computer-Based Exams Over 3 Academic Years (2020). https://doi.org/10.6084/m9.figshare.12610958.v1
Bavarian, M., et al.: Efficient training of language models to fill in the middle (2022). https://doi.org/10.48550/ARXIV.2207.14255
Bommasani, R., et al.: On the opportunities and risks of foundation models (2021). https://doi.org/10.48550/ARXIV.2108.07258
Chen, M., et al.: Evaluating large language models trained on code (2021). https://doi.org/10.48550/ARXIV.2107.03374
Chen, Z., Kommrusch, S., Tufano, M., Pouchet, L., Poshyvanyk, D., Monperrus, M.: SequenceR: sequence-to-sequence learning for end-to-end program repair. IEEE Trans. Softw. Eng. 47(09), 1943–1959 (2021). https://doi.org/10.1109/TSE.2019.2940179
Cleuziou, G., Flouvat, F.: Learning student program embeddings using abstract execution traces. In: 14th International Conference on Educational Data Mining, pp. 252–262 (2021)
Dey, N., et al.: Cerebras-GPT: open compute-optimal language models trained on the Cerebras wafer-scale cluster. arXiv preprint arXiv:2304.03208 (2023)
Fried, D., et al.: InCoder: a generative model for code infilling and synthesis (2022). https://doi.org/10.48550/ARXIV.2204.05999
Hirsch, T., Hofer, B.: A systematic literature review on benchmarks for evaluating debugging approaches. J. Syst. Softw. 192, 111423 (2022). https://doi.org/10.1016/j.jss.2022.111423
Hu, Y., Ahmed, U.Z., Mechtaev, S., Leong, B., Roychoudhury, A.: Re-factoring based program repair applied to programming assignments. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2019)
Le Goues, C., Nguyen, T., Forrest, S., Weimer, W.: GenProg: a generic method for automatic software repair. IEEE Trans. Softw. Eng. 38(1), 54–72 (2012). https://doi.org/10.1109/TSE.2011.104
Lin, D., Koppel, J., Chen, A., Solar-Lezama, A.: QuixBugs: a multi-lingual program repair benchmark set based on the Quixey challenge. In: Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, pp. 55–56. SPLASH Companion 2017, ACM (2017). https://doi.org/10.1145/3135932.3135941
Long, F., Rinard, M.: Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. POPL 2016, pp. 298–312. ACM (2016)
McCauley, R., et al.: Debugging: a review of the literature from an educational perspective. Comput. Sci. Educ. 18(2), 67–92 (2008)
Prenner, J.A., Babii, H., Robbes, R.: Can OpenAI’s codex fix bugs? An evaluation on QuixBugs. In: Proceedings of the Third International Workshop on Automated Program Repair, pp. 69–75 (2022)
Pu, Y., Narasimhan, K., Solar-Lezama, A., Barzilay, R.: Sk_p: a neural program corrector for MOOCs. In: Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity, pp. 39–40. ACM (2016). https://doi.org/10.1145/2984043.2989222
Touvron, H., et al.: Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Xia, C.S., Wei, Y., Zhang, L.: Practical program repair in the era of large pre-trained language models (2022). https://doi.org/10.48550/ARXIV.2210.14179
Yasunaga, M., Liang, P.: Graph-based, self-supervised program repair from diagnostic feedback (2020). https://doi.org/10.48550/ARXIV.2005.10636
Zhang, J., et al.: Repairing bugs in python assignments using large language models (2022). https://doi.org/10.48550/ARXIV.2209.14876
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Koutcheme, C., Sarsa, S., Leinonen, J., Hellas, A., Denny, P. (2023). Automated Program Repair Using Generative Models for Code Infilling. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2023. Lecture Notes in Computer Science(), vol 13916. Springer, Cham. https://doi.org/10.1007/978-3-031-36272-9_74
Download citation
DOI: https://doi.org/10.1007/978-3-031-36272-9_74
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36271-2
Online ISBN: 978-3-031-36272-9
eBook Packages: Computer ScienceComputer Science (R0)