Abstract
Both professional coders and teachers frequently deal with imperfect (fragmentary, incomplete, ill-formed) code. Such fragments are common in StackOverflow; students also frequently produce ill-formed code, for which instructors, TAs (or students themselves) must find repairs. In either case, the developer experience could be greatly improved if such code could somehow be parsed & typed; this makes such code more amenable to use within IDEs and allows early detection and repair of potential errors. We introduce a lenient parser, which can parse & type fragments, even ones with simple errors. Training a machine learner to leniently parse and type imperfect code requires a large training set including many pairs of imperfect code and its repair (and/or type information); such training sets are limited by human effort and curation. In this paper, we present a novel, indirectly supervised, approach to train a lenient parser, without access to such human-curated training data. We leverage the huge corpus of mostly correct code available on Github, and the massive, efficient learning capacity of Transformer-based NN architectures. Using GitHub data, we first create a large dataset of fragments of code and corresponding tree fragments and type annotations; we then randomly corrupt the input fragments (while requiring correct output) by seeding errors that mimic corruptions found in StackOverflow and student data. Using this data, we train high-capacity transformer models to overcome both fragmentation and corruption. With this novel approach, we can achieve reasonable performance on parsing & typing StackOverflow fragments; we also demonstrate that our approach performs well on shorter student error program and achieves best-in-class performance on longer programs that have more than 400 tokens. We also show that by blending DeepFix and our tool, we could achieve 77% accuracy, which outperforms all previously reported student error correction tools.
Similar content being viewed by others
Notes
It’s possible for others to license the data, however, as did we.
AST = Abstract Syntax Tree
28% is a point estimate. The 95% Wald confidence interval, on a binomial estimator with a sample size of 200, is 22-35%.
Allamanis and Sutton define popularity as the number of forks plus the number of watchers.
Anecdotally, additional braces are often next to existing braces; we therefore simulate this in 70% of cases while inserting them in another random location for the rest.
References
Alexandru CV, Panichella S, Gall HC (2017) Replicating parser behavior using neural machine translation. In: 2017 IEEE/ACM 25th international conference on program comprehension (ICPC). IEEE, pp 316–319
Allamanis M, Sutton C (2013) Mining source code repositories at massive scale using language modeling. In: The 10th working conference on mining software repositories. IEEE, pp 207–216
Babii H, Janes A, Robbes R (2019) Modeling vocabulary for big code machine learning. arXiv:190401873
Bacchelli A, Mocci A, Cleve A, Lanza M (2017) Mining structured data in natural language artifacts with island parsing. Sci Comput Program 150:31–55
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:14090473
Bhatia S, Singh R (2016) Automated correction for syntax errors in programming assignments using recurrent neural networks. arXiv:160306129
Brown NC, Altadmri A (2017) Novice java programming mistakes: Large-scale data vs. educator beliefs. ACM Trans Comput Educ (TOCE) 17 (2):7
Brown NCC, Kölling M, McCall D, Utting I (2014) Blackbox: a large scale repository of novice programmers’ activity. In: Proceedings of the 45th ACM technical symposium on Computer science education. ACM, pp 223–228
Chakraborty S, Allamanis M, Ray B (2018a) Tree2tree neural translation model for learning source code changes. arXiv:181000314
Chakraborty S, Allamanis M, Ray B (2018b) Tree2tree neural translation model for learning source code changes. CoRR 1810.00314
Chen Z, Kommrusch SJ, Tufano M, Pouchet LN, Poshyvanyk D, Monperrus M (2019) Sequencer: Sequence-to-sequence learning for end-to-end program repair. IEEE Trans Softw Eng
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:14123555
Dagenais B, Hendren L (2008) Enabling static analysis for partial java programs. In: ACM Sigplan notices, vol 43. ACM, pp 313–328
Ding Y, Ray B, Devanbu P, Hellendoorn VJ (2020) Patching as translation: the data and the metaphor. In: 35th IEEE/ACM international conference on automated software engineering (ASE)
Gupta R, Pal S, Kanade A, Shevade S (2017) Deepfix: Fixing common C language errors by deep learning. In: Thirty-First AAAI conference on artificial intelligence
Gupta R, Kanade A, Shevade S (2018) Deep reinforcement learning for programming language correction. arXiv:180110467
Hellendoorn VJ, Devanbu P (2017) Are deep neural networks the best choice for modeling source code?. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 763–773
Hellendoorn VJ, Bird C, Barr ET, Allamanis M (2018) Deep learning type inference. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. ACM, pp 152–162
Hindle A, Barr ET, Su Z, Gabel M, Devanbu P (2012) On the naturalness of software, IEEE
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9 (8):1735–1780
Holmes R, Walker RJ, Murphy GC (2005) Strathcona example recommendation tool. In: ACM SIGSOFT software engineering notes, vol 30. ACM, pp 237–240
Karampatsis RM, Babii H, Robbes R, Sutton C, Janes A (2020) Big code != big vocabulary: Open-vocabulary models for source code. In: International conference on software engineering (ICSE)
Kölling M, Quig B, Patterson A, Rosenberg J (2003) The bluej system and its pedagogy. Comput Sci Educ 13 (4):249–268
Le Goues C, Nguyen T, Forrest S, Weimer W (2011) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 38 (1):54–72
Li Y, Wang S, Nguyen TN (2020) Dlfix: Context-based code transformation learning for automated program repair. In: 2020 42th international conference on software engineering (ICSE)
Long F, Rinard M (2016) Automatic patch generation by learning correct code. In: Proceedings of the 43rd annual ACM SIGPLAN-SIGACT symposium on principles of programming languages, pp 298–312
Loshchilov I, Hutter F (2016) Sgdr: Stochastic gradient descent with warm restarts. arXiv:160803983
Lutellier T, Pham HV, Pang L, Li Y, Wei M, Tan L (2020) Coconut: combining context-aware neural translation models using ensemble for program repair. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, pp 101–114
Malik RS, Patra J, Pradel M (2019) Nl2type: inferring javascript function types from natural language information. In: Proceedings of the 41st international conference on software engineering. IEEE Press, pp 304–315
McCracken M, Almstrum V, Diaz D, Guzdial M, Hagan D, Kolikant YBD, Laxer C, Thomas L, Utting I, Wilusz T (2001) A multi-national, multi-institutional study of assessment of programming skills of first-year cs students. In: Working group reports from ITiCSE on innovation and technology in computer science education. ACM, pp 125–180
Mesbah A, Rice A, Johnston E, Glorioso N, Aftandilian E (2019) Deepdelta: learning to repair compilation errors
Moonen L (2001) Generating robust parsers using island grammars. In: Proceedings eighth working conference on reverse engineering. IEEE, pp 13–22
Nasehi SM, Sillito J, Maurer F, Burns C (2012) What makes a good code example?: A study of programming q&a in stackoverflow. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 25–34
Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014) Mining stackoverflow to turn the ide into a self-confident programming prompter. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 102–111
Pradel M, Sen K (2018) Deepbugs: a learning approach to name-based bug detection. Proc ACM Prog Lang 2 (OOPSLA):1–25
Qiu D, Li B, Leung H (2016) Understanding the API usage in java. Inf Softw Technol 73:81–100
Raychev V, Vechev M, Krause A (2015) Predicting program properties from big code. In: ACM SIGPLAN notices, vol 50. ACM, pp 111–124
Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: 013 35th international conference on software engineering (ICSE). IEEE, pp 832–841
Rountev A, Ryder BG, Landi W (1999) Data-flow analysis of program fragments. In: Software engineering? ESEC/FSE?99. Springer, pp 235–252
Santos EA, Campbell JC, Patel D, Hindle A, Amaral JN (2018) Syntax and sensibility: Using language models to detect and correct syntax errors. In: 2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 311–322
Synytskyy N, Cordy JR, Dean TR (2003) Robust multilingual parsing using island grammars. In: Proceedings of the 2003 conference of the centre for advanced studies on collaborative research. IBM Press, pp 266–278
Thummalapenta S, Xie T (2007) Parseweb: a programmer assistant for reusing open source code on the web. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 204–213
Tufano M, Pantiuchina J, Watson C, Bavota G, Poshyvanyk D (2019) On learning meaningful code changes via neural machine translation. In: Proceedings of the 41st international conference on software engineering. IEEE Press, pp 25–36
Van Deursen A, Kuipers T (1999) Building documentation generators. In: Proceedings IEEE international conference on software maintenance-1999 (ICSM’99).’Software Maintenance for Business Change’(Cat. No. 99CB36360). IEEE, pp 40–49
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Vinyals O, Kaiser Ł, Koo T, Petrov S, Sutskever I, Hinton G (2015) Grammar as a foreign language. In: Advances in neural information processing systems, pp 2773–2781
Wang K, Singh R, Su Z (2018) Search, align, and repair: Data-driven feedback generation for introductory programming exercises. SIGPLAN Not 53 (4):481–495. https://doi.org/10.1145/3296979.3192384
White M, Tufano M, Martinez M, Monperrus M, Poshyvanyk D (2019) Sorting and transforming program repair ingredients via deep learning code similarities, IEEE
Acknowledgements
Prem Devanbu and Vincent Hellendoorn were supported by the National Science Foundation, via NSF CISE SHF: Large No. 1414172. Hellendoorn was also supported by the Microsoft PhD Fellowship. Toufique Ahmed was supported by the UC Davis Dean’s Distinguished Graduate Fellowship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Lin Tan
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ahmed, T., Devanbu, P. & Hellendoorn, V.J. Learning lenient parsing & typing via indirect supervision. Empir Software Eng 26, 29 (2021). https://doi.org/10.1007/s10664-021-09942-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-09942-y