Skip to main content

Advertisement

Log in

Learning lenient parsing & typing via indirect supervision

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Both professional coders and teachers frequently deal with imperfect (fragmentary, incomplete, ill-formed) code. Such fragments are common in StackOverflow; students also frequently produce ill-formed code, for which instructors, TAs (or students themselves) must find repairs. In either case, the developer experience could be greatly improved if such code could somehow be parsed & typed; this makes such code more amenable to use within IDEs and allows early detection and repair of potential errors. We introduce a lenient parser, which can parse & type fragments, even ones with simple errors. Training a machine learner to leniently parse and type imperfect code requires a large training set including many pairs of imperfect code and its repair (and/or type information); such training sets are limited by human effort and curation. In this paper, we present a novel, indirectly supervised, approach to train a lenient parser, without access to such human-curated training data. We leverage the huge corpus of mostly correct code available on Github, and the massive, efficient learning capacity of Transformer-based NN architectures. Using GitHub data, we first create a large dataset of fragments of code and corresponding tree fragments and type annotations; we then randomly corrupt the input fragments (while requiring correct output) by seeding errors that mimic corruptions found in StackOverflow and student data. Using this data, we train high-capacity transformer models to overcome both fragmentation and corruption. With this novel approach, we can achieve reasonable performance on parsing & typing StackOverflow fragments; we also demonstrate that our approach performs well on shorter student error program and achieves best-in-class performance on longer programs that have more than 400 tokens. We also show that by blending DeepFix and our tool, we could achieve 77% accuracy, which outperforms all previously reported student error correction tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. It’s possible for others to license the data, however, as did we.

  2. AST = Abstract Syntax Tree

  3. 28% is a point estimate. The 95% Wald confidence interval, on a binomial estimator with a sample size of 200, is 22-35%.

  4. https://stackoverflow.com/a/54596387

  5. Allamanis and Sutton define popularity as the number of forks plus the number of watchers.

  6. See https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html

  7. See https://github.com/Lsdefine/attention-is-all-you-need-keras

  8. Anecdotally, additional braces are often next to existing braces; we therefore simulate this in 70% of cases while inserting them in another random location for the rest.

  9. See https://cloud.google.com/bigquery/public-data/

  10. See https://bitbucket.org/iiscseal/deepfix/src/master/

  11. http://www.sable.mcgill.ca/ppa/

References

  • Alexandru CV, Panichella S, Gall HC (2017) Replicating parser behavior using neural machine translation. In: 2017 IEEE/ACM 25th international conference on program comprehension (ICPC). IEEE, pp 316–319

  • Allamanis M, Sutton C (2013) Mining source code repositories at massive scale using language modeling. In: The 10th working conference on mining software repositories. IEEE, pp 207–216

  • Babii H, Janes A, Robbes R (2019) Modeling vocabulary for big code machine learning. arXiv:190401873

  • Bacchelli A, Mocci A, Cleve A, Lanza M (2017) Mining structured data in natural language artifacts with island parsing. Sci Comput Program 150:31–55

    Article  Google Scholar 

  • Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:14090473

  • Bhatia S, Singh R (2016) Automated correction for syntax errors in programming assignments using recurrent neural networks. arXiv:160306129

  • Brown NC, Altadmri A (2017) Novice java programming mistakes: Large-scale data vs. educator beliefs. ACM Trans Comput Educ (TOCE) 17 (2):7

    Google Scholar 

  • Brown NCC, Kölling M, McCall D, Utting I (2014) Blackbox: a large scale repository of novice programmers’ activity. In: Proceedings of the 45th ACM technical symposium on Computer science education. ACM, pp 223–228

  • Chakraborty S, Allamanis M, Ray B (2018a) Tree2tree neural translation model for learning source code changes. arXiv:181000314

  • Chakraborty S, Allamanis M, Ray B (2018b) Tree2tree neural translation model for learning source code changes. CoRR 1810.00314

  • Chen Z, Kommrusch SJ, Tufano M, Pouchet LN, Poshyvanyk D, Monperrus M (2019) Sequencer: Sequence-to-sequence learning for end-to-end program repair. IEEE Trans Softw Eng

  • Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:14123555

  • Dagenais B, Hendren L (2008) Enabling static analysis for partial java programs. In: ACM Sigplan notices, vol 43. ACM, pp 313–328

  • Ding Y, Ray B, Devanbu P, Hellendoorn VJ (2020) Patching as translation: the data and the metaphor. In: 35th IEEE/ACM international conference on automated software engineering (ASE)

  • Gupta R, Pal S, Kanade A, Shevade S (2017) Deepfix: Fixing common C language errors by deep learning. In: Thirty-First AAAI conference on artificial intelligence

  • Gupta R, Kanade A, Shevade S (2018) Deep reinforcement learning for programming language correction. arXiv:180110467

  • Hellendoorn VJ, Devanbu P (2017) Are deep neural networks the best choice for modeling source code?. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 763–773

  • Hellendoorn VJ, Bird C, Barr ET, Allamanis M (2018) Deep learning type inference. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. ACM, pp 152–162

  • Hindle A, Barr ET, Su Z, Gabel M, Devanbu P (2012) On the naturalness of software, IEEE

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9 (8):1735–1780

    Article  Google Scholar 

  • Holmes R, Walker RJ, Murphy GC (2005) Strathcona example recommendation tool. In: ACM SIGSOFT software engineering notes, vol 30. ACM, pp 237–240

  • Karampatsis RM, Babii H, Robbes R, Sutton C, Janes A (2020) Big code != big vocabulary: Open-vocabulary models for source code. In: International conference on software engineering (ICSE)

  • Kölling M, Quig B, Patterson A, Rosenberg J (2003) The bluej system and its pedagogy. Comput Sci Educ 13 (4):249–268

    Article  Google Scholar 

  • Le Goues C, Nguyen T, Forrest S, Weimer W (2011) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 38 (1):54–72

    Article  Google Scholar 

  • Li Y, Wang S, Nguyen TN (2020) Dlfix: Context-based code transformation learning for automated program repair. In: 2020 42th international conference on software engineering (ICSE)

  • Long F, Rinard M (2016) Automatic patch generation by learning correct code. In: Proceedings of the 43rd annual ACM SIGPLAN-SIGACT symposium on principles of programming languages, pp 298–312

  • Loshchilov I, Hutter F (2016) Sgdr: Stochastic gradient descent with warm restarts. arXiv:160803983

  • Lutellier T, Pham HV, Pang L, Li Y, Wei M, Tan L (2020) Coconut: combining context-aware neural translation models using ensemble for program repair. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, pp 101–114

  • Malik RS, Patra J, Pradel M (2019) Nl2type: inferring javascript function types from natural language information. In: Proceedings of the 41st international conference on software engineering. IEEE Press, pp 304–315

  • McCracken M, Almstrum V, Diaz D, Guzdial M, Hagan D, Kolikant YBD, Laxer C, Thomas L, Utting I, Wilusz T (2001) A multi-national, multi-institutional study of assessment of programming skills of first-year cs students. In: Working group reports from ITiCSE on innovation and technology in computer science education. ACM, pp 125–180

  • Mesbah A, Rice A, Johnston E, Glorioso N, Aftandilian E (2019) Deepdelta: learning to repair compilation errors

  • Moonen L (2001) Generating robust parsers using island grammars. In: Proceedings eighth working conference on reverse engineering. IEEE, pp 13–22

  • Nasehi SM, Sillito J, Maurer F, Burns C (2012) What makes a good code example?: A study of programming q&a in stackoverflow. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 25–34

  • Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014) Mining stackoverflow to turn the ide into a self-confident programming prompter. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 102–111

  • Pradel M, Sen K (2018) Deepbugs: a learning approach to name-based bug detection. Proc ACM Prog Lang 2 (OOPSLA):1–25

    Article  Google Scholar 

  • Qiu D, Li B, Leung H (2016) Understanding the API usage in java. Inf Softw Technol 73:81–100

    Article  Google Scholar 

  • Raychev V, Vechev M, Krause A (2015) Predicting program properties from big code. In: ACM SIGPLAN notices, vol 50. ACM, pp 111–124

  • Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: 013 35th international conference on software engineering (ICSE). IEEE, pp 832–841

  • Rountev A, Ryder BG, Landi W (1999) Data-flow analysis of program fragments. In: Software engineering? ESEC/FSE?99. Springer, pp 235–252

  • Santos EA, Campbell JC, Patel D, Hindle A, Amaral JN (2018) Syntax and sensibility: Using language models to detect and correct syntax errors. In: 2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 311–322

  • Synytskyy N, Cordy JR, Dean TR (2003) Robust multilingual parsing using island grammars. In: Proceedings of the 2003 conference of the centre for advanced studies on collaborative research. IBM Press, pp 266–278

  • Thummalapenta S, Xie T (2007) Parseweb: a programmer assistant for reusing open source code on the web. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 204–213

  • Tufano M, Pantiuchina J, Watson C, Bavota G, Poshyvanyk D (2019) On learning meaningful code changes via neural machine translation. In: Proceedings of the 41st international conference on software engineering. IEEE Press, pp 25–36

  • Van Deursen A, Kuipers T (1999) Building documentation generators. In: Proceedings IEEE international conference on software maintenance-1999 (ICSM’99).’Software Maintenance for Business Change’(Cat. No. 99CB36360). IEEE, pp 40–49

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  • Vinyals O, Kaiser Ł, Koo T, Petrov S, Sutskever I, Hinton G (2015) Grammar as a foreign language. In: Advances in neural information processing systems, pp 2773–2781

  • Wang K, Singh R, Su Z (2018) Search, align, and repair: Data-driven feedback generation for introductory programming exercises. SIGPLAN Not 53 (4):481–495. https://doi.org/10.1145/3296979.3192384

    Article  Google Scholar 

  • White M, Tufano M, Martinez M, Monperrus M, Poshyvanyk D (2019) Sorting and transforming program repair ingredients via deep learning code similarities, IEEE

Download references

Acknowledgements

Prem Devanbu and Vincent Hellendoorn were supported by the National Science Foundation, via NSF CISE SHF: Large No. 1414172. Hellendoorn was also supported by the Microsoft PhD Fellowship. Toufique Ahmed was supported by the UC Davis Dean’s Distinguished Graduate Fellowship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toufique Ahmed.

Additional information

Communicated by: Lin Tan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmed, T., Devanbu, P. & Hellendoorn, V.J. Learning lenient parsing & typing via indirect supervision. Empir Software Eng 26, 29 (2021). https://doi.org/10.1007/s10664-021-09942-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-09942-y

Keywords

Navigation