Abstract
For some inexperienced developers, extracting key information from code snippets and programming error messages and turning it into a highly readable question can help them better understand, locate and search for the cause of errors. This paper proposes a copy mechanism guided transformer with pre-trained programming and natural languages representations (CMPPN) to automatically generate questions with high human readability from code snippets and programming error messages. Our CMPPN is pre-trained on a large scale code corpus with code summarization task based on transformer, and incorporated with copying mechanism in the fine-tuning phase. To evaluate our proposed model, we create a new dataset based on Stack Overflow posts, which contains code snippets, programming error messages and corresponding question headlines in 3 programming languages (Java, C# and Python). Extensive experimental results on this dataset verify the effectiveness of our CMPPN compared to baseline methods. Both dataset and model are available on https://github.com/YuiTH/CEMS-SO.
B. Yao—Worked during the internship at Microsoft Research Asia.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
82% of the questions have at least one answer, indicating that these questions are of high quality.
- 5.
- 6.
For Java, we use “at:” plus Java top level domain package name including java org io net etc., as pattern for error message; for C#, pattern are “CS” with four digits for compile error and “Exception:” for runtime error; for Python, pattern is message start with “Traceback (most recent call last)”.
- 7.
When a Python program reports an error, the information pointing to user codes usually appears at the end.
- 8.
References
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan, pp. 65–72. Association for Computational Linguistics, June 2005
Dong, L., et al.: Unified language model pre-training for natural language understanding and generation. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 13063–13075 (2019)
Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 1536–1547 (2020)
Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1631–1640 (2016)
Guo, D., et al.: GRAPHCODEBERT: pre-training code representations with data flow. In: International Conference on Learning Representations (2020)
Hailpern, B., Santhanam, P.: Software debugging, testing, and verification. IBM Syst. J. 41(1), 4–12 (2002)
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. (2015)
Husain, H., Wu, H.H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodeSearchNet challenge: evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019)
Kanade, A., Maniatis, P., Balakrishnan, G., Shi, K.: Pre-trained contextual embedding of source code. arXiv preprint arXiv:2001.00059 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, Barcelona, Spain, pp. 74–81. Association for Computational Linguistics, July 2004
Lu, S., et al.: CodeXGLUE: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021)
Narayan, S., Cohen, S.B., Lapata, M.: Don’t give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium (2018)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Qi, W., et al.: ProphetNet-x: large-scale pre-training models for English, Chinese, multi-lingual, dialog, and code generation. arXiv preprint arXiv:2104.08006 (2021)
Qi, W., et al.: ProphetNet: predicting future n-gram for sequence-to-sequence pre-training. arXiv preprint arXiv:2001.04063 (2020)
Ruparelia, N.B.: Software development lifecycle models. ACM SIGSOFT Softw. Eng. Notes 35(3), 8–13 (2010)
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017)
Song, K., Tan, X., Qin, T., Lu, J., Liu, T.Y.: MASS: masked sequence to sequence pre-training for language generation. In: International Conference on Machine Learning, pp. 5926–5936. PMLR (2019)
Svyatkovskiy, A., Deng, S.K., Fu, S., Sundaresan, N.: IntelliCode compose: code generation using transformer. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1433–1443 (2020)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Zhang, J., Zhao, Y., Saleh, M., Liu, P.: PEGASUS: pre-training with extracted gap-sentences for abstractive summarization. In: International Conference on Machine Learning, pp. 11328–11339. PMLR (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Yao, B. et al. (2021). Question Generation from Code Snippets and Programming Error Messages. In: Wang, L., Feng, Y., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2021. Lecture Notes in Computer Science(), vol 13028. Springer, Cham. https://doi.org/10.1007/978-3-030-88480-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-88480-2_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88479-6
Online ISBN: 978-3-030-88480-2
eBook Packages: Computer ScienceComputer Science (R0)