Skip to main content

Source-Code Generation Using Deep Learning: A Survey

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14116))

Included in the following conference series:

Abstract

In recent years, the need for writing effective, reusable, and high-quality source code has grown exponentially. Writing source code is an integral part of building any software system; the development phase of the software lifecycle contains code implementation, refactoring, maintenance, and fixing bugs. Software developers implement the desired solution by turning the system requirements into viable software products. For the most part, the implementation phase can be challenging as it requires a certain level of problem-solving skills and the ability to produce high-quality outcomes without decreasing productivity rates or not meeting the business plans and deadlines. Programmers’ daily tasks might also include writing large amounts of repetitive boilerplate code, which can be tedious, not to mention the potential bugs that could arise from human errors during the development process. The ability to automatically generate source code will save significant time and effort invested in the software development process by increasing the speed and efficiency of software development teams. In this survey, we review and summarize the recent studies on deep learning approaches used to generate source code in different programming languages such as Java, Python, and SQL (Structured Query Language). We categorize the surveyed work into two groups, Natural Language-based solutions for approaches that use natural text as input and Computer Vision-based solutions which generate code based on images as input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Banzhaf, W.: Artificial Intelligence: Genetic Programming. In: Elsevier eBooks, pp. 789–792 (2001). doi: https://doi.org/10.1016/b0-08-043076-7/00557-x

  2. Vaswani, A. et al.: Attention is All you Need. vol. 30, pp. 5998–6008 (2017). [Online]. Available: https://arxiv.org/pdf/1706.03762v5

  3. OpenAI: “ChatGPT“ GitHub. [Online]. Available: https://github.com/openai/gpt-3. Accessed: Apr. 23, 2023

  4. Pulido-Prieto, O., Juárez-Martínez, U.: A survey of naturalistic programming technologies. ACM Comput. Surv. 50(5), 1–35 (2017). https://doi.org/10.1145/3109481

    Article  Google Scholar 

  5. Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. 51(4), 1–37 (2018). https://doi.org/10.1145/3212695

    Article  Google Scholar 

  6. Shin, J., Nam, J.: A survey of automatic code generation from natural language. J. Inf. Process. Syst. 17(3), 537–555 (2021). https://doi.org/10.3745/JIPS.04.0216

    Article  Google Scholar 

  7. Yang, C., Liu, Y., Yin, C.: Recent advances in intelligent source code generation: A survey on natural language based studies. Entropy 23(9), 1174 (2021). https://doi.org/10.3390/e23091174

    Article  Google Scholar 

  8. Dehaerne, E., Dey, B., Halder, S., De Gendt, S., Meert, W.: Code generation using machine learning: A systematic review. IEEE Access 10, 82434–82455 (2022). https://doi.org/10.1109/access.2022.3196347

    Article  Google Scholar 

  9. Le, T.H.M., Chen, H., Babar, M.E.: Deep learning for source code modeling and generation: Models, applications and challenges. arXiv preprint arXiv:2002.05442 (2020). Available at: http://arxiv.org/pdf/2002.05442

  10. Zhang, C., et al.: A survey of automatic source code summarization. Symmetry 14(3), 471 (2022). https://doi.org/10.3390/sym14030471

    Article  Google Scholar 

  11. Song, X., Sun, H., Wang, X., Yan, J.-F.: A survey of automatic generation of source code comments: Algorithms and techniques. IEEE Access 7, 111411–111428 (2019). https://doi.org/10.1109/access.2019.2931579

    Article  Google Scholar 

  12. Xiaomeng, W., Tao, Z., Wei, X., Changyu, H.: A survey on source code review using machine learning. In: Proceedings of the 2018 3rd International Conference on Information Systems Engineering (ICISE), Shanghai, China, pp. 56–60 (2018). doi: https://doi.org/10.1109/ICISE.2018.00018

  13. Yang, Y., Xia, X., Lo, D., Grundy, J.: A survey on deep learning for software engineering. ACM Comput. Surv. 54(10s), 1–73 (2021). https://doi.org/10.1145/3505243

    Article  Google Scholar 

  14. Sharma, T. et al.: A Survey on Machine learning techniques for source code analysis. arXiv preprint arXiv:2110.09610 (2021). Available at: https://arxiv.org/abs/2110.09610

  15. Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 2091–2100 (2016). Available at: http://proceedings.mlr.press/v48/allamanis16.pdf

  16. Murali, V., Qi, L., Chaudhuri, S., Jermaine, C.: Neural sketch learning for conditional program generation. In: Proceedings of the International Conference on Learning Representations (ICLR) (2018). Available at: https://arxiv.org/pdf/1703.05698

  17. Trishullab: “GitHub—trishullab/bayou: System for synthesizing Java API idioms, powered by Neural Sketch Learning.” GitHub. Available at: https://github.com/trishullab/bayou

  18. Li, J., Wang, Y., Lyu, M.R., King, I.: Code completion with neural attention and pointer networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI) (2018). https://doi.org/10.24963/ijcai.2018/578

  19. Yin, P., Neubig, G.: TRANX: A Transition-based neural abstract syntax parser for semantic parsing and code generation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2018). https://doi.org/10.18653/v1/d18-2002

  20. Oda, Y. et al.: Learning to generate pseudo-code from source code using statistical machine translation. In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA, pp. 574–584 (2015). doi: https://doi.org/10.1109/ASE.2015.36

  21. Zhong, V., Xiong, C.: Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103 (2017). Available at: https://arxiv.org/abs/1709.00103

  22. Tiwang, R., Oladunni, T., Xu, W.: A Deep learning model for source code generation. In: Proceedings of the 2019 SoutheastCon, Huntsville, AL, USA, pp. 1–7 (2019). doi: https://doi.org/10.1109/SoutheastCon42311.2019.9020360

  23. Agashe, R., Iyer, S., Zettlemoyer, L.: JuICe: A large scale distantly supervised dataset for open domain context-based code generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (2019). doi: https://doi.org/10.18653/v1/d19-1546

  24. Shin, E., Allamanis, M., Brockschmidt, M., Polozov, A.: Program synthesis and semantic parsing with learned code idioms. In: Neural Information Processing Systems, vol. 32, pp. 10825–10835 (2019). Available at: http://papers.nips.cc/paper/9265-program-synthesis-and-semantic-parsing-with-learned-code-idioms.pdf

  25. Ling, W., Wei, S., Yang, Z., Li, J., Huang, F., Zhou, M.: Latent predictor networks for code generation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), Berlin, Germany, pp. 599–609 (2016). doi: https://doi.org/10.18653/v1/p16-1057

  26. Yu, T., et al.: Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium, pp.3911–3921 (2018).doi:https://doi.org/10.18653/v1/d18-1425

  27. Sun, Z., Zhu, Q., Xiong, Y., Sun, Y., Mou, L., Zhang, L.: TreeGen: A tree-based transformer architecture for code generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8984–8991 (2020). doi: https://doi.org/10.1609/aaai.v34i05.6430

  28. Morton, K., Hallahan, W. T., Shum, E., Piskac, R., Santolucito, M.: Grammar filtering for syntax-guided synthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 02, pp. 1611–1618 (2020). doi: https://doi.org/10.1609/aaai.v34i02.5522

  29. Barrett, C. et al.: CVC4. In: Lecture Notes in Computer Science, Springer Science+Business Media, pp. 171–177 (2011). doi: https://doi.org/10.1007/978-3-642-22110-1_14

  30. Shim, S., Patil, P., Yadav, R. R., Shinde, A., Devale, V.: DeeperCoder: Code generation using machine learning. In: 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, pp. 0194–0199 (2020). doi: https://doi.org/10.1109/CCWC47524.2020.9031149

  31. Xu, F.K., Jiang, Z., Yin, P., Vasilescu, B., Neubig, G.: Incorporating External Knowledge through Pre-training for Natural Language to Code Generation. (2020). https://doi.org/10.18653/v1/2020.acl-main.538

    Article  Google Scholar 

  32. Guo, T., Gao, H.: Content Enhanced BERT-based text-to-SQL generation. (2019). Retrieved from arXiv.org: https://arxiv.org/abs/1910.07179

  33. Grouwstra, K.: Type-driven Neural Programming by Example. (2020). Retrieved from arXiv.org: https://arxiv.org/abs/2008.12613

  34. Gemmell, C., Rossetto, F., Dalton, J.: Relevance Transformer: Generating Concise Code Snippets with Relevance Feedback. (2020). Retrieved from Cornell University: doi: https://doi.org/10.1145/3397271.3401215

  35. Cruz-Benito, J., Vishwakarma, S., Martín-Fernández, F., Faro, I.: Automated source code generation and auto-completion using deep learning: Comparing and discussing current language model-related approaches. AI 2(1), 1–16 (2021). doi: https://doi.org/10.3390/ai2010001

  36. Merity, S., Keskar, N.S., Socher, R.: Regularizing and Optimizing LSTM Language Models. arXiv.org (2017). Available at: https://arxiv.org/pdf/1708.02182

  37. Bradbury, J., Merity, S., Xiong, C., Socher, R.: Quasi-recurrent neural networks. arXiv.org (2018). Available at: https://arxiv.org/pdf/1611.01576

  38. Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv.org (2019). Available at: https://arxiv.org/abs/1909.09436

  39. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. (2019) [Online]. Available: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

  40. Perez, L., Ottens, L., Viswanathan, S.: Automatic code generation using pre-trained language models. (2021) [Online]. Available: https://arxiv.org/abs/2102.10535

  41. Chen, M., et al.: Evaluating large language models trained on code. (2021) [Online]. Available: https://arxiv.org/abs/2107.03374

  42. Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020) [Online]. Available: https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

  43. Wang, B., Komatsuzaki, A.: GPT-J-6B: A 6 Billion parameter autoregressive language model. (2021) [Online]. Available: https://github.com/kingoflolz/mesh-transformer-jax

  44. Hong, J., Dohan, D., Singh, R., Sutton, C., Zaheer, M.: Latent programmer: Discrete latent codes for program synthesis. In Proceedings of the 38th International Conference on Machine Learning (ICML), vol. 139, pp. 4308–4318 (2021) [Online]. Available: http://proceedings.mlr.press/v139/hong21a/hong21a.pdf

  45. Wan, Y., et al.: Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE), pp. 397–407. Montpellier, France (2018). https://doi.org/10.1145/3238147.3238206

  46. Parvez, R., Ahmad, W. U., Chakraborty, S., Ray, B., Chang, K.-W.: Retrieval augmented code generation and summarization. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2950–2961 (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.232

  47. Lu, S., et al..: CodeXGLUE: A Machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021) [Online]. Available: https://arxiv.org/abs/2102.04664

  48. Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Mapping Language to Code in Programmatic Context. (2018). https://doi.org/10.18653/v1/d18-1192

    Article  Google Scholar 

  49. Blazek, P.J., Venkatesh, K., Lin, M. M.: Deep Distilling: Automated code generation using explainable deep learning. arXiv.org (2021) Available: https://arxiv.org/abs/2111.08275

  50. Mukherjee, R., Wen, Y., Chaudhari, D., Reps, T.W., Chaudhuri, S., Jermaine, C.: Neural program generation modulo static analysis. arXiv.org (2021) Available: https://arxiv.org/abs/2111.01633

  51. Wang, X., et al.: Compilable Neural Code Generation with Compiler Feedback. (2022). https://doi.org/10.18653/v1/2022.findings-acl.2

    Article  Google Scholar 

  52. Svyatkovskiy, A., Deng, S.K., Fu, S.-Y., Sundaresan, N.: IntelliCode Compose: Code Generation Using Transformer. (2020). https://doi.org/10.1145/3368089.3417058

    Article  Google Scholar 

  53. Kulal, S., et al.: SPoC: Search-based pseudocode to code. 32, 11883–11894 (2019) [Online]. Available: http://arxiv.org/pdf/1906.04908.pdf

  54. Yang, G., Zhou, Y., Chen, X., Zhang, X., Han, T., Chen, T.: ExploitGen: Template-augmented exploit code generation based on CodeBERT. J. Syst. Softw. 197, 111577 (2023). https://doi.org/10.1016/j.jss.2022.111577

    Article  Google Scholar 

  55. Feng, Z., et al.: CodeBERT: A pre-trained model for programming and natural languages (2020). doi: https://doi.org/10.18653/v1/2020.findings-emnlp.139

  56. Liguori, P., et al.: EVIL: Exploiting software via natural language. Cornell Univ. (2021). https://doi.org/10.1109/issre52982.2021.00042

    Article  Google Scholar 

  57. Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries, pp. 74–81 (2004) [Online]. Available: http://anthology.aclweb.org/W/W04/W04-1013.pdf

  58. Nijkamp, E., et al.: CodeGen: An open large language model for code with multi-turn program synthesis. arXiv.org (2022). Available: https://arxiv.org/abs/2203.13474

  59. Gao, L., et al.: The Pile: An 800GB Dataset of diverse text for language modeling. arXiv.org (2020). Available: https://arxiv.org/abs/2101.00027

  60. Beltramelli, T.: pix2code: Generating code from a graphical user interface screenshot. arXiv.org (2017). Available: https://arxiv.org/abs/1705.07962

  61. Sethi, A., Sankaran, A., Panwar, N., Khare, S., Mani, S.: DLPaper2Code: Auto-generation of code from deep learning research papers. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, p. 12326 (2017). doi: https://doi.org/10.1609/aaai.v32i1.12326

  62. Zhu, Z., Xue, Z., Yuan, Z.: Automatic graphics program generation using attention-based hierarchical decoder. In: Springer eBooks, Springer Nature (2018), pp. 181–196. doi: https://doi.org/10.1007/978-3-030-20876-9_12

  63. Asiroglu, B., et al.: Automatic HTML code generation from mock-up images using machine learning techniques. In: Proceed-ings of the 2019 Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), pp. 1–6 (2019). doi: https://doi.org/10.1109/ebbt.2019.8741736

  64. microsoft: ailab/Sketch2Code at master microsoft/ailab. GitHub. Available: https://github.com/microsoft/ailab/tree/master/Sketch2Code/model/images

  65. Teng, Z., Fu, Q., White, J., Schmidt, D. C.: Sketch2Vis: Generating data visualizations from hand-drawn sketches with deep learning. In: Proceedings of the 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 438–445 (2021). doi: https://doi.org/10.1109/icmla52953.2021.00141

  66. Hendrycks, D., et al.: Measuring coding challenge competence with APPS. In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2021). Available: https://openreview.net/pdf?id=sD93GOzH3i5

  67. Papineni, K., Roukos, S., Ward, T. J., Zhu, W.-J.: BLEU. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 311–318 (2002). doi: https://doi.org/10.3115/1073083.1073135

  68. Yin, P., Neubig, G.: A syntactic neural model for general-purpose code generation. Cornell Univ. (2017). https://doi.org/10.18653/v1/p17-1041

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Areeg Ahmed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ahmed, A., Azab, S., Abdelhamid, Y. (2023). Source-Code Generation Using Deep Learning: A Survey. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., SebastiĂŁo, R. (eds) Progress in Artificial Intelligence. EPIA 2023. Lecture Notes in Computer Science(), vol 14116. Springer, Cham. https://doi.org/10.1007/978-3-031-49011-8_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49011-8_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49010-1

  • Online ISBN: 978-3-031-49011-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics