Source-Code Generation Using Deep Learning: A Survey

Ahmed, Areeg; Azab, Shahira; Abdelhamid, Yasser

doi:10.1007/978-3-031-49011-8_37

Areeg Ahmed¹²,
Shahira Azab¹² &
Yasser Abdelhamid^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14116))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

1424 Accesses
1 Citations

Abstract

In recent years, the need for writing effective, reusable, and high-quality source code has grown exponentially. Writing source code is an integral part of building any software system; the development phase of the software lifecycle contains code implementation, refactoring, maintenance, and fixing bugs. Software developers implement the desired solution by turning the system requirements into viable software products. For the most part, the implementation phase can be challenging as it requires a certain level of problem-solving skills and the ability to produce high-quality outcomes without decreasing productivity rates or not meeting the business plans and deadlines. Programmers’ daily tasks might also include writing large amounts of repetitive boilerplate code, which can be tedious, not to mention the potential bugs that could arise from human errors during the development process. The ability to automatically generate source code will save significant time and effort invested in the software development process by increasing the speed and efficiency of software development teams. In this survey, we review and summarize the recent studies on deep learning approaches used to generate source code in different programming languages such as Java, Python, and SQL (Structured Query Language). We categorize the surveyed work into two groups, Natural Language-based solutions for approaches that use natural text as input and Computer Vision-based solutions which generate code based on images as input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Banzhaf, W.: Artificial Intelligence: Genetic Programming. In: Elsevier eBooks, pp. 789–792 (2001). doi: https://doi.org/10.1016/b0-08-043076-7/00557-x
Vaswani, A. et al.: Attention is All you Need. vol. 30, pp. 5998–6008 (2017). [Online]. Available: https://arxiv.org/pdf/1706.03762v5
OpenAI: “ChatGPT“ GitHub. [Online]. Available: https://github.com/openai/gpt-3. Accessed: Apr. 23, 2023
Pulido-Prieto, O., Juárez-Martínez, U.: A survey of naturalistic programming technologies. ACM Comput. Surv. 50(5), 1–35 (2017). https://doi.org/10.1145/3109481
Article Google Scholar
Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. 51(4), 1–37 (2018). https://doi.org/10.1145/3212695
Article Google Scholar
Shin, J., Nam, J.: A survey of automatic code generation from natural language. J. Inf. Process. Syst. 17(3), 537–555 (2021). https://doi.org/10.3745/JIPS.04.0216
Article Google Scholar
Yang, C., Liu, Y., Yin, C.: Recent advances in intelligent source code generation: A survey on natural language based studies. Entropy 23(9), 1174 (2021). https://doi.org/10.3390/e23091174
Article Google Scholar
Dehaerne, E., Dey, B., Halder, S., De Gendt, S., Meert, W.: Code generation using machine learning: A systematic review. IEEE Access 10, 82434–82455 (2022). https://doi.org/10.1109/access.2022.3196347
Article Google Scholar
Le, T.H.M., Chen, H., Babar, M.E.: Deep learning for source code modeling and generation: Models, applications and challenges. arXiv preprint arXiv:2002.05442 (2020). Available at: http://arxiv.org/pdf/2002.05442
Zhang, C., et al.: A survey of automatic source code summarization. Symmetry 14(3), 471 (2022). https://doi.org/10.3390/sym14030471
Article Google Scholar
Song, X., Sun, H., Wang, X., Yan, J.-F.: A survey of automatic generation of source code comments: Algorithms and techniques. IEEE Access 7, 111411–111428 (2019). https://doi.org/10.1109/access.2019.2931579
Article Google Scholar
Xiaomeng, W., Tao, Z., Wei, X., Changyu, H.: A survey on source code review using machine learning. In: Proceedings of the 2018 3rd International Conference on Information Systems Engineering (ICISE), Shanghai, China, pp. 56–60 (2018). doi: https://doi.org/10.1109/ICISE.2018.00018
Yang, Y., Xia, X., Lo, D., Grundy, J.: A survey on deep learning for software engineering. ACM Comput. Surv. 54(10s), 1–73 (2021). https://doi.org/10.1145/3505243
Article Google Scholar
Sharma, T. et al.: A Survey on Machine learning techniques for source code analysis. arXiv preprint arXiv:2110.09610 (2021). Available at: https://arxiv.org/abs/2110.09610
Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 2091–2100 (2016). Available at: http://proceedings.mlr.press/v48/allamanis16.pdf
Murali, V., Qi, L., Chaudhuri, S., Jermaine, C.: Neural sketch learning for conditional program generation. In: Proceedings of the International Conference on Learning Representations (ICLR) (2018). Available at: https://arxiv.org/pdf/1703.05698
Trishullab: “GitHub—trishullab/bayou: System for synthesizing Java API idioms, powered by Neural Sketch Learning.” GitHub. Available at: https://github.com/trishullab/bayou
Li, J., Wang, Y., Lyu, M.R., King, I.: Code completion with neural attention and pointer networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI) (2018). https://doi.org/10.24963/ijcai.2018/578
Yin, P., Neubig, G.: TRANX: A Transition-based neural abstract syntax parser for semantic parsing and code generation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2018). https://doi.org/10.18653/v1/d18-2002
Oda, Y. et al.: Learning to generate pseudo-code from source code using statistical machine translation. In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA, pp. 574–584 (2015). doi: https://doi.org/10.1109/ASE.2015.36
Zhong, V., Xiong, C.: Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103 (2017). Available at: https://arxiv.org/abs/1709.00103
Tiwang, R., Oladunni, T., Xu, W.: A Deep learning model for source code generation. In: Proceedings of the 2019 SoutheastCon, Huntsville, AL, USA, pp. 1–7 (2019). doi: https://doi.org/10.1109/SoutheastCon42311.2019.9020360
Agashe, R., Iyer, S., Zettlemoyer, L.: JuICe: A large scale distantly supervised dataset for open domain context-based code generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (2019). doi: https://doi.org/10.18653/v1/d19-1546
Shin, E., Allamanis, M., Brockschmidt, M., Polozov, A.: Program synthesis and semantic parsing with learned code idioms. In: Neural Information Processing Systems, vol. 32, pp. 10825–10835 (2019). Available at: http://papers.nips.cc/paper/9265-program-synthesis-and-semantic-parsing-with-learned-code-idioms.pdf
Ling, W., Wei, S., Yang, Z., Li, J., Huang, F., Zhou, M.: Latent predictor networks for code generation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), Berlin, Germany, pp. 599–609 (2016). doi: https://doi.org/10.18653/v1/p16-1057
Yu, T., et al.: Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium, pp.3911–3921 (2018).doi:https://doi.org/10.18653/v1/d18-1425
Sun, Z., Zhu, Q., Xiong, Y., Sun, Y., Mou, L., Zhang, L.: TreeGen: A tree-based transformer architecture for code generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8984–8991 (2020). doi: https://doi.org/10.1609/aaai.v34i05.6430
Morton, K., Hallahan, W. T., Shum, E., Piskac, R., Santolucito, M.: Grammar filtering for syntax-guided synthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 02, pp. 1611–1618 (2020). doi: https://doi.org/10.1609/aaai.v34i02.5522
Barrett, C. et al.: CVC4. In: Lecture Notes in Computer Science, Springer Science+Business Media, pp. 171–177 (2011). doi: https://doi.org/10.1007/978-3-642-22110-1_14
Shim, S., Patil, P., Yadav, R. R., Shinde, A., Devale, V.: DeeperCoder: Code generation using machine learning. In: 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, pp. 0194–0199 (2020). doi: https://doi.org/10.1109/CCWC47524.2020.9031149
Xu, F.K., Jiang, Z., Yin, P., Vasilescu, B., Neubig, G.: Incorporating External Knowledge through Pre-training for Natural Language to Code Generation. (2020). https://doi.org/10.18653/v1/2020.acl-main.538
Article Google Scholar
Guo, T., Gao, H.: Content Enhanced BERT-based text-to-SQL generation. (2019). Retrieved from arXiv.org: https://arxiv.org/abs/1910.07179
Grouwstra, K.: Type-driven Neural Programming by Example. (2020). Retrieved from arXiv.org: https://arxiv.org/abs/2008.12613
Gemmell, C., Rossetto, F., Dalton, J.: Relevance Transformer: Generating Concise Code Snippets with Relevance Feedback. (2020). Retrieved from Cornell University: doi: https://doi.org/10.1145/3397271.3401215
Cruz-Benito, J., Vishwakarma, S., Martín-Fernández, F., Faro, I.: Automated source code generation and auto-completion using deep learning: Comparing and discussing current language model-related approaches. AI 2(1), 1–16 (2021). doi: https://doi.org/10.3390/ai2010001
Merity, S., Keskar, N.S., Socher, R.: Regularizing and Optimizing LSTM Language Models. arXiv.org (2017). Available at: https://arxiv.org/pdf/1708.02182
Bradbury, J., Merity, S., Xiong, C., Socher, R.: Quasi-recurrent neural networks. arXiv.org (2018). Available at: https://arxiv.org/pdf/1611.01576
Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv.org (2019). Available at: https://arxiv.org/abs/1909.09436
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. (2019) [Online]. Available: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Perez, L., Ottens, L., Viswanathan, S.: Automatic code generation using pre-trained language models. (2021) [Online]. Available: https://arxiv.org/abs/2102.10535
Chen, M., et al.: Evaluating large language models trained on code. (2021) [Online]. Available: https://arxiv.org/abs/2107.03374
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020) [Online]. Available: https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Wang, B., Komatsuzaki, A.: GPT-J-6B: A 6 Billion parameter autoregressive language model. (2021) [Online]. Available: https://github.com/kingoflolz/mesh-transformer-jax
Hong, J., Dohan, D., Singh, R., Sutton, C., Zaheer, M.: Latent programmer: Discrete latent codes for program synthesis. In Proceedings of the 38th International Conference on Machine Learning (ICML), vol. 139, pp. 4308–4318 (2021) [Online]. Available: http://proceedings.mlr.press/v139/hong21a/hong21a.pdf
Wan, Y., et al.: Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE), pp. 397–407. Montpellier, France (2018). https://doi.org/10.1145/3238147.3238206
Parvez, R., Ahmad, W. U., Chakraborty, S., Ray, B., Chang, K.-W.: Retrieval augmented code generation and summarization. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2950–2961 (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.232
Lu, S., et al..: CodeXGLUE: A Machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021) [Online]. Available: https://arxiv.org/abs/2102.04664
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Mapping Language to Code in Programmatic Context. (2018). https://doi.org/10.18653/v1/d18-1192
Article Google Scholar
Blazek, P.J., Venkatesh, K., Lin, M. M.: Deep Distilling: Automated code generation using explainable deep learning. arXiv.org (2021) Available: https://arxiv.org/abs/2111.08275
Mukherjee, R., Wen, Y., Chaudhari, D., Reps, T.W., Chaudhuri, S., Jermaine, C.: Neural program generation modulo static analysis. arXiv.org (2021) Available: https://arxiv.org/abs/2111.01633
Wang, X., et al.: Compilable Neural Code Generation with Compiler Feedback. (2022). https://doi.org/10.18653/v1/2022.findings-acl.2
Article Google Scholar
Svyatkovskiy, A., Deng, S.K., Fu, S.-Y., Sundaresan, N.: IntelliCode Compose: Code Generation Using Transformer. (2020). https://doi.org/10.1145/3368089.3417058
Article Google Scholar
Kulal, S., et al.: SPoC: Search-based pseudocode to code. 32, 11883–11894 (2019) [Online]. Available: http://arxiv.org/pdf/1906.04908.pdf
Yang, G., Zhou, Y., Chen, X., Zhang, X., Han, T., Chen, T.: ExploitGen: Template-augmented exploit code generation based on CodeBERT. J. Syst. Softw. 197, 111577 (2023). https://doi.org/10.1016/j.jss.2022.111577
Article Google Scholar
Feng, Z., et al.: CodeBERT: A pre-trained model for programming and natural languages (2020). doi: https://doi.org/10.18653/v1/2020.findings-emnlp.139
Liguori, P., et al.: EVIL: Exploiting software via natural language. Cornell Univ. (2021). https://doi.org/10.1109/issre52982.2021.00042
Article Google Scholar
Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries, pp. 74–81 (2004) [Online]. Available: http://anthology.aclweb.org/W/W04/W04-1013.pdf
Nijkamp, E., et al.: CodeGen: An open large language model for code with multi-turn program synthesis. arXiv.org (2022). Available: https://arxiv.org/abs/2203.13474
Gao, L., et al.: The Pile: An 800GB Dataset of diverse text for language modeling. arXiv.org (2020). Available: https://arxiv.org/abs/2101.00027
Beltramelli, T.: pix2code: Generating code from a graphical user interface screenshot. arXiv.org (2017). Available: https://arxiv.org/abs/1705.07962
Sethi, A., Sankaran, A., Panwar, N., Khare, S., Mani, S.: DLPaper2Code: Auto-generation of code from deep learning research papers. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, p. 12326 (2017). doi: https://doi.org/10.1609/aaai.v32i1.12326
Zhu, Z., Xue, Z., Yuan, Z.: Automatic graphics program generation using attention-based hierarchical decoder. In: Springer eBooks, Springer Nature (2018), pp. 181–196. doi: https://doi.org/10.1007/978-3-030-20876-9_12
Asiroglu, B., et al.: Automatic HTML code generation from mock-up images using machine learning techniques. In: Proceed-ings of the 2019 Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), pp. 1–6 (2019). doi: https://doi.org/10.1109/ebbt.2019.8741736
microsoft: ailab/Sketch2Code at master microsoft/ailab. GitHub. Available: https://github.com/microsoft/ailab/tree/master/Sketch2Code/model/images
Teng, Z., Fu, Q., White, J., Schmidt, D. C.: Sketch2Vis: Generating data visualizations from hand-drawn sketches with deep learning. In: Proceedings of the 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 438–445 (2021). doi: https://doi.org/10.1109/icmla52953.2021.00141
Hendrycks, D., et al.: Measuring coding challenge competence with APPS. In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2021). Available: https://openreview.net/pdf?id=sD93GOzH3i5
Papineni, K., Roukos, S., Ward, T. J., Zhu, W.-J.: BLEU. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 311–318 (2002). doi: https://doi.org/10.3115/1073083.1073135
Yin, P., Neubig, G.: A syntactic neural model for general-purpose code generation. Cornell Univ. (2017). https://doi.org/10.18653/v1/p17-1041
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Graduate Studies for Statistical Research (FGSSR), Cairo University, Giza, Egypt
Areeg Ahmed, Shahira Azab & Yasser Abdelhamid
The Egyptian E-Learning University, Giza, Egypt
Yasser Abdelhamid

Authors

Areeg Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Shahira Azab
View author publications
You can also search for this author in PubMed Google Scholar
Yasser Abdelhamid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Areeg Ahmed .

Editor information

Editors and Affiliations

Lucy Family Institute for Data and Society, Notre Dame, IN, USA
Nuno Moniz
GECAD, Polytechnic of Porto, Porto, Portugal
Zita Vale
GRIA—LIACC, University of Azores, Ponta Delgada, Portugal
José Cascalho
CISUC, University of Coimbra, Coimbra, Portugal
Catarina Silva
IEETA, University of Aveiro, Aveiro, Portugal
Raquel Sebastião

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahmed, A., Azab, S., Abdelhamid, Y. (2023). Source-Code Generation Using Deep Learning: A Survey. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds) Progress in Artificial Intelligence. EPIA 2023. Lecture Notes in Computer Science(), vol 14116. Springer, Cham. https://doi.org/10.1007/978-3-031-49011-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-49011-8_37
Published: 15 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49010-1
Online ISBN: 978-3-031-49011-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics