Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

Belzner, Lenz; Gabor, Thomas; Wirsing, Martin

doi:10.1007/978-3-031-46002-9_23

Lenz Belzner⁸,
Thomas Gabor⁹ &
Martin Wirsing⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14380))

Included in the following conference series:

International Conference on Bridging the Gap between AI and Reality

1472 Accesses
3 Citations

Abstract

Large language models such as OpenAI’s GPT and Google’s Bard offer new opportunities for supporting software engineering processes. Large language model assisted software engineering promises to support developers in a conversational way with expert knowledge over the whole software lifecycle. Current applications range from requirements extraction, ambiguity resolution, code and test case generation, code review and translation to verification and repair of software vulnerabilities. In this paper we present our position on the potential benefits and challenges associated with the adoption of language models in software engineering. In particular, we focus on the possible applications of large language models for requirements engineering, system design, code and test generation, code quality reviews, and software process management. We also give a short review of the state-of-the-art of large language model support for software construction and illustrate our position by a case study on the object-oriented development of a simple “search and rescue” scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/mermaid-js/mermaid.
2.
ChatGPT https://chat.openai.com/share/a93d844d-e542-4997-a7d5-0d254e007c08.
3.
Bard https://g.co/bard/share/c51838296a3c.
4.
https://github.com/Significant-Gravitas/Auto-GPT.
5.
https://github.com/geekan/MetaGPT.
6.
https://github.com/RoboCoachTechnologies/GPT-Synthesizer.
7.
https://github.com/AntonOsika/gpt-engineer.
8.
Despite the validity of the Church–Turing thesis, more powerful tools enable more products in practice.

References

Anley, C.: Security code review with ChatGPT. NCC Group (2023). https://research.nccgroup.com/2023/02/09/security-code-review-with-chatgpt/. Accessed 20 June 2023
Bang, Y., et al.: A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023 (2023)
Belzner, L., Hennicker, R., Wirsing, M.: OnPlan: a framework for simulation-based online planning. In: Braga, C., Ölveczky, P.C. (eds.) FACS 2015. LNCS, vol. 9539, pp. 1–30. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28934-2_1
Chapter Google Scholar
Blasi, A., et al.: Translating code comments to procedure specifications. In: Tip, F., Bodden, E. (eds.) Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, 16–21 July 2018, pp. 242–253. ACM (2018)
Google Scholar
Blum, B.I., Wachter, R.F.: Expert system applications in software engineering. Telematics Inform. 3(4), 237–262 (1986)
Article Google Scholar
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Buchanan, B.G., Davis, R., Smith, R.G., Feigenbaum, E.A.: Expert systems: a perspective from computer science. Cambridge Handbooks in Psychology, 2nd edn, pp. 84–104. Cambridge University Press (2018)
Google Scholar
Busch, D., Nolte, G., Bainczyk, A., Steffen, B.: ChatGPT in the loop. In: Steffen, B. (ed.) AISoLA 2023. LNCS, vol. 14380, pp. 375–390. Springer, Cham (2023)
Google Scholar
Chang, E.Y.: Examining GPT-4: capabilities, implications, and future directions (2023)
Google Scholar
Chang, Y., et al.: A survey on evaluation of large language models. CoRR, abs/2307.03109 (2023)
Google Scholar
Charalambous, Y., Tihanyi, N., Jain, R., Sun, Y., Ferrag, M.A., Cordeiro, L.C.: A new era in software security: towards self-healing software via large language models and formal verification. CoRR, abs/2305.14752 (2023)
Google Scholar
Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)
Dettmers, T., Lewis, M., Belkada, Y., Zettlemoyer, L.: Llm.int8(): 8-bit matrix multiplication for transformers at scale. CoRR, abs/2208.07339 (2022)
Google Scholar
Feldt, R., Kang, S., Yoon, J., Yoo, S.: Towards autonomous testing agents via conversational large language models. CoRR, abs/2306.05152 (2023). Accessed 29 June 2023
Google Scholar
Frantar, E., Alistarh, D.: SparseGPT: massive language models can be accurately pruned in one-shot. CoRR, abs/2301.00774 (2023)
Google Scholar
Fu, M.: A ChatGPT-powered code reviewer bot for open-source projects. Cloud Native Computing Foundation (2023). https://www.cncf.io/blog/2023/06/06/a-chatgpt-powered-code-reviewer-bot-for-open-source-projects/. Accessed 20 July 2023
Fu, M., Tantithamthavorn, C.: GPT2SP: a transformer-based agile story point estimation approach. IEEE Trans. Software Eng. 49(2), 611–625 (2023)
Article Google Scholar
Gabor, T.: Self-adaptive fitness in evolutionary processes. Ph.D. thesis, LMU (2021)
Google Scholar
Gabor, T., et al.: The scenario coevolution paradigm: adaptive quality assurance for adaptive systems. Int. J. Softw. Tools Technol. Transf. 22(4), 457–476 (2020)
Article Google Scholar
Goldstein, I., Papert, S.: Artificial intelligence, language, and the study of knowledge. Cogn. Sci. 1(1), 84–123 (1977)
Google Scholar
Jana, P., Jha, P., Ju, H., Kishore, G., Mahajan, A., Ganesh, V.: Attention, compilation, and solver-based symbolic analysis are all you need. CoRR, abs/2306.06755 (2023)
Google Scholar
Kabir, S., Udo-Imeh, D.N., Kou, B., Zhang, T.: Who answers it better? An in-depth analysis of ChatGPT and Stack Overflow answers to software engineering questions. CoRR, abs/2308.02312 (2023)
Google Scholar
Kim, S., Yun, S., Lee, H., Gubri, M., Yoon, S., Oh, S.J.: Propile: probing privacy leakage in large language models (2023)
Google Scholar
Lahiri, S.K., et al.: Interactive code generation via test-driven user-intent formalization. CoRR, abs/2208.05950 (2022)
Google Scholar
Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Callison-Burch, C., Koehn, P., Fordyce, C.S., Monz, C. (eds.) Proceedings of the Second Workshop on Statistical Machine Translation, WMT@ACL 2007, Prague, Czech Republic, 23 June 2007, pp. 228–231. Association for Computational Linguistics (2007)
Google Scholar
Li, Y., Tan, Z., Liu, Y.: Privacy-preserving prompt tuning for large language model services (2023)
Google Scholar
Liu, J., Xia, C.S., Wang, Y., Zhang, L.: Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. CoRR, abs/2305.01210 (2023)
Google Scholar
Liventsev, V., Grishina, A., Härmä, A., Moonen, L.: Fully autonomous programming with large language models. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2023) (2023)
Google Scholar
Luccioni, A.S., Viguier, S., Ligozat, A.-L.: Estimating the carbon footprint of bloom, a 176b parameter language model. CoRR, abs/2211.02001 (2022)
Google Scholar
McColl, R.: On-demand code review with ChatGPT. NearForm blog (2023). https://www.nearform.com/blog/on-demand-code-review-with-chatgpt/. Accessed 20 June 2023
Motger, Q., Franch, X., Marco, J.: Software-based dialogue systems: survey, taxonomy, and challenges. ACM Comput. Surv. 55(5), 91:1–91:42 (2023)
Google Scholar
Naveed, H., et al.: A comprehensive overview of large language models. CoRR, abs/2307.06435 (2023)
Google Scholar
Nielsen, J.: AI is first new UI paradigm in 60 years. Jakob Nielsen on UX (2023). https://jakobnielsenphd.substack.com/p/ai-is-first-new-ui-paradigm-in-60. Accessed 03 July 2023
Ouyang, L., et al.: Training language models to follow instructions with human feedback. In: NeurIPS (2022)
Google Scholar
Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying large language models and knowledge graphs: a roadmap (2023)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI, San Francisco, California, United States (2019). https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf. Accessed 05 July 2023
Ross, S.I., Martinez, F., Houde, S., Muller, M., Weisz, J.D.: The programmer’s assistant: conversational interaction with a large language model for software development. In: Proceedings of the 28th International Conference on Intelligent User Interfaces, IUI 2023, Sydney, NSW, Australia, 27–31 March 2023, pp. 491–514. ACM (2023)
Google Scholar
Sansonnet, J.-P., Martin, J.-C., Leguern, K.: A software engineering approach combining rational and conversational agents for the design of assistance applications. In: Panayiotopoulos, T., Gratch, J., Aylett, R., Ballin, D., Olivier, P., Rist, T. (eds.) IVA 2005. LNCS (LNAI), vol. 3661, pp. 111–119. Springer, Heidelberg (2005). https://doi.org/10.1007/11550617_10
Chapter Google Scholar
Schäfer, M., Nadi, S., Eghbali, A., Tip, F.: Adaptive test generation using a large language model. CoRR, abs/2302.06527 (2023)
Google Scholar
Schröder, M.: Autoscrum: automating project planning using large language models. CoRR, abs/2306.03197 (2023)
Google Scholar
Sridhara, G., Mazumdar, S.: ChatGPT: a study on its utility for ubiquitous software engineering tasks. CoRR, abs/2305.16837 (2023)
Google Scholar
Thoppilan, R., et al.: Lamda: language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022)
White, J., Hays, S., Fu, Q., Spencer-Smith, J., Schmidt, D.C.: ChatGPT prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. CoRR, abs/2303.07839 (2023)
Google Scholar
Wirsing, M., Belzner, L.: Towards systematically engineering autonomous systems using reinforcement learning and planning. In: López-García, P., Gallagher, J.P., Giacobazzi, R. (eds.) Analysis, Verification and Transformation for Declarative Programming and Intelligent Systems. LNCS, vol. 13160, pp. 281–306. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-31476-6_16
Chapter Google Scholar
Xie, D., et al.: Docter: documentation-guided fuzzing for testing deep learning API functions. In: Ryu, S., Smaragdakis, Y. (eds.) ISSTA 2022: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, 18–22 July 2022, pp. 176–188. ACM (2022)
Google Scholar
Xie, D., et al.: Impact of large language models on generating software specifications. CoRR, abs/2306.03324 (2023)
Google Scholar
Yan, Z., Qin, Y., Hu, X.S., Shi, Y.: On the viability of using LLMS for SW/HW co-design: an example in designing cim DNN accelerators. CoRR, abs/2306.06923 (2023)
Google Scholar
Yuan, Z., et al.: No more manual tests? Evaluating and improving ChatGPT for unit test generation. CoRR, abs/2305.04207 (2023). Accessed 29 June 2023
Google Scholar
Zhao, W.X., et al.: A survey of large language models. CoRR, abs/2303.18223 (2023)
Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewer for constructive criticisms and helpful suggestions.

Author information

Authors and Affiliations

TH Ingolstadt, Ingolstadt, Germany
Lenz Belzner
Ludwig-Maximilians-Universität München, München, Germany
Thomas Gabor & Martin Wirsing

Authors

Lenz Belzner
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Gabor
View author publications
You can also search for this author in PubMed Google Scholar
Martin Wirsing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Wirsing .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Bernhard Steffen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Belzner, L., Gabor, T., Wirsing, M. (2024). Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study. In: Steffen, B. (eds) Bridging the Gap Between AI and Reality. AISoLA 2023. Lecture Notes in Computer Science, vol 14380. Springer, Cham. https://doi.org/10.1007/978-3-031-46002-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-46002-9_23
Published: 14 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46001-2
Online ISBN: 978-3-031-46002-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study