Skip to main content

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

  • Conference paper
  • First Online:
Bridging the Gap Between AI and Reality (AISoLA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14380))

Included in the following conference series:

Abstract

Large language models such as OpenAI’s GPT and Google’s Bard offer new opportunities for supporting software engineering processes. Large language model assisted software engineering promises to support developers in a conversational way with expert knowledge over the whole software lifecycle. Current applications range from requirements extraction, ambiguity resolution, code and test case generation, code review and translation to verification and repair of software vulnerabilities. In this paper we present our position on the potential benefits and challenges associated with the adoption of language models in software engineering. In particular, we focus on the possible applications of large language models for requirements engineering, system design, code and test generation, code quality reviews, and software process management. We also give a short review of the state-of-the-art of large language model support for software construction and illustrate our position by a case study on the object-oriented development of a simple “search and rescue” scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/mermaid-js/mermaid.

  2. 2.

    ChatGPT https://chat.openai.com/share/a93d844d-e542-4997-a7d5-0d254e007c08.

  3. 3.

    Bard https://g.co/bard/share/c51838296a3c.

  4. 4.

    https://github.com/Significant-Gravitas/Auto-GPT.

  5. 5.

    https://github.com/geekan/MetaGPT.

  6. 6.

    https://github.com/RoboCoachTechnologies/GPT-Synthesizer.

  7. 7.

    https://github.com/AntonOsika/gpt-engineer.

  8. 8.

    Despite the validity of the Church–Turing thesis, more powerful tools enable more products in practice.

References

  1. Anley, C.: Security code review with ChatGPT. NCC Group (2023). https://research.nccgroup.com/2023/02/09/security-code-review-with-chatgpt/. Accessed 20 June 2023

  2. Bang, Y., et al.: A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023 (2023)

  3. Belzner, L., Hennicker, R., Wirsing, M.: OnPlan: a framework for simulation-based online planning. In: Braga, C., Ölveczky, P.C. (eds.) FACS 2015. LNCS, vol. 9539, pp. 1–30. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28934-2_1

    Chapter  Google Scholar 

  4. Blasi, A., et al.: Translating code comments to procedure specifications. In: Tip, F., Bodden, E. (eds.) Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, 16–21 July 2018, pp. 242–253. ACM (2018)

    Google Scholar 

  5. Blum, B.I., Wachter, R.F.: Expert system applications in software engineering. Telematics Inform. 3(4), 237–262 (1986)

    Article  Google Scholar 

  6. Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  7. Buchanan, B.G., Davis, R., Smith, R.G., Feigenbaum, E.A.: Expert systems: a perspective from computer science. Cambridge Handbooks in Psychology, 2nd edn, pp. 84–104. Cambridge University Press (2018)

    Google Scholar 

  8. Busch, D., Nolte, G., Bainczyk, A., Steffen, B.: ChatGPT in the loop. In: Steffen, B. (ed.) AISoLA 2023. LNCS, vol. 14380, pp. 375–390. Springer, Cham (2023)

    Google Scholar 

  9. Chang, E.Y.: Examining GPT-4: capabilities, implications, and future directions (2023)

    Google Scholar 

  10. Chang, Y., et al.: A survey on evaluation of large language models. CoRR, abs/2307.03109 (2023)

    Google Scholar 

  11. Charalambous, Y., Tihanyi, N., Jain, R., Sun, Y., Ferrag, M.A., Cordeiro, L.C.: A new era in software security: towards self-healing software via large language models and formal verification. CoRR, abs/2305.14752 (2023)

    Google Scholar 

  12. Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)

  13. Dettmers, T., Lewis, M., Belkada, Y., Zettlemoyer, L.: Llm.int8(): 8-bit matrix multiplication for transformers at scale. CoRR, abs/2208.07339 (2022)

    Google Scholar 

  14. Feldt, R., Kang, S., Yoon, J., Yoo, S.: Towards autonomous testing agents via conversational large language models. CoRR, abs/2306.05152 (2023). Accessed 29 June 2023

    Google Scholar 

  15. Frantar, E., Alistarh, D.: SparseGPT: massive language models can be accurately pruned in one-shot. CoRR, abs/2301.00774 (2023)

    Google Scholar 

  16. Fu, M.: A ChatGPT-powered code reviewer bot for open-source projects. Cloud Native Computing Foundation (2023). https://www.cncf.io/blog/2023/06/06/a-chatgpt-powered-code-reviewer-bot-for-open-source-projects/. Accessed 20 July 2023

  17. Fu, M., Tantithamthavorn, C.: GPT2SP: a transformer-based agile story point estimation approach. IEEE Trans. Software Eng. 49(2), 611–625 (2023)

    Article  Google Scholar 

  18. Gabor, T.: Self-adaptive fitness in evolutionary processes. Ph.D. thesis, LMU (2021)

    Google Scholar 

  19. Gabor, T., et al.: The scenario coevolution paradigm: adaptive quality assurance for adaptive systems. Int. J. Softw. Tools Technol. Transf. 22(4), 457–476 (2020)

    Article  Google Scholar 

  20. Goldstein, I., Papert, S.: Artificial intelligence, language, and the study of knowledge. Cogn. Sci. 1(1), 84–123 (1977)

    Google Scholar 

  21. Jana, P., Jha, P., Ju, H., Kishore, G., Mahajan, A., Ganesh, V.: Attention, compilation, and solver-based symbolic analysis are all you need. CoRR, abs/2306.06755 (2023)

    Google Scholar 

  22. Kabir, S., Udo-Imeh, D.N., Kou, B., Zhang, T.: Who answers it better? An in-depth analysis of ChatGPT and Stack Overflow answers to software engineering questions. CoRR, abs/2308.02312 (2023)

    Google Scholar 

  23. Kim, S., Yun, S., Lee, H., Gubri, M., Yoon, S., Oh, S.J.: Propile: probing privacy leakage in large language models (2023)

    Google Scholar 

  24. Lahiri, S.K., et al.: Interactive code generation via test-driven user-intent formalization. CoRR, abs/2208.05950 (2022)

    Google Scholar 

  25. Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Callison-Burch, C., Koehn, P., Fordyce, C.S., Monz, C. (eds.) Proceedings of the Second Workshop on Statistical Machine Translation, WMT@ACL 2007, Prague, Czech Republic, 23 June 2007, pp. 228–231. Association for Computational Linguistics (2007)

    Google Scholar 

  26. Li, Y., Tan, Z., Liu, Y.: Privacy-preserving prompt tuning for large language model services (2023)

    Google Scholar 

  27. Liu, J., Xia, C.S., Wang, Y., Zhang, L.: Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. CoRR, abs/2305.01210 (2023)

    Google Scholar 

  28. Liventsev, V., Grishina, A., Härmä, A., Moonen, L.: Fully autonomous programming with large language models. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2023) (2023)

    Google Scholar 

  29. Luccioni, A.S., Viguier, S., Ligozat, A.-L.: Estimating the carbon footprint of bloom, a 176b parameter language model. CoRR, abs/2211.02001 (2022)

    Google Scholar 

  30. McColl, R.: On-demand code review with ChatGPT. NearForm blog (2023). https://www.nearform.com/blog/on-demand-code-review-with-chatgpt/. Accessed 20 June 2023

  31. Motger, Q., Franch, X., Marco, J.: Software-based dialogue systems: survey, taxonomy, and challenges. ACM Comput. Surv. 55(5), 91:1–91:42 (2023)

    Google Scholar 

  32. Naveed, H., et al.: A comprehensive overview of large language models. CoRR, abs/2307.06435 (2023)

    Google Scholar 

  33. Nielsen, J.: AI is first new UI paradigm in 60 years. Jakob Nielsen on UX (2023). https://jakobnielsenphd.substack.com/p/ai-is-first-new-ui-paradigm-in-60. Accessed 03 July 2023

  34. Ouyang, L., et al.: Training language models to follow instructions with human feedback. In: NeurIPS (2022)

    Google Scholar 

  35. Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying large language models and knowledge graphs: a roadmap (2023)

    Google Scholar 

  36. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI, San Francisco, California, United States (2019). https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf. Accessed 05 July 2023

  37. Ross, S.I., Martinez, F., Houde, S., Muller, M., Weisz, J.D.: The programmer’s assistant: conversational interaction with a large language model for software development. In: Proceedings of the 28th International Conference on Intelligent User Interfaces, IUI 2023, Sydney, NSW, Australia, 27–31 March 2023, pp. 491–514. ACM (2023)

    Google Scholar 

  38. Sansonnet, J.-P., Martin, J.-C., Leguern, K.: A software engineering approach combining rational and conversational agents for the design of assistance applications. In: Panayiotopoulos, T., Gratch, J., Aylett, R., Ballin, D., Olivier, P., Rist, T. (eds.) IVA 2005. LNCS (LNAI), vol. 3661, pp. 111–119. Springer, Heidelberg (2005). https://doi.org/10.1007/11550617_10

    Chapter  Google Scholar 

  39. Schäfer, M., Nadi, S., Eghbali, A., Tip, F.: Adaptive test generation using a large language model. CoRR, abs/2302.06527 (2023)

    Google Scholar 

  40. Schröder, M.: Autoscrum: automating project planning using large language models. CoRR, abs/2306.03197 (2023)

    Google Scholar 

  41. Sridhara, G., Mazumdar, S.: ChatGPT: a study on its utility for ubiquitous software engineering tasks. CoRR, abs/2305.16837 (2023)

    Google Scholar 

  42. Thoppilan, R., et al.: Lamda: language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022)

  43. White, J., Hays, S., Fu, Q., Spencer-Smith, J., Schmidt, D.C.: ChatGPT prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. CoRR, abs/2303.07839 (2023)

    Google Scholar 

  44. Wirsing, M., Belzner, L.: Towards systematically engineering autonomous systems using reinforcement learning and planning. In: López-García, P., Gallagher, J.P., Giacobazzi, R. (eds.) Analysis, Verification and Transformation for Declarative Programming and Intelligent Systems. LNCS, vol. 13160, pp. 281–306. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-31476-6_16

    Chapter  Google Scholar 

  45. Xie, D., et al.: Docter: documentation-guided fuzzing for testing deep learning API functions. In: Ryu, S., Smaragdakis, Y. (eds.) ISSTA 2022: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, 18–22 July 2022, pp. 176–188. ACM (2022)

    Google Scholar 

  46. Xie, D., et al.: Impact of large language models on generating software specifications. CoRR, abs/2306.03324 (2023)

    Google Scholar 

  47. Yan, Z., Qin, Y., Hu, X.S., Shi, Y.: On the viability of using LLMS for SW/HW co-design: an example in designing cim DNN accelerators. CoRR, abs/2306.06923 (2023)

    Google Scholar 

  48. Yuan, Z., et al.: No more manual tests? Evaluating and improving ChatGPT for unit test generation. CoRR, abs/2305.04207 (2023). Accessed 29 June 2023

    Google Scholar 

  49. Zhao, W.X., et al.: A survey of large language models. CoRR, abs/2303.18223 (2023)

    Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewer for constructive criticisms and helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Wirsing .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Belzner, L., Gabor, T., Wirsing, M. (2024). Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study. In: Steffen, B. (eds) Bridging the Gap Between AI and Reality. AISoLA 2023. Lecture Notes in Computer Science, vol 14380. Springer, Cham. https://doi.org/10.1007/978-3-031-46002-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46002-9_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46001-2

  • Online ISBN: 978-3-031-46002-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics