Abstract
LLMs face a critical vulnerability known as sandbox breakout, where attackers bypass the system designers’ limitations to prevent malicious access to the resources for which the LLM agent is a user interface. Thus, they can access the system and potentially steal data, change the interaction with other users, or inject malicious code or contents into underlying databases. Therefore, it is essential to identify and address vulnerabilities that could be exploited to break out of the sandbox. These vulnerabilities could exist in the sandbox, the operating system, or the LLM’s software dependencies. To mitigate the risk of LLM sandbox breakout, robust security measures, such as regular model updates, automated model red-teaming, testing, and access control policies, must be implemented. In addition, sandboxing should be enforced at multiple levels to reduce the attack surface and prevent attackers from accessing critical systems. By implementing these measures, the risk of LLM sandbox breakout can be significantly reduced, and the security and reliability of LLM-based applications can be improved.
Chapter PDF
References
G. T. Klondike. Threat Modeling LLM Applications. https://aivillage.org/large%20language%20models/threat-modeling-llm, 2023.
E. Eliacik. Playing with fire: The leaked plugin DAN unchains ChatGPT from its moral and ethical restrictions. https://dataconomy.com/2023/03/31/chatgpt-dan-prompt-how-to-jailbreak-chatgpt/, 2023.
A. Stubbs. LLM Hacking: Prompt Injection Techniques. https://medium.com/@austin-stubbs/llm-security-types-of-prompt-injection-d7ad8d7d75a3, 2023.
S. Manjesh. HackerOne and the OWASP Top 10 for LLM: A Powerful Alliance for Secure AI. https://www.hackerone.com/vulnerability-management/owasp-llm-vulnerabilities, 2023.
R. Merritt. What Is Retrieval-Augmented Generation aka RAG? https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation, 2023.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Majumdar, S., Vogelsang, T. (2024). Towards Safe LLMs Integration. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-54827-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)