Abstract
As consumer electronics and tensor computation for machine learning (ML) continue to advance, model execution and training become more accessible. NVIDIA introduced the RTX 4090 graphics cards, marketed initially as gamer-oriented products, in late 2022. Though relatively expensive for consumer use, their manufacturer’s suggested retail price (MSRP) of 1600 USD makes them affordable as a professional tool. These cards’ extensive video random access memory (vRAM), computational power comparable to last-generation flagship professional cards, and ability to use single-byte floats enable a pair of them to train, fine-tune, and run on-premises Large Language Models (LLMs) with up to 7 billion parameters per card. Until this release, such a feat would have required data center-level equipment. Although the RTX 4090 and H100 GPU represent a qualitative step forward, iterative improvements combined with the speculated lowering of computational precision to half-byte floats could make larger models even more accessible for on-premises use. This development might, in one aspect, lower the entry barrier for cyberattackers, simplifying the process for advanced persistent threats (APTs) to camouflage their activities amidst unsophisticated attackers or those employing generative LLMs for non-malicious purposes. Conversely, as an alternative to cloud-hosted models, on-site LLMs may limit the possibility of private information leakage or model poisoning while offering specialized capabilities for legitimate users.
Chapter PDF
References
Haokun Liu et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning, 2022.
Edward J. Hu et al. Lora: Low-rank adaptation of large language models, 2021.
Hugo Touvron et al. Llama: Open and efficient foundation language models, 2023.
Hugo Touvron et al. Llama 2: Open foundation and fine-tuned chat models, 2023.
Baptiste Rozière et al. Code llama: Open foundation models for code, 2023.
Edward Beeching et al. Open llm leaderboard. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard, 2023.
Dan Hendrycks et al. Measuring massive multitask language understanding, 2021.
Lianmin Zheng et al. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
Yupeng Chang et al. A survey on evaluation of large language models, 2023.
Ray Project. Numbers every LLM developer should know, 2023. GitHub repository.
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. Llm.int8(): 8-bit matrix multiplication for transformers at scale, 2022.
Tim Dettmers and Luke Zettlemoyer. The case for 4-bit precision: k-bit inference scaling laws, 2023.
Elias Frantar, Sidak Pal Singh, and Dan Alistarh. Optimal brain compression: A framework for accurate post-training quantization and pruning, 2023.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Schillaci, Z. (2024). On-Site Deployment of LLMs. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-54827-7_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)