On-Site Deployment of LLMs

Schillaci, Zachary

doi:10.1007/978-3-031-54827-7_23

Zachary Schillaci⁶

10k Accesses

Abstract

As consumer electronics and tensor computation for machine learning (ML) continue to advance, model execution and training become more accessible. NVIDIA introduced the RTX 4090 graphics cards, marketed initially as gamer-oriented products, in late 2022. Though relatively expensive for consumer use, their manufacturer’s suggested retail price (MSRP) of 1600 USD makes them affordable as a professional tool. These cards’ extensive video random access memory (vRAM), computational power comparable to last-generation flagship professional cards, and ability to use single-byte floats enable a pair of them to train, fine-tune, and run on-premises Large Language Models (LLMs) with up to 7 billion parameters per card. Until this release, such a feat would have required data center-level equipment. Although the RTX 4090 and H100 GPU represent a qualitative step forward, iterative improvements combined with the speculated lowering of computational precision to half-byte floats could make larger models even more accessible for on-premises use. This development might, in one aspect, lower the entry barrier for cyberattackers, simplifying the process for advanced persistent threats (APTs) to camouflage their activities amidst unsophisticated attackers or those employing generative LLMs for non-malicious purposes. Conversely, as an alternative to cloud-hosted models, on-site LLMs may limit the possibility of private information leakage or model poisoning while offering specialized capabilities for legitimate users.

Download to read the full chapter text

Chapter PDF

References

Haokun Liu et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning, 2022.
Google Scholar
Edward J. Hu et al. Lora: Low-rank adaptation of large language models, 2021.
Google Scholar
Hugo Touvron et al. Llama: Open and efficient foundation language models, 2023.
Google Scholar
Hugo Touvron et al. Llama 2: Open foundation and fine-tuned chat models, 2023.
Google Scholar
Baptiste Rozière et al. Code llama: Open foundation models for code, 2023.
Google Scholar
Edward Beeching et al. Open llm leaderboard. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard, 2023.
Dan Hendrycks et al. Measuring massive multitask language understanding, 2021.
Google Scholar
Lianmin Zheng et al. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
Google Scholar
Yupeng Chang et al. A survey on evaluation of large language models, 2023.
Google Scholar
Ray Project. Numbers every LLM developer should know, 2023. GitHub repository.
Google Scholar
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. Llm.int8(): 8-bit matrix multiplication for transformers at scale, 2022.
Google Scholar
Tim Dettmers and Luke Zettlemoyer. The case for 4-bit precision: k-bit inference scaling laws, 2023.
Google Scholar
Elias Frantar, Sidak Pal Singh, and Dan Alistarh. Optimal brain compression: A framework for accurate post-training quantization and pruning, 2023.
Google Scholar

Download references

Author information

Authors and Affiliations

Effixis SA, Lausanne, Switzerland
Zachary Schillaci

Authors

Zachary Schillaci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zachary Schillaci .

Editor information

Editors and Affiliations

HES-SO Valais-Wallis, Sierre, Switzerland
Andrei Kucharavy
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Octave Plancherel
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Valentin Mulder
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Alain Mermoud
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Vincent Lenders

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schillaci, Z. (2024). On-Site Deployment of LLMs. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-54827-7_23
Published: 12 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics