Abstract
By themselves, pretrained Large Language Models (LLMs) are interesting objects of study. However, they need to undergo a subsequent transfer learning phase to make them useful for downstream applications. While historically referred to as “fine-tuning,” the range of the tools available to LLMs users to better adapt base models to their applications is now significantly wider than the traditional fine-tuning. In order to provide the reader with an idea of the strengths and weaknesses of each method and allow them to pick one that would suit their needs best, an overview and classification of the most notable methods is provided, specifically the prompt optimization, pre-prompting and implicit prompting (system prompting), model coordination through actor agents, integration with auxiliary tools, parameter-efficient fine-tuning, further model pre-training, from-scratch retraining, and finally domain-specific distillation.
Chapter PDF
References
Jason Wei et al. Chain of thought prompting elicits reasoning in large language models. CoRR, abs/2201.11903, 2022.
Takeshi Kojima et al. Large language models are zero-shot reasoners. In NeurIPS, 2022.
Samuel Gehman et al. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16–20 November 2020, volume EMNLP 2020 of Findings of ACL, pages 3356–3369. Association for Computational Linguistics, 2020.
Nicholas Carlini et al. Extracting training data from large language models. In Michael Bailey and Rachel Greenstadt, editors, 30th USENIX Security Symposium, USENIX Security 2021, August 11–13, 2021, pages 2633–2650. USENIX Association, 2021.
Benj Edwards. Ai-powered bing chat spills its secrets via prompt injection attack. Ars Technica, 2023.
Kris McGuffie and Alex Newhouse. The radicalization risks of gpt-3 and advanced neural language models. CoRR, abs/2009.06807, 2020.
Timo Schick, Sahana Udupa, and Hinrich Schütze. Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in NLP. CoRR, abs/2103.00453, 2021.
Long Ouyang et al. Training language models to follow instructions with human feedback. CoRR, abs/2203.02155, 2022.
Ben Krause et al. Gedi: Generative discriminator guided sequence generation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16–20 November, 2021, pages 4929–4952. Association for Computational Linguistics, 2021.
Wenlong Huang et al. Inner monologue: Embodied reasoning through planning with language models. In Karen Liu, Dana Kulic, and Jeffrey Ichnowski, editors, Conference on Robot Learning, CoRL 2022, 14–18 December 2022, Auckland, New Zealand, volume 205 of Proceedings of Machine Learning Research, pages 1769–1782. PMLR, 2022.
Shunyu Yao et al. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1–5, 2023. OpenReview.net, 2023.
Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17–23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 9118–9147. PMLR, 2022.
Aaron Parisi, Yao Zhao, and Noah Fiedel. TALM: tool augmented language models. CoRR, abs/2205.12255, 2022.
Kurt Shuster et al. Language models that seek for knowledge: Modular search & generation for dialogue and prompt completion. CoRR, abs/2203.13224, 2022.
Timo Schick et al. Toolformer: Language models can teach themselves to use tools. CoRR, abs/2302.04761, 2023.
Grégoire Mialon et al. Augmented language models: a survey. CoRR, abs/2302.07842, 2023.
Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1–6, 2021, pages 4582–4597. Association for Computational Linguistics, 2021.
Neil Houlsby et al. Parameter-efficient transfer learning for NLP. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 2790–2799. PMLR, 2019.
Edward J. Hu et al. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. OpenReview.net, 2022.
Hao Li et al. Visualizing the loss landscape of neural nets. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, Canada, pages 6391–6401, 2018.
Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2022, Dublin, Ireland, May 22–27, 2022, pages 1–9. Association for Computational Linguistics, 2022.
Yuning Mao et al. Unipelt: A unified framework for parameter-efficient language model tuning. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22–27, 2022, pages 6253–6264. Association for Computational Linguistics, 2022.
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. CoRR, abs/2305.14314, 2023.
Matthew E. Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power. Semi-supervised sequence tagging with bidirectional language models. In Regina Barzilay and Min-Yen Kan, editors, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pages 1756–1765. Association for Computational Linguistics, 2017.
Matthew E. Peters et al. Deep contextualized word representations. In Marilyn A. Walker, Heng Ji, and Amanda Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1–6, 2018, Volume 1 (Long Papers), pages 2227–2237. Association for Computational Linguistics, 2018.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019.
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by generative pre-training. CoRR, 2018.
Jason Wei et al. Finetuned language models are zero-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. OpenReview.net, 2022.
Swaroop Mishra, Daniel Khashabi, Chitta Baral, and Hannaneh Hajishirzi. Cross-task generalization via natural language crowdsourcing instructions. In ACL, 2022.
Victor Sanh et al. Multitask prompted training enables zero-shot task generalization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. OpenReview.net, 2022.
Michael McCloskey and Neal J Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pages 109–165. Elsevier, 1989.
Sanyuan Chen et al. Recall and learn: Fine-tuning deep pretrained language models with less forgetting. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16–20, 2020, pages 7870–7881. Association for Computational Linguistics, 2020.
Ferenc Huszar. How (not) to train your generative model: Scheduled sampling, likelihood, adversary? CoRR, abs/1511.05101, 2015.
Yuntao Bai et al. Constitutional AI: harmlessness from AI feedback. CoRR, abs/2212.08073, 2022.
Colin Raffel et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020.
Lingjiao Chen, Matei Zaharia, and James Zou. How is chatgpt’s behavior changing over time? CoRR, abs/2307.09009, 2023.
Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. Recurrent models of visual attention. In Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 27: NeurIPS 2014, pages 2204–2212, 2014.
Kelvin Xu et al. Show, attend and tell: Neural image caption generation with visual attention. In Francis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 2048–2057. JMLR.org, 2015.
Philipp Schmid et al. Spread your wings: Falcon 180b is here, 2023.
Guilherme Penedo et al. The refinedweb dataset for falcon LLM: outperforming curated corpora with web data, and web data only. CoRR, abs/2306.01116, 2023.
Teven Le Scao et al. BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100, 2022.
Baptiste Rozière et al. Code llama: Open foundation models for code. CoRR, abs/2308.12950, 2023.
Tomasz Korbak et al. Pretraining language models with human preferences. In Andreas Krause et al., editor, International Conference on Machine Learning, ICML 2023, 23–29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 17506–17533. PMLR, 2023.
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108, 2019.
Mistral AI team. Mistral 7b, the best 7b model to date, apache 2.0, 2023.
Ilia Shumailov et al. The curse of recursion: Training on generated data makes models forget. CoRR, abs/2305.17493, 2023.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Kucharavy, A. (2024). Adapting LLMs to Downstream Applications. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-54827-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)