Overview of Existing LLM Families

Kucharavy, Andrei

doi:10.1007/978-3-031-54827-7_3

Andrei Kucharavy⁶

10k Accesses

Abstract

While the general public discovered Large Language Models (LLMs) with ChatGPT—a generative autoregressive model, they are far from the only models in the LLM family. Various architectures and training regiments optimized for specific usages were designed throughout their development, which were then classified as different LLM families.

Download to read the full chapter text

Chapter PDF

References

Ashish Vaswani et al. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pages 5998–6008, 2017.
Google Scholar
Matthew E. Peters et al. Deep contextualized word representations. In Marilyn A. Walker, Heng Ji, and Amanda Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1–6, 2018, Volume 1 (Long Papers), pages 2227–2237. Association for Computational Linguistics, 2018.
Google Scholar
Jeremy Howard and Sebastian Ruder. Universal language model fine-tuning for text classification. In Iryna Gurevych and Yusuke Miyao, editors, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15–20, 2018, Volume 1: Long Papers, pages 328–339. Association for Computational Linguistics, 2018.
Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019.
Google Scholar
Yinhan Liu et al. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019.
Google Scholar
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108, 2019.
Google Scholar
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. ELECTRA: pre-training text encoders as discriminators rather than generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net, 2020.
Google Scholar
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by generative pre-training. CoRR, 2018.
Google Scholar
Yukun Zhu et al. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pages 19–27. IEEE Computer Society, 2015.
Google Scholar
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
Google Scholar
Tom B. Brown et al. Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, virtual, 2020.
Google Scholar
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Madeleine Clare Elish, William Isaac, and Richard S. Zemel, editors, FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3–10, 2021, pages 610–623. ACM, 2021.
Google Scholar
Long Ouyang et al. Training language models to follow instructions with human feedback. CoRR, abs/2203.02155, 2022.
Google Scholar
Mark Chen et al. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021.
Google Scholar
Sid et al. Black. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, March 2021. If you use this software, please cite it using these metadata.
Google Scholar
Ben Wang and Aran Komatsuzaki. Gpt-j-6b: A 6 billion parameter autoregressive language model, 2021.
Google Scholar
Sidney et al. Black. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pages 95–136, virtual+Dublin, May 2022. Association for Computational Linguistics.
Google Scholar
Leo Gao et al. The pile: An 800gb dataset of diverse text for language modeling. CoRR, abs/2101.00027, 2021.
Google Scholar
Boseop Kim et al. What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7–11 November, 2021, pages 3405–3424. Association for Computational Linguistics, 2021.
Google Scholar
Susan Zhang et al. OPT: open pre-trained transformer language models. CoRR, abs/2205.01068, 2022.
Google Scholar
Jared Kaplan et al. Scaling laws for neural language models. CoRR, abs/2001.08361, 2020.
Google Scholar
Jack W. Rae et al. Scaling language models: Methods, analysis & insights from training gopher. CoRR, abs/2112.11446, 2021.
Google Scholar
Aakanksha Chowdhery et al. Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311, 2022.
Google Scholar
Teven Le Scao et al. BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100, 2022.
Google Scholar
Deep Ganguli et al. Predictability and surprise in large generative models. In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022, pages 1747–1764. ACM, 2022.
Google Scholar
Jordan Hoffmann et al. Training compute-optimal large language models. CoRR, abs/2203.15556, 2022.
Google Scholar
Amelia Glaese et al. Improving alignment of dialogue agents via targeted human judgements. CoRR, abs/2209.14375, 2022.
Google Scholar
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023.
Google Scholar
Guilherme Penedo et al. The refinedweb dataset for falcon LLM: outperforming curated corpora with web data, and web data only. CoRR, abs/2306.01116, 2023.
Google Scholar
Colin Raffel et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020.
Google Scholar
Hugo Touvron et al. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023.
Google Scholar
Leo et al. Gao. A framework for few-shot language model evaluation, September 2021.
Google Scholar
Baptiste Rozière et al. Code llama: Open foundation models for code. CoRR, abs/2308.12950, 2023.
Google Scholar
Mike Lewis et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pages 7871–7880. Association for Computational Linguistics, 2020.
Google Scholar
Yi Tay et al. UL2: unifying language learning paradigms. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1–5, 2023. OpenReview.net, 2023.
Google Scholar
Hyung Won Chung et al. Scaling instruction-finetuned language models. CoRR, abs/2210.11416, 2022.
Google Scholar
Alec Radford et al. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR, 2021.
Google Scholar
Junnan Li, Dongxu Li, Silvio Savarese, and Steven C. H. Hoi. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. CoRR, abs/2301.12597, 2023.
Google Scholar
Jean-Baptiste Alayrac et al. Flamingo: a visual language model for few-shot learning. In NeurIPS, 2022.
Google Scholar
Yusuf Mehdi. Reinventing search with a new ai-powered microsoft bing and edge, your copilot for the web, February 2023.
Google Scholar
OpenAI. Gpt-4 technical report. CoRR, abs/2303.08774, 2023.
Google Scholar
William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res, 23:1–40, 2021.
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

HES-SO Valais-Wallis, Sierre, Switzerland
Andrei Kucharavy

Authors

Andrei Kucharavy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrei Kucharavy .

Editor information

Editors and Affiliations

HES-SO Valais-Wallis, Sierre, Switzerland
Andrei Kucharavy
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Octave Plancherel
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Valentin Mulder
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Alain Mermoud
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Vincent Lenders

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kucharavy, A. (2024). Overview of Existing LLM Families. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-54827-7_3
Published: 12 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics