Abstract
Large Language Models (LLMs) are scaled-up instances of Deep Neural Language Models—a type of Natural Language Processing (NLP) tools trained with Machine Learning (ML). To best understand how LLMs work, we must dive into what technologies they build on top of and what makes them different. To achieve this, an overview of the history of LLMs development, starting from the 1990s, is provided before covering the counterintuitive purely probabilistic nature of the Deep Neural Language Models, continuous token embedding spaces, recurrent neural networks-based models, what self-attention brought to the table, and finally, why scaling Deep Neural Language Models led to a qualitative change, warranting a new name for the technology.
Chapter PDF
References
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Madeleine Clare Elish, William Isaac, and Richard S. Zemel, editors, FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3–10, 2021, pages 610–623. ACM, 2021.
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108, 2019.
Jared Kaplan et al. Scaling laws for neural language models. CoRR, abs/2001.08361, 2020.
Deep Ganguli et al. Predictability and surprise in large generative models. In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022, pages 1747–1764. ACM, 2022.
Claude E Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
Risto Miikkulainen and Michael G. Dyer. Natural language processing with modular PDP networks and distributed lexicon. Cogn. Sci., 15(3):343–399, 1991.
Jürgen Schmidhuber and Stefan Heil. Sequential neural text compression. IEEE Trans. Neural Networks, 7(1):142–146, 1996.
Yoshua Bengio, Réjean Ducharme, and Pascal Vincent. A neural probabilistic language model. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA, pages 932–938. MIT Press, 2000.
Holger Schwenk, Daniel Déchelotte, and Jean-Luc Gauvain. Continuous space language models for statistical machine translation. In Nicoletta Calzolari, Claire Cardie, and Pierre Isabelle, editors, ACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, 17–21 July 2006. The Association for Computer Linguistics, 2006.
Ronan Collobert and Jason Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In William W. Cohen, Andrew McCallum, and Sam T. Roweis, editors, Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5–9, 2008, volume 307 of ACM International Conference Proceeding Series, pages 160–167. ACM, 2008.
Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. Recurrent neural network based language model. In Interspeech, volume 2, pages 1045–1048. Makuhari, 2010.
Ilya Sutskever, James Martens, and Geoffrey E. Hinton. Generating text with recurrent neural networks. In Lise Getoor and Tobias Scheffer, editors, Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 1017–1024. Omnipress, 2011.
Ferenc Huszar. How (not) to train your generative model: Scheduled sampling, likelihood, adversary? CoRR, abs/1511.05101, 2015.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings, 2015.
Melvin Johnson et al. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. Assoc. Comput. Linguistics, 5:339–351, 2017.
Ashish Vaswani et al. Attention is all you need. In Isabelle Guyon et al., editor, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pages 5998–6008, 2017.
Dumitru Erhan et al. Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res., 11:625–660, 2010.
Samuel R. Bowman et al. Generating sentences from a continuous space. In Yoav Goldberg and Stefan Riezler, editors, Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, August 11–12, 2016, pages 10–21. ACL, 2016.
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net, 2020.
Nicholas Carlini et al. Extracting training data from large language models. In Michael Bailey and Rachel Greenstadt, editors, 30th USENIX Security Symposium, USENIX Security 2021, August 11–13, 2021, pages 2633–2650. USENIX Association, 2021.
Ethan Perez et al. Red teaming language models with language models. CoRR, abs/2202.03286, 2022.
Jordan Hoffmann et al. Training compute-optimal large language models. CoRR, abs/2203.15556, 2022.
Rylan Schaeffer, Brando Miranda, and Sanmi Koyejo. Are emergent abilities of large language models a mirage? CoRR, abs/2304.15004, 2023.
Aakanksha Chowdhery et al. Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311, 2022.
Teven Le Scao et al. BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100, 2022.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Kucharavy, A. (2024). From Deep Neural Language Models to LLMs. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-54827-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)