From Deep Neural Language Models to LLMs

Kucharavy, Andrei

doi:10.1007/978-3-031-54827-7_1

Andrei Kucharavy⁶

10k Accesses

Abstract

Large Language Models (LLMs) are scaled-up instances of Deep Neural Language Models—a type of Natural Language Processing (NLP) tools trained with Machine Learning (ML). To best understand how LLMs work, we must dive into what technologies they build on top of and what makes them different. To achieve this, an overview of the history of LLMs development, starting from the 1990s, is provided before covering the counterintuitive purely probabilistic nature of the Deep Neural Language Models, continuous token embedding spaces, recurrent neural networks-based models, what self-attention brought to the table, and finally, why scaling Deep Neural Language Models led to a qualitative change, warranting a new name for the technology.

Download to read the full chapter text

Chapter PDF

References

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Madeleine Clare Elish, William Isaac, and Richard S. Zemel, editors, FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3–10, 2021, pages 610–623. ACM, 2021.
Google Scholar
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108, 2019.
Google Scholar
Jared Kaplan et al. Scaling laws for neural language models. CoRR, abs/2001.08361, 2020.
Google Scholar
Deep Ganguli et al. Predictability and surprise in large generative models. In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022, pages 1747–1764. ACM, 2022.
Google Scholar
Claude E Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
Google Scholar
Risto Miikkulainen and Michael G. Dyer. Natural language processing with modular PDP networks and distributed lexicon. Cogn. Sci., 15(3):343–399, 1991.
Article Google Scholar
Jürgen Schmidhuber and Stefan Heil. Sequential neural text compression. IEEE Trans. Neural Networks, 7(1):142–146, 1996.
Article Google Scholar
Yoshua Bengio, Réjean Ducharme, and Pascal Vincent. A neural probabilistic language model. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA, pages 932–938. MIT Press, 2000.
Google Scholar
Holger Schwenk, Daniel Déchelotte, and Jean-Luc Gauvain. Continuous space language models for statistical machine translation. In Nicoletta Calzolari, Claire Cardie, and Pierre Isabelle, editors, ACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, 17–21 July 2006. The Association for Computer Linguistics, 2006.
Google Scholar
Ronan Collobert and Jason Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In William W. Cohen, Andrew McCallum, and Sam T. Roweis, editors, Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5–9, 2008, volume 307 of ACM International Conference Proceeding Series, pages 160–167. ACM, 2008.
Google Scholar
Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. Recurrent neural network based language model. In Interspeech, volume 2, pages 1045–1048. Makuhari, 2010.
Google Scholar
Ilya Sutskever, James Martens, and Geoffrey E. Hinton. Generating text with recurrent neural networks. In Lise Getoor and Tobias Scheffer, editors, Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 1017–1024. Omnipress, 2011.
Google Scholar
Ferenc Huszar. How (not) to train your generative model: Scheduled sampling, likelihood, adversary? CoRR, abs/1511.05101, 2015.
Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings, 2015.
Google Scholar
Melvin Johnson et al. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. Assoc. Comput. Linguistics, 5:339–351, 2017.
Article Google Scholar
Ashish Vaswani et al. Attention is all you need. In Isabelle Guyon et al., editor, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pages 5998–6008, 2017.
Google Scholar
Dumitru Erhan et al. Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res., 11:625–660, 2010.
MathSciNet Google Scholar
Samuel R. Bowman et al. Generating sentences from a continuous space. In Yoav Goldberg and Stefan Riezler, editors, Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, August 11–12, 2016, pages 10–21. ACL, 2016.
Google Scholar
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net, 2020.
Google Scholar
Nicholas Carlini et al. Extracting training data from large language models. In Michael Bailey and Rachel Greenstadt, editors, 30th USENIX Security Symposium, USENIX Security 2021, August 11–13, 2021, pages 2633–2650. USENIX Association, 2021.
Google Scholar
Ethan Perez et al. Red teaming language models with language models. CoRR, abs/2202.03286, 2022.
Google Scholar
Jordan Hoffmann et al. Training compute-optimal large language models. CoRR, abs/2203.15556, 2022.
Google Scholar
Rylan Schaeffer, Brando Miranda, and Sanmi Koyejo. Are emergent abilities of large language models a mirage? CoRR, abs/2304.15004, 2023.
Google Scholar
Aakanksha Chowdhery et al. Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311, 2022.
Google Scholar
Teven Le Scao et al. BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100, 2022.
Google Scholar

Download references

Author information

Authors and Affiliations

HES-SO Valais-Wallis, Sierre, Switzerland
Andrei Kucharavy

Authors

Andrei Kucharavy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrei Kucharavy .

Editor information

Editors and Affiliations

HES-SO Valais-Wallis, Sierre, Switzerland
Andrei Kucharavy
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Octave Plancherel
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Valentin Mulder
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Alain Mermoud
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Vincent Lenders

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kucharavy, A. (2024). From Deep Neural Language Models to LLMs. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-54827-7_1
Published: 12 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics