Abstract
Large language models (LLMs) have transformed the interpretation and creation of human language in the rapidly developing field of computerized language processing. These models, which are based on deep learning techniques like transformer architectures, have been painstakingly trained on massive text datasets. This study paper takes an in-depth look into LLMs, including their architecture, historical evolution, and applications in education, healthcare, and finance sector. LLMs provide logical replies by interpreting complicated verbal patterns, making them beneficial in a variety of real-world scenarios. Their development and implementation, however, raise ethical concerns and have societal ramifications. Understanding the importance and limitations of LLMs is critical for guiding future research and ensuring the ethical use of their enormous potential. This survey exposes the influence of these models as they change, providing a roadmap for researchers, developers, and policymakers navigating the world of artificial intelligence and language processing.
Similar content being viewed by others
References
Arisoy E, Sainath TN, Kingsbury B, Ramabhadran B (2012) Deep neural network language models. In: Proceedings of the NAACL-HLT 2012 workshop: Will we ever really replace the n-gram model? On the Future of Language Modeling for HLT, pp 20–28
Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech, vol. 2. pp 1045–1048
Huang J, Chang KCC (2022) Towards reasoning in large language models: a survey. arXiv preprint arXiv:2212.10403
Bharathi Mohan G, Prasanna Kumar R (2022) Survey of text document summarization based on ensemble topic vector clustering model. In: IoT based control networks and intelligent systems: proceedings of 3rd ICICNIS 2022. Springer Nature Singapore, Singapore, pp. 831–847
Li Y, Wang S, Ding H, Chen H (2023) Large language models in finance: a survey. In: Proceedings of the fourth ACM international conference on AI in finance, pp 374–382
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
Yu Y, Zhuang Y, Zhang J, Meng Y, Ratner A, Krishna R, Shen J, Zhang C (2023) Large language model as attributed training data generator: a tale of diversity and bias. arXiv preprint arXiv:2306.15895
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I et al. (2017) Attention Is All You Need. In: 31st Conference on neural information processing systems (NIPS 2017), Long Beach, CA, pp 5998–6008
Forsyth D, Forsyth D (2019) Hidden Markov models. Applied machine learning. Springer, Cham, pp 305–332
Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, Min Y, Zhang B, Zhang J, Dong Z, Du Y (2023) A survey of large language models. arXiv preprint arXiv:2303.18223
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, vol 27.
Naveed H, Khan AU, Qiu S, Saqib M, Anwar S, Usman M, Barnes N, Mian A (2023) A comprehensive overview of large language models. arXiv preprint arXiv:2307.06435
Yang Z, Dai Z, Yang Y, Carbonell JG (2019) Ruslan Salakhutdinov, Quoc V. Le: XLNet: Generalized autoregressive pretraining for language understanding. NeurIPS 2019, pp 5754–5764.
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Yan L, Sha L, Zhao L, Li Y, Martinez-Maldonado R, Chen G, Li X, Jin Y, Gašević D (2023) Practical and ethical challenges of large language models in education: a systematic literature review. arXiv preprint arXiv:2303.13379
Ellaway RH, Tolsgaard M (2023) Artificial scholarship: LLMs in health professions education research. Adv Health Sci Educ 28:659
Katz A, Shakir U, Chambers B (2023) The utility of large language models and generative AI for education research. https://doi.org/10.48550/arXiv.2305.18125
Meyer JG, Urbanowicz RJ, Martin PC, O’Connor K, Li R, Peng PC, Bright TJ, Tatonetti N, Won KJ, Gonzalez-Hernandez G, Moore JH (2023) ChatGPT and large language models in academia: opportunities and challenges. BioData Min 16(1):20
Milano S, McGrane JA, Leonelli S (2023) Large language models challenge the future of higher education. Nat Mach Intell 5(4):333–334
Aher GV, Arriaga RI, Kalai AT (2023) Using large language models to simulate multiple humans and replicate human subject studies. In: International conference on machine learning. PMLR, pp 337–371
Abd-Alrazaq A, AlSaad R, Alhuwail D, Ahmed A, Healy PM, Latifi S, Aziz S, Damseh R, Alrazak SA, Sheikh J (2023) Large language models in medical education: opportunities, challenges, and future directions. JMIR Med Educ 9(1):e48291
Kasneci E, Seßler K, Küchemann S, Bannert M, Dementieva D, Fischer F, Gasser U, Groh G, Günnemann S, Hüllermeier E, Krusche S (2023) ChatGPT for good? On opportunities and challenges of large language models for education. Learn Individ Differ 103:102274
Bewersdorff A, Seßler K, Baur A, Kasneci E, Nerdel C (2023) Assessing student errors in experimentation using artificial intelligence and large language models: a comparative study with human raters. Comput Educ Artif Intell 5:100177
Bawden R, Yvon F (2023) Investigating the translation performance of a large multilingual language model: the case of bloom. arXiv preprint arXiv:2303.01911
Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A (2023) Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971
Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, Lee P, Lee YT, Li Y, Lundberg S, Nori H (2023) Sparks of artificial general intelligence: early experiments with gpt-4. arXiv preprint arXiv:2303.12712
Zhu D, Chen J, Shen X, Li X, Elhoseiny M (2023) Minigpt-4: enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592
Bharathi Mohan G, Prasanna Kumar R, Parathasarathy S, Aravind S, Hanish KB, Pavithria G (2023). Text summarization for big data analytics: a comprehensive review of GPT 2 and BERT approaches. In: Sharma R, Jeon G, Zhang Y (eds) Data analytics for internet of things infrastructure. Internet of Things. Springer, Cham. https://doi.org/10.1007/978-3-031-33808-3_14
Azunre P (2021) Transfer learning for natural language processing. Simon and Schuster
Meskó B, Topol EJ (2023) The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med 6(1):120
Reddy S (2023) Evaluating large language models for use in healthcare: a framework for translational value assessment. Informat Med Unlocked 41:101304
Sallam M (2023) The utility of ChatGPT as an example of large language models in healthcare education, research and practice: systematic review on the future perspectives and potential limitations. medRxiv, pp 2023–02
Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S, Yin H, Xu C, Yang R, Zheng Q, Shi B (2023) ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int J Oral Sci 15(1):29
Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK (2023) ChatGPT in dentistry: a comprehensive review. Cureus 15(4):e38317. https://doi.org/10.7759/cureus.38317
Liu Y, Han T, Ma S, Zhang J, Yang Y, Tian J, He H, Li A, He M, Liu Z, Wu Z (2023) Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiol 1:100017
Liu XY, Wang G, Zha D (2023) Fingpt: democratizing internet-scale data for financial large language models. arXiv preprint arXiv:2307.10485
Gu Y, Zhang S, Usuyama N, Woldesenbet Y, Wong C, Sanapathi P, Wei M, Valluri N, Strandberg E, Naumann T, Poon H (2023) Distilling large language models for biomedical knowledge extraction: A case study on adverse drug events. arXiv preprint arXiv:2307.06439
Brameier DT, Alnasser AA, Carnino JM, Bhashyam AR, von Keudell AG, Weaver MJ (2023) Artificial intelligence in orthopaedic surgery: Can a large language model “write” a believable orthopaedic journal article? JBJS 105(17):1388–1392
Cabrera J, Loyola MS, Magaña I, Rojas R (2023) Ethical dilemmas, mental health, artificial intelligence, and llm-based chatbots. In: International work-conference on bioinformatics and biomedical engineering. Springer Nature Switzerland, Cham, pp 313–326
Cascella M, Montomoli J, Bellini V, Bignami E (2023) Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst 47(1):33
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29(8):1930–1940
De Angelis L, Baglivo F, Arzilli G, Privitera GP, Ferragina P, Tozzi AE, Rizzo C (2023) ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health 11:1166120
Sharaf S, Anoop VS (2023) An analysis on large language models in healthcare: a case study of BioBERT. arXiv preprint arXiv:2310.07282
Yang, X., PourNejatian, N., Shin, H.C., Smith, K.E., Parisien, C., Compas, C., Martin, C., Flores, M.G., Zhang, Y., Magoc, T. and Harle, C.A., 2022. GatorTron: A Large Language Model for Clinical Natural Language Processing. medRxiv, pp.2022–02.
Zhang H, Chen J, Jiang F, Yu F, Chen Z, Li J, Chen G, Wu X, Zhang Z, Xiao Q, Wan X (2023) HuatuoGPT, towards taming language model to Be a doctor. arXiv preprint arXiv:2305.15075
Zhou S, Wang N, Wang L, Liu H, Zhang R (2022) CancerBERT: a cancer domain-specific language model for extracting breast cancer phenotypes from electronic health records. J Am Med Inform Assoc 29(7):1208–1216
Santos T, Tariq A, Das S, Vayalpati K, Smith GH, Trivedi H, Banerjee I (2022) PathologyBERT-Pre-trained vs a new transformer language model for pathology domain. In: AMIA annual symposium proceedings, vol 2022. American Medical Informatics Association, p 962
Yang H, Liu XY, Wang CD (2023) FinGPT: open-source financial large language models. arXiv preprint arXiv:2306.06031
Yang Y, Tang Y, Tam KY (2023) InvestLM: a large language model for investment using financial domain instruction tuning. arXiv preprint arXiv:2309.13064
Nourbakhsh A, Bang G (2019) A framework for anomaly detection using language modeling, and its applications to finance. arXiv preprint arXiv:1908.09156
Wu S, Irsoy O, Lu S, Dabravolski V, Dredze M, Gehrmann S, Kambadur P, Rosenberg D, Mann G (2023) Bloomberggpt: a large language model for finance. arXiv preprint arXiv:2303.17564
Yang Y, Uy MCS, Huang A (2020) Finbert: a pretrained language model for financial communications. arXiv preprint arXiv:2006.08097
Xie Q, Han W, Zhang X, Lai Y, Peng M, Lopez-Lira A, Huang J (2023) PIXIU: a large language model, instruction data and evaluation benchmark for finance. arXiv preprint arXiv:2306.05443
Shi W, Ajith A, Xia M, Huang Y, Liu D, Blevins T, Chen D, Zettlemoyer L (2023) Detecting pretraining data from large language models. arXiv preprint arXiv:2310.16789
Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y (2022) Large language models are zero-shot reasoners. Adv Neural Inf Process Syst 35:22199–22213
Liddy E (2001) Advances in automatic text summarization. Inf Retr 4(1):82–83
Liu X, Croft WB (2005) Statistical language modeling for information retrieval. Annu Rev Inf Sci Technol 39(1):1–31
Juang BH, Rabiner LR (2005) Automatic speech recognition–a brief history of the technology development. Georgia Institute of Technology. Atlanta Rutgers University and the University of California. Santa Barbara, 1, p 67
Kovačević A, Kečo D (2022) Bidirectional LSTM networks for abstractive text summarization. In: Advanced technologies, systems, and applications VI: Proceedings of the international symposium on innovative and interdisciplinary applications of advanced technologies (IAT) 2021. Springer International Publishing, pp 281–293
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Akbar NA, Darmayanti I, Fati SM, Muneer A (2021) Deep learning of a pre-trained language model’s joke classifier using GPT-2. J Hunan Univ Nat Sci 48(8)
Floridi L, Chiriatti M (2020) GPT-3: its nature, scope, limits, and consequences. Mind Mach 30:681–694
Funding
The authors have no relevant financial or non-financial interests to disclose.
Author information
Authors and Affiliations
Contributions
BHARATHI MOHAN G and PRASANNA KUMAR R - conceptualized review theme, guided in technical writing of manuscript and reviewed the full manuscript. VISHAL KRISHH P, KEERTHINATHAN A, MEKA KAVYA UMA MEGHANA, SHEBA SULTHANA and LAVANYA G- wrote the main manuscript. Srinath Doss- reviewed the full manuscript
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Ethical approval and consent to participate
“Not applicable.”
Consent for publication
“Not applicable.”
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bharathi Mohan, G., Prasanna Kumar, R., Vishal Krishh, P. et al. An analysis of large language models: their impact and potential applications. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02120-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10115-024-02120-8