Skip to main content
Log in

The Life Cycle of Knowledge in Big Language Models: A Survey

  • Review
  • Published:
Machine Intelligence Research Aims and scope Submit manuscript

Abstract

Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there is still a lack of a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. J. Nilsson. Artificial intelligence. In Proceedings of the 6th IFIP Congress 1974, Stockholm, Sweden, pp.778-801,1974.

  2. J. Devlin, M. W. Chang, K. Lee, K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Minneapolis, USA, pp.4171–4186, 2019. DOI: https://doi.org/10.18653/v1/N19-1423.

    Google Scholar 

  3. Y. H. Liu, M. Ott, N. Goyal, J. F. Du, M. Joshi, D. Q. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. [Online], Available: https://arxiv.org/abs/1907.11692, 2019.

  4. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Q. Zhou, W. Li, P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, vol. 21, no. 1, Article number 140, 2020.

  5. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, vol.1, no. 8, Article number 9, 2019.

  6. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 159, 2020.

  7. M. Lewis, Y. H. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880, 2020. DOI: https://doi.org/10.18653/v1/2020.acl-main.703.

  8. Y. Sun, S. H. Wang, Y. K. Li, S. K. Feng, X. Y. Chen, H. Zhang, X. Tian, D. X. Zhu, H. Tian, H. Wu. ERNIE: Enhanced representation through knowledge integration. [Online], Available: https://arxiv.org/abs/1904.09223, 2019.

  9. Z. Y. Zhang, X. Han, Z. Y. Liu, X. Jiang, M. S. Sun, Q. Liu. ERNIE: Enhanced language representation with informative entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 1441–1451, 2019. DOI: https://doi.org/10.18653/v1/P19-1139.

  10. D. Sachan, Y. H. Zhang, P. Qi, W. L. Hamilton. Do syntax trees help pre-trained transformers extract information? In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 2647–2661, 2021. DOI: https://doi.org/10.18653/v1/2021.eacl-main.228.

  11. F. Petroni, T. Rocktäschel, S. Riedel, P. Lewis, A. Bakhtin, Y. X. Wu, A. Miller. Language models as knowledge bases? In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, pp. 2463–2473, 2019. DOI: https://doi.org/10.18653/v1/D19-1250.

  12. Y. J. Lin, Y. C. Tan, R. Frank. Open sesame: Getting inside BERT’s linguistic knowledge. In Proceedings of the Annual Meeting of the Association for Computational Linguistics Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Florence, Italy, pp. 241–253, 2019. DOI: https://doi.org/10.18653/v1/W19-4825.

  13. J. Hewitt, C. D. Manning. A structural probe for finding syntax in word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, pp.4129-4138, 2019. DOI: https://doi.org/10.18653/v1/N19-1419.

  14. C. Zhu, A. S. Rawat, M. Zaheer, S. Bhojanapalli, D. L. Li, F. Yu, S. Kumar. Modifying memories in transformer models. [Online], Available: https://arxiv.org/abs/2012.00363, 2020.

  15. N. De Cao, W. Aziz, I. Titov. Editing factual knowledge in language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, pp. 6491–6506, 2021. DOI: https://doi.org/10.18653/v1/2021.emnlp-main.522.

  16. E. Mitchell, C. Lin, A. Bosselut, C. Finn, C. D. Manning. Fast model editing at scale. [Online], Available: https://arxiv.org/abs/2110.11309, 2022.

  17. P. G. Zimbardo, F. L. Ruch. Psychology and Life, 9th ed., Scott, Foresman, 1975.

    Google Scholar 

  18. P. S. Churchland, T. J. Sejnowski. Perspectives on cognitive neuroscience. Science, vol. 242, no. 4879, pp. 741–745, 1988. DOI: https://doi.org/10.1126/science.3055294.

    Article  CAS  PubMed  ADS  Google Scholar 

  19. R. Studer, V. Richard Benjamins, D. Fensel. Knowledge engineering: Principles and methods. Data Sz Knowledge Engineering, vol. 25, no. 1–2, pp. 161–197, 1998. DOI: https://doi.org/10.1016/S0169-023X(97)00056-6.

    Article  Google Scholar 

  20. G. Schreiber, H. Akkermans, A. Anjewierden, R. de Hoog, N. R. Shadbolt, W. Van de Velde, B. J. Wielinga. Knowledge Engineering and Management: The CommonKADS Methodology, Cambridge, USA: MIT Press, 2000.

    Google Scholar 

  21. C. H. Chiang, S. F. Huang, H. Y. Lee. Pretrained language model embryology: The birth of ALBERT. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 6813–6828, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.553.

  22. L. Pérez-Mayos, M. Ballesteros, L. Wanner. How much pretraining data do language models need to learn syntax? In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, pp. 1571–1582, 2021. DOI: https://doi.org/10.18653/v1/2021.emnlp-main.118.

  23. Z. Y. Liu, Y. Z. Wang, J. Kasai, H. Hajishirzi, N. A. Smith. Probing across time: What does RoBERTa know and when? In Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 820–842, 2021. DOI: https://doi.org/10.18653/vi/2021.findings-emnlp.71.

    Chapter  Google Scholar 

  24. W. H. Xiong, J. F. Du, W. Y. Wang, V. Stoyanov. Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.

  25. M. E. Peters, M. Neumann, R. Logan, R. Schwartz, V. Joshi, S. Singh, N. A. Smith. Knowledge enhanced contextual word representations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, pp.43-54, 2019. DOI: https://doi.org/10.18653/v1/D19-1005.

  26. X. Z. Wang, T. Y. Gao, Z. C. Zhu, Z. Y. Zhang, Z. Y. Liu, J. Z. Li, J. Tang. KEPLER: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics, vol. 9, pp. 176–194, 2021. DOI: https://doi.org/10.1162/tacl_a_00360.

    Article  CAS  Google Scholar 

  27. R. Z. Wang, D. Y. Tang, N. Duan, Z. Y. Wei, X. J. Huang, J. S. Ji, G. H. Cao, D. X. Jiang, M. Zhou. K-Adapter: Infusing knowledge into pre-trained models with adapters. In Proceedings of the Findings of the Association for Computational Linguistics, pp. 1405–1418, 2021. DOI: https://doi.org/10.18653/v1/2021.findings-acl.121.

  28. W. J. Liu, P. Zhou, Z. Zhao, Z. R. Wang, Q. Ju, H. T. Deng, P. Wang. K-BERT: Enabling language representation with knowledge graph. In Proceedings of the 34th AAAI Conference on Artificial Intelhgence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, USA, pp.2901-2908, 2020.

  29. A. Bosselut, H. Rashkin, M. Sap, C. Malaviya, A. Celikyilmaz, Y. Choi. COMET: Commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp.4762-4779, 2019. DOI: https://doi.org/10.18653/v1/P19-1470.

  30. Z. X. Ye, Q. Chen, W. Wang, Z. H. Ling. Align, mask and select: A simple method for incorporating commonsense knowledge into language representation models. [Online], Available: https://arxiv.org/abs/1908.06725, 2019.

  31. J. Guan, F. Huang, Z. H. Zhao, X. Y. Zhu, M. L. Huang. A knowledge-enhanced pretraining model for commonsense story generation. Transactions of the Association for Computational Linguistics, vol.8, pp.93–108, 2020. DOI: https://doi.org/10.1162/tacl_a_00302.

    Article  Google Scholar 

  32. K. X. Ma, F. Ilievski, J. Francis, Y. Bisk, E. Nyberg, A. Oltramari. Knowledge-driven data construction for zero-shot evaluation in commonsense question answering. In Proceedings of the 35th AAAI Conference on Artificial Intelhgence, the 33rd Conference on Innovative Applications of Artificial Intelligence, the 11th Symposium on Educational Advances in Artificial Intelligence, pp.13507-13515, 2021.

  33. P. Ke, H. Z. Ji, S. Y. Liu, X. Y. Zhu, M. L. Huang. SentiLARE: Sentiment-aware language representation learning with linguistic knowledge. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 6975–6988, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.567.

  34. A. Lauscher, I. Vulic, E. M. Ponti, A. Korhonen, G. G lavas. Specializing unsupervised pretraining models for word-level semantic similarity. In Proceedings of the 28th International Conference on Computational Linguistics, pp. 1371–1383, 2020. DOI: https://doi.org/10.18653/v1/2020.coling-main.118.

  35. J. R. Zhou, Z. S. Zhang, H. Zhao, S. L. Zhang. LIMITBERT: Linguistic informed multi-task BERT. In Proceedings of the Findings of the Association for Computational Linguistics, pp.4450-4461, 2020. DOI: https://doi.org/10.18653/v1/2020.fmdings-emnlp.399.

  36. J. G. Bai, Y. J. Wang, Y. R. Chen, Y. M. Yang, J. Bai, J. Yu, Y. H. Tong. Syntax-BERT: Improving pre-trained transformers with syntax trees. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 3011–3020, 2021. DOI: https://doi.org/10.18653/v1/2021.eacl-main.262.

  37. M. Geva, R. Schuster, J. Berant, O. Levy. Transformer feed-forward layers are key-value memories. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, pp. 5484–5495, 2021. DOI: https://doi.org/10.18653/v1/2021.emnlpmain.446.

  38. D. M. Dai, L. Dong, Y. R. Hao, Z. F. Sui, B. B. Chang, F. R. Wei. Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, pp.8493-8502, 2022. DOI: https://doi.org/10.18653/v1/2022.acl-long.581.

  39. K. Meng, D. Bau, A. Andonian, Y. Belinkov. Locating and editing factual associations in GPT. [Online], Available: https://arxiv.org/abs/2202.05262, 2022.

  40. K. Clark, U. Khandelwal, O. Levy, C. D. Manning. What does BERT look at? An analysis of BERT’s attention. In Proceedings of the ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Florence, Italy, pp. 276–286, 2019. DOI: https://doi.org/10.18653/v1/W19-4828.

  41. P. M. Htut, J. Phang, S. Bordia, S. R. Bowman. Do attention heads in BERT track syntactic dependencies? [Online], Available: https://arxiv.org/abs/1911.12246, 2019.

  42. N. F. Liu, M. Gardner, Y. Belinkov, M. E. Peters, N. A. Smith. Linguistic knowledge and transferability of contextual representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, pp. 1073–1094, 2019. DOI: https://doi.org/10.18653/v1/N19-1112.

  43. J. Juneja, R. Agarwal. Finding patterns in knowledge attribution for transformers. [Online], Available: https://arxiv.org/abs/2205.01366, 2022.

  44. Z. B. Jiang, A. Anastasopoulos, J. Araki, H. B. Ding, G. Neubig. X-FACTR: Multilingual factual knowledge retrieval from pretrained language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 5943–5959, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.479.

  45. M. Sung, J. Lee, S. Yi, M. Jeon, S. Kim, J. Kang. Can language models be biomedical knowledge bases? In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, pp.4723-4734, 2021. DOI: https://doi.org/10.18653/v1/2021.emnlp-main.388.

  46. X. H. Zhou, Y. Zhang, L. Y. Cui, D. D. Huang. Evaluating commonsense in pre-trained language models. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Apphcations of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelhgence, New York, USA, pp. 9733–9740, 2020.

  47. A. Talmor, Y. Elazar, Y. Goldberg, J. Berant. oLMpicson what language model pre-training captures. Transactions of the Association for Computational Linguistics, vol. 8, pp. 743–758, 2020. DOI: https://doi.org/10.1162/tacl_a_00342.

    Article  Google Scholar 

  48. Z. B. Jiang, F. F. Xu, J. Araki, G. Neubig. How can we know what language models know?. Transactions of the Association for Computational Linguistics, vol. 8, pp. 423–438, 2020. DOI: https://doi.org/10.1162/tacl_a_00324.

    Article  Google Scholar 

  49. J. Davison, J. Feldman, A. Rush. Commonsense knowledge mining from pretrained models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, pp. 1173–1178, 2019. DOI: https://doi.org/10.18653/v1/D19-1109.

  50. A. Haviv, J. Berant, A. Globerson. BERTese: Learning to speak to BERT. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 3618–3623, 2021. DOI: https://doi.org/10.18653/v1/2021.eacl-main.316.

  51. T. Shin, Y. Razeghi, R. L. Logan IV, E. Wallace, S. Singh. AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.4222-41235, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.346.

  52. Z. X. Zhong, D. Friedman, D. Q. Chen. Factual probing is[MASK]: Learning vs. learning to recall. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5017–5033, 2021. DOI: https://doi.org/10.18653/vl/2021.naacl-main.398.

  53. X. L. Li, P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp.4582-4597, 2021. DOI: https://doi.org/10.18653/v1/2021.acl-long.353.

  54. X. Liu, Y. N. Zheng, Z. X. Du, M. Ding, Y. J. Qian, Z. L. Yang, J. Tang. GPT understands, too. [Online], Available: https://arxiv.org/abs/2103.10385, 2021.

  55. N. Kassner, H. Schütze. Negated and misprimed probes for pretrained language models: Birds can talk, but cannot fly. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.7811-7818, 2020. DOI: https://doi.org/10.18653/v1/2020.acl-main.698.

  56. Y. Elazar, N. Kassner, S. Ravfogel, A. Ravichander, E. Hovy, H. Schütze, Y. Goldberg. Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics, vol. 9, pp. 1012–1031, 2021. DOI: https://doi.org/10.1162/tacl_a_00410.

    Article  Google Scholar 

  57. B. X. Cao, H. Y. Lin, X. P. Han, L. Sun, L. Y. Yan, M. Liao, T. Xue, J. Xu. Knowledgeable or educated guess? Revisiting language models as knowledge bases. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 1860–1874, 2021. DOI: https://doi.org/10.18653/v1/2021.acl-long.146.

  58. B. X. Cao, H. Y. Lin, X. P. Han, F. C. Liu, L. Sun. Can prompt probe pretrained language models? Understanding the invisible risks from a causal view. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, pp. 5796–5808, 2022. DOI: https://doi.org/10.18653/v1/2022.acl-long.398.

  59. I. Tenney, P. Xia, B. Chen, A. Wang, A. Poliak, R. T. McCoy, N. Kim, B. Van Durme, S. R. Bowman, D. Das, E. Pavlick. What do you learn from context? Probing for sentence structure in contextualized word representations. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.

  60. Z. Y. Wu, Y. Chen, B. Kao, Q. Liu. Perturbed masking: Parameter-free probing for analyzing and interpreting BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.4166-4176, 2020. DOI: https://doi.org/10.18653/v1/2020.acl-main.383.

  61. Y. C. Zhou, V. Srikumar. DirectProbe: Studying representations without classifiers. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5070–5083, 2021. DOI: https://doi.org/10.18653/v1/2021.naacl-main.401.

  62. A. Rogers, O. Kovaleva, A. Rumshisky. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, vol.8, pp.842–866, 2020. DOI: https://doi.org/10.1162/tacl_a_00349.

    Article  Google Scholar 

  63. Y. Belinkov. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics, vol. 48, no. 1, pp. 207–219, 2022. DOI: https://doi.org/10.1162/coli_a_00422.

    Article  Google Scholar 

  64. E. Mitchell, C. Lin, A. Bosselut, C. D. Manning, C. Finn. Memory-based model editing at scale. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, USA, pp. 15817–15831, 2022.

  65. A. Madaan, N. Tandon, P. Clark, Y. M. Yang. Memory-assisted prompt editing to improve GPT-3 after deployment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, UAE, pp. 2833–2861, 2022.

  66. Q. X. Dong, D. M. Dai, Y. F. Song, J. J. Xu, Z. F. Sui, L. Li. Calibrating factual knowledge in pretrained language models. In Proceedings of the Findings of the Association for Computational Linguistics, Abu Dhabi, UAE, pp. 5937–5947, 2022.

  67. P. Hase, M. Diab, A. Celikyilmaz, X. Li, Z. Kozareva, V. Stoyanov, M. Bansal, S. Iyer. Do language models have beliefs? Methods for detecting, updating, and visualizing model beliefs. [Online], Available: https://arxiv.org/abs/2111.13654, 2021.

  68. C. D. Manning, K. Clark, J. Hewitt, U. Khandelwal, O. Levy. Emergent linguistic structure in artificial neural networks trained by self-sup er vision. Proceedings of the National Academy of Sciences of the United States of America, vol. 117, no. 48, pp. 30046–30054, 2020. DOI: https://doi.org/10.1073/pnas.1907367117.

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  69. X. K. Wei, S. Wang, D. J. Zhang, P. Bhatia, A. Arnold. Knowledge enhanced pretrained language models: A comprehensive survey. [Online], Available: https://arxiv.org/abs/2110.08455, 2021.

  70. J. Yang, G. Xiao, Y. L. Shen, W. Jiang, X. Y. Hu, Y. Zhang, J. H. Peng. A survey of knowledge enhanced pretrained models. [Online], Available: https://arxiv.org/abs/2110.00269, 2021.

  71. D. Yin, L. Dong, H. Cheng, X. D. Liu, K. W. Chang, F. R. Wei, J. F. Gao. A survey of knowledge-intensive NLP with pre-trained language models. [Online], Available: https://arxiv.org/abs/2202.08772, 2022.

  72. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, vol.1, no. 8, Article number 9, 2019.

  73. P. F. Liu, W. Z. Yuan, J. L. Fu, Z. B. Jiang, H. Hayashi, G. Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, vol.55, no. 9, Article number 195, 2023. DOI: https://doi.org/10.1145/3560815.

  74. Z. H. Zhao, E. Wallace, S. Feng, D. Klein, S. Singh. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning, pp. 12697–12706, 2021.

  75. Y. Lu, M. Bartolo, A. Moore, S. Riedel, P. Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, pp.8086-8098, 2022. DOI: https://doi.org/10.18653/v1/2022.acl-long.556.

  76. M. Forbes, A. Holtzman, Y. Choi. Do neural language representations learn physical commonsense? In Proceedings of the 41th Annual Meeting of the Cognitive Science Society, CogSci 2019: Creativity + Cognition + Computation, Montreal, Canada, pp. 1753–1759, 2019.

  77. J. Jang, S. Ye, M. Seo. Can large language models truly understand prompts? A case study with negated prompts. [Online], Available: https://arxiv.org/abs/2209.12711, 2022.

  78. N. Poerner, U. Waltinger, H. Schütze. E-BERT: Efficient-yet-effective entity embeddings for BERT. In Proceedings of the Findings of the Association for Computational Linguistics, pp.803-818, 2020. DOI: https://doi.org/10.18653/v1/2020.findings-emnlp.71.

  79. S. B. Li, X. G. Li, L. F. Shang, Z. H. Dong, C. J. Sun, B. Q. Liu, Z. Z. Ji, X. Jiang, Q. Liu. How pre-trained language models capture factual knowledge? A causal-inspired analysis. In Proceedings of the Findings of the Association for Computational Linguistics, Dublin, Ireland, pp. 1720–1732, 2022. DOI: https://doi.org/10.18653/v1/2022.findings-acl.136.

  80. B. Heinzerling, K. Inui. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1772–1791, 2021. DOI: https://doi.org/10.18653/v1/2021.eacl-main.153.

  81. C. G. Wang, X. Liu, D. Song. Language models are open knowledge graphs. [Online], Available: https://arxiv.org/abs/2010.11967, 2020.

  82. S. Razniewski, A. Yates, N. Kassner, G. Weikum. Language models as or for knowledge bases. [Online], Available: https://arxiv.org/abs/2110.04888, 2021.

  83. B. AlKhamissi, M. Li, A. Celikyilmaz, M. Diab, M. Ghazvininejad. A review on language models as knowledge bases. [Online], Available: https://arxiv.org/abs/2204.06031, 2022.

  84. X. P. Qiu, T. X. Sun, Y. G. Xu, Y. F. Shao, N. Dai, X. J. Huang. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, vol. 63, no. 10, pp. 1872–1897, 2020. DOI: https://doi.org/10.1007/s11431-020-1647-3.

    Article  ADS  Google Scholar 

  85. T. Sun, X. Liu, X. Qiu, X. Huang. Raradigm shift in natural language processing. Machine Intelligence Research, vol. 19, no. 3, pp. 169–183, 2022.

    Article  Google Scholar 

  86. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wain-wright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, R. Lowe. Training language models to follow instructions with human feedback. [Online], Available: https://arxiv.org/abs/2203.02155, 2022.

  87. BigScience Workshop. BLOOM: A 176B-parameter open-access multilingual language model. [Online], Available: https://arxiv.org/abs/2211.05100, 2022.

  88. K. T. Song, X. Tan, T. Qin, J. F. Lu, T. Y. Liu. MASS: Masked sequence to sequence pre-training for language generation. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, pp. 5926–5936, 2019.

  89. N. F. Liu, M. Gardner, Y. Belinkov, M. E. Peters, N. A. Smith. Linguistic knowledge and transferability of contextual representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, pp. 1073–1094, 2019. DOI: https://doi.org/10.18653/v1/N19-1112.

  90. Y. Goldberg. Assessing BERT’s syntactic abilities. [Online], Available: https://arxiv.org/abs/1901.05287, 2019.

  91. A. Warstadt, Y. Cao, I. Grosu, W. Peng, H. Blix, Y. N. Nie, A. Alsop, S. Bordia, H. K. Liu, A. Parrish, S. F. Wang, J. Phang, A. Mohananey, P. M. Htut, P. Jeretic, S. R. Bowman. Investigating BERT’s knowledge of language: Five analysis methods with NPIs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, pp. 2877–2887, 2019. DOI: https://doi.org/10.18653/v1/D19-1286.

  92. E. Wallace, Y. Z. Wang, S. J. Li, S. Singh, M. Gardner. Do NLP models know numbers? Probing numeracy in embeddings. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, pp. 5307–5315, 2019. DOI: https://doi.org/10.18653/v1/D19-1534.

  93. A. Ettinger. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics, vol. 8, pp. 34–48, 2020. DOI: https://doi.org/10.1162/tacl_a_00298.

    Article  Google Scholar 

  94. Z. Bouraoui, J. Camacho-Collados, S. Schockaert. Inducing relational knowledge from BERT. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelhgence, New York, USA, pp. 7456–7463, 2020. DOI: https://doi.org/10.1609/aaai.v34i05.6242.

  95. X. H. Zhou, Y. Zhang, L. Y. Cui, D. D. Huang. Evaluating commonsense in pre-trained language models. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Apphcations of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelhgence, New York, USA, pp. 9733–9740, 2020. DOI: https://doi.org/10.1609/aaai.v34i05.6523.

  96. A. Roberts, C. Raffel, N. Shazeer. How much knowledge can you pack into the parameters of a language model? In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 5418–5426, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.437.

  97. B. Y. Lin, S. Lee, R. Khanna, X. Ren. Birds have four legs?! NumerSense: Probing numerical commonsense knowledge of pre-trained language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 6862–6868, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.557.

  98. A. Tamborrino, N. Pellicanò, B. Pannier, P. Voitot, L. Naudin. Pre-training is (almost) all you need: An application to commonsense reasoning. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3878–3887, 2020. DOI: https://doi.org/10.18653/v1/2020.acl-main.357.

  99. A. Achille, M. Rovere, S. Soatto. Critical learning periods in deep networks. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.

  100. N. Saphra, A. Lopez. Understanding learning dynamics of language models with SVCCA. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, pp. 3257–3267, 2019. DOI: https://doi.org/10.18653/v1/N19-1329.

  101. N. Saphra, A. Lopez. LSTMs compose-and Learn-Bottom-up. In Proceedings of the Findings of the Association for Computational Linguistics, pp. 2797–2809, 2020. DOI: https://doi.org/10.18653/v1/2020.findings-emnlp.252.

  102. S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. DOI: https://doi.org/10.1162/neco.1997.9.8.1735.

    Article  CAS  PubMed  Google Scholar 

  103. M. Raghu, J. Gilmer, J. Yosinski, J. Sohl-Dickstein. SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6076–6085, 2017.

  104. Z. Z. Lan, M. D. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.

  105. T. Shen, Y. Mao, P. C. He, G. D. Long, A. Trischler, W. Z. Chen. Exploiting structured knowledge in text via graph-guided representation learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 8980–8994, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.722.

  106. I. Yamada, A. Asai, H. Shindo, H. Takeda, Y. Matsumoto. LUKE: Deep contextualized entity representations with entity-aware self-attent ion. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 6442–6454, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.523.

  107. T. Févry, L. B. Soares, N. FitzGerald, E. Choi, T. Kwiatkowski. Entities as experts: Sparse memory access with entity supervision. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.4937-4951, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.400.

  108. L. Logeswaran, M. W. Chang, K. Lee, K. Toutanova, J. Devlin, H. Lee. Zero-shot entity linking by reading entity descriptions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 3449–3460, 2019. DOI: https://doi.org/10.18653/v1/P19-1335.

  109. D. Gillick, S. Kulkarni, L. Lansing, A. Presta, J. Baldridge, E. Ie, D. Garcia-Olano. Learning dense representations for entity retrieval. In Proceedings of the 23rd Conference on Computational Natural Language Learning, Hong Kong, China, pp. 528–537, 2019. DOI: https://doi.org/10.18653/v1/K19-1049.

  110. Y. J. Qin, Y. K. Lin, R. Takanobu, Z. Y. Liu, P. Li, H. Ji, M. L. Huang, M. S. Sun, J. Zhou. ERICA: Improving entity and relation understanding for pre-trained language models via contrastive learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 3350–3363, 2021. DOI: https://doi.org/10.18653/v1/2021.acl-long.260.

  111. P. Banerjee and C. Baral. S elf-supervised knowledge triplet learning for zero-shot question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.151-162, 2020. DOI: 18653/v1/2020.emnlp-main.11.

  112. L. B. Soares, N. FitzGerald, J. Ling, T. Kwiatkowski. Matching the blanks: Distributional similarity for relation learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 2895–2905, 2019. DOI: https://doi.org/10.18653/v1/P19-1279.

  113. V. Shwartz, P. West, R. Le Bras, C. Bhagavatula, Y. Choi. Unsupervised commonsense question answering with self-talk. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 4615–4629, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.373.

  114. H. Tian, C. Gao, X. Y. Xiao, H. Liu, B. L. He, H. Wu, H. F. Wang, F. Wu. SKEP: Sentiment knowledge enhanced pre-training for sentiment analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.4067-4076, 2020. DOI: https://doi.org/10.18653/v1/2020.acl-main.374.

  115. Y. Levine, B. Lenz, O. Dagan, O. Ram, D. Padnos, O. Sharir, S. S halev-Shwartz, A. Shashua, Y. Shoham. SenseBERT: Driving some sense into BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4656–4667, 2020. DOI: https://doi.org/10.18653/v1/2020.acl-main.423.

  116. G. A. Miller. WordNet: A lexical database for English. In Proceedings of a Workshop Held at Harriman: Speech and Natural Language, New York, 1992.

  117. R. Navigli, S. P. Ponzetto. BabelNet: Building a very large multilingual semantic network. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp.216-225, 2010.

  118. J. Song, D. Liang, R. M. Li, Y. T. Li, S. R. Wang, M. L. Peng, W. Wu, Y. X. Yu. Improving semantic matching through dependency-enhanced pre-trained model with adaptive fusion. In: Proceedings of the Findings of the Association for Computational Linguistics, Abu Dhabi, UAE, pp.45-57, 2022.

  119. K. Guu, K. Lee, Z. Tung, P. Pasupat, M. W. Chang. REALM: Retrieval-augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, Article number. 368, 2020.

  120. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. T. Yih, T. Rocktäschel, S. Riedel, D. Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Curran Associates Inc, Vancouver, Canada, Article number. 793, 2020.

    Google Scholar 

  121. M. Yasunaga, A. Aghajanyan, W. J. Shi, R. James, J. Leskovec, P. Liang, M. Lewis, L. Zettlemoyer, W. T. Yih. Retrieval-augmented multimodal language modeling. [Online], Available: https://arxiv.org/abs/2211.12561, 2022.

  122. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 5998–6008, 2017.

  123. J. Wallat, J. Singh, A. Anand. BERTnesia: Investigating the capture and forgetting of knowledge in BERT. In Proceedings of the 3rd BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 174–183, 2020. DOI: https://doi.org/10.18653/v1/2020.blackboxnlp-1.17.

  124. A. Warstadt, A. Parrish, H. K. Liu, A. Mohananey, W. Peng, S. F. Wang, S. R. Bowman. BLiMP: The benchmark of linguistic minimal pairs for English. Transactions of the Association for Computational Linguistics, vol. 8, pp. 377–392, 2020. DOI: https://doi.org/10.1162/tacl_a_00321.

    Article  Google Scholar 

  125. N. Kassner, P. Dufter, H. Schütze. Multilingual LAMA: Investigating knowledge in multilingual pretrained language models. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 3250–3258, 2021. DOI: https://doi.org/10.18653/v1/2021.eacl-main.284.

  126. B. Y. Lin, S. Lee, R. Khanna, X. Ren. Birds have four legs?! NumerSense: Probing numerical commonsense knowledge of pre-trained language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 6862–6868, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.557.

  127. E. Voita, I. Titov. Informat ion-theoretic probing with minimum description length. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 183–196, 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.14.

  128. A. Srivastava, A. Rastogi, A. Rao, A. A. Md Shoeb, A. Abid, A. Fisch, A. R. Brown, A. Santoro, A. Gupta, A. Garriga-Alonso, et al. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. [Online], Available: https://arxiv.org/abs/2206.04615, 2022.

  129. M. Hardt, E. Price, N. Srebro. Equality of opportunity in supervised learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 3323–3331, 2016.

  130. N. Kilbertus, M. Rojas-Carulla, G. Parascandolo, M. Hardt, D. Janzing, B. Schölkopf. Avoiding discrimination through causal reasoning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp.656-666, 2017.

  131. M. Kusner, J. Loftus, C. Russell, R. Silva. Counterfactual fairness. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp.4069-4079, 2017.

  132. J. Vig, S. Gehrmann, Y. Belinkov, S. Qian, D. Nevo, Y. Singer, S. Shieber. Investigating gender bias in language models using causal mediation analysis. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 1039,2020.

  133. A. Feder, K. A Keith, E. Manzoor, R. Pryzant, D. Sridhar, Z. Wood-Doughty, J. Eisenstein, J. Grimmer, R. Reichart, M. E. Roberts, B. M. Stewart, V. Veitch, D. Y. Yang. Causal inference in natural language processing: Estimation, prediction, interpretation and beyond. Transactions of the Association for Computational Linguistics, vol. 10, pp. 1138–1158, 2022. DOI: https://doi.org/10.1162/tacl_a_00511.

    Article  Google Scholar 

  134. Y. Elazar, N. Kassner, S. Ravfogel, A. Feder, A. Ravichander, M. Mosbach, Y. Belinkov, H. Schütze, Y. Goldberg. Measuring causal effects of data statistics on language model’s4 factual’ predictions. [Online], Available: https://arxiv.org/abs/2207.14251, 2022.

  135. M. Finlayson, A. Mueller, S. Gehrmann, S. Shieber, T. Linzen, Y. Belinkov. Causal analysis of syntactic agreement mechanisms in neural language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 1828–1843, 2021. DOI: https://doi.org/10.18653/v1/2021.acl-long.144.

  136. A. Köhn. What’S in an embedding? Analyzing word embeddings through multilingual evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 2067–2073, 2015. DOI: https://doi.org/10.18653/v1/D15-1246.

  137. A. Gupta, G. Boleda, M. Baroni, S. Padó. Distributional vectors encode referential attributes. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 12–21, 2015. DOI: https://doi.org/10.18653/v1/D15-1002.

  138. Y. Yaghoobzadeh, K. Kann, T. J. Hazen, E. Agirre, H. Schütze. Probing for semantic classes: Diagnosing the meaning content of word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 5740–5753, 2019. DOI: https://doi.org/10.18653/v1/P19-1574.

  139. Y. C. Zhou, V. Srikumar. DirectProbe: Studying representations without classifiers. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5070–5083, 2021. DOI: https://doi.org/10.18653/v1/2021.naacl-main.401.

  140. A. Sinitsin, V. Plokhotnyuk, D. V. Pyrkin, S. Popov, A. Babenko. Editable neural networks. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.

  141. D. Ha, A. M. Dai, Q. V. Le. Hypernetworks. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.

  142. K. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM, Vancouver, Canada, pp. 1247–1250, 2008. DOI: https://doi.org/10.1145/1376616.1376746.

    Google Scholar 

  143. D. Vrandecic, M. Krötzsch. Wikidata: A free collaborative knowledgebase. Communications of the ACM, vol. 57, no. 10, pp. 78–85, 2014. DOI: https://doi.org/10.1145/2629489.

    Article  Google Scholar 

  144. J. Pérez, M. Arenas, C. Gutierrez. Semantics and complexity of SPARQL. ACM Transactions on Database Systems, vol.34, no. 3, Article number 16, 2009. DOI: 1145/1567274.1567278.

  145. T. Y. Gao, A. Fisch, D. Q. Chen. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 3816–3830, 2021. DOI: https://doi.org/10.18653/v1/2021.acl-long.295.

  146. S. D. Hu, N. Ding, H. D. Wang, Z. Y. Liu, J. G. Wang, J. Z. Li, W. Wu, M. S. Sun. Knowledgeable prompt-tuning: Incorporating knowledge into prompt verbalizer for text classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, pp. 2225–2240, 2022. DOI: https://doi.org/10.18653/v1/2022.acllong.158.

  147. X. L. Li, P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp.4582-4597, 2021. DOI: https://doi.org/10.18653/v1/2021.acl-long.353.

  148. K. Hambardzumyan, H. Khachatrian, J. May. WARP: Word-level Adversarial Re Programming. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp.4921-4933, 2021. DOI: https://doi.org/10.18653/v1/2021.acl-long.381.

  149. B. Lester, R. Al-Rfou, N. Constant. The power of scale for parameter-efficient prompt tuning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, pp. 3045–3059, 2021. DOI: https://doi.org/10.18653/v1/2021.emnlp-main.243.

  150. G. H. Qin, J. Eisner. Learning how to ask: Querying LMs with mixtures of soft prompts. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5203–5212, 2021. DOI: https://doi.org/10.18653/v1/2021.naacl-main.410.

  151. X. Han, W. L. Zhao, N. Ding, Z. Y. Liu, M. S. Sun. PTR: Prompt tuning with rules for text classification. AI Open, vol. 3, pp. 182–192, 2022. DOI: https://doi.org/10.1016/j.aiopen.2022.11.003.

    Article  Google Scholar 

  152. B. Ozturkler, N. Malkin, Z. Wang, N. Jojic. Thinksum: Probabilistic reasoning over sets using large language models. [Online], Available: https://arxiv.org/abs/2210.01293, 2022.

  153. T. Schick, H. Schütze. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 255–269, 2021. DOI: https://doi.org/10.18653/v1/2021.eacl-main.20.

  154. E. Ben-David, N. Oved, R. Reichart. PADA: A prompt-based autoregressive approach for adaptation to unseen domains. [Online], Available: https://arxiv.org/abs/2102.12206, 2021.

  155. T. Schick, S. Udupa, H. Schütze. S elf-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in NLP. Transactions of the Association for Computational Linguistics, vol. 9, pp. 1408–1424, 2021. DOI: https://doi.org/10.1162/tacl_a_00434.

    Article  Google Scholar 

  156. J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, Q. V. Le. Finetuned language models are zero-shot learners. In Proceedings of the 10th International Conference on Learning Representations, 2022.

  157. V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, A. Raja, M. Dey, M. S. Bari, C. W. Xu, U. Thakker, S. S. Sharma, E. Szczechla, T. Kim, G. Chhablani, N. V. Nayak, D. Datta, J. Chang, M. T. J. Jiang, H. Wang, M. Manica, S. Shen, Z. X. Yong, H. Pandey, R. Bawden, T. Wang, T. Neeraj, J. Rozen, A. Sharma, A. Santilli, T. Fèvry, J. A. Fries, R. Teehan, T. L. Scao, S. Biderman, L. Gao, T. Wolf, A. M. Rush. Multitask prompted training enables zero-shot task generalization. In Proceedings of the 10th International Conference on Learning Representations, 2022.

  158. H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y. Li, X. Z. Wang, M. Dehghani, S. Brahma, A. Webson, S. S. Gu, Z. Y. Dai, M. Suzgun, X. Y. Chen, A. Chowdhery, A. Castro-Ros, M. Pellat, K. Robinson, D. Valter, S. Narang, G. Mishra, A. Yu, V. Zhao, Y. P. Huang, A. Dai, H. K. Yu, S. Petrov, E. H. Chi, J. Dean, J. Devlin, A. Roberts, D. Zhou, Q. V. Le, J. Wei. Scaling instruction-finetuned language models. [Online], Available: https://arxiv.org/abs/2210.11416, 2022.

  159. S. S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Y. Chen, S. H. Chen, C. Dewan, M. Diab, X. Li, X. V. Lin, T. Mihaylov, M. Ott, S. Shleifer, K. Shuster, D. Simig, P. S. Koura, A. Sridhar, T. L. Wang, L. Zettlemoyer. OPT: Open pre-trained transformer language models. [Online], Available: https://arxiv.org/abs/2205.01068, 2022.

  160. A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. S. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. C. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. W. Zhou, X. Z. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, N. Fiedel. PaLM: Scaling language modeling with pathways. [Online], Available: https://arxiv.org/abs/2204.02311, 2022.

  161. Q. X. Dong, L. Li, D. M. Dai, C. Zheng, Z. Y. Wu, B. B. Chang, X. Sun, J. J. Xu, L. Li, Z. F. Sui. A survey on incontext learning. [Online], Available: https://arxiv.org/abs/2301.00234, 2023.

  162. D. H. Lee, A. Kadakia, K. M. Tan, M. Agarwal, X. Y. Feng, T. Shibuya, R. Mitani, T. Sekiya, J. Pujara, X. Ren. Good examples make a faster learner: Simple demonstration-based learning for low-resource NER. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, pp.2687-2700, 2022. DOI: https://doi.org/10.18653/v1/2022.acl-long.192.

  163. J. Eisenstein, D. Andor, B. Bohnet, M. Collins, D. Mimno. Honest students from untrusted teachers: Learning an interprétable question-answering pipeline from a pretrained language model. [Online], Available: https://arxiv.org/abs/2210.02498, 2022.

  164. H. X. Zhang, Y. Z. Zhang, R. Y. Zhang, D. Y. Yang. Robustness of demonstration-based learning under limited data scenario. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, UAE, pp. 1769–1782, 2022.

  165. S. Y. Li, J. S. Chen, Y. L. Shen, Z. Y. Chen, X. L. Zhang, Z. K. Li, H. Wang, J. Qian, B. L. Peng, Y. Mao, W. H. Chen, X. F. Yan. Explanations from large language models make small reasoners better. [Online], Available: https://arxiv.org/abs/2210.06726, 2022.

  166. Z. Y. Dai, V. Y Zhao, J. Ma, Y. Luan, J. M. Ni, J. Lu, A. Bakalov, K. Guu, K. B. Hall, M. W. Chang. Promptagator: Few-shot dense retrieval from 8 examples. [Online], Available: https://arxiv.org/abs/2209.11755, 2022.

  167. W. H. Yu, D. Iter, S. H. Wang, Y. C. Xu, M. X. Ju, S. Sanyal, C. G. Zhu, M. Zeng, M. Jiang. Generate rather than retrieve: Large language models are strong context generators. [Online], Available: https://arxiv.org/abs/2209.10063, 2022.

  168. J. Wei, X. Z. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, D. Zhou. Chain-of-thought prompting elicits reasoning in large language models. [Online], Available: https://arxiv.org/abs/2201.11903, 2022.

  169. A. K. Lampinen, I. Dasgupta, S. C. Y. Chan, K. Mathewson, M. H. Tessler, A. Creswell, J. L. McClelland, J. X. Wang, F. Hill. Can language models learn from explanations in context? In Proceedings of the Findings of the Association for Computational Linguistics, Abu Dhabi, UAE, pp. 537–563, 2022.

  170. D. Zhou, N. Schärli, L. Hou, J. Wei, N. Scales, X. Z. Wang, D. Schuurmans, O. Bousquet, Q. Le, E. Chi. Least-to-most prompting enables complex reasoning in large language models. [Online], Available: https://arxiv.org/abs/2205.10625, 2022.

  171. J. C. Liu, D. H. Shen, Y. Z. Zhang, B. Dolan, L. Carin, W. Z. Chen. What makes good in-context examples for GPT-3? In Proceedings of the Deep Learning Inside Out: The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, Dublin, Ireland, pp. 100–114, 2022. DOI: https://doi.org/10.18653/v1/2022.deelio-1.10.

  172. O. Rubin, J. Herzig, J. Berant. Learning to retrieve prompts for in-context learning. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, USA, pp. 2655–2671, 2022. DOI: https://doi.org/10.18653/v1/2022.naacl-main.191.

  173. H. J. Su, J. Kasai, C. H. Wu, W. J. Shi, T. L. Wang, J. Y. Xin, R. Zhang, M. Ostendorf, L. Zettlemoyer, N. A. Smith, T. Yu. Selective annotation makes language models better few-shot learners. [Online], Available: https://arxiv.org/abs/2209.01975, 2022.

  174. L. Reynolds, K. McDonell. Prompt programming for large language models: Beyond the few-shot paradigm. In Proceedings of the CHI Conference on Human Factors in Computing Systems, ACM, Yokohama, Japan, Article number 314, 2021. DOI: https://doi.org/10.1145/3411763.3451760.

    Google Scholar 

  175. S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, L. Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, UAE, pp.11048-11064, 2022.

  176. S. C. Y. Chan, A. Santoro, A. K. Lampinen, J. X. Wang, A. Singh, P. H. Richemond, J. McClelland, F. Hill. Data distributional properties drive emergent in-context learning in transformers. [Online], Available: https://arxiv.org/abs/2205.05055, 2022.

  177. J. von Oswald, E. Niklasson, E. Randazzo, J. Sacramento, A. Mordvintsev, A. Zhmoginov, M. Vladymyrov. Transformers learn in-context by gradient descent. [Online], Available: https://arxiv.org/abs/2212.07677, 2022.

  178. T. X. Sun, Y. F. Shao, H. Qian, X. J. Huang, X. P. Qiu. Black-box tuning for language-model-as-a-service. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, USA, pp. 20841–20855, 2022.

Download references

Acknowledgements

This research work is supported by the National Natural Science Foundation of China (No. 62 122 077) and CAS Project for Young Scientists in Basic Research, China (No.YSBR-040).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianpei Han.

Ethics declarations

The authors declared that they have no conflicts of interest to this work.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Boxi Cao received the B.Sc. degree in Beijing University of Posts and Telecommunication, China in 2019. He is a Ph.D. degree candidate at the Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, under the supervision of Professor Xianpei Han and Professor Le Sun.

His research interests he in natural language process, especially the knowledge in large language models, as well as information extraction.

Hongyu Lin received the Ph.D. degree from Institute of Software, Chinese Academy of Sciences, China in 2020. He is currently an associate professor in Institute of Software, Chinese Academy of Sciences, China.

His research interests include information extraction and knowledge mechanism in large LMs.

Xianpei Han received the Ph.D. degree in pattern recognition and intelligent systems from National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China in 2010. He is a professor of computer science at the Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, China.

His research interests lie in natural language understanding, and he has published about 60 papers in ACL/EMNLP/SIGIR/AAAI.

Le Sun received the Ph.D. degree in Engineering Mechanics from Nanjing University of Science Technology, China in 1998. He is a professor at Institute of Software, Chinese Academy of Sciences (ISCAS), China. He is the General Secretary of Chinese Information Processing Society of China (CIPS), China. He visited University of Birmingham, UK and University of Montreal, Canada as visiting scholar at 2004 and 2005. He has published more than 100 top journal and conference papers. He received the best short paper Award from SIGIR2021. In 2022, he received the Excellent Tutor Award from Chinese Academy of Sciences and First prize of Qian Weichang Chinese Information Processing Science and Technology Award.

His research interests include natural language understanding, knowledge graph, information extraction, and question answer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, B., Lin, H., Han, X. et al. The Life Cycle of Knowledge in Big Language Models: A Survey. Mach. Intell. Res. 21, 217–238 (2024). https://doi.org/10.1007/s11633-023-1416-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-023-1416-x

Keywords

Navigation