Abstract
Various probing studies have investigated the ability of neural language models trained only on word prediction over large corpora to process hierarchical structures, a hallmark of human linguistic abilities. For instance, it has been shown that Long Short-Term Memory (LSTM), a type of Recurrent Neural Networks (RNNs), can capture long-distance subject-verb agreement patterns and attraction effects found in human sentence processing. However, although it’s found in human experiments that attractors that are syntactically closer to but linearly farther from the verb elicit greater attraction effects than those that are linearly closer but syntactically farther, LSTM shows the opposite pattern, suggesting that it lacks knowledge regarding syntactic distance and is more sensitive to local information. The current article investigates whether state-of-the-art Generative Pre-trained Transformers (GPTs) can capture the prominence of syntactic distance. We experimented with various versions of GPT-2 and GPT-3 and found that all of them succeeded at the task. We conclude that GPT may be able to better model human linguistic cognition than LSTM, corroborating previous research, and that further investigating what mechanisms enable GPT to do so may inform research on human syntactic processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goldberg, Y.: Neural network methods for natural language processing. Synthesis Lect. Hum. Lang. Technol. 10(1), 1–309 (2017)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Linzen, T., Dupoux, E., Goldberg, Y.: Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Trans. Assoc. Comput. Linguist. 4, 521–535 (2016)
Gulordava, K., Bojanowski, P., Grave, E., Linzen, T., Baroni, M.: Colorless green recurrent networks dream hierarchically. In: Proceedings of NAACL (2018)
Wilcox, E., Levy, R., Morita, T., Futrell, R.: What do RNN language models learn about filler-gap dependencies? In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, vol. 2, no. 5, pp. 211–221 (2018)
Wilcox, E., Gauthier, J., Hu, J., Qian, P., Levy, R.P.: Learning syntactic structures from string input. In: Lappin, S. (ed.) Algebraic Systems and the Representation of Linguistic Knowledge, vol. 2, no. 5 (2022)
Chomsky, N.: Syntactic Structures. Walter de Gruyter, Berlin (1957)
Jäger, L.A., Engelmann, F., Vasishth, S.: Similarity-based interference in sentence comprehension: literature review and Bayesian meta-analysis. J. Mem. Lang. 94, 316–339 (2017)
Bock, K., Miller, C.A.: Broken agreement. Cogn. Psychol. 23(1), 45–93 (1991)
Wagers, M., Lau, E.F., Phillips, C.: Agreement attraction in comprehension: representations and processes. J. Mem. Lang. 61, 206–237 (2009)
Lewis, R.L., Vasishth, S.: An activation-based model of sentence processing as skilled memory retrieval. Cogn. Sci. 29(3), 375–419 (2005)
Eberhard, K.: The marked effect of number on subject–verb agreement. J. Mem. Lang. 36, 147–164 (1997)
Bock, K., Cutting, J.C.: Regulating mental energy: performance units in language production. J. Mem. Lang. 31, 99–127 (1992)
Haskell, T.R., MacDonald, M.C.: Constituent structure and linear order in language production: evidence from subject-verb agreement. J. Exp. Psychol. Learn. Mem. Cogn. 31(5), 891–904 (2005)
Franck, J., Vigliocco, G., Nicol, J.: Subject–verb agreement errors in French and English: the role of syntactic hierarchy. Lang. Cognit. Process. 17, 371–404 (2002)
Arehalli, S., Linzen, T.: Neural language models capture some, but not all, agreement attraction effects. Trans. Assoc. Comput. Linguist. 8, 166–180 (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training. OpenAI (2018)
Merkx, D., Frank, S. L.: Human sentence processing: recurrence or attention? In: Proceedings of the 2021 Workshop on Cognitive Modeling and Computational Linguistics, pp. 1–11. Association for Computational Linguistics, Mexico City (2021)
Michaelov, J.A., Coulson, S., Bergen, B.K.: So Cloze yet so far: N400 amplitude is better predicted by distributional information than human predictability judgements. arXiv:2109.01226 [Cs, Math] (2021)
Bürkner, P.-C.: brms: An R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80(1), 1–28 (2017)
Contreras Kallens, P., Kristensen‐McLachlan, R.D., Christiansen, M.H.: Large language models demonstrate the potential of statistical learning in language. Cogn. Sci. 47(3) (2023)
Wilcox, E.G., Futrell, R., Levy, R.: Using computational models to test syntactic learnability. Linguist. Inq. (2022)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st Conference on Neural Information Processing Systems, pp. 6000–6010 (2017)
Luong, M.-T., Pham, H., Manning, C.-D.: Effective approaches to attention based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hao, H. (2023). Evaluating Transformers’ Sensitivity to Syntactic Embedding Depth. In: Mehmood, R., et al. Distributed Computing and Artificial Intelligence, Special Sessions I, 20th International Conference. DCAI 2023. Lecture Notes in Networks and Systems, vol 741. Springer, Cham. https://doi.org/10.1007/978-3-031-38318-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-38318-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38317-5
Online ISBN: 978-3-031-38318-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)