Evaluating Transformers’ Sensitivity to Syntactic Embedding Depth

Hao, Hailin

doi:10.1007/978-3-031-38318-2_16

Hailin Hao¹⁹

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 741))

Included in the following conference series:

International Symposium on Distributed Computing and Artificial Intelligence

368 Accesses

Abstract

Various probing studies have investigated the ability of neural language models trained only on word prediction over large corpora to process hierarchical structures, a hallmark of human linguistic abilities. For instance, it has been shown that Long Short-Term Memory (LSTM), a type of Recurrent Neural Networks (RNNs), can capture long-distance subject-verb agreement patterns and attraction effects found in human sentence processing. However, although it’s found in human experiments that attractors that are syntactically closer to but linearly farther from the verb elicit greater attraction effects than those that are linearly closer but syntactically farther, LSTM shows the opposite pattern, suggesting that it lacks knowledge regarding syntactic distance and is more sensitive to local information. The current article investigates whether state-of-the-art Generative Pre-trained Transformers (GPTs) can capture the prominence of syntactic distance. We experimented with various versions of GPT-2 and GPT-3 and found that all of them succeeded at the task. We conclude that GPT may be able to better model human linguistic cognition than LSTM, corroborating previous research, and that further investigating what mechanisms enable GPT to do so may inform research on human syntactic processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goldberg, Y.: Neural network methods for natural language processing. Synthesis Lect. Hum. Lang. Technol. 10(1), 1–309 (2017)
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Google Scholar
Linzen, T., Dupoux, E., Goldberg, Y.: Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Trans. Assoc. Comput. Linguist. 4, 521–535 (2016)
Article Google Scholar
Gulordava, K., Bojanowski, P., Grave, E., Linzen, T., Baroni, M.: Colorless green recurrent networks dream hierarchically. In: Proceedings of NAACL (2018)
Google Scholar
Wilcox, E., Levy, R., Morita, T., Futrell, R.: What do RNN language models learn about filler-gap dependencies? In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, vol. 2, no. 5, pp. 211–221 (2018)
Google Scholar
Wilcox, E., Gauthier, J., Hu, J., Qian, P., Levy, R.P.: Learning syntactic structures from string input. In: Lappin, S. (ed.) Algebraic Systems and the Representation of Linguistic Knowledge, vol. 2, no. 5 (2022)
Google Scholar
Chomsky, N.: Syntactic Structures. Walter de Gruyter, Berlin (1957)
Book MATH Google Scholar
Jäger, L.A., Engelmann, F., Vasishth, S.: Similarity-based interference in sentence comprehension: literature review and Bayesian meta-analysis. J. Mem. Lang. 94, 316–339 (2017)
Article Google Scholar
Bock, K., Miller, C.A.: Broken agreement. Cogn. Psychol. 23(1), 45–93 (1991)
Article Google Scholar
Wagers, M., Lau, E.F., Phillips, C.: Agreement attraction in comprehension: representations and processes. J. Mem. Lang. 61, 206–237 (2009)
Article Google Scholar
Lewis, R.L., Vasishth, S.: An activation-based model of sentence processing as skilled memory retrieval. Cogn. Sci. 29(3), 375–419 (2005)
Article Google Scholar
Eberhard, K.: The marked effect of number on subject–verb agreement. J. Mem. Lang. 36, 147–164 (1997)
Article Google Scholar
Bock, K., Cutting, J.C.: Regulating mental energy: performance units in language production. J. Mem. Lang. 31, 99–127 (1992)
Article Google Scholar
Haskell, T.R., MacDonald, M.C.: Constituent structure and linear order in language production: evidence from subject-verb agreement. J. Exp. Psychol. Learn. Mem. Cogn. 31(5), 891–904 (2005)
Article Google Scholar
Franck, J., Vigliocco, G., Nicol, J.: Subject–verb agreement errors in French and English: the role of syntactic hierarchy. Lang. Cognit. Process. 17, 371–404 (2002)
Article Google Scholar
Arehalli, S., Linzen, T.: Neural language models capture some, but not all, agreement attraction effects. Trans. Assoc. Comput. Linguist. 8, 166–180 (2020)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training. OpenAI (2018)
Google Scholar
Merkx, D., Frank, S. L.: Human sentence processing: recurrence or attention? In: Proceedings of the 2021 Workshop on Cognitive Modeling and Computational Linguistics, pp. 1–11. Association for Computational Linguistics, Mexico City (2021)
Google Scholar
Michaelov, J.A., Coulson, S., Bergen, B.K.: So Cloze yet so far: N400 amplitude is better predicted by distributional information than human predictability judgements. arXiv:2109.01226 [Cs, Math] (2021)
Bürkner, P.-C.: brms: An R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80(1), 1–28 (2017)
Article Google Scholar
Contreras Kallens, P., Kristensen‐McLachlan, R.D., Christiansen, M.H.: Large language models demonstrate the potential of statistical learning in language. Cogn. Sci. 47(3) (2023)
Google Scholar
Wilcox, E.G., Futrell, R., Levy, R.: Using computational models to test syntactic learnability. Linguist. Inq. (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st Conference on Neural Information Processing Systems, pp. 6000–6010 (2017)
Google Scholar
Luong, M.-T., Pham, H., Manning, C.-D.: Effective approaches to attention based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Southern California, Los Angeles, CA, 90089, USA
Hailin Hao

Authors

Hailin Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hailin Hao .

Editor information

Editors and Affiliations

High Performance Computing Center, King Abdulaziz University, Jeddah, Saudi Arabia
Rashid Mehmood
University of Minho, Braga, Portugal
Victor Alves
ISEP/GECAD, Instituto Superior de Engenharia do Port, Porto, Portugal
Isabel Praça
Kielce University of Technology, Kielce, Poland
Jarosław Wikarek
BISITE, University of Salamanca, Salamanca, Spain
Javier Parra-Domínguez
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria
Roussanka Loukanova
Universidad de Valladolid, Valladolid, Spain
Ignacio de Miguel
INESC-TEC and GECAD/ISEP, Universidade de Trás-os-Montes e Alto Douro, Vila Real, Portugal
Tiago Pinto
Universidade de Trás-os-Montes e Alto Douro, Vila Real, Portugal
Ricardo Nunes
University of Calabria, Arcavacata, Rende CS, Cosenza, Italy
Michela Ricca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, H. (2023). Evaluating Transformers’ Sensitivity to Syntactic Embedding Depth. In: Mehmood, R., et al. Distributed Computing and Artificial Intelligence, Special Sessions I, 20th International Conference. DCAI 2023. Lecture Notes in Networks and Systems, vol 741. Springer, Cham. https://doi.org/10.1007/978-3-031-38318-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-38318-2_16
Published: 26 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38317-5
Online ISBN: 978-3-031-38318-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics