Skip to main content

Evaluating Transformers’ Sensitivity to Syntactic Embedding Depth

  • Conference paper
  • First Online:
Distributed Computing and Artificial Intelligence, Special Sessions I, 20th International Conference (DCAI 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 741))

  • 368 Accesses

Abstract

Various probing studies have investigated the ability of neural language models trained only on word prediction over large corpora to process hierarchical structures, a hallmark of human linguistic abilities. For instance, it has been shown that Long Short-Term Memory (LSTM), a type of Recurrent Neural Networks (RNNs), can capture long-distance subject-verb agreement patterns and attraction effects found in human sentence processing. However, although it’s found in human experiments that attractors that are syntactically closer to but linearly farther from the verb elicit greater attraction effects than those that are linearly closer but syntactically farther, LSTM shows the opposite pattern, suggesting that it lacks knowledge regarding syntactic distance and is more sensitive to local information. The current article investigates whether state-of-the-art Generative Pre-trained Transformers (GPTs) can capture the prominence of syntactic distance. We experimented with various versions of GPT-2 and GPT-3 and found that all of them succeeded at the task. We conclude that GPT may be able to better model human linguistic cognition than LSTM, corroborating previous research, and that further investigating what mechanisms enable GPT to do so may inform research on human syntactic processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Goldberg, Y.: Neural network methods for natural language processing. Synthesis Lect. Hum. Lang. Technol. 10(1), 1–309 (2017)

    Article  Google Scholar 

  2. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  3. Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)

    Google Scholar 

  4. Linzen, T., Dupoux, E., Goldberg, Y.: Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Trans. Assoc. Comput. Linguist. 4, 521–535 (2016)

    Article  Google Scholar 

  5. Gulordava, K., Bojanowski, P., Grave, E., Linzen, T., Baroni, M.: Colorless green recurrent networks dream hierarchically. In: Proceedings of NAACL (2018)

    Google Scholar 

  6. Wilcox, E., Levy, R., Morita, T., Futrell, R.: What do RNN language models learn about filler-gap dependencies? In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, vol. 2, no. 5, pp. 211–221 (2018)

    Google Scholar 

  7. Wilcox, E., Gauthier, J., Hu, J., Qian, P., Levy, R.P.: Learning syntactic structures from string input. In: Lappin, S. (ed.) Algebraic Systems and the Representation of Linguistic Knowledge, vol. 2, no. 5 (2022)

    Google Scholar 

  8. Chomsky, N.: Syntactic Structures. Walter de Gruyter, Berlin (1957)

    Book  MATH  Google Scholar 

  9. Jäger, L.A., Engelmann, F., Vasishth, S.: Similarity-based interference in sentence comprehension: literature review and Bayesian meta-analysis. J. Mem. Lang. 94, 316–339 (2017)

    Article  Google Scholar 

  10. Bock, K., Miller, C.A.: Broken agreement. Cogn. Psychol. 23(1), 45–93 (1991)

    Article  Google Scholar 

  11. Wagers, M., Lau, E.F., Phillips, C.: Agreement attraction in comprehension: representations and processes. J. Mem. Lang. 61, 206–237 (2009)

    Article  Google Scholar 

  12. Lewis, R.L., Vasishth, S.: An activation-based model of sentence processing as skilled memory retrieval. Cogn. Sci. 29(3), 375–419 (2005)

    Article  Google Scholar 

  13. Eberhard, K.: The marked effect of number on subject–verb agreement. J. Mem. Lang. 36, 147–164 (1997)

    Article  Google Scholar 

  14. Bock, K., Cutting, J.C.: Regulating mental energy: performance units in language production. J. Mem. Lang. 31, 99–127 (1992)

    Article  Google Scholar 

  15. Haskell, T.R., MacDonald, M.C.: Constituent structure and linear order in language production: evidence from subject-verb agreement. J. Exp. Psychol. Learn. Mem. Cogn. 31(5), 891–904 (2005)

    Article  Google Scholar 

  16. Franck, J., Vigliocco, G., Nicol, J.: Subject–verb agreement errors in French and English: the role of syntactic hierarchy. Lang. Cognit. Process. 17, 371–404 (2002)

    Article  Google Scholar 

  17. Arehalli, S., Linzen, T.: Neural language models capture some, but not all, agreement attraction effects. Trans. Assoc. Comput. Linguist. 8, 166–180 (2020)

    Google Scholar 

  18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  19. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training. OpenAI (2018)

    Google Scholar 

  20. Merkx, D., Frank, S. L.: Human sentence processing: recurrence or attention? In: Proceedings of the 2021 Workshop on Cognitive Modeling and Computational Linguistics, pp. 1–11. Association for Computational Linguistics, Mexico City (2021)

    Google Scholar 

  21. Michaelov, J.A., Coulson, S., Bergen, B.K.: So Cloze yet so far: N400 amplitude is better predicted by distributional information than human predictability judgements. arXiv:2109.01226 [Cs, Math] (2021)

  22. Bürkner, P.-C.: brms: An R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80(1), 1–28 (2017)

    Article  Google Scholar 

  23. Contreras Kallens, P., Kristensen‐McLachlan, R.D., Christiansen, M.H.: Large language models demonstrate the potential of statistical learning in language. Cogn. Sci. 47(3) (2023)

    Google Scholar 

  24. Wilcox, E.G., Futrell, R., Levy, R.: Using computational models to test syntactic learnability. Linguist. Inq. (2022)

    Google Scholar 

  25. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st Conference on Neural Information Processing Systems, pp. 6000–6010 (2017)

    Google Scholar 

  26. Luong, M.-T., Pham, H., Manning, C.-D.: Effective approaches to attention based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hailin Hao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hao, H. (2023). Evaluating Transformers’ Sensitivity to Syntactic Embedding Depth. In: Mehmood, R., et al. Distributed Computing and Artificial Intelligence, Special Sessions I, 20th International Conference. DCAI 2023. Lecture Notes in Networks and Systems, vol 741. Springer, Cham. https://doi.org/10.1007/978-3-031-38318-2_16

Download citation

Publish with us

Policies and ethics