Abstract
In which we investigate the technical issues surrounding the defeat, or perhaps the sudden assassination, of the Winograd Schema Challenge. We argue that, while the obvious suspect is the WinoGrande-based solution, the real cause of death was the masked language modeling technique for learning large language models. The Winograd Schema Challenge was, in the end, just a test for masked language closure, and as such it was killed by the use of this technique at scale.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The self-attention mechanism guarantees that long-distance context has “equal opportunity” to show up. When it comes to an anaphora resolution, the self-attention mechanism tackles it at its core.
- 2.
As a digression, we note that Quoc V. Le, the second author of this paper, tackled the WSC in one of the first successful approaches using language models [48].
- 3.
In an interview, Jacob Devlin has commmented on the cloze test as being a fairly known test in psycholinguistics for accessing levels of readership ability.
- 4.
- 5.
References
Open AI. GPT4 Technical Report. arXiv:2303.08774, 2023
Bailey, D., Harrison, A., Lierler, Y., Lifschitz, V., Michael, J.: The winograd schema challenge and reasoning about correlation. In: Working Notes of the Symposium on Logical Formalizations of Commonsense Reasoning (2015)
Bender, D.: Establishing a human baseline for the Winograd schema challenge. In: Modern AI and Cognitive Science Conference, pp. 39–45 (2015)
Bobrow, D.: Precision-focussed textual inference. In: Proceedings of the Workshop on Textual Entailment and Paraphrasing ACL, Prague (2007)
Brown, T.B., et al.: Language models are few-shot learners. arXiv:2005.14165 (2020)
Cozman, F.G., Neri, H.: Some thoughts on knowledge-enhanced machine learning. Int. J. Approximate Reasoning 136, 308–324 (2020)
Dagan, I.: Recognizing textual entailment: Rational, evaluation and approaches. Natural Lang. Eng. 15(4), i-xvii (2009)
Dagan, I., Glickman, O., Magnini, B.: The PASCAL Recognising Textual Entailment Challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006). https://doi.org/10.1007/11736790_9
Dai, A.M., Le, V.Q.: Semi-supervised sequence learning. In: C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, eds, Advances in Neural Information Processing Systems, vol. 28. Curran Associates Inc (2015)
Davies, E.: Winograd schemas and machine translation. arXiv:1608.01884 (2016)
Davis, E., Morgenstern, L., Ortiz, C.L.: The first Winograd Schema Challenge at IJCAI-16. AI Mag. 38(3), 97–98 (2017)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805v2 (2019)
Elazar, Y., Zhang, H., Goldberg, Y., Roth, D.: Back to square one: artifact detection, training and commonsense disentanglement in the Winograd schema. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10486–10500. Association for Computational Linguistics (2021)
Emami, A., et al.: The KnowRef coreference corpus: removing gender and number cues for difficult pronominal anaphora resolution. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3952–3961, Florence, Italy, July 2019. Association for Computational Linguistics (2019)
Emami, A., Trischler, A., Suleman, K., Chi, J., Cheung, K.: A generalized knowledge hunting framework for the Winograd Schema Challenge. In: NAACL-HLT 2018: Student Research Workshop, pp. 25–31 (2018)
Frege, G.: Sense and reference. Philos. Rev. 57 (1948)
Joshi, B., Shah, N., Barbieri, F., Leonardo Neves, L.: The devil is in the details: evaluating limitations of transformer-based methods for granular tasks. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 3652–3659, Barcelona, Spain (Online). International Committee on Computational Linguistics (2020)
Jurafsky, D., Martin, J.H.: Speech and Language Processing (3rd ed. draft) (2023)
Kavumba, P., Inoue, N., Heinzerling, B., Singh, K., Reisert, P., Kentaro Inui, K.: When choosing plausible alternatives, Clever Hans can be clever. In: Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing, pp. 33–42, Hong Kong, China. Association for Computational Linguistics (2019)
Kocijan, V., Cretu, A.-M., Camburu, O.-M., Yordanov, Y., Lukasiewicz, T.: A surprisingly robust trick for the Winograd scheme challenge. In: Annual Meeting of the Association for Computational Linguistics, pp. 4837–4842 (2019)
Kocijan, V., Davis, E., Lukasiewicz, T., Marcus, G., Leora Morgenstern, L.: The defeat of the Winograd Schema Challenge. arXiv:2201.02387 (2023)
Kocijan, V., Lukasiewicz, T., Davis, E., Marcus, G.: A review of Winograd Schema Challenge datasets and approaches. arXiv:2004.13831v1 (2020)
Korman, D.: Defining textual entailment. J. Assoc. Inf. Sci. Technol. 69 (2018)
Levesque, H.: The winograd schema challenge. In: AAAI (2011)
Levesque, H.: On our best behaviour. In: IJCAI (2013)
Levesque, H.: Common Sense, the Turing Test, and the Quest for Real AI. The MIT Press (2017)
Levesque, H., Davis, E., Morgenstern, L.: The Winograd Schema Challenge. Knowledge Representation (2012)
Liu, Q., Jiang, H., Ling, H.-Z., Zhu, X,. Wei, S., Hu, Y.: Commonsense knowledge enhanced embeddings for solving pronoun disambiguation problems in Winograd schemes challenge. arXiv:1611.04146 (2016)
Yinhan Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 (2019)
Marcus, G., Davis, E.: Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon (2019)
George, A.: Müller. Language and Communication, McGraw-Hili (1951)
Nicos, I., Michael Loizos, M.: Tackling the Winograd schema Challenge through machine logical inferences. STAIRS 75 (2016)
Nicos, I., Michael Loizos, M.: How the availability of training material affects performance in the Winograd Schema Challenge (2017)
Nicos, I., Michael Loizos, M.: A data-driven metric of hardness for WSC sentences. GCAI-2018 (EPiC Series in Computing) 55, 107–120 (2018)
Judea Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, 2\({}^{\underline{\rm a}}\) edition (2009)
Peters, M.E., Neumann, M., Iyyer, M, Gardner, M., Kenton Lee, K., Luke Zettlemoyer, L.: Deep contextualized word representations, Christopher Clark (2018)
Quine. Two dogmas of empiricism. Philos. Rev. 60 (1951)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. Technical Report 8, OpenAI Blog (2019)
Rahman, A., Ng, V.: Resolving complex cases of definite pronouns: The Winograd schema challenge. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 777–789, Jeju Island, Korea (2012). Association for Computational Linguistics
Ruan, Y.-P., Zhu, X., Ling, Z.-H., Shi, Z., Liu, Q., Wei, S.: Exploring unsupervised pretraining and sentence structure modelling for Winograd schemes challenge. arXiv:1904.09705 (2019)
Rus, V.: A study of textual entailment. Int. J. Art. Intell. Tools 17 (2007)
Sakaguchi, K., Bras, R.L., Bhagavatula, C., Choi, Y.: Winogrande: an adversarial Winograd Schema Challenge at scale. arXiv:1907.10641v2 (2019)
Sakaguchi, K., Bras, R.L., Bhagavatula, C., Choi, Y.: Winogrande: an adversarial Winograd Schema Challenge at scale. AAAI-20 Technical Tracks 34(05) (2019)
Sakaguchi, K., Bras, R.L., Bhagavatula, C., Yejin Choi, Y.: Winogrande: an adversarial Winograd Schema Challenge at scale. arXiv:1907.10641v1 (2019)
Taylor, W.: Cloze procedure: a new tool for measuring readability. J. Quartely Fall (1953)
Hugo Touvron, H., et al.: LLaMA: open and efficient foundation language models. Technical report, arXiv:2302.13971 (2023)
Trichelair, P., et al.: On the evaluation of common-sense reasoning in natural language understanding. arXiv preprint arXiv:1811.01778 (20180
Trinh, T., Quoc Le, Q.: A simple method for commonsense reasoning. arXiv:1806.02847 (2018)
van Aken, B., Winter, B., Löser, A., Felix, A.: Gers. How does BERT answer questions? a layer-wise analysis of transformer representations. In: Association for Computing Machinery, editor, Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, pp. 1823–1832 (2019)
Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA (2017)
Zhang, H., Song, Y.: A distributed solution for Winograd Schema Challenge. In: ICMLC2018 (2018)
Zhu, Y, et al.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Acknowledgements
The first author was supported by FAPESP through grant 2018/0968-1. The second author was partially supported by CNPq through grant 305753/2022-3. This work was carried out at the Center for Artificial Intelligence (C4AI-USP), with support by FAPESP (grant 2019/07665-4) and by the IBM Corporation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Neri, H., Cozman, F.G. (2023). Who Killed the Winograd Schema Challenge?. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14197. Springer, Cham. https://doi.org/10.1007/978-3-031-45392-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-45392-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45391-5
Online ISBN: 978-3-031-45392-2
eBook Packages: Computer ScienceComputer Science (R0)