A Machine Learning Benchmark with Meaning: Learnability and Verb Semantics

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11919)


Just over thirty years ago the prospect of modelling human knowledge with parallel distributed processing systems without explicit rules, became a possibility. In the past five years we have seen remarkable progress with artificial neural network (ANN) based systems being able to solve previously difficult problems in many cognitive domains. With a focus on Natural Language Processing (NLP), we argue that the progress is in part illusory because the benchmarks that measure progress have become task oriented, and have lost sight of the goal to model knowledge. Task oriented benchmarks are not informative about the reasons machine learning succeeds, or fails. We propose a new dataset in which the correct answers to entailments and grammaticality judgements depend crucially on specific items of knowledge about verb semantics, and therefore errors on performance can be directly traced to deficiencies in knowledge. If this knowledge is not learnable from the provided input, then it must be provided as an innate prior.


Machine learning NLP Grammar Learnability Cognition Benchmarks Dataset 



This research was supported by the Project News Angler, which is funded by the Norwegian Research Council’s IKTPLUSS programme as project 275872.


  1. 1.
    Bejar, I.I., Chaffin, R., Embretson, S.: Cognitive and Psychometric Analysis of Analogical Problem Solving. Springer, New York (1991)CrossRefGoogle Scholar
  2. 2.
    Berko, J.: The child’s learning of English morphology. Word 14(2–3), 150–177 (1958). Scholar
  3. 3.
    Chmomsky, N.: Syntactic Structures. Mouton (1957)Google Scholar
  4. 4.
    Clark, A., Lappin, S.: Linguistic Nativism and the Poverty of the Stimulus. Wiley, Hoboken (2011)CrossRefGoogle Scholar
  5. 5.
    Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006). Scholar
  6. 6.
    Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)Google Scholar
  7. 7.
    Gibbons, M.: Attaining landmark status: Rumelhart and McClelland’s PDP volumes and the connectionist paradigm. J. Hist. Behav. Sci. 55(1), 54–70 (2019). Scholar
  8. 8.
    Jurgens, D., Turney, P., et al.: Semeval-2012 task 2: Measuring degrees of relational similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and The Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (2012)Google Scholar
  9. 9.
    Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. The University of Chicago Press, The University of Chicago (1993)Google Scholar
  10. 10.
    McClelland, J.L.: Parallel distributed processing: explorations in the microstructure of cognition. In: The Programmable Blackboard Model of Reading, vol. 2, pp. 122–169. MIT Press, Cambridge (1986).
  11. 11.
    McCoy, R., Linzen, T.: Non-entailed subsequences as a challenge for natural language inference (2018). arXiv:1811.12112
  12. 12.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
  13. 13.
    Mikolov, T., Yih, S.W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013). Association for Computational Linguistics (2013).
  14. 14.
    Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)Google Scholar
  15. 15.
    Pinker, S.: Whatever happened to the past tense debate? (2006)Google Scholar
  16. 16.
    Pinker, S.: The Stuff of Thought: Language as a Window Into Human Nature. Viking, New York (2007)Google Scholar
  17. 17.
    Pinker, S.: Learnability and Cognition: The Acquisition of Argument Structure (1989/2013). MIT Press, Cambridge (2013). New EditionCrossRefGoogle Scholar
  18. 18.
    Radford, A., Narasimhan, K., et al.: Improving language understanding by generative pre-training (2018).
  19. 19.
    Rumelhart, D.E., McClelland, J.L.: Parallel distributed processing: explorations in the microstructure of cognition. In: On Learning the Past Tenses of English Verbs, vol. 2, pp. 216–271. MIT Press, Cambridge (1986).
  20. 20.
    Rumelhart, D.E., McClelland, J.L., PDP Research Group (eds.): Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations. MIT Press, Cambridge (1986)Google Scholar
  21. 21.
    Rumelhart, D.E., McClelland, J.L., PDP Research Group (eds.): Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 2: Psychological and Biological Models. MIT Press, Cambridge (1986)Google Scholar
  22. 22.
    Turney, P.D.: Similarity of semantic relations. Comput. Linguist. (2006). Scholar
  23. 23.
    Wang, A., et al.: SuperGLUE: a stickier benchmark for general-purpose language understanding systems (2019)Google Scholar
  24. 24.
    Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: the Proceedings of ICLR (2019)Google Scholar
  25. 25.
    Warstadt, A., Singh, A., Bowman, S.R.: Neural Network Acceptability Judgments (2018)Google Scholar
  26. 26.
    White, A.S., Rastogi, P., Duh, K., Van Durme, B.: Inference is everything: recasting semantic resources into a unified evaluation framework. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (2017)Google Scholar
  27. 27.
    Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. CoRR abs/1906.08237 (2019).
  28. 28.
    Yann, L., Bengio, Y., Hinton, G.: Deep learning 521(7553), 436–444 (2015). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.The University of BergenBergenNorway

Personalised recommendations