Advertisement

The Groningen Meaning Bank

  • Johan Bos
  • Valerio Basile
  • Kilian Evang
  • Noortje J. Venhuizen
  • Johannes Bjerva
Chapter

Abstract

The goal of the Groningen Meaning Bank (GMB) is to obtain a large corpus of English texts annotated with formal meaning representations. Since manually annotating a comprehensive corpus with deep semantic representations is a hard and time-consuming task, we employ a sophisticated bootstrapping approach. This method employs existing language technology tools (for segmentation, part-of-speech tagging, named entity tagging, animacy labelling, syntactic parsing, and semantic processing) to get a reasonable approximation of the target annotations as a starting point. The machine-generated annotations are then refined by information obtained from both expert linguists (using a wiki-like platform) and crowd-sourcing methods (in the form of a ‘Game with a Purpose’) which help us in deciding how to resolve syntactic and semantic ambiguities. The result is a semantic resource that integrates various linguistic phenomena, including predicate-argument structure, scope, tense, thematic roles, rhetorical relations and presuppositions. The semantic formalism that brings all levels of annotation together in one meaning representation is Discourse Representation Theory, which supports meaning representations that can be translated to first-order logic. In contrast to ordinary treebanks, the units of annotation in the GMB are texts, rather than isolated sentences. The current version of the GMB contains more than 10,000 public domain texts aligned with Discourse Representation Structures, and is freely available for research purposes.

Keywords

Formal semantics Compositional semantics Combinatory Categorial Grammar Discourse Representation Theory Gamification Crowdsourcing 

Notes

Acknowledgements

We thank James Pustejovsky and Nancy Ide to encourage us to write this chapter. We also thank the anonymous reviewers for their valuable feedback that helped us to improve previous versions of this chapter significantly. We further would like local and visiting students who contributed to the Groningen Meaning Bank or Wordrobe: Jaap Nanninga, Jay Feldman, Lena Rampula, Hylke Postma, and Maurice Kleine. Finally we thank our crowd of expert annotators that together produced over a thousand bows, and the 1,580 players of Wordrobe, who all helped to improve the Groningen Meaning Bank. A final note from the authors: the ordering of the authors of this chapter is determined chronologically, reflecting the time they joined the project.

References

  1. 1.
    Asher, N.: Reference to Abstract Objects in Discourse. Kluwer Academic Publishers, Amsterdam (1993)Google Scholar
  2. 2.
    Asher, N., Lascarides, A.: Logics of Conversation. Studies in Natural Language Processing. Cambridge University Press, Cambridge (2003)Google Scholar
  3. 3.
    Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the Conference on 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 86–90. Université de Montréal, Montreal, Quebec, Canada (1998)Google Scholar
  4. 4.
    Basile, V., Bos, J.: Aligning formal meaning representations with surface strings for wide-coverage text generation. In: Proceedings of the 14th European Workshop on Natural Language Generation, pp. 1–9. Association for Computational Linguistics, Sofia, Bulgaria (2013)Google Scholar
  5. 5.
    Basile, V., Bos, J., Evang, K., Venhuizen, N.: A platform for collaborative semantic annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 92–96. Avignon, France (2012)Google Scholar
  6. 6.
    Basile, V., Bos, J., Evang, K., Venhuizen, N.J.: Developing a large semantically annotated corpus. In: Calzolari, N., Choukri, K., Declerck, T., Uğur Doğan, M., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey (2012)Google Scholar
  7. 7.
    Beschke, S., Liu, Y., Menzel, W.: Large-scale CCG induction from the Groningen Meaning Bank. In: Proceedings of the ACL 2014 Workshop on Semantic Parsing (2014)Google Scholar
  8. 8.
    Bjerva, J.: Multi-class animacy classification with semantic features. In: Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 65–75. Association for Computational Linguistics, Gothenburg, Sweden (2014)Google Scholar
  9. 9.
    Blackburn, P., Bos, J.: Representation and Inference for Natural Language. A First Course in Computational Semantics, CSLI (2005)Google Scholar
  10. 10.
    Blackburn, P., Bos, J., Kohlhase, M., de Nivelle, H.: Inference and computational semantics. In: Bunt, H., Muskens, R., Thijsse, E. (eds.) Computing Meaning, vol. 2, pp. 11–28. Kluwer (2001)Google Scholar
  11. 11.
    Bos, J.: Predicate logic unplugged. In: Dekker, P., Stokhof, M. (eds.) Proceedings of the Tenth Amsterdam Colloquium, pp. 133–143. ILLC/Dept. of Philosophy, University of Amsterdam (1996)Google Scholar
  12. 12.
    Bos, J.: Implementing the binding and accommodation theory for anaphora resolution and presupposition projection. Comput. Linguist. 29(2), 179–210 (2003)CrossRefGoogle Scholar
  13. 13.
    Bos, J.: Computational semantics in discourse: underspecification, resolution, and inference. J. Log. Lang. Inf. 13(2), 139–157 (2004)CrossRefGoogle Scholar
  14. 14.
    Bos, J.: Semantic annotation issues in parallel meaning banking. In: Proceedings of the Tenth Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-10), pp. 17–20. Reykjavik, Iceland (2014)Google Scholar
  15. 15.
    Bos, J., Evang, K., Nissim, M.: Annotating semantic roles in a lexicalised grammar environment. In: Proceedings of ISA-8. Pisa, Italy (2012)Google Scholar
  16. 16.
    Bresnan, J., Cueni, A., Nikitina, T., Harald Baayen, R.: Predicting the dative alternation. In: Cognitive Foundations of Interpretation, pp. 69–94 (2007)Google Scholar
  17. 17.
    Calhoun, S., Carletta, J., Brenier, J.M., Mayo, N., Jurafsky, D., Steedman, M., Beaver, D.: The NXT-format switchboard corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Lang. Res. Eval. 44(4), 387–419 (2010)Google Scholar
  18. 18.
    Carletta, J., Evert, S., Heid, U., Kilgour, J., Robertson, J., Voormann, H.: The NITE XML toolkit: flexible annotation for multi-modal language data. Behav. Res. Methods Instrum. Comput. 35(3), 353–363 (2003)CrossRefGoogle Scholar
  19. 19.
    Central Intelligence Agency. The CIA World Factbook. Potomac Books (2006)Google Scholar
  20. 20.
    Chamberlain, J., Poesio, M., Kruschwitz, U.: Addressing the resource bottleneck to create large-scale annotated texts. In: Bos, J., Delmonte, R. (eds.) Semantics in Text Processing. STEP 2008 Conference Proceedings, vol. 1 of Research in Computational Semantics, pp. 375–380. College Publications (2008)Google Scholar
  21. 21.
    Clark, S., Curran, J.R.: Parsing the WSJ using CCG and log-linear models. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL ’04), pp. 104–111. Barcelona, Spain (2004)Google Scholar
  22. 22.
    Copestake, A., Flickinger, D., Sag, I., Pollard, C.: Minimal recursion semantics: an introduction. J. Res. Lang. Comput. 3(2–3), 281–332 (2005)CrossRefGoogle Scholar
  23. 23.
    Curran, J.R., Clark, S.: Language independent NER using a maximum entropy tagger. In: CONLL ’03 Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Vol. 4, pp. 164–167 (2003)Google Scholar
  24. 24.
    Curran, J., Clark, S., Bos, J.: Linguistically motivated large-scale NLP with C&C and boxer. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 33–36. Prague, Czech Republic (2007)Google Scholar
  25. 25.
    Dell’Orletta, F., Lenci, A., Montemagni, S., Pirrelli, V.: Climbing the path to grammar: a maximum entropy model of subject/object learning. In: Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition, pp. 72–81. Association for Computational Linguistics (2005)Google Scholar
  26. 26.
    Dowman, M., Tablan, V., Cunningham, H., Popov, B.: Web-assisted annotation, semantic indexing and search of television and radio news. In: Proceedings of the 14th International World Wide Web Conference, pp. 225–234. Chiba, Japan (2005)Google Scholar
  27. 27.
    Evang, K., Bos, J.: Scope disambiguation as a tagging task. In: Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Short Papers, pp. 314–320. Association for Computational Linguistics, Potsdam, Germany (2013)Google Scholar
  28. 28.
    Evang, K., Basile, V., Chrupała, G., Bos, J.: Elephant: sequence labeling for word and sentence segmentation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1426. Association for Computational Linguistics, Seattle, Washington, USA (2013)Google Scholar
  29. 29.
    Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. The MIT Press (1998)Google Scholar
  30. 30.
    Gurevych, I., Eckle-Kohler, J., Hartmann, S., Matuschek, M., Meyer, C.M., Wirth, C.: Uby - a large-scale unified lexical-semantic resource based on LMF. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 580–590 (2012)Google Scholar
  31. 31.
    Hahn, U., Buyko, E., Tomanek, K., Piao, S., McNaught, J., Tsuruoka, Y., Ananiadou, S.: An annotation type system for a data-driven NLP pipeline. In: Proceedings of the Linguistic Annotation Workshop, pp. 33–40. Association for Computational Linguistics, Prague, Czech Republic (2007)Google Scholar
  32. 32.
    Hladká, B., Mírovský, J., Schlesinger, P.: Play the language: play coreference. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 209–212. Association for Computational Linguistics, Suntec, Singapore, (2009)Google Scholar
  33. 33.
    Hockenmaier, J., Steedman, M.: CCGbank: a corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Comput. Linguist. 33(3), 355–396 (2007)CrossRefGoogle Scholar
  34. 34.
    Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: the 90% solution. In: Proceedings of the Human Language Technology Conference of the NAACL. Companion Volume: Short Papers, pp. 57–60. PA, USA, Stroudsburg (2006)Google Scholar
  35. 35.
    Ide, N., Fellbaum, C., Baker, C., Passonneau, R.: The manually annotated sub-corpus: a community resource for and by the people. In: Proceedings of the ACL 2010 Conference Short Papers, pp. 68–73. Stroudsburg, PA, USA (2010)Google Scholar
  36. 36.
    Kamp, H.: A theory of truth and semantic representation. In: Groenendijk, J., Janssen, T.M.V., Stokhof, M. (eds.) Truth, Interpretation and Information, pp. 1–41. FORIS, Dordrecht – Holland/Cinnaminson – U.S.A. (1984)Google Scholar
  37. 37.
    Kamp, H., Reyle, U.: From Discourse to Logic; An Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and DRT. Kluwer, Dordrecht (1993)Google Scholar
  38. 38.
    Kipper, K., Korhonen, A., Ryant, N., Palmer, M.: A large-scale classification of English verbs. Lang. Res. Eval. 42(1), 21–40 (2008)CrossRefGoogle Scholar
  39. 39.
    Lafourcade, M.: Making people play for Lexical Acquisition with the JeuxDeMots prototype. In: SNLP’07: 7th International Symposium on Natural Language Processing, p. 7, Pattaya, Chonburi, Thailand (2007)Google Scholar
  40. 40.
    Le, P., Zuidema, W.: Learning compositional semantics for open domain semantic parsing. In: Proceedings of COLING 2012, pp. 1535–1552. The COLING 2012 Organizing Committee, Mumbai, India, December (2012)Google Scholar
  41. 41.
    Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules (2013)Google Scholar
  42. 42.
    Manning, C.D.: Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In: Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing - vol. Part I, pp. 171–189. Springer, Berlin, Heidelberg (2011)Google Scholar
  43. 43.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)Google Scholar
  44. 44.
    Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., Grishman, R.: The NomBank project: an interim report. In: Meyers, A. (ed.) HLT-NAACL 2004 Workshop: Frontiers in Corpus Annotation, pp. 24–31. Association for Computational Linguistics, Boston, Massachusetts, USA, May 2–7 (2004)Google Scholar
  45. 45.
    Minnen, G., Carroll, J., Pearce, D.: Applied morphological processing of English. J. Nat. Lang. Eng. 7(3), 207–223 (2001)Google Scholar
  46. 46.
    Mooney, R.J.: Learning for semantic parsing. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, vol. 4394, pp. 311–324. Springer, Berlin (2007)Google Scholar
  47. 47.
    Muskens, R.: Combining Montague semantics and discourse representation. Linguist. Philos. 19, 143–186 (1996)Google Scholar
  48. 48.
    Navigli, R., Paolo Ponzetto, S.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)Google Scholar
  49. 49.
    Orasan, C., Evans, R.: NP animacy identification for anaphora resolution. J. Artif. Intell. Res. 29, 79–103 (2007)Google Scholar
  50. 50.
    Palmer, M., Kingsbury, P., Gildea, D.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)CrossRefGoogle Scholar
  51. 51.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)Google Scholar
  52. 52.
    Pianta, E., Bentivogli, L., Girardi, C.: MultiWordNet: developing an aligned multilingual database. In: Proceedings of the First International Conference on Global WordNet (2002)Google Scholar
  53. 53.
    Potts, C.: The Logic of Conventional Implicatures. Oxford University Press, Oxford (2005)Google Scholar
  54. 54.
    Prasad, R., Joshi, A., Dinesh, N., Lee, A., Miltsakaki, E., Webber, B.: The Penn Discourse TreeBank as a resource for natural language generation. In: Proceedings of the Corpus Linguistics Workshop on Using Corpora for Natural Language Generation, pp. 25–32 (2005)Google Scholar
  55. 55.
    Pustejovsky, J., Stubbs, A.: Natural Language Annotation and Machine Learning. O’Reilly Media (2012)Google Scholar
  56. 56.
    Rosenbach, A.: Animacy and grammatical variation-findings from English genitive variation. Lingua 118(2), 151–171 (2008)CrossRefGoogle Scholar
  57. 57.
    Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: LREC (2002)Google Scholar
  58. 58.
    Steedman, M.: The Syntactic Process. The MIT Press, Cambridge (2001)Google Scholar
  59. 59.
    Stefanowitsch, A.: Constructional semantics as a limit to grammatical alternation: the two genitives of English. Top. Engl. Linguist. 43, 413–444 (2003)Google Scholar
  60. 60.
    Tiedemann, J.: News from OPUS - a collection of multilingual parallel corpora with tools and interfaces. In: Nicolov, N., Bontcheva, K., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing. volume V, pp. 237–248. John Benjamins, Amsterdam/Philadelphia, Borovets, Bulgaria (2009)Google Scholar
  61. 61.
    Van der Sandt, R.A.: Presupposition projection as anaphora resolution. J. Semant. 9, 333–377 (1992)CrossRefGoogle Scholar
  62. 62.
    van Eijck, J., Kamp, H.: Representing discourse in context. In: van Benthem, J., ter Meulen, A. (eds.) Handbook of Logic and Language, pp. 179–240. Elsevier, MIT (1997)Google Scholar
  63. 63.
    Venhuizen, N.J., Basile, V., Evang, K., Bos, J.: Gamification for word sense labeling. In: Proceedings of 10th International Conference on Computational Semantics (IWCS-2013), pp. 397–403 (2013)Google Scholar
  64. 64.
    Venhuizen, N.J., Bos, J., Brouwer, H.: Parsimonious semantic representations with projection pointers. In: Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers, pp. 252–263. Association for Computational Linguistics, Potsdam, Germany (2013)Google Scholar
  65. 65.
    Venhuizen, N.J., Bos, J., Hendriks, P., Brouwer, H.: How and why conventional implicatures project. In: Proceedings of the 24rd Semantics and Linguistic Theory Conference (SALT 24), pp. 63–83. New York University, New York, May 30 – June 1 (2014)Google Scholar
  66. 66.
    Wyner, A., Bos, J., Basile, V., Quaresma, P.: An empirical approach to the semantic representation of laws. In: JURIX, pp. 177–180 (2012)Google Scholar
  67. 67.
    Zaenen, A., Carletta, J., Garretson, G., Bresnan, J., Koontz-Garboden, A., Nikitina, T., O’Connor, M.C., Wasow, T.: Animacy encoding in english: why and how. In: Proceedings of the 2004 ACL Workshop on Discourse Annotation, pp. 118–125. Association for Computational Linguistics (2004)Google Scholar
  68. 68.
    Zettlemoyer, L., Collins, M.: Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In: Proceedings of the Twenty-First Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), pp. 658–666. AUAI Press, Arlington, Virginia (2005)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  • Johan Bos
    • 1
  • Valerio Basile
    • 2
  • Kilian Evang
    • 1
  • Noortje J. Venhuizen
    • 3
  • Johannes Bjerva
    • 1
  1. 1.University of GroningenGroningenThe Netherlands
  2. 2.INRIANiceFrance
  3. 3.Saarland UniversitySaarbrückenGermany

Personalised recommendations