Annotation of Compositional Operations with GLML

  • James PustejovskyEmail author
  • Anna Rumshisky
  • Olga Batiukova
  • Jessica L. Moszkowicz
Part of the Text, Speech and Language Technology book series (TLTB, volume 47)


In this paper, we introduce a methodology for annotating compositional operations in natural language text and describe the Generative Lexicon Mark-up Language (GLML), a mark-up language inspired by the Generative Lexicon model, for identifying such relations. While most annotation systems capture surface relationships, GLML captures the “compositional history” of the argument selection relative to the predicate. We provide a brief overview of GL before moving on to our proposed methodology for annotating with GLML. There are three main tasks described in the paper. The first one is based on atomic semantic types and the other two exploit more fine-grained meaning parameters encoded in the Qualia Structure roles: (i) Argument Selection and Coercion Annotation for the SemEval-2010 competition; (ii) Qualia Selection in modification constructions; (iii) Type selection in modification constructions and verb-noun combinations involving dot objects. We explain what each task comprises and include the XML format for annotated sample sentences. We show that by identifying and subsequently annotating the typing and subtyping shifts in these constructions, we gain an insight into the workings of the general mechanisms of composition.


Head Noun Grammatical Relation Modification Construction Prepositional Object Generative Lexicon 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The idea for annotating a corpus according to principles of argument selection within GL arose during a discussion at GL2007 in Paris, between one of the authors (James Pustejovsky) and Nicoletta Calzolari and Pierrette Bouillon. The authors would like to thank the members of the GLML Working Group and the organizers of the ASC task at SemEval-2010 for their fruitful feedback. In particular, we would like to thank Nicoletta Calzolari, Elisabetta Jezek, Alessandro Lenci, Valeria Quochi, Jan Odijk, Tommaso Caselli, Claudia Soria, Chu-Ren Huang, Marc Verhagen, and Kiyong Lee. The contribution by Olga Batiukova was partially financed by the Ministry of Economy and Competitiveness of Spain under Grant No. FFI2009-12191 (Subprogram FILO).


  1. Asher, N., & Pustejovsky, J. (2006). A type composition logic for Generative Lexicon. Journal of Cognitive Science, 6, 1–38. Google Scholar
  2. Bisetto, A., & Scalise, S. (2005). The classification of compounds. Lingue e Linguaggio, 2, 319–332. Google Scholar
  3. BNC (2000). The British National Corpus. The BNC Consortium, University of Oxford.
  4. Bouillon, P. (1997). Polymorphie et semantique lexical: le cas des adjectifs. PhD dissertation, Paris VII, Paris. Google Scholar
  5. Burchardt, A., Erk, K., Frank, A., Kowalski, A., Pado, S., & Pinkal, M. (2006). The SALSA corpus: A German corpus resource for lexical semantics. In Proceedings of LREC, Genoa, Italy. Google Scholar
  6. Chierchia, G. (1998). Reference to kinds across language. Natural Language Semantics, 6(4), 339–4015. CrossRefGoogle Scholar
  7. Egg, M. (2005). Flexible semantics for reinterpretation phenomena. Stanford: CSLI. Google Scholar
  8. Groenendijk, J., & Stokhof, M. (1989). Type-shifting rules and the semantics of interrogatives (Vol. 2, pp. 21–68). Dordrecht: Kluwer. Google Scholar
  9. Hanks, P. (2009). Corpus pattern analysis. CPA Project Page. Retrieved April 11, 2009, from
  10. Hanks, P., & Pustejovsky, J. (2005). A pattern dictionary for natural language processing. Revue Française de Linguistique Appliquée, X, 63–82. Google Scholar
  11. Hobbs, J. R., Stickel, M., & Martin, P. (1993). Interpretation as abduction. Artificial Intelligence, 63, 69–142. CrossRefGoogle Scholar
  12. Johnston, M., & Busa, F. (1999). The compositional interpretation of compounds. In E. Viegas (Ed.), Breadth and depth of semantic lexicons (pp. 167–187). Dordrecht: Kluwer. CrossRefGoogle Scholar
  13. Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The sketch engine. In Proceedings of EURALEX, Lorient, France (pp. 105–116). Google Scholar
  14. Kipper, K. (2005). VerbNet: A broad-coverage, comprehensive verb lexicon. PhD dissertation, University of Pennsylvania, PA. Google Scholar
  15. Levi, J. N. (1978). The syntax and semantics of complex nominals. New York: Academic Press. Google Scholar
  16. Markert, K., & Nissim, M. (2007). Metonymy resolution at SemEval I: Guidelines for participants. In Proceedings of the ACL 2007 conference. Google Scholar
  17. Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., & Grishman, R. (2004). The NomBank project: An interim report. In HLT-NAACL 2004 workshop: Frontiers in corpus annotation (pp. 24–31). Google Scholar
  18. Nunberg, G. (1979). The non-uniqueness of semantic solutions: Polysemy. Linguistics and Philosophy, 3, 143–184. CrossRefGoogle Scholar
  19. Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106. CrossRefGoogle Scholar
  20. Partee, B., & Rooth, M. (1983). Generalized conjunction and type ambiguity (pp. 361–383). Berlin: de Gruyter. Google Scholar
  21. Pinkal, M. (1999). On semantic underspecification. In H. Bunt & R. Muskens (Eds.), Proceedings of the 2nd international workshop on computational semantics (IWCS 2), January 13–15, The Netherlands: Tilburg University. Google Scholar
  22. Pradhan, S., Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2007). Ontonotes: A unified relational semantic representation. In ICSC 2007, International conference on semantic computing (pp. 517–526). Google Scholar
  23. Pustejovsky, J. (1991). The Generative Lexicon. Computational Linguistics, 17(4), 409–441. Google Scholar
  24. Pustejovsky, J. (1995). Generative Lexicon. Cambridge: MIT Press. Google Scholar
  25. Pustejovsky, J. (2000). Events and the semantics of opposition. In C. Tenny & J. Pustejovsky (Eds.), Events as grammatical objects (pp. 445–482). Stanford: CSLI. Google Scholar
  26. Pustejovsky, J. (2001). Type construction and the logic of concepts. In The syntax of word meaning, Cambridge: Cambridge University Press. Google Scholar
  27. Pustejovsky, J. (2005). A survey of dot objects (Technical report). Brandeis University. Google Scholar
  28. Pustejovsky, J. (2006a). Type theory and lexical decomposition. Journal of Cognitive Science, 6, 39–76. Google Scholar
  29. Pustejovsky, J. (2006b). Unifying linguistic annotations: A TimeML case study. In Proceedings of TSD 2006, Brno, Czech Republic. Google Scholar
  30. Pustejovsky, J. (2011). Coercion in a general theory of argument selection. Journal of Linguistics, 49(6), 1401–1431. Google Scholar
  31. Pustejovsky, J., Hanks, P., & Rumshisky, A. (2004). Automated induction of sense in context. In COLING 2004, Geneva, Switzerland (pp. 924–931). Google Scholar
  32. Pustejovsky, J., Knippen, R., Littman, J., & Sauri, R. (2005). Temporal and event information in natural language text. Language Resources and Evaluation, 39(2), 123–164. CrossRefGoogle Scholar
  33. Pustejovsky, J., Rumshisky, A., Plotnick, A., Jezek, E., Batiukova, O., & Quochi, V. (2010). Semeval-2010 task 7: Argument selection and coercion. In Proceedings of the 5th international workshop on semantic evaluation, Uppsala, Sweden (pp. 27–32). Stroudsburg: Association for Computational Linguistics. Google Scholar
  34. Pustejovsky, J., & Stubbs, A. (2012). Natural language annotation for machine learning. Sebastopol: O’Reilly Publishers. Google Scholar
  35. Rumshisky, A., & Batiukova, O. (2008). Polysemy in verbs: Systematic relations between senses and their effect on annotation. In COLING workshop on human judgement in computational linguistics (HJCL-2008), Manchester, England. Google Scholar
  36. Rumshisky, A., Grinberg, V., & Pustejovsky, J. (2007). Detecting selectional behaviour of complex types in text. In 4th international workshop on Generative Lexicon, Paris. Google Scholar
  37. Rumshisky, A., Hanks, P., Havasi, C., & Pustejovsky, J. (2006). Constructing a corpus-based ontology using model bias. In The 19th international FLAIRS conference, FLAIRS 2006, Melbourne Beach, Florida, USA. Google Scholar
  38. Ruppenhofer, J., Ellsworth, M., Petruck, M., Johnson, C., & Scheffczyk, J. (2006). FrameNet II: Extended theory and practice. Berkeley: California International Computer Sciences Institute. Google Scholar
  39. Spencer, A. (1991). Morphological theory: An introduction to word structure in generative grammar. Oxford, UK and Cambridge, USA: Blackwell Textbooks in Linguistics. Google Scholar
  40. Subirats, C. (2004). FrameNet Español. Una red semántica de marcos conceptuales. In VI international congress of Hispanic linguistics, Leipzig. Google Scholar
  41. Verhagen, M. (2010). The Brandeis annotation tool. In Language resources and evaluation conference, LREC 2010, Malta. Google Scholar
  42. Warren, B. (1978). Semantic patterns of noun-noun compounds. Göteborg: Acta Universitatis Gothoburgensis. Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • James Pustejovsky
    • 1
    Email author
  • Anna Rumshisky
    • 2
    • 3
  • Olga Batiukova
    • 4
  • Jessica L. Moszkowicz
    • 1
  1. 1.Department of Computer ScienceBrandeis UniversityWalthamUSA
  2. 2.Department of Computer ScienceUniversity of MassachusettsLowellUSA
  3. 3.Computer Science and Artificial Intelligence LaboratoryMassachusetts Institute of TechnologyCambridgeUSA
  4. 4.Department of Spanish PhilologyAutonomous University of MadridCantoblancoSpain

Personalised recommendations