Language Resources and Evaluation

, Volume 50, Issue 3, pp 549–584 | Cite as

Iarg-AnCora: Spanish corpus annotated with implicit arguments

Original Paper

Abstract

This article presents the Spanish Iarg-AnCora corpus (400 k-words, 13,883 sentences) annotated with the implicit arguments of deverbal nominalizations (18,397 occurrences). We describe the methodology used to create it, focusing on the annotation scheme and criteria adopted. The corpus was manually annotated and an interannotator agreement test was conducted (81 % observed agreement) in order to ensure the reliability of the final resource. The annotation of implicit arguments results in an important gain in argument and thematic role coverage (128 % on average). It is the first corpus annotated with implicit arguments for the Spanish language with a wide coverage that is freely available. This corpus can subsequently be used by machine learning-based semantic role labeling systems, and for the linguistic analysis of implicit arguments grounded on real data. Semantic analyzers are essential components of current language technology applications, which need to obtain a deeper understanding of the text in order to make inferences at the highest level to obtain qualitative improvements in the results.

Keywords

Implicit argument Deverbal nominalizations Argument structure Thematic roles Semantic corpus annotation Linguistic resource 

References

  1. Álvez, J., Atserias, J., Carrera, J., Climent, S., Oliver, A., & Rigau, G. (2008). Consistent annotation of EuroWordnet with the top concept ontology. In Proceedings of 4th international WordNet conference (GWC-08). Association for Computational Linguistics.Google Scholar
  2. Aparicio, J., Taulé, M., & Martí, M. A. (2008). AnCora-Verb: A lexical resource for the semantic annotation of corpora. In Proceedings of 6th international conference on of language, resources and evaluation. Marrakech, Morocco.Google Scholar
  3. Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The Berkeley FrameNet project. In Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics. ACL’98 (Vol. 1, pp. 86–90). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  4. Balvet, A., Condette, M. H., Haas, P., Huyghe, R., Marín R., & Merlo, A. (2011). Nomage: An electronic lexicon of French deverbal nouns based on a semantically annotated corpus. In Proceedings of the 1st international workshop on lexical recources (WoLeR 2011), pp. 8–15.Google Scholar
  5. Bertran, M., Borrega, O., Martí, M. A, & Taulé, M. (2011). AnCoraPipe: A new tool for corpora annotation. Working paper 1: TEXT-MESS 2.0 (Text-Knowledge 2.0). Universitat de Barcelona. http://clic.ub.edu/sites/default/files/pagines/AnCoraPipe.pdf
  6. Chen, D., Schneider, N., Das, D., & Smith, N. A. (2010). SEMAFOR: Frame argument resolution with log-linear models. In Proceedings of the 5th international workshop on semantic evaluation (SemEval’10) (pp. 264–267). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  7. Chinchor, N., & Sundheim, B. (2003). Message understanding conference (MUC) 6. Philadelphia: Linguistic Data Consortium.Google Scholar
  8. Erk, K., & Padó, S. (2004). A powerful and versatile XML Format for representing role-semantic annotation. In Proceedings of 4th international conference on language resources and evaluation. Lisbon, Portugal.Google Scholar
  9. Fillmore, C. J. (1986). Pragmatically controlled zero anaphora. Technical report, Department of Linguistics. University of California.Google Scholar
  10. Fillmore, C. J., & Baker, C. F. (2001). Frame semantics for text understanding. In Proceedings of the workshop on WordNet and other lexical resources. NAACL, Pittsburgh, Pennsylvania, Association for Computational Linguistics.Google Scholar
  11. Gerber, M. (2011). Semantic role labeling of implicit arguments for nominal predicates. Ph-Dissertation, Michigan State University, USA.Google Scholar
  12. Gerber, M., & Chai, J. Y. (2010). Beyond NomBank: A study of implicit arguments for nominal predicates. In Proceedings of the 48th annual meeting of the association for computational linguistics. ACL’10 (pp. 1583–1592). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  13. Gerber, M., & Chai, J. Y. (2012). Semantic role labeling of implicit arguments for nominal predicates. Computational Linguistics, 38, 755–798.CrossRefGoogle Scholar
  14. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2006). OntoNotes: The 90% solution. In Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics, HLT-NAACL’06, 57–60, New York.Google Scholar
  15. Kipper, K., Korhonen, A., Ryant, N., & Palmer, M. (2006). Extending VerbNet with novel verb classes. In Proceedings of the 5th international conference on language resources and evaluation (LREC’06), pp. 1027–1032. Genova, Italy.Google Scholar
  16. Laparra, E., & Rigau, G. (2012). Exploiting explicit annotations and semantic types for implicit argument resolution. ICSC, pp. 75–78.Google Scholar
  17. Laparra, E., & Rigau, G. (2013). ImpAr: A deterministic algorithm for implicit semantic role labelling. The 51st annual meeting of the association for computational linguistics (ACL 2013). Sofia, Bulgaria. Aquest crec que es unsupervised.Google Scholar
  18. Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago: University of Chicago Press.Google Scholar
  19. Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19, 313–330.Google Scholar
  20. Meyers, A. (2007). Anotation guidelines for NomBank-noun argument structure for PropBank. Technical report, University of New York.Google Scholar
  21. Meyers, A., Reeves, R., & Macleod, C. (2004). NP-external arguments, a study of argument sharing in English. In Proceedings of the workshop on multiword expressions: Integrating processing (MWE’04) (pp. 96–103). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  22. Mitchell, A., Strassel, S., Przybocki, M., Davis, J. K., Doddington, G., Grishman, R., et al. (2003). ACE-2 version 1.0. Linguistic Data Consortium, Philadelphia.Google Scholar
  23. Moor, T., Roth, M., & Frank, A. (2013). Predicate-specific annotations for implicit role binding: Corpus annotation, data analysis and evaluation experiments. In Proceedings of the 10th international conference on computational semantics (IWCS)Short papers. Potsdam, Germany, pp. 369–375.Google Scholar
  24. Müller, H. (2011). The copenhagen dependency treebank (CDT). Extending syntactic annotation to morphology and semantics. In K. Gerdes, E. Hajičová, & L. Wanner (Eds.), Depling 2011 proceedings. International conference on dependency linguistics: Exploring dependency grammar, semantics, and the lexicon (pp. 125–134). Barcelona: Depling.Google Scholar
  25. Palmer, M., Kingsbury, P., & Gildea, D. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.CrossRefGoogle Scholar
  26. Parker, R., Graff, D., Kong, J., Chen, K., & Maeda, K. (2011). English Gigaword (5th ed.). Philadelphia: Linguistic Data Consortium.Google Scholar
  27. Peris, A., & Taulé, M. (2011). AnCora-Nom: A Spanish Lexicon of deverbal nominalizations. Procesamiento del Lenguaje Natural, 46, 11–19.Google Scholar
  28. Peris, A., & Taulé, M. (2012). Annotating the argument structure of deverbal nominalizations in Spanish. Language Resources and Evaluation, 46(4), 667–699, Springer.Google Scholar
  29. Peris, A., & Taulé, M. (2013). Argumentos implícitos de los sustantivos deverbales. Guía de anotación v. 0.2. Working paper: 1 Diana-Construcciones. Universitat de Barcelona.Google Scholar
  30. Peris, A., Taulé, M., Rodríguez, H., & Bertran, M. (2013). LIARc: Labeling implicit ARguments in Spanish deverbal nominalizations. In Computational linguistics and intelligent text processing14th international conference, CICLing 2013, Samos, Greece. Proceedings, Part I. Springer, Lecture Notes in Computer Science, 7816, pp. 423–434, Berlin, Germany.Google Scholar
  31. Poesio, M. (2004). The MATE/GNOME proposals for anaphoric annotation, revisited. In Proceedings of the 5th SIGdial workshop at HLT-NAACL 2004, pp. 154–162. Boston.Google Scholar
  32. Poesio, M., & Artstein, R. (2005). The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. In Proceedings of the workshop on frontiers in corpus annotation II: Pie in the sky, pp. 76–83, Ann Arbor, MI.Google Scholar
  33. Recasens, M., & Martí, M. A. (2010). AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation, 44(4):315–345, Springer.Google Scholar
  34. Recasens, M., & Vila, M. (2010). On paraphrase and coreference. Computational Linguistics, 36(4), 639–647.CrossRefGoogle Scholar
  35. Roth, M., & Frank, A. (2012). Aligning predicate argument structures in monolingual comparable texts: A new corpus for a new task. In Proceedings of the 1st joint conference on lexical and computational semantics (*SEM) (pp. 218–227). Montreal, Canada: Association for Computational Linguistics.Google Scholar
  36. Roth, M., & Frank, A. (2013). Automatically identifying implicit arguments to improve argument linking and coherence modeling. In Proceedings of the 2nd joint conference on lexical and computational semantics (*SEM) (pp. 306–316). Atlanta, Georgia, USA: Association for Computational Linguistics.Google Scholar
  37. Ruppenhofer, J., Ellsworth, M., Petruck, M., Johnson, C. R., & Scheffczyk, J. (2006). FrameNet II: Extended theory and practice. Berkeley, California: International Computer Science Institute.Google Scholar
  38. Ruppenhofer, J., Gorinski, P., & Sporleder, C. (2011). In search of missing arguments: A linguistic approach. In Proceedings of the international conference recent advances in natural language processing (RANLP 2011), pp. 331–338, Hissar, Bulgaria.Google Scholar
  39. Ruppenhofer, J., Lee-Goldman, R., Sporleder, C., & Morante, R. (2012). Beyond sentence-level semantic role labeling: Linking argument structures in discourse. Language Resources and Evaluation, 47(3), 695–721, Springer.Google Scholar
  40. Ruppenhofer, J., Sporleder, C., Morante, R., Baker, C., & Palmer, M. (2010). Semeval-2010 task 10: Linking events and their participants in discourse. In Proceedings of the 5th workshop on semantic evaluations (ACL 2010), pp. 45–50, Uppsala, Sweden.Google Scholar
  41. Silberer, C., & Frank, A. (2012). Casting implicit role linking as an anaphora resolution task. *SEM 2012: The 1st joint conference on lexical and computational semantics—Vol. 1: Proceedings of the main conference and the shared task, and Vol. 2: Proceedings of the 6th international workshop on semantic evaluation (SemEval 2012) (pp. 1–10). Montréal, Canada, Association for Computational Linguistics.Google Scholar
  42. Taulé, M., Martí, M. A., & Borrega, O. (2011). AnCora 2.0: Argument structure guidelines for Catalan and Spanish, Working paper 4: TEXT-MESS 2.0 (Text-Knowledge 2.0).Google Scholar
  43. Taulé, M., Martí, M. A., & Recasens, M. (2008). AnCora: Multilevel annotated corpora for Catalan and Spanish. In Proceedings of 6th international conference on language resources and evaluation, pp. 96–101. Marrakesh, Morocco.Google Scholar
  44. Tonelli, S., & Delmonte, R. (2010). VENSES ++: Adapting a deep semantic processing system to the identification of null instantiations. In Proceedings of the 5th international workshop on semantic evaluation (SemEval’10) (pp. 296–299). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  45. Tonelli, S., & Delmonte, R. (2011). Desperately seeking implicit arguments in text. In Proceedings of the ACL 2011 workshop on relational models of semantics (pp. 54–62). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  46. Wang, N., Li, R., Lei, Z., Wang, Z., & Jin, J. (2013). Document oriented gap filling of definite null instantiation in FrameNet. In M. Sun, et al. (Eds.), Chinese computational linguistics and natural language processing based on naturally annotated big data 2013 (pp. 85–96)., Lecture notes in computer science Berlin Heidelberg: Springer.CrossRefGoogle Scholar
  47. Weischedel, R., Hovy, E., Marcus, M., Palmer, M., Belvin, R., Pradhan, S., et al. (2011). OntoNotes: A large training corpus for enhanced processing. In J. Olive, C. Christianson, & J. McCary (Eds.), Handbook of natural language processing and machine translation: DARPA global autonomous language exploitation. New York: Springer.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Mariona Taulé
    • 1
  • Aina Peris
    • 1
  • Horacio Rodríguez
    • 2
  1. 1.Centre de Llenguatge i Computació (CLiC)University of BarcelonaBarcelonaSpain
  2. 2.TALP Research CenterTechnical University of CataloniaBarcelonaSpain

Personalised recommendations