Skip to main content

Annotating the argument structure of deverbal nominalizations in Spanish

Abstract

Over recent years, there has been a growing interest in the computational treatment of nominalized Noun Phrases due to the rich semantic information they contain. These Noun Phrases can be understood as verbal paraphrases and, just like them, they can also denote argument and thematic-role relations. This paper presents the methodology followed to annotate the argument structure of deverbal nominalizations in the Spanish AnCora-Es corpus. We focus on the automated annotation process that is mostly based on the semantic information specified in a verbal lexicon but also on the syntactic and semantic information annotated in the corpus. The heuristic rules that make use of this information rely on linguistic assumptions that are also evaluated as we evaluate the reliability of the automated process. The automated annotation was manually checked in order to ensure the accuracy of the final resource. We demonstrate its feasibility (77% F-measure) and show that it facilitates corpus annotation, which is always a time-consuming and costly process. The result is the enrichment of the AnCora-Es corpus with the argument structure and thematic roles of deverbal nominalizations. It is the first Spanish corpus with this kind of information that is freely available.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Notes

  1. 1.

    Nominalizations are also represented in the different FrameNet projects, which are lexical databases supported by corpus evidence. See Ruppenhofer et al. (2006) for English, Burchardt et al. (2009) for German, Ohara (2009) for Japanese and Subirats (2009) for Spanish.

  2. 2.

    Deverbal nominalizations are classified in three types depending on the denotation (Peris et al. 2010): event nominalizations (i.e., they refer to an action), result nominalizations (i.e., they refer to the result of an action), and underspecified nominalizations (i.e. when it is not possible to disambiguate between the two denotations above). This information is also annotated in AnCora-Es.

  3. 3.

    http://verbs.colorado.edu/semlink/.

  4. 4.

    Peris (2010) AnCora-Nom: annotation guidelines. Work report at http://clic.ub.edu/ca/publicacions.

  5. 5.

    AnCora-Es is the largest multilayer annotated corpus of Spanish freely available at: http://clic.ub.edu/corpus/ancora.

  6. 6.

    Our initial idea to detect relational adjectives was to adapt to Spanish the adjective classifier developed by Boleda (2007), but it was not worth the effort and time that we had to put into it.

  7. 7.

    When translating into English it turns out that the AP corresponding to policial is converted to a NP ‘police’. For this reason, we removed ‘AP’ from the English tag.

  8. 8.

    When translating into English it turns out that the AP corresponding to empresarial is converted to a NP ‘business’. For this reason, we removed ‘AP’ from the English tag.

  9. 9.

    In this case, the English translation does not provide an equivalent constituent to the Spanish PP-arg1-pat. We therefore removed that tag from the English translation.

  10. 10.

    In this case, the English translation does not provide an equivalent constituent to the Spanish PP-arg2-loc. We therefore removed the PP tag from the English translation.

  11. 11.

    Kappa measure lowers the observed agreement measure discounting the part of agreement due to chance. Kappa have been computed using Scott’s Pi to calculate the agreement by chance probability.

  12. 12.

    In addition to the observed false positives that support this claim, 90% of false negatives for the arg0-agt tag were arg1-pat, and 70% of false negatives for the arg0-cau tag were arg1-tem. These figures reinforce our observation.

  13. 13.

    http://www.clips.ua.ac.be/conll2008/.

References

  1. Aparicio, J., Taulé, M., & Martí, M. A. (2008). AnCora-Verb: A lexical resource for the semantic annotation of corpora. In Proceedings of the sixth international language resources and evaluation LREC’08 (pp. 797–802). Marrakech, Morocco: European Language Resources Association (ELRA).

  2. Badia, T. (2002). Els complements nominals. In J. Solà (Ed.), Gramàtica del Català Contemporani (Vol. 3, pp. 1591–1640). Barcelona: Empúries.

  3. Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The Berkeley FrameNet Project. In Proceedings of the 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics, ACL’98 (Vol. 1, pp. 86–90). Stroudsburg, PA, USA: Association for Computational Linguistics.

  4. Bertran, M., Borrega, O., Recasens, M., & Soriano, B. (2008). AnCoraPipe: A tool for multilevel annotation. Procesamiento del Lenguaje Natural, 41, 291–292.

    Google Scholar 

  5. Boleda, G. (2007). Automatic acquisition of semantic classes for adjectives. Ph.D. thesis, Pompeu Fabra University, Barcelona, Spain.

  6. Bosque, I., & Picallo, C. (1996). Postnominal adjectives in Spanish DPs. Journal of Linguistics, 32, 349–385.

    Article  Google Scholar 

  7. Burchardt, A., Erk, K., Frank, A., Kowalski, A., Padó, S., & Pinkal, M. (2009). FrameNet for the semantic analysis of German: Annotation, representation and automation. In H. C. Boas (Ed.), Multilingual FrameNets in computational lexicography: Methods and applications. Mouton de Gruyer.

  8. Che, W., Li, Z., Hu, Y., Li, Y., Qin, B., Liu, T., et al. (2008). A cascaded syntactic and semantic dependency parsing system. In Proceedings of the twelfth conference on computational natural language learning, CoNLL’08 (pp. 238–242).

  9. Ciaramita, M., Attardi, G., Dell’Orletta, F., & Surdeanu, M. (2008). DeSRL: A linear-time semantic role labeling system. In Proceedings of the twelfth conference on computational natural language learning, CoNLL’08 (pp. 258–262).

  10. Dowty, D. (1979). Word meaning and montague grammar. Dordrecht: Reidel.

    Book  Google Scholar 

  11. Gerber, M., & Chai, J. Y. (2010). Beyond NomBank: A study of implicit argumentation for nominal predicates. In Proceedings of the Association of Computational Linguistics conference 2010, ACL’10 (pp. 1583–1592). Uppsala, Sweden: Association for Computational Linguistics.

  12. Grimshaw, J. (1990). Argument structure. Cambridge, MA: MIT Press.

    Google Scholar 

  13. Gurevich, O., & Waterman, S. (2009). Mapping verbal argument preferences to deverbals. In Proceedings of the 2009 IEEE international conference on semantic computing (pp. 17–24).

  14. Gurevich, O., Richard, C., Holloway King, T., & De Paiva, V. (2006). Deverbal nouns in knowledge representation. In Proceedings of Florida Artificial Intelligence Research Society conference, Florida, USA (pp. 670–675).

  15. Hull, R. D., & Gomez, F. (2000). Semantic interpretation of deverbal nominalizations. Natural Language Engineering, 6(2), 139–161.

    Article  Google Scholar 

  16. Johansson, R., & Nugues, P. (2008). Dependency-based syntactic–semantic analysis with PropBank and NomBank. In Proceedings of the twelfth conference on computational natural language learning, CoNLL’08 (pp. 183–187). Manchester, UK.

  17. Kipper, K., Dang, H. T., Schuler, W., & Palmer, M. (2000). Building a class-based verb lexicon using TAGs. In Proceedings of the fifth international workshop on tree adjoining grammars and related formalisms. Paris, France.

  18. Kipper, K., Korhonen, A., Ryant, N., & Palmer, M. (2006). Extending VerbNet with novel verb classes. In Proceedings of the 5th international conference on language resources and evaluation, LREC’06 (pp. 1027–1032). Genova, Italy.

  19. Loper, E., Yi, S., & Palmer, M. (2007). Combining lexical resources: Mapping between PropBank and VerbNet. In Proceedings of the 7th international workshop on computational linguistics. Tilburg, The Netherlands.

  20. Meyers, A. (2007). Annotation guidelines for NomBank noun argument structure for PropBank. Technical report, University of New York

  21. Meyers, A., Reeves, R., & Macleod, C. (2004). NP-external arguments: A study of argument sharing in English. In Proceedings of the workshop on multiword expressions: Integrating processing, MWE ’04 (pp. 96–103). Stroudsburg, PA, USA: Association for Computational Linguistics.

  22. Ohara, K. (2009). Frame-based contrastive lexical semantics in Japanese FrameNet: The case of risk and kakeru. In H. C. Boas (Ed.), Multilingual FrameNets in computational lexicography: Methods and applications. Mouton de Gruyer.

  23. Padó, S., Pennacchiotti, M., & Sporleder, C. (2008). Semantic role assignment for event nominalisations by leveraging verbal data. In Proceedings of the 22nd international conference on computational linguistics, CoLing'08 (pp. 665–672). Manchester, UK.

  24. Palmer, M. (2009). SemLink: Combining English lexical resources. In Proceedings of the Generative Lexicon conference, GenLex-09 (pp. 19–25).

  25. Palmer, M., Kingsbury, P., & Gildea, D. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 76–105.

    Article  Google Scholar 

  26. Peris, A. (2010). AnCora-Nom: Annotation guidelines. Technical report, University of Barcelona.

  27. Peris, A., & Taulé, M. (2009). Evaluación de los criterios lingüísticos para la distinción evento y resultado en los sustantivos deverbales. In Proceedings of the 1st international conference on corpus linguistics (pp. 596–611). España: Murcia.

  28. Peris, A., Taulé, M., Boleda, G., & Rodríguez, H. (2010). ADN-Classifier: Automatically assigning denotation types to nominalizations. In Proceedings of the language resources and evaluation conference, LREC’10 (pp. 1422–1428). Valleta, Malta.

  29. Picallo, C. (1999). La estructura del Sintagma Nominal: las nominalizaciones y otros sustantivos con complementos argumentales. In I. Bosque & V. Demonte (Eds.), Gramática Descriptiva de la Lengua Española (Vol. 1, pp. 363–393). Madrid: Espasa Calpe.

  30. Rainer, F. (1999). La derivación Adjetival. In I. Bosque, & V. Demonte (Eds.), Gramática Descriptiva de la Lengua Española (Vol. 3, pp. 4595–4642). Madrid: Espasa Calpe.

  31. Recasens, M., & Martí, M. A. (2010). AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation, 44, 315–345.

    Article  Google Scholar 

  32. Ruppenhofer, J., Ellsworth, M., Petruck, M. R. L., Johnson, C. R., & Scheffczyk, J. (2006). FrameNet II: Extended theory and practice. Technical report, ICSI—International Computer Science Institute.

  33. Santiago, R., & Bustos, E. (1999). La derivación Nominal. In I. Bosque & V. Demonte (Eds.), Gramática Descriptiva de la Lengua Española (Vol. 3, pp. 4505–4594). Madrid: Espasa Calpe.

    Google Scholar 

  34. Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19(3), 321–325.

    Article  Google Scholar 

  35. Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences. New York: McGraw-Hill.

    Google Scholar 

  36. Subirats, C. (2009). Spanish FrameNet: A frame semantic analysis of the Spanish Lexicon. In H. C. Boas (Ed.), Multilingual FrameNets in computational lexicography: Methods and applications. Mouton de Gruyer.

  37. Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L., & Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In Proceedings of the twelfth conference on computational natural language learning, CoNLL’08 (pp. 159–177). Stroudsburg, PA, USA: Association for Computational Linguistics.

  38. Taulé, M., Martí, M. A., & Recasens, M. (2008). AnCora: Multilevel annotated corpora for Catalan and Spanish. In Proceedings of the sixth international language resources and evaluation LREC’08 (pp. 96–101). Marrakech, Morocco: European Language Resources Association (ELRA).

  39. Vázquez, G., Fernández, A., & Martí, M. A. (2000). Clasificación verbal. Alternancias de diátesis. Quaderns de Sintagma, 3, Edicions de la Universitat de Lleida.

  40. Vendler, Z. (1967). Linguistics in philosophy. Ithaca: Cornell University Press.

    Google Scholar 

  41. Yi, S., Loper, E., & Palmer, M. (2007). Can semantic roles generalize across genres? In HLT-NAACL’07 (pp. 548–555).

  42. Zhao, H., & Kit, C. (2008). Parsing syntactic and semantic dependencies with two single-stage maximum entropy models. In Proceedings of the twelfth conference on natural language learning, CoNLL’08 (pp. 203–207). Manchester, UK.

Download references

Acknowledgments

We are grateful to Horacio Rodríguez and David Bridgewater for their helpful advice. We would also like to express our gratitude to the two anonymous reviewers for their suggestions to improve this article. This work was partly supported by the project Araknion (FFI2010-114774-E) and TEXT-MESS 2.0 (TIN2009-13391-C04-04) from the Spanish Ministry of Science and Innovation, and by a FPU grant (AP2007-01028) from the Spanish Ministry of Education.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Aina Peris.

Appendix

Appendix

See Table 10.

Table 10 Specific rules involving two or more nominalized constituents

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Peris, A., Taulé, M. Annotating the argument structure of deverbal nominalizations in Spanish. Lang Resources & Evaluation 46, 667–699 (2012). https://doi.org/10.1007/s10579-011-9172-x

Download citation

Keywords

  • Nominalization
  • Argument structure
  • Semantic corpus annotation
  • Heuristic rules