The Spanish DELPH-IN grammar

Abstract

In this article we present a Spanish grammar implemented in the Linguistic Knowledge Builder system and grounded in the theoretical framework of Head-driven Phrase Structure Grammar. The grammar is being developed in an international multilingual context, the DELPH-IN Initiative, contributing to an open-source repository of software and linguistic resources for various Natural Language Processing applications. We will show how we have refined and extended a core grammar, derived from the LinGO Grammar Matrix, to achieve a broad-coverage grammar. The Spanish DELPH-IN grammar is the most comprehensive grammar for Spanish deep processing, and it is being deployed in the construction of a treebank for Spanish of 60,000 sentences based in a technical corpus in the framework of the European project METANET4U (Enhancing the European Linguistic Infrastructure, GA 270893GA; http://www.meta-net.eu/projects/METANET4U/.) and a smaller treebank of about 15,000 sentences based in a corpus from the press.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Notes

  1. 1.

    http://www.delph-in.net/.

  2. 2.

    An earlier version on the grammar was briefly presented in Marimon (2010).

  3. 3.

    The current version of the LinGO Grammar Matrix is defined as a web-based interface accessible from: http://www.delph-in.net/matrix/customize/matrix.cgi. A description of it can be found in Bender et al. (2010).

  4. 4.

    Pineda and Meza (2003, 2005) describe a basic grammar for Spanish implemented in the LKB system that has 15 syntactic rules, 180 lexical entries, and 120 lexical rules.

  5. 5.

    http://nlp.lsi.upc.edu/freeling.

  6. 6.

    FreeLing also includes a guesser to deal with words which are not found in the lexicon by computing the probability of each possible PoS tag given the longest observed termination string for that word.

  7. 7.

    SPPP assumes that a pre-processor runs as an external process to the LKB that communicates with its caller through its standard input and output channels. See http://wiki.delph-in.net/moin/LkbSppp.

  8. 8.

    Lexical type names consist of four fields separated by an underscore. The first three fields specify the part-of-speech, the complements that the type selects for (separated by a hyphen), and optional annotations to distinguish lexical types with the same part-of-speech and complement selection; the last field is always the suffix ‘le’ (lexical entry). Thus, the type name n_pp_c_le that we show in the example is for nouns selecting for a PP de complement, ‘c’ indicates that the noun is countable.

  9. 9.

    For the sake of simplicity, the Spanish DELPH-IN grammar does not bind words to lexical types in the morphological lexicon of the FreeLing toolkit, which has more than 500,000 full-form entries. This approach also allows the two components, which have been developed independently, to be maintained independently of each other.

  10. 10.

    This table also shows the number of lexical entries we have defined for NEs.

  11. 11.

    Mendikoetxea (1999), in addition, distinguishes medio se-constructions, where, like in passive constructions, the verb has a unique argument (arg2) which is the syntactic subject and which usually precedes the verb. In the Spanish DELPH-IN grammar we treat medio constructions as a sub-class of passive constructions.

  12. 12.

    In the implementation of modern Greek clitic doubling constructions in the modern Greek DELPH-IN grammar, proclitics are also treated in the syntax (Kordoni and Neu 2005). Pineda and Meza (2005) also propose this dual approach to Spanish object clitics.

  13. 13.

    In Monachesi (1998) clitics are members of the feature CLTS and in Miller and Sag (1997) are members of the ARG-ST (argument structure) of the verb.

  14. 14.

    Boxed numbers indicate that two features are token-identical.

  15. 15.

    The Spanish DELPH-IN grammar has 14 CCLRs. Diversification of the CCLR allows to control the order within the clitic cluster when more than one complement is cliticized (imposing the additional constraint that the “spurious se” is used instead of the dative clitic when it precedes third person accusative clitics) and when object clitic pronouns occur in reflexives and the impersonal constructions. Alternatively, to control the order within the clitic cluster, (Pineda and Meza 2005) develop a clitic lexicon consisting a set of 100 clitic pronoun sequences.

  16. 16.

    The same approach is described in Pineda and Meza (2005).

References

  1. Bender, E. M., & Flickinger, D. (2005). Rapid prototyping of scalable grammars: Towards modularity in extensions to a language-independent core. In Proceedings of IJCNLP’05 (Posters / Demos) (pp. 203–208), Jeju Island, Korea.

  2. Bender, E. M., Drellishak, S., Fokkens, A., Poulson, L., & Saleem, S. (2010). Grammar customization. Research on Language and Computation, 8(1), 23–72.

    Article  Google Scholar 

  3. Bosque, I. (2010). Nueva gramática de la lengua española: Manual. Real Academia Española, Asociación de Academias de la lengua española, Espasa Calpe, Madrid.

  4. Branco, A., & Costa, F. (2008). A computational grammar for deep linguistic processing of Portuguese: LXGram, version A. 4.1. TR-2008-17. Tech. rep., Universidade de Lisboa, Faculdade de Ciências, Departamento de Informatica.

  5. Branco, A., Costa, F., Silva, J., Silveira, S., Castro, S., Avelãs, M., et al. (2010). Developing a deep linguistic databank supporting a collection of treebanks: The CINTIL DeepGramBank. In Proceedings of LREC-2010, La Valletta, Malta.

  6. Callmeier, U. (2000). PET a platform for experimentation with efficient HPSG processing. In D. Flickinger, S. Oepen, J.-I. Tsujii, & H. Uszkoreit (Eds.), Natural language engineering (6)1—Special Issue: Efficiency processing with HPSG: Methods, systems, evaluation (pp. 99–108). Cambridge: Cambridge University Press.

    Google Scholar 

  7. Copestake, A. (2002). Implementing typed feature structure grammars. Stanford: CSLI Publications.

    Google Scholar 

  8. Copestake, A., Flickinger, D., Pollard, C., & Sag, I. A. (2006). Minimal recursion semantics: An introduction. Research on Language and Computation, 3(4), 281–332.

    Google Scholar 

  9. Crysmann, B. (2005). Syncretism in German: A unified approach to underspecification, indeterminacy, and likeness of Case. In Proceedings of HPSG’05, Lisbon, Portugal.

  10. Fernández, S. O. (1999). El pronombre personal. Formas y distribuciones. Pronombre átonos y tónicos. In I. Bosque, & V. Demonte (Eds.), Gramática descriptiva de la lengua española (pp. 1209–1273). Madrid: Espasa.

    Google Scholar 

  11. Flickinger, D. (2002). On building a more efficient grammar by exploiting types. In D. Flickinger, S. Oepen, J.-I. Tsujii, & H. Uszkoreit (Eds.), Natural language engineering (6)1—Special issue: Efficiency processing with HPSG: Methods, systems, evaluation (pp. 1–17). Cambridge: Cambridge University Press.

    Google Scholar 

  12. Hashimoto, C., Bond, F., & Siegel, M. (2007). Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank. Language Resources and Evaluation (Special Issue on Asian Language Technology), 42(2), 117–126.

    Article  Google Scholar 

  13. Hellan, L., & Haugereid, P. (2004). NorSource—An excercise in the matrix grammar building design. In E. M. Bender, D. Flickinger, F. Fouvry, & M. Siegel (Eds.), A workshop on ideas and strategies for multilingual grammar engineering. Vienna: ESSLLI.

  14. Kim, J. B., & Yangs, J. (2003). Korean phrase structure grammar and its implementations into the LKB system, paper presented at the 17th Pacific Asia conference on language, information, and computation.

  15. Kordoni, V., & Neu, J. (2005). Deep analysis of modern Greek. In K.-Y. Su, J.-I. Tsujii, & J.-H. Lee (Eds.), Lecture notes in computer science, Vol. 3248 (pp. 674–683). Berlin: Springer.

  16. Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago: University of Chicago Press.

    Google Scholar 

  17. Marimon, M. (2010). The Spanish resource grammar. In Proceedings of LREC-2010, La Valletta, Malta.

  18. Mendikoetxea, A. (1999). Construcciones con se: Medias, pasivas e impersonales. In I. Bosque, & V. Demonte (Eds.), Gramática descriptiva de la lengua española (pp. 1631–1722). Madrid: Espasa.

    Google Scholar 

  19. Miller, P. H., Sag, I. A. (1997). French clitic movement without clitics or movement. Natural Language and Linguistic Theory, 5(3), 573–639.

    Article  Google Scholar 

  20. Monachesi, P. (1998). Decomposing Italian clitics. In S. Balari, & L. Dini (Eds.), Romance in HPSG (pp. 305–357). Stanford: CSLI publications.

    Google Scholar 

  21. Oepen, S., & Carroll, J. (2000). Performance profiling for parser engineering. In D. Flickinger, S. Oepen, J.-I. Tsujii, & H. Uszkoreit (Eds.), Natural language engineering (6)1—Special issue: Efficiency processing with HPSG: Methods, systems, evaluation (pp. 81–97). Cambridge: Cambridge University Press.

    Google Scholar 

  22. Oepen, S., Flickinger, D., Toutanova, K., & Manning, C.D. (2002). LinGo Redwoods. A rich and dynamic treebank for HPSG. In Proceedings of TLT 2002, Sozopol, Bulgaria.

  23. Padró, L., Collado, M., Reese, S., Lloberes, M., & Castelón, I. (2010). FreeLing 2.1: Five years of open-source language processing tools. In Proceedings of LREC-2010, La Valletta, Malta.

  24. Pineda, L., & Meza, I. (2003). Una gramática básica del español en HPSG. Tech. rep., DCC-IIMAS, Universidad Nacional Autónoma de México.

  25. Pineda, L., & Meza, I. (2005). The Spanish pronominal clitic system. Procesamiento del Lenguaje Natural, 34, 67–104.

    Google Scholar 

  26. Pollard, C., & Sag, I. A. (1987). Information-based syntax and semantics. Volume I: Fundamentals. CSLI Lecture Notes, Stanford.

  27. Pollard, C., & Sag, I. A. (1994). Head-driven phrase structure grammar. Chicago: The University of Chicago Press and CSLI Publications.

    Google Scholar 

  28. Siegel, M., & Bender, E. M. (2002). Efficient deep processing of Japanese. In 3rd Workshop on Asian language resources and international standardization, COLING-2002, Tapei, Taiwan.

  29. Toutanova, K., Manning, C. D., Flickinger, D., & Oepen, S. (2005). Stochastic HPSG parse disambiguation using the Redwoods corpus. Research on Language and Computation, 3(1), 83–105.

    Article  Google Scholar 

  30. Tseng, J. (2004). LKB grammar implementation: French and beyond. In E. M. Bender, D. Flickinger, F. Fouvry, & M. Siegel (Eds.), A workshop on ideas and strategies for multilingual grammar engineering. Vienna: ESSLLI.

  31. Zwicky, A., & Pullum, G. (1983). Cliticization vs. inflection: English n’t. Language, 59(3), 502–513.

    Article  Google Scholar 

Download references

Acknowledgments

This work was funded by the Ramón y Cajal program of the Spanish Ministerio de Ciencia e Innovación. Part of this work was carried out during a three-month research visit at CSLI of the Stanford University funded by the Agència de Gestió d’Ajuts Universitaris i de Recerca under the programe Beques per a estades per a la recerca fora de Catalunya. The author is grateful to the anonymous reviewers for their constructive and helpful comments on the earlier version of the paper. The author also thanks all DELPH-IN members, special thanks to Dan Flickinger for fruitful discussions and Stephan Oepen for answers to numerous question about the LKB system.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Montserrat Marimon.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Marimon, M. The Spanish DELPH-IN grammar. Lang Resources & Evaluation 47, 371–397 (2013). https://doi.org/10.1007/s10579-012-9199-7

Download citation

Keywords

  • Spanish
  • Grammar
  • Deep processing
  • HPSG
  • LKB
  • DELPH-IN