In this article we present a Spanish grammar implemented in the Linguistic Knowledge Builder system and grounded in the theoretical framework of Head-driven Phrase Structure Grammar. The grammar is being developed in an international multilingual context, the DELPH-IN Initiative, contributing to an open-source repository of software and linguistic resources for various Natural Language Processing applications. We will show how we have refined and extended a core grammar, derived from the LinGO Grammar Matrix, to achieve a broad-coverage grammar. The Spanish DELPH-IN grammar is the most comprehensive grammar for Spanish deep processing, and it is being deployed in the construction of a treebank for Spanish of 60,000 sentences based in a technical corpus in the framework of the European project METANET4U (Enhancing the European Linguistic Infrastructure, GA 270893GA; http://www.meta-net.eu/projects/METANET4U/.) and a smaller treebank of about 15,000 sentences based in a corpus from the press.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
An earlier version on the grammar was briefly presented in Marimon (2010).
The current version of the LinGO Grammar Matrix is defined as a web-based interface accessible from: http://www.delph-in.net/matrix/customize/matrix.cgi. A description of it can be found in Bender et al. (2010).
FreeLing also includes a guesser to deal with words which are not found in the lexicon by computing the probability of each possible PoS tag given the longest observed termination string for that word.
SPPP assumes that a pre-processor runs as an external process to the LKB that communicates with its caller through its standard input and output channels. See http://wiki.delph-in.net/moin/LkbSppp.
Lexical type names consist of four fields separated by an underscore. The first three fields specify the part-of-speech, the complements that the type selects for (separated by a hyphen), and optional annotations to distinguish lexical types with the same part-of-speech and complement selection; the last field is always the suffix ‘le’ (lexical entry). Thus, the type name n_pp_c_le that we show in the example is for nouns selecting for a PP de complement, ‘c’ indicates that the noun is countable.
For the sake of simplicity, the Spanish DELPH-IN grammar does not bind words to lexical types in the morphological lexicon of the FreeLing toolkit, which has more than 500,000 full-form entries. This approach also allows the two components, which have been developed independently, to be maintained independently of each other.
This table also shows the number of lexical entries we have defined for NEs.
Mendikoetxea (1999), in addition, distinguishes medio se-constructions, where, like in passive constructions, the verb has a unique argument (arg2) which is the syntactic subject and which usually precedes the verb. In the Spanish DELPH-IN grammar we treat medio constructions as a sub-class of passive constructions.
Boxed numbers indicate that two features are token-identical.
The Spanish DELPH-IN grammar has 14 CCLRs. Diversification of the CCLR allows to control the order within the clitic cluster when more than one complement is cliticized (imposing the additional constraint that the “spurious se” is used instead of the dative clitic when it precedes third person accusative clitics) and when object clitic pronouns occur in reflexives and the impersonal constructions. Alternatively, to control the order within the clitic cluster, (Pineda and Meza 2005) develop a clitic lexicon consisting a set of 100 clitic pronoun sequences.
The same approach is described in Pineda and Meza (2005).
Bender, E. M., & Flickinger, D. (2005). Rapid prototyping of scalable grammars: Towards modularity in extensions to a language-independent core. In Proceedings of IJCNLP’05 (Posters / Demos) (pp. 203–208), Jeju Island, Korea.
Bender, E. M., Drellishak, S., Fokkens, A., Poulson, L., & Saleem, S. (2010). Grammar customization. Research on Language and Computation, 8(1), 23–72.
Bosque, I. (2010). Nueva gramática de la lengua española: Manual. Real Academia Española, Asociación de Academias de la lengua española, Espasa Calpe, Madrid.
Branco, A., & Costa, F. (2008). A computational grammar for deep linguistic processing of Portuguese: LXGram, version A. 4.1. TR-2008-17. Tech. rep., Universidade de Lisboa, Faculdade de Ciências, Departamento de Informatica.
Branco, A., Costa, F., Silva, J., Silveira, S., Castro, S., Avelãs, M., et al. (2010). Developing a deep linguistic databank supporting a collection of treebanks: The CINTIL DeepGramBank. In Proceedings of LREC-2010, La Valletta, Malta.
Callmeier, U. (2000). PET a platform for experimentation with efficient HPSG processing. In D. Flickinger, S. Oepen, J.-I. Tsujii, & H. Uszkoreit (Eds.), Natural language engineering (6)1—Special Issue: Efficiency processing with HPSG: Methods, systems, evaluation (pp. 99–108). Cambridge: Cambridge University Press.
Copestake, A. (2002). Implementing typed feature structure grammars. Stanford: CSLI Publications.
Copestake, A., Flickinger, D., Pollard, C., & Sag, I. A. (2006). Minimal recursion semantics: An introduction. Research on Language and Computation, 3(4), 281–332.
Crysmann, B. (2005). Syncretism in German: A unified approach to underspecification, indeterminacy, and likeness of Case. In Proceedings of HPSG’05, Lisbon, Portugal.
Fernández, S. O. (1999). El pronombre personal. Formas y distribuciones. Pronombre átonos y tónicos. In I. Bosque, & V. Demonte (Eds.), Gramática descriptiva de la lengua española (pp. 1209–1273). Madrid: Espasa.
Flickinger, D. (2002). On building a more efficient grammar by exploiting types. In D. Flickinger, S. Oepen, J.-I. Tsujii, & H. Uszkoreit (Eds.), Natural language engineering (6)1—Special issue: Efficiency processing with HPSG: Methods, systems, evaluation (pp. 1–17). Cambridge: Cambridge University Press.
Hashimoto, C., Bond, F., & Siegel, M. (2007). Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank. Language Resources and Evaluation (Special Issue on Asian Language Technology), 42(2), 117–126.
Hellan, L., & Haugereid, P. (2004). NorSource—An excercise in the matrix grammar building design. In E. M. Bender, D. Flickinger, F. Fouvry, & M. Siegel (Eds.), A workshop on ideas and strategies for multilingual grammar engineering. Vienna: ESSLLI.
Kim, J. B., & Yangs, J. (2003). Korean phrase structure grammar and its implementations into the LKB system, paper presented at the 17th Pacific Asia conference on language, information, and computation.
Kordoni, V., & Neu, J. (2005). Deep analysis of modern Greek. In K.-Y. Su, J.-I. Tsujii, & J.-H. Lee (Eds.), Lecture notes in computer science, Vol. 3248 (pp. 674–683). Berlin: Springer.
Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago: University of Chicago Press.
Marimon, M. (2010). The Spanish resource grammar. In Proceedings of LREC-2010, La Valletta, Malta.
Mendikoetxea, A. (1999). Construcciones con se: Medias, pasivas e impersonales. In I. Bosque, & V. Demonte (Eds.), Gramática descriptiva de la lengua española (pp. 1631–1722). Madrid: Espasa.
Miller, P. H., Sag, I. A. (1997). French clitic movement without clitics or movement. Natural Language and Linguistic Theory, 5(3), 573–639.
Monachesi, P. (1998). Decomposing Italian clitics. In S. Balari, & L. Dini (Eds.), Romance in HPSG (pp. 305–357). Stanford: CSLI publications.
Oepen, S., & Carroll, J. (2000). Performance profiling for parser engineering. In D. Flickinger, S. Oepen, J.-I. Tsujii, & H. Uszkoreit (Eds.), Natural language engineering (6)1—Special issue: Efficiency processing with HPSG: Methods, systems, evaluation (pp. 81–97). Cambridge: Cambridge University Press.
Oepen, S., Flickinger, D., Toutanova, K., & Manning, C.D. (2002). LinGo Redwoods. A rich and dynamic treebank for HPSG. In Proceedings of TLT 2002, Sozopol, Bulgaria.
Padró, L., Collado, M., Reese, S., Lloberes, M., & Castelón, I. (2010). FreeLing 2.1: Five years of open-source language processing tools. In Proceedings of LREC-2010, La Valletta, Malta.
Pineda, L., & Meza, I. (2003). Una gramática básica del español en HPSG. Tech. rep., DCC-IIMAS, Universidad Nacional Autónoma de México.
Pineda, L., & Meza, I. (2005). The Spanish pronominal clitic system. Procesamiento del Lenguaje Natural, 34, 67–104.
Pollard, C., & Sag, I. A. (1987). Information-based syntax and semantics. Volume I: Fundamentals. CSLI Lecture Notes, Stanford.
Pollard, C., & Sag, I. A. (1994). Head-driven phrase structure grammar. Chicago: The University of Chicago Press and CSLI Publications.
Siegel, M., & Bender, E. M. (2002). Efficient deep processing of Japanese. In 3rd Workshop on Asian language resources and international standardization, COLING-2002, Tapei, Taiwan.
Toutanova, K., Manning, C. D., Flickinger, D., & Oepen, S. (2005). Stochastic HPSG parse disambiguation using the Redwoods corpus. Research on Language and Computation, 3(1), 83–105.
Tseng, J. (2004). LKB grammar implementation: French and beyond. In E. M. Bender, D. Flickinger, F. Fouvry, & M. Siegel (Eds.), A workshop on ideas and strategies for multilingual grammar engineering. Vienna: ESSLLI.
Zwicky, A., & Pullum, G. (1983). Cliticization vs. inflection: English n’t. Language, 59(3), 502–513.
This work was funded by the Ramón y Cajal program of the Spanish Ministerio de Ciencia e Innovación. Part of this work was carried out during a three-month research visit at CSLI of the Stanford University funded by the Agència de Gestió d’Ajuts Universitaris i de Recerca under the programe Beques per a estades per a la recerca fora de Catalunya. The author is grateful to the anonymous reviewers for their constructive and helpful comments on the earlier version of the paper. The author also thanks all DELPH-IN members, special thanks to Dan Flickinger for fruitful discussions and Stephan Oepen for answers to numerous question about the LKB system.
About this article
Cite this article
Marimon, M. The Spanish DELPH-IN grammar. Lang Resources & Evaluation 47, 371–397 (2013). https://doi.org/10.1007/s10579-012-9199-7
- Deep processing