In this article we present a Spanish grammar implemented in the Linguistic Knowledge Builder system and grounded in the theoretical framework of Head-driven Phrase Structure Grammar. The grammar is being developed in an international multilingual context, the DELPH-IN Initiative, contributing to an open-source repository of software and linguistic resources for various Natural Language Processing applications. We will show how we have refined and extended a core grammar, derived from the LinGO Grammar Matrix, to achieve a broad-coverage grammar. The Spanish DELPH-IN grammar is the most comprehensive grammar for Spanish deep processing, and it is being deployed in the construction of a treebank for Spanish of 60,000 sentences based in a technical corpus in the framework of the European project METANET4U (Enhancing the European Linguistic Infrastructure, GA 270893GA; and a smaller treebank of about 15,000 sentences based in a corpus from the press.

    An earlier version on the grammar was briefly presented in Marimon (2010).

    The current version of the LinGO Grammar Matrix is defined as a web-based interface accessible from: A description of it can be found in Bender et al. (2010).

    Pineda and Meza (2003, 2005) describe a basic grammar for Spanish implemented in the LKB system that has 15 syntactic rules, 180 lexical entries, and 120 lexical rules.

    FreeLing also includes a guesser to deal with words which are not found in the lexicon by computing the probability of each possible PoS tag given the longest observed termination string for that word.

    SPPP assumes that a pre-processor runs as an external process to the LKB that communicates with its caller through its standard input and output channels. See

    Lexical type names consist of four fields separated by an underscore. The first three fields specify the part-of-speech, the complements that the type selects for (separated by a hyphen), and optional annotations to distinguish lexical types with the same part-of-speech and complement selection; the last field is always the suffix ‘le’ (lexical entry). Thus, the type name n_pp_c_le that we show in the example is for nouns selecting for a PP de complement, ‘c’ indicates that the noun is countable.

    For the sake of simplicity, the Spanish DELPH-IN grammar does not bind words to lexical types in the morphological lexicon of the FreeLing toolkit, which has more than 500,000 full-form entries. This approach also allows the two components, which have been developed independently, to be maintained independently of each other.

    This table also shows the number of lexical entries we have defined for NEs.

    Mendikoetxea (1999), in addition, distinguishes medio se-constructions, where, like in passive constructions, the verb has a unique argument (arg2) which is the syntactic subject and which usually precedes the verb. In the Spanish DELPH-IN grammar we treat medio constructions as a sub-class of passive constructions.

    In the implementation of modern Greek clitic doubling constructions in the modern Greek DELPH-IN grammar, proclitics are also treated in the syntax (Kordoni and Neu 2005). Pineda and Meza (2005) also propose this dual approach to Spanish object clitics.

    In Monachesi (1998) clitics are members of the feature CLTS and in Miller and Sag (1997) are members of the ARG-ST (argument structure) of the verb.

    Boxed numbers indicate that two features are token-identical.

    The Spanish DELPH-IN grammar has 14 CCLRs. Diversification of the CCLR allows to control the order within the clitic cluster when more than one complement is cliticized (imposing the additional constraint that the “spurious se” is used instead of the dative clitic when it precedes third person accusative clitics) and when object clitic pronouns occur in reflexives and the impersonal constructions. Alternatively, to control the order within the clitic cluster, (Pineda and Meza 2005) develop a clitic lexicon consisting a set of 100 clitic pronoun sequences.

    The same approach is described in Pineda and Meza (2005).


