A valency dictionary specifies what types of arguments are possible for a given predicate. The need for such information is most obvious for verbs, which differ widely in possible arguments, e.g., some Polish verbs allow for a complement in the form of a verbal phrase in the infinitive and others do not. Other classes of predicates have mostly typical dependants—for instance, adjectival and prepositional-nominal phrases for nouns. Infinitival dependants do not occur with nouns and are very rare for adjectives. Yet providing valency information for other classes of predicates is also useful, especially when differentiating arguments and adjuncts is involved.
Initially, the Świgra parser used a valency dictionary based on (Świdziński 1994). This dictionary was extended when the Składnica treebank was built. Its version released with Składnica 0.5 consisted of 6400 schemata for 1450 Polish verbs, covering about 75% of verb occurrences in the 1 million tokens manually annotated subcorpus of the NKJP (Woliński et al. 2011). The dictionary only contained verbs.
Later, this dictionary became a seed for a new one, which is currently being developed at the Institute of Computer Science of the Polish Academy of Sciences (ICS PAS). The new dictionary, called Walenty, is a comprehensive valency dictionary of Polish based on corpus data (Hajnicz et al. 2016a, b; Przepiórkowski et al. 2014a, b, c). After several years of development, Walenty contains 101,500 schemata for 18,250 predicates, which include about 13,000 verbs, 4000 nouns and 1100 adjectives and adverbs. Walenty covers 99.8% of occurrences of verbal forms in the 300 million word balanced sub-corpus of the NKJP. Moreover, Walenty is much richer in linguistic information than the original dictionary of the Świgra parser. Among other features, it describes syntactic control and raising and contains a rich phraseological component.
Walenty consists of two layers. On the syntactic level of Walenty, valency is expressed in terms of syntactic types of phrases (e.g., nominal phrase, verbal phrase) and their grammatical features (e.g., case, aspect). Phrases of specified types fill syntactic positions, which comprise syntactic schemata. The second level describes semantics by coupling syntactic schemata with semantic frames consisting of arguments specified as semantic roles and their selectional preferences (Hajnicz et al. 2016a).
In this paper we are only concerned with the syntactic layer. Thus we will consider a dictionary entry for a predicate to be a set of valency schemataFootnote 4. Each schema is a set of syntactic positions, which can be realised by arguments of specified phrase types.
Phrase type specification in Walenty describes the kind of allowed syntactic construction and required grammatical features. The list of phrase types includes nominal phrases (np), adjectival phrases (adjp), prepositional phrases (nominal prepnp and adjectival prepadjp), infinitival phrases (infp), clausal phrasesFootnote 5 (cp), clausal phrases with a nominal correlate (ncp) and clausal phrases with a prepositional-nominal correlate (prepncp). The phrase types have several attributes specifying their grammatical features. Table 2 presents attributes governed by the predicate for each phrase type and lists possible values of particular attributes. For the complete specification of available phrase types, see (Hajnicz et al. 2016b).
Grammatical case is governed by the predicate for nominal, adjectival and prepositional phrases. Similarly, the predicate governs the case of nominal (ncp) and prepositional-nominal (prepncp) correlates. Apart from six usual case values, some special ones are used in the dictionary. The most important is the so-called structural case, i.e. the case whose morphological realisation depends on the syntactic context. Structural case is used to specify nominal phrases underlying the genitive of negation. In Świgra and Składnica, we denote this structural case with the mnemonic symbol np(accgen), since this type of phrase is realised in the accusative or in the genitive, depending on whether the predicate is negated or not.Footnote 6 This can be illustrated with a simple schema for the verb jeść ‘to eat’ (imperfect):
The schema comprises a subject position, a nominal object position in the structural case and a position for a prepnp phrase type containing the preposition na with an accusative complement. This schema can be applied to an affirmative sentence (2) and a negated sentence (3). Observe that the object mi
so ‘meat’ in (2) is in the accusative, whereas owoców ‘fruit’ in (3) is in the genitive case.
The other non-standard values for case are used when the grammatical case depends on the predicate in some convoluted manner. The symbol part represents the so called partitive case, agr denotes agreement of a dependant with the head in phraseological schemata, and the so called predicative case pred is used for adjectives in the predicative position (Przepiórkowski et al. 2014a).
The kind of clausal phrase cp is typically determined by specifying the complementizer introducing the clause, e.g., jeśli ‘if’, kiedy ‘when’, że ‘that’ or żeby ‘in order to’. Two types are not introduced by a complementizer. These phrases are bare clauses of a specific type: relative clauses cp(rel), which must contain a relative pronoun in the initial constituent, and interrogative clauses cp(int), whose initial constituent has to be interrogatory.
A simple schema (4) for the verb podejrzewać ‘suspect’ containing an interrogative clausal phrase is exemplified by sentence (5) with a subordinate interrogative clause kim był denat ‘who the deceased was’ introduced by the interrogative pronoun kto ‘who’.
Clausal phrases can appear with a nominal (ncp) or a prepositional (prepncp) correlate. The correlate is a form of the pronoun to ‘this’ in a governed case, optionally appearing after a preposition. These constitute separate phrase types, since they are not (always) interchangeable with cp. Sentence (6) contains a clause tym, że pokazuje w każdej piosence nieco inn
siebie ‘by showing herself from a slightly different side’ of type ncp(inst,że), which is introduced by the nominal correlate in the instrumental followed by the complemetizer że ‘that’. Sentence (7) contains a clause o tym, by jeździć bezpiecznie ‘to drive safely’ of type prepncp(o, loc, żeby) composed of the preposition o ‘about’ governing correlate tym ‘this’ in the locative followed by complementizer żeby ‘in order to’. The schemata used in these examples will be discussed on page 28, as they use coordination, cf. (10) for the verb urzekać ‘to charm’ and (13) for pami
tać ‘to remember’.
Semantically motivated phrases
Walenty provides semantic classification of some adverbial-like arguments (e.g., ablative and adlative), denoted as xp(...). Such valency positions can be filled mainly with adverbs and prepositional phrases. The attribute of xp specifies a semantically motivated set of allowed realisations. For example xp(abl)—ablative phrase, marking the departure point of a motion—can be realised (among others) by adverbs st
d ‘from here’, znik
d ‘out of nowhere’, or prepnp(z,gen)—phrases with the preposition z ‘from’. Adlative phrases xp(adl) denote point of arrival: tutaj ‘here’, naprzód ‘forward’, prepnp(do,gen)—do ‘towards’, complex preposition comprepnp(w kierunku) ‘in the direction of’, or even clauses, e.g. cp(rel[dok
d;gdzie])—a relative clause limited to two relative pronouns dok
d ‘where to’ and gdzie ‘where’. The lists of allowed xp realisations are stored separately; their identifiers are used in schemata. In total, there are 10 specific subtypes of xp—expressing time, duration, place, starting or ending point, path, tool, manner, cause, or aim, cf. Table 2.
Ablative, adlative and perlative phrases are typical for verbs of movement. Below we present a schema (8) of the verb maszerować ‘to march’, illustrated by sentence (9), where the ablative phrase is realised by a prepositional phrase z domu ‘from home’, the adlative phrase—by a prepositional phrase do szkoły ‘to school’, and the perlative phrase—by a nominal phrase in the instrumental niebezpieczn
As can be seen in the previous examples, two positions are labelled in syntactic schemata: the subject subj (the nominal argument in this position influences morphological features of the finite verb) and the passivable object obj (the argument in this position turns into a subject in the passive voice; the presence of this position signals that passive voice is possible).
Walenty is explicit about what counts as a single syntactic position, and it employs the coordination test to resolve doubts in this respect: if two phrases can be coordinated in the same sentence then they are different realisations of the same position and they are listed in the same schema as alternative realisations for the given position. For instance, the sentence (11) contains two coordinated phrases np(inst): solidność ‘solidity’ and pracowitość ‘diligence’ and the clause with the nominal correlate ncp(inst,że) tym, że na wszystko miała sposób ‘that she has a solution for everything’. The schema used to parse this sentence is (10).
Sentence (12) is an example of two coordinated clauses with a prepositional correlate with the preposition o—prepncp(o, loc, że) (o tym, że si
cieżko pracuje ‘that sb works hard’) and prepncp(o, loc, int) (o tym, jaka jest sytuacja innych ludzi ‘about what the situation of other people is’).
The following schema can be used to analyse this sentence:
Coordination is the main reason to allow clausal phrases cp in the subject position. Let us look at sentence (14) with a clausal argument że zapomniałam, jak wygl
dasz ‘that [I] forgot what [you] looked like’. The sentence could be modified and extended into (15) in which the clause is coordinated with a nominal subject Piotr i że zapomniałam, jak wygl
da ‘Peter and that [I] forgot what [he] looked like’, which is an argument to assume that the cp(że) clause is a subject in (14). The respective schema for the verb śnić ‘to dream’ is shown as (16).
Other cases of the so-called unlike coordination are discussed in Sect. 7.
In Walenty, due to the free word order of Polish, the order of positions within a schema and the order of argument types within a position is not important.
Valency schemata given by Walenty are maximal—the dictionary does not list possible sub-schemata of a given schema. In Polish, most arguments are optional. In particular, subjects are often omitted. It is also possible to omit a direct object. A sentence with a missing direct object remains grammatical, but is usually semantically incomplete. Thus, transitivity is a much less sharp classification of Polish verbs than it is for English.
Only phraseological elements are strictly obligatory in Walenty. A schema with such elements cannot be applied if the phraseological arguments are missing in the sentence.
Walenty includes a rich phraseology component, implementing a detailed notation for various types of idiomatic arguments, from completely fixed (given as a string) to almost freely modifiable—in a recursive way (Przepiórkowski et al. 2014a, 2017; Hajnicz et al. 2016b). The dictionary aims at a precise representation of the structure of lexicalised arguments. For instance, schema (17) represents a phraseological construction czuć si
na siłach ‘to feel fit to do sth’. The idiomatic expression as a whole opens a position for infinitival phrase infp(_), whereas the verb czuć si
‘to feel’ itself does not. The type of a lexicalised phrase is denoted as lex with the first attribute specifying the syntactic type of the phrase (here prepnp(na,loc)). The type determines the other attributes. In this example, the phrase is required to contain a nominal phrase with the lexical head siła ‘strength’ in the plural pl, no modifiers are allowed in this phrase (natr). The construction is illustrated by sentence (18), where the infinitival phrase is składać zeznania ‘to give testimony’. Note that this is an idiomatic expression as well, meaning ‘testify’ (in Polish: zeznawać).
This notation is also used to define so called compound prepositions comprepnp. These are typically prepositional-nominal phrases which from the valency point of view act as simple prepositions—they have an argument, typically a nominal phrase in genitive np(gen) (but a clause with a nominal correlate ncp(...,gen) is also frequent). For example, comprepnp(w kierunku) ‘in [the] direction of’ occurs directly in some schemata and is a possible realisation of xp(adl) and xp(dest). For details of the internal structure notation, see (Hajnicz et al. 2016b; Przepiórkowski et al. 2017).