The semantically annotated corpus of Polish quantificational expressions

The paper presents a manually annotated corpus of Polish quantificational expressions. The quantifier annotation was conducted on top of existing gold-standard data for Polish as its separate layer. This paper releases the data and gives an overview of the corpus and related tools. As far as we know, this is the first large-scale annotation of generalized quantifiers together with their crucial semantic properties, including monotonicity profile. We also discuss the potential further use of the corpus in linguistics and cognitive science.


Overview
The paper presents a manually annotated corpus of quantificational expressions of Polish. The corpus is a new separate layer of annotation in the gold-standard 1.2 million tokens large subcorpus of the National Corpus of Polish (NKJP1M, Przepiórkowski et al., 2012). It is a balanced set of short samples (approx. 40-60 words long) representing different text genres and available on GNU GPL. It is the most widely used resource for Polish in standard NLP and machine learning tasks covering automatic annotation of various levels. Its annotation features adjudicated sentence-and word-level segmentation, morphosyntactic description, shallow parsing (syntactic words and groups), named entities description, and limited word sense disambiguation. Thus the quantificational layer contributes to the semantic level of the gold-standard annotation of the dataset and, at the same time, may benefit from the existing morphological and syntactic layers.
The paper describes the process of manual annotation of the corpus regarding quantificational expressions and their features, gives an overview of the corpus, and points out its relevance for linguistics and cognitive science. The corpus will also serve as a referential data set and training data for machine learning classifiers.

Related work
According to our knowledge, there were no previous large-scale attempts at the manual annotation of generalized quantifiers in any language, so the presented corpus is a pioneering work in the field. However, we are aware of two small-scale quantifier corpus research. Higgins and Sadock (2003) has used the Penn Treebank Annotation of the quantifier phrases to propose a machine learning approach to modeling quantifier scope preferences. Their research resulted in a dataset of 893 double-quantified sentences, annotated with Penn Treebank II parse trees and handtagged for the primary scope reading. Reguera and Stender (2013) has conducted a contrastive study of quantifier use in Spanish and German 60 economic texts in online media. As we will see, the corpus being released in this paper has not only a much more comprehensive coverage but is also tagged with very general semantic features of quantifiers.
Two works had a substantial impact on the presented project. The first of the two is an extensive two-volume survey of the quantificational expressions from the cross-linguistic perspective . In the introductory chapter Quantifier Questionnaire, many useful distinctions, and guidelines for recognizing and describing quantifiers were specified. Polish was not included among 34 languages presented in the survey. The genetically and typologically closest language considered in both volumes was Russian as the only Slavonic language. The other significant source of motivation was a study by Szymanik and Thorne (2017) in which the authors investigate the frequencies of 36 most common quantifiers in English in The WaCky corpus (Baroni et al., 2009). The authors have shown that semantic complexity (Szymanik, 2016) contributes to explaining the differences in frequency distributions. The major limitation of this study is the restriction to the small group of English quantifiers. The corpus described in the current paper will allow, for instance, to refine the results of Szymanik and Thorne (2017) by counting all the quantifiers that occur in a corpus with their semantic features, including semantic complexity, and recognizing the more robust statistical patterns.

Quantifiers
Quantifiers are semantic objects. Intuitively, by quantifier or quantificational expression, we understand a natural language expression indicating quantities that are topic neutral, i.e., the truth of the quantifier statement does not depend on the particular individuals to be considered. Typical examples in English are all, not quite all, nearly all, an awful lot, a lot, a comfortable majority, most, many, more than n, less than n, quite a few, quite a lot, several, not a lot, not many, only a few, few, a few, hardly any, one, two, three. Extensionally, a quantifier can be represented as a relation between two sets (properties), Q(A, B). For instance, a quantifier ''Some As are B'' can be thought of as a relation between predicates A and B, specifically to make the sentence true, the extensions of the two predicates need to overlap. Analogously, the meaning of quantifier ''all'' can be given in terms of inclusion relation of the denotation of the restrictor (first argument) in the denotation of the scope (second argument). 1 Mathematically speaking, there are other possible types of generalized quantifiers; however, quantifiers taking two properties as their arguments (as defined above) are the most common across natural languages (Peters & Westerståhl, 2006). They are also most intensively discussed in semantic literature. There is no agreement among linguists whether, and if yes, which of the more complex quantifiers are even expressed in any natural language (Keenan, 1992;Beck, 2000). Last but not least, by restricting attention to those types of quantifiers, we make the annotators' task practically feasible. We eliminate some commonly occurring expressions that are not prototypical examples of quantifiers but could be interpreted in the framework of generalized quantifier theory. For instance, proper names, like John, are often interpreted as quantifiers of type (1), and possessives are quantifiers not satisfying the topic neutrality condition, i.e., isomorphism (see, e.g., Peters & Westerståhl, 2006 for examples and more extensive discussion).
The annotators' task was to identify a quantifier and describe its three features, which we will define in the following sections. The chosen features were: grammatical structure (D vs. A quantifiers; see Sect. 3.1), quantificational force (universal,existential,or proportional;see Sect. 3.2), and its positivity/negativity (monotonicity profile; see Sect. 3.3). The most important part of the annotation specified each quantifier's features in terms of categories described in the annotators' manual and presented briefly in the following subsections. In selecting the features, we follow  comprehensive overview of quantifier properties from a cross-linguistic perspective. We have selected those features as they are among the most critical linguistic and logical characteristics of the quantifiers. The first feature informs us about the basic syntactic and predicate properties of the quantifier. The second one roughly characterizes its meaning and complexity. The third one gives information about the inferential and grammatical properties of the quantifiers (e.g., downward monotone quantifiers are known to trigger negative polarity items) (Ladusaw, 1979). These features also play an essential role in the linguistic debate about the characterization of natural language quantifiers and their universal properties (see Sect. 8 of the paper for some discussion). Furthermore, the annotations should be further extended in the future with other properties of quantifiers. Describing the three selected key dimensions of quantifier meanings will help with such further work.

D-and A-quantifiers
The first category distinguishes D-quantifiers from A-quantifiers (Bach et al., 1995;Partee, 1995). This feature of quantifiers refers to syntactic and predicate structures in which the quantifier occurs. In the predicate-argument structure of an utterance, D-quantifiers form expressions that are predicates (nominal phrases), but Aquantifiers directly build or modify predicates. This semantic distinction is also reflected in purely syntactic functions of the expressions: D-quantifiers are usually nouns, adjectives, or numerals (ex. 1), whereas A-quantifiers are verb modifiers: verbal affixes, auxiliary verbs, or adverbs (ex. 2). In the context of our corpus, Aquantifiers are almost exclusively adverbial phrases or functionally adverbial idiomatic expressions. Among the most frequent ones are mostly temporal adverbs such as (nie) zawsze '(not) always', nigdy 'never', często 'often', czasem, czasami 'sometimes', rzadko 'rarely'. Also, adverbial phrases indicating repeatability of events are common: raz 'once', dwa razy or dwukrotnie 'twice', wiele razy or wielokrotnie 'many times'.
( There exist interpretations of some verbal prefixes na-and po-as A-quantifiers in Slavonic languages (i.e., Russian, Paperno, 2012), which can also be applied to Polish. In practice, however, they appeared only three times in our corpus even though specific examples of such prefixal quantifiers were explicitly given in the annotation manual, as in the following example of verbal prefix na-which has cumulative meaning: ( The reason for the rareness of such constructions in our corpus is that they are rather colloquial, and only a small fraction of our corpus consists of spoken data. In fact, two of the three examples of such prefixal quantifiers appeared in the spoken subcorpus.

Universal, existential, and proportional
The second category distinguishes between existential (intersective), e.g., some, none (see ex. 5 and 6), universal (co-intersective), e.g., all, (ex. 7) and proportional quantifiers, e.g., many, every third (ex. 8 and 9). The criteria for distinguishing the three are extensional and adopted after Keenan and Paperno (2017). For Q being a quantifier and A, B sets, if Q(A, B) is determined by A \ B, that is the set of As that are Bs, then Q is existential (intersective). If Q(A, B) depends on the property A À B, that is the set of As that are not Bs, then Q is universal (co-intersective). If Q(A, B) depends on the proportion of As that are Bs, that is jA \ Bj=jAj, then Q is proportional.
(5) Niektóre wartości uległy w ostatnich latach dewaluacji Some values succumbed in recent years devaluation 'Some values have devalued in recent years.' (6) W momencie wybuchu po_ zaru w budynku nie było nikogo z At moment outbreak fire in building not was none of domowników members of the house (7) Wszyscy uczniowie w zasadzie są przeciwko stosowaniu przemocy All student generally are against use violence 'All students are generally against the use of violence.' (8) Wielu radnych pobiera wysokie diety Many councillors take high diets 'Many councillors are on high diets.' We also distinguished a class of numeral quantifiers (unmodified numerals), e.g., 5, which are restricted only to quantifiers expressed by a number. The motivation for the additional value of the category is purely practical and technical: numeral quantifiers are one of the most frequent in texts and relatively less interesting, so marking them with a separate label provides an easy way to filter them out. So far, we did not distinguish among existential quantifiers a separate class of modified numerals (e.g., more than 5), which would be a possible future extension. In line with Szymanik and Thorne's (2017) complexity analysis, we expect that existential and universal quantifiers will be the most frequent, followed by the proportional quantifiers.
The paradigmatic example of a proportional quantifier is większość 'most'. The class also contains quantifiers, like wiele 'many' or mało 'few'. We are aware that these quantifiers may be sometimes, depending on the context, also interpreted as existential constructions (Partee, 1989). However, as in the majority of cases, the proportional interpretation seems to be available, so we decided to uniformly treat those expressions as proportional and leave the empirical research into other possible meanings for the future. Furthermore, proportional quantifiers often consist of more than one token. A significant number of them are those expressing a percentage of a whole population, such as: (9) Co trzeci u_ zytkownik mieszka w wielkim mieście, a tylko 9 proc. Every third user lives in large city, and only 9 percent to mieszkańcy wsi are inhabitants countryside Every third user lives in a large city, and only 9 percent. are inhabitants of the countryside.
We also treated synonymous quantifiers oba and obydwa 'both' as a special case of proportional quantifiers expressing the meaning: 'two out of two'. They consist of about 10% of all proportional quantifiers in the corpus.

Monotonicity
The third category described for each quantifier is its left and right monotonicity annotated as two separate features but with the same range of values. Both are tested independently for each quantifier, and the category can take one of three values: increasing, decreasing, and non-monotone. A quantifier Q is upward monotone (increasing) in its left (respectively, right) argument if and only if, for any sets A, B, C, and D, if A is a subset of C and B is a subset of D, then Q(A, B) entails Q(C, B) (respectively, Q(A, B) entails Q(A, D)). As the property's value might not be determined directly in the context of a corpus utterance, the annotators were encouraged to use diagnostic sentence schemes for testing the monotonicity of the quantifiers. For example, a quantifier some may be put in the following context: (10) Some student like candy.
From the fact that sentence (10) logically entails sentence (11), it can be seen that the quantifier some is upward monotone on its left argument. As sentence (10) implies sentence (12) some is also upward monotone in its right argument. Polish quantifier niektóre as in sentence (5)  If a quantifier is not an upward or downward monotone in its left or right argument, e.g., exactly 5 translated to Polish as dokładnie 5, then we say that the quantifier is non-monotone in this argument. Right monotonicity is crucial for semantic and psycholinguistics research. Barwise and Cooper (1981) even proposed it as one of the semantic universals-a property that every language of the world satisfies. The proposed generalization can be formulated as all simple D-quantifiers are the right monotone or are conjunctions of the right monotone quantifiers. The conjunctions of monotone quantifiers are sometimes also called connected quantifiers. Therefore, we expect that all (or almost all) monomorphemic D-quantifiers in our corpus should be right monotone or connected. Furthermore, there is ample psycholinguistic evidence that right downward monotone quantifiers are harder to process for humans (reasoning, comprehension, verification, and acquisition); see, e.g., Szymanik (2016) for an overview or Deschamps et al. (2015) for recent experimental evidence-one possible explanation associates this extra complexity with a lower overall frequency of right downward monotone quantifiers. Our corpus allows directly comparing the frequencies of downward and upward monotone quantifiers.

Annotation and tools
Since there were no large-scale semantic attempts at annotating quantifiers so far and no specific guidelines were established, we have decided to follow the general best practices in manual corpus annotation. Each sample in the corpus was annotated simultaneously by two independent annotators. An additional adjudicator resolved conflicts between the two. Since the quantifier theory involves interdisciplinary research originating in logic and linguistics, we have decided to recruit annotators with different backgrounds and divide them into two teams. The first team consisted of cognitive science undergraduate students. Most of them had no previous experience with linguistic annotation of any kind but had a more substantial logic background. According to the recruitment process, they needed to complete at least four semesters of formal logic courses to be hired in the project. The second team consisted of four qualified linguists (graduates in Polish philology), who were experienced in various linguistic annotations: morphological, syntactic, and semantic, but with no background in logic. Each sample was Fig. 1 An example of adjudication of collision between two annotators as seen in the Webanno application. The screen shows the same sample consisting of four sentences: the top part is the adjudicator's version and the remaining two are versions of the annotators. As it could be seen, the annotators agree on everything but one feature of the quantifier nikogo z 'none of' in the second sentence (see example 6): the second annotator decided that it is universal rather than existential. The second annotator is wrong and the adjudicator chose the version of the first annotator annotated by one annotator from each team to diversify insights and reduce oversights in the corpus material. One of the authors of this paper (with a background in both linguistics and logic) served as an adjudicator for the whole project, occasionally consulting the other author. The adjudicator resolved many conflicts between the annotators, see the next section for details on the interannotator agreement. We believe that by recruiting two teams of annotators with different educational backgrounds, we could better identify all the possible quantificational expressions. For that reason, any future extension of the annotation will be much faster and easier. The annotators also had access to a dedicated mailing list, where they could ask questions and discuss problems concerning their work. Figure 1 presents an example of a collision between two annotators and adjudicator's choice as seen in the WebAnno application.
The annotation was conducted in the web-based application WebAnno (Eckart de Castilho et al., 2016) designed for different types of linguistic annotation. WebAnno is based on Java and SQL database, so it has quite standard requirements, making it relatively easy to run and operate. The application allows for sharing different projects in one installation.
During the annotation process, the annotators had access to some information from other layers existing in the gold-standard subcorpus of the National Corpus of Polish (NKJP1M), namely: morphosyntactic tags and some selected surface syntactic groups that could be indicators of quantificational usage of an expression. The syntactic groups are limited only to adverbial groups (which could be Aquantifiers) and numeral groups (most likely an existential numeral quantifier). However, as we treated quantifiers primarily as semantic units, annotators were not bound to those distinctions from other non-semantic layers. They were even free to switch off that information from their view if they did not consider it useful.
One may observe that the three quantifier features, grammatical structure, quantificational force, and monotonicity are to at least a significant extent lexicomorphosyntactic properties. Hence, one may wonder whether we could have first created an exhaustive dictionary of the quantifiers and only later assign the properties to the identified lexical quantifiers. We have decided to create a quantifier dictionary and annotate the features in parallel because we did not want to assume that no two homonymous quantifying expressions can have different feature values depending on the context. Also, identifying quantifiers only based on dictionary entries is doubtful as some words may serve both as quantifiers and non-quantifiers depending on context. Consider two examples below: in (16) większość means majority as in parliamentary majority and is not a quantifier. In (17), however, większość 'most' is one of the usual proportional quantifiers:

Inter-annotator Agreement
As it was mentioned above generalized quantifier expressions may be sparse textwise. For that reason, our approach was focused primarily on identifying all text units that could be interpreted as quantifiers and correct or reduce the redundant ones during the process of adjudication. Selecting two groups of annotators based on their background and experience was motivated by this goal, even though it resulted in a significant decrease in the inter-annotator agreement (IAA) rate.
As it is always the case in manual annotation a significant part of inconsistencies between annotators were simple mistakes, overlooks, and misclicks. Among more interesting examples of inconsistencies were quantifiers such as _ zaden 'none', nikt 'nobody', nic 'nothing', nigdy 'never', nigdzie 'nowhere' which quite often by at least some annotators were marked as universal rather than existential. From the cognitive perspective, this seems quite natural that expressions semantically informing of the non-existence of entities of some sort are actually existential, even though such examples were given in the annotation manual. Of course, we are not drawing any conclusions about the cognitive aspects of such quantifiers, however, we consider this specific example interesting.
Another typical example of systematic mistakes in the annotations are quantifiers such as większość 'most' and similar, which are monotonic on one argument and non-monotonic on the other. Some annotators intuitively and unconsciously assume that the quantifier needs to be either monotonic or non-monotonic on both arguments.
To calculate IAA we used Cohen's Kappa as a standard and most widely used measure. However, there are at least two problems to keep in mind. Firstly, quantifiers may be either single-or multi-token expressions so the annotator needs to identify boundaries of such expressions-two annotators may generally agree on the given quantifier and its features but disagree over one token belonging (or not) to the expression. Secondly, quantifiers are sparse, which means the tokens that are not annotated vastly outnumber those that need to be annotated, which artificially increases the score. Thus, for the purpose of evaluation, we have calculated Cohen's Kappa for the token level and only for those tokens which were annotated as belonging to quantificational expressions.
In the process of annotation 36,887 tokens were identified as belonging to such expressions by at least one annotator and only 23,284 (63.1%) simultaneously by both annotators. This strongly impairs the IAA results and suggests that the task of identifying quantifiers was relatively difficult. As stated above, low IAA for quantifiers identification was expected and to some extent influenced by the decision of recruiting annotators with diverse backgrounds and experience. However, if only tokens annotated by both annotators are taken into account it is possible to additionally tell which features are harder to specify after a quantifier is identified. Table 1 presents Cohen's Kappa for four features described in Sect. 3. As it could be seen, the IAA score is relatively high for quantifier type (D vs. A), with subtype (universal vs. existential vs. proportional) of a significantly lower rate and both left and right monotonicity with almost identically low rates. The ranking is again expected since the decision between A and D types is binary and to some extent could be induced based on surface syntactic features, so the task is relatively easier than deciding on other features. On the other hand, monotonicity itself is a complicated notion, and diagnosing it is even more difficult outside of the context of an artificially prepared textbook example.
The IAA results are rather low compared to the results of annotation for wellestablished NLP tasks such as named entities. However, our approach was experimental, and in the process of annotation, we intentionally favored recall over precision. Thus a large amount of work was also done in the process of adjudication.

Querying the corpus on the web
The corpus is available on the web both as a separate layer of annotation together with the whole NKJP1M indexed in the corpus search engine and as an XML source tarball for processing in other projects, see: http://kwantyfikatory.nlp.ipipan.waw.pl/ Type refers to the distinction between A-and D-quantifiers. Subtype refers to a distinction between universal, existential, and proportional quantifiers. We separately looked into the monotonicity in the left (restriction) and right (scope) arguments The semantically annotated corpus of polish quantificational... 1067 The corpus is indexed using MTAS (Brouwer et al., 2017), a multi-tier annotation search engine that allows for indexing multiple layers of annotation. The quantifier layer, as well as all previously existing annotation layers, are all accessible from the Corpus Query Language, which enables searching for alignments between grammatical and quantificational layers. The scripts and data reported in the paper can also be found there and will be made available upon the publication of the paper. The query language is an extension of the annotation layers previously existing in the corpus consisting of morphosyntax, named entities, syntactic groups, and some limited word sense disambiguation layers. The quantifier layer consists of singleand multiword elements together with their features encoded in the tagset. Quantifier units may be queried with a <q/> element, optionally enriched by feature values combined in a positional tag. For example, a query: \q = ''D:prop:nmon:inc'' [ will search for all D-type proportional quantifiers, which are left non-monotone and right upward monotone. The majority of hits, in this case, will return instances of quantifiers such as many, both, most, or similar. Regular expressions are accepted in the query as well, allowing for queries with some features unspecified (e.g. <q = ''.*:exst:.*'' /> for all existential quantifiers). It is also possible to query simultaneously different layers of annotation, which allows for restricting the results to quantifiers containing specific words (e.g., <q /> containing [base = ''_ zaden''] for quantifiers containing a word _ zaden 'no'), containing specific part of speech (e.g. <q /> containing [pos = ''conj''] for quantifiers containing conjunctions) or occurring in specific phrases (e.g., <q /> llyalignedwith <g = ''NumG'' /> for quantifiers that are numeral groups on the syntactic level). Basically, MTAS allows for constructing queries compliant with the CQL standard.

Basic statistics
The NKJP1M corpus consists of 18,484 short samples (40-60 words long each, up to full sentence). In 11,606 (62.79%), at least one quantifier was annotated. In total, 21,938 quantificational expressions were annotated, which is 1.19 on average in each sample.
As expected, see Table 1, one of the most numerous groups among the quantifiers is unmodified numerals constituting 29.82% of all units. D-quantifiers, including the unmodified numerals (27%), are ten times more frequent than A-quantifiers (19,801 to 2137). Existential quantifiers (7461, 34.01%) are more frequent than universal ones (3902, 17.79%), which are again slightly more frequent than proportional (4032, 18.38%). Next to typical examples for the class of proportional quantifiers, this count includes, as discussed above, oba and obydwa 'both' (8.26% of all proportional count). Moreover, percentage expressions of the form ''n %'' or quantifiers including the sequence ''n%,'' e.g., ''more than n%'' are counted under the proportional label (379 occurrences). These frequency numbers are consistent with the semantic complexity predictions mentioned in the introduction (Szymanik & Thorne, 2017).

Corpus and quantifier theory
One of the crucial achievements of formal semantics was to formulate linguistic universals for the domains of function words, with quantifiers being a prime example in the literature (Barwise & Cooper, 1981). Over the years, we have seen intensive research efforts in linguistics and cognitive science to assess the proposed universals' empirical adequacy and find explanations for their existence (see, e.g., Steinert-Threlkeld & Szymanik, 2020a). This research program's major bottleneck is the lack of cross-linguistic quantitative data that could help with the theory testing and development. Such data often exists for other semantic domains, e.g., color terms (Berlin & Kay, 1969) or kinship terms (Murdock, 1970), supporting advancements in the field (Kemp & Regier, 2012;Steinert-Threlkeld & Szymanik, 2020b). We believe that gathering quantitative cross-linguistic data on function words, like quantifiers, is crucial to understand language and cognition better. Therefore, our biggest hope is that the detailed description of the corpus building process, presented in the paper, will motivate similar work and be a reference to replicate the process in other languages.
What does our data tell us already about the quantitative distribution of quantifier features? A review of the annotation regarding right monotonicity may be at first sight surprising for researchers working on quantifiers. We have tagged 44.42% of all quantifiers as non-monotone in the right argument (right nmon, see Table 2). However, the quantifier literature strongly suggests that right monotonicity is a The semantically annotated corpus of polish quantificational... 1069 Table 3 The cross-linguistic universal among monomorphemic quantifiers (Barwise & Cooper, 1981). So let us break down the class of right non-monotone quantifiers in Polish into further categories: first of all, 67% of all occurrences are, in fact, numerals. Hence, they are either non-monomorphemic quantifiers of the form ''exactly n'' or just bare numerals of the form ''n,'' which arguably should be interpreted semantically in a monotone way as ''at least n.'' For instance, (18) W sumie pełnił funkcję prezydenta przez 5 lat i 214 dni In total performed function president for 5 years and 214 days In total he performed the presidential function for 5 years and 214 days.
Another interesting subgroup, existential right nonmonotone quantifiers (22%), consists of very specific to Slavic languages quantifiers such as kilka, kilkanaście, kilkadziesiąt, kilkaset meaning 'more than X and less than Y' (e.g., kilkanaście means 'between ten and twenty') and their A-type adverbial counterparts (kilkakrotnie, kilkunastokrotnie). For example, Po tym wydarzeniu przez kilka dni przebywał After this event for between 1 and 10 days stayed w szpitalu in hospital After this event he stayed between one and 10 days in the hospital.
Arguably, such quantifiers also cannot be labeled as morphologically simple. Moreover, they are semantically equivalent to the conjunction of monotone quantifiers. They are so-called connected quantifiers. The remaining 10% consists of complex proportional quantifiers including the phrase ''n%,'' which got assigned the exact interpretation, exactly ''n%'' (out of 379 occurrences of such quantifiers, only 51 are monotone). For instance, see the second part of sentence (9) repeated below: … a tylko 9 proc. to mieszkańcy wsi. … and only 9 percent are inhabitants countryside. … and only 9 percent. are inhabitants of the countryside.
Hence, even though Polish is not a counterexample to the current universal formulation, the data shows that non-monotone complex quantifiers are very frequent textwise (Table 3).
Monotonicity also plays a crucial role in psycholinguistic research. The experimental studies have repeatedly demonstrated that downward monotone quantifiers are more difficult to process than upward monotone ones. Subjects need more time to read sentences with downward monotone quantifiers and make more errors when asked to evaluate their truth values (see, e.g., Szymanik, 2016 for an overview). One possible explanation of this so-called monotonicity effect is that it may be due to the relative frequencies (Degen & Tanenhaus, 2019). And indeed, in our corpus, the number of right downward monotone quantifiers is significantly smaller than the number of right upward monotone quantifiers (see Table 2).

Outlook
The first step in future work with the annotated corpus is an insightful analysis of the annotated units concerning the possible extensions of quantifier description. The three categories considered in our project are by no means exhaustive, and many other possible features of quantifiers could be added in the extended annotation. The list of categories can be extended, depending on the research goals. For instance, among existential quantifiers, one may wish to distinguish value judgments, e.g., ''Enough members attended to constitute a quorum'' ) and among non-monotone quantifiers, one may want to distinguish connected quantifiers (conjunction of monotone quantifiers), e.g., between 5 and 7 or kilkanaście (Chemla et al., 2019). Furthermore, building on already determined quantifier features, one may want to focus, for instance, on morphosyntactically complex quantifiers, like already mentioned modifications but also Boolean combinations, exception phrases (all but students), bounding phrases (twice a day), or partitive constructions (most of the) . Another direction would be to try to disambiguate and count various readings of quantifiers, for example, proportional and cardinal readings of quantifiers such as many or few. The tagging system could also be extended by other quantifier properties known in the literature like universe independence also know as extensionality (Peters & Westerståhl, 2006). Some of those extensions may be carried out automatically or semi-automatically. Another possible extension of the annotation is to include the comparison type of quantifiers. Each quantifier can be either positive, comparative, or superlative. Modified numerals come in two, semantically equivalent, flavors: comparative, e.g., more than, fewer than, and superlative, e.g., at least, at most. Geurts et al. (2010) have provided evidence that superlative quantifiers are harder to process for humans than comparative quantifiers. Thus, as in the case of monotonicity, it may be interesting to compare the frequency of the two types of modified numerals.
In parallel with our annotation project, there has been recently an effort to establish an ISO standard annotation scheme for quantification phenomena in natural language as part of the ISO Semantic Annotation Framework (ISO 24617) (Bunt, 2020). The developments of the standard are still very much in the preliminary stage. Most importantly, it still needs to be validated in a manual and automatic annotation. The annotation scheme proposed in the standard is highly complex in its current form, making it too difficult to use for the annotators.
However, if the system will become validated and supported by training and annotation tools, it would be interesting to test it on our corpus.
The manually annotated data will also serve as a training corpus for a machine learning classifier aimed at the automatic semantic annotation of quantifiers in large corpora. Based on the annotation, we plan to carry out an extensive corpus-based analysis of quantifiers distribution in Polish based on the standard balanced and representative 300M tokens large corpus of modern Polish. One natural direction would be repeating the research on semantic complexity conducted by Szymanik and Thorne (2017). The biggest weakness of their analysis was the restriction to 36 most common quantifiers. Using our corpus, we could have much broader coverage, approximating all quantifier expressions in Polish. Therefore any statistical generalization about the influence of various semantic factors on linguistic distribution would be more robust. Also, such analysis would be based on a typologically different language than English. Furthermore, an additional aspect of the text genre could be taken into account.