East Slavic indefinite pronouns: a corpus-based approach

The paper focuses on the development and functional distribution of indefinite pronouns in Old East Slavic, taking into account different sources, genres and registers. All the examples in the collected dataset were taken from the historical modules of the Russian National Corpus. They were tagged for type of indefinite marker, source (including originality and date), type of reference of the indefinite marker, semantics, type of discourse, and the degree of formality (formal or informal) present in the context. We then applied both descriptive and inferential statistical methods such as Random Forest analysis as well as multinomial logistic regression. Our analysis enabled us to identify the primary and secondary predictors of the choice of a particular indefinite marker and to trace the functional distribution of indefinite markers according to these factors.


Introduction and overview of indefinite markers in the history of Russian
The present paper discusses indefinite pronouns in Old East Slavic (OES). The initial system of OES indefinites differed significantly from that of modern Russian. The system presented in the oldest OES written sources featured relatively few distinctions. It included only interrogatives in the function of indefinites (1-2), a function they can still have in modern colloquial Russian, the ně-prefixed pronouns (3), and quasi-relative constructions based on the predicative adjective ljubo, with the original meaning of 'pleasing' (4), that were used as indefinites (Krys' ko et al., 2020, p. 178-183).
(7) изидоша же ѿ воинъ ѥтери хотѧще трѣбованиꙗ брашьнѹ обрѣсти 'some of the warriors came out wishing to find the required food' (Miracles of Saint Nicholas) In the present paper, we do not dwell on the indefinite markers that emerged later -in the Middle Russian period, that is, on nibud', to and koe. The particles ljubo and libo are treated as one marker (we have only one example with libo instead of ljubo in our dataset).
In OES, there were also other pronouns that could be used in ways very similar to the indefinite ones. These include universal quantifiers such as kъžьdo 'each' and vьsjakъ 'every', and identity/non-identity determiners such as inъ and drugyj 'another'. For various reasons, neither of the two classes are considered to be true indefinite pronouns (cf. Haspelmath, 1997, 11-13). In this paper, we do not dwell on such quantifiers and focus only on true indefinites; an exception is made only for the pronoun eter, a borrowing in OES, which can have pure indefinite uses (see below).
We proceed as follows: Sect. 2 presents an overview of previous research on the history of indefinites in East Slavic. Section 3 introduces the research data and the methods used in the study. Sections 4 and 5 discuss the results.

State of research
Indefinite pronouns have always been among the most discussed topics in Modern Russian linguistics (see, for example, Růžička, 1973;Šeljakin, 1978;Padučeva, 1985;Seliverstova, 1988;Kuz' mina, 1989;Tatevosov, 2002;Bylinina & Testelec, 2005;Geist, 2008;Kustova, 2015). This is not the case for OES indefinites, which are, on the contrary, under-researched (see Malovickij, 1971;Krys' ko et al., 2020; for Middle Russian, see Penkova 2011Penkova , 2016Penkova , 2017Penkova , 2021. Differences and changes in the distribution of OES indefinites, their referential status, refunctionalization, and competition between different types in different genres and registers -these are all questions that have remained outside the scope of previous research. For Old Church Slavonic, Večerka (1989) enumerates several indefinite pronouns and provides a few examples of their being used with a substantival function. He counts interrogatives such as kъto 'who', čьto 'what', as well as někъto, něčьto, among the indefinite pronouns (Večerka, 1989, p. 39;1996, p. 135;2002, p. 179). However, Večerka only discusses the syntactic functions of these pronouns -i.e., whether they can be used attributively or substantivally -and pays no attention to their semantics and reference.
The OCS dictionary (SJS) also treats the particle ljubo as a marker of indefiniteness, providing examples from Codex Suprasliensis and some later Russian manuscripts from the 1200s and 1500s (SJS II, p. 159), cf. (8).

(8)
Имѣниѥ свое желаѧ оустроити по коѥмоу лоубо образоу 'Wishing to arrange his belongings according to a certain manner' (Supr. 547, 3sq, Večerka, 1989) The origins of ljubo as well as the shift from ljubo to libo that took place in Russian are not clear. Recently, Majer (2015) proposed an interesting hypothesis regarding the development of ljubo. He cites numerous authoritative sources (e.g., Vaillant, 1958, p. 407;ĖSSJa 15, p. 177) claiming the grammaticalization path of the indefinite marker ljubo (as well as the conjunction ljubo 'or') originated from the neuter of the adjective *ljubъ. However, after analyzing both Slavic and Non-Slavic data, he concludes that "it is a tempting alternative to derive them from a (Post-)PIE verbal form *lewb h -o(r) instead" (Majer, 2015, p. 201), more specifically, from the t-less middle voice form, which is known to produce indefinite formants across Indo-European. According to Majer, if the t-less middle voice form is the source of grammaticalization this directly implies that "the grammaticalization must have happened before the class of t-less middles was lost from the language" (Majer, 2015, p. 200). Whereas this conclusion sounds reasonable for the conjunction ljubo, which lost its semantic connection with the verbal meaning, it creates a problem for ljubo as an indefinite marker, which shows a lower degree of grammaticalization in OES (see below).
The reasons for the shift from ljubo to libo have not yet been established and call for separate research. There are contradictory explanations in the scholarly literature. Some scholars claim that the shift from ljubo into libo might have occurred for phonological reasons such as delabialization, as is typical for subsequent stages of grammaticalization (Miklosich, 1886;Haspelmath, 1997, p. 168). Others (cf. Fasmer II,p. 493) reject this explanation. This shift could possibly also be explained in terms of contamination between two paronyms, namely, the phonetically and functionally similar conjunctions ljubo and libo, which both exhibit a disjunctive meaning ('or'). Pičсhadze (2011) has discussed the pronoun eter as a feature of the Cyrillo-Methodian language. She shows that eter was often replaced with ně(kyj) in later Church Slavonic texts copied in Preslav. Nevertheless, the pronoun eter was preserved in many Old Bulgarian texts and was adopted by the Old East Slavic tradition. Many attestations of eter can be found in East Slavic copies of Old Bulgarian translations (for more information, see Pičсhadze, 2011) as well as in original East Slavic texts from as late as the 17th century (SRJa V, p. 64).
The pronoun eter could be used in two main ways: as an indefinite pronoun (with both substantival and attributive functions) and to indicate non-identity ('another'). According to Pičсhadze (2011), the latter was a Serbian-Macedonian innovation unknown in the Cyrillo-Methodian tradition. However, both uses were well-known in East Slavic. We will deal here only with eter as a marker of indefiniteness, though both meanings are to some extent semantically and (probably) diachronically related.
According to Miklosich's lexicon (Miklosich, 1862(Miklosich, -1865(Miklosich, , p. 1150) and the Etymological Dictionary of the Slavic Languages (= ĖSSJa;1981, V, p. 185-186), there was another infrequent construction which could mark indefiniteness, consisting of the demonstrative pronoun je combined with the de-particle and an interrogative pronoun. 2 In the following we will not deal with this construction with the particle jede since it is extremely marginal in OES.
Some topics related to the history of indefinite pronouns in Russian are discussed by Malovickij (1971, p. 3-130). The most significant contribution of his paper was to describe interrogative pronouns with indefinite function and to identify their typical uses, specifically in conditional clauses, interrogative sentences and irrealis predication, and in combination with the pronouns inъ, drugyj '(an)other' (i.e., in the context of disjunction). Malovickij examines other structural types of indefinites mainly based on data from late Middle Russian or even later periods (19th-20th century Russian texts). Consequently, neither the emergence nor the semantic evolution of OES indefinites has been traced thoroughly. Pronouns that ceased to be used over the course of the history of Russian, such as eter, were left out of his study.
Malovickij's description, accomplished in the "pre-corpus" era, obviously requires verification, significant revision and amendment to account for up-to-date knowledge in the field and the possibilities of corpus-based research methods. 2 The ĖSSJa mentions only the combination jede kyj. There is a construction employing this combination in modern Bulgarian. According to Miklosich, the combination is attested in Bulgarian and Serbian manuscripts, but there are no traces of it in OCS texts (cf. SJS). Unlike eter, it was not borrowed into OES, and there are no attestations of it in East Slavic written texts. Miklosich also gives evidence for another combination, jede čьto, similar to the previous one. This combination cannot be found anywhere in OES (cf. SRJa XI, SDRJa III, and RNC), except for a sole example cited by Miklosich: Но срѣтаемъ брата иного и абiе глаголемъ емоу еде что и еде что бысть 'But we meet another brother and suddenly tell him: this and that happened (?)' [perhaps mistakenly for eda čьto?] (Izv. 1856, p. 208). This citation was taken from the 16th century "Discourse" of Metropolitan Daniil, though Miklosich erroneously has a reference to an 11th century Menaion. We can hardly draw any conclusions based on one occurrence only, but in the example cited, the context exactly matches those in which indefinite pronouns with the marker edi are used in modern Bulgarian. In Bulgarian, edi serves as a marker of so-called "secondary indefiniteness", that is, it is used in the context of reproduced speech (Nicolova, 2008, p. 202-203).
According to Haspelmath, other sources of indefinites not driven by grammaticalization are: -interrogative pronouns in the function of indefinites (an archaic feature of many Indo-European languages including Vedic Sanskit, Ancient Greek, Latin, Gothic, Slavic and OES in particular, yet with restricted usage in modern colloquial Russian). Haspelmath argues that the ability of interrogatives to be used as indefinites is based on the shared semantic component of 'ignorance', a characteristic of both interrogative and indefinite pronouns (Haspelmath, 1997, p. 175); different functions of interrogatives in the function of indefinites across Slavic were discussed by Křížková (1974); -reduplication of interrogatives: Latin quisquis, lit. 'who-who' (Haspelmath, 1997, p. 179), Late Middle and modern Russian коекто, lit. 'which-who'; this type is based on the notion of irrelevance; the type is unknown in OES; -negative scalar focus particles: modern Russian negative indefinite pronouns like nikto 'not even who' (Haspelmath, 1997, p. 222), etymologically also nеktо, nečto; OES někto, něčto, etc.
There are also several other diachronic sources of indefinite pronouns, such as generic nouns (e.g. French rien, personne) or numerals denoting 'one' (Haspelmath, 1997, p. 157-335). However both types have only restricted uses. Indefinites in modern Russian were also discussed by Haspelmath (1997). However, some of his observations were based on incorrect etymological interpretations. For instance, Fasmer argues that it is incorrect to explain the etymology of the prefix ně through *nе vě 'don't know' (Fasmer III,p. 59), as Haspelmath does (Haspelmath, 1997, p. 131). There are other, more acceptable etymologies for ně, such as a lengthened form of the negative particle *ne or a PIE deictic particle *ne/no (Fasmer, 2004, III, p. 59, see also ĖSSJa 1997. A good structural analogy to the former explanation can be found in the Northern Russian dialects, where there are indefinites consisting of an interrogative and a negative particle, just like the něpronouns. However, in this case, the negative particle ni follows the interrogative pronoun, cf. gdeni 'anywhere, lit. where even not' (Zaliznjak, 2004, p. 200; on the prosodic status of Middle Russian ni, see also Penkova, 2011).
Cross-linguistically, indefinite pronouns are typically characterised by multifunctionality and diachronic instability. Multifunctionality, i.e., the ability of different structures to be used in the same types of contexts, inevitably leads to competition between them (for the competition of indefinites in Modern Russian see, for instance, Padučeva, 1985, p. 219-220). Diachronic instability of indefinites implies that they are subject to rapid changes in semantics. According to Haspelmath, as they evolve semantically, their capacity for emphasis is gradually weakened: "[…] the semantic grammaticalization of indefinite pronouns is primarily weakening of emphasis" 4 (Haspelmath, 1997, p. 154).
In the present paper, we describe the OES system of indefinite markers and establish the role of each marker in it. We can expect the OES system of indefinites to be organized differently from Modern Russian one, with one marker simultaneously covering many functions. We also expect to find diachronic changes occurring over a fairly short period considering the diachronic instability of indefinites. Finally, we assume that there could have been differences or changes in the distribution of indefinite markers with regard to the rhetorical strategy of the writer, i.e., between higher-code and lower-code texts and contexts, as well as between different types of discourse.

Data and methods
We collected our data from the historical subcorpora of the Russian National Corpus (RNC), specifically the Corpus of Birchbark Letters and the Old Russian Corpus. The former is rather small and contained only 19002 tokens at the time it was accessed. At the time of writing (October 2021) it comprised 1059 birchbark letters, which accounts for about 90% of all the birchbark letters that have been discovered so far. The Old Russian Corpus is more extensive and included 504077 tokens at the time it was accessed. Though these corpora may seem not large enough for studying infrequent phenomena, that is not the case for indefinite markers, which exhibit frequencies comparable with that of major grammatical categories.
Both corpora are fully tagged for part-of-speech and morphology and manually corrected. However, they are not completely disambiguated for our purposes, which means that we could not get disambiguated search results for most of our items. 5 First, the corpus tagset does not recognize interrogatives used in the function of indefinites. Second, we cannot rely on the corpus tagset to distinguish between uses of ljubo as an indefinite particle and as an epistemic particle meaning 'maybe' (SRJa VIII, 328). The latter is used without a pronoun, so it is not the subject of our analysis (9).
(9) сн҃ ѹ мои Юрьи не ходи самъ с Литвою ѹбилъ ꙗ кнѧзѧ ихъ Воишелка любо восхотѧть мьсть створити 'my son Jurij, do not go with the Lithuanians yourself. I killed their prince Voišelk, maybe they want to take revenge' (Volhynian Chronicle) Finally, we could not automatically draw the line between the use of eter as an indefinite pronoun and with the meaning of 'another'. Therefore, all the search entries of the zeromarked pronouns, pronouns combined with the particle ljubo and the pronouns eter-were manually checked for indefinite meaning. For instance, we included in our dataset examples such as (17), in which the pronoun kto is used in the indefinite function, but not examples such as (10), where it is used as an interrogative, and not examples such as (11) with a relative function of the pronoun kto. We also did not include in the dataset examples with ljubo such as (9) and examples with eter-such as (12), in which it is used in the meaning '(an)other'.
(10) никола […] рече къто ты ѥси и ѿкѹдѹ идеши и къ комѹ идеши? 'Nicolas […] said: "Who are you and where do you come from and to whom are you going?"' (Miracles of Saint Nicolas) (11) кто боудеть виноватъ на того татьба снидеть 'The responsibility will fall on the person who is guilty' (Russkaja Pravda) (12) Соломо(н҃ ): Желѣзо желѣза остритъ, мѹжь же остритъ лице етера 'Iron sharpens iron and a man shapes the countenance of another one' (Pčela) Nevertheless, the Old Russian Corpus, complemented by the Corpus of Birchbark Letters, is a reliable source, especially since it is possible to discriminate among the textual genres in it. It includes a representative number of OES original texts as well as texts translated by East Slavic scribes. The translated texts in the corpus comprise 7 sources (237998 words), i.e., "Life of Andrew the Fool", "1076 Izbornik of Svjatoslav", "Alexander Romance", "History of the Jewish War", "Story of Ahikar", "Miracles of Saint Nicolas", and "Pčela". The original sources in the corpus comprised nine texts at the time of access (266079 words), i.e., six chronicles (The Primary chronicle, Novgorod First chronicle, Kievan, Galician, Volhynian and Suzdal' chronicles), "Questions of Kirik", and two tales by Cyril from Turov. Only legal texts were lacking at the time the Old Russian Corpus was accessed, though some such texts are incorporated into the chronicles in the corpus. The Corpus of Birchbark Letters supplies us with instances of vernacular usage of indefinite markers.
The resulting dataset contains 608 examples: all examples of the indefinite markers ljubo and eter, all examples of někъto and něčьto, and of kъto and čьto used as indefinites, from all OES sources included in the Old Russian corpus and the Corpus of Birchbark Letters at the moment of access. We did not include numerals such as OES odinъ (OCS edinъ) in our dataset since they are limited to the attributive function, and we were focused primarily on indefinites with a substantival function.
After collecting the data, we tagged the indefinite markers in our dataset for various features.
First, we tagged them for the type of indefinite marker. We distinguish between zeromarked indefinites deriving from interrogatives (limited to contexts with the pronouns kъto and čьto only); as well as kъto and čьto prefixed by indefinite marker ně-(the reason for this choice is the impossibility of getting disambiguated results for bare interrogatives and the need for manual data selection), all pronouns marked with ljubo, and all indefinite uses of the pronoun eter.
Second, we tagged the indefinites for source, including originality and date. In the case of translations, this was the date when the translation was produced, rather than the date of the copy. Accordingly, the examples from the "Life of Andrew the Fool" are tagged for the 12th century and not for the 14th, the date of the earliest known copy (Moldovan, 2000). The same holds for other translated sources, such as the "Story of Ahikar" (the translation is no later than 1230, yet the earliest copy dates from the 15th century), the "History of the Jewish War" (the translation was made in the 12th century, cf. Pičchadze, 2002, p. 168, while the earliest copies are from no earlier than the 15th century), etc. The same strategy was adopted for the original OES texts (works by particular authors and chronicles as far as we are aware of their textual history). Based on these principles, the examples from the Kievan Chronicle were tagged for 1199, the time when Abbot Moses of Vydubyči Monastery brought together a set of linguistically diverse sources into one codex, and not the 15th century, the date of the earliest extant copy. Examples from original works by a particular author (e.g., Cyril of Turov or Kyrik of Novgorod) are tagged in our dataset according to the lifetime of the respective author. Examples from the birchbark letters are tagged with the date established in (Zaliznjak, 2004). If the date reads "no later than year N", the example is tagged with the last possible year. The data is distributed evenly across two important periods in the evolution of OES: we collected 289 examples from Early Old East Slavic, that is from the 11th and 12th century sources, and 320 examples from Late Old East Slavic (13th -14th century sources). Within the former subset, the majority of data belongs to the 12th century (244 examples Third, the indefinites were classified according to their reference type: a) specific indefiniteness (the object is specific but unknown to the speaker); b) halfdefiniteness (the object is specific, known to the speaker and unknown to the listener); c) nonspecific indefiniteness, or existential indefiniteness (the object exists but is not specified); d) universal reference (indefinite markers serve as universal quantifiers).
We distinguish between four main types of reference: half-definite, specific indefinite, nonspecific indefinite and universal, or generic. 6 Specific indefiniteness implies that the object is specific but unknown to the speaker. According to Haspelmath (1997, p. 38), "the speaker presupposes the existence and unique identifiability of its referent", cf. (13).
(13) и бѣаше в судѣ томъ нѣчто чрѣмно добрѣ вонѧꙗ велми, да ꙗкоже съ ст҃ ою водою смѣсивъ и съклативъ, вда пити, рекъ ѥму: миръ тебе 'and there was something red in that vessel that smelled very nice, and when [he] mixed it with holy water [he] gave him a drink, saying to him: Peace [be] with you' (Life of Andrew the Fool) 6 Non-referential use of indefinites is also possible, such as use of indefinites in a predicative function, or the so-called discourse uses; the meaning of indefinites in such constructions is labelled as 'appreciative' by Haspelmath (1997, pp. 186-189) The main diagnostic factors which allow us to identify specific reference are the use of a co-referential anaphoric pronoun (cf. onъ 'he, that' in 14) and an affirmative declarative sentence with the perfective past or an ongoing present 7 (13).
(14) слыша же хлапъ […] ꙗко речи нѣкому, ꙗко изъ млада ѥсть ѹмѣлъ сьи ꙗзыкъ, да хлапъ молѧшесѧ, да бы ѥму быти, ꙗкоже ѥсть и ѻнъ 'A slave heard someone saying that [he] knew this language since his childhood and the slave begged to be like that' (Life of Andrew the Fool) Half-definiteness means that the object is concrete, known to the speaker and unknown to the addressee; cf. 15 and 16. We consider the use of a pronoun to be half-definite if it is used with a proper noun in the context (cf. 15), or the pronoun depends on a modal verb in the 1st person form requiring definite reference (cf. chošču 'I want' in 16).
(15) бѣ же нѣќто тамо паѹса́ниѧ именемъ, мѹжь вели́къ и богатъ зѣло 'There was a certain man there whose name was Pausania, a great man of wealth ' (Alexander Romance) (16) хощю ємѹ показати нѣчто ди́вно 'I want to show him something amazing'(Alexander Romance) Non-specific indefiniteness means that the object is not specified. The main criteria for such use, according to Haspelmath (1997, p. 45), is the presence of the indefinite in conditional protasis, questions (cf. 17), contexts with the imperative, direct and indirect negation. The same criteria were used for tagging indefinite markers in our dataset. Generic reference is used for speaking about a category of objects. The main type of contexts where the pronoun may be used with the generic reference is an imperfective present tense or future lacking their temporal meaning but denoting possibility (cf. 18).
(18) можеть кто любо ѿ дѣмонъ плакатисѧ 'anyone/everyone can cry because of demons' (Life of Andrew the Fool) Fourth, we also defined the semantic function of indefinite markers in our dataset according to the classification elaborated by Haspelmath (1997) and Tatevosov (2002) for semantic maps of indefinite pronouns (see below for details). This classification does not always match the reference type (as in the case of universal references).
Haspelmath's semantic map for indefinite pronouns (for the notion of a semantic map, see above) includes and arranges different types of functions, among them the following: specific known, specific unknown, irrealisnon specific, question, conditional, indirect negation, di rect negation, comparative, freechoice. Haspelmath's map has been revised and completed by Tatevosov who added three nodes for the functions covered by universal quantifiers, i.e., generic, completeness, and definite (Tatevosov, 2002, p. 167). We will also use the node for generic and neglect the other ones as irrelevant for indefinite pronouns (Fig. 1).
The functions that are labelled as specific known and specific unknown are already discussed above. Most functions on the map belong to the non-specific, or existential, reference type. Now, we will illustrate these functions with OES examples. The only function that we cannot illustrate with an example is the comparative function, that is the use of an indefinite marker in the standard of comparison (bigger than anyone type). According to Haspelmath, the domain of negative polarity is covered by the following functions: direct negation (the central context is a direct, or clausemate, negation) (19), polar questions (20), conditional protasis (21), and indirect negation, which includes contexts of implicit negation (complements of verbs meaning 'refuse', 'be absent', etc.; prepositions meaning 'without', 'besides', etc.; negative quantifiers like 'few', and contexts with negation in the main clause and an indefinite NP in the subordinate clause, see Haspelmath, 1997, pp. 33-37;Tatevosov, 2002, 137), cf. (22). The irrealis-non-specific function comprises various types of irrealis contexts including hypothetic and counterfactual modality, future, imperative, complements of non-factual predicates (like 'want'), and habitual, e.g. (23).
(24) сверзи коверъ ли что ли что постьлати или чимъ прекрыти г(с)на нашего 'Throw the rug or whatever you can lay or cover our master with' (Kievan chronicle) The free-choice uses can be very similar to the uses of indefinites as universal quantifiers, that is, with generic, or universal, reference (cf. 18 above). For us, the main criterion to distinguish between the former and the latter was the reading 'every, all', possible for an indefinite used with the generic reference (cf. 18 above) and impossible for a free-choice function 8 (cf. 24).
As mentioned above, the notion of a semantic map implies that there is not only a list but also a certain order of uses with particular links between them, and that the functional spaces covered by the markers are contiguous (see Haspelmath, 2003). In contrast to synchronic ones, diachronic semantic maps also specify the direction of development from one type of use to another. Moreover, while synchronic semantic maps have been already established for Russian indefinite pronouns, diachronic ones have not. A diachronic analysis could contribute to creating such a diachronic map.
Fifth, we took into consideration the type of discourse (deictic discourse, narration, or irrealis discourse) and the degree of formality (formal or informal) represented in the context. Relying on the classification of types of discourse proposed by Plungian (2008, p. 24), we were able to discern three types of discourse: deictically oriented discourse, the interpretation of which requires an appeal to the moment of speech; irrealis discourse, not requiring such an appeal (habitual, hypothetical, counterfactual situations, combined with a non-referential character); and narrative discourse. Irrealis discourse should not be confused here with the irrealis semantic domain.
The degree of formality and type of discourse are attributed to the context and not the source text in general due to the fact that the latter is -usually -quite straightforward. Usually, the immediate context is equal to one or two sentences and becomes evident in the context of the corpus output. In the case of direct speech, the boundaries of the context coincide with the boundaries of the direct speech, on the one hand, and the text of the author that introduces direct speech, on the other hand. This is particularly important for chronicles, which are a heterogeneous genre of medieval writing and may include more formal hagiographical discourse as well as the more informal speech of treaties (on the existence of at least two linguistic layers in the Kievan Chronicle, see Zaliznjak, 2004, p. 51).
We classified the context as formal if it included one or more linguistic features of highcode writing, such as an aorist, imperfect, dative absolute, accusative with infinitive, or any other syntactic/morphological marker with a higher degree of formality (for an extensive list of formal and informal features see Uspenskij, 2002, pp. 192-266; for the discussion of different linguistic features typical for formal and informal registers see Živov, 2017, pp. 325-815), e.g. (25-26).
(27) ѥже вы ѥсть въ имѣнии ли злато, ли срѣбро, ли ино чьто, ѥмѹже что хощете оставити 'which you have in the property: gold, or silver, or something else, to whom you want to leave something' (Miracles of Saint Nicholas) (28) ты ѥси ѹ насъ кнѧзь ѡдинъ, ѡже тобѣ сѧ что створить, то что на(м) дѣꙗти? A поѣди, кнѧже, к городу 'you are our only prince; if something happens to you, then what shall we do? And go, prince, to the city' (Suzdalʹ Chronicle) Contexts lacking either high-or low-code features were classified as informal (the reason for this decision will be given below), as in (29) Such an approach to classifying contexts makes it possible to identify cases in which the choice of the indefinite marker might have been forced by the presence of high-code stylistic markers. It also explains why contexts without any stylistic markers are classified as informal. In such cases, as well as in contexts with low-code markers, the choice of the speaker/writer is not influenced by any formal features, so it is more likely that an item from the vernacular system would be intuitively chosen.
We applied a mixed-methods approach, i.e. a combination of qualitative and quantitative analysis. We used statistical methods such as Random Forest analysis as well as multinomial logistic regression.

Analysis of data and discussion
In this section, we discuss the factors that were taken into consideration in our dataset: reference, semantics, degree of formality, discourse types, originality of sources, date, and source specification. Studies of this kind that take into account all the above-mentioned factors (both reference types and semantic distribution of OES indefinites, and different discourse factors) as well as a wide range of sources provided by the RNC haven't been undertaken yet.
We will structure the discussion that follows according to these probable predictors and show which of them play a more significant role, and which do not. Table 1 shows the frequency of different reference types.

Reference type
It is obvious that non-specific indefiniteness (the existential reference type) is the most frequent and that the most frequent marker with this type is the zero marker. On the other hand, the half-definite and specific indefinite reference types are most often tied to the use of the prefix ně.
According to the data, interrogatives were used as the main means of expressing nonspecific indefiniteness, e.g. (31)   However, it is possible that at an earlier stage zeromarked indefinites may have been used with a specific type of reference, as in the following OCS example given by Křížková (1974, p. 82): 9 (33) Прикоснѫ сѧ мьнѣ къто, азъ бо чюхъ силѫ ишьдъшѫ из мене 'someone touched me, for I felt a force emanating from me' (Lk. 8:46, Zogr.) Pronouns with the prefix ně display a wider range of forms of "indefiniteness". Unlike zero-marked indefinites, ně-pronouns tend to occur more frequently with a specific reference (34). However, they are also attested in non-specific uses (35).
Such a low frequency can hardly be considered accidental. It can be taken as an indication that ljubo was a new marker, probably one still undergoing a process of grammaticalization (cf. the similar conclusion by Malovickij, 1971, p. 71). There are many examples of quasirelative constructions with ljubo preserving the expected argument structure, i.e. dative of the experiencer (40-41), and of relative constructions with ljubo preserving agreement in gender (39)  (41) пѹщени быша камо и(м) любо 'they were released wherever they liked' (History of the Jewish War) Ljubo occurs only to express non-specific indefiniteness, and as a universal marker, i.e. in contexts typical for zero-marked indefinites. It thus could have entered the web of indefinites as a marked functional alternative to the older unmarked interrogatives, or it could have first specialized as a marker of the free-choice function. Consider, for example, the difference between (42) and (43) Another piece of evidence supporting this observation is an example with double marking: an interrogative koe marked with both ně and ljubo, with ně already being less strongly associated with non-specific uses and thus requiring a stronger marker of non-specific indefiniteness (44).
(44) аще ѡбрѧщю(т) русь кувару грѣчьску вывержену на нѣкоемъ любо мѣстѣ, да не приѡбидѧть еꙗ 'if the Rusʹ find a Greek ship cast away in some place, [they shall] cause no damage to it' (Primary Chronicle) The homonymy of the interrogative and indefinite pronouns could have reinforced the development of a special marker of non-specific indefiniteness or its drift from free-choice towards other non-specific functions. The source of this marker was the quasi-relative construction interrogative + predicative ljubo, with the experiencer in the dative omitted.
The indefinite eter occurred in the sources only sporadically and did not belong to the core system of indefinites. Its peripheral status can be proven by its indifference to any type of reference, which suggests that it was a multifunctional higher-code substitution for any indefinite marker (primarily for the multifunctional ně). In (45), eterъ is a marker of nonspecific indefiniteness (it is accounted for by the context of conditional protasis, impossible for a marker of specific reference). In (46), it is a marker of specific indefiniteness (it is accounted for by the narrative context); whereas in (47), it is a marker of half-definiteness (cf. the context with a proper noun). Finally, in (48)

Semantic functions of OES indefinites
We tagged our data according to the functions proposed by Haspelmath (1997).  (Table 3). According to our data, zero-marked indefinites occur most frequently in conditional and irrealis non-specific uses (including future events as well as different types of non-indicative modality: in this respect, it confirms the results obtained by Malovickij (1971, pp. 61-73)), as well as in questions (49). On the other hand, ně-marked pronouns tend to appear with specific uses, in which they have almost no competitors, and in irrealis non-specific uses (50), in which they compete with ljubo-(51) and zero-marked indefinites (52): (49) мышлѧх же и се гл҃ ѧ: есть ли здѣ кто инъ или азъ єсмь єдинъ? 'I was thinking, saying the following: is there anyone else here, or am I alone?' (Life of Andrew the Fool) (50) ѻн же мнѧ, ꙗко ѥдинъ ѥсть ѿ нищихъ того дѣлѧ стоить, дабы взѧлъ ѹ него нѣчто, и ре(ч) ѥму: "бъ҃ тѧ помилуи, бра(т), ꙗко не имѣю ти что вдати" 'But he was thinking that [this] was one of the beggars, because he was standing to take something from him, and said to him: "God have mercy on you, brother, because I have nothing to give you!"' (Life of Andrew the Fool) (51) а вороти ми товара брата моєго со што любо, а ꙗ с тобою буду 'and give me back my brother's merchandise with whatever [you] like, and I will be with you' (Kievan Chronicle) (52) Се же напсахъ не ꙗко творити все то, нъ разоума ради, ци коли сѧ что таково пригодить 'I wrote this, not to do all of this, but to be aware if ever some of this should be of use' (Questions of Kirik) Thus, the ně-indefinites behave similar to the modern Russian -to-series, which competes with the nibudʹ-series in irrealis non-specific uses in a similar way (however, unlike the toseries, the ně-indefinites also occur as markers of half-definiteness) (53). (53) Если он что-нибудь / что-то утаил, он за это поплатится 'If he hid something, he will pay for it' (Padučeva, 2015) (54) Если возникнут какие-нибудь / какие-то проблемы, звони 'If any problems arise, call' (Padučeva, 2015) The irrealis non-specific context should be considered the main field of competition between different types of OES indefinite pronouns. Another field where the uses of OES indefinites also intersect is the free-choice context, cf. (55-56). We have few examples of free-choice interpretation of OES indefinites. Nevertheless, some observations can be made. Indefinites used in the free-choice contexts appear in slightly different contexts. Ljubo-indefinites can appear in contexts emphasizing a diversity of elements in the set, i.e. in contexts with concessive meaning similar to the propositions with the emphatic particle 'even' (cf. the right-hand context in (57) where this meaning is reinforced with an emphatic particle ni 'even'). In our examples, ljubo marks the maximum degree of the value ('even'), cf. examples (57-59).
(57) бѣаше бо єпифанъ прем дръ велми, нъ ꙗкоже бѣаше славныи ѻндрѣи, єпифанъ не бѣаше: ст҃ ымъ дх҃ мъ ꙗзыкомъ иноплеменьны(х) гл҃ ше и толковаше, єже кто любо хотѧше, ни ѿ прем дрости книжныꙗ не ѹтаꙗшесѧ єм ничтоже 'Epiphanius was very wise, but Epiphanius was not like the glorious Andrew: [the latter] spoke foreign languages due to the Holy Spirit and knew how to interpret what anyone wanted, and nothing was hidden from him even from the book wisdom' (Life of Andrew the Fool) (58) в та бо лѣта всѧко зла(т), ѥже ѥсть кдѣ любо съкровено, повелѣньѥмь би҃ имь ꙗвитьсѧ 'for in those years, all the gold that is hidden somewhere [= even in the most hardto-reach places] will be revealed by God's command' (Life of Andrew the Fool) That is not the case for zero-marked indefinites. They do not occur in pure free-choice contexts, but mostly in disjunctive contexts (usually with pronouns meaning 'other'), that is, in those emphasizing the indifference of the choice, with no particular focus on the diversity of the objects: (60) ꙗко се ѧзыкьмь ли слѹхъмь ли окъмь крадъмь: ли тъштеславиѥмь: ли печѧлию ли ѧростью ли чимь отъ таковыхъ похѹлѧюшт<е се>бе и исповѣдаюштесѧ бѹ҃ такѡ ст҃ ыихъ та<и>нъ приѥмлемъ вѣрѹюште: ꙗко въ очиштениѥ таковыихъ нашихъ грѣсѣхъ приѧтиѥ 'For when we put ourselves at fault, whether by word or by listening or by secret glances, or through vanity, or sadness, or anger, or any of these sins [= no matter which of them], thus confessing to God, we receive holy the holy mysteries, believing that we receive them for the forgiveness of our sins' (1076 Izbornik of Svjatoslav) (61) Мѹчи кого сѹдомъ, а не гнѣвомъ 'Torture a person [= no matter whom] with condemnation, not anger!' (Pčela) In this respect, zero-marked indefinites are similar to modern Russian nibudʹ-pronouns, which are not pure free-choice indefinites but can occur in contexts of disjunction together with the pronoun drugoj 'another': Разве могли бы Дмитриев, или Ксения Фёдоровна, или кто-нибудь другой из дмитриевской родни организовать и провернуть так лихо ремонт дачи, как это сделал Иван Васильевич? 'Could Dmitriev, or Ksenija Fjodorovna, or some other of Dmitriev's relatives organize and carry out such a renovation of the dacha as Ivan Vasiljevič did?' (Jurij Trifonov: Obmen, 1969) Ljuboindefinites therefore can emphasize a diversity of objects, and zero-marked indefinites -only indifference to the diversity. In this respect, the former are more similar to the modern Russian by to ni bylo than to the modern Russian libo (for by to no bylo-and liboindefinites, see Padučeva, 2013, pp. 222-238). Presumably, this effect of the ljubo-marked indefinites is the consequence of the original meaning of ljubo 'pleasing', which is associated with a choice anchored not to the speaker but to another discourse referent (in 58, the discourse referent may choose the most hard-to-reach place; in 59, -the hardest fault).
Whereas modern Russian has specialised negative polarity indefinites, OES seems not to feature any specific markers for such contexts. However, all the OES indefinite markers except eter can occur in contexts of indirect negation and even direct negation, though such usage is sporadic (63-65).
The corpus data enable us to create the following semantic map for Old East Slavic indefinites. On the map, we take into account only the most common uses of indefinites; marginal uses are not included in the functional space covered by high-frequency markers such as ně and zero-. The pronouns with ljubo are still in the process of being grammaticalized and do not allow us to make any definite conclusions. However, ljubo is the only marker attested in pure free-choice contexts. At the same time, it exhibits a tendency to take over the space on the left of the semantic map. This space has been indicated with a dotted line. The marker eter-has not been added to the map, since the status of this marker is artificial and the data are very scarce (Fig. 2).

Discourse factors
In this subsection, we discuss the role of discourse-related factors in the choice of a particular marker. These factors include the degree of formality (higher-and lower-code contexts), the type of discourse (deictic, irrealis discourse or narration), and the factor of originality (original or translated text).
The OES corpus is not balanced with regard to the number of higher-and lower-code sources and contexts, i.e., the number of higher-and lower-code sources and contexts are not equal. We will thus compare the frequency of our indefinite markers in different discourse types with the average frequency of a particular indefinite marker in the dataset. We have italicized the number of cases where the relative frequency is significantly (i.e. more than 30%) lower and indicated in bold the number of cases where the relative frequency is noticeably (i.e. more than 30%) higher than the percentual share of the respective marker in the dataset (Table 4).
Indefinite markers in higher-code contexts do not deviate much from the average level. Significant deviations affect only contexts with a lower code. The ně-marked indefinites are very unlikely to occur in such contexts, and no attestations of eter can be found in contexts with informal vernacular features. At the same time, the frequency of zero-and ljubo-marked indefinites is significantly higher in contexts with low-code markers. The choice of an indefinite is not determined to a large extent by discourse factors. Nevertheless, several tendencies can be observed. For instance, ljubo-marked indefinites rarely occur in narration and are much more frequent than average in irrealis discourse. The discourserelated distribution of zero-marked indefinites is even more worthy of note: they tend to occur in deictic discourse and are found with less-than-average frequency in narrative and irrealis discourse. Low frequency in narrative discourse is expected for markers of non-specific reference. With respect to ně-indefinites, we observe the opposite tendency: they are more likely to occur in narration than in the deictic discourse. However, all these features may be caused by the reference type. This follows from the fact that non-specific indefinites, unlike specific ones, are not possible in affirmative declarative sentences with a perfective past tense form (see above).
Let us now turn to evaluate the factor of originality. The number of examples with indefinites in translated and original texts is not equal, so the frequency of the markers in original texts and translations should be compared to the respective percentual share in our dataset (we have used the same formatting to indicate strong deviations from the respective percentual share) (Table 5).
Comparing the average frequency with the frequency in original and translated texts, we see that the translations do not significantly deviate from the average in the corpus, except with respect to eter. Given the overall low numbers for eter, these results should not be overinterpreted. However, it should be noted that this pronoun occurs in OES translations only 3 times (cf. also Pičсhadze, 2011, p. 220, p. 245), and all the examples are from the same source, the "Miracles of Saint Nicholas" (66). This can be explained by the fact that OES translations from Greek fall into several groups according to linguistic features driven by different translation schools or traditions (Pičсhadze, 2011). Pičchadze could not find enough data to identify the place of the Miracles of Saint Nicholas among other OES translations. However, this translation is the only one in our material that uses the indefinite pronoun eter.
In original texts, the frequency of indefinite markers deviates more significantly from the average. (As we have already shown, this affects the pronoun eter, as well.) For ljubo-and zero-marked indefinites, it is higher than the average, whereas for the ně-marked indefinites, the frequency is extremely low.
Ljubo-marked indefinites occur in a limited number of translations. No attestations can be found in such large-volume sources as the Aleksander Romance, the History of the Jewish War, or the Pčela. They occur primarily in texts where a significant part is made up of direct speech or prophecies (Life of Andrew the Fool, Tale of Ahikar), for example (67). (67) ст҃ ць ре(ч): "а єгда кто любо ѡтвержетьсѧ дѣлъ вашихъ, єже б деть дѣꙗлъ прѣже, како то терпѣть, добле ли ци люто?" 'the saint said, "and when someone renounces your deeds, which he did before, how does he endure it, courageously or dreadfully?"' (Life of Andrew the Fool) (69) наслажаютьсѧ ѹчениꙗ бж(с)твенаго гласа, ꙗкоже бо нѣкто землю разореть, другыи же насѣеть, инии же пожинають и ꙗдѧть пищу бескудну 'they enjoy the teachings of the divine voice, that one will plow the earth, the other will sow, and the others will reap and eat rich food' (Primary Chronicle)

Role of individual sources
The following descriptive chart shows the overall relative frequency of any overt (i.e. nonzero-marked) indefinite markers in each of the sources used. 10 A high percentage means that predominantly overt indefinite markers are used, while a low percentage means that the zero-markers are predominantly used. As can be seen, there are some translated sources (represented by the black bars) with a very high percentage of overt indefinite markers (Fig. 3). The overall percentage of overt indefinite markers in translated sources is slightly higher than in original sources, with the notable exception of the Suzdalʹ chronicle. However, as can be seen, inter-source variation is high, especially when focusing on the translated sources, suggesting that the factor original vs. translation is not a suitable predictor for the use of overt indefinite markers.

Chronology and exploratory statistics
Using the multinom function from the nnet package in R, we fitted several multinomial logistic regressions with marker as the multinomial response variable and date as well as one other variable (such as originality or referential type) as predictor variables, in order to evaluate the role of these variables, above all, the date variable. Some of the multinomial regressions fitted suggest that the date variable plays a role (i.e. the zero marker as well as the ně type of universal reference seem to rise with time). However, since the results were predominantly not statistically significant, we resolved not to report the results of the multinomial regressions.
The structure of our data, with many low-number variables, thus seems to call for a different statistical method than (multinomial) logistic regression. Recently, random forests have gained popularity in modern quantitative linguistics. Random forests are a robust, exploratory method from the field of machine learning and are used to classify complex data with many potentially relevant variables. The method works as follows: "Random forests […] work through the data and, by trial and error, establish whether a variable is a useful predictor. The basic algorithm used by the random forests constructs conditional inference trees. A conditional inference tree provides estimates of the likelihood of the value of the response variable […] on the basis of a series of binary questions about the values of predictor variables." (Baayen & Tagliamonte, 2012, p. 159) Random forests construct many conditional inference trees, each of them based on a random subset of the data. In this way, the results of the classification are more reliable than that found in a single tree. For our dataset, the results (produced with the R party and random ForestExplainer (Paluszynska et al., 2020) packages, 2,000 iterations) are as shown in Fig. 4.
The predictors farthest to the left are the most important ones, in the sense that these predictors are found statistically closer to the roots of the classification trees, and therefore are responsible for a larger number of the splits in the dataset. This means that date is not among the most important predictors, suggesting that language-internal factors such as the semantic or referential type play a more important role. While clearly relevant for other linguistic variables in the time period covered by our study, the variable date does not seem to have much of an effect on the choice of indefinite markers.
While individual sources have a certain impact, whether or not the source containing the respective indefinite marker is an original work or a translation does not have any significant effect.
The most important factors determining the choice of indefinites can be seen in the visualization in Fig. 5 (for more specific information on how to interpret random forest visualizations, see https://cran.r-project.org/web/packages/randomForestExplainer/vignettes/ randomForestExplainer.html).
The farther to the upper right the respective variable, the more important it is, in the sense that the analysis would lose accuracy if it were omitted. Interestingly, despite being relatively far up on the right, the p-value of the referential type is greater than 0.1, which means that it is not statistically significant. This leaves us with a single most important predictor of the choice of an indefinite marker: semantic type. Thus, we have empirically established that the data analysed conforms well to Haspelmath's semantic map, suggesting that his classification is superior to competing classifications.

Conclusion
In our analysis of indefinite pronouns in OES, we made use of a combination of quantitative and qualitative methods and applied statistical tools to the OES data.
Contrary to one of our hypotheses, the diachronic factor does not play a dominant role, nor does the formality of the texts or whether or not they are original works. The absence of any significant changes during the whole OES period means that the crucial restructuring in the system happened later, in the Middle Russian period (15th-17th centuries) when the new series of indefinites were grammaticalizing. This can be explained by the fact that no new indefinite markers came into being during the period in question. Internal factors, such as, first and foremost, semantic type, are more important in determining the choice of indefinite pronouns than external factors such as time. The only language-external factor with some degree of importance was the individual source. Therefore, we can conclude that separate subsystems of indefinite pronouns were not a feature of Church Slavonic and East Slavic vernacular texts.
In future studies, the somewhat unexpected result that such factors as date, formality, or original vs. translation play only a minor role in explaining linguistic variation in OES, should be statistically tested against other linguistic phenomena using the same methods applied in the current study. It may allow us to distinguish between linguistic phenomena that are more susceptible to being affected by factors such as tradition, formality or originality and the ones that are more resistant to them. The results of our analysis are also important for further typological and diachronic studies of indefinite pronouns, since in our case, the choice of a particular marker depended only to a very small extent on the influence of tradition or originality, being instead determined primarily by the internal logic of the language system. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.