The Indonesian prefixes PE- and PEN-: A study in productivity and allomorphy

This study examines two nominalizing prefixes in Indonesian: PE- and PEN-, which derive nouns from verbs with a range of meanings similar to that found in -er suffix in English. The prefix PE- is form-invariant, whereas PEN- has several nasal allomorphs. Given their similarity in form and function, the question arises of whether PE- and PEN- are allomorphs. We conducted a corpus-based analysis of their productivity, using the written Indonesian corpus in the Leipzig Corpora Collection. In this corpus, PEN- is apparently more productive than PE-. Interestingly, the frequency of words with PEN- correlates significantly with the productivity of the corresponding base verbs. In addition, PEN- is more integrated into the verbal system; verbs that have PEN- are part of larger verb families. PEN- attaches almost exclusively to verbs and creates nouns denoting agents and instruments. By contrast, PE- creates nouns denoting agents and patients and attaches not only to verbs but also to nouns and adjectives. For derived words with PE-, there is no significant correlation between the frequency of the nominalization and the frequency of its base. PE- also does not participate in the linearity of the productivity of the allomorphs of base and derived words that characterizes PEN-. Words with PE- are also more often input to further reduplication and inflectional variants than is the case for PEN-. This corpus-based research thus illustrates that affixes can have different qualitative and quantitative properties, although at first blush they look like allomorphs. Our analyses justify their treatment in the Indonesian literature as separate prefixes.


Introduction
The question addressed in this study is whether two phonologically very similar prefixes of Indonesian are allomorphs or rather independent prefixes. According to the classical definition of allomorphy, variants of a morpheme which have the same underlying form, which share the same meaning, and are in complementary distribution, are classified as allomorphs (Bloomfield 1933;Alber 2011). When two different affixes express roughly the same semantics, they are referred to not as allomorphs but as rival affixes (Aronoff and Anshen 2017). Conversely, when the same form signifies completely different semantic functions, as in the case for English -s (third person singular vs. plural inflection vs. third person genitive, . . . Plag et al. 2017), we have affix homonymy. Less clear-cut are cases where formatives are obviously similar in form as well as in meaning, without the form similarity being phonologically conditioned. For instance, Peters (2004) argued that English -er and -eer are allomorphs where the choice of -eer is semantically conditioned on the referent being from the semantic field of war. Baayen et al. (2013) discussed the Russian prefixes pere-and pre-, which are etymologically related but express subtly different semantics. Endresen (2014) provides detailed discussion of the limitations of the classical definition of allomorphy. She points out that there are counterexamples where other parameters should be taken into account, such as subtle differences in meaning as exhibited by the Russian affix pairs s-vs. so-'together ', o-vs. ob-'around', pere-vs. pre-'across', vz-vs. voz-'up', and vy-vs. iz-'out of'. The two Indonesian prefixes that are the subject of this study likewise raise the question of whether these prefixes are allomorphs, given their phonological similarity, or separate prefixes. As pointed out by Denistia (2018), Indonesian linguists mainly have described the two morphs as independent prefixes (Ramlan 2009;Sneddon et al. 2010), but there are also studies that take them to be allomorphs (Darwowidjojo 1983;Kridalaksana 2008). Since these two prefixes are similar in form, but not phonologically conditioned, and since they are similar, but not identical in meaning, the classical criteria for allomorphy are only approximately satisfied. Thus, the present study is a corpus based investigation into what Endresen (2014) refers to as non-standard allomorphy. Specifically, we examine in detail the differences in the semantics of PE-and PEN-, the differences in their productivity, and the differences in the extent to which derived words with PEand PEN-are input to further inflection. In our analyses, the paradigmatic relations between base words and derived words are especially informative. 1 In what follows, we first introduce some basic aspects of Indonesian verb morphology and deverbal nominalization. In the next section, we introduce the databases that inform our analyses. We then present our analyses and conclude with a discussion of the results obtained.
The nasal allomorphy of Indonesian MEN-and PEN-is an example of classical phonologically conditioned allomorphy.
Verbs with MEN-can be extended with the suffixes -i and -kan. MEN-typically renders a verb explicitly transitive. The suffixes -i and -kan add a further argument, either a beneficiary or a causer, while often at the same time expressing intensification or iteration (Arka et al. 2009;Sutanto 2002;Tomasowa 2007;Kroeger 2007;Sneddon et al. 2010).
Occasionally, one finds both PEN-and PE-. There are 5 cases in which the form with PE-semantically refers to a profession and the form with PEN-does not, as listed in (9). There are also some cases in which the form with PEN-expresses agent, causer, or instrument and the form with PE-expresses patient or agent. In this case, 7 instances are attested in our database, as listed in (10).
The goal of this paper is to clarify the morphological status of PE-and PEN-, allomorphs or separate prefixes, through a quantitative survey of their productivity, their paradigmatic relations with their base verbs, and the extent to which these derived nouns are input for further inflection. Indonesian inflection comprises several bound morphs: -ku, -mu, and -nya for first, second, and third person singular possessives or objects, ku-and kau-for first and second person subjects (Sneddon et al. 2010). In the Indonesian literature, these bound morphs are referred to as clitics, as they are phonologically reduced forms of free pronouns (Kridalaksana 2008). There are also two suffixes that attach to verbs or nouns to express emphasis (-lah and -pun) or questioning (-kah). In what follows, we will refer to these morphs as inflectional, as they do not give rise to new onomasiological units but rather modify existing words much in the same way as adverbs modify verbs in English. Indonesian also has reduplication, which is used to express the plural for nouns and realizes various semantics function on verbs and adjectives, including intensification and iteration (Sugerman 2016;Chaer 2008;Rafferty 2002;Dalrymple and Mofu 2012). Following Booij (1996), we distinguish between inherent and contextual inflection. Agreement marking on verbs (e.g, ku-and kau-) exemplifies contextual inflection, which is syntactically governed. Inherent inflection is more similar to word formation and hence in some languages can feed derivation and compounding. For instance, in Dutch, plural nouns can appear as left constituents in compounds (Schreuder et al. 1998). Reduplication in Indonesian is inherent inflection: it is not governed by syntactic context (marah-marah 'very angry', anak-anak 'children', berhenti-berhenti 'to stop repeatedly'), and can feed further inflection, as in memukul-mukuli, 'to hit intensively over and over again', which has as parse [[[meN + [pukul] We shall see below that derived words with PE-are more often input to these inflectional processes than derived words with PEN-. We will argue that the joint quantitative evidence justifies to analyse PE-and PEN-as two distinct prefixes rather than as allomorphs. In the next section, we introduce the databases that we derived from a 36 million token corpus of written Indonesian (Goldhahn et al. 2012).

Materials
We created a database from the Indonesian corpus that is part of the Leipzig Corpora Collection at http://corpora2.informatik.uni-leipzig.de/download.html, accessed in April 2016. This corpus comprises a variety of written registers (the web, newspapers, Wikipedia) dating from the years 2008-2012 (Goldhahn et al. 2012). There are 112.025 different word types in this corpus, that occur in 2.759.800 sentences, to a total of 36.608.669 word tokens.
The words in the corpus were morphologically analyzed using the MorphInd parser, which has an overall accuracy of 84.6% (Larasati et al. 2011) and it was run in single word mode, i.e., compounds were not parsed. Prior to running the parser, the 200 words with PE-or PEN-that contained a typo were corrected manually. The MorphInd parser's results for PE-and PEN-were checked and corrected manually against the online version of Kamus Besar Bahasa Indonesia (hereafter called the dictionary), a comprehensive dictionary of Indonesian (http://kbbi.kemdikbud.go.id; accessed on June 2016), to verify the morphological status and semantics of the PEand PEN-words. We made use of the fourth edition, published in 2012, which has more than 90,000 lemmas (Alwi 2012). The language it records is formal; it omits words that are considered slang or foreign. Where the dictionary and the MorphInd are in conflict, we followed the dictionary. Where the dictionary does not provide information on the word category of the base, we followed the MorphInd parser. The precision of the parser for these words was 0.98 and its recall was 0.82, using the dictionary as the gold standard complemented with manual verification for out-ofvocabulary words.
Sample output of the parser is shown in Table 1: a morphological segmentation is provided where available, as well as a word category label. Table 1 shows that Mor- . Therefore, the output of the parser was manually checked and corrected when necessary. We processed the data using the R (version 3.3.2) programming language (R Team 2008) in R Studio (R Team 2015). The databases and the R scripts used to construct these databases are available online at http://bit.ly/PePeNProductivity. In what follows, we first present the database with Indonesian verbs, and then proceed to the database with derived nouns with PE-and PEN-.

The database of Indonesian verbs
Indonesian has deverbal morphology for active, passive, causative, and transitive semantics among others, see Table 2 for examples. From the corpus, we retrieved all verbs recognized by the MorphInd parser and brought these together in a database. The total number of types in the database is 26996. Table 3 illustrates that for each verb, we provide information on the derived word's frequency in the corpus, the parse provided by MorphInd, the base word, the word category of the base, and the affix or affixes in the verb. When particles (e.g. -lah, -kah, -pun) or affixes (e.g. ku-, -ku, kau-, -mu, -nya) are found attached to a verb (Sneddon et al. 2010;Sugerman 2016), this form is listed with its own entry. 3 The database comprises 2489 simple verbs and 24507 affixed verbs (3665 verbs with suffixes, 11562 verbs with prefixes, and 9280 verbs with both prefix and suffix). We observed 27 verb constructions of which 13 are reported in the literature ( (Hidajat 2014;Fortin 2006;Sneddon et al. 2010;Benjamin 2009;Arka et al. 2009;Sudaryanto 1993;Kridalaksana 2007)). In our corpus, there are 2 attested verb constructions (e.g. terke-/-an and terper-/-an) that are not productive (1 token and 8 tokens respectively). Table 2 lists the 25 productive constructions. As our specific interest is in nouns with PE-and PEN-, we extracted from this database all the verbs that correspond to these nominalizations and that carry the prefix BER-or MEN-. To this new database, henceforth the MeBer Database, we added information on the frequency of the base words of these complex verbs, whether the verbal prefix is MEN-or BER-and also the allomorph of MEN-(see Table 4). Whereas all nominalizations with PEN-have a corresponding verb with MEN-, there is one simple verb, sohor 'to be famous', that has a corresponding nominalization with PE-, pesohor 'a famous person', without having a corresponding verb with BER-. This verb-noun pair is not in the MeBer Database, but in a separate database (SimpleWords) which also specifies the frequency of the base verb and the frequency of the derived noun (see Sneddon et al. (2010) for discussion of such exceptional pairs). All the data in MeBer Database were compiled computationally from the output of the MorphInd and subsequently checked manually using the dictionary. In total, there are 8484 words with the MEN-prefix and 3582 words with the BER-prefix. These counts include forms with the suffixes -i, -kan or -an. To this database, we added some words such as beserta 'to be together with', belajar 'to study', beternak 'to farm', bekerja 'to work', and beterbangan 'to fly randomly' and their inflectional variants, forms which MorphInd did not recognize but that we happened to identify in the course of this study. The MorphInd parser also does not recognize verbs with the allomorph menge-. For the 18 nominalizations with PEN penge -, we manually searched for the occurrences of the corresponding verbs and added these together with their frequency counts to the MeBer database. Finally, a total of 297 verbs with MEN-and 14 verbs with BER-were not recognized by the parser, and were corrected manually on the basis of the dictionary. 4

The PePeN database
We brought together the PE-and PEN-words in a lexical database, henceforth the PePeN database. This database also includes the noun with PE-that have a simple verb as the base. In this way, we obtained a total of 3090 words, 267 with PE-, 2818  with PEN-, and 4 words with the unproductive variant PER- (Benjamin 2009). 5 There are 34 words that the MorphInd parser did not analyze. All derived words were annotated manually for semantic role (agent, instrument, causer, patient, and location), and checked (for at least one token) against both the dictionary and usage in the corpus. As in English, where -er nominalizations may express multiple semantic roles (Booij 2010;Booij and Lieber 2004) (e.g. printer, which has both an agent and instrument reading), Indonesian PE-and PEN-formations can have multiple interpretations (see Table 5). In this study, we did not distinguish between impersonal agent 6 and instrument. Although it is well known that PEN-create agents, patients, and instruments (Sneddon et al. 2010), we observed a few cases of causer (e.g. penyakit 'disease') and location (e.g. penghujung 'the end') in our database. It is possible, even likely, that semantic roles are in use in the corpus without being registered in the database, as manual verification of all 579564 tokens with PE-or PEN-in the corpus was infeasible. In the database, words with more than one semantic role have multiple entries in the database, with one row for each role (cf. Table 5). The frequencies listed in rows of Table 5 are those of the overall frequency of the word and are not broken down by semantics.
The PePeN Database thus provides the following information: 1. Word frequency: the token frequency of the derived word in the corpus, 2. Allomorph: the form of the PEN-prefix; where the allomorph does not follow the rules as given in Chaer (2008), Sneddon et al. (2010), e.g. penglihat 'seer' is expected to be pelihat, this is marked in the 'notes' column of the database as AllomorphDeviation, 3. Base word, 4. Word category of the base word, 5. Base word frequency: the token frequency of the base word in the corpus, 6. MorphInd output as illustrated in Table 1,  7. Semantic role of the derived noun with respect to the base word (agent, instrument, patient, . . . ), 8. Morphological variation: reduplications, particles (e.g.-lah, -pun, per-) or affixes (e.g.-ku, -mu, -nya), if present, 9. Typo: whether the form in the corpus had a spelling error (corrected in the database, frequency counts include the frequency of the corrected typos); when several spelling alternants are in use, this is indicated in the FreeVariance column of the database as illustrated in Table 7.
Entries of this database are listed in Table 6.

Productivity of PE-and PEN-derived nouns
The PE-and PEN-prefixes differ in their productivity. As shown in the upper panel of Table 8, PEN-occurs with more tokens, more types, and more hapax legomena compared to PE-. Further detail is provided by the lower panel of Table 8, which shows the numbers of tokens, types, and hapaxes for PEN-allomorphs and PE-.   (Zipf 1935(Zipf , 1949. The left panel clarifies that the highest ranked words with PEN-also exceed in frequency the highest ranked words with PE-. Nevertheless, the productivity index V1/N (Baayen 2009) remains greater for PEN-(0.00118) than for PE-(0.00055). The second panel of Fig. 1 shows that four of the six allomorphs of PEN-have rank-frequency curves that lie above the rank-frequency curve of PE-. The curve for PEN peny -, crosses the curve for PE-around rank 50, but still shows many more low-frequency formations. The only allomorph that is less productive than PE-is PEN penge -, an allomorph that attaches to monosyllabic words and which appears in the corpus with only 18 types.
Given the similarity of PE-and PEN-form, the question arises of whether it makes sense to consider PE-as a low productivity allomorph of PEN-. To address this question, we examined the counts of types and hapax legomena for PE-and the allo-    Fig. 2 shows that the rate at which base verbs give rise to derived nouns is the same (according to a regression model) for all allomorphs of MEN-and that PE-patterns as an outlier, both with respect to type counts and with respect to hapax legomena. It is remarkable that the rate at which hapaxes and types appear is so constant across the allomorphs of PEN-and MEN-. From this, we draw the conclusion that the outlier PE-is best understood as a formative in its own right. We note here that Indonesian PEN-and MEN-offer a remarkable window on the relation between base productivity and derived productivity. Further evidence that PE-is not an allomorph of PEN-emerges when we take the semantic roles of the derived noun into account. Table 9 cross-tabulates PEand the allomorphs of PEN-by the semantic roles of these nouns in our database; (1) = 81.32, p < 0.0001; interestingly, the few patient nouns with PEN-are realised with the allomorph pe-, however, the proportion of patient hapaxes is much lower (0.02 for PEN-and 0.13 for PE-, p < 0.015, proportion test). Conversely, PEN-is productive for instruments, which are virtually absent for PE-. This may be one of the reasons that PEN-is more productive than PE-. For PEN-, a chi-squared test indicates that the ratios of agents to instruments are proportional across all allomorphs (χ 2 (5) = 1.01, p > 0.1 and χ 2 (5) = 5.48, p > 0.1 for both types and hapax legomena). The uniformity of semantic functions accross the allomorphs of PEN-is perfectly in line with the fact that these allomorphs are phonologically conditioned. Conversely, the lack of productivity for instruments that characterizes PE-, and its (limited) productivity for patient nouns that is strongly attenuated for PEN-is a further indication that PE-is unlikely to be an allomorph of PEN-. Thus, Indonesian PEN-and PE-show the kind of semantic specialisation that led Baayen et al. (2013) to conclude that Russian pere-and pre-are not allomorphs but independent prefixes.
The counts underlying Table 9 and Fig. 3 are based on a type definition that distinguishes between forms of the noun with different possessive suffixes or suffixes expressing emphasis, as well as noun plurals. When such variants are collapsed into a single type, the pattern of results on the ratios of agents to instruments across all allomorphs remains similar (χ 2 (5) = 0.75, p > 0.1 and χ 2 (5) = 5.11, p > 0.1 for both types and hapax legomena). However, the number of distinct types for patient nouns with PE-reduces to 5, each of which occurs more than once. Thus, PE-appears to be  well-entrenched for a handful of patient nouns, but does not show real productivity here. Krott et al. (1999) reported the paradoxical finding that words with less productive affixes tend to be used more as base words for further word formation. A similar observation holds for PE-and PEN-, but now for inflection rather than word formation. Inflectional variation is well illustrated by the noun pengikut 'follower', which is attested in the corpus with 9 variants: pengikutku 'my follower', pengikutmu 'your follower', pengikutnya 'his/her follower'; reduplication as in pengikut-pengikut 'followers'; reduplication and affixes as in pengikut-pengikutmu 'your followers', pengikutpengikutnya 'his/her followers'; affixes and particles as in pengikutmupun 'your follower' (contrastive your, i.e., your, not somebody else's follower), pengikutnyapun 'his/her follower' (contrastive), pengikutnyalah 'his/her follower' (contrastive in imperative mood). Table 10 shows the counts of the different kinds of inflections types for PE-and PEN-. In our corpus, particles (e.g. -lah, -pun), possessive suffixes (e.g. -ku, -mu, -nya), and plural reduplications are used most often. Figure 4 presents a mosaic plot for the cross-classification of pe and PEN-by type of inflection. The mosaic plot shows that inflected forms of PE-are overrepresented for particles, plurals, and combinations of plurals and possessives. In other words, the less productive prefix, PE-, is used more intensively as input for further inflection than is the case for PEN-. This is likely to be due to the greater entrenchment of words with PE-in the mental lexicon, which makes them more readily available for more further affixation. Thus, the same principles that Krott et al. (1999) reported for derivation in Germanic languages generalize to inflection in Indonesian.

The base verbs of PEN-and PE-: MEN-and BER-
Several studies call attention to the tight relation between PE-and PEN-and their verbal base words (Putrayasa 2008;Chaer 2008;Ramlan 2009;Kridalaksana 2007;Darwowidjojo 1983). We therefore inspected the productivity of verb formation, focusing on monomorphemic words as potential base words. In our database, a total of 5581 such monomorphemic words is attested, with 3617 simple nouns, 943 simple adjectives, and 1021 simple verbs. As shown in Table 2, a large number of affixes is available for creating verbs from nouns, adjectives, and verbs. For this study, the number of different complex verb forms will be referred to as a monomorphemic word's verb family size. The verb family size measure includes inflectional variants of the verbs in its counts. Plots of this verb family size against base frequency show that, as expected, a higher base frequency predicts a greater verb family size. Interestingly, the functional form of this relation is different for base words that give rise to nouns with PEN-, and those that do not. This is illustrated in Fig. 5 (see also Table 11), which present the results of a GAM (Generalized Additive Model, MGCV package version 1.8-17, Wood (2006Wood ( , 2011) with a poisson link fitted to the verb family size with centered log base frequency as the predictor. The increase of verb family size with base frequency is greater when PEN-is present, as can be seen by comparing the right panel with the left. In the right panel, we see a linear increase, whereas in the left panel, there is no increase at all for the lowest frequency base words. For the larger part of the range of the base word frequencies, the verb family size is larger if the verb family has a noun with PEN-. We also considered the base words with PEin the verb family, but as the resulting curve was not significantly different from that of base words with verb families that did not have either nominalization, the two sets were merged into one defined by the absence of PEN-in the verb family. Apparently, base productivity and derived productivity are interacting for PEN-, but independent for PE-.  -and PEN-, with PE-, or with PEN-). The plot shows that simple words that give rise to affixed verbs but not to any formations with PE-or PEN-are overrepresented for nouns, and that base words that have PENin their verb family are overrepresented for verbs, unurprisingly (χ 2 (4) = 839.97, p <  The right panel concerns monomorphemic base words for which the verb family size is zero. Again, we see that base words that have PEN-in their verb family are overrepresented for verbs (χ 2 (4) = 288.58, p < 0.0001). No such overrepresentation is visible for PE-. Whereas the literature on PE-and PEN-holds that PEN-is derived from verbs with MEN-, our corpus data indicate that PEN-actually can attach to simple words that do not have a corresponding verb with MEN-, even though the total number of instances is small (45). It is possible that the relevant MEN-verbs are in use in the language, but not attested in our corpus. Alternatively, it is conceivable that these MEN-verbs only have a virtual existence as possible words.
We have seen that PEN-is more productive than PE-and more tightly integrated into the verbal system. This raises the question of whether the reduced productivity of PE-might be due to reduced productivity of the verbal prefix BER-. Indeed, verbs with MEN-are more productive overall than verbs with BER-(2714704 tokens with MEN-vs. 801052 tokens with BER-, 5174 types with MEN-vs. 2869 types with BER-, and 996 hapax legomena with MEN-vs. 760 hapax legomena with BER-); see  also Table 12 and the rank-frequency plot for BER-and MEN-in the left panel of Fig. 7. However, when considering the allomorphs of MEN-separately, it turns out that BER-is more productive than any of these allomorphs, as shown in the right panel of Fig. 7. Although BER-is more productive than any of the allomorphs of MEN-, it is not the case that PE-is proportionally more productive than any of the allomorphs of PEN-. It follows that the modest productivity of PE-is not a straightforward consequence of the lack of productivity of BER-. This conclusion receives further support from the presence of a significant correlation between the frequency of the MEN-base and the PEN-nominalization (r s = 0.4397, p < 0.0001) and the absence of such a correlation for BER-and PE-(r s = 0.1908, p = 0.1711).

General discussion
We have presented a quantitative investigation of the use of two nominalizing prefixes of Indonesian: PE-and PEN-. Although quite similar in form, nouns with PE-are described by literature as derived from verbs with the prefix BER-. Conversely, nouns with PEN-typically originate from verbs with the prefix MEN-, and show the same allomorphy in the same conditioning contexts as these prefixed verbs. In this paper, we addressed three questions. First, do PE-and PEN-differ with respect to their degree of productivity? Second, how does their productivity relate paradigmatically to the productivity of their base words? Third, given the similarity in form of PEand PEN-, should they be taken to be allomorphs? To answer these questions, we examined the use of these nominalizations and their base words in a corpus of written Indonesian.
With regards to their productivity, PEN-is clearly more productive than PE-by any measure of productivity. In fact, PE-is less productive than any of the allomorphs of PEN-, with as only exception the allomorph PEN penge -, for which only 18 words are attested. PEN-is productive for agents and instruments, whereas PE-is productive for agent nouns and to some small extent for patient nouns. Nouns with PE-and PEN-reveal the same productivity paradox that was reported by Krott et al. (1999) for derivation and compounding. Krott et al. observed that less productive morphological categories are used more intensively as input for further word formation. In our data, we likewise find that the less productive prefix, PE-, appears with more variants compared to PEN-.
Whereas words with PE-are more readily accessible for further inflection compared to PEN-(see Fig. 4), words with PEN-emerge as paradigmatically more entrenched. Verbs to which PEN-attaches tend to allow for more verbal affixation than is the case for verbs to which PE-attaches (see Fig. 5). Furthermore, the productivity of the allomorphs of PEN-mirrors the productivity of the allomorphs of their base words with MEN-(see Fig. 2). The proportionalities that govern the types and hapaxes of the allomorphs of MEN-and PEN-does not extend to BER-and PE-. In fact, PEis surprisingly uncommon with base verbs with BER-, which is not what standard descriptions in the literature-PEN-is derived from MEN-, PE-is derived from BER- (Chaer 2008;Ramlan 2009;Ermanto 2016;Sneddon et al. 2010;Putrayasa 2008;Darwowidjojo 1983;Benjamin 2009) -would lead one to expect.
It is well known that the productivity of an affix can vary depending on the structure of its base words (Aronoff 1976;Baayen and Renouf 1996). Nevertheless, it is surprising to see an almost perfect linear relation between the productivity of the allomorphs of MEN-and the productivity of the allomorphs of PEN-, both with respect to types and with respect to hapax legomena. This linear relationship strongly supports analyses according to which the variant forms of PEN-and MEN-are allomorphs. Our examination of the use of PE-and PEN-in written Indonesian revealed some novel uses that have not been noted in the preceding literature on allomorphy.
This raises the question of whether PE-should be considered to be yet another allomorph of PEN-. Several observations argue against this possibility. First, PEdoes not participate in the linear dependence that characterizes the productivity of the allomorphs of MEN-and PEN-. Second, our data indicate that PEN-has a strong preference for verbs as base words, but PE-does not show such a preference. Third, a monomorphemic base word's verb family tends to be larger when this verb family gives rise to a nominalization with PEN-, but no such tendency is present for PE-. Fourth, the frequencies of words with PEN-enter into a significant correlation with the frequency of the base words, but no such correlation is present for PE-: the formations with PE-have become independent of their base words. Finally, PEis proportionally overrepresented for patient nouns, whereas PEN-creates primarily instruments in addition to agents.
That allomorphy is to some extent a matter of degree is well known (Baayen et al. 2013;Endresen 2014). Obviously, PE-is highly similar in form to PEN-, in fact, it is identical to one of its allomorphs (although it is possible that phonetically the two are different, see Plag et al. (2017) for durational differences between the realisations of English -s depending on the semantics functions expressed). Yet, even though PEand PEN-are largely in complementary distribution, they differ substantially in their productivity, both quantitatively and qualitatively, as well as in their entrenchment in the verbal system of Indonesian.