1 Introduction

The notion of the morphome refers to a pattern of partial or total morphological syncretism that does not correspond to any syntactically or semantically natural domain. First introduced by Aronoff (1994), the existence of a morphomic level in grammar has been since defended by those who believe that morphology can have rules and structures of its own. Most of the subsequent literature on morphomes, spearheaded by Maiden (e.g. 2018b), has argued quite forcefully for autonomous morphology, and has done so almost exclusively on the basis of the patterns of allomorphy and diachronic developments observed in the Romance verb (notable exceptions include Stump, 2015:128–140, Enger, 2021; Feist & Palancar, 2021; Herce, 2023). Romance stem alternations, thus, have become the most crucial object of analysis in this so-called ‘Morphome Debate’ (Luís & Bermúdez-Otero, 2016). Unfortunately, their diachronic discussion to date has been mostly qualitative in nature (but see Gaglia, 2020) and comparatively subjective. It has relied on carefully chosen examples and analyses usually aimed at supporting a previously held theoretical stance/opinion. This type of research might be enlightening to understand the nature of concrete cases better, but fails to provide a comprehensive (and representative) picture of morphomes’ productivity and evolution across the family, and is hence unlikely to resolve the broader debate on how productive or autonomous Romance stem alternations really are.

It is in this gap in the literature that the present paper should be understood to fit in. This paper constitutes the first attempt to approach the topic in a replicable, quantitative way. It relies on a predefined sample of Romance varieties and paradigms, all of which will be considered (i.e. no cherry-picking) and present evidence for or against the productivity (and psychological reality?) of Romance morphomes. It will do so by exploring patterns of stem allomorphy that emerged late in the history of the family, i.e. in a way unrelated to the historical processes that generated the inherited morphomic structures N, L, PYTA and FUÈC (see next section and Maiden, 2018b) in the common ancestor of (most) Romance varieties. More specifically, it will survey the loss of stem-final consonants and the associated (e.g. hiatus-avoiding) morpho-phonological changes across Romance, and will quantify how often these alternations abide by the inherited morphomic templates and how they deviate from them when they do. Section 2 will provide a brief overview of Romance morphomes. Section 3 will discuss the exact object of analysis, and Sect. 4 the theoretical interest of the topic. Section 5 presents the data sources and methodological choices employed. Section 6 presents the results in detail and Sect. 7 discusses their interpretation. Section 8 summarizes the main conclusions and avenues for future research.

2 Introduction to Romance morphomes

Romance verbal stem alternations have been, for the last three decades, one of the most thoroughly studied morphological objects in the languages of the world. Proponents and detractors of autonomous morphology have written numerous papers on (specific aspects of) Romance morphomes (see Tables 1 and 2) and differ in how much synchronic and diachronic importance they attach to these paradigmatic structures (Steriade, 2016; Bermúdez-Otero & Luís, 2016; Maiden, 2016). It has become customary to refer to different inherited patterns of stem alternation by the names N, L, PYTA, and FUÈC (see Maiden, 2018b, and Esher, 2014).

Table 1 Spanish quer-er ‘want’ (GER quer-iendo, SG.IMP quier-e, PL.IMP quer-ed)
Table 2 Spanish sal-ir ‘exit’ (GER sal-iendo, SG.IMP sal-∅, PL.IMP sal-id)

NFootnote 1 alternations (see stem quier- in Table 1) constitute the morphologized effects of sound changes that introduced differences between stressed and unstressed stem vowels. Stem-stressed forms (SG and 3PL forms of the present, and 2SG imperative) constitute the expected domain of N stems.

L alternations (see stem salg- in Table 2) constitute the morphologized results of sound changes palatalizing coronal consonants before /j/ and velar stops before front vowels. The 1SG.PRS.IND and the PRS.SBJV constitutes the inherited domain of these special stems, in addition to the 3PL.PRS.IND in some varieties.

PYTA alternations (see stem quis- in Table 1) refer to the ones inherited from the Infectum-Perfectum stem distinctions that many Classical Latin verbs showed as a marking of aspect. Former-perfective tenses (which underwent semantic change so that they became no longer characterized by a common semantic value) are the expected domain of PYTA.

Last, FUÈC alternations (see stem saldr- in Table 2) constitute the morphological outcome of univerbation into synthetic forms of earlier periphrases with the verb habeō ‘have’. The strictly morphomic status of these alternants depends on whether the future and conditional tenses, which constitute PYTA’s paradigmatic domain, are understood to have or lack some value(s), like ‘irrealis’, that distinguish them from others.Footnote 2

3 The phenomenon explored: CVC>C(V) stems

The diachronic origins of the Romance morphomes/alternations outlined above are associated with certain paradigmatic environments and certain typical segmental realizations: Stem vowel/diphthong alternations in N, /k/, /g/, and palatal stem-final consonants in L, etc. Later sound changes and morphological changes in the history of Romance, however, generated other types of stem alternations. One of these, different from the typically morphomic ones, will be surveyed in the present paper.

Phonologists usually regard CV.CV as the cross-linguistically preferred syllable structure (Jakobson, 1962; Blevins, 1995). A hiatus is defined as a heterosyllabic sequence of vowels (i.e. V.V), and thus does not conform to this preferred phonotactic template. Many (maybe most) languages have (had) phonological rules or sound changes aimed at avoiding or ‘repairing’ these sequences. These occur in areally and genetically unrelated languages (e.g. Japanese [Kawahara, 2003], Hungarian [Siptár, 2007], Shona [Mudzingwa, 2013], Tiv [Kwambehar, 2014], etc.), which suggests it might be, indeed, a universally dispreferred phonological sequence. Hiatus avoidance rules and changes are also common in Romance (see e.g. Garrapa, 2012; Garrido, 2013).

The preferred morphotactic structure of (verbal) stems in Latin, (C)CVC(C), was such that stem vowels were most often clearly separated from theme vowels and inflectional endings (e.g. cant-ō ‘I sing’, crēd-ō ‘I believe’). Only a small set of verbs (which I cite here in the 1SG form of the PRS IND, following the Latinist convention) could be said to lack a stem vowel, either throughout the paradigm (e.g. ‘go’, n-eō ‘weave’, qu-eō ‘be able’, sc-iō ‘know’, d-ō ‘give’, st-ō ‘stay’), or in certain persons (vol-ō ‘want’ > v-īs ‘want.2SG’, ed-ō ‘eat’ > ēs ‘eat.2SG’), or to have the stem vowel adjacent to the inflectional endings (su-ō ‘sew’, cre-ō ‘create’…). Loss of stem-final (often word-internal intervocalic) consonants, however, occurs frequently. It happened, thus, sporadically in Latin (e.g. trah-ō /tra.oː/ < trag-ō ‘carry’) and more frequently in the later evolution of Romance. Various phonological and morphological changes, happening at different times and often independently, generated many more of these morphotactic configurations. To name a few consider Sp. ve-o ‘I see’ < vid-eō, Cat. fa-s ‘you.SG do’ < fac-is, It. sa-i ‘you.SG know’ < sap-is, Rom scri-em ‘we write’ < scrīb-imus, Fr. ri-ez ‘you.PL laugh’ < rīd-ētis, etc. Regardless of whether these were, in concrete cases, the result of phonological or of morphological changes, given the usual structure of verbal forms in Romance, the elimination of these consonants tended to generate vowel hiatus. These are notoriously uncomfortable sequences phonologically and tend to be “repaired” (see e.g. Yip, 1988; Casali, 1996, 1997; Rosenthal, 1997) by means of various processes: loss of the syllable break and dipthongization, vowel coalescence, glide insertion (plus possible fortition), etc. In the context of a complex inflectional paradigm, morphological change is also available to repair these sequences (see e.g. Lloret, 2009), or to repair morphologically the phonological changes’ paradigmatic effects.

As a result, these sequences seem to be, unsurprisingly, particularly unstable historically. In medieval Spanish, for example, the intervocalic /d/ from Latin regularly disappeared due to sound change (e.g. Lat. laudāre > Sp. loar ‘INF.praise’, Lat. crēdō > Sp. creo ‘I believe’). Although some of the resulting hiatus were preserved, other times, these configurations gave rise to competing variants (i.e. to ‘overabundance’, see Thornton, 2012) in the paradigm. In XIVc. Castilian, for example, ver, veer, and veyer (< Lat. vidēre ‘INF.see’) were all widespread (with 701, 474, and 112 tokens respectively in the historical corpus CORDE), and other solutions like veyr (21 tokens) also existed (see Lass [1997:44-103] for a discussion of the challenges involved in distinguishing orthographic and phonological variation in historical texts). In addition, the original consonant also survived sometimes in specific environments. A form veder was very uncommon in Old Spanish as the infinitive form, but vido was very frequent in the 3SG preterite (see Malkiel, 1960). This added still more variation to the picture.Footnote 3

These situations constitute an ideal testing ground to evaluate the productivity and continued relevance of abstract morphomic patterns of stem alternation around 1000 years later, in this case, after they originally emerged in the language. Many morphologists believe that synonymy situations like these are, the same as hiatus, inherently dispreferred by language users (see e.g. Carstairs-McCarthy, 1994; Hurford, 2003; Manin, 2008). Sometimes, lexical (near-)synonymy is resolved into a suppletive arrangement (see e.g. Börjars & Vincent, 2011, and Table 1), where different roots are combined into the same paradigm, each of them coming to specialize into different values/niches. Having all these different stems (i.e. v-er, ve-er and vey-er) circulating, the opportunity is perfect (or the need most acute) for speakers to (re)distribute them in the paradigm in any way they see fit. By doing so, they inform us about which generalizations and paradigmatic structures they perceive as more prominent in their language. Under this assumption, a widespread one in the literature (e.g. Maiden, 2018b), diachronic productivity can be regarded as revealing of the cognitive reality of morphomes.

In this particular case (i.e. Spanish ‘see’) Table 3 illustrates the paradigmatic distribution of the stems that speakers of Spanish eventually converged on. Two things are worth mentioning about this paradigm. The first is that the pattern of stem alternation that ver displays is unique in the language; that is, it does not follow the template of the L-morphome (which would have the darker-shaded stem only in the 1SG.PRS.IND and in the PRS.SBJV, but not in the IPF.IND) nor of any of the other morphomes in the language. This could be argued to support a reduced influence of morphomic alternations to determine the distribution of new forms. At the same time, however, the distribution of v- vs ve- in Table 3 can hardly be said to be completely at odds with the L-morphome either. In fact, the identity of the stems in the L-morphome cells is borne out. In addition, some of the Romance varieties closest to Spanish (e.g. Asturian, see Vallina Alonso 1985) do differ from the paradigm in Table 3 precisely in having the “right” stem (i.e. 1SG v-ia, etc.) in the IPF.IND.

Table 3 Modern Spanish ‘see’ (GER , SG.IMP , PL.IMP )

Individual examples can often be contested, and arguments can always be found to support either of the two positions within the Morphome Debate (Luís & Bermúdez-Otero, 2016). What is needed, thus, and the purpose of this paper, is a replicable quantitative survey of how many of these newly emerging stem alternations abide by the inherited morphomic patterns. When they adopt one of the morphomic distributions preexisting in the language, it will be registered which of these (i.e. N, L, PYTA, or FUÈC) is being replicated, in order to get a measure of the relative productivity of each of them. When the paradigmatic distribution does not match any of the morphomes, we will need to know how severe the deviation is from the established patterns (e.g. if it respects, at least, their so-called ‘stem-space’, see Pirelli & Battista, 2000; Boyé, 2000; Bonami & Boyé, 2002, Boyé & Cabredo-Hofherr, 2006, Montermini & Bonami, 2013; Herce, 2019) and in which cells exactly it occurs. The overall goal, thus, is to generate a sound quantitative estimation of the relative degree of productivity of (the different) morphomic structures over phonologically heterogeneous objects, long after morphomes first appeared in the language, and after the split of Proto-Romance into largely independent local varieties.

The reason for choosing this concrete phenomenon (i.e. phonologically heterogeneous CVC>CV stem changes) to gauge the productivity of morphomes is that, as Nevins et al. (2015:8) mention, morphome theory’s “clearest and most predictive aspect (…) says that it is about an abstract relation of complete identity between these cells of the paradigm without any reference to their phonological form or phonological naturalness” (emphasis mine). Given Beard’s (1995) Separation Hypothesis and the usual formalizations of the morphome as involving abstract syncretic indices, one would not expect the productivity of morphomes to be limited to the extension to novel verbs of pre-existing morphological alternation types (e.g. adding a /g/ to an L-morphome, or a diphthong to N). By looking at alternations different from these usual morphome-typical ones, this paper evaluates an abstract kind of architectural productivity. The alternative type of productivity that could be evaluated through the number of verbs that have an L, N, etc. of any kind is a similarly interesting and relevant one to study (see Cathcart et al., 2022; Herce, 2022) but will not be addressed here.

4 Empirical and theoretical interest

Language users appear to be quite good at spotting patterns and finding niches for the distribution of allomorphs. The literature on morphological change, and on (Romance) morphomes in particular (Aronoff, 1994; Esher, 2015; Herce, 2021; Maiden, 2011, 2018a,b, etc.) is full of examples of this. Whatever their underlying motivation to do so (e.g. synonymy avoidance, Carstairs-McCarthy, 1994), Romance speakers seem to be particularly keen on distributing morphological differences in very specific ways, e.g. finding a niche in the paradigm for competing roots to coexist.

Consider the partial paradigm of the verb ‘go’ in the Aragonese variety from Ansó (Table 4). No less than three different roots from three different Latin lexemes are deployed in the person-number-tense inflection of this verb. Reflexes of Latin sum ‘be’ (its perfectum stem fu-) occur in the imperfect subjunctive and in the preterite. Reflexes of Latin vādō ‘go’ occur in the singular and in the 3PL of the present indicative and through the present subjunctive (also in the singular imperative, not shown in Table 1). Last, reflexes of Latin ‘go’ occur in the 1PL and 2PL of the present indicative, and in the imperfect indicative and the infinitive.

Table 4 Paradigm of Anso Aragonese ir ‘go’ (Maiden et al., 2010, from Barcos, 2007)

Although suppletive, the paradigmatic distribution of this stem alternation is by no means random, but rather runs entirely parallel to various other stem alternation patterns in the language. The paradigm for the verb benir ‘come’ (Table 5), for example, shows the exact same distribution of its different (this time non-suppletive) stem alternants. The distributions of stem vowel differences (see stem vowel /je/ vs /e/ vs /i/) match exactly the paradigmatic distribution of the suppletive alternation in ‘go’ in Table 4 before.

Table 5 Paradigm of Anso Aragonese benir ‘come’ (Maiden et al., 2010, from Barcos, 2007)

The consensus explanation for these phenomena (Maiden 1992, 2018b) is that inherited patterns of alternation, even if morphomic, provided a template for the distribution of incoming root (also sometimes affixal) allomorphy. Because of their important role in diachronic change, these patterns are assumed to be cognitively real for language users even if they do not (seem to) match any morphosyntactic value. In many cases (see the paradigm of ‘come’ in Table 5) these alternations are completely redundant (i.e. morphologically uninformative), because person, number, and tense suffixes already express all the relevant morphological distinctions in the language. The productivity of these patterns in the history of Romance (illustrated in a striking way by analogical morphological changes that give rise to suppletive paradigms like the one in Table 4) is taken to be proof that these patterns are internalized as such, and constitute an active autonomously morphological part of the grammar, not simply constituting rote-learned irregularities or “diachronic junk” (Lass, 1990).

Some disagreement exists, however, regarding this particular point in synchrony. Following a nonce-verb learning experiment with native speakers of Portuguese, Nevins et al. (2015) argued that these patterns (specifically the one labelled ‘L’ that spreads over 1SG.PRS.IND and SBJV (see stem salg- in Table 2) are no longer productive/active in the language and attribute this loss of productivity to a decrease in the proportion of verbs that are subject to them. The best examples of analogical restructuring of morphologically heterogeneous (i.e. suppletive) roots to match morphomic patterns seem to be, in fact, quite ancient. Note, for example, that the verb in Table 4 is suppletive in a similar way across western Romance (cf. French or Italian). One could entertain the hypothesis, thus, that these patterns might have been somewhat productive for some time after they emerged (maybe only a few centuries) but later ceased to play any crucial role in the structuring of other types of allomorphy.

Two logically extreme positions can be found in this respect, and regarding morphological autonomy more generally. For lack of better terms, we shall call these the “epiphenomenal morpheme” (A), and the “epiphenomenal morphome” (B) positions.

A):

Morphology is an independent grammatical module of its own, obligatorily present in any language and/or meaning-to-form mapping. There is no cognitive preference for morphological patterns which align to extramorphological categories. The attraction power of patterns is proportional to their presence in the lexicon and blind to their phonological naturalness, and to whether they apply over natural or unnatural classes of cells. In the context of Romance stem alternations, the patterns inherited from Proto-Romance were (or at any rate are considered here) all morphomic. Other patterns of alternation might have emerged in individual varieties from later sound changes but barring these, because no other (e.g. natural) pattern of stem-alternation was inherited from Latin, there should be largely no novel patterns of alternation that do not fit the established morphomic templates N, L, PYTA, and FUÈC.

B):

Morphology does not have an independent existence and paradigms are completely epiphenomenal. The realization of values is what matters, and not predictive relations between inflected word forms. Any morphological commonality or syncretism that might hold between cells with no common meaning/value is necessarily accidental, i.e. diachronic “junk” or irregular traits which language users can learn by rote in frequent lexemes but over which they do not construct/infer single categories or rules, and hence they cannot be used as productive templates. This position would expect that no Romance morphome should be replicated with novel forms except by chance, which is probabilistically extremely unlikely. New patterns of alternation will instead match whatever environment motivated them, which can in principle be a phonological or a semantic one but crucially not a morphological one.

Aronoff’s (1994) original contribution, by which an intermediate morphomic level would be present in all systems, would be close to A). In a grammatical architecture where this intermediate level cannot be dispensed with, natural form-meaning mappings would offer no advantage in terms of either learnability or stability. Some authors like Blevins (forthcoming) have in fact claimed that “the contrast between ‘natural’ and ‘unnatural’ classes appears to reflect a priori assumptions about descriptive ‘economy’ and ‘naturalness’ which have never been shown to be relevant to language structure, acquisition or use”. With regards to Romance specifically, Maiden (2016:49) has claimed that morphomic distributions are “not especially dispreferred” in Romance, and regards this as an objective empirical observation on the basis of that family’s history. If these positions are right, there should be no drive for morphological forms to become better aligned to semantic/syntactic structure and we should observe (e.g. in Romance) that even novel stem alternations, phonologically unrelated to the inherited ones, would strongly tend to conform to the preexisting patterns of stem alternation (i.e. N, L, PYTA, and FUÈC), rather than adopt an unprecedented distributionFootnote 4 in the paradigm, even if this were a semantically more sensible one. Maximizing the paradigmatic distributional fit of new and old alternations would make the Paradigm Cell Filling Problem (Ackerman et al., 2009) easiest to solve. Proponents of Distributed Morphology (e.g. Halle & Marantz 1994, Embick, 2000; Pomino, 2008), for example, would be closer to the logically opposite position B). If there is no morphological component of language that can have rules and structures different from syntactic and semantic ones, unnatural morphological structures must be synchronically accidental and could never serve as templates for the distribution of new, especially phonologically heterogeneous alternations (as this would also invalidate any phonological explanation). If this were right, novel stem alternations ought to (almost) never adopt an unnatural distribution, even if this is one that, as in Romance N, L, PYTA, and FUÈC, is present, somewhat robustly, in the extant paradigmatic structures of the language.

It is in this context that a closer look at the morphophonological changes of the type surveyed in this paper offers the possibility for most insight. Unlike the changes that gave rise to the N- and L-morphomes (loss of vowel qualities in unstressed syllables, and pre-yod or pre-front-vowel palatalizations), which happened largely in the common ancestor and were inherited by the daughter languages, the loss of stem-final consonants, or the later hiatus-repairing adjustments, occur largely independently in the various Romance languages (cf. Spanish ver, Aragonese veyer, French voir, Surselvan vezər, Italian vedere, etc.). Thus, commonalities with regard to these (identical paradigmatic distributions in particular) will tend not to be synapomorphic (i.e. shared retentions), but rather homoplastic (i.e. shared innovations). In addition, because the consonant and vowel alternations involved in these morpho-phonological changes are in general unrelated to the typically morphomic ones, assimilation to the paradigmatic distribution of N, L, PYTA, or FUÈC morphomes will be necessarily structural (i.e. predicated over abstract paradigm cells) and largely form-independent. This is the reason why these morphophonological changes constitute an ideal object of study to evaluate the long-term productivity of morphomic templates, and hence to contribute meaningfully to the autonomous morphology debate.

I finish this section with a disclaimer. Although this might have been interesting, I will not make any distinction throughout this paper regarding the origin (e.g. sound change, analogical, or other morphological changes) of stem-alternations in different lexemes and varieties. There are many reasons for this. The first is that the history of most of the varieties explored in this paper is severely underexplored and very imperfectly understood. Thus, any taxonomization would be comparatively unreliable and involve a great deal of guesswork. Adding to this, and regarding morphological change more concretely, there is, despite a long research tradition (see e.g. Andersen, 1980; Fertig, 2013) no agreement on what types exist, i.e. regarding which morphological changes are different or the same, and how they should be identified. This would have limited the validity and usefulness of any one taxonomization. Last, claims regarding the productivity of morphomes have not been limited to any specific type(s) of (morphological) change as far as I know. While some research suggests that sound-change-generated alternations might be less sensitive to morphomic structure than morphological changes (see e.g. Esher, 2017, or Maiden, 2018b, who tends to talk about the sensitivity to morphomes of ‘morphological change’), clear cases exist of sound changes being sensitive to paradigmatic structure (see e.g. Malkiel, 1960; Maiden, 2018b:277-283, Herce, forthcoming). As I understand it, and although different types of change might show different trends, the attraction effect of morphomic templates should operate on any type of alternation, regardless of its origin. Thus, and although a diachronic classification of the stem alternation changes/patterns analyzed in this paper could have been of interest to many readers, I have decided to leave this aside for now and simply mention this here as a possible limitation to this study, where I classify synchronic distributions exclusively.

5 Data sources and method

The main source for the linguistic data for this research was the Oxford Online Database of Romance Verb Morphology (Maiden et al., 2010). This online database (and its downloadable expanded version, see Beniamine et al., 2020) includes full (also many partial) paradigms in phonological form of the most frequent Romance verbs in 73 different Romance varieties. The lexical coverage of the database is not perfect, so not all the targeted verbs (see below) are documented everywhere. To alleviate this, this source was occasionally supplemented with others,Footnote 5 mostly with inflected lexicons of national standard languages, which are the ones for which these resources are available.

In terms of target lexemes, 22 were selected which had various documented reflexes showing stem-final consonant deletion in some parts of the paradigm (i.e. forms in which the etymological consonant has been lost, regardless of whether another one was introduced later). The list of explored verbs and their infinitive forms in some illustrative Romance daughter languages is provided in Table 6.

Table 6 Inspected paradigms and their sources (white = Oxford Online Database) (Color table online)

The classification of individual datapoints was done by hand and proceeded as follows:

First (see Sect. 5.1 below), it was determined whether the paradigm of a particular lexeme in a particular variety (e.g. the reflex of faciō ‘do’ in the Catalan variety spoken in Alghero) showed morphophonological innovations associated to the loss of stem-final consonants. Innovations involving only the extension of L, N, PYTA, or FUÈC morphology to new verbs were ignored.Footnote 6 These innovations were further classified into those that have taken place across the whole paradigm, and those that have not, in which latter case we identify a novel pattern of alternation that will need to be further explored and taxonomized.

Second (see Sect. 5.2), the paradigmatic distribution of these novel alternations was registered. These were classified in one of three major types: a) the alternation abides by some pre-existing morphome’s template (L, N, PYTA, or FUÈC),Footnote 7 in which case it is registered which of these it matches, b) the alternation does not correspond with any of the traditional inherited morphomic templates but does abide by the stem-space (Boyé & Cabredo-Hofherr, 2006, Montermini & Bonami, 2013) generated by the cross-classification of L, N, PYTA, or FUÈC and their combinations, in which case we register which domain the innovative alternation concerns, or c) the alternation abides by neither of these templates, in which case we register, again, in which paradigm cells the alternants occur.Footnote 8

Last, for additional information and even in the case of non-alternation, I register if these patterns are based on the original stem-final segment being preserved (i.e. no innovation), on the original stem-final segment being lost (i.e. a substractive innovation), or on the introduction of a new segment not formerly present in the paradigm (i.e. an additive innovation). The segment involved is also noted to allow for the inspection of the concrete morphology involved in individual cases. All this information is provided in the supplementary materials. Appendix A also includes, for convenience, a visual summary of the information.

5.1 The reflexes of Romance morphomes

Within individual Romance varieties, the paradigmatic stem-alternation patterns that derive from the “classical” morphomes (N, L, PYTA, FUÈC) may be more than one.

As Table 7 shows, the L-morphome reflexes of Italian dire and fare come in somewhat different flavours. In addition, the paradigmatic distribution of these inherited stem-alternation patterns differs, often substantially, from one Romance language to another.

Table 7 Partial paradigms of two Italian verbs with different reflexes of the L morphome

Consider the paradigmatic distribution of the stem-final velar in the paradigm of ‘do’ in Table 8. It shows that the inherited alternation between L (former pre-yod stem) in gray, and in white is often found with a different distribution in different languages. Morphological or later sound changes may modify (or eliminate) the inherited distributions, or may generate new ones. For this research we want to assess how closely the alternations emerging from stem-final consonant deletions and hiatus-avoidance match the inherited morphomic patterns as instantiated in that individual language. Local morphomic domains were evaluated from the distribution of etymological morphome-typical alternations like the ones in Tables 8 and 9.

Table 8 Present tense forms of the reflexes of faciō ‘do’ in four Romance varieties
Table 9 Stem domains derived from the cross-classification of L, N, PYTA, and FUÈC, as present in conservative (e.g. Ibero-Romance) Romance varieties

5.2 Romance morphome-derived stem space

If L, N, PYTA, and/or FUÈC-morphomic patterns of stem alternation remained productive for a long time (beyond regulating their own paradigmatic distribution),Footnote 9 we would expect that newly emerging alternations would tend to abide by the paradigmatic niches they provide. In the case of the Italian L-morphome in Table 7, we could expect innovations to spread, for example, across the cells 1SG.PRS.IND, 3PL.PRS.IND, 1SG.PRS.SBJV, 2SG.PRS.SBJV, 3SG.PRS.SBJV and 3PL.PRS.SBJV, or across their complement set.

There is, however, a different, somewhat less restrictive version of what exactly is the paradigmatic template/entity that should be expected to be productive. Many authors (e.g. Boyé & Cabredo-Hofherr, 2006, Herce, 2019) have noted that, taken across lexemes, some morphomes cross-classify because they span disjoint sets of cells. This gives rise to various zones of interpredictability in the paradigm, within which stems will always be identical and mutually predictable, but between which stems may differ. Given the original paradigmatic distribution of L, N, PYTA, and FUÈC morphomes for simplicity, their paradigmatic distribution would give rise to the six stem domains illustrated in Table 9.

An innovation that spread, for example, to the INF, GER, IMP.PL, and IPF would not match the template provided by any of the Romance morphomes, but would still abide by the partitions of the paradigm that they give rise to. Similarly, an innovation that spread to the PLUP.SBJV, PRET, FUT, and COND would also not match the paradigmatic distribution of any of L, N, PYTA, or FUÈC, but would still not disrupt the stem homogeneity of the inherited stem space, which would provide some support for their continued importance in organizing allomorphy in Romance paradigms.

Note that there is a one-directional implicative relation between morphome-abiding and stem-space-abiding patterns. If an innovation abides by a morphome template, this entails that it also abides by stem space. The latter, thus, constitutes a superset of the former. Some morphomes are the same as invidual stem domains (V=P, and VI=F), while others represent the combination of 2 stem domains (I+II=N, and I+IV=L). Paradigms exhibiting these will be, hence, “double-counted” in the results in Sect. 6 under both morphome-abiding, and stem-space-abiding ones.

Another point to keep in mind is that stem domains in Table 9, the same as the morphome reflexes in Table 8, vary somewhat from one Romance variety to another. For example, the observed change in Standard Italian (Table 7) whereby the etymologically 1PL.PRS.SBJV form started to be used for the 1PL.PRS.IND too means that 1PL.PRS.IND in this variety will pattern together with the 1PL.PRS.SBJV into stem domain IV. In Balkan Romance, in turn, 1/2.PRS.IND extended their domain to the SBJV, in some Gallo-Romance varieties the 1PL.PRS.IND form came to be used as the 3PL.PRS.IND too, etc. These local developments will be taken into account when reporting on the accommodation (or lack thereof) of innovative morphological alternations to these stem domains. Thus, an innovation that spreads, in Italian, to 1PL.PRS.IND, 1PL.PRS.SBJV, and 2PL.PRS.SBJV will be reported as not abiding by morphomic patterns but as abiding by the morphome-derived stem domains (in this case IV).

6 Results

The exploration of the 22 lexeme sets in Table 6 across 63 different Romance varieties (all of the sufficiently documented ones [>3 lemmas] in the Oxford Online Database of Romance Verb Morphology) yielded a total of 716 documented paradigms. Of these, 214 (30%) displayed no morphological innovation of the ones this paper targets (see Sect. 3 and examples therein). The classification of the remaining 502 paradigms into the types defined in Sect. 5 is shown in Fig. 1.

Fig. 1
figure 1

Late morphophonological innovations in the Romance verb by type

Of those lexemes/paradigms that contain the morphophonological innovations that this paper explores, 70 (15%) showed an innovation that has spread to the whole paradigm, consider for example, the cases in Table 10. The original stem-final consonant /d/ of Latin rīdeō ‘laugh’ has disappeared across the whole paradigm in Ladin. In Val Badia, it has done so without a trace, whereas in Val Gardena, /ʒ/ (derived either from an older /dj/, or from a strengthened hiatus-avoiding yod) has taken its place. In both cases, however, the paradigms provide no strong evidence for or against the late vitality of morphomes as templates for the distribution of allomorphy. The relatively frequent adoption of the same solution across the paradigm may remind us, however, of the power of paradigm uniformity (see e.g. Steriade, 2000) as a structuring force within lexemes.

Table 10 Partial paradigms of the verb ‘laugh’ in two varieties of Ladin

The remaining 432 lexemes (i.e. those where we do find novel stem alternations) are, in any case, the most interesting ones to bear on morphome productivity. Of these, 46 (11%) abide by one of the morphomic patterns (i.e. N, L, PYTA, or FUÈC) that most Romance languages inherited from Proto-Romance. We refer to these as ‘morphome-abiding’ throughout this paper, by which we understand that they match the distribution of Romance morphomes.

Consider the paradigm in Table 11. In perfect compliance with the N-morphome template, /r/ (a segment probably originating in the infinitive) occurs in Val Badia Ladin (Alton & Vittur, 1968), instead of the etymological segment /b/, in the INF, SG.IMP, 1SG.PRS.IND, 2SG.PRS.IND, 3SG.PRS.IND, 3PL.PRS.IND, 1SG.PRS.SBJV, 2SG.PRS.SBJV, 3SG.PRS.SBJV, and 3PL.PRS.SBJV. At the same time, the glide /j/ occurs in the complement set of cells.

Table 11 Val Badia Ladin ‘drink’ (INF = , IMP.SG = , IMP.PL =boˈjede)

It is not only N but also the other morphomes that may occasionally provide a template for the distribution of the innovative stem alternation patterns surveyed in this paper. Consider, for example, the paradigm in Table 12. The etymological stem-final consonant / θ/ has been preserved in the paradigm of Ansó Aragonese ‘make’ (Barcos, 2007) in all person-number cells of the tenses PRET and PLUP.IND. These correspond to those PYTA tenses that have been preserved in the language.

Table 12 Paradigm of Ansó Aragonese INF ˈfeɾ‘make’ (GER ˈfendo, SG.IMP ˈfes)

The productivity of the four Romance morphomes, according to the late morphophonological innovations observed to target each of those sets of cells, is shown in Fig. 2. Of the 46 innovative patterns of alternation that have been found here to abide by one of the inherited morphomic distributions, the majority (27, 59%) abide by N, 8 (17%) by PYTA, 6 by L (13%), and 5 by FUÈC (11%). The N morphome, thus, seems to clearly outrank all others in its (late) productivity (differences between N and the rest are statistically significant: \(X^{2}\) (1, N = 46) = 7.6, p = .0057), a fact which might be derived from the greater token frequency of its cells (see red+orange in Fig. 3) compared to those of the other morphomes. Although the role of type frequency has been emphasized over that of token frequency when it comes to productivity (also in morphomes, see Nevins 2015, Esher, 2017) there is research suggesting that token frequency also “promotes productivity provided it combines with high type frequency” (De Smet, 2020:252).

Fig. 2
figure 2

Number of N-abiding, P-abiding, L-abiding, and F-abiding innovations

Fig. 3
figure 3

Token frequencies of the different paradigm cells in Latin (Delatte et al., 1981) (Color figure online)

Although the morphome-abiding cases are comparatively common in some varieties (namely Ladin, and to a lesser extent Romansh, see Appendix A) they remain comparatively rare across the rest of Romance. If we relax the criteria of what exactly it means to abide by morphomic patterns (as defined around Table 9), however, we find a total of 192 (44%) cases of innovations that abide at least by some of the stem domains derived from morphomes (see Table 13) or by their combinations (see Table 14).

Table 13 Campidanese Sardinian INF ˈfai ‘do’ (GER , SG.IMP ˈfai, PL.IMP )
Table 14 Vaux Francoprovençal INF ‘drink’ (GER , SG.IMP , PL.IMP )

The introduction of /d/ in the paradigm of Campidanese Sardinian ‘do’ (Lepori, 2001)Footnote 10 perfectly matches the stem-domain III, thus abiding by the predictive structure generated by the inherited Romance morphomes. In Vaux Francoprovençal ‘drink’ (Duraffour, 1932), by contrast, the novel alternation involving etymological stem final consonant /v/ concerns multiple stem-domains. This consonant has been preserved in I+III+IV+V and lost in the complement set of cells, stem domains II+VI (see Table 9). Overall, the number of innovations (of any kind) found to concern a single stem domain or its complement set (as in Table 13) was 86 (20%), and the number of innovations spreading over multiple stem domains (as in Table 14) was 106 (25%).Footnote 11

The productivity of different stem domains and stem-domain combinations differs dramatically however (much more than between the different morphomes). Among the former, stem domain II (43, 10%) and stem domain III (30, 7%) were found to be the most productive ones (see Herce & Cathcart, forthcoming for the relevance of the former domain in stem shortening in Romance, and its relation to high token frequency). These stem domains, thus, clearly outrank the productivity of morphomes as found in Fig. 2. By contrast, not a single innovation was found to target stem domains I or IV.Footnote 12 Among the latter, stem domains II+VI (32, 7%), III+IV (16, 4%), and I+III+IV (7, 2%) are the most productive, while around half (12/25) of the logically possible combinations (e.g. I+III, I+VI, II+IV, IV+V, I+II+III) are completely unattested as niches for the spread of morphophonological innovations.

Tentative explanations could be proposed for some of these differences. The high frequency of stem-domain II’s cells, for example, might allow them to achieve a degree of autonomy (à la Bybee, 2007) that, for example, stem-domain IV (i.e. 1/2PL.PRS.SBJV) can never achieve by itself. If a different stem cannot ever survive in these two cells alone due to their low frequency, that might be the reason why IV tends to merge with its (semantic/syntactic) neighbours I (to form L), or III (to form III+IV), or both (to form I+III+IV). IV is never found, however, to merge with non-neighbouring II, for example, probably because this would give rise to a semantically very odd pattern of stem alternation (see Herce, 2019).

Despite the various morphome and/or stem-space-compliant changes described so far, the majority (240, 56%) of the innovations identified here split the paradigm in ways that, like the Spanish example in Table 3, match neither morphomes nor morphome-derived stem domains. Consider the verbs in Table 15. The verb ‘cook’ represents one with a preserved stem-final consonant. It shows, with its regularly developed morphology, the paradigmatic distribution in Istro-Romanian of the L morphome (root kok- < Lat. kokw+ back vowel) and the PYTA morphome (root kops- < Lat. koks-). N, not present in ‘cook’, has the usual SG+3PL paradigmatic extension in the language (see Table 9). The verb ‘drink’, by contrast, shows a more recent pattern of stem alternation. The forms of the present clearly hint at what must have happened. After the loss of the etymological stem-final consonant /b/ (note that deletion of Latin intervocalic /b/ is regular in Romanian, see Gartner 1904:110), a hiatus must have developed between stem vowel and the thematic vowel/inflectional suffixes. Glides developed then to repair these sequences, with a glide /j/ emerging before front vowels, and a glide /w/∼/ʋ/∼/v/ before back vowels. The resulting different stems have stayed in place in the frequent present tense, but not in other parts of the paradigm, where they seem to have even exchanged contexts occasionally.Footnote 13 The outcome, however, has not been to become better aligned to the pre-existing patterns of (morphomic) stem alternation. Importantly for the purposes of the present taxonomization, the integrity of stem domain III is violated, as the stems of the 1PL and 2PL present and the imperfect indicative have diverged. Note that this configuration must even be analogical, as both 1/2PL.PRS and IPF are characterized by following front vowels. This suggests that the stem identity between these two domains is not highly influential/active in speakers’ minds.

Table 15 Two Istro-Romanian verbs showing different stem alternations (Puşcariu, 1926)

Deviations from the structure provided by inherited stem domains, thus, are frequent enough to represent a majority among the more recent morphophonological innovations that this paper exlores. These deviations are, however, not random. In the morphosyntactic description of these deviations, the cells in Fig. 4 are the most frequently mentioned ones.

Fig. 4
figure 4

Cells that most frequently do not conform to inherited stem domains

INF is the most frequently mentioned cell, present in 87 (36%) different innovative paradigmatic splits, followed by the GER (69, 29%), the 1PL.PRS.IND (54, 23%) and the 2PL.PRS.IND (54, 23%). It must be noted that these forms constitute the most peripheral members of stem domain III, whose morphological unity is very frequently compromised. This is a finding that largely agrees with the conclusions that Esher (2021) draws on the basis of a more limited dataset. As she conjectures, the fact that this is an area of the paradigm which is defined negatively (i.e. it is defined as those cells that do not partake in either of the standard morphomic alternations L, N, PYTA, and FUÈC) and the fact that it has a low lexical frequency (i.e. there are very few lexemes where III has a stem of its own) might be part of the story. However, the syntactic and semantic heterogeneity of the component cells must also be playing some role in their greater vulnerability to changes that disrupt their stem identity. The nonfinite forms INF and GER are probably the most different ones in the paradigm from the bulk of finite forms. The infinitive, and to a smaller extent the gerund, also have the high token frequencyFootnote 14 required to achieve a degree of autonomy (à la Bybee, e.g. 2007) that may allow them to cut their morphological ties to other word forms which are already distant semantically and syntactically.

Stem-domain III might be the most vulnerable, but all of them have been found here to be broken under the right circumstances. Stem domain II, for example, is also broken relatively frequently via the SG.IMP (46, 19%), and the 3PL.PRS.IND (42, 18%), and stem domain I is often split (see Table 16) via the 1SG.PRS.IND (37, 15%).

Table 16 Surmiran ‘say’ INF dir, SG.IMP dei, PL.IMP ʤe (Candrian, 1900)

In the paradigm of ‘say’ in Surmiran Romansh, an unetymological stem-final /d/ has intruded in the PRS.SBJV cells. This, however, has failed to extend to the 1SG.PRS.IND (also a part of the expected domain of L in the language, as demonstrated by stem alternation patterns in other verbs like the reflexes of volō ‘want’ or possum ‘can’), which patterns like the other SG.PRS.IND cells instead (see also the same cells of Table 12 for a comparable split of the 1SG.PRS.IND from the PRS.SBJV in Aragonese ‘do’).

The paradigm cells that have been found here to be more prone to “breaking away” from the other cells of their stem domain (i.e. INF, GER, 1PL.PRS.IND, 2PL.PRS.IND [these last two usually en-bloc], SG.IMP, 3PL.PRS.IND, and 1SG.PRS.IND, are characterized, it should be noted, for being comparatively frequent paradigm cells, but even most consistently, by being peripheral members of their stem-domain with respect to their syntax and semantics (by virtue of having a different finiteness, mood or tense value from the rest of the cells). In agreement with this finding, when the morphological unity of TAM stem domains V and IV is broken, this happens almost exclusively via ‘splits’ (see Corbett 2015, 2022) aligned to tense distinctions.

We can see two examples of this in Table 17. In Saint Augustin Occitan ‘know’ (Monteil, 1997), we can see the stem-final consonant /b/ (cf. Lat. sapiō) in the conditional tense but not in the future. This must represent an analogical reintroduction of this consonant from other parts of the paradigm where it was preserved because /w/ (cf. ʃowˈrej) is the regular reflex of /b/ before a consonant. A segment /b/ might have been reintroduced, for example, from the IPF.IND, with which the COND shares person endings. Similarly, In Panticosa Aragonese (Nagore Lain, 1986), we observe that the spread of the short stem f- across the paradigm has affected one of the component tenses of PYTA, the PLUP.IND, but not (yet?) the PRET, thus leading to the breakup of the stem unity within this morphomic domain.

Table 17 Stem splits in FUÈC and PYTA

7 Discussion

Although the morphological evolution of Romance stem alternations has lately been described with reference to Autonomous Morphology, and abstract patterns of alternation that have nothing to do with syntax/semantics or phonology, the findings of the present research cast doubt about this being the whole story. When one considers late (i.e. independently emerged) morphophonological innovations unrelated to morphome-typical alternations, it emerges that a majority of them does not abide by morphomic templates.

The raw numbers and percentages presented so far, however, need to be properly contextualized. The main appeal of abstract morphomes in accounting for morphological change is their restrictiveness. Out of the myriad of different paradigmatic distributions that a morphological innovation could potentially adopt in the paradigm, morphomes predict that only a small fraction of these are possible. In a typical Romance verb (see Table 9) we find 4 morphomic splits, 6 stem-space splits, and 25 stem-domain-combination splits. This contrasts dramatically with the over half a trillion (549 755 800 000) different splits that are logically possible in the average Romance paradigm.Footnote 15

Against this background (Table 18), the interpretation of the numbers in Fig. 2 changes notably. By chance alone, it would be almost impossible (less likely than winning the lottery) that any one of the 502 innovations in our sample would adopt one of the 4 morphomic templates, or one of the 31 stem-space-abiding ones. The number of innovations found of this kind, and more dramatically still, of innovations that have extended to the whole-paradigm, is several orders of magnitude higher than we might have expected by chance. This does speak in favour of these structures constituting abstract and cognitively real niches that, even centuries after their emergence and the breakup of Romance, continue to influence the paradigmatic distribution of incoming forms.Footnote 16

Table 18 Number of splits of each kind and relative over/underrepresentation

This is, however, still just half of the story. Some morphomes (N), some stem domains (II and III), and some stem domain combinations (II+VI, III+IV and I+III+IV) are much more productive than others. In the same way, also among the stem-space non-abiding patterns, some of them are comparatively common (the highest ranking are III-GER [12 occurrences], VI+INF [11], II+1SG.PRS.IND [11], INF [8], I+III+IV+3PL.PRS.IND [8], SG.IMP [7], III-1PL.PRS.IND-2PL.PRS.IND [7], 1SG.PRS.IND [5], INF+SG.IMP [4], III-INF-GER-1PL.PRS.IND-2PL.PRS.IND [4], II+1SG.PRS.IND+1PL.PRS.IND+2PL.PRS.IND [4], and 2SG.PRS.IND+ 3SG.PRS.IND [4]) while most novel splits (84/128, 66%) are attested only once.

None of these frequent non-abiding splits (except maybe VI+INF, which continues the affinity between the INF and the FUÈC tenses (which grammaticalized via periphrases containing the INF) make Autonomous-Morphological sense. They are much easier to explain, however, with reference to the feature-values of the words where they occur. Some of these alternations (e.g. III-INF-GER-1/2PL.PRS.IND and II+1SG.PRS.IND+1/2PL.PRS.IND, and those that were discussed in Tables 16 and 17), match TAM values (IPF.IND, PRS.IND, PRS.SBJV, COND, and PRET respectively). They involve the acquisition of morphological idiosyncrasies by syntactically and semantically peripheral cells within the paradigm (e.g. nonfinite vs finite forms), or the failure of morphophonological innovations to cross other “hard” (i.e. TAM) semantic borders (e.g. 1SG.PRS.IND, III-INF-GER-1PL.PRS.IND-2PL.PRS.IND), or their failure to stop at “soft” (i.e. person-number) semantic boundaries (e.g. II+1SG.PRS.IND, II+1SG.PRS.IND+1PL.PRS.IND+ 2PL.PRS.IND).

The morphome-derived stem-domains, thus, should be supplemented, for completeness, with a meaning-derived one (something along the lines of Table 19) in order to account for many of the most recurrent domains of novel allomorphy.

Table 19 Paradigm areas derived from the semantics of different paradigm cells

Superimposing this sort of structure over the morphomic one in Table 9 would lead to a 13-way division of the paradigm that would come very close to covering all the recurrent novel splits found in the present research. It would only be empirically suboptimal in two respects: a) In order to account for all splits attested 3 times or more (this is just an arbitrary cut-off point to focus on trends and not on exceptional patterns), the distinctions between PLUP.SBJV and PRET, and between FUT and COND are not strictly needed, as cases like those in Table 17 are rare (see also discussions in Maiden, 2018b: 71-79, 267-272). What is needed instead is b) a boundary between the 3PL.PRS.IND and the rest of the cells, and, to a lesser extent (for only one recurrent split), one between the 2SG.PRS.IND and the 3SG.PRS.IND.

With semantics motivating the split (from Table 9 to Table 20) between 1A and 1B, between IIA and IIH, and between IIIA, IIIC, IIIJ and IIIK, it is a logical next step to ask what might be motivating the two small tweaks that we mentioned separate Table 20 from the mere juxtaposition of the morphomic (Table 9) and semantic (Table 19) systems. The answer, I believe, is no other than the token frequency of different cells in natural speech.

Table 20 Pan-Romance stem domains based on late morphological innovations

Stem domain IIA’s cells are all among the highest frequency ones in the paradigm (see Figure3) and must therefore be stored in the lexicon for a great number of verbs. As a result, a morphological innovation that affects one of these cells need not extend to the others.Footnote 17 The opposite effect of frequency might be responsible for the failure of different PYTA and FUÈC tenses to cut ties with each other. More infrequent tenses might not be able to “make it” easily on their own and would tend to share the same traits as other parts of the paradigm. The inherited morphomic structure, thus, comes to the rescue in most cases.

The semantic splitting of unnatural morphological domains results, thus, in a paradigm structure with multiple motivations. The partitions illustrated in Table 20 look less morphomic than they did in Table 9. The relative similarity of the different morphological subdomains can be explored in a finer-grained manner by counting the number of times that these domains behave the same with respect to the adoption (or lack thereof) of the morphophonological innovations explored in this paper. Looking at those splits that occur 3 times or more in our dataset (and weighing for their observed relative frequencies), Hamming distances were calculated in R (through the function hammingdists) between the different stem domains in Table 20. Subsequent hierarchical clustering (function hclust, method=”average”) leads to the groupings in Fig. 5 (all distances reported in Table 23).

Fig. 5
figure 5

Hierarchical clustering of the stem-domains in Table 20 by similarity

As the clustering in Fig. 5 shows, the similarity between the different more-or-less natural-class stem domains in Table 20 runs in many ways parallel to the morphomic structural categories in Table 9: The various subparts of I, II, and III show up now as probabilistic clusters of similarity. At the same time, semantic relations explain the subclustering within those groups: the imperative is the most dissimilar in II (different mood), and the infinitive the most dissimilar in III (note that the gerund shares at least its imperfective nature with the imperfect indicative). In the highest levels of the hierarchy, the similarity in behaviour of FUÈC and PYTA might be unexpected. This could be due to an ongoing consolidation of the least token-frequent domains in the paradigm into a non-II ‘default’ form of the stem (note how these areas are grouped in progressively larger domains ((V+VI)+I)+III before they join the most externally dissimilar domain of II). Morphomic, semantic, and frequency-based insights, thus, all come together to explain the trends observed in morphophonological innovations in Romance.

A final observation with respect to the cognitive-paradigmatic domains identified here through morphological innovations in stems is that they are remarkably close to the domains of interpredictability identified through conditional entropies in individual Romance languages (e.g. Pellegrini & Cignarella, 2020, Beniamine et al., 2021) and also in Latin (Pellegrini, 2020), looking at whole word forms. Suffixal and stem allomorphy, in fact, seem to behave alike in many instances (see Tables 21 vs 22), which casts doubt on the usefulness and empirical basis of segmentation in at least some cases.

Table 21 Interpredictability areas based on suffixal allomorphy in the Spanish PRS.IND
Table 22 Interpredictability areas based on stem allomorphy in the Spanish PRS.IND

In the largest inflectional classes in Spanish (Table 21), and in many other Romance languages, the 3SG suffix can predict and be predicted by the 2SG and 3PL suffixes. The 1PL suffix can predict and be predicted by the 2PL and INF. The 1SG suffix cannot predict any other cells. In the right-edge allomorphic alternations of the CVC>CV verbs in Table 21, we find the exact same areas of interpredictability. Future explorations could be aimed at ascertaining whether suffixal innovations paint indeed a similar picture to the one novel stem alternations have provided in the present paper.

8 Conclusion

Some research traditions like Distributed Morphology have notably underplayed the relevance of paradigmatic relations and wordform-to-wordform predictive structure for the organization of morphological allomorphy. This triggered an understandable reaction from the Autonomous Morphology camp, initiated by Aronoff (1994), which recognized the importance of these relations. This reaction, however, has arguably leaned occasionally too far towards the opposite extreme position that the semantic and syntactic relations within the paradigm have little to no influence in the evolution of (Romance) inflectional morphology. Blevins (forthcoming), for example, questions the importance of natural classes in language, and Carstairs-McCarthy (2010:210) has similarly argued that morphological evolution suggests that the importance of features “has been overrated”. Maiden (2016:49) argues that morphomic patterns are not dispreferred in Romance diachrony and he favours explanations other than semantic structure when accounting for cases of morphomes aligning to semantic values (Maiden, 2018b:308-309).

In view of the findings of the present paper, one feels entitled to concur with Aristotle that there is virtue in the mean. Morphomic structure (derived from accidental historical events in the language), semantic structure (TAM at least), and frequency of use, are all forces of the utmost importance to explain different aspects of the synchronic inflectional morphological system of most languages, including Romance. What I claim here is not entirely new. Some morphologists (e.g. Bybee 2006, 2007) have convincingly argued for the importance of each of these three forces in the organization of language in general and the paradigm in particular. In (re)claiming the importance of the morphological component there is no need to deny that of the rest. Thus, although Romance might have been somewhat exoticized in recent years due to the focus on the purely morphological aspects of its paradigmatic structure, the findings reported here confirm the old insight that other sources of explanation are also needed.

Although its claims might not be new, this paper is the first one to approach the problem of morphome productivity quantitatively and in a comparatively objective and replicable way. The morpho-phonological changes related to the loss of stem-final consonants were surveyed in 22 different lexemes across 63 Romance varieties, which yielded 502 meaningful datapoints. The focus has been on this type of changes and their morphological repairs, rather than on the deployment and redeployment of morphomic alternations for two main reasons: I) To evaluate the diachronic productivity of morphomes in the purest possible way we need to look at morphological alternations that are as phonologically dissimilar as possible from the typically-morphomic ones that were inherited from Proto-Romance, which brings us to II). Because the loss of these segments happened late and separately in different Romance languages, we are more likely to be analyzing comparatively independent datapoints, rather than older traits simply inherited from the ancestral language.

The findings reported here have been many. First, it was found that N is by far the most productive of Romance morphomes. This might be related to the greater token frequency of its cells. At the same time, however, it has been found that most of the late innovative paradigmatic splits surveyed do not follow the established morphomic patterns of alternation. Even when morphome-derived stem domains and stem-domain combinations are also “allowed”, morphome-abiding splits still constitute a minority (N=192, 44.45%). This suggests that the importance of Romance morphomes as abstract patterns of stem identity, independent of their phonological make-up, might have been somewhat exaggerated. Although the attraction effect of morphomic templates still emerges very clearly when these counts are compared to a random baseline of all possible alternation patterns, a much greater proportion of paradigmatic splits can be accounted for with the additional reference to semantic factors. It has been found that novel patterns of alternation tend to split the inherited morphome-provided domains into smaller, more semantically rational, and homogeneous ones. Thus, Inherent Inflectional distinctions (i.e. TAM values, see Booij, 1996), and other ‘hard’ semantic borders such as between different finite and nonfinite forms, often become insurmountable obstacles for the spread of morphological innovations. At the same time, ‘soft’ semantic boundaries (i.e. person-number values) are often ignored, even when these correspond to morphomic boundaries (see also Herce, 2022). Most violations of morphomic structure that cannot be explained by semantic factors can be explained via token frequency, with highly frequent forms from the present indicative capable of resisting assimilation to their morphomic and semantic neighbours.

Although the findings of the present paper are incompatible with both the strictest version of Autonomous Morphology and with the view that there are no morphomes (see positions A and B in Sect. 3), they say little about intermediate positions in the debate. The qualitative claims and discussions that have prevailed in the literature so far cannot be objectively translated into testable hypotheses and numerical expectations, so, besides discarding the two logical extremes, we are still left unfortunately in the dark concerning “who was right and who was wrong”. What I could do, however, was assess the ‘pull’ or ‘power of attraction’ of Romance morphomic templates. The advantage of this quantitative approach is that future research will have the results of this paper (e.g. 15% [see Fig. 1] or 14.6 billion times overrepresented, see Table 18) as less slippery ones to replicate, refute, or compare to.

The present findings notwithstanding, much work remains to be done to explore other aspects of paradigmatic morphological productivity and vitality. Although the present paper has excluded changes involving morphome-typical exponents (e.g. stem augments /k/-/g/, the reflexes of the former inchoative suffix /isk/, etc.), these sometimes do change their distribution in the paradigm as well (consider, for example, the reflexes of dīcō in Alghero Catalan, or fīniō in Macerata Italian, see Maiden et al., 2010). In doing so, they can also inform us about the underlying cognitive structuring of the Romance verbal paradigm. An assessment of whether morphome-typical and morphome-atypical morphology behave in similar or in different ways would also help us quantify the autonomy of Romance stem alternations, in this case from the phonological component.

Due to the Romance-wide focus of the present study, references to the semantic distance between different cells, and to the relative frequency of different feature values have only been approximate. Corpus-based research (e.g. in the distributional semantic tradition) in concrete languages would be able to evaluate, in a much more fine-grained way, the match (or lack thereof) between stem alternation patterns and semantic distributional distances between different paradigm-cells. In addition to this, although the morphophonological phenomena that have been analyzed here are not inherited from the Proto-language in any meaningful way, individual datapoints are, as visual inspection of Appendix A reveals, not completely independent. Morphological innovations across different verbs in the same language tend to converge. Genetic relations between closely-related varieties and areal relations might also lead to shared innovations, with the concomitant double or triple-counting of what is at heart a single event. Phylogenetic work could control for some of these confounds and estimate with greater precision than was possible here the productivity of the different Romance morphomes, for example by calculating the rate of gain and rate of loss of different morphomes in different lexemes and in different areas of the paradigm. These will be left for the future.