Abstract
The Constant Rate Hypothesis (Kroch 1989) states that when grammar competition leads to language change, the rate of replacement is the same in all contexts affected by the change (the Constant Rate Effect, or CRE). Despite nearly three decades of empirical work into this hypothesis, the theoretical foundations of the CRE remain problematic: it can be shown that the standard way of operationalizing the CRE via sets of independent logistic curves is neither sufficient nor necessary for assuming that a single change has occurred. To address this problem, we introduce a mathematical model of the CRE by augmenting Yang’s (2000) variational learner with production biases over an arbitrary number of linguistic contexts. We show that this model naturally gives rise to the CRE and prove that under our model the time separation possible between any two reflexes of a single underlying change necessarily has a finite upper bound, inversely proportional to the rate of the underlying change. Testing the predictions of this time separation theorem against three case studies, we find that our model gives fits which are no worse than regressions conducted using the standard operationalization of CREs. However, unlike the standard operationalization, our more constrained model can correctly differentiate between actual CREs and pseudoCREs—patterns in usage data which are superficially connected by similar rates of change yet clearly not unified by a single underlying cause. More generally, we probe the effects of introducing contextspecific production biases by conducting a full bifurcation analysis of the proposed model. In particular, this analysis implies that a difference in the weak generative capacity of two competing grammars is neither a sufficient nor a necessary condition of language change when contextual effects are present.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
1.1 The Constant Rate Effect
In a seminal paper in historical syntax, Kroch (1989) proposed the Constant Rate Hypothesis:
[W]hen one grammatical option replaces another with which it is in competition across a set of linguistic contexts, the rate of replacement, properly measured, is the same in all of them. (Kroch 1989:200)
Initially (and still logically) a hypothesis, the notion of a constant rate has accumulated enough support over the last three decades for this to be referred to as the Constant Rate Effect, or CRE (see e.g. Pintzuk 2003:511).
The logic behind CREs is as follows: if a variant replaces another variant in two or more different contexts and the rate of change is the same in each of these contexts, then we should assume that only a single change has occurred. CREs have therefore been deployed to argue that two or more apparently unrelated surface changes are in fact manifestations of a single underlying change (Fig. 1). Unifying changes in this way provides strong support for approaches to language in which syntactic variation consists not primarily in lexical or contextual idiosyncrasies but in the values of a finite number of universal parameters, as in the classical Principles & Parameters approach (Chomsky 1981; Chomsky and Lasnik 1993). There is no necessary link between Principles & Parameters and CREs, as Pintzuk (2003:511) emphasizes; the variationist approach within which the Constant Rate Hypothesis is couched is theoryneutral. However, CREs are a useful tool in the armoury of diachronic syntacticians who wish to argue for “the controlling effect of abstract grammatical analyses on patterns in usage data” (Kroch 1989:239).
CREs offer a fresh perspective on the causation of changes. Kroch (1989:238) criticizes the approach to causation in which “the finding that a given context is most favourable to the use of an innovation is taken to show that the innovation is an accommodation to the linguistic functionality of that context.” Where there is a disparity between contexts that share the same rate of change, this “reflects functional effects, discourse and processing, on the choices speakers make among the alternatives available to them in the language as they know it; and the strength of these effects remains constant as the change proceeds” (Kroch 1989:238). In other words: surface changes are to be thought of as reflexes of underlying grammatical changes; the discrepancies in frequencies seen at the surface level are due to extragrammatical factors, or contextual effects, which are independent of the underlying change itself and constant across time.
The usual procedure for detecting a CRE in some diachronic data is to fit a logistic curve (1) to each of the contexts separately and then to compare the growth rates of these curves against each other.
Here, \(p_{t}\) is the frequency of either the innovatory or the receding variant (or parameter value) in a given context at time t, and s is the (timeindependent) rate of change in that context. The k parameter serves to translate the curve along the time axis, indicating the point of greatest growth, or the tipping point, of \(p_{t}\) (Fig. 2). With this operationalization, we have the following procedure for establishing a CRE: a logistic curve of the form (1) is first fit to each of the contexts of interest separately and independently. Then, if variation among the s or ‘slope’ parameters for these curves is found to fall within a reasonable confidence interval, the change is said to proceed at the same (‘constant’) rate in all contexts. Variation among the k or ‘intercept’ parameters, on the other hand, is allowed and is where the contextual effects, independent of the underlying grammatical change, are thought to manifest themselves. This is the procedure used in a number of studies that have sought to establish CREs in various processes of change across a number of languages (e.g. Kroch 1989; Santorini 1993; Pintzuk 1995; Kallel 2005; Pintzuk and Taylor 2006; Kallel 2007; Fruehwald et al. 2009; Postma 2010; Durham et al. 2012; Wallage 2013; Gardiner 2015). Henceforth, we shall refer to it as the standard operationalization.^{Footnote 1}
1.2 The nonlinking problem
Initially, the logistic function (1) was adopted because of its practicability and its success in other disciplines such as population genetics, not because it followed from any established first principles:
[G]iven the mathematical simplicity and widespread use of the logistic, its use in the study of language change seems justified, even though, unlike in the population genetic case, no mechanism of change has yet been proposed from which the logistic form can be deduced. (Kroch 1989:204)
The logistic has since been derived from mathematical models of language acquisition independently by Niyogi and Berwick (1997) and Yang (2000); Ingason et al. (2013) provide a particularly clear illustration of how syntactic acquisition in successive generations can give rise to logistic change at population level. What has never been explicated in detail, however, is why different contextual reflexes of a single underlying change should be governed by logistics agreeing in their s parameters but freely varying in their k parameters: even though this operationalization has proved useful in gathering empirical support for the Constant Rate Hypothesis, it is not a model of the CRE itself.^{Footnote 2} In short, while the standard operationalization may adequately describe historical data, it fails to explain it, suggesting no mechanism for how contextual reflexes spring from underlying changes. The fact that under the standard formulation the independent contextual reflexes are not linked to each other, or to anything else, in this stronger sense we call the nonlinking problem, and there are a number of reasons to believe that the problem is serious enough to warrant that the standard operationalization of CREs should be rejected.
Firstly, note that fitting a number of independent logistics to a number of contexts in some data leaves variation among the k parameters entirely unexplained, even if we assume that the logistics agree in their s parameters as required by the standard operationalization. In principle, it is possible for this variation in k to be arbitrarily large, and it is therefore in principle possible to ‘connect’ two clearly unrelated changes—possibly separated by millennia on the time axis—as long as they happen to share the same growth rate. In principle, then, it is possible to be led to the absurd conclusion that a single change runs to completion in one context before it even takes off in another (Fig. 2).
Secondly, there are reasons to think that not all instances of logistics agreeing in their s parameters are in fact CREs in the sense that a single underlying grammatical change is being modulated in the usage of a speaker or group of speakers by constant contextual (functional, discourserelated, etc.) effects. The relevant evidence comes from studies in which the ‘contextual effects’ are not withinspeaker but betweenspeaker effects or even outright contingencies. Wallenberg (2016) shows that relative clause extraposition is a gradually declining option across the histories of both English and Icelandic, and that the s parameters of the two curves do not differ significantly. Similarly, Willis (2017), in his study of the spread of the innovative secondperson pronoun chdi in the recent history of Welsh, finds that in different regions of Wales the change is more or less advanced (i.e. different intercepts) but that the slopes of the changes are not significantly different. Corley (2014) tests for a CRE in the usage of negative concord between female and male speakers of Early Modern English, using data from Nevalainen and RaumolinBrunberg (2003), and again finds no significant difference in slopes. What these case studies show is that the s parameters of two different changes may be similar for reasons other than being reflexes of a single abstract grammatical pattern, and thus that identity of slope parameters is not a sufficient condition for the assumption of a single underlying change. Wallenberg (2016:e244), for instance, notes explicitly that these are different populations, and suggests that the similarity of slopes may indicate that “the same forces are underlying the change” in both the English and the Icelandic populations—but however these forces are to be understood, we cannot be dealing with a CRE in the traditional sense, as all these authors recognize.^{Footnote 3}
These two problems are in most cases only technical in the sense that a researcher will usually have independent reasons for ruling out such fantastical hypotheses: in particular, the inference that two apparently separate changes are reflexes of the same underlying phenomenon is usually motivated by a particular structural analysis which is arrived at on independent grounds. However, the theoretical importance of these problems is great: they demonstrate that the standard independent logistics formulation of the CRE can serve at most as a proxy to CREs, not as a model of them. If the CRE is a phenomenon—and the empirical support gathered for it over the last three decades suggests it is—this means we have so far failed to model one of the more wellestablished facts about language diachrony. Consequently, we have only a very approximate understanding of the dynamics of language change in the presence of contextual factors, and a number of questions remain wide open: if underlying changes are to be thought of as competition between two or more parametric options or grammars, and if CREs are thought to appear because of some sort of performance effects operating over that process of competition, how, exactly, do the two processes interact? What role does the magnitude of the performance effects play in the overall change? Could contexts that favour the innovatory variant be so favouring as to accelerate the change, and if so, can this accelerating effect be quantified and measured? Similarly, could disfavouring contexts slow the change down? Could they even block change in certain cases? These questions can only be answered with the help of mathematical models of change that accommodate mechanisms for both grammatical competition and contextual effects, and also define, without equivocation, the possible interactions between these two mechanisms over time.
The nonlinking problem has, of course, not gone unnoticed in the literature. As Roberts (2007) puts it:
One might wonder why [the CRE] should hold. It is unlikely to be a fact about the grammars themselves. Instead, it is plausible that it may be a fact either about speech communities or about the ways in which individuals choose among grammars available to them. As such, it may be attributable to sociolinguistic factors or to the dynamics of populations, or both factors acting in tandem. (Roberts 2007:313)
Our aim in this paper is to propose a solution to the nonlinking problem, and our concrete proposal is that the CRE occurs because of contextspecific production biases which serve either to promote or to hinder an underlying change in progress. That is to say, we will argue that the CRE is indeed a fact about the ways in which individuals choose among grammars available to them, and propose a rigorous mathematical model of this kind of speaker behaviour. The result is a first step towards a mechanistic model of the CRE that not only describes the diachronic phenomenon but explains it by deriving it from independently plausible first principles of language acquisition and use.
1.3 Plan
The paper is structured as follows. In Sect. 2, we augment Yang’s (2000) mathematical model of grammar competition with production biases to account for variability across contexts. This results in a dynamical system in which the evolution of the underlying change—a parameter switch for us—and the evolution of the usage frequencies in a number of linguistic contexts feed into each other iteratively. We then derive analytical expressions for the time evolution of this system and show in Sect. 3 that in most cases it can be approximated by a constrained set of equations based on one logistic. In Sect. 4, this approximation is used to derive a theorem concerning the possible temporal separation between two reflexes of one underlying change: we show analytically that under our proposed model the time separation between contexts always has a finite upper bound which is inversely proportional to the rate of the underlying change. This solves the problem of unconstrained variation in the k parameters: under our model, it is no longer possible for the curves of different contexts to be radically distant from each other in time. In the remainder of Sect. 4, we proceed to test the model empirically from two complementary angles: (1) by investigating whether time separations observed in a number of previously established CREs agree with the predictions of our time separation theorem, and (2) by testing whether our model is able to distinguish actual CREs from pseudoCREs, that is, surface changes that proceed at similar rates accidentally but that are clearly not reflexes of one and the same underlying change.
A side product of this investigation is an extension of some of the analytical results in Yang (2000). In Sect. 3, we uncover all possible outcomes of the dynamical interplay between grammatical competition and production biases. A full bifurcation analysis of the twogrammar case shows that production biases can both induce change in settings where Yang’s (2000) model outlaws change, and block change in settings where Yang’s (2000) model predicts change. On the assumption that a model which incorporates the possibility of production biases is more realistic than one that does not, then, the assumption that language change is driven (solely) by distributional differences in the proportion of sentences parsed by different competing grammars is shown to be too simple. A theorem resulting from our analysis of the extended model shows that, when production biases are in operation, such differences are neither necessary nor sufficient for change, though they continue to play an important role in any given change process in a way that can be quantified exactly. In Sect. 5, we offer a brief account of the nature of production biases; Sect. 6 concludes.
2 Grammar competition and production biases
2.1 Learning competing grammars
Empirical work on language variation and change has demonstrated the limitations of the traditional view of parameter setting as a onceandforall process which leaves the learner with a unique grammar at the point of maturation: speakers have, at least during periods of change, access to more than one grammar (Kroch 1989, 1994, 2000; Santorini 1992; Pintzuk 2003). As pointed out by Santorini (1992:619), this intraindividual coexistence of multiple grammatical systems is “an ability for which the phenomena of multilingualism, diglossia and intrasentential codeswitching provide independent and incontrovertible evidence”; see also the discussion in Roberts (2007:319–331). This notion has been formalized by Yang (2000, 2002) in his mathematical model of competitiondriven change, on which our model of the CRE is based. We therefore begin by reviewing the operating principles behind this model, focussing on the presentation in Yang (2000).
This model construes language change as a learning process in a homogeneous, wellmixing population with nonoverlapping generations. At each iteration, we can therefore think of the population as a single individual who sets parameters based on the linguistic output of the previous generation, abstracting away entirely from the social and geographical structure of that population. In the competing grammars framework, each biologically possible grammar of human language \(G_{i}\) is associated with a weight which gives the probability of an individual using that grammar. The framework allows any number of those grammars to compete; however, during wellstudied and relatively wellunderstood periods of language change, it usually seems to be the case that two grammars are in competition. Since, additionally, this renders the mathematics of the model particularly tractable, we focus on the twogrammar case in all that follows.
Let \(G_{1}\) and \(G_{2}\) be these two grammars, and denote their weights with \(p_{t}\) and \(q_{t}\), respectively, indexed for generational time t.^{Footnote 4} The basic insight behind Yang’s (2000) model is that each grammar has its timeindependent (parsing) advantage, which is simply the proportion of sentences the other grammar cannot parse (out of all sentences generated, in abstracto, by either grammar). There are then fundamentally three kinds of sentence: sentences of type \(L_{1}\), which \(G_{1}\) but not \(G_{2}\) parses; sentences of type \(L_{2}\), which \(G_{2}\) but not \(G_{1}\) parses; and sentences of type \(L_{X}\), which both grammars parse (Fig. 3). The language learner receives primary linguistic data (PLD)—the linguistic output of the generation at time step t—and his task is to arrive at weights \(p_{t+1}\) and \(q_{t+1}\) for the two competing grammars in his own generation. Letting α denote the advantage of \(G_{1}\) and β that of \(G_{2}\), then (assuming he samples his environment uniformly) the learner is confronted with a number of sentences drawn from the following distribution:
Based on this input, the learner is assumed to set parameters in accordance with linear reward–penalty learning, an offtheshelf learning algorithm from mathematical psychology (Bush and Mosteller 1951, 1958; Narendra and Thathachar 1989). Of crucial importance here are the two quantities \(c_{t} = \beta q_{t}\) and \(d_{t} = \alpha p_{t} = \alpha (1q_{t})\), known as the penalty probabilities of the two grammars: \(c_{t}\) is the probability of the learner encountering a sentence which \(G_{1}\) cannot parse and \(d_{t}\) the probability of a sentence which \(G_{2}\) cannot parse. It can be shown (Narendra and Thathachar 1989:162–163) that, if the learner’s training sample is large enough, eventually he ends up with a weight \(q_{t+1}\) which is well approximated by
Assuming \(c_{t} \neq 0\) and \(d_{t} \neq 0\) without loss of generality, this equation may be reduced to the more useful form
where we write ρ = α/β for the ratio of the parsing advantages. Equation (4), then, relates the grammar weights of the (t + 1)th generation to those of the tth generation, thereby defining the intergenerational or diachronic dynamics of a sequence of (reliable) linear reward–penalty learners.^{Footnote 5}
It follows that α<β, or ρ<1, is a sufficient condition for grammar \(G_{2}\) to overtake grammar \(G_{1}\):
Theorem 1
(The Fundamental Theorem of Language Change; Yang 2000:239)
Assume reliable learners, so that (4) holds. Then \(q_{t} \to 1\) as t→∞ if α<β, and \(q_{t} \to 0\) as t→∞ if α>β.
In other words, the grammar with the greater parsing advantage will necessarily win out in the long term. The difference equation (4) may in fact be solved for t to yield
where \(q_{0}\) is the weight of \(G_{2}\) at the point of actuation of the change (Appendix A.1, Corollary 4): hence as soon as the value of \(q_{0}\) is known, the entire change trajectory can be predicted. Furthermore, it is not difficult to show that this solution is equivalent to
with s = −log(ρ) and \(k=\log(\rho)^{1}\log (q_{0}^{1}  1)\). Thus, assuming that learners receive representative samples of their linguistic environments, a diachronic sequence of such learners exhibits logistic evolution. In particular, the slope of the trajectory is directly dependent on the advantage ratio ρ such that the smaller ρ (the more advantageous \(G_{2}\) is), the faster the change from \(G_{1}\) to \(G_{2}\), and vice versa.
This is the gist of the competing grammars model of language change; for more details, see Kroch (1994), Yang (2002), Pintzuk (2003) and especially Heycock and Wallenberg (2013), who apply the model to a concrete case study involving the loss of verb movement in Scandinavian.
2.2 Competing grammars and contextual biases
To account for contextual effects and the CRE, we now assume the existence of K linguistic contexts 1,…,K, with each sentence generated by \(G_{1}\) or \(G_{2}\) belonging to one and only one of these contexts.^{Footnote 6} Each context i is equipped with a context weight \(\lambda_{i}\) that gives the proportion of sentences that fall in that context (out of all sentences generated by either \(G_{1}\) or \(G_{2}\)); clearly, since we are dealing with proportions, we require \(\lambda_{1} + \dots + \lambda_{K} = 1\) (Fig. 4). In addition to these weights, each context is associated with a fixed (constant over time) production bias \(b_{i}\) which can be positive, negative or zero. In the first case, the context favours \(G_{2}\); in the second, it favours \(G_{1}\); and in the third case, the context is neutral with respect to the two grammars.^{Footnote 7}
Now consider a language learner acquiring his grammar weights based on the output of generation t of speakers. With Yang (2000), we assume that the tth generation has internalized grammar weights \(p_{t}\) and \(q_{t}\). Where our treatment diverges is the effect these weights have on the language acquisition process of the (t + 1)th generation. Rather than assuming that \(p_{t}\) and \(q_{t}\) feed directly into the acquisition process in the following generation, we assume that speakers of the tth generation may promote or demote the two weights \(p_{t}\) and \(q_{t}\) in different linguistic contexts in different ways, subject to the contextspecific production biases \(b_{i}\). It is then on the basis of this usage, modulated by the contextual biases \(b_{i}\) and the context weights \(\lambda_{i}\), that the next generation of learners must infer their grammar weights.
Letting \(q^{(i)}_{t}\) denote the probability with which a speaker of the tth generation uses grammar \(G_{2}\) in context i, and similarly for \(p^{(i)}_{t}\) and \(G_{1}\), a general form of this biasing is
where F and G are some (yet undetermined) functions which modulate the effect of the bias \(b_{i}\) on production. These functions must satisfy two requirements:
and the following is a theorem.
Theorem 2
Functions \(F = F(b_{i}, p_{t})\) and \(G=G(b_{i}, q_{t})\) satisfy the conditions (7) and (8) if, and only if, they satisfy
Proof
Appendix A.2. □
Theorem 2 thus implies that the functions F and G are necessarily the additive inverse of each other, and that their absolute value is necessarily bounded from above by the minimum of \(p_{t}\) and \(q_{t}\). Technically an infinite number of functions satisfy this pair of conditions, so we need to ask what these functions actually are (Fig. 5). The simplest, most parsimonious choice is to consider the product \(p_{t}q_{t}\), which is guaranteed to be bounded from above by both \(p_{t}\) and \(q_{t}\) whenever \(0 \leq p_{t},q_{t} \leq 1\). In other words, we suggest setting
with \(1\leq b_{i} \leq 1\). The contextual usage probabilities in (7) then assume the definite forms
where the contextual biases \(b_{i}\) range from −1 (maximally \(G_{1}\)favouring) through 0 (neutral) to 1 (maximally \(G_{2}\)favouring).
This choice for the functions F and G has a number of intuitively satisfying features. For example, (10) implies that if either \(p_{t}=1\) or \(q_{t}=1\), then F = G = 0 (since in the first case \(q_{t}=0\) and in the second case \(p_{t}=0\) and consequently \(p_{t}q_{t}=0\)) and no biasing will apply. Empirically, this means that if a grammar has been acquired categorically, no contextual biases will be able to skew usage in the direction of the other grammar. This is intuitively right: if a grammatical option has been acquired categorically, then by definition the competing option does not exist for the speaker and no grammarexternal biasing ought to be able to apply. This behaviour of our biasing mechanism in the limits \(q_{t}\to 1\) and \(q_{t} \to 0\) is just one manifestation of a more general feature of the model: that while the biases \(b_{i}\) themselves are constant and do not change over time, the magnitude of the effect of these biases on usage does depend on the state of the underlying change: the effect is the strongest midway through the change (from Fig. 5, we see that the effect is the strongest when \(q_{t}=0.5\)) and tails off to zero in the limits \(q_{t} \to 1\) (completion) and \(q_{t} \to 0\) (actuation). As we will see in Sect. 4, this is what the empirical data also show.^{Footnote 8}
With (11) in place, it is possible to work out the diachronic, intergenerational dynamics of our model (assuming, again, that learners receive large input samples). What generation t outputs in this extended model is not the distribution given in (2), but a combination of grammar advantages (α and β), grammar weights (\(p_{t}\) and \(q_{t}\)), context weights (\(\lambda_{i}\)) and context biases (\(b_{i}\)). The penalty probability for grammar \(G_{1}\) now becomes
where the index i runs through the contexts i = 1,…,K and where we write \(B = \sum_{i=1}^{K} \lambda_{i} b_{i}\) for convenience.^{Footnote 9} The quantity B, which may be regarded as the net bias operating on the language acquisition process weighted by the context proportions \(\lambda_{i}\), turns out to be a decisive quantity in our model: from (12), we immediately see that if B = 0, the penalty \(c_{t}\) reduces to the Yangian penalty \(c_{t} = \beta q_{t}\). Our model, then, generalizes Yang’s (2000) model and reduces to the latter in the special case that the contextual biases are ‘in balance’—if either all the biases are zero or if \(G_{2}\)favouring (positive) biases cancel out the effect of \(G_{1}\)favouring (negative) biases.
Entirely symmetrically, the penalty for grammar \(G_{2}\) reads
Assuming reliable learners, we may now use these penalty probabilities to write down the intergenerational difference equation that relates \(q_{t+1}\) to \(q_{t}\) for the extended model: equation (4) becomes
Recalling that \(p_{t} = 1  q_{t}\), this may be written as
where ρ = α/β as before and
2.3 The Constant Rate Effect
To summarize, we propose to augment Yang’s (2000) model of grammar competition with a set of production biases \(b_{i}\) which modulate the grammar weights \(p_{t}\) and \(q_{t}\) in actual linguistic production. This modulation is implemented by a mechanism which, we have shown, has to operate within certain analytical bounds. Within those bounds, we have suggested that the most parsimonious mechanism be adopted, corresponding to our particular choice of the biasmodulating functions F and G, as explained above. The diachronic behaviour of this extended model is characterized by equations (11) and (15): the difference equation (15) gives the evolution of the underlying grammar weight \(q_{t}\), whilst equation (11) supplies the contextspecific value of this probability, modulated by the contextual production biases. The flowchart in Fig. 6 illustrates the intergenerational dynamics that result from this mechanism, comparing our extension of Yang’s (2000) model to the original.
Before moving on to an empirical evaluation of our proposed model, we first ask whether it produces, in broad qualitative terms, the right kind of behaviour. To this end, Fig. 7 shows the behaviour of our model in two different situations involving three arbitrarily chosen contexts: in a situation in which the contextual biases are in balance and cancel each other out (B = 0; Fig. 7a), and in a situation in which the net effect of biases in favour of the conventional variant \(G_{1}\) conspire against the propagation of the innovative variant \(G_{2}\) (B<0; Fig. 7b). Impressionistically, our model produces a CRE in both cases: the probability of use of \(G_{2}\) increases roughly at the same rate in each context, with a characteristic temporal shift between the propagation curves of the individual contexts. This suggests that our model is able to replicate the central intuition of Kroch (1989) that different reflexes of one underlying change ought to proceed at similar rates, and that the output of our model can, in principle, approximate the empirical situations that have been suggested as CREs in the literature.
In these two cases, the evolution of the underlying probability \(q_{t}\) is different, however, because of the different biasing that applies in each case. In Fig. 7a the contexts are ‘in balance’ (B = 0), which by the preceding analysis implies that the evolution of \(q_{t}\) itself is logistic. In Fig. 7b, on the other hand, \(G_{1}\)favouring biases outweigh \(G_{2}\)favouring biases (B<0), hindering the propagation of \(G_{2}\). This is reflected in the fact that the evolution of the underlying \(q_{t}\) is slowed down. Even though \(G_{2}\) still overtakes \(G_{1}\) in the limit, the trajectory of \(q_{t}\) is no longer strictly logistic (careful examination shows that it is not symmetric about the midpoint \(q_{t}=0.5\), but rather exhibits slower change for \(q_{t} < 0.5\) and faster change for \(q_{t} > 0.5\)). This motivates us to consider extreme model parameter regimes, particularly the subspace where B is negative, in more detail.
In equation (15), the factor \(\Lambda_{t}\) depends on \(q_{t}\) whenever B ≠ 0. This complicates the analysis of the extended model significantly: while the Yangian equation (4) can be solved for t to yield the logistic function, we are not aware of a closedform solution to the more complex nonlinear difference equation (15) except in the singular case B = 0, where the equation reduces to (4). This has the undesirable practical consequence that there is no trivial way of fitting our model to data—lacking a closedform curve for the underlying probability \(q_{t}\) from which to derive curves for the contextual reflexes \(q^{(i)}_{t}\), there simply are no closedform contextual curves to fit. The best one can do is to iterate the model for various choices of model parameter values and initial conditions and compare the resulting trajectories against empirical data, an approach which soon becomes computationally prohibitive as the number of logically possible model parameter combinations grows as a superlinear function of the number of model parameters. To tackle this problem, we will in the next section conduct a full analysis of the behaviour of our model in the limit t→∞ and show that, under most empirically meaningful combinations of model parameter values, the underlying trajectory \(q_{t}\) is well approximated by a logistic curve. Thus, even though we cannot write down the solution of \(q_{t}\) for arbitrary times t, and even though we know that for some parameter values (such as when B<0) the evolution of \(q_{t}\) is not logistic, we can use logistic functions to approximate the true value of \(q_{t}\). This will form the basis of our curvefitting procedure in Sect. 4. A reader who is willing to skip the technicalities of the logistic approximation may advance straight to Sect. 4.
3 Dynamics of the extended model
3.1 Advantage versus bias
As we have noted above, the Fundamental Theorem of Yang’s (2000) model is that a more advantageous grammar will necessarily overtake a less advantageous one: if ρ<1 (α<β) and learners are reliable, then \(q_{t}\to 1\) as t→∞, and thus grammar \(G_{2}\) overtakes \(G_{1}\) (Theorem 1). A nontrivial consequence of extending the model with production biases is that this theorem no longer holds: a difference in the proportion of input parsed by the two competing grammars is neither sufficient nor necessary for language change. While this is a minor observation from the point of view of the CRE, which is the main focus of the present paper, the failure of the Fundamental Theorem under suitable combinations of grammar advantages and production biases is an interesting finding from the vantage point of the theory of language change in general, and we will therefore pursue it briefly in this section. The bifurcation scenario here outlined will also play a role in the logistic approximation that we develop in the following subsection for model evaluation purposes.
The production biases \(b_{i}\) can be positive, negative or zero. In the first case, the context in question favours \(G_{2}\) over \(G_{1}\); in the second case, \(G_{1}\) is favoured; and in the third case, the context is neutral. The scalar product \(B = \sum_{i=1}^{K} \lambda_{i}b_{i}\) of context weights and production biases turns out to play a critical role in determining how, and if, change from \(G_{1}\) to \(G_{2}\) happens. If there are negatively biased (\(G_{2}\)disfavouring) contexts, and if their share of all sentences in the language learner’s PLD is large enough, change from \(G_{1}\) to \(G_{2}\) can be blocked even if the advantage of \(G_{2}\) is greater than the advantage of \(G_{1}\). On the other hand, if there are sufficiently strong positively biased (\(G_{2}\)favouring) contexts, \(G_{2}\) may overtake \(G_{1}\) even if the latter’s advantage exceeds that of the former. A critical value \(B_{c}\) of the net bias B in fact exists such that change from \(G_{1}\) to \(G_{2}\) is guaranteed whenever \(B > B_{c}\) but is blocked whenever \(B \leq B_{c}\):
Theorem 3
(The Extended Fundamental Theorem of Language Change)
Assume reliable learners, so that (15) holds. Let \(q_{0}\) be the weight of grammar \(G_{2}\) at the point of actuation, let \(B=\sum_{i=1}^{K} \lambda_{i}b_{i}\), and let
Then

1.
\(q_{t} \to 1\) as t→∞, if \(B > B_{c}\);

2.
\(q_{t} = q_{0}\) for all t, if \(B = B_{c}\);

3.
\(q_{t} \to 0\) as t→∞, if \(B < B_{c}\).
In other words, \(G_{2}\) overtakes \(G_{1}\) if, and only if, \(B > B_{c}\).
Proof
Appendix A.3. □
In dynamicalsystems terminology, the production bias mechanism induces a bifurcation in the parameter space of the extended model: small tweaks made to either the biases (\(b_{i}\)) or to the proportion of input falling in each context (\(\lambda_{i}\)) can alter the trajectory of language change entirely by determining which of the two grammars will win out (Fig. 8). An immediate consequence of Theorem 3 is that if no context is negatively biased, \(G_{2}\) will overtake \(G_{1}\) whenever ρ<1:
Corollary 1
If ρ<1 and \(b_{i} \geq 0\) for all contexts i, \(q_{t}\to 1\) as t→∞.
Proof
Since \(B_{c} < 0\) for any choice of \(q_{0}\), if ρ<1. □
Even though production biases, then, can induce or block change in parameter regimes where such behaviour is impossible in Yang’s (2000) original model, there are limits to how much of an effect the biases can have over grammar advantages. Briefly put, if \(G_{2}\) is much more advantageous than \(G_{1}\) (0<ρ≪1), then no amount of negative bias can block change, and, on the other hand, if \(G_{1}\) is much more advantageous than \(G_{2}\) (ρ≫1), no amount of positive bias can make \(G_{2}\) overtake \(G_{1}\). How much is much depends on the boundary condition \(q_{0}\):
Corollary 2
If \(\rho < q_{0}/(1+q_{0})\), then \(q_{t} \to 1\) as t→∞, regardless of the value of B. If \(\rho > (2q_{0})/(1q_{0})\), then \(q_{t} \to 0\) as t→∞, regardless of the value of B.
Proof
Clearly −1 ≤ B ≤ 1 since \(1 \leq b_{i} \leq 1\) and \(0 \leq \lambda_{i} \leq 1\). If \(\rho < q_{0}/(1+q_{0})\), then \(B_{c} < 1\). If \(\rho > (2q_{0})/(1q_{0})\), then \(B_{c} > 1\). □
Figure 8 illustrates.
3.2 Logistic approximation
The above results show that the outcome of grammar competition in the presence of contextspecific production biases is determined by a complicated interaction between these biases (\(b_{i}\)), the proportion of input that falls in each context (\(\lambda_{i}\)) and the ratio of the parsing advantages of the two competing grammars (ρ). This is because in our model the language acquisition mechanism and the production bias mechanism constitute a feedback loop across iterated applications over multiple generations of language learners, the production biases modulating the acquisition of the grammatical weights \(p_{t}\) and \(q_{t}\). As we noted in Sect. 2.3 (Fig. 7), this feature of the model also implies that when the production biases are particularly strong, they will cause the evolution of the underlying grammar weights to be nonlogistic. The feedback loop gives rise to a nonlinear difference equation for which we have no solution in the general case, and the following problem immediately arises: how can the predictions of our model be tested against empirical data if there is no closedform curve which to fit?
Even though the evolution of \(q_{t}\) is, strictly speaking, logistic only when B = 0, eyeballing trajectories such as the one in Fig. 7b suggests that these trajectories are still Sshaped and perhaps well approximated by logistics. To explore this possibility, we performed a sweep across the model parameter space, generating trajectories of \(q_{t}\) from the initial condition \(q_{0} = 0.01\) (1% usage of \(G_{2}\) at the point of actuation) in the regime \(B > B_{c}\) (i.e. in the parameter regime where \(G_{2}\) is guaranteed to oust \(G_{1}\) by Theorem 3), until \(q_{t}\) had reached the value \(q_{t} = 1q_{0} = 0.99\). We then proceeded to fit a logistic curve to each of these trajectories in order to investigate how well the trajectory may be approximated by a logistic. Figure 9a gives the errors of these fits, showing that the trajectories are closely approximated by logistics whenever ρ is not too large and B is not too close to the critical bifurcation threshold \(B_{c}\).
Figures 9b–c supply the best slope (s) and intercept (k) coefficients found by these regressions. We find that s is a decreasing function of ρ and an increasing function of B: the more advantageous \(G_{2}\) is, and the more \(G_{2}\) is favoured by the production biases, the steeper the underlying change, as one would expect. The intercept coefficient k, in turn, is an increasing function of ρ and a decreasing function of B: the less advantage \(G_{2}\) has and the more the production biases tend to disfavour \(G_{2}\), the more the curve of the underlying change is shifted towards positive time.
4 Evaluation
4.1 The Time Separation Theorem
Under the logistic approximation from Sect. 3, the usage of grammar \(G_{2}\) in the contexts i = 1,…,K is described by a set of K equations
where \(\tilde{q}_{t}\) is a logistic function approximating the true underlying probability \(q_{t}\). What historical language corpora give us are usage frequencies in various contexts, and we therefore wish to fit curves of the form (18) to such data. The fact that under the logistic approximation all such curves are tied to \(\tilde{q}_{t}\), which itself has a closedform solution, now facilitates this empirical evaluation: even though the individual context curves \(q_{t}^{(i)}\) themselves are not logistic (unless \(b_{i}=0\)), they are easily derived from one that is. In what follows, we will take a look at a number of case studies, fitting context curves with the help of a nonlinear least squares optimization algorithm.
Estimating the goodness of fit of these regressions is one important goal of this exercise. However, our main aim is to solve the nonlinking problem identified in Sect. 1.2. Specifically, we wish to demonstrate that our model does not allow arbitrarily large time separations between contexts, operationalized as the difference between the points in time at which different context curves reach their tipping point, or the point in time at which the context frequency of the overtaking grammar equals 0.5 when (18) is generalized for realvalued t. The logistic approximation gives us a straightforward proof of this.
Theorem 4
(The Time Separation Theorem)
For any two contextual reflexes of an underlying change from \(G_{1}\) to \(G_{2}\) approximated by a logistic \(\tilde{q}_{t}\) with slope s, the maximal time separation at tipping points is
Proof
Appendix A.4. □
It is to be noted that Δ(s) is inversely proportional to s—the slower the rate of change, the more time separation is allowed between any two contexts and vice versa.
To fit the system (18) to a set of data points, we first define reasonable ranges of variation for the s and k parameters of the logistic \(\tilde{q}_{t}\) that we wish to probe. We then loop through the values contained in these ranges, finding the best fitting bias parameters \(b_{i}\) for each pair (s,k) using a nonlinear least squares optimization algorithm such as the Gauss–Newton procedure (Bates and Watts 1988), bearing in mind the bounds \(1 \leq b_{i} \leq 1\). Finally, out of all these regressions, we pick the combination of s, k and \(b_{i}\) that provides the best fit to the data in question. The whole procedure is detailed in pseudocode in Appendix A.5.^{Footnote 10}
We now proceed to an evaluation of the model by comparing its predictions against three historical changes for which a CRE has been reported in the literature: the emergence of periphrastic do in the history of English (Kroch 1989), the earliest stages of the English Jespersen Cycle (Wallage 2013), and, to take a phonological example to illustrate the generality of the procedure, the loss of final fortition in Early New High German (Fruehwald et al. 2009). In Sects. 4.2–4.4 we first briefly summarize the linguistics of each change, reproduce the relevant empirical data, and visualize the fit of our model to the data when the regression is conducted using the procedure outlined above. In Sect. 4.5, we take a more quantitative angle and report the numerical errors of these fits, comparing them to the errors that an application of the standard procedure based on individual logistics (cf. Sect. 1.1) would produce. Finally, in Sect. 4.6, we take a look at a pseudoCRE—a case where the standard independent logistics operationalization reports a CRE but where this conclusion is patently absurd from other considerations (cf. Sect. 1.2)—in order to investigate whether or not our model, too, is prone to report false positives in such cases.
4.2 Periphrastic do in English
The first case study we will consider is perhaps the best known instance of a CRE: Kroch’s (1989) interpretation of Ellegård’s (1953) data on the rise of periphrastic do in Early Modern English. The variable in question is whether a form of do is used in certain contexts, as in (20a), or not, as in (20b) (examples from Kroch 1989:216).
 (20)
In modern standard English, a form of do is required in a number of contexts, including all interrogatives as well as negative declaratives. What Ellegård’s (1953) data show is that, on the surface, the use of do appears to ‘take off’ in the different contexts at different rates: for instance, between around 1500 and 1650, negative questions exhibit a much higher proportion of do than affirmative whobject questions or negative declaratives, though the latter contexts eventually catch up (Table 1). Kroch (1989) conducted a regression on these contexts and showed that logistic curves fitted to them individually did not differ much in their slope (s) parameters, although they did differ in terms of the intercept (k) parameter. This particular example, while perhaps the most celebrated instance of a CRE, is in fact not the most straightforward instance found in the literature, primarily due to a ‘dip’ in the later portion of the change in some contexts, which makes the progression of do nonmonotonic; see Warner (2005) and Ecay (2015) for detailed discussion, concluding that other factors (and possibly another grammar) are at play. Kroch (1989) also identifies the dip and consequently focusses on the first seven data points of Ellegård’s (1953) data only; we follow his practice here.
When the algorithm from Appendix A.5 is used to fit a model of the form (18) to these data, the picture in Fig. 10 emerges. On visual inspection, the fit is a good one. Crucially, our model is constrained to allow only so much time separation between any two contexts of one change (Theorem 4), illustrated in Fig. 10 as the horizontal bar extending both ways from the tipping point of the curve of the underlying grammar probability. We find that fitting the model to Ellegård’s (1953) data drives two of the context curves (negative questions and affirmative object questions) to the very extremes of the range licensed by the model—in other words, for these contexts, the production biases need to be maximal in order for the model to fit the data. Crucially, however, the data are described well by the regression curves so obtained, an observation which we back up quantitatively in Sect. 4.5.
4.3 English Jespersen Cycle
Our second case study, also from the history of English, involves the replacement of preverbal ne/ni by postverbal not during the Middle English period. This change involves an intermediate stage in which both ne and not cooccur. The three stages are illustrated in (21a)–(21c) (examples from Wallage 2008:644).
 (21)
This replacement of negators, a crosslinguistically common diachronic pathway, is referred to as Jespersen’s Cycle; see Wallage (2008) and Ingham (2013) for detailed discussion of the English development. For our purposes, the change that is important is the replacement of Stage 1 of Jespersen’s Cycle—negation by ne alone, as in (21a)—with Stage 2, bipartite negation, as exemplified by (21b). Wallage (2013) shows that Stage 2 is favoured with discourseold propositions during the middle of the change, but that a CRE obtains (Table 2). Again, on purely visual inspection, our model fits the data well, and the variation observed between discourseold and discoursenew propositions falls, roughly, within the time bounds prescribed by the Time Separation Theorem (Fig. 11).
4.4 Loss of final fortition in Early New High German
CREs are not found only with syntactic variables. Fruehwald et al. (2009) reanalyse data from Glaser (1985) on the loss of final fortition in (Bavarian) Early New High German, which is observable in orthographic variation of the period, e.g. tak vs. tag ‘day (acc. sg.)’, rat vs. rad ‘counsel (acc. sg.)’. They argue that the orthographic variation clearly represents a phonological change in progress rather than shifting scribal tradition, and that fortition is the result of a single phonological rule whose loss is visible to different degrees in different contexts during the period of the change: /d/ exhibits fortition the most and /g/ the least, with /b/ showing an intermediate pattern (Table 3). Our model describes the data well, with the observed CRE again falling within the time bounds implied by the model (Fig. 12).
This example illustrates that even though our model is based on the variational learner in Yang (2000), which is essentially a hypothesis about parameter setting in syntax, the logistic approximation (18) which underlies the curvefitting procedure can legitimately be used to model CREs in any domain as long as the assumption of an underlying logistic change is justifiable. As Fruehwald et al. (2009:9) point out, “[t]he discovery of the Constant Rate Effect in phonological change is perfectly expected under normal generative theories of phonology when the mechanism of change is grammar or rule competition”—the learning algorithm a language learner uses in this case may (or may not) be different from the one assumed in Yang’s (2000) model, but this notwithstanding, as long as some sort of underlying representation similar to the Yangian weights \(p_{t}\) and \(q_{t}\) can be assumed to exist, our production bias mechanism may be applied.
4.5 Comparison with standard procedure
Sections 4.2–4.4 have adduced evidence to the effect that the model introduced in this paper can account for the CRE: the model gives fair fits to historical data even though it is constrained by the chronological bounds set by the Time Separation Theorem (Theorem 4). To make this argument more quantitatively, in this section we compare these three fits to the standard operationalization of the CRE: that is, logistic curves agreeing in the s (slope) parameter but differing in their k (intercept) parameters. For each of the three case studies considered in Sects. 4.2–4.4, then, we carry out two fits: one for our model, and another one for a model consisting of a set of logistic curves (one per context) where the s parameter is not allowed to vary between contexts but where such variation is allowed for the k parameter.
We quantify the goodness of fit of these regressions in the usual way, by the normalized sum of squared residuals: in other words, for each context of a given change, for each time period, we calculate the displacement between the empirically attested frequency and the value predicted by the model, square this displacement, sum over all time periods and over all contexts, and divide by the number of data points. Thus, the better the fit, the closer the sum of squared residuals is to zero. Some deviance from zero is always to be expected because of the noisy nature of historical language data. However, this measure is still able to capture the difference between models which are good fits, but subject to noise, from models which are simply bad fits to the data in question.
Figure 13 shows a comparison of the goodness of fit of the two models for each of the three case studies, operationalized using the sum of squared residuals. The crucial finding is that our model, which places more constraints on the shape and placement of the regression curves, fares no worse than the standard procedure in two out of three cases: in other words, a more constrained, theoretically motivated model which generates empirical predictions (in the form of the Time Separation Theorem) performs no worse than a less constrained, theoretically unmotivated model. The exception to this are the data on the English Jespersen Cycle, where the less constrained standard formulation reports a very low error. This appears to be a result of the very small number of data points—just three time periods and two contexts—for this particular case study. Low data resolution necessarily gives a disproportionate advantage to the less constrained model over any model that incorporates more assumptions.
4.6 A pseudoCRE
Above, we have shown that the proposed model can account for the CRE, in the sense that it gives good fits to three historical changes—fits which are, in two of these cases, no worse than fits conducted using the standard operationalization of CREs. It remains to be shown that introducing this more constrained model can actually solve some of the underspecification issues the standard operationalization suffers from. As discussed in Sect. 1.2, the method of ‘same slopes, different intercepts’ is susceptible to false positives: the fact that a number of logistics agree in their s or slope parameters is insufficient evidence that a single underlying change is at hand (see Corley 2014; Wallenberg 2016; Willis 2017 for examples where the ‘contexts’ cannot possibly be assumed to be evidence of underlying grammatical unity).
Here, we construct a pseudoCRE by combining the two changes investigated in Sections 4.3 and 4.4: the early stages of the English Jespersen Cycle and loss of fortition in Early New High German. As it turns out, these two changes happen to propagate at very similar paces by accident (Fig. 14). The standard operationalization of CREs is, then, expected to report a CRE between changes to English sentential negation and Bavarian phonology, a conclusion which is clearly absurd.
The fact that the two changes are separated in time, however, means that the more constrained model introduced in this paper can correctly diagnose the pseudoCRE. Figure 15 gives the residual errors of both the present model and the standard one for this pseudoCRE, along with the errors for the actual CREs investigated above. The pattern that emerges is striking: for each of the actual CREs, our model reports an error on the order of 0.005, whereas for the AngloBavarian pseudoCRE the model generates an error that surpasses 0.03. The standard operationalization, by contrast, reports similar errors for all changes, failing to distinguish between the pseudoCRE and the actual CREs.
The pseudoCRE thus sheds light on the manner in which our model constrains variation in the time dimension—a constraint that is not built into the model as a premise but that follows from first principles in the form of the Time Separation Theorem. Ultimately, the amount of time separation allowed between any two contextual reflexes of a single underlying change depends on s, the slope of the underlying logistic \(\tilde{q}_{t}\). To obtain a more intuitive interpretation of the relationship between contextual time separations and the rate of the underlying change, it is useful to convert the slope parameter into a quantity that measures the time the change needs to go from actuation to completion, using the timetocompletion calculations proposed by Ingason et al. (2013:96–97). Namely, it can be shown that for slope s,
gives the time it takes for a change to proceed from initial frequency \(\tilde{q}_{0}\) to final frequency \(1\tilde{q}_{0}\), for any (small) \(\tilde{q}_{0}\) with \(0< \tilde{q}_{0} \leq 0.5\). Choosing \(\tilde{q}_{0} = 0.01\), a reasonable choice corresponding to 1% usage of the new variant at the point of actuation, this ‘inverse slope’ then gives a timetocompletion of
time units (e.g. years) for any slope s. Theorem 4, on the other hand, implies a maximal time separation of
units between any two contexts of a change proceeding at rate s. Since 1.8/9.2 ≈ 0.2, this means that the maximal time separation between any two contexts of a single underlying change is roughly a fifth of the time it takes for the change to go to completion in any context individually.
This fact can be used as a heuristic to evaluate purported CREs. For the AngloBavarian pseudoCRE, for instance, the timetocompletion for \(\tilde{q}_{0} =0.01\) is
years when calculated for slope s = −0.0175, which is the arithmetic mean of the slopes found by our regressions for the two changes previously (see captions to Figs. 11 and 12). This implies that the time separation between any two contexts should be no more than 0.2⋅525 = 105 years. On visual inspection, however, the empirical time separation between the discourseold and /d/ ‘contexts’ must be at least 300 years (Fig. 14). This, essentially, is why the model is able to diagnose the pseudoCRE.
5 Discussion
In this paper, we have augmented Yang’s (2000, 2002) variational learner with production biases that vary by context—the first time, to our knowledge, that this has been done.^{Footnote 11} We have used this model to make precise the important intuition of Kroch (1989) that variation between contexts in the increasing use of a new variant may, under certain circumstances, be due to the interaction of a single underlying change with fixed contextual biases. Two important issues remain to be discussed: the diachronic implications of the interaction between language acquisition and production biases, and the nature of the production biases themselves.
5.1 Which grammar wins?
Yang’s (2000) Fundamental Theorem of Language Change, given earlier as Theorem 1, can be paraphrased as follows: when grammars compete, the one with the greater parsing advantage will win. In Sect. 3 we have shown that this result does not hold in our model. Instead, bias and advantage together determine which grammar will triumph: the precise way in which this works is given in our Extended Fundamental Theorem (Theorem 3).
In one sense this result is unsurprising: one of the great virtues of Yang’s (2000, 2002) model of the learner and of diachronic change is its simplicity, and our model introduces additional complexity. It is therefore not a particular surprise that our more complex model does not yield the same intuitive generalization. On the other hand, it is not a necessary consequence of this complication that the Fundamental Theorem fails to hold. As Fig. 8 shows, if we add to our model the stipulation that contextual weights must always be precisely in balance (B = 0), then Yang’s Fundamental Theorem does hold. Such a stipulation would be wholly unmotivated, as far as we are aware, and represents a more complex model than ours.
Moreover, we have not, of course, shown that the Fundamental Theorem of Language Change is false—merely that it is false under the assumptions we make. Whether or not it is false, empirically speaking, depends on how well our model, and Yang’s (2000, 2002) model, correspond to reality: specifically, whether a model that incorporates the effect of contextual biases as ours does is more realistic than one that does not, and more realistic than one that constrains the net bias. We think that is right, but it is likely that a full consideration of the facts of reallife acquisition and change will require a model that is substantially more complex than any that has been proposed thus far. One feature of Yang’s model, with or without our extension, is that it is completely impossible for a grammar \(G_{2}\) to overtake and defeat another grammar \(G_{1}\) if the weak generative capacity of \(G_{2}\) is a proper subset of that of \(G_{1}\); yet a preference for exactly this kind of subset is often invoked in the context of acquisition in the form of the Subset Principle (Berwick 1985; Manzini and Wexler 1987), and, in the domain of phonology at least, retreat to the subset is a frequentlyattested diachronic pathway, since unconditioned mergers are wellattested and have precisely the effect of reducing the number of forms generated by the grammar (see e.g. Labov 1994:551). Future work will need to address these questions of realism, as well as pursuing further analytical consequences of simpler (and thus more tractable) models like this one.
5.2 The nature of production biases
Up to now we have remained mute with respect to the ontology of production biases, beyond stating that they are biases that affect production. In principle, such biases could assume a number of forms. In a word order change such as OV to VO, for instance, one possibility for interpreting the fixed biases we have proposed is as a reflection of performance pressures in the sense of Hawkins (1994, 2004). Hawkins’s (2004:38) principle of Minimize Domains states that “[t]he human processor prefers to minimise the connected sequences of linguistic forms and their conventionally associated syntactic and semantic properties in which relations of combination and/or dependency are processed.” This general principle is made concrete using a metric of Early Immediate Constituents (EIC), which serves to favour syntactic structures with a uniform directionality of branching. Importantly, EIC does not penalize rightbranching (e.g. VO) or leftbranching (e.g. OV) grammars directly, instead disfavouring individual structures with a disharmonic directionality of branching, for instance when a headfinal VP is embedded under a headinitial TP. This is equivalent to a contextspecific production bias in our sense. Hawkins conceptualizes Minimize Domains and EIC as principles of parsing rather than of production, but notes that there is evidence that EIC might be involved in production too (Hawkins 2004:106), and states that “if EIC can be systematically generalized from a model of comprehension to a model of production […] then so much the better” (Hawkins 1994:427). Hawkins’s principles have also been reformulated as principles of derivational/computational complexity by Mobbs (2008) and Walkden (2009).
In phonological change, meanwhile, the biases can be interpreted as well established articulatory phonetic effects. Final fortition, for example, is known to be more likely to apply to velar consonants than to coronal consonants and more likely to apply to coronal consonants than labial consonants (Ohala 1983), and this order of preference seems to be observed diachronically as final fortition emerged in the history of Frisian (Tiersma 1985).^{Footnote 12} These two phonological and syntactic examples are intended to give a flavour of how contextual production biases can be interpreted, not to exhaust the range of possibilities. For other changes, other biases might be necessary: for instance, in Wallage’s (2013) data, discourseold propositions favour Stage 2 of the Jespersen Cycle during the change, and the biases here could plausibly reflect Gricean maxims of cooperative communication.
The abovementioned biases—constraints on syntactic processing, articulatory pressures, pragmatic principles—are plausibly innate in the sense that they are shared by all speakers across all languages and are not subject to change. This is why in our model definition we maintained that the biases \(b_{i}\) be diachronically constant. Note that the logic here is not just that constant biases imply Constant Rate Effects—we are actually defending the stronger claim that Constant Rate Effects occur if, and only if, diachronically constant biases impinge on an underlying change. Timedependent biases, or random biases, would result in change processes in which the trajectories of different linguistic contexts are not parallel to each other—something we might refer to as an ‘Inconstant Rate Effect.’ Having said that, it is not inconceivable that some noninnate biases are constant on sufficiently long timescales so that they may give rise to Constant Rate Effects: this will be the case when the underlying change itself is fast enough to be carried to completion within the timeframe in which the biases stay fixed. This could be true of certain sociolinguistic biases, and here the biasing mechanism of our model is in agreement with sociolinguistic work (e.g. Labov 2001:ch. 9) which has found some types of sociolinguistic bias modulation to be strongest midway through the change, just as in our model (cf. Fig. 5).^{Footnote 13}
6 Conclusion
Building on earlier work that derives logistic evolution as a populationlevel property of language change (Niyogi and Berwick 1997; Yang 2000, 2002), we have provided a mechanism for the Constant Rate Effect proposed by Kroch (1989). We have done this by enriching Yang’s (2000, 2002) model of acquisition and change with a contextual bias mechanism that links different context curves to a single underlying change. The work also provides a method of testing for CREs that is demonstrably superior to the traditional method of ‘same slope, different intercepts,’ since it is a consequence of the model that there is a fixed upper bound on the time separation of contextual curves. We have shown that this enables us to distinguish certain types of pseudoCREs from instances in which a single underlying grammatical change is actually plausible. We have also shown that advantage, in the Yangian sense, is not the only factor at play in determining the ultimate outcome of a situation of grammatical competition, if the basic assumptions of our model hold true.
The upshot of all this is that it is now possible to test whether divergent usage frequencies in corpora across different contexts during the course of a change in fact mask a deeper underlying grammatical homogeneity, and to do so in a more restricted and principled way than has been possible to date. Crucially, the method we propose is not only methodologically superior to the standard operationalization of CRE testing: our model in fact derives the possibility of CREs, and sets tight bounds on the kind of empirically observed situation that can be said to constitute a CRE.
Notes
There exist a number of methods to implement this procedure, such as nonlinear regression on bare frequencies, linear regression on logittransformed data, and multivariate regression. Which method is chosen is a technical matter; conceptually, all of these implementations share the basic theoretical assumption that the reflexes of one underlying change are described by a family of logistics agreeing in their s parameters but possibly differing in their k parameters.
Postma (2017) notes that the logistic function is the general solution of Verhulst’s differential equation and that a family of contextual curves results when this differential equation is solved for specific initial conditions. While a true statement, this does not constitute a model of the CRE in the strict sense and does not solve the problems we outline below.
Paolillo (2011) raises a problem that may be related. The standard way of testing for the statistical significance of a putative CRE is to perform a chisquare test of independence on the s values of the regressions for the different contexts (Kroch 1989; Santorini 1993; Pintzuk 1995). If the result is not statistically significant, then it is concluded that there is support for a CRE. However, it is not sound to treat a nonsignificant value as evidence for the null hypothesis, since it was assumed to begin with. We acknowledge this problem and have no solution to it in the present paper, except insofar as our method of modelling CREs does not rely on nullhypothesis significance testing at all.
In what follows, we are mostly concerned with equations for \(q_{t}\) (the weight of \(G_{2}\)) and consider the conditions under which \(G_{2}\) will replace \(G_{1}\). The corresponding value of \(p_{t}\) can always be recovered from the fact that in a twogrammar setting, \(p_{t} + q_{t} = 1\).
For two competing grammars, the linear reward–penalty learning algorithm assumes the following form for learning rate 0<γ<1 (see Yang 2000 for more details). Assuming that the learner’s initial guess for the weight of grammar \(G_{2}\) is \(Q_{0} = 0.5\) (no a priori bias), then, for input sentence s = 1,…,N, the learner picks \(G_{2}\) with probability \(Q_{s1}\) (and \(G_{1}\) with probability \(1Q_{s1}\)), attempts to parse the sentence, and sets \(Q_{s} = Q_{s1} + \gamma (1Q_{s1})\) if \(G_{2}\) parses s, and \(Q_{s} = (1\gamma)Q_{s1}\) if \(G_{2}\) does not parse s. Thus, Q is increased with successful parsing events and decreased with unsuccessful parsing events. Finally, we set \(q_{t+1} = Q_{N}\). Under the simplifying assumption that N→∞, the learner does not have to contend with a finite dataset or a critical period. It is of course false, but like much work in learnability and modelling we adopt it here in order to derive analytical approximations such as (4) which would otherwise be difficult, if not impossible, to derive. This approximation holds in the following sense: \(q_{t+1}\) converges to a normal distribution with mean \(q_{t+1} = c_{t}/(c_{t} + d_{t})\) and a variance which tends to 0 as γ→0 and Nγ→∞ (Narendra and Thathachar 1989:162–163). Assuming a finite learning sample would introduce a stochastic component (noise) to the system, and exploring the consequences of this falls beyond the scope of the present paper.
Formally, this means that the contexts constitute a partition of the set \(L_{1}\cup L_{X}\cup L_{2}\) in the usual settheoretic sense: the contexts are pairwise disjoint subsets of \(L_{1}\cup L_{X}\cup L_{2}\) and their union equals the whole of \(L_{1}\cup L_{X}\cup L_{2}\).
Associating positive production biases with a favour for \(G_{2}\) over \(G_{1}\) (rather than \(G_{1}\) over \(G_{2}\)) is but a convention and does not affect the dynamics of our model: reverting the biases would merely swap the labels of the two grammars.
We further elaborate on the empirical grounding of our biasing mechanism in Sect. 5.
Note that −1 ≤ B ≤ 1, since \(0 \leq \lambda_{i} \leq 1\) and \(1 \leq b_{i} \leq 1\).
R (R Core Team 2012) code implementing the procedure can be obtained from the authors.
Clark et al. (2008) present a filtered version of Yang’s model to account for typological skews. In this model universal biases play a role, but crucially the biases apply to grammars as a whole rather than to particular contexts of use.
We might expect to find the reverse effect in the data from Fruehwald et al. (2009) discussed above, but, curiously, we do not: /g/ is the most favouring context for the loss of devoicing.
We are grateful to Charles Yang for bringing this point to our attention.
References
Bates, Douglas M., and Donald G. Watts. 1988. Nonlinear regression analysis and its applications. New York: Wiley.
Berwick, Robert C. 1985. The acquisition of syntactic knowledge. Cambridge: MIT Press.
Bush, Robert R., and Frederick Mosteller. 1951. A mathematical model for simple learning. Psychological Review 68: 313–323.
Bush, Robert R., and Frederick Mosteller. 1958. Stochastic models for learning. New York: Wiley.
Chomsky, Noam. 1981. Lectures on government and binding. Dordrecht: Foris.
Chomsky, Noam, and Howard Lasnik. 1993. The theory of principles and parameters. In Syntax: An international handbook of contemporary research, eds. Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld, and Theo Vennemann, Vol. 1, 506–569. Berlin: De Gruyter.
Clark, Brady, Matthew Goldrick, and Kenneth Konopka. 2008. Language change as a source of word order correlations. In Variation, selection, development: Probing the evolutionary model of language change, eds. Regine Eckardt, Gerhard Jäger, and Tonjes Veenstra, 75–102. Berlin: De Gruyter.
Corley, Kerry. 2014. The constant rate hypothesis in syntactic change: Empirical fact or “lies, damned lies, and statistics”? BA diss., University of Cambridge.
Durham, Mercedes, Bill Haddican, Eytan Zweig, Daniel Ezra Johnson, Zipporah Baker, David Cockeram, Esther Danks, and Louise Tyler. 2012. Constant linguistic effects in the diffusion of be like. Journal of English Linguistics 40: 316–337.
Ecay, Aaron W. 2015. A multistep analysis of the evolution of English dosupport. PhD diss., University of Pennsylvania. Available at http://repository.upenn.edu/edissertations/1049/. Accessed 27 August 2017.
Ellegård, Alvar. 1953. The auxiliary do: The establishment and regulation of its use in English. Stockholm: Almqvist and Wiksell.
Fruehwald, Josef, Jonathan GressWright, and Joel C. Wallenberg. 2009. Phonological change: The constant rate effect. In North East Linguistic Society (NELS) 40. Amherst: GLSA.
Gardiner, Shayna. 2015. Taking possession of the constant rate hypothesis: Variation and change in Ancient Egyptian possessive constructions. University of Pennsylvania Working Papers in Linguistics 21: 69–78.
Glaser, Elvira. 1985. Graphische Studien zum Schreibsprachwandel vom 13. bis 16. Jahrhundert. Heidelberg: Carl Winter.
Hawkins, John A. 1994. A performance theory of order and constituency. Cambridge: Cambridge University Press.
Hawkins, John A. 2004. Efficiency and complexity in grammars. Oxford: Oxford University Press.
Heycock, Caroline, and Joel Wallenberg. 2013. How variational acquisition drives syntactic change: The loss of verb movement in Scandinavian. Journal of Comparative Germanic Linguistics 16: 127–157.
Ingason, Anton Karl, Julie Anne Legate, and Charles Yang. 2013. The evolutionary trajectory of the Icelandic New Passive. University of Pennsylvania Working Papers in Linguistics 19: 91–100.
Ingham, Richard. 2013. Negation in the history of English. In The history of negation in the languages of Europe and the Mediterranean, eds. David Willis, Christopher Lucas, and Anne Breitbarth, Vol. 1: Case studies, 119–150. Oxford: Oxford University Press.
Kallel, Amel. 2005. The loss of negative concord and the constant rate hypothesis. University of Pennsylvania Working Papers in Linguistics 10: 128–142.
Kallel, Amel. 2007. The loss of negative concord in Standard English: Internal factors. Language Variation and Change 19: 27–49.
Kroch, Anthony. 1989. Reflexes of grammar in patterns of language change. Language Variation and Change 1: 199–244.
Kroch, Anthony. 1994. Morphosyntactic variation. In 30th annual meeting of the Chicago Linguistic Society (CLS), ed. Katharine Beals et al., 180–201. Chicago: Chicago Linguistic Society.
Kroch, Anthony. 2000. Syntactic change. In The handbook of contemporary syntactic theory, eds. Mark Baltin and Chris Collins, 629–739. Oxford: Blackwell.
Labov, William. 1994. Principles of linguistic change, Vol. 1: Internal factors. Oxford: Blackwell.
Labov, William. 2001. Principles of linguistic change, Vol. 2: Social factors. Malden: Blackwell.
Manzini, M. Rita, and Kenneth Wexler. 1987. Parameters, binding theory, and learnability. Linguistic Inquiry 18: 413–444.
Mobbs, Iain. 2008. ‘Functionalism’, the design of the language faculty, and (disharmonic) typology. MPhil dissertation, University of Cambridge. Available at http://ling.auf.net/lingbuzz/000680. Accessed 27 August 2017.
Narendra, Kumpati S., and Mandayam A. L. Thathachar. 1989. Learning automata: An introduction. Englewood Cliffs: PrenticeHall.
Nevalainen, Terttu, and Helena RaumolinBrunberg. 2003. Historical sociolinguistics: Language change in Tudor and Stuart England. London: Pearson.
Niyogi, Partha, and Robert C. Berwick. 1997. A dynamical systems model for language change. Complex Systems 11: 161–204.
Ohala, John J. 1983. The origin of sound patterns in vocal tract constraints. In The production of speech, ed. Peter F. MacNeilage, 189–216. New York: Springer.
Paolillo, John C. 2011. Independence claims in linguistics. Language Variation and Change 23: 257–274.
Pintzuk, Susan. 1995. Variation and change in Old English clause structure. Language Variation and Change 7: 229–260.
Pintzuk, Susan. 2003. Variationist approaches to syntactic change. In The handbook of historical linguistics, eds. Brian D. Joseph and Richard D. Janda, 509–528. Oxford: Blackwell.
Pintzuk, Susan, and Ann Taylor. 2006. The loss of OV order in the history of English. In The handbook of the history of English, eds. Ans van Kemenade and Bettelou Los, 249–278. Malden: Blackwell.
Postma, Gertjan. 2010. The impact of failed changes. In Continuity and change in grammar, eds. Anne Breitbarth, Christopher Lucas, Sheila Watts, and David Willis, 269–302. Amsterdam: John Benjamins.
Postma, Gertjan. 2017. Modelling transient states in language change. In Microchange and macrochange in diachronic syntax, eds. Éric Mathieu and Robert Truswell. Oxford: Oxford University Press.
R Core Team. 2012. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Available at http://www.Rproject.org/. Accessed 27 August 2017.
Roberts, Ian. 2007. Diachronic syntax. Oxford: Oxford University Press.
Santorini, Beatrice. 1992. Variation and change in Yiddish subordinate clause word order. Natural Language and Linguistic Theory 10: 595–640.
Santorini, Beatrice. 1993. The rate of phrase structure change in the history of Yiddish. Language Variation and Change 5: 257–283.
Tiersma, Pieter M. 1985. Frisian reference grammar. Dordrecht: Foris.
Walkden, George. 2009. Deriving the FinaloverFinal Constraint from third factor considerations. Cambridge Occasional Papers in Linguistics 5: 67–72.
Wallage, Phillip. 2008. Jespersen’s Cycle in Middle English: Parametric variation and grammatical competition. Lingua 118: 643–674.
Wallage, Phillip. 2013. Functional differentiation and grammatical competition in the English Jespersen Cycle. Journal of Historical Syntax 2: 1–25.
Wallenberg, Joel C. 2016. Extraposition is disappearing. Language 92: e237–e256.
Warner, Anthony. 2005. Why DO dove: Evidence for register variation in Early Modern English. Language Variation and Change 17: 257–280.
Willis, David. 2017. Investigating geospatial models of the diffusion of morphosyntactic innovations: The Welsh strong secondperson singular pronoun chdi. Journal of Linguistic Geography 5: 41–66.
Yang, Charles D. 2000. Internal and external forces in language change. Language Variation and Change 12: 231–250.
Yang, Charles D. 2002. Knowledge and learning in natural language. Oxford: Oxford University Press.
Acknowledgements
We thank Ricardo BermúdezOtero, Laurel MacKenzie, David Willis, Charles Yang and the reviewers of this paper for comments that led to important improvements, as well as David Denison, Aaron Ecay, Tobias Galla, Anthony Kroch, Gertjan Postma, Alexandra Simonenko and Richard Zimmermann for valuable discussions on the subject. Early versions of the paper were delivered at the 22nd International Conference on Historical Linguistics (Naples, 27–31 July 2015), the workshop on Dealing with Bad Data in Linguistic Theory (Amsterdam, 17–18 March 2016), the 39th Generative Linguistics in the Old World conference (Göttingen, 5–8 April 2016), New Ways of Analyzing Syntactic Variation 2 (Ghent, 19–20 May 2016), and the 18th Diachronic Generative Syntax conference (Ghent, 29 June–1 July 2016); we thank our audiences, conference organizing committees and reviewers. HK gratefully acknowledges financial support from Emil Aaltonen Foundation and The Ella and Georg Ehrnrooth Foundation.
Author information
Authors and Affiliations
Corresponding author
Appendix: Derivations
Appendix: Derivations
1.1 A.1 Evolution of \(q_{t}\)
Theorem 5
Let \(\Lambda_{0} = 1\) and
Then, for all t,
where \(\mathcal{L}_{t} = \prod_{\tau=0}^{t1} \Lambda_{\tau}\) for t ≥ 1 and \(\mathcal{L}_{0} = 1\).
Proof
Induction. For t = 0,
Now assume that the claim holds for t. Then
as desired. □
Corollary 3
Let B = 0. Then the evolution of \(q_{t}\) is logistic.
Proof
Let B = 0. Then from equation (16) \(\Lambda_{t} = 1\) for all t, so that
Theorem 5 now implies
Now assume \(q_{t}\neq 0\) and divide both the numerator and the denominator by \(q_{0}\):
Hence, \(q_{t}\) is logistic with s = −log(ρ) and \(k = \frac{1}{\log(\rho)}\log (\frac{1q_{0}}{q_{0}} )\). □
Corollary 4
In Yang’s (2000) model, the weight \(q_{t}\) evolves as
which is a logistic function of t.
Proof
In this model, \(q_{t}\) obeys (A.1) with \(\Lambda_{t} = 1\) for all t. From Theorem 5,
Assuming, without loss of generality, that \(q_{0} \neq 0\), we derive
This is a logistic function by Corollary 3. □
1.2 A.2 The biasmodulating functions F and G
Theorem 6
Let
Then
if and only if
Proof
Writing \(F = F(b_{i},p_{t})\), \(G=G(b_{i},q_{t})\), \(p=p_{t}\) and \(q=q_{t}\), the first requirement in (A.12) implies that
since
The second requirement in (A.12), on the other hand, implies F ≤ q and −F ≤ p, since
and
hence
and an exactly symmetric argument shows that
Thus
On the other hand, if (A.20) holds, then
and
Similarly,
and an analogous argument shows that \(0 \leq q_{t}^{(i)} \leq 1\). □
1.3 A.3 Proof of the Extended Fundamental Theorem
Let
To prove Theorem 3, we make use of the following auxiliary result:
Theorem 7
For all t:

(1)
\(q_{t+1} > q_{t}\) if \(B > B_{c}(\rho, q_{t})\);

(2)
\(q_{t+1} = q_{t}\) if \(B = B_{c}(\rho, q_{t})\);

(3)
\(q_{t+1} < q_{t}\) if \(B < B_{c}(\rho, q_{t})\).
Proof
with
To prove the first claim, we examine the difference \(q_{t+1}  q_{t}\). Now \(q_{t+1}  q_{t} > 0\) if and only if
Now, −1 ≤ B ≤ 1, so always \(1  q_{t}B > 0\) and \(1 + (1q_{t})B \geq q_{t} > 0\). Hence
as desired. Claims 2 and 3 are handled similarly. □
Corollary 5
For all t:

(1)
\(q_{t+1} > q_{t}\) if \(B > B_{c}(\rho, q_{0})\);

(2)
\(q_{t+1} = q_{t}\) if \(B = B_{c}(\rho, q_{0})\);

(3)
\(q_{t+1} < q_{t}\) if \(B < B_{c}(\rho, q_{0})\).
Proof
First, we note that \(B_{c}(\rho, q_{t})\) is a decreasing function of \(q_{t}\): if q<Q, then \(B_{c}(\rho, q) \geq B_{c}(\rho, Q)\).
Now let \(B > B_{c}(\rho, q_{0})\). Then by Theorem 7 \(q_{1} > q_{0}\). Since \(B_{c}\) is decreasing, \(B_{c}(\rho, q_{1}) \leq B_{c}(\rho, q_{0}) < B\). Hence \(q_{2} > q_{1}\) by Theorem 7. By full induction, \(q_{t+1} > q_{t}\) for all t.
Let \(B = B_{c}(\rho, q_{0})\). Then by Theorem 7 \(q_{1} = q_{0}\). Then \(B_{c}(\rho, q_{1}) = B_{c}(\rho, q_{0}) = B\), so that \(q_{2} = q_{1}\) by Theorem 7. By full induction, \(q_{t+1} = q_{t}\) for all t.
Finally, let \(B < B_{c}(\rho, q_{0})\). Then by Theorem 7 \(q_{1} < q_{0}\). Since \(B_{c}\) is decreasing, \(B_{c}(\rho, q_{1}) \geq B_{c}(\rho, q_{0}) > B\), so that \(q_{2} < q_{1}\) by Theorem 7. By full induction, \(q_{t+1} < q_{t}\) for all t. □
Theorem 3 now follows from Corollary 5: since \(q_{t}\) is bounded between 0 and 1 and \(q_{0}\) may be chosen arbitrarily close to 1 or 0, \(q_{t} \to 1\) if \(B > B_{c}(\rho, q_{0})\) and \(q_{t} \to 0\) if \(B < B_{c}(\rho, q_{0})\) in the limit t→∞.
1.4 A.4 Proof of the Time Separation Theorem
We shall now prove Theorem 4, reproduced here as Theorem 8:
Theorem 8
For any two contextual reflexes of an underlying change from \(G_{1}\) to \(G_{2}\) approximated by a logistic \(\tilde{q}_{t}\) with slope s, the maximal time separation at tipping points is
Proof
Assume two contexts 1 and 2. The maximal separation will of course be attained with maximal biases \(b_{1} = 1\) and \(b_{2} = 1\) (or vice versa). Assume this. Then \(q^{(2)}_{t} = \tilde{q}_{t}  \tilde{q}_{t}(1\tilde{q}_{t}) = \tilde{q}_{t}^{2}\), and so \(q^{(2)}_{t} = 1/2\) when \(\tilde{q}_{t} = 1/\sqrt{2}\). On the other hand, \(\tilde{q}_{t}\) is given by the logistic
where we assume k = 0 because translation along the time axis obviously makes no difference to the argument here. A little algebra now shows that \(\tilde{q}_{t} = 1/\sqrt{2}\) if and only if \(t = \frac{1}{s}\log(\sqrt{2}1)\). Thus, \(q_{t}^{(2)}\) attains its tipping point at time \(t_{2}^{*} =\frac{1}{s}\log(\sqrt{2}1)\). Since the logistic \(\tilde{q}_{t}\) itself attains its tipping point at t = 0 and since \(b_{1} = b_{2}\), symmetry implies that \(q_{t}^{(1)}\) attains its tipping point at \(t_{1}^{*} = 0  t_{2}^{*} = \frac{1}{s}\log(\sqrt{2}1)\). Hence
as wished. □
1.5 A.5 Curvefitting algorithm for the extended model
This appendix provides the curvefitting algorithm used for model evaluation in Sect. 4, in pseudocode. We assume the existence of three subroutines: Regress, Param and Error. The first of these can be any optimization algorithm that performs nonlinear regression on the ith context for given s and k, fitting a curve of the form (18) subject to the lower and upper bounds \(1 \leq b_{i} \leq 1\). The second routine is assumed to return the bias size \(b_{i}\) found by this regression, and the third to give the error (in terms of the normalized sum of squared residuals) of the fit.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Kauhanen, H., Walkden, G. Deriving the Constant Rate Effect. Nat Lang Linguist Theory 36, 483–521 (2018). https://doi.org/10.1007/s1104901793801
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1104901793801