Biology & Philosophy

, Volume 29, Issue 5, pp 611–655 | Cite as

The logical structure of evolutionary explanation and prediction: Darwinism’s fundamental schema



We present a logically detailed case-study of Darwinian evolutionary explanation. Special features of Darwin’s explanatory schema made it an unusual theoretical breakthrough, from the point of view of the philosophy of science. The schema employs no theoretical terms, and puts forward no theoretical hypotheses. Instead, it uses three observational generalizations—Variability, Heritability and Differential Reproduction—along with an innocuous assumption of Causal Efficacy, to derive Adaptive Evolution as a necessary consequence. Adaptive Evolution in turn, with one assumption of scale (‘Deep Time’), implies the observational generalization of Adaptation. It is a fascinating methodological task to regiment the premises and make the reasoning both rigorous and clear. Doing so reveals how surprisingly small an amount of mathematics is needed in order to carry out the argument. The investigation also reveals the crucial role played by heritability, and how heritability itself admits of Darwinian explanation.


Evolution Adaptation Heritability Differential reproduction Variability 


This essay expands on an account first sketched in Schilcher and Tennant (1984), pp. 95–98.

Here is the task. Imagine—anachronistically—that you are a contemporary of Darwin, back in 1859, say; but that you happen to have a computer equipped with Excel. This study will show how you could have convinced yourself, rigorously, that Darwin was right. The aim is to regiment Darwin’s explanatory argument for adaptive evolutionary changes leading to states of adaptedness. In his immortal and oft-quoted words:

As many more individuals of each species are born than can possibly survive; and as, consequently, there is a frequently recurring struggle for existence, it follows that any being, if it vary however slightly in any manner profitable to itself, under the complex and sometimes varying conditions of life, will have a better chance of surviving, and thus be naturally selected. From the strong principle of inheritance, any selected variety will tend to propagate its new and modified form. (From the Introduction to Darwin (1859).)

The task we have set ourselves involves identifying and regimenting the premises of the argument, and showing exactly what sort of reasoning is involved. Our only assumed advantage over Darwin is that we have the cognitive prosthetic of an Excel spreadsheet. We purposely avoid appealing to any post-Darwinian knowledge from other areas of the neo-Darwinian synthesis. In particular, we do not assume any principles of Mendelian genetics, let alone any principles of molecular biology. (We shall advert to Mendelian matters only occasionally, by way of illustration.)

This anachronistic thought-experiment has interesting consequences. To foreshadow: Darwinian evolutionary explanation nicely exemplifies the hypothetico-deductive account of scientific explanation;1 the conceptual materials involved are very simple; and the argument itself is, if not analytically valid, then at least a priori. But, perhaps most importantly: the ability to crunch numbers quickly, and display outcomes graphically, and to investigate the sometimes dramatic effects of even very slight changes of numerical inputs, engender a qualitative grasp of evolutionary possibilities effectively denied to Darwin and his contemporaries who lacked such cognitive prosthetics. One is also able to dig more deeply into the concept of heritability itself, and to reach a vantage point from which one can appreciate that there could have been ‘natural selection’ for heritability itself. This surprising result cannot be further elucidated at this stage. It awaits some needed explication of the concept of heritability before its deeper significance can be made apparent. Details will emerge by the section “Reflections on heritability”, which will provide the promised elucidation.

Relationship of the ideas worked out here to some in the recent literature

The use-occurrence of the phrase ‘natural selection’ in the previous paragraph is one of only two in this work. It is important to note that it occurs within scare quotes, as does the other use-occurrence, which will be found in the final sentence of the work. Hence we do not need to consider, or take sides in, the modern debate about whether natural selection (if there is such a ‘thing’) is a causal process, or a force; nor do we need to have a view as to whether it operates on genes, or on individuals, or on populations. (See Millstein (2006), Brunnander (2007), and Matthen and Ariew (2009), for interesting discussion of these issues.) We note that Walsh (2012), at p. 195, advances an interpretation of natural selection as a (merely) ‘higher-order effect’:

It does not cause individuals to live, die or reproduce. It is not a further cause over and above the natural activities of organisms. It is an ensemble-level process, but not a causal process. Natural selection is an aggregate of causal processes: those processes that constitute the natural activities of organisms in the struggle for life—birth, death and reproduction.

This study aims to justify such a view by providing a detailed regimentation of Darwinian reasoning that does not invoke natural selection as a ‘force of nature’. A proper understanding of the logico-mathematical structure of evolutionary explanation turns out to commend such a more austere metaphysics.

Darwin’s explanatory schema

In any breeding population of organisms, one can observe the following general facts.
  1. (V)

    Variability: Individuals tend to vary with respect to many an observable trait.

  2. (H)

    Heritability: Offspring tend to resemble their parents in respect of these variable traits.

  3. (DR)

    Differential Reproduction: Not all organisms have the same number of offspring.

Facts like these are available to even the most casual observer who takes it into his or her head to look out for them, once made aware of the possibility that they might obtain.

By ‘observing’ facts here we mean determining that they obtain, without using any instruments to enhance one’s powers of observation. Observation, in this sense, might involve patient counting and record-keeping over an extended period of time, and might involve the use of macroscopic measuring devices such as rulers, measuring tapes, and weight scales. The basic point is that one does not need any advanced scientific knowledge, or reliance on theoretical hypotheses, in order to observe the matters of fact in question. To put matters more vividly: these general facts are known to, or at least could be known by, any farm boy in Iowa. One uses the farm boy in Iowa as one’s Everyman here, rather than Darwin’s pigeon fancier, because the former, more than the latter, is likely to be a Bible-thumping creationist.

Anyone unaware of the fact that there is differential reproduction will, upon being told about it, start noticing that different people have different numbers of children (whether in or out of wedlock); and that their household pets produce varying numbers of litters with varying numbers of offspring.

Likewise, anyone tipped off to look out for heritability will begin to notice strong family resemblances. They will come to appreciate that children take after one or other of their parents in many different ways—physical, temperamental, and behavioral.

As for variability: any observer will notice the many and systematic differences among individuals along various scales of comparison. Among these are height, weight, skin color, type and color of hair, facial features, agility, running speed, throwing strength, level of sexual attractiveness, readiness to laugh, level of social insecurity, kindness, optimism, reliability, foresight, prudence, greed, ability to hold their liquor, speed of uptake, willingness to travel, appetite for new challenges in life, etc. We stress: all these are observable features, even if some of them are not readily or immediately observable, but only inferrable from what one can observe by attending to a number of different occasions, or seeing what happens over a considerable period of time.

And here is another observable feature of the biological realm with which one will readily concur:
  1. (A)

    Adaptation, and Species-Diversity: There are many different species of organism, all of them adapted to the environmental niches that they inhabit. They appear to be ‘well designed’ to cope with the challenges that they frequently encounter. They are well equipped to find sustenance, avoid or evade predators, find mates, reproduce, and (albeit to different degrees, depending on how mature their offspring are at birth) rear their young.

If we were to leave matters there, we would have a hodge-podge of observational generalizations. Epistemologically, each is robust. Evidence can be gathered, and will mount over time, to make the casual observer more and more convinced that claims (V), (H), (DR) and (A) do indeed hold. Moreover, our expository fiction of the unobservant thinker who finally twigs to these facts upon being told to look out for them is just that: a fiction. We all grow up imbibing these facts in a very explicit way, throughout infancy and childhood. By the time we reach adulthood, (V), (H), (DR) and (A) are common wisdom. It would be a peculiar young man or woman who would not immediately concur with (V), (H), (DR) and (A) when presented with them explicitly for summary judgement.
For millenia casual observers lived with these familiar facts of life—(V), (H), (DR) and (A)—all on a par, as it were, without appreciating any connection among them. Then, in 1859, something momentous happened. Charles Darwin, in his Origin of Species, claimed, in effect, that (A) is an ineluctable causal consequence of (V), (H) and (DR):
$$\begin{aligned} \left. \begin{array}{l} \textsc {Variability}\\ \textsc {Heritability}\\ \textsc {Differential\, Reproduction} \end{array}\right\} \quad \Longrightarrow \quad \begin{array}{l} \textsc {Adaptation,}\\ \textsc {and\, Species-Diversity} \end{array} \end{aligned}$$
At the time, this was a truly arresting insight. For it removed the previously felt need to offer a supernatural or theistic explanation for the truth of (A). Instead, the truth of (A) was now appreciated as following, by efficient causation, from the entirely mundane facts (V), (H) and (DR). (Indeed, as we have been expressing them: the entirely mundane and observable facts (V), (H) and (DR).)

Just how ineluctable a consequence is (A) from (V), (H) and (DR)? The present author believes that there is abroad, among scientists and philosophers, a largely incorrect understanding of the inferential transition from (V), (H) and (DR) to (A). The main purpose of this study is to offer a regimentation of Darwin’s argument that will reveal how compelling it is, even given only what was known in his day.

Explanations in physics

Consider, for purposes of comparison, the inferential transition \(I\) from ‘I let go of the ball’ to ‘The ball dropped to the ground’. The physicist might helpfully say that the fact that the ball dropped to the ground was an ineluctable causal consequence of my having let go of it. With nothing else to support the ball (ex hypothesi), the force of gravity went to work on the ball and made it accelerate towards the ground. The physicist’s background postulation of the gravitational field, plus the extra assumption about the lack of any continuing support for the ball, underwrite the ineluctability of the causal consequence. The inference \((I)\) is ineluctable, even though (qua causal inference) enthymematic.

The philosopher of physics accordingly suggests that we should turn the enthymematic causal inference \((I)\) into one that is logico-mathematically compelling, and a priori. Let us formulate the law of gravitation explicitly, and state Newton’s second law of motion (\(F=m\cdot a\)). Let us add those statements as explicit premises alongside the original premise that I let go of the ball and the ancillary assumptions that there was nothing else supporting it, and no force other than gravity exerted on it. Then the inferential transition (to the conclusion that the ball dropped to the ground) becomes truly ineluctable. Indeed, it becomes logico-mathematically necessary, and a priori.

It does so, however, at the cost of postulating the theoretical notion of a force (or force field) and high-level laws governing its behavior. Not that this is a bad thing—for these theoretical posits and the laws governing them earn their theoretical keep via a vast range of satisfying and successful explanations of disparate data concerning the details and qualitative types of motions of objects as various as pendula, projectiles, and planets. The important methodological observation at this juncture is just that the a priori character of the inferential transition made by the physicist is secured only by recourse to highly theoretical extra premises.

Among these extra premises, it is important to note, are the axioms of those branches of higher mathematics that need to be applied in order to provide, among other things, solutions of the relevant differential equations that are postulated as governing the physical magnitudes of interest. These branches of mathematics involve both the infinite and the continuous, and are of considerable logio-mathematical consistency strength—too high to be recovered by any logicist reconstruction. This means that the mathematics involved, even though it is a priori, is not all analytic. Rather, significant and indispensable parts of it are synthetic. They certainly cannot be recovered by any uncontroversially logicist reconstruction of mathematical foundations.

Explanations in evolutionary biology

We drew attention in the section “Explanations in physics” to the way the physicist turns an enthymematic explanation of certain motions into a logico-mathematically compelling argument of an hypothetico-deductive kind. The philosopher of biology, analogously, could suggest that we should turn Darwin’s enthymematic causal inference
$$\begin{aligned} \left. \begin{array}{l} \textsc {Variability}\\ \textsc {Heritability}\\ \textsc {Differential\, Reproduction} \end{array}\right\} \quad \Longrightarrow \quad \begin{array}{l} \textsc {Adaptation,}\\ \textsc {and\, Species{-}Diversity} \end{array} \end{aligned}$$
into one that is logico-mathematically compelling, and a priori. This will be done by stating some extra premises, but not necessarily highly theoretical ones.

The first of these extra premises is a very obvious, commonsense one (Causal Efficacy, or (CE)). The second is just a statement of scale (Deep Time, or (DT)). Details will emerge below.

When a Darwinian theorist says, as is frequently the case, that the three conditions of variability, heritability and differential reproduction lead to adaptation (and also speciation), the phrase ‘lead to’ is a causal one. The double arrow in the original inferential schema can be read as ‘can be seen as causally giving rise to’. The burden of this essay, however, is to suggest (and indeed to argue for the view) that the double arrow can be ‘logicized’, or purified to the point of analytic validity, by adding the two premises just mentioned.

Now, why is this important? It is because anyone coming to the regimentation of Darwin’s explanatory schema who is at all steeped in the prior explanatory successes of Newtonian science, and who is aware of the orthodox talk, among theoretical biologists, of the force of natural selection, would understandably be tempted to think that there must be a ‘lot more going on’, mathematically, within the Darwinian framework. They might naïvely suppose that a rigorous regimentation of Darwin’s explanatory schema will reveal highly non-analytic (i.e., synthetic) uses of mathematics in mediating the inferential transition from left to right in the schema above.

This study aims to show, however, that this naïve supposition is emphatically not correct. No synthetically mathematical mediation of that inferential transition is called for. It is a transition from ‘low level’ premises to a ‘low level’ conclusion that owes its validity to just the meanings of the words that express them. This remark covers also all the (surprisingly) ‘low level’ mathematics that is deployed in the process. It is a fragment of mathematics easily within the scope of purely logicist reconstruction. This is because the mathematics in question is rock-bottom combinatorial, and is implemented by computable functions available within an Excel spreadsheet. It has no truck with either the infinite or the continuous, and is of very low ‘consistency strength’. Indeed, the mathematics in question is all contained in the theory EFA (Exponential Function Arithmetic), which is of even lower consistency strength than Primitive Recursive Arithmetic.2 We are therefore pushing further than the claim in Matthen and Ariew (2009):

When there are heritable differences in traits leading to differential reproduction rates, the probability of the fitter types increasing in frequency is greater than that of the less-fit types increasing. This is simply a mathematical truth. (p. 211)

It is, indeed, a mathematical truth so ‘low level’ as to be analytic. That it is so will emerge from the computational and algebraic details supplied in due course, which Matthen and Ariew were not concerned to address.3

Exactly how will the arrow in the schema above be purified to the point of analytic validity? This, as already foreshadowed, will be done in two steps. First, we shall express the causal element by means of the extra premise (CE). Secondly, we shall supply the aforementioned extra premise (DT) concerning scale. By supplying these two extra premises we shall be able to see that they, taken in conjunction with the original premises (V), (H) and (DR), ensure that the inferential transition to the conclusion (A) is analytically valid. (Henceforth, we shall use the shorter label ‘Adaptation’ for the conclusion after the arrow.)

In order to grasp the inferential transition in question, one will need only to have mastered the concepts involved, and be willing to display one’s mastery by means of a little a priori reasoning. The conclusion can be seen to follow from the premises by virtue of logic, conceptual relationships and a modicum of mathematics that is analytically true (and derivable). Moreover, the supplementation with extra premises is much more modest, in the case of Darwinian biology, than it is in the case of Newtonian physics.

This stark and important contrast appears to be under-appreciated in contemporary philosophy of science. This is probably on account of the post-Quinean rejection of the analytic/synthetic distinction, and with it also the distinction between the a priori and the a posteriori. Failure to appreciate the contrast may also have something to do with the historical happenstance that Newtonian physics, which calls for mathematics of higher consistency strength, namely real analysis and the differential and integral calculus, was developed earlier than evolutionary biology, which calls for mathematics of lower consistency strength, namely simple arithmetic and rational-valued probability theory. If one were logically reconstructing our scientific knowledge in an order reflecting intellectual accessibility and modesty of theoretical assumptions, Darwinian evolutionary biology would surely precede Newtonian physics!

The first extra premise: Causal Efficacy

One of the two extra premises that we need to add—analogous to the premise that gravity is acting on the ball, and that an object accelerates in proportion to the total force acting on it—is the following:
  1. (CE)

    Causal Efficacy: The different values of the variable traits make distinctive causal contributions to their bearers’ reproductive success. That is to say, some trait-values are more reproduction-conducive than others. Such trait-values enable their bearers to produce more offspring than do the others.


The second extra premise: Deep Time

The conclusion, (A): Adaptation, and Species-Diversity, of the Darwinian explanatory schema describes the fully-to-be-expected result, over time, of a process we shall call (AE): Adaptive Evolution. This label should be construed as short for

Adaptive Evolution, and Speciation

or, more informatively:

Adaptive Evolution, in the sense of Changes in Trait-Frequencies within Populations, with Some of the Changes being so Radical that Populations Can Divide into Sub-Populations between whose members it transpires that Sexual Reproduction is no longer Possible; i.e., Speciation can take place

Adaptation is what we currently observe. ‘Good fit’ or ‘good design’ can be judged by looking at organisms and how they get on in their usual environments. The process of Adaptive Evolution, however, is more difficult (though not impossible) to observe. This is because it takes more time and attention to track changes in the frequencies of values of characteristic traits over several generations. We can do this, if we put our minds to it, with short-lived organisms like fruit flies; but it is harder to become observationally aware of the fact that Adaptive Evolution is actually taking place all the time, for all species, including longer-lived ones. And it is even more difficult to observe speciation taking place. To clarify:
  1. (AE)

    Adaptive Evolution: Over successive generations of any population of interbreeding organisms, one can expect an increase in the frequencies, within the population, of the more reproduction-conducive-cum-heritable (values of) variable and heritable traits, at the expense of the less reproduction-conducive-cum-heritable ones. Moreover—the Speciation bit—certain such changes, ‘in different directions’ within a population whose members encounter different environmental challenges, by being isolated from each other in different places, can result in speciation.

Note that we are working here on the understanding, shared with Darwin and his contemporaries, that adaptive evolution is a matter of changes in trait-value-frequencies within a population over time. The modern understanding of adaptive evolution as a matter of changes in gene-frequencies4 can be seen as a later, theoretical precisification of the Darwinian conception, according to which the ultimately relevant trait-values are of the form ‘having such-and-such alleles at such-and-such chromosomal locus’.

Note that the phrase ‘one can expect’ shows that we are dealing here with a (qualitatively) probabilistic claim. Over the long run, of course, statistical frequencies will converge on underlying probability values, with ever-greater certainty. (This is the informal content of Bernouilli’s Law of Large Numbers.) So, over the long run, we have moral certainty, given the premises, in the truth of the conclusion Adaptive Evolution. This explains the truth of its current outcome, Adaptation, given only one extra assumption. This is the assumption that

there has been enough time for (i) sufficiently many new traits, and new variant values thereof, to emerge; and (ii) sufficiently many adaptive evolutionary changes to have taken place—including those leading to speciation.

We call this assumption

(DT) Deep Time.

(This sounds snappier than more literal alternative likes ‘generational sufficiency’, and is already in use in discussions of the history of geology). The operational significance of (DT), for us, is that we can countenance a great number of generations when examining expected patterns of change in the relative frequencies of trait-values within a population.

Regimenting the premises

Our brief statement of Variability, Heritability and Causal Efficacy can be regimented further. We shall concentrate on (finitely) polyvalent traits—that is, traits with finitely many distinct values. Bivalent traits are a special case. Treating traits as (finitely) polyvalent is adequate even for traits whose values vary continuously. For it is a reasonable assumption that measured values will be assigned, in effect, to but finitely many sufficiently short intervals on the measurement scale. The transition to the continuous case never produces any significantly different qualitative outcomes. Moreover, such a transition is never really made by even the most conscientious measurer. This is because of the limitations, in principle, to the exactitude of measurement. Actual measurements, on any scale, always result in approximations, which are rational numbers.5

How to regiment variability

(V) Variability: Individuals tend to vary with respect to many an observable trait F.

(Polyvalent) If F is a trait with several possible discrete values (such as number of digits on a hand, or number of seeds in a pod, or eye color), then not all individuals exhibit the same value of F.

How to regiment heritability

(H) Heritability: A value of a variable trait is heritable in the following explicated sense on the suppositions that X and Y are the parents of Z, and that X\('\), Y\('\) are the parents of Z\('\), and that they are all members of the same population.

(Polyvalent) Suppose F is a polyvalent trait. Suppose that either X or Y (or both) exhibit the value F of F, and that neither X\('\) nor Y\('\) does so. Then the probability that Z exhibits the value F (call this probability \(\phi ^{\mathbf{F}}\))6 is higher than the probability that Z\('\) does so.7

Note that it is trait-values that are heritable, according to our regimentation. A given trait might have some of its values highly heritable,8 and others not.

How to regiment Causal Efficacy

We shall speak of generic individuals \(\iota\). Note that this is the Greek iota, and is not to be confused with the roman letter ‘\(i\)’.

We can conceptualize an individual \(\iota\)’s prospects for reproductive success as a probability distribution \(p_\iota (\;)\) of a unit amount over the numbers 0, 1, 2, ... of possible offpsring. The value \(p_\iota (k)\) is the probability that the individual \(\iota\) has exactly \(k\) offspring.9 We therefore have
$$\begin{aligned} \sum _{k=0}^\infty p_\iota (k) = 1 \end{aligned}$$
For any species \(S\) there is a relatively small number \(n_S\) such that for all \(m>n_S\) and for all individuals \(\iota\) of species \(S\), we have \(p_\iota (m)=0\). So the summation to infinity in the displayed expression is somewhat pedantic. Moreover, it is highly likely that beyond a certain value of \(k\)—which will depend on \(S\) and on the chosen individual \(\iota\) of \(S\)—the function \(p_\iota (k)\) decreases monotonically.
The graph—or better, the bar chart—of \(p_\iota (k)\) will have an initial bulge on the left, near to \(k=0\), and taper off to the right as \(k\) increases.

Remember that the total length of all the bars in the chart must be unity. Improved prospects for reproductive success can be pictured as involving a (very metaphorical!) ‘shift of the bulge to the right’, so that higher values of \(k\) become more probable.

Shifting the bulge of the bar chart for \(p_\iota\) to the right is a straightforward way to make vivid the idea that the individual \(\iota\)’s prospects for reproductive success have improved. But it is not the only way. One can imagine changes in the distribution \(p_\iota\) that similarly represent improved prospects, but that do not involve right-shifting that preserves a single bulge. One such way, for example, would be to create several more smaller bulges further to the right (but of course all still subject to the summation constraint that the total length of the bars be unity). We are dealing here with what are called probability density distributions.

The important point is that such a distribution \(p_\iota (k)\) allows one to calculate an ‘expected value’ \(E_{p_\iota }\) of \(k\), defined as follows:

Definition 1

\(E_{p_\iota }= \sum _{k=0}^\infty k\cdot p_\iota (k)\)

\(E_{p_\iota }\) is the expected number of offspring (according to \(p_\iota\)) begotten by individual \(\iota\). A distribution \(p_\iota\) represents improved prospects for reproductive success over another distribution \(p'_\iota\) just in case \(E_{p_\iota }>E_{p'_\iota }\). Since such improvement will involve, in a suitably generalized sense, some right-shifting of the probability weight (even if unevenly, to several smaller bulges), we shall use the phrase ‘right-shifting’ of \(p\) as a catch-all to register improvement of the prospects (represented by \(p\)) for reproductive success—or, equivalently, increase in the expected value \(E_p\). Causal Efficacy can now be regimented as follows.

Causal Efficacy, Regimented

(Polyvalent:) If F is a polyvalent trait, then there are at least two F-values F1 and F2 such that for every individual \(\iota\), if \(\iota\) exhibits value F1, then (ceteris paribus) the distribution \(p_\iota\) is right-shifted from what it would be (i.e. \(E_{p_\iota }\) is greater than what it would be) if \(\iota\) exhibited F2.

This regimentation furnishes a clear sense in which one (expressed value of a) trait can be more reproduction-conducive than another.10

Note the ceteris paribus clauses. These are required because of the myriad ways in which the values of any given trait F can be combined with various values of other variable and heritable traits.

So far we have focused on an individual organism \(\iota\) and we have spoken of \(p_\iota (\;)\), its probability-distribution for varying numbers of offspring. We shall now investigate a correlative notion, \(p^{\mathbf{F}}\), where F is a particular value of a given trait F (such as, say, GREEN EYES, for the trait EYE COLOR). It is one thing to contemplate the prospects for reproductive success—encoded by \(p_\iota (\;)\)—of a whole individual organism \(\iota\), fully constituted, as it were, by all of \(\iota\)’s particular values of all the traits that one might wish to take into account. It is quite another thing, however, to contemplate the ‘prospects of reproductive success’ of a particular trait-value, in isolation, as it were, from all the other trait-values that are involved in constituting a complete, whole individual. How does one conceptualize \(p^{\mathbf{F}}\)?

The answer is determined, in principle, by imagining an ‘averaging’ over all possible ways of combining each of the two given F-values with available values of the other traits. Call a way of combining values of the other traits with the value F of trait F an F-based trait profile; and call an F-variant of the profile in question any nomologically possible result of changing only the F-value therein.11 Every possible F-individual \(\iota\) has its own (fully specific) F-based trait profile \(\mathcal{F}\), with its corresponding probability distribution \(p_\mathcal{F}\)\((= p_\iota )\) fully determined by the full details of that F-based trait profile \(\mathcal{F}\). By averaging over all such distributions \(p_\mathcal{F}\) (holding only F constant within the trait profile), one arrives abstractly at the distribution \(p^{\mathbf{F}}\), which adverts only to the F-value F, and not to values of any other traits. It should be stressed, however, that \(p^{\mathbf{F}}\), on this account, takes into account only the number of first-generation offspring to which F conduces.

One could, however, if one wished, ‘iterate this process out’ an agreed number of generations, in order to get a better longitudinal handle on the longer-term reproduction-conduciveness (or distribution \(p^{\mathbf{F}}\)) of any particular F-value F. This would enable one to get the right answers in the case of such rare (and bizarre) cases of values F that nomologically covary with values F\(^{'}\) that are highly detrimental to the reproductive success, not of the parent(s), but of their offspring. Imagine, for example, a male trait-value F:being absolutely irresistible, sexually, to nulliparous young women, and a nomologically covarying male trait-value F\(^{'}\): certain to sire only sons, all of whom are sterile. On the one-generational look-ahead method, the trait-value F would be over-valued. On a two-generational look-ahead method, it would be appropriately undervalued. But we may reasonably suppose that the one-generational look-ahead method is by and large adequate, and that the exceptions counselling two (or more)-generational look-aheads are rare curiosities. One exception to this generalization, however, is that of K- and r-selection, in environmental regimens where the K-selectionists have the better long-term (multi-generational) reproductive prospects. A multi-generational look-ahead method could also, in principle, incorporate inclusive-fitness effects into the picture.

The importance of K- and r-selection notwithstanding, this study will proceed on the reasonable assumption that the one-generational look-ahead method is adequate for the delineation of the major conceptual and theoretical points to be made. We shall treat, that is, of what we shall call the basic case. The latter conceptual and theoretical points are also going to emerge even under the perhaps over-simplifying assumption that the environment remains stable enough not to induce variation in the reproduction-conduciveness of particular trait-values F. It is a theoretical option, which remains open, to complicate matters further by treating of time- or environment-dependent measures of reproduction-conduciveness. But the qualitative points to emerge from this study can be expected to be robust under such theoretical refinements. Indeed, given that one of the main points concerns dramatic threshold effects resulting from the non-linearity of the underlying mathematics, one can reasonably expect that main lesson to be reinforced upon any such theoretical refinement of our basic case.

Suppose one is given two distinct possible F-values, say F1 and F2, and suppose the problem is that of determining, in the abstract, which of the corresponding distributions \(p^{\mathbf{F}_1}\) and \(p^{\mathbf{F}_2}\) is right-shifted with respect to the other. Each F1-based trait profile has as an F-variant the corresponding F2-based trait profile, differing only with respect to the F-value. Following the abstract method just explained, one arrives at \(p^{\mathbf{F}_1}\) and \(p^{\mathbf{F}_2}\), and can (in principle) compare their expected values.

This does not purport to be a feasible recipe for actually computing distributions \(p^{\mathbf{F}}\) from possible empirical data. Rather, it is a conceptual recipe by means of which one comes to understand what sort of quantity \(p^{\mathbf{F}}\) is supposed to be. The recipe shows how \(p^{\mathbf{F}}\) would supervene on distributions of the form \(p_{\mathcal{F}}\), the form for fully fledged or specified individuals. The latter distributions are based on the propensities of fully specified individuals—fully specified in the sense that, for every trait F, a particular F-value has been specified.

It will be convenient to have the concept of a normed expectation value.

Definition 2

The normed expectation value of \(p_\mathbf{F}\) is
$$\begin{aligned} \frac{E_{p ^{\mathbf{F}}}}{\sum _\mathbf{F'} E_{p^{\mathbf{F}'}}}, \end{aligned}$$
where the summation in the denominator is over all values F′ of the trait F in question.

Normed expectation values lie in the interval [0,1].

Definition 3

$$\begin{aligned} \rho _\mathbf{F}=_{df}\frac{E_{p ^\mathbf{F}}}{\sum _\mathbf{F'} E_{p^\mathbf{F'}}} \end{aligned}$$

We call \(\rho _\mathbf{F}\) the measure of reproduction-conduciveness of the trait-value F. When the trait-value F is written as Fi, then we shall write \(\rho _i\) in place of \(\rho _{\mathbf{F}_{i}}\). For the trait F that we are dealing with, with its particular values F1, F2, ..., will always be understood easily from the context.

The analyticity of the Darwinian schema made clear

The central Darwinian inference, suitably fleshed out so that it is no longer enthymematic, can now be exhibited in two stages as follows.
$$\begin{aligned} \left. \begin{array}{l} \textsc {Variability}\\ \textsc {Heritability}\\ \textsc {Differential\, Reproduction}\\ \textsc {Causal\, Efficacy} \end{array}\right\} \quad \mapsto \quad \begin{array}{c} \;\\ \;\\ \left. \begin{array}{c} \\ \textsc {Adaptive\, Evolution}\\ \;\\ \textsc {Deep\, Time} \end{array}\right\} \mapsto \end{array} \begin{array}{c} \;\\ \;\\ \textsc {Adaptation} \end{array} \end{aligned}$$
Note that the double arrow ‘\(\Rightarrow\)’ of causal inference has given way here to a new style of arrow, ‘\(\mapsto\)’, intended to represent analytically necessary, hence a priori consequence. Should the reader balk at the bold claim of analyticity, then the invitation is to invest ‘\(\mapsto\)’ with at least the sense of logico-mathematically necessary, hence a priori consequence.
In keeping with the style adopted in various other essays setting out the present author’s regimentations of hypothetico-deductive explanations,12 the pattern of Darwinian explanation can be rendered as the following tree-like proof-schema, with premises at the tips of branches and conclusion at the root:
$$\begin{aligned} \begin{array}{l} \underbrace{ \textsc {Variability}, \; \textsc {Heritability}, \; \textsc {Differential\, Reproduction}, \; \textsc {Causal\, Efficacy}}\\ \,\,\quad \qquad \qquad \qquad \qquad \qquad \vdots \;\text{(I) }\\ \quad \\ \quad \qquad \qquad \qquad \underbrace{ \textsc {Adaptive\, Evolution}\qquad \qquad \quad \quad \quad \quad \quad \textsc {Deep\, Time}}\\ \quad \\ \,\quad \qquad \qquad \qquad \quad \qquad \qquad \qquad \qquad \vdots \;\text{(II) }\\ \quad \\ \quad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \textsc {Adaptation} \end{array} \end{aligned}$$
Here, the vertically descending dots marked (I) and (II) indicate passages of a priori reasoning (either analytic or logico-mathematical, according to one’s philosophical convictions). The passage marked (II) should strike anyone (who is willing to acknowledge the analytic/synthetic distinction) as analytic. The inference is, after all, to the effect that changes in trait-frequencies in favor of more reproduction-conducive (and, as will emerge below, more heritable) traits will, over a very long time, produce a state of adaptedness on the part of that generation of organisms that is alive at the end of that very long time.

Observe that none of the five premises (at this stage of the exposition) involves theoretical posits. They involve only observational concepts.

What, then, are the concepts in question? The embedded concepts that occur explicitly in the regimented forms of the premises and the conclusion can be listed as follows:

Concrete: individual organism; parent; offspring;

Abstract/Logical: observable trait; value of a trait; logical connectives; logical quantifiers; identity/distinctness

Mathematical: natural number; rational number; probability; choosing at random within a population; less than; interval (between); probability density; expected value

Functional/teleological: adapted; designed; equipped

The only task that remains is to make obvious the inferential passage marked (I).

The probabilistic reasoning for (I) in the asexual case

The challenge is to make it clear that inference (I) is analytically valid (or at least a priori). First we confine ourselves to the case of asexual reproduction. (The case of sexual reproduction will be considered in the section “The probabilistic reasoning for (I) in the sexual case”.) So suppose that the four premises (V), (H), (DR) and (CE) are true. We seek an argument for the conclusion (AE):

Over successive generations of any population of self-reproducing organisms, one can expect an increase in the frequency, within the population, of the more reproduction-conducive-cum-heritable (values of) variable and heritable traits, at the expense of the less reproduction-conducive-cum-heritable ones.


Since we are considering only frequencies, we can avail ourselves of the mathematical convenience of ‘norming’ the population to unity, and treating trait-value-frequencies as rational numbers in the interval [0,1]. The frequency \(f_i\) of a trait-value Fi of the trait F is defined as follows.

Definition 4

\(f_i=\frac{The\, number\, of\, \mathbf{F}_i{-}individuals}{The\, number\, of\, individuals}\)

(At times we may also use a superscript \(n\): \(f_i^n\) will be the frequency of Fi-individuals within generation \(n\).)

Observation 1

\(\sum _j f_j=1\).

The task is to investigate and to determine the conditions under which, as we pass from generation \(n\) to generation \((n+1)\), the frequency \(f^{n+1}_i\) exceeds the frequency \(f^{n}_i\). That is, it is required that we determine the conditions under which \(f_i\) (i.e., the proportion of Fis) will grow.


Definition 5

\(\phi ^i_j\) is the statistical frequency of Fjs among the offspring of Fis.

Observation 2

For all \(i\), \(\sum _j \phi ^i_j=1\).

Without loss of generality we can consider only the result of replacing each reproducer by its offspring. That simplifies the imagined move from one generation to the next, and allows for an easier comparison of the resulting trait-value-frequencies across generations.

For any given F-value Fi, and with an eye to its frequency in the next generation, we need to consider as reproducers not only Fis themselves, but also all the Fjs, where \(j\ne i\). Since F is heritable, we have the following algebraic condition to be satisfied.

Heritability For at least some \(i\), the (expected) frequency \(\phi ^i_i\) of Fis among the offspring of Fis is greater than the (expected) frequency \(\phi ^j_i\) of Fis among the offspring of Fjs, for any \(j\) distinct from \(i\). That is,
$$\begin{aligned} \forall j\ne i,\;\phi ^i_i > \phi ^j_i \end{aligned}$$
Our algebraic statement of Heritability is intended as a formalization of the informal condition of the same name, formulated earlier.

The crucial equation for next-generation frequencies

Definition 6

\(\rho _i\) is the normed measure of reproduction-conduciveness of the trait-value Fi, as per Definition 3.

In generation \((n+1)\), the proportion of Fi-individuals will be
$$\begin{aligned} \frac{\sum _j f^n_j\cdot \rho _j\cdot \phi ^j_i}{\sum _k \sum _j f^n_j\cdot \rho _j\cdot \phi ^j_k} \end{aligned}$$

(The denominator—let us call it \(\delta\)—is needed in order to normalize frequencies.)

Note that we are assuming here—for purposes of simplification, for it may not be true—that \(\rho\)- and \(\phi\)-values remain constant across generations, while only frequencies change.

Our question was: under what more perspicuous condition do we have
$$\begin{aligned} f^{n+1}_i>f^{n}_i\; ? \end{aligned}$$
That is, under what more perspicuous condition do we have
$$\begin{aligned} \frac{\sum _j f^n_j\cdot \rho _j\cdot \phi ^j_i}{\delta }>f^{n}_i\; ? \end{aligned}$$
We can now suppress the superscript \(n\). So our question concerns the condition
$$\begin{aligned} \frac{\sum _j f_j\cdot \rho _j\cdot \phi ^j_i}{\delta }>f_i \end{aligned}$$
Divide through by \(f_i\) and multiply by \(\delta\):
$$\begin{aligned} \rho _i\cdot \phi ^i_i + \frac{1}{f_i}\sum _{j\ne i} f_j\cdot \rho _j\cdot \phi ^j_i>\delta \end{aligned}$$
Bear in mind that all the quantities represented by letters on the left-hand side lie in the interval [0,1]. The inequality holds if and only if
$$\begin{aligned} \rho _i\cdot \phi ^i_i + \frac{1}{f_i}\cdot \sum _{j\ne i} f_j\cdot \rho _j\cdot \phi ^j_i>\sum _k \sum _j f_j\cdot \rho _j\cdot \phi ^j_k \end{aligned}$$
$$\begin{aligned} \rho _i\cdot \phi ^i_i + \frac{1}{f_i}\cdot \sum _{j\ne i} f_j\cdot \rho _j\cdot \phi ^j_i>\sum _j \sum _k f_j\cdot \rho _j\cdot \phi ^j_k \end{aligned}$$
$$\begin{aligned} \rho _i\cdot \phi ^i_i + \frac{1}{f_i}\cdot \sum _{j\ne i} f_j\cdot \rho _j\cdot \phi ^j_i>\sum _k f_i\cdot \rho _i\cdot \phi ^i_k+\sum _{j\ne i} \sum _k f_j\cdot \rho _j\cdot \phi ^j_k \end{aligned}$$
$$\begin{aligned} \rho _i\cdot \phi ^i_i + \frac{1}{f_i}\cdot \sum _{j\ne i} f_j\cdot \rho _j\cdot \phi ^j_i-\sum _k f_i\cdot \rho _i\cdot \phi ^i_k>\sum _{j\ne i} \sum _k f_j\cdot \rho _j\cdot \phi ^j_k \end{aligned}$$
$$\begin{aligned} \rho _i\cdot \phi ^i_i + \frac{1}{f_i}\cdot \sum _{j\ne i} f_j\cdot \rho _j\cdot \phi ^j_i-f_i\cdot \sum _k \rho _i\cdot \phi ^i_k>\sum _{j\ne i} \sum _k f_j\cdot \rho _j\cdot \phi ^j_k \end{aligned}$$
Let us use \(a\), \(b\) and \(c\) as abbreviations for the \(\Sigma\)-terms

\( {\sum _{j\ne i}} {f_{j}}\cdot {\rho_{j}}\cdot {\phi^{j}_{i}}, \quad {\sum_{k}} {\rho_{i}}\cdot {\phi^{i}_{k}} \quad {\text{and}} \quad {\sum_{j\ne i}} {\sum_{k}} {f_{j}}\cdot {\rho_{j}}\cdot {\phi^{j}_{k}}\)

respectively. (Note that they are all positive.) Then the last inequality becomes

$$\begin{aligned} \rho _i\cdot \phi ^i_i + \frac{1}{f_i}\cdot a-f_i\cdot b>c \end{aligned}$$
We see immediately that there are three ways to make it easier for the required inequality to hold:
  1. 1.

    increase the reproduction-conduciveness \(\rho _i\) of Fi;

  2. 2.

    increase the heritability \(\phi ^i_i\) of Fi;

  3. 3.
    make the initial frequency \(f_i\) less than \(\sqrt{\frac{a}{b}}\), i.e. less than
    $$\begin{aligned}\sqrt{\frac{\sum _{j\ne i} f_j\cdot \rho _j\cdot \phi ^j_i}{\sum _k \rho _i\cdot \phi ^i_k}}\end{aligned}$$
A trait-value’s reproduction-conduciveness and heritability work in tandem to make it increase in frequency at the expense of other trait-values. Also, the situation favors the underrepresented trait-values, that is, the ones with low initial frequencies (in the sense that condition (3) above is met).

It is not the case that one trait-value will increase in frequency at the expense of another simply by being more conducive to reproduction. For, if the first trait-value is much less heritable than the second, this need not happen. Conversely, the more heritable a trait-value is, the better it will do in competition with other trait-values, even if the latter conduce slightly more to reproduction. Ironically, Nature (or, rather: the mathematics), tends to smile on newcomers (that is, traits with very low frequencies) that have the least bit going for them (that is, a reasonable value of reproduction-conduciveness times heritability).

It would be difficult to overemphasize the importance here of heritability. It is an easy exercise to construct an Excel spreadsheet that will display, by means of bar charts, the relative proportions (within a normalized population) of the various trait-values over, say, 50 generations. Such an ‘a priori prosthetic’ makes vivid how crucial are the values \(\phi ^i_j\) for \(i\!\ne \! j\). With ‘true breeding’ trait-values, these ‘non-diagonal’ entries in the heritability matrix are 0; and, with heritable trait-values, they are less than the diagonal entry.

With ‘true breeding’ trait-values, changes will arise in their frequencies only as a result of changes in their reproduction-conduciveness-values \(\rho _i\). As soon as one allows non-zero entries \(\phi ^i_j\) for \(i\!\ne \! j\), however (compatibly with Heritability), the reproductive success of any one trait-value can be ‘siphoned off’ so as to boost the representation, in the next generation, of other trait-values. In these situations, the trait-values Fi that prevail are those for which (i) the ‘diagonal’ entry \(\phi ^i_i\) (which can be thought of as the degree of ‘self-heredity’, or true-breeding) is kept as high as possible, or (ii) there is enough siphoning \(\phi ^i_j\) off other successful traits Fj to compensate for lowered self-heredity on the part of Fi. Then, also, the trend will be towards very stable polymorphic equilibria, which are rapidly achieved. In these equilibria even the least reproduction-conducive trait-values Fj enjoy non-trivial, albeit small, representation without going extinct (i.e., without it being the case that \(f^n_j\rightarrow 0\) as \(n\rightarrow \infty\)). Moreover, when these polymorphic equilibria obtain as a consequence of a particular distribution of values within the heritability matrix \(\phi ^i_j\), one finds that even large changes among the various \(\rho\)-values result in only modest changes in the nature of the equilibrium attained. Frequencies will be altered; but no trait-value will be driven to extinction.

All these remarks pertain to heritability matrices satisfying the condition Heritability. It is an interesting exercise to relax that condition, so as to be able to countenance matrices in which some diagonal entries \(\phi ^i_i\) are exceeded by other entries \(\phi ^i_j\) (\(i\!\ne \! j\)) in the \(i\)-th row. Remember that the entries in the \(i\)-th row represent the make-up of the offspring of Fis. This Heritability-violating possibility allows that the offspring of Fis are least likely to be Fis!

No part of the reasoning in this section has assumed that the environment is constant. We have been looking only at the transition from one generation to the next. It is quite possible that with environmental changes there will be a shift in the distribution of reproduction-conduciveness values across trait-values. In such circumstances, the same qualitative argument can be run again, to predict increases in the frequencies of certain trait-values at the expense of others.

The Monte Carlo method

Our argument, as presented above, may have to face an objection along the following line. We have in effect treated frequencies as identical to the postulated underlying probabilities (or at least as the result of certain algebraic operations on presumed probability values). But this is unrealistic, for two reasons. First, expected numbers of offspring (according to the probabilities involved in one’s calculations) are often fractional; yet individuals can leave only whole numbers of offspring.13 Secondly, even if expected values were always whole numbers, there will be fluctuations of actual frequencies in finite populations, at times well away from the underlying probabilities. A fair coin, for example (where \(p\)(Heads)\(=p\)(Tails)\(=\frac{1}{2}\)), if tossed 10 times, could come up 8 Heads and 2 Tails. Analogously, the proportion of F-offspring in the brood of F-parents may on occasion deviate significantly from the presumed ‘heritability’ probability \(\phi ^{\mathbf{F}}\) mentioned in our statement above of Heritability.

Over the long run, however, as already pointed out, statistical frequencies will converge on underlying probability values, with ever-greater certainty (Bernouilli’s Law of Large Numbers again.) Especially as populations expand, and also as many generations come to pass, these occasional fluctuations will be swamped by the inexorable tendency of probabilistic processes to deliver long-run frequencies closely approximating the underlying probabilities. We acknowledge here the potential significance of ‘founder effects’ (when initial population sizes are small); of ‘demic structure’ (where populations are divided into significantly smaller breeding-groups); and of ‘genetic drift’ (where gene-frequencies change—up or down—without these changes being explicable in terms of greater or lower reproduction-conduciveness, respectively).14 There is also the important consideration that at times of catastrophe, when large portions of a population are wiped out, the trait-values of lowest frequency might be wiped out too, even if those values were set to enjoy some small measure of representation in an enduring polymorphic equilibrium. For trait-values, as for some animals, there can be safety in numbers.

But these stochastic effects lend only historical interest to an otherwise general and lawlike process—that of asymptotic trends (in the directions to be expected) of frequencies of variable and heritable traits, towards polymorphic equilibria in which the more reproduction-conducive-cum-heritable trait-values enjoy proportionally larger representation within the population.

Anyone who doubts the robustness of the general conclusion is invited to test the soundness of the foregoing reasoning by running ‘Monte Carlo’ methods on a computer. One can build in randomizing devices (the electronic equivalent of fair-coin-tossing or fair-dice-rolling) and program imaginary ‘evolutionary runs’ through generational phase-space of trait-value-proportions. Each such run will allow for the kinds of stochastic deviations from underlying probabilities (or from the results of algebraic calculations employing such) mentioned above. After sufficiently many such runs of sufficient length (that is, covering sufficiently many generations) the same statistical ‘settling’ should be observed: actual long-run frequencies of trait-values should approximate arbitrarily closely to the frequencies calculated by the foregoing method of straightforwardly probabilistic reasoning.

One cannot do more, at this stage, to rebut the envisaged objection; this reply will have to suffice.

The true nature of Darwinian fitness

For the time being we are assuming asexual reproduction. Any four-trait-value evolutionary scenario may be characterized by matrices of the following form.
$$\begin{aligned}\varvec{\Phi } = \left( \begin{array}{cccc} \phi ^1_1 &{} \phi ^1_2 &{} \phi ^1_3 &{} \phi ^1_4 \\ \phi ^2_1 &{} \phi ^2_2 &{} \phi ^2_3 &{} \phi ^2_4 \\ \phi ^3_1 &{} \phi ^3_2 &{} \phi ^3_3 &{} \phi ^3_4 \\ \phi ^4_1 &{} \phi ^4_2 &{} \phi ^4_3 &{} \phi ^4_4 \end{array} \right) \mathbf {\rho }= \left( \begin{array}{cccc} \rho _1&\rho _2&\rho _3& \rho _4 \end{array} \right) \mathbf {f}= \left( \begin{array}{cccc} f_1& f_2& f_3& f_4 \end{array} \right)\end{aligned}$$
These matrices can always be normalized, by dividing each entry by the sum of entries in its row. (Indeed, only normalized matrices are used in our calculations.) But, provided that one appreciates that a matrix is normalizable in this way, one can also work with non-normalized matrices. This is helpful in that it allows one to give illustrations by means of whole-number entries, as will be done here.

When normalized, \(\varvec{\Phi }\) is the matrix of heritability values. The entry \(\phi ^i_j\) represents the proportion of Fj s among offspring of a parent of type Fi.

The \(\rho\)-values are measures of reproduction-conduciveness of the trait-values. The value \(\rho _i\) measures the expected size of brood of a parent of type Fi.

To help simplify matters, we can assume that
$$\begin{aligned} \mathbf {f} = \left( \begin{array}{cccc} 1&1&1&1 \end{array} \right) \end{aligned}$$
so that the initial frequency of each trait-value is \(\frac{1}{4}\). This enables us subsequently to contrast evolutionary scenarios by reference only to the matrices \(\varvec{\Phi }\) and \(\rho\).

A simple example: obviously steady state

$$\begin{aligned} \varvec{\Phi } = \left( \begin{array}{cccc} 1 &{} 1 &{} 1 &{} 1\\ 1 &{} 1 &{} 1 &{} 1\\ 1 &{} 1 &{} 1 &{} 1\\ 1 &{} 1 &{} 1 &{} 1 \end{array} \right) \quad \text{ and }\quad \mathbf {\rho } = \left( \begin{array}{cccc} 1&1&1&1 \end{array} \right) \end{aligned}$$
no change takes place. If
$$\begin{aligned} \mathbf {f}^0 = \left( \begin{array}{cccc} 1&1&1&1 \end{array} \right) , \end{aligned}$$
then for every \(n\),
$$\begin{aligned} \mathbf {f}^n = \left( \begin{array}{cccc} 1&1&1&1 \end{array} \right) : \end{aligned}$$

Another simple example, with a less obviously steady state

Exactly the same scenario obtains with
$$\begin{aligned} \varvec{\Phi } = \left( \begin{array}{cccc} 10 &{} 1 &{} 1 &{} 1\\ 1 &{} 10 &{} 1 &{} 1\\ 1 &{} 1 &{} 10 &{} 1\\ 1 &{} 1 &{} 1 &{} 10 \end{array} \right) \quad \text{ and }\quad \mathbf {\rho } = \left( \begin{array}{cccc} 1&1&1&1 \end{array} \right) \end{aligned}$$
This is because the pattern of ‘reciprocal heredity’ means that each trait-value receives a boost from all the others that exactly offsets the contribution that it makes to each of them. The bar chart illustrating the heritability matrix is

Varying reproduction-conduciveness

Consider the same heritability matrix, but with F1s twice as prolific in breeding as individuals with the other trait-values:
$$\begin{aligned} \varvec{\Phi } = \left( \begin{array}{cccc} 10 &{} 1 &{} 1 &{} 1\\ 1 &{} 10 &{} 1 &{} 1\\ 1 &{} 1 &{} 10 &{} 1\\ 1 &{} 1 &{} 1 &{} 10 \end{array} \right) \quad \text{ and }\quad \mathbf {\rho } = \left( \begin{array}{cccc} 2&1&1&1 \end{array} \right) \end{aligned}$$
We shall use a pie-chart to make the various \(\rho\)-values vivid:
The evolutionary picture is now as follows:

Note that a stable polymorphic equilibrium is reached very quickly, after only fifteen generations.

The reader might be surprised to learn how very little \(\rho\) matters when heritability is reciprocal.

Consider the last example changed only in one respect:
$$\begin{aligned}\mathbf {\rho } = \left( \begin{array}{cccc} 100&1&1&1 \end{array} \right) \end{aligned}$$
A polymorphic equilibrium is achieved even more rapidly, and—perhaps surprisingly—involves a non-trivial proportion of the much less prolific trait-values F2, F3 and F4:
Moreover, this picture persists unruffled even as one increases \(\rho _1\) by yet more orders of magnitude.
Let us return to the first example of this subsection, and change it only by making F1s ‘breed true’. So we have the matrices
$$\begin{aligned} \varvec{\Phi } = \left( \begin{array}{cccc} 10 &{} 0 &{} 0 &{} 0\\ 1 &{} 10 &{} 1 &{} 1\\ 1 &{} 1 &{} 10 &{} 1\\ 1 &{} 1 &{} 1 &{} 10 \end{array} \right) \quad \text{ and }\quad \mathbf {\rho } = \left( \begin{array}{cccc} 2&1&1&1 \end{array} \right) \end{aligned}$$
The picture for the heritability matrix is

The effect is to confer an astonishing advantage on F1:

Moreover, even if we reduce the reproduction-conduciveness edge conferred by F1, so that
$$\begin{aligned} \rho = \left( \begin{array}{cccc} 1.1&1&1&1 \end{array} \right) , \end{aligned}$$
the trait-values F2, F3 and F4 are relegated to arbitrarily small frequencies over sufficiently many generations:


All the evolutionary scenarios considered thus far have involved heritability matrices conforming to the algebraic condition of Heritability. It is instructive to inquire what happens if we relax (or violate) this condition. What if the offspring of Fis are least likely to be Fis?

Let us return to equal \(\rho\)-values. Consider the following rather bizarre heritability matrix, in which the numbers 1, 2, 3, 4 are permuted row-by-row so that the diagonal entries are always 1s, representing a certain ‘minimization of true breeding’:
$$\begin{aligned} {\varvec{\Phi }} = \quad \left( \begin{array}{cccc} 1 &{} 2 &{} 3 &{} 4\\ 4 &{} 1 &{} 2 &{} 3\\ 3 &{} 4 &{} 1 &{} 2\\ 2 &{} 3 &{} 4 &{} 1 \end{array} \right) \end{aligned}$$
The picture is

With \(\mathbf {\rho } = \left( \begin{array}{cccc} 1&1&1&1 \end{array} \right)\), there is no evolutionary change! Indeed, no matter how varied the four initial frequencies, the same polymorphic equilibrium \(\mathbf {f} = \left( \begin{array}{cccc} \frac{1}{4}&\frac{1}{4}&\frac{1}{4}&\frac{1}{4} \end{array} \right)\) is reached after very few generations.

Now consider making F1 in this scenario much more conventionally heritable (indeed: almost true breeding), say by having the entries of the first row changed thus:
$$\begin{aligned} \varvec{\Phi } = \left( \begin{array}{cccc} 10 &{} 0.1 &{} 0.1 &{} 0.1\\ 4 &{} 1 &{} 2 &{} 3\\ 3 &{} 4 &{} 1 &{} 2\\ 2 &{} 3 &{} 4 &{} 1 \end{array} \right) \end{aligned}$$
The picture of the heritability matrix is accordingly
The evolutionary outcome is another polymorphic equilibrium:

Moreover, even if we increase \(\rho _1\) by many orders of magnitude, much the same picture emerges, with the frequencies \(f_1\), \(f_2\), \(f_3\) and \(f_4\) only slightly reduced, and with an upper limit to the frequency \(f_1\).

Not so, however, if F1s breed true, so that the entries \(\phi ^1_2\), \(\phi ^1_3\) and \(\phi ^1_4\) are set to 0! For then we have, with the heritability matrix
$$\begin{aligned} \varvec{\Phi } = \left( \begin{array}{cccc} 1 &{} 0 &{} 0 &{} 0\\ 4 &{} 1 &{} 2 &{} 3\\ 3 &{} 4 &{} 1 &{} 2\\ 2 &{} 3 &{} 4 &{} 1 \end{array} \right) \end{aligned}$$
the rapid marginalization of the other trait-values:

And that is with\(\rho\)-values being equal!

A very special heritability matrix for the case of three trait-values

Let us look now at the case of three trait-values, rather than four. Here is a very special 3\(\times\)3 heritability matrix:
$$\begin{aligned} \varvec{\Phi } = \left( \begin{array}{cccc} 1 &{} 1 &{} 0\\ 1 &{} 2 &{} 1\\ 0 &{} 1 &{} 1\\ \end{array} \right) \end{aligned}$$
for which the picture is
This heritability matrix has some remarkable features. First, it induces the polymorphic equilibrium \(\mathbf {f}=\left( \begin{array}{cccc} 1&2&1 \end{array} \right)\) within one generation, for all \(\mathbf {f}^0\) of the form \(\left( \begin{array}{cccc} \alpha&\beta&\alpha \end{array} \right)\)and for all \(\rho\) of similar form. That is to say, it matters not what the initial frequency of F2 is (assuming the initial frequencies of F1 and of F3 are equal), and it matters not what the \(\rho\)-value of F2 is (assuming the \(\rho\)-values of F1 and of F3 are equal). The following two pictures will make this vivid. The first shows what happens with the initial frequency of F2 set at one hundredth of the frequency of F1 and of F3, and the second shows what happens when it is set at one hundred times that frequency.
The second remarkable feature about this heritability matrix is that no matter how \(\rho\)-values are set, and no matter what the initial frequencies, a stable polymorphic equilibrium will be induced very rapidly. For example, with \(\mathbf {f}=\left( \begin{array}{cccc} 1&10&100 \end{array} \right)\) and with \(\rho =\left( \begin{array}{cccc} 100&10&1 \end{array} \right)\), the evolutionary scenario is

Note that our analysis thus far has been purely ‘phenomenological’, that is, innocent of any postulation of a mechanism of heredity. Nowhere have we mentioned the possibility of particulate genetics.15

How higher \(\rho\) can lower success

Our evolutionary model is very general. It allows us to consider some very unusual heritability matrices. Consider this one:
$$\begin{aligned} \varvec{\Phi } = \left( \begin{array}{cccc} 1 &{} 2 &{} 1\\ 1 &{} 2 &{} 1\\ 1 &{} 2 &{} 1\\ \end{array} \right) \end{aligned}$$

With this matrix, the stable polymorphic equilibrium of \(\mathbf {f}=\left( \begin{array}{cccc} 1&1&1 \end{array} \right)\) will be reached within a single generation, regardless of initial frequencies and regardless of \(\rho\)-values. So we see that a heritability matrix can render \(\rho\)-values ineffectual.

Intriguingly, some heritability matrices can make \(\rho\)-values anti-correlate with reproductive success!16 Consider the heritability matrix
$$\begin{aligned} \varvec{\Phi } = \left( \begin{array}{cccc} 0 &{} 1 &{} 0\\ 1 &{} 2 &{} 1\\ 0 &{} 1 &{} 0\\ \end{array} \right) \end{aligned}$$
With initial frequencies equal, and \(\rho\)-values equal, the frequencies oscillate down to a stable polymorphic equilibrium:
But here’s the rub: increase the value of \(\rho _2\), so that \(\rho\) is, say, \(\left( \begin{array}{cccc} 1&100&1 \end{array} \right)\). Then the evolutionary scenario becomes
in which the frequency of F2 in the equilibrium has been reduced. Conversely, with \(\rho\) set to, say, \(\left( \begin{array}{cccc} 100&1&100 \end{array} \right)\), the evolutionary scenario becomes

in which the frequency of F2 in the equilibrium has been increased.

Reflections on heritability

What has emerged here is a solid analytic basis for the view that approximation to true breeding is as important for the spread of a trait-value as is its conduciveness to reproduction. The two factors go hand-in-hand. Just as physics revealed the wave-particle duality of light, so too has conceptual analysis, with the aid of computerized exploration of a priori consequences, revealed the \(\varvec{\Phi }\)-\(\rho\) duality of Darwinian fitness. With so much attention paid to Mendelian particulate heredity, the \(\varvec{\Phi }\)-aspect of this duality has been under-appreciated, and only the \(\rho\)-aspect has been emphasized. But in the possibility space explored by the foregoing ‘phenomenological’, pre-theoretical, non-genetic account of Darwinian thinking, we are able to appreciate more fully the importance of heritability. Indeed, it may even be argued that the reason why so many traits (or trait-values) are heritable (in the stronger sense, involving row-by-row inequality conditions on diagonal and non-diagonal entries) is that those trait-values that were not heritable in the stronger sense were displaced, over evolutionary time, by those that were! Heritability matrices that make it the case that increases in \(\rho _i\) result in decreases in \(f_i\) are ones characterizing in-principle heritable traits F (where ‘heritable’ is to be taken in the weaker sense, as not involving those inequality conditions) for which the trait-value Fi in question disinherits itself, or writes itself out, over evolutionary time.

So we see that there is a deeper, but still Darwinian, reason why the initial, stronger statement of Heritability holds for the trait-values that we observe today. And this makes one realize that in the overall explanatory schema, Heritability (in the stronger sense) need not be taken as a starting point. Instead, something weaker could replace it, and the stronger statement of Heritability could then be derived from it (plus other assumptions already in place) as an intermediate conclusion. The weaker condition could be something like

For the \(n\) values of any polyvalent trait, there will be a reasonably stable \(n\times n\) heritability matrix.

This modest statement makes no requirement that the diagonal entry of any row of the heritability matrix should exceed every one of its non-diagonal entries (let alone that it should exceed their sum). But Darwinian reasoning would explain why something approximating this stronger requirement would eventually be met.

In saying this, of course, we are still working on the assumption of asexual reproduction.

The probabilistic reasoning for (I) in the sexual case

If we wish to cover the case of sexual reproduction in general—where any (Fi, Fj)-mating is possible—then the heritability matrix will have to be generalized.17 It will be an \((n\times n\times n)\) matrix, whose entries \(\phi ^{ij}_k\) represent the expected proportion of Fk among the offspring of any (Fi, Fj)-mating. Moreover, that more general modelling would also have to invoke a matrix \(\varvec{\Psi }\), whose entries \(\psi ^{ij}\) represent the proportion of (Fi, Fj)-matings among all matings within a generation. If mating is random, and if the sex-ratio is 50-50 across all trait-values, then \(\psi ^{ij}\) will be \((f_i\cdot f_j)\). (The sum of all the entries in the matrix \(\varvec{\Psi }\) will be unity.)

If one ponders the meaning of heritability more closely, it seems right that one should countenance arbitrary heritability matrices, without imposing any requirement on the relative sizes of diagonal and non-diagonal entries. After all, when two different homozygotes mate, all their offspring are heterozygote; so the appropriate generalization18 of the kind of inequality condition bruited earlier—that for all \(j\!\ne \! i\), \(\phi ^i_i > \phi ^i_j\)—could only be honored in the breach. The minimalist interpretation of heritability is just this: that the rows of the heritability matrix be stable. That is, for parents of the respective given types Fi and Fj, the values \(\phi ^{i,j}_1 ,\ldots ,\phi ^{i,j}_n\) should be stable. It is owing to the ‘inner constitution’ of the parents (in the F-regard) that their potential brood should be expected to divide in those proportions across the types F1,..., Fn.

What kind of stability is in question here? It is a commonplace in evolutionary theorizing that (what we are calling) \(\rho\)-values might alter in response to environmental changes. In particular, they might alter in response to changes of the frequencies \(f_i\) of the trait-values themselves. This is the familiar phenomenon of ‘frequency-dependent’ selection. But although \(\rho\)-values might display this kind of instability, there is no reason to think that the values within a heritability matrix might do so too. For a given environment, at least, a heritability matrix should enjoy constant entries. It encodes what a given pair of parental types can be expected to produce in the way of offspring-types, under those environmental conditions. Certainly, the matrix \(\varvec{\Psi }\) of mating-frequencies can be expected to be sensitive to the trait-value frequencies \(f_1,\ldots , f_n\) as these vary over time, even within a stable environment; but the same does not hold of the heritability matrix \(\varvec{\Phi }\).

Now, we have seen that the ‘truer breeding’ trait-values enjoy greater reproductive success, in the sense that they tend to increase in frequency over time. Hence one can expect it to be the case that there are many such traits with respect to which one can ‘profile’ any complex organism such as a human being. Having in the make-up of an individual many trait-values that almost breed true makes it easy to descry ‘family resemblances’ between parents and their offspring. We search for such ‘agreements in trait-value’ as we find salient: hair-color, eye-color, shape of nose, etc. And upon finding agreement in almost any plenitude of reasonably variable traits, we shall judge the resemblance to be marked. (Note that for this purpose we always ignore the non-variable, true-breeding traits, such as number of arms!) Moreover, such marked resemblances as we discern are much more likely to obtain between parents and their offspring than between randomly chosen members of the population. This is why one so often encounters, as the very explication of the notion of heritability, the italicized condition just stated.

It is not, however, a satisfactory explication of the notion of heritability. Rather, it is yet another explanandum for the reflective evolutionary theorist. For its explanation we need a more satisfactory explication of heritability, from which it will follow (given certain other assumptions) that it will indeed turn out to be the case that marked resemblances are much more likely to obtain between parents and their offspring than between randomly chosen members of the population. That more satisfactory explication of heritability has been offered above: it is merely a matter of stability of values within a heritability matrix. Evolution then takes care of all the rest, vouchsafing the stronger heritability of such trait-values as survive the selective process.

Suppose we are dealing with a polyvalent trait F, with trait-values F1,...,Fn. We have already foreshadowed the need for an \((n\times n)\)-matrix \(\varvec{\Psi }\) of mating-type frequencies. Each entry \(\psi _{ij}\) will represent the proportion, among all matings within the population, of matings between fathers of type Fi and mothers of type Fj (henceforth: matings of type (Fi, Fj)).

Let us assume that the sex-ratio is 50-50 within each type Fi. Let us also assume that mating is random, and that all matings within one generation take place simultaneously. The next generation is defined as consisting of the offspring of those matings.

We need also to generalize the interpretation of \(\rho\)-values. In this setting, \(\rho _{ij}\) will be the expected size of brood resulting from matings of type (Fi, Fj).

The general algebraic formula for the expected frequency Fk-individuals in the next generation is
$$\begin{aligned} \frac{\sum _i\sum _j \rho _{ij}\cdot \psi _{ij}\cdot \phi ^{ij}_k}{\sum _k\sum _i\sum _j \rho _{ij}\cdot \psi _{ij}\cdot \phi ^{ij}_k} \end{aligned}$$
which is equal to
$$\begin{aligned} \frac{\sum _i\sum _j f_i\cdot f_j\cdot \psi _{ij}\cdot \phi ^{ij}_k}{\sum _k\sum _i\sum _j \rho _{ij}\cdot \psi _{ij}\cdot \phi ^{ij}_k} \end{aligned}$$
on the assumption of random mating.
Here is a special \((3\times 3\times 3)\) heritability matrix (in pre-normal form). The reader should have no trouble identifying it as the heritability matrix for a single-gene, two-allele trait. Think of F1, F2 and F3 as \(aa\), \(aA\) and \(AA\) respectively:
$$\begin{aligned} \begin{array}{ccccc} (i,j) &{} k=1 &{} k=2 &{} k=3\\ (1,1) &{} 1 &{} 0 &{} 0\\ (1,2) &{} 1 &{} 1 &{} 0\\ (1,3) &{} 0 &{} 1 &{} 0\\ (2,1) &{} 1 &{} 1 &{} 0\\ (2,2) &{} 1 &{} 2 &{} 1\\ (2,3) &{} 0 &{} 1 &{} 1\\ (3,1) &{} 0 &{} 1 &{} 0\\ (3,2) &{} 0 &{} 1 &{} 1\\ (3,3) &{} 0 &{} 0 &{} 1 \end{array} \end{aligned}$$
An interesting evolutionary scenario is that where initially \(A\) is the only allele, so that \(AA\) (=F1) is the only type of individual. Imagine now that an allele \(A\) mutates to \(a\), so that there is one heterozygous individual (\(aA\)) in a population of, say, 1000. Thus the initial frequencies are in the proportions
$$\begin{aligned} \begin{array}{ccc} f_1 &{} f_2 &{} f_3\\ 0 &{} .001 &{} .999 \end{array} \end{aligned}$$

What will happen to allele \(a\)? Will it spread? Will it reach polymorphic equilibrium, in accordance with the Hardy-Weinberg law? Or will it displace the original allele \(A\) from the gene pool altogether? The answer depends on the ‘reproduction-conduciveness’ (or ‘expected brood-size’) values \(\rho _{ij}\).

For example, with the \(\rho\)-matrix set at
$$\begin{aligned} \begin{array}{cccc} &{} \text{F}_1 \text{-female } &{} \text{F}_2 \text{-female }&{} \text{F}_3 \text{-female }\\ \text{F}_1 \text{-male } &{} 2 &{} 2 &{} 2\\ \text{F}_2 \text{-male } &{} 2 &{} 4 &{} 1\\ \text{F}_3 \text{-male } &{} 2 &{} 1 &{} 1 \end{array} \end{aligned}$$
the evolution of frequencies of the three trait-values tends to a stable polymorphic equilibrium:
Likewise with the \(\rho\)-matrix set at
$$\begin{aligned} \begin{array}{cccc} &{} \mathbf{F}_1 \hbox {-female} &{} \mathbf{F}_2 \hbox {-female}&{} \mathbf{F}_3 \hbox {-female}\\ \mathbf{F}_1 \hbox {-male} &{} 3.7 &{} 1 &{} 1\\ \mathbf{F}_2 \hbox {-male} &{} 1 &{} 10 &{} 1\\ \mathbf{F}_3 \hbox {-male} &{} 1 &{} 1 &{} 1 \end{array}: \end{aligned}$$
But make one apparently insignificant change to the top-left entry, from 3.78 to 3.79:
$$\begin{aligned} \begin{array}{cccc} &{} \mathbf{F}_1 \hbox {-female} &{} \mathbf{F}_2 \hbox {-female}&{} \mathbf{F}_3 \hbox {-female}\\ \mathbf{F}_1 \hbox {-male} &{} 3.78 &{} 1 &{} 1\\ \mathbf{F}_2 \hbox {-male} &{} 1 &{} 10 &{} 1\\ \mathbf{F}_3 \hbox {-male} &{} 1 &{} 1 &{} 1 \end{array} \end{aligned}$$
and suddenly the mutant allele \(a\) sweeps through the gene pool, driving allele \(A\) to extinction:
The extinction can be prevented by a similar tweak to a heritability value. If we simply change \(\phi ^{11}_3\) from its present value of 0 to the new value of 0.01, the picture reverses:

This is the kind of phase-transition (or ‘butterfly effect’) familiar to students of non-linear dynamical systems. We see that there is a range of quite feasible values in both the \(\rho\)-matrix and the heritability matrix \(\varvec{\Phi }\) that can variously result in polymorphic equilibria, or wholesale extinctions, upon the introduction of a new mutation. New mutant alleles can be co-opted so as to become ‘one on a team’, or they can take over like Attila the Hun, depending on how certain matrix entries are tweaked. Either way, the Darwinian lesson is underscored: the biological realm is in constant flux; and evolutionary trends are determined, most critically, by heritability and reproductive success.

An intriguing analogy between Newtonian mechanics and Darwinian evolutionary theory

Kepler’s three laws of planetary motion state certain regularities that he discovered in the observational data of Tycho Brahe. They privilege the ellipse as one conic section among the three kinds possible. (The other two are the parabola and the hyperbola.)

Kepler’s Laws are purely kinematic, and do not involve mention of mass or force. They concern only geometric and algebraic conditions on measurements of position and of time.

Newton’s explanation of Kepler’s laws did involve mention of mass and force. Newton’s laws of motion and universal law of gravitation have Kepler’s laws as consequences.

Darwin is to Mendel as Kepler is to Newton. Darwin’s explanatory schema, as has been shown above, involves no theoretical posits, and draws on regularities that are easy to find in the observational data. The explanatory schema is very general indeed, involving a notion of heritability whose exact explication is a delicate question. Even on what we have called the minimalist interpretation, however, we find that the Darwinian scheme delivers evolutionary patterns that we know are out there in Nature: stable polymorphic equilibria, as well as eventual extinction.

But the Darwinian schema leaves open what the mechanism of heredity in the biological realm is, just as Kepler’s laws left open what the underlying causes were of the observed regularities in planetary orbits. Mendelian genetics posits units of particulate heredity, and thereby supplies further constraints on the evolutionary possibilities that may be realized: it leads us to certain choices of heritability matrices. In the same way, Newtonian mechanics led us to the family of conics as the possible paths of heavenly bodies in the gravitational field of a sun, and to elliptical orbits for those bodies whose total energy (potential plus kinetic) is negative.

From observational to theoretical concepts

We have employed only a modest amount of the Eudoxan mathematics of proportions in arriving at our (intermediate) conclusion of Adaptive Evolution. The Darwinian argument, had it been available, would have been able to persuade the ancient Greeks. (They would need to have been furnished also with a modicum of Bernouillian combinatorics—the gist of which would have been entirely accessible to them.) The inference-template etched above has dealt with only observable traits. But the very same template will now work just as well if we consider instantiating it with unobservable, or theoretical, traits. Such traits would include blood-type, level of concentration of catecholamine in the brain, blood testosterone level, molecular-biologically-based immunities to pathogens, etc. These traits have been made accessible only by the scientific progress achieved since Darwin’s day. His argument-schema, however, applies to these theoretical traits with as much persuasive force as it does to observable traits. It is a truly universal explanatory schema: it applies to all traits, whether observable or not. But, remarkably, it does not itself involve the postulation of any theoretical entities or properties.

Such postulation becomes necessary (and also fruitful) only when one inquires after the mechanism of heredity. The historical path from Mendelian genes (units of particulate inheritance) through the Crick–Watson DNA-account of genes has uncovered many new theoretical traits of organisms—to which Darwin’s evolutionary schema naturally applies. By concentrating on these new applications, one comes to appreciate the recent shift in focus to the ‘gene-selectionist’ view of evolutionary change, championed by G. C. Williams and Richard Dawkins (see Williams (1966) and Richard (1976)). Whatever the ‘ultimate’ traits of choice are, there one will see the (until now, unmentioned) ‘forces of selection’ at work.19

The author notes that no part of this explanation of the universal applicability of the Darwinian scheme has had recourse to the notion of a ‘force’ of ‘natural selection’, or to the cognate notion of a ‘unit’ of selection.


  1. 1.

    We do not claim any great originality for this idea. That Darwin employed the hypothetico-deductive method is already clear from Darwin’s own accounts, and in modern commentaries on Darwin such as Ghiselin (1969). (See especially pp. 64–5 of the latter work.) What we do claim as original in the present treatment, however, is its detailed rigor in drawing out the deductive consequences from certain hypothetical assumptions (about heritability and reproduction-conduciveness). The combinatorial mathematics, implemented on a spreadsheet, would have been unfeasible for (even if not: foreign to) any contemporary of Darwin; and it delivers qualitative insights into evolutionary processes that can be not just surprising, but quite startling.

  2. 2.

    For an account of the contemporary study of consistency strengths (an area known as Reverse Mathematics, founded by Harvey Friedman) see the definitive book Simpson (1999). Also useful in this regard is Burgess (2005), especially Table E: Twenty Milestones on the Fundamental Series, at pp. 220–1.

  3. 3.

    Although Walsh (2012) describes Matthen and Ariew (2009) as ‘argu[ing] that the changes in the trait structure of a population, identified as selection, are simple “analytic consequences” of the differential survival and reproduction of individuals’, it should be noted that the latter paper makes no claims of analyticity at all. Matthen and Ariew themselves use only the adjective ‘mathematical’ when formulating their view.

  4. 4.

    Cf. Millstein (2006), at p. 630, where she speaks of the ‘outcome of selection’ as ‘the change in gene or genotype frequencies from one generation to the next’; and at p. 641, where she writes

    Definitions of evolution differ, but one common definition (at least among population geneticists) is change in gene frequencies from one generation to the next.

    We need to remind ourselves that our thought-experiment involves being innocent of genetics. We would therefore not be much helped by adopting today’s population-geneticist’s understanding of what evolutionary change consists in. We submit that Darwin and his contemporaries had a perfectly workable understanding of adaptive evolutionary change, according to which it is change in trait-value frequencies. It is the plausibility of Darwin’s theory of evolution, on that understanding of evolutionary change, that we are seeking to enhance, independently of subsequent theoretical developments within the neo-Darwinian synthesis. This is so even though—and because—those developments have tended, on the whole, to make Darwinism even more plausible.

  5. 5.

    This is a theme well developed in Putnam (1971).

  6. 6.

    Note that we are simplifying here by not distinguishing the case where only one parent has F from the case where both parents have F. Nothing of importance will turn on this.

  7. 7.

    The comparison here is between two conditional probabilities with different conditions. The requirement is that the conditional probability

    \(p\)(F(Z)\(\mid\)(F(X) or F(Y)) and X and Y begat Z)

    exceed the conditional probability

    \(p\)(F(Z\('\))\(\mid\)(not-F(X\('\)) and not-F(Y\('\))) and X\('\) and Y\('\) begat Z\('\)).

  8. 8.

    The modifier ‘highly’ here can be understood easily with reference to increased differences between the two kinds of conditional probabilities just mentioned.

  9. 9.

    One could, in principle, make this probability a function not only of the individual \(\iota\) but also of the age of individual \(\iota\). Thus \(p_\iota (k,t)\) would be the probability that the individual \(\iota\) has exactly \(k\) offspring by age \(t\). Making the function \(p_\iota\) age-dependent would enable one to address the interesting case of different generation times discussed in Godfrey-Smith (2007). This further layer of mathematical complexity is unnecessary, however, for the basic task of regimentation that we have set ourselves. Carrying out that task even within the idealization of the discrete-generation model is sufficiently challenging.

  10. 10.

    Note that we avoid the phrases ‘fitness enhancing’ and ‘fitter than’.

  11. 11.

    The requirement here of nomological possibility is needed in order to take care of cases where respective values of two different phenotypic traits covary, as can happen, say, in cases where a single gene has pleiotropic phenotypic effects. The specific values of the two traits then come together, or not at all. They cannot be ‘uncoupled’.

  12. 12.

    See Tennant (2010), Tennant (2008b), Tennant (2008a).

  13. 13.

    On this point, cf.Sober (2001), at p. 310.

  14. 14.

    Note that we can limit ourselves here to mention only of reproduction-conduciveness, rather than reproduction-conduciveness-cum-heritability. This is because with a trait of the form ‘has such-and-such an allele at genetic locus so-and-so’, the heritability factor stabilizes out at \(\frac{1}{2}\), whence one can ‘divide through’ all reproduction-conduciveness-cum-heritability values, so as to be able to deal only with reproduction-conduciveness values.

  15. 15.

    The biologically informed reader will have noticed, however, that the special heritability matrix under consideration here characterizes (in the case of sexual as opposed to asexual reproduction) a two-allele, single-gene trait with F1 = aa, F2 = aA, and F3 = AA, under conditions of random mating within types, but not mating across types. It could be called a Mendelian matrix.

  16. 16.

    This is even more surprising than the claims, in Earnshaw-Whyte (2012) at p. 398, that ‘[Evolution by natural selection] can proceed even where there is no heredity’ and ‘variation in heritability can drive evolutionary change’.

  17. 17.

    It is extremely important to deal with the sexual case, when making general qualitative claims about evolutionary processes. Tennant (1999) criticized Skyrms (1996) for making incorrect qualitative generalizations that were made on the basis of a consideration of only (what amounted to) the case of asexual reproduction.

  18. 18.

    The appropriate generalization, for the case of sexual reproduction involving all possible (Fi, Fj)-matings, would be:

    for all \(k\) distinct from both \(i\) and \(j\), either \(\phi ^{i,j}_i > \phi ^{i,j}_k\) or \(\phi ^{i,j}_j > \phi ^{i,j}_k\).

  19. 19.

    As observed in footnote 14, one advantage enjoyed by gene-selectionism is that one can concentrate on reproduction-conduciveness rather than reproduction-conduciveness-cum-heritability. Single alleles have constant heritability (barring the occasional mutation). Polygenic phenotypic traits, however (of the kind that most observable traits tend to be), are both variable and more variable in their heritabilities.



This study had its origins in an Advanced Philosophy of Science course that the author taught in the Spring Term of 2009, devoted in large part to analyzing the hypothetico-deductive structure of explanations in Newtonian and Darwinian science. Earlier versions of the present study have been available as downloads from the public-domain teaching webpage The author is grateful to the students in that class, who were willing to embark on an exploration of issues that had no settled outcome guaranteed in advance, and that made considerable demands on their attention in class—for the mathematics in this paper was custom-made, and not drawn from any published sources. The author is grateful also to Elliott Sober and an anonymous referee, for helpful comments, and to the Editor, both for helpful suggestions on overall structure and for eliciting mention of how certain evolutionary phenomena would fit into the general picture on offer here. All remaining errors and oversights are the author’s sole responsibility.


  1. Brunnander B (2007) What is natural selection? Biol Philos 22(2):231–246CrossRefGoogle Scholar
  2. Burgess JP (2005) Fixing Frege. Princeton University Press, PrincetonGoogle Scholar
  3. Darwin C (1859) On the origin of species. John Murray, LondonGoogle Scholar
  4. Richard D (1976) The selfish gene. Oxford University Press, OxfordGoogle Scholar
  5. Earnshaw-Whyte E (2012) Increasingly radical claims about heredity and fitness. Philos Sci 79(3):396–412CrossRefGoogle Scholar
  6. Ghiselin MT (1969) The triumph of the Darwinian method. University of California Press, BerkeleyGoogle Scholar
  7. Godfrey-Smith P (2007) Conditions for evolution by natural selection. J Philos 104:498–516Google Scholar
  8. Matthen M, Ariew A (2009) Selection and causation. Philos Sci 76(2):201–224CrossRefGoogle Scholar
  9. Millstein RL (2006) Natural selection as a population-level causal process. Br J Philos Sci 57:627–653CrossRefGoogle Scholar
  10. Putnam H (1971) Philosophy of logic. Harper and Row, New YorkGoogle Scholar
  11. Schilcher F, Tennant N (1984) Philosophy, evolution and human nature. Routledge & Kegan PaulGoogle Scholar
  12. Simpson SG (1999) Subsystems of second order arithmetic. Perspectives in mathematical logic. Springer, BerlinCrossRefGoogle Scholar
  13. Skyrms B (1996) Evolution of the social contract. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  14. Sober E (2001) The two faces of fitness. In: Singh RS, Krimbas CB, Paul DB, Beatty J (eds) Thinking about evolution: historical, philosophical, and political perspectives, vol 2. Cambridge University Press, Cambridge, pp 309–321Google Scholar
  15. Tennant N (1999) Sex and the evolution of fair-dealing. Philos Sci 66:391–414CrossRefGoogle Scholar
  16. Tennant N (2008a) The logical structure of scientific explanation and prediction: simple pendula with small oscillations in a uniform gravitational field (Unpublished typescript)Google Scholar
  17. Tennant N (2008b) The logical structure of scientific explanation and prediction: projectiles in a uniform gravitational field (Unpublished typescript)Google Scholar
  18. Tennant N (2010) The logical structure of scientific explanation and prediction: planetary orbits in a Sun’s gravitational field. Studia Logica 95:207–232CrossRefGoogle Scholar
  19. Walsh DM (2012) The struggle for life and the conditions of existence. In: Brinkworth MH, Weinert F (eds) Evolution 2.0: implications of Darwinism in philosophy and the social and natural sciences. Springer, HeidelbergGoogle Scholar
  20. Williams GC (1966) Adaptation and natural selection. Princeton University Press, PrincetonGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Department of PhilosophyThe Ohio State UniversityColumbusUSA

Personalised recommendations