## Abstract

The adequacy of Elliott Sober’s analogy between classical mechanics and evolutionary theory—according to which both theories explain via a zero-force law and a set of forces that alter the zero-force state—has been criticized from various points of view. I focus here on McShea and Brandon’s claim that drift shouldn’t be considered a force because it is not directional. I argue that there are a number of different theses that could be meant by this, and show that one of those theses—the idea that drift cannot bias populations to be taken somewhere in the evolutionary space from one generation to the next—is actually false. Not only has this thesis been implicitly assumed in the discussion of the force analogy thus far, but it is also commonly found in a wider range of philosophical and biological texts. I argue that correcting this view, and the usual images associated with it, will thereby bring heuristic benefits that impact the force analogy discussion, but that also go beyond it.

This is a preview of subscription content, access via your institution.

## Notes

The same uses of language can be found in more recent publications, as can be seen, for example, in Gillespie's textbook, where he states that "[t]he Hardy–Weinberg law describes the equilibrium state of a single locus in a randomly mating diploid population that is free of other

*evolutionary forces*, such as mutation, migration, and genetic drift." (Gillespie 1998, p. 11, emph. added) or in Hartl and Clark's, as they claim that "[t]o deduce the [Hardy–Weinberg] genotype frequencies under random mating, additional assumptions are needed. First, the allele frequencies should not change from one generation to the next because of systematic*evolutionary forces*, the most important of which are mutation, migration, and natural selection" (Hartl and Clark 2007, p. 48, emph. added). For more examples, see Pence (2016).There are other ways in which an analogy between these two theories has been made. Darwin himself made some remarks in that direction (Darwin 1872, p. 63, 421). For a more recent example (and curiously, partly in response to Sober 2011), Díez and Lorenzano (2013) argue that they are analogous in the aprioricity status of their laws. Other possible ways of making the analogy can be seen in Williams (1980). I will focus here on Sober's (1984) presentation.

Of course I do not mean here an evolutionary factor for which one cannot make directional probabilistic predictions at all. For instance, utilizing drift-only models, one can obviously predict that, for some particular directional event

*e*(say, the event that an allele's frequency will increase in the next generation), P(*e*) =*x*. The point is that, with drift-only models, and considering a specific event-type E (short-term directional predictions, see below), then P(*e*) is never high enough to justify the belief that*e*will actually take place.That is not to say that evolutionary theory consists solely of population genetics. I would argue that this is not true even for microevolutionary theory (see Roffe and Ginnobili, in press).

This is not the only noteworthy aspect of Sober's analogy. Recently, two authors have claimed that "[t]here are (at least) two different ways in which [classical mechanics and evolutionary theory] may be analogous: (1) Evolutionary forces are like Newtonian forces in the way that they are used to construct mathematical models of the evolution of a system in time. (2) Evolutionary forces are like Newtonian forces in being causes of the temporal evolution of the system" (Hitchcock and Velasco 2014). Both of these aspects appear clearly in the quote below. My treatment here will be more akin to the first perspective, as I will study genetic drift, its directionality, and its similarity with the other factors only

*as they are modeled by theories like population genetics*. In other words, my concerns will be mainly metatheoretical/epistemological, not metaphysical.Perhaps one should include some other ones, such as meiotic drive or genetic draft (Skipper 2006); I only mention these four for simplicity.

The point had already been made by Matthen and Ariew (2002, p. 61). Still, it was McShea and Brandon who fully fleshed out the consequences of this thesis; so, my focus is on their presentation.

The particle model was originally introduced to show that a non-directional process at one level can have directional effects at the next level (e.g. an increase of the variance at the level of ensembles of populations)—see McShea and Brandon (2010), chapter 2. Nevertheless, I believe it can be fruitfully used to illustrate the authors' conception about drift (the particles themselves can be thought of, according to them, as drifting populations).

A gamete pool is, in most cases, an abstract entity or an idealization, not a spatiotemporally existing one. However, this does not matter. All that is required is that population genetic models do represent generational transitions in terms of those kinds of (statistical, if not biological) populations. That will be enough to make my point that,

*according to the usual models*, evolution by drift is a biased process (see below).Although Pence (2016) has argued that classical mechanics does contain stochastic forces as well, like brownian motion, and that the theories are analogous even in this sense. I will not go into this discussion here.

Of course, with a fair coin, one could also predict that, in a long series of throws, it will land heads approximately half the time and tails about half the time. But that would not be predicting a direction for the coin toss, but the probabilities that the coin goes in different directions.

The question of the conditions under which a probabilistic prediction is reliable is an important, and very difficult, one. It seems that they depend, at least partly, on the pragmatic context. For example, some have argued that the threshold of reliability should be set as a result of a balance of the risk/cost of getting a false positive vs. the risk/cost of getting a false negative (Rudner 1953). In the context at hand, it seems that being better than a coin toss is at least a necessary condition for reliability. Therefore, I will take having a probability greater than 0.5 as a necessary condition.

A similar point had been made by Stephens, who claims that drift can be thought of as having a direction towards the reduction of homozygosity (Stephens 2004). Brandon's response to this is to claim that, while it is true that drift tends to fix alleles, this is not a directional prediction since it does not say which of the alleles will end up fixing (Brandon 2006, p. 325). For the thesis presented above, this response is not available, since the fixation of a specific allele is being predicted.

At least in the long-term (i.e. remote generation) sense. It would still be debatable whether in the short-term (i.e. from one generation to the next), a direction could be predicted reliably.

There is a possible alternative reading of McShea and Brandon's response, according to which the absence of absorbing barriers simply points out to the impossibility of making deterministic predictions (i.e. absorbing barriers is what one

*would*need in order to make them)—going back, then, to the first thesis. Even if that was really their intention, it does not matter for my purposes. All I contend is that the second thesis is a different possible interpretation of their claims. Furthermore, my argument will concern neither the first nor the second interpretations mentioned so far, but a third one detailed in what follows.It is interesting to note that the qualification "at least in the simplest models" was added to the final version, but was absent in a previous draft of this essay (see Pence 2012, p. 7). This is not to criticize the author (as his work should be evaluated only by its final version), but I think it illustrates the tendency to think in these terms. Another point worth mentioning is that it is not clear what these "simplest models" are. The simplest model that I can think of (at least the simplest one that is minimally useful or realistic) is the particle model, for which, as I will show, Pence's claim is false.

In a previous writing (Roffe 2016 which curiously utilized the same image) I argued that, to correct that image, one must make the field "tilted" like a gabled roof, so that the drunk person has a greater chance of walking one way or the other depending on where she is in the space (for the reason, see below).

An example might help clarify what I mean by "short term" and "long term". Take a haploid population that has a frequency of 0.8 for an allele

*A*at generation*g*_{ n }. Saying that at the limit (after an infinite number of generations have passed)*A*will increase in frequency (it will actually become fixed with probability 0.8) is not the same as saying that*in generation g*_{ n+1}(the immediate next generation),*A*will increase in frequency. I label the first a "long term" prediction, since it tells us about what will happen in some (possibly remote) generation; "short term" effects deal with the immediate next generation to the one being considered (for simplicity's sake, I am assuming generations to be discrete and non-overlapping). The distinction is important because many take the first prediction to be derivable from population genetics theory, but not the second one.What is not wrong is the (perhaps confusingly similar) claim that, if there is no selection, mutation, etc., then every

*particular*allele has the same probability of being sampled in every generation, but not any allele*type*.Some qualifications must be made here, see the “Mathematical appendix” section.

For the way in which the biases must change according to the number of points, see the “Mathematical appendix” section below.

Perhaps there are other reasons for not doing so. I actually find McShea and Brandon’s “drift as constitutive” argument more adequate (see Roffe 2016). The point is only that drift shouldn’t be discarded as a force on the basis that it can

*never*(or almost never) bias populations one way or the other.Or, in the original context of this quote, for the ZFEL: "But the ZFEL arises from an

*unbiased*random process, and yet it predicts directional change: increasing diversity and complexity. How is this possible?" (McShea and Brandon 2010, p. 14).Notice also that in Futuyma’s quote, the population starts with a frequency of

*p*= 0.5, and the problem arises when he extends the result he obtains from the first generation to the second.Alternatively, this can be thought of as taking a sample of two alleles from a gamete population of 4 alleles.

Unlike McShea and Brandon's ZFEL (see their 2010), which predicts these effects only at the next level (the level of ensembles of populations).

Or, again, as Pence (2016) argues with brownian motion as an example, perhaps they are. But this is besides the point here, I am conceding this for the sake of argument.

I thank Bjørn Kjos–Hanssen for his help with this part of the proof.

## References

Barrett M, Clatterbuck H, Goldsby M et al (2012) Puzzles for ZFEL, McShea and Brandon’s zero force evolutionary law. Biol Philos 27:723–735

Beatty J (1984) Chance and natural selection. Philos Sci 51:183–211

Berwick RC, Chomsky N (2015) Why only us: language and evolution. MIT Press, Cambridge

Brandon RN (2006) The principle of drift: biology’s first law. J Philos 103:319–335

Brandon RN, McShea DW (2012) Four solutions for four puzzles. Biol Philos 27:737–744

Crow JF, Kimura M (1970) An introduction to population genetics theory. The Blackburn Press, New Jersey

Darwin C (1872) On the origin of species by means of natural selection, 6th edn. John Murray, London

Díez J, Lorenzano P (2013) Who got what wrong? Fodor and Piattelli on Darwin: guiding principles and explanatory models in natural selection. Erkenntnis 78:1143–1175

Earnshaw E (2015) Evolutionary forces and the Hardy–Weinberg equilibrium. Biol Philos 30:423–437

Feller W (1971) An introduction to probability theory and its applications. Wiley, New York

Futuyma DJ (1986) Evolutionary biology, 2nd edn. Sinauer Associates, Sunderland

Gillespie JH (1998) Population genetics: a concise guide. Johns Hopkins University Press, Baltimore

Hartl DL, Clark AG (2007) Principles of population genetics, 4th edn. Sinauer Associates, Sunderland

Hitchcock C, Velasco JD (2014) Evolutionary and Newtonian forces. Open Access J Philos 1:39–77

Kuhn TS (1970) The structure of scientific revolutions, 2nd edn. The University of Chicago Press, Chicago

Matthen M, Ariew A (2002) Two ways of thinking about fitness and natural selection. J Philos 99:55–83

McShea D, Brandon R (2010) Biology’s first law. The University of Chicago Press, Chicago

Pence CH (2012) It’s okay to call genetic drift a “Force”. Available at: http://philsci-archive.pitt.edu/9256/

Pence CH (2016) Is genetic drift a force? Synthese. doi:10.1007/s11229-016-1031-2

Roffe AJ (2016) La genética de poblaciones y las fuerzas evolutivas. Undergraduate dissertation, Universidad de Buenos Aires

Roffe AJ, Ginnobili S (in press) ¿Son los genetistas de poblaciones inductivistas estrechos? Scientiae Studia 15:2

Rudner R (1953) Value judgments in the acceptance of theories. In: Frank P (ed) The validation of scientific theories. The Beacon Press, Boston, pp 24–28

Skipper RA (2006) Stochastic evolutionary dynamics: drift versus draft. Philos Sci 73:655–665

Sober E (1984) The nature of selection: evolutionary theory in philosophical focus. University of Chicago Press, Chicago

Sober E (2011) A priori causal models of natural selection. Australas J Philos 89:571–589

Stephens C (2004) Selection, drift, and the “forces” of evolution. Philos Sci 71:550–570

Williams MB (1980) Similarities and differences between evolutionary theory and the theories of physics. PSA Proc Bienn Meet Philos Sci Assoc 2:385–396

## Acknowledgements

This work has been funded by the research projects PICT-2014 No. 1741 (ANPCyT, Argentina), PUNQ 1401/15 (National University of Quilmes, Argentina) and UNTREF 32/15 255 (Universidad Tres de Febrero, Argentina). I wish to thank the members of the philosophy of biology group directed by Santiago Ginnobili; I also thank Charles Pence, and an anonymous referee for helpful comments and criticisms on previous versions.

## Author information

### Authors and Affiliations

### Corresponding author

## Appendix

### Appendix

### Mathematical appendix

In this appendix, a number of precisions are introduced regarding the directional effects mentioned above. I first distinguish between general and particular effects (predictable in every case, or only in a limited subset). The second subsection analyzes if the biases mentioned can be said to have any general direction. And lastly, in the third subsection, McShea and Brandon’s idea that drift’s magnitude can be predicted while its direction cannot is revised.

#### General and particular directional effects

It is clear that, in each particular case, given enough information, we can predict the probability that a population goes “right” or “left” (i.e. an allele increases or decreases in frequency) in the evolutionary space. But, the discussion above suggests much more: that we can predict that, *in general*, drift will bias populations to move towards an increase of the most frequent allele, in every generation. This also seems like an intuitive result: If a population with two kinds of things has more of one, then we should expect deviations of the expected value to occur in that one’s direction. The next subsection examines in greater detail whether drift can be said to have this general directional effects.

#### General directional effects

This subsection examines the generality of the claim that, in every generation, drift will tend to move populations towards an increase of the most frequent allele more than the reverse. Unfortunately, for various reasons, this is not true in general. First, while we can treat gamete sampling as being *with replacement*, parental sampling is a process *without replacement*. Now, for this second type of sampling, there are counterexamples to the claim made above. For instance, consider a haploid population with two alleles (*A* and *a*) in one *locus*. If population size at birth is *N* = 9, the frequency of allele *A* equals 2/3 (there are 6 *A* individuals, and so *A* is the most frequent allele), and the sample size (adult population size) is *n* = 6, then the expected number of *A* individuals in the adult population is 4. However, the probability that allele frequency increases (that is, that the sample contains either 5 or 6 *A* individuals) is smaller than the probability that it decreases (that it contains all 3 *a* individuals in it)—the probabilities are 0.226 and 0.238 respectively.

The question that arises is, then, if the result holds for sampling processes *with* replacement (like gamete sampling, or parental sampling when populations are very large). The answer is that this also isn’t the case. Take *p* = 0.51 and *n* = 2, for example. What is happening here is that the expected value is not a possible sample value, so a frequency of 0.5 in the sample (i.e. when the sample contains one individual of each kind) will count as a case where the frequency of *A* decreased. This problem arises because the mean is close to, but not exactly, 2. It might be argued that this is merely an effect of the “incommensurability” between the population and sample sizes, an artifact of the choice of sample size, and that it doesn’t reflect drift’s “intrinsic” direction.

These kinds of cases can be eliminated by putting some mathematical restrictions in place, for example, by establishing that \(np \in {\mathbf{\mathbb{Z}}}\). With this restriction, the problem of the general direction can now be analyzed as follows. Call \(\hat{p}\) the sample frequency of allele *A*. What we would need to establish mathematically to prove our general claim is that “If *p* > 0.5, then \(P(\hat{p} > p) > P(\hat{p} < p)\)”. This is, of course, equivalent to claiming that “If *p* > 0.5, then \(P(\hat{p} \ge p) > P(\hat{p} \le p)\)”. In a sampling process without replacement, this translates to:

Even though this would be an interesting result, even from a purely mathematical point of view, I haven’t found any mention of it in the mathematical (or biological) literature. And, probably due to my limited abilities in mathematics, I haven’t been able to prove it (nor refute it) myself, at least not generally. However, what I have been able to do is to get certain partial results that I think may help illuminate how drift is working in these kind of models.

For example, it is possible to show^{Footnote 29} that the result holds for the case where \(p = 1 - (1/n)\). This case is interesting because it is one of the most counter-intuitive ones. That is, if in a population of 1000 alleles (gametes) there are 990 *A* ones, and a sample with replacement is taken of 100 individuals, then what the result claims is that the probability of obtaining 100 *A* individuals is higher than *the sum* of the probabilities that 1, 2, 3,…, 98 *A* alleles are obtained.

###
**Theorem 1**

*If*
\(p = 1 - \frac{1}{n}\)
*, then*
\(P(\hat{p} > p) > P(\hat{p} < p)\)

Formally, what needs to be established is that:

Since \(\left( \begin{aligned} n \\ n \\ \end{aligned} \right)\) *=* 1, and replacing *p* for the assumed value, this is equivalent to:

Multiplying both sides by *n*
^{n}, we get

Next, the following lemma can be established:

###
**Lemma 1**

###
*Proof of Lemma 1*

The first term in the subtraction can be operated with as follows. Given the binomial theorem, which claims that:

The first term reduces to:

The third term can also be simplified as follows, which directly gives us the desired result:

**□**

Given this lemma, all that remains to be proved is that \(\left( {n - 1} \right)^{n} > n^{n} - (n - 1)^{n} - n(n - 1)^{n - 1}\), that is, that \(n^{n} < 2(n - 1)^{n} + n(n - 1)^{n - 1}\).

Multiplying both sides by \(\frac{1}{{\left( {n - 1} \right)^{n} }}\), we get

or with \(m = n - 1,\)

Multiplying the term on the right by \(\frac{m}{m}\)

What is left to prove is that:

And this is true, since it can be checked directly for \(m \in \{ 2, \ldots ,6\}\), and the left side is bounded by *e*, while the right side (which represents an increasing function) surpasses *e* by *m* = 7. **□**

A possible demonstration strategy of the general result would be the following. We first show that, for a starting frequency of *p* = 0.5, \(P(\hat{p} > p) = P(\hat{p} < p)\). This is fairly easy to establish. We need to show that:

###
**Theorem 2**

###
*Proof of Theorem 2*

The strategy here is to “pair up” the factors of the sums, such that the probability that *k* = 0 on the left “pairs” with *k* = *n* on the right; *k* = 1 on the left with *k* = *n* − 1 on the right… and, in general, *k* = *r* in the left with *k* = *n* − *r* on the right (Fig. 3).

Since \(\left( {\frac{1}{2}} \right)^{n - k} \left( {\frac{1}{2}} \right)^{n - (n - k)} = \left( {\frac{1}{2}} \right)^{k} \left( {\frac{1}{2}} \right)^{n - k}\), and since

then each of the paired factors will be equal to the other. **□**

Next, I establish the following conjecture: as *p* increases by “steps” of 1/*n*, \(P(\hat{p} \ge p)\) increases as well. That would give us a sort of induction over *p* (theorem 2 being the base step, and this conjecture the inductive step). Formally, we need that:

That is, that

###
**Conjecture 1**

\(\sum\limits_{k = 0}^{r + 1} {\left( \begin{aligned} n \\ n - k \\ \end{aligned} \right)} \left( {1 - \frac{r + 1}{n}} \right)^{n - k} \left( {\frac{r + 1}{n}} \right)^{k}\)
*is a decreasing function for r* *=* *0, 1, 2,…, n* *–* *1.*

If this could be established it would seem to show something else. Put in the terminology of the coin game image, it would seem to show that the bias of the coin gets bigger and bigger the more one gets closer to one of the absorbing states. However, that would be partially misleading, since proving that \(P(\hat{p} \ge p)\) increases when p increases by steps of 1/n is not equivalent to proving that \(P(\hat{p} > p)\) increases in the same way. This last statement is, in fact, false. Consider the case where n = 10; for p = 0.9, \(P(\hat{p} > p)\) = 0.348, while for p = 0.8, \(P(\hat{p} > p)\) = 0.375. It does hold however, in this case, for \(P(\hat{p} \ge p)\). This is due to the fact that the increase in the probability of the expected value \((\hat{p} = p)\) is greater than the decrease of the probability of going right.

If the conjecture holds, then this would paint an interesting and complicated picture for the changes in the coins’ bias. That is, the bias would still exist (and the main result needed to claim that drift has a general direction would be established), but the changes in the bias would be somewhat strange: as one moves closer to 100 points, the probability of landing heads or sideways increases. However, at the same time, it could happen that the coin gets closer to 100 points and the bias gets smaller, while this decrease would be compensated by the probability of it landing sideways getting bigger.

In any case, two conclusions from the preceding discussion can be drawn. First, that whether or not drift has a general direction, particular directions can be predicted a priori in particular cases, and this is enough to make the point of the article. And second, that if the conjecture holds, then the general claim can be made, and it is only for the changes in the bias that one gets more complicated results.

#### The magnitude of drift

Lastly, another interesting point concerns notion of the magnitude of drift, and its differences with the direction. It is generally assumed—e.g. it was by Brandon 2006, p. 324—that the magnitude of drift can be predicted a priori, in a sense that its direction cannot, by considering population sizes.

What the previous considerations show is that the expected *magnitude* of drift depends not only on the population size, but also on the starting frequencies of the alleles. The last example shows a case where increasing the frequency of an *A* allele resulted in an increase of \(P(\hat{p} = p)\) and in a decrease of both \(P(\hat{p} < p)\) and \(P(\hat{p} > p)\). That is, a mere increase in the frequency of an allele, without modifying population sizes, made drift “weaker”—it diminished the probability that the sample values deviate from the expected ones. In fact, both magnitude and direction depend upon both factors, population size and initial frequencies.

Additionally, as it happens with the case of the direction, it is not possible to obtain a deterministic prediction about the magnitude of drift simply by specifying the population sizes and the initial frequencies (e.g. a prediction that tells us that *p* will either increase or decrease by 0.1). At most, specifying that would allow us to say, with a given degree of confidence, that the sample values will occur within a certain interval (the smaller the interval, the less confidence we have). But this prediction is also different than the one given by selection only models—it doesn’t state a particular magnitude, but a range of magnitudes. In sum, it seems that whatever is said about drift’s direction, must also be said about its magnitude.

## Rights and permissions

## About this article

### Cite this article

Roffé, A.J. Genetic drift as a directional factor: biasing effects and a priori predictions.
*Biol Philos* **32**, 535–558 (2017). https://doi.org/10.1007/s10539-017-9575-1

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10539-017-9575-1

### Keywords

- Drift
- Directionality
- Evolutionary space
- Evolutionary forces