Biology & Philosophy

, Volume 26, Issue 6, pp 813–835

Righteous modeling: the competence of classical population genetics

Authors

Original Research

DOI: 10.1007/s10539-011-9268-0

Cite this article as:
Gildenhuys, P. Biol Philos (2011) 26: 813. doi:10.1007/s10539-011-9268-0

Abstract

In a recent article, “Wayward Modeling: Population Genetics and Natural Selection,” Bruce Glymour claims that population genetics is burdened by serious predictive and explanatory inadequacies and that the theory itself is to blame. Because Glymour overlooks a variety of formal modeling techniques in population genetics, his arguments do not quite undermine a major scientific theory. However, his arguments are extremely valuable as they provide definitive proof that those who would deploy classical population genetics over natural systems must do so with careful attention to interactions between individual population members and environmental causes. Glymour’s arguments have deep implications for causation in classical population genetics.

Keywords

Natural selectionPopulation geneticsFitnessCausationEvolutionary theory

Introduction

In a recent article, “Wayward Modeling: Population Genetics and Natural Selection,” Bruce Glymour claims that population genetics is burdened by serious predictive and explanatory inadequacies and that the theory itself is to blame: “population genetics models evolving populations with the wrong variables related by the wrong equations employing the wrong kinds of parameters” (Glymour 2006, 371).1 In particular, Glymour gainsays the core commitment, that population genetics provides the “core formal machinery for describing and understanding natural selection and the evolutionary events it produces” (Glymour 2006, 369). Because Glymour overlooks a variety of formal modeling techniques in population genetics, his arguments do not quite undermine a major scientific theory.2 However, his arguments are extremely valuable as they provide definitive proof that those who would deploy classical population genetics over natural systems must do so with careful attention to the causes that operate over individual population members. Glymour’s arguments defeat an account of fitness variables according to which these quantify the causal influence of many indeterminate “individual-level” causes, a stance on fitness shared by advocates of the received view and its statisticalist rival.

In what follows, I consider Glymour’s arguments in depth as they apply to classical population genetics models, focusing in particular on Wright-Fisher models, which Glymour clearly had in mind, and then I discuss the formal techniques population geneticists have developed to handle the sorts of causes that Glymour thinks undermine inferences about system dynamics based on population genetics. The final section reconciles Glymour’s arguments with philosophers’ intuitions that population genetics must abstract away from individual-level causes operative within natural populations.

Long-run dynamics

One important fact about how classical population genetics works that is critical to Glymour’s arguments is how classical equations are used to make inferences about the dynamics of natural populations into the future: The response variables of classical population genetics equations are frequencies of types at arbitrary future times (2006, 374). Classical models are recursively structured: a system of equations adequate to make inferences about a single generation is applied to a succession of them. This means that a population genetics model for some system must be adequate to handle the causes that are operate over a considerable length of time; the formalism is not adjusted generation by generation, even if the impact of environmental causes varies over that time.

When classical models are applied to a natural population, the population must be researched to ascertain what system of equations with what variables at what values will be appropriate for the natural population. Glymour calls this phase the estimation period. Once researchers fix on a system of equations and also some values or value ranges for some variables, those equations are then used to make inferences about dynamics of the population into the future, for what Glymour calls the projection period. It is absolutely critical to Glymour’s argument that the equations derived from the estimation period remain fixed throughout their use in the projection period, and this is exactly how recursively structured classical population genetics models work.

Narrow population genetics

Glymour arguments for the incompetence of population genetics rest chiefly on his ascription of two limitations to the theory. First, Glymour’s writes that fitness coefficients in population genetics are in general exogenous variables:

In population genetics models, fitnesses (or functions of them) are either fixed parameters or random variables drawn from some constant distribution. It follows that fitnesses and functions of them are not endogenous variables in population genetics models. (Glymour 2006, 373)

This is not a general truth about classical models: density-dependent selection and frequency-dependent selection models are both types of models in which fitness variables are endogenous. Later on in the paper, Glymour discusses these sorts of models writing that “the qualitative relations between fitnesses depend on selection pressures which themselves depend on class frequencies” (Glymour 2006, 380–381). But as we will see, Glymour does not discuss how fitness functions can be used to quantify the influence of the sorts of causes he considers in his arguments for the incompetence of the theory.

Glymour goes on to infer from the exogeneity of fitness variables that “nongenetic causes of reproductive success are not represented” in population genetics models (Glymour 2006, 373). Especially if one has a background in structural equation modeling, one might expect environmental causal influences to be handled in classical population genetics through the deployment of variables that represent them, perhaps ones that serve as arguments in fitness functions. Glymour seems to have such an expectation, noting that environmental causes that influence their future values do not appear in population genetics equations (2006, 378). But nongenetic causes can be represented in population genetics models even without the use of fitness functions. This is done by partitioning the population. Glymour writes that a fully specific population genetics model of an arbitrary population will feature a partition of it into genic or genotypic classes (2006, 373–374), but individuals can vary in terms of their genotype, their substructure membership, their sub-group membership, their sex, and their sub-environment.3 The additional partitions are used to capture a variety of causal influences without deploying variables in the formal equations that represent them.4

Though based on a narrow view of population genetics modeling practice, Glymour’s arguments remain very valuable because they expose how population geneticists must take advantage of the full resources of the theory to handle the sorts of cases that Glymour discusses in two main arguments for the incompetence of population genetics, the Argument from Direct Estimation and the Argument from Noncausal Models. A version of classical population genetics without fitness functions and partitioning would indeed be hamstrung in its potential to model the dynamics of causally complex natural populations for precisely the reasons Glymour gives. So, to understand Glymour’s arguments for incompetence, it is helpful to see just how badly off we would be if population geneticists lacked fitness functions and nongenetic partitions.

What Glymour seems to have in mind when he discusses the incompetence of population genetics are the simple models that are typically presented in the initial sections of elementary textbooks. We’ll focus on the familiar genotypic selection model:
$$ \begin{aligned} p^{\prime } = & \frac{{w_{AA} p^{2} + w_{Aa} pq}}{{w_{AA} p^{2} + 2w_{Aa} pq + w_{aa} q^{2} }} \\ q^{\prime } = & \frac{{w_{aa} q^{2} + w_{Aa} pq}}{{w_{AA} p^{2} + 2w_{Aa} pq + w_{aa} q^{2} }} \\ \end{aligned} $$
(1)
p = frequency or absolute number of gametes bearing the A allele, q = frequency or absolute number of gametes bearing the a allele, w(AA) = viability of the AA homozygote, w(Aa) = viability of the Aa heterozygote, w(aa) = viability of the a homozygote, p′ = next-generation frequency of gametes bearing the A allele, q′ = next-generation frequency of gametes bearing the a allele.
A population governed by (1) can be graphed this way, where the nodes refer to absolute numbers or frequencies of the entities by which the nodes are labeled (Fig. 1)5:
https://static-content.springer.com/image/art%3A10.1007%2Fs10539-011-9268-0/MediaObjects/10539_2011_9268_Fig1_HTML.gif
Fig. 1

Directed causal graph of a simple natural population

Note how (1) has all the features that Glymour regards classical models as having: fitness coefficients are exogenous and the population is partitioned into classes based on genotype alone. Moreover, the cause of the post-selection frequency of each zygote is just the same zygote’s pre-selection frequency. Thus, the equations are adequate to capture the causal influence of the genetic variations that distinguish the zygotes over their post-selection frequency in a homogenous environment, but nongenetic causes are not represented.

Glymour takes it that real populations will be beset by many environmental causes that affect their dynamics and hence population genetics models such as (1) will fail to adequately model the dynamics of such populations. Real populations, for Glymour, will look more like the one pictured in Fig. 2.

In this last graph, a host of environmental causal influences over zygote viability (E1–E5) are each represented. What is critical is that we seemingly cannot model the sort of system represented by Fig. 2 using (1) because the equations do not capture nongenetic causes.
https://static-content.springer.com/image/art%3A10.1007%2Fs10539-011-9268-0/MediaObjects/10539_2011_9268_Fig2_HTML.gif
Fig. 2

Directed causal graph of a natural population subject to five environmental causes

Glymour writes that the received understanding of population genetics is that fitness coefficients are supposed to capture the impact of all the causal influences on zygote viability. He gets this idea from Elliot Sober:

It doesn’t matter to the equations in population genetics why a given population is characterized by a set of selection coefficients. … These values may just as well have dropped out of the sky. (Sober 1984, 59; quoted in Glymour 2006, 370).

Glymour himself writes that, on the received view, “the consequences of selection do not depend on the causal details by which fitness differences arise, but only the single facet of selection characterized by those differences” (Glymour 2006, 371). On the received view, the dynamical force/cause of selection is used to capture the impact of a range of environmental causes of population dynamics, yielding a picture like Fig. 3.

But, and here is the key point, the variable representing selection, W, is not a function of E1–E5 in Wright-Fisher models. This is true despite the fact that in reality relative reproduction rates, what fitness variables must track for the models to yield true conclusions about system dynamics, do depend on the environmental causal influences that operate upon population members. So even though W is supposed to capture a variety of causal influences over population dynamics that really operate in nature and determine who reproduces and how much, on the received view these are left unrepresented in population genetics equations and a single exogenous fixed parameter, W, is used in their place.

Were E1–E5 to have a consistent impact as the population evolved, then they could be harmlessly ignored. But if E1–E5 are significant causal influences on post-selection zygote frequencies, and ones whose impact on population dynamics changes from generation to generation, then we will make bad inferences about the dynamics of the system using equations featuring fixed exogenous fitness variables. This is evident from Fig. 3: W depends on E1–E5, which change in value while W stays fixed.
https://static-content.springer.com/image/art%3A10.1007%2Fs10539-011-9268-0/MediaObjects/10539_2011_9268_Fig3_HTML.gif
Fig. 3

Directed causal graph of a natural population with environmental causes captured by selection

As Glymour himself notes about his arguments, they

exploit a pair of facts: the causes of reproductive success vary over generational time, and the dependence relations between these causes and reproductive success are not represented in population genetics models. (Glymour 2006, 380)

Indeed, environmental causes that vary over generational time will hardly be exceptional, for any environmental cause for which different population members take different values will change in its impact on population dynamics in an evolving population unless it itself evolves in lockstep with the population, a condition that will almost never be met in nature and would be impossible to anticipate. For instance, a cause that, say, strikes a third of population members and leaves two-thirds alone and which impedes the survival of all but the dominant homozygotes will have a different impact when the dominant homozygotes are rare as compared to when they are at a middling frequency. (For further details, see Glymour (2006, 383) for a discussion of how interventions on relative frequency variables will produce changes in the distribution of environmental causes of reproduction.)

Glymour’s arguments for incompetence

We are now in a better position to understand Glymour’s arguments for the incompetence of population genetics. Consider the first one, Glymour’s argument from Direct Estimation (Glymour 2006, 374–376). Glymour starts by noting that fitness parameters are estimates of reproductive success relative to an environment, which consists in causes of reproductive success. Population geneticists must individuate environments, says Glymour, and he offers two suggestions on how we can do so.

According to Glymour’s first suggestion, we can create narrow environments by using specific values for the environmental causes of reproductive success of the zygotes. The genotypes of different population members will have definite causal influences in narrow environments. But if we do carve up environments in this way, we cannot generate good expectations about the future, because “narrow environmental nearly always change over generational time” (Glymour 2006, 375). The fitness values we assign based on the reproductive success of the zygotes during the estimation period, generations 1 to j, will fail to reflect the reproductive success of the zygotes during the projection period, generations j + 1 to n, because the environment will be different in the later generations than it was in the earlier ones.

Glymour’s second suggestion for individuating environments is to use generalized environments. On this approach, we recognize that environments change over time, and hence we use fitness parameters that are averages of the expected reproductive success of our rival types in each narrow environment, weighted for the frequency with which the narrow environment occurs in the estimation period. The problem now is that unless we get the same suite of narrow environments during the projection period, our fitness parameters will not allow us to make good predictions about dynamics up until generation n. And we will not be in a position to know what suite of narrow environments will likely occur in generations j + 1 to n unless we have an impossibly huge estimation period. Indeed, since the order in which different narrow environments occur will equally be relevant to the dynamics of the system during the projection period, even knowing the frequency and causal influence of each narrow environment on the population will not be enough to calculate the dynamics of the system during the projection period.

Glymour’s second argument for the incompetence of population genetics begins with the claim that fitness variables or functions of them are in general fixed parameters or random variables drawn from a fixed distribution. He follows this with the claim that “environmental and phenotypic variables, when introduced at all, are used to constrain this fixed distribution” (Glymour 2006, 377). He then draws the implication that the dependence relation between reproductive success and environmental and phenotypic variables must be constant in form for population genetics models to be predictively successful. Accordingly, population genetics cannot be used to accurately model populations featuring environmental causes that affect both exogenous variables, such as fitness variables, and endogenous ones, such as their own future values. Equally, changes in environmental causes that interact with genotypic variations will change the reproduction rates of the individuals bearing the genotypes, causing difficulties for a model in which the fitness coefficients that weight those genotypes stay fixed:

If fitnesses are to reliably track rates of reproductive success, values for the latter must be drawn from a constant distribution, with roughly the same form and with roughly the same statistical parameters as that from which the corresponding fitnesses are drawn. But this cannot be so if rates of reproductive success have interactive causes whose values vary over the projection period, since such variations will ‘switch regimes’, that is, change the distribution over particular rates of reproductive success. (Glymour 2006, 378–379)

Glymour thinks there are good reasons to think that the kinds of causal influences that he claims will scupper population genetics models exist in nature, and hence population genetics will regularly and predictably fail as a tool of dynamical inference.

It must be acknowledged that Glymour’s argument does show that simple systems of equations, such as (1), will indeed fail as tools of dynamical inference for natural populations subject to the sorts of causes that Glymour discusses. But population geneticists use models more sophisticated than (1) for such populations and these are perfectly adequate to the task of predicting and explaining the dynamics of the sorts of systems Glymour considers. The next section explains the models’ use.

How classical population genetics modeling handles nongenetic causes

Even though classical population genetics contains no variables that pick out environmental causal influences, classical population genetics can nonetheless be used to the dynamics of explain and predict populations subject to such causal influences. Causal influences that interact with genotypic variations and change over time are handled using one of two different techniques: partitioning and fitness functions.

Partitioning

For the sake of a definite example of a population beset by an environmental causal influence, consider a population of plants that spans an area in which part of the soil is toxic and where the plants exhibit genetic variations at a single locus that impact their resistance to the toxin. (The example is borrowed from Brandon (1990).) Figure 4 is a causal graph of the population.
https://static-content.springer.com/image/art%3A10.1007%2Fs10539-011-9268-0/MediaObjects/10539_2011_9268_Fig4_HTML.gif
Fig. 4

Directed causal graph of a natural population subject to a single environmental cause

Seemingly, unless we were to include in the equations for our system a variable for E1, the soil toxicity, we will not get the dynamics of the above system right. At any rate, in an evolving population, there is no value at which one might set the fitness coefficients in (1) to capture the dynamics of the graphed population because of the influence of E1. But because population genetics does not contain variables for interactive environmental causes, the theory seems to be in serious jeopardy. However, the population graphed in Fig. 4 might be equally well graphed in the following fashion, where there system is divided into two substructures, each substructure (numbered 1 or 2) picking out a different region of the ecosystem corresponding to toxic or non-toxic soil (Fig. 5):
https://static-content.springer.com/image/art%3A10.1007%2Fs10539-011-9268-0/MediaObjects/10539_2011_9268_Fig5_HTML.gif
Fig. 5

Directed causal graph of a natural population featuring two substructures

Here are the equations for our system (following Christiansen 1975):
$$ \begin{aligned} p_{1}^{\prime } & = \frac{{m_{(AA)1,1} w_{(AA)1} p_{1}^{2} + m_{(AA)2,1} w_{(AA)1} p_{2}^{2} + m_{(Aa)1,1} w_{(Aa)1} p_{1} q_{1} + m_{(Aa)2,1} w_{(Aa)1} p_{2} q_{2} }}{{\bar{w}}} \\ p_{2}^{\prime } & = \frac{{m_{(AA)1,2} w_{(AA)2} p_{1}^{2} + m_{(AA)2,2} w_{(AA)2} p_{2}^{2} + m_{(Aa)1,2} w_{(Aa)2} p_{1} q_{1} + m_{(Aa)2,2} w_{(Aa)2} p_{2} q_{2} }}{{\bar{w}}} \\ \bar{w} & = m_{(AA)1,1} w_{(AA)1} p_{1}^{2} + m_{(AA)2,1} w_{(AA)1} p_{2}^{2} + 2m_{(Aa)1,1} w_{(Aa)1} p_{1} q_{1} + 2m_{(Aa)2,1} w_{(Aa)1} p_{2} q_{2} \\ & + m_{(aa)1,1} w_{(aa)1} p_{1} q_{1} + m_{(aa)2,1} w_{(aa)1} p_{2} q_{2} + m_{(AA)1,2} w_{(AA)2} p_{1}^{2} + m_{(AA)2,2} w_{(AA)2} p_{2}^{2} \\ & + 2m_{(Aa)1,2} w_{(Aa)2} p_{1} q_{1} + 2m_{(Aa)2,2} w_{(Aa)2} p_{2} q_{2} + m_{(aa)1,2} w_{(aa)2} p_{1} q_{1} + m_{(aa)2,2} w_{(aa)2} p_{2} q_{2} \\ \end{aligned} $$
(2)
The new indices refer to substructure membership. The migration coefficients, the m’s, are indexed by genotype first, substructure of origin second, and destination substructure last, so that m(AA),1,2 refers to the rate at which dominant homozygotes move from the first to the second substructure.6

The partitioning technique involves taking the nongenetic causal influence over the dynamics of the system, in this case soil toxicity, and partitioning the population such that for each value of the environmental variable (toxic soil vs. non-toxic soil) there is a population fragment. Note how population members are distinguished by more than just their genetic variations. This is what allows the fitness coefficients that weight each frequency term to capture how toxicity influences system dynamics. Insofar as the fitness coefficient on a genotype in one sub-environment differs from its fitness coefficient in another, the soil toxicity is what is responsible for the difference.

Furthermore, if the gamete frequency variables change from generation to generation, then the impact of soil toxicity will change from generation to generation, since different proportions of each type of zygote will be exposed to each value of the environmental cause. This means that no classical population genetics model that fails to partition the natural population into substructures will accurately model the dynamics of the system. The impact of soil toxicity cannot just be averaged over or abstracted away for precisely the reasons Glymour gives: (1) would fail to yield true conclusions about the population no matter what values or range of values is used for its fitness variables. But that is no surprise, because (2) is the appropriate system of equations for the population rather than (1).

Partitioning in the face of environmental causes is not always achieved using substructures, as in the above example. For environmental causal influences that are not fixed features of the geographical environment, a different sort of spatially variable selection model may be used, one pioneered by Levene (1953). Here is a graph of a population facing a single environmental causal influence with two binary values (Fig. 6):
https://static-content.springer.com/image/art%3A10.1007%2Fs10539-011-9268-0/MediaObjects/10539_2011_9268_Fig6_HTML.gif
Fig. 6

Directed causal graph of a natural population featuring two niches

Here is the system of equations that can be used to calculate its dynamics:
$$ \begin{aligned} p^{\prime } = & \frac{{c_{(AA)1} w_{(AA)1} p^{2} + c_{(Aa)1} w_{(Aa)1} pq + c_{(AA)2} w_{(AA)2} p^{2} + c_{(Aa)2} w_{(Aa)2} pq}}{{\bar{w}}} \\ q^{\prime } = & \frac{{c_{(aa)1} w_{(aa)1} q^{2} + c_{(Aa)1} w_{(Aa)1} pq + c_{(aa)2} w_{(aa)2} q^{2} + c_{(Aa)2} w_{(Aa)2} pq}}{{\bar{w}}} \\ \end{aligned} $$
(3)
where
$$ \bar{w} = c_{(AA)1} w_{(AA)1} p^{2} + c_{(AA)2} w_{(AA)2} p^{2} + c_{(Aa)1} w_{(Aa)1} 2pq + c_{(Aa)2} w_{(Aa)2} 2pq + c_{(aa)1} w_{(aa)1} q^{2} + c_{(aa)2} w_{(aa)2} q^{2} $$
The c variables represent the rates at which the individual encounter each sub-environment and the numerical indices represent subenvironment.
Like the model featuring substructures, the Levene model straightforwardly generalizes to systems with more than one environmental causal influence. The dynamics of a discrete generation single-locus biallelic population beset by five binary interactive environmental causes could be inferred from the following system of equations:
$$ \begin{aligned} p^{\prime } = & \frac{{\sum\nolimits_{i = 1}^{m} {c_{(AA)i} w_{(AA)i} p^{2} } + \sum\nolimits_{i = 1}^{m} {c_{(Aa)i} w_{(Aa)i} pq} }}{{\bar{w}}} \\ q^{\prime } = & \frac{{\sum\nolimits_{i = 1}^{m} {c_{(aa)i} w_{(aa)i} q^{2} } + \sum\nolimits_{i = 1}^{m} {c_{(Aa)i} w_{(Aa)i} pq} }}{{\bar{w}}} \\ \bar{w} = & \sum\limits_{i = 1}^{m} {c_{(AA)i} w_{(AA)i} p^{2} } + \sum\limits_{i = 1}^{m} {c_{(Aa)i} w_{(Aa)i} 2pq + \sum\limits_{i = 1}^{m} {c_{(aa)i} w_{(aa)i} q^{2} } } \\ \end{aligned} $$
(4)
Here, each of the m = 32 sub-environments represents a combination of the possible values for each of the five binary causal influences stemming from the environment.7

It is worth emphasizing here that the use of 32 sub-environments for the case at hand is not optional. The above equations cannot be collapsed into simpler ones unless some of the environmental causes covary exactly with each over the course of the entire estimation and projection period, such that they strike the same proportion of individuals of each type in each generation, something that would require a miracle. Otherwise, setting the values of fitness variables by averaging over such environmental causes will lead to predictive and explanatory failure.

The partitioning technique for modeling nongenetic causal influences is quite widespread in classical population genetics. Sex may function as a nongenetic interactive causal influence over population dynamics, and for cases of this sort population geneticists have developed sex-dependent selection models, which distinguish gametes and zygotes on the basis of both genotype and sex. Multi-locus selection models are prompted in similar conditions: If alleles of interest interact with genetic variations at other loci, then the individuals in the population are distinguished by the genetic variations at all the interacting loci.

Fitness functions

While populations are partitioned in response to the presence of environmental and other causal influences that interact with genetic and other variations, a different modeling technique is used to handle the influence of causes that affect their future values; for such cases, relative fitness variables are made functions of the causes, which themselves may be functions of their values in previous generations. The most well-known example of this sort of case in one in which population members act as interactive causes of other population members’ descendant production, that is, cases of frequency-dependent selection. As noted earlier, though Glymour is familiar with these sorts of models (Glymour 2006, 381), he does not discuss how such models can be used to quantify the impact of causes that he considers in his arguments for the predictive and explanatory incompetence of population genetics.

The simplest frequency-dependent selection models feature fitness parameters that are additive functions of weighted relative frequency terms. For instance, Hori invokes negative frequency-dependent selection to explain a stable polymorphism among a population of cichlid fish whose dextral and sinistral morphs exert interactive causal influences on their relative hunting success (Cresswell and Sayre 1991; Hori 1993). Note that even in simple frequency-dependent selection, we have a case in which causes affect both endogenous and what would otherwise be exogenous variables in the formalism: this-generation relative frequency terms affect this-generation fitnesses and, indirectly, next-generation relative frequency terms. But there is no limit to the complexity of causes that can be modeled using fitness functions, and fitness functions may include all manner of variables that affect the dynamics of the population in one generation and their own values in a subsequent one. Glymour considers a rather complicated case of this sort in his article on page 379; I use fitness functions to model that population, or at least one much like it, in the “Appendix”. Other similarly complex models featuring environmental causes that affect their future values are considered under the rubric of “niche construction” (see for instance, Laland et al. 2001). Anyway, the general point is simple: causal influences stemming from causes that affect system dynamics in one generation as well as their future values in later ones are captured in classical models using fitness functions. Though researchers have deployed complex fitness functions featuring variables governed by their recursive equations only recently, the use of simple versions of such functions is at least 80 years old. Accordingly, such causes in no way threaten the competence of population genetics models.

Temporally variable selection

Temporally variable selection models are a final type of population genetics model that deserves extended discussion here, since these are used for environmental causes that change over time independently of the dynamics of the population. While the spatially variable selection models considered earlier posit environmental causes that may change in their impact on the dynamics of the population, this change in impact is a byproduct of the dynamics of the system. For instance, in the population graphed in Fig. 4, migration causes the proportion of individuals facing each value of the environmental cause to change over time, at least when the population is off equilibrium. But even though the influence of the environmental causes modeled in spatially variable selection models changes as the population evolves, the causes themselves stay fixed. For the population graphed in Fig. 4, a definite fixed proportion of the environment is toxic, and this does not change from generation to generation. But the impact of some causal influences can be expected to change from generation to generation because they themselves change, not just because different numbers of population members encounter them in different generations. For these sorts of causes, what is needed is temporally variable selection models, in which fitness coefficients are functions of time.

It is easiest here to discuss a case of temporally variable selection in nature that has been the subject of intense study by biologists. Turelli, Schemske, and Bierzychudek deploy a temporally variable selection model to explain the dynamics of a population of desert perennials, Linanthus parryae (Schemske and Bierzychudek 2001, Schemske and Bierzychudek 2007, Turelli, Schemske, and Bierzychudek 2001). From year to year, precipitation rates in the Mojave Desert vary. Schemske et al. determined that precipitation is a causal influence that interacts with a pair of alleles that are also responsible for the color differences between the blue and white morphs. Different morphs are favored in different years depending on the extent of precipitation such that rainfall functions as an interactive environmental causal influence that changes from generation to generation independently of system dynamics. Accordingly, fitness coefficients on the morph frequencies are indeed by time.

To establish whether varying precipitation levels can account for the persistence of both morphs in the wild, the Linanthus researchers determine sufficient conditions for a protected polymorphism in a biallelic population under temporally variable selection and then analyze 11 years worth of data on several wild Linanthus populations to determine whether they meet these conditions. The researchers determine that one of the two conditions for a protected polymorphism is fulfilled, while the second condition is almost met. They also deploy a diffusion approximation to their system, or rather an approximation of their system, and do numerical analysis to verify its validity. On the basis of these formal techniques, Turelli et al. (2001, 1295–1296) claim that temporally variable selection is statistically consistent with the maintenance of the polymorphism. Schemske and Bierzychudek (2007, 2540) later implicate spatially variable selection as well in the maintenance of the polymorphism, perhaps involving interactions between the alleles that distinguish the morphs and cation uptake.

It is an interesting question to ask whether the Linnathus researchers’ explanation of the blue/white polymorphism is successful. On the one hand, their application of population genetics is assailable on many grounds. The authors are aware of many of these grounds, and many should be familiar from Glymour’s article. On the other hand, their work should hardly be considered done in vain; arguably they establish at least that Sewall Wright’s earlier explanation of the same polymorphism is a failure. The researchers themselves write: “Our goal is to make empirically useful qualitative and quantitative predictions in terms of estimable parameters rather than capturing all aspects of the biology” (Turelli et al. 2001, 1284) For instance, Turelli et al. claim to have shown why the white morph is more prevalent than the blue despite its lower arithmetic and geometric mean fitness (Turelli et al. 2001, 1296). Because the study illustrates the limits of population genetics modeling, it is worthwhile to discuss the limitations of the application of the formal modeling techniques from population genetics at greater depth.

For one thing, the temporally variable selection model that the researchers use is idealized. An age-structured model capturing age-dependent variations in seed survival and germination rates would be more appropriate than the simpler one the Linanthus researchers use instead. For another thing, the authors’ suspect, and later themselves experimentally establish (Schemske and Bierzychudek 2007), that causal influences that interact with genotypic variations are at work in the Linanthus population despite not being featured in their original model. The authors also assume a lognormal distribution of fitnesses, which, as Glymour argues, will only be valid provided that the population members are not subject to unaccounted for interactive causes of varying values. These are but some of the many ways in which the Linnathus researchers deliberately assume, idealize, suppress complexities, simplify, and leave causes unquantified. It should be noted, too, that some of the analytical techniques used by the researchers involve further idealizations. The diffusion analysis is strictly speaking appropriate for haploids.

The seed bank is a source of difficulties that the researchers discuss at especially great length:

This damping [effect of the seed bank] also implies that even a study that spans more than a decade may be insufficient for studying the range of environmental conditions responsible for the polymorphism or for determining whether the polymorphism is transient or stable. … The long-term fluctuations predicted by the diffusion’s stationary distribution would take well over 50 years to observe with plausible levels of seed longevity. Equally sobering is the fact that thorough studies of adult fitness spanning more than a decade would have to be supplemented by even more elaborate and sustained studies of seed bank demography and selection to understand all of the biology relevant to the transient behavior of this “simple” polymorphism. As demonstrated by the insights that have emerged from the decades of study of Darwin’s finches (Grant 1986; Grant and Grant 1989; Grant and Grant 1997), progress in merging evolutionary genetics with ecology may often require such long-term studies. (Turelli, Schemske and Bierzychudek 2001, 1296)

These remarks are clearly reminiscent of Glymour’s argument from direct estimation.

While the obstacles to the deployment of an exact model encountered by the Linnathus researchers amply illustrate many of the limitations of population genetics modeling techniques, the Linnathus study does nonetheless serve to underwrite one crucial point being defended here: classical population genetics models, specifically temporally variables selection models, can be used to quantify interactive causes whose values change over the projection period independently of system dynamics. However many causes the Linanthus researchers may overlook or idealize away, and at whatever cost, they do deploy fitnesses variables that are functions of time to capture the temporally varying interactive causal influence of precipitation. Moreover, the difficulties that researchers face in actually deploying dynamically sufficient models for complex natural populations do not show that the theory itself deploys the wrong variables related by the wrong equations employing the wrong kinds of parameters. Indeed, it is a lack of data, not a lack of available modeling techniques, which prevents the use of an age-structured model that would better capture the dynamics of dormant seeds (Turelli et al. 2001, 1284). Especially because the Linnathus researchers chiefly face epistemic barriers to the deployment of more exact models for their system, their difficulties do not provide especially strong grounds for questioning the core commitment that population genetics provides the means for describing and understanding natural selection and the evolutionary events it produces. No one would impugn physical theory merely because falling leaves are systems whose exact dynamics cannot be inferred using the theory owing to insurmountable epistemic barriers.

Causation in classical population genetics

Because Glymour overlooks the use of spatially variable selection models, temporally variable selection models, frequency-dependent selection models, and niche constructions models, his arguments for the incompetence of population genetics are unsound; they are not, however, invalid. Glymour has supplied us with some very powerful arguments, but he has mischaracterized their implications. Glymour has not shown that population genetics models are inadequate to convey an understanding of selection. But what he has done is show that classical population genetics cannot be deployed without careful attention to the causal influences that operate on individual members of a target population. Indeed, the idea that predictively and explanatorily adequate models of system dynamics must be causal models is precisely what Glymour argues in section 5 of his essay (Glymour 2006, 384–387).

Recent debate between advocates of the statisticalist interpretation of population genetics and advocates of the dynamical/causal view has centered on whether selection and drift are “population-level” causes. According to the statisticalists, the causes of system dynamics operate upon individual population members and fitness variables in population genetics equations are merely summaries or accumulations of the fundamental events that lead individual population members to reproduce to the extent that they do (Matthen and Ariew 2002, 82, see also Walsh et al. 2002).8 Some advocates of a causal or dynamical interpretation of evolutionary effectively grant that population genetics equations abstract away from “individual-level” processes, but that nonetheless selection should be understood as a causal process, though a “population-level” one (e.g., Millstein 2006; for a couple of alternative approaches, see Abrams 2007; Haug 2007). Sober, the source of the received view, is quite explicit that selection is dynamical force that operates over populations, even though fitness variables are akin to life expectancies that take into account a variety of causal influences over individual mortality:

In assigning you a life expectancy, an actuary assembles an overall picture of your changes of surviving another n years. … These probabilities are supposed to take into account your changes of mortality due to all possible causes, even ones that, as it turns out, will not do you in. (Sober 1984, 95, see also Stephens 2004 for a defense of this view).

But what Glymour’s arguments show is that we cannot manipulate the values of fitness variables in classical equations to effectively capture the impact of multiple causes that operate over individual population members if the distribution of types to environments changes over time. In an evolving population, such change is all but inevitable for interactive causes with multiple values. If some population members of a given type face one value for an environmental cause that interacts with individuals’ genetic variations, and other population members of the same type face another value for it (say because the latter happen to live in a different region of the ecosystem or at a later time), then fitness coefficients weighting the frequency of that type cannot be set at any value that would sustain credible inferences about population dynamics during the projection period. The same goes for other sorts of environmental variables that Glymour considers, ones that affect their future values or for other reasons show evolving statistical associations with types of individuals in the model: there is no way to alter fitness values to accommodate such causes. Accordingly, Glymour sees his arguments as undermining both the received and the statisticalist interpretations, since his arguments undermine a shared assumption that advocates of both views make about fitness variables, the assumption that fitness variables function as life expectancies do, capturing the impact of a range of causes, including perhaps even possible causes, that operate over individuals and quantifying the net effect of these: “On both [received and statisticalist] conceptions, [population genetics] equations and no others are required for explaining central tendencies. In this, both conceptions err” (Glymour 2006, 383; see also Brandon and Ramsey 2007)

To be predictively accurate, classical equations must, as Glymour has shown, reflect causal facts at the “individual-level.” As we have seen, the formalism is indeed equipped to do just this. For instance, when appropriate values are assigned to the right-hand side in (2) above, that equation literally dictates that 37 AA homozygotes face toxic soil in generation 5, while 22 do so in generation 17, 14 in generation 24, 18 in generation 33, and so on. Moreover, the dynamics of natural systems subject to variable selection regimes can be very sensitive to the values of the variables in the system; a system can go to fixation rather than gravitating to a stable polymorphism if just a handful of individuals are subject to a different value of an interactive environmental cause.9 It can matter a great deal exactly how many individuals of each type are influenced by which value of an environmental cause.

Still, the idea that population geneticists must abstract away from the causes that operate over individual population members is hardly groundless. It is evident that a host of causal influences operate over population members are not individually represented even in sophisticated population genetics models. Consider the Linanthus population again. While the model that the Linanthus researchers deploy takes into account the temporally variable interactive causal influence of precipitation levels upon the system, there are surely more causal influences over the survival and reproduction of individual Linanthus morphs than just the rain. For instance, in order to reproduce, the morphs must be visited by their pollinator, the Trichochorous beetle (Schemske and Bierzychudek 2001, 1273); presumably only some population members are pollinated. Furthermore, individuals’ reproduction rates are probably sensitive to their exposure to soil nutrients, and presumably nutrient levels in the soil are not absolutely homogenous across the entire area studied by Schemske and Bierzychudek. A few of the plants are also occasionally subject to the destructive influence of herbivores (Schemske and Bierzychudek 2001, 1273). The plants will also vary genetically, even within their morph classes, and some of these genes must exert a causal influence on the reproduction of their bearers. Other causal influences no doubt operate over the Linanthus population, too. In short, a host of causal influences operate over individual morphs that are not individually represented in any way in the equations deployed by the Linanthus researchers; the formalism features neither partitions nor functions that are in place to capture them.

There is a tension, then, between Glymour’s arguments, on the one hand, and the non-representation of causal influences in population genetics, on the other hand. The way out of this apparent tension is to recognize that Glymour’s arguments apply most forcefully to interactive causal influences and ones that exhibit systematic statistical associations with other causal influences (perhaps stemming from their influence over their future values). If individual population members are subject to such causes, and the formal model lacks partitions or functions for them, then we will not get the dynamics of the system right (except by chance). But if individuals are subject to non-interactive causal influences that exhibit no systematic statistical associations with target genetic variations, then we need not expect to get the dynamics during the projection period wrong even if we leave these causes out of the formal model, provided the extent of their influence is not too great. Consider again Fig. 3. Equations (1) would function as decent tools of inference for a sufficiently large population, provided E1–E5 do not interact with the genetic variations that distinguish the zygotes, and are not systematically statistically associated with any of them, and the population is effectively large enough that such causes are unlikely to swing relative frequencies a great deal in any generation. Such causes are commonly idealized away, since, in effectively large populations, they will in all likelihood merely alter the rates at which equilibriums are achieved, and keep population hovering around their equilibriums once they get there.

Causal influences over system dynamics that are non-interactive and exhibit no statistical associations with target variations can be incorporated into formal models through stochastic parameters, such as effective population size parameters, that capture all such random influences through a single variable. Thus, the intuition that there is a pooling of causes in population genetics such that a great many are quantified by a single parameter is not wrongheaded. But fitness parameters cannot, and do not, function this way; effective population size parameters and variance terms in diffusion theory do so instead (for details, see Gildenhuys 2009). This is perhaps worth emphasizing: in no models, idealized or otherwise, are fitness coefficients used to capture general trends or what happens as a result of causal variables that have not been explicitly measured and modeled; in some models effective population size variables are used to capture such causes while in other models the impact of such causes is idealized away. It should be acknowledged, however, that populations that are effectively small are ones whose dynamics are profoundly susceptible to the impact of the sorts of causes that are captured by effective population size variables, and population genetics equations provide predictively less powerful, if not useless, means of inferring dynamics in such cases.

Schemske and Bierzychudek’s Linanthus study provides a clear example of population geneticists who deploy exactly the above criterion for handling causal influences over individual population members in their research. The Linanthus researchers know that, in order to produce descendants, the plants must be visited by the Trichochorous beetle. They respond by spending over a 100 h counting pollinator visits to make sure that one morph is not preferentially visited by the beetle (Schemske and Bierzychudek 2001, 1273); they conclude that there is no statistically significant association between pollination and flower color (Schemske and Bierzychudek 2001, 1277). What the researchers are verifying is not that the pollinator does not cause the reproduction of the individual plants (which would be absurd), but merely that there exist no statistical associations between genotype and pollinator visitation. Similarly, the researchers send samples of the morphs to the laboratory and verify that neither morph makes more efficient use of available water resources (Schemske and Bierzychudek 2001, 1277). They are not checking that water does not exert a causal influence over the reproduction of their morphs (again, absurd); rather, they are making sure that this causal influence does not interact with the genotypic variations that distinguish the morphs to produce differences in water use efficiency. That genotypic variations exhibit no interactions or statistical associations with these causes is a necessary condition for the researchers use of a deterministic population genetics model that ignores their impact, though, as noted earlier, such causes may nonetheless have a random impact on system dynamics. The Linanthus researchers do, however, deploy stochastic diffusion approximations for the dynamics of their system that do take these myriad random influences into account, ones that are used to re-enforce the analytic results they derive for the system under temporally variable selection.

Conclusion

The critical upshot of my explanation of why Glymour’s arguments do not show that population genetics is predictively and explanatorily incompetent can be summarized this way: in classical population genetics, relative frequency terms and relative fitness functions contain causal information. The deployment of a model in which individuals are not distinguished by sex is a commitment to the non-existence of sex as an interactive causal influence over the dynamics of the system. The deployment of a model in which the population is not partitioned by membership in multiple niches is a commitment to the non-existence of interactive environmental influences over system dynamics that take more than one value. A commitment to a two-locus model is a commitment to the nonexistence of a third locus with genetic variations that interact with the alleles at either of the loci featured in the model, and so on. Conversely, a model in which the population is partitioned into causal contexts, one in which the fitness coefficient associated with a genetic variation in one causal context is distinct from the coefficient that weights it in an alternative causal context, is a model of a population in which the context is serving as an interactive cause of population dynamics. What Glymour’s arguments show is that an interpretation of fitness variables according to which these quantify more than the causal influences of the variations that distinguish their bearers, or the arguments that appear in the functions that set them, must fail. They do not show the failure of population genetics theory, however, since that theory has the resources to quantify such causes by means of partitions, fitness functions and stochastic parameters.

It should be stressed, too, that the deployment of population genetics over natural populations may involve making false and idealizing assumptions for the purposes of generating tractable systems of equations. In the body of the paper, I discuss a variety of techniques that can be used to quantify the impact of environmental causes to generate tractable models. But while population genetics contains these techniques, the theory is not without its limitations. This is evident in the Linanthus study whose authors make a number of unrealistic assumptions about their population in order to generate results about its dynamics during the projection period. The situation with population genetics is probably no different in this respect than that with other formal sciences. Population genetics is only predictively competent with respect to those populations for which the formal apparatus of the theory can be used to generate results, an array of systems that is larger than just those that can be modeled issuing exogenous fitness variables weighting frequency terms for individuals distinguished on the basis of genotype alone, but nonetheless not so large as to include any arbitrarily causally complex natural population.

It should be stressed, in closing, that everything said above is meant to apply to classical models only. Glymour makes many of his claims about population genetics in general, and other philosophers have been equally willing to write about population genetics, rather than about particular formalisms. The above description of population modeling practice is not meant to apply beyond the scope of a particular formalism and the conclusions drawn above should not be understood to hold outside the context of classical models.

Footnotes
1

The quote is not an isolated one. Glymour also writes that “for a large range of cases population genetics is both explanatorily and predictively incompetent” (Glymour 2006, 371) and that models in population genetics fail to “reliably predict the trajectory of particular populations through state-space defined by the frequencies of types in the population [or] changes in state variables consequent to interventions on other state variables” (Glymour 2006, 372).

 
2

The formal machinery discussed below for modeling environmental and other causes is not my invention and derived from population genetics textbooks and the primary literature.

 
3

Glymour (2008) discusses a case in which fitnesses are relativized to sub-group membership in another paper.

 
4

It is important to be clear exactly what is meant here. Partitions made to accommodate environmental causes can only be made if the existence of such causes is recognized. Moreover, the range of values of the environmental causes must be discerned for the contextualized fitnesses to take the right values. So environmental variables must be used in population genetics, but no variables that refer to them appear in the equations. Rather, the variables that appear both do appear and take the values they do because environmental variables have been recognized.

 
5

The edge loadings from gametes to zygotes represent rates at which zygotes are created from gametes. In the absence of such things as assortative mating, gametes pair with other gametes at rates equivalent to the frequencies at which the gametes exist in the population, that is, “randomly.” So zygote formation functions take gamete frequencies as independent variables and gamete frequencies as coefficients, leading to Hardy–Weinberg frequencies if we add heterozygote frequencies (as we do in the absence of genomic imprinting).

 
6

I should note that the above system of equations makes a great many non-mandatory assumptions about the population for the sake of a definite example.

 
7

Again, I make a host of assumptions for the sake of definiteness. I also leave the c parameters unspecified, though they will typically be set by functions of homing parameters and sub-environment size.

 
8

Matthen (2009) takes a more nuanced view, claiming that population genetics “suppresses” some causal influences that operate over individual population members, factors that meet one of a couple of conditions, microconstancy and metaconstancy. His stance is similar to the one I take up below, though we differ on how to specify what causes can be “suppressed” in population genetics models.

 
9

See Hedrick (1990) for a discussion of the range of values in which spatially variable selection leads to polymorphisms rather than fixation.

 

Acknowledgments

Thanks to an anonymous referee for many helpful comments that greatly improved the quality of the paper. Thanks also to Gillian Barker for one single extremely helpful suggestion.

Copyright information

© Springer Science+Business Media B.V. 2011