Conservation Genetics

, Volume 11, Issue 6, pp 2219–2229

A new method for the partition of allelic diversity within and between subpopulations

Authors

    • Departamento de Bioquímica, Genética e Inmunología, Facultad de BiologíaUniversidad de Vigo
  • Silvia T. Rodríguez-Ramilo
    • Departamento de Bioquímica, Genética e Inmunología, Facultad de BiologíaUniversidad de Vigo
Research Article

DOI: 10.1007/s10592-010-0107-7

Cite this article as:
Caballero, A. & Rodríguez-Ramilo, S.T. Conserv Genet (2010) 11: 2219. doi:10.1007/s10592-010-0107-7

Abstract

A method is proposed for the analysis of allelic diversity in the context of subdivided populations. The definition of an allelic distance between subpopulations allows for the partition of total allelic diversity into within- and between-subpopulation components, in a way analogous to the classical partition of gene diversity. A new definition of allelic differentiation, AST, between subpopulations results from this partition, and is contrasted with the concept of allelic richness differentiation. The partition of allelic diversity makes it possible to establish the relative contribution of each subpopulation to within and between-subpopulation components of diversity with implications in priorisation for conservation. A comparison between this partition and that corresponding to allelic richness is illustrated with an example. Computer simulations are used to investigate the behaviour of the new statistic AST in comparison with FST for a finite island model under a range of mutation and migration rates. AST has less dependence on migration rate than FST for large values of migration rate, but the opposite occurs for low migration rates. In addition, the variance in the estimates of AST is higher than that of FST for low mutation rates, but the opposite for high mutation rates.

Keywords

Allelic diversityGene diversityHeterozygosityRarefactionConservation priorities

Introduction

Most developments regarding priority of subpopulations for conservation are based on measures of gene diversity or expected heterozygosity (Nei 1973). However, allelic richness (the number of different alleles segregating in the population) is an alternative criterion to measure genetic diversity, and some authors (e.g. Petit et al. 1998; Notter 1999; Barker 2001; Simianer 2005; Foulley and Ollivier 2006) have considered that this parameter is of key relevance in conservation programmes. A high number of alleles implies a source of single-locus variation for important traits such as the major histocompatibility complex, which is responsible for the recognition of pathogens. Allelic richness is also important from a long-term perspective, because the limit of selection response is determined by the initial number of alleles (James 1970; Hill and Rasbash 1986) and, because it is more sensitive to bottlenecks than expected heterozygosity, it reflects better past fluctuations in population size (Nei et al. 1975; Allendorf 1986; Cornuet and Luikart 1996; Luikart et al. 1998; Leberg 2002). Marker-assisted methods for maximising the number of alleles conserved have been proposed and shown to be effective (Schoen and Brown 1993; Bataillon et al. 1996; Fernández et al. 2004).

Allelic richness provides complementary information to gene diversity (expected heterozygosity). Situations can be given of populations with the same heterozygosity but different allelic richness and viceversa. For instance, a population with two alleles with frequencies 0.5 each has the same expected heterozygosity (0.5) as a population with five alleles with frequencies 0.69, 0.08, 0.08, 0.08 and 0.07, and the same as another with 10 alleles with frequencies 0.7, 0.04 (six alleles) and 0.02 (three alleles). However, the consequences of these different population compositions can be different in terms of potentiality of the population for adaptation and evolution. Allelic richness and gene diversity can also behave differently in terms of genetic differentiation between subpopulations in the context of a subdivided population. El Mousadik and Petit (1996) proposed a coefficient of allelic richness differentiation (ρST) and found that this parameter gives higher values than gene diversity differentiation (FST; Wright 1969) in an analysis of allozymes in argan trees.

The partition of genetic diversity into components within and between-subpopulations has important applications for priorisation of populations in conservation (Petit et al. 1998; Caballero and Toro 2002; Foulley and Ollivier 2006; Ollivier and Foulley 2009; Toro et al. 2009; Wilson et al. 2009). The contribution of each subpopulation to total allelic richness and its components has been derived by Petit et al. (1998). In this partition, the within-subpopulation component refers to the average allelic richness of the subpopulations, whereas the contribution of each subpopulation to total allelic richness relies basically on the number of alleles present in that subpopulation and absent in the others, thus being a measure of the allelic uniqueness of that subpopulation, also called its private allelic richness (Petit et al. 1998; Foulley and Ollivier 2006; Ollivier and Foulley 2009; Toro et al. 2009). In this paper we propose an alternative partition of allelic diversity into within and between-subpopulation components in analogy to the corresponding partition for gene diversity. A parameter for measuring allelic differentiation between subpopulations (that we will call AST) is proposed that also differs from the allelic richness differentiation parameter ρST of El Mousadik and Petit (1996). The differences between the two methods are illustrated with an example. Finally, computer simulations are carried out to explore the behaviour of AST in comparison with Wright’s FST for a finite island model under a range of mutation and migration rates.

Gene and allelic differentiation between subpopulations

In a structured population with n subpopulations, the total gene diversity or expected heterozygosity (HT) can be partitioned into a component within subpopulations (HS) and another (HT − HS) between subpopulations (Nei 1973),
$$ H_{S} = 1 - \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {\sum\limits_{k = 1}^{K} {p_{i,k}^{2} } } \right)} , $$
(1)
$$ H_{T} = 1 - \sum\limits_{k = 1}^{K} {\left( {\sum\limits_{i = 1}^{n} {{\frac{{p_{i,k} }}{n}}} } \right)}^{2}, $$
(2)
where pi,k is the frequency of allele k for a given locus in subpopulation i and K is the total number of alleles in the population. The between-subpopulation component of gene diversity (HT − HS) is also the average Nei’s minimum distance between populations
$$ D_{G} = {\frac{1}{{n^{2} }}}\left[ {\sum\limits_{i,j = 1}^{n} {d_{G,ij} } } \right] , $$
(3)
where \( d_{G,ij} = \frac{1}{2}\sum\limits_{k = 1}^{K} {\left( {p_{ik} - p_{jk} } \right)^{2} } \)is the distance between subpopulations i and j. Consequently,
$$ H_{T} = H_{S} + D_{G} . $$
(4)
Wright’s (1969) FST (we will keep the term FST rather than GST even for the multiallelic case, for simplicity) is defined as the proportion of diversity between subpopulations relative to the total diversity,
$$ F_{ST} = {{\left( {H_{T} - H_{S} } \right)} \mathord{\left/ {\vphantom {{\left( {H_{T} - H_{S} } \right)} {H_{T} }}} \right. \kern-\nulldelimiterspace} {H_{T} }} = {{D_{G} } \mathord{\left/ {\vphantom {{D_{G} } {H_{T} }}} \right. \kern-\nulldelimiterspace} {H_{T} }} . $$
(5)

When more than one locus is considered, HS, DG, and HT should be calculated for each locus with expressions (13), the values averaged over loci, and introduced into Eq. 5 to obtain the average FST. This is equivalent to averaging FST values from each locus weighted by their corresponding HT, a common procedure to average estimates of FST over loci (Reynolds et al. 1983; Weir and Cockerham 1984).

In the case of allelic richness, El Mousadik and Petit (1996) proposed a coefficient of subpopulation differentiation. Because allelic richness is highly dependent on sample size, they proposed to estimate the number of expected alleles in samples of specified size by using the rarefaction methodology (Sanders 1968; Hurlbert 1971). In this approach, the smallest sample size is chosen as a reference to examine the number of alleles present in all samples. In the context of a subdivided population, if Nik represents the number of copies of the kth allele from the sample of a given subpopulation i and Ni represents the total number of genes present in that subpopulation, the allelic richness at one locus is denoted as the expected number of different alleles that a sample had if the sample size had been g genes (usually the smallest sample size) instead of Ni (≥g). The expected number of different alleles in a sample of genes taken at random is then equal to
$$ a_{i} = \sum\limits_{k = 1}^{K} {\left( {1 - P_{ik} } \right)} . $$
(6)
(El Mousadik and Petit 1996), where
$$ P_{ik} = {{\left( {\begin{array}{*{20}c} {N_{i} - N_{ik} } \\ g \\ \end{array} } \right)} \mathord{\left/ {\vphantom {{\left( {\begin{array}{*{20}c} {N_{i} - N_{ik} } \\ g \\ \end{array} } \right)} {\left( {\begin{array}{*{20}c} {N_{i} } \\ g \\ \end{array} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\begin{array}{*{20}c} {N_{i} } \\ g \\ \end{array} } \right)}} $$
(7)
is the probability that allele k does not occur in a sample of g genes chosen at random. If Ni = g rarefaction is not necessary, and Pik in expression (7) should be 0 when Nik > 0 or 1 when Nik = 0. For Ni > g, Pik should be set up to zero when Ni − Nik < g.
The coefficient of allelic richness differentiation proposed by El Mousadik and Petit (1996) is
$$ \rho_{ST} = {\frac{{\left( {R_{T} - R_{S} } \right)}}{{\left( {R_{T} - 1} \right)}}} , $$
(8)
where
$$ R_{S} = \frac{1}{n}\sum\limits_{i = 1}^{n} {a_{i} } $$
(9)
is the average allelic richness over subpopulations and RT is the richness of the total population, obtained from Eqs. 67 replacing Ni and Nik by NT and NTk, respectively, the corresponding numbers in the total population. The minus one term in Eq. 8 refers to the fact that a subpopulation with a single allele is considered to be devoid of allelic richness. The parameter ρST has been called AST by several authors (e.g. Comps et al. 2001; Tyler 2002; Persson et al. 2004; Stefenon et al. 2008). However, we will keep the original term to avoid further notation. Note that the definition of ρST depends on the average allelic richness within subpopulations assuming that a sample of g genes is taken from each, and the allelic richness of the total population, again assuming that a sample of g genes is taken from it. Thus, the specific allelic differences between subpopulations are not necessarily captured by this definition. In the hypothetical case where rarefaction is not necessary or is not made, RS would be the average number of alleles over subpopulations and RT = K, the total number of alleles in the whole population. Then, \( \rho_{ST} = {{\left( {K - R_{S} } \right)} \mathord{\left/ {\vphantom {{\left( {K - R_{S} } \right)} {\left( {K - 1} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {K - 1} \right)}} \) just depends on the average number of alleles within subpopulations irrespective of their distribution among subpopulations.

A definition of allelic differentiation between subpopulations

Following a derivation analogous to that of FST, it is possible to define a partition of allelic diversity into within and between-subpopulation components. The within-subpopulation component of allelic diversity is simply the average allelic richness across subpopulations defined above minus one,
$$ A_{S} = R_{S} - 1 = \left( {\frac{1}{n}\sum\limits_{i = 1}^{n} {a_{i} } } \right) - 1 , $$
(10)
where ai is defined by expressions (67). Now, as suggested by Foulley and Ollivier (2006) and Ollivier and Foulley (2009) (see also Weitzman 1998), an allelic dissimilarity or distance between two subpopulations may be defined as the number of alleles present in a subpopulation and absent in the other. Thus, the average allelic distance between subpopulations i and j can be obtained as
$$ d_{A,ij} = \frac{1}{2}\sum\limits_{k = 1}^{K} {\left[ {\left( {1 - P_{ik} } \right)P_{jk} + P_{ik} \left( {1 - P_{jk} } \right)} \right]} , $$
(11)
and the average distance between all subpopulations is
$$ D_{A} = {\frac{1}{{n^{2} }}}\left[ {\sum\limits_{i,j = 1}^{n} {d_{A,ij} } } \right] . $$
(12)
Hence, the total allelic diversity (AT) is the sum of both components,
$$ A_{T} = A_{S} + D_{A} = \left[ {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {a_{i} + \frac{1}{n}\sum\limits_{j = 1}^{n} {d_{ij} } } \right)} } \right] - 1 = \left[ {{\frac{1}{{n^{2} }}}\sum\limits_{k = 1}^{K} {\sum\limits_{i,j = 1}^{n} {\left( {1 - P_{ik} P_{jk} } \right)} } } \right] - 1 . $$
(13)
AT is not only the result of the sum of two terms with clear meanings, but it also has a meaning itself, the average pairwise diversity of subpopulations, i.e. the number of different alleles available in each pairwise grouping of subpopulations. From the above expressions, a definition of the coefficient of allelic differentiation is defined as
$$ A_{ST} = {{\left( {A_{T} - A_{S} } \right)} \mathord{\left/ {\vphantom {{\left( {A_{T} - A_{S} } \right)} {A_{T} }}} \right. \kern-\nulldelimiterspace} {A_{T} }} = {{D_{A} } \mathord{\left/ {\vphantom {{D_{A} } {A_{T} }}} \right. \kern-\nulldelimiterspace} {A_{T} }} . $$
(14)

Note that, in contrast to the parameter ρST defined by El Mousadik and Petit (1996), the coefficient AST depends on the allelic differences between subpopulations, not only on their average number of alleles. Thus, AST gives a measure proportional to the number of alleles in which two randomly chosen subpopulations differ. If all subpopulations carry a single allele distinct from all the others, AST = 1 (as FST). If all subpopulations share the same alleles, AST = 0 but FST can take different values depending on the allele frequencies. Analogously to FST, when more than one locus is considered, AS, DA and AT should be calculated for each locus with expressions (1013), the values averaged over loci, and introduced into Eq. 14 to obtain the average AST.

An example illustrating the above estimates for a single locus is given in Table 1. Assume a population composed of 6 subpopulations with a total of K = 9 alleles, where a 1 denotes presence of the allele and 0 denotes absence. For simplicity, let us assume that rarefaction is not necessary (e.g. all samples have the same size). Consider first the case of Table 1A, where each of the subpopulations has four alleles. The matrix of pairwise allelic diversities between subpopulations is given in Table 1B. Note that, for example, the pairwise diversity between subpopulations I and II is 5, because there is a total of five alleles in the set of the two subpopulations. Also, that between subpopulations I and VI is 8 alleles. The average of all elements in the matrix (minus one) is AT = 4.889, and the average of the diagonal (minus one) is AS = 3, the average allelic diversity within subpopulations. The matrix of allelic distances between subpopulations is given in Table 1C. Thus, for example, the allelic distance between subpopulations I and II is 2 because there is one allele in I that is not present in II and another in II that is not present in I. Half the average of elements in this matrix is DA = 1.889. Thus, the allelic differentiation from Eq. 14 is AST = DA/AT = 0.386. Without rarefaction, RT in Eq. 8 equals K = 9, the total number of alleles, and RS = AS + 1 = 4, thus in this example, ρST = (K − RS)/(K − 1) = 0.625.
Table 1

(A) Allelic composition of a population with n = 6 subpopulations with K = 9 alleles for a single locus, where 1 denotes presence of the allele and 0 denotes absence. (B) Matrix of pairwise allelic diversities between subpopulations corresponding to the case in (A). The diagonal is the number of alleles in each subpopulation. (C) Matrix of allelic distances between subpopulations corresponding to the case in (A). (D) Allelic composition of a population with the same average number of alleles as in (A) but a different distribution among subpopulations

 

Alleles

a

b

c

d

e

f

g

h

i

(A)

 Sub I

1

1

1

1

0

0

0

0

0

 Sub II

0

1

1

1

1

0

0

0

0

 Sub III

0

0

1

1

1

1

0

0

0

 Sub IV

0

0

0

1

1

1

1

0

0

 Sub V

0

0

0

0

1

1

1

1

0

 Sub VI

0

0

0

0

0

1

1

1

1

 

Sub I

Sub II

Sub III

Sub IV

Sub V

Sub VI

(B)

 Sub I

4

5

6

7

8

8

 Sub II

5

4

5

6

7

8

 Sub III

6

5

4

5

6

7

 Sub IV

7

6

5

4

5

6

 Sub V

8

7

6

5

4

5

 Sub VI

8

8

7

6

5

4

 

Sub I

Sub II

Sub III

Sub IV

Sub V

Sub VI

(C)

 Sub I

0

2

4

6

8

8

 Sub II

2

0

2

4

6

8

 Sub III

4

2

0

2

4

6

 Sub IV

6

4

2

0

2

4

 Sub V

8

6

4

2

0

2

 Sub VI

8

8

6

4

2

0

 

Alleles

a

b

c

d

e

f

g

h

i

(D)

 Sub I

1

1

1

1

1

1

1

1

1

 Sub II

1

1

1

0

0

0

0

0

0

 Sub III

1

1

1

0

0

0

0

0

0

 Sub IV

1

1

1

0

0

0

0

0

0

 Sub V

1

1

1

0

0

0

0

0

0

 Sub VI

1

1

1

0

0

0

0

0

0

Consider now the case shown in Table 1D, where the average number of alleles per subpopulation is again four, but subpopulation I has all nine alleles whereas the other subpopulations have the first three alleles. In this case, AS = 3, AT = 3.833, DA = 0.833, and AST = 0.217, i.e. there is a much lower overall allelic differentiation than in case 1A. However, ρST is again 0.625, as it only depends on the average allelic richness per subpopulation.

The contribution of each subpopulation to gene or allelic diversity

The contribution of each subpopulation to the total gene diversity and its components can be obtained by decomposing the two terms of expression (4) for each subpopulation, as suggested by Caballero and Toro (2002). An analogous partition can be made for allelic diversity from expression (13). Table 2A gives a hypothetical case with four subpopulations for which this partition is illustrated for a single locus. For this example, total allelic diversity is AT = 3.688, with within and between-subpopulation terms AS = 2.25 and DA = 1.438. The contributions of subpopulations I, II, III and IV to the within-subpopulation component are 0.5, 1, 0.25 and 0.5, respectively, simply proportional to their own allelic richnesses. The contributions to the between-subpopulation component are 0.344, 0.344, 0.344 and 0.406, respectively, denoting that subpopulation IV contributes proportionately more to allelic differentiation than the others.
Table 2

(A) Allelic frequencies for a single locus of a population with n = 4 subpopulations, with K = 6 alleles and the corresponding population differentiation parameters (FST, AST, ρST defined in the text). (B) Contributions (in %) of each subpopulation to the total allelic richness or diversity, and their within- and between-subpopulation components following the method of Petit et al. (1998) (allelic richness) and the new method (allelic diversity), respectively, calculated by disregarding each subpopulation and recalculating the loss (positive sign) or gain (negative sign) in richness/diversity when the subpopulation is removed

Subpopulation

Allele

Statistic

1

2

3

4

5

6

(A)

 I

0.2

0.3

0.5

   

FST = 0.22

 II

 

0.2

0.3

0.1

0.1

0.3

AST = 0.39

 III

0.5

0.5

    

ρST = 0.55

 IV

   

0.3

0.3

0.4

 

Subpopulation

Allelic richness

Allelic diversity

Total

Within

Between

Total

Within

Between

(B)

 g = 100

  I

0.0

−1.7

+1.7

+0.6

−2.3

+2.8

  II

0.0

+11.7

−11.7

+18.6

+15.8

+2.8

  III

0.0

−8.3

+8.3

−8.5

−11.3

+2.8

  IV

0.0

−1.7

+1.7

+6.6

−2.3

+8.9

 g = 90

  I

0.0

−1.7

+1.7

+0.6

−2.3

+2.8

  II

0.0

+11.7

−11.7

+18.6

+15.8

+2.8

  III

0.0

−8.3

+8.3

−8.5

−11.3

+2.8

  IV

+1.1

−1.7

+2.7

+6.6

−2.3

+8.9

 g = 40

  I

−0.2

−1.7

+1.5

+0.6

−2.2

+2.8

  II

+0.0

+11.7

−11.7

+18.6

+15.8

+2.8

  III

+0.6

−8.4

+9.0

−8.5

−11.3

+2.8

  IV

+9.1

−1.7

+10.8

+6.6

−2.2

+8.9

 g = 10

  I

−0.9

−1.1

+0.2

+0.0

−1.2

+1.2

  II

+1.3

+10.3

−8.9

+14.3

+11.5

+2.8

  III

+2.4

−8.7

+11.1

−8.4

−9.8

+1.3

  IV

+19.5

−0.5

+20.0

+10.3

−0.6

+10.9

The sample size is g, being 100 the total number of genes in each subpopulation (note that g = 100 implies no rarefaction)

Petit et al. (1998) also proposed to estimate the contribution of each subpopulation to total diversity by disregarding one subpopulation and recalculating the global average diversity from the remaining pool. This procedure has been used both for gene diversity and allelic richness (see also Caballero and Toro 2002; Foulley and Ollivier 2006; Toro et al. 2009). For the case of allelic richness, Petit et al. (1998) based this partition on the definition of ρST given in expressions (89). Thus, total allelic richness (RT) can be decomposed in within-subpopulation allelic richness (RS) and a between-subpopulation component obtained as RT − RS. As shown by Foulley and Ollivier (2006) and Toro et al. (2009), this partition is such that the contribution of a subpopulation to the total allelic richness depends basically on the number of private alleles in the subpopulation, i.e. a subpopulation only contributes to the total allelic richness if it carries unique alleles in the population. As noted by Toro et al. (2009), if a subpopulation has no private alleles and, therefore, its contribution to the global richness is null, then its contribution to the between-subpopulation component of allelic richness equals its contribution to the within-subpopulation component with opposite sign, which apparently lacks an intuitive justification. These aspects of the method will be shown with the example of Table 2.

Table 2B shows the contribution of each subpopulation to allelic richness (Petit et al. 1998 method) or allelic diversity (the proposed new method based on Eqs. 1014) disregarding that subpopulation and recalculating the percentage of allelic richness/diversity lost (positive sign) or gained (negative sign) with the removal. Consider first the case with g = 100, which corresponds to the situation of no rarefaction (as all subpopulations are assumed to have 100 genes each). As mentioned above, because no subpopulation has private alleles in the example, with the allelic richness method all subpopulations have a null contribution to the total richness, and the contribution to between-subpopulation richness is equal to the within-subpopulation richness with opposite sign. In contrast, with the new method (allelic diversity), subpopulation II contributes the most to the total diversity, followed by subpopulation IV. This is the subpopulation that contributes most to the between-subpopulation diversity, as it has the largest differences in allelic composition with respect to the others. Subpopulation III contributes negatively to total diversity (i.e. its removal increases the overall diversity). As expected, these results are in agreement with the relative contributions of each subpopulation given above, calculated by decomposing Eq. 13.

Table 2B also gives the contributions to allelic richness/diversity when rarefaction is made with different numbers (g) of sampled genes. It should be noted that the contributions with the method of allelic richness change substantially with decreasing values of g. In particular, the total contribution from subpopulation IV gradually increases up to 19.5% with g = 10. The reason is the following. Because alleles 4 and 5 have a low frequency in subpopulation II (see Table 2A), there are good chances for these alleles to get lost when sampling is carried out in this subpopulation (particularly for a low sample size). This would imply that subpopulation IV would appear as having unique alleles (i.e. alleles 4 and 5 would appear only in subpopulation IV if they are lost in subpopulation II). This is obviously a consequence of the method, which would suggest that subpopulation IV has the largest contribution. Note, in contrast, that the total contributions do not change much with rarefaction under the new allelic diversity method.

Simulated behaviour of AST for a finite island model

In order to investigate the asymptotic behaviour of AST in a subdivided population and its comparison with FST, computer simulations (with a C program available on request) were run assuming a finite island model with a range of migration and mutation rates. First, a single population of NT diploid individuals, typically 10,000, was initially set where all individuals originally carried the same allele at a given neutral locus. The population was run for 100,000 generations assuming random mating (no sexes were assumed and random self-fertilisation was allowed). In each generation mutation to new allelic variants (infinite alleles model) occurred with Poisson probability and rate u. Over this period of time mutation-drift equilibrium was reached for all parameters considered. At generation 100,000, the population was randomly split into n subpopulations of size N = NT/n. This subdivided population was maintained for a further 200,000 generation period assuming the same mutation rate as before and a migration rate per generation m between subpopulations (migration occurred with Poisson probability among randomly chosen subpopulations). One hundred independent loci were assumed for each run. At the last generation, the current allele frequencies for each locus were used to calculate the population genetic parameters (HS, DG, HT, AS, DA and AT) from the above equations, and averaged over loci to obtain estimates of FST and AST (obtained from Eqs. 5 and 14, respectively). Estimates of AST were also obtained from different sample sizes (g) after rarefaction (Eq. 7). All simulation scenarios were replicated twenty times and results were averaged over replicates.

Simulation results

Figure 1 illustrates the main simulated parameters over generations for a scenario of n = 10 subpopulations of N = 1000 individuals each, mutation rate u = 0.00001 per locus and generation and migration rate m = 0.0001 per generation (i.e., Nm = 0.1 migrants per subpopulation and generation) over the first 10,000 generations after subdivision. The equilibrium values (generation 200,000) are represented with symbols.
https://static-content.springer.com/image/art%3A10.1007%2Fs10592-010-0107-7/MediaObjects/10592_2010_107_Fig1_HTML.gif
Fig. 1

Population genetic parameters over generations for a subdivided population with n = 10 subpopulations with N = 1000 individuals each, assuming a mutation rate per locus and generation of u = 0.00001 and a migration rate per generation between subpopulation of m = 0.0001 (island model). Results are obtained as the average of twenty independent runs and 100 independent loci per run. Right-hand side squares and triangles refer to the asymptotic values (generation 200,000) of the corresponding parameters (squares correspond to continuous lines and triangles to broken lines). HS average within-subpopulation heterozygosity, DG average Nei′s distance between subpopulations, AS average allelic diversity within subpopulations, DA average allelic distance between subpopulations, K average total number of alleles in the population, np average number of private alleles per subpopulation, FST average gene frequency differentiation coefficient, AST average allelic differentiation coefficient

The base population, run for 100,000 generations with size NT = 10,000 prior to subdivision, reached an average of 4.84 segregating alleles per locus and an average heterozygosity of H = 0.288. These values are close to their expectations: expected number of segregating alleles from the diffusion approximation (\( \theta \int_{{{1 \mathord{\left/ {\vphantom {1 {2N_{T} }}} \right. \kern-\nulldelimiterspace} {2N_{T} }}}}^{1} {x^{ - 1} (1 - x)^{\theta - 1} dx} \); Ewens 1964; Kimura and Crow 1964; Crow and Kimura 1970, p. 455) equal to 4.76, where θ = 4NTu, and expected heterozygosity θ/(θ + 1) = 0.286.

After subdivision, within-subpopulation diversity measures (HS and AS) declined first and then increased slowly toward their equilibrium values. Between-subpopulation distance measures (DG and DA) increased accordingly. The total average number of alleles (K) and the number of private alleles (np) also increased toward their equilibrium values. Note that AST reached equilibrium rather faster than FST.

Figure 2 shows the equilibrium values of AST for different subpopulation sample sizes (g) using the rarefaction technique, assuming different migration rates (m). The symbols represent the population values of AST (when no rarefaction is used). It can be noted that the estimated values of AST are very close to the population values even for very small subpopulation sample sizes (the smallest value is g = 10; i.e. a sample of five individuals per subpopulation). Thus, in contrast to the typical strong dependency of allelic richness on sample size, AST seems to be very robust against changes in subpopulation sample size. Given this result, only population values of AST are considered thereafter.
https://static-content.springer.com/image/art%3A10.1007%2Fs10592-010-0107-7/MediaObjects/10592_2010_107_Fig2_HTML.gif
Fig. 2

Average asymptotic estimates of AST (allelic differentiation between subpopulations) for different numbers of alleles (g) sampled within subpopulation. Results refer to a subdivided population with n = 10 subpopulations with N = 1000 individuals each, assuming a mutation rate per locus and generation of u = 0.00001 and three different migration rates (m) per generation. The right-hand side symbols refer to the corresponding population genetic values of AST (when no rarefaction is used)

Figure 3 shows the equilibrium values of FST and AST for a different number of subpopulations (n = 2, 5 and 10). The values of AST change little with n, and in a similar way to those for FST. Thus, AST seems also to be very robust against changes in subpopulation size.
https://static-content.springer.com/image/art%3A10.1007%2Fs10592-010-0107-7/MediaObjects/10592_2010_107_Fig3_HTML.gif
Fig. 3

Asymptotic values of FST (gene frequency differentiation coefficient) and AST (allelic differentiation coefficient) between subpopulations for a subdivided population of n subpopulations with N = 1000 individuals each, assuming a mutation rate per locus and generation of u = 0.00001 and two migration rates (m) per generation

Figure 4 compares the equilibrium values of FST and AST for a range of migration rates (m). The corresponding values of within and between subpopulation components are also shown in the figure. For small values of migration, AST decays faster than FST, but for large migration rates, AST becomes almost insensitive to migration rate. Note that AST can be larger or smaller than FST depending on the migration rate. Figure 5 shows the equilibrium values of FST and AST for a range of mutation rates. In general, AST is rather insensitive to the mutation rate, particularly for high mutation rates.
https://static-content.springer.com/image/art%3A10.1007%2Fs10592-010-0107-7/MediaObjects/10592_2010_107_Fig4_HTML.gif
Fig. 4

Asymptotic values of population genetic parameters for a subdivided population of n = 10 subpopulations with N = 1000 individuals each, assuming a mutation rate per locus and generation of u = 0.00001 and a range of migration rates (m) per generation. HS average within-subpopulation heterozygosity, DG average Nei′s distance between subpopulations, AS average allelic diversity within subpopulations, DA average allelic distance between subpopulations, FST average gene frequency differentiation coefficient, AST average allelic differentiation coefficient

https://static-content.springer.com/image/art%3A10.1007%2Fs10592-010-0107-7/MediaObjects/10592_2010_107_Fig5_HTML.gif
Fig. 5

Asymptotic values of FST (gene frequency differentiation coefficient) and AST (allelic differentiation coefficient) between subpopulations for a subdivided population of n = 10 subpopulations with N = 1000 individuals each, assuming a range of mutation rates (u) per locus and three migration rates (m) per generation

Finally, Table 3 shows the coefficient of variation of the estimates of FST and AST for all simulated scenarios. It can be noticed that the estimates of AST are generally more variable than those for FST for low values of the mutation rate. In contrast, for high mutation rates, the estimates of FST are generally more variable than those for AST, particularly with no migration.
Table 3

Coefficient of variation (CV) of FST and AST estimates (and the ratio of these CV) obtained from 20 simulated replicates, for a subdivided population with n = 10 subpopulations with N = 1000 individuals each, assuming a mutation rate per locus and generation of u and a migration rate per generation between subpopulation of m (island model)

u

m

CV(FST)

CV(AST)

Ratio

0.000001

0

0.0032

0.0124

0.26

 

0.0001

0.0369

0.0746

0.49

 

0.01

0.1175

0.1560

0.75

0.000005

0

0.0031

0.0099

0.31

 

0.0001

0.0249

0.0462

0.53

 

0.01

0.0649

0.0943

0.68

0.00001

0

0.0042

0.0102

0.41

 

0.00001

0.0075

0.0349

0.21

 

0.00005

0.0109

0.0476

0.22

 

0.0001

0.0112

0.0368

0.30

 

0.0005

0.0322

0.0498

0.64

 

0.001

0.0388

0.0404

0.96

 

0.005

0.0587

0.0411

1.42

 

0.01

0.0545

0.0693

0.78

0.00005

0

0.0081

0.0032

2.49

 

0.0001

0.0141

0.0273

0.51

 

0.01

0.0393

0.0294

1.33

0.0001

0

0.0097

0.0024

3.96

 

0.0001

0.0156

0.0155

1.01

 

0.01

0.0322

0.0244

1.31

0.0002

0

0.0131

0.0008

15.27

 

0.0001

0.0101

0.0072

1.41

 

0.01

0.0318

0.0173

1.83

Discussion

The partition of genetic diversity into within and between-subpopulation components and its application in the conservation of genetic resources have been addressed by a number of studies and reviews (Petit et al. 1998; Eding and Meuwissen 2001; Caballero and Toro 2002; Simianer 2005; Ollivier and Foulley 2005; Toro and Caballero 2005; Foulley and Ollivier 2006; Toro et al. 2009). These applications have been used in practice for: (1) ascertaining those subpopulations (or breeds, in the case of domestic animals) that contribute more to variation and establishing endangered subpopulations (Eding et al. 2002; Piyasatian and Kinghorn 2003; Fabuel et al. 2004; Bennewitz and Meuwissen 2005; Tapio et al. 2005, 2006; Rodrigáñez et al. 2008; Stefenon et al. 2008); (2) calculating the optimal contribution of the different subpopulations or breeds to a pool (e.g. germ plasm collection) of maximal diversity (Fabuel et al. 2004); and (3) establishing a proper management of subdivided populations in captivity (Wang 2004; Fernández et al. 2008; Caballero et al. 2010).

Most of the above studies deal with the partition of gene diversity (expected heterozygosity) and only a few of them consider allelic richness as an alternative or complementary tool to gene diversity. The partition of allelic richness proposed by Petit et al. (1998) and used so far (e.g. Foulley and Ollivier 2006; Rodrigáñez et al. 2008; Stefenon et al. 2008) differs drastically from that proposed for allelic diversity in this paper, because of the way to deal with the between-subpopulation differences. For the Petit et al. (1998) method, basically subpopulations with private alleles contribute to the whole allelic richness (Petit et al. 1998; Foulley and Ollivier 2006; Ollivier and Foulley 2009; Toro et al. 2009). In contrast, the proposed method considers the allelic differences between subpopulations, arising from both private and common alleles. Ollivier and Foulley (2009) have put a note of caution in the partition of allelic richness into within and between-subpopulation components, because allelic richness is a characteristic attached to each population and cannot vary within it. For instance, a change in allelic richness in a given subpopulation (e.g. the appearance of a new allele) implies the same change (i.e. the appearance of a new allele) in the total allelic richness of the whole set of subpopulations (L. Ollivier and J. L. Foulley, personal communication) This argument does not hold for the partition of allelic diversity proposed, because a change in the number of alleles in a given subpopulation affects to the total diversity both through the average within-subpopulation diversity and through a possible change in the allelic distances between subpopulations.

The differences between the allelic richness and allelic diversity partitions of variation become evident in the illustrative example of Table 2 (for g = 100; no rarefaction). If no private alleles are present, then all subpopulations have a null contribution to the total richness. When rarefaction is applied, then some subpopulations can become non-null contributors. This occurs as a by-product of the sampling assumed because, after this, some alleles are lost and “private” alleles can arise in some subpopulations. Therefore, the method of Petit et al. (1998) has a different interpretation to that proposed here, which, in contrast, is little affected by rarefaction (Table 2).

Foulley and Ollivier (2006) recently proposed an alternative method to the rarefaction technique, based on extrapolation, which consists of adding to the number of alleles actually observed in a sampled subpopulation the expected number of alleles missing, given the number of genes examined in the sample and the allelic frequencies observed in the whole population. An assumption of this approach is that all subpopulations are drawn at random from the same founder population, so that alleles can be potentially present in all samples. Thus, if one allele is missing in a given sampled population, the reason is the low sampling size, ignoring the possibility that it could be truly lost by genetic drift or other reasons. The extrapolation method can also be used for working out the contribution from each subpopulation to allelic richness. This was investigated by Foulley and Ollivier (2006), who made the partition of richness following the Petit et al. (1998) approach with the extrapolation method and applied it on the argan tree data analysed by Petit et al. (1998). The contributions to total richness of the 12 argan tree subpopulations obtained with the extrapolation method were highly correlated (r = 0.99; Foulley and Ollivier 2006) with a private allelic richness criterion (Kalinowski 2004), and are also very closely correlated (r = 0.99) with those obtained with the Petit et al. (1998) method when rarefaction is not carried out (not shown). Because the extrapolation method adds, rather than removes, alleles in the subpopulations, it seems logical that the results with this method are close to a situation where no rarefaction is made on the data, where no alleles are assumed to be lost by sampling.

Toro et al. (2009) also proposed an alternative way for the partition of genetic diversity which makes some consideration of allelic diversity. Because the larger the number of alleles the larger is the potential diversity of a subpopulation, and because the maximal diversity occurs when alleles are at equal frequencies, a hypothetical situation can be assumed where all alleles present in a subpopulation have identical frequencies. Thus, the method proposed by Toro et al. (2009) consists of analysing gene diversity in the standard way (Eqs. 15) but assuming that all alleles in each subpopulation have the same frequency. An estimate of gene diversity under this situation takes account, at least partially, for the allelic diversity of the population, by considering the potentiality of each subpopulation according to the number and type of alleles that it carries. The rationale of this method, however, is not as intuitive as that proposed here, and a comparison between methods should be performed by simulations.

We have proposed a new measure of allelic differentiation (AST) which differs from its analogous ρST (El Mousadik and Petit 1996) in a basic point. Whereas the second depends on the average allelic richness of the subpopulations and that for the whole population, the former depends also on the particular arrangement of alleles in the different subpopulations (Table 1). Thus, AST considers the pairwise differences in absence/presence of alleles for each particular pair of subpopulations and, therefore, provides a specific assessment of the distances between subpopulations with respect to allelic composition. It has been argued that ρST largely depends on the distribution of rare alleles, notably whether they tend to be clustered in some subpopulations (high ρST) or are distributed more evenly so that one subpopulation is representative of the whole population (low ρST) (Comps et al. 2001; Tyler 2002; Stefenon et al. 2008). This is not very straightforward though. For instance, consider the example of Table 1D, where all subpopulations share three alleles and one subpopulation accumulates six private alleles. The value of ρST in this case would be the same as that where each of the six private alleles belong to different subpopulations, so that all private alleles are evenly distributed.

The parameter AST gives a measure proportional to the number of alleles in which two randomly chosen subpopulations differ. In the same way that a population with a larger number of alleles has a higher adaptive potential than another with a lower number of alleles, AST may indicate the degree of differential potentiality among subpopulations. The values of AST must be obviously related with those of FST, but they do not have to be necessarily in full accordance. For example, AST may be zero (all subpopulations carry the same alleles) whereas FST can be low (allele frequencies are similar between subpopulations) or high (there are substantial differences in allele frequencies between subpopulations). In general, AST can be usually regarded as a parameter which stresses the differences among subpopulations regarding rare alleles. This is in contrast with FST, which is generally little dependent on rare alleles. For example, assume a locus with two alleles with frequencies 0.8 and 0.2, respectively, in one subpopulation, and 0.2 and 0.8, respectively, in the other. This corresponds to FST = 0.36 and AST = 0. Now consider that there are 12 alleles instead, with frequencies 0.75, 0.15, the next five with frequency 0.02, and the next five with frequency 0, respectively, in one subpopulation, and frequencies 0.15, 0.75, the next five with frequency 0, and the next 5 with frequency 0.02, respectively, in the other subpopulation. Here FST = 0.30, only a bit smaller than that in the previous case, but AST = 0.29, largely dependent on the rare alleles present in one subpopulation but not in the other. The larger AST value in this latter case indicates a higher heterogeneity among subpopulation in rare allele composition and, therefore, a higher potentiality for evolution among subpopulations.

We carried out a population genetics study to investigate the behaviour of AST in comparison with FST. AST may reach an asymptotic value faster than FST (Fig. 1) and is very little dependent on subpopulation sample size (Fig. 2) and number of subpopulations (Fig. 3). Although both parameters have a similar trend towards an increase in the migration rate between subpopulations (Fig. 4), AST shows a faster decline with increasing values of migration rate (m) when m is low, but is less dependent on migration rate for large m. This must be explained by the higher dependency of AST than FST on rare alleles, because AST measures presence/absence of alleles rather than net differences in gene frequency. If there is a substantial isolation between subpopulations (m is very low) an increase in migration implies an immediate increase in the average number of alleles present per subpopulation (AS) but, because the frequencies of the new immigrant alleles will be low in the recipient subpopulation, the impact on the increase of the subpopulation heterozygosity will be relative low (see Fig. 4). When migration rates are large, the most frequent alleles will be present in most subpopulations, and migration will reduce their differences in gene frequency among subpopulations, getting FST values towards zero (Fig. 4). However, some rare alleles may still be present in some subpopulations but not in others, keeping a substantial allelic distance DA even with large migration rates.

The different behaviour of AST and FST for different migration rates suggests that AST can provide a complementary way to deduce migration rates and to investigate past population history (e.g. bottlenecks). Note that for a given mutation rate, such as that presented in Fig. 4, the values of AST are smaller than those for FST for low migration rates, and vice versa for large migration rates. Therefore, the relative values between these two parameters can provide some clues on the amount of migration. A predictive equation for AST as a function of population and genetic parameters (N, m, u) would be useful to make such deductions, but this requires further investigation.

Acknowledgements

We thank L. Ollivier, J. L. Foulley, J. Fernández, M. A. Toro and three referees for useful comments on the manuscript. This work was funded by Ministerio de Ciencia y Tecnología and Fondos Feder (CGL2006-13445-C02/BOS and CGL2009-13278-C02), and Xunta de Galicia.

Copyright information

© Springer Science+Business Media B.V. 2010