Introduction: the concept of N e and its importance

Effective population size (Ne) is a central concept in evolutionary genetics and conservation biology (Charlesworth 2009). Briefly, N e is defined as the size of an idealized population undergoing the same rate of genetic drift as the population under study (Wright 1931). In practice, N e is used in the design of conservation/selection breeding programs and for the interpretation of molecular evolutionary data (Charlesworth 2009).

The effective population size does not usually match the census size (N) of the population. Discrepancies can sometimes be extreme, as for example in the case of the Holstein cattle breed. Despite being a worldwide dairy cow breed with a census size of millions of individuals, the N e of the whole population is considered to be no larger than 100 (Flint and Mott 2001). This means that the Holstein breed accumulates inbreeding at the same rate as an ideal population of only 100 individuals.

Based on population genomics data, recent studies in model animals and humans suggest that different segments of the genome might undergo different rates of genetic drift, potentially challenging the idea that a single N e can account for the evolution of the genome (Gossmann et al. 2011; Jiménez-Mena et al., in review). Heterogeneity in N e across the genome has been specifically estimated and reported in eleven species (Fig. 1). As N e is important in guiding conservation programs, it is important to evaluate whether heterogeneous N e is the exception or the rule. Despite a few studied examples of heterogeneity in N e , is N e in reality constant overall, or has heterogeneity in N e been overlooked so far and commonly occurring? If N e is heterogeneous across the genome and pervasive across species, conservation guidelines that typically assume a single N e in a population may be overly optimistic. Let us illustrate this point with an example. Consider that the N e of a population is estimated to be 250. The rate at which the population is building up consanguinity (as measured by the rate of increase in inbreeding coefficient, or ΔF), corresponds to ΔF = 1/(2N e ) = 0.2 %. However, if portions of the genome are drifting at very different rates, the actual N e describing the building up of consanguinity in these regions might be quite different from 250. Under that assumption, let us consider a portion of the genome with say N e  = 50, thus, the ΔF of that specific region would be 1 %. In this case, the population would locally be accumulating homozygosity at a much faster rate than expected given the average N e . If the population harbors a sizable mutation load, e.g. strongly deleterious and fairly recessive mutations, this extra local drift might actually incur much higher inbreeding depression than what would be predicted if all regions consistently accumulated a constant and very modest amount of consanguinity throughout the genome.

Fig. 1
figure 1

Empirical distributions of the variation in local N e found in the species over which heterogeneity in N e have been studied (Gossmann et al. 2011; Jiménez-Mena et al. in review). These distributions of local N e were all fitted from empirical data by assuming that variation in local N e follows a log-normal distribution. Distributions have been scaled so that they all have a mean of 1 to make comparisons of N e heterogeneity on a comparable scale

Here, we first review the empirical evidence available so far that has revealed the existence of heterogeneity in N e throughout the genome, explaining the methods that allow us to detect such heterogeneity in N e and addressing the possible theoretical causes. We close by discussing the potential implications of such heterogeneity for conservation and breeding practices.

Evidence available for the heterogeneity in N e throughout the genome

Under selective neutrality and constant population size, a single N e is expected to capture the rate of genetic drift of all autosomes throughout the genome. In this case we refer to this N e as a “global N e ”, where the whole genome is affected by a homogeneous rate of genetic drift. But allele frequencies at a given focal site in the genome can also be influenced by selection operating at adjacent sites. N e , the number of individuals effectively contributing to future generations, is affected by the fact that, due to selection, some individuals contribute disproportionately to future generations because they are carrying more beneficial and/or less deleterious mutations around the site of interest. There is a wealth of population genetics theory that has accumulated since the mid-60s (see recent exhaustive review by Charlesworth 2012; Cutter and Payseur 2013) showing that selection can locally reduce the N e experienced by a genomic site reducing both the level of genetic variability and the efficacy of selection at that site. We hereafter talk about this “local N e ” to describe the potential heterogeneity in N e in the genome.

Levels of genetic diversity are typically summarized using measures such as mean marker heterozygosity, levels of pairwise nucleotide diversity, or number of distinct alleles in a sample. It turns out that under a model where mutation is selectively neutral, these measures are all determined jointly by N e and mutation rate µ, through the composite parameter θ = 4 N e µ. Thus, observed variation in levels of genetic diversity can be in principle caused by variation in N e and/or µ. Since the seminal study of Begun and Aquadro (1992), there has been evidence that levels of genetic diversity can be heterogeneous and correlated with recombination rates. In principle a unique N e can account for the correlation between mutation and recombination rate if regions of higher recombination are also more mutable. However, several studies in Drosophila spp. and humans show that a mere correlation between these two factors cannot account for the observed heterogeneity of diversity (Lohmueller et al. 2011; Comeron 2014). These studies provide evidence for selection at linked sites and thus challenge the theoretical expectation that N e should/could plausibly vary.

Different methods can be used to detect heterogeneity in N e

Studies that have reported variation in N e across the genome used different kinds of methods to estimate N e as well as different amounts of data. Despite the numerous methods available to estimate N e , i.e. demographic, pedigree, or molecular methods (Caballero 1994; Luikart et al. 2010), not all have the potential to detect heterogeneity in N e . Methods that rely on demographic and pedigree information assume per se a single N e across the whole genome. Demographic factors, such as fluctuations in population size over time or variance in progeny size, are assumed to affect the “global N e ” of a population. DNA variation, however, is affected by several processes that modify levels of genetic diversity locally along the genome (Sella et al. 2009) and methods that use this information can potentially be useful to detect different rates of genetic drift along the genome and measure a “local N e ” (Wang 2005; Luikart et al. 2010).

Molecular methods can be divided according to the number of samples taken from a population used to estimate N e . Methods that use only one contemporary sample from a population will estimate a long-term N e (also referred to as ancient or historical; Luikart et al. 2010). This long-term N e will reflect the accumulated effects of genetic drift over past generations and how demographic and selective evolutionary processes have changed diversity levels in the population throughout time. This yields a very integrative average of the population size history over a rather long time period but is not necessarily informative about the recent or current rate of genetic drift in the population (Wang 2005). Most evolutionary studies published so far that explicitly consider heterogeneity in N e along the genome use patterns of polymorphism, estimated at a single point in time, together with patterns of between-species divergence as a way to assess heterogeneity in mutation rates. By their nature these studies can only indirectly document heterogeneity in N e over long historical periods. Gossmann et al. (2011) estimated the long-term N e of 10 eukaryotic species from data of joint patterns of population polymorphism and divergence at neutral sites within protein-coding regions. The number of loci used for each species differed, ranging from 49 to 918 loci. The number of sites per locus also varied among species. The authors first detected variation in levels of genetic diversity using two different statistical tests (Chi2 test and HKA test). These two tests differ in their assumptions about recombination within and between the loci studied for each species, i.e. Chi2-test assumes that there is free recombination within loci, whereas HKA test assumes that there is no recombination. Assuming that N e varied locally and this variation followed a log-normal or gamma distribution (Fig. 1), the authors estimated the distribution of the variation in local N e for each species. Although significant, the variation in N e found was modest, ranging from 7.2-fold (Capsella rubella) to < four-fold (Mus musculus castaneus, Capsella grandiflora, Boechera strycta, and Saccharomyces paradoxus). The study of Gossmann et al. (2011) represents the first study that directly assessed the existence of heterogeneity in N e across the genome and quantified it. However, this study does not specifically report the locations in the genome where N e is heterogeneous.

A second type of molecular method uses information from a single population sampled at two different time points. With two temporal samples, it is possible to measure the increase in inbreeding coefficient, or a change in allele frequencies between the two time points and estimate a “realized N e ” over a defined time period (Wang 2005). The temporal information allows us to measure this realized N e using the temporal standardized variance in allele frequencies (Waples 1989; Jorde and Ryman 2007). Temporal datasets thus allow estimating directly the realized N e and studying its heterogeneity. Orozco-terWengel et al. (2012) analysed the change of allele frequency over 37 generations in a set of three replicated experimental lines of Drosophila melanogaster. They used temporal samples and directly measured the realized N e along the genome. They observed different rates of genetic drift, as estimated by loss of heterozygosity, among the chromosomes; these rates ranged from very low levels of genetic drift found in the sexual X chromosome to very high levels presented by chromosome 3. Tobler et al. (2014) combined experimental evolution and next-generation sequencing (NGS) to detect candidate SNPs associated with thermal adaptation in D. melanogaster. They estimated a chromosome-specific N e for each chromosome, using a total of 1.45 million SNPs. N e was compatible with a genome-wide realized N e of about 300 individuals over 15 generations, but several chromosome arms (2L, 3R) exhibited a (significantly) reduced realized N e of approximately 150 in some treatments.

The animal breeding sector generates large amounts of genomic data, typically genotyping thousands of individuals at different time points in breeding populations of livestock species. This type of data can in principle be used to infer the realized N e in different segments of the genome over 1–10 or possibly 20 generations. However, work studying the heterogeneity in N e in livestock populations is very scarce. Some indirect results bearing on heterogeneity in genetic drift can be found in MacEachern et al. (2009). They looked at more than 7500 SNPs located along the genome for two common cattle breeds (Angus and Holstein). They found that many high-frequency SNP alleles showed some evidence of non-neutrality throughout the genome. Most of these high-frequency alleles seemed to be grouped in a few regions along the genome, particularly in Holstein. The authors interpret these patterns as instances of hitchhiking events occurring along the genome but do not test per se whether N e is heterogeneous in regions not strongly affected by hitchhiking. The majority of studies within the animal breeding sector are interested in the implementation of genome-wide scans for selection footprints since domestication by comparing breeds (see Qanbari and Simianer 2014 for a review). Jiménez-Mena et al. (in review) explicitly used SNP data obtained from three cohorts of individuals of the Danish Holstein cattle breed to examine the rate of genetic drift or N e throughout the genome. This analysis is also strongly suggesting a heterogeneous N e . Using a 54 K chip data from the Danish Holstein population, they estimated the standardized variance in allele frequency changes from temporal samples in different bins comprising 100 SNPs along the genome and estimated a realized N e in each bin. They found evidence for significant variation in N e along the different chromosomal regions with 472 chromosomal segments spanning 100 SNPs (scale of 3–10 Mb) exhibiting a realized N e that varies considerably (N e : 40–250). This range of variation is genuine and exceeds the mere sampling variance around estimates (Fig. 1).

Causes for the heterogeneity in N e

Neutral regions linked to selected regions may undergo selection through genetic hitchhiking, whereby adaptive mutations sweep through a population (see the ‘genetic draft’ theory developed by Gillespie 2000), and/or background selection, whereby deleterious mutations are removed by selection. Positive and background selection affect diversity across the genome (Sella et al. 2009), but distinguishing their relative importance is challenging (Elyashiv et al. 2014). Some analytical results—holding when selection is strong—show that local N e at a site experiencing background selection is scaled by the fraction of individuals free from deleterious mutations (Charlesworth et al. 1993; Charlesworth 2012). We thus have a neutral site that is apparently neutrally evolving, albeit with a N e lower than the one in absence of background selection. In the case of hitchhiking the effects of selection on a linked neutral site can be drastic and cannot even be described properly by a mere reduction of N e .

How widespread along the genome are these linked selection effects? The magnitude of N e reduction by linked selection ultimately depends on the frequency and intensity of selective sweeps and the magnitude of background selection. Recombination is key as, all other things being equal, a site in a region with a low recombination rate (per physical length) is more prone to linked selection at neighboring sites. The recombination rate is not constant but varies across the chromosome (Simianer et al. 1997; Nachman 2002) and N e is expected to be lower in regions of low recombination rates and higher in regions of high recombination rates.

When testing whether heterogeneity in N e correlates with regions undergoing genetic hitchhiking and background selection, empirical studies use different genetic proxies for selection at linked sites to study this relationship, e.g. density of selected sites, rates of recombination. Although these proxies are expected to be correlated to levels of variation in N e , empirical studies have not been able to show this relationship in a consistent way. From the four species tested, Gossmann et al. (2011) only found a significant correlation of the heterogeneity in N e with the recombination rate in D. melanogaster (positive), and with density of genic sites in Homo sapiens (positive) and Arabidopsis thaliana (negative). The authors argue that these inconsistent correlations between N e and recombination rates and density of selected sites might be explained by their measure of the mutation rate, which was indirectly obtained from the synonymous diversity. Jiménez-Mena et al. (in review), surprisingly, did not find any correlation of the variation of N e reported with density of genic sites, local recombination rate, or presence of past selective sweeps or QTL. This is probably due to the large window size used for the estimation of N e across the genome and possibly the very short time interval used to measure drift.

The different studies that have reported variation in N e across the genome differed in (1) the method used to estimate N e , (2) the kind of N e estimated and (3) the amount of molecular data used. Nevertheless, they seem to reveal a substantial heterogeneity in N e within the genome of the species for which heterogeneity in N e was significant. The amount of variation in N e differed between the eleven species studied (Gossmann et al. 2011; Fig. 1) but this variation did not show a clear pattern between groups of species, e.g. plant or animal species analyzed so far did not have different levels of heterogeneity. Only Drosophila spp. was studied using both long-term and realized N e . Variation in N e was reported at large scale such as entire chromosome arm (Orozco-terWengel et al. 2012; Tobler et al. 2014), as well as between gene fragments (Gossmann et al. 2011). The other nine species studied in Gossmann et al. (2011) also reported statistically significant variation in N e between gene fragments. In the case of Bos taurus, variation in N e was found in relatively large genomic windows of average size of 5 Mb within and between chromosomes (Jiménez-Mena et al. in review).

Outlook: consequences for conservation and sustainable breeding practices

N e is used to evaluate the threat status of a population and to design breeding programs for both wild/captive populations and livestock (local breeds). As a ‘rule of thumb’, it has been suggested that a minimum N e of 50 is required to avoid the effects of inbreeding depression in the short-term (Soulé 1980), and larger than 500 if the interest is to maintain the evolutionary potential of the population over a long term (Franklin 1980). There is general disagreement on the appropriate values for these two lower limits and there has been an intense discussion ever since the influential work of Franklin (1980) and Soulé (1980) were published. The short-term threshold of N e (N e  = 50) was derived from the “one percent inbreeding per generation” rule, obtained from practical experience of livestock breeders (Soulé 1980). The long-term threshold of N e (N e  = 500) was conceived based on the importance of mutational accumulation over time. Lande (1995) and Lynch and Lande (1998) later suggested increasing this value to 5000 based on the assumption that the majority of mutations occurring are deleterious, whilst approximately only 10 % are not deleterious and thus potentially evolutionary favourable. Franklin and Frankham (1998) argued against this recommendation by Lynch and Lande to increase Franklin’s original threshold to 5000. Their reasoning was that (1) a portion of these deleterious mutations can also have evolutionary potential in a different environment, (2) the mutation rate and heritability estimates that were used to derive the N e limit of 5000 were not very realistic, i.e. too low mutation rate and too high heritability estimate. Very recently, Frankham et al. (2014a) recommended increasing these two thresholds to N e  = 100 and N e  > 1000, respectively, and it continues to be the subject of a controversial discussion (Franklin et al. 2014; Frankham et al. 2014b; Rosenfeld 2014). With all these recommendations, N e is implicitly discussed assuming homogeneity of N e throughout the genome. What are the potential consequences of a heterogeneous N e in that context?

International agencies such as IUCN (International Union for the Conservation of Nature) and FAO (Food and Agriculture Organization of the United Nations) use N e as a parameter, amongst others, to evaluate the threat status of a population (IUCN 2011; FAO 2013). Strategies for managing small populations typically focus on maximizing N e to limit the loss of genetic variation (Ballou and Lacy 1995; Fernández et al. 2011). If variation in N e turns out to be substantial, it would be important to study the consequences of unforeseen local homozygosity on inbreeding depression in managed populations.

In livestock selection programs, the objective is to increase genetic value for a set of economically valuable traits but also to impose restrictions on the rate of increase in inbreeding coefficient, ΔF, as inbreeding depression might offset the genetic gains achieved (Toro et al. 2009). These programs rely on phenotypic and pedigree information (Hayes and Goddard 2010), but increasingly also on genome-wide marker information to select the breeding individuals (Meuwissen et al. 2001). So-called genomic selection, has transformed the breeding industry (Hayes et al. 2009). There are different selection schemes to manage the contribution of individuals in selection breeding programs. The selection scheme based on optimum contributions (OCS, Meuwissen 1997) is of particular interest in our context. Briefly, the idea is that increasing genetic gain and restricting the level of inbreeding of the population can be achieved by controlling the co-ancestry of the selected parents. The estimation of the breeding values can be obtained from pedigree or genomic information, and the breeding scheme to manage inbreeding can also be based on pedigree or genomic relationships. Current methods seek to restrict ΔF as estimated from pedigree (these methods are assuming ΔF is homogenous) or using co-ancestry estimated by molecular markers. An open question is whether local control of the rate of drift (and ΔF) is desirable and can be achieved.

Interestingly, simulation results from Sonesson et al. (2012) show that using genomic information both to obtain the estimated breeding values (EBV) and to restrict inbreeding in the breeding scheme result in a more homogeneous build up of inbreeding around selected regions harboring loci of interest. So when genomic information is used both for the EBV and the breeding scheme, the OCS method indirectly takes into account the selection forces acting on the evaluated markers and neighboring regions, and thus, actually might control the local N e . Importantly the authors also found that merely using pedigree-based information for restricting inbreeding among selected parents does not guard against the fact that ΔF can be far above the desired rate locally at specific regions of the genome. More recently, a slightly alternative OCS strategy that builds upon Sonesson et al.’s results proposes to combine weighting rare alleles with the OCS method that uses genomic information. This strategy also reduces excessive build up of local inbreeding along the genome while obtaining higher genetic gain (Liu et al. 2014).

Although evidence of a heterogeneous N e has been found in several species, we currently lack a clear picture of the extent of heterogeneity in N e . As empirical information on genome-wide diversity in livestock accumulates, there is the opportunity to exploit that wealth of information for other purposes. Currently, this wealth of SNP data is predominantly used to search for strong selective sweeps between breeds and in artificial selection programs to predict breeding values. But, these datasets could also be used (1) to assess empirically the magnitude of the heterogeneity in N e realized over generations and (2) to monitor more precisely the rate at which inbreeding builds up locally in the genome of livestock breeds. Both conservation and breeding programs might be improved by the knowledge gained about the interaction between drift and linked selection in well-studied livestock populations.