Fitness consequences of polymorphic inversions in the zebra finch genome

Knief, Ulrich; Hemmrich-Stanisak, Georg; Wittig, Michael; Franke, Andre; Griffith, Simon C.; Kempenaers, Bart; Forstmeier, Wolfgang

doi:10.1186/s13059-016-1056-3

Fitness consequences of polymorphic inversions in the zebra finch genome

Research
Open access
Published: 29 September 2016

Volume 17, article number 199, (2016)
Cite this article

Download PDF

You have full access to this open access article

Genome Biology Aims and scope Submit manuscript

Fitness consequences of polymorphic inversions in the zebra finch genome

Download PDF

Ulrich Knief^1,2,
Georg Hemmrich-Stanisak³,
Michael Wittig³,
Andre Franke³,
Simon C. Griffith^4,5,
Bart Kempenaers¹ &
…
Wolfgang Forstmeier¹

4691 Accesses
32 Citations
21 Altmetric
Explore all metrics

Abstract

Background

Inversion polymorphisms constitute an evolutionary puzzle: they should increase embryo mortality in heterokaryotypic individuals but still they are widespread in some taxa. Some insect species have evolved mechanisms to reduce the cost of embryo mortality but humans have not. In birds, a detailed analysis is missing although intraspecific inversion polymorphisms are regarded as common. In Australian zebra finches (Taeniopygia guttata), two polymorphic inversions are known cytogenetically and we set out to detect these two and potentially additional inversions using genomic tools and study their effects on embryo mortality and other fitness-related and morphological traits.

Results

Using whole-genome SNP data, we screened 948 wild zebra finches for polymorphic inversions and describe four large (12–63 Mb) intraspecific inversion polymorphisms with allele frequencies close to 50 %. Using additional data from 5229 birds and 9764 eggs from wild and three captive zebra finch populations, we show that only the largest inversions increase embryo mortality in heterokaryotypic males, with surprisingly small effect sizes. We test for a heterozygote advantage on other fitness components but find no evidence for heterosis for any of the inversions. Yet, we find strong additive effects on several morphological traits.

Conclusions

The mechanism that has carried the derived inversion haplotypes to such high allele frequencies remains elusive. It appears that selection has effectively minimized the costs associated with inversions in zebra finches. The highly skewed distribution of recombination events towards the chromosome ends in zebra finches and other estrildid species may function to minimize crossovers in the inverted regions.

Genetic structure and viability selection in the golden eagle (Aquila chrysaetos), a vagile raptor with a Holarctic distribution

Article 11 July 2016

Female-biased gene flow between two species of Darwin’s finches

Article 04 May 2020

Insights into the genetic architecture of morphological traits in two passerine bird species

Article 14 June 2017

Background

Between-individual genetic variation is the substrate for selection. Genetic polymorphisms range in size from single nucleotides (SNPs) to large scale insertions, deletions, or rearrangements that span several millions of base pairs [1, 2]. Among these structural variants, inversions play a prominent role and have long been recognized as drivers of local adaptation and speciation (reviewed in [3]). Inversions are intrachromosomal structural mutations which result in the reversal of gene order (and no change in the genic content of a chromosome) [4].

In heterokaryotypic individuals (those that are heterozygous for an inversion) recombination within the inverted region is largely suppressed, either because homologous pairing is partially inhibited or because crossovers give rise to unbalanced gametes (carrying deletions or duplications) which will lead to the death of the zygote [1]. These two processes are not mutually exclusive and their prevalence depends, amongst others, on the size and location of the inverted region [5–8]. In particular, a distinction between those inversions which cover both chromosome arms and thus include the centromere (pericentric inversions) and those which are restricted to a single chromosome arm (paracentric inversions) has often been made [9]. A single crossover within a pericentric inversion leads to the formation of two chromatids with duplications and deficiencies and two normal chromatids, whereas in paracentric inversions an acentric fragment and a dicentric chromatid along with two normal chromatids are formed [9]. In species with an ordered (linear) tetrad in the female meiosis [e.g., Drosophila spp. or maize (Zea mays)] paracentric inversions often do not cause reduced fertility in females because the dicentric chromatid is preferentially passed into the second polar body [6, 9]. On the other hand, pericentric inversions often lead to decreased fertility in females [6, 7], which may also explain the preponderance of polymorphic paracentric over pericentric inversions in species like Drosophila spp. that lack male recombination [10]. In contrast, humans and maize recombine in the male meiosis and heterokaryotypic males for both pericentric and paracentric inversions may produce a high percentage of unbalanced gametes and hence inviable embryos [5, 11–14], with recombination frequency being highest in the largest inversions (absolute or proportional to the total chromosome size).

Despite their presumed heterozygous fitness costs, which would ultimately lead to the loss of the minor allele from the population, inversion polymorphisms are ubiquitously found within species (reviewed in [3], [15–22]). Inversion polymorphisms are, therefore, somewhat of an evolutionary puzzle. In order to increase in allele frequency, the inverted region should either confer a fitness advantage to the organism or exhibit segregation distortion (drift as the sole force is unlikely but may contribute in small populations [23]). The most prominent feature of an inversion is its ability to suppress recombination within the inverted region. This may preserve linkage disequilibrium between beneficial combinations of alleles, which could lead to the spread of the inversion (with epistatic fitness interactions [24] or without epistasis [25, 26]).

Once a beneficial inversion starts to spread in a population, several mechanisms may prevent it from going to fixation, thereby maintaining the polymorphic state at some equilibrium frequency. The frequencies of most of the known inversion polymorphisms within a species vary latitudinally [18, 27–30], locally [21], or seasonally [31], apparently in response to a changing environment. However, there are also examples of polymorphisms within single populations which could be stabilized via frequency-dependent (disruptive) selection [17, 32–34], antagonistic pleiotropy [35], mate choice [36, 37], recessive deleterious mutations captured by or accumulating on the inverted haplotype (“associative overdominance” [23, 38]), overdominance (i.e., the heterokaryotypic individuals have higher fitness than both homozygotes [38, 39]), or under several scenarios involving segregation distortion [40, 41]. Several of these scenarios will effectively result in overdominance for fitness or in fitness being negatively correlated with the inversion’s frequency; both of which should be possible to measure empirically.

In birds, intraspecific inversion polymorphisms are regarded as common [42, 43], yet it is unknown whether birds have evolved mechanisms to suppress recombination within inversions to reduce the cost of embryo mortality. The two best-studied cases are found in ruffs (Philomachus pugnax) and in white-throated sparrows (Zonotrichia albicollis). In both species an inversion is linked to distinct plumage and behavioral phenotypes, covering around 100 and 1000 genes, respectively. By (almost) completely suppressing recombination between arrangements these inversions constitute so-called supergenes [19, 20, 22, 44]. In white-throated sparrows, the inversion likely became polymorphic through a past hybridization event [22] and is kept polymorphic by disassortative mating between birds with the two arrangements (e.g., [22, 37, 44]).

The zebra finch (Taeniopygia guttata) belongs to the family of grassfinches (Estrildidae), which are rich in polymorphic inversions [45–47]. Two polymorphic pericentric inversions have been described cytogenetically in zebra finches, one on chromosome Tgu5 (the sixth largest chromosome in the karyotype [45, 48, 49]) and one on the sex chromosome TguZ [50]. The inversion polymorphism on chromosome TguZ is found in the Australian subspecies (T. guttata castanotis) and, with a different allele frequency, in the subspecies from Timor (T. guttata guttata) [50].

Here we report on a genome-wide scan for inversion polymorphisms in a wild population of 948 zebra finches from Australia. Due to their nomadic behavior, Australian zebra finches appear to form one very large panmictic population [51, 52], such that the sampled birds are considered representative for the entire Australian subspecies. Using 4553 SNPs, we searched for unusual patterns of long-range linkage disequilibrium and identified four large linkage blocks (two of which are the known inversion polymorphisms on chromosomes Tgu5 and TguZ). We then inferred the inversion genotypes for every individual by principle component analysis, selected unique tagging SNPs, and genotyped an additional set of 5229 birds from four different captive populations of Australian zebra finches. We used these data to study the phenotypic and fitness consequences of the four inversion polymorphisms. Our aim was to address three main issues. (1) Heterokaryotypic individuals should exhibit increased embryo mortality rates if they are unable to completely suppress recombination within the inverted region or to remove the unbalanced meiotic products. We test this prediction by analyzing the occurrence of natural embryonic deaths in 9764 developing eggs. (2) Heterosis and frequency-dependent selection could balance inversion polymorphisms; we test both selective forces by correlating inversion genotypes with several fitness parameters (viability, fecundity, siring success, and number of independent offspring). (3) Effect sizes of associations between inversion genotypes and polygenic traits are expected to be higher than those using single SNPs in collinear parts of the genome because multiple causal variants will be linked together by the inversion. Thus, inversions offer an opportunity to study the relative importance of additive versus dominance effects in a defined genomic region. We use the inversion genotypes as predictors in association studies with eight morphological traits and compare the contribution of additive and dominance effects on phenotypic variation.

Results

Detection and description of inversion polymorphisms in a wild population

Linkage disequilibrium patterns

In collinear parts of the zebra finch genome, linkage disequilibrium (LD; measured as r²) >0.1 extends maximally for 185 kb in our sample of wild Australian birds (Knief U, Schielzeth H, Backström N, Hemmrich-Stanisak G, Wittig M, Franke A, Griffith SC, Ellegren H, Kempenaers B, Forstmeier W: Association mapping of morphological traits in wild and captive zebra finches: reliable within but not between populations, unpublished) (for a representative example, see Additional file 1: Figure S1a). In contrast, four chromosomes (Tgu5, Tgu11, Tgu13, and TguZ) showed extraordinarily large linkage blocks, spanning several megabases (12–63 Mb). They showed the typical LD structure of an inversion [53–55], with LD being highest near the presumed inversion breakpoints and lower in the central parts of the inverted region due to gene flux by double crossover (Fig. 1a, c, e, g).

For chromosome Tgu5 the LD block reaches from 0.96–16.50 Mb, which is equivalent to 25 % of the assembled chromosome and covers 325 genes. The inversion most likely includes the centromere, which is located maximally 5.12 Mb from the proximal chromosome end (Table 1) [56]. On chromosome Tgu11, the region of high LD extends from 0.086–12.29 Mb (equivalent to 57 % of the assembled chromosome, spanning 250 genes), covering the most proximal SNP that had been genotyped. The centromere on chromosome Tgu11 is located at the distal end of the chromosome at around 20 Mb [57] and is thus located outside the LD block (Table 1). On chromosome Tgu13, almost the complete assembled chromosome is part of one large LD region (99 %, covering 312 genes), starting from the second proximal SNP and covering the most distal SNP being genotyped (0.15–16.91 Mb). The centromere on chromosome Tgu13 is located at the distal end of the chromosome [57]. Some parts of the genome assembly are missing at this position [57, 58], yet crossovers between the centromere and a marker located within the LD region occur in heterokaryotypic individuals [57] and we thus conclude that the LD region does not cover the centromere (Table 1). Finally, the physically largest LD block was found on the sex chromosome TguZ, extending from 5.91–68.83 Mb, which is equivalent to 86 % of the total chromosome length and covering 619 genes. The centromere on chromosome TguZ is located at 27.62–28.12 Mb and is thus included in the LD region (Table 1). Summing up, the inversions on chromosomes Tgu5 and TguZ are pericentric and the ones on chromosomes Tgu11 and Tgu13 are paracentric.

Table 1 Description of the four large linkage blocks (resulting from inversion polymorphisms) on chromosomes Tgu5, Tgu11, Tgu13, and TguZ and the two smaller and less certain ones on chromosomes Tgu26 and Tgu27 found in wild Australian zebra finches

Full size table

Weaker signals of long-range LD were also found on chromosomes Tgu26 and Tgu27 (Additional file 1: Figure S1c, e), covering 2.05 Mb (42 % of the total chromosome length, covering 57 genes) and 2.74 Mb (59 % of the total chromosome length, covering 166 genes), respectively. Yet, LD patterns were not typical for inversions and principal component analyses (PCAs; see the “Principle component analyses” section below) did not lead to clear clustering (Additional file 1: Figure S1d, f). Hence, we presumed that these are not inversions and did not analyze them further.

We tried to locate the inversion breakpoints with high resolution using the distance and orientation of paired reads from pooled sequencing data. However, we were unable to map any of the breakpoints, suggesting that they are located in genomic regions that are missing in the current genome assembly (for example, in repetitive sequences).

Principle component analyses

The four chromosomes found in the LD scan also showed inversion-typical patterns in the PCA (Fig. 1b, d, f, h, principle component loadings: Additional file 1: Figures S2–S5). The three autosomal inversions had two main homozygote haplotype clusters (with the heterozygous individuals in between) and the sex chromosome split into three main homozygote haplotype clusters (with the heterozygous individuals in between). The clusters were well defined on the autosomes but on chromosome TguZ the least common haplotype (haplotype C in Fig. 1h) seemed to allow some recombination with each of the two other haplotypes, making the clusters more diffuse. However, both the low average heterozygosity within each cluster of homozygotes compared to heterozygotes (Table 2) and median-joining networks (using Network v4.6.1.1 with standard settings [59]) on phased SNP data at the inversion breakpoint (using Beagle v3.3.2 [60]; Additional file 1: Figure S6) further support the interpretation that the LD regions represent inversion polymorphisms. It should also be noted that chromosomes Tgu5 and TguZ had been previously found cytogenetically to carry pericentric inversions and the breakpoints match precisely to the LD region boundaries [45, 48–50].

Table 2 Population genetic descriptive statistics of the four inversion polymorphisms in wild Australian zebra finches

Full size table

From the current analyses we do not know with confidence which arrangement is ancestral and we thus name them according to their allele frequency (A = major haplotype, B = minor haplotype, C = least common haplotype on chromosome TguZ; Fig. 1b, d, f, h; Table 2). The major alleles of all four inversion polymorphisms showed remarkably similar frequencies ranging between 0.53 and 0.60 (Table 2). On chromosome TguZ, the least common allele (haplotype C) was rare (frequency 0.074; Table 2). All inversion polymorphisms were in Hardy–Weinberg equilibrium (HWE; Table 2) and there was no LD between them, which means that they segregate independently (Additional file 2: Table S1).

Pooled heterozygosity and minor allele counts at inversion breakpoints

We calculated pooled heterozygosity (ZH_p) in 50-kb non-overlapping sliding windows along each chromosome (Fig. 2a). Low values of ZH_p are indicative of regions with a high degree of fixation, for example, due to positive selection [61]; high values of ZH_p are expected, for example, in regions of local population structure (like inversions) or under balancing selection [62]. We found pronounced peaks in ZH_p at the presumed breakpoints of the inversions on chromosomes Tgu5, Tgu11, and Tgu13, whereas ZH_p dropped to almost genome-wide average values in the interior of the inversions. Chromosome Tgu11 had only one such peak, suggesting that the proximal breakpoint is missing in the current genome assembly. Diversity (SNPs per site in a 50-kb window; Additional file 1: Figure S7) was slightly reduced at the presumed breakpoints of every inversion compared to the inversion’s interior (mean SNPs per site ± standard deviation at breakpoints versus interior of 0.017 ± 0.005 versus 0.020 ± 0.005 for Tgu5, 0.0057 ± 0.0036 versus 0.018 ± 0.004 for Tgu11, and 0.016 ± 0.006 versus 0.022 ± 0.004 for Tgu13; 0.021 ± 0.007 collinear autosomal genome-wide average SNPs per site). On chromosome TguZ, the entire inversion interior had high ZH_p values, which only dropped to the genome-wide average outside the inverted region. Further, diversity on TguZ was markedly reduced all along the inverted region, including the presumed breakpoints, and increased to the genome-wide average only outside the inversion (0.0021 ± 0.0015 versus 0.022 ± 0.009, respectively).

The minor allele count frequency (MAC) spectra at the breakpoint regions for all autosomal inversion polymorphisms showed an admixture of the background MAC spectrum (Fig. 2f) with a second MAC distribution whose local maximum matches with remarkable accuracy the allele frequency of the inversion types in those 100 individuals that had been used for the pooled population sequencing (Fig. 2b–e).

Association analyses and fitness consequences in captive and wild populations

Associations with embryo mortality

For each of the inversion polymorphisms, we tested whether heterokaryotypic individuals had higher embryo mortality rates than homozygotes using data from three captive populations of zebra finches. We fitted generalized linear mixed-effects models using embryo mortality as a binomial response variable (0 = embryo survived until hatch, 1 = embryo died naturally; overall proportion of dead embryos, \( \overline{\mathrm{x}}\kern0.37em =27.4\ \%,\ \mathrm{n} = 9764 \) fertile eggs; for the “Seewiesen” population \( \overline{\mathrm{x}}=31.5\ \%,\ \mathrm{n} = 6334 \) eggs; for the “Bielefeld” population \( \overline{\mathrm{x}} = 22.9\ \%,\ \mathrm{n} = 1170 \) eggs; for the “Cracow” population \( \overline{\mathrm{x}}=18.4\ \%,\ \mathrm{n} = 2260 \) eggs; Table 3) and the inversion genotypes of both parents as two predictors, coded as 1 = heterozygous and 0 = homozygous.

Neither the mother’s nor the father’s inversion genotype had an effect on embryo mortality that survived strict Bonferroni correction (Fig. 3; Additional file 1: Figure S8). Notably, the two inversions which cover almost entire chromosomes (Tgu13 and TguZ) had a weak effect in the expected direction in heterokaryotypic males (meta-analytic odds ratio (OR) [95 % confidence interval] = 1.17 [1.01–1.36] for Tgu13, P = 0.040, and OR = 1.16 [0.99–1.36], P = 0.065 for TguZ).

Associations with fitness parameters

We fitted generalized linear mixed-effects models using measures of four fitness components as response variables and the inversion genotype simultaneously as an additive (−1 = homozygous for the minor allele, 0 = heterozygous, 1 = homozygous for the major allele, using one degree of freedom) and a dominance (0 = homozygous, 1 = heterozygous) predictor in the three captive populations (“Seewiesen”, “Bielefeld”, and “Cracow”). For females we included fecundity (number of eggs laid) and reproductive success (number of chicks that survived until an age of 35 days). For males we used siring success (total number of eggs sired in communal aviaries) and reproductive success (total number of chicks sired that survived until an age of 35 days). Sample sizes are given in Table 3.

Table 3 Sample sizes for the association analyses with embryo mortality and fitness parameters in the three captive zebra finch populations

Full size table

Neither in females nor in males did any of the inversion polymorphisms exhibit additive or non-additive fitness effects (Fig. 4; Additional file 1: Figure S9).

Frequency-dependent selection

Within the aviary setting we tested whether negative frequency-dependent selection was balancing the inversion polymorphisms (see “Methods”). For this analysis we used two captive populations (“Seewiesen” and “Bielefeld”) and tested whether there was a negative relationship between inversion frequency and fitness. None of the inversion polymorphisms was subject to significant negative frequency-dependent selection. However, the three strongest effects of inversion frequency on fitness parameters were in the expected direction (chromosome Tgu5 frequency on male reproductive success [nominal P = 0.077] and chromosome Tgu11 frequency on female fecundity [nominal P = 0.12] and female reproductive success [nominal P = 0.21]; Additional file 1: Figure S10).

Segregation distortion

Within three captive populations (“Seewiesen”, “Bielefeld”, and “Cracow”) we tested whether the four inversion polymorphisms were transmitted to the next generation in a fair Mendelian way. None of them showed signs of segregation distortion (Additional file 2: Table S2).

Associations with phenotypes

To test whether inversion genotypes had an effect on morphological traits we fitted generalized linear mixed-effects models using eight different Z-transformed phenotypes as response variables (body mass, tarsus length, wing length, beak length, beak depth, beak width, digit ratio, and visible fat deposition scores) and the inversion genotype simultaneously as an additive and a dominance predictor (see above section "Associations with fitness parameters"). For these analyses we used data from three captive zebra finch populations (“Seewiesen”, n = 3233 individuals; “Bielefeld”, n = 1096 individuals; “Cracow”, n = 634 individuals) and from two wild populations (“Fowlers Gap”, n = 939 individuals; “Sydney”, n = 265 individuals).

The inversions on chromosomes Tgu5, Tgu11, and TguZ had strong additive effects on six out of the eight phenotypes. In total, nine out of 40 associations survived a strict Bonferroni correction (Fig. 5). The major allele A of the inversion on chromosome TguZ had the strongest effects overall and increased visible fat deposition (nominal P = 1 × 10⁻¹⁶) and body mass (nominal P = 2 × 10⁻¹⁴) and had a negative effect on tarsus length (nominal P = 4 × 10⁻⁶).

None of the inversions exhibited significant dominance effects on any of the phenotypes (Additional file 1: Figure S11).

Summary across morphological and fitness phenotypes

The effects of inversion genotypes on morphology and fitness could be so small that they were not detectable in our association studies due to low power (thereby committing a type II error; Fig. 6). This is especially true for the fitness components because sample sizes were smaller and effect sizes are expected to be smaller, at least for the additive genetic component (since natural selection should reduce the amount of additive genetic variance in fitness [63]). To obtain the expected distribution of effect sizes under randomness (null distribution) and to estimate the power for different effect sizes we used a permutation approach (see “Methods” for details).

The 40 empirical additive effect size estimates on morphology (5 comparisons between inversions × 8 phenotypes) clearly exceeded the random expectation (observed mean ± standard error (SE) = 0.0494 ± 0.0069, expected mean ± SE = 0.0163 ± 0.0002), suggesting that several of the associations which did not survive a strict Bonferroni correction were in fact real (Fig. 6). In line with this, power to detect small additive effect sizes was relatively low. In contrast, the 40 empirical dominance effects on morphology did not deviate from the random expectation (Fig. 6). In fact, the mean of the 40 dominance effects (0.0251 ± 0.0029) was slightly smaller than the mean of the random expectation (0.0300 ± 0.0004) and power to detect large effects was high. Similarly, neither the 20 empirical additive effect size estimates on fitness (5 comparisons between inversions × 4 fitness components) nor the 16 dominance effects (the same 20 associations excluding dominance effects on female fitness components on chromosome TguZ) deviated from the random expectation, despite high power for detecting large effect sizes (Fig. 6). Because we specifically expected to find positive dominance effects (heterosis) for fitness, we calculated the overall mean of the 16 dominance effects and found an effect size that was very close to zero (weighted d = 0.0019, P = 0.91).

Discussion

Here we describe four large inversion polymorphisms (12–63 Mb) in wild Australian zebra finches using molecular and population genetic tools. Two of them had been identified previously in cytogenetic screens because they shift the position of the centromere [45, 48–50]. In total, the inverted regions span at least 8.7 % of the zebra finch genome and 8.1 % of all annotated genes (based on the Ensembl80 gene predictions). Although all polymorphisms are in HWE, their remarkably similar allele frequencies (range of the major allele 0.53–0.60) may indicate some sort of balancing selection acting on them. We find tentative evidence that the largest two inversions increase embryo mortality in heterokaryotypic males (but not in females), which makes their high allele frequencies even less likely to be due to drift alone. However, although the inversions have an additive effect on several morphological traits, we do not find any dominant gene action and no balancing effects on several aspects of fitness in three captive zebra finch populations.

Ruling out other processes leading to high LD

In this study we used PCAs and LD patterns to detect inversions and to genotype the wild-caught individuals [64–68]. LD in the four inversions was increased 26–137 times compared to same-sized regions in the rest of the genome, which is still an underestimate since in the collinear parts of the genome, LD estimates arise from sampling noise alone (we use r² to measure the strength of LD which can never be negative and never reach 0) [69]. We further backed-up our interpretation by looking at population genomic patterns, which are typical for inversion polymorphisms with the highest LD values at and between the inversion breakpoints (see also the “Detection and description of inversion polymorphisms in a wild population” section in the “Results”) [8, 53, 55]. In addition, for two of the chromosomes (macrochromosomes Tgu5 and TguZ with pericentric inversions) there is independent cytogenetic evidence for polymorphic inversions [45, 48–50] (see below for details).

Regions of high LD in the genome can arise by processes other than inversions that suppress recombination; demographic events (inbreeding, admixture, bottlenecks), genetic drift in regions with low recombination, or natural selection are such mechanisms [70]. The wild Australian zebra finch population has been increasing to a current effective population size of 1.3 to 7 million [52] with no traces of inbreeding [71], admixture, bottlenecks, or population structure [51, 52]. Hence, demographic events cannot have caused the observed high LD on just four chromosomes while maintaining very low LD values on all other chromosomes. We can also reject genetic drift as a mechanism creating the observed high LD spanning more than 12 Mb. In zebra finches, recombination is highly biased towards the telomeres with large “recombination deserts” in the center of the macrochromosomes [58, 72]. Chromosome Tgu2 is the largest chromosome in the zebra finch genome with the largest “recombination desert” (and the lowest recombination rate [58]). However, there is no such long-range LD on chromosome Tgu2 (Additional file 1: Figure S1a) and neither on any other chromosome. There is a sign of drift in a region with low recombination visible on chromosome Tgu2, namely at the centromere between 81 and 82 Mb, thus spanning less than 1 Mb (Additional file 1: Figure S1a). So, could selection in the absence of inversions have caused the regions of high LD we observed? Increased LD due to selection is usually transient [73] and decays rapidly after a beneficial allele has been fixed [74]. In the face of substantial recombination, strong and ongoing (epistatic) selection would be required to keep regions larger than 12 Mb in high LD [70]. However, we did not find any evidence for a selective advantage. Another typical sign of selection is reduced nucleotide diversity [73], but this was only visible on chromosome TguZ (Additional file 1: Figure S7). Moreover, the abrupt increase in diversity at the presumed breakpoints on chromosome TguZ is more likely the result of selection acting on an inversion polymorphism [75]. Taken together, the most likely explanation for the observed long-range LD on chromosomes Tgu5, Tgu11, Tgu13, and TguZ is that they harbor polymorphic inversions.

Comparison to cytogenetic results

Previous cytogenetic analyses detected pericentric inversions on chromosomes Tgu5 and TguZ [45, 48–50]. In these analyses, the inversion on chromosome Tgu5 covered around 31 % of the chromosome with the proximal breakpoint located close to the telomere of the short arm [49], which matches well with our LD region (LD block reaches from 0.96–16.50 Mb, corresponding to 25 % of the chromosome length). On chromosome TguZ, cytogenetic analyses located the proximal inversion breakpoint at around 5.85 Mb (but in a highly repetitive region which is missing in the current genome assembly) and the distal breakpoint beyond 65.38 Mb [50]. Again, this matches well with our LD region (LD block reaches from 5.91–68.83 Mb). LD and PCA patterns for chromosomes Tgu11 and Tgu13 are very much the same as for chromosomes Tgu5 and TguZ. The inversions on these two microchromosomes most likely have not been discovered in the cytogenetic studies because these chromosomes cannot be identified unambiguously and are usually disregarded in such studies. Furthermore, the inversion on chromosome Tgu11 is paracentric and thus does not shift the position of the centromere and on chromosome Tgu13 the change of the centromere position is most likely invisible.

Inversion polymorphisms on chromosomes Tgu5 and Tgu11—smaller and less complex

On chromosome Tgu5, the inversion is pericentric and on chromosome Tgu11 paracentric (see also [45, 48, 57]). Compared to the inversions on chromosomes Tgu13 and TguZ (described further in the next section), the inverted segments on chromosomes Tgu5 and Tgu11 span smaller proportions of the corresponding total chromosome lengths (covering 25 % and 57 % of the total chromosome, respectively). In the PCAs, individuals were only separated along PC1. Individual scores on PC2 showed a normal distribution, which indicates that there is no additional population substructure [76, 77] due to a second rearrangement included in, overlapping with, or independent of the first. In that sense, the inversions on chromosomes Tgu5 and Tgu11 are less complex than those on chromosomes Tgu13 and TguZ (see the next section). Individual SNPs loaded only on PC1 with the strongest loadings close to the breakpoints. Accordingly, the median-joining networks formed only two separated haplotype clusters and also the LD patterns suggested that these are simple single inversions. LD and pooled heterozygosity were highest at the presumed breakpoints and dropped to the central regions, which are typical signs of gene flux due to double crossovers between two simple arrangements [8]. Parsimoniously, if double crossovers do occur, we would also expect that single detrimental crossovers should occur occasionally between the arrangements, leading to unbalanced gametes and embryo mortality, but we did not observe any increased embryo mortality rate in heterokaryotypic individuals.

At least one single crossover per chromosome is needed to ensure the proper segregation of homologous chromosomes in meiosis [78]. On chromosomes Tgu5 and Tgu11, however, the collinear parts of the chromosome are large and a crossover is likely to be initiated there, with no adverse effect on the meiotic products [1]. Alternatively, the inversions may be too rigid to synapse (a loop structure needs to be formed within the inverted region), thereby suppressing recombination [5]. We suspect that detrimental single crossovers within the inverted region in heterokaryotypic individuals happen so rarely that they fall below the detection limit in our analysis of embryo mortality. In line with this interpretation, a recent cytogenetic study on the inversion on chromosome Tgu5 did report no loop formation and no crossovers in 230 meioses from three heterokaryotypic individuals (including females and males) [49]. The absence of an inversion loop suggests that the homologs synapse heterologously [49].

On chromosome Tgu5, heterozygosity (Table 2) and diversity (the spread in the median-joining network) within cluster B is low, suggesting that it increased in frequency in the population and that type A is the ancestral state. Using the same line of argument, on chromosome Tgu11, type A spread in the population and type B is the ancestral state.

Inversion polymorphisms on chromosomes Tgu13 and TguZ—complex and costly

The inversion on chromosome Tgu13 is paracentric and at least one inversion on chromosome TguZ is pericentric (see also [50]). The inverted regions on both chromosomes are very large relative to the corresponding total chromosome lengths (covering 99 % and 86 % of the total chromosome, respectively, but note that the assembly of chromosome Tgu13 appears to be incomplete). Our PCA for chromosome TguZ showed at least three large haplotype clusters (A, B, and C), but also the higher principle component PC3 deviated from a normal distribution, suggesting an even more complex situation and an independent linkage block at the distal inversion breakpoint (Additional file 1: Figure S5). In line with this interpretation, individual SNPs loaded in complex patterns on PC1 to PC3 and the outermost breakpoints were not particularly defined. Itoh et al. [50] described a single large pericentric inversion on chromosome TguZ and we suppose that they identified the two most common types A versus B (B together with C) because our breakpoint locations match with the data in Itoh et al. [50] and there is apparently no LD between types B and C in the region where their tagging length polymorphism (deletion/insertion) is located (Additional file 1: Figure S12). However, in that case the allele frequency estimates by Itoh et al. [50] in wild Australian zebra finches deviate from ours (sums across all sampling locations, assuming that the common type is allele A: A = 61, B + C = 9; for our sample A = 758, B + C = 514, Fisher’s exact test P = 8 × 10⁻⁶). This could be explained if their tagging marker is not reliably linked to the inversion in the wild (the marker is only accurate in 28 out of the examined 30 cases in two captive populations). Note that we used a total of six tagging SNPs that showed perfect clustering in 948 wild birds.

The median-joining network and the number of shared SNPs suggest that haplotypes B and C on chromosome TguZ are more closely related with each other than with haplotype A. Judging from the fuzzy clusters formed in the PCA, gene flux between arrangements seems to happen, either between haplotypes A and C or between haplotypes B and C or between both of the pairs. Thus, inversion types B and C could be more related because of their shared ancestry or because of gene flux and in the end we cannot separate these two possibilities.

The PCA on chromosome Tgu13 separated individuals largely along PC1. However, PC2 distinguished between at least two groups within inversion type A; yet these groups were not completely separated, indicating some more extensive gene flux between them. The higher principle components (≥PC3) were normally distributed, suggesting that there is no additional population substructure [76, 77]. The LD patterns on chromosome Tgu13 suggest that there is gene flux between the two main arrangements (types A and B) due to double crossovers [8] (Fig. 1).

There is tentative evidence that mortality rates are increased in embryos sired by heterokaryotypic males for both chromosomes Tgu13 and TguZ (by a weighted average of 4.5 % for each of the chromosomes across populations). We suspect that these effects are not type I errors resulting from multiple testing because also in human males noticeable rates of unbalanced gametes are produced only when an inversion (both para- and pericentric) spans more than half of the chromosome [5, 11, 12]. In fact, our Fig. 3 shows a remarkable similarity to Fig. 6b in Anton et al. [5]. However, assuming that the rate of unbalanced gametes translates directly into embryo mortality, the effect in humans is an order of magnitude (12-fold) larger than in zebra finches [5], indicating that zebra finches evolved a rather effective way to decrease recombination within inversion heterokaryotypes. Interestingly, the median-joining networks and PCA results suggest that there is a succession of inversions on chromosomes Tgu13 (within haplotype A) and TguZ (haplotypes B and C appear to be more closely related), and accumulating inversions on a chromosome may be a way to increasingly suppress recombination between inversion haplotypes (as, for example, in the t-complex in mice [40]).

Heterokaryotypic female zebra finches did not show increased rates of embryo mortality for the inversion on chromosome Tgu13, which suggests that they are either able to pass on abnormal meiotic products to the second polar body (as found in Drosophila and maize in case of paracentric inversions [6, 14]) or that they are able to shut down recombination within the inverted region almost completely (as suggested for pericentric inversions in grasshoppers [1]).

For both chromosome Tgu13 and TguZ it is difficult to conclude which haplotypes represent the ancestral states from patterns of diversity or the median-joining networks. Within the inversion on chromosome TguZ, SNPs per site were reduced tenfold compared to the collinear outer parts of TguZ (Additional file 1: Figure S7; which was also found by Balakrishnan and Edwards [52]). Hence, the patterns of diversity indicate that all three haplotypes (A, B, and C) are rather recently derived, having replaced the high ancestral diversity that is still present on the distal ends of chromosome TguZ (Additional file 1: Figure S7). However, the minimal sojourn time of at least one of the rearrangements is supposed to be at least 1.2–2.8 million years, which is the estimated split time, with little subsequent gene flow, between Timor and Australian zebra finches [52]. Supposedly, both subspecies are polymorphic for one of the rearrangements [50].

Fitness effects: no heterosis for viability in the wild or for fitness in captivity

The inversions on chromosomes Tgu13 and TguZ, and probably also to a lesser extent the ones on chromosomes Tgu5 and Tgu11, should be costly in terms of increased embryo mortality whenever they are in the heterozygous state in males. Given an effective population size of wild Australian zebra finches of 1.3 × 10⁶ to 7 × 10⁶ [52], it is unlikely that the polymorphisms would have escaped purifying selection and be at frequencies (0.53–0.60) close to their fitness minimum (at an allele frequency of 0.5 the maximal number of individuals are heterozygous), if they do not confer a fitness advantage to their carriers.

The simplest condition for a balanced polymorphism with two alleles is given when both homozygotes have lower fitness than heterokaryotypic individuals (heterosis [1]). In our sample of wild zebra finches, all four inversion polymorphisms were in HWE, indicating that there was no heterosis for viability at the time of sampling the individuals. However, it is possible that heterosis is only expressed during stressful environmental conditions in the wild, such as during a severe drought. Such selective events could be so rare that they did not happen during the few years in which the sampled individuals lived. Furthermore, deviations from HWE are not necessarily expected if heterotic superiority depends on fecundity or siring success rather than viability [1]. Thus, we tested whether the inversions showed heterotic superiority with respect to several other aspects of fitness (female fecundity, male siring success, and the number of offspring produced) in three captive populations of zebra finches. The average of all effect sizes was close to zero (weighted d = 0.0019, P = 0.91), suggesting that none of the four inversions exhibits heterosis, at least in a captive environment. The number of offspring produced is not independent of embryo mortality and the observation of no underdominance for fitness either means that the reduction in fitness is compensated for or that we lack the power to detect it, given that the effect on embryo mortality was barely significant using almost 10,000 eggs.

Heterotic superiority could be due to direct overdominance (resulting from either the inversion breakpoints themselves or an allele fixed on both haplotypes and conferring a heterozygote advantage) or due to recessive deleterious mutations captured by or accumulating on the inverted haplotype (“associative overdominance”) [23]. If indeed heterotic superiority was stabilizing the inversions, we should be able to detect it also in captivity, judging from the strong inbreeding effects that have been observed in the captive Seewiesen population on morphology and fitness components [79, 80]. Thus, although we cannot rule out heterosis completely, other forms of balancing selection are more likely to keep the inversion polymorphic, which do not require heterokaryotypic superiority and do not lead to deviations from HWE, such as (negative) frequency-dependent selection [1], meaning that individuals carrying the rare inversion type have higher fitness than those carrying the more common type. In several Drosophila species, negative frequency-dependent selection stabilized inversion polymorphisms [33, 34, 81]: after the inversion frequencies were experimentally perturbed in a population, individuals with the rare inversion type had higher fitness and the inversion frequencies quickly returned to their equilibrium values in subsequent generations. Interestingly, negative frequency-dependent selection can theoretically lead to stable equilibria even if inversion polymorphisms are underdominant [82]. However, we did not find any significant evidence for frequency-dependent selection in the captive environment. Yet, the strongest effects of inversion frequency on fitness parameters were in the direction expected under negative frequency-dependent selection and this might deserve further study.

Morphological effects: all additive, no dominant gene action

We found remarkable additive genetic effects of the inversion genotypes on several morphological traits, which were highly consistent across populations (Fig. 5). Only one test for heterogeneity between populations was significant after Bonferroni correction, which was the association between beak length and haplotype A versus B on chromosome TguZ (Cochran’s Q test P = 0.026 [83]). In contrast to the strong additive effects on morphology, none of the four inversion polymorphisms exhibited dominant gene action.

Additive genotype–phenotype association studies typically find small effects of individual SNPs on a phenotype [84, 85] and associations are often difficult to replicate between populations due to differences in LD structure [86]. Recently, we tested several promising causal SNPs in collinear parts of the zebra finch genome for an additive association with the same morphological phenotypes and in the same populations as the ones studied here. Individual SNP effects were small and not consistent across populations (Knief et al., unpublished). The contrast between the diminishingly little additive effects of individual SNPs in collinear parts of the genome and the large and consistent additive effects of inversions is most likely due to differences in LD and highlights the polygenic nature of the quantitative traits under study. Whereas in collinear parts of the zebra finch genome LD decays rapidly [52], making associations hard to detect, inversions capture hundreds of alleles in extended defined haplotypes, which do not or hardly ever recombine. Thereby, they combine the additive effects of many causal alleles.

Conclusions

Large inversion polymorphisms are abundant in the estrildid finch family [45–47]. Here we describe inversion polymorphisms in one species belonging to this family, the zebra finch. We find polymorphic inversions on at least four out of its 32 annotated chromosomes. In each case, a novel haplotype has spread to about 50 % allele frequency and has persisted for an extended period of time. However, exactly when and in which species these inversions arose remains to be worked out. It will be interesting to see whether the polymorphisms are shared between species (as suggested in juncos [42]) and if so whether they introgressed into zebra finches (as it is apparently the case in white-throated sparrows [22]). Of course, we do not have any information on the ancestral fitness landscape of these inversion polymorphisms and thus on the mechanisms that led to their establishment, but we tested which selective forces maintain them in a polymorphic state at present. However, these forces remain elusive: we found no signs of heterosis for viability in a wild population or for other fitness-related traits in captivity despite our ability to measure fitness effects of recessive deleterious mutations in captivity [80]. Nevertheless, some benefit to the individual (undetected heterosis or frequency-dependent selection) or to the genotype itself (segregation distortion [87]) is expected, since a small cost remains: heterokaryotypic males produce a higher proportion of inviable embryos, presumably due to single crossovers within the inverted region. It appears that past selection has effectively minimized this cost: (1) “Small” inversions (chromosomes Tgu5 and Tgu11) do not observably increase the proportion of inviable embryos produced by heterokaryotypic individuals. Perhaps these inversions do not synapse regularly in meiosis, thereby reducing the risk of detrimental crossovers. (2) Heterokaryotypic females do not exhibit increased rates of embryo mortality even for the largest inversion on chromosome Tgu13. Thus, they may have found a way to deposit the abnormal meiotic products (the dicentric single-crossover chromatids in case of a paracentric inversion) to the polar bodies. (3) The effects on embryo mortality in heterokaryotypic males for the two largest inversions on chromosomes Tgu13 and TguZ are an order of magnitude smaller than those reported in humans. We suspect that this could be due to selection favoring repeated inversions on the same chromosome, thereby effectively suppressing pairing of the inversion types during meiosis and inhibiting detrimental crossovers. Additionally, the highly skewed distribution of recombination events towards the chromosome ends in zebra finches [58] and other Estrildidae species [72] may minimize crossovers in the inverted regions. The interior parts of zebra finch chromosomes show large recombination deserts (15-fold lower recombination rate compared to the chicken [58]) and it is possible that the underlying molecular mechanism was favored by selection because it also suppresses recombination in inverted regions. Testing this idea quantitatively would require an improved assembly of chromosomes Tgu11 and Tgu13.

Methods

Inversion discovery in wild zebra finches

Study population and phenotypes

We took blood samples from 1059 wild adult zebra finches (530 females, 529 males) at Fowlers Gap, NSW, Australia, in two places (S 30°57’ E 141°46’ and S 31°04’ E 141°50’) from October to December 2010 and in April/May 2011. A detailed description of the study sites and catching procedure using a walk-in trap at feeders is provided in Griffith et al. [88] and Mariette and Griffith [89]. In this study we refer to this population as “Fowlers Gap”.

The following phenotypes were measured on all birds: right tarsus length, right wing length, beak length, beak depth, beak width, ratio of the length of the second to fourth digit of the right foot (measured twice and averaged), and body mass. Further details on the measurement procedures and summary statistics are given in Knief et al. (unpublished, available upon request). We included a score-based measure of visible fat on the ventral side at the furcular depression and at the abdomen [90].

Population-level SNP data and sequencing

We sequenced pooled non-barcoded DNA samples from 100 of the 1059 “Fowlers Gap” individuals on the Illumina HiSeq 2000 platform (paired-end) at the Institute of Clinical Molecular Biology (IKMB) at Kiel University, Germany. Software input parameters are provided in Knief et al. [71]. Briefly, after mapping reads to the zebra finch genome assembly (WUSTL 3.2.4 [56]) using bwa (v0.5.9 [91]), we calculated an average genome coverage of 247.5× (using BEDTools v2.17.0 [92]) and called around 23 million SNPs using GATK (v2.1-11-g13c0244 [93]). SNPs with a minor allele count frequency (MAC) below 0.1 were rarer than expected due to an ascertainment bias in the SNP discovery pipeline [71].

Pooled population sequencing allows estimating diversity and allele frequencies across the genome [94]. Although individual-based data were missing, we calculated a measure of heterozygosity (pooled heterozygosity, H_p) in 50-kb non-overlapping sliding windows along the autosomes [61] as H_p = 2 × ∑n_MAJ × ∑n_MIN/(∑n_MAJ + ∑n_MIN)², where n_MAJ and n_MIN are counts of reads covering the major and minor allele, respectively, and ∑n_MAJ and ∑n_MIN are the sum of all these counts in a 50-kb window. We transformed the H_p values into Z-scores (ZH_p) as ZH_p = (H_p − μH_p)/σH_p.

In order to locate the inversion breakpoints with high resolution, we used the BreakDancer (v1.1) [95] and “clipping reveals structure” (CREST v0.0.1) [96] algorithms with default settings on our mapped paired-end pooled-sequencing reads. BreakDancer makes use of read pairs which are separated by unexpectedly large distances or which are oriented in a parallel manner in comparison to the reference genome to identify structural variants. On the other hand, CREST uses the unaligned portion of a sequencing read (soft-clipping information stored along with the mapped reads) and maps it to the reference genome to predict structural variants.

SNP chip design

From the 23 million SNPs we designed an Illumina Infinium iSelect HD Custom BeadChip with 6000 attempted bead types [71]. In short, 884 SNPs resided within candidate genes for an association study and were not used for the present study and 4405 SNPs covered all assembled chromosomes except chromosome Tgu16. We attempted to position at least 40 physically evenly spaced SNPs on each chromosome, yet this was not possible for chromosomes Tgu1B (n = 33 SNPs) and Tgu25 (n = 24 SNPs) because too few SNPs passed our filtering procedure [71]. In regions of the genome where the pooled heterozygosity was exceptionally high we increased the SNP density. Overall we intended to genotype 5289 SNPs (which summed up to 6000 bead types because we did not exclude C/G and A/T SNPs that require two bead types for genotyping) and the final chip delivered by Illumina contained 4553 of these SNPs, with drop-outs being randomly distributed along chromosomes (Knief et al., unpublished).

Median marker spacing of SNPs on the chip was 243.17 kb (interquartile range [IQR] = 16.68–343.70 kb) on macrochromosomes (chromosomes Tgu1–Tgu5, Tgu1A), 239.03 kb (IQR = 20.57–355.14 kb) on microchromosomes (all other autosomes) and 174.63 kb (IQR = 161.11–179.40 kb) on chromosome TguZ.

Individual genotyping and quality control

All 1059 “Fowlers Gap” individuals were genotyped for the 4553 SNPs at the IKMB at Kiel University. Quality control was done using the R package GWASTools (v1.6.2) [97] and details are provided in Knief et al. [71]. In summary, we removed 111 individuals with a missing call rate larger than 0.05 (which was due to DNA extraction problems, but these birds were genotyped in the follow-up study; see the “Follow-up genotyping and phenotyping in captive populations” section below), leaving 948 individuals. Further, we removed 152 SNPs that did not form defined genotype clusters, or had high missing call rates (missing rate >0.1), or were monomorphic, or deviated strongly from HWE (Fisher’s exact test P < 0.05/4553), or because their position in the zebra finch genome assembly was likely not correct, leaving 4401 SNPs.

LD calculations

Inversion polymorphisms lead to extensive LD across the inverted region, with the highest LD near the inversion breakpoints because recombination in these regions is almost completely suppressed in inversion heterozygotes [53–55]. To screen for inversion polymorphisms we did not resolve genotypic data into haplotypes and thus based all LD calculation on composite LD [98]. We calculated the squared Pearson’s correlation coefficient (r²) as a standardized measure of LD between every two SNPs on a chromosome genotyped in the 948 individuals [99, 100]. In order to calculate and test for LD between inversions we used the methods described in [101] to obtain r² and P values for loci with multiple alleles.