The Aquilegia genome reveals a hybrid origin of core eudicots
Whole-genome duplications (WGDs) have dominated the evolutionary history of plants. One consequence of WGD is a dramatic restructuring of the genome as it undergoes diploidization, a process under which deletions and rearrangements of various sizes scramble the genetic material, leading to a repacking of the genome and eventual return to diploidy. Here, we investigate the history of WGD in the columbine genus Aquilegia, a basal eudicot, and use it to illuminate the origins of the core eudicots.
Within-genome synteny confirms that columbines are ancient tetraploids, and comparison with the grape genome reveals that this tetraploidy appears to be shared with the core eudicots. Thus, the ancient gamma hexaploidy found in all core eudicots must have involved a two-step process: first, tetraploidy in the ancestry of all eudicots, then hexaploidy in the ancestry of core eudicots. Furthermore, the precise pattern of synteny sharing suggests that the latter involved allopolyploidization and that core eudicots thus have a hybrid origin.
Novel analyses of synteny sharing together with the well-preserved structure of the columbine genome reveal that the gamma hexaploidy at the root of core eudicots is likely a result of hybridization between a tetraploid and a diploid species.
Whole-genome duplication (WGD) is common in the evolutionary history of plants (reviewed in [1, 2]). All flowering plants are descended from a polyploid ancestor, which in turn shows evidence of an even older WGD shared by all seed plants . These repeated cycles of polyploidy dramatically restructure plant genomes. Presumably driven by the “diploidization” process, whereby genomes are returned to an effectively diploid state, chromosomes are scrambled via fusions and fissions, lose both repetitive and genic sequences, or are lost entirely [4, 5, 6, 7, 8, 9, 10, 11]. Intriguingly, gene loss after WGD is non-random: there is a bias not only against the retention of certain genes [12, 13] but also against the retention of one of the WGD-derived paralog chromosomes [6, 9, 14, 15, 16].
We investigated the history of WGDs in the columbine genus Aquilegia for two reasons. The first is related to its phylogenetic position: columbines have been referred to as “basal” eudicots because they appear to be an outgroup to most other “core” eudicot lineages [17, 18]. This matters because our understanding of eudicot karyotype evolution is limited to the heavily sampled core eudicots. Using the recently published Aquilegia coerulea genome , we were able to address key questions about the history of polyploidization in all eudicots. Second, we traced the origins of the columbine chromosomes with a particular focus on the strange chromosome 4, which differs from the rest of the genome in many ways. In particular, it harbors more genetic polymorphism and transposable elements, has lower gene density and reduced gene expression, and appears to migrate more, including between species. It also carries the rDNA clusters, and there is reason to believe that knowing the history of the chromosome could help explain its aberrant behavior .
2.1.Within-genome synteny confirms columbine paleotetraploidy
Ancient WGDs have commonly been inferred from the distribution of divergences between gene duplicates. The simultaneous generation of gene duplicates via WGD is expected to produce a peak in the age distribution relative to the background age distribution of single-gene duplicates [20, 21, 22]. Such a spike of ancient gene birth was the first evidence of paleotetraploidy in columbines  and was later supported by gene count-based modeling .
2.2.Columbines share ancient tetraploidy with core eudicots
We used two different approaches to detect this pattern. First, we clustered homologous segments based on gene order similarity (“Materials and methods”). The pairwise comparisons show that each member of columbine paralogs matches a different grape chromosome (Additional file 1: Figure S6–S8). Reshuffling genes on grape chromosomes further indicates that this pattern of clustering is highly unlikely to be produced by chance (p = 0–0.05). Second, we attempted to corroborate the clustering based on gene order similarity by clustering homologous regions based on similarity in protein sequence (“Materials and methods”). Because of the deep history of shared tetraploidy, only a small fraction of all the informative gene trees (0.016–0.044) show the “expected” pairings (Additional file 1: Figure S6–S8), and it is thus not possible to infer history from individual trees. Although it is possible that more sophisticated tree-building methods would perform better, the order of the homologous genes that do show the expected pairwise clustering (based on sequence divergence) again recaptures the clustering pattern inferred from synteny alone (compare Additional file 1: Figure S7 and S9). Thus, the clustering pattern inferred from synteny is mirrored in clustering based on sequence divergence.
2.3.The core eudicots have a hybrid origin
Our inference of shared tetraploidy between basal and core eudicots makes use of the signals presumably generated by diploidization (Figs. 5 and 7). However, hybridization of unreduced gametes from two divergent diploid genomes, “allotetraploidy,” would also lead to gene order-based clustering between two different pairs of grape and columbine chromosomes (Additional file 1: Figure S11–S12). In this case, the alternative paralogous gene orders in the tetraploid ancestor reflect the gene orders on the progenitor chromosomes. Thus, the clustering pattern does not depend on whether the eudicot tetraploid genome evolved via “auto-” or “allopolyploidy.” The same is not true for the second part of the process leading to hexaploidy. In this case, autopolyploidy would lead to the duplication of one of the existing chromosomes, whereas allohexaploidy would lead to one of the three paralogous grape chromosomes being an “outlier” with respect to the two grape-columbine pairing (Additional file 1: Figure S11–S12)— which is what we see in our data (Additional file 1: Figure S6–S8).
2.4.Current columbine chromosomes have mostly been generated via fusions
The lack of a fusion event on columbine chromosome 6 might explain the fact that it is the smallest chromosome of columbine (Fig. 6). However, chromosome 4 is comparable in size to the remaining chromosomes, all of which are products of ancient fusion events. The observations that chromosome 4 has a higher proportion of genes in tandem duplicates (0.37 versus genome-wide mean of 0.22) and a greater extent of intra-chromosomal synteny (indicative of segmental duplications) (Additional file 1: Figure S17) suggest that chromosome 4 has reached a comparable size partly due to numerous tandem and segmental duplications and partly due to an expansion of repetitive DNA . These results reinforce the idea that chromosome 4 has followed a distinct evolutionary path from the rest of the genome .
Fusion-dominated genome shuffling  is not the only facet of diploidization . Following WGDs, gene duplicates get lost and this happens in a non-random manner. Genes involved in connected molecular functions like kinases, transcription factors, and ribosomal proteins are retained in pairs [41, 42, 43, 44, 45] potentially due to dosage-related constraints : losing or duplicating some, but not all of these dosage-sensitive genes, might upset the stoichiometric relationship between their protein products [47, 48, 49]. Consistent with this dosage balance hypothesis, columbine genes potentially retained post WGD (1302 genes across 76 syntenic regions; Additional file 2: Table S1) are enriched for the GO categories “structural constituent of ribosome,” “transcription factor activity,” “translation” (p < 0.001), and “protein tyrosine kinase activity” (p < 0.01). Tandemly duplicated genes (n = 6972), on the other hand, are depleted for the GO categories “structural constituent of ribosome” and “translation” (p = 10−17), reflecting the role of dosage-related purifying selection.
All flowering plants are descended from a polyploid ancestor, and with only a few exceptions (e.g., Amborella from basal angiosperms ), all of them experienced at least one further round of WGD. Within-genome synteny (Fig. 1) shows that the columbine is an example of the latter, confirming the conclusions from other studies [23, 24, 51]. Here we show that this columbine tetraploidy is a remnant of a WGD at the base of all eudicots and is thus far more ancient than previously thought [23, 51]. Furthermore, we use this observation to argue that the hexaploidy shared by the core eudicots must have involved allopolyploidy, i.e., presumably hybridization between the ancestral tetraploid and a diploid species.
A eudicot-wide WGD has been suggested by several studies [26, 35, 36, 52]. Our synteny-based approach solidifies these findings by demonstrating that the columbine and grape genomes have inherited the genome structure of a common tetraploid ancestor. That we can trace such an ancient tetraploidy is due to two facts. First, the genome structure of columbine is well-preserved and free from recent WGDs. Second, genomes provide much greater information than genes alone. Indeed, a recent study on another basal eudicot , the opium poppy, highlights these two facts. Having experienced a recent WGD (~ 8 million years ago), the genome of the poppy is dominated by syntenic gene pairs of low divergence (Fig. 1C in ), although it also carries highly diverged paralogs whose Ks values nicely overlap with our estimates for columbine and grape, consistent with an eudicot-wide WGD (compare Fig. S13D in  to Fig. 4). However, the strength of the signal from recent polyploidization largely obscures the much weaker signal of ancient polyploidization (interpreted as segmental duplications by Guo et al. ). In fact, although overlooked by the authors, the intergenomic synteny between columbine and poppy provides a clear signature of an eudicot-wide WGD (Fig. S9D in ). Differing from columbines with only one additional genome duplication, the poppy genome aligns to the columbine genome in a 4:2 manner, with 4 paralogous regions of poppy syntenic to 2 paralogous regions of columbine derived from the ancient shared tetraploidy.
Our approach also helps us shed light on the nature of the gamma hexaploidy found in all core eudicots ([9, 28, 29, 30, 31, 32], and Supplementary Note 5 in ). WGDs have often been discussed as if they were “events,” ignoring the process by which they originated. We show here that core eudicot hexaploidy is the result of two processes: an ancient tetraploidization shared by all eudicots, followed by allopolyploidization leading to the core eudicots. In other words, all core eudicots have a hybrid origin. An allohexaploid origin has indeed been previously suggested by Murat et al. , who identified the three subgenomes of grape using differential patterns of gene loss on “dominant” versus “sensitive” subgenomes. Their classification assumes that the most recently added set of paralogous chromosomes will be “dominant,” because they have spent a shorter amount of time in the polyploid genome and thus experienced fewer gene losses. Contrary to this, our results suggest that the most recently added grape chromosomes (chromosomes 3, 8, 9, and 14) largely correspond to the “sensitive” grape chromosomes identified by Murat et al. . Instead, we argue that the extensive gene loss in the most recently added subgenome reflects its divergence from the other two subgenomes at the time of hexaploid formation, perhaps similar to the situation in the allotetraploid Arabidopsis suecica, which is a hybrid between the more ancestral-like (n = 8) genome of A. arenosa and the heavily reduced (n = 5) genome of A. thaliana . Another example is hexaploid wheat, which is a hybrid between tetraploid emmer wheat and wild diploid grass, Aegilops tauschii ( and references therein]).
Our findings reveal the hybrid structure of core eudicot genomes and will hopefully help us understand what hybridization has meant for core eudicots—a group which comprises more than 70% of all living flowering plants . What are the hybridization-coupled changes that have led to the current patterns of gene expression, methylation, or transposable element density/distribution? All these questions call for additional genomes from basal eudicots which—as this study illustrates—have great values as outgroup to the core eudicots. More data will also allow the development of sophisticated analysis methods based on explicit models of the evolution of gene order, which our results suggest is a very powerful source of information about the past.
5.Materials and methods
We performed all genes (CDS)-against-all genes (CDS) BLAST for the latest version of Aquilegia coerulea reference genome (v3.1) using the SynMap tool  in the online CoGe portal . We also looked at the synteny within Vitis vinifera (v12) and between A. coerulea and V. vinifera using default parameter combinations in DAGChainer. We filtered the raw output files for both within-grape and columbine-to-grape synteny. For the former, we only kept the blocks that are syntenic between the polyploidy-derived paralogous chromosomes of grape as identified by Jaillon et al.  (Additional file 4: Table S3). For the latter, we required that a given columbine chromosome is overall syntenic to all the three paralogous chromosomes of grape (Additional file 3: Table S2). So, for a given pair of columbine and grape chromosomes, we only kept the blocks if the columbine chromosome also matches to the other members of paralogous grape chromosomes.
The raw output files can be regenerated at the CoGe portal  using the id numbers provided below for each species (Availability of data and materials) and changing the default parameter combination in DAGChainer (D:A = 20:5) when needed. D and A specify the maximum genic distance between two matches and the minimum number of aligned gene pairs, respectively, to form a collinear syntenic block.
5.2.Estimating the divergence between synteny block pairs
We used Ks (the number of synonymous substitutions per synonymous site) values provided for each homolog gene pair by the CoGe portal . We estimated the median Ks of homologous genes in a synteny block after filtering gene pairs with Ks > 10 due to a saturation effect . Both values are provided in Supplementary Data 1–3 for within columbine, columbine-to-grape, and within-grape synteny, respectively.
5.3.Quantifying gene order similarity
We “reconstructed” a given set of columbine and grape chromosomes at their homologous regions (color-coded in Fig. 6). We seeded this reconstruction by focusing on at least three consecutive genes aligning between a pair of columbine and grape chromosomes (D:A = 0:3). We particularly chose three genes since it is the most stringent value we could use to detect homologous synteny blocks; we detected almost nothing when we required 4 consecutive genes (D:A = 0:4). This stringent criteria aim to minimize the effect of gene movement on the homology between columbine and grape chromosomes. Once we had the list of genes, we then looked for their paralogous counterparts on the remaining columbine and grape chromosomes using intragenomic gene-to-gene blast (D:A = 0:1). Having chromosomes represented by syntenic gene sets and reminiscent of these sets (Additional file 1: Figure S5 and Additional file 5: Table S4), we assigned a unique word to each synteny block and the genes forming the block to be able to use the text alignment provided by the R package align_local . We then quantified the gene (“word”) similarity as such: for an initial N number of words on a columbine chromosome (N = window size), we did a pairwise alignment between these N words and all the words a grape chromosome (match = 4, gap = − 1). We repeated the same analysis with the inverted order of N words and picked the maximum alignment score. We repeated these steps by sliding the window by one word and keeping the N constant to get a distribution of scores as in Additional file 1: Figure S6–S8. We used different N values ranging from 4 to 15. Note that we excluded columbine chromosomes 3, 4, and 7 from this analysis since they all have a complex history of lineage-specific chromosomal reshuffling events (Figs. 1, 6, Additional file 1: Figure S4 and S17).
We applied the same stringent criteria (D:A = 0:3) to detect the homologous regions between grape and cacao (Theobroma cacao, v1). The same criteria led to very few homologous regions between columbine and cacao. So, we relaxed the parameters for the synteny detection between these two genomes (D:A = 0:2) and quantified the gene order similarity with greater window sizes (N = 20, 30, 35, 40, and 50). Note that we focused on the triplicated regions distributed across 3 different cacao chromosomes (Fig. 8, Additional file 1: Figure S13–S14), which are rather unaffected by lineage-specific shuffling .
5.4.Statistical testing of gene order similarity
Given the gene order similarity between the two different pairs of columbine and grape chromosomes harboring homologous regions, we performed permutation tests to estimate the probability of observing such a clustering just by chance. To do so, we first combined all the grape genes and sampled the same number of genes (“words”) as we observe to reconstruct each of the paralogous grape chromosome. We repeated the quantification step as above to get a permuted distribution of alignment score between a pair of columbine and grape chromosomes. We used the Wilcoxon rank sum test (W-statistic) to quantify the shift in the distribution of alignment scores between one of the members of columbine paralogous chromosomes and its best grape hit when combined with the alignment scores between the same columbine chromosome and other grape chromosomes. We repeated the same analysis for the other member of columbine paralogous chromosomes as well. Having these observed W-statistics, we counted the number of cases (out of 100) where the permuted distributions generate W-statistics as high as or higher than the observed ones. We ran permutation tests for the columbine-cacao pairing as well (Additional file 1: Figure S13–S14).
5.5.Building gene trees
We built upgma trees for the homologous genes (Additional file 6: Table S5) distributed across a given set of columbine and grape chromosomes (color-coded in Fig. 6). We first detected homologous genes aligning between a pair of columbine and grape chromosomes (D:A = 0:1). We then searched for their paralogous counterparts using intragenomic blasts (D:A = 0:1). For protein alignment, we required at least five homologous genes, each from a single chromosome in the given set and ran ClustalW2 (v2.1) with the options -TREE -KIMURA -CLUSTERING = UPGMA -OUTPUTTREE = dist . Of all the trees generated by ClustalW2 (informative trees), we only focused on the ones that support the synteny-based pairings (Additional file 1: Figure S6–S8), which are detected by the subtrees function in R package ape [60, 61]. Once we had the sets of homologous genes from this subset of trees, we assigned a unique word to the each set and quantified the gene order similarity between pairs of columbine and grape chromosomes as mentioned above.
For protein sequences, we used the annotations provided by JGI and Ensembl for columbine  and grape , respectively. Note that CoGe  outputs grape genes with the “PAC” tag while they are tagged with “VIT” in the Ensembl database. To match these different ids, we used two intermediary files. The first one is a gff file provided by CoGe (available at https://genomevolution.org/coge/GenomeInfo.pl?gid=19990). The second one is a conversion file provided by the Grape Genome Database  that lists the correspondence between different gene ids (can be downloaded from http://genomes.cribi.unipd.it/DATA/). These two files contain the common tag “GSVIVT” which bridges the “PAC” and “VIT” tags.
5.6.GO enrichment analysis
2 × 2 contingency table obtained by classifying genes into 2 categorical variables. The letters denote the number of genes for a given category (e.g., “a” denotes the number of retained genes annotated with the tested GO category)
a + b*
c + d
a + c
b + d
N = total number of genes
=29,550 (across 7 chromosomes)
We thank Robin Burns and Claus Vogl for their comments on the manuscript; Daniel Gómez Sánchez and Benjamin Jaegle for the fruitful discussions.
Peer review information
Andrew Cosgrove and Barbara Cheifet were the primary editors on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
GA performed all analyses. GA and MN wrote the manuscript. Both authors read and approved the final manuscript.
G.A. was supported by the Vienna Graduate School of Population Genetics (Austrian Science Fund, FWF: DK W1225-B20).
Ethics approval and consent to participate
The authors declare no competing interests.
- 4.Leitch IJ, Bennett MD. Genome downsizing in polyploid plants. Biol J Linn Soc Lond. Oxford University Press. 2004;82:651–63.Google Scholar
- 8.Renny-Byfield S, Chester M, Kovařík A, Le Comber SC, Grandbastien M-A, Deloger M, et al. Next generation sequencing reveals genome downsizing in allotetraploid Nicotiana tabacum, predominantly through the elimination of paternally derived repetitive DNAs. Mol Biol Evol. 2011;28:2843–54.PubMedCrossRefPubMedCentralGoogle Scholar
- 19.Filiault DL, Ballerini ES, Mandáková T, Aköz G, Derieg NJ, Schmutz J, et al. The Aquilegia genome provides insight into adaptive radiation and reveals an extraordinarily polymorphic chromosome with a unique history. Elife. 2018;7. https://doi.org/10.7554/eLife.36426.
- 31.Truco MJ, Ashrafi H, Kozik A, van Leeuwen H, Bowers J, Wo SRC, et al. An Ultra-High-Density, Transcript-Based, Genetic Map of Lettuce. G3. 2013;3:617–31.Google Scholar
- 60.R Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2013. Available from: https://www.R-project.org/Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.