Introduction

Cork oak (Quercus suber L., Fagaceae) is an evergreen tree species native to the western Mediterranean region where it has considerable economic importance (Aronson et al. 2009; Vessella et al. 2017). Its geographical distribution has been shaped by glaciation cycles (López de Heredia et al. 2007a; Vessella et al. 2015), similarly to other European tree species (Bagnoli et al. 2016; de Dato et al. 2020; Petit et al. 2002). Quercus suber is not only a dominant climax forest species in many natural settings across its distribution range, but also a vital element in managed tree-grassland agro-systems known as “montado” or “dehesa,” together with Q. rotundifolia  Lam. These agro-systems are the main source of commercial cork but also provide grazing grounds for livestock (Rolo and Moreno 2019) and harbour significant levels of biodiversity (Lopez-Sanchez et al. 2016).

Quercus suber has a significant regional importance for the primary sector, and has been the subject of numerous studies, for example on population structure (Pina-Martins et al. 2019; Ramírez-Valiente et al. 2010; Soto et al. 2007; Vanhove et al. 2021), adaptation (Modesto et al. 2014; Pina-Martins et al. 2019; Vanhove et al. 2021), hybridisation (Belahbib et al. 2001; Burgarella et al. 2009; López de Heredia et al. 2020; Staudt et al. 2004) and biogeography (López‐de‐Heredia et al. 2005, 2007a; Lumaret et al. 2005; Magri et al. 2007; Simeone et al. 2018; Vila-Viçosa et al. 2020).

Intraspecific genetic diversity in Q. suber can be considered moderate to low and reflects a weak structure along a longitudinal axis (Pina-Martins et al. 2019), but specific regions, in particular along the southern margins of its distribution range, are known to harbour higher levels of genetic diversity (Sousa et al. 2022). Genetic studies in Q. suber have analysed both nuclear data (Burgarella et al. 2009; López de Heredia et al. 2020; Pina-Martins et al. 2019; Vanhove et al. 2021) and chloroplast data (cpDNA), which can be informative in population-level studies of terrestrial plant groups. The chloroplast genome is non-recombinant, usually maternally inherited and haploid, meaning that chloroplast DNA variants, or haplotypes, should become more rapidly fixed within populations, i.e. have a shorter coalescence time, compared to nuclear (diploid) allelic variants (Mariac et al. 2014; Petit and Vendramin 2007). Thus, chloroplast genomic data may provide insights on recent demographic and hybridisation events (López de Heredia et al. 2020; Pham et al. 2017).

Chloroplast haplotypes in Q. suber have been investigated using RFLP and PCR–RFLP markers (Jiménez et al. 2004; López‐de‐Heredia et al. 2005, 2007a, b; Lumaret et al. 2005; Lumaret and Jabbour-Zahab 2009), microsatellites (Magri et al. 2007) and sequence data (Belahbib et al. 2001; López de Heredia et al. 2020; Simeone et al. 2018). The different types of data have shown the presence of two or more haplogroups, which have been interpreted as the result of introgression events, possibly with adaptive influence, between Q. suber and other oak species (Jiménez et al. 2004; Lumaret and Jabbour-Zahab 2009; López de Heredia et al. 2017), as a consequence of isolation and subsequent expansion from glacial refugia (López de Heredia et al. 2007a; Lumaret et al. 2005) or as a product of Miocene plate tectonics (Magri et al. 2007). Reconstructing haplotype patterns from multiple chloroplast sequences and sampling broadly across the distribution range of Q. suber may bring new insights and enable a stronger resolution of haplotype diversity and distribution in this species, compared to earlier inferences made from RFLPs, microsatellites and single chloroplast regions. The main questions to address are whether chloroplast haplotype diversity in Q. suber is geographically structured, and whether it reflects ancient diversification or was instead shaped mostly by hybridisation events with other Quercus L. species. In addition, haplotype sequence data analysis may show which regions harbour the most genetic diversity in the chloroplast genome.

Here we expand the data set of Costa et al. (2011) based on three chloroplast intergenic regions (trnL-F, trnS-psbC and trnH-psbA) by sampling additional populations and by including the matK gene, which is considered a reliable marker for retrieving chloroplast phylogenies (Hochbach et al. 2018). The matK gene has been proposed as a barcode marker for plants (CBOL Plant Working Group. 2009) and has indeed been tested as a barcode marker for genus Quercus (Piredda et al. 2011; Simeone et al. 2013). We conduct diversity and phylogenetic analyses on our expanded dataset with the objective of generating a detailed characterisation of chloroplast diversity and its geographical distribution across the distribution range of Q.suber. We compare our results with earlier analyses of Q. suber cpDNA and use a time-calibrated analysis to elaborate on the origin of Q. suber chloroplast genome diversity.

Materials and methods

Sampling and DNA sequencing

Samples of Quercus suber (subgenus Cerris, sect. Cerris) originated from 26 sites representing the entire natural range of the species. A map showing all sampling sites and the natural distribution of Q. suber was generated based on Caudullo et al. (2017) (Fig. 1).

Fig. 1
figure 1

Map of sampling sites. Map showing the natural distribution of Quercus suber (grey) and the location of sampling sites (black circles). For each site, the presence of haplotypes of either haplogroup is indicated by green (suber I) and orange (suber II) squares

Diversity measures were estimated at a regional level by combining sites of each country. Leaf material was sampled in situ from natural stands in eight sites in Portugal (Gerês, Serra da Estrela, Serra de São Mamede, Serra da Arrábida, Serra de Monchique, Serra do Buçaco, Azeitão and Serra de Sintra). The remaining samples (Portugal: São Brás de Alportel; Spain: Cataluña, Montes de Toledo, Haza del Lino, Sierra de Aracena, Sierra Morena, Sierra de Guadarrama; Italy: Puglia, Lazio, Sicily; France: Var, Landes, Corsica; Algeria: Forêt des Guerbès; Tunisia: Mekna, Fernana; Morocco: Taza, Kenitra) were obtained from a cork oak provenance trial (FAIR I CT 95 0202) established in 1998 at Herdade Monte da Fava (Santiago do Cacém, Portugal; 8°7′ W, 38°00′ N) as part of the European Forest Genetic Resources Programme (EUFORGEN; Varela 2003). Samples from nine additional Quercus species (subgenus Cerris, sect. Ilex: Q. coccifera L., Q. ilex L., Q. rotundifolia; sect. Cerris: Q. cerris L.; subgenus Quercus, sect. Quercus: Q. canariensis Willd., Q. faginea Lam., Q. lusitanica Lam., Q. pyrenaica Willd.; sect. Lobatae: Q. rubra L.) and from an outgroup species in the Fagaceae (Castanea crenata Siebold & Zucc.) were also sampled. Quercus rotundifolia is often considered a subspecies of Q. ilex (e.g. López de Heredia et al. 2007a). Here we follow the classification of the Plants of the World index (https://powo.science.kew.org) that places Quercus ilex subsp. rotundifolia (Lam.) O.Schwarz ex Tab.Morais in synonymy under Q. rotundifolia Lam. All these oak species have native distribution ranges that overlap with the natural distribution of Q. suber, except for Q. rubra, which is introduced in Europe. All leaf material, corresponding to 181 samples, was stored at −80 °C until DNA extraction. Sampled taxa and sampling sites, with the corresponding site codes, are presented in Table 1.

Table 1 Sampling

Leaf tissue was manually ground using liquid nitrogen and genomic DNA was extracted with the DNeasy Plant Mini Kit (Qiagen) following the manufacturer’s protocol. Four chloroplast markers (intergenic spacer regions trnL-F, trnS-psbC, trnH-psbA and the matK gene, in part) were amplified by PCR in a final volume of 25 μL with the following conditions: denaturation step at 94 °C (5’’); 30 cycles of: denaturation at 94 °C (20’’), annealing at 65 °C (for cpDNA intergenic spacers) or 55 °C (for matK) (30’’), extension at 72 °C (40’’); final extension step at 72 °C for 7 min. Primer pairs were obtained from earlier publications (matK: Cuénoud et al. 2002; trnH-psbA: Kress et al. 2005; trnL-F: Taberlet et al. 1998; trnS-psbC: Nishizawa & Watano 2000). Amplified PCR products were verified by gel electrophoresis and purified with SureClean (Bioline). Sequencing of PCR products was done using the BigDye v3.1 chemistry (Applied Biossystems,USA), in house, on an ABI prism 310 automated sequencer, and by outsourcing to Macrogen (South Korea). Sequences were edited with Sequencher v4.0.5 (Gene Codes Corporation).

Alignment, diversity estimates and phylogenetic analyses

Four individual matrices (matK, trnH-psbA, trnL-F, trnS-psbC) were aligned using the software MAFFT v. 7.2 (Katoh and Standley 2013), using a gap penalty of 1 and a maximum of 10 iterative refinements (mafft –op 1.0 –maxiterate 10). The four chloroplast marker alignments were concatenated using the script catfasta2phyml.pl (https://github.com/nylander/catfasta2phyml).

The package “pegas” (v. 1.1; Paradis 2010) implemented in R v. 3.6.3 (R core Team 2013) was used to estimate the number of haplotypes, haplotype and nucleotide diversity, and to test for demographic change using Tajima’s D (Tajima 1989) and R2 (Ramos-Onsins and Rozas 2002).

Maximum-parsimony (MP) trees were inferred from the cpDNA alignment using the program TNT v 1.1 (Goloboff et al. 2008) and a “traditional search” on the Phylogeny.fr server (http://www.phylogeny.fr/). Branch support was estimated from a standard bootstrapping with 1000 replicates. Substitution models were inferred for the concatenated cpDNA alignment using jModelTest v. 2.1 (Darriba et al. 2012) with three substitution schemes and Akaike Information Criterion (AIC) calculations. A maximum-likelihood (ML) full bootstrap analysis of each alignment, with 1000 replicates, was run on RaxML v. 8.2.4 (Stamatakis 2014) using the best-fitting models inferred with AIC (concatenated matrix: GTR) and the gamma model of rate heterogeneity.

A reduced cpDNA alignment with only one sequence per haplotype (46 sequences) was analysed on BEAST v 2.6.7 (Bouckaert et al.2014), under the GTR substitution model and a discrete four category gamma model of site rate heterogeneity. A prior was defined for the monophyly of genus Quercus. A lognormal prior on the TMRCA for the genus was set with a mean in real space of 56E6 (Hipp et al. 2020) and a stdev of 0.05. A relaxed lognormal clock model was applied, with a lognormal prior for ucldmean (mean [real space] = 1.8E-9; stdev = 0.05; adapted from Sousa et al. 2014) and a gamma prior for ucldstdev (α = 0.5396, β = 0.3819). The Yule model was chosen as the tree prior, and a gamma prior (alpha = 0.001, beta = 1000) was used for “birthrate”. The analysis ran for 100E6 MCMC generations on the CIPRES Science gateway (Miller et al. 2011). The run was validated using Tracer v 1.7.2., and all parameters had ESS > 200 after a 10% burnin. A maximum clade credibility tree and branch support values were obtained with TreeAnnotator v 2.6.7.

A haplotype parsimony network was built from the concatenated cpDNA data set with the R package “pegas” and the “haploNet” function, which uses the uncorrected P or Hamming distance and pairwise deletion of missing data.

Results

Analyses of cpDNA sequence data

The concatenated matrix of cpDNA from 181 samples has 2006 base-pairs (bp) in length (matK: 897 bp; trnH-psbA: 478 bp; trnL-F: 381 bp; trnS-psbC: 250 bp), of which 46 are parsimony-informative sites (matK:12; trnH-psbA:18; trnL-F: 8; trnS-psbC:8), and contains 17 indels corresponding to 52 positions with gaps. The largest indel has 11 bp, and 7 indels have 1 bp.

Indels were not coded prior to analyses, i.e. gaps were considered as missing data. This data set, that includes ten sampled Quercus lineages and the outgroup, comprised 46 haplotypes. Among the 146 Q. suber samples, 29 haplotypes were found. Of these, nine (31%) were singletons, i.e. occurred in a single sample. The most frequent haplotype (XVIII) was found in 25 samples from Spain, France and Italy. The second most frequent haplotype (XXII) was found in 23 samples from Portugal. Haplotype II was found to be shared by Q. canariensis, Q. faginea and Q. pyrenaica, which is in accordance with earlier findings (Petit et al. 2002). The full list of samples and corresponding haplotypes is presented in Online Resource 1.

Results of haplotype diversity estimates for Q. suber are presented in Table 2.

Table 2 Haplotype diversity in Quercus suber

The countries with most haplotypes found were Portugal (12), Spain (6) and Italy (5). Haplotype diversity ranged from 0 (Tunisia) to 0.83 (Portugal). The highest nucleotide diversity was found in Morocco (0.005). Tajimas’D statistic was negative in all sites except for Portugal, with significant values in Algeria, France and Italy, whereas the lowest significant values of the R2 test were found in France and Italy.

In parsimony analyses of the complete data set of 181 individuals, 20 trees were retained after 157,888,177 rearrangements. The complete maximum parsimony consensus tree is shown in Online Resource 2, and a reduced MP consensus tree inferred from one sequence of each haplotype is shown in Fig. 2.

Fig. 2
figure 2

Phylogenetic reconstruction of Quercus suber haplotypes under parsimony. Tree showing the phylogenetic reconstruction produced by a parsimony analysis of the chloroplast sequence data using TNT. This simplified tree is derived from the strict consensus cladogram of 20 most parsimonious trees (Online Resource 2) by merging all tips corresponding to the same haplotype (46 haplotypes). Haplotype numbers are presented in roman numerals. Tip labels show full acronyms for Q. suber, Q. rotundifolia and Q. ilex samples but are shortened for other taxa. Q. suber haplotypes are highlighted in green (suber I) and orange (suber II). Branches with bootstrap support equal or greater than 80 in at least one of the analyses (maximum parsimony and maximum likelihood) are marked with*

Taxa belonging to section Quercus, namely Q. canariensis (QCA), Q. faginea (QFA), Q. pyrenaica (QPY), which share the same haplotype (II), and Q. lusitanica (QLU), form a fully supported clade (BS = 100). The placement of Q. rubra (QRU) as sister to the remainder of the ingroup is also supported (BS = 100), as is the grouping of the two Q. coccifera (QCO) haplotypes. Samples belonging to Q. suber are recovered in two distinct clades: a supported clade (BS = 88 in MP analyses) comprising solely Q. suber lineages from all countries and corresponding to 13 haplotypes (henceforth referred to as the suber I group); an unsupported clade comprising Q. suber samples from Portugal, Spain, Morocco and Algeria, corresponding to 16 haplotypes (henceforth referred to as the suber II group), in which Q. ilex haplotypes appear nested in Q. suber lineages. Relationships among Q. suber haplotypes are largely unresolved, particularly within the suber I group. The two most frequent haplotypes (XVIII, XXII) are included in the suber I clade. Relationships between haplotypes from different taxa are not supported, except for the placement of Q. cerris (QCE) as sister to the suber I group (BS = 92 in MP analyses).

The time-calibrated analysis of the reduced cpDNA alignment using BEAST (Fig. 3) also recovers two Q. suber clades, both with full support (PP = 1.0), and Q. cerris fully supported as sister to the suber I group. A clade composed of three Q. rotundifolia haplotypes (XII, XIII, XIV) and the two Q. coccifera haplotypes (IV, V) is also fully supported. Full support was also obtained for the group comprising all Quercus haplotypes except Q. rubra and Q. rotundifolia haplotype XI.

Fig. 3
figure 3

Time-calibrated phylogenetic reconstruction of Quercus suber haplotypes. Tree showing the phylogenetic reconstruction produced by a time-calibrated analysis of the 46 haplotypes in BEAST2. The inferred tree is scaled to geological time in units of million years (Myr). Branch support values represent posterior probabilities. Haplotype numbers are presented in roman numerals. Q. suber haplotypes are highlighted in green (suber I) and orange (suber II). Pli Pliocene; Ple Pleistocene

Branches between the two suber haplogroups are mostly unsupported. Node height estimates place the root of genus Quercus at 55.25 Myr (height 95% HPD: 49,804,501.61; 60,605,566.97). The split between Q. cerris and Q. suber is estimated at 21.1 Myr (height 95% HPD: 8,957,266.84; 35,020,741.65). The age of the suber I group is estimated at 16.49 Myr (height 95% HPD: 6,483,671.34; 28,170,912.6), whereas the age of the suber II group is estimated to be 23.24 Myr (height 95% HPD: 11,615,111.29; 36,610,615.62).

The parsimony haplotype network built with the uncorrected P distance and showing the different taxa is presented in Fig. 4.

Fig. 4
figure 4

Haplotype parsimony network. Statistical parsimony network (TCS) of haplotypes constructed from the chloroplast sequence data set using the R package “pegas”. Haplotype numbers are presented in roman numerals. Quercus suber haplotypes are highlighted in green (suber I) and orange (suber II). Dashes on each link represent evolutionary steps between haplotypes

Quercus suber haplotypes appear divided into two groups, separated by intermediate Q. rotundifolia haplotypes and the Q. cerris haplotype. The suber I group, comprising 13 Q. suber haplotypes, is separated from the Q. cerris haplotype by seven evolutionary steps (mutations). The suber II group, comprising 12 Iberian and north African Q. suber haplotypes and the Q. ilex haplotype, is separated from Q. rotundifolia by seven steps. Haplotypes in both suber groups are separated by 1–3 steps, except in one case in the suber II group, where six steps separate two of the haplotypes. Six steps separate a group formed by three Q. rotundifolia haplotypes and the two Q. coccifera haplotypes, the latter at a distance of seven steps from the closest Q. rotundifolia haplotype.

Discussion

Our analyses agree with earlier findings that revealed the existence of two main chloroplast haplogroups in Q. suber (Jiménez et al. 2004; López de Heredia et al. 2005, 2007a, b, 2020; Lumaret et al. 2005; Lumaret and Jabbour-Zahab 2009; Simeone et al. 2018). We identify a widespread group, suber I, and a western group, suber II, which is only present in the Iberian Peninsula and NW Africa. Group suber I appears as sister to Q. cerris in all trees, which is in accordance with the taxonomic treatment and with the phylogeny of genus Quercus, that place Q. suber in section Cerris (Denk et al. 2017; Hipp et al. 2020; Hubert et al. 2014; Zhou et al. 2022), i.e. closest to Q. cerris than to any of the other Quercus species sampled herein. Suber I may correspond to what has been considered the primary chloroplast lineage in Q. suber, and referred to as the “suber” chloroplast lineage, as opposed to the “ilex” chloroplast lineage (Lumaret et al. 2005; Simeone et al. 2018). Suber II appears either unresolved (MP and ML trees) or nested within Q. rotundifolia, without support (BEAST tree), and is separated from suber I, in the haplotype network, by six intermediate haplotypes of Q. rotundifolia and Q. cerris. This pattern raises questions on the origin and persistence of the two Q. suber chloroplast lineages, which could be explained by two biological scenarios. The first would be incomplete lineage sorting, meaning that Q. suber retained two ancestral chloroplast lineages within its distribution range. The second would be ancient introgression between Q. suber and Quercus sect. Ilex. Both these processes have been invoked to explain haplotype diversity in Quercus sect. Cerris (Simeone et al. 2018). Our analyses show an ancient origin of both Q. suber chloroplast lineages but also the presence of different haplogroups in Quercus sect. Ilex, as well as a lack of support for relationships between the two sections. These results are compatible with a scenario of incomplete lineage sorting, i.e. the persistence of ancestral chloroplast lineages within both Q. suber and Q. rotundifolia, but do not negate the possibility that haplotypes in suber II derive from introgression events between Q. suber and Q. rotundifolia/Q. ilex, as these two species were not sampled across their entire range, and a more complete sampling may have shown a pattern more indicative of hybridisation.

The recovery of the Q. ilex haplotype nested within suber II confirms the existence of genetic exchange between Q. ilex and Q. suber. Hybridisation between these two species has been widely described (Belahbib et al. 2001; López de Heredia et al. 2017; Lumaret et al. 2009) and Q. suber is considered to be the paternal donor in most cases (Belahbib et al. 2001; López de Heredia et al. 2020), although some authors do not recognise a directional hybridisation pattern in these two taxa (Burgarella et al. 2009; López de Heredia et al. 2017; Lumaret et al. 2009). The observed pattern suggests that the Q. ilex haplotype is derived from a Q. suber haplotype present in a diversified suber II group. However, both Q. ilex and Q. rotundifolia are not broadly sampled in our data set, and a putative introgression event where Q. ilex is the paternal donor is likely to be uncommon. The parsimony haplotype network supports the hypothesis of suber II originating through introgression, as the two Q. suber chloroplast groups are separated by intermediate haplotypes of Q. rotundifolia. However, haplotype network reconstruction is not robust to small changes in our data set. It was observed that the simple removal of Q. cerris and of parsimony-informative site 458, for example, results in a network showing the two suber haplotype groups connected, without intermediate haplotypes from other taxa (Online Resource 3).

Age estimates obtained under a relaxed clock recover the origin of Quercus at c. 55 Myr, in accordance with earlier analyses that placed the origin of the genus at 55–56 Myr (Hipp et al. 2020; Hubert et al. 2014). The option for a relaxed clock was considered appropriate given the inclusion of different Quercus species and the outgroup. Crown age estimates for the two suber haplotype groups and for the Q. suber/Q. cerris clade point to the late Oligocene/early Miocene, and are thus compatible with the dates proposed by Magri et al. (2007) for the diversification of Q. suber haplotypes. Our age estimates have a large margin of error, as indicated by the height posterior density (HPD) intervals, and must be considered as only an approximation. Nevertheless, these estimates suggest that the two haplotype lineages were present in Q. suber well before the Pleistocene, and therefore that putative introgression events with Q. ilex/Q. rotundifolia (Jiménez et al. 2004; Lumaret and Jabbour-Zahab 2009) originating suber II haplotypes would likely have occurred before glaciations associated with that epoch.

If the suber II haplogroup was indeed acquired through modern introgression between Q. suber and Q. ilex/Q. rotundifolia, the latter corresponding to seed-bearing donors, then all sampled suber II haplotypes in our trees would have to be derived from multiple unsampled Q. ilex or Q. rotundifolia haplotypes. Ancient introgression between Q. suber and an ancestral lineage in Quercus sect. Ilex, during the Miocene, could nevertheless be a valid possibility, assuming that both lineages were already well differentiated, although the Q. coccifera/Q. ilex haplotype split is estimated at c.9 Myr in our analysis (Hipp et al. 2020 estimated the species split at c.10 Myr), and is thus more recent than the suber II group. The hypothesis of ancient reticulations explaining the presence of different chloroplast lineages in Quercus sects. Cerris and Ilex has been postulated earlier (e.g. Simeone et al. 2016, 2018). These reticulations may have occurred as multiple independent events, and haplotypes would have become fixed due to genetic drift associated with demographic changes. However, our current sampling of Quercus sect. Ilex is insufficient to fully verify the hypothesis of ancient reticulation events originating the two distinct haplogroups.

An alternative explanation for the observed pattern would be incomplete sorting of chloroplast lineages in Q. suber. Manos et al. (1999) hypothesised the persistence of cpDNA polymorphisms through the diversification of Quercus, and highlighted the lack of a clear discriminating signal of Quercus plastid data at infrageneric level. Li et al. (2022) have reported long-term persistence of ancestral chloroplast lineages in East Asian oaks. Lumaret et al. (2005) also predicted the existence of two native Q. suber haplogroups, rather than a native and an introgressed lineage, and Simeone et al. (2009) postulated that sharing of ancestral cpDNA polymorphisms in sect. Cerris is highly probable.

Besides Q. suber, one of the hypothetical maternal donors under the introgression scenario, Q. rotundifolia, does not possess a single chloroplast lineage either. This pattern is consistent with the findings of Simeone et al. (2016) who reported a non-monophyly of plastomes in Quercus sect. Ilex, which was attributed to a combination of incomplete lineage sorting and putative introgression. Vitelli et al. (2017) also identified different lineages in Quercus sect. Ilex which followed a clear geographical structuring. Samples of Q. rotundifolia do not cluster together in our phylogenetic analyses, and indeed display a pattern that is compatible with the retention of different chloroplast lineages. For example, three Q. rotundifolia haplotypes (XII, XIII, XIV) form a supported clade with the Q. coccifera haplotypes (IV, V) in the BEAST tree (Fig. 3), and are readily identified in the haplotype network (Fig. 4). These haplotypes may correspond to what Simeone et al. (2016) named the “EuroMed” lineage of sect. Ilex plastomes.

Persistence of ancestral polymorphisms is a product of both rapid diversification and large effective population sizes (Pamilo and Nei 1988; Maddison and Knowles 2006). If ancestral Q. suber populations experienced rapid growth before a single chloroplast lineage could be fixed throughout the entire species range, then the two haplogroups could have persisted without major constraints for long periods. Diversification in Quercus species has been related to ecological opportunity due to a mid-Miocene temperature decrease (Graham 2011; Hipp et al. 2020), which may have promoted population expansion as well as speciation. Large effective population sizes throughout the Miocene may have enabled the retention of different chloroplast lineages up to the Pleistocene, when glaciation cycles caused range contractions in tree species north of the Mediterranean, that affected Q. suber (López de Heredia et al. 2007a; Vessella et al. 2015). Range contractions and decreased effective population sizes during glaciation periods would in theory have favoured the fixation of polymorphisms and chloroplast lineages. To explain the persistence of the two suber haplogroups, assuming they both have an ancestral origin, perhaps a vicariance process must be invoked. Vicariance could have resulted from contraction, during Pleistocene glacial maxima, into disconnected refugia in which either haplogroup, suber I or II, would become fixed. The geographical distribution of the suber II group may support the hypothesis of a separation of the two lineages, since Q. suber haplotypes in this group were only found in the Iberian Peninsula and northwestern Africa, whereas suber I is found throughout the distribution range of the species. Haplotypes of the suber II group may have become fixed in refugia located in the Iberian Peninsula and northwestern Africa (see Vessella et al. 2015), and post-glacial secondary contact between formerly isolated populations holding either suber I o II would then have allowed for the coexistence of the two haplogroups in these regions. Alternatively, new areas of suitable habitat for Q. suber may have become available around southern refugia, as hypothesised by Vesella et al. (2015), allowing for the maintenance of a large effective population and the presence of the two haplogroups. The current presence of the two lineages in the Iberian Peninsula and northwestern Africa but not in the remaining distribution range could be explained by the generally larger effective sizes of Q. suber populations in these regions, more recent colonisation of the northern and eastern parts of the species distribution range and low dispersal of suber I seeds across the Pyrenees, although a degree of adaptive leverage of either haplogroup cannot be excluded (see Pham et al. 2017 and López de Heredia et al. 2017). Regarding cpDNA diversity among regions, Algeria, Morocco, Spain, Italy and Portugal stand out as having large haplotype diversity and more than one private haplotype. Values of Tajima’s D statistic were generally negative, with statistical significance in France and Italy. This deviation from neutrality indicates recent population expansion. Like other Mediterranean tree species, Q.suber is known to have expanded northwards after contracting during the last glaciation period (López de Heredia et al. 2007a; Vessella et al. 2015), and signal of population expansion should therefore be detectable, particularly in the northern part of the Q. suber distribution range, where post-glacial colonisation is likely to be more recent.

Conclusions

We identified two cpDNA haplogroups within the distribution range of Q. suber, in agreement with earlier studies on Q. suber chloroplasts. One of these haplogroups occurs only in the Iberian Peninsula and northwestern Africa, while the other is present across the species range. Age estimates point to a Miocene diversification of both haplogroups, suggesting a scenario involving the retention of ancient chloroplast lineages or the occurrence of ancient introgression events between Q. suber and Q. sect. Ilex. Differential fixation of these chloroplast lineages in refugia and recent population expansion may explain their persistence through glaciation periods and their present-day geographical distribution. Our results highlight the complexity of chloroplast genealogies in Q. suber, Q. ilex and Q. rotundifolia, and suggest that obtaining more data, such as whole chloroplast sequences, may be required to fully understand the processes behind chloroplast diversity in these Mediterranean oak species.