Haplotype diversity patterns in Quercus suber (Fagaceae) inferred from cpDNA sequence data

Chloroplast genome diversity in cork oak (Quercus suber) is characterised by the occurrence of haplotypes that are akin to those found in other Mediterranean oak species, particularly in Q. ilex and Q. rotundifolia, suggesting the possible presence of an introgressed chloroplast lineage. To further investigate this pattern, we reconstructed chloroplast haplotypes by sequencing four chloroplast markers (cpDNA), sampled across 181 individuals and 10 taxa. Our analyses resulted in the identification of two diversified chloroplast haplogroups in Q. suber, corresponding to a geographically widespread lineage and an Afro-Iberian lineage. Time-calibrated phylogenetic analyses of cpDNA point to a Miocene origin of the two haplogroups in Q. suber, suggesting that the Afro-Iberian lineage was present in cork oak before the onset of glaciation periods. The persistence of the two haplogroups in the western part of the species distribution range may be a consequence of either ancient introgression events or chloroplast lineage sorting, combined with different fixation in refugia through glaciation periods. Our results provide a comprehensive insight on the origins of chloroplast diversity in these ecologically and economically important Mediterranean oaks.


Introduction
Cork oak (Quercus suber L., Fagaceae) is an evergreen tree species native to the western Mediterranean region where it has considerable economic importance (Aronson et al. 2009;Vessella et al. 2017).Its geographical distribution has been shaped by glaciation cycles (López de Heredia et al. 2007a;Vessella et al. 2015), similarly to other European tree species (Bagnoli et al. 2016;de Dato et al. 2020;Petit et al. 2002).Quercus suber is not only a dominant climax forest species in many natural settings across its distribution range, but also a vital element in managed tree-grassland agro-systems known as "montado" or "dehesa," together with Q. rotundifolia Lam.These agro-systems are the main source of commercial cork but also provide grazing grounds for livestock (Rolo and Moreno 2019) and harbour significant levels of biodiversity (Lopez-Sanchez et al. 2016).
Intraspecific genetic diversity in Q. suber can be considered moderate to low and reflects a weak structure along a longitudinal axis (Pina-Martins et al. 2019), but specific regions, in particular along the southern margins of its distribution range, are known to harbour higher levels of genetic diversity (Sousa et al. 2022).Genetic studies in Q. suber have analysed both nuclear data (Burgarella et al. 2009;López de Heredia et al. 2020;Pina-Martins et al. 2019;Vanhove et al. 2021) and chloroplast data (cpDNA), which can be informative in population-level studies of terrestrial plant groups.The chloroplast genome is nonrecombinant, usually maternally inherited and haploid, meaning that chloroplast DNA variants, or haplotypes, should become more rapidly fixed within populations, i.e. have a shorter coalescence time, compared to nuclear (diploid) allelic variants (Mariac et al. 2014;Petit and Vendramin 2007).Thus, chloroplast genomic data may provide insights on recent demographic and hybridisation events (López de Heredia et al. 2020;Pham et al. 2017).
Chloroplast haplotypes in Q. suber have been investigated using RFLP and PCR-RFLP markers (Jiménez et al. 2004;López-de-Heredia et al. 2005, 2007a, b;Lumaret et al. 2005;Lumaret and Jabbour-Zahab 2009), microsatellites (Magri et al. 2007) and sequence data (Belahbib et al. 2001;López de Heredia et al. 2020;Simeone et al. 2018).The different types of data have shown the presence of two or more haplogroups, which have been interpreted as the result of introgression events, possibly with adaptive influence, between Q. suber and other oak species (Jiménez et al. 2004;Lumaret and Jabbour-Zahab 2009;López de Heredia et al. 2017), as a consequence of isolation and subsequent expansion from glacial refugia (López de Heredia et al. 2007a;Lumaret et al. 2005) or as a product of Miocene plate tectonics (Magri et al. 2007).Reconstructing haplotype patterns from multiple chloroplast sequences and sampling broadly across the distribution range of Q. suber may bring new insights and enable a stronger resolution of haplotype diversity and distribution in this species, compared to earlier inferences made from RFLPs, microsatellites and single chloroplast regions.The main questions to address are whether chloroplast haplotype diversity in Q. suber is geographically structured, and whether it reflects ancient diversification or was instead shaped mostly by hybridisation events with other Quercus L. species.In addition, haplotype sequence data analysis may show which regions harbour the most genetic diversity in the chloroplast genome.
Here we expand the data set of Costa et al. (2011) based on three chloroplast intergenic regions (trnL-F, trnS-psbC and trnH-psbA) by sampling additional populations and by including the matK gene, which is considered a reliable marker for retrieving chloroplast phylogenies (Hochbach et al. 2018).The matK gene has been proposed as a barcode marker for plants (CBOL Plant Working Group. 2009) and has indeed been tested as a barcode marker for genus Quercus (Piredda et al. 2011;Simeone et al. 2013).We conduct diversity and phylogenetic analyses on our expanded dataset with the objective of generating a detailed characterisation of chloroplast diversity and its geographical distribution across the distribution range of Q.suber.We compare our results with earlier analyses of Q. suber cpDNA and use a time-calibrated analysis to elaborate on the origin of Q. suber chloroplast genome diversity.

Sampling and DNA sequencing
Samples of Quercus suber (subgenus Cerris, sect.Cerris) originated from 26 sites representing the entire natural range of the species.A map showing all sampling sites and the natural distribution of Q. suber was generated based on Caudullo et al. (2017) (Fig. 1).
Diversity measures were estimated at a regional level by combining sites of each country.1.
The package "pegas" (v.1.1; Paradis 2010) implemented in R v. 3.6.3(R core Team 2013) was used to estimate the number of haplotypes, haplotype and nucleotide diversity, and to test for demographic change using Tajima's D (Tajima 1989) and R2 (Ramos-Onsins and Rozas 2002).Maximum-parsimony (MP) trees were inferred from the cpDNA alignment using the program TNT v 1.1 (Goloboff et al. 2008) and a "traditional search" on the Phylogeny.fr server (http:// www.phylo geny.fr/).Branch support was estimated from a standard bootstrapping with 1000 replicates.Substitution models were inferred for the concatenated cpDNA alignment using jModelTest v. 2.1 (Darriba et al. 2012) with three substitution schemes and Akaike Information Criterion (AIC) calculations.A maximumlikelihood (ML) full bootstrap analysis of each alignment, with 1000 replicates, was run on RaxML v. 8.2.4 (Stamatakis 2014) using the best-fitting models inferred with AIC (concatenated matrix: GTR) and the gamma model of rate heterogeneity.
A reduced cpDNA alignment with only one sequence per haplotype (46 sequences) was analysed on BEAST v 2.6.7 (Bouckaert et al.2014), under the GTR substitution model and a discrete four category gamma model of site rate heterogeneity.A prior was defined for the monophyly of genus Quercus.A lognormal prior on the TMRCA for the genus was set with a mean in real space of 56E6 (Hipp et al. 2020) and a stdev of 0.05.A relaxed lognormal clock model was applied, with a lognormal prior for ucldmean (mean [real space] = 1.8E-9; stdev = 0.05; adapted from Sousa et al. 2014) and a gamma prior for ucldstdev (α = 0.5396, β = 0.3819).The Yule model was chosen as the tree prior, and a gamma prior (alpha = 0.001, beta = 1000) was used for "birthrate".The analysis ran for 100E6 MCMC generations on the CIPRES Science gateway (Miller et al. 2011).The run was validated using Tracer v 1.7.2., and all parameters had ESS > 200 after a 10% burnin.A maximum clade credibility tree and branch support values were obtained with TreeAnnotator v 2.6.7.
A haplotype parsimony network was built from the concatenated cpDNA data set with the R package "pegas" and the "haploNet" function, which uses the uncorrected P or Hamming distance and pairwise deletion of missing data.
Indels were not coded prior to analyses, i.e. gaps were considered as missing data.This data set, that includes ten sampled Quercus lineages and the outgroup, comprised 46 haplotypes.Among the 146 Q. suber samples, 29 haplotypes were found.Of these, nine (31%) were singletons, i.e. occurred in a single sample.The most frequent haplotype (XVIII) was found in 25 samples from Spain, France and Italy.The second most frequent haplotype (XXII) was found in 23 samples from Portugal.Haplotype II was found to be shared by Q. canariensis, Q. faginea and Q. pyrenaica, which is in accordance with earlier findings (Petit et al. 2002).The full list of samples and corresponding haplotypes is presented in Online Resource 1.
Results of haplotype diversity estimates for Q. suber are presented in Table 2.
The countries with most haplotypes found were Portugal (12), Spain (6) and Italy (5).Haplotype diversity ranged from 0 (Tunisia) to 0.83 (Portugal).The highest nucleotide diversity was found in Morocco (0.005).Tajimas'D statistic was negative in all sites except for Portugal, with significant values in Algeria, France and Italy, whereas the lowest significant values of the R2 test were found in France and Italy.In parsimony analyses of the complete data set of 181 individuals, 20 trees were retained after 157,888,177 rearrangements.The complete maximum parsimony consensus tree is shown in Online Resource 2, and a reduced MP consensus tree inferred from one sequence of each haplotype is shown in Fig. 2.
Taxa belonging to section Quercus, namely Q. canariensis (QCA), Q. faginea (QFA), Q. pyrenaica (QPY), which share the same haplotype (II), and Q. lusitanica (QLU), form a fully supported clade (BS = 100).The placement of Q. rubra (QRU) as sister to the remainder of the ingroup is also supported (BS = 100), as is the grouping of the two Q. coccifera (QCO) haplotypes.Samples belonging to Q. suber are recovered in two distinct clades: a supported clade (BS = 88 in MP analyses) comprising solely Q. suber lineages from all countries and corresponding to 13 haplotypes (henceforth referred to as the suber I group); an unsupported clade comprising Q. suber samples from Portugal, Spain, Morocco and Algeria, corresponding to 16 haplotypes (henceforth referred to as the suber II group), in which Q. ilex haplotypes appear nested in Q. suber lineages.Relationships among Q.suber haplotypes are largely unresolved, particularly within the suber I group.The two most frequent haplotypes (XVIII, XXII) are included in the suber I clade.Relationships between haplotypes from different taxa are not supported, except for the placement of Q. cerris (QCE) as sister to the suber I group (BS = 92 in MP analyses).
The time-calibrated analysis of the reduced cpDNA alignment using BEAST (Fig. 3) also recovers two Q. suber clades, both with full support (PP = 1.0), and Q. cerris fully supported as sister to the suber I group.A clade composed of three Q.rotundifolia haplotypes (XII, XIII, XIV) and the two Q. coccifera haplotypes (IV, V) is also fully supported.Full support was also obtained for the group comprising all Quercus haplotypes except Q. rubra and Q. rotundifolia haplotype XI.
The parsimony haplotype network built with the uncorrected P distance and showing the different taxa is presented in Fig. 4.
Quercus suber haplotypes appear divided into two groups, separated by intermediate Q. rotundifolia haplotypes and the Q. cerris haplotype.The suber I group, comprising 13 Q.suber haplotypes, is separated from the Q. cerris haplotype by seven evolutionary steps (mutations).The suber II group, comprising 12 Iberian and north African Q. suber haplotypes and the Q. ilex haplotype, is separated from Q. rotundifolia by seven steps.Haplotypes in both suber groups are separated by 1-3 steps, except in one case in the suber II group, where six steps separate two of the haplotypes.Six steps separate a group formed by three Q.rotundifolia haplotypes and the two Q. coccifera haplotypes, the latter at a distance of seven steps from the closest Q. rotundifolia haplotype.

Discussion
Our analyses agree with earlier findings that revealed the existence of two main chloroplast haplogroups in Q. suber (Jiménez et al. 2004;López de Heredia et al. 2005, 2007a, b, 2020;Lumaret et al. 2005;Lumaret and Jabbour-Zahab 2009;Simeone et al. 2018).We identify a widespread group, suber I, and a western group, suber II, which is only present in the Iberian Peninsula and NW Africa.Group suber I appears as sister to Q. cerris in all trees, which is in accordance with the taxonomic treatment and with the phylogeny of genus Quercus, that place Q. suber in section Cerris (Denk et al. 2017;Hipp et al. 2020;Hubert et al. 2014;Zhou et al. 2022), i.e. closest to Q. cerris than to any of the other Quercus species sampled herein.Suber I may correspond to what has been considered the primary chloroplast lineage in Q. suber, and referred to as the "suber" chloroplast lineage, as opposed to the "ilex" chloroplast Fig. 3 Time-calibrated phylogenetic reconstruction of Quercus suber haplotypes.Tree showing the phylogenetic reconstruction produced by a time-calibrated analysis of the 46 haplotypes in BEAST2.The inferred tree is scaled to geological time in units of million years (Myr).Branch support values represent posterior probabilities.Haplotype numbers are presented in roman numerals.Q. suber haplotypes are highlighted in green (suber I) and orange (suber II).Pli Pliocene; Ple Pleistocene lineage (Lumaret et al. 2005;Simeone et al. 2018).Suber II appears either unresolved (MP and ML trees) or nested within Q. rotundifolia, without support (BEAST tree), and is separated from suber I, in the haplotype network, by six intermediate haplotypes of Q. rotundifolia and Q. cerris.This pattern raises questions on the origin and persistence of the two Q. suber chloroplast lineages, which could be explained by two biological scenarios.The first would be incomplete lineage sorting, meaning that Q. suber retained two ancestral chloroplast lineages within its distribution range.The second would be ancient introgression between Q. suber and Quercus sect.Ilex.Both these processes have been invoked to explain haplotype diversity in Quercus sect.Cerris (Simeone et al. 2018).Our analyses show an ancient origin of both Q. suber chloroplast lineages but also the presence of different haplogroups in Quercus sect.Ilex, as well as a lack of support for relationships between the two sections.These results are compatible with a scenario of incomplete lineage sorting, i.e. the persistence of ancestral chloroplast lineages within both Q. suber and Q. rotundifolia, but do not negate the possibility that haplotypes in suber II derive from introgression events between Q. suber and Q. rotundifolia/Q.ilex, as these two species were not sampled across their entire range, and a more complete sampling may have shown a pattern more indicative of hybridisation.
The recovery of the Q. ilex haplotype nested within suber II confirms the existence of genetic exchange between Q. ilex and Q. suber.Hybridisation between these two species has been widely described (Belahbib et al. 2001;López de Heredia et al. 2017;Lumaret et al. 2009) and Q. suber is considered to be the paternal donor in most cases (Belahbib et al. 2001;López de Heredia et al. 2020), although some authors do not recognise a directional hybridisation pattern in these two taxa (Burgarella et al. 2009;López de Heredia et al. 2017;Lumaret et al. 2009).The observed pattern suggests that the Q. ilex haplotype is derived from a Q. suber haplotype present in a diversified suber II group.However, both Q. ilex and Q. rotundifolia are not broadly sampled in our data set, and a putative introgression event where Q. ilex is the paternal donor is likely to be uncommon.The parsimony haplotype network supports the hypothesis of suber II originating through introgression, as the two Q. suber chloroplast groups are separated by intermediate Fig. 4 Haplotype parsimony network.Statistical parsimony network (TCS) of haplotypes constructed from the chloroplast sequence data set using the R package "pegas".Haplotype numbers are presented in roman numerals.Quercus suber haplotypes are highlighted in green (suber I) and orange (suber II).Dashes on each link represent evolutionary steps between haplotypes haplotypes of Q. rotundifolia.However, haplotype network reconstruction is not robust to small changes in our data set.It was observed that the simple removal of Q. cerris and of parsimony-informative site 458, for example, results in a network showing the two suber haplotype groups connected, without intermediate haplotypes from other taxa (Online Resource 3).
Age estimates obtained under a relaxed clock recover the origin of Quercus at c. 55 Myr, in accordance with earlier analyses that placed the origin of the genus at 55-56 Myr (Hipp et al. 2020;Hubert et al. 2014).The option for a relaxed clock was considered appropriate given the inclusion of different Quercus species and the outgroup.Crown age estimates for the two suber haplotype groups and for the Q. suber/Q.cerris clade point to the late Oligocene/early Miocene, and are thus compatible with the dates proposed by Magri et al. (2007) for the diversification of Q. suber haplotypes.Our age estimates have a large margin of error, as indicated by the height posterior density (HPD) intervals, and must be considered as only an approximation.Nevertheless, these estimates suggest that the two haplotype lineages were present in Q. suber well before the Pleistocene, and therefore that putative introgression events with Q. ilex/Q.rotundifolia (Jiménez et al. 2004;Lumaret and Jabbour-Zahab 2009) originating suber II haplotypes would likely have occurred before glaciations associated with that epoch.
If the suber II haplogroup was indeed acquired through modern introgression between Q. suber and Q. ilex/Q.rotundifolia, the latter corresponding to seed-bearing donors, then all sampled suber II haplotypes in our trees would have to be derived from multiple unsampled Q. ilex or Q. rotundifolia haplotypes.Ancient introgression between Q. suber and an ancestral lineage in Quercus sect.Ilex, during the Miocene, could nevertheless be a valid possibility, assuming that both lineages were already well differentiated, although the Q. coccifera/Q.ilex haplotype split is estimated at c.9 Myr in our analysis (Hipp et al. 2020 estimated the species split at c.10 Myr), and is thus more recent than the suber II group.The hypothesis of ancient reticulations explaining the presence of different chloroplast lineages in Quercus sects.Cerris and Ilex has been postulated earlier (e.g.Simeone et al. 2016Simeone et al. , 2018)).These reticulations may have occurred as multiple independent events, and haplotypes would have become fixed due to genetic drift associated with demographic changes.However, our current sampling of Quercus sect.Ilex is insufficient to fully verify the hypothesis of ancient reticulation events originating the two distinct haplogroups.An alternative explanation for the observed pattern would be incomplete sorting of chloroplast lineages in Q. suber.Manos et al. (1999) hypothesised the persistence of cpDNA polymorphisms through the diversification of Quercus, and highlighted the lack of a clear discriminating signal of Quercus plastid data at infrageneric level.Li et al. (2022) have reported long-term persistence of ancestral chloroplast lineages in East Asian oaks.Lumaret et al. (2005) also predicted the existence of two native Q. suber haplogroups, rather than a native and an introgressed lineage, and Simeone et al. (2009) postulated that sharing of ancestral cpDNA polymorphisms in sect.Cerris is highly probable.
Besides Q. suber, one of the hypothetical maternal donors under the introgression scenario, Q. rotundifolia, does not possess a single chloroplast lineage either.This pattern is consistent with the findings of Simeone et al. (2016) who reported a non-monophyly of plastomes in Quercus sect.Ilex, which was attributed to a combination of incomplete lineage sorting and putative introgression.Vitelli et al. (2017) also identified different lineages in Quercus sect.Ilex which followed a clear geographical structuring.Samples of Q. rotundifolia do not cluster together in our phylogenetic analyses, and indeed display a pattern that is compatible with the retention of different chloroplast lineages.For example, three Q.rotundifolia haplotypes (XII, XIII, XIV) form a supported clade with the Q. coccifera haplotypes (IV, V) in the BEAST tree (Fig. 3), and are readily identified in the haplotype network (Fig. 4).These haplotypes may correspond to what Simeone et al. (2016) named the "EuroMed" lineage of sect.Ilex plastomes.
Persistence of ancestral polymorphisms is a product of both rapid diversification and large effective population sizes (Pamilo and Nei 1988;Maddison and Knowles 2006).If ancestral Q. suber populations experienced rapid growth before a single chloroplast lineage could be fixed throughout the entire species range, then the two haplogroups could have persisted without major constraints for long periods.Diversification in Quercus species has been related to ecological opportunity due to a mid-Miocene temperature decrease (Graham 2011;Hipp et al. 2020), which may have promoted population expansion as well as speciation.Large effective population sizes throughout the Miocene may have enabled the retention of different chloroplast lineages up to the Pleistocene, when glaciation cycles caused range contractions in tree species north of the Mediterranean, that affected Q. suber (López de Heredia et al. 2007a;Vessella et al. 2015).Range contractions and decreased effective population sizes during glaciation periods would in theory have favoured the fixation of polymorphisms and chloroplast lineages.To explain the persistence of the two suber haplogroups, assuming they both have an ancestral origin, perhaps a vicariance process must be invoked.Vicariance could have resulted from contraction, during Pleistocene glacial maxima, into disconnected refugia in which either haplogroup, suber I or II, would become fixed.The geographical distribution of the suber II group may support the hypothesis of a separation of the two lineages, since Q. suber haplotypes in this group were only found in the Iberian Peninsula and northwestern Africa, whereas suber I is found throughout the distribution range of the species.Haplotypes of the suber II group may have become fixed in refugia located in the Iberian Peninsula and northwestern Africa (see Vessella et al. 2015), and post-glacial secondary contact between formerly isolated populations holding either suber I o II would then have allowed for the coexistence of the two haplogroups in these regions.Alternatively, new areas of suitable habitat for Q. suber may have become available around southern refugia, as hypothesised by Vesella et al. (2015), allowing for the maintenance of a large effective population and the presence of the two haplogroups.The current presence of the two lineages in the Iberian Peninsula and northwestern Africa but not in the remaining distribution range could be explained by the generally larger effective sizes of Q. suber populations in these regions, more recent colonisation of the northern and eastern parts of the species distribution range and low dispersal of suber I seeds across the Pyrenees, although a degree of adaptive leverage of either haplogroup cannot be excluded (see Pham et al. 2017 andLópez de Heredia et al. 2017).Regarding cpDNA diversity among regions, Algeria, Morocco, Spain, Italy and Portugal stand out as having large haplotype diversity and more than one private haplotype.Values of Tajima's D statistic were generally negative, with statistical significance in France and Italy.This deviation from neutrality indicates recent population expansion.Like other Mediterranean tree species, Q.suber is known to have expanded northwards after contracting during the last glaciation period (López de Heredia et al. 2007a;Vessella et al. 2015), and signal of population expansion should therefore be detectable, particularly in the northern part of the Q. suber distribution range, where post-glacial colonisation is likely to be more recent.

Conclusions
We identified two cpDNA haplogroups within the distribution range of Q. suber, in agreement with earlier studies on Q. suber chloroplasts.One of these haplogroups occurs only in the Iberian Peninsula and northwestern Africa, while the other is present across the species range.Age estimates point to a Miocene diversification of both haplogroups, suggesting a scenario involving the retention of ancient chloroplast lineages or the occurrence of ancient introgression events between Q. suber and Q. sect.Ilex.Differential fixation of these chloroplast lineages in refugia and recent population expansion may explain their persistence through glaciation periods and their present-day geographical distribution.Our results highlight the complexity of chloroplast genealogies in Q. suber, Q. ilex and Q. rotundifolia, and suggest that obtaining more data, such as whole chloroplast sequences, may be required to fully understand the processes behind chloroplast diversity in these Mediterranean oak species.

Information on Electronic Supplementary Material
Online Resource 1. List of samples showing the corresponding sampling sites and haplotypes.Online Resource 2. Maximum-parsimony consensus tree of combined chloroplast data.Online Resource 3. Haplotype parsimony network after site and sample exclusion.

Fig. 1
Fig. 1 Map of sampling sites.Map showing the natural distribution of Quercus suber (grey) and the location of sampling sites (black circles).For each site, the presence of haplotypes of either haplogroup is indicated by green (suber I) and orange (suber II) squares

Fig. 2
Fig. 2 Phylogenetic reconstruction of Quercus suber haplotypes under parsimony.Tree showing the phylogenetic reconstruction produced by a parsimony analysis of the chloroplast sequence data using TNT.This simplified tree is derived from the strict consensus cladogram of 20 most parsimonious trees (Online Resource 2) by merging all tips corresponding to the same haplotype (46 haplotypes).Haplo- Leaf material was sampled in situ from natural stands in eight sites in Portugal (Gerês, Serra da Estrela, Serra de São Mamede, Serra da Arrábida, Serra de Monchique, Serra do Buçaco, Azeitão and Serra de Sintra).The remaining samples (Portugal:

Table 1
SamplingList of sampled taxa, taxon/site acronyms, sampling sites, number of samples per site, country and site geographic coordinates

Table 2
Haplotype diversity in Quercus suberNumber of samples per country (n), number of haplotypes (n hap.), list of haplotypes, haplotype diversity (h), nucleotide diversity (π), and values of Tajima's (D) and Ramos-Onsins and Rozas (R2) tests.P values are marked with * for p < 0.05, ** for p < 0.01 and (**) for p < 0.01 assuming that D follows a beta distribution after rescaling