Genetic variation of introduced red oak (Quercus rubra) stands in Germany compared to North American populations

Although Northern red oak (Quercus rubra L.) is the most important introduced deciduous tree species in Germany, only little is known about its genetic variation. For the first time, we describe patterns of neutral and potentially adaptive nuclear genetic variation in Northern red oak stands across Germany. For this purpose, 792 trees were genotyped including 611 trees from 12 stands in Germany of unknown origin and 181 trees from four populations within the natural distribution area in North America. Our marker set included 12 potentially adaptive (expressed sequence tag-derived simple sequence repeat = EST SSR) and 8 putatively selectively neutral nuclear microsatellite (nSSR) markers. Our results showed that German stands retain comparatively high levels of genetic variation at both EST-SSRs and nSSRs, but are more similar to each other than to North American populations. These findings are in agreement with earlier chloroplast DNA analyses which suggested that German populations originated from a limited geographic area in North America. The comparison between potentially adaptive and neutral microsatellite markers did not reveal differences in the analyzed diversity and differentiation measures for most markers. However, locus FIR013 was identified as a potential outlier locus. Due to the absence of signatures of selection in German stands, we suggest that introduced populations were established with material from provenances that were adapted to environmental conditions similar to those in Germany. However, we analyzed only a limited number of loci which are unlikely to be representative of adaptive genetic differences among German stands. Our results suggest that the apparent introduction from a limited geographic range in North America may go along with a reduced adaptive potential.


Introduction
The introduction of Northern red oak (Quercus rubra L.) from its natural range in North America to Europe dates back to the end of the seventeenth century, when it was Communicated by Rüdiger Grote.
* Ludger Leinemann lleinem@gwdg.de * Oliver Gailing ogailin@gwdg.de 1 3 likely first brought to France (Houba 1887;Hickel 1932). Until the middle of the eighteenth century, Q. rubra was planted in parks and botanical gardens for ornamental purposes (Bauer 1951;Nagel 2015). After two major waves of cultivation in the second half of the nineteenth century and at the turn of the nineteenth to the twentieth century, it is now the most important non-native deciduous tree species for wood production in Germany (Bauer 1951; Bundesministerium für Ernährung und Landwirtschaft (BMEL) 2014). Nagel (2015) described Q. rubra as a species covering a wide range of soil and climatic conditions with annual precipitation between 600 and 2000 mm and mean annual temperatures between 4 and 15 °C and featuring a shorter rotation period (~ 80-120 years) than native white oak species (~ 140 years), as well as a lower demand for nutrients and water. Unlike other European countries (Möllerová 2005;Riepšas and Straigytė 2008;Chmura 2013), Q. rubra is not considered invasive in Germany. It is less shade-tolerant than the main native tree species Fagus sylvatica L. and only little more shade tolerant than the native white oak species Q. robur L. (Vor and Lüpke 2004;Niinemets and Valladares 2006;Nagel 2015). Moreover, it is easily controllable by containment measures (Vor 2005;Nagel 2015).
Introduced species are usually expected to experience a founder effect and genetic bottlenecks that greatly promote genetic drift and can result in decreased genetic variation, especially if only a limited numbers of trees served as seed source and/or only a small part of the species' natural range was sampled (Nei et al. 1975; Barrett and Husband 1990). In the past, several studies were conducted to reveal patterns of genetic variation within Northern red oak's natural range (e.g., Daubree and Kremer 1993;Romero-Severson et al. 2003;Magni et al. 2005;Zhang et al. 2015;Borkowski et al. 2017). However, only few studies focussed on the impact of the introduction on genetic variation of Northern red oak populations in Europe (Magni Diaz 2004;Merceron et al. 2017;Pettenkofer et al. 2019). Borkowski et al. (2017) found that the genetic differentiation between Northern red oak populations increased from south to north within the natural range, reflecting its postglacial migration movement, but without revealing distinct pathways. Magni Diaz (2004) used chloroplast (cp) DNAbased PCR-RFLP (restriction fragment length polymorphism) markers to analyze the genetic variation of introduced stands mainly located in France and Germany with only a few samples from the Netherlands, Belgium, Spain, Italy, and Romania. Interestingly, Magni Diaz (2004) found greater total and within-population diversity as well as differentiation between populations for historical German populations in comparison with historical French populations potentially as a result of different import and forest policies in these countries. However, for all European populations, Magni Diaz (2004) could not identify a geographic pattern. Recently, Pettenkofer et al. (2019) used maternally inherited chloroplast DNA markers to analyze the genetic variation of German stands in comparison with North American populations. Most German stands showed very similar haplotype frequencies and low haplotype diversity, pointing to a limited number of seed sources. A considerably higher cpDNA haplotype diversity was detected in Southern Germany. Multiple introductions and admixture of material within Europe or prior to its introduction to Europe were suggested as possible reasons for high haplotype diversity within certain regions (Magni Diaz 2004;Pettenkofer et al. 2019). While it was not possible to narrow down the geographic origin to specific regions in North America, patterns of genetic variation suggested an origin from a limited geographic area, likely in the northern part of the species' distribution range (Magni Diaz 2004;Pettenkofer et al. 2019). Merceron et al. (2017) studied single-nucleotide polymorphism (SNP) markers randomly distributed across the genome in European populations, mainly in France, but also including samples from Germany, the Netherlands, Belgium, Spain, Italy, and Romania, and compared them with data for North American populations. Three main genetic clusters were identified in North America: in the south, the northeast, and the northwest, respectively. However, only trees representing the two northern clusters were found in Europe. Merceron et al. (2017) stated that trees representing the southern cluster were either never introduced to Europe or vanished eventually. They suggested that European populations may thus originate from the northern part of the natural range, a conclusion supported by other studies on this topic (Bauer 1954;Magni Diaz 2004;Nagel 2015;Pettenkofer et al. 2019).
This study used some of the same populations that were analyzed earlier with chloroplast SSR markers (Pettenkofer et al. 2019). In contrast to chloroplast SSR markers, which are maternally inherited in Q. rubra and therefore useful to track migration routes (Petit et al. 1997;Alexander and Woeste 2014), nSSR markers are biparentally inherited and much more polymorphic. However, nSSR markers require a considerably larger set of samples per population. For the first time, we describe patterns of neutral and potentially adaptive nuclear genetic variation in Northern red oak stands across Germany.
The main objective of the present study was to assess the genetic variation of German Northern red oak stands by analyzing variable potentially adaptive (expressed sequence tagderived simple sequence repeats = EST SSR) and putatively selectively neutral nuclear microsatellite (nSSR) markers and to compare German stands with selected North American populations.
We hypothesize that (1) the Q. rubra gene pool introduced to Germany represents only a fraction of the North American gene pool, and (2) potentially adaptive genetic markers show different variation patterns compared to neutral markers.

Plant material
In total, 792 trees were genotyped in this study including 611 trees from 12 stands in Germany of unknown origin (48-57 samples per stand; Table 1, Fig. 1) and 181 trees from four populations within the natural distribution area in North America (40-47 samples per population; Table 2, Fig. 2). The selection of the study sites was based on the chloroplast haplotype frequencies identified earlier in these populations based on smaller sample sizes (5-10 samples per population) (Pettenkofer et al. 2019). Due to the strict maternal inheritance and lower variation of cpDNA, these sample sizes allowed to characterize 39 stands in Germany and 19 populations in the USA and Canada. In the study carried out by Pettenkofer et al. (2019), most German stands had the most common haplotype "A." The next frequent haplotype "B" was found exclusively in the northern parts of the natural range. Two of the three haplotype clusters identified in the natural range were also found in German populations. Stands located in the southwest of Germany had a higher haplotype variation compared to other sampled stands. The populations within the natural range used in this study  were selected from the northern and southern parts of the range that were different based on the chloroplast haplotypes, and samples were available in sufficient numbers for them. Samples were taken randomly in each population as buds, green leaves, or cambium probes. Within Germany, sample stands were selected in five different federal states of Germany (Table 1): two stands in each of Lower Saxony, North Rhine-Westphalia, Brandenburg, and Thuringia and four stands in Baden-Wuerttemberg, where Pettenkofer et al. (2019) discovered the highest chloroplast variation. All German stands were (1) pure Q. rubra stands, (2) 50-80 years old, (3) featured a rectangular shape for easier data acquisition, and (4) both the present and expected future climate conditions matched their autecological properties.

DNA isolation
The DNeasy 96 Plant Kit (Qiagen, Hilden, Germany) was used to extract the DNA from either about 1 cm 2 leaf tissue of the fresh leaf or 1-2 whole buds from a fresh twig per tree.
Extraction of DNA from cambium samples was performed with the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) using the respective manufacturer's protocol with the following changes: The incubation of cambium samples took place overnight at 65 °C in a lysate containing the buffer AP1 and 9.4% polyvinylpyrrolidone (PVP) in the final solution.
Multiplex 4 was performed with the following PCR mix per sample: 8.5 µl Multiplex PCR Kit (Qiagen, Hilden, Germany) and the denoted amount for each forward and reverse primer for the markers PIE040 (1 µl The same touchdown PCR program was used for all markers in a Biometra TProfessional thermocycler (Jena, Germany). The PCR protocol started with 15 min for initial denaturation at 95 °C, followed by 10 cycles of 1-min denaturation at 94 °C, 1-min annealing at 60 °C (− 1 °C per cycle), and 1-min extension at 72 °C. This first set of cycles was then followed by another 25 cycles of 1-min denaturation at 94 °C, 1-min annealing at 50 °C, and 1-min extension at 72 °C. The PCR ended with a final 20-min extension step.
Before adding to HiDi formamide, the PCR products were diluted for capillary gel electrophoresis based on band intensities on agarose gels in the ratios of 1:150 for multiplexes 1 and 2 and 1:60 for multiplexes 3 and 4. The SSR fragments were separated using capillary electrophoresis on an ABI Genetic Analyzer 3130xl (Applied Biosystems, Foster City, USA) and sized using the internal size standard GeneS-can™ 500 ROX™ as reference for multiplexes 1 and 2 and GeneScan™ 500 LIZ™ as reference for multiplexes 3 and 4 (Applied Biosystems, Foster City, USA). The fragments were scored using the software package GeneMapper version 3.7 (Applied Biosystems, Foster City, USA).

Data analyses
Based on the obtained SSR genotypes, the software GenAlEx 6.5 Smouse 2006, 2012) was used to calculate the number of alleles N a , the effective number of alleles N e , the observed heterozygosity H o , the expected heterozygosity H e , and the genetic differentiation measures D (Jost 2008) and F ST for all markers and populations. We chose to calculate both differentiation parameters because traditional F ST can underestimate differentiation for such highly polymorphic markers as SSRs (Jost 2008). The fixation index F IS and the test for significant differences of its values from 0 were calculated with FSTAT 2.9.3 with P-values adjusted using the sequential Bonferroni procedure (Rice 1989;Goudet 1995). The Kruskal-Wallis test was used to identify significant differences between German and North American populations for all diversity parameters (Table 3) using the software R version 3.3.2 (R Core Team 2016). Also, the Kruskal-Wallis test with multiple comparisons implemented in the R-package "pgirmess" (Giraudoux et al. 2018) was used to test for pairwise differences among all populations. The implemented multiple-comparison test determines which population pair shows significant differences. To visualize population differences, we further used GenAlEx 6.5 to perform a principal coordinates analysis (PCoA). This cluster analysis was based on the pairwise genetic distance matrix between populations and between individuals. The PCoA assigns a location for each individual or population within a multidimensional space. Two or three first main axes that explain most of the genetic differentiation between individuals and/or populations are usually presented in a plot based on this analysis (Peakall and Smouse 2012). In this study, the genetic distance matrix was based on Jost's D (Jost 2008) calculated using GenAlEx 6.5.
To build a neighbor-joining tree (NJT), the genetic distance D A by Nei et al. (1983), which is especially suited for microsatellite markers (Takezaki and Nei 1996), was calculated using the Populations 1.2.32 software (Langella 1999). The bootstrap values were based on 1000 permutations across loci, and the NJT was visualized using the online software IcyTree (Vaughan 2017). Arlequin version 3.5 (Excoffier and Lischer 2010) was also used to perform an analysis of molecular variance (AMOVA) by applying 9999 permutations for all populations (the entire dataset), all German populations, and the four North American populations (Excoffier et al. 1992). The software computes a matrix of pairwise distances between all populations using the number of different alleles (F STlike). Further, Arlequin was also used to search for outlier loci that could be potentially under selection (Excoffier et al. 2009). Here, 50,000 simulations and 100 demes were selected as running conditions.
To make inferences about the potential population structure and number of clusters that can be identified in our genotyped samples, we used the software STRU CTU RE 2.3.4 by Pritchard et al. (2000). STRU CTU RE uses a model-based statistical clustering method within a Bayesian framework to assign individuals to populations and detect population structure over all sampled multilocus genotypes (Pritchard et al. 2000;Falush et al. 2003). We used 10,000 and 100,000 Markov chain Monte Carlo (MCMC) replicates for the burn-in period and further iterations, respectively. Runs were performed for 1-20 potential clusters (K) using 20 iterations for each test. To infer the optimal number of K, we applied the ΔK method by Evanno et al. (2005) using the STRU CTU RE HARVESTER 0.6.94 software (Earl and Von Holdt 2012). Results were visualized using the online software CLUMPAK (Kopelman et al. 2015).
To test for genetic bottleneck effects, the software BOT-TLENECK (Cornuet and Luikart 1996) was used with 1000 iterations and assuming a two-phase model (TPM). The TPM allows to assume different proportions (contributions) of the infinite allele model (IAM) and the stepwise mutation model (SMM) for all markers. Considering that dinucleotide and other perfect repeats follow the IAM while imperfect repeats rather follow the SMM, we selected the option with 70% IAM and 30% SMM in accordance with our marker set (Cornuet and Luikart 1996;Cristescu et al. 2010).

Genetic diversity
The German stands and North American populations showed similar levels of mean observed (H o = 0.61 vs. 0.59) and expected (H e = 0.63 vs. 0.65) heterozygosities, respectively (Table 3). These parameters were also similar across all populations (H o = 0.57-0.62, H e = 0.62-0.68, Table 3). The mean (N a ) and effective (N e ) number of alleles also showed no significant differences between German and North American populations (Table 3). However, the number of private alleles and their mean frequency in Germany (9 in 12 populations, N p = 0.75) were significantly lower than those in the North American populations (19 in 4 populations, N p = 4.75) (Tables 3 and 4). In Germany, 5 of 9 private alleles were found in stands in Baden-Wuerttemberg, while two were found in each Brandenburg and Lower Saxony (Table 4). We detected no private alleles in stands from Thuringia or North Rhine-Westphalia. In North America, most private alleles were found in BR1 (N p = 9) and Constance Bay (N p = 5); both populations are located in the north of the natural distribution range. Relative frequencies of private alleles range from 0.009 to 0.042 (Table 4). Among all populations and tested parameters (N a , N e , H o , H e , F IS ), only F IS showed significant differences between the populations 18 (North Rhine-Westphalia) and BR1. No recent genetic bottleneck was detected using the BOTTLENECK software.
The search for outlier loci that could be potentially under selection identified only the EST-SSR FIR013 as a significant outlier locus (Supplementary Table 2S and Fig. 3). At this marker, the North American population BR1 shows a higher frequency (0.24) of allele 138 in comparison with other populations (0.01-0.11).

Genetic differentiation and population structure
The PCoA based on Jost's D between all populations showed that German stands densely clustered between the North American population FC-B, which is located in the northwest of the natural range (Ford Forestry Center, Michigan), and Nantahala, which is located in the south of the range, in the western parts of North Carolina (Fig. 4). Constance Bay (Canada) and BR1 (Brockway Mountain, Michigan) were set apart from the German cluster. The first two principal coordinates together explained 66.32% of the variation among populations. Also, the individual-based PCoA revealed more heterogeneous point clouds for the populations of North American origin as compared to the German plantations. Especially, the two natural populations from Northern Michigan occupy a comparatively wide space in the twodimensional PCoA (Supplementary figures 5S and 6S).
To partition genetic variation among and within populations, a hierarchical AMOVA was performed for all populations (the entire dataset) and separately for the North American populations and all German populations (Supplementary  Tables 3S, 4S, and 5S). In the entire dataset, 1.10% of the variation was due to partition among populations, while 98.90% was due to variation within populations (among and within individuals), with all components being highly significant. In North America, 1.88% of the variation was due to partition among populations (p < 0.001), and 98.12% of the variation was due to variation within populations. In German populations, the genetic variation among populations was lower than among North American populations (0.78%).
The STRU CTU RE and Delta K analyses of the entire dataset, as well as for North American and German populations, indicated no strong population structure (Supplementary figures 1S-4S).
The NJT did not reveal strong differentiation among German populations (Fig. 5) and was in agreement with the STRU CTU RE and PCoA analyses. Most clusters were not well supported by bootstrap values. However, all four North American populations, BR1, FC-B, Nantahala, and Constance Bay clustered together. The German population 31 clustered close to Nantahala and Constance Bay populations, but this relationship was not well supported.

Genetic variation at EST-SSRs and nSSRs
Unlike nSSR markers, EST-SSRs are more conserved being located very close to functional genes whose variation is very likely to be limited due to selection. Thus, they feature higher transferability across species within genera than nSSR markers, but are not as polymorphic as nSSR markers (Ellis and Burke 2007;Kalia et al. 2011). Accordingly, for our dataset, mean allelic richness was lower for EST-SSR markers (N a = 5.63) than for the neutral nSSR markers (N a = 15.04). This pattern was also observed for the observed and expected heterozygosity (neutral: H o = 0.798, H e = 0.850; EST: H o = 0.474, H e = 0.494). Although we identified EST-SSR marker FIR013 as an outlier locus, in general, F ST and Jost's D were similar for potentially adaptive EST-SSR and neutral nSSR markers.

Genetic variation patterns indicate the origin of German populations from a limited area
Our results show that 19 rare alleles (relative frequencies = 0.010-0.042), which are private in the examined North American populations, are not found in German populations. These drift-like effects can be attributed to the selection and sampling of seed sources. The occurrence of  (Magni Diaz 2004) but could also have led to a lower differentiation between introduced populations. Concerning the overall level of genetic variation in German Northern red oak stands, additional import of seeding material seems not to be necessary at the moment. But concerning the limited number of North American populations investigated in our study and with respect to future climate change scenarios, additional import of material from other parts of the natural range might be necessary in the future. Interestingly, the number of private alleles among German populations was especially high in Baden-Württemberg, suggesting multiple introductions or admixture of plant material before its introduction (Pettenkofer et al. 2019). In North America, 14 out of 19 private alleles were found in the northern populations BR-1 (9) and Constance Bay (5).
The AMOVA showed that only 1.10% of the genetic variation was due to partitioning among populations. The PCoA clusters that visualized the genetic differentiation between populations (Fig. 4) and the pairwise population genetic distances supported the data of Pettenkofer et al. (2019). German populations clustered densely occupying only a small part of the two-dimensional space in the PCoA plot indicating the introduction from a limited geographic range  Nei et al. (1983). Only bootstrap numbers above 50% are shown (Fig. 4). Likewise, the distribution of point clouds in individual-based PCoA analyses and earlier chloroplast DNA analyses suggest that introduced populations originated from a limited range in the northern part of the species' distribution.
Considering also the similar levels of genetic diversity in native and introduced populations and the absence of recent genetic bottlenecks, the clustering of populations in the PCoA supported the hypothesis that German stands might have been established with material from a particular region within the species' native range.

Weak genetic structure was observed across all sampled populations
Although the bootstrap support was weak for most of the clusters, the NJT showed that the two North American populations that are located close to the Lake Superior in Michigan, USA (FC-B and BR1), are clustered together. For the North American cluster, however, bootstrap support was rather low (= 28%), showing only weak distinction between North American and German populations.
However, there is no strong evidence for inter-populational or inter-regional differentiation within both German stands and North American populations.
In this study, only the EST-SSR marker FIR013 was identified as potentially under selection. It was the most differentiated marker with the highest F ST value among all markers (Supplementary Table 2S), mostly due to the different frequencies of the FIR013 alleles in the North American population BR1 (data not shown), which was also the most differentiated population among all populations with the highest pairwise Jost's D and F ST values. FIR013 is located in a CONSTANS-like gene and represents a region consisting of several tandemly repeated codons CAG and/or CAA encoding glutamine. This gene is known to be involved in the regulation of flowering time and photoperiodic control of growth (Lind-Riehl et al. 2014). Also, the distribution of alleles between populations of closely related Q. rubra and Q. ellipsoidalis shows that FIR013 might also serve as a marker to trace potential introgression of genes from Q. ellipsoidalis into Q. rubra (Lind-Riehl et al. 2014). Considering the overlap of ranges of both species in the Great Lakes region, genetic introgression may be one possible reason for higher differentiation of BR1. However, population BR1 is located outside the current distribution range of Q. ellipsoidalis (Lind-Riehl et al. 2014). The absence of signatures of local adaptation in German populations may suggest that introduced populations were established with material from provenances with similar environmental conditions as the ones prevailing in Germany (Liesebach and Schneck 2011; Pettenkofer et al. 2019). However, we only analyzed a limited number of potentially adaptive markers. Further analyses of candidate genes for local adaptation and of adaptive traits in provenance trials are needed to assess adaptive differences among provenances.

Conclusions and future perspective
Comparatively high levels of genetic variation at both EST-SSRs and nSSRs in German stands suggest that additional import of seeding material from the native range is not required for maintaining a sufficiently high level of genetic variation in Germany. Estimation of chloroplast haplotype diversity (Pettenkofer et al. 2019) further revealed that plantations in the southwest of Germany may provide a variable genetic resource. Moreover, Liesebach and Schneck (2011) state that German provenances perform better in respect of growth than provenances from the natural range. Therefore, additional import of material from the native range is not necessary to enhance the genetic variation or growth of local Northern red oak stands. The analysis of additional candidate genes for local adaptation and of adaptive traits (e.g., drought tolerance) in provenance trials is needed to assess adaptive differences among German and North American provenances. The identification and genotyping of large numbers of SNPs can be achieved by genotyping by sequencing techniques such as restriction site-associated DNA sequencing (RADseq) and thus provide a new source of genome-wide potentially adaptive markers (Miller et al. 2007;Davey and Blaxter 2010).
Finally, North American populations representing the natural range could be characterized at nuclear and chloroplast DNA markers to narrow down the geographic origin of German plantations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.