The Oder valley
The Oder River is the second largest river in Poland. Its total length is 854.4 km, 742.0 km of which lies in Poland. The riverhead is located in the Odra Mountains (East Sudetes) in the Czech Republic, 634 m above sea level. The river drains a basin of 119,074 km2 and ends in the Roztoka Odrzańska near Police. The riverbed of the Oder is highly regulated along almost the entire length of the valley. According to historical data, the first embankments were already built along the Oder at the end of the 13th century (Hudak et al. 2018). Currently, the flood protection system consists of levees, polder areas, and, since the end of 2019, the Racibórz Dolny water reservoir. One of the most spectacular examples of the performed transformations of the Oder valley is the shortening of the natural river course by approximately 160 km, which constitutes 1/6 of the total river length (Migoń 2006).
Ecologically, despite far-reaching anthropogenic changes in habitats as well as forest and thicket communities, the tree and shrub flora in some parts of the Oder valley still retain its basic riparian character. For this reason, this river valley constitutes one of the major ecological corridors in the European Ecological Network (EECONET) (Liro et al. 1994). Since 2004, approximately 2300 km2 of the valley has been included in the Natura 2000 network, and this area is already legally protected.
With regard to poplar species, a characteristic chorological feature of this group of riparian forest-forming trees in the Oder valley is the much rarer occurrence of Populus alba (frequency 23.5%) than of P. nigra (73.8%) (Danielewicz 2008). This discrepancy in frequency between species may suggest that the white poplar is much more sensitive than the black poplar to changes in riverside environments. On the other hand, there is also some probability that because of the difficulties associated with distinguishing pure black poplars from hybrids on the basis of morphological characters, the true frequency of P. nigra in the Oder valley is overestimated. This hypothesis is supported by the fact that artificially introduced P. × canadensis hybrid trees are very common along the entire Oder valley, with a frequency of 75% (Danielewicz 2008).
Study sites and plant material
Nine natural black poplar populations located in the Oder valley were analyzed (Fig. 1). Sample sizes ranged from 27 to 99 trees per population (mean of 64.7 individuals), resulting in a total of 582 specimens studied (Table 1). In each study site, we focused on the adult trees that were in good condition. Additionally, the reference individuals of P. × canadensis and four representatives of Populus deltoides were included to eliminate potential hybrids from the analyzed set of black poplar trees in case they had not been recognized based on phenotype. The included reference individuals grow in the Kórnik Arboretum of the Institute of Dendrology, Polish Academy of Sciences. The Euramerican hybrids comprised variants marked as Populus ‘Serotina’ ♂, P. ‘Robusta’ ♂, P. ‘Marilandica’ ♀, and P. ‘Grandis’ ♀. These cultivars represent the most popular male and female poplar variants that have been planted in Poland in the last 100 years as plantations for wood production or singles trees for landscape purposes. Currently, there are no large-area plantations of poplar cultivars in Poland. However, the remnant poplar hybrids occur in the Polish river valleys as single trees or groups of individuals spread along the river valleys with similar frequency as native poplar species (Danielewicz 2008).
DNA was extracted from young poplar leaves by using the standard CTAB protocol (Dumolin et al. 1995). Qualitative and quantitative assessments of the DNA isolates were conducted by absorbance measurement using an Eppendorf BioPhotometer (Hamburg, Germany).
All samples were genotyped with the 12 nuclear microsatellite loci described by Van der Schoot et al. (2000) and Smulders et al. (2001). Specifically, WPMS01, WPMS04, WPMS06, WPMS07, WPMS08, WPMS09, WPMS10, WPMS11, WPMS12, WPMS16, WPMS18, and WPMS20 were used in the study. Marker amplification was performed according to the methodology described by Wójkiewicz et al. (2019). The products of each polymerase chain reaction (PCR) were analyzed using an ABI 3130 capillary sequencer (Thermo Fisher Scientific, Waltham, Massachusetts, USA) with GeneScan LIZ500 internal size standard. The genotypes were scored using GENEMAPPER vs. 4.0 (Wójkiewicz 2020).
Identification of hybrids and clones
The recognition of hybrid trees was performed based on the microsatellite markers used in study, among which WPMS01, WPMS12, and WPMS18 were previously described as diagnostic and useful for the identification of P. × canadensis cultivars (Smulders et al. 2008a; Jelić et al. 2015; Wójkiewicz et al. 2019). In order to identify the hybrids, the STRUCTURE 2.3.4. software was used (Pritchard et al. 2000). For the 590 sampled trees, which comprised the four reference individuals of P. deltoides, four P. × canadensis representatives (marked as P. ‘Serotina’, P. ‘Robusta’, P. ‘Marilandica’, and P. ‘Grandis’), and 582 investigated trees derived from the nine populations, we set K = 2 (the number of species). Twenty independent runs were performed with a burn-in length of 250,000 and 100,000,000 iterations, with admixture model and correlated allele frequencies, without any prior information. Average admixture coefficients were estimated using the LargeKGreedy algorithm as implemented in the program CLUMPP version 1.1 (Jakobsson and Rosenberg 2007). To assign the detected hybrids to the reference poplar cultivars included in the study, the genotypes of all hybrid individuals were matched using the GenAlEx 6.5 software (Peakall and Smouse 2006). Finally, all identified poplar hybrids were excluded from the data set to analyze clonality.
To determine the genotypic resolution power of the 12 microsatellites used in the study, a test of the reliability of loci was performed (Alberto et al. 2005). The number of distinct multilocus genotypes (MLGs) was assessed, and then the clones were identified as sets of individuals that presented the same MLG using the package RClone (Bailleul et al. 2016). Furthermore, as the number of MLGs can be overestimated due to the occurrence of slightly different MLGs resulting from somatic mutations or scoring errors, discrimination of clonal lineages and assembly of similar MLGs into corresponding MLLs were performed. Discrimination analysis was performed by calculating Rozenfeld’s genetic distance (difference in length between alleles; Rozenfeld et al. 2007) for each pair of unique MLGs in the sample that were initially characterized molecularly and comparing these MLGs to each pair of unique MLGs identified as sexually produced by simulations (Arnaud-Hanod et al. 2007). From this distribution, a threshold was determined, under which genetic distances were considered to be due to somatic mutations or scoring errors, and distinct MLGs belonging to the same MLL were identified. With the aim of assessing the relative importance of asexual reproduction, genotypic richness R was calculated for each population based on the number of detected MLLs (Dorken and Eckert 2001). Finally, for the purpose of population genetic analyses, only one ramet of each genet was left in the data set, as clones do not result from sexual reproduction.
Genetic variation and differentiation
To estimate the frequency of null alleles and detect the loci that deviate from the Hardy–Weinberg equilibrium, we used the exact test based on the Markov Chain Monte Carlo (MCMC) algorithm with Bonferroni correction implemented in GENEPOP v. 4.6 (Rousset 2008). The basic information about the markers used in the study are presented in Table 4 in Appendix. To test for linkage disequilibrium (LD) between the 12 pairs of loci at the individual population level and across the populations, the Fisher’s exact test was used in Arlequin 3.22 (Excoffier et al. 2005). Basic genetic diversity parameters (i.e., A—mean number of alleles, Ae—mean number of effective alleles, PA—number of private alleles, Ho—observed heterozygosity, and He—unbiased expected heterozygosity) were calculated in GenAlEx 6.5. FSTAT 2.9.4 (Goudet 2001) was used to estimate the inbreeding coefficient (Fis) and allelic richness (AR) for the minimum sample size of 19 individuals. A Bayesian approach implemented in the INEST 2.0 software (Chybicki and Burczyk 2009) was applied to estimate the inbreeding coefficient, including ‘null alleles’ correction (Fisnull), according to the individual inbreeding model (IIM). The estimation was run with 500,000 MCMC cycles, with every 200th cycle updated and a burn-in of 50,000. The deviance information criterion (DIC) was used to compare the full model (‘nfb’, when Fis>0) with the random mating model (‘nb’, when Fis = 0) to assess the determinants of homozygosity level. The significance of heterozygote deficiency in the sampled populations was assessed by the U test (Guo and Thompson 1992) in GENEPOP, and p values were obtained with the Markov chain algorithm using default settings.
NeEstimator software ver. 2.01 (Do et al. 2014) was used to estimate the effective population size (Ne^) of each studied population with the LD approach (Waples and Do 2008). To test the sensitivity of the method to the presence of rare alleles, the results obtained with three different allele frequency cut-off thresholds (i.e., Pcrit = 0.01, 0.02, and 0.05) were compared. The 95% confidence intervals (CINe^) were derived using the ‘parametric’ option with χ2 approximation (Waples 2005).
Finally, we assessed interpopulation differentiation by hierarchical analysis of molecular variance (AMOVA) using both FST and RST values computed for all pairs of populations in Arlequin 3.11. To evaluate the influence of stepwise mutations on the differentiation level of populations, RST and permuted RST (pRST—which corresponds to FST) values were compared using the test proposed by Hardy et al. (2003) and implemented in the SPAGeDi ver. 1.4c software (Hardy and Vekemans 2002). Moreover, overall and pairwise FST values were also calculated with a correction for the presence of null alleles (excluding null alleles (ENA), FSTNA) with the use of FreeNA software (Chapuis and Estoup 2007). The bootstrapped 95% confidence intervals (CIs) of FSTNA were calculated using 2000 replicates over the loci. The statistical significance (at the level of p = 0.01) of the estimated pairwise FST values was tested by 10,100 random permutations using the Arlequin software.
Demographic history of the populations
We used two different approaches to elucidate the demographic history of the studied black poplar populations and to test whether past environmental transformations of river landscapes coupled with potential population size variation left detectable signatures of genetic bottlenecks. For each population, we calculated the M ratio (MR, Garza and Williamson 2001) and ran the Wilcoxon test for heterozygote excess (Cornuet and Luikart 1996) using the INEST ver. 2.2 software. The analysis was performed under the two-phase mutation (TPM) model with two parameters: pg = 0.22 (the proportion of multistep mutations) and δg = 0.31 (the mean size of the multistep mutations). The significance of a potential bottleneck was tested using Wilcoxon’s signed-rank test based on 1,000,000 permutations.
Genetic structure and gene flow patterns
To describe the spatial autocorrelation within the populations, the average kinship coefficient over the pairs of studied individuals was computed using the SPAGeDi software. Distance intervals were adjusted by SPAGeDi to obtain approximately the same number of pairs of individuals within each of eight distance classes. The statistical significance of the autocorrelations was tested by 10,000 random permutations, with a 95% CI. Average kinship coefficients between pairs of individuals for each distance interval were plotted against distance classes in a diagram. Significant autocorrelation is shown as an outlier in the observed data from the 95% CIs. Moreover, SGS was also quantified using the Sp statistics (Vekemans and Hardy 2004). Matrices of pairwise spatial physical distances and genetic distances based on kinship coefficients within each population were obtained for simple sequence repeat (SSR) loci using SPAGeDi. Within each population, the relationship between matrices was assessed using the Mantel test implemented in GenAlEx 6. The significance of the Mantel test was evaluated based on 1,000 permutations.
To investigate migration patterns among the analyzed populations, Bayesian assignment testing (Rannala and Mountain 1997) was performed using the Geneclass2 software (Piry et al. 2004). Moreover, the Mantel (1967) test was applied to evaluate whether the distribution of genetic variation was geographically structured and to verify the hypothesis of isolation by distance (IBD) between populations. For this purpose, the GenAlEx software was used, with 1,000 random permutations of the relationship between genetic differentiation, quantified as FST/(1- FST), and corresponding geographical distance matrices between populations.
The substructuring of the black poplar gene pool was evaluated by a nonspatial Bayesian clustering model implemented in STRUCTURE 2.3.4. Twenty independent runs were performed for each K from 1 to 15 (the user-defined number of clusters), with a burn-in length of 250,000 and 100,000,000 iterations. The probability distributions of the data (LnP(D)) and the ΔK values (Evanno et al. 2005) were visualized using the STRUCTURE HARVESTER Web application (Earl and VonHoldt 2012). Average admixture coefficients were estimated for each value of K using the LargeKGreedy algorithm as implemented in the program CLUMPP version 1.1. STRUCTURE plots were generated using the STRUCTURE PLOT v.2.0 Web application (Ramasamy et al. 2014). Furthermore, AMOVA analysis was conducted between the groups of populations defined by STRUCTURE, and significance was tested using 10,000 random permutations in Arlequin 3.11.