In face of climate change both species Quercus robur and Quercus petraea are considered as alternatives compared to more drought sensitive tree species in Europe (Albert et al. 2018). The selection of the right oak provenances is essential for the re-forestation programs (Wilkinson et al. 2017; George et al. 2020). Gene markers and especially large sets of SNPs play an important role to distinguish and identify different tree provenances (Blanc-Jolivet et al. 2018; Pettenkofer et al. 2020). In frame of a Russian- German co-operation project, we are developing a suitable set of SNPs and Indels for this purpose. Here we present an enlarged set of markers compared to our first study (Blanc-Jolivet et al. 2020).

For SNP and Indel discovery, we used leaf or cambium material from 95 Q. robur and Q. petraea trees originating from all Europe including Ukraine and Russia for the nuclear SNPs and additional 40 individuals from Europe (Q. robur) and Far East Russia (Q. mongolica) (Schroeder et al. 2016) to check for plastid SNPs (Suppl. 1). For the selection of nuclear SNPs derived from Double Digest Restriction site associated DNA sequencing (ddRAD) (Peterson et al. 2012), we used the same samples and data described in Blanc-Jolivet et al. (2020). From the 3648 loci filtered for their ability to be included in a design (50 bp flanking length around the target SNP and maximum two SNPs in the flanking regions), 484 nuclear loci were selected. Discriminant analysis was conducted grouping the samples per species, per country within species, and state within Germany. Loci with the highest contributions were identified according to each grouping strategy. Further, samples from Germany and Russia were analysed separately to select loci with both high expected heterozygosity and positive fixation index (Fis). Combining the best loci for each grouping strategy allows the development of a multipurpose set of loci, which may distinguish between Q. robur and Q. petraea, show differentiation among countries and have enough intrapopulation genetic diversity for population genetics purposes. We used the packages vcfR, poppR, adegenet and hierfstat in R 3.6.0 to conduct the analysis (Goudet 2005; Kamvar et al. 2014; Knaus and Grünwald 2017).

Additionally, to the SNPs produced by the above described ddRAD, we used data from a previously performed MiSeq analysis with a Q. mongolica individual as a reference and two pooled DNA samples that included 20 individual specimens each of European Q. robur and Asian Q. mongolica. The Q. robur specimens were sampled from ten geographically-widespread populations in Europe and Q. mongolica specimens were sampled from 11 geographically-widespread populations in Far East Russia, China and South Korea (bold in Suppl. 1). The MiSeq analysis, assembly, variation detection and accession numbers for all the data of this analysis is in detail described in Schroeder et al. (2016). Though, originally this data was produced to discriminate within populations of Q. robur or Q. mongolica, respectively, we finally chose ten chloroplast and six mitochondrial SNPs (Suppl. 3) from this previous study because these SNPs turned out to be also helpful for discriminating Q. robur and Q. petraea.

From a total of 559 loci (518 nuclear loci, 34 chloroplast loci, seven mitochondrial loci), a set of 479 loci could be designed for targeted genotyping by sequencing (SeqSNP assay). SeqSNP is a targeted genotyping by sequencing (targeted GBS) service, which allows for genotyping of SNPs and small insertions/deletions using a single primer enrichment technology (Anonymous 2019).

We choose to test our newly developed markers on 100 Q. petraea from 10 locations in Germany and 200 Q. robur from 10 locations in Germany and ten locations in Russia (Suppl. 2). The locations were selected from different regions of the natural distribution range in the countries. All samples were run on Illumina NextSeq 500/550 platforms at LGC. We estimated for each locus the percentage of amplification, observed heterozygosity (Ho), effective number of alleles (Ae), fixation indices (Fis, Fst) according to Weir and Cockerham (1984); Gregorius (1987) and average differentiation (delta) among sampling locations (Gregorius 1987). A final set of 453 loci (437 nuclear SNPs/indels + 10 chloroplast SNPs + six mitochondrial SNPs) was selected from the screened 479 loci. (Suppl. 3). The criteria for the final selection were polymorphism, an amplification rate above 85% and average FIS-values for the nuclear markers between − 0.3 and 0.3 (Suppl. 4).

We computed genetic distances (Gregorius 1984) among the allele frequencies at the 437 nuclear SNPs/indels of the individuals at the 30 locations and entered this data into an UPGM-cluster analysis (Fig. 1) using the program PAST 4.3 (Hammer et al. 2001). The dendrogram showed that the developed new SNP and Indel markers for Q. robur and Q. petraea were useful to distinguish between species and populations at the European level.

Fig. 1
figure 1

UPGM-cluster analysis based on matrix of genetic distances (Gregorius 1984) among the 437 nuclear SNPs/indels