Norway spruce (Picea abies (L.) H.Karst) is one of the most economically and ecologically important timber species in Europe with a distribution range from Central Europe to Northern Europe and western Russia (Caudullo et al. 2016). In Central Europe, a significant part of the present days distribution of Norway Spruce is outside the natural species range and has been created by artificial regeneration using distant seed sources (Jansen et al. 2017).

Several studies have investigated the geographic population structure of the natural species distribution in Europe with mitochondrial markers (Tollefsrud et al. 2008) and nuclear SNP markers (Tollefsrud et al. 2009; Dering et al. 2012; Scalfi et al. 2015). Chen et al. (2019) also studied the population structure and transfer of forest reproduction material with a focus on Northern Europe based on markers from exome sequencing. Additionally, the studies from Azaiez et al. (2018) and Depardieu et al. (2021) identified adaptive variation in conifers with respect to population genomics and drought.

The genetic assignment of seed origin with a broad set of informative genetic markers is key to distinguish autochthonous from planted populations (Blanc-Jolivet & Liesebach 2015). With our new set of SNP markers, we want to fill this gap with focus on Central Europe.

Based on ddRADseq (Peterson et al. 2012) a screening of putative nuclear SNPs separating populations by geographic origin on 26 samples from 23 populations (Table 1) was performed. DNA was extracted according to (Dumolin et al. 1995). Sequencing was done by Floragenex Inc. One of the 26 samples was used for an assembly with the software velvet (Zerbino & Birney 2008) (minimum contig length: 200 bp, minimum contig coverage: 6×, expected contig length: 600 bp) resulting in 112,422 contigs with a N50 contig length of 160 bp and with a total assembly length of about 20 Mbp. Read mappings against the contigs were done with bowtie (Langmead et al. 2009). A screening for SNPs with samtools (Li 2011) identified a set of 45,044 SNPs, which was filtered by selecting all variants without neighboring SNPs within 50 bp of flanking sequence. We further restricted the selected loci with calling rates > 90% and minor allele frequencies higher than 1%. In order to select loci potentially showing geographical structure, discriminant analysis of principal components (DAPC) was conducted to detect the optimal number of genetic groups. Loci with highest allele contribution were selected [function var.contr in R package Adegenet, (Jombart 2008)]. Additionally, we grouped the data per country and selected the loci with the highest average differentiation [Dj > 0.2, (Gregorius and Roberds 1986)] using GDA_NT 2021 (Degen 2021). Finally we selected loci interesting for population genetics purposes, showing high average intra-country polymorphism (within population gene diversity, Hs > 0.2) but without excess of heterozygosity (fixation index, Fis > 0) using the basic.stats function of the R package hierfstat (Goudet 2005). Altogether, this resulted in a final set of 397 variants (Table S2).

Table 1 26 samples used for the genetic screening with ddRADseq and the development of drought stress markers. Extreme drought stress behavior was evaluated by analyzing tree ring patterns of a total of 628 trees from a provenance trail

A set of adaptive SNPs for drought stress has also been delimited as we expect that these SNPs could be used to analyze geographic population structure if clinal adaptation occurred (Li et al. 2019). These markers have been developed based on exome-seq of candidate genes identified by RNA-Seq (sequencing done by GATC Biotech AG) in a drought stress experiment (six ramets of a clone originated from tissue culture, three plants as control and drought stress each during flushing period). Read mappings were done with the STAR aligner (Dobin et al. 2013), reads were counted with htseq-count (Anders et al. 2015) and differential gene expression analysis was done with Deseq2 using the Wald test (Love et al. 2014). Differentially expressed genes (|log2fold change|> 2, FDR adjusted p-value < 0.05) assigned to significantly enriched drought stress-related Gene Ontology terms [gene set enrichment analysis with Cytoscape using the plugin Bingo Shannon et al. 2003; Maere et al. 2005)], were consequently examined with exome-seq of 26 phenotyped individuals (Table 1) (exome-seq done by GATC Biotech AG). All original 88,478 SNPs were based on the reference genome of Picea abies (Nystedt et al. 2013) (gene only scaffolds). SNPs were filtered and selected using SAS statistics software (9.4 TS Level 1M5), looking for significant differences in allele or genotype frequencies between the resistant and sensitive groups. This step also removed all SNPs with near-complete heterozygous genotypes that were likely to be paralogous SNPs. For genotyping in a first run with 96 samples (Tables 1 and S1), we used targeted GBS applying single primer enrichment technology (Barchi et al. 2019; Scaglione et al. 2019). Probe design for the selected 777 potentially adaptive SNPs and sequencing for 500 SNPs with most specific probes were done by LGC Genomics GmbH (service SeqSNP). 319 SNPs fulfilled the conditions with calling rates > 90% and minor allele frequencies higher than 1% and could be recommended for further use. 121 of these SNPs have been added to the preselected set originating from ddRADseq (Table S2).

In total, 518 loci (397 loci from ddRADseq and 121 loci from exome-seq) represent a solid basis for further population genetic applications allowing to explore geographic population structure and adaptive genetic differentiation in the natural distribution range as well as outside of the native range of Picea abies. For further studies all SNPs, including flanking sequences (Table S3), as well as functional annotation of the adaptive SNPs are provided (Table S4).