Abstract
Key message
This genomic dataset provides highly variable SNP markers from georeferenced natural Quercus petraea (Matt.) Liebl. populations collected in Bulgaria, Hungary, Romania, Serbia, Bosnia and Herzegovina, Kosovo* and Albania. These SNP loci can be used to assess genetic diversity, differentiation, and population structure, and can also be used to detect signatures of selection and local adaptation. The dataset can be accessed at https://doi.org/10.5281/zenodo.3908963/ (Tóth et al. 2020 ). Associated metadata available at https://metadata-afs.nancy.inra.fr/geonetwork/srv/fre/catalog.search#/metadata/b6fee4fa-01e9-44d0-92f5-ad19379f9693 .
Similar content being viewed by others
1 Background
In the last decade, restriction site–associated DNA (RAD-seq) sequencing is increasingly used to identify and genotype large numbers of single-nucleotide polymorphisms (SNPs) in various organisms (Allendorf et al. 2010; Davey and Blaxter 2010). Due to the advancements of high-throughput sequencing, this approach allows to produce extremely large collections of genomic data that is fairly evenly distributed across the genome providing hundreds of thousands of polymorphic loci (Allendorf et al. 2010; Seeb et al. 2011). A variation of the approach, known as ddRAD-seq (double-digest RAD-seq), has been widely applied for population genomic analyses in non-model organisms, such as forest tree species, lacking available reference genomes (Konar et al. 2017). ddRAD-seq is suitable for genotyping a large number of loci to determine the extent of genetic diversity, to identify outlier loci being under selection (Schwartz et al. 2010; Burak et al. 2018) and to infer eco-evolutionary population dynamics (Legrand et al. 2017; Hendry 2019).
While forest tree populations are well-studied in Europe, the Central-Eastern European region, including the Balkan Peninsula is less investigated with high-resolution and genome-wide genetic markers. In fact, this region has high topographic and climatic heterogeneity, and long-term environmental stability that significantly contributed to long-term persistence of genetic diversity (Tzedakis 2004; Feliner 2011; Fassou et al. 2020). In addition, the Balkan Peninsula has been postulated to be the primary source of post-glacial expansion while functioning as a Pleistocene glacial refugia (Zhelev 2017; Gömöry et al. 2020). Therefore, this area is considered an important reservoir of genetic resources of European forest tree taxa (Gömöry et al. 2020).
Sessile oak (Quercus petraea (Matt.) Liebl.) is a widely distributed late‐successional forest tree species of the white oak group (sect. Lepidobalanus) with great ecological and silvicultural importance in Europe (Mölder et al. 2019; Blanc-Jolivet et al. 2020). However, the species’ distribution towards the southern limit on the Balkan Peninsula tends to be fragmented (Bruschi et al. 2003; Eaton et al. 2016), and being under intensive forest management and severe ecosystem disturbances such as drought (Milad et al. 2011). Similarly, species’ intraspecific genetic differentiation increases towards the southeast due to its complex evolutionary history. Since current studies mostly focusing on western European populations, the southeastern European distribution range is poorly sampled and investigated (Lang et al. 2020; Leroy et al. 2017, 2020; Kremer and Hipp 2020). This inequality might potentially lead to underestimation of regional genetic diversity. For these reasons, we generated a novel resource of genome-wide SNPs, using ddRAD-seq, in carefully selected natural populations.
2 Methods
2.1 Sampling strategy
The sampling of plant material was designed to cover the southeastern natural distribution range of Q. petraea. Sites were selected based on literature data and personal communication with local forestry experts (contributing authors). For literature data, available information on neutral genetic patterns in European white oaks was reviewed (Dumolin-Lapegue et al. 1997; Gömöry et al. 2001, 2020; Bordács et al. 2002; Csaikl et al. 2002; Petit et al. 2002a, b; Slade et al. 2008; Dering et al. 2008), and the geographic regions corresponding to the major genetic clusters were included in a candidate list of locations. Existing databases of forest genetic resources also contributed in sampling site selection. The European Information System on Forest Genetic Resources (EUFGIS, 2019; http://portal.eufgis.org/), the georeferenced database of forest conservation units, and the European Forest Reproductive Material Information System (FOREMATIS, 2019; https://ec.europa.eu/forematis) were searched and filtered, and the records of “source-identified” and “autochthonous/indigenous” locations were taken into account. Altogether, 180 individuals from 18 autochthonous populations were selected from Central-Eastern Europe and from the Balkan Peninsula (Table 1, Fig. 1). Out of these, four populations situated in Bulgaria (western Balkan, Rila, Rhodope, and Strandzha mountains), three in Hungary (Kőszeg, Bakony, and Mecsek mountains), Romania (Gurghiu, northern and southern foothills of Fagaras mountains), Serbia (Fruska Gora, Rudnik, and Stovoli mountains), Bosnia and Herzegovina (Kozara, Javorova, and Maglic mountains), and one in Kosovo* (Blinaja) and Albania (Djeravica Mt.), respectively (Tóth et al. 2020). Our sampling sites are found in natural sites either as part of nature reserves or without non-natural disturbance and human influence. Accordingly, sessile oak populations were regenerated naturally. Fresh leaves were collected from even-aged mature trees (> 70‐year‐old) by considering at least a 30-m isolation distance between trees to limit sampling-related individuals. Spatial coordinates (GPS) of each tree in each population were recorded. Leaf samples were both dried using silica gel and stored frozen (− 20 °C) in plastic bags before DNA extraction.
2.2 Library preparation and RAD-tag sequencing
Total genomic DNA was extracted from seven leaf disks (5-mm diam.) per each individual using the method of Dumolin et al. (1995).
DNA of each Q. petraea sample was quantified by using the Qubit dsDNA BR Assay Kit and Qubit 3.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). A total of 50 ng of DNA per sample was double-digested with 0.1–0.1 μl of PstI and MspI enzymes at 37 °C for 2 h (FastDigest restriction enzymes; Thermo Fisher Scientific, Waltham, MA, USA). The enzyme combination was selected based on the study of Cumer et al. (2018) and a preliminary restriction site analysis carried out in CLC Genomic Workbench version 12.0 (QIAGEN Bioinformatics, Hilden, Germany). Fragments were double-sided size selected using KAPA PureBeads, with a 0.55–0.80X solution/bead ratio (Roche, Basel, Switzerland) in order to isolate fragments in the range of 300–600 bp. After quantification, inserts (3 ng) were ligated to adapters (Table 2) with using T4 DNA Ligase according to the manufacturers’ protocol (Thermo Fisher Scientific, Waltham, MA, USA). Ligation products were purified using 0.8 vol KAPA PureBeads (Roche, Basel, Switzerland) and amplified by PCR with NEBNext Multiplex Oligos for Illumina (Dual Index Set 1; New England Biolabs, Ipswich, MA, USA) and KAPA HiFi Hotstart Ready Mix (Roche, Basel, Switzerland). The amount of 0.5–0.5 μl of i5 and i7 indexed primers was used per reaction. Thermal cycling conditions were as follows: a 3-min initial denaturation at 95 °C; 17 cycles of 30 s of denaturation at 95 °C, 30 s of annealing at 55 °C, and 30 s extension at 72 °C; and a final 5-min extension at 72 °C. The quality and quantity of the amplicon library were determined by using High Sensitivity DNA1000 ScreenTape system with 2200 Tapestation (Agilent Technologies, Santa Clara, CA, USA) and dsDNA HS Assay Kit with Qubit 3.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA), respectively. Equimolarily pooled libraries were diluted to 10 pM for 2 × 301-bp paired-end sequencing with 600-cycle sequencing kit v3.1 (Illumina, San Diego, CA, USA). Nucleotide sequences of the libraries were determined on a MiSeq Sequencing System (Illumina, San Diego, CA, USA) according to the manufacturer’s protocol. Raw data have been deposited in the NCBI Sequence Read Archive (SRA); BioProject ID: PRJNA699096.
2.3 De novo assembly and SNP calling
All bioinformatics processing was carried out on a Silicon Computers (SGI) HPC server, allocating 40 cores (80 threads) and 38 GB RAM, located at the University of Sopron (UOS), Sopron, Hungary.
Raw short-read sequences (77,101,088) were demultiplexed and adapter-trimmed by using the MiSeq Control Software (Illumina, San Diego, CA, USA). In total, 69,993,001 sequences contained adapter sequences kept for further processing. Next, the implemented FastQ Toolkit was applied to trim bases at the 3′ and the 5′ end with a quality score lower than 30. Reads having a mean quality score less than 30 and shorter than 200 bp were filtered. Computational processing of short-read data was carried out with Stacks 2.0 (Catchen et al. 2013; Rochette et al. 2019). Whole reads were quality-filtered using a sliding-window method (15% of read length) implemented in “process_radtags.” Reads having a quality score below 90% (raw phred score of 10) were discarded (Catchen et al. 2011). In “process_radtags,” reads were truncated to 200 bp as a prerequisite of further processing and to avoid the lower-quality bases to present at the end of the reads (Catchen et al. 2011). During this filtering step, 42,273 sequences were discarded. In addition, one individual was removed from the dataset (BU2-10) since an insufficient number of high-quality reads remained after filtering. RAD loci were reconstructed following the de novo pipeline implemented in “denovo_map.pl” command, in which “ustacks” builds loci and calls SNPs in each sample, then “cstacks” creates a catalogue of all loci for all the samples and “sstacks” matches the loci of the samples against the catalogue (Catchen et al. 2011; Rochette et al. 2019). To execute “denovo_map.pl,” it is essential to optimize the parameters of the command, and define (m) the minimum number of reads required to form an allele, (M) the maximum number of mismatches allowed between two alleles, and (n) the maximum number of mismatches allowed between two individual loci to consider them as homologous (Mastretta-Yanes et al. 2015; Paris et al. 2017). We optimized these parameters throughout the “r80” method, which can effectively maximize the number of SNPs found in 80% of the individuals (Paris et al. 2017). We checked the iterative values (ranging from 2 to 10) of M and n how they affect the gained SNPs (Fig. 2a). After, based on this, we chose the threshold of M = 3, n = 3, applying the M = n rule for the final run (Paris et al. 2017). For the stack-depth parameter, we selected the default m = 3 value (three identical reads). The program “gstacks” was used to assemble paired-end contigs and re-call SNPs using the population-wide data (Rochette et al. 2019).
The “denovo_map.pl” pipeline aligned paired-end reads for a total of 112,750 RAD loci in which 1490 loci had paired-end reads that could not be assembled (1.3%). The remaining 111,260 loci (98.7%) had an average contig size of 361.8 bp. The effective per-sample coverage resulted in 14.3 × (mean) with a 2.2 × standard deviation (min = 10.1 × , max = 27.0 ×) (Fig. 2b). Population-specific statistics are presented in Table 3.
The set of SNPs were called and further processed with the “populations” program (Catchen et al. 2011), in which markers with a minor allele frequency < 0.05, missing individual rate > 0.8, and significant deviation from Hardy–Weinberg equilibrium (HWE, P < 0.001) were filtered out (Catchen et al. 2011). However, the “minimum number of populations” value was set to 1 (SNPs were kept when they were present in at least one population), allowing to be later filtered by the user for the specific research question. Out of the 111,260 RAD loci, 105,326 loci did not pass the missing rate of sample/population and were below the MAF threshold; also, 1099 loci were blacklisted and discarded having variants with significant HWE exception. The final dataset consisted of 5934 loci, composed of 2,443,502 bases, of which 21,951 are highly polymorphic variant sites (SNPs).
3 Access to data and metadata description
The SNP dataset is available from the ZENODO repository; https://doi.org/10.5281/zenodo.3908963 (Tóth et al. 2020). The data records are described in the metadata description files, available at https://metadata-afs.nancy.inra.fr/geonetwork/srv/fre/catalog.search#/metadata/b6fee4fa-01e9-44d0-92f5-ad19379f9693.
The data file presented in this paper is a PLINK-formatted UTF-8 encoded, white-space (tab)-delimited, set of ped and map files. PLINK is a publicly available and widely used software for genomic data manipulation and analysis, capable of converting to other file formats and can perform several population-based tests (Purcell et al. 2007). Within the “.ped” file, the first column corresponds to the population code (Pop_ID: 1–18), the second to the individual identifier. From column three to six, values are not provided (father ID, mother ID, sex, and phenotype are not defined). From column seven, two columns code an observed allele at each SNP position. Missing data are coded as “0 0” as it is defined in the standard PLINK data format (PLINK 1.07; http://zzz.bwh.harvard.edu/plink/). The “.map” file contains four columns, the first is the chromosome number (un: unknown due to de novo assembly), the second is the SNP ID, the third is the SNP genetic position (0: not defined), and the fourth column contains the physical position (bp) within the RAD loci. Users do not need to combine or merge the files; it is ready to be loaded and analyzed. Population IDs (Pop_ID), individual IDs (Ind_ID), and the corresponding GPS coordinates (GPS_1: latitude and GPS_2: longitude) are located in the Appendix Table 5. Further information can be requested from the corresponding author.
4 Technical validation
Validation of the dataset was performed with manual, visual, and numerical tests. The quality of raw short-read data, of all individuals (179), was assessed using FastQC v0.11.9 (Andrews 2010) by investigating per-base quality and per-sequence quality at three steps: at raw read state directly after sequencing, after sequence quality trimming (3′ and 5′ end), and following sequence processing (whole read filtering). Results were consistent with a standard Illumina run, and therefore considered to be of sufficiently high quality for further analysis (Kircher et al. 2011).
5 Reuse potential and limits
Data papers on genome-wide polymorphisms are extremely scarce especially on geographic regions that are sources of biodiversity and in case of forest trees lacking a reference genome. Our data can significantly contribute to the growing number of RAD-seq studies by providing genotype information on species’ rarely investigated southeastern distribution range.
To evaluate the potential of the data, at the population level, we calculated genetic diversity indices including the expected heterozygosity (He), observed heterozygosity (Ho), and the inbreeding coefficient (FIS) with R (R Core Team 2013) using the “adegenet” (Jombart and Ahmed 2011) and “hierfstat” (Goudet 2005) packages (Table 4). Trends of our resulting indices were compared to Blanc-Jolivet et al. (2020), since up until now, no SNP dataset (i.e., SNP genotype collection) from genomic resources has so far been published for Q. petraea. Since our estimators are in accordance with the Blanc-Jolivet et al. (2020) data, as well as with other publications, we can confirm that our panel of markers is suitable for population genomic studies.
It is important to note, however, that polymorphisms from reduced representation methods, such as RAD-seq, might introduce biases in population genetics estimates (Arnold et al. 2013; Andrews et al. 2016), and phylogenetic reconstructions and genomic mappings might require even larger genomic coverage as well as higher SNP density (Sims et al. 2014; Eaton et al. 2017). Despite of these, our generated high-quality SNP resource consisting of 21,951 SNPs provides a valuable tool for future population genetics and genomics studies allowing to carry out wide variety of reference-genome-free investigations (Peterson et al. 2012). The available spatial information (GPS coordinates; Table S1) allows the analyses of spatial genetic pattern, structure (including admixture), and gene flow and hybridization. Due to the relatively large-scale genome representation, fine-scale determination of relationship among individuals and populations is also possible. For example, in the context of phylogeography, our genotype data may supplement studies on evolutionary history, migration from glacial refugia, and past hybridization events increasingly studied nowadays (Leroy et al. 2017, 2020). Since ddRAD-seq has the potential to generate strongly differentiating markers, our data could be used to investigate patterns of divergent selection processes, thus can infer population-level adaptation.
Overall, this dataset provides a unique opportunity to study the genetic background of natural populations and reveals important ecological and evolutionary insights into the present and future distribution of sessile oak.
Data availability
The datasets generated during and analyzed during the current study are available in the ZENODO repository, https://doi.org/10.5281/zenodo.3908963/.
Change history
28 June 2021
A Correction to this paper has been published: https://doi.org/10.1007/s13595-021-01075-y
References
Allendorf FW, Hohenlohe PA, Luikart G (2010) Genomics and the future of conservation genetics. Nat Rev Genet 11:697–709. https://doi.org/10.1038/nrg2844
Andrews S (2010) FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Andrews KR, Good JM, Miller MR, Luikart G, Hohenlohe PA (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nat Rev Gen 17:81. https://doi.org/10.1038/nrg.2015.28
Arnold B, Corbett-Detig RB, Hartl D, Bomblies K (2013) RAD seq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. MolEcol 22:3179–3190. https://doi.org/10.1111/mec.12276
Blanc-Jolivet C, Bakhtina S, Yanbaev R, Yanbaev Y, Mader M, Guichoux E, Degen B (2020) Development of new SNPs loci on Quercus robur and Quercus petraea for genetic studies covering the whole species’ distribution range. Conserv Genet Resour 1-4. https://doi.org/10.1007/s12686-020-01141-z
Bordács S, Popescu F, Slade D, Csaikl UM, Lesur I, Borovics A, Kézdy P, König AO, Gömöry D, Brewer S, Burg K, Petit RJ (2002) Chloroplast DNA variation of white oaks in northern Balkans and in the Carpathian Basin. For Ecol Manage 156:197–209. https://doi.org/10.1016/s0378-1127(01)00643-0
Bruschi P, Vendramin GG, Bussotti F, Grossoni P (2003) Morphological and molecular diversity among Italian populations of Quercus petraea (Fagaceae). Ann Bot 91:707–716. https://doi.org/10.1093/aob/mcg075
Burak MK, Monk JD, Schmitz OJ (2018) Eco-evolutionary dynamics: the predator-prey adaptive play and the ecological theater. Yale J Biol Med 91:481–489
Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. MolEcol 22:3124–3140. https://doi.org/10.1111/mec.12354
Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH (2011) Stacks: building and genotyping loci de novo from short-read sequences. G3: Genes Genom Genet 1:171–182. https://doi.org/10.1534/g3.111.000240
Csaikl UM, Burg K, Fineschi S, König AO, Mátyás G, Petit RJ (2002) Chloroplast DNA variation of white oaks in the alpine region. For Ecol Manage 156:131–145. https://doi.org/10.1016/s0378-1127(01)00639-9
Cumer T, Pouchon C, Boyer F, Yannic G, Rioux D, Bonin A, Capblancq T (2018) Double-digest RAD-sequencing: do wet and dry protocol parameters impact biological results?. bioRxiv 445122. https://doi.org/10.1101/445122
Davey JW, Blaxter ML (2010) RADSeq: next-generation population genetics. Brief Funct Genomics 9:416–423. https://doi.org/10.1093/bfgp/elq031
Dering M, Lewandowski A, Ufnalski K, Kedzierska A (2008) How far to the east was the migration of white oaks from the Iberian refugium? Silva Fenn 42:327. https://doi.org/10.14214/sf.240
Dumolin S, Demesure B, Petit RJ (1995) Inheritance of chloroplast and mitochondrial genomes in pedunculate oak investigated with an efficient PCR method. TheorAppl Genet 91:1253–1256. https://doi.org/10.1007/bf00220937
Dumolin-Lapegue S, Demesure B, Fineschi S, Le Come V, Petit RJ (1997) Phylogeographic structure of white oaks throughout the European continent. Genetics 146:1475–1487
Eaton E, Caudullo G, Oliveira S, de Rigo D (2016) Quercus robur and Quercus petraea in Europe: distribution, habitat, usage and threats. In: San-Miguel-Ayanz J, de Rigo D, Caudullo G, Houston Durrant T, Mauri A (2016) European atlas of forest tree species. European Commission, Brussel, pp 160–163. https://doi.org/10.2788/4251
Eaton DA, Spriggs EL, Park B, Donoghue MJ (2017) Misconceptions on missing data in RAD-seqphylogenetics with a deep-scale example from flowering plants. SystBiol 66:399–412. https://doi.org/10.1093/sysbio/syw092
Fassou G, Kougioumoutzis K, Iatrou G, Trigas P, Papasotiropoulos V (2020) Genetic diversity and range dynamics of Helleborus odorus subsp. cyclophyllus under different climate change scenarios. Forests 11:620. https://doi.org/10.3390/f11060620
Feliner GN (2011) Southern European glacial refugia: a tale of tales. Taxon 60:365–372. https://doi.org/10.1002/tax.602007
Gömöry D, Yakovlev I, Zhelev P, Jedináková J, Paule L (2001) Genetic differentiation of oak populations within the Quercus robur/Quercus petraea complex in Central and Eastern Europe. Heredity 86:557–563. https://doi.org/10.1046/j.1365-2540.2001.00874.x
Gömöry D, Zhelev P, Brus R (2020) The Balkans: a genetic hotspot but not a universal colonization source for trees. Plant SystEvol 306:5. https://doi.org/10.1007/s00606-020-01647-x
Goudet J (2005) Hierfstat, a package for R to compute and test hierarchical Fstatistics. MolEcol Notes 5:184–186. https://doi.org/10.1111/j.1471-8286.2004.00828.x
Hendry AP (2019) A critique for eco-evolutionary dynamics. FunctEcol 33:84–94. https://doi.org/10.1111/1365-2435.13244
Jombart T, Ahmed I (2011) Adegenet 1.3-1: new tools for the analysis of genome wide SNP data. Bioinformatics 27:3070–3071. https://doi.org/10.1093/bioinformatics/btr521
Kircher M, Heyn P, Kelso J (2011) Addressing challenges in the production and analysis of illumina sequencing data. BMC Genomics 12:382. https://doi.org/10.1186/1471-2164-12-382
Konar A, Choudhury O, Bullis R, Fiedler L, Kruser JM, Stephens MT, Gailing O, Schlarbaum S, Coggeshall MV, Staton ME, Carlson JE, Emrich S, Romero-Severson J (2017) High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra. BMC Genomics 18:1–12. https://doi.org/10.1186/s12864-017-3765-8
Kremer A, Hipp AL (2020) Oaks: an evolutionary success story. New Phytol 226:987–1011. https://doi.org/10.1111/nph.16274
Lang T, Abadie P, Léger V, Decourcelle T, Frigerio JM, Burban C, Bodenes C, Guichoux E, Le Provost G, Robin C, Tani N, Léger P, Lepoittevin C, El Mujtar VA, Hubert F, Tibbits J, Paiva J, Franc A, Raspail F, Mariette S, Reviron MP, Plomion C, Kremer A, Desprez-Loustau ML, Garnier-Gere P (2020) High-quality SNPs from genic regions highlight introgression patterns among European white oaks (Quercus petraea and Q. robur). bioRxiv 388447. https://doi.org/10.1101/388447
Legrand D, Cote J, Fronhofer EA, Holt RD, Ronce O, Schtickzelle N, Clobert J (2017) Eco-evolutionary dynamics in fragmented landscapes. Ecography 40:9–25. https://doi.org/10.1111/ecog.02537
Leroy T, Rougemont Q, Dupouey JL, Bodénès C, Lalanne C, Belser C, Labadie K, Le Provost G, Aury JM, Kremer A, Plomion C (2020) Massive postglacial gene flow between European white oaks uncovered genes underlying species barriers. New Phytol 226:1183–1197. https://doi.org/10.1111/nph.16039
Leroy T, Roux C, Villate L, Bodénès C, Romiguier J, Paiva JA, Dossat C, Aury JM, Plomion C, Kremer A (2017) Extensive recent secondary contacts between four European white oak species. New Phytol 214:865–878. https://doi.org/10.1111/nph.14413
Mastretta-Yanes A, Arrigo N, Alvarez N, Jorgensen TH, Piñero D, Emerson BC (2015) Restriction site-associated DNA sequencing, genotyping error estimation and de novo assembly optimization for population genetic inference. MolEcolResour 15:28–41. https://doi.org/10.1111/1755-0998.12291
Milad M, Schaich H, Bürgi M, Konold W (2011) Climate change and nature conservation in Central European forests: a review of consequences, concepts and challenges. For Ecol Manage 261:829–843. https://doi.org/10.1016/j.foreco.2010.10.038
Mölder A, Meyer P, Nagel RV (2019) Integrative management to sustain biodiversity and ecological continuity in Central European temperate oak (Quercus robur, Q. petraea) forests: an overview. For Ecol Manage 437:324–339. https://doi.org/10.1016/j.foreco.2019.01.006
Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89:583–590
Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York
Paris JR, Stevens JR, Catchen JM (2017) Lost in parameter space: a road map for stacks. Methods EcolEvol 8:1360–1373. https://doi.org/10.1111/2041-210x.12775
Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS ONE 7:e37135. https://doi.org/10.1371/journal.pone.0037135
Petit RJ, Brewer S, Bordács S, Burg K, Cheddadi R, Coart E, Cottrell J, Csaikl UM, van Dam BC, Deans JD, Espinel S, Fineschi S, Finkeldey R, Glaz I, Goicoechea PG, Jensen JS, König AO, Lowe AJ, Madsen SF, Mátyás G, Munro RC, Popescu F, Slade D, Tabbener H, de Vries SGM, Ziegenhagen B, de Beaulieu JL, Kremer A (2002a) Identification of refugia and post-glacial colonisation routes of European white oaks based on chloroplast DNA and fossil pollen evidence. For Ecol Manage 156:49–74. https://doi.org/10.1016/s0378-1127(01)00634-x
Petit RJ, Csaikl UM, Bordács S, Burg K, Coart E, Cottrell J, van Dam BC, Deans JD, Dumolin-Lapegue S, Fineschi S, Finkeldey R, Gillies A, Glaz I, Goicoechea PG, Jensen JS, Köni AO, Lowe AJ, Madsen SF, Mátyás G, Munro RC, Olalde M, Pemonge MH, Popescu F, Slade D, Tabbener H, Taurchini D, de Vries SGM, Ziegenhagen B, Kremer A (2002b) Chloroplast DNA variation in European white oaks: phylogeography and patterns of diversity based on data from over 2600 populations. For Ecol Manage 156:5–26. https://doi.org/10.1016/S0378-1127(01)00645-4
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Julian Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. https://doi.org/10.1086/519795
R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Rochette NC, Rivera-Colón AG, Catchen JM (2019) Stacks 2: analytical methods for paired-end sequencing improve RADseq-based population genomics. MolEcol 28:4737–4754. https://doi.org/10.1111/mec.15253
Schwartz MK, McKelvey KS, Cushman SA, Luikart G (2010) Landscape genomics: a brief perspective. In: Cushman SA, Huettmann F. Spatial complexity: informatics, and wildlife conservation. Springer Japan, pp 165–174. https://doi.org/10.1007/978-4-431-87771-4_9
Seeb JE, Carvalho G, Hauser L, Naish K, Roberts S, Seeb LW (2011) Single-nucleotide polymorphism (SNP) discovery and applications of SNP genotyping in nonmodel organisms. MolEcolResour 11:1–8. https://doi.org/10.1111/j.1755-0998.2010.02979.x
Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Gen 15:121–132. https://doi.org/10.1038/nrg3642
Slade D, Škvorc Ž, Ballian D, Gračan J, Papes D (2008) The chloroplast DNA polymorphisms of White Oaks of section Quercus in the Central Balkans. Silvae Genet 57:227–234. https://doi.org/10.1515/sg-2008-0035
Tóth EG, Köbölkuti ZA, Cseke K, Kámpel JD, Takács R, Tomov VT, Ábrán P, Stojnic S, Vastag E, Mataruga M, Daničić V, Tahirukaj E, Zhelev P, Orlovic S, Benke A, Borovics A (2020) A genomic data set of single‐nucleotide polymorphisms (SNPs) generated by ddRAD tag sequencing in Q. petraea (Matt.) Liebl. populations from Central-Eastern Europe and Balkan Peninsula. ZENODO. https://doi.org/10.5281/zenodo.3908963/
Tzedakis PC (2004) The Balkans as prime glacial refugial territory of European temperate trees. In: Griffiths HI, Kryštufek B, Reed JM (Eds.) Balkan biodiversity - pattern and process in the European hotspot; Springer, Dordrecht, pp 49–68. https://doi.org/10.1007/978-1-4020-2854-0
Zhelev P (2017) Studies on the glacial refugia of forest trees on Balkan Peninsula. Contrib Sec Nat Math Biotech Sci 38:129–135. https://doi.org/10.20903/csnmbs.masa.2017.38.2.108
Acknowledgements
The authors are grateful to Ivan Iliev (University of Forestry, Bulgaria), Lazar Kesic and Velisav Karaklić (University of Novi Sad, Serbia), Zoltán Barina (Hungarian Natural History Museum, Hungary), Ibrahim Muja (Ministry of Agriculture Forestry and Rural Development (MAFRD), Kosovo*), Naser Krasniqi (Kosovo* Forest Agency, Kosovo*), Vlado Kaurin, Alen Gačić, Aleksandar Jorgić, Stojan Božić, Mijović Goran, and Spomenko Stojanović; the PFE “ŠumeRS” a.d. Sokolac (Bosnia and Herzegovina); the Institute of Lowland Forestry and Environment (project number: 451-03-68/2020-14/200197); and the Romanian forest authorities from Sibiu and Arges counties for the assistance they offered with the collection of plant material and for the valuable discussions in the course of which they shared their insights. We also thank Zoltán Bihari (Xenovea Ltd.) for providing sequencing resources and useful comments on data analysis.
Funding
This study was supported by the Ministry of Agriculture of Hungary (Kaán Károly project, no.: EVgF/549/2018, EGF/178/2019), by the Ministry of Education, Science and Technological Development of the Republic of Serbia (project no.: 451–03-68/2020–14/200197), and by the National Science Fund of Bulgaria (research grant no.: KP-06-H26/4).
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Marianne Peiffer
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contribution of the co-authors
EGT, ZAK, KC, ABE, and ABO conceived the study. EGT, ZAK, KC, ABE, JDK, RT, VT, PÁ, SS, EV, MM, VD, ET, PZ, SO, and ABE contributed in sampling of plant materials. KC performed laboratory analysis. EGT and ZAK processed genomic data and performed statistical analyses and simulations. EGT wrote the manuscript. All co-authors provided feedback on the manuscript drafts.
This designation is without prejudice to positions on status, and is in line with UNSCR 1244 (1999) and the ICJ Opinion’s on the Kosovo* Declaration of Independence.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Tóth, E.G., Köbölkuti, Z.A., Cseke, K. et al. A genomic dataset of single‐nucleotide polymorphisms generated by ddRAD tag sequencing in Q. petraea (Matt.) Liebl. populations from Central-Eastern Europe and Balkan Peninsula. Annals of Forest Science 78, 43 (2021). https://doi.org/10.1007/s13595-021-01051-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13595-021-01051-6