Introduction

European wild apple or crabapple (Malus sylvestris (L.) Mill.) is a major contributor to the chloroplast and nuclear genomes of the cultivated apple, Malus domestica (Coart et al. 2006; Cornille et al. 2012, 2013, 2014; Nikiforova et al. 2013; Sun et al. 2020). At this moment, Malus sylvestris is an endangered tree species in Europe due to destruction of suitable habitats and conversion of open-type forest into mature forest types (Stephan et al. 2003; Coart et al. 2003; Jacques et al. 2009).

In the Netherlands, M. sylvestris is rare, occurring as single individuals or small groups of (6-34) trees scattered across the Eastern part of the Netherlands (Koopman et al. 2007). There are various threats to these wild apple populations. Firstly, the small forest patches in which the wild apple trees occur formerly consisted of coppice wood but as this practice has been discontinued, large trees of other species create low-light conditions, therefore the apple trees grow poorly and flower seldomly. Hence, there is very little regeneration of the population. Secondly, spontaneous apple trees resulting from thrown-away ‘cultivated’ apples appear in forest edges and may hybridize with the remaining wild apple trees. Because of these threats, ex situ conservation measures were taken in 2002 by establishing a field genebank for M. sylvestris in the Netherlands (in vivo ex situ). Therefore, the original trees at different localities within the Netherlands were vegetatively propagated by grafting. For safety and seed orchard purposes, each accession was represented by 2 to 5 copies in the collection. At the same time, these wild apple populations were genetically characterized using selected microsatellite markers (eight microsatellites across linkage groups plus seven additional markers on linkage group 10; Koopman et al. 2007). These markers revealed a weak signal of introgression from cultivated apple in the genotyped accessions, but no direct descendants of cultivated apple varieties were detected. Based on this information, some of the sampled trees were excluded from the collection.

Ex situ collections which are maintained as living trees in a field genebank, are expensive to establish and maintain and therefore they preferably serve multiple purposes. They have to be efficiently and properly managed. The aim of the ex situ field collection of wild apple in the Netherlands is twofold: 1) to assure the long-term conservation of the genetic diversity of wild apple as present in the Netherlands, and 2) to facilitate use of the collection by various users for research, and seed and plant production for re-introduction in nature. Therefore, the collection is also managed as a seed orchard.

Efficient germplasm management consists of preserving the highest diversity with the smallest number of accessions. Since the establishment of the collection, about 15 years ago, several issues have been raised and questions asked about the size and composition that is needed for optimal genetic management of such a small collection of European wild apple. These were addressed by the present study and are introduced below.

First, it is necessary to ensure the genetic integrity of the accessions (Zurn et al. 2020), which may be affected by mislabelling during genebank management (e.g. accessions with the same code but different DNA profiles or accessions with different codes but the same DNA profile). The genetic integrity of an accession can also be compromised by rootstocks overgrowing the scion, a potential threat as the grafting success of M. sylvestris was sometimes relatively poor due to variable quality of budwood and poor compatibility. Furthermore, detection and removal of accessions with introgression from M. domestica can help to maintain the integrity of the genebank collection.

In addition, there is little knowledge on the degree of genetic relatedness (sibship, family structures) among the accessions and how this affects the genetic diversity of the collection. The majority of the accessions in the collection are from small, relict populations, so family relations among accessions may be expected. Insight in the degree of relatedness among accessions can contribute to reducing genetic redundancy in the collection.

Second, regarding acquisition of new material as an alternative for adding clonal copies of the original trees to the collection could be through collecting seeds in situ and adding their seedlings as new accessions to the collection. However, incorporating seedlings in the collection as an alternative may increase the chance of including hybrid material with cultivated apple. In addition, compared to the original mature trees sampled before, present-day seedlings may be more related to each other due to shared parentage and genetic drift in a population that is decreasing in size and has low and skewed fruit set.

Third, insight into population structure and genetic relatedness among the accessions can be used to optimize the planting design of the collection as a seed orchard (e.g. should genotypes from the original populations be kept separate as conservation seed orchards or should they be mixed as a single seed orchard) or even to plan a mating design with specific crosses between genotypes that are genetically most dissimilar. It may also support the harvest strategy in the seed orchard for maximising the genetic diversity in the seed lot as well in in situ populations to be used for reintroduction.

Therefore, this study had three objectives: 1) to detect the extent of mislabelling, redundancy, and introgression with M. domestica in the M. sylvestris collection; 2) to assess the level of genetic structure and relatedness (parent-offspring or sibling relationships) in the collection, and 3) to define a genebank management strategy for the use of seed-derived material from the collection.

Compared to microsatellite markers (as used by e.g., Larsen et al. 2006; Koopman et al. 2007; Bitz et al. 2019; Reim et al. 2020), genome-wide SNP data enable a much finer overview of the genetic composition of the accessions, their genetic distance and genetic relationships, and the occurrence of introgression. We decided to address these questions using genomic tools that have been developed in the last 15 years for M. domestica. Notably, we used one of the arrays with single nucleotide polymorphism (SNP) markers with a known degree of polymorphisms in cultivated germplasm, namely the 20K SNP Infinium array (Bianco et al. 2014), the dense, genome-wide genetic reference map for most of the 20K SNP array markers (Di Pierro et al. 2016), and the reference genome sequences for the cultivated apple (Velasco et al. 2010; Daccord et al. 2017; Zhang et al. 2019). Using these resources, duplicates may be identified (Pikunova et al. 2014), parent-offspring relations may be established (Evans et al. 2011; Howard et al. 2017, 2018; Van de Weg et al. 2018; Vanderzande et al. 2019; Skytte AF Sätraa et al. 2020), and possible introgression from cultivated germplasm may be identified by comparison to the genetic data available for a large part of the cultivated germplasm (Peace et al. 2019).

Material and methods

Plant material

The ex situ genebank collection of M. sylvestris in Roggebotzand (the Netherlands) is a field collection of trees managed by the State Forest Service, existing as grafted trees (with clonal replicates) from budwood that was collected at six locations in the Netherlands since 2001 (Table 1; Fig. 1; Koopman et al. 2007). Trees with phenotypic characteristics that could point to cultivated apples such as hairiness of the underside of the leaves, large leaves, or fruits with red blush (Maes 2013) were excluded during that sampling.

Table 1 Origin of the M. sylvestris accessions and in situ sampled trees
Fig. 1
figure 1

Locations of the M. sylvestris populations in the Netherlands

For the current study, young leaves were collected from 137 trees in the genebank, 14 trees in situ of which one tree was considered to be a putative hybrid (Supplementary Table S1) and, additionally, leaves of 29 one-year old seedlings in a nursery (Zundert). In total 180 samples were collected. The 137 trees sampled in the genebank comprise of 115 accessions and 22 clonal replicates of these accessions (10 accessions with 1 replicate and 6 accessions with 2 replicate trees). The original location of the accessions and trees sampled in situ is given in Table 1. The seedlings were grown from seeds collected from dropped apples picked up at two of the in situ locations (Nijmegen and Veluwe). The young leaves were stored at −80C, freeze dried, and genomic DNA was extracted according to Fulton et al. (1995).

SNP genotyping

All samples were genotyped with the 20K Infinium SNP array (Bianco et al. 2014) at the Fondazione Edmund Mach according to the procedures described by Chagné et al. (2012) and Antanaviciute et al. (2012). The array contains 18019 SNPs (Bianco et al. 2014). Most of the SNPs are located on the genetic map of Di Pierro et al. (2016). This was the first time that the array was used in M. sylvestris.

SNP calling for the 180 M. sylvestris samples was done as part of a larger set of samples that also included M. domestica varieties and selections, using Genome Studio (Illumina Inc., San Diego, CA, USA; http://www.illumina.com) with manually adjusted cluster definitions obtained through an ongoing apple pedigree project (Howard et al. 2018). Within the set of M. sylvestris samples, markers with >10% missing values were removed, along with monomorphic markers and markers with a minor allele frequency (MAF) <5%. Genotypes with >5% missing values were also removed. Triploids were detected by examining B allele frequency (BAF) plots in Genome Studio (Chagné et al. 2015).

The first step in the analysis was the identification of duplicate genotypes. Screening for identical genotypes, among all pairs of M. sylvestris samples, was done using sets of SNPs for which data were available for both genotypes of the pair.

From the large reference dataset on M. domestica produced with the 20K array available to us through the EU-FruitBreedomics project, 33 popular varieties grown in commercial fruit orchards and private gardens in the first half of the 20th century in north-western Europe, or popular varieties for consumption in that period (Supplementary Table S1), were added to the filtered M. sylvestris dataset to be able to determine the degree of differentiation between the two species, the occurrence of hybridization and, in case of introgression from cultivated varieties, the regions of introgression and possibly direct parentage of some of the cultivated trees. These 33 varieties were also used as outgroup in the Structure analysis (Pritchard et al. 2000). We were also able to check against SNP data of a set of 65 rootstocks, which were provided by NIAB-EMR, UK (unpublished), in order to check if the genetic integrity of an accession was compromised by rootstocks overgrowing the scion.

Parent-offspring relations

Since apples were picked near in situ trees, which were also genotyped, we expected to find genetic relationships between the seedlings and some adult trees. Parentage reconstruction analysis to detect parent-offspring relationships was done for (1) the filtered unique M. sylvestris genotypes and (2) this set plus the set of 33 M. domestica varieties to enable the identification of feral trees (direct offspring of varieties) and of introgression events (plants with a relationship to M. domestica but less than direct offspring) (Cronin et al. 2020).

We used a custom R script to determine Mendelian errors between genotypes in any putative set of parent-child (P-C) relationship (duo) as well as complete parentage (trio or P-P-C). In case of trios, the script counts the number of Mendelian errors observed in the offspring, given the genotypes of two putative parents and under the assumption of parent-child relation. The analysis for duos counts the number of opposite homozygous calls, which are Mendelian errors when assuming a parent-child relationship. We used a low percentage of opposite homozygosity as indication for putative parent-offspring relations, and higher percentages as indicative of more distant relationships or, if a variety was involved, as a sign of introgression of M. domestica into M. sylvestris (see Vanderzande et al. 2019). In addition, the data were also compared to the dataset from the ongoing apple pedigree reconstruction project described in Howard et al. (2018), which includes thousands of varieties.

Population structure and genetic distance to M. domestica

For the comparison with cultivated apples, a dataset was added of SNPs of 33 M. domestica varieties that were popular varieties grown in commercial fruit orchards and private gardens in the first half of the twentieth century in north-western Europe, or that were popular varieties for consumption in that period (Supplementary Table 1). A principal component analysis (PCA) was performed by cmdscale (R core team, 2017) of all filtered unique M. sylvestris genebank accessions, plus the 33 M. domestica varieties.

Structure version 2.3.4 (Pritchard et al. 2000) was also run on the filtered unique genebank accessions together with the M. domestica varieties. Posterior probabilities between 0.5 and 0.85 for the largest group assignment in the M. sylvestris samples were taken as putative cases of introgression from M. domestica to M. sylvestris.

We also performed a Structure analysis per chromosome, using the genetic map of Di Pierro et al. (2016) as guidance for chromosome-specific SNP sets, as introgression from M. domestica may vary among chromosomes (Supplementary Tables S3A and S3B). For this analysis, we selected around 120 markers per linkage group (every fifth marker on the genetic map), 2099 markers in total, and looked for q >0.50 on each linkage group separately as tentative M. domestica inferred ancestry.

Fst values between the mature M. sylvestris genotypes and M. domestica varieties and among the M. sylvestris populations from which the trees were originally sourced, were calculated using Fstat 2.94 (Goudet 1995), using 600 randomly chosen SNPs.

Results

Genotyping and quality filtering

An analysis of missing SNP values per sample produced only two samples with >5% missing values, which were subsequently excluded, leaving 178 M. sylvestris samples for further analyses. One was most likely a mixture of two genotypes and thus the results of a DNA sampling error, but the other sample, D180283 from population Drenthe, was triploid. A maximum likelihood shared haplotype test which is part of the apple pedigree reconstruction project (described by Howard et al. 2018) indicated that its parents are most likely also M. sylvestris.

Of the 18019 SNPs on the 20K SNP array, 3552 (20%) were excluded because of >10% missing values, and 5348 markers (30%) were excluded because of <5% MAF. An additional 1007 markers (6%) were completely monomorphic in the dataset of 178 M. sylvestris samples. This resulted in 8112 SNPs for further analyses (Supplementary Table S2).

Detection of duplicates

In the total dataset 23 sets of duplicates were found. No duplicates were found in the 29 seedlings. One duplication was observed among the 14 trees that were sampled in situ in forest patches, which was not expected. In the remaining 135 samples from the field genebank, 18 genotypes were found twice and 4 genotypes three times. Hence, 23 genotypes were represented by 48 samples. From these 48 samples, the members of 23 pairs of samples were identical for all the marker scores that could be compared (between 7855 and 8053 SNPs, depending on the number of missing values in one of the two samples), for 4 pairs there was 1 mismatch, and for 3 pairs there were 2-3 mismatches between the members of a pair. Non-duplicate pairs showed at least 18% differences in the genotype scores. Duplicate samples could thus be easily and accurately distinguished from non-duplicate samples.

During the sampling, we had included a number of clonal replicates of 16 accessions. All clonal replicates were correctly retrieved except two: the mixed sample (a DNA sampling error) and a different genotype than expected, which may indicate a case of mislabelling in the field genebank. Five sets of duplicates were found in the genebank, which were not expected forehand. Three sets of these accessions may have been originally sampled from an ancient coppice system that was not recognised in the field, and thus these trees will probably be genuine ramets. In the other two cases, the duplicated accessions originated from different localities and these are probably caused by mislabelling during the nursery phase or planting in the field collection.

Removing duplicates and mislabelled trees reduced the number of unique genebank genotypes to 110, plus 13 genotypes sampled in situ. The average heterozygosity of the 123 unique genotypes plus the 29 seedlings was 0.30 while the median was 0.29 (Fig. 2).

Fig. 2
figure 2

Distribution of level of heterozygosity of the 152 genotypes (110 accessions, 13 in situ trees and 29 seedlings collected in situ)

Parentage analysis—finding trios to detect parent-offspring and sibling relationships

A distinct gap in Mendelian errors when assuming perfect trios, was observed between a small group of trios and the rest (Fig. 3A). Those with 2% or less Mendelian errors were considered “realistic trios” and these are listed in Table 2. The great majority of the <2% errors was due to null alleles (on average ~25 per genotype), indicating that the genotype calling made very few mistakes.

Fig. 3
figure 3

Distinguishing Mendelian errors when assuming (a) complete parentage P-P-C (trio) and (b) parent-child P-C (duo) Mendelian relationships in possible combinations of the M. sylvestris genotypes. Trios and pairs are indexed according to the percentage Mendelian error. In Fig. 3A, 1.7% error is the highest value for true trios that were part of the set. In Fig. 3B, the inflexion point is around 7% error, and only 2.2% (247/11476) of the possible pairs of duos has less than 7% error

Table 2 Parentage analysis of apple seedlings obtained from two in situ populations, based on trios. The two parents are in random order. The five full-sib families found are indicated by numbers

For one accession in the field genebank, both parents could be traced in the genebank itself. The tree and its parents originated from population Zeldersche Driessen, where the distance among them was less than 5 m.

We identified both parents for 25 of the 29 seedlings in the dataset (10 seedlings in the Nijmegen population and 15 in the Veluwe population). In total, 13 different trees were the parents of these 25 seedlings; 6 of the Nijmegen population and 7 of the Veluwe population. The number of offspring per parent ranged from 1 to 9. The trios we identified enabled us to reconstruct full-sibs as well as half-sibs among these offspring. Five full-sib families were present (Table 2), which may include offspring from reciprocal crosses. The distance between parental trees of a seedling in the natural populations varied between 21 and 969 m in Nijmegen and between 14 and 85 m in Veluwe. The parents of the 19 full-sibs were all located at less than 55 m distance from each other.

Relatedness analysis using duos

There was no clear gap in Mendelian error rate (based on opposite homozygosity) among the tested potential P-C relationships (Fig. 3B), due to which the differentiation between real and random pairs was less evident than with trios. The inflexion point was at around 7% error. This is comparable to the error limit used by Muranty et al. (2020), although they had a much larger set of SNPs. In the absence of a gap, we used the Mendelian error rate in each parent-child (duo) relationship from the 25 trio relationships in the seedlings that we already confirmed (Table 2) as reference values. Among these 50 duo parent-child relationships the Mendelian error rate ranged between 0.27% and 0.90% whereas the Mendelian error rate among 28 full-sib relationships from this material ranged between 1.13% and 3.99% and the 53 relationships between true half-sibs ranged 3.04-6.81%. In contrast, all relationships between seedlings from different populations, which we assume to be unrelated, ranged from 7.32 to 23.63%. This is similar to the 6.26–23.58% error for relationships between the mature trees from different populations that also may be assumed not to be closely related. Therefore, a Mendelian error rate below 0.90% was used for confirming parent-child relationships, below 3.5% for full-sibs and below 7% for half-sibs or third-degree relationships.

We identified three parent-child relationships among the genebank accessions (0.34-0.53% error) (Table 3): two in Zeldersche Driessen and one in Nijmegen. In addition, we found four putative full-sibs (1.47–2.51% error) and 14 putative half-sibs (4.29–5.24% error) through unknown parents. In each case, the pairs were collected from the same population. We also found 12 relationships with 5.38–6.16% error, always from the same population, which may represent second- or third-degree relatedness. Overall, 32 of 110 unique accessions (29%) were closely related to each other in the field genebank at the level of parent-offspring, siblings and half-sibs.

Table 3 Close genetic relationships between accessions in the M. sylvestris genebank, based on duo analysis. (P-C is parent-child, FS is fullsib, HS is halfsib)

In addition to the duo relationships among accessions in Table 3, accession D180244 was parent of seedling D180298 (0.84% error) in the Nijmegen population. Accession D180211, along with being a parent in three of the trios in Nijmegen, was also a parent of seedling D180324 (0.56% error). The other parents remain unknown.

Thus, in total, we have identified 90% of the parents of the seedlings in the dataset, namely both parents for 25 seedlings, one parent for two seedlings, and neither parent for two seedlings.

In addition, when comparing the data with the dataset from the ongoing apple pedigree reconstruction project described in Howard et al. (2018), which includes thousands of varieties, it was found that D180295 is an offspring of the variety ‘Vlaamsche Schijveling’.

Population structure and genetic distance to M. domestica

For a comparison with cultivated apples, the SNP data of 33 M. domestica varieties grown in commercial fruit orchards and private gardens in the first half of the twentieth century in north-western Europe, or used for consumption in that period, were added. When the SNP data were filtered for this extended set of samples to optimise the set of SNPs for population differentiation analysis, a set of 11551 SNP markers remained, or close to 3500 more markers (Supplementary Table S2). About 100 additional markers with too many missing values in the M. sylvestris accessions were now included, as well as almost 500 markers that were monomorphic in M. sylvestris, but the great majority (more than 2900 SNPs) were now included as MAF > 5%, due to the polymorphic scores contributed by the M. domestica genotypes. Thus, 29% of the SNPs that are polymorphic in M. domestica were monomorphic or had very low MAF in our limited set of M. sylvestris genotypes.

The mature M. sylvestris populations were not much differentiated. The Fst values varied between 0.011 and 0.062 (Table 4). In contrast, the Fst value between the groups of M. sylvestris and M. domestica samples was 0.307.

Table 4 Pairwise Fst values as measure of differentiation among M. sylvestris populations and from M. domestica

Within the 123 unique mature M. sylvestris genotypes, no obvious structure could be detected in the PCA (Fig. 4A). The genotypes form one group with a few outliers on the first axis. These are unlikely to be rootstock sprouts, as no relationship was detected with 65 rootstocks.

Fig. 4
figure 4

Principal component analysis of 123 M. sylvestris genotypes without (A) and with (B) 33 additional M. domestica varieties. PC1 of Fig. 4B represents the difference between the two species; red dots are M. sylvestris, green triangles are M. domestica. The blue squares are M. sylvestris accessions with obvious introgression. D180202, the accession with one M. domestica parent, is the encircled blue square closest to M. domestica

The M. sylvestris genotypes and the domestic varieties clearly clustered separately in a PCA, where the first principal component explained 23.2% of the observed variance (Fig. 4B). Seven individuals were in between the M. sylvestris and M. domestica clusters. One parent of D180202 was identified as ‘Sterappel’ (syn ‘Reinette Rouge Etoilee’ or ‘Early Red Calville’). D180202 was sampled in situ in Vijlenerbos, which is in the southern part of the country, and the tree is located at the edge of the forest and was included as it had an intermediate morphology.

Structure analyses

The Structure analysis of the 123 unique mature M. sylvestris and the 33 M. domestica samples at K=2 separated the two species well (Fig. 5). The seven intermediate M. sylvestris samples in the PCA are also the seven samples with the highest level of admixture (q=0.30–0.85). When we performed the analysis for the 17 chromosomes separately, 23 individuals had q > 0.50 for tentative M. domestica inferred introgression, including the seven intermediate genotypes from the PCA and the overall Structure analysis (Supplementary Tables 3A and 3B). In ‘Sterappel’-offspring D180202, we found the signal on 12 of the 17 chromosomes. The other six intermediate genotypes had signals on two to six linkage groups. Of the other sixteen individuals, five had a signal on two chromosomes and 11 had a signal on only one chromosome.

Fig. 5
figure 5

Structure analysis of the M. sylvestris accessions in the field genebank along with the 33 M. domestica reference varieties, at K=2 (i.e., assuming two groups). Each individual is represented by a vertical line, the black lines separate the nine populations where the trees originate from (in the order as presented in Table 1 and Supplemental Table 1). Group 10 = the 33 M. domestica varieties. The individually estimated posterior probability to each of the two clusters are indicated by colors. The seven intermediate samples do have the highest level of admixture (q=0.30–0.85)

Across linkage groups, only five linkage groups showed introgression in five or more individuals, notably linkage groups 5 (in 12 individuals) and 2 (in eight individuals). Introgression on linkage group 17, on which the incompatibility locus is located (Maliepaard et al. 1998), occurred in five individuals. As the set of 33 varieties was small, we did not investigate the chromosomal segments at the haplotype level.

Discussion

In this study of European wild apple genetic resources in the Netherlands, we used the 20K SNP array developed for cultivated apple (M. domestica; Di Pierro et al. 2016) for the first time in M. sylvestris. A total of 8048 SNPs passed our filtering, with a median level of heterozygosity of 0.29 (55% of the samples between 0.28 and 0.30), which is only slightly lower than the value of 0.34 (range 0.30–0.38 for more than 90% of the samples) found by Muranty et al. (2020) using a much larger set of SNPs from the apple Axiom 480K SNP array (Bianco et al. 2016) in commercial apple varieties, which have been selected and therefore may have an elevated level of heterozygosity. Among the accessions we found the first triploid M. sylvestris accession.

The 20K array was developed for genome-wide coverage in M. domestica with some minor input from M. floribunda and M. micromalus, but not from M. sylvestris. This might cause ascertainment bias, even though M. sylvestris is a major contributor to the genome of M. domestica. The microsatellite data used earlier in M. sylvestris (Koopman et al. 2007) and selected based on map positions in M. domestica had not shown any ascertainment bias. When the set of SNP data from 123 M. sylvestris genotypes was filtered together with SNP data of 33 M. domestica varieties made with the same 20K array, this resulted in a set of 11551 SNP that passed the threshold, an increase of about 3500 SNP markers that now were sufficiently polymorphic while they were (almost) monomorphic in M. sylvestris (we used a MAF threshold of 5%). It is possible that some of these markers are located in regions in cultivated apple that are not derived from M. sylvestris (Sun et al. 2020). However, adding a diverse group (even cultivars) to accessions with a narrow background can result in a comparable increase in the number of polymorphic SNPs. For example, a combination of indica and japonica rice accessions had 8–25% more polymorphic SNPs from the RiceSNP50 array compared to each subspecies separately (Chen et al. 2014). For a thorough analysis of unique M. sylvestris regions, a much larger set of M. sylvestris accessions should be screened, and it should include populations across its distribution area. These may be compared with the available SNP data of over 4000 M. domestica old and modern varieties and breeding lines that have been genotyped and will be analysed as part of an ongoing pedigree reconstruction project (Howard et al. 2018).

Evaluation of the ex situ management

In situ and ex situ conservation are complementary to each other and should both be applied to adequately conserve the genetic diversity of a species (Volis and Blecher 2010). Naturally, M. sylvestris trees are found in open forests and along forest edges (Schnitzler et al. 2014). Their current occurrence in small forest patches in the fragmented landscape in the Netherlands restricts the possibilities for implementing in situ conservation strategies. According to the Euforgen technical guidelines, the establishment of an ex situ collection, which also serves as a seed, is the most suitable and efficient conservation measure to undertake for European wild apple (Stephan et al. 2003). In the Netherlands, the ex situ collection was established as a field genebank in Roggebotzand in 2002 and does not meet these guidelines with respect to number of clones per seed orchard. This collection is used as a seed orchard to produce pure M. sylvestris seeds for forest and landscape purposes. Hereto, the collection has been used to establish three separate seed orchards, each containing 31 to 41 accessions from one (Drenthe), two (Nijmegen and Jansberg), or three populations (Winterswijk, Zeldersche Driessen, and Veluwe). The three seed orchards are officially approved as a 'seed source' under the category 'source identified' in the national register of approved basic material.

Errors may occur at many stages during collection of material, the propagation phase at the nursery or during planting in the field. The results of the current study demonstrate that the SNP analysis is an efficient tool to assist collection management in quality control. Overall, the clonal analysis identified the expected duplicate trees except in one case. Although only a small number of ramets were analysed, this confirms that erroneous labelling is rare, and we did not detect any sprouted rootstocks overgrowing the scions. The analysis also revealed some redundancy in the collection. Among the total of 115 accessions in the ex situ collection, 5 duplicates were found, indicating an immediate redundancy of 4.3%. Two of these unexpected duplicate groups were due to mislabeling, but three groups were sampled together, and they may be the result of coppice practices, which was common up to the early twentieth century (Koopman et al. 2007).

Extent of present and past hybridization

The study indicated a high frequency of pure M. sylvestris trees in the data set, as only 7 M. sylvestris individuals were detected with admixed ancestry in the Structure analysis. These individuals originally came from four different populations: Drenthe (3), Nijmegen (2), Slenaken (1), and Vijlenerbos (1). The individual tree in the forest edge of Vijlenerbos with mixed ancestry was already noticed during sampling as having intermediate morphological characters and was therefore included as a control. For this tree, we determined that one of its parents was the variety Sterappel, syn. Reinette Rouge, first described in 1830 and widely cultivated in Europe in the first half of the twentieth century (Smith 1971).

In the Netherlands, conditions for hybridization with domesticated apple exist. Old maps from the beginning of the nineteenth century show that practically every village in large parts of the Netherlands was surrounded by a belt of orchards. The largest expansion of apple growing took place between 1920 and 1930 and reached its peak around 1950 as the agricultural crisis around 1900 caused many farmers to switch from grain to fruit growing. Nowadays, cultivation of commercial apples is still an important activity in several regions of the Netherlands, notably in the Betuwe (Gelderland), South Limburg, and Zeeland. This means that the sampling locations Slenaken, Vijlen, Meinweg, as well as Nijmegen and Sint Jansberg are situated in apple-growing regions. On the other hand, we also found introgressed apple trees in Drenthe, which never had extensive commercial apple cultivation, although farms would have had apple trees for home use.

The low frequency of introgression from M. domestica to M. sylvestris in our study indicates that gene flow between the two species occurred only at low frequency. There are several studies on hybridization between M. sylvestris and M. domestica [see for a review Cornille et al. 2014 and references therein], which show rates of hybridization varying from 6.8% (Coart et al. 2003) to 36.7% (Cornille et al. 2013). In a comprehensive sampling of M. sylvestris across Europe (1889 individuals) Cornille et al. (2015) detected that 23.1% of the genotypes showed signs of introgression from M. domestica. A recent study investigating the frequency of hybrids in M. sylvestris populations in Northern Britain (Ruhsam et al. 2019) also found a significant amount of hybridization, as an average of 27% of the sampled M. sylvestris trees were the product of hybridization. However, they found clear differences in the extent of hybridization among the sampled regions, which differ in geographical features and intensity of land use. The lowest percentage of samples with hybrid origin were found in areas characterized by rugged terrain, high proportion of woodland of natural origin and, likely, a low number of cultivated apple trees in orchards. The highest percentage of hybrid trees collected in the field were found in areas with a long tradition of cultivating apple trees. Cornille et al. (2013) also suggested that environmental factors such as the distance of M. sylvestris populations to M. domestica orchards, as well as ecological or geographic factors, may affect hybridization rates. Larsen and Kjær (2009) found in their study that pollination distances above 50 m are quite common in natural populations, while pollinations above 300 m also occur, at a low frequency. Feurtey et al. (2017) found pollination distances between M. domestica and M. sylvestris up to a few km, but only 5% was above 1 km. In our study, the sampling of the trees was primarily focussed on old forest areas, where relict populations of M. sylvestris still exist in the Netherlands, in combination with stringent selection criteria for the typical ‘wild’ phenotypic characteristics. Trees with phenotypic characteristics that could point to cultivated apples such as hairiness of the underside of the leaves, large leaves, or fruits with red blush were excluded from sampling (Maes 2013). Apple trees occurring in the forest edges and along roads were also excluded as these may be derived from thrown-away cores (feral apples). Therefore, our results on percentage of hybrids found in the genebank material may be a realistic estimate of real hybridization rates in the natural populations.

Expanding the genebank collection with seedlings

To expand the ex situ field genebank in order to capture more of the genetic diversity that is still present in M. sylvestris populations, but not represented in the field genebank yet, the question was raised if seedling material collected in situ could be used instead of vegetative propagation of the original trees through budwood. Although none of the seedlings showed to be the result of hybridization with cultivated apple, nearly all seedlings were derived from parents that were already present in the genebank. In addition, a lot of effort would be required to keep the contribution of the parents balanced. Therefore, this is not recommended. Stocking in situ populations with seedlings from a seed orchard, is discussed below.

Threats to in situ populations

Our results suggest that the main threat for M. sylvestris populations in the Netherlands is not introgression from cultivated apple. In addition to the 6 ‘intermediate’ genotypes, 16 other genotypes showed introgression on 1 or 2 chromosomes. These might be random signals, or the remnants of an old introgression event, for which the age cannot be determined easily. At least, they do not point at recent introgression events. The set of reference varieties we used was small, but it was tuned towards the region of north-west Europe and varieties of the beginning of the twentieth century.

A more pressing threat is the fact that the populations are small and isolated. Close genetic relationships at the level of parent-offspring, siblings and half-sibs were found in the populations, as already indicated by the shared haplotype analysis of Koopman et al. (2007). Overall, 32 (29%) of the accessions had a close relative among the unrelated accessions in the population from which they came. In a self–incompatible species such as M. sylvestris, small population size and genetic relatedness may, due to limiting mate availability and genetic drift, lead to further population decline. These results imply that maintaining genetic diversity within these populations may become a problem in further generations, even if the level of heterozygosity of the accessions does not indicate this yet. There is no doubt that the loss of genetic diversity may compromise the ability of the populations to evolve and to cope with environmental changes and eventually reduces their chances of long-term existence (Booy et al. 2000).

Therefore, to maintain the genetic diversity in these populations, the most effective conservation strategy would be to increase the size of these populations by restocking with seed from the ex situ collection. With such small numbers of trees and occurrence of closely related individuals, sourcing from the same location is not effective, nor feasible, so it should be done through seed from other populations.

An essential requirement for in situ conservation of the relict M. sylvestris populations is that restocking is accompanied by improvements of the habitat conditions and proper management actions. During Medieval times and all through the nineteenth century, coppicing was an important form of forest management in the Netherlands and other parts of Europe. Oaks, beech, and other tree species were periodically coppiced. This brought light into the forest and a light-demanding species such as M. sylvestris benefited from this. However, current forest practice has resulted in relatively dense forest canopies. These conditions make M. sylvestris a weak competitor and makes it difficult for it to rejuvenate. Therefore, restocking should be accompanied by management actions such as thinning of surrounding trees, to allow sufficient light to be transmitted through the canopy. This will enable the trees to flower, to form fruit and to regenerate. Besides unfavorable light conditions due to the change in forest practice, current difficulties encountered in regeneration are also the result of browsing; therefore, young saplings also need to be protected from wild game.

Seed orchard design and in situ restocking

The SNP analysis confirmed that little population structure exists in the ex situ collection, but it did identify the existence of close relationships. The results strongly support the notion to combine all accessions in one seed orchard instead of their current partitioning in three different seed orchards according to their geographic origin. In order to facilitate random crosses between the genetically most distant accessions and thereby to maximize the potential genetic variability in the seed lots for restocking in forest areas, it is important to neutralise the effect of closely related genotypes, so that allele frequencies remain constant. The strategy that we propose, based on the genome-wide SNP data, is to focus for the seed orchard on the 71% of the 110 accessions that have no close genetic relationships with each other. Without the need to remove accessions in the collection, we recommend to harvest all accessions but suggest that within the seven pairs with parent-child or full-sib relationships, sharing half their genome, only one accession is used for seed harvest.

For a seed orchard, different scenarios are possible for harvesting in order to restock natural populations, including 1) harvest in bulk, 2) harvest per mother tree, mix equally and distribute half-sibs equally over the populations, and 3) harvest per mother tree and distribute the half-sibs over populations while taking into account the genotypes of the mother trees. We recommend the harvest strategy to follow the size of the population that is desired. Harvesting in bulk, which is less costly, could be appropriate for commercial plantings and restocking of large populations, while for restocking of very small populations the background of the mother trees should be taken into account and the frequency of related individuals (half-sibs) in the seed harvest should be controlled, even though this takes more effort from the seed orchard manager.

Prospects for future research on genetic relationships in M. sylvestris

Our current analyses on genetic relationships is based on single SNP data. However, the network of discovered genetic relationships does allow phasing of the SNP markers, allowing us to reconstruct multi-SNP haplotypes. The genotypes that were parent to several offspring, in varying combinations, as shown in the trio and duo analyses in Table 3 combined, reveal an entire network of genetic relationships. From a conservation point of view, identifying separate haplotypes would, in the future, allow management of diversity at the haplotype level. Such an advanced approach would not be meaningful for the small Dutch genebank, but it may be explored at a European level.

Analyses of the length of shared haplotypes are powerful for the identification and reconstruction of distant genetic relationships, in resolving generation order, and demographic history (Koopman et al. 2007; Gusev et al. 2012; Harris and Nielsen 2013; Leitwein et al. 2020). In the Rosaceae, through various projects, approaches, software and workflows are being developed to perform such analyses in an adequate and efficient manner (Howard et al. 2018; Laurens et al. 2018; Peace et al. 2019). These infrastructures are expected to become publicly available soon (Howard and Peace, personal communication). They have been mostly developed in M. domestica and sweet cherry (Prunus avium L.). Our dataset may serve as an initiation point for such analyses in M. sylvestris and lead to a joint international initiative on a much wider germplasm collection.