Introduction

Almond [Prunus dulcis (Mill.) D.A.Webb; syn. P. amygdalus Batsch] is a species of the large and economically important Rosaceae family that is commercially grown worldwide for its kernels. The cultivated almond is thought to have originated in the arid mountainous regions of Central Asia (Gradziel 2011). Numerous related wild species (over 20) are also found growing in the mountains and deserts of Central Asia from western China through Iran and Turkey (Martínez-Gómez et al. 2007). Molecular results evidenced a dissemination of the cultivated almond from Central Asia (Iran) to the Eastern Mediterranean and subsequently to the Western Mediterranean regions, to North America, and finally, to the southern hemisphere including South America and Australia.

Hybridization has played a central role in the evolutionary history of domesticated plants. Nuclear and chloroplast DNA markers supported in a complementary way, P. fenzliana as a probable ancestor of the cultivated almond (Zeinalabedini et al. 2010). Delplancke et al. (2012) detected high genetic diversity levels along with substantial and symmetric gene flow between the domesticated P. dulcis and the wild P. orientalis. Almond is a self-incompatible species which is governed by the highly polymorphic, multiallelic S-locus (Dicenta and García 1993a). Due to the genetically controlled self-incompatibility system, almond is one of the most polymorphic cultivated fruit species (Martínez-Gómez et al. 2007). Genetic diversity studies in almond have not revealed a direct relationship between the level of diversity and the origin of the germplasm (Szikriszt et al. 2011).

Molecular markers developed for Prunus are particularly useful to study genome evolution and structure, genetic diversity, and for fingerprinting accessions (Martínez-Gómez et al. 2003). Several marker types have been used to study almond species. The last decade showed that the molecular marker technique most intensively used and most appropriate for genetic variability studies in almond was the SSR (simple sequence repeats) analysis because SSRs are extremely abundant, dispersed relatively evenly throughout the genome and show codominant inheritance (Gupta et al. 1996). Since the first SSRs were described in peach (Cipriani et al. 1999), they have been developed in many other Rosaceae species, such as apricot, Japanese plum, and cherry (Dirlewanger et al. 2002; Messina et al. 2004; Mnejja et al. 2004). Transferability (being able to use an SSR developed in one species for other species) has been frequently reported, particularly for peach SSRs (Cipriani et al. 1999). The first set of almond SSRs was published by Testolin et al. (2004).

To date, several studies were carried out to characterize the SSR diversity of almond cultivars and genotypes originating in specific geographical regions. Xu et al. (2004) developed SSR markers for the phylogenetic analysis of almond accessions from China and the Mediterranean region. Genetic diversity of the Spanish national almond collection was characterized by Fernández i Martí et al. (2009). Twelve highly polymorphic SSR loci were selected to uniquely identify cultivars commonly grown in California, and to allow an accurate assessment of parent/offspring relationships among them (Dangl et al. 2009). Zeinalabedini et al. (2010) characterized Spanish, French, Italian, American, Iranian, Tunisian, Australian, Ukrainian, Portuguese, and Slovakian almond cultivars by chloroplast and nuclear SSRs. These studies established the value of SSR markers for distinguishing different genetic lineages and detected an extensive gene pool available to almond genetic improvement. Promising Iranian almond genotypes, wild almonds, and related Prunus species were also characterized by SSR and EST-SSR markers (Zeinalabedini et al. 2012; Rahemi et al. 2012). Nuclear DNA markers showed that Moroccan genotypes were genetically different from the tested commercial cultivars and therefore formed a separate genetic pool (El Hamzaoui et al. 2013).

Genetic diversity parameters have not demonstrated marked differences in almond germplasm grown in different regions of the world. This current study was carried out to examine this phenomenon by analyzing accessions from Central Asia to California using highly polymorphic SSR markers. In addition, our analysis also included almonds of Hungarian origin, a previously unexamined region. We also aimed to characterize genetic differentiation and both natural and human imposed factors that are responsible for the maintenance of genetic diversity and population structure in almond.

Materials and methods

Plant material

Eighty-six almond accessions originating from different geographical regions were evaluated in the experiments. The Hungarian, Ukrainian, and Italian cultivars are kept in the collection of the Szent István University, Faculty of Horticultural Science, Department of Genetics and Plant Breeding. Other Hungarian samples were collected from old abandoned orchards in Gellérthegy (47° 29′ 8.6748″ N and 19° 3′ 0.9468″ E), Tétényi-fennsík (47° 24′ 48.7″ E), Monor (47° 20′ 53.03″ N and 19° 26′ 24.464″ E), and the germplasm collection of National Agricultural Research and Innovation Centre, Fruitculture Research Institute, Cegléd, Hungary. Leaf samples of the Moroccan accessions were collected from natural populations in the Rif Mountains, north of Morocco (33° 14′ 0.456″ N and 8° 29′ 48.012″ W) and Sais Plain in central regions of Morocco (33° 27′ 28.8936″ N and 5° 11′ 1.824″ W). Turkish wild almond genotypes originated from Bademli region, Erzurum (40° 27′ 13.3704″ N and 40° 54′ 9.3924″ E) and Akdamar Island in Lake Van (38° 20′ 30.9084″ N and 43° 2′ 7.9044″ E). Some wild-growing accessions were collected in Kyrgyzstan, near the city Osh (40° 28′ 56.1432″ N and 72° 43′ 20.2764″ E). Californian cultivars and three accessions of wild almond species (Prunus tenella, P. arabica, and P. webbii) were obtained from an experimental orchard of Agricultural Research Service (United States Department of Agriculture, Parlier, CA).

DNA extraction and PCR conditions

Genomic DNA was extracted from fully expanded young leaves using a DNeasy Plant Mini Kit (Qiagen, Hilden, Germany). DNA concentrations and purification parameters were measured using a Nanodrop ND-1000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA). A set of 12 SSRs and 5 EST-SSR primer pairs were selected on the basis of previous reports on different Prunus species, and included 9 for peach, 6 for almond, and 2 for plum (Table 1), covering eight linkage groups (G1 to G8). The forward primers were labeled with 6-FAM fluorescent dye for detection in a capillary genetic analyzer. PCR reactions were carried out in a PTC 200 thermocycler (MJ Research, Quebec, Canada) using the program described for the primers. For each 25 μl PCR reaction, approximately 20–80 ng of genomic DNA was used. Each reaction contained 10× DreamTaq™ Green buffer (Thermo Fisher Scientific), 4.5 mM MgCl2, 0.2 mM of dNTPs, 0.2 μM of the adequate primers, and 0.75 U of DreamTaq™ DNA polymerase (Thermo Fisher Scientific). Finally, KCl and (NH4)2SO4 were added at a ratio optimized for robust performance of DreamTaq™ DNA Polymerase.

Table 1 SSR loci from different Prunus species analyzed in the almond accessions studied, locus type, linkage group of their localization, annealing temperature, number of alleles detected, observed (Ho) and expected (He) heterozygosity, fixation index (F), and polymorphic information content (PIC) values

Electrophoresis of PCR products and allele sizing

To check the PCR amplifications and determine the approximate sizes of the alleles amplified, 4 μl of the PCR products were separated by electrophoresis in 1.2% TAE agarose gels for 2 h at 100 V with DNA bands being visualized by ethidium bromide staining. Fragment lengths were estimated by comparison with the 1-kb DNA ladder (Promega, Madison, USA). To determine the exact size of the fragments, the fluorescently labeled products were run on an automated sequencer (ABI Prism 3100 Genetic Analyzer, Applied Biosystems, Foster City, CA). For determination of fragment sizes (genotyping), GENOTYPER 3.7 software and the GS500 LIZ size standard (Applied Biosystems) were used.

Data analysis

Genetic relatedness among genotypes was studied by UPGMA (Unweighted Pair Group Method with Arithmetic averages) cluster analysis using Popgene 1.32 (http://www.ualberta.ca/_fyeh/). Bootstrap support for nodes in the UPGMA tree was estimated from 1000 replications using the Paleontological Statistics (Past) software v.2.17c (Hammer et al. 2001). A phylogenetic tree (dendrogram) was constructed using the TREEVIEW program (Page 1996). PopGene 1.32 software (Yeh et al. 1997) was also used for calculation of observed heterozygosity (Ho), expected heterozygosity (He), observed number of alleles (Na), Shannon’s information index (I), fixation index (FST), inbreeding coefficient (FIS), and gene flow (Nm = 0.25(1-FST)/FST. To assess the informativeness of the chosen markers, the average polymorphic information content (PIC) was calculated in each locus according to the formula given by Anderson et al. (1993).

To further analyze the genetic composition of almond accessions, a Bayesian approach was used to estimate the number of clusters with STRUCTURE 2.3.4. software (Pritchard et al. 2000). Because the analyzed genotypes were derived from different breeding programs and natural hybridization, the “admixture ancestry” and the “correlated allele frequency” parameters were used. K was set from 1 to 15, each run was replicated 10 times, with the burn-in period of 100,000 followed by 100,000 Markov Chain Monte Carlo (MCMC) repetitions. Estimation of the best K value was conducted with STRUCTURE Harvester (Earl and vonHoldt 2012) following the Evanno et al. (2005) method. Neighbor-joining clustering of geographic groups based on pairwise Nei’s genetic distance values was also carried out using STRUCTURE.

Genetic diversity was partitioned using an analysis of molecular variance (AMOVA) as implemented in GenAlEx 6.503 (Peakall and Smouse 2012) to compute pairwise FST and RST values between taxa. Standard errors for F statistics were determined by jackknifing over loci using 999 permutations. A Mantel test was used to estimate the association between pairs of independent dissimilarity matrices: geographic distance and genetic distance, geographic distance and RST values as well as FST and RST values. Pairwise geographic distances among the analyzed populations were calculated according to the latitude and longitude of each site with Vincenty’s formula (http://www.movable-type.co.uk/scripts/latlong- vincenty.html). The significance of estimates was tested based on 9999 permutations.

Results and discussion

Polymorphism analysis of the SSR markers

The amplification of genomic DNA of 86 genotypes collected in different geographical regions was successful in 15 of the 17 SSR loci, including 11 genomic and 4 EST-SSR primers developed from different Prunus species (peach, almond, and Japanese plum). All primer pairs produced a maximum of two alleles per genotype in accordance with the diploid genome of the species. Genotypes showing a single band were considered homozygous for that particular locus. Combining data of all efficient SSR markers, 15 primer pairs produced a total of 238 alleles ranging from four to 21 per locus (Table 1), with sizes from 86 to 230 bp. Genomic7 SSRs were more polymorphic revealing 193 alleles, while the EST-SSRs were less polymorphic by detecting 45 alleles. The mean value was 18.86 alleles per locus. SSR primers derived from the non-coding DNA region detected an average of 17.54 alleles per locus. This value is much higher than 8.4 reported by Testolin et al. (2004), 4.7 obtained by Martínez-Gómez et al. (2003), 6.3 obtained by Xie et al. (2006), 14.9 shown by El Hamzaoui et al. (2013), and slightly higher than 17.21 alleles per locus reported by Fernández i Martí et al. (2009) in other Prunus diversity studies. However, it is slightly lower than 18.66 per locus reported by Fernández i Martí et al. (2015) and 18 alleles per locus obtained by Distefano et al. (2013). The average number of alleles for EST-SSR primers, originating from coding DNA regions, was 11.25, which is lower than that obtained by genomic SSR markers.

Among EST-SSR loci, EPDCU 3083 amplified the highest number of alleles (20), while EPDCU5100 detected the lowest number of alleles, only four (Table 1). The most informative locus was BPTCT007 with a polymorphic information content (PIC) value of 0.90, whereas, the EPDCU 5100 locus was the least informative marker with a PIC value of 0.17. This parameter provides an estimate of the discrimination power of a molecular marker by taking into account not only the number of alleles per locus but also their relative frequencies in each studied population (Anderson et al. 1993).

The average PIC value of EST-SSR loci (0.66) was lower than that of genomic SSR loci (0.80). The present results fit well with previous studies that suggested that EST-SSRs were less polymorphic than their genomic counterparts in other almond genotypes (Rahemi et al. 2012) and other species (Tahan et al. 2009). However, they are in contrast with results of Xie et al. (2006), reporting that genomic and EST-SSR primers amplified similar numbers of alleles. That study analyzed only 38 almond cultivars with five more peach and almond × peach hybrids. The fewer analyzed genotypes and the inclusion of peach, a more homozygous species into the sample set, or differences in the applied markers may explain the identical performance of genomic and EST-SSRs, which is in contrast with our findings. The results reported here suggest that SSR markers are a very suitable tool for assessing genetic diversity of almond genotypes and genomic SSRs may provide more information than SSR regions in the coding part of the genome.

The values of expected and observed heterozygosity were compared by the fixation index (F) that ranged between − 0.09 (EPDCU 5100) and 0.36 (ASSR17) (Table 1). The observed heterozygosity ranged between 0.62 in CPPCT044 and 0.81 in BPPCT 025, with an average of 0.73 across the SSR loci. In the case of EST-SSRs, the average was much lower (0.53) and one locus (EPDCU 5100) revealed an exceptionally low level of heterozygosity (0.19). The average observed heterozygosity value calculated for the SSR makers (0.73) is very similar to the 0.72 value obtained by Fernández i Martí et al. (2009), but higher than the values 0.59, 0.62, and 0.62 reported by El Hamzaoui et al. (2013), Martínez-Gómez et al. (2003) and Fernández i Martí et al. (2015), respectively. In general, He values were higher than Ho, and hence, the values of fixation index were positive for 14 primers and was only negative for one EST-SSR primer. The average F value was 0.13, which is lower than 0.23 reported by Fernández i Martí et al. (2015). This low level of heterozygosis deficit is in harmony with the self-incompatibility of this species but it may indicate a modest level of inbreeding (discussed later).

Genetic relationships and similarities among genotypes

The genetic relatedness and diversity among the 86 almond genotypes and cultivars were evaluated using UPGMA cluster analysis based on the similarity matrix generated by the Nei and Li (1979) coefficient. Genetic similarities ranged from 0.03 (P. tenella and ‘Ne Plus Ultra’) to 1.00 (three accessions from Akdamar island, Turkey) with an average of 0.29. Almond genotypes clustered according to their pedigree and geographic origin (Fig. 1). The 86 almond cultivars, genotypes, and wild species were classified into two groups of different size, with P. tenella forming an outgroup and separated from the rest of the genotypes. The present results indicate that P. tenella might not have contributed to the gene pool of modern cultivated almond. P. tenella is classified in the section Chamaeamygdalus Spach of subgenus Amygdalus (Browicz and Zohary 1996). P. tenella is considered as a sister to a clade composed of subgenera Amygdalus and Prunus (Lee and Wen 2001). Bortiri et al. (2006) studied the phylogenetic relationships among 37 Prunus species using data from internal transcribed spacer, chloroplastic intergenic spacer sequences, and 25 morphological characters, and confirmed that this species could be considered as a sister to the rest of subgenus Amygdalus.

Fig. 1
figure 1

Genetic distance and structure analysis of the 86 almond cultivars and genotypes studied using 15 SSR markers. The dendrogram was constructed based on UPGMA analysis using the similarity matrix generated by the Nei and Li (1979) coefficients. The numbers at specific nodes indicate percentage of 1000 bootstrap replicates in which a given group was found (< 50% not shown). The bar plots of individual almond accessions were generated using the STRUCTURE 2.3.4. software. The nine reconstructed populations are distinguished by different colors and their clades are labeled by numbers on the dendrogram. Each accession is presented by a thin bar divided into segments indicating its genetic background. Multiple colors show admixed genetic constitution of each individual

STRUCTURE analysis was carried out to determine the genetic constitution of different groups. The Evanno criterion gave a strong signal for K = 9 indicating nine genetically distinct subgroups resided within the studied genotypes (Fig. 2). This analysis detected the Kyrgyz, Akdamar, Bademli, Hungarian, Monor, Italian, Moroccan, and Californian accessions and wild species as genetically distinct groups, all receiving significant bootstrap support (Fig. 1). The results from Bayesian clustering analysis were in harmony with the groupings we detected in the molecular phylogenetic dendrogram (Fig. 1) with some admixed accessions (e.g., all Ukrainian cultivars and some California, Hungarian, and Moroccan accessions).

Fig. 2
figure 2

Estimation of the optimum number of clusters for almond genotypes using the Evanno’s method. The graph displays the DeltaK for each K value and indicates K = 9 as the uppermost probable number of genetically homogenous groups of the analyzed samples

The second group is formed only by Italian cultivars and one accession of Prunus webbii. All these cultivars are self-compatible (Socias i Company 1990). P. webbii self-compatibility had been suggested by different authors (Godini 1979; Socias i Company 1990) as a possible origin of self-compatibility in the almond population of Puglia. Godini (2000) also pointed out some morphological similarities between P. webbii and some cultivars from Puglia as support for this hypothesis. Further molecular studies have verified this hypothesis, providing unequivocal evidence for gene flow between wild and cultivated almond species (Martínez-Gómez et al. 2003). Later, Bošković et al. (2007) S-genotyped Apulian P. dulcis and P. webbii accessions and speculated that Sf might have arisen within P. dulcis and its occasional occurrence in P. webbii is due to introgression from P. dulcis. Our results seem to support this hypothesis since P. webbii carried some Apulian alleles (Fig. 1) and most self-compatible (SC) Italian cultivars did not contain “wild” alleles associated with P. webbii. However, regardless of the direction of introgressive pollen transfer between the two species, the clustering of P. webbii with the Italian almond cultivars in the present study confirms the hypothesis that when almond cultivars moved toward the Mediterranean, new hybridizations might have occurred, especially with the wild Mediterranean species P. webbii, resulting in some of the almond populations found along the northern shore of the Mediterranean sea (Socias i Company 2004).

The analysis of genetic distance shows that ‘Tuono’ and ‘Supernova’ are distinct cultivars. This result agrees with those reported by Fernández i Martí et al. (2009). However, using SSR analysis, Marchese et al. (2008) showed that ‘Supernova’ was very similar to ‘Tuono’. Later, Distefano et al. (2013) and Dicenta et al. (2015) also found ‘Supernova’ and ‘Tuono’ to be identical. Marchese et al. (2008) reported that growers in the Agrigento area, Sicily, had commented on the similarity of the nuts of ‘Supernova’ and ‘Tuono’. While tree characters were also said to be very similar, they also presented small differences in their growth habit and flowering time. These differences support our results and those reported by Fernández i Martí et al. (2009). The cultivars ‘Filippo Ceo’ and ‘Falsa Barese’ exhibited similar SSR profiles. These genotypes also show similar agronomic and phenotypic traits (De Giorgio and Polignano 2001). Our results suggest that these cultivars have the same origin or one may have originated from the other.

The third clade contained almost all Moroccan genotypes included in this study and a Prunus arabica accession. The subgroup formed only by Moroccan genotypes agrees with the results reported by El Hamzaoui et al. (2013) on the comparison of almond gene pool of Mediterranean and Californian cultivars. Similar results were reported by Delplancke et al. (2013) who by combining nuclear and chloroplast microsatellites suggested the presence of endemic alleles in Morocco that could either reflect early introductions or relict natural populations of wild almonds in Moroccan glacial refugia (Médail and Diadema 2009). The clustering of Moroccan genotype (K4) with a P. arabica accession indicates the contribution of this species in the evolution of cultivated almond. This species is native to Middle East regions (Iraq, Jordan, and Syria) (Meikle 1966). It is probable that in these regions, natural hybridization of P. dulcis with other species such as P. arabica might have occurred. Recently, Delplancke et al. (2016) using molecular analysis reported that P. arabica and P. dulcis clustered in the same clade, suggesting that different wild lineages could have spontaneously contributed to the genome of cultivated almond. Furthermore, archeological and ethnographical data reported a long use of wild almonds (P. orientalis, P. arabica, P. bucharica) in different parts of their distribution area (Martinoli and Jacomet 2004). Probably, P. arabica or its interspecific hybrids were also introduced into Morocco by the Carthaginians between the fifth and fourth century BP (El Khatib-Boujibar 1983) or by the Arabs during the sixth and seventh centuries (Kester et al. 1991).

California cultivars (‘Nonpareil’, ‘Ne Plus Ultra’, ‘Mission’, ‘Monterey’, and ‘Thompson’) formed a separate subcluster (4). The cultivars ‘Ne Plus Ultra’ and ‘Nonpareil’ showed some admixed structure and positioned relatively far compared to the rest of California almonds which were more uniform. ‘Ne Plus Ultra’, along with ‘Nonpareil’, are considered founding cultivars of the California almond industry, and originated from the same seedling planting in the late 1800s (Wood 1925). ‘Monterey’ is an offspring of ‘Nonpareil’, and the parentage of ‘Thompson’ is unknown (Brooks and Olmo 1952), but our results support previous assumptions based on randomly amplified DNA analysis and S-genotypes pointing to ‘Nonpareil’ as one of its putative parent (Bartolozzi et al. 1998). ‘Mission’ also seems to be very close to ‘Thompson’. Its putative Mediterranean origin hypothesized by Fernández i Martí et al. (2015) was supported by our results since alleles present in Akdmar Island (South of Turkey) were detected in the US cultivars.

Two sub-clades have an intermixed pattern of cultivars with different origins. One of the sub-clades encompassed Hungarian, French, and Moroccan genotypes while the other grouped some Ukrainian and Hungarian accessions. STRUCTURE analysis identified an admixed gene pool as the reasons for such an interweaving cluster of different cultivars. ‘Szigetcsépi 55’ showed only minor differences to ‘Eriane’ in two of the assayed loci. Halász et al. (2010) determined the S-genotypes of two ‘Szigetcsépi 55’ accessions in a germplasm collection. One of those shared an identical S-genotype with ‘Eriane’ and our SSR data now support that they must be closely related or identical accessions kept under different names. The placement of ‘Eriane’ in this subgroup with a Moroccan genotype is an indication of the intense connections among Mediterranean countries and germplasm exchange ranging from Phoenician trading to collecting trips in the second half of the twentieth century (Gradziel 2011).

A subgroup of the Ukrainian cultivars is formed with some Hungarian accessions and ‘Afrnitsplates’ (of supposed African origin). Ukrainian cultivars (‘Pozdnyi’, ‘Nikitskyi 707’ and ‘Crenomorskyi’), as well as Hungarian accessions ‘Diósdi félpapírhéjú’, Cegléd-451, and Gellérthegy-9 carry some alleles from the European and wild groups. Ukrainian almond breeding was based on a germplasm collection having accessions from France (Yezhov et al. 2005) and several wild species (Gradziel 2011). The Nikitskyi clones were selected at the Nikita Botanical Garden of Yalta but they were reported to be introduced from France, which was also supported by clustering with Iranian and Mallorcan cultivars (Fernández i Martí et al. 2015).

The major part of cluster 5 includes 22 accessions originating from Hungary embracing old cultivars, rootstocks, and naturalized trees from countryside and abandoned orchards in three locations of Hungary. Cultivars with known complete or partial pedigrees were clustered according to their parentage. For example, ‘Szigetcsépi 58’ and ‘Szigetcsépi 92’ are seedlings of ‘Diósdi félpapírhéjú’ while ‘Tétényi kedvenc’, ‘Tétényi keményhéjú’, and ‘Tétényi rekord’ are in half-sib or full-sib relationships (Halász et al. 2010). Several bitter-kernelled accessions are positioned among sweet-kernelled Hungarian cultivars in this cluster. Cegléd accessions are old trees of unknown origin used for producing seed-propagated rootstocks. Gellérthegy accessions are bitter-kernelled almond trees used as rootstock in a peach orchard established in the second half of the nineteenth century (Rapaics 1940). After peach orchards had been devastated, the rootstock trees grew up and formed a picturesque part of the Danube banks in Budapest that is included on the UNESCO World Heritage List. The co-clustering of sweet- and bitter-kernelled almonds is not unexpected since this trait is determined by a single gene with the sweet allele being dominant to the bitter (Dicenta and García 1993b).

Cluster 6 includes mainly Turkish, some Moroccan, and Hungarian accessions as well as cultivar ‘Santa Caterina’, probably originating in the Mediterranean region of the Middle East. This grouping suggests that the very old trees sampled in Monor (Hungary) may have relationships with Mediterranean genotypes. Almond was first brought to Hungary (Pannonia) by the Romans (Rapaics 1940; Gradziel 2011) and later by Latin monks and traders in the Middle Ages. It is further supported by etymology since the Latin mandorla/mandola reutilized and altered to “mandula” in Hungarian language.

Turkish genotypes clustered in two groups; accessions from ‘Akdamar’ island in Lake Van (cluster 7) were separated from those collected in ‘Bademli’ village (Erzurum) (cluster 8). Bademli-31 was placed among Kyrgyz accessions indicating their genetic relatedness. This is probably explained by its location close to Silk Road trading routes (Gradziel 2011). It is very interesting to consider the separation of the Bademli and Akdamar accessions and that genotypes Akdamar 2; 25–26; and 30, 32 are very similar, as are Akdamar 27 and 36 as well as Akdamar 40 and 41. The latter two and Akdamar 1 seem to share some Bademli alleles, as well. Genetic diversity analysis was carried out to clarify the phylogenetic patterns of Turkish and other almond accessions and genetic differentiation of the major reconstructed populations.

Genetic diversity and differentiation of major reconstructed populations

The genetic diversity parameters were further investigated in the major populations reconstructed by the STRUCTURE analysis. Wild species as single representatives of P. arabica, P. webbii, and P. tenella were neglected in some analyses as were Kyrgyz and Monor groups since we had less than five individuals in those groups. An AMOVA analysis declared that considerable genetic variation occurred within populations (71.30%), and genetic variation among populations was also significant, reaching a value of 28.70% (Table 4). This level of variation among populations is much higher than the value estimated for P. sibirica (Wang et al. 2014) or P. mahaleb (Jordano and Godoy 2000), and that shown by Fernández i Martí et al. (2015) in almond.

The highest number of alleles for most of the used markers (Table 2) was detected in the Hungarian and Moroccan accessions, while the lowest number of alleles occurred in California cultivars. In Turkish genotypes, the mean number of alleles was lower in the almond population collected from Akdamar Island (4.45) compared to those collected from the Bademli inland region (5.09), and higher than the values of California (3.09) and Italian (3.63) genotypes. Most unique alleles were detected in Hungarian accessions, while unique alleles were not identified in California cultivars. Since allele frequencies are influenced by the number of accessions studied, Akdamar, Bademli, and Morocco can be directly compared since 11 accessions were tested in each of those regions. The genetic diversity of Akdamar almonds seem to be lower compared to Bademli and Moroccan accessions.

Table 2 Results of the analysis of molecular variance (AMOVA) for 86 almond accessions grouped in 10 populations

The highest observed heterozygosity was shown in the Turkish accessions from Akdamar Island (0.77), while the lowest value occurred in the group of Italian cultivars (0.58). These values remain high across evaluated populations since almond is a self-incompatible species and are similar to the data reported in the literature (Martínez-Gómez et al. 2003; El Hamzaoui et al. 2013; Fernández i Martí et al. 2015). The mean value of the fixation index (FST) varied between 0.38 (Turkish Akdamar Island population) and around 0.55 for both the Moroccan and Hungarian genotypes. These values are positively high indicating a marked genetic differentiation among the populations. A gene flow value (Nm) greater than one is “strong enough” to prevent substantial differentiation due to genetic drift (Slatkin and Barton 1989). Its value was much lower than one in each population presenting an evidence of restricted gene flow (Table 2), further supporting the basis of strong genetic differentiation among populations.

As stepwise mutation is reflected by RST values (Slatkin 1995) and it proved to be more efficient to assess population genetic differentiation in Prunus species (Mihretie et al. 2015), we determined pairwise RST values among major reconstructed populations (Table 3). The mean value of genetic differentiation among all populations was significant (P < 0.01) both based on unordered alleles (FST = 0.168) and ordered alleles (RST = 0.289). Only RST values between pairs of the Hungarian and Admixed (composed of Ukrainian and some Hungarian accessions) as well as the Monor and Admixed groups were not statistically significant (P ≤ 0.01). Its explanation is illustrated in Fig. 1, showing the cultivars in admixed group within cluster 5 share some common alleles with the Hungarian and Monor groups. These latter two groups are isolated from each other, which is indicated by their strong bootstrap support and their significantly different RST values. Although FST was found to be more suitable in a study on P. sibirica (Wang et al. 2014), RST proved to be more sensitive in P. africana (Mihretie et al. 2015). In our study, only slight differences were shown between FST and RST values, which was confirmed by a Mantel test showing significant association of FST and RST values (Rxy = 0.636, P = 0.001). Between the pairs of populations of Kyrgyz and Monor, Monor and Morocco, Akdamar and Italian, Hungarian and Admixed, Monor and Admixed as well as Monor and Wild species, the values for FST were significantly higher than RST indicating that drift-driven differentiation is more probable between such populations, while stepwise mutations have also contributed to differentiation between the rest of populations (Hardy et al. 2003) (Table 4).

Table 3 Diversity indices of reconstructed major populations with ≥ 5 individuals suggested by the STRUCTURE analysis
Table 4 Pairwise FST values (below diagonal) and pairwise RST values (above diagonal) between 10 populations

A Mantel test showed no significant correlation between the geographic distance and the genetic distance (Rxy = 0.173, P = 0.226) as well as between the geographic distance and the RST values (Rxy = 0.248, P = 0.194), suggesting that geographic distance among the assessed populations has little influence on their genetic differentiation. This is also confirmed by genetic relationships among the almond groups defined by STRUCTURE and assessed based on Nei’s genetic distances and the neighbor-joining algorithm (Fig. 3). Populations in relative geographic proximity show considerable genetic distance, such as between Hungarian accessions and those in Monor group (separately clustering from the rest of Hungarian genotypes) or Turkish groups in Bademli and Akdamar.

Fig. 3
figure 3

Neighbor-joining clustering of geographic groups based on pairwise Nei’s genetic distance values

The inbreeding coefficient (FIS) had low positive values for the Hungarian, Bademli, and Moroccan accessions, as well as self-compatible Italian cultivars, which indicates low-level inbreeding in these populations. Similar findings related to heterozygosity deficiency have been observed in other almond populations (Xie et al. 2006; Fernández i Martí et al. 2009; El Hamzaoui et al. 2013). A deficiency of heterozygosity in the outcrossing tropical species Sextonia rubra (Mez) van der Werff was considered to be an effect of biparental breeding, resulting from limited pollen dispersal among relatives (Veron et al. 2005). Half-sibling mating was shown by Hadziabdic et al. (2012) to be responsible for the deficiency in heterozygosity in dogwood accessions growing in a small geographical area.

The deficit of heterozygosity in the analyzed almond trees could be, in part, due to the frequent use of superior almond genotypes for propagation (Lansari et al. 1994; Ercisli 2004; Gradziel 2011). This leads to a loss of genetic diversity and therefore strengthens the hypothesis for the existence of a certain level of inbreeding within local almond tree populations. This might be the only factor in the Moroccan gene pool, which is characterized by high genetic diversity reflected by the average number of alleles and observed heterozygosity. This fact could be due to the multiple and recurrent introductions that avoided the domestication bottleneck and led to higher local diversity (Pickersgill 1998). Delplancke et al. (2013) confirmed that the Moroccan almond gene pool has been formed by numerous exchanges between southern Europe and North Africa. This has led to the observed high diversity and hence Morocco could be considered a hotspot of the secondary diversity of almond.

The Hungarian gene pool showed a high number of alleles indicating great genetic diversity. However, Ho is lower than He, and this slight deficiency in heterozygosity might be attributed to human selection and breeding activity performed through several decades on this material. Tétényi and Szigetcsépi cultivars can be traced back to a limited number of offspring from ‘Burbank seedling’, advanced selections Budatétényi 6 and 7, and ‘Diósdi félpapírhéjú’ (Halász et al. 2010). This co-ancestry may result in a slight loss of heterozygosity.

In the case of Italian cultivars, a heterozygosity deficiency could be explained as a process of partial selfing that occurred among genotypes taking into account that all analyzed cultivars are self-compatible. Godini (2000) and Socias i Company (1990) reported that in the Italian cultivars, self-compatibility (the allele Sf at the S-locus) could have been transferred spontaneously into cultivars from P. webbii. This gene flow between crop and wild species could also account for the high diversity observed in self-compatible Italian genotypes. Fernández i Martí et al. (2009) reported that no differences between the heterozygosity levels of self-compatible (released from breeding program in the last two decades) and self-incompatible cultivars were observed. It is not surprising since all self-compatible cultivars are heterozygous for self-compatibility (Socias i Company 1990) and selfing is not used in almond breeding programs because of resultant inbreeding depression symptoms (Martínez-García et al. 2012). The half-sib mating coupled with the relatively recent occurrence of self-compatibility in almond, detected only in Puglia region in Italy, has probably not advanced through enough generations to accumulate lethal and deleterious alleles in the progeny, as observed in a self-compatible breeding program (Socias i Company 2002), and preventing a serious loss of genetic diversity (Szikriszt et al. 2011).

Only California cultivars and Akdamar group showed negative FIS values indicating excess of heterozygotes in those populations. For the California cultivars, all studied accessions are genetically related and have high degree of co-ancestry (Lansari et al. 1994). The cultivars ‘Monterey’ and ‘Thompson’ are chance seedlings, probably originated from the ‘Nonpareil’ × ‘Mission’ cross (Asai et al. 1996). Lansari et al. (1994) reported that although many almond cultivars have ‘Nonpareil’ and ‘Mission’ as parents, the mean inbreeding coefficient for US almond cultivars is low, which was also confirmed by our analyses.

The FIS value was the lowest in the Akdamar population, growing in an isolated geographical area. Although Akdamar Island is located approx. 3.5 km from land, a distance that pollinator insects are capable of achieving through flight, bees will rarely fly that distance when over water (Tautz et al. 2004). Ellstrand and Elam (1993) reported that population isolation may lead to stochastic differentiation by genetic drift. In this case, founder effect also contributes to the genetic differentiation with a small number of individuals originally reaching the island. This is well reflected by the relatively low average allele number and the fact that only some accessions carry alleles that are relatively frequent in another Turkish almond population (Bademli). However, in a self-incompatible species, frequency-dependent selection acting on the S-locus will definitely have an effect on other loci, as well. It will allow mating only between trees with differing S-genotypes that help maintain high diversity under these specific conditions.

All these data indicate that genetic diversity parameters in almond do not show a biased variation along the dissemination routes from its putative center of origin in Central Asia (Martínez-Gómez et al. 2007). This is a sharply different pattern compared to what was shown for apricot, which is characterized by a considerable loss of genetic diversity in Western Europe and the Mediterranean basin (Bourguiba et al. 2012). This phenomenon was found to be linked to changes in the mating strategy (Halász et al. 2007), with self-compatibility arising putatively in the Eastern part of Turkey (Halász et al. 2013). As this important character spread rapidly into the Western regions, it induced a domestication bottleneck. A slight loss of heterozygosity occurred in the almond germplasm in most regions, which might be attributed to several factors. However, the extent of inbreeding is limited, even in the group of self-compatible cultivars. This might be explained by the relatively recent occurrence of this trait in almond and a noted inbreeding depression after selfing, which makes breeders avoid the use of self-crosses.

Genetic differentiation was shown to be significant in most assayed populations. In most cases, it is driven by drift but mutations might also contribute. The gene flow between cultivated almond and wild species growing in close proximity of the tested populations (e.g., P. webbii and P. arabica) was shown to be a crucial factor for preserving genetic diversity and differentiation among populations. Several former studies support the bidirectional gene flow among different almond-related species (Godini 2000; Bošković et al. 2007; Zeinalabedini et al. 2010; Delplancke et al. 2012). In addition, a P. webbii S-allele was also detected in a Bademli accession (data not shown). This might be additional proof of gene flow, as this study identified common SSR alleles shared by the Italian self-compatible cultivars and P. webbii (Fig. 1). Seeds of almond-related wild species were also collected by people, which has resulted in genetic impacts as verified in a recent study (Sorkheh et al. 2017). Based on such findings, almond could efficiently preserve its genetic variability along its dissemination routes from the center of origin and throughout the continents, which can be attributed to self-incompatibility and both massive and symmetrical gene exchange between wild species and cultivated P. dulcis. However, drift, especially in isolated geographic regions, was shown to induce a moderate loss of genetic diversity and resulted in consequences similar to those of human imposed selection during cultivar improvement due to the frequent use of superior almond genotypes in crosses. Such impacts may lead to the loss of genetic diversity and our results suggest a moderate level of inbreeding in most of the assayed and genetically differentiated subpopulations. Thus, the present results indicate almond domestication avoided the occurrence of a genetic bottleneck, although its risk is present in many subpopulations.