Background

Domestication of plants is a complex evolutionary process in which human selection favours phenotypic transitions making them more useful for humans and better adapted to landscape management. It is a crucial step in the evolution of crop species since humans have an important impact on their origins and changes. Moreover, selection pressure and local diversification lead to an ongoing process [1]. Two major impacts on plant diversity result from domestication. Firstly, changes in traits selected for human use, called the “domestication syndrome” [2], lead to selection signatures at specific loci [3, 4]. In fact, according to the intensity of the selection process, the degree of change in populations can vary along a continuum from their wild ancestors to the domesticated populations, which cannot reproduce or survive without human intervention. Several highly domesticated plants such as maize, rice and wheat express domestication traits and have lost their ability to survive on their own in the wild [5]. Other crops like trees and forage are generally considered to be partially domesticated while conserving some ability to survive in natural environments [6]. In seed-propagated crops, domesticated types are characterized by a lack of seed dispersal at maturity and a lack of seed dormancy [7], while in clonally propagated crops, the reduction of sexual fertility and adaptations facilitating vegetative propagation have generally been reported [8].

The second major consequence of domestication is the reduction of genetic diversity in crops relative to their wild progenitors due to human selection and genetic drift through bottleneck effects [9]. Contrary to selection which only affects genetic diversity at target genes [3, 4], bottleneck processes reduce neutral genetic diversity across the entire genome [1012]. The strength of genetic drift during the domestication bottleneck is determined by its duration and the effective population size [13]. Thus, according to their life-history traits and evolutionary history, diversity loss differs considerably among crop plants. The reduction of gene diversity in crops compared to wild relatives has been observed in soybean (34%), maize (38%) and wheat (70–90%) [1012]. However, introgressive hybridization between domesticated forms and their wild relatives has often expanded genetic diversity, counteracting the effects of the initial domestication bottleneck [14].

For perennial fruit species, domestication means changing the reproductive biology from sexual reproduction (in the wild) to vegetative propagation (under cultivation) [15]. Few studies have reported the impact of the domestication history and how bottleneck effects may reduce the genetic diversity of crops relative to the wild relatives. Miller and Schall [16] provided phylogeographic evidence of multiple domestication of a cultivated fruit tree, Spondias purpurea, within the Mesoamerican centre of domestication. About 29% of the total diversity was not recovered in wild populations, suggesting that either new alleles have arisen during cultivation or, alternatively, contemporary extinction of tropical dry forests has occurred in Mesoamerican areas leading to genetic erosion of the wild gene pool. In Mediterranean zones, only a weak bottleneck effect on diversity in olive and grapevine was observed when comparing the wild and cultivated forms [17, 18]. For Prunus species, Mariette et al. [19] reported that in the case of sweet cherry a marked genetic bottleneck due to plant breeding was detected at microsatellite loci (40%) and at the S-locus coding for a gametophytic self-incompatibility (GSI) system (30%). However, the domestication bottleneck, as estimated by the loss of genetic diversity between wild cherry and landraces, was not detected by SSR markers but only observed at the S-locus (20%).

Apricot, Prunus armeniaca L., is a stone-fruit species that is grown commercially worldwide in all temperate regions. The numerous cultivars are highly adapted to restricted areas [20]. Apricot is clonally propagated through grafting but it is also seed-propagated, mainly in oasis agroecosystems. The Mediterranean area accounts for over 50% of the worldwide production [21]. Apricot was probably initially domesticated in China where wild apricot is found [22]. Following several collection expeditions through the major agricultural areas of the world and based on morphological data, Vavilov [23] proposed an explanation to determine the centre of origin of cultivated plants and described three regions as centres of origin for apricot: a Chinese centre, a Central Asian centre and a Near East centre. The latter centre included apricot from the Irano-Caucasian area (Iran, Caucasia and Turkey) and was considered as a secondary centre of cultivar diversification because of its presumed intermediate geographic position between the main area of cultivation of the domesticate and the distribution of the wild species [24]. On the basis of morphological characters and pomological descriptions, four major eco-geographical apricot groups were defined [25]: (i) The Central Asian group is the oldest and most diversified. The cultivars are self-incompatible apricots and have high-frost requirements; (ii) the Dzhungar-Zailij group includes self-incompatible small-fruited cultivars; (iii) the Irano-Caucasian group mostly encompasses self-incompatible apricots with reduced chilling requirements; and (iv) the European group is the most recent one, including self-compatible cultivars. According to the morphological characters, the expansion of apricot species into the Mediterranean Basin may have occurred in two waves along two distinct major apricot diffusion routes [24, 26]: the first one being brought by the Arabs through the Near East and North Africa, and the second through Hungary and Central Europe. However, Kostina [25] reported only one major route from the Irano-Caucasian area to the Mediterranean Basin. Hence, apricot domestication and its diffusion into the Mediterranean Basin are still debated issues.

Recently, improvements in neutral molecular markers have significantly increased the capacity of genetic characterization and relationship studies in different apricot cultivars, generating important information relating to the genetic variability, selection processes and breeding history of this crop at a large spatial scale. In this setting, using both microsatellite and AFLP markers, apricot accessions collected from different eco-geographical groups (Europe, North America, Irano-Caucasia, Central Asia) have been grouped according to their geographical origin and pedigree information supporting the history of apricot diffusion from its centre of origin [2731]. However, these studies considered material collections, including both traditional cultivars and selected accessions derived from breeding programs, and did not take the difference between vegetative and seed propagation into account.

In Europe, clonal propagation through grafting and cuttings was used for a long time. Nevertheless, much of the remaining variability was from a mix between seed and grafting propagated apricots (i.e. Vesuvian apricots, Roussillon apricots, Peloponnese apricots) [32]. In North Africa ( Algeria, Morocco, and Tunisia), apricot germplasm contained accessions propagated by grafting, but also by seeds specifically located in oasian regions. A fine-scale genetic diversity study conducted using AFLP markers at the within-population level, focusing on Tunisian grafted apricot cultivars, supported the assumption of few introduced genotypes that have been firstly propagated by seeds [33]. Analysing a larger set of Tunisian apricots, including both vegetatively propagated cultivars and seed propagated accessions, Bourguiba et al. [34] identified two main gene pools according to their propagation mode and confirmed the assumption that these two gene pools shared the same origin.

A gradient of decreasing genetic diversity from east to west was proposed by Hagen et al. [35] among the four identified apricot groups: ‘Diversification’, ‘Adaptive Diversity’, ‘Continental Europe’ and ‘Mediterranean Basin’, which could be related to the apricot diffusion process. However, this study was limited to a small sample (only 50 accessions) and based on a phenetic approach related to the geographic origin and the phenotypic characters of the cultivars.

We still have incomplete knowledge of apricot domestication and diffusion into the Mediterranean Basin based on the assumption of a genetic diversity decrease [35] and balancing between one [25] or two major diffusion routes [24, 26]. This was the first study generating insight into these evolutionary and historical processes following a genetic structure analysis in apricot which included a large sample size of local Mediterranean material and involved microsatellite markers as well as a model-based Bayesian clustering approach. Owing to their transferability across Prunus species, simple sequence repeat (SSR) markers have been widely used in variability studies and linkage map construction [36]. A set of 25 microsatellite markers was selected according to their polymorphism in apricot cultivars and their mapping over the Prunus genome.

The goal of this study was to clarify the history of the apricot domestication process in the Mediterranean area through an analysis of genotypes originating from Algeria, France, Iran, Italy, Morocco, Spain, Tunisia and Turkey. We specifically addressed the following questions: (i) What is the genetic structure of Mediterranean apricots compared to Irano-Caucasian germplasm? (ii) Is there a loss of genetic diversity from the Near-Eastern secondary centre of diversification to the extreme south-western Mediterranean area? and (iii) Can distinct apricot diffusion routes be identified throughout the Mediterranean Basin?

Results

SSR polymorphism

A total of 257 alleles was detected across the 25 SSR loci used, ranging from 5 (BPPCT001) to 18 per locus (UDP98-409; Table 1). The average number of alleles per locus was 10.28 but dropped to 3.96 when rare alleles were removed (i.e. with a frequency of less than 5%). The number of alleles per locus with a frequency higher than 5% ranged from 1 (UDP96-018) to 8 (UDP98-409). The average PIC value for the 25 loci was 0.586 and the most informative locus was UDP98-409 (0.861). Only three among the 25 SSR loci displayed a significant heterozygosity deficit (P < 0.01; BPPCT004, Ma040a and UDP97-402 loci; see Additional file 1: Table S1) in apricot from the Irano-Caucasian area (Iran and Turkey), which is considered as a secondary diversification zone [24], indicating the lack of null alleles even though most of SSR loci were not specifically developed in apricot [3742].

Table 1 Genetic diversity scored at 25 mapped loci in the 207 apricot accessions

The Nei’s genetic diversity ranged from 0.102 (UDP96-018) to 0.876 (UDP98-409), with an average of 0.626 (Table 1), suggesting that the examined Mediterranean apricot germplasm enclosed higher polymorphism than reported in previous studies [2730]. Compared to peach, apricot had higher polymorphism and was more diversified as confirmed by the number of alleles per locus and the observed heterozygosity, which was significantly higher on the 11 SSR loci that were used in both studies [43] (Mann–Whitney U-test, P < 0.05; the average number of alleles per locus was 11.80 in apricot and 7.64 in peach).

Genetic structure and diversity within apricot geographic groups

According to the geographic origin of the studied apricot accessions, we defined eleven groups (Figure 1; see Additional file 2: Table S2). They displayed substantial genetic differentiation since the average F ST value was 0.111, ranging from 0.024 for pairwise comparisons between the Iran and Turkey groups to 0.195 between the Murcia and Oases of Tunisia groups (see Additional file 3: Table S3). All pairwise F ST values were significant at P < 10–6, except for the one observed between South Italy and Murcia, which was significant at P < 10–4.

Figure 1
figure 1

Origin of the 207 apricot accessions classified into 11 apricot groups and three regions: A, B and C as defined according to their spatial and genetic proximity. Algerian, Moroccan and Tunisian apricots were sampled in situ. The remaining accessions originated from an ex situ collection (see Additional file 2: Table S2). Region A = Iran and Turkey; Region B = Continental Europe, southern France and southern Italy; Region C = Murcia, northern Tunisia, Moulouya Valley, Messaad, Oases of Tunisia and Draa Valley. Colours correspond to genetic clusters defined by STRUCTURE analysis with cluster 1 in blue, cluster 2 in green, cluster 3 in yellow and cluster 4 in red.

Genetic relationships among the defined apricot groups were assessed based on Nei’s [44] genetic distances and the Neighbor-joining algorithm (Figure 2). According to the bootstrap values, the 11 apricot groups were classified into three regions (A, B and C; Figures 1 and 2). Region A, including Iran and Turkey groups (Iran-Caucasian region), was clearly distinguished from the remaining groups by a high bootstrap support of 99.94% (Figure 2). Region B, including Continental Europe, South France and South Italy groups, was defined by a bootstrap value of 69.61%. The third region (region C) included the five apricot groups from North Africa area as well as Murcia group, supported by a weak bootstrap value (35.81%; Figure 2). Based on the AMOVA analysis, the genetic variance was about 7% among these three defined regions and 14% among apricot groups per region (Table 2).

Figure 2
figure 2

Neighbor-joining clustering of geographic groups based on pairwise Nei’s genetic distance values, as well as the distribution of the genetic clusters within each of them. Colours correspond to genetic clusters defined by the STRUCTURE analysis, as reported in Figure 3, with cluster 1 in blue, cluster 2 in green, cluster 3 in yellow and cluster 4 in red. Numbers next to nodes indicate bootstrap support percentages in 10000 pseudoreplicates.

Table 2 Partitioning of variance within and among apricot groups and regions (average over 25 loci)

The mean number of accessions per group was 18.81, ranging from 11 for the Murcia group to 32 for the Turkey group (Table 3). The levels of the genetic diversity estimators measured within these geographic groups differed: Iran and Turkey groups (region A), had the highest expected heterozygosity values, with 0.655 and 0.630, respectively; while the Oases of Tunisia and Draa Valley groups (region C) had the lowest ones, with 0.487 and 0.474, respectively. The observed heterozygosity was highest for South Italy (0.633) and lowest for Draa Valley (0.379). For the following groups: Turkey, Murcia, Moulouya Valley and Draa Valley, the F IS values showed a significant heterozygosity deficit (Table 3).

Table 3 Genetic diversity within apricot geographic groups and regions

As the number of alleles observed in a group is highly dependent on the sample size, the allelic richness and private allelic richness were computed for each group and region (Table 3). The highest allelic richness was detected for the Iran and Turkey groups (region A), with 5.03 and 4.88, respectively, while the lowest value was noted in the Draa Valley (3.38), Messaad (3.36) and Oases of Tunisia (3.34) groups belonging to region C. Similar results were obtained when computing the private allelic richness. Thus, region A had a significant upper level of genetic diversity in terms of allelic richness, private allelic richness and expected heterozygosity (Table 3), reflecting a decrease in genetic diversity from the eastern (region A) to the south-western (region C) Mediterranean Basin.

Model-based Bayesian clustering analysis

Using the model-based Bayesian clustering approach implemented in STRUCTURE [45], the genetic structure of Mediterranean apricot was examined according to the model with 2 clusters (K = 2) to 6 clusters (K = 6). The ad hoc quantity based on the second order rate of change of the likelihood function (ΔK) [46] revealed a first level of clustering at K = 2 for the investigated Mediterranean apricots (ΔK = 61.81; Figure 3; see Additional file 4: Figure S1) and a sub-clustering at K = 4 (ΔK = 3.6). Based on the permuted average Q-matrix generated by CLUMPP for the 10 STRUCTURE runs, the highest similarity coefficient (H’) was observed for K = 2 (H’ = 0.997) and K = 4 (H’ = 0.979), indicating the stability of the results for these two models (Figure 3).

Figure 3
figure 3

Genetic structure assessed by STRUCTURE analysis. Bar plot, generated by DISTRUCT, depicts classifications with the highest probability among assumed clusters in the Mediterranean apricot germplasm. Each individual is represented by a vertical bar, partitioned into colored segments representing the proportion of the individual’s genome in the K clusters. Apricot geographic groups were separated by black line.

At K = 2, apricot accessions from Iran and Turkey and a few accessions from Italy and France were separated from the other ones (Figure 3). At K = 3, accessions from Continental Europe, South Italy, South France and Murcia were separated from those located in the Maghreb. At K = 4, a fourth cluster, including some accessions from Continental Europe, South Italy and South France, was identified as originating from the ‘Adaptive Diversity’ group previously defined by Hagen et al. [35]. At K = 5 and K = 6, the genetic structure of Mediterranean apricot within three main gene pools was not modified since the accessions of the fifth and sixth clusters were not consistently assigned and hence no distinctive additional cluster was noted (Figure 3).

The K = 4 model was chosen to obtain an in-depth overview of apricot genetic structure in the Mediterranean area. Four genetic clusters were thus defined and most of the apricot accessions (167 genotypes among the 207 studied, 80.7%) were assigned to a cluster with a probability superior to 80%: cluster 1 (blue) containing 39 accessions originating from the ‘Irano-Caucasian’ area, cluster 2 (green) including 7 accessions referred to the ‘Adaptive Diversity’ group [35], cluster 3 (yellow) composed of 58 accessions from the ‘North Mediterranean Basin’ area and a few from the South Mediterranean and cluster 4 (red) with 63 accessions originating from the ‘South Mediterranean Basin’ region (Figures 1 and 3). The remaining 40 accessions (19.3%) of the sample were assumed to have an admixed ancestry (see Additional file 2: Table S2). The admixture was clearly observed in apricots from the North Tunisia (68.4%) and Moulouya Valley (30.7%) groups (see Additional file 5: Table S4). The four genetic clusters were significantly differentiated, as reflected by the high global F ST value (F ST  = 0.122). The genetic differentiation among the three main clusters 1, 3 and 4 ranged from 0.102 to 0.118, with an average value of 0.109 (see Additional file 6: Table S5).

A significant reduction of genetic diversity between apricot gene pools

A significant reduction of allelic richness and private allelic richness was observed when regions B and C were compared to region A (Table 4). These observations confirmed the presence of a substantial gradient of decreasing genetic diversity of apricot germplasm from the east (Iran-Caucasian area, region A) to the southwest (Maghreb area, region C), depicting apricot domestication and its diffusion history towards the Mediterranean area [35].

Table 4 Relative reduction of diversity among geographic regions and genetic clusters

To properly assess the reduction of genetic diversity due to the domestication bottleneck, a neutral subsample of loci was determined by removing those presumed to be under selection within populations. Thus, we analysed clusters as defined by a model-based Bayesian clustering since they were significantly differentiated without genetic structure at the intra-cluster level in order to detect true positive outlier loci, as proposed by Excoffier et al. [47]. An analysis using Fdist2 software was conducted on three comparisons of these three main clusters: (i) clusters 1 (‘Irano-Caucasian’) and 3 (‘North Mediterranean Basin’), (ii) clusters 1 and 4 (‘South Mediterranean Basin’), and (iii) clusters 3 and 4 (Figure 4). The F ST calculated by Fdist2 between clusters 1 and 3 was 0.105. Based on the first analysis, only one outlier locus was detected at the 95% level: CPPCT022 locus (Figure 4a). This outlier was removed for a second analysis. No outlier was detected and the F ST value was 0.110. A similar analysis was carried out on clusters 1 and 4. The F ST was 0.100 and no outlier was detected at the 95% level (Figure 4b). Finally, using the same procedure, the F ST between clusters 3 and 4 was 0.118. No outlier was detected at the 95% level (Figure 4c).

Figure 4
figure 4

Simulated F ST values as a function of the expected heterozygosity ( H e ) using the F ST between clusters 1 and 3 ( F ST = 10.5%; a), 1 and 4 ( F ST = 10%; b), and 3 and 4 ( F ST = 11.8%; c). Curves delimiting the neutral expectations with the infinite allele model were computed as described by Beaumont and Nichols [66]. Curves with broken lines, triangles and squares represent the 0.5 (1 – 0.95), 0.5 (1 + 0.95) quantiles and median values, respectively. Black and white circles represent the observations non-significant and significant at 5%, respectively.

When looking for the genetic diversity parameters within the three main clusters, similar results were obtained based on both all SSR loci and only on 24 SSR loci following the assumption of neutrality (i.e. after removing the CPPCT022 locus; Table 5). Significant differences were noted for the allelic richness and private allelic richness when comparing clusters 1 and 3 as well as clusters 1 and 4. Further, the observed and expected heterozygosity values were significantly different when comparing clusters 1 and 4 (Table 5). Hence, a significant reduction of allelic richness and private allelic richness was observed using both all SSR loci, and only SSR loci based on the neutrality assumption, when both clusters 3 and 4 were compared to cluster 1 (Table 4). Such a reduction of allelic richness (from 41 to 47%), private allelic richness (from 83 to 93%), and observed and expected heterozygosity (from 22 to 24%) confirmed the decrease in genetic diversity among genetic clusters from the eastern (cluster 1) to the south-western (cluster 4) Mediterranean Basin (Table 4).

Table 5 Genetic diversity within the three main clusters identified by the STRUCTURE analysis at K = 4a

Specific alleles within pairs of geographic regions and genetic clusters

The number of shared alleles specifically detected between pairs from each geographic region as well as between each cluster pair at all microsatellites was computed. Nineteen alleles observed at 14 SSR loci among the total of 257 alleles detected at the 25 loci (7.4%) and 26 alleles observed at 16 SSR loci (9.7%) were specifically detected within regions A vs. B and A vs. C, respectively, while only 4 alleles observed at 3 SSR loci were detected within regions B vs. C (Table 6). A total of 161 alleles were shared by at least two of the three geographic regions. Among them, 11.8% and 16.1% were specific to regions A vs. B and A vs. C, respectively, while only 2.4% were specific to regions B vs. C (Table 6). The frequency of these alleles ranged from 0.005, corresponding to one allele detected once in each of the two regions, to 0.162, with an average of 0.027 (Table 6). The frequency of these alleles varied according to the locus was higher in region A than B and C when taking all observed shared alleles into account (seeAdditional file 7: Figure S2).

Table 6 Specific alleles at each microsatellite locus within geographic region pairs

Similar results were obtained when comparing clusters 3 and 4 to cluster 1. Twenty-three alleles observed at 11 SSR loci among the total of 239 alleles detected at the 25 loci (9.6%) and 32 alleles observed at 18 SSR loci (13.3%) were specifically detected within clusters 1 vs. 3 and 1 vs. 4, respectively; while only one allele observed at locus UDP98-409 was detected within clusters 3 vs. 4 (see Additional file 8: Table S6). A total of 141 alleles was shared by at least two of the three genetic clusters. Among them, 16.3% and 22.6% were specific to clusters 1 vs. 3 and 1 vs. 4, respectively, while only 0.07% were specific to clusters 3 vs. 4 (see Additional file 8: Table S6). These results suggest that in the Mediterranean Basin apricot originated from the ‘Irano-Caucasian’ area (region A/cluster 1) and diffused via two different routes, i.e. northern (region B/cluster 3) and southern (region C/cluster 4) routes within the Mediterranean Basin.

Discussion

Three main gene pools in the Mediterranean apricot germplasm

Apricot has been mainly cultivated in the Mediterranean area since its earliest introduction [24], while displaying high genetic diversity, as previously reported [2730]. In this study, analysis of the genetic structure of a large representative sample of apricots located throughout the Mediterranean countries generated new insight into the history of domestication and diffusion of this species within the Mediterranean Basin. Our apricot sample included 207 accessions from Algeria, France, Iran, Italy, Morocco, Spain, Tunisia and Turkey. Apart from apricots from Algeria, Morocco and Tunisia that were sampled in situ, all the remaining accessions originated from several ex situ collections. Unlike a study on sweet cherry [19] where the authors compared landraces to modern varieties in order to assess the breeding bottleneck, we decided to exclude accessions derived from breeding programs and with unknown passport data in order to obtain clear knowledge about the geographic origin of the studied material and hence to retain only presumed selected local apricots from seed-propagated populations for this study. Despite their ex-situ status, these accessions can be considered as “in-situ sampled” since they were originally from specific geographical areas where local apricots have been diversified through selection from seed-propagated populations, as previously mentioned [24]. Using model-based Bayesian clustering without prior information about the geographic origin of the accessions, we obtained a genetic structure pattern similar to those defined with the geographic origin of the accessions. In fact, the distinction of three main genetic clusters [i.e. cluster 1 (‘Irano-Caucasian’ area; in blue), cluster 3 (‘North Mediterranean Basin’ area; in yellow) and cluster 4 (‘South Mediterranean Basin’ area; in red)] by STRUCTURE analysis (Figures 1 and 3) was in concordance with the three regions (A, B and C) defined according to the geographic origin of the accessions, reflecting a long process of apricot domestication and diffusion in the Mediterranean area.

Loss of genetic diversity supporting the apricot domestication bottleneck

For most crop species, domestication processes cause a loss of genetic diversity due to the bottleneck effect and genetic drift [1012]. In our study, this loss could be assessed by comparison of levels of diversity between geographic groups or the genetic clusters defined by STRUCTURE analysis (clusters 1 vs. 3, clusters 1 vs. 4 and clusters 3 vs. 4). However, estimation of genetic diversity reduction due to bottleneck domestication could be biased by human selection [9] and the genetic structure within populations [47]. Therefore, we assessed genetic diversity within clusters defined by STRUCTURE using all SSR markers as well as the 24 presumed neutral loci after removing the CPPCT022 outlier locus. This locus was located on linkage group 7 of the Prunus genome and not linked to the self-incompatibility locus [48].

The results obtained using the two sets of markers (all SSRs and 24 SSRs under the assumption of neutrality) were congruent for all diversity estimators. A substantial decrease in genetic diversity was observed from the eastern (cluster 1 = ‘Iran-Caucasian’) to the western Mediterranean Basin (cluster 3 = ‘North Mediterranean Basin’ and cluster 4 = ‘South Mediterranean Basin’). Such a loss of genetic diversity was significant when comparing clusters 1 (‘Iran-Caucasian’) and 3 (‘North Mediterranean Basin’) and clusters 1 and 4 (‘South Mediterranean Basin’). Nevertheless, it was not significant when cluster 3 was compared to cluster 4, despite the different modes of apricot propagation: vegetative in North Mediterranean vs. sexual reproduction in South Mediterranean, especially in the oasis agroecosystems in the Maghreb area [33, 34].

We thus noted a substantial loss of genetic diversity that was independent of the selection impact due to the domestication bottleneck. Such a loss of genetic diversity is closely in line with the apricot diffusion routes, and contrasting with patterns in other native Mediterranean fruit species such as olive and grape for which a weak loss of genetic diversity between varieties and wild relatives has been observed [17, 18]. The magnitude of the bottleneck depends on the number of individuals involved and the duration of these pressures [13]. In sweet cherry, several successive domestication events have probably occurred and a significant bottleneck associated with modern breeding was revealed, while no reduction in diversity has been shown between landraces and wild relatives [19].

Two routes of apricot diffusion into the Mediterranean Basin

The Irano-Caucasian area is considered as a secondary diversification zone [24] for at least two reasons. First, based on ethno-botanical data, northern Iran was identified as an evolutionary centre for a large number of fruit trees, including Prunus species [49]. Second, Irano-Caucasian apricots occupy an intermediate position between domesticated varieties and wild species, as previously described [24]. Similar findings were also obtained for Iranian apples [50]. Although we lacked genetic data from the presumed primary gene pool of the centre of apricot origin in China [22, 23], our findings suggest that Mediterranean apricots have been selected from the ‘Irano-Caucasian’ gene pool. Indeed, 62% of alleles common to the three regions A, B and C and revealed in the northern and southern Mediterranean Basin were found to be shared with the ‘Irano-Caucasian’ gene pool. This leads to the following question: did these two gene pools diffuse into the Mediterranean Basin through only one route, as proposed by Kostina [25], or two main routes, i.e. via the northern and southern Mediterranean, as hypothesised by Faust et al. [24] and Mehlenbacher et al. [26]?

The distinction of ‘South-Mediterranean’ apricots from the ‘Irano-Caucasian’ cluster confirmed the findings of a previous study [22] based on morphological characters. Moreover, according to our analyses, most accessions from the Murcia group (Spain) were assigned to region C, but it seems that this group is genetically intermediate between regions B (Continental Europe, South France and South Italy groups) and C (South Mediterranean). Spanish and North African accessions were also pooled in another AFLP-based study [35]. In fact, the introduction of apricots into North Africa and Spain was attributed to the Arabs during the regime of Umayyad, who conquered Spain between 711 and 719 [24]. Furthermore, region C pooled apricots propagated by grafting (from Murcia and North Tunisia) and by seeds (from the oasis agroecosystems: Moulouya Valley, Messaad, Oases of Tunisia and Draa Valley). These two apricot groups, also distinct according to their mode of propagation, were recently proved to share a common genetic basis in Tunisia [34]. In addition, some accessions encountered in North Tunisia, Messaad and Moulouya Valley represented a clear admixture between clusters 3 and 4, indicating that gene exchanges have occurred between northern and southern Mediterranean countries. These events could be related to both ancient human movements (e.g. Romans, Arabs, Andalusians) and/or to recent material transfers associated with the French colonisation period in the early 20th century [24, 26].

By comparing cluster pairs 1 vs. 3 and 1 vs. 4, we observed a significant loss of genetic diversity. Conversely, we noted no significant loss of genetic diversity between clusters 3 vs. 4, while they were more differentiated than cluster pairs 1 vs. 3 and 1 vs. 4 (see Additional file 6: Table S5). Otherwise, a substantial proportion of specific alleles was observed along the northern Mediterranean apricot diffusion route (clusters 1 vs. 3 and geographic group A vs. B) and the southern route (clusters 1 vs. 4 geographic group A vs. C). These results strongly indicate that apricot was diffused through two main routes: the first one through countries north of the Mediterranean Sea (cluster 3/region B) and the second one probably brought by the Arabs through North African countries (cluster 4/region C), as previously proposed [51, 52]. Our findings are in agreement with earlier studies [24, 26], but not with the hypothesis proposed by Kostina [25], who reported only one major route from the Irano-Caucasian area to the Mediterranean Basin.

Conclusions

Based on the three main identified gene pools, we observed a significant and substantial loss of apricot genetic diversity, ranging from about 37 to 49% from the secondary apricot diversification zone (‘Irano-Caucasian’) to the southwestern Mediterranean Basin, depicting a genetic signature of apricot domestication and diffusion into the Mediterranean Basin. Unlike Kostina’s assumptions [25], we propose an evolutionary scenario in favour of two diffusion routes in southern Europe and North Africa as revealed by a substantial proportion of shared alleles that were specifically detected along each of the two diffusion routes. Our study generated genetic insight to: (i) improve management and conservation strategies for Mediterranean apricot germplasm, and (ii) propose a genetic basis for apricot breeding programs.

Methods

Plant material

A total of 207 apricot accessions were sampled throughout different countries from the eastern to the south-western Mediterranean Basin: Algeria (23), France (49), Italy (19), Morocco (34), Spain (8), Tunisia (42) and Turkey (32; Figure 1; see Additional file 2: Table S2). The strategy was to select accessions reflecting the local variability in each country, excluding accessions derived from breeding programs. According to the surveyed country, apricot material was collected either from germplasm collections, including the French collection maintained at the Institut National de la Recherche Agronomique (INRA Avignon, France), the Italian collection of the Department of Fruit Science and Plant Protection of Woody Species (University of Pisa, Italy), the Spanish collection of CEBAS-CSIC (Murcia, Spain), and the Turkish collection of the Inonu University (Malatya, Turkey), or from different collection surveys as was the case for the Algerian, Moroccan and Tunisian accessions. The French material studied contained native cultivars as well as a few introduced accessions initially acquired from other collections around the world (Italy, Iran, Spain), which were also considered in the sample set in order to span a broad eco-geographical apricot distribution range (Figure 1; see Additional file 2: Table S2). Apricots are vegetatively propagated, however traditional local cultivars propagated by grafting and accessions propagated by seeds grown in oasis agroecosystems were both present in North African countries.

The studied accessions were subdivided into eleven groups based on their geographic origin (Figure 1; see Additional file 2: Table S2). Group 1 (Iran) consisted of 14 Iranian varieties present in the French collection, group 2 (Turkey) was composed of 32 accessions from Turkey, group 3 (Continental Europe) was composed of 21 accessions from both northern Italy and northern France, since these two populations are known to be related to a Central Europe gene pool, which is less adapted to the Mediterranean region [24], group 4 (South France) comprised 12 accessions originating from the southern regions of France, group 5 (South Italy) included 18 accessions from the southern Italy and specifically the Napoli area, group 6 (Murcia) consisted of 11 accessions from Murcia in Spain, group 7 (North Tunisia) consisted of 19 grafted propagated cultivars collected from Testour and Ras Jbel regions in northern Tunisia, group 8 (Moulouya Valley) encompassed 13 accessions from the Moulouya Valley in central eastern Morocco, group 9 (Messaad) was composed of 23 accessions from Messaad region in Algeria, group 10 (Oases of Tunisia) consisted of 23 seed-propagated apricot accessions collected from five different oases in Tunisia (i.e. Tameghza, Nefta, Tozeur, Midess and Degache), and finally group 11 (Draa Valley) included 21 accessions from the Draa Valley in south-eastern Morocco.

DNA extraction and microsatellite analysis

Except for the Turkish accessions, all the remaining material shipped to the INRA Montpellier laboratory for DNA extraction were in the form of fresh young leaves collected in each country of origin during the apricot flowering period (between March and May 2008). For these latter, total genomic DNA extraction was conducted from 150 mg of fresh young leaves, using the DNeasy Plant Mini Kit (QIAGEN, Germany) according to the manufacturer's instructions, with minor modifications: addition of 1% w/v of PVP-40 to the AP1 buffer solution. For the Turkish accessions, total genomic DNA was extracted at the Apricot Research Center of the Inonu University of Malatya according to the same protocol. Then DNA aliquots were purified in the INRA Montpellier laboratory according to the protocol described above.

Twenty-five microsatellite markers were selected on the basis of the ease of amplification in apricot and their location on the Prunus reference genetic map: ‘Texas’ x ‘Earlygold’ [53, 54] as they are evenly distributed throughout the eight linkage groups of the Prunus genome (Table 1). These SSR loci were successfully used in the study on Tunisian apricot [34] and Mnejja et al. [36] have demonstrated the successful transferability of most of them among Prunus species. PCR was carried out in a 20 μl reaction mixture, containing 20 ng of template DNA, 2 mM MgCl2, 4 pmol of reverse primer and 1 pmol of forward primer, 0.2 mM of each dNTP, and 1 U of Taq polymerase (Sigma, USA). The forward primer was 5’-labeled with one of the three fluorochromes (6FAM, NED or HEX). The PCR conditions were as follows: 35 cycles at 94°C for 30 s, T° annealing (depending on the locus) for 60 s, and 72°C for 60 s, with a final extension step at 72°C for 10 min. Amplified products were resolved using an ABI prism 3130 XL automatic DNA sequencer (Applied Biosystems, USA). Allele sizes were determined with GeneMapper 3.7 software (Applied Biosystems, USA).

Genetic structure analysis

The model-based clustering approach, as implemented in the STRUCTURE 2.2 software program [45], was used to infer the population structure of Mediterranean apricot accessions. This program assumes Hardy–Weinberg equilibrium and linkage equilibrium within clusters. No prior information about the geographic origin of the accessions was considered in the analysis. The STRUCTURE algorithm was run using a model with admixture and correlated allele frequencies, with 10 independent replicate runs for each K value (number of genetic clusters), for K ranging from 1 to 6. Each run involved a burning period of 100000 iterations, and a post-burning simulation length of 1000000. The run with the maximum likelihood was used to assign the most probable number of clusters, which was validated with an ad hoc statistic based on the second order rate of change in the log probability of data between successive K values [46]. To find optimal alignments of independent runs, the CLUMPP version 1.1 software program [55] was used with greedy algorithms, 10,000 random input orders and 10,000 repeats, to calculate the average pairwise similarity (H’) of runs. The output obtained was used directly as input by the cluster DISTRUCT version 1.1 visualization program [56].

Genetic diversity and differentiation assessment

GENETIX 4.05 [57] was used to calculate the following parameters: total number of alleles (N A ), number of alleles with frequency higher than 5% (N A,P ), and observed (H o ) and expected (H e ) heterozygosities. PowerMarker 3.25 [58] was used to estimate the polymorphic information content (PIC) at each locus, originally defined as the probability of a given marker being informative in a random mating [59]. The inbreeding coefficient (F IS ) and the genetic differentiation (F ST ) were computed according to the formula of Weir and Cockerham [60] using GENEPOP 4.0 [61] and Fisher’s method [62] was applied to test the significance of pairwise F ST values. The generalized rarefaction approach ADZE [63] with standardized values was used to estimate the allelic richness (A r ) and the private allelic richness (A pr ).

Pairwise standard genetic distances of Nei [44] were calculated and used to conduct cluster analysis with the Neighbor-joining algorithm and to construct an unrooted tree with 10,000 bootstraps over microsatellite loci, as implemented in PHYLIP 3.69 [64]. The analysis of molecular variance (AMOVA) implemented in GENALEX [65] was conducted to estimate the hierarchical differentiation at two levels: (i) a group level identifying the eleven apricot geographic groups; and (ii) a region level distinguishing the three geographic regions A (Iran and Turkey), B (Continental Europe, South France and South Italy), and C (Murcia, North Tunisia, Moulouya Valley, Messaad, Oases of Tunisia and Draa Valley).

Sub-samples of neutral markers definition

The observed variation of genetic diversity among clusters might be due to two evolutionary factors: selection and bottleneck effects. Selection force could affect the estimation of bottleneck impact on the apricot diffusion history. A sub-sample of neutral markers was thus defined and used for the genetic diversity analysis. For this purpose, the method developed by Beaumont and Nichols [66] was used, involving the detection of unusually high or low F ST levels by plotting F ST against heterozygosity on the set of markers. The Fdist2 analysis was conducted on three comparisons of the three main genetic clusters. We compared (i) clusters 1 (‘Irano-Caucasian’) and 3 (‘North Mediterranean Basin’), (ii) clusters 1 and 4 (‘South Mediterranean Basin’), and (iii) clusters 3 and 4. For each comparison, 50000 simulations were run using the infinite allele model for markers. A first analysis revealed a first set of outliers. Then they were removed and a new F ST was calculated, which was used to make a new analysis and reveal a new set of outliers. Analyses were iterated until no further locus fell outside of the expected distribution. The last F ST value was used as the neutral value to detect outliers on the whole set of data.

Determination of the relative reduction of diversity

The relative reduction of diversity was estimated with five diversity estimators: the total number of alleles, allelic richness, private allelic richness, observed heterozygosity and the expected heterozygosity, as described by Vigouroux et al. [67]. The probability of independency between regions/clusters was determined for each of these diversity estimators by the two-tailed Mann-Whitney’s U test using the R version R 2.11.0 software package [68]. For each estimator, the relative reduction of diversity was determined by calculating 1-(DIV1/DIV2), where DIV1 is the estimator of diversity in the supposed derivating genetic pool and DIV2 is the estimator of diversity in the supposed originating genetic pool. The relative reduction of diversity was estimated using all SSR markers for the comparison among regional groups, and based on all 25 SSR markers and also only on neutral SSR markers when comparing the three main genetic clusters.

Authors’ information

HB is a PhD student who will defend her thesis entitled “Apricot genetic structure in the Mediterranean Basin: domestication and diffusion process” in 2012. JMA is a scientist at INRA, the French national institute for agricultural research, and is in charge of INRA’s fruit research group and in particular apricot research program including genetic resource management and breeding programs. LK is an Assistant Professor at the University of Sciences of Gafsa and a scientist in the Laboratory of Molecular Genetics, Immunology and Biotechnology at the University of Sciences of Tunis and is in charge of the apricot genetic resource management. NTF is a Professor at the University of Sciences of Tunis and Director of the Laboratory of Molecular Genetics, Immunology and Biotechnology at the University of Sciences of Tunis and is in charge of fruit genetic resource management. AM is a scientist at the Moroccan national institute for agricultural research and is in charge of apricot genetic resource management and breeding. ST is a scientist at the Laboratoire d’Arboriculture at the University of Blida. CD is an Assistant Professor at the University of Pisa, Head of the Laboratory of Molecular Biology of Fruit Science and in charge of local fruit resource genotyping. BMA is a scientist in the Department of Biology at Inonu University. SS is an engineer, molecular biologist, and is in charge of the development and use of molecular markers. BK is a geneticist in the conservation biology field, focusing on diversification of Mediterranean fruit species and on the management of agrobiodiversity and agroecosystems.