Introduction

Apricot (Prunus armeniaca L.) is a deciduous stone fruit tree that is globally cultivated in temperate climate zones, especially around the Mediterranean basin and in the Middle East (Zhebentyayeva et al. 2012). Nowadays fruits are most often eaten fresh, and the home-made production of dried, canned, or jammed apricots has strongly declined in Western countries (Khursheed et al. 2020). Main industrial uses are for juices, puree, jam, drinks, and as flavoring agent (e.g., confectionary, ice creams, and dairy) (Roussos et al. 2016). Despite this gastronomic versatility, the sweet, savory, and vividly colored fruits have limited shelf life and are largely seasonal, two main factors that negatively impact consumer choice and food industry, respectively.

Italy is one of the largest producers of apricots in the world (Khursheed et al. 2020), with the Campania region (Southern Italy) providing approximately one-third of the national production (Istituto Nazionale di Statistica; https://www.istat.it). In the last decades, the introduction of cultivars of foreign origin characterized a significant varietal renewal, further promoting the abandonment of traditional varieties (Biscotti et al. 2022; Corrado et al. 2021). The main drivers of such change were the market appreciation of strongly (over-)colored fruits, the preference for varieties specifically bred for fresh consumption, as well as resistance to stress and a ripening time that allows growers to extend the harvest season (Mennone 2016). Recent studies highlighted that germplasm erosion and loss of genetic diversity in the contemporary apricot germplasm are worldwide concerns and prompted for action to preserve traditional varieties and local landraces also for possible applied exploitation (Rossarolla et al. 2020; Herrera et al. 2021; Hagen et al. 2002). For instance, Italy has one of the largest availability of apricot landraces (Ledbetter 2008) and, despite the known limitations related to the fruit perishability, they should be considered a source of genetic variation for adaptive traits, bearing in mind the specific needs of the Italian production systems (Corrado and Rao 2017; Hormaza et al. 2007; Bartolini et al. 2020; Mennone 2016). Moreover, it is debated whether the quality standards of early blooming and maturing contemporary cultivars (mainly focused on the fruit shape, size, symmetry, and color) may have inadvertently caused an additional reason for the commercial decline of this crop, since Italian consumers were more accustomed to sweeter and less sour fruits (Sansavini 2019). Finally, the promotion of this germplasm is also a means to sustain local economy and to improve the management of peri-urban and rural landscape (Zimmerer et al. 2015; Corrado et al. 2021).

To protect the Campanian apricot germplasm, an ex-situ repository has been established based on folk taxonomy and local knowledge yet, this material has been little exploited (Banca Regionale del Germoplasma, Regione Campania; http://www.agricoltura.regione.campania.it/). The molecular characterization of the diversity is essential for managing germplasm (Sheikh et al. 2021; Gürcan et al. 2019), assessing clonal relationships (Fossati et al. 2005), and identifying duplicated accessions and mislabeling (e.g., synonyms, homonyms) (Potts et al. 2012). Moreover, research on the population structure is important for the definition of conservation units, essential in the more demanding management of landraces that cannot be propagated by seeds. Under these perspectives, microsatellites, commonly known as Simple Sequence Repeats (SSRs) or Short Tandem Repeats (STRs), are the one of the markers of choice in plant science because they are codominant, multiallelic and usually, highly polymorphic (Varshney et al. 2005). SSRs are also appropriate to detect recent demographic events and population-specific alleles, thus being particularly suitable to reveal population structure in local populations (Tsykun et al. 2017). The informativeness and utility of SSRs have been considerably enhanced by the diffusion of capillary electrophoretic (CE) systems (Butler 2009). For germplasm management, and more generally forensics, CE has not only increased throughput and multiplexing ability, but greatly limited inter-laboratory variability, thus allowing to retrieve highly reproducible, robust, and transferable information, crucial to build databases of molecular profiles (Baric et al. 2008; Ordidge et al. 2021).

In this work, we analyzed an ex-situ germplasm collection of seventy-three apricot landrace varieties, collected in the Campania region and cultivated also in Southern Italy. By using fluorescent SSR markers resolved in capillary electrophoresis (SSR-CE), we aimed to investigate the level of molecular diversity and verify the presence of a possible genetic structure. Specifically, we provide an in-depth description of the relatedness among apricot landraces as the first step to facilitate their agronomic characterization, ex-situ management, and possibly, promotion in local markets.

Material and methods

Plant material

This work was carried out on seventy-three landrace varieties of Prunus armeniaca L., namely (know synonims are in square brackets; abbreviations used in this article in round brackets): 'Abate' (ABA), 'Abatone' (ABT), 'Ananassa' (ANA), 'Antonaniello' (ANT), 'Aronzo' (ARO), 'Baracca' (BAR), 'Boccuccia' (BOC), 'Boccuccia bianca' (BCA), 'Boccuccia di Eboli' (BCE), 'Boccuccia grossa' (BCG), 'Boccuccia liscia ii' (BCL), 'Cafona' (CAF), 'Campana' (CAM), 'Cardinale' (CAD), 'Carpona' (CAR), 'Casino' (CAS), 'Cerasiello' (CEI), 'Cerasona' (CEO), 'Cristiana' (CRI), 'Diavola' (DIA), 'Don Aniello' (DON), 'Ebolitana' (EBO), 'Fracasso' (FRA), 'Fronne fresche' (FRO), 'Grangicana' (GRA), 'Lisandrina' (LIS), 'Macona' (MAC), 'Magnalona' (MAG), 'Mammana' (MAM), 'Montedoro' (MND), 'Monteruscello' (MNR), 'Nonno' (NON), 'Ottavianese' (OTT), 'Palumella' (PAL), 'Palumella ii' (PAM), 'Panzona' (PAN), 'Paolona' (PAO), 'Pazza' (PAZ), 'Pelese Correale [Pelese]' (PEC), 'Pelese di Giovanniello' (PEG), 'Persechella' (PER), 'Piciona' (PIC), 'Portuallara' (POR), 'Presidente' (PRE), 'Prevetone' (PRV), 'Puscia' (PUS), 'Puzo' (PUZ), 'Resina' (RES), 'S. Francesco' (SAF), 'S. Giorgio' (SAG), 'Sant'Antonio' (SAN), 'Scassulillo' (SCA), 'Scassulillo grande' (SCG), 'Scecquagliella II' (SCE), 'Schiavona' (SCH), 'Scialo'' (SCI), 'Secondina' (SEC), 'Setacciara' (SET), 'Signora' (SIG), 'Silvana' (SIL), 'Sonacampana' (SON), 'Sorrentino' (SOR), 'Stella' (STE), 'Stradona' (STR), 'Taviello' (TAV), 'Tre [Tre Palle]' (TRE), 'Vicario' (VCA), 'Vicienzo [Vicienzo 'e Maria]' (VCI), 'Zeppa [Zeppa 'e sisco]' (ZEP), 'Zeppona' (ZPO), 'Zi' Francesco' (ZIF), 'Zi' Luisa' (ZIL), and 'Zi' Ramunno' (ZIR). Adult trees belong to the collection of the Azienda Agricola Sperimentale Regionale ‘Improsta’ (Centro per la Ricerca Applicata in Agricoltura, Regione Campania), located in Eboli (SA, Italy).

DNA isolation and fluorescent SSR-capillary electrophoresis (SSR-CE)

Five young, healthy looking leaves per plant were harvested and immediately frozen in liquid nitrogen. We analyzed two trees per landrace. Leaves were stored at − 80 °C until analysis. Leaves were finely ground in liquid nitrogen and DNA was isolated as previously described (Corrado et al. 2021). DNA fingerprinting was performed using eight highly polymorphic apricot SSR loci (AMPA095, AMPA112, UDAp401, UDAp410, UDAp 414, UDAp415, UDAp420, and UDAp446), selected from the literature (Hagen et al. 2004) (Messina et al. 2004; Rao et al. 2010). Primer sequences and main features are reported in Supplementary Table 1. Reactions were assembled in a final volume of 25 µL using as template 100 ng of genomic DNA as estimated in an agarose gel electrophoresis (Sambrook et al. 1989). The thermal profile of the PCR and the primer specific annealing temperatures are reported in Supplementary Table 1. The success of the amplification was first checked by agarose gel-electrophoresis (Sambrook et al. 1989), while allelic discrimination was carried out by fluorescence-based capillary electrophoresis on an ABI PRISM 3130 Avant Genetic Analyzer (Thermo Fisher Scientific, Milan, Italy) as already described (Verdone et al. 2018). Automated fragment data analysis was carried out with the GeneScan 4.3 software (Thermo Fisher Scientific) on the basis of the sizeable peaks of the Gene Scan 500 Liz-dye internal standard (Thermo Fisher Scientific). Manual binning was independently performed on each SSR locus to minimize the mean offset of allelic sizes within the instrument resolution (± 1 bp).

Data analysis

For the analysis of locus-based indices of genotypic diversity we calculated per each SSR locus: the allelic size range (ASR) in bp; the number of alleles (Na); the number of MultiLocus Genotypes (MLG); the Effective number of alleles (Ne) as 1/(Σpi2); the Shannon Index of Diversity (I) as -1 × Σ(pi × ln (pi)); the Evenness (E) as ((1/l)-1)/((eI) − 1); the Observed Heterozygosity; the Polymorphic Index Content (PIC, also known as gene diversity) as 1 − Σpi2; and the Wright’s Fixation Index (F) as (He − Ho)/He, where for each locus, pi is the frequency of the i-th allele, Σpi2 is the sum of the squared population allele frequencies, and 1/l is the Stoddart and Taylor’s index. The significance of bivariate correlations was assessed using the Pearson correlation coefficient. The index of association (Ia) and the modified scaled (ranging from 0 to 1) measure rd were calculated to detect signatures of sexual reproduction as described (Brown et al. 1980; Agapow and Burt 2001). Missing data were ignored and data resampling for statistical testing were performed with permutations over alleles (n = 999). Pairwise resemblance between varieties were calculated with the Prevosti’s absolute genetic distance and the dendrogram was built using the unweighted pair group method with arithmetic mean (UPGMA) algorithm (Prevosti et al. 1975). These calculations were carried out with Genalex and poppr (Smouse and Peakall 2012; Kamvar et al. 2014).

The occurrence of a structured population was evaluated utilizing the model-based Bayesian procedure implemented in the software Structure v2.3 (Pritchard et al. 2000). The analysis was performed using a burning period of 50,000 iterations and a run length of 250,000 MCMC replications. We tested a continuous series of Ks, from 1 to 11 in ten independent runs, without introducing prior knowledge about the population, and assuming correlated allele frequencies and admixture (Falush et al. 2003). The most informative K was identified using the so-called Evanno’s method (DeltaK), based on the rate of change in the log probability of data between successive K values (Evanno et al. 2005). The estimated cluster membership coefficient matrices of the ten runs were permuted so that all replicates have the closest match possible and then averaged across replicates using the Greedy algorithm of the software CLUMMP with 9999 permutations (Jakobsson and Rosenberg 2007). To statistically validate the estimated populations, we calculated pairwise Fst and Nei’s standard genetic distance (Dst) between populations using MSA (Dieringer and Schlötterer 2003). The reference distribution for p value calculation of the Fst analysis was based on 9999 permutations.

Result

Genetic and genotypic diversity in the apricot landraces

The SSR fingerprinting of the germplasm collection indicated that all the loci were polymorphic, the maximum ploidy for each sample was two, and that the allelic size range was consistent with the values reported in the literature (Table 1) (Hagen et al. 2004; Messina et al. 2004; Rao et al. 2010). The SSR output of the two trees of the same landrace was always identical, and therefore a single profile per variety was considered for subsequent analyses. The number of alleles greatly varied among loci, with a coefficient of variation (CV) of 27.5%. Specifically, the number of alleles ranged from twelve (UDAp446) to five (AMPA095). The most diverse locus according to the Effective number of alleles (i.e., the number of alleles weighted for their frequencies) was UDAp401 (with eight alleles), while the UDAp446 ranked in the bottom half. UDAp401 was also the most informative locus considering the PIC, although differences among loci were limited (CV: 12.1%). On the other hand, the number of alleles had a large positive correlation (r = 0.79; p = 0.02) with the number of detected multilocus genotypes (MLG). This index ranged from 18 for AMPA112, AMPA095 and UDAp415, to eight for UDAp446 UDAp414, and it was the most variable (CV: 31.6%) among the calculated indices of genetic diversity. The Simpson’s index of diversity (1–D) also largely varied across loci (CV = 18.4%) yet it was higher than one for every locus. A slightly lower variation across loci was present for the Evenness (CV: 13.9), a measure related to the ratio between the more abundant and the rarer genotypes. This index was on average high (0.71 ± 0.04) and less variable also than the number of alleles per locus. The moderate negative linear correlation (r = − 0.52; Pearson’s Correlation) between the number of alleles and their Evenness was not significant (p = 0.18). The observed heterozygosity (Ho) was high (0.73 ± 0.03) and little varied among loci (CV = 12.3%). As for the Evenness, Ho did not significantly associate with the number of alleles (r = − 0.34; p = 0.40) and neither with the number of MLG (r = − 0.31; p = 0.45). Overall, the indices of genetic diversity are consistent with the detection, by highly polymorphic loci, of a non-adaptive genetic diversity, possibly distributed through clonal selection. Consistently, the Fixation Index was close to zero for all loci (mean ± s.e.: − 0.07 ± 0.06), except for UDAp414 (− 0.35), suggesting a limited heterotic selection or negative assortative mating in our population, despite the high Ho values.

Table 1 Main indices of molecular diversity of the apricot collection

Considering that the high number of genotypes was collected in a relatively small area, and the possible on-farm clonal propagation and exchange of fruit tree landraces, we tested if the population was partially or predominantly clonal, that is with a considerable disequilibrium among loci due to linkage. To this aim we calculated the index of association (Ia) and the rd, a related index weighted for the number of loci, which are used to detect signatures of sexual reproduction. The analysis indicated a Ia of 0.58 and a p value lower than 0.01 for a rd of 0.084. The latter falls well outside of the calculated distribution that is expected under no linkage (Fig. 1), indicating that the population under investigation is predominantly of sexual origin.

Fig. 1
figure 1

Histograms of resampled rd values (n = 999). The overlaid dashed vertical blue line indicates the observed rd. The ticks at the bottom represent individual observations

The molecular profiles were used to calculate pairwise genetic distances, which were then used to build an UPGMA dendrogram (Fig. 2).

Fig. 2
figure 2

Dendrogram of the apricot landrace varieties. Pairwise genetic distances were calculated with the Provesti's coefficient. The agglomerative clustering method was UPGMA

This analysis illustrated that some varieties had the same genetic profile. The identities were (in parenthesis is reported the cumulative product of the genotype probability calculated over all the SSR loci; RMP): DIA and PAL (2.12 × 10–10), BCA and BCE (6.86 × 10–7), BCG and ZIR (3.66 × 10–6), CAS and GRA (2.44 × 10–11), and the three samples ABA, ABT and BCL (1.23 × 10–06). In most of the cases, the vernacular classification supports the presence of derived names from a possible landrace group, such as for the cluster that comprises the identical profiles ‘Boccuccia di Eboli’ (BCE) and ‘Boccuccia’ (BCA), and the ‘Abate’ (ABA) e ‘Abatone’ (ABT). In the other cases, it may be possible the presence of mislabeling or erroneous denominations (e.g., synonyms), also considering the low value of the genotype probability.

Analysis of the population structure and differentiation

In absence of an a priori classification, the identification of genetically similar groups of apricot landraces was performed using a widely employed admixture model-based clustering method that also allows proportional assignment to multiple populations. The most informative number of subpopulations (K) was five according to the second order rate of change of the estimate of the conditional posterior probability of the simulation (Fig. 3).

Fig. 3
figure 3

Estimation of the optimum number of clusters in the apricot germplasm based on the second order rate of change of the conditional posterior probability of the simulation, also known as the Evanno's test. The graph displays the Delta K for each of the K value tested

The inferred population structure for K = 5 is presented in Fig. 4 and the CLUMPP generated Q-matrix is reported in Supplementary Table 2.

Fig. 4
figure 4

Estimated population structure of the apricot varieties. Each variety is represented by a vertical line, which is partitioned into colored segments that represent the estimated membership fractions in the five clusters (C). See Fig. 3 for the determination of the optimal number of cluster and Supplementary Table 2 for the Q-matrix

A high proportion of genotypes (68%) had a membership coefficient higher than 0.8 and overall, most varieties were strongly assigned to subpopulations (Supplementary Fig. 1), suggesting also a reduced genetic admixture.

We tested if the groups inferred by the population structure analysis represent statistically significant subpopulations considering pairwise measures of two widely used estimates of differentiation, Fst and the Nei’ standard genetic distance (Dst). The analysis indicated that the genetic divergence between the identified sub-populations was low and for instance, the maximum value was below 0.1 (Table 2). Only in two cases (C1 vs C4; C2 vs C4) the genetic differentiation based on the Fst calculation was statistically significant. Similarly, the genetic distances between sub-populations were on average very low (Table 2).

Table 2 Estimation of pairwise genetic differentiation and genetic distance among the five sup-populations (from C1 to C5, see Supplementary Table 2 for the list of varieties in each sub-population) as inferred by the Bayesian analysis implemented in the structure software

Discussion

The ever-expanding consensus to preserve the genetic diversity of food plant varieties needs to be supported and guided by its characterization, especially in areas where landraces have been produced and survived in agriculture. Although Italy has a rich apricot germplasm (Ledbetter 2008), reports on genetic resources have rarely considered local realities in this country. The molecular analysis of the ex-situ collection indicated a high level of diversity considering both the total number of alleles in the population and the number of alleles per locus. Although these estimates are biased by the sample size, these values were within or above the range of other reports (Wang et al. 2011; Hagen et al. 2004; Messina et al. 2004; Li et al. 2018), which also included landraces (Sheikh et al. 2021; Junhuan et al. 2012; Lamia et al. 2010). Microsatellites were all highly polymorphic and informative (e.g., PIC values higher than 0.5), confirming the features of the selected SSRs (Hagen et al. 2004; Messina et al. 2004). The level of the observed heterozygosity was also high, a likely consequence of the agamic propagation of the germplasm. On the other hand, we could not distinguish all the varieties. Specifically, the number of multilocus genotypes was slightly lower than the number of landraces, as predicted from the analysis of a curated ex-situ collection. For example, in a study of a natural population of P. avium, the percentage of MLG was lower (i.e., 30%) (Jarni et al. 2015). In some instances, it could be proposed the presence of synonyms or derived clones taking into consideration the folk names. Moreover, the observed genetic similarity could suggest the presence of a landrace group (e.g., for the ‘Boccuccia’ types) consisting of genetically similar types (Zeven 1998). These hypotheses should be tested by a detailed morphological analysis, also considering the limitations of the vernacular names (Wilkie and Saridan 1999). Nonetheless, our previous experience on apricot indicated that the number of unique profiles identified with microsatellites is larger than those obtained from qualitative morphological characters (Corrado et al. 2021). Moreover, although the very low values of the RMP, we cannot exclude that more in-depth DNA investigations may reveal adaptive or morphologically significant polymorphisms among landrace groups. However, for other varieties, the data favored the presence of erroneous denominations or sampling. It should also be added that other characterizations of traditional germplasm have also revealed cases of synonymy and/or duplicated accessions (Zhebentyayeva et al. 2003; Ispizua et al. 2007; Queiroz et al. 2015).

At least for apricots of the Campania region, grey literature reports that part of the germplasm is likely to derive from the on-farm selection of open-pollinated seedlings, and that the vegetative spread of plants between farms may favor the selection of possible phenotypic variants, contributing thus to the creation of derived accessions and/or synonyms (Pugliano et al. 1980; Nunziata and Petriccione 2019). For these reasons, we attempted to infer the level of clonality in our population. According to the statistical evaluation of two indices of association, the level of clonality was very low and not significant, indicating that it was not meaningful trying to identify possible multilocus lineages by exploiting, for instance, other information as well (e.g., names, sites of collection, etc.). Similarly, the evaluation of the genetic distance and related dendrogram revealed normally distributed genetic distances. Specifically, they did not make evident a distance threshold (e.g., by peaks or asymmetry in the histogram) at which varieties would be possibly considered deriving from clonal reproduction and recent divergence (Arnaud‐Haond et al. 2007), excluding the above-mentioned identical genotypes. This also indirectly suggests that the identified clones did not significantly affect the estimation of the level of genetic diversity within our collection. We, therefore, inferred a possible genetic structure using a model-based clustering method. The analysis clearly suggested the presence of five sub-populations, whose members were in general well assigned. However, the pairwise genetic distance and differentiation between those clusters were low and most often not significant, not only for the small groups (i.e., fewer than ten members). Although Fst is not fully suited to assess population structure (Meirmans and Hedrick 2011), it is a very popular index also to describe the evolutionary history of derived populations based on the level of heterozygosity. The very limited genetic differentiation implies a rather uniform genetic basis of the accessions (with little foreign introduction), supporting a true local origin of the samples. Moreover, the limited differentiation may be also justified by the presence of rather homogenous geographic, ecological, and for human selected plants, agronomic forces, consistent with standardized cultivation practices and commercial uses of the apricots (Aradhya et al. 2003). Local (fine-scale) genetic differentiation has been verified in several instances in plants (Savolainen et al. 2007; Linhart and Grant 1996) as well as in landraces (Santos et al. 2019; Corrado and Rao 2017), although comparisons are not easy because the scale of the differentiation is usually defined by means of dispersal, typically pollen. On the other hand, to explain the limited genetic distances among sub-populations, it should be also considered that, for asexually propagated plants, somatic mutations are expected to be the main drivers of adaptive evolution to new environments (Miller and Gross 2011).

In conclusion, our work highlighted the high level of genotypic diversity present in an ex-situ collection of traditional varieties of apricot. As expected, the molecular analysis revealed possible homonymy and spurious classifications, which should be confirmed or solved by implementing a thorough morphological classification. Moreover, the very low level of clonality and genetic differentiation among the sub-populations identified by Bayesian analysis can be indicative of a possible common origin of the germplasm, and of an adaptive diversification that is mainly due to similar environmental and human-driven factors. These specificities should not only be considered important backers for the conservation of neglected apricot resources, but also prompt actions to identify and exploit agronomically useful traits (e.g., adaptive or fruit quality-related) behind conservation initiatives.