Introduction

Olive, native to the Mediterranean Basin, is believed to have been first introduced into California by the Spanish Franciscan missionaries in the late eighteenth century (Taylor 2000). Compared to leading olive producers such as Spain (2.42 million ha), Italy (1.4 million ha), Tunisia, and Greece (1.03 million ha), California is a minor player in the worldwide table olive and oil markets (Vossen 2007). Yet, most olives produced in the United States come from California, grown on 12,245 ha in 2007 (U.S. Department of Agriculture 2008). In recent years, California domestic olive oil production has almost doubled, from 936,889 liters in 2000 to 1.514 million liters in 2005, with a projected increase to 3.785 million liters by 2013 (Vossen 2007; Vossen and Devarenne 2005).

The U.S. Department of Agriculture’s National Clonal Germplasm Repository for Fruit and Nut Crops (NCGR) maintains a field collection of olive germplasm at the Wolfskill Experimental Orchard (WEO) in Winters, California. This collection contains 140 individual accessions, which includes about 120 uniquely named cultivars of Olea europaea L. ssp. europaea var. europaea from the Mediterranean Basin, South America and California. The collection also contains several accessions of hybrids and the related subspecies, O. europaea L. ssp. cuspidata (Wall. ex G. Don) Ciferri, which has a native range throughout Africa, and South-Western and Subcontinental Asia (Green 2002).

Several previous olive genetic diversity studies applied molecular marker systems including isozymes (Trujillo et al. 1995), amplified fragment length polymorphisms (AFLP; Sanz-Cortes et al. 2003), random amplified polymorphic DNA [RAPD; Besnard et al. 2001(a)] and microsatellite simple-sequence-repeats (SSR; Lopes et al. 2004) to examine genetic diversity, structure, and differentiation among olive cultivars, as well as to characterize germplasm collections and explore cultivar identity. These studies encompass collections and accessions from throughout the Mediterranean and secondary growing regions such as Australia (Sweeney 2003). Belaj et al. (2003) compared the usefulness of RAPD, AFLP, and SSR markers for identification and genetic differentiation of 32 Spanish and Italian olive cultivars. They concluded that SSR markers, due to their co-dominant nature, high levels of polymorphism and reproducibility, have a higher discriminating power for cultivar identification, and are ideal for olive genome mapping and genetic studies.

This study used microsatellite markers to examine the level of genetic diversity, structure and differentiation within Olea europaea accessions maintained in the NCGR collection, based on their genetic affinities, geographic origin and fruit use for oil, table, or dual (oil and table) purposes.

Materials and methods

Plant material collection and DNA extraction

A total of 116 accessions in the NCGR olive collection at the WEO were sampled (Table 1). Fresh, young, fully expanded leaves were collected in 2005 or 2008 for DNA isolation. Homogenization was accomplished by freezing fresh leaf tissue in liquid nitrogen, and then either grinding with a mortar and pestle or in tubes using a Geno/Grinder 2000 (OPS Diagnostics, Bridgewater, NJ). DNA was extracted from homogenized leaf tissue using either QIAGEN DNeasy Plant Mini-prep Kits (QIAGEN, Valencia, CA), or a CTAB-based protocol modified from Doyle and Doyle (1987). DNA was suspended in TE buffer (pH = 8) and digested with RNase A (QIAGEN) at 37 °C for 1 h. DNA samples were diluted to approximately 20 ng/μl before conducting PCR.

Table 1 List of accessions sampled in this study, their geographic origin, and fruit use type

Amplification of SSR loci and fragment separation

Twenty-seven microsatellite markers previously developed from O. europaea ssp. europaea were screened for usefulness on a diverse subset of cultivars from the NCGR collection. Based on criteria of PCR amplification consistency, polymorphism, and allele scoring consistency, the fourteen best markers were chosen: EM03, EM030, EM088, EM090 (de la Rosa et al. 2002), GAPU89 (Carriero et al. 2002), IAS-oli16 (Rallo et al. 2000), IAS-pOe12 (Genbank Acc. No. AF356039 2001), ssrOeUA-DCA03, ssrOeUA-DCA08, ssrOeUA-DCA11 (Sefc et al. 2000), UD099-011, UD099-019, UD099-028, and UD099-042 (Cipriani et al. 2002). The forward primer was labeled with one of three fluorescent dyes: 6-FAM, HEX, or NED [Applied Biosystems, Foster City, CA (ABI)]. However, one marker, IAS-pOe12, was found to PCR amplify two loci instead of one, and these two loci were separated and identified for final analysis as IAS-pOe12_A and IAS-pOe12_B, bringing the total number of microsatellite loci analyzed to fifteen.

Each PCR reaction consisted of 1× Standard Taq Buffer [New England BioLabs, Ipswich, MA (NEB)], 2 mM MgCl2, 0.375 mM each dNTP (ABI), 0.075 units/μl Taq DNA Polymerase (NEB), 0.05 pmol/μl each primer, and approximately 5 ng/μl DNA. PCR reactions were triplexed, i.e. made with three primer pairs combined in one reaction, each pair labeled with a different fluorescent dye. PCR was performed under the following conditions: 1 cycle of 94 °C for 5 min, 30 cycles of 94 °C for 30 s, 55 °C for 30 s, and 72 °C for 40 s, and then a final elongation of 72 °C for 7 min.

Amplified SSR fragments (approximately 0.8 μl PCR product/sample) were combined with 12 μl Hi-Di Formamide and 0.2 μl HD400 ROX Size Standard (ABI) per sample. Samples were denatured at 95 °C for 5 min. Fragment separation by capillary electrophoresis was conducted on a 3100 Genetic Analyzer (ABI). At least one copy of each available, unique accession was fingerprinted for the fifteen chosen SSR loci.

Data analysis

SSR fragments were sized in nucleotide base pairs and binned into allele categories using GeneScan Analysis Software v. 3.1 and Genotyper v. 2.5 (ABI) (Online Resource 1). Allelic data for all loci were combined and converted to a binary format to produce a rectangular data matrix (1 = presence, 0 = absence).

Using PAUP* v. 4.0 (Swofford 2003), a pairwise genetic distance matrix was computed using the Nei and Li similarity coefficient (Nei and Li 1979), which was then used to perform cluster analysis to generate a phenogram using the Unweighted Pair Group Method using Arithmetic mean (UPGMA) algorithm. Bootstrapping with 1,000 replicates and no resampling was conducted to determine support for each node. Allelic data were grouped into clusters based on the UPGMA cluster analysis. Using BIOSYS-1, v. 1.7 (Swofford and Selander 1989) the clusters were subjected to various within-cluster genetic diversity parameters, such as mean number of alleles/locus, observed and expected levels of heterozygosity, polymorphic index, and contingency chi-squared tests.

Total gene diversity (H T), gene diversity within clusters (H S), and proportion of diversity among clusters (G ST) (Nei 1973) were calculated using the computer program, Dispan: Genetic Distance and Phylogenetic Analysis (Ota 1993). Genetic divergence among groups was examined using the distance-Wagner tree based on the Prevosti distance measure (Swofford 1981; Wright 1978). The program NTSYSpc: Numerical Taxonomy System, v. 2.1.1 (Rohlf 2002) was used for principal components analysis (PCA) using the Dice coefficient distance matrix (Dice 1945). To simplify analysis and according to NTSYSpc requirements, synonymous genotypes, samples with missing data at three or more loci, and the ssp. cuspidata accessions were excluded from the data set for PCA.

Results

Genetic diversity, structure, and intraclonal variation within the NCGR collection

Per locus number of alleles ranged from three at IAS-pOe12_A to 16 at ssrOeUA-DCA11, with a mean of 9.93 alleles per locus. Expected heterozygosity ranged from 0.227 at UDO99–019 to 0.860 at ssrOeUA-DCA3 with a mean of 0.687, while observed heterozygosity ranged from 0.175 at UDO99–019 to 0.937 at GAPU89, with a mean of 0.640 (Table 2). Most loci had two or three moderate to high-frequency (20% or higher) alleles. Locus-wise fixation indices were significant at the 5 or 10% level in six loci (EM030, EM088, IAS-pOe12_A, IAS-pOe12_B, ssrOeUA-DCA11, and UDO99–019), all indicating a deficiency of heterozygotes as compared to Hardy–Weinberg (HW) expected levels. Of the 116 total accessions analyzed in the collection, 88 unique, multilocus genotypes were generated.

Table 2 Per locus number of alleles (A), allele size range in base pairs (bp), unbiased estimate of expected heterozygosity (H e), direct-count observed heterozygosity (H o), and fixation index (F)

The cluster analysis using UPGMA method showed low to moderate differentiation within the olive collection with a total of twelve clusters (C1–C12), organized into seven major groups, partially attributable to common geographic origin and/or fruit use (Fig. 1). Most clusters had moderate bootstrap support (50% or above). Accessions from ssp. cuspidata clustered as outliers. Contingency chi-squared (Χ 2) analysis showed significant differences in allele composition and frequencies across all loci among the different UPGMA clusters.

Fig. 1
figure 1

UPGMA phenogram from Nei and Li (1979) genetic distance measures using fifteen SSR loci on 116 cultivated olive and ssp. cuspidata accessions within the NCGR collection

The first group in the UPGMA phenogram is a cluster of oil producing cultivars of French, Greek, South American, and Tunisian origin. Group two is a separate, small cluster of the Spanish oil cultivars ‘Arbequina’ and ‘Arbussana’, as well as ‘3s5-117’ of unknown origin or use. Group three contains two Greek, one French, and one Italian, dual purpose cultivars. The fourth group is a cluster of mainly Italian oil and Greek dual purpose cultivars. Group five is a medium sized group of cultivars from the Eastern Mediterranean, mainly Greece and Tunisia, of various use types. The sixth group is made up of primarily dual purpose cultivars from throughout the Mediterranean region. Group seven is the largest and comprises most of the Iberian and Italian cultivars of various use types. Seventeen groups of synonyms can also be seen in the UPGMA tree as zero branch lengths (Online Resource 2).

Six instances of closely related accessions, with one or two allelic differences over fifteen SSR loci, were found within the NCGR olive collection. The instances found in this study, however, include accessions that carry unique cultivar names. ‘Zitoum’ differed from ‘Mission’ at only one allele by 2 bp at the imperfect dinucleotide repeat locus GAPU89, as did ‘Mission Variant’, with the addition of a second allelic difference at locus UDO99-028. The accession ‘Nab Tamri’ differed from ‘Sevillano’ at one, 2 bp difference allele at locus GAPU89. ‘Rouget’ differed from ‘Piconia’ at locus UDO99-028, ‘Conserviola’ from ‘Oliva A Prugno’ at ssrOeUA-DCA8, and ‘Bouteillon’ from ‘Frantoio’ at UDO99-028. A comparison of the ‘Mission’ genotype to several Iberian cultivars revealed that all of the latter could contribute at least one allele to the ‘Mission’ genotype at thirteen or fourteen SSR loci, and at all fifteen loci for ‘Sevillano’ (Table 5).

Principal components analysis

The first two components of the PCA explained 24.8% of the total variation among a subset of 99 accessions included in the analysis. The first component explained 20.0% of the variation, followed by 4.8% for the second components. Projection of the unique genotypes on a two-dimensional plane, based on the first two principal components, partially confirmed the results of the UPGMA method. The accessions from the UPGMA clusters likewise tended to form their own groups in the PCA, though usually overlapping (Fig. 2). In the top two quadrants of the projection, a loose group of accessions was comprised only from clusters 1, 3, and 10 of the UPGMA phenogram. Overlapping groups of accessions from almost all UPGMA clusters comprise the central to lower quadrants of the projection. They represent various geographic origins and uses.

Fig. 2
figure 2

Two-dimensional projection of olive accessions along the first two principal axes, accounting for 24.8% of the total variation; symbols assigned by UPGMA cluster

Within groups genetic diversity and differentiation

The mean number of alleles per locus among the thirteen clusters found on the UPGMA tree, ranged from 2.1 for C4 to 5.3 for the ssp. cuspidata cluster (C13), with an overall mean of 3.6 alleles per locus per cluster. Mean expected heterozygosity (H e) within groups ranged from 0.529 for C13 to 0.794 for C6, with an overall mean H e of 0.631. Mean observed heterozygosity (H o) within groups ranged from 0.422 in C7 to 0.721 for C13, with an overall mean H o of 0.585 (Table 3). The percentage of polymorphic loci within groups ranged from 80% in C4 to 100% in C1, C3, C12, and C13. Cluster groups 5, 6, 7, 8, and 10 had positive fixation indices, suggesting a deficiency of heterozygotes. In contrast, the ssp. cuspidata group (C13) shows an excess of heterozygotes. The remaining clusters and overall mean heterozygosity across all groups approached panmixia.

Table 3 Within-cluster genetic variability at fifteen loci (standard errors in parentheses)

Table 4 shows genetic differentiation measures within and among olive clusters identified in the UPGMA CA. The different loci varied in the distribution of components of gene diversity. The total gene diversity (H T) per locus ranged from 0.319 at UDO099-019 to 0.847 at ssrOeUA-DCA3, with a mean H T of 0.696.

Table 4 Measures of gene diversity and genetic differentiation (Nei 1973) in olive

The locus-wise gene diversity within groups (H S) ranged from 0.220 at UDO099-019 to 0.670 at UDO099-011, with an overall mean of 0.528. The proportion of diversity partitioned within groups (H S/H T) ranged from 63.3% at IAS-pOe12_B to 84.8% at GAPU89, with a mean of 75.9%. The proportion of gene diversity partitioned among groups (G ST) ranged from 15.2% at GAPU89 to 36.7% at IAS-pOe12_B, with a mean G ST of 24.1%.

Figure 3 shows a distance-Wagner tree, using the Prevosti distance measure, produced for populations based on the thirteen clusters provided by the UPGMA. The ssp. cuspidata cluster again forms an outlier group, and the cultivated olives can be further divided into three main groups. The French and Greek oil cultivar cluster, C1, forms a distinct group. The second group comprises the majority of clusters, including cultivar accessions of various origins and use types. However, two subgroups are formed, one with most of the Iberian and Italian origin cultivars, and the other with two clusters dominated by Eastern Mediterranean (primarily Greek and Tunisian) cultivars. The third group contains a Spanish oil cluster (C12), an Italian oil and Greek dual purpose cultivar cluster (C3), and a pair comprised of a western Asian cultivar of unknown use and the Italian table cultivar ‘San Francesco’ (C4).

Fig. 3
figure 3

The distance-Wagner tree of the differentiation of the thirteen UPGMA clusters of olive accessions, cophenetic correlation = 0.978

Discussion

Genetic diversity and cultivar identity

The generally high heterozygosity per locus found within the NCGR olive collection is typical in naturally out-crossing, perennial, and clonally propagated species that are highly selected for greater adaptability, vigor, and productivity under cultivation and harbor recessive somatic mutations as heterozygotes (Aradhya et al. 2003). The allelic diversity results generally agree with recent olive SSR studies (Lopes et al. 2004; Belaj et al. 2004; de la Rosa et al. 2002; Sefc et al. 2000; Cipriani et al. 2002; Omrani-Sabbaghi et al. 2007). However, the observed heterozygosity for locus EM030 is lower than found in a previous study (de la Rosa et al. 2002). This may be due to the differences in the selection regime causing subtle differences in the genetic structure and differentiation patterns among the germplasm used. The mean heterozygosity across all loci and among groups approaches HW expectations. This indicates general panmixia within cultivated olive. However, the significant deficiency of heterozygotes found at six loci suggests a difference in selection pressures and responses among loci.

Polyclonal cultivars have been documented in olive. For example, ‘Cordovil’, from Portugal, may have arisen from accidental seedling propagation and local selection (Lopes et al. 2004). However, it is highly unlikely that differences at one or two microsatellite loci could be the result of sexual reproduction in a highly heterozygous, out-crossing species such as olive (Cipriani et al. 2002). Another possible cause of this type of clonal variation is a somatic mutation event occurring as bud sports in the mother cultivar tree, and subsequent clonal propagation (Cipriani et al. 2002). This may explain the six instances of slight clonal variation found in this study. However, this variation occurs between different accessions not known to be synonyms or closely related–the sole exception is ‘Mission Variant’ with ‘Mission’. These observations suggest misidentification of at least one accession. For example, ‘Zitoum’ is putatively a synonym of ‘Picholine marocaine’, a unique cultivar from Morocco (Bartolini et al. 2005), yet the results suggest that the NCGR accession labeled ‘Zitoum’ is actually an incorrectly named, clonal variant of ‘Mission’.

The cases of synonymy or near synonymy encountered in this study may be the result of accession mislabeling, however, the problem may also lie in the assumption that an olive cultivar is a homogeneous entity. Many olive cultivars have a long history of cultivation in the regions where they were selected by local farmers (Breton et al. 2006). Traditional farmers, in the past and still today, develop and name crop cultivars based on distinct phenotypes associated with specific agronomic, morphological, phenological and organoleptic qualities (Gibson 2009), some of which may be achieved via different genotypes, potentially resulting in heterogeneous cultivars. For example, in cassava, generally assumed to be clonally propagated, a study of farmer management in Guyana (Elias et al. 2001) found intravarietal polymorphisms in 72% of varieties analyzed. In the case of olive, the issue is further complicated by: both clonal and seed propagation, changing selection pressures, exposure to new growing environments, gene flow with other cultivars or locally occurring oleasters (O. europaea L. ssp. europaea var. sylvestris (Miller) Lehr.), all of which may result in intracultivar heterogeneity. Additionally, phenotypic or genetic changes may occur, triggering a new name for an extant genotype, or a new genotype for an established name. Finally, germplasm collections may themselves be sources of identification confusions contributing to intracultivar heterogeneity (Lopes et al. 2004).

A good example is the case of ‘Oblonga’ in the USDA collection. That cultivar was first identified in 1940 as an exceptionally Verticillium-resistant volunteer seedling in a Corning, California orchard (Hartmann et al. 1971). Based on RAPD and SSR marker data, however, that accession is potentially a synonym of ‘Frantoio’, a traditional Italian oil cultivar (Barranco et al. 2000). The current study corroborates those findings. ‘Oblonga’ and ‘Frantoio’ were synonymous at all fifteen SSR loci.

‘Mission’, a widely grown mainstay of the California olive industry, was first grown in California by Spanish Franciscan missionaries in the eighteenth Century, but its precise origin remains unclear (Taylor 2000). It shares the most allelic diversity in the UPGMA cluster analysis with several other ancient and well-established Iberian cultivars: ‘Nevadillo’, ‘Manzanillo’, ‘Cordovil’, ‘Picual’, ‘Verdeal’, and one genotype synonymous among several South American cultivar names (‘Azapa’, ‘Yuaca’, ‘Mission Leiva’, and ‘Mostazal’). This supports a Spanish origin of ‘Mission’. A comparison of the ‘Mission’ genotype to the above mentioned cultivars and ‘Sevillano’, another widely grown Spanish cultivar, revealed that all could contribute at least one allele to the ‘Mission’ genotype at thirteen or fourteen SSR loci, and at all fifteen loci for ‘Sevillano’ (Table 5), suggesting that ‘Mission’ could have originated as a seedling selection of the Iberian germplasm imported to the New World by the Spanish missionaries. Another possibility in the cases of both ‘Zitoum’ and ‘Mission’ accessions is that these are catch-all names that may have included distinct cultivars that at one time were subsumed under those names for ease or because of broad shared characteristics, as has been observed for Phaseolus species and cultivars among farmers in southern Mexico (Soleri et al. forthcoming). The leading international database for olive cultivars includes five entries for “Zeitoun” (http://www.oleadb.it/scheda.php?name=zeitoun&Submit2=submit). Another Californian cultivar, ‘Redding Picholine’, is thought to have been imported into California in the late nineteenth century as a rootstock variety initially mistaken for the French ‘Picholine’ scion cultivar (Taylor 2000). The UPGMA cluster analysis places ‘Redding Picholine’ into the most basal cultivated olive clade (C1) and is most closely associated with the French cultivar, ‘Rubra’, suggesting that the former cultivar may derive from French germplasm.

Table 5 Comparison of the ‘Mission’ genotype to that of several Iberian and one South American accession(s) at fifteen SSR loci

The NCGR olive collection is a useful source of genetically diverse material for use by olive breeders, researchers, growers, and backyard hobbyists to help maintain and improve the olive crop in California. Therefore, it is vital for the genetic identity of olive germplasm to be verified before distribution to these users. Instances of misidentification or ambiguous identity, as with the ‘Zitoum’ example above, highlight the need for a standardized set of molecular markers used to build a database of verified olive cultivar genotypes (Doveri et al. 2008; Baldoni et al. 2009), a set harmonized with researchers in the Mediterranean and worldwide to ensure robust, globally accurate identities.

Genetic structure and differentiation within and among groups

The US ex situ germplasm for cultivated olive, represented by the accessions in the NCGR collection, appears to be relatively variable as reflected in the highly dissected nature of the UPGMA phenogram, low level of differentiation among groups, low proportions of variation explained by the first two principal components, and overlapping groups within the PCA. Such low differentiation might be expected from free gene flow among the highly outcrossing olive cultivars and millennia of human exchange of germplasm throughout the Mediterranean (Angiolillo et al. 1999). However, the greater structure and intercultivar variation encountered in surveys of Mediterranean germplasm (Cipriani et al. 2002; Sarri et al. 2006) suggest that our observations may reflect one or more of the following: (a) narrow genetic diversity in a crop species outside its center of origin and diversity; (b) a founder effect in a nonnative and often clonally propagated species; (c) a relatively short and geographically limited experience with olive cultivation; (d) poor representation in the germplasm collection of regionally present genotypes. For example, a small recent study (128 individuals from five different plantings) of in situ historic olive trees in central coastal California using the same SSR markers as reported here found four different cultivar genotypes not represented in the USDA accessions (Soleri et al. 2010).

The moderate to high level of gene diversity found overall per locus is typical of outcrossing, clonally propagated crops such as olive (Sefc et al. 2000). Most of this gene diversity is apportioned among cultivars within groups. However, the structure of the distance-Wagner analysis indicates some significant differences in the frequency and composition of alleles among olive groups.

Some Eastern Mediterranean cultivars in the NCGR collection showed higher levels of differentiation than other accessions in the UPGMA (C1, C11). This may support archeological (Zohary and Hopf 2000) and mitotype (Besnard et al. 2001; Breton et al. 2006) evidence of an Eastern Mediterranean center of origin of olive domestication and east to west dispersal pattern. The differentiation of certain oil cultivars, such as the French and Greek oil cultivars in cluster 1 of the UPGMA, suggests multilocal, strong selection pressure for oil-producing traits. Notably, cluster 1 also forms a separate clade among UPGMA clusters using the distance-Wagner method. Multilocal selection of cultivars was likewise reported using RAPD (Besnard et al. 2001) and SSR (Belaj et al. 2004) data.