Patterns of genetic and eco-geographical diversity in Spanish barleys
- First Online:
- Cite this article as:
- Yahiaoui, S., Igartua, E., Moralejo, M. et al. Theor Appl Genet (2008) 116: 271. doi:10.1007/s00122-007-0665-3
- 159 Views
The pool of Western Mediterranean landraces has been under-utilised for barley breeding so far. The objectives of this study were to assess genetic diversity in a core collection of inbred lines derived from Spanish barley landraces to establish its relationship to barleys from other origins, and to correlate the distribution of diversity with geographical and climatic factors. To this end, 64 SSR were used to evaluate the polymorphism among 225 barley (Hordeum vulgare ssp. vulgare) genotypes, comprising two-row and six-row types. These included 159 landraces from the Spanish barley core collection (SBCC) plus 66 cultivars, mainly from European countries, as a reference set. Out of the 669 alleles generated, a large proportion of them were unique to the six-row Spanish barleys. An analysis of molecular variance revealed a clear genetic divergence between the six-row Spanish barleys and the reference cultivars, whereas this was not evident for the two-row barleys. A model-based clustering analysis identified an underlying population structure, consisting of four main populations for the whole genotype set, and suggested further possible subdivision within two of these populations. Most of the six-row Spanish landraces clustered into two groups that corresponded to geographic regions with contrasting environmental conditions. The existence of wide genetic diversity in Spanish germplasm, possibly related to adaptation to a broad range of environmental conditions, and its divergence from current European cultivars confirm its potential as a new resource for barley breeders, and make the SBCC a valuable tool for the study of adaptation in barley.
For many crops, high-yielding cultivars developed by modern plant-breeding programmes have replaced the traditional landraces. This phenomenon, which in turn reduces the genetic base of current cultivars, is especially true in developed countries (Briggs 1978). Several recent studies have shown that this is the case for European barley (Hordeum vulgare ssp. vulgare) (Graner et al. 1994; Melchinger et al. 1994; Ellis et al. 1997; Russell et al. 2000), though the loss in diversity due to modern breeding may have been partially offset by the subsequent introgression of disease resistances (Koebner et al. 2003). Historical records indicate that genetic erosion might have occurred as a consequence of the use of a very limited number of landraces and primitive cultivars in crosses during the earlier stages of modern breeding in Europe (Fischbeck 1992). It is likely, therefore, that genetic diversity present across original European landraces has not been fully exploited.
In Spain, the National Centre for Plant Genetic Resources holds a collection of over 2000 accessions of cultivated barley, most of which are native landraces collected in the first half of the twentieth century (Lasa et al. 2001). Given their history of selection under Mediterranean conditions over a long period of time (i.e., barley cultivation in Spain dates back to prehistoric times), they may harbour adaptive genes and alleles that have escaped mainstream breeding. An evaluation of the diversity represented by this genetic resource is necessary in order to facilitate its use in future cultivar development. To this end, a core collection of the accessions held at the national repository [Spanish barley core collection (SBCC)] was systematically assembled (Igartua et al. 1998).
We chose microsatellite markers, as this system has been used effectively in diversity studies in barley (Struss and Plieske 1998; Russell et al. 2000; Matus and Hayes 2002; Russell et al. 2003, Malysheva-Otto et al. 2006; Pandey et al. 2006; Orabi et al. 2007). The main objective of this study was the characterization of the genetic diversity present in Spanish barleys, by means of molecular markers, and their relationship to the standard breeding genepool, represented by a reference set of mostly European cultivars.
Materials and methods
Barley genotypes analysed (detailed information on cultivars in Electronic supplementary material Table S1)
No. of accessions
Inbred lines derived from local landraces (SBCC)
Albacete, Almunia, Candela, Pané
Ager, Asplund, Athenais, Athene, Banteng, Barberousse, Bordia, Dea, Dobla, Dura, Frisia, Gerbel, Hatif de Grignon, Hauter, Herfodia, Juli, Mammuth, Maskin, Mirco, Monlon, Morex, Olli, Orria, Plaisant, Ragusa, Senta, Steptoe, S-36, S-45, Tapir, Vega Svalöf, Vindicat, Vogelsanger Gold
Inbred lines derived from local landraces (SBCC)
Albaicín, Alexis, Alpha, Angora, Beka, Camelot, Cameo, Clarine, Gaelic, Graphic, Hassan, Hispanic, Igri, Kym, Labea, Logan, Mogador, Nevada, Pallas, PC-4, Seira, S-7, Tipper, Tremois, Triumph, Union, Volga, Wisa, Zaida
Molecular marker analyses
Samples of leaf tissue were taken from 8–10 plants per genotype, 14 days after sowing in paper-pots in the greenhouse. DNA extraction was carried out following a CTAB procedure, as described in Casas et al. (1998). The entire set of 225 genotypes was genotyped for 64 microsatellites, also known as simple sequence repeats (SSRs), (Electronic supplementary material Table S2) and for one sequence-tagged site (STS) MWG699. This STS is closely linked to vrs1, the gene controlling ear type and that has been proposed as a diagnostic marker for barley origin (Tanno et al 1999, 2002). SSR primer pair sequences and amplification conditions were obtained from Pillen et al. (2000), Ramsay et al. (2000) and Macaulay et al. (2001). PCR amplifications were carried out in a final volume of 15 μl, containing 50 ng of genomic DNA, 1× PCR Buffer (Biotools, Madrid, Spain), 2 mM MgCl2, 15 pmol of forward and reverse primers, dNTPs at 0.2 mM each, and 0.4 U of Tth DNA Polymerase (Biotools, Madrid, Spain).
Equal volumes of electrophoresis-loading buffer containing 95% formamide were added to the samples, which were then denatured at 95°C, quickly cooled and electrophoresed in 5% polyacrylamide gels. A 30–330 bp AFLP Ladder (Invitrogen) was also loaded and products visualized by silver staining (Bassam et al. 1991). Gels were scanned with a Molecular Imager FX (Bio-Rad) and product sizes estimated using the Diversity Database software (Bio-Rad). Two cultivars were included as checks in each gel with these checks being polymorphic for the markers assayed. A set of 40 cultivars were assayed first for all markers, providing information on allele diversity and size at each marker. After all samples had been tested, the polymorphisms found were confirmed by running a set of verification gels for each marker with genotypes representing all apparent allele sizes.
Twelve SSRs, Bmag136, Bmag013, Bmag384, Bmag353, EBmac701, EBmac970, Bmac113, Bmag223, EBmac806, Bmag206, Bmag120, and Bmag135 were analysed using an ABI 310 automated sequencer (Applied Biosystems). PCR products were run together with the internal lane size standard GeneScan 500 [TAMRA] according to the supplier’s instructions. The results were processed with GeneScan software.
Nearly all SSR used produced a single band per genotype. Double bands were observed in low frequency, and were even less common among the SBCC landraces, which confirms their homozygosity. A single SBCC genotype showed double bands for six SSRs, and three cultivars showed double bands at 13 SSRs in total. All potential cases of double bands were confirmed with additional PCR runs. When double bands were detected, the most intense was taken as representative and used for subsequent analyses.
Analysis of molecular variance (AMOVA)
The analysis of the distribution of genetic variation across the germplasm groups established a priori was done using the AMOVA option of Arlequin software package (Arlequin software, Schneider et al. 2000). Fixation statistics (FST and RST, corresponding to the infinite allele and the stepwise mutation models of SSR evolution, IAM and SMM, respectively) were produced for individual SSRs and groups of germplasm. The significance of the estimates was obtained through permutation tests, using 1,000 permutations. The significance level chosen was 0.0008, which corresponds to a genome-wise significance level of 0.05 for 64 independent tests (one for each marker), applying a Bonferroni correction.
Analysis of genetic structure
Genotypes were classified into genetic clusters according to molecular markers, using a model-based approach with the software package STRUCTURE (Falush et al. 2003). Given a value for the assumed number of populations (clusters), this method assigns genotypes from the entire sample to clusters in a way that Hardy–Weinberg equilibrium is maximized within clusters, and linkage disequilibrium is accounted for by differences in allele frequencies among clusters. As the lines used for this study are homozygous, we used the method to detect exclusively association between marker loci rather than including within-marker locus variation (Kraakman et al. 2004). The analyses were done according to the linkage model in STRUCTURE (Falush et al. 2003). Twenty-five different runs of STRUCTURE were done by setting the number of populations (K) from 1 to 10. For each run, batches of runs were carried out using burn-in time and replication numbers set to 10,000 (as several runs with these numbers set at 50,000 gave very similar results).
Distribution of genetic diversity according to geographic and climatic factors
The distribution of populations defined by cluster analysis over climatic and geographic factors was examined. Factors considered were: altitude, latitude, rainfall, temperature, total evapotranspiration (ETP), Turc index, Papadakis climatic index, and agroecological region (a division of geographic zones based on historic barley production, defined in Igartua et al. 1998). Rainfall, temperature and ETP were calculated as annual means, and as seasonal means: Autumn (mean of October, November and December), Winter (January, February, March), and Spring (April, May, June). Turc index is an estimation of the agronomic productivity of a region, based on correlations between climatic factors and production over a long period of time, and is expressed as tons of dry matter per hectare of an adapted plant under standard cultivation (Ruimy et al. 1996). Papadakis climatic index classifies climates according to the factors that affect crop development most, namely temperature and humidity (Papadakis 1975). Data were extracted from the SIGA service (Spanish acronym for Geographic Information System for Agriculture) of the Spanish Ministry of Agriculture, Fisheries, and Food (http://www.mapa.es/es/sig/pags/siga/intro.htm). This site provides monthly averages of climatic data for 4,186 locations over the country (averaged from 1960 until 1996), collected by the National Institute of Meteorology. We had coordinates for the collections sites of all landrace-derived accessions from the SBCC. For 77 accessions, collection localities also had weather stations. For the rest, the nearest most similar location with climatic records was chosen. In 34 cases, climatic stations were less than 10 km away from collection sites; for an additional 35 cases, weather stations were within a 20-km radius; leaving only 13 cases, where weather stations were over 20 km from the collection sites. When several weather stations were available at similar distances, we chose the one that most closely resembled the collection site in altitude and orientation.
Association of markers and alleles with geographic and climatic factors was explored by means of linear regression analyses. Alleles with relative frequencies below 5% were excluded from the analyses. Regressions of markers on climatic and geographic factors were carried out using PROC GLM (SAS 1988). Alleles were introduced in the model as dependent variables, whereas geographic and climatic factors were the independent variables. Significance was calculated for the model, which included only one allele, with the significance threshold set at 0.05, using a Bonferroni correction, as mentioned before. The association to climate types according to Papadakis index was done by analysis of variance (using similar significance levels as for the regression analyses).
Overall, 669 alleles were found for the 225 genotypes and 64 SSRs (Electronic supplementary material Table S2). Null alleles were found at five loci (HvLTPPB, HvDHN7, EBmac806, Bmag135, and HvGLB2). The number of alleles per locus varied between 2 and 38, with a mean of 10.5 alleles per locus. A sizeable proportion of alleles (34.1%) were restricted to one of the four germplasm groups. Although differences in sample size need to be considered, the number of unique alleles was much higher within the group of six-row Spanish barleys (184 out of 228). Most of these were rare alleles, present at very low frequencies, but nine of them were found in more than 10% of the individuals of the corresponding germplasm group.
Overall, the Spanish six-row genotypes were more diverse than the reference six-row group (average diversity index across all loci of 0.62 versus 0.58, Electronic supplementary material Table S2). The level of diversity in both two-row groups was lower (0.53 and 0.54). A few alleles were fixed at some groups. For instance, all the genotypes of the Spanish six-row group had the same allele at the HvHVA1 (136 bp) and HvCMA (133 bp) loci.
Genetic variance among the four groups of germplasm defined a priori
The analysis of molecular variance for the germplasm groups defined a priori revealed significant FST and RST parameters for both six-row and two-row comparisons (Electronic supplementary material Table S3). Genetic divergence between the Spanish and reference six-row groups was 10.4% for FST, whereas it was 21.4% for RST. For the comparison between the two-row groups, the FST was similar (11.9%), but RST was much lower than the value for between the six-row groups (7.5%).
The grouping of genotypes observed in Fig. 1 did not follow entirely the groups made a priori. For this reason, we decided to investigate further the genetic structure of the germplasm studied, especially to address the question of whether highly diverse Spanish six-row barleys constitute a single group.
FST and RST statistics for pairwise comparisons between the populations (named in Roman numerals I through IV) and subpopulations of barley genotypes defined by the analysis carried out with the STRUCTURE package for K = 4
Association of groups with geographic and climatic factors
Means and standard deviations of geographic and climatic factors for the landrace-derived inbred lines of the SBCC of the four populations, and population IV subpopulations, deduced from the population structure analyses (subpopulations of population III presented no significant difference for these factors)
Autumn rainfall (mm)
Winter rainfall (mm)
Spring rainfall (mm)
Turc productivity index
Distribution of STS MWG699 haplotypes across populations derived from population structure analyses
Populations and subpopulations
No. of genotypes
Genetic diversity in groups defined a priori
The average number of alleles per locus (10.5, Electronic supplementary material Table S2) was higher than the comparable figures reported in other studies for cultivated barley (Russell et al. 1997; Macaulay et al. 2001; Karakousis et al. 2003; Sjakste et al. 2003). Although many markers are in common among these studies and ours, they surveyed smaller samples than that of the current study. The level of diversity found in our sample is closer to values reported in studies with populations of H. spontaneum (Ivandic et al. 2002; Baek et al. 2003), or large diverse sets of cultivated barley (Russell et al. 2000; Matus and Hayes 2002). Thus, we can conclude that we are studying a sample of H. vulgare types with considerable polymorphism. Notably, we had 14 SSRs in common with the worldwide survey of cultivated barley diversity carried out by Malysheva-Otto et al. (2006). In the present study, the average numbers of alleles and PIC for these 14 loci were 12.1 and 0.72, respectively, compared with 18.6 and 0.79, for the worldwide study.
On average, SSRs derived from genic sequences were less polymorphic than SSRs from random genomic clones (6.5 vs. 12.1 alleles per locus). Interestingly, most of the gene-derived markers in our study showed higher diversity values than the previously reported for a set of elite German cultivars (Pillen et al. 2000). The high values of diversity detected were mostly due to the large number of alleles (591) present in the six-row Spanish group, 34.1% of which were private alleles. As pointed out by Matus and Hayes (2002), the presence of so many unique alleles could be an indication of the relatively high rate of mutation at SSR loci, or could also point to the existence of exotic germplasm that could be a reservoir for novel alleles for crop improvement. Thus, the high diversity of the Spanish six-row accessions, coupled with the high number of unique alleles, could be explained by a rather long history of isolation from other European countries and concurrent genetic drift or selection for adaptation to local constraints. The generally low frequency of most private alleles is consistent with a genetic drift explanation; but the presence of some private alleles with high frequencies in Spanish barleys, however, suggests the effect of selection pressure. The distribution of five of the nine private alleles present in high frequencies in the Spanish six-row group was related to geographical factors. The association of the distribution of genetic diversity with geographic patterns also points at the presence of selection pressure favouring alleles associated with better local adaptation (Tables 3, 4).
The molecular analysis of variance revealed a remarkable genetic divergence between the Spanish and reference sets. FST values among them were similar to the values found by Maestri et al. (2002) and Koebner et al. (2003) for comparisons between winter and spring barleys, the two quite distinct germplasm groups. The divergence between the Spanish and reference groups was similar for both the two-row and six-row barleys, using an IAM (measured by the FST, Electronic supplementary material Table S3). However, using an SMM (RST statistic), the Spanish and reference six-row barleys would be more distinct than the two-row groups. This was caused by the fact that differences in allele frequencies between groups were similar in number for both row types but, for the six-row groups, the alleles with different frequencies among groups were also more distant in size. Accordingly, there were more SSRs clearly discriminating between the Spanish and reference sets for the six-row than for the two-row groups (Electronic supplementary material Table S3). The level of genetic differentiation between the Spanish and reference sets was not as high as found among barley landraces from Syria and Jordan (RST = 32.04) (Russell et al. 2003), but were close to the values found between populations of wild barley from different countries (FST = 7.75–10.54, and RST = 8.58–10.59) (Ivandic et al. 2002). For some markers, both FST and RST indices were significant, whereas for others only one of the two statistics, usually FST, was significant, as was found also by Ivandic et al. (2002), who recommended the use of both indices to provide maximum information on allele differentiation among groups of genotypes.
The evidence suggests that the Spanish six-row barleys are more distinct from their reference counterparts than the two-row types. The principal coordinate analysis supported this, as about half of the Spanish two-row barleys clustered with cultivars from the reference two-row set (Fig. 1), especially with spring cultivars (Beka, Triumph, and Alexis). Other Spanish two-row entries clustered closer to the six-row barleys, in intermediate positions between the Spanish and reference sets. Nevertheless, the observations on the Spanish two-row set cannot be conclusive, because of the small sample size. This size was set to be proportional to their prevalence in the original collection of Spanish barleys (Igartua et al. 1998).
The Spanish six-row genotypes generally formed a distinct group, with only a few genotypes clustering with genotypes of the reference sets (about 16 in total). The distinct cluster of Spanish six-row genotypes, however, showed a remarkable internal diversity that was comparable to the two reference sets combined, given the scatter of their respective points on the principal coordinate analysis (Fig. 1). Interestingly, the Mediterranean cultivar Athenais (from Greece) appeared to show a relatively higher degree of genetic relatedness to Spanish accessions compared with other non-Mediterranean cultivars.
The use of the STRUCTURE clustering algorithm allowed the identification of populations of genotypes, based on their genetic similarity. This procedure has been used for clustering of collections of inbred lines in maize (Remington et al. 2001; Jung et al. 2004), Arabidopsis (Olsen et al. 2004), rice (Garris et al. 2003), wheat (Chao et al. 2007) and barley (Pandey et al. 2006; Morrell and Clegg 2007), among other crops. In all these studies, authors found evidence of population substructure. Kraakman et al. (2004), however, did not find any substructure among a collection of recent spring barley cultivars. In our study, we have clearly and consistently identified four populations, two dominated by Spanish six-row entries, and two clearly formed around the reference sets.
The two main Spanish populations (III and IV) were identified by the STRUCTURE analysis at K = 3, prior to the separation of populations I and II, which comprised all the entries from the reference sets (K = 4). This does not necessarily imply that the two main Spanish populations presented larger genetic divergence than populations I and II (basically two-row vs. six-row reference sets) as the population sizes were not equal and this may well have affected the order of population identification. Indeed, the FST snd RST fixation indices (Table 2) showed that the genetic divergence between populations III and IV was slightly less than that between populations I and II. It is remarkable, however, that the difference between the two large Spanish populations and the two populations dominated by reference sets entries were roughly equivalent. The reference sets were very diverse, including cultivars of all growth and row types (spring and winter, two-row and six-row), and represented the range of germplasm that largely underpins current European barley cultivars. Therefore, the genetic differences between populations III and IV must be quite important. The AMOVA of the four populations derived from the STRUCTURE analysis confirmed the genetic divergence among them. The range of values observed for FST and RST among populations was similar to the range found in a comparison of landraces from Jordan and Syria by Russell et al. (2003).
Association of populations with geographic and climatic factors
The two main Spanish populations were distributed according to geographic patterns, and roughly following a North–South direction (Fig. 4), though some groupings independent from geography were also observed.
A few studies have shown that correlations of climatic factors with genetic diversity assessed with SSRs are prevalent in H. spontaneum (Turpeinen et al. 2001; Baek et al. 2003; Ivandic et al. 2002, 2003). Nevo et al. (2005) even suggested a possible adaptive role for the SSRs themselves, in a situation dominated by abiotic stresses. Some of the associations found by these authors were confirmed in the present study (Table 4). This was true for 6 markers (HvM62, HvLTPPB, Bmag369, Bmag378, HvBTAI3, HvM67). In another two cases, we detected similar associations to those found by Ivandic et al. (2002), but using different markers in the same regions. In this study HvM62 and Bmag369 alleles correlated with temperature and rainfall, whereas Ivandic et al.(2002) found that Bmac29 (close to HvM62), and Bmac273 and Bmag120 (flanking Bmag369) were associated with humidity.
The main climatic factors affecting the distribution of barley cultivars are temperature and water availability (Morris et al. 1991). Accordingly, we divided the markers associated with climatic and ecogeographic factors into five groups: markers with association to temperature and rainfall distribution; markers related to only one of these two factors; markers related to other climatic factors; markers related only to complex geographic indexes (Table 4). As climatic factors were themselves correlated, it is difficult to discriminate the effect of single factors. Attending to the number of associations, however, temperature seemed to be the single climatic factor, which most affected the distribution of marker alleles in the SBCC landraces.
Thus, adaptation to ecogeographic factors could be one of the causes of the observed population structure of the SBCC. It cannot be concluded, however, that there are loci for adaptation linked to the SSRs whose distribution is associated with geography and climate. The genetic diversity of this sample of germplasm was clearly stratified in populations (Fig. 3), and these populations were seemingly distributed along eco-geographic gradients (Table 3). Thus the distribution of entries from populations III, IV–I, and IV–II followed a gradient of agroecological conditions occurring in the barley-growing area of the Iberian peninsula, following the main climatic clines. Therefore, any marker that had uneven distribution across genotypic populations would necessarily appear associated to geographic or climatic factors for which populations also differ. Actually, there were similarities between locus-by-locus FST (not shown) calculated for the four populations presented in Table 2, and the strength of loci association with eco-geographic traits. This was especially true when the analysis was based only on populations III and IV, which included the majority of SBCC entries, and showed clearly distinct eco-geographic distributions (Table 3). Markers (31) that showed association with eco-geographic factors showed an average FST of 0.18 between these populations, which was significantly (P < 0.0001) larger than the average FST value of 0.05 for the rest of the marker loci. For these two populations, the correlation coefficient of FST with number of associations was 0.64. The reason for uneven distribution of marker alleles over populations could be the selection for adaptation to local environments (Russell et al. 2003), but could also be due to incomplete admixture of populations from different origins, with different phylogenetic histories. Further efforts to elucidate the causes of this biased marker distribution will need to focus on the existence of unlinked gene complexes, distribution of functional polymorphisms, and search for molecular signatures of selection.
All evidence found point to the existence of at least two large distinct populations of Spanish six-row barleys (III and IV), with distinct distribution over agroecological environments. In order to investigate whether these populations have a different origin, all materials in this study were genotyped with the STS MWG699, which was proposed by Tanno et al. (1999, 2002) as a marker of barley domestication. The marker shows three haplotypes, named A, D, and K, the last one being only reported in two-row barleys. Tanno et al. (2002) found the A haplotype widely spread, whereas D was confined to the Mediterranean region, though Casas et al. (2005) found a more widely spread distribution of D haplotype over Europe and a possible association of this marker with plant growth type.
The distribution of MWG699 haplotypes over the populations and subpopulations found in this study reflects the population structure and the distribution of the SSRs used to identify the populations (Table 5). If this marker does reflect domestication history, then the evidence found suggests the presence of two different origins for SBCC barleys that may be related to the influx of different human populations coming into the Iberian peninsula from different origins in historic or prehistoric times, each one possibly carrying its own type of barley. Alternatively, there could be a substratum of barleys in Spain that originated from a domestication event in the Western Mediterranean region (possibly represented by the MWG699 D haplotype), as several authors propose a polyphyletic origin for barley (Komatsuda et al. 2004; Morrell and Clegg 2007; Orabi et al. 2007), and some evidence points to a Western Mediterranean centre of origin or diversity for barley (Moralejo et al. 1994; Molina-Cano et al. 1987, 2005). Interestingly, both Casas et al. (2005) and Tanno et al. (2002), found that the MWG699 D haplotype was present in some H. spontaneum accessions from Morocco.
In conclusion, we propose the hypothesis that the SBCC genotypes have at least two different origins, and that the distribution of their original landrace populations over the Iberian peninsula followed patterns of adaptation to local conditions. This adaptation may have been due to distinct fitness of founder populations to different climates, to new variability created through admixture and recombination of original populations, or to specific adaptations that developed locally. Further studies on the SBCC should shed light on the causes of adaptation, specifically on traits and genes that led to its distribution. Such studies will also facilitate the utilisation of this important genetic resource as a source of useful novel alleles in future breeding programmes.
This research was funded by project RTA01-088-C3, granted by the Instituto Nacional de Investigación y Tecnología Agraria y Alimentación (INIA), of the Spanish Ministry of Science and Technology, and co-funded by the European Regional Development Fund. Samia Yahiaoui was supported by a scholarship from the Agencia Española de Cooperación Internacional (AECI), of the Spanish Ministry of Foreign Affairs.