Introduction

Plants can increase their freezing tolerance (FT) in response to low, non-freezing temperatures, a phenomenon known as cold acclimation or hardening. The molecular dissection of cold acclimation has revealed a complex process characterised by the coordinated up- or down-regulation of hundreds of Cold-Regulated (COR) genes which, in turn, are controlled by a complex regulatory network (Shinozaki and Yamaguchi-Shinozaki 2000; Thomashow 2001; Fowler and Thomashow 2002). In many plant species, including Triticeae, the C-repeat Binding Factor (CBF) genes are the key regulators of the signal cascade leading to the expression of COR genes. The CBF transcription factors recognise the cis-acting CRT/DRE (C-repeat/dehydration responsive element) element largely represented in the regulatory regions of COR genes (Stockinger et al. 1997; Skinner et al. 2005). Heterologous over-expression of CBF sequences in transgenic plants has been demonstrated to increase levels of FT (Takumi et al. 2008; Oh et al. 2007).

In barley (Hordeum vulgare), 20 CBF genes have been identified by Skinner et al. (2006), 11 of which were found to be arranged in two tight clusters in tandem on the long arm of chromosome 5H, within a genomic region previously identified as FT quantitative trait locus (Fr-H2) (Skinner et al. 2006; Francia et al. 2007). Several studies have reported that the expression of CBF and, in turn, of COR genes is reduced following induction of Vrn-H1 in response to vernalization, suggesting an interaction between vernalization and acclimation regulatory gene networks (Stockinger et al. 2007). In the Nure × Tremois mapping population, CBF gene clusters were reported to span 0.8 cM (Francia et al. 2007). In Triticum monococcum an orthologous genomic region containing similar CBF gene clusters was found (Miller et al. 2004). The role of CBFs in cold acclimation (Stockinger et al. 2007), the association of CBF expression level with FT (Vágújfalvi et al. 2005) and the tight co-segregation of the CBF gene clusters with the Fr-H2 QTL region make these genes the most appropriate candidates for one of the two major QTL controlling frost tolerance in Triticeae (Stockinger et al. 2007). The detailed analysis of the genomic structure of the CBF cluster in a panel of barley cultivars has revealed a non-conserved genomic organisation at the Fr-H2 locus: the cultivars Nure and Dicktoo (both winter type) share the same genomic organisation, while Morex and Tremois (both spring type) are characterised by a different genomic structure (Francia et al. 2007). For example, HvCbf10B gene is present in Dicktoo and Nure, though it is not harboured in Morex. As a consequence, the total number of the CBF genes at the Fr-H2 locus may vary among accessions (Francia et al. 2007). The role of each CBF gene and their regulatory network has been extensively studied in Arabidopsis thaliana where at least AtCbf1 and AtCbf3 have additive effect and redundant activities (Novillo et al. 2007).

The investigation of the genetic variants in cultivated and wild cereal crops allowed understanding important phenomena as domestication effects and selective sweep. Nucleotide and haplotype diversities are widely used estimators of the degree of loci polymorphism in populations. In barley a dramatic loss of genetic variants in cultivated varieties was reported in genes controlling pathogen resistances (Bundock and Henry 2004), germination (Russell et al. 2004) and in genes encoding enzymes (Kilian et al. 2006). Intensive selection and the resulting bottlenecks appear to be the major determinants of the narrowness of the modern gene pools (Badr et al. 2000; Salamini et al. 2002). In A. thaliana the natural purifying selection was pointed out to act on CBF genes to explain the clinal variation in freezing tolerance of the different geographical accessions (Hannah et al. 2006; Zhen and Ungerer 2007; Zhen and Ungerer 2008).

The level of linkage disequilibrium (LD) or the non-random association of alleles at distinct loci (Hill and Robertson 1968; Levontin and Kojima 1960), can change across genomic regions, depending on different local recombination rates and sequence compositions. As pointed out in previous studies in human and plants (Yu and Buckler 2006; Hayes and Szucs 2006; Rostoks et al. 2006), LD evaluation is a crucial step to carry out association analysis. In out-crossing plants as maize, LD decay extends from few bases up to 2,000 bp, in landraces and elite cultivars, respectively (Remington et al. 2001; Tenaillon et al. 2001; Palaisa et al. 2003; Yu and Buckler 2006), although, in some genomic regions, LD can extend till 500 kb (Jung et al. 2004). In A. thaliana, an inbreeding species, the extent of LD range is about 10 kb (Nordborg et al. 2005; Kim et al. 2007), nevertheless in highly isolated populations LD extends up to 50 cM (Garris et al. 2003). In cultivated barley, LD can be estimated as far as 60 cM (Rostoks et al. 2006), while in wild barley, despite the high rate of self-fertilisation, LD decays at a rate closer to that observed in out-crossing plants (Morrell et al. 2005).

Association mapping has been originally proposed in humans to dissect the genetics of complex diseases (Lander and Schork 2004; Templeton 1995). The aim of association mapping is to find in unrelated individuals significant association between genetic variants and phenotypes (Weiss and Clark 2002). In plants, association mapping offers the opportunity to dissect candidate genes that underlie important agronomic QTLs using large germoplasm collections instead of family based crosses (Laird and Lange 2006). Association mapping can be performed with a genome-wide molecular marker set or with markers designed to target candidate genes. The second approach requires low levels of LD and a high density of markers in order to detect association at high resolution (Rafalski 2002). This approach was used in wheat to refine the genomic region carrying candidate genes for resistance to Stagonospora nodarum blotch (Tommasini et al. 2007) and for finding molecular markers associate with kernel size (Breseghello and Sorrells 2006). In barley and maize, several studies were conducted to find statistical association between genetic variants and complex traits such as yield, flowering time and drought tolerance (Kraakman et al. 2004; Thornsberry et al. 2001; Pakniyat et al. 1999).

In this work, we explored the natural allelic variation in four CBF genes located at the Fr-H2 locus and in the Vrn-H1 gene using a representative collection of European cultivated barley, landraces and wild accessions. This study was designed (1) to assess existing allelic variants and the occurrence of domestication effects in key genes controlling a trait associated with plant adaptation to environment, and (2) to perform an association analysis between the genetic variants in the studied genes and the freezing tolerance phenotype.

Materials and methods

Plant material and DNA extraction

The genetic material used in this study (216 accessions in total) was selected from three different sources. A representative subset of 113 cultivars and locally adapted accessions extracted from the germplasm collection of the CRA-Genomic Research Centre of Fiorenzuola d’Arda were selected to represent widely cultivated germplasm across Europe, ensuring representation of likely current genetic variants in European barley. This set of genotypes was composed of modern cultivars (released between 1960 and 1980), of advanced cultivars (released after 1980) and of 12 European landraces. The growth habit of each cultivar considered in this study was obtained querying the European Barley Database (http://barley.ipk-gatersleben.de/ebdb.php3). For barley accessions missing in the database as well as for cultivars having contrasting passports, the vernalization requirement was assessed during experimental trials in Fiorenzuola d’Arda.

Sixty-seven barley landraces and old European cultivars previously selected out of 5,842 accessions for their different plant characters and described by Castiglioni et al. (1998) were obtained from the germplasm bank of Braunschweig (Germany). These accessions were used to investigate the genetic variants in barley populations locally adapted in a broad range of different environments.

Thirty-six Hordeum spontaneum originating from the Fertile Crescent were added as representative samples of the genetic variability present in wild populations (Badr et al. 2000). The genetic material considered is described in the supplementary Tables S1, S2 and S3.

Genomic DNA was isolated from about 60 mg of leaf tissue in 96-well microtube plates. Plant material was ground with MM300 Mixer Mill (Retsch®, Haan, Germany) and DNA was extracted with Promega Magnetic® 96 DNA Plant System following manufacturer’s instructions.

PCR amplification and SNP discovery

SNP analysis was carried out on five genes putatively involved in the adaptation of barley to low temperature: four amplicons from the CBF clusters on the long arm of the chromosome 5H (HvCbf3, HvCbf6, HvCbf9 and HvCbf14) and one corresponding to the Vrn-H1 gene located distally from the CBF clusters on the same chromosome. The CBF genes are named according to the classification of Miller et al. (2004), while following Badawi et al. (2007) the selected genes correspond to HvCbfIIIc-3, HvCbfIIIa-6, HvCbfIVd-9 and HvCbfIVc-14, respectively. PCR primers were designed using Primer3 software (http://frodo.wi.mit.edu) (Rozen and Skaletsky 2000) on the basis of the available sequences for HvCbf3, HvCbf9, HvCbf14 and Vrn-H1 (GenBank accession numbers AF239616 and EF409484, AY785877, DQ151545 and AY750995, respectively) in order to amplify most of the coding sequence and the 3′UTR. Two primer pairs with slightly different annealing positions were drawn for HvCbf3 and HvCbf9. One primer only was developed for HvCbf6 and used in association with the primer HvCBF6.006 (Skinner et al. 2005). Two reference sequences, α-Amy1 and Gapdh, described by Kilian et al. (2006) known to generate significant Tajima’s D values were also considered as references. They were amplified using two previously published primer pairs, B306–B307 and B604–B605 (Kilian et al. 2006). The list of primers used in this study and the length of expected sizes of amplicons are reported in Table S4.

All PCRs and sequencing reactions were carried out in 96-well plates (MicroAmp® 96-well plates, Applied Biosystems, Foster City, USA) in Eppendorf Mastercycler EPgradient thermalcyclers (Eppendorf, Hamburg, Germany) and each reaction was performed in 10 μl with the following mixture composition: 20 ng of DNA, 1.5 mM MgCl2, 0.4 μM of each primer and 0.25 U Taq HotStart DNA polymerase QIAGEN (Valencia, USA). The reaction mixtures for Vrn-H1 and HvCbf6 amplifications were supplemented with 3 mM of MgCl2 and 1× Q Solution QIAGEN (Valencia, USA), respectively. The following thermal protocol was used to amplify HvCbf3, HvCbf9 and HvCbf14: after an initial denaturation step at 95°C for 15 min, the reactions were subjected to 11 cycles of 95°C for 30 s, 60°C for 45 s and 72°C for 90 s and the annealing temperature was decreased by 0.7°C in each cycle. Then the reactions were subjected to 29 cycles of 95°C for 30 s, 53°C for 45 s and 72°C for 90 s. For the amplification of Vrn-H1 and HvCbf6, after an initial denaturation (95°C for 15 min), the thermal protocol was 35 cycles of 95°C for 30 s, 56°C for 45 s and 72°C for 90 s. A final elongation step of 10 min was applied for all amplification. Gapdh and α-Amy were amplified following the published protocol (Kilian et al. 2006).

PCR products were sequenced on both strands using ABI PRISM BigDye Primer Cycle sequencing kit v.3.1 (Applied Biosystems, Foster City, USA), following manufacturer’s instruction. Purified sequencing reactions were injected in ABI3730 DNA Analyzer (Applied Biosystems Foster City, USA). Raw electropherograms were analysed using Sequencing Analysis® software (Applied Biosystems, Foster City, USA) to obtain FASTA sequences. Subsequently, for each gene a consensus sequence for each accession was created, assembling forward and reverse sequences using Contig Assembly Program (CAP) (Huang 1992). The consensus sequence of each gene was aligned with web-based MultAlin Software (http://bioinfo.genopole toulouse.prd.fr/multalin/multalin.html) (Corpet 1988) and each FASTA alignment was used as input file for DnaSP 4.10 (Rozas et al. 2003) for SNPs detection and for all other analyses. Indel polymorphisms were included for inferring Vrn-H1 haplotypes, while for other genes, indels were not included.

Fluorescent AFLP reactions

AFLP reactions were conducted in fluorescence following the published protocol (Vos et al. 1995) with some modifications. Digestion, ligation and pre-amplification reactions were conducted as in the original protocol, while for selective fluorescent amplification, custom forward primers, 5′-labelled with 6-FAM (Applied Biosystems, Foster City, USA), were used instead of radioactive labelled primers. Selective amplification products, were diluted tenfold and 2 μl of each dilution was mixed with 10 μl of deionised formamide and 0.15 μl of GeneScan 1200-LIZ internal size standard (Applied Biosystems, Foster City, USA). Raw data were analysed using GeneMapper® 4.0 software (Applied Biosystems, Foster City, USA) to obtain genotyping tables containing only polymorphic peaks. Genotyping tables were then exported as tab-delimited files and appropriately formatted in Microsoft Excel® (Redmond, USA) to conduct phylogenetic and statistical analyses.

Data analyses

The degree of genetic polymorphism of each gene was investigated using two estimators, nucleotide diversity (π) and haplotype diversity (Hd). The estimators were calculated using the DnaSP software with Nei’s equation 10.5 and Nei’s unbiased equation, respectively (Nei 1987). Analysis of π and Hd was initially conducted for each group of accessions and, subsequently, considering the population structure.

To investigate the phylogenetic relationships among accessions, binary data obtained after scoring peaks in AFLP analysis were transformed in NEXUS format and analysed by the SplitsTree4 software (Huson and Bryant 2006). Trees based on Hamming’s genetic distance and neighbour-joining method, and Uncorrected-P genetic distance and neighbour-joining method, were constructed.

In order to test if the selected genes deviate from the neutrality (Kimura 1983), the D test of Tajima’s (1989) was applied accounting for population structure. Tajima’s D values were calculated with DnaSP, and when significant, the Tajima’s test was repeated within domesticated (D) and wild (W) accessions.

Linkage disequilibrium (LD) was estimated by squared allele–frequency correlations (r²) using the software package TASSEL 2.01 and 1,000 permutations (Bradbury et al. 2007). The significance of pairwise LD (P value) among all possible pairs was also evaluated by TASSEL with the rapid permutation test. Since r² has a large variance with rare alleles, only sites with an allele frequency (MAF) >0.1 were formatted appropriately and used to perform pairwise analysis of LD. To calculate the threshold beyond which LD values can be significant due to genetic linkage, square root of each pairwise r² among nucleotide variants detected in CBFs and in three physically unlinked genes (α-Amy; Gapdh and Vrn-H1) was calculated. The 95 percentile of this approximate normal distribution was assumed as r² threshold to declare the presence of LD among molecular markers (Breseghello and Sorrells 2006). Undetected amplicons of HvCbf3 and HvCbf9 were modelled as deletions of these genes in their respective genomes in LD analysis in order to find significant association with other molecular markers in the CBF genes.

Phenotypic analysis of frost tolerance

One-hundred-thirteen genotypes of European cultivars and locally adapted accessions were selected considering their different level of frost tolerance as assessed in repeated experiments conducted at Fiorenzuola during the past decade (Rizza et al. 2003). Frost tolerance was evaluated under controlled conditions (growth chamber),on plants at first leaf stage, cold acclimated for 4 weeks (3°C, 8 h light and 1°C, 16 h dark) and subjected to a freezing treatment according to Rizza et al. (2001). The freezing temperatures ranged between –10 and –14°C on the bases of previous experiences. A freezing stress at –10°C was able to discriminate within susceptible genotypes; –12°C between susceptible and resistant cultivars; while −14°C underline the behaviour of the very resistant genotypes. Three barley cultivars previously described in the literature and representative of different frost tolerance levels were included as controls in all experiments: Tremois (spring, frost susceptible—Francia et al. 2004), Nure (winter, frost resistant—Francia et al. 2004) and Pamina (facultative, very frost resistant—Rizza et al. 2003). The frost-induced damage was measured in leaves as a decrease in the maximum quantum efficiency of photosystem II (PSII) photochemistry, using the chlorophyll fluorescence parameter F v/F m, which is the ratio of variable (F v) to maximal (F m) fluorescence in dark adapted state. On the basis of the results of at least three independent experiments the genotypes were classified into five classes based on their level of frost tolerance: I: very good (not significantly different from Pamina), II: good (not significantly different from Nure); III: middle (less resistant than Nure, more resistant than Tremois); IV: low (not significantly different from Tremois); V: very low (more susceptible than Tremois).

Association analysis

The five FT groups served as the dependent variable for an association analysis to determine the effects of polymorphisms in CBFs and Vrn-H1. A structured association based on least squared general linear model (GLM) procedure was implemented in SAS® (Cary, NC, USA) modelling each polymorphism in the five candidate genes (CBFs and Vrn-H1) as a single factors consisting of two classes, one for each of the two SNP variants observed. Only those SNPs with a minor allele frequency >0.10 were considered. Likewise, a similar approach was used to include the genetic variants detected in the reference genes (α-Amy1 and Gapdh) in order to verify if spurious statistical association with genes not involved in FT occurred.

To account for population structure, principal component analysis was computed on AFLP matrix containing polymorphic loci scored in the 113 European genotypes of cultivars and of locally adapted accessions (Table S1) data set with NTSYSpc® (Exetersoftware, New York, USA). The first three principal components were included as regression variables in the association analysis. In addition, growth habit and ear type were also considered in the analysis.

The approach taken was to start with a saturated model, including all factors, and then progressively eliminating the least significant factor to finally arrive at a model with only plausibly important effects.

The initial, “saturated” model including selected nucleotide variants, plus additional explanatory factors, was

$$ \begin{aligned} Y_{\text{ijklmnopq}} = & \mu + {\text{PC1}} + {\text{PC2}} + {\text{PC3}} + {\text{HABIT}}_{\text{i}} + {\text{EAR TYPE}}_{\text{j}} + \left( {{\text{GAPDH}}_{\text{k1}} + \cdots + {\text{GAPDH}}_{\text{kNk}} } \right) \\ + \left( {\alpha {\text{AMYL}}_{\text{l1}} + \alpha {\text{AMYL}}_{\text{lNl}} } \right) + \left( {{\text{VRNH1}}_{\text{m}} + {\text{VRNH1}}_{\text{mNm}} } \right) + \left( {{\text{HvCBF3}}_{\text{n1}} + {\text{HvCBF3}}_{\text{nNn}} } \right) \\ + \left( {{\text{HvCBF6}}_{\text{o1}} + {\text{HvCBF6}}_{\text{oNo}} } \right) + \left( {{\text{HvCBF9}}_{\text{p1}} + {\text{HvCBF9}}_{\text{pNp}} } \right) + \left( {{\text{HvCBF14}}_{\text{q1}} + {\text{HvCBF14}}_{\text{qNq}} } \right) + {\text{e}}_{\text{ijklmnopqr}} \\ \end{aligned} $$

where Y is the FT score (I–V); μ a population mean, PC1 to PC3 are the linear regressions on the respective principal components accounting for population stratification; HABITi is the fixed effect associated with growth habit (winter, spring or alternative); EAR TYPEj is the fixed effect corresponding to whether the barley was classified a two- or six-row variety; (GAPDHk1 + GAPDHkNk) is the sum of the fixed effects of the Nk polymorphisms at the Gapdh reference gene; (αAMYLl1 + αAMYLlNl) is the sum of the fixed effects of the Nl polymorphisms at the α-Amyl l1 reference gene; (VRNH1ml + VRNH1mNm) is the sum of the fixed effects of the Nm polymorphisms in the Vrn-H1 gene; (HvCBF3n1 + HvCBF3nNn),… (HvCBF14q1 + HvCBF14qNq) are the sums of fixed effects of the polymorphisms at each of the four CBF genes and eijklmnopq is the random residual.

Backwards stepwise regression was used to eliminate non-significant effects and after, models with progressively fewer effects were implemented after elimination of the factors with the lowest level of significance (i.e., greatest P value). All factors with a P value below 0.40 were maintained in the final model. Inclusion of all significant marker loci in the models precludes the need for further multiple testing correction.

Results

CBF genes have a narrow gene pool in widely cultivated European barley

A representative subset of European cultivars, landraces and wild accessions of barley was investigated to detect genetic variants in five genes involved in FT plus two reference genes. The sequences of the amplicons were analysed to detect SNPs as well as to calculate π, Hd and the number of haplotype per gene (Table 1).

Table 1 Nucleotide and haplotype diversity recorded in four CBF genes, in Vrn-H1 and in reference genes Gapdh and α-Amy1

A total of 3,974 bp of genomic DNA were sequenced from each accession for candidate and reference genes and the average number of substitutions was calculated for each group of accessions (Table 1). In H. spontaneum one SNP every 79 bp was found, while in landraces and in cultivars the same value was one SNP every 107 bp and one SNP every 128 bp, respectively. Heterozygote mutations were not detected and all SNPs resulted bi-allelic. A summary of the overall genetic variants detected in CBF and reference genes is reported in Table S5. All singletons (polymorphic sites present only once in the germplasm analysed) occurred in landraces and in H. spontaneum. When nucleotide variants of each gene were translated in haplotypes, the number of haplotypes varied depending on the loci, from 14 in Vrn-H1 to six in α-Amy1 (Table 1). All haplotypes detected in advanced and modern varieties were represented in H. spontaneum accessions. An exception was a particular haplotype of HvCbf14 found only in landraces and old cultivars, but not in H. spontaneum.

Three CBF genes, HvCbf3, HvCbf6 and HvCbf9 showed a reduction of allele diversity and of π in advanced and modern varieties compared to other groups of accessions (Table 1). Compared with H. spontaneum these genes were undergone a loss of nucleotide polymorphisms (1 − πd/πw) (Tenaillon et al. 2004) of 46, 24 and 89%, respectively. Haplotype analysis in HvCbf9 revealed an almost monomorphic sequence in European cultivars (Hd = 0.058), while in wild accessions the allelic gene pool was wider (Table 1). When the primer pair HvCBF3fw-HVCBF3rw was used to amplify HvCbf3 the amplification failed in 38% of modern and advanced cultivars, in 4% of landraces and old cultivars and in 20% of H. spontaneum accessions (Tables S1, S2, S3). An additional primer pair (HvCBF3fw2-HVCBF3rw2) with slightly different annealing positions was therefore designed to amplify the gene in the accessions where amplicons were undetected; nevertheless, no amplification products were found. Similarly, the primer pair HvCBF9fw-HVCBF9rw failed the amplification in 12% of modern and advanced cultivars, 15% of landraces and old cultivars and in 11% of H. spontaneum accessions (Tables S1, S2, S3). A further attempt at amplifying with a second primer pair (HvCBF9fw2-HvCBF9rw2) did not give any amplification products in the same accessions.

HvCbf14 showed a wider allelic diversity in landraces and old cultivars than in H. spontaneum; nevertheless, in advanced and modern varieties the nucleotide and haplotype diversity were lower (Table 1). Compared to the other CBFs analysed, the loss of nucleotide diversity of HvCbf14 in modern and advanced cultivars was lower (21%). On average, the loss of nucleotide diversity of CBF genes in modern and advanced European cultivars compared to H. spontaneum was of 45%. Vrn-H1 represents the best candidate gene that underlie the Fr-H1 QTL for FT in barley (Hayes et al. 1993; Laurie et al. 1995; Stockinger et al. 2007). Fourteen Vrn-H1 haplotypes were found in selected accessions, and a subset of 12 were detected in H. spontaneum. Unlike other genes studied, the most recurrent genetic variants detected in Vrn-H1 were indel polymorphisms (Table S5). In advanced and modern varieties, Vrn-H1 showed a reduction of Hd and π compared to other groups of accessions (Table 1).

Two-hundred-fifteen AFLP polymorphic peaks obtained with six primer combinations (E32M49, E32M55, E36M49, E36M55, E38M55 and E41M55) were used to infer the population structure of the genetic material studied. The tree obtained with AFLP data, based on Hamming’s genetic distance and neighbour-joining method showed 4 NJ-clades (Fig. 1). Other trees based on different genetic distances showed very similar topologies (data not shown). Accessions were split on the basis of their growth habit: in NJ-clades I and IV predominant accessions were spring, while in NJ-clades II and III predominant accessions were winter types (Fig. 1). Two subgroups of H. spontaneum clustered with winter and spring accessions in NJ-clade II and IV, respectively.

Fig. 1
figure 1

Clustering of selected accessions using neighbour-joining method and Hamming’ s genetic distance. The tree was constructed on the basis of 215 AFLP polymorphisms obtained with six primer combinations. Red circles indicate H. spontaneum, while spring and winter accessions are shown in violet and dark blue, respectively. The four nodes from which the NJ-clades considered depart are indicated with black squares

Significant Tajima’s D values in non-structured populations can be consistent with a molecular selection hypothesis (Tajima 1989). To evaluate if the narrowness of allelic gene pool of CBFs and reference genes in advanced and modern cultivars of barley was an effect of selection, population structure was introduced and Tajima’s test was applied within each NJ-clade. Among the four CBF genes considered, a significant and negative D value was found for HvCbf9 in NJ-clade IV. Other CBF genes and Vrn-H1 did not deviate significantly from the neutral hypothesis (Table 2). As previously reported, the two reference genes deviated from neutrality in some NJ-clades (Kilian et al. 2006). Gapdh showed significant and positive D values for NJ-clades I, III and IV and these cases of significance could be consistent with balancing selection hypothesis (Wright and Gaut 2004) while for α-Amy1 Tajima’s D was significant and negative in NJ-clade IV (Table 2).

Table 2 Nucleotide diversity, haplotype diversity and Tajima’s D values recorded at seven barley genes considering genotypes assigned to four populations corresponding to the four major clades of the NJ-tree

To gain insight, for all cases of significance, Tajima’s D values were re-computed to assess if significant D values occurred when domesticated lines (D) and H. spontaneum (W) accessions were considered separately. The sequences of HvCbf9 within domesticated lines (D) in NJ-clade IV were monomorphic and Tajima’s D cannot be calculated, while within wild lines Tajima’s D was significant (Table 3). These wild lines were phenotyped for FT, revealing a heterogeneous group of accessions with different levels of tolerance to freezing. Differently in NJ-clades III and IV, D test was not significant for α-Amy1 in wild lines, and in domesticated lines only one haplotype is represented. Nevertheless, when domesticated and wild accessions were considered together D values were negative and significant suggesting that α-Amy1 may have undergone selection.

Table 3 Significant Tajima’s D values of Table 2 calculated separately for domesticated (D) lines, H. spontaneum (W) and D + W groups

The pattern of polymorphisms of HvCbf14 and Vrn-H1 genes is statistically associated with freezing tolerance

Analysis of nucleotide variants in candidate and reference genes detected in European cultivars was conducted to find those with a MAF >0.1. Two SNPs of HvCbf3, nine SNPs of HvCbf6, and four SNPs of HvCbf14, were used to evaluate the extent of LD in cultivated accessions within the CBF cluster (Table S5). Undetected amplicons of HvCbf3 and HvCbf9 were modelled as deletions in their respective genomes and treated as polymorphisms in LD calculation and in association analysis.

The LD threshold beyond two sites to declare them in disequilibrium was fixed at r² = 0.32 on the basis of the statistical analysis of the 95 percentile of the square root distribution of the r 2 values (Breseghello and Sorrells 2006). The four CBF genes at Fr-H2 locus made a co-segregating group, reported to span in 0.8 cM in barley genetic maps: within this distance high levels of LD are expected in barley cultivars based on previous publications (Rostoks et al. 2006). Considering all modern and advanced European cultivars, pairwise plot of LD evidenced, as expected, blocks of significant associations at intra-genic level, showing that polymorphisms in HvCbf6 were almost all in complete disequilibrium (r² = 1) with each other (Fig. 2). On the contrary, considering inter-genic polymorphisms, pairwise analysis of r² showed low level of LD among physically linked CBF genes: almost all these pairwise comparison did not reach a value of r² > 0.1. The maximum levels of inter-genic pairwise LD were found among Cbf3_4 and nearly all SNPs of HvCbf6 and among Cbf3_4 and Cbf14_3 (Fig. 2). Since population structure may overestimate the real extent of LD (Rostoks et al. 2006), pairwise analysis of LD was recomputed within each NJ-clade; nevertheless, resulting plots maintained the same decay in each NJ-clade (data not shown). To provide further evidence of the low extent of LD among genetic variants in selected CBFs, an analysis of the number of haplotype blocks was conducted. As these genes are linked in a narrow region of chromosome 5H, haplotype blocks are expected to co-segregate in cultivated accessions. When each gene was considered by itself, a reduction of the number of haplotypes in cultivars was observed, while considering the combinations of haplotypes of the four CBFs, 27 combinations were identified in 113 cultivated accessions and 31 combinations were found in 36 wild accessions. Considering the overall polymorphisms making haplotypes blocks detected at the CBF locus, Hd values of 0.88 and 0.98 were observed in cultivars and in H. spontaneum, respectively. In other words, the absence of recurrent haplotypes blocks in cultivars at the CBF locus supports the rapid decay of LD at this locus, despite the global trend of chromosome 5H, where the extent of linkage disequilibrium spans wider genomic distances (Rostoks et al. 2006).

Fig. 2
figure 2

Linkage disequilibrium at CBF genes in modern and advanced cultivars. Each point in the LD matrix represents a comparison between a pair of polymorphic sites, with the r 2 values displayed above the diagonal and the P values for Fisher’s exact test below. The name of polymorphic sites and the colour codes for r 2 and P values are also reported. The LD threshold beyond two sites to declare them in disequilibrium was fixed at r² = 0.20 on the basis of the evaluation of LD among unlinked molecular markers (Breseghello and Sorrells 2006)

To gain insight into the FT variability in European barley cultivars, plants were phenotyped using non-disruptive fluorescent method based on chlorophyll fluorescence analysis (Rizza et al. 2001). Based on phenotypic data the genotypes were split in five discrete classes of frost tolerance indicated from I (highly tolerant) to V (low tolerant); the data are shown in Table S1. In order to verify the hypothesis of statistical association between genetic variants at CBF genes and FT classes, a structured association based on GLM procedure was conducted including in the model each one of the first three principal components explaining respectively 23, 16 and 6% of the variation. As expected, no statistical relationship among nucleotide variants of reference genes and FT was evidenced (Table 4). Using GLM test statistic, the SNPs Cbf14_10 and CBf14_7 of HvCbf14, located in the 3′UTR, and the nucleotide variant VRN_1 of Vrn-H1 were strongly associated with the trait of interest (Table 4). All three nucleotide variants were significant with P value < 0.005 and the two polymorphisms of HvCbf14 showed similar effects on the trait. One nucleotide variant of HvCbf6 showed a P value slightly inferior to 0.005. As well as reference genes, other CBF genes considered in this study do not contribute significantly to FT with the statistical test implemented.

Table 4 List of the seven fixed effects and their level of significance retained in the final model after the backwards stepwise regression analysis

Discussion

The cluster of CBF genes located at the peak of the FT QTL Fr-H2 has relevance to the regulation of freezing tolerance in barley and in diploid wheat (Francia et al. 2004; Vágújfalvi et al. 2005; Båga et al. 2007). In recent years, the CBF gene clusters of few barley accessions have been the subjects of intense research. While several studies have considered the gene pool structures of these transcription factors in different barley populations (Francia et al. 2007; Stockinger et al. 2007), this study is the first systematic report on the sets of CBF alleles in a large collection of barley accessions.

Genetic variants of several CBF genes co-segregating with a FT QTL have been searched in a large panel of barley, including wild accessions and landraces, to assess if the signature of selection is molecularly and functionally evident. The CBF gene family has expanded itself in barley and in wheat genomes compared with other related species of temperate grasses (Badawi et al. 2007). CBF sequences of monocots cluster in several monophyletic clades, some of which contain only CBFs of temperate grasses, indicating the appearance of their common ancestors before the speciation and during the colonisation of temperate habitats (Badawi et al. 2007). As the freezing tolerance is a characteristic of temperate plants, the key CBF genes that in barley underlie this trait are expected to be peculiar and highly conserved in the Triticeae tribe and in the less related temperate species of the family. Following this principle, only four CBF genes were considered, three of which (HvCbf3, HvCbf9 and HvCbf14) having homologs currently found in temperate grasses only and one (HvCbf6) belonging to a CBF monophyletic clade deriving from a more ancient ancestral gene, HvCbf6 homologs have indeed been found in temperate and tropical grasses (Badawi et al. 2007).

It is generally agreed that crop plants contain less genetic variation than their wild ancestors (Doebley et al. 2006). Domestication in barley, as well as in other crops, acted upon some key loci, selecting favourable alleles at genes that control agronomic traits relevant for domestication (Salamini et al. 2002). In addition, domestication processes may involve genetic bottlenecks caused by the repeated reproduction of superior individuals (Doebley et al. 2006). As an outcome, neutral loci of domesticated populations can show a reduction of genetic variation compared with wild ancestor populations. When comparing the levels of nucleotide and haplotype diversities between cultivated varieties and wild accessions (Table 1), a loss of genetic variants is evident. The analysis of the CBF gene pools in the cultivated European germplasm reveals that sequences of HvCbf9 are almost monomorphic in cultivated varieties. As several oligonucleotides present in the coding sequence failed to detect HvCbf9 in some lines, we hypothesise the absence of this gene in some barley accessions. A similar situation holds for HvCbf3 which was not detected in one-third of European cultivars. Our results beg the question of CBF cluster structure in barley. Large deletions and a different organisation of orthologue genomic regions in unrelated accessions of barley have been already demonstrated, and BAC comparison of two barley accessions shows that some CBFs are absents in specific varieties (Francia et al. 2007; Skinner et al. 2005) showing that a pan-cluster of CBFs with some dispensable genes evolves and segregates in barley.

A structured population as well as variations in population size and selection might explain deviation from the neutral hypothesis. HvCbf9 and α-Amy1 show a strong reduction of nucleotide diversity in cultivated accessions and deviate significantly from the neutral hypothesis. To bypass possible effects of population structure, Tajima’s D values have been re-calculated within NJ-clades to avoid the effect of population structure and to find evidence of selection for HvCbf9; a significant D value within the NJ-clade IV was evident. When the Tajima’s test was applied only in wild accession the D value was again significant (Table 3). Most probably the excess of rare variants of HvCbf9 in the wild gene pool affects the test, supporting a rapid population expansion rather than natural selection targeted to the gene. However, the second hypothesis was also considered and NJ-clade IV lines were phenotyped (Table S3). Unexpectedly, they were characterised by a large variation in FT, but genotypes with rare HvCbf9 variants did not show a high level of tolerance; thus most probably this gene does not affect the trait. As selection targeted at individual loci reduce genetic diversity within and around selected loci (a phenomenon known as genetic hitchhiking; Smith and Haigh 1974; Palaisa et al. 2004), the strong reduction of alleles of HvCbf9 in cultivated varieties could have been generated by selection targeted to tightly linked CBF genes.

It is generally assumed that breeding programmes act simultaneously upon more loci that control a variety of traits, and, similarly to domestication bottlenecks, they contribute to reduce drastically genetic diversity in modern crops. Being one of the most important crops in the world, barley has been considered by several breeding programmes. The data presented here demonstrate that the genetic erosion of the barley gene pools caused by the intense breeding is continuing and increases progressively with time (Fig. 3).

Fig. 3
figure 3

Haplotype diversity (Hd) of five genes involved in FT (HvCbf3, HvCbf6, HvCbf9, HvCbf14 and Vrn-H1) plus two reference genes (α-Amy1 and Gapdh) recorded in two subgroups of European cultivars released before 1980 a and after 1980 b, respectively. More recent cultivars showed a loss of Hd, for almost all genes

Changing the approaches to CBF analyses and QTL genetic dissections in plants (Francia et al. 2007; Knox et al. 2008), association mapping is applied for the first time to a panel of barley varieties to reveal significant associations of genetic variants detected within the candidate CBF genes. Barley, as well as other temperate grasses, shows a broad range of phenotypic variation for freezing tolerance and the phenotyping of selected germplasm for this complex trait provides a useful tool to study candidate genes involved in the freezing response. On the basis of phenotypic data, selected European cultivars have been sorted in five discrete classes, each of which identifies groups of accessions with comparable tolerance to freezing stress.

In the map generated from the cross between the varieties Morex and Dicktoo, HvCbf14, HvCbf6 and HvCbf3 co-segregate together, while HvCbf9 is located in a separate but genetically linked cluster of CBF genes (Skinner et al. 2005). At the physical level, HvCbf6 and HvCbf3, but not HvCbf14, have been identified in a single BAC. The physical association between HvCbf6 and HvCbf3 is reflected in the levels of LD higher than a significant threshold (Fig. 2). On the basis of these data, the four CBFs considered in this study should span the length of the CBF gene cluster. Subtle genetic differences in unrelated lines allow using the association mapping approach to search for genes that underlie one or more traits. The resolution of an association mapping procedure is affected by the level of linkage disequilibrium present in the region that contains the candidate genes (Rafalski 2002). Based on the analysis of r 2 values, we reported for the fist time the levels of pairwise associations among CBF variants at the Fr-H2 locus. The data demonstrate the low levels of r 2 among CBF genes of European cultivars and also that LD decay allows sufficient mapping resolution to locate the genetic determinant of freezing tolerance within the CBF cluster. In rice and in other plant species, homogenization events internal to the CBF gene cluster seem to be a key driving force guiding CBF evolution via gene conversion and crossing over (Wang et al. 2005; Pennycooke et al. 2008). The same mechanisms may contribute to the erosion of linkage disequilibrium as well as to gene deletions; similar events may have played a role in the evolution of barley CBF genes.

In a collection of barley cultivars, different levels of genetic similarity correlate with specific alleles of loci involved in the control of growth habit (Fig. 1). This introduces a bias due to population stratification and phenotypic homogenization which ultimately affects negatively the association procedure. To circumvent this effect, principal component data were inserted in a general linear model as covariates to account for population stratification in association analysis. The association analysis between genetic variants of CBFs and freezing tolerance gave positive results for two genes and revealed that, more than other CBF variants, two nucleotide substitutions of HvCbf14 are statistically associated with freezing tolerance. Both polymorphisms fall in the 3′UTR of HvCbf14 and bioinformatics analysis indicates that this region is devoid of cis-elements affecting CBF gene expression. Interestingly, a number of freezing-tolerant wild lines from the NJ-clade IV have the same variants of HvCbf14, supporting the effectiveness of the statistical procedure. It is not, however, established that these polymorphisms are the causative variant(s) of FT; the coding sequence and cis-regulatory elements at the 5′of HvCbf14 are also good candidates, as well as other physically linked genes (i.e. another gene also affecting FT in the CBF cluster in high LD with HvCbf14 due to a common history of selection for FT) and/or regulatory elements located in this region of the cluster. A deletion in the first intron of Vrn-H1 gene was pointed out as the functional genetic variant associated with spring growth habit in barley and diploid wheat and, consequently, able to affect frost tolerance in these species (Yan et al. 2004; Fu et al. 2005; von Zitzewitz et al. 2005; Szucs et al. 2007). Even though considering the functional genetic variant of Vrn-H1 gene as a fixed effect the power of the statistical model would have increased, other genetic variants in the 3′UTR of Vrn-H1 were instead searched and inserted as fixed effects in the equation model to verify their capability to evidence the well-known role of Vrn-H1 in frost tolerance. One of them, named VRN_1, was revealed to be associated with frost tolerance (Table 4) providing a further control of the association procedure effectiveness. On the other side, this statistical association raises the question of the LD of this gene as in the barley population examined VRN_1 must be in high LD with the functional genetic variant previously discovered. Several studies have investigated on the extent of the LD in barley and their conclusions support this speculation as the LD decay extends over the genes and well far away (Rostoks et al. 2006); a remarkable exception is the relatively low level of LD within the CBF locus described in this report. The comparison of resistant and susceptible varieties of bread wheat (Cheyenne and Chinese spring, respectively, Vágújfalvi et al. 2005) as well as of barley (Nure and Tremois, respectively Stockinger et al. 2007) indicated the CBF14 gene as one of the CBF sequences whose expression is associated with frost tolerance. Dissection of the QTL for FT in T. monococcum points to the CBF sub-cluster region carrying TmCBF12, TmCBF14 and TmCBF15 genes as the only one explaining the variation in frost tolerance in the two accessions compared (Knox et al. 2008). The comparison between our results and those from diploid wheat suggests that the conservation of this sub-cluster region is the most relevant for FT, and highlights that, despite the high degree of homology among the CBF members, they have different physiological roles and/or relevance in trait control. In A. thaliana evidence indicates that genes AtCbf1, AtCbf2 and AtCbf3 have different physiological roles (Novillo et al. 2004, 2007). The results of our work, indicating that only HvCBF14 is associated with FT in a large germplasm collection, suggest that there are some degrees of specificity among the different CBFs, and that HvCbf14 more than HvCbf3, HvCbf6 and HvCbf9 is relevant for FT. This finding makes unlikely that FT is merely due to a high amount of any CBF mRNAs, rather that the presence of specific CBF mRNAs is required to promote a high level of cold hardening.