Introduction

Protein properties are the most important factors in determining cultivar characteristics with differences in whole plant attributes better reflected by their proteomes than genomes or transcriptomes. Early two-dimensional gel electrophoresis (2-DE)-based work showed genetic variation in protein patterns between wheat lines (Zivy et al. 1983), demonstrating the potential of proteomics for assessment of genotype differences. Natural variation within species is increasingly recognised as an important genetic resource for plant breeding. Proteome analysis has been used to assess natural variation among potato genotypes (Lehesranta et al. 2005), Arabidopsis ecotypes (Chevalier et al. 2004; Ruebelt et al. 2006) and barley cultivars differing in malting quality (Görg et al. 1992a, b). Proteome variations have rarely been studied in combination with genetic analysis (Consoli et al. 2002). A pioneering study in an F2 population of maize demonstrated variations in intensities of protein spots on 2D gels that could be used to construct a genetic linkage map (Damerval et al. 1994) and identify Protein Quantity Loci “PQLs” that affected the intensity of protein spots. Quantitative protein spot variations in maize inbred lines were used to analyse their genetic variability (Burstin et al. 1994). Other genetic studies of storage protein spots have been reported using ditelocentric linies and di-parental mapping populations of wheat (Islam et al. 2002; Amiour et al. 2002). Segregation of protein spots on 2D gels in a doubled haploid population was used to find markers for anther culturability in barley (Devaux and Zivy 1994). These studies highlight the potential of protein variations as physiologically relevant genetic markers (De Vienne et al. 1996). In most cases, the molecular basis for such protein spot differences was not elucidated.

We combined proteomic and genetic analyses by identifying cultivar variations in barley seed proteins, as well as employing a doubled haploid population for initial mapping of 2-DE spot differences to chromosome locations. The cultivars were clustered independently on the basis of (a) seed protein spot variations and (b) simple sequence repeat (SSR) markers and correlated with malting quality. Mass spectrometry was used to identify the varying protein spots and to clarify the molecular basis for spot variations.

Experimental methods

Barley seeds and malt

Eighteen spring barley cultivars comprising 14 malting (Alexis, Annabell, Barke, Century, Chamant, Golden Promise, Harrington, Lux, Mentor, Morex, Optic, Saana, Scarlett and Sloop) and four feed cultivars (Meltan, Otira, Regatta and Salka) were included in the study. Cultivars were field grown in Fyn, Denmark under the supervision of Sejet Plantbreeding, Denmark. Seeds from 186 doubled haploid lines were kindly supplied by Sejet Plantbreeding, Sejet, Denmark. Mature seeds were micromalted, and the following parameters were analysed according to the European Brewing Convention (EBC) standards: extract, colour, soluble nitrogen, β-glucan, viscosity, friability, modification, homogeneity, diastatic power, α-amylase and β-amylase. The cultivars were scored on a weighted scale from 1 to 10 for each parameter (Supplementary Table S1) and ranked on the basis of their total score.

Protein extraction and 2D gel electrophoresis

Proteins soluble in low-salt buffer (5 mM Tris HCl pH 7.5, 1 mM CaCl2) were extracted from milled mature seeds at 4°C as previously described (Østergaard et al. 2002). Proteins in 100 or 200 μL extract (100 or 200 μg protein) for pI 4–7 or 6–11 gels, respectively, were precipitated with 4 volumes of acetone at −20°C for at least 24 h. Precipitated proteins were redissolved in the appropriate IPG rehydration buffer prior to the first dimension, as previously described (Østergaard et al. 2002; Bak-Jensen et al. 2004). For separation of proteins in the pI ranges 4–7 and 6–11, immobilised pH gradient 18 cm IPG strips were run on an IPGphor unit or a Multiphor II, respectively (GE-Healthcare), and the second dimension SDS-PAGE gels (12–14%, 18 × 24 cm, GE-Healthcare) were run on a Multiphor II as previously described (Østergaard et al. 2002; Bak-Jensen et al. 2004). Gels were stained with silver nitrate (Heukeshoven and Dernick 1985). For identification of proteins by mass spectrometry, preparative gels loaded with 250 μg protein were stained with colloidal Coomassie blue (Rabilloud and Charmont 2000). Protein profiles were highly similar and all analysed spots were detected both on silver- and Coomassie-stained gels.

Protein identification

Spots were cut from gels and subjected to in-gel trypsin digestion (Shevchenko et al. 1996) and peptides were desalted and concentrated (Gobom et al. 1999) as previously described (Finnie et al. 2002) for MALDI-time-of-flight (TOF) peptide mapping. A Bruker REFLEX MALDI-TOF mass spectrometer (Bruker-Daltonics, Bremen, Germany) in positive ion reflector mode was used to analyse tryptic peptides. In cases where identification was unsuccessful using peptide mass mapping, peptide fragmentation was carried out on an ABI 4700 MALDI MS/MS instrument (Applied Biosystems). The m/z software (Proteometrics, New York, USA) was used to analyse spectra. Spectra were calibrated using trypsin autolysis products (m/z 842.51 and m/z 2211.10) as internal standards. The Mascot server (http://www.matrixscience.com) was used for database searches using the following parameters for MALDI-TOF peptide mass data: monoisotopic mass accuracy 80 ppm, missed cleavages 1, allowed modifications carbamidomethylation of cysteine (complete) and oxidation of methionine (partial). For positive identifications, a significant Mascot score (p < 0.05) was required. If identification failed, the peptide mass data was queried against the NCBI EST database. Putative identifications were then validated by comparison of peptide mass lists with translated Tentative Consensus (TC) sequences in the Institute for Genome Research (TIGR) barley gene index (http://www.tigr.org/tdb/hvgi). Details of identifications made as part of this study are given in Supplementary Table S2.

Molecular marker analysis

DNA was extracted from seeds as described (Christiansen et al. 2006). Markers employed for mapping were SSR developed earlier (Becker and Heun 1995; Liu et al. 1996; Struss and Plieske 1998; Ramsay et al. 2000) and amplified fragment length polymorphism (AFLP). Polymerase chain reaction (PCR) using SSRs was performed using a Primus Multiblock from MWG Biotech in a total volume of 10 μL containing 20 ng genomic DNA, 1× PCR buffer, 0.1 U Immolase Taq polymerase (DNA Technology A/S), 0.7 pmol of each forward and reverse primer and 0.2 mmol dNTPs. PCR cycling conditions were as described (Ramsay et al. 2000). AFLP fragments with the enzyme combination MseI/PstI were produced using an established protocol (Vos et al. 1995) with minor modifications. Primers used for PCR amplification were fluorescently labelled, and the fragment length was analysed using either an ABI377 or an ABI310 (Applied Biosystems) automatic DNA sequencer (Christiansen et al. 2006; Schwarz et al. 2000).

Linkage map

In total, 147 AFLP and 56 SSR fragments were mapped in the Scarlett × Meltan doubled haploid population of 186 lines. The marker fragments were assigned to chromosomes based on clustering using two-point data and prior knowledge about localisation of the SSR bands. Within these groups, the GMENDEL computer software (Liu and Knapp 1990) was used to determine the marker order. An order was accepted if it was stable during a Monte-Carlo simulation with different starting points and using the different options for verification provided by GMENDEL.

Cultivar comparisons and clustering

Fragments amplified by SSR primers were scored by their size in base pairs using the programme GenotyperR version 3.7 (Applied Biosystems). Protein spot variations were located by manual inspection of the 2D gels and confirmed to be reproducible in replicate gels and gels from two harvest years. Redundancy was removed from the spot variation matrix (Table 1) by excluding spots displaying identical variation across all cultivars. Dissimilarity matrices based on SSR or proteome variations were obtained by calculating a dissimilarity coefficient (Diwan and Cregan 1997) for each pair of cultivars. Dendrograms were produced by the application of dissimilarity matrices as described in Christiansen et al. (2002).

Table 1 Mature seed spot variation matrix for 18 barley cultivars

Results and discussion

Cultivar variations in protein 2-DE gel spots

Spring barley cultivars were chosen on the basis of differences in malting properties. The proteome comparison was initially carried out using seeds harvested in year 2000 from a single growing location. Proteins were extracted from mature barley seeds, which represent the raw material for the malting industry. In order to verify that only stable genetic differences were included in the analysis, seeds harvested in 2001 were also analysed, and spots varying between years were excluded. In this study, only variations involving clear presence/absence of protein spots were considered. To ensure that the observed differences were reproducible, three to four replicate gels were run for each of the cultivars Barke, Scarlett and Meltan.

In total, 69 well-defined and reproducible spot differences were observed among the cultivars in the overlapping pI ranges 4–7 and 6–11 (Supplementary Fig S1). Proteins in 26 of the spots were already identified as part of ongoing barley seed proteome analyses (Bak-Jensen et al. 2004, 2007; Finnie et al. 2002, 2004, 2006; Finnie and Svensson 2003; Laugesen et al. 2007; Østergaard et al. 2002, 2004). Proteins in an additional 22 varying spots were successfully identified by mass spectrometry (Supplementary Table S2) resulting in the characterisation of 48 out of the 69 variable spots. The spot variations included serpins, peroxidases and several proteins with unknown functions (Table 1). In many cases, proteins identified in varying spots were also present in non-variable spots. This may be the result of post-translational modifications or the presence of protein isoforms as differences of a few amino acids cannot always be distinguished due to incomplete sequence coverage by mass spectrometry.

Polymorphic spots were usually absent in more than one of the cultivars. This suggested that it would be possible to group the cultivars based on their combinations of these variable protein forms which, due to their functions in defence, metabolism and other processes, are likely to contribute to the differing characteristics of the cultivars. Noticeably, each of the analysed cultivars had a unique combination of these spots. The variation matrix (Table 1) is thus a promising tool for cultivar identification.

A clustering analysis was performed to group the cultivars based on either protein or SSR marker profiles (Fig. 1). In parallel with the proteome analysis of the barley cultivars, a standard micromalting analysis was performed using the same seed material. The superior malting cultivars according to this analysis (Alexis, Barke, Lux and Mentor) were closely grouped by the proteome data (Fig. 1b) supporting a connection between the proteome and malting quality. These cultivars were not grouped as closely by the SSR analysis (Fig. 1a), suggesting that the seed proteome variations might indeed reflect seed phenotypes more closely than genome-wide SSR variations.

Fig. 1
figure 1

Clustering of cultivars on the basis of 2D gel variations and SSR markers. Genetic relationships between cultivars based on a SSR markers and b 2D spot variations (Table 1). The weighted malt score is indicated for each cultivar except for Optic and Golden Promise for which data are not available

Cultivars displaying poorer malting quality did not group together, probably because selection criteria for feed barley are traditionally far less stringent than for malting barley. If any one of the malting quality parameters (Supplementary Table S1) is not within the required range, the barley is unsuitable for malting. Good malting cultivars must conform to each of the criteria and therefore share common properties.

Molecular and genetic basis for spot pattern differences analysed by mass spectrometry

Barley β-amylase SNPs recognised in the proteome

Often, when a spot in one cultivar was replaced in another by a spot with similar molecular mass but different pI, MS peptide mass mapping showed that the spots contained the same protein. In a previous study (Finnie et al. 2002), spots differing in pI in four cultivars (Barke, Morex, Meltan and Mentor) were shown to contain β-amylase fragments differing by a Cys-Arg substitution. This amino acid difference affects the catalytic efficiency such that the Arg-containing form of β-amylase has a 2.5-fold higher K m for soluble starch than the Cys-containing form (Ma et al. 2001). Thus, the 2-DE patterns could be directly coupled both to genetic differences between cultivars and to distinct functional properties of protein forms. The Cys-Arg substitution was confirmed by mass spectrometric identification and sequencing of peptides from 2-DE gel spots containing full-length β-amylase from Barke and Morex. Thus, peptide mass mapping demonstrated that in Barke, only the Cys115 form was present, while both Cys115 and Arg115 forms were found in Morex (data not shown). The Cys-Arg substitution is due to a single nucleotide polymorphism (SNP) in β-amylase genes (Erkkila et al. 1998; Clark et al. 2003). In contrast to classical SNP analysis, proteomics illustrates the occurrence of proteins representing the SNP markers, enabling prediction of β-amylase SNP markers in barley cultivars (Finnie et al. 2002).

Identification and validation of SNPs in the proteome

Examination of 2D gel and mass spectrometric data generated by barley seed proteome analysis revealed several cases of coding SNPs in the proteome. Determination of coding SNPs expressed in the seed proteome can identify proteins with a higher probability of influencing seed phenotypes such as malting quality.

In the developing seed proteome of cultivar Barke, a new form of endo 1,3-β-glucosidase was identified (spot no. 522 in Finnie et al. 2006). Several EST sequences that matched peptide mass data from this spot encoded Gly144 whereas Ala144 was encoded by the majority of the EST sequences included in the corresponding TC sequence generated at the Institute for Genome Research Barley Gene Index (HvGI). The peptide mass map from spot no. 522 contained the Gly ([M + H] 1697.8) but not the Ala peptide ([M + H] 1711.8; data not shown). Nine EST sequences in the database covered this region: three originating from Barke had a GGC codon for Gly144, in agreement with the peptide mass map from spot no. 522; the remaining six—five from cultivar Himalaya and one from Morex—contained a GCC codon for Ala. These cultivar-related differences supported the presence of a SNP at this position. The Gly-Ala change does not alter the protein pI, and no 2-DE spot pattern difference was expected for the cultivars. The presence of the SNP could be confirmed by mass spectrometry on the corresponding spot from Morex or Himalaya, in which an Ala144-containing peptide of [M + H] 1711.8 is expected.

Previously (Finnie et al. 2004), spots varying between cultivars Barke and Golden Promise were identified, including two with different pI, which were both identified as a putative 6-phosphogluconolactonase encoded by TC139824 (spots no. 336 and 540; Table 1; Fig. 2a). Several putative SNPs are listed for this TC sequence. One of these, a change from CAG to GAG (Fig. 2b), would result in a substitution of Gln for Glu, which would account for the pI difference between spots no. 540 (Golden Promise) and no. 336 (Barke). Careful re-examination of the mass spectra for the Barke and Golden Promise spots revealed peaks with the expected mass difference of 1 Da (Fig. 2c). The available EST sequences covering this region of the TC originated from Barke (predicted Gln), Morex (predicted Glu) and Optic (predicted Glu); none was available from Golden Promise, which contained the more acidic 2-DE spot. However, both Morex and Optic contained the more acidic spot (no. 540; Table 1, Fig. 2a) as predicted by the SNP data. In fact, Morex was found to contain both spots no. 540 and 336, the latter of which remains to be supported by an EST sequence. Thus, Morex appears to be heterozygous at several loci.

Fig. 2
figure 2

Coding SNPs can be identified and confirmed in the proteome using 2D gels and mass spectrometry. a Cultivar variations involving spots no. 336 and 540 containing a putative 6-phosphogluconolactonase (Table 1). b EST sequences originating from cultivars Barke, Morex and Optic showing a predicted coding SNP that would result in a change from glutamine to glutamate in 6-phosphogluconolactonase. c Mass spectrometry confirms the Gln-containing tryptic peptide ([M + H] 1664.8) in Barke and the Glu-containing form ([M + H] 1665.8) in Golden Promise

A small heat shock protein isoform was identified in spots no. 304 and 408 (Østergaard et al. 2004; Finnie et al. 2004). This protein family typically shows a high degree of variation (Zivy 1987). Thirteen potential SNPs were present in the TC sequence encoding this isoform, four of which would result in an amino acid substitution and one of which encoded a Lys-Glu substitution, which would account for the pI shift observed for spots no. 304 and 408. Available ESTs predicted the more basic Lys-containing spot no. 408 in Himalaya and Morex and the acidic Glu-containing spot no. 304 in Morex, Barke and Optic. Examination of 2D gel data confirmed the presence of spot no. 304 in Morex, Barke and Optic (Table 1) and spot no. 408 in Himalaya (not shown) but not Morex. However, the tryptic peptides containing the Lys-Glu substitution were too small to be detected in the mass spectra from these spots.

Mapping of 2-DE spot differences in a doubled haploid population

A doubled haploid population consisting of 186 lines derived from the malting barley Scarlett and the feed barley Meltan was used to generate a genetic map based on SSR and AFLP markers. Forty-eight spots showed polymorphism between Scarlett and Meltan (Table 1). Segregation of these polymorphic spots was analysed in a subset of 30 lines of the Scarlett × Meltan population, which enabled initial mapping of the protein spot variations. Several of the spots co-segregated, forming 15 groups (A–O; Table 1; Supplementary Fig. S1), each containing up to nine spots. The 15 spot groups were mapped (Table 1, Supplementary Fig. S2) to chromosomes 1H (group N), 2H (group C), 3H (groups D and J), 5H (groups A, E, F, G, H, I, K and L) and 7H (groups B, M and O). Several of the proteins in the varying spots were identified by mass spectrometry (Table 1). In three cases, the chromosome location of the corresponding gene was known, providing a means to validate the spot segregation data. Thus, the gene encoding serpin Z7 is located on chromosome 5H (von Wettstein-Knowles 1993), the gene encoding the α-amylase/trypsin inhibitor BMAI-1 is located on chromosome 2H (Mena et al. 1992), and the barley grain peroxidase-encoding gene is found on chromosome 7H (Rostoks et al. 2005). The known chromosome locations corresponded to group A (containing serpin Z7 spots), C (containing BMAI-1), B and M (containing barley grain peroxidase), respectively (Table 1, Supplementary Fig. S2). This demonstrated the possibility of assigning spot pattern differences to chromosome locations using a subset of lines. In addition, this type of analysis can identify chromosome locations of trans-acting regulators of the proteins observed, as well as structural genes encoding the proteins themselves (Consoli et al. 2002). Expression of the peroxidase gene on chromosome 7H was reported to be affected by abiotic stress (Rostoks et al. 2005), and therefore, the presence or absence of the protein in different cultivars might be an indicator of stress tolerance.