Background

Plant cell wall consists of three main polysaccharide fractions: cellulose, hemicellulose, and pectin, with lignin and proteins being the other two constituents. Grass walls contain mainly two of the three polysaccharide fractions with pectin being a rather minor constituent. Hemicelluloses are plant cell wall matrix polysaccharides that possess diverse linear or branched structures [1, 2]. These mainly encompass 1–4-β-glucan, 1,3;1,4-β-glucan, galactan, and glucomannan in grasses [3]. In addition, glucuronoarabinoxylan is a major grass wall constituent. Because of the presence of heterogeneous substituents or other linkages on their polymer backbones, hemicelluloses are non-crystalline and can be readily hydrolysed in comparison to cellulose. These polysaccharides can interact with cellulose microfibrils through hydrogen bonds [4].

Hemicellulosic polysaccharide backbones in plants are made by the cellulose synthase-like (Csl) enzymes, which are members of a much larger superfamily of genes referred to as glycosyltransferase 2 (GT2) [5]. Several other GTs, i.e., xyloglucan α-1,6-xylosyltransferases (GT34), xyloglucan fucosyltransferases (GT37), and xyloglucan galactosyltransferases (GT47) have been reported to be involved in the biosynthesis of xyloglucans [6]. Genes encoding Csl enzymes share sequence similarity with the cellulose synthase A (CesA) gene family known to form cellulose throughout the plant kingdom [7]. A variable number of Csl genes ranging from 30 to 50 have been reported from different plant species and are classified into nine subfamilies (CslACslH and CslJ) [8, 9]. Cereals generally lack CslB and CslG families. Among the remaining families, CslA, CslC, and CslD are conserved in all land plants, whereas CslF, CslH are restricted to grasses [10, 11]. A poorly understood subfamily, CslJ, has been reported in grasses as well as dicots, which contrasts with the previous claims of its occurrence only in grasses [12, 13]. Similarly, the subfamilies CslB and CslG were previously reported to be specific to dicots [14]. However, a recent report established the presence of the CslB subfamily in monocots as well [12]. Several of the Csl subfamilies have been reported to be involved in the biosynthesis of different cell wall polysaccharides. For example, subfamily CslA was shown to form β-1,4-mannan backbone of galactomannan and glucomannan [15, 16]. Similarly, CslF and CslH subfamilies were shown to make 1–3;1–4-β-glucan in grasses [17, 18], whereas CslC genes were associated with the formation of the 1–4-β-glucan backbone of a xyloglucan and some other polysaccharides [19].

Wheat is a major cereal crop grown on the largest area of arable land in the world, is second only to maize in grain production, and feeds approximately 40% of the world population [20]. It has a large genome size (~17 Gb), of which ~80–90% is repetitive [21]. Even after the complete genome sequence became available [22], Csl genes remain unidentified and uncharacterized in bread wheat. In general, homeologous copies of most of the genes are located on each of the three chromosomes belonging to each of the subgenomes (A, B, and D), suggesting that the number of Csl genes is expected to approximately three-times that of a diploid species like rice. We used publicly available resources to retrieve wheat genome sequence. Large-scale data mining was performed using the Pfam domain models for the identification of Csl gene family members, which are reported in this study.

Methods

Data sources and sequence retrieval

Wheat genome data were downloaded from the Ensembl Plants FTP server (ftp://ftp.ensemblgenomes.org/pub/current/plants/fasta/triticum_aestivum/), generated by the International Wheat Genome Sequencing Consortium (IWGSC) and converted into a local BLAST database using the UNIX pipeline. BLAST analyses (BLASTN as well as BLASTP) were performed using the stand-alone command line version of NCBI (National Center for Biotechnology Information) blast 2.2.28+ (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/), released March 19, 2013. A query file was generated from Pfam domain models; PF00535 (GT2) domain and PF03552 (Cellulose_synt) downloaded from Pfam 30.0 June 2016 release [23]. The sequences of splice variants were also retrieved from Ensembl Plants browser (http://plants.ensembl.org/Triticum_aestivum/Info/Index). Analysis of splice variants was conducted as described by Kim et al. (2007) [24]. Previously known Csl sequences from Arabidopsis thaliana, Oryza sativa, and Zea mays were downloaded from the Cell Wall Navigator database [25]. For Brachypodium, sequences were retrieved from phytomine (https://phytozome.jgi.doe.gov). Amino acid sequences of the aforementioned CSL proteins are given in Additional file 1: Figure S1.

Blast searches for wheat homologs

All query files containing the two Pfam domain models (PF00535 and PF03552) were used to perform the BLASTn searches against the local blast database of bread wheat. All blast hits with E-value >1.0 were removed. Using cut-off E- value <1.0, all previously known CesA genes were retrieved. After the compilation of all the sequences below the cut-off value, CD-hit program was used to obtain non-redundant sequences. Higher cut-off E- value was used to ascertain the identification of all the genes that possessed the Pfam domains PF00535 and PF03552. These genes were further filtered through phylogenetic analysis alongwith previously known CSL proteins from Arabidopsis, Brachypodium, maize, and rice, which reflected some non-targeted genes that were removed from further analysis [26]. Phylogenetic analysis was also implemented to categorize different Csl sub-families. CesA genes were distinguished from the Csl genes with the CXC motif, which is diagnostic of the CesA but absent from the Csl proteins [7, 27]. Presence of the conserved domains Cellulose_synt/GT2 was confirmed using a batch blast search at the CDD (conserved domain database) of NCBI. Homeologous genes from each of the three genomes were named TaCslXY_ZA, TaCslXY_ZB, or TaCslXY_ZD, where X denotes the Csl subfamily, Y the gene number and Z the wheat chromosome where it is located. Alignment of the sequences of all newly identified wheat Csl genes is given in Additional file 2: Figure S2.

Protein structure and motif/domain identification

Protein sequences were downloaded from the Ensembl Plants FTP server (ftp://ftp.ensemblgenomes.org/pub/current/plants/fasta/triticum_aestivum/), developed by the International Wheat Genome Sequencing Consortium (IWGSC) [22]. Multiple protein sequence alignments were performed using Clustal Omega (http://www.ebi.ac.uk/Tools/msa/clustalo/) [28]. The resulting alignments were analysed for the presence of conserved motifs (D, D, DXD, QXXRW) of the GT2 superfamily. Conserved patterns of aligned sequences were highlighted using the sequence manipulation suite: Color align conservation (http://www.bioinformatics.org/sms2/color_align_cons.html) [29]. The conserved domains were predicted using CCD database (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) [22, 30, 31]. Wheat Csl genes were named based on their sequence identity, coverage, presence of conserved domains and motifs similar to those of the previously identified rice Csl genes. The number of genes in in a subfamily exceeded that of rice, the additional genes were given new names. Because of the resemblance of CslD genes with CesA genes and their probable role in cellulose synthesis, we specifically focused on the TaCslD subfamily. Gene structures and intron evolution of TaCslD members were predicted using the gene structure display server 2.0 (http://gsds.cbi.pku.edu.cn/) using the genomic and cDNA sequences.

Evolutionary relationships of Csl genes

A total of 215 CSL proteins from Arabidopsis, maize, rice and wheat were aligned using MAAFT (v1.3.6) [32]. Sequences that did not extend over the conserved core region were removed. Positions where more than 40% of the sequences contained a gap were also removed. The phylogeny and 1000 bootstrap replications of these sequences was inferred using Seqboot (v3.696) [33] and FastTree (v2.1.10) implemented on the Guillimin cluster [34].

The phylogeny of the CslD subfamily was also determined separately from Arabidopsis, Brachypodium, maize, rice and wheat. For phylogenetic analysis, the amino acid sequences of CSL proteins were aligned using MUSCLE and their evolutionary history was inferred using Neighbor-Joining methods [35]. The tree was drawn to scale, with branch lengths being equivalent to the evolutionary distances used to infer the phylogenetic tree. Evolutionary distances were computed with a Poisson correction and are given as the number of amino acid substitutions per site. The rate of variation among sites was modeled with a gamma distribution (shape parameter = 1) and all positions containing gaps and missing data were removed. Evolutionary analyses were conducted in MEGA6 [36].

RNA-seq expression analysis

Publicly available RNA-seq data generated from bread wheat (var. Chinese Spring) was used to study the expression of newly identified wheat Csl genes. The data were compiled from five different wheat tissues (spike, leaf, stem, root, and grain) collected at seedling, vegetative and reproductive stages of development [37]. The relative expression of each TaCsl subfamily was presented as a heat map generated from the relative abudnace of transcripts (per 10 million reads) for each gene using wheat expression browser powered by expVIP (http://www.wheat-expression.com).

Results

Identification and classification of Csl gene family members in bread wheat

Database searches for bread wheat using conserved pfam motifs PF00535 and PF03552, which are specific to the GT2 superfamily, resulted in the identification of 108 cellulose synthase-like (TaCsl) genes (Table 1). Two to three homeologous copies of each gene from the A, B and D genomes were common. The identified genes were named following the nomenclature of rice, which shares synteny with wheat. To avoid the complexity of the nomenclature, a suffix corresponding to the chromosome number and the specific wheat genome identifier (A, B, or D) has been used for each gene name [7]. For example, the first gene of subfamily CslA; CslA1 on the long arm of chromosome 1 of genomes A, B, and D is named as TaCslA1_1AL, TaCslA1_1BL, and TaCslA1_1DL, respectively.

Table 1 Homeologous copies of the bread wheat Csl genes

An unrooted neighbor-joining (NJ) tree for the 215 derived Csl proteins from Arabidopsis, maize, rice and wheat is shown in Fig. 1. TaCsl proteins grouped into seven subfamilies: TaCslA (32 proteins), TaCslC (13 proteins), TaCslD (12 proteins), TaCslE (10 proteins), TaCslF (29 proteins), TaCslH (8 proteins), and TaCslJ (4 proteins) (Fig. 2). The TaCslA and TaCslC subfamilies were closely related as shown by their taxonomic distribution and phylogenies. As expected, these subfamilies were conserved across the plant species. Although TaCslD is present in all the plant species whereas TaCslF is specific to grasses, their proximity to each other suggests a common origin [12]. Among the sequences common to both dicots and grasses, subfamily CslA appeared to be the most divergent between these two groups of plants. Whereas the sequences within the subfamilies CslC and CslD were interspersed between Arabidopsis and grasses, all the subfamily CslA sequences of Arabidopsis clustered together, separately from the grass CslA sequences. Proximity of the CslB and CslH subfamilies points to their common origin before the separation of grasses from dicots. Similarly, CslG and CslJ apparently had a common origin.

Fig. 1
figure 1

An unrooted maximum likelihood phylogenetic tree of the Cellulose synthase-like (Csl) gene family from Arabidopsis, maize, rice and wheat using FastTree (v2.1.10) according to Price et al. (35). Nodes with more than 70% support from 1000 bootstrap replications were considered significant and indicated by a black circle. Different colors represent CSL proteins from different species. The scale bar indicates a radial distance equal to 0.5 amino acid substitutions per site. To keep the gene family nomenclature uniform, maize gene models from Gramene were renamed as follows: Zm, first four digits of the locus number, Csl, and the class identifier as described in Schwerdt et al. (9)

Fig. 2
figure 2

Distribution of the TaCsl genes and their splice variants in seven subfamilies and their corresponding pfam domains used to identify TaCsl gene family members

Splice variants of Csl genes

Twenty two of the 108 genes appeared to encode two or more proteins because of the presence of alternative splicing sites, as predicted by Ensembl database, which would result in 137 probable Csl protein products (Table 2). Splice variants were predicted in all the subfamilies of the TaCsl genes except TaCslD (Table 2). In the subfamily TaCslA, 6 genes alternatively spliced to form 13 putative proteins whereas in the subfamily TaCslC, 5 genes were alternatively spliced resulting in 14 putative proteins. Similarly, for the subfamilies TaCslE and TaCslF, alternative splicing resulted in 7 and 10 splice variants, respectively. Alternative splicing of 1 and 2 genes respectively generated 3 and 4 putative proteins in the CslH and CslJ subfamilies (Fig. 2). More than half (51%) of the splice variants stemmed from exon skipping, ~24% from alternative 5′ and 3′ splice sites, and the rest, ~24%, from intron retention (Table 2).

Table 2 Splice variants of the bread wheat Csl genes

Conserved motifs and domains

All predicted TaCSL proteins contain either the pfam glycosyltransferase family 2_3 (GT) domain (PF13641) or the cellulose_synt domain (PF03552), considered to be the signature domains of the GT2 superfamily [12, 26]. Subfamilies TaCslA and TaCslC contained GT 2_3, and CslD, CslE, CslF, CslH,and CslJ contained the cellulose_synt domain (Fig. 2). All the TaCsl translanted products contained the motifs D, DXD, D and QXXRW except eight truncated genes that lacked some of these motifs apparently because of the missing sequence (TaCslA7_2DS, TaCslD4_1BS, TaCslD4_5BS, TaCslF2_7BL, TaCslF6_7AL, TaCslF6_7DL, TaCslH3_3AS, TaCslH2_3B). Rice CesA10, 11 and CslH3 also contained only the DXD but lacked the D and QXXRW motifs [38]. The variable amino acids in the conserved motifs DXD and QXXRW were diverse in different subfamilies of Csl genes, for example, for TaCslA (DMD, QQH/FRW); TaCslC (DMD, QQHRW); TaCslD (DCD, QVLRW); TaCslE (DCD, QHKRW); TaCslF (DC/GD, QI/VL/VRW); TaCslH (DCD QF/YKRW); TaCslJ (DCD, QNKRW). These motifs are highlighted in alignment files in the text file S_2a-f.

Phylogenetic analysis of the CslD subfamily

The evolutionary history of the CslD subfamily from Arabidopsis, Brachypodium, rice, maize and wheat was inferred using the Neighbor-Joining method, in MEGA6 [36], after grouping the orthologs from various species into different clades (Fig. 3). Rice Csl genes were used as reference because their complete nomenclature is well documented. All the genes grouped into three clades. The first clade contained CslD2 and CslD1 genes from rice and their orthologs from the remaining species. The three homeologous genes of wheat branched together with OsCslD1; wheat genes under this clade were named TaCslD1_1AL, TaCslD1_1BL, and TaCslD1_1DL. The second clade contained two subgroups with the orthologs of rice genes CslD3 and CslD5 from different species. The genes in the first subgroup were named TaCslD3_2AS, TaCslD3_2BS, and TaCslD3_2DS, and those of the second subgroup TaCslD5_7AL, TaCslD5_7BL, and TaCslD5_7DL. The last clade was composed of the orthologs of the rice CslD4 and wheat genes TaCslD4_5BS, TaCslD4_1BS and TaCslD4_5DS. Here we found only two homeologs of TaCslD4, but a gene from the 1BS genome (TaCslD4_1BS) of wheat grouped together with TaCslD4 genes (bootstrap = 1000), pointing to a translocation from its original A genome (Table 1). This gene shared sequence identity of 85% with TaCslD4_5BS at the amino acid level. OsCslD genes shared 73–86% sequence identity with the corresponding wheat orthologs.

Fig. 3
figure 3

An unrooted phylogenetic tree representing the CslD subfamily from Arabidopsis, Brachypodium, maize, rice and wheat using Neighbour Joining (NJ) method with 1000 replicates to generate bootstrap values that are shown beside the each node forming the Csl clusters. Different colors and shapes represent orthologous Csl genes from different species. Arabidopsis-blue circles, Brachypodium- sky blue triangels, maize-brown rectangles-, rice-no marker, and wheat-black circles

Gene structure and intron evolution of TaCslD subfamily

The 12 TaCslD genes identified from bread wheat ranged in size from 1519 to 5864 bp. The TaCslD4_1BS gene was the shortest and TaCslD1_1AL was the longest. Homeologous copies of all the genes shared sequence identity ranging from 87 to 94% at the nucleotide level. The variation in size among different genes was primarily because of the number and length of introns but also because of a lack of the complete sequences in the database (Fig. 4). The number of introns in all the genes varied from 2 to 4. Two homeologs: TaCslD1_1AL and TaCslD1_1BL each contained three introns whereas, a third homeolog (TaCslD1_1DL) had four. The genes TaCslD3, TaCslD4 and their homeologs contained three introns each, except TaCslD4_1BS with only two introns. TaCslD5 and its homeologs also had two introns each. For the phases of introns, the genes from the TaCslD subfamily exhibited variable patterns of distribution. Introns 1, 2 and 3 of TaCslD1_1AL, TaCslD1_1BL and TaCslD1_1DL were in 2, 0, and 0 phase whereas the 4th intron of TaCslD1_1DL was in 0 phase. Introns 1 and 2 of TaCslD3_2AS, TaCslD3_2BS and TaCslD3_2DS both were in 0 phase. The third intron of these genes was in phase 2, 1 and 2 respectively. The genes TaCslD4_5BS, TaCslD4_5DS, TaCslD5_7AL, TaCslD5_7BL and TaCslD5_7DL had intron 1 and 2 in phases 2 and 0, respectively, and the third intron of TaCslD4_5BS and TaCslD4_5DS was in phase 0 and 2, respectively. TaCslD4_1BS had introns 1 and 2 in phases 1 and 0. The largest proportion of introns (60%) of all the genes was in phase 0, followed by phase 2 (34%) with a few in phase 1 (6%).

Fig. 4
figure 4

Structural features and phases of intron evolution of the CslD subfamily genes. Drawn to scale, exons are represented by red boxes and introns by back lines. Corresponding phases of intron evolution (0, 1, and 2) for the CslD genes are shown on the top of the black lines

Expression analysis of TaCsl genes from bread wheat

Publicly available RNA-Seq datasets were used to analyse the expression of TaCsl genes over three developmental stages and different tissues of wheat including root, stem, leaf, spike, and grain. Expression data were available for 32 of the TaCslA genes. Two genes (TaCslA1_6AS and TaCslA1_6BS) were expressed in all the tissues except reproductive stem and leaves. Four genes (TaCslA5_2BS, TaCslA5_2DS, TaCslA6_3B, and TaCslA6_3AL) were expressed moderately. TaCslA9 gene was highly expessed in the leaf tissue from the reproductive stage while the transcript abundance of the remaining genes was low (Fig. 5). TaCslC subfamily genes, wtht the exception of TaCslC3, TaCslC9 and two homeologs of TaCslC10, were expressed highly in root and spike tissues. Two genes, TaCslC1 and TaCslC7 and their homeologs displayed moderate to high expression in all the tissues at seeding and vegetative stage. One gene (TaCslC10_5DL) exhibited moderate to high expression in all the tissues studied except reproductive stem and grain (Fig. 6). Expression of most of the genes of the TaCslD subfamily ranged from moderate to a high in the spike and root tissues but was very low in all the other tissues (Fig. 7). Three of the 10 TaCslE subfamily genes (TaCslE2_6AL, TaCslE2_6BL and TaCslE3) were expressed from moderate to high levels in all the tissues.The remaining genes were expressed at a very low level in all the tissues (Fig. 8). A mixed pattern of expression was observed in the large TaCslF subfamily. Three genes (TaCslF6_7AL, TaCslF6_7BL and TaCslF6_7DL) were highly expressed in all the tissues except the leaves at the reproductive stage. Two genes (TaCslF4_2BS and TaCslF4_2DS) were highly expressed in the stem tissue, but only at a low or moderate level in all other tissues. All other genes expressed at low or moderate levels in one or more tissues (Fig. 9). In the TaCslH subfamily, one of the eight genes, TaCslH1_2BL, was expressed from moderate to high levels in the leaf, stem and spike tissues. The remaining genes were expressed from low to moderate levels in all the tissues (Fig. 10). Three out of four members of the subfamily TaCslJ were expressed from low to moderate levels in the leaf and root tissues while one gene (TaCslJ1_3DS) was poorly expressed in all the tissues studied (Fig. 10).

Fig. 5
figure 5

Heat map showing the expression profiling of wheat TaCslA genes at seedling, vegetative and reproductive stages. RNA-seq data were obtained from root, leaf, stem, spike and grain of the Chinese spring cultivar. The respective transcripts per 10 million values were used to construct heat map with the scale bar showing expression of the genes

Fig. 6
figure 6

Heat map of the expression profiling of wheat TaCslC genes at seedling, vegetative and reproductive stages. RNA-seq data were obtained from root, leaf, stem, spike and grain of Chinese spring cultivar. The respective transcripts per 10 million values were used to construct heat map with scale bar showing expression of the genes

Fig. 7
figure 7

Heat map of the expression profiling of wheat TaCslD genes at seedling, vegetative and reproductive stages. RNA-seq data were obtained from root, leaf, stem, spike and grain of Chinese spring cultivar. The respective transcripts per 10 million values were used to construct heat map with scale bar showing expression of the genes

Fig. 8
figure 8

Heat map of the expression profiling of wheat TaCslE genes at seedling, vegetative and reproductive stages. RNA-seq data were obtained from root, leaf, stem, spike and grain of Chinese spring cultivar. The respective transcripts per 10 million values were used to construct heat map with scale bar showing expression of the genes

Fig. 9
figure 9

Heat map of the expression profiling of wheat TaCslF genes at seedling, vegetative and reproductive stages. RNA-seq data were obtained from root, leaf, stem, spike and grain of Chinese spring cultivar. The respective transcripts per 10 million values were used to construct heat map with scale bar showing expression of the genes

Fig. 10
figure 10

Heat map of the expression profiling of wheat TaCslH and TaCslJ genes at seedling, vegetative and reproductive stages. RNA-seq data were obtained from root, leaf, stem, spike and grain of Chinese spring cultivar. The respective transcripts per 10 million values were used to construct heat map with scale bar showing expression of the genes

Discussion

Grass cell walls contain 20–40% non-cellulosic polysaccharides. The proportion and composition of these polysaccharides varies in different plant species [39]. After the first report demonstrating the β-glucan synthase activity in a Csl-encoded protein was published [15], several members of the Csl gene family have been reported to be involved in the formation of the backbone of the hemicellulosic polysaccharides [16, 18, 19, 26, 38, 40, 41]. As information on the identify of the Csl genes in wheat was lacking, we undertook this study to fill this gap.

We retrieved 108 TaCsl genes from wheat using two conserved domains, PF00535, and PF03552, which were previously shown to be present in the derived proteins of all the Csl genes [12]. These genes include homeologs from A, B and D genome of bread wheat. Similar patterns of homeologous genes were found for FLOWERING LOCUS T (FT), Pairing homeologous 1 (Ph1) and ADP-glucose pyrophosphorylase (AGPase) gene families of hexaploid wheat. Approximately, a quarter of the identified Csl genes were predicted to be alternatively spliced, possibly contributing to the diversity of encoded enzymes. A recent study suggested that alternative splicing was common in plants and accounted for about 20% of the loci transcribed in the leaf and spike tissues of Aegilops tauschii. In the case of germinating barley embryos, 14–20% of intron-containing genes were alternatively spliced [42]. This phenomenon, apparently meant to increase the fitness of an organism, has not thus far been reported for the Csl genes from other species [43].

The TaCsl genes were distributed across all the wheat chromosomes except one, chromosome 4 (Fig. 11). A similar trend of Csl gene distribution was observed in barley [9, 44, 45]. More than half the TaCsl genes were located on only two chromosomes: 2 (32%) and 3 (22%). This suggests hyper-multiplication of the Csl genes on these chromosomes although the reasons for this phenomenon are unknown. It appears, though, that cis duplication of the Csl genes was favored over trans duplication in wheat. Five of the nine CslF genes in barley were located on chromosome 2H [40]. In fact, the barley CslF gene was assigned its role in mixed-linked glucan (MLG) formation via syntenic orthology with rice long before the barely genome sequence became available [40] A detailed analysis of the rice syntenic region corresponding to a known QTL for MLG from barley, which had been published previously, initially led to the breakthrough of the role of CslF in the formation of this polysaccharide [40]). A similar cluster of CslF genes was also detected in the conserved syntenic regions of Brachypodium and sorghum on chromosomes 1and 2, respectively [9].

Fig. 11
figure 11

Pie chart showing the percentage of TaCsl genes on wheat chromosomes

The observation that only half of genes from the subfamily CslA were expressed at varying levels in the studied tissues suggests that the apparently silent genes may provide a backup under stressful conditions. Alternatively, they may express only transiently in specialized cells or cell parts at levels too low to be detected by the method used to study expression. The first biochemical evidence for the relationship of CslA genes with mannan synthase activity came from the expression of a guar CSLA cDNA in soybean somatic embryos [15]. Subsequent studies in insect cells demonstrated the role of CslA family members in the glucomannan synthases [16, 46]. Reverse genetic and biochemical approaches in Arabidopsis and Dendrobium officinale have also allowed association of certain CslA genes with glucomannan biosynthesis [41, 47]. A recent study in wheat suggested the involvement of a gene from the CslA subfamily in the development of tillers, cell wall composition and stem strength. This study further suggested the probable role of CslA gene transcript levels in carbon partitioning throughout the plant [48].

For the subfamilies TaCslC and TaCslD, most of the genes were relatively highly expressed in the root and spike tissues during the vegetative as well as reproductive phases. Heterologous expression in Pichia revealed that the CslC-encoded enzymes made β-1,4-glucan, the backbone of xyloglucan [19]. The CslD subfamily is conserved in all land plants and is most closely related to the CesA gene family with 40–50% sequence similarity at the amino acid level [49]. Similar to CesAs, the CslD subfamily is ubiquitous in all plant genomes examined to date, unlike other, taxa-specific Csl subfamilies [50]. Previous reports also showed the involvement of certain members of the CslD subfamily in tip growth, for example development of root hairs and pollen tube elongation [51, 52], normal plant growth [50, 53], and meristem morphology [53, 54]. More recently, their role in resistance against biotic stresses has been described [55]. Adding to this discussion, our in silico expression analysis suggests the involvement of certain TaCslD genes in spike development. This suggestion is supported by the observation that a mutant, slender leaf 1 (sle1), which encodes the CSLD4 protein in rice, reduces the number and width of spikelets in the panicle [56].

Two groups of Csl genes, CslF and CslH, have evolved independently in grasses [57]. A third group CslJ, originally believed to be specific to grasses, was recently identified in some dicots [11, 13]. Although TaCslF6 gene showed higher expression in all the studied tissues except the leaf tissue from reproductive stage, it was the only member of the TaCslF subfamily which expressed highly in the grain tissue. Several studies have demonstrated the functional role of CslF6 and CslH in the synthesis of MLG [18, 44, 58, 59]. Only one member of all the genes in these families, CslF6, was expressed in the grain, suggesting that it was responsible for MLG formation. MLG is a desirable polysaccharide as a dietary fiber but undesirable for the brewery industry because it causes haze in beer. It should be possible to select natural variants for the expression of the CslF6 gene to select for an increased or reduced MLG content depending upon the target market for the grain.

Differential expression patterns were observed among homeologous copies from three different genomes of bread wheat, which agree with the previous studies reporting unequal contributions of the three genomes toward gene expression. Interestingly, the homeologous copies of TaCslD genes also differed from each other in terms of intron phase evolution, indicating structural and functional divergence of homeologous gene copies (Fig. 4). Most introns were present in phase 0, which is in accordance with previous findings showing an intron bias in favour of phase 0 [7, 60, 61]. The three homeologs of each gene were not observed for all the genes reported in this study. This could be because of the incomplete sequencing information or because of the elimination of the genes during the allopolyploidization of wheat.

Conclusions

We have identified 108 TaCsl genes in bread wheat and classified them into seven subfamilies (CslA, CslC, CslD, CslE, CslF, CslH, and CslJ). Two or three homeoalleles were identified for most of the Csl genes. Although located on all the wheat chromosomes except chromosome 4, the Csl genes were especially concentrated on chromosomes 2 and 3, suggesting selective, localized duplication in cis phase. Only one of the 29 CslF genes, CslF6, was expressed in the grain, suggesting its role in mixed-linked glucan formation. Neither CslJ nor CslH was expressed in the grain. Information in this report will be helpful in designing experiments to alter wall composition in wheat for improving grain quality, culm strength, or culm composition for biofuels.