Functional & Integrative Genomics

, Volume 12, Issue 1, pp 173–182

Functional features of a single chromosome arm in wheat (1AL) determined from its structure


  • Stuart J. Lucas
    • Sabanci University, Biological Sciences and Bioengineering Program
  • Hana Šimková
    • Centre of the Region Haná for Biotechnological and Agricultural Research, Institute of Experimental Botany
  • Jan Šafář
    • Centre of the Region Haná for Biotechnological and Agricultural Research, Institute of Experimental Botany
  • Irena Jurman
    • IGA Institute of Applied Genomics
  • Federica Cattonaro
    • IGA Institute of Applied Genomics
  • Sonia Vautrin
    • INRA-CNRGV French Plant Genomic Resources Centre
  • Arnaud Bellec
    • INRA-CNRGV French Plant Genomic Resources Centre
  • Hélène Berges
    • INRA-CNRGV French Plant Genomic Resources Centre
  • Jaroslav Doležel
    • Centre of the Region Haná for Biotechnological and Agricultural Research, Institute of Experimental Botany
    • Sabanci University, Biological Sciences and Bioengineering Program
Original Paper

DOI: 10.1007/s10142-011-0250-3

Cite this article as:
Lucas, S.J., Šimková, H., Šafář, J. et al. Funct Integr Genomics (2012) 12: 173. doi:10.1007/s10142-011-0250-3


Bread wheat (Triticum aestivum L.) is one of the most important crops globally and a high priority for genetic improvement, but its large and complex genome has been seen as intractable to whole genome sequencing. Isolation of individual wheat chromosome arms has facilitated large-scale sequence analyses. However, so far there is no such survey of sequences from the A genome of wheat. Greater understanding of an A chromosome could facilitate wheat improvement and future sequencing of the entire genome. We have constructed BAC library from the long arm of T. aestivum chromosome 1A (1AL) and obtained BAC end sequences from 7,470 clones encompassing the arm. We obtained 13,445 (89.99%) useful sequences with a cumulative length of 7.57 Mb, representing 1.43% of 1AL and about 0.14% of the entire A genome. The GC content of the sequences was 44.7%, and 90% of the chromosome was estimated to comprise repeat sequences, while just over 1% encoded expressed genes. From the sequence data, we identified a large number of sites suitable for development of molecular markers (362 SSR and 6,948 ISBP) which will have utility for mapping this chromosome and for marker assisted breeding. From 44 putative ISBP markers tested 23 (52.3%) were found to be useful. The BAC end sequence data also enabled the identification of genes and syntenic blocks specific to chromosome 1AL, suggesting regions of particular functional interest and targets for future research.


WheatA genomeBAC end sequencingComparative genomicsMarker design


Bread wheat (Triticum aestivum) is one of the most important crop species, with global annual production currently over 600 million tonnes providing approximately one fifth of the world’s total calorific input (data from The Food and Agriculture Organization of the United Nations 2009). Continually raising the yield potential of wheat to match human population growth and stabilizing yield against the damaging effects of climate change is a top priority for agricultural science (Reynolds et al. 2009). While sequencing of the wheat genome would be of great utility for gene discovery and mapping of traits required for yield improvement, it has been perceived as too difficult owing to its size and complexity. At an estimated 17 Gb the wheat genome is 40 times larger than that of rice and contains about 80% repetitive sequence (Smith and Flavell 1975). Furthermore, as an allohexaploid (2n = 6x = 42) many sequences are present in three similar but different copies on each of the homoeologous genomes A, B, and D, further complicating genome-wide sequence analysis. Therefore, the majority of the wheat genomic sequences found in the public databases are those generated during targeted cloning projects or comparative studies of important traits (reviewed in Feuillet and Salse 2009), which necessarily focus on gene-rich segments.

Recently, methods of purifying the individual wheat chromosomes and producing chromosome-specific BAC libraries have been developed (Doležel et al. 2007, Šafář et al. 2010). By treating each chromosome individually, the complexities of physical mapping and sequence assembly can be greatly reduced. Additionally, high-throughput protocols for sequencing the ends of BAC clones (Kelley et al. 1999) enable the generation of large datasets of BAC end sequence (BES) distributed randomly across the whole genome, and therefore more representative of the total genome composition. Combining these approaches, Paux et al. (2006) used BES derived from wheat chromosome 3B to assess the composition and structure of the wheat genome.

Furthermore, BESs are a valuable source of molecular markers. Using BACs representing a minimum tiling path of soybean, Shultz et al. (2007) used BESs to develop new microsatellite markers. The wheat chromosome 3B BESs have been used to develop 711 chromosome-specific molecular markers based on transposable element insertion sites (insertion site-based polymorphism (ISBP)) (Paux et al. 2010). BAC end sequencing of rye (Secale cereale) chromosome 1RS, which is frequently translocated into wheat, allowed the development of 33 chromosome-specific SSR and ISBP markers at a success rate of better than 50%, with over 200 more potential marker sequences still to be tested (Bartoš et al. 2008).

Next-generation sequencing technologies provide the opportunity to obtain wheat genome sequences at a scale that has not previously been possible. Most notably sequencing of low-copy and genic regions of the short arm of wheat chromosome 7D (Berkman et al. 2011) and complete sequencing and assembly of 13 BAC contigs comprising 18 Mb of chromosome 3B (Choulet et al. 2010) have now been completed. We present BAC end sequencing of 7,470 BACs distributed across the long arm of wheat chromosome 1A (1AL), which provides insight into the composition of this chromosome and by extension to the rest of the A genome. In addition, the development of molecular markers for this chromosome and analysis of synteny between wheat 1AL and the complete genome sequences of rice, sorghum and Brachypodium distachyon are presented.

Materials and methods

Purification of chromosome arm 1AL by flow cytometric sorting

Liquid suspensions of intact mitotic chromosomes were prepared from double ditelosomic line (2n = 40 + 2t 1AS + 2t 1AL) of T. aestivum L. cv. “Chinese spring” according to Vrána et al. (2000). The samples were stained by DAPI and both the short (1AS) and the long (1AL) arms, maintained in the line as telocentric chromosomes, were purified simultaneously by flow cytomeric sorting as described by Kubaláková et al. (2002). The 1AL arms were sorted in batches of 200,000 into 320 μl of 1.5 × IB buffer (Šimková et al. 2003). The purity in sorted fractions was checked regularly by FISH as described in Janda et al. (2006) using probes for telomeric repeat and GAA repeat.

Construction of BAC libraries

Two 1AL-specific BAC libraries were constructed according to Šimková et al. (2011). Briefly, isolated HMW DNA was partially digested with HindIII (New England Biolabs, Beverly, MA, USA) and subjected to two rounds of size selection. DNA of particular size fractions was electroeluted from the gel and ligated into HindIII-digested dephosphorylated pIndigoBAC-5 vector (Epicentre, Madison, WI, USA). The recombinant vector was used to transform Escherichia coli ElectroMAX DH10B (TaaCsp1ALhA library) and MegaX DH10B (TaaCsp1ALhB library) competent cells (Invitrogen, Carlsbad, CA, USA), respectively. The libraries were ordered by Qbot (Genetix, New Milton, UK) into 384-well plates filled with 75 μl freezing medium consisting of 2YT, 6.6% glycerol and 12.5 μg/ml chloramphenicol. The clones were stored at −80°C. In order to estimate average insert sizes, a total of 160 BAC clones from the TaaCsp1ALhA library and 120 BAC clones from the TaaCsp1ALhB library were randomly selected from all size fractions of the libraries and analysed as described in Janda et al. (2006).

BAC end sequencing

Clones selected for BAC end sequencing were cultured overnight, and BAC DNA isolated using routine alkaline lysis miniprep techniques.

Sequencing reactions were set up using Big Dye Terminator chemistry (Applied Biosystems, Foster City, CA, USA) according to the manufacturer’s instructions. Both ends of each BAC were sequenced using universal M13 forward (5′CAGGAAACAGCTATGACC3′) and reverse (5′TGTAAAACGACGGCCAGT3′) primers, and a 3,730 × l DNA Analyser (Applied Biosystems). Chromatogram traces were base called and scored for quality using PHRED (Ewing and Green 1998, Ewing et al. 1998).

Annotation of chromosome 1AL sequences


DNA Repeat sequences were downloaded from the following databases: TREP release 10 (; Repbase Update release 15.11 (Jurka et al. 2005;; and the TIGR plant repeat databases (Ouyang and Buell 2004; For analysis of syntenic sequences, B. distachyon CDS (genome annotation v1.2) were downloaded from the B. distachyon project (International Brachypodium Initiative 2010, Sorghum bicolor CDS and Oryza sativa transcripts (genome annotation v6.1) were obtained from PlantGDB (Dong et al. 2005, For gene identification, 1,616,584 PlantGDB-assembled Unique Transcripts (PUTs) were obtained from the same source. PUTs were taken from the following plant species: Arabidopsis thaliana (543,450 PUTs), B. distachyon (30,991), Glycine max (259,849), Hordeum vulgare (134,482), O. sativa (146,642), S. cereale (5,977), S. bicolor (44,954), T. aestivum (301,765), Triticum monococcum (6,987), and Zea mays (181,717). In addition, 15,871 full-length Triticeae CDS were obtained from TriFLDB (


Known DNA repeat sequences were searched for using RepeatMasker version 3.2.9 ( with the CrossMatch algorithm (Green 1996; to locate alignments. All other similarity searches were carried out using the BLAST+ software suite downloaded from the NCBI (Camacho et al. 2009). Putative SSR markers were identified using SciRoKo v3.4 (Kofler et al. 2007) and ISBP markers using (Paux et al. 2010). PCR primers were designed using Primer3 (Rozen and Skaletsky 2000).

Identifying repetitive elements

A semi-automated pipeline was used to identify and mask repetitive elements from the BES. First of all, three consecutive runs of RepeatMasker were carried out using default settings with three different custom libraries in the following order: TREPall, Repbase Update, TIGR plant repeats. Sequences matching known repeats were masked with an ‘N’. Putative unknown repeats were then identified by searching masked BES with BLASTN against themselves and marked as repeats if they gave three or more hits of 50 bp or more at >80% identity. Repeats were classified according to the system proposed by Wicker et al. (2007).

Gene and synteny analysis

The repeat-masked sequences were then used in BLASTN searches against the PUTs, CDS and transcript sequences mentioned above at an e value cutoff of 1e−30. Proportion of the chromosome involved in coding sequences was derived from the cumulative match length. For those putative genes that gave a match of more than 200 bp in length, BLASTX searches (e value = 1e−10) were carried out against all non-redundant protein sequences. Hits corresponding to transposable element proteins and hypothetical proteins were removed from the analysis. For synteny analysis, only hits of longer than 50 bp were considered.

Marker development and testing

Unmasked sequences were searched for SSR markers using SciRoKo while ISBPs were identified using the results of the initial three rounds of repeat masking (against libraries of known repeats) using ISBPs were then sorted by hand to select unique junctions with high confidence, meaning that the end of a repetitive element could be clearly identified on one or both sides of the junction. Primers sequences used to test putative SSR and ISBP markers are listed in Online resource 4. PCR reactions were carried out using Taq polymerase and standard protocols (Budak et al. 2005).


BAC cloning of chromosome arm 1AL

Two BAC libraries cloned in two different bacterial strains were prepared from the 1AL. A total of 7.7 × 106 flow-sorted chromosome arms were used to construct the first library named TaaCsp1ALhA. The library cloned in ElectroMAX DH10B E. coli competent cells (Invitrogen, Carlsbad, CA, USA) comprises 49,536 clones. Considering 1AL size of 523 Mbp (Šafář et al. 2010), 83% purity of the sorted fraction, 1% of empty clones, and average insert size of 103 kb, the library provides 8× coverage of the 1AL arm. Aiming to reach 15× coverage favourable for construction of physical contig maps, a second library (TaaCsp1AlhB) was constructed using bacteriophage resistant MegaX DH10B competent cells (Invitrogen). This library was made from 6.0 × 106 flow-sorted arms and contains 43,008 clones with mean insert size of 109 kb. Considering estimated 87% purity of the sorted fraction and 1% of empty clones, the library represents 7.7 arm equivalents. Thus together the 1AL-specific libraries contain 92,544 clones and provide 15.7× coverage of the T. aestivum chromosome arm 1AL.

BAC end sequencing of wheat chromosome 1AL libraries

High-information content fingerprinting of BAC clones has proved effective in resolving the repetitive nature of wheat genomic seqeunces and constructing a physical map for chromosome 3B (Paux et al. 2008). Using a similar strategy, we fingerprinted both 1AL-specific BAC libraries, generated a preliminary physical map of chromosome 1AL (unpublished data), and from this selected a minimum tiling path of 7,470 BAC clones which are expected to cover the entire chromosome. Both ends of each BAC were sequenced and after eliminating poor quality bases, 13,445 useful sequences (89.99% success rate) with an average edited read length of 563 bp were obtained. This gives a total of 7.57 Mb of sequence distributed across the chromosome, representing 1.43% of the long arm of chromosome 1A. The GC content of the sequenced portion was 44.71%.

Annotation of chromosome 1AL sequences

The repeat content of chromosome 1AL was estimated by sequentially analysing the BESs with three libraries of plant repetitive sequences (see “Materials and methods”). Novel repeats were then identified by carrying out sequential BLAST searches of the BESs against themselves, followed by the sequences generated by 454 sequencing. From these searches, 8.2% of the chromosome is predicted to consist of novel repeats, bringing the total repetitive content of 1AL up to 90% of the chromosomal DNA. The repeat composition was compared with previously published data describing the B and D genomes (Fig. 1). Chromosome 1AL showed significant differences from the D genome but a very similar composition to the B genome, as might be anticipated from their closer evolutionary relationship.
Fig. 1

Comparison of the composition of wheat A, B, and D genomes. The percentage of each kind of repeat in the A genome was estimated by dividing the total number of bases assigned to that repeat in the 1AL BES sequences by the total length of the BESs (cf. Table 1). Data for the B genome were obtained in a similar way using 10.8 Mb of BESs from wheat chromosome 3B (Paux et al. 2006). The D genome is represented by 2.9 Mb of whole-genome shotgun sequence from the progenitor of this genome, A. tauschii (Li et al. 2004)

The repetitive elements were masked with strings of “N,” and the remaining sequences used to identify expressed genes present in chromosome 1AL. BLAST searches were carried out against PUTs (PlantGDB-assembled Unique Transcripts) generated from EST sequences from a variety of plants. Cumulative hit length was used to estimate the proportion of the chromosome encoding expressed genes, and totalled 78,248 bp, which is equivalent to 1.03% of the total sequence. If this proportion is maintained across the whole chromosome, the estimated transcribed fraction of 1AL is 5.4 Mb. Sequences that gave hits longer than 200 bp were then annotated by searching against all non-redundant protein sequences. After eliminating hits against repetitive element-derived and hypothetical proteins, 29 putative proteins expressed from 1AL were identified (Table 1).
Table 1

Protein homologs of predicted genes


Functional annotation



E value


Pentatricopeptide repeat-containing protein


Ricinus communis



Ionotropic glutamate receptor ortholog GLR6


Oryza sativa



Avr9/Cf-9 rapidly elicited protein


O. sativa





Boesenbergia rotunda



Respiratory burst oxidase


O. sativa



GRAS-family transcription factor containing protein


O. sativa



Zinc-finger protein


Populus euphratica



Verticillium wilt disease resistance protein


O. sativa



Kinesin-related protein


O. sativa



Sulphate transporter


Triticum aestivum



Nucleolar protein NOP56 (ribosome biogenesis)


Zea mays



RNA-binding protein


R. communis



Polygalacturonase PG1


O. sativa



Formin-like protein


O. sativa



Protein kinase


O. sativa



Kinase-interacting protein 1


O. sativa



NB-ARC domain containing protein


O. sativa



Receptor kinase 1


T. aestivum



RNA recognition motif family protein


O. sativa



Nascent polypeptide associated complex alpha chain


Nicotiana benthamiana



MATE efflux protein


O. sativa





O. sativa



Transcription factor GAMyb


O. sativa



Binding protein with PPR repeat


Arabidopsis lyrata



Binding protein with PPR repeat


A. lyrata



Hydrolase, alpha/beta fold family protein


Arabidopsis thaliana



F-box domain containing protein


Z. mays



Proline iminopeptidase


O. sativa



Signal transduction protein


O. sativa


Repeat-masked BES sequences of 200 bp or longer that matched known expressed sequences were identified as described in “Materials and methods.” Functional annotations were taken from the highest-scoring annotated BLASTX hit among non-redundant proteins. Transposable element proteins and hypothetical proteins were excluded from the analysis

Development of molecular markers from BAC end sequences of chromosome 1AL

BAC end sequences can provide a rich source of potential molecular markers for mapping and sequencing projects. Among these, short sequence repeats have long been established as a useful source of polymorphism between closely related plant cultivars. More recently, ISBPs, located at the junctions between repetitive elements and unique sequences, have been proposed as a valuable and almost limitless source of genetic markers for highly repetitive genomes such as T. aestivum (Paux et al. 2006, 2010). The 1AL BES sequences were analysed for the presence of both SSR and ISBP-type sequences. A total of 433 SSRs were identified within the BESs, from which it was possible to design 362 viable primer pairs (see Online resource 1). Suitable junctions for designing ISBP markers numbered 9,338 (see Online resource 2) representing 6,948 of the BESs. Among these 147 of the ISBPs designed (1.57%) also incorporated an SSR, which may increase their chances of containing polymorphisms (listed in Online resource 3). To test the utility of these markers, PCR screens were carried out for 44 putative markers (eight SSRs, 26 ISBPs, and ten ISBPs incorporating SSRs). Of these, 23 (52.3%) correctly amplified the BAC against which they were designed when screened against multiple pools of different 1AL BAC clones (results summarized in Online resource 4). Developing genetic markers for T. aestivum is complicated by the hexaploid genome, where markers often amplify multiple similar loci on homeologous chromosomes. Therefore, 18 of the successful markers were then used to amplify gDNA from cultivars Chinese Spring and Renan, to test for size polymorphisms, along with nullisomic lines for 1A to test whether the marker locus was specific to this chromosome. Additionally, gDNA from T. monococcum, the diploid ancestor of the A genome, was screened. Typical results are shown in Fig. 2. Out of 18 markers, eight were specific to chromosome 1A (e.g., ISR13 and bISBP1; Fig. 2) while four appeared to have size polymorphisms between Chinese Spring and Renan (e.g., ISR19 and bISBP5; the bands for ISR19 were reproducible, but faint; some PCR optimization is required). However, 11 out of 18 markers were polymorphic between Chinese Spring and T. monococcum, as might be expected from the greater genetic distance in this case. The efficiency of marker development from SSR and ISBP sequences appeared to be similar, although a larger sample set is required to make statistically meaningful conclusions.
Fig. 2

Analysis of specificity of molecular markers generated from BAC ends. PCR products for four typical markers were separated on 2% agarose. Control PCR reactions with no template (-ve) were also carried out. Genomic DNA from T. monococcum and two T. aestivum cultivars, Renan and Chinese Spring (CS) were used as templates along with N1A-T1B (nullisomic for chromosome 1A, tetrasomic for 1B), N1A-T1D (nullisomic for 1A, tetrasomic for 1D). Size standards are GeneRuler 1 kb Ladder Plus (Fermentas GmbH, St. Leon-Rot, Germany), and expected sizes for each marker are indicated by open arrowheads. Markers labelled ISR are ISBPs which contain an SSR, those labelled bISBP are simple ISBPs

Syntenic relationships between chromosome arm 1AL and other grass genomes

The complete sequencing of the genomes of rice, sorghum and most recently the model grass species B. distachyon provide a valuable resource for mapping the genomes of other related grass species. Mayer et al. (2009) used similarity with rice and sorghum coding sequences to integrate 454 shotgun reads and EST sequences with a genetic map of barley chromosome 1H. Similarly, the non-repetitive BES sequences were searched for in the complete coding sequences (CDS) of B. distachyon, O. sativa, and S. bicolor. Of the BAC clones used in this study, 101 give significant matches to at least one CDS from a fully sequenced grass species. The BES were then used to search putative full-length Triticum transcripts deposited in TriFLdb, and hits were compared with the other grass genomes, identifying a further 34 clones that matched a conserved CDS (Fig. 3a). The highest number of conserved CDS were found with B. distacyhon (112/135), supporting its utility as a model organism for wheat.
Fig. 3

Syntenic relationships of putative gene sequences. a Venn diagram showing the number of CDS/full-length transcripts from each indicated grass species with homologs in the repeat-masked BES sequences. b Diagram showing the relationships between syntenic blocks from B. distachyon, O. sativa, and S. bicolor represented on chromosome 1AL. Approximate positions of blocks on each chromosome and inversions are indicated by the coloured lines

In total, 95 1AL BESs gave hits in two or more sequenced grass species. These matches were compared, and grouped into syntenic groups where more than three sequences were present in the same order on at least two of the genomes (Fig. 3b). These revealed two major blocks of synteny, corresponding to Brachypodium chromosome 2/Oryza chromosome 5/Sorghum chromosome 9 and Brachypodium chromosome 3/Oryza chromosome 10/Sorghum chromosome 1, which correspond to the main syntenic regions also identified on barley chromosome 1H (Mayer et al. 2009). In addition, three smaller syntenic blocks were identified which may represent shorter chromosome fragments that have been translocated into chromosome arm 1AL during the evolution of T. aestivum. The full list of relationships is given in Online resource 4 and the syntenic blocks in Online resource 5.

For all sequences that showed a syntenic relationship, the orthologs in other grass species had their predicted function assigned by gene ontology examined at Gramene (2011). Of the 95 syntenic sequences, 25 had no known function. The remainder were divided into groups by function (Table 2). In several cases, two BES sequences mapped to the same syntenic gene, which may indicate that they were derived from overlapping BAC clones, or that the gene has been duplicated in wheat. Of note was the relatively high representation of protein kinases (8/95) which may indicate a cluster of these genes on chromosome 1AL, or that the wheat genome as a whole contains a large number of these signalling molecules. Also of interest are the putative stress response and carbohydrate metabolism genes, which may underlie quantitative trait loci (QTLs) that have been mapped to chromosome 1AL.
Table 2

Functional annotation of syntenic genes found on chromosome 1AL


BES sequence

Syntenic gene

Predicted function

Transcription regulation



CG-1 domain TF




GRAS-family TF






Histone methyltransferase



Transcription factor B3



GCN5-related N-acetyltransferase

Carbohydrate metabolism



UDP-glucosyl transferase



Starch synthase




O-Glycosyl hydrolase

Cytoskeleton and vesicle transport



Clathrin/coatomer adaptor like



Rab GTP dissociation inhibitor



Dynamin GTPase



Dynamin GTPase



Alpha tubulin



Alpha tubulin






Vinculin (cell adhesion)


Membrane transport



Sulphate transporter



Cobalt transporter



MatE multi-drug transporter



Tetracycline-proton antiporter



Cation efflux transporter



Heavy metal transport

Nucleic acid modification



RRM-RNP1 (NA binding)



3′–5′ Exonuclease



Nucleic acid binding






DEAD-box RNA helicase




DEAD-box RNA helicase

Signal transduction



Protein kinase



ANTH (phospholipid binding)



MARCKS (calmodulin binding)




Protein kinase




GTP binding/GTPase



Ser/Thr protein kinase



Protein kinase



Protein kinase



Protein kinase



Protein kinase



Glutamate receptor

Stress responses



DnaJ heat-shock protein






Apoptosis/defense response

Cell metabolism



Cytochrome P450 B



Carbamoyl phosphate synthase



Nucleoside diphosphate kinase



Proton-coupled ATP synthase



Phosphoribosylformylglycinamidine synthase






Pyruvate, phosphate dikinase



Lipase 3



Proton-coupled ATP synthase




Protein synthesis and degradation



Peptidase S1_S6






TFIIB-related (translation initiation)



Ubiquitin-dependent peptidase C19



Ubiquitin ligase



Serine peptidase S28



Peptidase C13



Signal recognition particle



IF2 (translation initiation)



SecY protein translocase



Ubiquitin ligase


25 sequences

Syntenic genes (those found in at least two sequenced grass species) that are also present in 1AL were identified as described in “Results.” Where a syntenic Brachypodium gene was found, this gene is indicated; where genes were present in rice and sorghum but not Brachypodium, the rice gene is indicated


The construction of BAC library from chromosome arm 1AL marks a step forward in developing large-insert libraries from all chromosome arms of hexaploid wheat. These subgenomic resources provide an entrance into the complex genome and facilitate its mapping, positional cloning and sequencing (Paux et al. 2008). The analysis of a total of 7.57 Mb of BES from wheat chromosome arm 1AL presented here provides interesting comparisons with previous studies on other Triticeae chromosomes. The GC content was extremely similar to that previously reported for BESs from chromosome 3B (Paux et al. 2006), 44.7% and 44.5%, respectively, and the repeat composition was also extremely similar, whereas data from the D genome ancestor Aegilops tauschii showed significantly different proportions (Li et al. 2004; Fig. 1). The consistency of the data from 1AL and 3B suggests that the global composition of the A and B genomes is very similar and that BES studies provide a good representation of the whole chromosome.

By sequence similarity to plant ESTs, the transcribed portion of 1AL was estimated to be 1.03% of the total sequence, or a cumulative length of about 5.4 Mb. The average length of 6,137 full-length cDNAs from wheat was 1,143 bp (Mochida et al. 2009), which would suggest the presence of about 4,700 genes on chromosome 1AL at an average density of 1/110 kb, and about 50,000 genes for the entire A genome. These values are close to those previously reported, although it must be noted that the size limitation of BESs makes it difficult to distinguish intact genic sequences from pseudogenes, so this may be an overestimation.

Complete sequencing of wheat chromosomes and marker assisted selection both require a high density of molecular markers to distinguish between similar repetitive sequences (Paux et al. 2010). The most densely populated physical map for 1AL currently available consists of 334 deletion bin-mapped ESTs (Peng et al. 2004). However, the small number of deletion bins in 1AL (only three) limits the resolution of this map. The probable order of a minority of these ESTs can be found by examining their relationships with syntenic regions on the sequenced genomes of rice or other grasses (Quraishi et al. 2009), thus converting them into conserved orthologous set markers. However, many additional markers are still required to saturate the chromosome. Using the 1AL BES dataset, we were able to generate 100 s of putative SSR markers and 1,000 s of putative ISBP markers. When these were tested for specificity, 23/44 (52.3%) were found to be useful. The 21 putative markers that were not useful were mostly rejected due to lack of specificity, which is a risk when using ISBP markers; as one of the primers in each pair binds to a repetitive sequence, the other must be highly specific to ensure accurate amplification. From the successful markers 4/18 (11.6% of all the markers tested) showed size polymorphisms between Chinese Spring and Renan. If the remaining ISBP and SSR markers presented here are developed at the same success rate, about 3,750 useful markers could be generated to populate future physical and genetic maps of chromosome 1AL, of which about 830 would be expected to have size polymorphisms in a Chinese Spring × Renan cross. In addition, many more of the new markers may be polymorphic between Chinese Spring and other cultivars, or may contain single-nucleotide polymorphisms (SNPs) not identified in this study. Indeed, sequencing of 157 ISBPs from wheat chromosome 3B in eight different wheat lines (Paux et al. 2010) revealed polymorphisms between at least two of the lines for 67% of the ISBPs, the great majority of which were SNPs. Another advantage of the markers generated here is that, having been designed from minimum tiling path BESs, they should be evenly distributed across the whole chromosome.

Similarly, using the BES sequences we were able to identify syntenic blocks located on chromosome arm 1AL. The major regions of synteny were those expected from previous analysis of the related barley chromosome 1H (Mayer et al. 2009). However, three smaller syntenic blocks were also identified that most likely represent smaller translocations from other chromosomes. Using the syntenic relationships, it was possible to identify genes in the sequenced grass species that are likely to be present on chromosome arm 1AL, in addition to those identified by similarity to ESTs. These relationships will be useful in identifying candidate genes for QTLs that have been mapped to this chromosome. For example, Li et al. (2010) identified a QTL for tolerance to photo-oxidative stress on 1AL, while our study detected a putative peroxidase and Cytochrome P450 protein that may be involved in the oxidative stress response. Yield is a sufficiently complex trait that QTLs influencing yield-related traits have been mapped to most wheat chromosomes, but meta-QTL analysis identified two yield-enhancing MQTLs on chromosome 1AL (Zhang et al. 2010). In this regard the four carbohydrate metabolism genes identified here are of particular interest for further study, as yield is affected by the balance of starch production and degradation in the grain.

One important locus associated with grain quality QTLs that have been mapped to chromosome 1AL is the high-molecular weight glutenin gene Glu-A1 (Kuchel et al. 2006). The bulk of the Glu1 protein sequence is made up of two repeated motifs (consensus sequences PGQGQQ and GYYPTSLQQ), and the gene is not found in Brachypodium or rice. However, when the non-repetitive sections of the Glu1 protein sequence (the N-terminal 120 residues and C-terminal 45 residues) are used in similarity searches against these two species, their best matches are Bradi2g20870 (prolamin subfamily 2) and Os05g41970 (SSA1 family protein), respectively. Both of these putative proteins are involved in seed storage and located in the same syntenic location in this study—block III, between BACs Tae1AL160D24 and Tae1AL13G19. Therefore, it is possible that the Glu-A1 gene is evolutionarily related to these two genes, and also found at the same location, which we will be able to characterise more closely once the 1AL physical map is finalised.


We are grateful to Prof. B. S. Gill (Kansas State University, Manhattan, USA) for seeds of the double ditelosomic line 1A of wheat T. aestivum L. cv. Chinese Spring. We thank our colleagues, Dr. Jarmila Číhalíková, Dr. Marie Kubaláková and Romana Šperková, Bc. for chromosome sorting and Jana Dostálová, Bc., Radka Tušková, Helena Tvardíková, and Dr. Marie Seifertová for excellent technical assistance in BAC library construction and Z. Weinstein for the help with the MS. The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013) under the grant agreement no. FP7-212019.

Supplementary material

10142_2011_250_MOESM1_ESM.xlsx (42 kb)
Online resource 1Excel spreadsheet of 362 putative SSR markers identified in chromosome 1AL BES sequences. (XLSX 42.3 kb)
10142_2011_250_MOESM2_ESM.xlsx (1.7 mb)
Online resource 2Excel spreadsheet of 9,338 putative ISBP markers identified in chromosome 1AL BES sequences, including details of repetitive element junctions. (XLSX 1.71 mb)
10142_2011_250_MOESM3_ESM.xlsx (43 kb)
Online resource 3Excel spreadsheet of 147 putative ISBP markers that incorporate an SSR from 1AL BES sequences, including details of repetitive element junctions and microsatellite sequences. (XLSX 42.7 kb)
10142_2011_250_MOESM4_ESM.xlsx (14 kb)
Online resource 4Excel spreadsheet of primer pairs used to test 26 ISBP, eight SSR, and ten combined ISBP/SSR markers in PCR screens, along with summary of results from amplification of both pooled BAC clones and gDNA from wheat cultivars and nullitetrasomic lines. (XLSX 14.1 kb)
10142_2011_250_MOESM5_ESM.docx (21 kb)
Online resource 5Table of syntenic relationships of masked 1AL BES sequences. All BES that had significant homology to coding regions in at least two sequenced grass species are shown. CDS that are out of syntenic sequence in one species are highlighted. (DOCX 21.1 kb)

Copyright information

© Springer-Verlag 2011