Functional & Integrative Genomics

, Volume 12, Issue 1, pp 173–182

Functional features of a single chromosome arm in wheat (1AL) determined from its structure

Authors

  • Stuart J. Lucas
    • Sabanci University, Biological Sciences and Bioengineering Program
  • Hana Šimková
    • Centre of the Region Haná for Biotechnological and Agricultural Research, Institute of Experimental Botany
  • Jan Šafář
    • Centre of the Region Haná for Biotechnological and Agricultural Research, Institute of Experimental Botany
  • Irena Jurman
    • IGA Institute of Applied Genomics
  • Federica Cattonaro
    • IGA Institute of Applied Genomics
  • Sonia Vautrin
    • INRA-CNRGV French Plant Genomic Resources Centre
  • Arnaud Bellec
    • INRA-CNRGV French Plant Genomic Resources Centre
  • Hélène Berges
    • INRA-CNRGV French Plant Genomic Resources Centre
  • Jaroslav Doležel
    • Centre of the Region Haná for Biotechnological and Agricultural Research, Institute of Experimental Botany
    • Sabanci University, Biological Sciences and Bioengineering Program
Original Paper

DOI: 10.1007/s10142-011-0250-3

Cite this article as:
Lucas, S.J., Šimková, H., Šafář, J. et al. Funct Integr Genomics (2012) 12: 173. doi:10.1007/s10142-011-0250-3

Abstract

Bread wheat (Triticum aestivum L.) is one of the most important crops globally and a high priority for genetic improvement, but its large and complex genome has been seen as intractable to whole genome sequencing. Isolation of individual wheat chromosome arms has facilitated large-scale sequence analyses. However, so far there is no such survey of sequences from the A genome of wheat. Greater understanding of an A chromosome could facilitate wheat improvement and future sequencing of the entire genome. We have constructed BAC library from the long arm of T. aestivum chromosome 1A (1AL) and obtained BAC end sequences from 7,470 clones encompassing the arm. We obtained 13,445 (89.99%) useful sequences with a cumulative length of 7.57 Mb, representing 1.43% of 1AL and about 0.14% of the entire A genome. The GC content of the sequences was 44.7%, and 90% of the chromosome was estimated to comprise repeat sequences, while just over 1% encoded expressed genes. From the sequence data, we identified a large number of sites suitable for development of molecular markers (362 SSR and 6,948 ISBP) which will have utility for mapping this chromosome and for marker assisted breeding. From 44 putative ISBP markers tested 23 (52.3%) were found to be useful. The BAC end sequence data also enabled the identification of genes and syntenic blocks specific to chromosome 1AL, suggesting regions of particular functional interest and targets for future research.

Keywords

WheatA genomeBAC end sequencingComparative genomicsMarker design

Introduction

Bread wheat (Triticum aestivum) is one of the most important crop species, with global annual production currently over 600 million tonnes providing approximately one fifth of the world’s total calorific input (data from The Food and Agriculture Organization of the United Nations 2009). Continually raising the yield potential of wheat to match human population growth and stabilizing yield against the damaging effects of climate change is a top priority for agricultural science (Reynolds et al. 2009). While sequencing of the wheat genome would be of great utility for gene discovery and mapping of traits required for yield improvement, it has been perceived as too difficult owing to its size and complexity. At an estimated 17 Gb the wheat genome is 40 times larger than that of rice and contains about 80% repetitive sequence (Smith and Flavell 1975). Furthermore, as an allohexaploid (2n = 6x = 42) many sequences are present in three similar but different copies on each of the homoeologous genomes A, B, and D, further complicating genome-wide sequence analysis. Therefore, the majority of the wheat genomic sequences found in the public databases are those generated during targeted cloning projects or comparative studies of important traits (reviewed in Feuillet and Salse 2009), which necessarily focus on gene-rich segments.

Recently, methods of purifying the individual wheat chromosomes and producing chromosome-specific BAC libraries have been developed (Doležel et al. 2007, Šafář et al. 2010). By treating each chromosome individually, the complexities of physical mapping and sequence assembly can be greatly reduced. Additionally, high-throughput protocols for sequencing the ends of BAC clones (Kelley et al. 1999) enable the generation of large datasets of BAC end sequence (BES) distributed randomly across the whole genome, and therefore more representative of the total genome composition. Combining these approaches, Paux et al. (2006) used BES derived from wheat chromosome 3B to assess the composition and structure of the wheat genome.

Furthermore, BESs are a valuable source of molecular markers. Using BACs representing a minimum tiling path of soybean, Shultz et al. (2007) used BESs to develop new microsatellite markers. The wheat chromosome 3B BESs have been used to develop 711 chromosome-specific molecular markers based on transposable element insertion sites (insertion site-based polymorphism (ISBP)) (Paux et al. 2010). BAC end sequencing of rye (Secale cereale) chromosome 1RS, which is frequently translocated into wheat, allowed the development of 33 chromosome-specific SSR and ISBP markers at a success rate of better than 50%, with over 200 more potential marker sequences still to be tested (Bartoš et al. 2008).

Next-generation sequencing technologies provide the opportunity to obtain wheat genome sequences at a scale that has not previously been possible. Most notably sequencing of low-copy and genic regions of the short arm of wheat chromosome 7D (Berkman et al. 2011) and complete sequencing and assembly of 13 BAC contigs comprising 18 Mb of chromosome 3B (Choulet et al. 2010) have now been completed. We present BAC end sequencing of 7,470 BACs distributed across the long arm of wheat chromosome 1A (1AL), which provides insight into the composition of this chromosome and by extension to the rest of the A genome. In addition, the development of molecular markers for this chromosome and analysis of synteny between wheat 1AL and the complete genome sequences of rice, sorghum and Brachypodium distachyon are presented.

Materials and methods

Purification of chromosome arm 1AL by flow cytometric sorting

Liquid suspensions of intact mitotic chromosomes were prepared from double ditelosomic line (2n = 40 + 2t 1AS + 2t 1AL) of T. aestivum L. cv. “Chinese spring” according to Vrána et al. (2000). The samples were stained by DAPI and both the short (1AS) and the long (1AL) arms, maintained in the line as telocentric chromosomes, were purified simultaneously by flow cytomeric sorting as described by Kubaláková et al. (2002). The 1AL arms were sorted in batches of 200,000 into 320 μl of 1.5 × IB buffer (Šimková et al. 2003). The purity in sorted fractions was checked regularly by FISH as described in Janda et al. (2006) using probes for telomeric repeat and GAA repeat.

Construction of BAC libraries

Two 1AL-specific BAC libraries were constructed according to Šimková et al. (2011). Briefly, isolated HMW DNA was partially digested with HindIII (New England Biolabs, Beverly, MA, USA) and subjected to two rounds of size selection. DNA of particular size fractions was electroeluted from the gel and ligated into HindIII-digested dephosphorylated pIndigoBAC-5 vector (Epicentre, Madison, WI, USA). The recombinant vector was used to transform Escherichia coli ElectroMAX DH10B (TaaCsp1ALhA library) and MegaX DH10B (TaaCsp1ALhB library) competent cells (Invitrogen, Carlsbad, CA, USA), respectively. The libraries were ordered by Qbot (Genetix, New Milton, UK) into 384-well plates filled with 75 μl freezing medium consisting of 2YT, 6.6% glycerol and 12.5 μg/ml chloramphenicol. The clones were stored at −80°C. In order to estimate average insert sizes, a total of 160 BAC clones from the TaaCsp1ALhA library and 120 BAC clones from the TaaCsp1ALhB library were randomly selected from all size fractions of the libraries and analysed as described in Janda et al. (2006).

BAC end sequencing

Clones selected for BAC end sequencing were cultured overnight, and BAC DNA isolated using routine alkaline lysis miniprep techniques.

Sequencing reactions were set up using Big Dye Terminator chemistry (Applied Biosystems, Foster City, CA, USA) according to the manufacturer’s instructions. Both ends of each BAC were sequenced using universal M13 forward (5′CAGGAAACAGCTATGACC3′) and reverse (5′TGTAAAACGACGGCCAGT3′) primers, and a 3,730 × l DNA Analyser (Applied Biosystems). Chromatogram traces were base called and scored for quality using PHRED (Ewing and Green 1998, Ewing et al. 1998).

Annotation of chromosome 1AL sequences

Databases

DNA Repeat sequences were downloaded from the following databases: TREP release 10 (http://147.49.50.65/ITMI/Repeats/); Repbase Update release 15.11 (Jurka et al. 2005; http://www.girinst.org/repbase/index.html); and the TIGR plant repeat databases (Ouyang and Buell 2004; http://plantrepeats.plantbiology.msu.edu/index.html). For analysis of syntenic sequences, B. distachyon CDS (genome annotation v1.2) were downloaded from the B. distachyon project (International Brachypodium Initiative 2010, http://mips.helmholtz-muenchen.de/plant/brachypodium). Sorghum bicolor CDS and Oryza sativa transcripts (genome annotation v6.1) were obtained from PlantGDB (Dong et al. 2005, http://www.plantgdb.org). For gene identification, 1,616,584 PlantGDB-assembled Unique Transcripts (PUTs) were obtained from the same source. PUTs were taken from the following plant species: Arabidopsis thaliana (543,450 PUTs), B. distachyon (30,991), Glycine max (259,849), Hordeum vulgare (134,482), O. sativa (146,642), S. cereale (5,977), S. bicolor (44,954), T. aestivum (301,765), Triticum monococcum (6,987), and Zea mays (181,717). In addition, 15,871 full-length Triticeae CDS were obtained from TriFLDB (http://trifldb.psc.riken.jp/index.pl).

Software

Known DNA repeat sequences were searched for using RepeatMasker version 3.2.9 (http://www.repeatmasker.org) with the CrossMatch algorithm (Green 1996; http://www.phrap.org/phredphrapconsed.html) to locate alignments. All other similarity searches were carried out using the BLAST+ software suite downloaded from the NCBI (Camacho et al. 2009). Putative SSR markers were identified using SciRoKo v3.4 (Kofler et al. 2007) and ISBP markers using IsbpFinder.pl (Paux et al. 2010). PCR primers were designed using Primer3 (Rozen and Skaletsky 2000).

Identifying repetitive elements

A semi-automated pipeline was used to identify and mask repetitive elements from the BES. First of all, three consecutive runs of RepeatMasker were carried out using default settings with three different custom libraries in the following order: TREPall, Repbase Update, TIGR plant repeats. Sequences matching known repeats were masked with an ‘N’. Putative unknown repeats were then identified by searching masked BES with BLASTN against themselves and marked as repeats if they gave three or more hits of 50 bp or more at >80% identity. Repeats were classified according to the system proposed by Wicker et al. (2007).

Gene and synteny analysis

The repeat-masked sequences were then used in BLASTN searches against the PUTs, CDS and transcript sequences mentioned above at an e value cutoff of 1e−30. Proportion of the chromosome involved in coding sequences was derived from the cumulative match length. For those putative genes that gave a match of more than 200 bp in length, BLASTX searches (e value = 1e−10) were carried out against all non-redundant protein sequences. Hits corresponding to transposable element proteins and hypothetical proteins were removed from the analysis. For synteny analysis, only hits of longer than 50 bp were considered.

Marker development and testing

Unmasked sequences were searched for SSR markers using SciRoKo while ISBPs were identified using the results of the initial three rounds of repeat masking (against libraries of known repeats) using IsbpFinder.pl. ISBPs were then sorted by hand to select unique junctions with high confidence, meaning that the end of a repetitive element could be clearly identified on one or both sides of the junction. Primers sequences used to test putative SSR and ISBP markers are listed in Online resource 4. PCR reactions were carried out using Taq polymerase and standard protocols (Budak et al. 2005).

Results

BAC cloning of chromosome arm 1AL

Two BAC libraries cloned in two different bacterial strains were prepared from the 1AL. A total of 7.7 × 106 flow-sorted chromosome arms were used to construct the first library named TaaCsp1ALhA. The library cloned in ElectroMAX DH10B E. coli competent cells (Invitrogen, Carlsbad, CA, USA) comprises 49,536 clones. Considering 1AL size of 523 Mbp (Šafář et al. 2010), 83% purity of the sorted fraction, 1% of empty clones, and average insert size of 103 kb, the library provides 8× coverage of the 1AL arm. Aiming to reach 15× coverage favourable for construction of physical contig maps, a second library (TaaCsp1AlhB) was constructed using bacteriophage resistant MegaX DH10B competent cells (Invitrogen). This library was made from 6.0 × 106 flow-sorted arms and contains 43,008 clones with mean insert size of 109 kb. Considering estimated 87% purity of the sorted fraction and 1% of empty clones, the library represents 7.7 arm equivalents. Thus together the 1AL-specific libraries contain 92,544 clones and provide 15.7× coverage of the T. aestivum chromosome arm 1AL.

BAC end sequencing of wheat chromosome 1AL libraries

High-information content fingerprinting of BAC clones has proved effective in resolving the repetitive nature of wheat genomic seqeunces and constructing a physical map for chromosome 3B (Paux et al. 2008). Using a similar strategy, we fingerprinted both 1AL-specific BAC libraries, generated a preliminary physical map of chromosome 1AL (unpublished data), and from this selected a minimum tiling path of 7,470 BAC clones which are expected to cover the entire chromosome. Both ends of each BAC were sequenced and after eliminating poor quality bases, 13,445 useful sequences (89.99% success rate) with an average edited read length of 563 bp were obtained. This gives a total of 7.57 Mb of sequence distributed across the chromosome, representing 1.43% of the long arm of chromosome 1A. The GC content of the sequenced portion was 44.71%.

Annotation of chromosome 1AL sequences

The repeat content of chromosome 1AL was estimated by sequentially analysing the BESs with three libraries of plant repetitive sequences (see “Materials and methods”). Novel repeats were then identified by carrying out sequential BLAST searches of the BESs against themselves, followed by the sequences generated by 454 sequencing. From these searches, 8.2% of the chromosome is predicted to consist of novel repeats, bringing the total repetitive content of 1AL up to 90% of the chromosomal DNA. The repeat composition was compared with previously published data describing the B and D genomes (Fig. 1). Chromosome 1AL showed significant differences from the D genome but a very similar composition to the B genome, as might be anticipated from their closer evolutionary relationship.
https://static-content.springer.com/image/art%3A10.1007%2Fs10142-011-0250-3/MediaObjects/10142_2011_250_Fig1_HTML.gif
Fig. 1

Comparison of the composition of wheat A, B, and D genomes. The percentage of each kind of repeat in the A genome was estimated by dividing the total number of bases assigned to that repeat in the 1AL BES sequences by the total length of the BESs (cf. Table 1). Data for the B genome were obtained in a similar way using 10.8 Mb of BESs from wheat chromosome 3B (Paux et al. 2006). The D genome is represented by 2.9 Mb of whole-genome shotgun sequence from the progenitor of this genome, A. tauschii (Li et al. 2004)

The repetitive elements were masked with strings of “N,” and the remaining sequences used to identify expressed genes present in chromosome 1AL. BLAST searches were carried out against PUTs (PlantGDB-assembled Unique Transcripts) generated from EST sequences from a variety of plants. Cumulative hit length was used to estimate the proportion of the chromosome encoding expressed genes, and totalled 78,248 bp, which is equivalent to 1.03% of the total sequence. If this proportion is maintained across the whole chromosome, the estimated transcribed fraction of 1AL is 5.4 Mb. Sequences that gave hits longer than 200 bp were then annotated by searching against all non-redundant protein sequences. After eliminating hits against repetitive element-derived and hypothetical proteins, 29 putative proteins expressed from 1AL were identified (Table 1).
Table 1

Protein homologs of predicted genes

BES

Functional annotation

Accession

Organism

E value

Tae1AL98E13REV1

Pentatricopeptide repeat-containing protein

EEF51393.1

Ricinus communis

2E−89

Tae1AL137K21FOR1

Ionotropic glutamate receptor ortholog GLR6

BAD45488.1

Oryza sativa

9E−86

Tae1AL145A05REV1

Avr9/Cf-9 rapidly elicited protein

BAD10281.1

O. sativa

5E−85

Tae1AL119A12FOR1

Maturase

ADR66911.1

Boesenbergia rotunda

2E−75

Tae1AL102I24REV1

Respiratory burst oxidase

AAT58757.1

O. sativa

7E−62

Tae1AL199P15REV1

GRAS-family transcription factor containing protein

AAP54936.2

O. sativa

2E−61

Tae1AL157E13REV1

Zinc-finger protein

AAR86717.1

Populus euphratica

2E−58

Tae1AL32E20FOR1

Verticillium wilt disease resistance protein

BAA96770.1

O. sativa

1E−57

Tae1AL164B20REV1

Kinesin-related protein

AAN62776.1

O. sativa

1E−56

Tae1AL55J19FOR1

Sulphate transporter

CAD55701.1

Triticum aestivum

1E−55

Tae1AL75D23FOR1

Nucleolar protein NOP56 (ribosome biogenesis)

ACG38753.1

Zea mays

7E−55

Tae1AL100O14REV1

RNA-binding protein

EEF28708.1

R. communis

1E−50

Tae1AL128L20REV1

Polygalacturonase PG1

BAC57273.1

O. sativa

5E−46

Tae1AL168I13REV1

Formin-like protein

AAN05367.1

O. sativa

1E−45

Tae1AL80I10FOR1

Protein kinase

AAV44033.1

O. sativa

4E−44

Tae1AL72M02FOR1

Kinase-interacting protein 1

BAD73396.1

O. sativa

7E−41

Tae1AL139B17REV1

NB-ARC domain containing protein

ABA95503.1

O. sativa

1E−40

Tae1AL20O06REV1

Receptor kinase 1

ABG68037.1

T. aestivum

6E−40

Tae1AL104E02FOR1

RNA recognition motif family protein

ABF93874.1

O. sativa

3E−35

Tae1AL50G08REV1

Nascent polypeptide associated complex alpha chain

BAF46352.1

Nicotiana benthamiana

4E−28

Tae1AL190C18FOR1

MATE efflux protein

AAT58729.1

O. sativa

7E−28

Tae1AL154H06REV1

MLA1

BAD28289.1

O. sativa

8E−26

Tae1AL63B05FOR1

Transcription factor GAMyb

BAD68205.1

O. sativa

4E−24

Tae1AL126H15FOR1

Binding protein with PPR repeat

EFH40349.1

Arabidopsis lyrata

3E−21

Tae1AL104C05FOR1

Binding protein with PPR repeat

EFH40349.1

A. lyrata

4E−21

Tae1AL165D10REV1

Hydrolase, alpha/beta fold family protein

NP_194145

Arabidopsis thaliana

3E−19

Tae1AL186G19FOR1

F-box domain containing protein

ACG34427.1

Z. mays

4E−18

Tae1AL58M18FOR1

Proline iminopeptidase

AAU03101.1

O. sativa

2E−12

Tae1AL104A09FOR1

Signal transduction protein

AAM92822.1

O. sativa

4E−10

Repeat-masked BES sequences of 200 bp or longer that matched known expressed sequences were identified as described in “Materials and methods.” Functional annotations were taken from the highest-scoring annotated BLASTX hit among non-redundant proteins. Transposable element proteins and hypothetical proteins were excluded from the analysis

Development of molecular markers from BAC end sequences of chromosome 1AL

BAC end sequences can provide a rich source of potential molecular markers for mapping and sequencing projects. Among these, short sequence repeats have long been established as a useful source of polymorphism between closely related plant cultivars. More recently, ISBPs, located at the junctions between repetitive elements and unique sequences, have been proposed as a valuable and almost limitless source of genetic markers for highly repetitive genomes such as T. aestivum (Paux et al. 2006, 2010). The 1AL BES sequences were analysed for the presence of both SSR and ISBP-type sequences. A total of 433 SSRs were identified within the BESs, from which it was possible to design 362 viable primer pairs (see Online resource 1). Suitable junctions for designing ISBP markers numbered 9,338 (see Online resource 2) representing 6,948 of the BESs. Among these 147 of the ISBPs designed (1.57%) also incorporated an SSR, which may increase their chances of containing polymorphisms (listed in Online resource 3). To test the utility of these markers, PCR screens were carried out for 44 putative markers (eight SSRs, 26 ISBPs, and ten ISBPs incorporating SSRs). Of these, 23 (52.3%) correctly amplified the BAC against which they were designed when screened against multiple pools of different 1AL BAC clones (results summarized in Online resource 4). Developing genetic markers for T. aestivum is complicated by the hexaploid genome, where markers often amplify multiple similar loci on homeologous chromosomes. Therefore, 18 of the successful markers were then used to amplify gDNA from cultivars Chinese Spring and Renan, to test for size polymorphisms, along with nullisomic lines for 1A to test whether the marker locus was specific to this chromosome. Additionally, gDNA from T. monococcum, the diploid ancestor of the A genome, was screened. Typical results are shown in Fig. 2. Out of 18 markers, eight were specific to chromosome 1A (e.g., ISR13 and bISBP1; Fig. 2) while four appeared to have size polymorphisms between Chinese Spring and Renan (e.g., ISR19 and bISBP5; the bands for ISR19 were reproducible, but faint; some PCR optimization is required). However, 11 out of 18 markers were polymorphic between Chinese Spring and T. monococcum, as might be expected from the greater genetic distance in this case. The efficiency of marker development from SSR and ISBP sequences appeared to be similar, although a larger sample set is required to make statistically meaningful conclusions.
https://static-content.springer.com/image/art%3A10.1007%2Fs10142-011-0250-3/MediaObjects/10142_2011_250_Fig2_HTML.gif
Fig. 2

Analysis of specificity of molecular markers generated from BAC ends. PCR products for four typical markers were separated on 2% agarose. Control PCR reactions with no template (-ve) were also carried out. Genomic DNA from T. monococcum and two T. aestivum cultivars, Renan and Chinese Spring (CS) were used as templates along with N1A-T1B (nullisomic for chromosome 1A, tetrasomic for 1B), N1A-T1D (nullisomic for 1A, tetrasomic for 1D). Size standards are GeneRuler 1 kb Ladder Plus (Fermentas GmbH, St. Leon-Rot, Germany), and expected sizes for each marker are indicated by open arrowheads. Markers labelled ISR are ISBPs which contain an SSR, those labelled bISBP are simple ISBPs

Syntenic relationships between chromosome arm 1AL and other grass genomes

The complete sequencing of the genomes of rice, sorghum and most recently the model grass species B. distachyon provide a valuable resource for mapping the genomes of other related grass species. Mayer et al. (2009) used similarity with rice and sorghum coding sequences to integrate 454 shotgun reads and EST sequences with a genetic map of barley chromosome 1H. Similarly, the non-repetitive BES sequences were searched for in the complete coding sequences (CDS) of B. distachyon, O. sativa, and S. bicolor. Of the BAC clones used in this study, 101 give significant matches to at least one CDS from a fully sequenced grass species. The BES were then used to search putative full-length Triticum transcripts deposited in TriFLdb, and hits were compared with the other grass genomes, identifying a further 34 clones that matched a conserved CDS (Fig. 3a). The highest number of conserved CDS were found with B. distacyhon (112/135), supporting its utility as a model organism for wheat.
https://static-content.springer.com/image/art%3A10.1007%2Fs10142-011-0250-3/MediaObjects/10142_2011_250_Fig3_HTML.gif
Fig. 3

Syntenic relationships of putative gene sequences. a Venn diagram showing the number of CDS/full-length transcripts from each indicated grass species with homologs in the repeat-masked BES sequences. b Diagram showing the relationships between syntenic blocks from B. distachyon, O. sativa, and S. bicolor represented on chromosome 1AL. Approximate positions of blocks on each chromosome and inversions are indicated by the coloured lines

In total, 95 1AL BESs gave hits in two or more sequenced grass species. These matches were compared, and grouped into syntenic groups where more than three sequences were present in the same order on at least two of the genomes (Fig. 3b). These revealed two major blocks of synteny, corresponding to Brachypodium chromosome 2/Oryza chromosome 5/Sorghum chromosome 9 and Brachypodium chromosome 3/Oryza chromosome 10/Sorghum chromosome 1, which correspond to the main syntenic regions also identified on barley chromosome 1H (Mayer et al. 2009). In addition, three smaller syntenic blocks were identified which may represent shorter chromosome fragments that have been translocated into chromosome arm 1AL during the evolution of T. aestivum. The full list of relationships is given in Online resource 4 and the syntenic blocks in Online resource 5.

For all sequences that showed a syntenic relationship, the orthologs in other grass species had their predicted function assigned by gene ontology examined at Gramene (2011). Of the 95 syntenic sequences, 25 had no known function. The remainder were divided into groups by function (Table 2). In several cases, two BES sequences mapped to the same syntenic gene, which may indicate that they were derived from overlapping BAC clones, or that the gene has been duplicated in wheat. Of note was the relatively high representation of protein kinases (8/95) which may indicate a cluster of these genes on chromosome 1AL, or that the wheat genome as a whole contains a large number of these signalling molecules. Also of interest are the putative stress response and carbohydrate metabolism genes, which may underlie quantitative trait loci (QTLs) that have been mapped to chromosome 1AL.
Table 2

Functional annotation of syntenic genes found on chromosome 1AL

Process

BES sequence

Syntenic gene

Predicted function

Transcription regulation

Tae1AL106N07FOR1

Bradi1g27170

CG-1 domain TF

Tae1AL106O14FOR1

Tae1AL199P15REV1

Bradi1g78230

GRAS-family TF

Tae1AL169C17FOR1

Bradi2g22050

NOR1 TF

Tae1AL156G04REV1

Bradi4g25940

Histone methyltransferase

Tae1AL198C17REV1

Bradi1g14550

Transcription factor B3

Tae1AL220L12REV1

Bradi3g26800

GCN5-related N-acetyltransferase

Carbohydrate metabolism

Tae1AL23B19REV1

Bradi1g24660

UDP-glucosyl transferase

Tae1AL183C14REV1

Bradi2g18810

Starch synthase

Tae1AL83K17FOR1

Tae1AL98B11REV1

Bradi5g13560

O-Glycosyl hydrolase

Cytoskeleton and vesicle transport

Tae1AL148H06FOR1

Bradi1g54510

Clathrin/coatomer adaptor like

Tae1AL181M03FOR1

Bradi2g25210

Rab GTP dissociation inhibitor

Tae1AL143K09FOR1

Bradi2g17090

Dynamin GTPase

Tae1AL194K14REV1

Bradi3g33850

Dynamin GTPase

Tae1AL146C16REV1

Bradi1g10150

Alpha tubulin

Tae1AL137N08FOR1

Bradi2g10970

Alpha tubulin

Tae1AL146C16REV1

Os11g14220

Tubulin

Tae1AL223I01REV1

Bradi2g15630

Vinculin (cell adhesion)

Tae1AL235G09REV1

Membrane transport

Tae1AL188F06FOR1

Bradi1g62050

Sulphate transporter

Tae1AL196N09FOR1

Bradi2g25950

Cobalt transporter

Tae1AL190C18FOR1

Bradi2g17260

MatE multi-drug transporter

Tae1AL229L12FOR1

Bradi5g14680

Tetracycline-proton antiporter

Tae1AL235K08FOR1

Bradi2g54410

Cation efflux transporter

Tae1AL206M17FOR1

Os05g27100

Heavy metal transport

Nucleic acid modification

Tae1AL104E02FOR1

Bradi1g75860

RRM-RNP1 (NA binding)

Tae1AL139E06FOR1

Bradi2g28110

3′–5′ Exonuclease

Tae1AL202J19REV1

Bradi2g14147

Nucleic acid binding

Tae1AL228J01FOR1

Bradi3g34030

DNA-binding

Tae1AL199F10FOR1

Bradi3g30250

DEAD-box RNA helicase

Tae1AL159E14FOR1

Tae1AL109L13REV1

Bradi1g36730

DEAD-box RNA helicase

Signal transduction

Tae1AL236M05REV1

Bradi2g26680

Protein kinase

Tae1AL146D13FOR1

Bradi2g24800

ANTH (phospholipid binding)

Tae1AL224D21FOR1

Bradi2g22250

MARCKS (calmodulin binding)

Tae1AL233P03FOR1

Tae1AL233H19REV1

Bradi2g19380

Protein kinase

Tae1AL80I10FOR1

Tae1AL124J11FOR1

Bradi2g16627

GTP binding/GTPase

Tae1AL161L22REV1

Bradi2g15900

Ser/Thr protein kinase

Tae1AL240P05REV1

Bradi3g31110

Protein kinase

Tae1AL83E04FOR1

Bradi2g01180

Protein kinase

Tae1AL56H20REV1

Bradi2g05710

Protein kinase

Tae1AL20O06REV1

Os10g19160

Protein kinase

Tae1AL137K21FOR1

Bradi1g32800

Glutamate receptor

Stress responses

Tae1AL38K24FOR1

Bradi3g34450

DnaJ heat-shock protein

Tae1AL213C11FOR1

Bradi1g61550

Peroxidase

Tae1AL139B17REV1

Os11g47780

Apoptosis/defense response

Cell metabolism

Tae1AL162F05REV1

Bradi2g25710

Cytochrome P450 B

Tae1AL228O02FOR1

Bradi2g18550

Carbamoyl phosphate synthase

Tae1AL141M03REV1

Bradi2g14210

Nucleoside diphosphate kinase

Tae1AL166A05FOR1

Bradi3g24730

Proton-coupled ATP synthase

Tae1AL200D23REV1

Bradi1g32100

Phosphoribosylformylglycinamidine synthase

Tae1AL230D05FOR1

Bradi2g43020

Crotonase

Tae1AL173F19FOR1

Bradi3g11270

Pyruvate, phosphate dikinase

Tae1AL190D02FOR1

Bradi3g36540

Lipase 3

Tae1AL104H13FOR1

Os04g16740

Proton-coupled ATP synthase

Tae1AL57G21FOR1

Bradi2g19830

O-Methyltransferase

Protein synthesis and degradation

Tae1AL106C20REV1

Bradi2g25260

Peptidase S1_S6

Tae1AL160D24FOR1

Bradi2g21280

Cystatin

Tae1AL240I04FOR1

Bradi2g17730

TFIIB-related (translation initiation)

Tae1AL63N10FOR1

Bradi3g20790

Ubiquitin-dependent peptidase C19

Tae1AL165N15REV1

Bradi3g28280

Ubiquitin ligase

Tae1AL168J12REV1

Bradi3g30640

Serine peptidase S28

Tae1AL18M03FOR1

Bradi5g16960

Peptidase C13

Tae1AL98B05REV1

Bradi2g20340

Signal recognition particle

Tae1AL56K24REV1

Bradi2g38310

IF2 (translation initiation)

Tae1AL201A17REV1

Bradi3g19100

SecY protein translocase

Tae1AL145A05REV1

Bradi3g38610

Ubiquitin ligase

Unknown

25 sequences

Syntenic genes (those found in at least two sequenced grass species) that are also present in 1AL were identified as described in “Results.” Where a syntenic Brachypodium gene was found, this gene is indicated; where genes were present in rice and sorghum but not Brachypodium, the rice gene is indicated

Discussion

The construction of BAC library from chromosome arm 1AL marks a step forward in developing large-insert libraries from all chromosome arms of hexaploid wheat. These subgenomic resources provide an entrance into the complex genome and facilitate its mapping, positional cloning and sequencing (Paux et al. 2008). The analysis of a total of 7.57 Mb of BES from wheat chromosome arm 1AL presented here provides interesting comparisons with previous studies on other Triticeae chromosomes. The GC content was extremely similar to that previously reported for BESs from chromosome 3B (Paux et al. 2006), 44.7% and 44.5%, respectively, and the repeat composition was also extremely similar, whereas data from the D genome ancestor Aegilops tauschii showed significantly different proportions (Li et al. 2004; Fig. 1). The consistency of the data from 1AL and 3B suggests that the global composition of the A and B genomes is very similar and that BES studies provide a good representation of the whole chromosome.

By sequence similarity to plant ESTs, the transcribed portion of 1AL was estimated to be 1.03% of the total sequence, or a cumulative length of about 5.4 Mb. The average length of 6,137 full-length cDNAs from wheat was 1,143 bp (Mochida et al. 2009), which would suggest the presence of about 4,700 genes on chromosome 1AL at an average density of 1/110 kb, and about 50,000 genes for the entire A genome. These values are close to those previously reported, although it must be noted that the size limitation of BESs makes it difficult to distinguish intact genic sequences from pseudogenes, so this may be an overestimation.

Complete sequencing of wheat chromosomes and marker assisted selection both require a high density of molecular markers to distinguish between similar repetitive sequences (Paux et al. 2010). The most densely populated physical map for 1AL currently available consists of 334 deletion bin-mapped ESTs (Peng et al. 2004). However, the small number of deletion bins in 1AL (only three) limits the resolution of this map. The probable order of a minority of these ESTs can be found by examining their relationships with syntenic regions on the sequenced genomes of rice or other grasses (Quraishi et al. 2009), thus converting them into conserved orthologous set markers. However, many additional markers are still required to saturate the chromosome. Using the 1AL BES dataset, we were able to generate 100 s of putative SSR markers and 1,000 s of putative ISBP markers. When these were tested for specificity, 23/44 (52.3%) were found to be useful. The 21 putative markers that were not useful were mostly rejected due to lack of specificity, which is a risk when using ISBP markers; as one of the primers in each pair binds to a repetitive sequence, the other must be highly specific to ensure accurate amplification. From the successful markers 4/18 (11.6% of all the markers tested) showed size polymorphisms between Chinese Spring and Renan. If the remaining ISBP and SSR markers presented here are developed at the same success rate, about 3,750 useful markers could be generated to populate future physical and genetic maps of chromosome 1AL, of which about 830 would be expected to have size polymorphisms in a Chinese Spring × Renan cross. In addition, many more of the new markers may be polymorphic between Chinese Spring and other cultivars, or may contain single-nucleotide polymorphisms (SNPs) not identified in this study. Indeed, sequencing of 157 ISBPs from wheat chromosome 3B in eight different wheat lines (Paux et al. 2010) revealed polymorphisms between at least two of the lines for 67% of the ISBPs, the great majority of which were SNPs. Another advantage of the markers generated here is that, having been designed from minimum tiling path BESs, they should be evenly distributed across the whole chromosome.

Similarly, using the BES sequences we were able to identify syntenic blocks located on chromosome arm 1AL. The major regions of synteny were those expected from previous analysis of the related barley chromosome 1H (Mayer et al. 2009). However, three smaller syntenic blocks were also identified that most likely represent smaller translocations from other chromosomes. Using the syntenic relationships, it was possible to identify genes in the sequenced grass species that are likely to be present on chromosome arm 1AL, in addition to those identified by similarity to ESTs. These relationships will be useful in identifying candidate genes for QTLs that have been mapped to this chromosome. For example, Li et al. (2010) identified a QTL for tolerance to photo-oxidative stress on 1AL, while our study detected a putative peroxidase and Cytochrome P450 protein that may be involved in the oxidative stress response. Yield is a sufficiently complex trait that QTLs influencing yield-related traits have been mapped to most wheat chromosomes, but meta-QTL analysis identified two yield-enhancing MQTLs on chromosome 1AL (Zhang et al. 2010). In this regard the four carbohydrate metabolism genes identified here are of particular interest for further study, as yield is affected by the balance of starch production and degradation in the grain.

One important locus associated with grain quality QTLs that have been mapped to chromosome 1AL is the high-molecular weight glutenin gene Glu-A1 (Kuchel et al. 2006). The bulk of the Glu1 protein sequence is made up of two repeated motifs (consensus sequences PGQGQQ and GYYPTSLQQ), and the gene is not found in Brachypodium or rice. However, when the non-repetitive sections of the Glu1 protein sequence (the N-terminal 120 residues and C-terminal 45 residues) are used in similarity searches against these two species, their best matches are Bradi2g20870 (prolamin subfamily 2) and Os05g41970 (SSA1 family protein), respectively. Both of these putative proteins are involved in seed storage and located in the same syntenic location in this study—block III, between BACs Tae1AL160D24 and Tae1AL13G19. Therefore, it is possible that the Glu-A1 gene is evolutionarily related to these two genes, and also found at the same location, which we will be able to characterise more closely once the 1AL physical map is finalised.

Acknowledgments

We are grateful to Prof. B. S. Gill (Kansas State University, Manhattan, USA) for seeds of the double ditelosomic line 1A of wheat T. aestivum L. cv. Chinese Spring. We thank our colleagues, Dr. Jarmila Číhalíková, Dr. Marie Kubaláková and Romana Šperková, Bc. for chromosome sorting and Jana Dostálová, Bc., Radka Tušková, Helena Tvardíková, and Dr. Marie Seifertová for excellent technical assistance in BAC library construction and Z. Weinstein for the help with the MS. The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013) under the grant agreement no. FP7-212019.

Supplementary material

10142_2011_250_MOESM1_ESM.xlsx (42 kb)
Online resource 1Excel spreadsheet of 362 putative SSR markers identified in chromosome 1AL BES sequences. (XLSX 42.3 kb)
10142_2011_250_MOESM2_ESM.xlsx (1.7 mb)
Online resource 2Excel spreadsheet of 9,338 putative ISBP markers identified in chromosome 1AL BES sequences, including details of repetitive element junctions. (XLSX 1.71 mb)
10142_2011_250_MOESM3_ESM.xlsx (43 kb)
Online resource 3Excel spreadsheet of 147 putative ISBP markers that incorporate an SSR from 1AL BES sequences, including details of repetitive element junctions and microsatellite sequences. (XLSX 42.7 kb)
10142_2011_250_MOESM4_ESM.xlsx (14 kb)
Online resource 4Excel spreadsheet of primer pairs used to test 26 ISBP, eight SSR, and ten combined ISBP/SSR markers in PCR screens, along with summary of results from amplification of both pooled BAC clones and gDNA from wheat cultivars and nullitetrasomic lines. (XLSX 14.1 kb)
10142_2011_250_MOESM5_ESM.docx (21 kb)
Online resource 5Table of syntenic relationships of masked 1AL BES sequences. All BES that had significant homology to coding regions in at least two sequenced grass species are shown. CDS that are out of syntenic sequence in one species are highlighted. (DOCX 21.1 kb)

Copyright information

© Springer-Verlag 2011