Functional features of a single chromosome arm in wheat (1AL) determined from its structure
- First Online:
- Cite this article as:
- Lucas, S.J., Šimková, H., Šafář, J. et al. Funct Integr Genomics (2012) 12: 173. doi:10.1007/s10142-011-0250-3
- 229 Views
Bread wheat (Triticum aestivum L.) is one of the most important crops globally and a high priority for genetic improvement, but its large and complex genome has been seen as intractable to whole genome sequencing. Isolation of individual wheat chromosome arms has facilitated large-scale sequence analyses. However, so far there is no such survey of sequences from the A genome of wheat. Greater understanding of an A chromosome could facilitate wheat improvement and future sequencing of the entire genome. We have constructed BAC library from the long arm of T. aestivum chromosome 1A (1AL) and obtained BAC end sequences from 7,470 clones encompassing the arm. We obtained 13,445 (89.99%) useful sequences with a cumulative length of 7.57 Mb, representing 1.43% of 1AL and about 0.14% of the entire A genome. The GC content of the sequences was 44.7%, and 90% of the chromosome was estimated to comprise repeat sequences, while just over 1% encoded expressed genes. From the sequence data, we identified a large number of sites suitable for development of molecular markers (362 SSR and 6,948 ISBP) which will have utility for mapping this chromosome and for marker assisted breeding. From 44 putative ISBP markers tested 23 (52.3%) were found to be useful. The BAC end sequence data also enabled the identification of genes and syntenic blocks specific to chromosome 1AL, suggesting regions of particular functional interest and targets for future research.
KeywordsWheatA genomeBAC end sequencingComparative genomicsMarker design
Bread wheat (Triticum aestivum) is one of the most important crop species, with global annual production currently over 600 million tonnes providing approximately one fifth of the world’s total calorific input (data from The Food and Agriculture Organization of the United Nations 2009). Continually raising the yield potential of wheat to match human population growth and stabilizing yield against the damaging effects of climate change is a top priority for agricultural science (Reynolds et al. 2009). While sequencing of the wheat genome would be of great utility for gene discovery and mapping of traits required for yield improvement, it has been perceived as too difficult owing to its size and complexity. At an estimated 17 Gb the wheat genome is 40 times larger than that of rice and contains about 80% repetitive sequence (Smith and Flavell 1975). Furthermore, as an allohexaploid (2n = 6x = 42) many sequences are present in three similar but different copies on each of the homoeologous genomes A, B, and D, further complicating genome-wide sequence analysis. Therefore, the majority of the wheat genomic sequences found in the public databases are those generated during targeted cloning projects or comparative studies of important traits (reviewed in Feuillet and Salse 2009), which necessarily focus on gene-rich segments.
Recently, methods of purifying the individual wheat chromosomes and producing chromosome-specific BAC libraries have been developed (Doležel et al. 2007, Šafář et al. 2010). By treating each chromosome individually, the complexities of physical mapping and sequence assembly can be greatly reduced. Additionally, high-throughput protocols for sequencing the ends of BAC clones (Kelley et al. 1999) enable the generation of large datasets of BAC end sequence (BES) distributed randomly across the whole genome, and therefore more representative of the total genome composition. Combining these approaches, Paux et al. (2006) used BES derived from wheat chromosome 3B to assess the composition and structure of the wheat genome.
Furthermore, BESs are a valuable source of molecular markers. Using BACs representing a minimum tiling path of soybean, Shultz et al. (2007) used BESs to develop new microsatellite markers. The wheat chromosome 3B BESs have been used to develop 711 chromosome-specific molecular markers based on transposable element insertion sites (insertion site-based polymorphism (ISBP)) (Paux et al. 2010). BAC end sequencing of rye (Secale cereale) chromosome 1RS, which is frequently translocated into wheat, allowed the development of 33 chromosome-specific SSR and ISBP markers at a success rate of better than 50%, with over 200 more potential marker sequences still to be tested (Bartoš et al. 2008).
Next-generation sequencing technologies provide the opportunity to obtain wheat genome sequences at a scale that has not previously been possible. Most notably sequencing of low-copy and genic regions of the short arm of wheat chromosome 7D (Berkman et al. 2011) and complete sequencing and assembly of 13 BAC contigs comprising 18 Mb of chromosome 3B (Choulet et al. 2010) have now been completed. We present BAC end sequencing of 7,470 BACs distributed across the long arm of wheat chromosome 1A (1AL), which provides insight into the composition of this chromosome and by extension to the rest of the A genome. In addition, the development of molecular markers for this chromosome and analysis of synteny between wheat 1AL and the complete genome sequences of rice, sorghum and Brachypodium distachyon are presented.
Materials and methods
Purification of chromosome arm 1AL by flow cytometric sorting
Liquid suspensions of intact mitotic chromosomes were prepared from double ditelosomic line (2n = 40 + 2t 1AS + 2t 1AL) of T. aestivum L. cv. “Chinese spring” according to Vrána et al. (2000). The samples were stained by DAPI and both the short (1AS) and the long (1AL) arms, maintained in the line as telocentric chromosomes, were purified simultaneously by flow cytomeric sorting as described by Kubaláková et al. (2002). The 1AL arms were sorted in batches of 200,000 into 320 μl of 1.5 × IB buffer (Šimková et al. 2003). The purity in sorted fractions was checked regularly by FISH as described in Janda et al. (2006) using probes for telomeric repeat and GAA repeat.
Construction of BAC libraries
Two 1AL-specific BAC libraries were constructed according to Šimková et al. (2011). Briefly, isolated HMW DNA was partially digested with HindIII (New England Biolabs, Beverly, MA, USA) and subjected to two rounds of size selection. DNA of particular size fractions was electroeluted from the gel and ligated into HindIII-digested dephosphorylated pIndigoBAC-5 vector (Epicentre, Madison, WI, USA). The recombinant vector was used to transform Escherichia coli ElectroMAX DH10B (TaaCsp1ALhA library) and MegaX DH10B (TaaCsp1ALhB library) competent cells (Invitrogen, Carlsbad, CA, USA), respectively. The libraries were ordered by Qbot (Genetix, New Milton, UK) into 384-well plates filled with 75 μl freezing medium consisting of 2YT, 6.6% glycerol and 12.5 μg/ml chloramphenicol. The clones were stored at −80°C. In order to estimate average insert sizes, a total of 160 BAC clones from the TaaCsp1ALhA library and 120 BAC clones from the TaaCsp1ALhB library were randomly selected from all size fractions of the libraries and analysed as described in Janda et al. (2006).
BAC end sequencing
Clones selected for BAC end sequencing were cultured overnight, and BAC DNA isolated using routine alkaline lysis miniprep techniques.
Sequencing reactions were set up using Big Dye Terminator chemistry (Applied Biosystems, Foster City, CA, USA) according to the manufacturer’s instructions. Both ends of each BAC were sequenced using universal M13 forward (5′CAGGAAACAGCTATGACC3′) and reverse (5′TGTAAAACGACGGCCAGT3′) primers, and a 3,730 × l DNA Analyser (Applied Biosystems). Chromatogram traces were base called and scored for quality using PHRED (Ewing and Green 1998, Ewing et al. 1998).
Annotation of chromosome 1AL sequences
DNA Repeat sequences were downloaded from the following databases: TREP release 10 (http://220.127.116.11/ITMI/Repeats/); Repbase Update release 15.11 (Jurka et al. 2005; http://www.girinst.org/repbase/index.html); and the TIGR plant repeat databases (Ouyang and Buell 2004; http://plantrepeats.plantbiology.msu.edu/index.html). For analysis of syntenic sequences, B. distachyon CDS (genome annotation v1.2) were downloaded from the B. distachyon project (International Brachypodium Initiative 2010, http://mips.helmholtz-muenchen.de/plant/brachypodium). Sorghum bicolor CDS and Oryza sativa transcripts (genome annotation v6.1) were obtained from PlantGDB (Dong et al. 2005, http://www.plantgdb.org). For gene identification, 1,616,584 PlantGDB-assembled Unique Transcripts (PUTs) were obtained from the same source. PUTs were taken from the following plant species: Arabidopsis thaliana (543,450 PUTs), B. distachyon (30,991), Glycine max (259,849), Hordeum vulgare (134,482), O. sativa (146,642), S. cereale (5,977), S. bicolor (44,954), T. aestivum (301,765), Triticum monococcum (6,987), and Zea mays (181,717). In addition, 15,871 full-length Triticeae CDS were obtained from TriFLDB (http://trifldb.psc.riken.jp/index.pl).
Known DNA repeat sequences were searched for using RepeatMasker version 3.2.9 (http://www.repeatmasker.org) with the CrossMatch algorithm (Green 1996; http://www.phrap.org/phredphrapconsed.html) to locate alignments. All other similarity searches were carried out using the BLAST+ software suite downloaded from the NCBI (Camacho et al. 2009). Putative SSR markers were identified using SciRoKo v3.4 (Kofler et al. 2007) and ISBP markers using IsbpFinder.pl (Paux et al. 2010). PCR primers were designed using Primer3 (Rozen and Skaletsky 2000).
Identifying repetitive elements
A semi-automated pipeline was used to identify and mask repetitive elements from the BES. First of all, three consecutive runs of RepeatMasker were carried out using default settings with three different custom libraries in the following order: TREPall, Repbase Update, TIGR plant repeats. Sequences matching known repeats were masked with an ‘N’. Putative unknown repeats were then identified by searching masked BES with BLASTN against themselves and marked as repeats if they gave three or more hits of 50 bp or more at >80% identity. Repeats were classified according to the system proposed by Wicker et al. (2007).
Gene and synteny analysis
The repeat-masked sequences were then used in BLASTN searches against the PUTs, CDS and transcript sequences mentioned above at an e value cutoff of 1e−30. Proportion of the chromosome involved in coding sequences was derived from the cumulative match length. For those putative genes that gave a match of more than 200 bp in length, BLASTX searches (e value = 1e−10) were carried out against all non-redundant protein sequences. Hits corresponding to transposable element proteins and hypothetical proteins were removed from the analysis. For synteny analysis, only hits of longer than 50 bp were considered.
Marker development and testing
Unmasked sequences were searched for SSR markers using SciRoKo while ISBPs were identified using the results of the initial three rounds of repeat masking (against libraries of known repeats) using IsbpFinder.pl. ISBPs were then sorted by hand to select unique junctions with high confidence, meaning that the end of a repetitive element could be clearly identified on one or both sides of the junction. Primers sequences used to test putative SSR and ISBP markers are listed in Online resource 4. PCR reactions were carried out using Taq polymerase and standard protocols (Budak et al. 2005).
BAC cloning of chromosome arm 1AL
Two BAC libraries cloned in two different bacterial strains were prepared from the 1AL. A total of 7.7 × 106 flow-sorted chromosome arms were used to construct the first library named TaaCsp1ALhA. The library cloned in ElectroMAX DH10B E. coli competent cells (Invitrogen, Carlsbad, CA, USA) comprises 49,536 clones. Considering 1AL size of 523 Mbp (Šafář et al. 2010), 83% purity of the sorted fraction, 1% of empty clones, and average insert size of 103 kb, the library provides 8× coverage of the 1AL arm. Aiming to reach 15× coverage favourable for construction of physical contig maps, a second library (TaaCsp1AlhB) was constructed using bacteriophage resistant MegaX DH10B competent cells (Invitrogen). This library was made from 6.0 × 106 flow-sorted arms and contains 43,008 clones with mean insert size of 109 kb. Considering estimated 87% purity of the sorted fraction and 1% of empty clones, the library represents 7.7 arm equivalents. Thus together the 1AL-specific libraries contain 92,544 clones and provide 15.7× coverage of the T. aestivum chromosome arm 1AL.
BAC end sequencing of wheat chromosome 1AL libraries
High-information content fingerprinting of BAC clones has proved effective in resolving the repetitive nature of wheat genomic seqeunces and constructing a physical map for chromosome 3B (Paux et al. 2008). Using a similar strategy, we fingerprinted both 1AL-specific BAC libraries, generated a preliminary physical map of chromosome 1AL (unpublished data), and from this selected a minimum tiling path of 7,470 BAC clones which are expected to cover the entire chromosome. Both ends of each BAC were sequenced and after eliminating poor quality bases, 13,445 useful sequences (89.99% success rate) with an average edited read length of 563 bp were obtained. This gives a total of 7.57 Mb of sequence distributed across the chromosome, representing 1.43% of the long arm of chromosome 1A. The GC content of the sequenced portion was 44.71%.
Annotation of chromosome 1AL sequences
Protein homologs of predicted genes
Pentatricopeptide repeat-containing protein
Ionotropic glutamate receptor ortholog GLR6
Avr9/Cf-9 rapidly elicited protein
Respiratory burst oxidase
GRAS-family transcription factor containing protein
Verticillium wilt disease resistance protein
Nucleolar protein NOP56 (ribosome biogenesis)
Kinase-interacting protein 1
NB-ARC domain containing protein
Receptor kinase 1
RNA recognition motif family protein
Nascent polypeptide associated complex alpha chain
MATE efflux protein
Transcription factor GAMyb
Binding protein with PPR repeat
Binding protein with PPR repeat
Hydrolase, alpha/beta fold family protein
F-box domain containing protein
Signal transduction protein
Development of molecular markers from BAC end sequences of chromosome 1AL
Syntenic relationships between chromosome arm 1AL and other grass genomes
In total, 95 1AL BESs gave hits in two or more sequenced grass species. These matches were compared, and grouped into syntenic groups where more than three sequences were present in the same order on at least two of the genomes (Fig. 3b). These revealed two major blocks of synteny, corresponding to Brachypodium chromosome 2/Oryza chromosome 5/Sorghum chromosome 9 and Brachypodium chromosome 3/Oryza chromosome 10/Sorghum chromosome 1, which correspond to the main syntenic regions also identified on barley chromosome 1H (Mayer et al. 2009). In addition, three smaller syntenic blocks were identified which may represent shorter chromosome fragments that have been translocated into chromosome arm 1AL during the evolution of T. aestivum. The full list of relationships is given in Online resource 4 and the syntenic blocks in Online resource 5.
Functional annotation of syntenic genes found on chromosome 1AL
CG-1 domain TF
Transcription factor B3
Cytoskeleton and vesicle transport
Clathrin/coatomer adaptor like
Rab GTP dissociation inhibitor
Vinculin (cell adhesion)
MatE multi-drug transporter
Cation efflux transporter
Heavy metal transport
Nucleic acid modification
RRM-RNP1 (NA binding)
Nucleic acid binding
DEAD-box RNA helicase
DEAD-box RNA helicase
ANTH (phospholipid binding)
MARCKS (calmodulin binding)
Ser/Thr protein kinase
DnaJ heat-shock protein
Cytochrome P450 B
Carbamoyl phosphate synthase
Nucleoside diphosphate kinase
Proton-coupled ATP synthase
Pyruvate, phosphate dikinase
Proton-coupled ATP synthase
Protein synthesis and degradation
TFIIB-related (translation initiation)
Ubiquitin-dependent peptidase C19
Serine peptidase S28
Signal recognition particle
IF2 (translation initiation)
SecY protein translocase
The construction of BAC library from chromosome arm 1AL marks a step forward in developing large-insert libraries from all chromosome arms of hexaploid wheat. These subgenomic resources provide an entrance into the complex genome and facilitate its mapping, positional cloning and sequencing (Paux et al. 2008). The analysis of a total of 7.57 Mb of BES from wheat chromosome arm 1AL presented here provides interesting comparisons with previous studies on other Triticeae chromosomes. The GC content was extremely similar to that previously reported for BESs from chromosome 3B (Paux et al. 2006), 44.7% and 44.5%, respectively, and the repeat composition was also extremely similar, whereas data from the D genome ancestor Aegilops tauschii showed significantly different proportions (Li et al. 2004; Fig. 1). The consistency of the data from 1AL and 3B suggests that the global composition of the A and B genomes is very similar and that BES studies provide a good representation of the whole chromosome.
By sequence similarity to plant ESTs, the transcribed portion of 1AL was estimated to be 1.03% of the total sequence, or a cumulative length of about 5.4 Mb. The average length of 6,137 full-length cDNAs from wheat was 1,143 bp (Mochida et al. 2009), which would suggest the presence of about 4,700 genes on chromosome 1AL at an average density of 1/110 kb, and about 50,000 genes for the entire A genome. These values are close to those previously reported, although it must be noted that the size limitation of BESs makes it difficult to distinguish intact genic sequences from pseudogenes, so this may be an overestimation.
Complete sequencing of wheat chromosomes and marker assisted selection both require a high density of molecular markers to distinguish between similar repetitive sequences (Paux et al. 2010). The most densely populated physical map for 1AL currently available consists of 334 deletion bin-mapped ESTs (Peng et al. 2004). However, the small number of deletion bins in 1AL (only three) limits the resolution of this map. The probable order of a minority of these ESTs can be found by examining their relationships with syntenic regions on the sequenced genomes of rice or other grasses (Quraishi et al. 2009), thus converting them into conserved orthologous set markers. However, many additional markers are still required to saturate the chromosome. Using the 1AL BES dataset, we were able to generate 100 s of putative SSR markers and 1,000 s of putative ISBP markers. When these were tested for specificity, 23/44 (52.3%) were found to be useful. The 21 putative markers that were not useful were mostly rejected due to lack of specificity, which is a risk when using ISBP markers; as one of the primers in each pair binds to a repetitive sequence, the other must be highly specific to ensure accurate amplification. From the successful markers 4/18 (11.6% of all the markers tested) showed size polymorphisms between Chinese Spring and Renan. If the remaining ISBP and SSR markers presented here are developed at the same success rate, about 3,750 useful markers could be generated to populate future physical and genetic maps of chromosome 1AL, of which about 830 would be expected to have size polymorphisms in a Chinese Spring × Renan cross. In addition, many more of the new markers may be polymorphic between Chinese Spring and other cultivars, or may contain single-nucleotide polymorphisms (SNPs) not identified in this study. Indeed, sequencing of 157 ISBPs from wheat chromosome 3B in eight different wheat lines (Paux et al. 2010) revealed polymorphisms between at least two of the lines for 67% of the ISBPs, the great majority of which were SNPs. Another advantage of the markers generated here is that, having been designed from minimum tiling path BESs, they should be evenly distributed across the whole chromosome.
Similarly, using the BES sequences we were able to identify syntenic blocks located on chromosome arm 1AL. The major regions of synteny were those expected from previous analysis of the related barley chromosome 1H (Mayer et al. 2009). However, three smaller syntenic blocks were also identified that most likely represent smaller translocations from other chromosomes. Using the syntenic relationships, it was possible to identify genes in the sequenced grass species that are likely to be present on chromosome arm 1AL, in addition to those identified by similarity to ESTs. These relationships will be useful in identifying candidate genes for QTLs that have been mapped to this chromosome. For example, Li et al. (2010) identified a QTL for tolerance to photo-oxidative stress on 1AL, while our study detected a putative peroxidase and Cytochrome P450 protein that may be involved in the oxidative stress response. Yield is a sufficiently complex trait that QTLs influencing yield-related traits have been mapped to most wheat chromosomes, but meta-QTL analysis identified two yield-enhancing MQTLs on chromosome 1AL (Zhang et al. 2010). In this regard the four carbohydrate metabolism genes identified here are of particular interest for further study, as yield is affected by the balance of starch production and degradation in the grain.
One important locus associated with grain quality QTLs that have been mapped to chromosome 1AL is the high-molecular weight glutenin gene Glu-A1 (Kuchel et al. 2006). The bulk of the Glu1 protein sequence is made up of two repeated motifs (consensus sequences PGQGQQ and GYYPTSLQQ), and the gene is not found in Brachypodium or rice. However, when the non-repetitive sections of the Glu1 protein sequence (the N-terminal 120 residues and C-terminal 45 residues) are used in similarity searches against these two species, their best matches are Bradi2g20870 (prolamin subfamily 2) and Os05g41970 (SSA1 family protein), respectively. Both of these putative proteins are involved in seed storage and located in the same syntenic location in this study—block III, between BACs Tae1AL160D24 and Tae1AL13G19. Therefore, it is possible that the Glu-A1 gene is evolutionarily related to these two genes, and also found at the same location, which we will be able to characterise more closely once the 1AL physical map is finalised.
We are grateful to Prof. B. S. Gill (Kansas State University, Manhattan, USA) for seeds of the double ditelosomic line 1A of wheat T. aestivum L. cv. Chinese Spring. We thank our colleagues, Dr. Jarmila Číhalíková, Dr. Marie Kubaláková and Romana Šperková, Bc. for chromosome sorting and Jana Dostálová, Bc., Radka Tušková, Helena Tvardíková, and Dr. Marie Seifertová for excellent technical assistance in BAC library construction and Z. Weinstein for the help with the MS. The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013) under the grant agreement no. FP7-212019.