Discrimination of homoeologous gene expression in hexaploid wheat by SNP analysis of contigs grouped from a large number of expressed sequence tags
- First Online:
- Cite this article as:
- Mochida, K., Yamazaki, Y. & Ogihara, Y. Mol Genet Genomics (2004) 270: 371. doi:10.1007/s00438-003-0939-7
- 482 Views
Single-nucleotide polymorphisms (SNPs) are useful markers for gene diagnosis and mapping of genes on chromosomes. However, polyploidy, which is characteristic of the evolution of higher plants, complicates the analysis of SNPs in the duplicated genes. We have developed a new method for SNP analysis in hexaploid wheat. First, we classified a large number of expressed sequence tags (ESTs) from wheat in silico. Those grouped into contigs were anticipated to correspond to transcripts from homoeologous loci. We then selected relatively abundant ESTs, and assigned these contigs to each of the homoeologous chromosomes using a nullisomic/tetrasomic series of Chinese Spring wheat strains in combination with pyrosequencing. The ninety genes assigned were almost evenly distributed into seven homologous chromosomes. We then created a virtual display of the relative expression of these genes. Expression patterns of genes from the three genomes in hexaploid wheat were classified into two major groups: (1) genes almost equally expressed from all three genomes; and (2) genes expressed with a significant preference, which changed from tissue to tissue, from certain genomes. In 11 cases, one of the three genes in the allopolyploid was found to be silenced. No preference for gene-silencing in particular genomes or chromosomes was observed, suggesting that gene-silencing occurred after polyploidization, and at the gene level, not at the chromosome or genome level. Thus, the use of this SNP method to distinguish the expression profiles of three homoeologous genes may help to elucidate the molecular basis of heterosis in polyploid plants.
KeywordsSNP (single-nucleotide polymorphism) analysisHexaploid wheatExpressed sequence tags (ESTs)Expression patterns of homeologous genesVirtual display
Among the landmarks available for mapping traits on chromosomes, single-nucleotide polymorphisms (SNPs) are the most expedient, because of their high number, stability, their distribution as bi-allelic sequence variations (in diploids) throughout the genome, and their ease of assay using high-throughput automated methods. In fact, the discovery and characterization of SNPs has become the main target for studies of the human genome (Altshuler et al. 2000). Such sequence variants have been detected by two methods: analysis of sequence differences in clusters of expressed sequence tags (Picoult-Newber et al. 1999; Buetow et al. 1999) and overall comparison of overlapping genome regions among DNAs inserted into a vector (Taillon-Miller et al. 1998).
Despite the importance of SNPs with respect to the study of quantitative trait loci (QTLs), few SNP analyses have so far been carried out in plants, especially in agronomically important plants (Drenkard et al. 2000; Kanazin et al. 2002; Batley et al. 2003). Some agronomically important plants are polyploids, a feature which is characteristic of many plant genomes (Soltis and Soltis 2000; Vision et al. 2000; Wendel 2000). In fact, about 70% of angiosperms were polyploid at some stage during their evolution (Masterson 1994; Leitch and Bennett 1997). Although polyploidy brought about an increase in genetic information and gave rise to novel system(s) of gene regulation, polyploid genomes are more difficult to analyze for SNPs than diploids. The ratio of SNP alleles varies in polyploid genomes (Rickert et al. 2002; Adams et al. 2003). Furthermore, the haplotypes among multigenomes are difficult to determine.
Wheat is allopolyploid in nature (AABBDD is the genome formula; Lilienfeld 1951). Wheat species (of the Triticum-Aegilops group) can provide a model system for the analysis of SNPs in polyploid plants, as the genetic relationships among wheat species have been extensively characterized (Tsunewaki 1993), and the chromosome assignment of homoeologous SNPs can easily determined using the aneuploid series of hexaploid wheat (Sears 1965). Recently, we examined a large number of ESTs expressed in ten tissues throughout the wheat life cycle, and were able to classify these ESTs into distinct contigs distributed among the three genomes (Ogihara et al. 2003). Here, we have developed a new system for SNP analysis in hexaploid wheat, which uses pyrosequencing in combination with a nullisomic-tetrasomic series of hexaploid wheat (Sears 1965) to map every transcript to the genetic map of hexaploid wheat and clarify the distinct expression patterns of homoeologous genes among the three genomes.
Unlike analogous methods for the analysis of gene expression, such as Northern hybridization and cDNA microarrays or DNA chips, digital analysis of gene expression profiling can be conducted by statistically analyzing expressed sequence tags (ESTs) in cDNA libraries, and relative transcript abundance can be estimated from EST frequencies (Ewing et al. 1999; Ogihara et al. 2003).
As the first step in the analysis of SNPs in hexaploid wheat, we assigned the transcripts of 90 genes to their homoeologous chromosomes. Those 90 genes were almost evenly distributed among seven homologous chromosomes. We then went on to create a virtual display of the relative expression levels of these genes from each genome, based on the numbers of sequenced clones (derived from RNA isolated from ten tissues) belonging to each contig.
Materials and methods
Plant materials and DNA extraction
Total DNA was isolated from 14-day-old seedlings of common wheat (Triticum aestivum cv. Chinese Spring) and its nullisomic-tetrasomic series (Sears 1965)—strains which each lack a pair of homologous chromosomes due to replacement with its homoeologues—according to the method previously described by Ogihara et al. (1994).
EST data mining
DNA sequences of plasmid inserts from cDNA libraries constructed from RNA isolated from ten tissues obtained throughout the wheat life cycle (Ogihara et al. 2003) have been determined. In all, 116,232 cDNA sequences obtained from both ends were grouped into 25,971 contigs using the PHRAP method (University of Washington Genome Center; http://www.genome.washington-edu/UWGC) under the conditions: new ace penalty −5, -minmatch 50, minscore 100. A total of 5199 contigs containing five or more constituent ESTs were selected from the 25,971 contigs (Ewing et al. 1999) and classified into 3300 gene clusters by the BLAST method (1e-60; Altschul et al. 1990). The contigs corresponding to each gene cluster were re-aligned by the PHRAP method to detect SNPs that discriminate between distinct gene loci from the three genomes (A, B, and D) of common wheat. Gene clusters with two and three contigs were used for SNP analysis. The SNPs were targeted by checking the exon region with a BLAST search (1e-5) against the genome sequence database of indica rice (Yu et al. 2002).
PCR and primers
Primers for PCR were designed based on conserved sequences outside of polymorphic regions to amplify the SNPs. For pyrosequencing (see below), either the forward or the reverse amplimer was labeled with 5′-biotinylated primers. Primers for pyrosequencing were chosen to anneal just upstream of the SNP sites on biotinylated antisense strands. PCR amplifications were carried out with 50 ng of total DNA in a GeneAmp PCR system 9600 (ABI) following the enzyme manufacturer’s instructions (KOD-Plus, Toyobo). The thermal cycling program was as follows: 94°C for 2 min, then 40 cycles of 94°C for 15 s, 65°C for 30 s and 68°C for 15 s. PCR products were analyzed by electrophoresis on 2% agarose gels.
Haplotypes showing the SNPs were identified with the PSQ system (Pyrosequencing AB, Uppsala, Sweden). Biotin-labeled PCR products were immobilized on streptavidin-coated paramagnetic beads. Capture of biotinylated single-stranded PCR products, annealing of the sequencing primer and solid-phase pyrosequencing followed the manufacturer’s instructions.
Chromosome assignment of the haplotypes
We selected multiple SNP sites to distinguish between the haplotypes of the three genomes. In pyrosequencing, the (light) signal intensity is directly proportional to the incorporated nucleotides. This information is given in the form of a “pyrogram”. The pyrogram presents information about several dozen nucleotide sequences so that the haplotype can be inferred. We developed a new program to convert the peak heights of the SNP sites into allele frequencies. We were able to assign the haplotype of the genes to each of the three genomes by observing a decrease in the corresponding peak in a nullisomic line and an increase in the corresponding tetrasomic line.
The hierarchical clustering method (Eisen et al. 1998) was employed to compare the EST expression profiles among contigs from the three genomes. The expression profile is displayed based on the number of member sequences in a contig and the number of its constituent ESTs found in each cDNA library (Ogihara et al. 2003).
Classification of homoeologous genes in silico
Chromosome assignment of homoeologous genes
Number of genes assigned to each of three homoeologous chromosomes by pyrosequencing
Virtual display of homoeologous gene expression profiles
Expression of 79 genes was scored in ten wheat tissues. In those 790 gene/tissue combinations, gene expression was detected in 570 plots (72.2%). A chi-squared test was carried out to test the null hypothesis that genes from the three genomes were uniformly expressed. In 152 plots (27.5%), preferential expression of genes from a certain genome was significant. In fact, 31 of 54 genes (57.4%) expressed in the pistil showed selective gene expression from a particular genome. On the other hand, only nine of 64 genes (14.1%) expressed in the spikes at the bolting stage showed significant preferential expression from one genome. The frequencies of genes showing genome preference in the other eight tissues were consistently around 27%. However, in no case was a gene predominantly expressed from a specific genome in all tissues. For individual gene expression patterns, 15 genes (19.0%) were expressed uniformly from all three genomes, whereas the remaining 64 genes showed preferential expression from a certain genome in at least one tissue. The patterns of preferential gene expression from the three genomes differed from tissue to tissue (Fig. 3C). No preference for gene expression from a certain genome over the other genomes, either during the wheat life cycle or during specific developmental stages, such as vegetative growth, reproductive growth or seed maturation, was found. These lines of evidence indicate that one-fifth of the genes are always expressed from all three genomes represented in the hexaploid, but the remaining genes show preferential expression from certain genomes, and the preferred genome can vary from tissue to tissue.
In eleven cases, continuous shut-down of expression from one of the three genomes was observed. Chromosome assignments of inactivated genes are also presented in Table 1. Six genes were assigned to the A genome, two to the B genome and three to the D genome. Shut-down of gene expression occurred with almost equal frequency on all the seven chromosomes analyzed here (Table 11), showing no preference for shut-down of gene expression among the three genomes and seven homologous chromosomes. This indicates that suppression of gene expression in certain genomes took place for both genes on a pair of homologous chromosomes, but did not occur at the level of the chromosome or genome. As to the mechanism of shut-down of gene expression, mutation in the promoter region (Nemoto et al. 2003), methylation (Shaked et al. 2001) or gene silencing (Scheid et al. 2002) are plausible possibilities, because at least parts of the exons were retained in all genomes. Expression of eleven genes was scored in ten tissues. In those 110 gene/tissue combinations, gene expression was detected in 68 plots (61.8%). A chi-squared test showed that four genes were not preferentially expressed from certain genomes, whereas the remaining seven genes showed preferential expression from a certain genome over the others in at least one tissue. In fact, three of seven genes (42.9%) expressed in the pistil and two of five genes expressed in the seeds (DPA10 and 30), showed selective gene expression from a certain genome, although sample gene numbers were limited. In contrast, no preference for gene expression from specific genomes was found, except in two cases: (1) a gene homologous to ascorbate peroxidase of barley (NCBI Accession No. AF411228) showed dominant expression of the D genome over the B genome in three out of the nine tissues in which it was expressed (Fig. 3C); and (2) a gene homologous to glycine decarboxylase complex H-protein of Arabidopsis thaliana (NCBI Accession No. NP 181057) revealed dominant expression from the D genome over the B genome in three out of the seven tissues in which it was expressed. It should be pointed out that these two genes were expressed at the same or higher level from the B genome as from the D genome in the other tissues investigated. These lines of evidence suggest that most genes were selected randomly or uniformly for expression from the two genomes in tissues throughout the wheat life cycle, as in the case of genes expressed from all three genomes, but some genes (2/11) showed dominant gene expression from the D genome over that from the B genome under conditions in which only two out of the three genomes could potentially express them.
SNP analysis in a polyploid background
We have developed a new SNP analysis system in hexaploid wheat. Our system consists of two steps. First we classified a large number of wheat ESTs in silico. These classified contigs were expected to correspond to the transcripts from each of the homoeologous loci. We then selected relatively abundant ESTs, and assigned these contigs to each of the homoeologous chromosomes using a nullisomic-tetrasomic series of hexaploid wheat and pyrosequencing. Pyrosequencing was found to be superior to other systems such as DHPLC (Hoogendoorn et al. 1999) and SnapShot (Makridakis and Rechardt 2001), because of its linear dose-response and high throughput activity. Furthermore, pyrosequencing data were available for several dozens of sequences, so that we could obtain information not only for the ratio of SNPs in the hexaploid background, but also for the linkage of SNPs belonging to each haplotype in the three genomes. Consequently, the pyrosequencing method is the most suitable method for SNP analysis among cultivars or strains of polyploid plants.
We assigned the genes to each of the homoeologous chromosomes by combining the pyrosequencing method with the nullisomics-tetrasomics series of hexaploid wheat (Table 1). Since SNPs that distinguish between the homoeologues of hexaploid wheat were estimated to occur once per 144.9 bp, it is reasonable to expect that every single-copy gene in each genome can be distinguished and assigned to its homoeologous chromosomes. Once the haplotypes of the homoeologous genes have been established, we can map the genes on each homoeologous genetic map, even in populations obtained from crosses between wheat strains. Because SNPs are highly polymorphic, every gene should contain a few SNPs, even among strains (Cho et al. 1999). These high-density EST markers, combined with QTL data for phenotypic characters, will provide a new system of breeding, i.e., gene-mediated breeding instead of marker-assisted selection (Lange and Whittacker 2001). We have now begun a project aimed at analyzing SNPs in common wheat strains.
Distinguishing homoeologous transcripts from three distinct genes
It is well known that polyploidy induces heterosis, but few genome-wide studies have been carried out to clarify the molecular basis of polyploid heterosis (Soltis and Soltis 1999). We have analyzed the expression of 90 genes, whose transcripts are relatively abundant. Expression patterns were classified into several categories. (Fig. 3). (1) Genes that were expressed from all three genomes in the tissues in which they were expressed. (2) Genes that could potentially be expressed from all three genomes, but were actually expressed preferentially from one or other genome in different tissues. Genes belonging to these two categories have not been subjected to inactivation/elimination during diploidization of polyploids (Ozkan et al. 2001; Shaked et al. 2001). (3) Genes that were inactive in one genome, while usage of the other two genomes was random. (4) Genes that were inactive in one genome, and showed preferential expression from one of the other two in some tissues. Gene silencing or shut-down of gene expression might be a process that accompanies diploidization. It should be emphasized again that the expression or silencing of genes took place randomly on seven homologous chromosomes and three genomes, and revealed no significant chromosome or genome specificity. Therefore, it can be concluded that the regulation of gene expression and silencing among homoeologues takes place at the level of the individual gene, and not at the chromosome or genome level. Further examination of gene expression patterns inferred from SNP analysis of the three homoeologuous genomes of wheat should eventually provide an understanding of the molecular basis of heterosis in polyploid plant species.
This work was supported by Grants-in-Aid for Scientific Research on Priority Areas (C) “Genome Science” and Basic Research (A) (Nos. 13202055 and 13356001) from the Ministry of Education, Culture, Sports, Science and Technology of Japan.