Molecular Genetics and Genomics

, Volume 270, Issue 5, pp 371–377

Discrimination of homoeologous gene expression in hexaploid wheat by SNP analysis of contigs grouped from a large number of expressed sequence tags

Authors

  • K. Mochida
    • Kihara Institute for Biological Research, Graduate School of Integrated ScienceYokohama City University
  • Y. Yamazaki
    • Genetic Informatics Laboratory, Center for Genetic Resource InformationNational Institute of Genetics
    • Kihara Institute for Biological Research, Graduate School of Integrated ScienceYokohama City University
Original Paper

DOI: 10.1007/s00438-003-0939-7

Cite this article as:
Mochida, K., Yamazaki, Y. & Ogihara, Y. Mol Genet Genomics (2004) 270: 371. doi:10.1007/s00438-003-0939-7

Abstract

Single-nucleotide polymorphisms (SNPs) are useful markers for gene diagnosis and mapping of genes on chromosomes. However, polyploidy, which is characteristic of the evolution of higher plants, complicates the analysis of SNPs in the duplicated genes. We have developed a new method for SNP analysis in hexaploid wheat. First, we classified a large number of expressed sequence tags (ESTs) from wheat in silico. Those grouped into contigs were anticipated to correspond to transcripts from homoeologous loci. We then selected relatively abundant ESTs, and assigned these contigs to each of the homoeologous chromosomes using a nullisomic/tetrasomic series of Chinese Spring wheat strains in combination with pyrosequencing. The ninety genes assigned were almost evenly distributed into seven homologous chromosomes. We then created a virtual display of the relative expression of these genes. Expression patterns of genes from the three genomes in hexaploid wheat were classified into two major groups: (1) genes almost equally expressed from all three genomes; and (2) genes expressed with a significant preference, which changed from tissue to tissue, from certain genomes. In 11 cases, one of the three genes in the allopolyploid was found to be silenced. No preference for gene-silencing in particular genomes or chromosomes was observed, suggesting that gene-silencing occurred after polyploidization, and at the gene level, not at the chromosome or genome level. Thus, the use of this SNP method to distinguish the expression profiles of three homoeologous genes may help to elucidate the molecular basis of heterosis in polyploid plants.

Keywords

SNP (single-nucleotide polymorphism) analysisHexaploid wheatExpressed sequence tags (ESTs)Expression patterns of homeologous genesVirtual display

Introduction

Among the landmarks available for mapping traits on chromosomes, single-nucleotide polymorphisms (SNPs) are the most expedient, because of their high number, stability, their distribution as bi-allelic sequence variations (in diploids) throughout the genome, and their ease of assay using high-throughput automated methods. In fact, the discovery and characterization of SNPs has become the main target for studies of the human genome (Altshuler et al. 2000). Such sequence variants have been detected by two methods: analysis of sequence differences in clusters of expressed sequence tags (Picoult-Newber et al. 1999; Buetow et al. 1999) and overall comparison of overlapping genome regions among DNAs inserted into a vector (Taillon-Miller et al. 1998).

Despite the importance of SNPs with respect to the study of quantitative trait loci (QTLs), few SNP analyses have so far been carried out in plants, especially in agronomically important plants (Drenkard et al. 2000; Kanazin et al. 2002; Batley et al. 2003). Some agronomically important plants are polyploids, a feature which is characteristic of many plant genomes (Soltis and Soltis 2000; Vision et al. 2000; Wendel 2000). In fact, about 70% of angiosperms were polyploid at some stage during their evolution (Masterson 1994; Leitch and Bennett 1997). Although polyploidy brought about an increase in genetic information and gave rise to novel system(s) of gene regulation, polyploid genomes are more difficult to analyze for SNPs than diploids. The ratio of SNP alleles varies in polyploid genomes (Rickert et al. 2002; Adams et al. 2003). Furthermore, the haplotypes among multigenomes are difficult to determine.

Wheat is allopolyploid in nature (AABBDD is the genome formula; Lilienfeld 1951). Wheat species (of the Triticum-Aegilops group) can provide a model system for the analysis of SNPs in polyploid plants, as the genetic relationships among wheat species have been extensively characterized (Tsunewaki 1993), and the chromosome assignment of homoeologous SNPs can easily determined using the aneuploid series of hexaploid wheat (Sears 1965). Recently, we examined a large number of ESTs expressed in ten tissues throughout the wheat life cycle, and were able to classify these ESTs into distinct contigs distributed among the three genomes (Ogihara et al. 2003). Here, we have developed a new system for SNP analysis in hexaploid wheat, which uses pyrosequencing in combination with a nullisomic-tetrasomic series of hexaploid wheat (Sears 1965) to map every transcript to the genetic map of hexaploid wheat and clarify the distinct expression patterns of homoeologous genes among the three genomes.

Unlike analogous methods for the analysis of gene expression, such as Northern hybridization and cDNA microarrays or DNA chips, digital analysis of gene expression profiling can be conducted by statistically analyzing expressed sequence tags (ESTs) in cDNA libraries, and relative transcript abundance can be estimated from EST frequencies (Ewing et al. 1999; Ogihara et al. 2003).

As the first step in the analysis of SNPs in hexaploid wheat, we assigned the transcripts of 90 genes to their homoeologous chromosomes. Those 90 genes were almost evenly distributed among seven homologous chromosomes. We then went on to create a virtual display of the relative expression levels of these genes from each genome, based on the numbers of sequenced clones (derived from RNA isolated from ten tissues) belonging to each contig.

Materials and methods

Plant materials and DNA extraction

Total DNA was isolated from 14-day-old seedlings of common wheat (Triticum aestivum cv. Chinese Spring) and its nullisomic-tetrasomic series (Sears 1965)—strains which each lack a pair of homologous chromosomes due to replacement with its homoeologues—according to the method previously described by Ogihara et al. (1994).

EST data mining

DNA sequences of plasmid inserts from cDNA libraries constructed from RNA isolated from ten tissues obtained throughout the wheat life cycle (Ogihara et al. 2003) have been determined. In all, 116,232 cDNA sequences obtained from both ends were grouped into 25,971 contigs using the PHRAP method (University of Washington Genome Center; http://www.genome.washington-edu/UWGC) under the conditions: new ace penalty −5, -minmatch 50, minscore 100. A total of 5199 contigs containing five or more constituent ESTs were selected from the 25,971 contigs (Ewing et al. 1999) and classified into 3300 gene clusters by the BLAST method (1e-60; Altschul et al. 1990). The contigs corresponding to each gene cluster were re-aligned by the PHRAP method to detect SNPs that discriminate between distinct gene loci from the three genomes (A, B, and D) of common wheat. Gene clusters with two and three contigs were used for SNP analysis. The SNPs were targeted by checking the exon region with a BLAST search (1e-5) against the genome sequence database of indica rice (Yu et al. 2002).

PCR and primers

Primers for PCR were designed based on conserved sequences outside of polymorphic regions to amplify the SNPs. For pyrosequencing (see below), either the forward or the reverse amplimer was labeled with 5′-biotinylated primers. Primers for pyrosequencing were chosen to anneal just upstream of the SNP sites on biotinylated antisense strands. PCR amplifications were carried out with 50 ng of total DNA in a GeneAmp PCR system 9600 (ABI) following the enzyme manufacturer’s instructions (KOD-Plus, Toyobo). The thermal cycling program was as follows: 94°C for 2 min, then 40 cycles of 94°C for 15 s, 65°C for 30 s and 68°C for 15 s. PCR products were analyzed by electrophoresis on 2% agarose gels.

Pyrosequencing

Haplotypes showing the SNPs were identified with the PSQ system (Pyrosequencing AB, Uppsala, Sweden). Biotin-labeled PCR products were immobilized on streptavidin-coated paramagnetic beads. Capture of biotinylated single-stranded PCR products, annealing of the sequencing primer and solid-phase pyrosequencing followed the manufacturer’s instructions.

Chromosome assignment of the haplotypes

We selected multiple SNP sites to distinguish between the haplotypes of the three genomes. In pyrosequencing, the (light) signal intensity is directly proportional to the incorporated nucleotides. This information is given in the form of a “pyrogram”. The pyrogram presents information about several dozen nucleotide sequences so that the haplotype can be inferred. We developed a new program to convert the peak heights of the SNP sites into allele frequencies. We were able to assign the haplotype of the genes to each of the three genomes by observing a decrease in the corresponding peak in a nullisomic line and an increase in the corresponding tetrasomic line.

Hierarchical clustering

The hierarchical clustering method (Eisen et al. 1998) was employed to compare the EST expression profiles among contigs from the three genomes. The expression profile is displayed based on the number of member sequences in a contig and the number of its constituent ESTs found in each cDNA library (Ogihara et al. 2003).

Results

Classification of homoeologous genes in silico

A large number of wheat ESTs were examined. In all, 116,232 single-pass sequences of cDNAs (from both the 5′ and the 3′ end), derived from ten tissues during the life cycle of wheat, were classified into 25,971 contigs using the PHRAP method (Ogihara et al. 2003). From these, 5199 contigs with five or more members showing relatively abundant gene expression were selected for further analysis. The PHRAP classification parameter was robust, such that each contig constituted a unigene from each locus. The 5199 contigs were grouped into 3300 distinct gene clusters using the BLAST method (Altschul et al. 1990). These 3300 gene clusters were searched for their counterparts using BLAST against the remaining 20,772 contigs. About half of the contigs stood alone with no counterparts (Fig. 1), perhaps reflecting the effects of genetic diploidization in the hexaploid (Ozkan et al. 2001; Shaked et al. 2001). From sequence comparison of these gene clusters, it is highly likely that gene clusters with one, two and three member contigs represented single-copy genes per genome (93.0%), and those with more than four members were derived from a multigene family in a genome (7.0%). For SNP analyses among the three genomes, we first chose gene clusters with two and three contig members; 1560 gene clusters fell into this category. By re-alignment of the contigs in the gene cluster using the PHRAP method, we were able to detect SNP sites that differentiated the three genomes. In order to amplify these SNP sites by PCR, we used the exon regions of the 1560 gene clusters for BLAST searches against the genome database of indica rice (Yu et al. 2002). The frequency of SNPs among the three genomes of common wheat was calculated to be one per 144.9 bp. Since a number of SNPs were found in each of the contigs, haplotypes can be distinguished by combinations of homoeologous SNPs among the three genomes.
Fig. 1

Numbers of genes expressed from different homoeoloci. The two numbers associated with each category are the number of expressed loci and the number of genes expressed by that number of loci (given in parentheses). Relatively abundant contigs with five or more members were selected for analysis

Chromosome assignment of homoeologous genes

Haplotypes harboring several SNPs were assigned by pyrosequencing to each chromosome of the three genomes, A, B and D. Purothionin genes were adopted as an example, because the complete sequences of the genes from all three genomes had previously been determined (Van Campenhout et al. 1998). In fact, three contigs, corresponding to Pur-A1, Pur-B1 and Pur-D1, were grouped in the gene cluster 2077-1; sequences within the third exon of these are shown in Fig. 2A. From the position chosen for the sequencing primer (marked by the open blue arrow in Fig. 2A), the corresponding DNA sequences from the three genomes had been determined simultaneously by the pyrosequencing method (as indicated by the red arrow in Fig. 2A). The resulting pyrograms are presented in Fig. 2B, together with the pyrograms of nullisomics-tetrasomics of Chinese Spring for homoeologous group 1, because the purothionin gene is located on the chromosomes of homoeologous group 1 in wheat (Van Campenhout et al. 1998). From the height of the sequencing peak, we calculated the ratio of the polymorphic nucleotides (Fig. 2C). By comparing the ratio at sites where SNPs were found in the nullisomic-tetrasomic series, we were able to assign the contigs (haplotypes) to their homoeologous chromosomes. The pyrogram data fully supported the previous chromosomal assignment of the purothionin gene (Van Campenhout et al. 1998).
Fig. 2A–C

Genotyping of three different haplotypes of the Purothionin gene. A The nucleotide sequences of the exon III of the Purothionin genes from the three genomes were aligned. The sequencing primer is represented as an open blue arrow, and the region shown in the pyrograms is indicated as a red arrow. The SNP sites of each genome are marked in red, blue and green characters, respectively. B Pyrograms of three aneuploids, i.e., nulli 1A-tetra 1D, nulli 1B-tetra 1D and nulli 1D-tetra 1A, together with normal Chinese Spring. Arbitrary luminescence units are represented on the ordinate. C Allelic frequency at each SNP site of individual strains

By repeating the pyrosequencing of Chinese Spring and its nullisomic-tetrasomic series, 270 haplotypes of 90 gene clusters were assigned to chromosomes, as summarized in Table 1. The chromosomal locations of the assigned genes were almost evenly distributed among chromosomes. Several of these markers are located in novel genes that have no homologs in the DNA databases. These SNPs supply potential sources of markers that can be mapped to any transcribed region on the chromosomes.
Table 1.

Number of genes assigned to each of three homoeologous chromosomes by pyrosequencing

Genome

Homologous chromosomea

Total

1

2

3

4

5

6

7

A

14 (2)

12 (1)

11 (0)

13 (1)

15 (0)

13 (2)

12 (0)

90 (6)

B

14 (0)

12 (0)

11 (0)

13 (0)

15 (1)

13 (0)

12 (1)

90 (2)

D

14 (1)

12 (0)

11 (1)

13 (0)

15 (1)

13 (0)

12 (0)

90 (3)

Total

42 (3)

36 (1)

33 (1)

39 (1)

45 (2)

39 (2)

36 (1)

270 (11)

aThe number of shut-down genes assigned to each chromosome of the three genomes is given in parentheses

Virtual display of homoeologous gene expression profiles

SNP analysis, in combination with virtual display of ESTs (Ogihara et al. 2003), can be used to assess the expression profiles of homoeologous genes in different tissues. The number of ESTs derived from each of ten cDNA libraries was scored for each contig, producing a two-way expression profile, i.e., contig vs. library. Based on the EST constituent matrix, hierarchical clustering was performed according to the method of Eisen et al. (1998). A virtual display of the expression profile is shown in Fig. 3. In addition to gene clusters expressed in a tissue-specific manner, genes expressed in all ten tissues were found (Fig. 3A). The 90 genes were classified into seven major groups. A remarkable feature of the investigation is the ability to discriminate the expression patterns of homoeologous genes in three different genomes, i.e., A, B and D, as displayed in Fig. 3B. Out of the 90 genes presented in Table 1, 79 were expressed from all three genomes, while the remaining 11 genes were expressed from only two genomes, indicating that the corresponding genes in the third genome were continuously shut down during the life cycle of wheat. Because the exons of all 90 genes could be amplified by PCR, these exon sequences, even if not complete, have been maintained in each of the three genomes.
Fig. 3

Clustered correlation display of wheat EST data. Ninety genes that were subjected to pyrosequencing were used to construct the clustered correlation map of the cDNA libraries from the ten wheat tissues. At the top of the panel, the ten tissues from which cDNA libraries were constructed are indicated: WhdL, crown of seedling; Whr, root; Whyd, spikelet at early flowering; Whyf, spikelet at late flowering; Wh, spike at bolting stage; Whh, spike at heading date; Whoh, pistil at heading date; Whf, spike at flowering date; Whe, seed at 10 days post-anthesis; WhSL, seed 30 days post-anthesis. A Virtual display of the pooled gene expression patterns. The color scale ranges from saturated black for 0 members to red for 85 members. The expression profile of each gene is represented by a single row of colored boxes, and that of each library is represented by a single column. The brackets indicated on the left show the clusters formed based on similarities in expression pattern. B Gene expression patterns of the three homoeologous loci are separately displayed: red indicates gene expression from the A genome, blue indicates expression from the B genome, and green indicates expression from the D genome. The color scale ranges from saturated black for 0 members to each of the three colors for 35 members. C Some contig identities are shown in expanded regions: annotations of each contig deduced from homology searches in the DNA database are given below each panel. Contig number and chromosomal location are also given on the right

Expression of 79 genes was scored in ten wheat tissues. In those 790 gene/tissue combinations, gene expression was detected in 570 plots (72.2%). A chi-squared test was carried out to test the null hypothesis that genes from the three genomes were uniformly expressed. In 152 plots (27.5%), preferential expression of genes from a certain genome was significant. In fact, 31 of 54 genes (57.4%) expressed in the pistil showed selective gene expression from a particular genome. On the other hand, only nine of 64 genes (14.1%) expressed in the spikes at the bolting stage showed significant preferential expression from one genome. The frequencies of genes showing genome preference in the other eight tissues were consistently around 27%. However, in no case was a gene predominantly expressed from a specific genome in all tissues. For individual gene expression patterns, 15 genes (19.0%) were expressed uniformly from all three genomes, whereas the remaining 64 genes showed preferential expression from a certain genome in at least one tissue. The patterns of preferential gene expression from the three genomes differed from tissue to tissue (Fig. 3C). No preference for gene expression from a certain genome over the other genomes, either during the wheat life cycle or during specific developmental stages, such as vegetative growth, reproductive growth or seed maturation, was found. These lines of evidence indicate that one-fifth of the genes are always expressed from all three genomes represented in the hexaploid, but the remaining genes show preferential expression from certain genomes, and the preferred genome can vary from tissue to tissue.

In eleven cases, continuous shut-down of expression from one of the three genomes was observed. Chromosome assignments of inactivated genes are also presented in Table 1. Six genes were assigned to the A genome, two to the B genome and three to the D genome. Shut-down of gene expression occurred with almost equal frequency on all the seven chromosomes analyzed here (Table 11), showing no preference for shut-down of gene expression among the three genomes and seven homologous chromosomes. This indicates that suppression of gene expression in certain genomes took place for both genes on a pair of homologous chromosomes, but did not occur at the level of the chromosome or genome. As to the mechanism of shut-down of gene expression, mutation in the promoter region (Nemoto et al. 2003), methylation (Shaked et al. 2001) or gene silencing (Scheid et al. 2002) are plausible possibilities, because at least parts of the exons were retained in all genomes. Expression of eleven genes was scored in ten tissues. In those 110 gene/tissue combinations, gene expression was detected in 68 plots (61.8%). A chi-squared test showed that four genes were not preferentially expressed from certain genomes, whereas the remaining seven genes showed preferential expression from a certain genome over the others in at least one tissue. In fact, three of seven genes (42.9%) expressed in the pistil and two of five genes expressed in the seeds (DPA10 and 30), showed selective gene expression from a certain genome, although sample gene numbers were limited. In contrast, no preference for gene expression from specific genomes was found, except in two cases: (1) a gene homologous to ascorbate peroxidase of barley (NCBI Accession No. AF411228) showed dominant expression of the D genome over the B genome in three out of the nine tissues in which it was expressed (Fig. 3C); and (2) a gene homologous to glycine decarboxylase complex H-protein of Arabidopsis thaliana (NCBI Accession No. NP 181057) revealed dominant expression from the D genome over the B genome in three out of the seven tissues in which it was expressed. It should be pointed out that these two genes were expressed at the same or higher level from the B genome as from the D genome in the other tissues investigated. These lines of evidence suggest that most genes were selected randomly or uniformly for expression from the two genomes in tissues throughout the wheat life cycle, as in the case of genes expressed from all three genomes, but some genes (2/11) showed dominant gene expression from the D genome over that from the B genome under conditions in which only two out of the three genomes could potentially express them.

Discussion

SNP analysis in a polyploid background

We have developed a new SNP analysis system in hexaploid wheat. Our system consists of two steps. First we classified a large number of wheat ESTs in silico. These classified contigs were expected to correspond to the transcripts from each of the homoeologous loci. We then selected relatively abundant ESTs, and assigned these contigs to each of the homoeologous chromosomes using a nullisomic-tetrasomic series of hexaploid wheat and pyrosequencing. Pyrosequencing was found to be superior to other systems such as DHPLC (Hoogendoorn et al. 1999) and SnapShot (Makridakis and Rechardt 2001), because of its linear dose-response and high throughput activity. Furthermore, pyrosequencing data were available for several dozens of sequences, so that we could obtain information not only for the ratio of SNPs in the hexaploid background, but also for the linkage of SNPs belonging to each haplotype in the three genomes. Consequently, the pyrosequencing method is the most suitable method for SNP analysis among cultivars or strains of polyploid plants.

We assigned the genes to each of the homoeologous chromosomes by combining the pyrosequencing method with the nullisomics-tetrasomics series of hexaploid wheat (Table 1). Since SNPs that distinguish between the homoeologues of hexaploid wheat were estimated to occur once per 144.9 bp, it is reasonable to expect that every single-copy gene in each genome can be distinguished and assigned to its homoeologous chromosomes. Once the haplotypes of the homoeologous genes have been established, we can map the genes on each homoeologous genetic map, even in populations obtained from crosses between wheat strains. Because SNPs are highly polymorphic, every gene should contain a few SNPs, even among strains (Cho et al. 1999). These high-density EST markers, combined with QTL data for phenotypic characters, will provide a new system of breeding, i.e., gene-mediated breeding instead of marker-assisted selection (Lange and Whittacker 2001). We have now begun a project aimed at analyzing SNPs in common wheat strains.

Distinguishing homoeologous transcripts from three distinct genes

It is well known that polyploidy induces heterosis, but few genome-wide studies have been carried out to clarify the molecular basis of polyploid heterosis (Soltis and Soltis 1999). We have analyzed the expression of 90 genes, whose transcripts are relatively abundant. Expression patterns were classified into several categories. (Fig. 3). (1) Genes that were expressed from all three genomes in the tissues in which they were expressed. (2) Genes that could potentially be expressed from all three genomes, but were actually expressed preferentially from one or other genome in different tissues. Genes belonging to these two categories have not been subjected to inactivation/elimination during diploidization of polyploids (Ozkan et al. 2001; Shaked et al. 2001). (3) Genes that were inactive in one genome, while usage of the other two genomes was random. (4) Genes that were inactive in one genome, and showed preferential expression from one of the other two in some tissues. Gene silencing or shut-down of gene expression might be a process that accompanies diploidization. It should be emphasized again that the expression or silencing of genes took place randomly on seven homologous chromosomes and three genomes, and revealed no significant chromosome or genome specificity. Therefore, it can be concluded that the regulation of gene expression and silencing among homoeologues takes place at the level of the individual gene, and not at the chromosome or genome level. Further examination of gene expression patterns inferred from SNP analysis of the three homoeologuous genomes of wheat should eventually provide an understanding of the molecular basis of heterosis in polyploid plant species.

Acknowledgements

This work was supported by Grants-in-Aid for Scientific Research on Priority Areas (C) “Genome Science” and Basic Research (A) (Nos. 13202055 and 13356001) from the Ministry of Education, Culture, Sports, Science and Technology of Japan.

Copyright information

© Springer-Verlag 2003