Potato unigene assembly
A total of 219,507 ESTs passed through the sequence-cleaning pipeline and were used in the unigene assembly. The ESTs were grouped into 18,343 clusters, and the assembly yielded 26,474 contigs with an average length of 991 bp. The contigs were comprised of 199,636 ESTs. Of the remaining 19,871 singleton ESTs, 9,800 sequences clustered together with other sequences. The singleton ESTs had an average length of 561 bp. The combined contigs and singletons produced a total of 46,345 unique sequences with an average length of 806 bp. From these unigenes, 60-mer oligos were designed for 42,034 of the sequences using Agilent’s proprietary methods. The sequences on the POCI array represent a total of 184,620 ESTs. A comparison of the sequences on the POCI array to those on the TIGR cDNA array indicated that 19,986 (47.5%) of the POCI sequences were not present on the TIGR array. In contrast, only 80 (0.7%) of the TIGR chip sequences were unique at the same BLAST E value (<1 × 10−10). To give POCI users access to all relevant data concerning sequences on the POCI chip, we set up a searchable POCI database (http://pgrc.ipk-gatersleben.de/poci). Sequences of synthesized oligo nucleotides as well as unigenes can be viewed, downloaded, and searched by their unigene identifier. The ESTs identifiers that were used to generate the unigene set are available, together with their membership relation to the unigene identifiers. ESTs that did not cluster with any other are marked as singletons.
A self-BLASTN was also conducted to allow microarray users to identify similar genes on the POCI array. Designed oligos were also compared with the unigenes to identify similar genes. This information is useful for the study of candidate genes and pathways, as well as for explaining similar gene expression profiles. A summary of the self-BLAST and oligo-BLAST data is presented in Table 2, and the complete data set can be searched on the POCI website. The majority (84.3%) of the oligos did not have significant similarity to nontarget unigenes (Table 2). Only 120 oligo sequences perfectly matched nontarget unigenes, and most (6,346 out of 6,588) of those oligos with significant nontarget hits had less than 90% similarity (data not shown).
POCI gene annotation
The POCI unigenes were blasted against the NCBI nr database and the Uniref100 protein database for functional annotation and GO. Figure 1 summarizes the biological functional annotation of the POCI sequences. Of the total number of unigenes on the array, 13,900 (33.1%) did not have a significant BLAST hit against the protein nr database. The computed annotation of the POCI unigenes with NCBI nr and GO is also available and searchable by keywords in the previously mentioned POCI database (http://pgrc.ipk-gatersleben.de/poci). Moreover, complete results of alignments against several relevant databases are available. In detail, these are:
To find possible homologous sequences, the algorithms blastn and blastx were used for Solanaceae databases and less related organisms, respectively. For further investigation, whole-sequence alignments can be downloaded. Because all these results are precomputed and stored, a reverse search by database identifiers is also possible. This is especially helpful to find representatives/homologues of candidate genes derived from other platforms on the POCI chip. If the respective identifier is not known, direct BLAST searches (blastn and tblastx) using single sequences can be performed against the POCI unigene set.
The correct functional annotation of all features on the POCI array is important if one wants to infer biological meaning from obtained expression data. Therefore, the POCI database also contains a POCI annotation tool that works as a platform for manual curation of the POCI annotation. With this tool, the precomputed annotation for a unigene can be changed by every registered user with an editing account. Every annotation that has not been curated manually is marked with “NOT VALIDATED.” This tool will facilitate ongoing improvement of the annotations of sequences on the chip.
Transcriptional changes during potato tuber development
The POCI array design is based on a large EST collection containing genes from a wide variety of tissues, developmental stages, and treatments allowing for a diverse range of gene transcription studies covering all aspects of potato biology. In this paper, we describe a first use of the POCI array in a study to follow transcriptional changes during the process of potato tuber formation to identify genes driving potato tuber initiation and growth. Since the tuber development range has been studied previously using a small dedicated spotted cDNA microarray (Kloosterman et al. 2005), the newly obtained data can be assessed through comparison with the previously published data set. The tuber development range includes the various physiological events leading up to the induction, formation, and early growth stages of a potato tuber (Fig. 2a). The original developmental range described in Kloosterman et al. (2005) contained two additional developmental stages (stages 6 and 8) that were excluded from the present study to allow for an optimal loop design (Fig. 2b) and to reduce the number of hybridizations. Stages that were included are: stage 1, stolon tip grown under long day conditions; stage 2, stolon tip under short day (SD) conditions; stage 3, subapical stolon swelling 6–7 days after the switch to SD conditions; stage 4, tuber initiation (7–8 days); stages 5–7, tuber growth stages (9.5 and 11 days). The six developmental stages were labeled and hybridized to the array, and the data were analyzed as described in the “Materials and methods.”
Estimated log ratios of 31,293 features passed quality check and significant expression levels and were selected for further analysis (Supplementary Table S2). Principal component analysis (PCA) showed a clear influence of the developmental time range with the first component (PC1), explaining 67.8% of the variance (PC1 = 67.8%, PC1 + 2 = 89.9%, PC1 + 2 + 3 = 95.2%). A similar distribution of the developmental stages was found in the previous study in which the time component (X-axis) explains a large percentage of the observed variance (Fig. 3). The second component (Y-axis) explains 22.1% of the observed variation. This component can probably be attributed to transcriptional changes during stages 3 and 4 during which the transition from stolon to a tuber takes place.
Genes were tested for differential expression within each of the six developmental stages (p < 0.05) based on the estimated log values and standard errors. A total of 15,959 features showed differential hybridization in at least one of the six developmental stages, while 15,334 features were not significantly differentially expressed (Table 3). The latter group is, however, likely to include genes that are differentially expressed from a biological standpoint and therefore still interesting but could not be classified as such due to large standard deviations. The inclusion of more biological replicates could potentially impart more significance to such data.
Genes exhibiting differential expression during the process of potato tuber formation exhibit both unique and common expression profiles. To identify the most common expression profiles, self-organizing maps were calculated. The most common expression profiles found were categorized in six major groups with one other group containing all remaining profiles (Table 3). Within the nondifferentially expressed class, we were able to identify 1,258 features that showed very low variation in transcript levels during the developmental time range. Such genes were classified as constitutively expressed during tuber development, and the average profile of this group was used for comparison to other expression profiles (Fig. 4).
It is interesting to note that the most common expression profile found among the genes analyzed follows a downregulated expression pattern, which is particularly strong at tuber onset (Fig. 4a) and represents 3,778 features (Table 3). A much smaller set of genes (862 features) also follow a downregulated expression pattern with a delay compared to the first group, dropping in expression from stages 4, 5, and onward (Fig. 4b). In general, these genes showed a less pronounced decrease in comparison to the early downregulated subset (Fig. 4a,b).
The second largest group (Table 3; 3,544 features) exhibits a temporarily upregulated expression pattern at tuber onset (Fig. 4e). As stolon growth is inhibited and cells in the subapical stolon region follow a transition to the tuber state, the expression of new sets of genes and proteins are required to facilitate this transition and initiation of cell division and expansion. Based on the GO classification of genes represented in the POCI array, the majority of genes in the transient upregulated group could be unequivocally assigned to a specific GO classification associated with translation (52%; GO; 0006412) and cell cycle (24%; GO: 0007049; data not shown). Similarly, a large proportion of genes within this specific profile group are, based on sequence homology, associated with the ribosome (33%; GO: 0005840), with respect to their cellular process (data not shown).
In sharp constrast, few genes exhibiting a transient downregulated expression profile (Fig. 4f) are associated with translational activity or cell cycle (data not shown). During the transition of a stolon to a tuber, the plane of cell division and expansion is reorientated to facilitate radial swelling (Xu et al. 1998b). The cytoskeleton has to be reorientated in a similar fashion during these stages. In fact, the largest proportion of genes associated with the cytoskeleton (32%) follow a transient downregulated trend (data not shown). The transient downregulated profile group, however, is much smaller (1,492 features) in comparison to the transient upregulated profile (3,544 features; Table 3).
Analogous to the early and late downregulated profiles, we could make a distinction between early and late upregulated transcript profiles. The early upregulated genes (Fig. 4c) exhibit an increase in transcription levels during stage 2, often prior to visible swelling, indicating early transcriptional control and a changing metabolic status, as many sucrose-related genes are strongly induced. The import of sucrose, initially apoplastic and subsequently symplastic (Viola et al. 2001), is undoubtedly a major factor in potato tuber formation. In vitro tuberization requires a certain threshold of sucrose concentration, and a clear link with gibberellic acid (GA) metabolism has been made previously (Xu et al. 1998a). Overall, genes in the early upregulated category are strongly induced during stages 3, 4, and 5 after which transcript levels remain relatively stable during further tuber growth. On average, the upregulated genes exhibiting a somewhat delayed increase in transcript levels (stages 4–7), clearly visible from stage 4 onward (Fig. 4d), continue to show a strong increase until the last included developmental stage at which tuber filling is thought to be at its peak.
The time point and developmental stages at which transcriptional changes occur provide clues as to their biological function. They also indicate a shift in the metabolic status during tuber transition and subsequent tuber growth that can be analyzed in more detail. The expression level of a gene that has been shown to be important for tuber formation (StGA2ox1) serves as a good example of the set of genes that are early induced at tuber onset. StGA2ox1 has recently been shown to be strongly upregulated at tuber onset in the subapical stolon region and is thought to be involved in the orientation of initial cell divisions and expansion at tuber organogenesis (Kloosterman et al. 2007). StGA2ox1 was first identified using the small dedicated cDNA microarray, and its strong upregulated expression pattern was later confirmed by quantitative reverse transcriptase polymerase chain reaction (qRT-PCR; Kloosterman et al. 2005). StGA2ox1 expression data obtained with the POCI array provides a similar expression pattern but achieves a much higher level of sensitivity when compared with the cDNA array results (Fig. 5). Similarly, expression profiles of other candidate genes that exhibit strong transcriptional control at tuber onset have been shown to correlate well with qRT-PCR data. Expression data of differentially expressed genes and other gene expression profiles represented on the array can be found in Supplementary Table S2.
Biological function of expression profiles
Based on the observed common expression profiles, gene function of individual members or individual metabolic routes that are over-represented within a particular profile can be assessed. As mentioned above, genes associated with the cell cycle (GO: 0007049) are over-represented (24%) within the transient upregulated profile group (data not shown). This is not surprising, as during the stages of tuber initiation, an increase in cell division has been reported (Xu et al. 1998b), followed by a growth period of active cell division and expansion until the final tuber size is reached. The bulk of the mature tuber is formed by the perimedullary region, which originates from a small band of cells around the vascular bundle during tuber growth (Xu et al. 1998b). The level of cell mitotic activity together with cell expansion occurs particularly during the stages of stolon swelling and initial tuber growth, allowing for a rapid growth period of the newly formed tuber. The transient increase in transcript levels of cell cycle-related genes serve as a good marker for the increase in overall cell cycle activity during stolon swelling (Fig. 6).
The processes and genes involved in the initiation of cell division in the subapical region of the stolon, as well as the time point of inhibition of further cell divisions, is still largely unknown. We have recently shown that an auxin-dependent regulator of cell growth (EBP1) is transiently upregulated during tuber development. StEBP1 was shown to be required for dose-dependent expression of cell cycle genes in potato and Arabidopsis (Horvath et al. 2006). Using the self-BLAST information, we identified eight unigenes present on the POCI array that show high homology to the StEBP1 nucleotide sequence of which six gave a significant signal throughout the tuber development range (Fig. 6). The unigene MICRO.127.C2 shares the highest homology to the cloned StEBP1 (99%) and confirms the transient upregulated profile published previously (Horvath et al. 2006). The remaining unigenes could either represent closely related family members, different allelic variations, or nonoverlapping 5′- and 3′-derived ESTs from the same transcribed gene and are referred to here as EBP1-like genes. MICRO.127.C1, MICRO.127.C3, PPCAS07TH, BPLI4F10TH, and MICRO.127.C4 have regions of high sequence homology with the coding sequence of StEBP1/MICRO.127.C2, but variation in the nucleotide sequence is significant and is particularily evident in the 3′-untranslated region allowing differentiating probe designs (Fig. 7). MICRO.127.C4 has a unique 73-bp insertion that is absent in the homologous unigenes and StEBP1. Based on the oligo design for this probe, cross-hybridization is unlikely, and therefore, it is interesting to note that MICRO.127.C4 exhibits a similar expression pattern as MICRO.127.C2 (StEBP1; Fig. 6) and thus may reveal allelic variation or the existence of homologous genes that could fulfil redundant or indeed novel functional roles in controlling cell cycle-related processes. Similarily, MICRO.127.C1 codes for a predicted protein of 377 amino acids, sharing 92% sequence similarity to StEBP1, but it exhibits a more upregulated profile during development very similar to the expression profile of singleton PPCAS07TH. Singleton BPLI4F10TH and MICRO.127.C3 show a similar expression pattern with elevated transcript levels at stage 1 and stages 4 and 5 (Fig. 6). In addition, the BPLI4F10TH sequence reveals a 16-bp deletion in a relatively conserved region on which the probe design was targeted (Fig. 7). Hence, by designing the unique potato 60-mer oligo in regions of high sequence variation, closely related family members or large allelic variations can be distinguished. This provides additional levels of gene expression information less likely to be obtained through conventional spotted cDNA microarrays. However, verification of observed expression profiles of highly homologous genes should always be confirmed by qRT-PCR using gene- or allele-specific primers.
Metabolic pathway analysis
In a previous study (Kloosterman et al. 2005), gene expression profiles of enzymes active in the starch biosynthesis route were assessed during potato tuber development and provided an overview of the key metabolic steps. The POCI array comprises roughly 20 times the number of unigenes compared to the dedicated cDNA array that were implemented in the previous developmental range study. This should allow the analysis of most relevant metabolic routes to potato researchers. As an example, the biosynthesis route of carotenoids during potato tuber development is analyzed in more detail (Fig. 8). Individual enzyme reactions are often represented by multiple features on the POCI array and are likely to represent either the same gene based on the on nonoverlapping 5′ and 3′ ESTs oligo design (gene redundancy) or allelic variants or closely related gene family members. Within the schematic representation (Fig. 8), only genes with a significant homology to known proteins at the amino acid level (>83%) are represented. Various carotenoids accumulate in different concentrations depending on genotype (Morris et al. 2004). It is interesting to note that the most common expression profile of genes in the carotenoid pathway is transient upregulation at tuber organogenesis (Fig. 8, stages 3–4: phytoene synthase, phytoene desaturase, lycopene beta-cyclase, lycopene epsilon-cyclase, carotenoid hydroxylase 1, 9-cis-epoxidase). This indicates a shift in the metabolic flux within this pathway at these stages, and this is in line with high total carotenoid content early in tuber development, as described by Morris et al (2004).
A second expression pattern that can be observed for a number of genes within the pathway is characterized by an increase in transcription levels starting at tuber onset (Fig. 8, stage 3: zeta-carotene desaturase, carotenoid isomerase, LCY-b, carotenoid hydroxylase 2 [CHY2], neoxanthin synthase [NXS], and carotenoid cleavage dioxygenase [CCD]). In particular, CHY2 shows a very strong increase in transcript levels during the later stages of tuber growth. It is interesting to note that the CHY2 gene has been mapped to chromosome 3, directly under a quantitative trait locus for flesh color within a diploid potato population and is considered to be a likely candidate for the observed variation in carotenoid content (data not shown). A recent study by Diretto et al. (2007) shows that silencing of both beta-carotene hydroxylase 1 and 2 results in a dramatic shift in carotenoid concentrations. Potato genes with high homology to both a tomato LCY-b (88%) and a potato NXS (89%) show a strong upregulation similar to CHY2. Based on the sequence homology, it currently remains unclear which of the two reactions is catalyzed by the predicted protein and is, therefore, represented in both reactions (Fig. 8).
The oxidative cleavage of several major carotenoids leads to the production of apocarotenoids. These reactions are catalyzed by a family of CCDs (Auldridge et al. 2006). Control of abscisic acid hormone levels is thought to be at least partially controlled by carotene cleavage dioxygenases, and several studies have shown that the loss of these cleavage enzymes has an effect on plant development (reviewed in Auldridge et al. 2006). It is, however, important to note that the current analysis is performed in S. tuberosum cv. Bintje, which is generally considered to contain low amounts of carotenoids, and therefore, expression profiles may deviate significantly in comparison with high-carotenoid content tubers as shown by Morris et al. (2004). We have presented the biosynthesis route of the major carotenoids as an example of transcriptional control analysis using the POCI array; however, other metabolic routes could be equally well analyzed in a similar fashion using gene annotation and GO classification (http://pgrc.ipk-gatersleben.de/poci).
Within the POCI consortium, we have constructed a new platform for transcriptional analysis of potato using in situ synthesized 60-mer oligo arrays. The oligo design is based on the extended potato unigene set comprising more than 246,000 EST reads from a wide variety of sources. The oligo array contains 42,034 potato features, and its application was first tested through the hybridization of a potato tuber developmental range and provided novel insights in the genes driving this complex process. The design, annotation, and expression data will be made available and will serve as a database resource for potato researchers worldwide (http://pgrc.ipk-gatersleben.de/poci). With the potato genome-sequencing project well underway (www.potatogenome.net), the availability of such a high-throughput transcriptional analysis system will prove important in functional gene annotation and unraveling of complex metabolic or developmental pathways such as tuber development.