Assembly, ORF Prediction, and Annotation
In the global assembly, i.e. when pooling sequence reads of all 5 libraries, 8573 passed EST reads clustered to 3593 tentative unigenes (TUGs), 2496 of which were singlets and 1097 were contigs, i.e. clusters consisting of 2 or more reads.
The annotation using the different protein and domain databases yielded largely consistent results. We considered only TUGs that were ≥ 100 nt and hits in a positive reading frame. A core of 1893 TUGs could be annotated with all three databases (Fig. 1), whereas 753 TUGs could not be annotated at all, resulting in total annotated fraction of 79% of all TUGs (2840/3593). Using the Prodom database, a total of 115 TUGs were associated with the key word “transcription,” whereas 35 (0.94%) were specifically designated as transcription factors. The majority of hits to the gene ontology GO database comprised genes of Arabidopsis thaliana (mouse-ear cress; 1857/3593) and Oryza sativa (rice; 853/3593). According to the database comparisons, 761 of all TUGs (singletons or contigs) contained the complete open reading frame, 899 contained no ORF or were not annotated, 1060 and 545 contained portions of the 3′- and 5′ untranslated region (UTR), respectively, whereas 481 genes contained stretches of both UTRs.
After initial annotation, 2 or more TUGs showed the same highly significant BLASTX score for particular SwissProt proteins in 188 cases. In these cases, the assembly into TUGs was manually edited. In 84 cases, we obtained novel “merged” TUGs from 2 or sometimes 3 initial CAP3 TUGs, indicated as “merge” instead of “contig” in the database. As criterion for merging, tentative gene clusters were considered to belong to the same gene if they overlapped < 20% of their read length. Otherwise, they were considered recent duplicates, based on the assumption that clustering in CAP3 is correct in producing different clusters given sufficient sequence overlap. This slightly altered the overall library statistics (total TUGs 3496 including 84 merged TUGs; total annotated TUGs 2743 = 78.5%). The assessment of differential expression among libraries was done with the modified data set.
Microsatellite Identification
In total, we identified 210 genes (6.01%) that contained a total of 223 microsatellite motifs under the specified criteria, i.e. some TUGs contained more than one motif. Among these we found 82 dinucleotide repeats, with the majority being AG/TC and AT-repeats, 113 trinucleotide repeats, and a few tetra-, penta-, and hexanucleotide repeats (Table 2). As hypothesized, trinucleotide repeat motifs are more abundant within the coding region of genes (55) than outside (24, Table 3). In contrast, repeat motifs containing less or more than three nucleotides, resulting in frameshifts when undergoing slip-strand mutations, are primarily found in untranslated regions of a transcript (65), whereas rare in ORFs (16). This difference was statistically significant in a (2× 2 contingeny table, df = 1, Chi-square = 42, P < 0.0001). Consistent with the prediction that microsatellites causing no frame-shift may be more abundant within ORFs, the only detected hexa-nucleotide repeat (AATACC9; unigene ZMD04004) was found within an open reading frame. This TUG had two domains that resembled a zinc-finger domain in PRODOM (PD007661, E = 4e-16). The identity of the gene itself is unclear, it may be a transcription factor, consistent with the zinc-finger domain, or a salt-tolerance like gene (Swiss Prot ID Q9SYM2, Arabidopsis thaliana, E = 4e-20).
Table 2 Composition and length of microsatellites detected among Zostera marina ESTs
Table 3 Position of microsatellites with respect to putative open reading frames (ORFs)
Comparison of Tentative Unigene Frequencies among Libraries
In a global comparison according to Susko and Roger (2004), divergent patterns of gene expression were detected among all pre-planned library contrasts (Table 4). We thus proceeded with a more detailed analysis of single genes that were differentially expressed (d.e.). Given the total number of TUGs in the libraries to be compared, the minimal frequency for detecting d.e. was four reads for the comparison library C vs. library D, and for library D + E vs. F. Accordingly, of the subset of 149 genes where differential expression is detectable, 7 were down- and 19 were up-regulated under winter conditions (library D) versus summer conditions (library C, supplementary Table S1). Qualitatively, many genes of the light reaction, in particular light harvesting proteins (as in TUGs contig 62, 114, merge 29, 71, 188) and reaction subunits themselves (contig 107, 172, 188) are more abundant under summer conditions (Table S1).
Table 4 Global comparison of EST library composition based on the frequency spectrum of single sequence reads contributing to tentative sequence clusters, according to Susko and Rogers (2000)
Note that the comparison of libraries C and D lacked the statistical power of the other comparison, as only 1248 passed sequence reads comprise the first library (Table 5). Therefore, despite the very low P value for the global comparison, relatively few individual genes are d.e. In the remainder of this study, we, therefore, focus our discussion on the comparison under experimentally induced stress conditions.
Table 5 Summary statistics of five eelgrass (Zostera marina) EST libraries
Our second comparison of libraries concerned the experimental response to heat (and possibly also uprooting and translocation) stress. As for the heat stress response, of 333 TUGs compared among the libraries D + E vs. F, 27 (8%) were up- and 36 (11%) were down-regulated under heat stress (Tables 6 and 7). Among the strongest responses was a 7-fold up-regulation of a putative photosystem I assembly protein (SwissProt Q3BAN1), and a 6-fold increase in a light harvesting, chlorophyll-binding protein (SwissProt P27495). Down-regulations observed were a 15-fold reduction in a chloroplast precursor gene (SwissProt Q6K953), and a 6-fold reduction in a metallothionein-like gene (SwissProt Q40256). Because in several cases, one library contributed no reads to the relevant gene cluster, frequencies could not be estimated, but fold-changes may even be higher in these cases.
Table 6 Zostera marina TUGs significantly up-regulated in library F (heat stress) with respect to library D (designated DF), E (EF), or both libraries pooled (DE-F)
Table 7 Zostera marina TUGs significantly down-regulated in library F (heat stress) with respect to library D (designated DF), E (EF), or both libraries pooled (DE-F), in descending order of total expression level
Among temperature responsive genes, 7 of 27 (26%) and 5 of 36 (14%) TUGs, respectively, had a role in photosynthesis, predominantly in the light reaction (photosystem I and II). Although under laboratory exposure with higher temperatures, several light harvesting complex proteins were up-regulated (as in contigs 983, 787; merge29 and 73, Table 6), the reaction subunits themselves were down-regulated (as in contig 314, 326 and 341, putative homology to photosystem I and II reaction center subunit genes, Table 7). The dark reaction also was affected. A 10-fold down-regulation upon heat exposure also is observed in the primary gene of photosynthetic carbon fixation, Rubisco (contig69).
Frequency of Microsatellites in Differentially Expressed Genes
When comparing the abundance of microsatellites among differentially expressed genes (comparison D + E vs. F only) with all other TUGs, no significant difference could be detected. Of all up-regulated genes, 5.16% carried a microsatellite, whereas of all down-regulated TUGs, 13.89% carried microsatellites. Both frequencies were not significantly different from the global frequency of microsatellites among all TUGs (6.01%) in Chi-square tests.
Abundance and Diversity of Genes Encoding Stress Proteins
Among 3496 TUGs we found 9 genes encoding for diverse families of heat shock proteins (HSP) that are known to be involved in mediating high temperatures and other stresses (Boston et al. 1996). All of those were genes encoding HSPs of large molecular weight > 60 kDa (Table 8). We also identified one heat shock transcription factor B4 (singleton, ZMD01094, Table 8). Because HSP-genes were too rare to allow frequency-based tests on single genes, we lumped them according to the GO category (biological function) response to heat. Interestingly, we find a higher frequency of HSP genes sensu latu under the summer conditions (library C vs. D, 7 vs. 11 reads; P = 0.035), but no significantly different frequencies among the “winter” and the heat stress libraries, with largely similar contribution of HSPs to the total number of reads [library D (7 reads), E (9), F (13)]. A stress-mediating gene that was significantly up-regulated under heat stress may be involved in scavenging reactive oxygen species, a Mn-superoxide dismutase (contig901, SwissProt P35017, Table 6).
Table 8 Zostera marina putative HSP (heat shock protein) encoding genes and heat shock transcription factors
Comparison of Z. marina Differentially Expressed Genes Against Arabidopsis thaliana
Among the 63 TUGs that were differentially expressed (d.e.) in library F (heat stress 25°C) vs. D + E, a BLASTX search with the MIPS Arabidopsis database resulted in 48 significant hits. These were distributed over all five chromosomes in the Arabidopsis genome. There also were three significant hits in the chloroplast genome (ycf4, rps18, atpA). Among those 48 genes, we found a highly significant overrepresentation of several GO categories. There were significantly more photosynthetic genes and those taking part in chromatin binding regulated differentially than expected by chance (GO molecular function, both P < 0.001). In terms of GO category, biological function, photosynthesis was more affected than any other process (P < 0.001). Finally, in terms of cellular components, mainly photosystem I components and light harvesting complexes were d.e., confirming above qualitative finding on composition of significantly up- or down-regulated genes on an individual basis (Tables 6 and 7).
When comparing Z. marina with the ATGenExpress, a stress specific database of Arabidopsis thaliana, 24/63 (38%) d.e. eelgrass genes also were reported to change expression levels in A. thaliana as response to stress (Table 9). The major organ of expression varied and comprised root and shoot. Interestingly, although many were responsive to the same stress type as in our Z. marina data (i.e. heat stress), several of these (10/23 = 43%) are primarily responding to osmotic and salt stress in Arabidopsis. Whether this reflects functional changes of genes in Zostera after adaptation to the marine environment requires further study.
Table 9 Similarity among eelgrass (Zostera marina) TUGs and Arabidopsis thaliana stress response
Only approximately half of the Arabidopsis thaliana stress response genes with putative homology to Z. marina TUGs are predominantly expressed in the shoot, whereas the others are characteristic for the root, although the tissue type used for constructing the Z. marina library did not contain root material. Whether this, too, reflects functional dissimilarity driven by the different habitat type or taxonomic affiliation of Arabidopsis (dicot) vs. Zostera (monocot) is unclear.
Further Characteristics of the Heat Stress Response
Under all experimental conditions, the transcriptome of Z. marina is dominated by a gene encoding for a cystein-rich metallothionein-like protein (mt3) comprising between 2.5% and 15% of all transcripts (contig479, SwissProt ID Q40256). Although one reported primary function of such genes is heavy metal homeostasis, in particular copper (Guo et al. 2003), such a dominant frequency suggests that this gene must be responsible for other important functions as well. Note, however, that this putative metallothionein is down-regulated approximately 6-fold under temperature stress. Almost exactly the same down-regulation was observed in winter (library D) compared with average summer conditions (library C, Table S1). Interestingly, in the transcriptome of the Mediterranean seagrass species Posidonia oceanica we also find a similar TUG that is even more abundant (G. Procaccini, personal communication, 2007). Finally, we have probably identified several genes that have a high homology to Dictyostelium, a social amoeba or slime mold (Table 6). One of those genes, encoding a 26S proteasome regulatory subunit, shows a significant up-regulation under heat stress (merge17; SwissProt ID P02889).