Background

White spot syndrome (WSS) is a highly contagious viral disease of penaeid shrimp. The cumulative mortality of diseased shrimp can reach 100% within 3–10 days. Since its first outbreak in 1993, WSS has caused serious economic losses to the shrimp farming industry worldwide. The causative agent, white spot syndrome virus (WSSV), is an enveloped, non-occluded, rod-shaped virus that contains a circular, double-stranded DNA of about 300 kb. This virus has an extremely wide range of potential hosts, infecting not only shrimps, but also other decapods [1, 2]. WSSV infects most shrimp tissues and organs, and it replicates in the nuclei of infected cells. At the late stage of infection, either the nucleus or the whole cell disintegrates, leading to loss of cellular architecture. Both genomic and proteomic approaches have revealed the unique characteristics of the virus, and the virus has been erected as the type species of the new family of Nimaviridae [35].

Due to its serious impact on shrimp aquaculture, there is an urgent need to understand WSSV and to unveil the underlying mechanisms involved in WSSV pathogenesis in shrimp. Although considerable progress has been made in characterizing the virus, information on the host genes involved in WSSV pathogenesis is limited. To identify these host genes, one strategy is to isolate genes that are differentially expressed after WSSV infection. To that purpose, a variety of different approaches have been used, including an mRNA differential display technique [6], suppression subtractive hybridization [7], SSH and differential hybridization [8], cDNA microarrays [9, 10] and ESTs [11]. Both cDNA microarrays and EST libraries are particularly suitable for large-scale gene expression analysis, and both of these methods have been well developed in several model organisms. However, the application of these methods to shrimp is still in its infancy. Consequently, compared to other model organisms, only relatively few sequenced ESTs and microarray cDNA targets are available for shrimp. Furthermore, all of the studies cited above focused exclusively on the identification of immune-related genes with only immune-related organs, (ie, the hemocytes and the hepatopancreas [HP]) being analyzed. However, WSSV is a systemic virus that infects most shrimp tissues and organs, and it is logical to assume that in different cell types, the virus would be likely to modulate the expression of different host genes in order to promote its multiplication in the correspondingly different cellular contexts. If so, then the gene expression changes induced by WSSV in immune-related cells should be different from those in non-immune cells. Therefore, in the present paper, rather than using a specific tissue or organ to investigate only the gene expression patterns of immune-related cells, we instead take a global view by using entire P. monodon postlarvae as our study subject. Our large scale EST approach used two different cDNA libraries, one from normal and one from WSSV-infected P. monodon postlarvae. The respective EST data were then compared to predict the gene expression changes in host shrimp after WSSV infection.

As an additional benefit, this large scale EST study also increases our transcriptomic data for Crustacea and penaeid shrimp. This in turn will improve our understanding of penaeid shrimp biology, which is important because the penaeid shrimp are economically valuable and yet they remain vulnerable to outbreaks of various viral diseases. Research into the genetics and genomics of shrimp has been gaining in importance over the past decade, and a number of penaeid shrimp EST projects have already been undertaken. However, most of these projects and their associated EST libraries were small in scale [1215]. Currently, the two largest penaeid shrimp EST studies have published 13,656 and 10,100 ESTs, respectively [16, 17]. We hope that, together with the 15,981 additional ESTs released with this report, this will provide a good foundation for further research into the genetics, genomics and even the proteomics of shrimp.

Results

Generation and analysis of EST libraries

Two cDNA libraries, PmTwN and PmTwI, were constructed from normal and WSSV-infected postlarvae of P. monodon, respectively. No normalization was applied to these two libraries. A total of 7,200 and 8,064 clones were randomly selected from the normal and infected libraries for DNA template preparation, respectively. After template quality screening, a total of 6,964 and 7,686 cDNA clones were sequenced from the 3' end from the normal and infected libraries, respectively. After base-calling, vector sequence trimming and screening to eliminate low quality sequences and contamination from WSSV and other sources, a total of 6,658 and 7,276 high quality 3' ESTs were generated from the normal and infected libraries, respectively (Table 1).

Table 1 Summary of cDNA libraries and EST data

After finishing the 3' end sequencing, as well as 3' end sequence assembly and annotation, we next performed DNA sequencing from the 5' end for a fraction of cDNA clones. We randomly chose the cDNA clones from the 3' EST contigs that showed no significant hits of the BlastX searches to the NCBI nr database. From the normal and infected libraries, 1,036 and 1,119 clones, respectively, were subjected to 5' end sequencing with SP6 primer. After base-calling, trimming the vector sequence and eliminating low quality sequences, 978 and 1,069 high quality 5' ESTs were generated (Table 1). These were then combined with the high quality 3' ESTs for the final assembly and annotation.

Overall, these high quality ESTs were derived from 6,671 normal and 7,298 infected cDNA clones. The average lengths of the high quality sequences were 678 bp and 652 bp in the normal and infected libraries, respectively. The CAP3 assembly program produced 9,622 unique sequences. Of these unique sequences, 8,258 were singlets, consisting of only one EST, and the other 1,364 were contigs, consisting of at least two ESTs. Most contigs contained 2–4 ESTs, and the largest contig was formed by 854 ESTs. The average length of the unique sequences was 678 bp (Table 1).

We also noted that when both libraries were checked for contamination by WSSV sequence, 167 ESTs were found to be WSSV contaminants in the PmTwI library (E value < 10-25, score > 100), whereas there were no WSSV contaminants in PmTwN. This result is consistent with the PCR screening results for the original, unchallenged postlarvae, and it reconfirmed that the shrimp used in the present study were WSSV-free.

Sequence similarity

BlastX found 2,027 (21.07%) unique sequences similar to known protein sequences in the NCBI nr protein database and 2,026 (21.06%) unique sequences that had GO annotations in the UniProt database. Working backwards from the unique sequences to the original ESTs in the normal library (PmTwN), this translates to 3,022 (45.30%) ESTs that match known protein sequences in the nr database, and 2,870 (43.02%) ESTs with matches in the UniProt database. In the infected library (PmTwI), the corresponding number of matches are 3,338 (45.74%) and 3,202 (43.88%), respectively. These data are summarized in Table 1.

Identification of the most abundant genes in each library

Table 2 and 3 list the 50 most abundant genes in each library. In the normal library (Table 2), most of these abundant genes can be classified into four major groups, including proteins involved in ATP metabolism, proteins involved in translation, proteins highly or specifically expressed in muscle, and proteins highly or specifically expressed in the HP. These results probably reflect the fact that the shrimp postlarvae were in an active growth stage. During active growth, both energy and cellular translation machinery would be needed to synthesize proteins and build muscle, while the HP would be actively engaged in synthesizing digestive enzymes, hemocyanin and other proteins. By contrast, in the infected library (Table 3), the most abundant genes no longer included the HP proteins, but instead included four other groups: immune-related proteins with chitin-binding or lectin domains, proteins involved in glycolysis, cuticle-related proteins and several different actin genes.

Table 2 The fifty most abundant genes in the normal library.
Table 3 The fifty most abundant genes in the WSSV-infected library.

Identification of genes with differential abundance

Among the known genes represented by EST matches, 360 genes were found exclusively in the normal library, 361 genes were found only in the infected library, and 264 genes were cross expressed in both libraries. Based on the number of homologous ESTs, Fisher's exact test found a significant increase in abundance for 23 genes (Table 4), and a significant decrease for 25 genes (Table 5).

Table 4 Unique genes with increased differential abundance in the normal and infected libraries.
Table 5 Unique genes with decreased differential abundance in the normal and infected libraries.

Among the genes with increased abundance, several major groups can be identified, including four proteins with a chitin binding Peritrophin-A domain, seven cuticle-related proteins, four proteins involved in oxidative phosphorylation, two glycolytic enzymes and two ribosomal proteins. Other increased-abundance genes include thioredoxin-1, actin and a protein homologous to CG6055-PA. The decreased-abundance genes can also be classified into several groups, including two SCP calcium-binding proteins, five cytoskeleton/motility-related proteins, three proteins involved in oxidative phosphorylation, three ribosomal proteins and seven proteins that are produced mainly by the HP. These HP-produced proteins include four digestive enzymes, hemocyanin and two immune-related proteins, PmAV and ferritin. Other decreased-abundance genes include opsin and cAMP responsive element binding protein-like 2.

Functional classification based on Gene Ontology

The putative functions assigned to the unique sequences by the Gene Ontology (GO) classification scheme suggested that in the normal and infected libraries, respectively, 3,397 and 3,571 ESTs map to biological processes, 3,408 and 3,188 ESTs map to cellular components, and 2,012 and 2,375 ESTs map to molecular functions (Table 6). In both libraries, most of the corresponding biological process genes are involved in electron transport, transport, protein metabolism, phosphate metabolism and carbohydrate metabolism (Table 6). Table 6 also shows that most of the cellular component genes encode proteins located in the mitochondria, membrane, ribosome, cytoskeleton, and nucleus, and that most of the molecular function genes are associated with catalytic activity, transporter activity, structural molecules, and metal ion binding.

Table 6 Gene Ontology of the sequences with significant Blastx hits in UniProt

Analysis of GO categories showed a significant statistical difference (Fisher's exact test; P < 0.05) between the normal and infected libraries for several biological processes, including carbohydrate metabolism, signal transduction, response to external stimulus, microtubule-based movement, phosphate metabolism, transport and protein metabolism (Table 6). Among the processes that appear to be increased after WSSV infection, carbohydrate metabolism showed the most significant change, mostly because this GO category includes the proteins involved in the glycolytic pathway as well as proteins with the chitin binding Peritrophin-A domain, and the abundance of both of these groups was highly increased (Table 3 and 4). Two categories, signal transduction and response to external stimulus, showed significantly decreased abundance in the infected library. We note that opsin is included in both of these two categories.

Two molecular functions, structural molecular activity and carbohydrate binding, appear to be elevated significantly in the infected library (Table 6). The structural molecular activity category consisted of cuticle-related proteins (including the highly abundant BCS-1 in the infected library), ribosomal proteins, and cytoskeletal proteins. The carbohydrate-binding category included proteins with C-type lectin (CTL) and CTL-like domains and proteins with the chitin binding Peritrophin-A domain, both of which showed significantly increased abundance in the infected library (Table 3 and 4). Significantly decreased categories included metal ion binding, motor activity, transporter activity, signal transducer activity and protein binding. The metal ion binding category consisted of a diverse array of proteins, including cytochrome c oxidase subunit II, various enzymes, the calcium-binding proteins and cytoskeletal/muscle-related proteins.

In the cellular component group, only the cytoskeleton category was significantly different (Table 6). This category included the cytoskeleton/motility-related proteins, which had a decreased abundance in the infected library.

We have constructed a database to host all the sequence data and the analysis results obtained from this study. The database can be accessed through a web interface [18].

Discussion

Shrimps are economically important cultured aquatic animals. However, compared to other aquacultured animals, there have been relatively few studies on shrimp genomics. In the present study, a global analysis of 15,981 high-quality shrimp ESTs revealed 2,027 known genes and 7,595 unknown unique sequences. These ESTs will not only be a valuable addition to the current archived sequences from shrimps, but will also provide a major resource for the comparative analysis of gene expression profiles between normal and WSSV-infected shrimps.

To the extend that changes in EST abundance (Table 4 and 5) are predictive of changes in gene expression, then the present data suggest that in P. monodon postlarvae, WSSV infection modulates the expression of various kinds of genes. The predicted up-regulated genes include several proteins involved in oxidative phosphorylation, cuticular proteins, a protein with C-type lectin (CTL) and CTL-like domains, proteins with the chitin binding Peritrophin-A domain, two glycolytic enzymes, and thioredoxin-1. The predicted down-regulated genes include several proteins that are synthesized in the HP (digestive enzymes, two immune-related proteins, and the hemocyanin), five cytoskeleton/motility-related proteins, four proteins involved in oxidative phosphorylation, and opsin. Several ribosomal protein genes and actin genes are also predicted to be differentially modulated by WSSV.

Cuticular proteins

This is the first study to suggest that WSSV infection strongly up-regulates the expression of cuticular proteins. One prominent feature of arthropods is the cuticular covering of the whole body. Cuticles are highly organized structures made of chitin filaments embedded in a proteinaceous matrix, and they are produced as a layered, extracellular secretion from the underlying epidermis [19]. Pathological studies reveal that the cuticular epidermis is one of the main target tissues of WSSV [20]. At the late stage of infection, this tissue is heavily infected, it loses its cellular architecture and becomes necrotic. Loosening of the cuticle and the appearance of white spots in the cuticular epidermis are two of the pathological characters caused by WSSV infection.

The white spots in the cuticle of WSSV-infected shrimp represent abnormal deposits of calcium salts by the cuticular epidermis [21]. Table 4 suggests that after WSSV infection, a gene corresponding to crayfish calcification-associated peptide-1 (CAP-1) is very strongly up-regulated. Crayfish CAP-1 is isolated from the exoskeleton, has chitin-binding ability and, most importantly, anti-calcification activity [22]. CAP-1 mRNA is strongly expressed in the epidermal tissue during the postmolt stage [23]. Inoue et al. [23] proposed that CAP-1 might play an important role in calcification and cuticle formation in the exoskeleton, and if so, then, the abnormal production of CAP-1 in WSSV-infected shrimp may cause the abnormal deposits of calcium salt, leading to the formation of white spots in the cuticle.

Table 4 includes a protein gene that is a homolog to BCS-1, a gene that was cloned from a subtracted barnacle cypris larval cDNA library by differential screening [24]. BCS-1 mRNA is specifically expressed in barnacle cypris larvae, and during the process of larval attachment and metamorphosis, the amount of BCS-1 mRNA is decreased [24]. The function of BCS-1 remains unknown. However, ScanProsite analysis of the BCS-1 protein sequence revealed that BCS-1 contains the chitin-binding type R&R domain profile, which is a structural feature of insect cuticular proteins. This suggests that BCS-1 is a barnacle cuticular protein, and the homolog to BCS-1 in Table 4 is therefore described as a cuticle protein.

Pathological studies have shown that WSSV infection seriously damages the cuticular epidermis, and this study now suggests that WSSV infection strongly up-regulates the gene expressions of various cuticular proteins. It remains unclear how and why WSSV infection induces the expression of cuticular protein genes, and whether this benefits WSSV. These are questions that deserve further study.

Gene expression in non-primary WSSV-target organs: HP, muscle and compound eye

Compared to the cuticular epidermis, the HP, muscle and compound eye are only lightly infected by WSSV, and these organs remain intact at the late stage of infection [20]. In HP, WSSV mainly infects the myoepithelial cells of the hepatopancreatic sheath and the fibroblast of the connective tissue, whereas the epithelium of the tubules, which synthesize the hemocyanin and digestive enzymes [25], are rarely infected. However, our EST analysis suggests that the RNAs of hemocyanin and several digestive enzymes are strongly reduced after WSSV infection. This suggests that although the epithelium of the tubules are refractory to WSSV infection, at least some of their physiological functions (such as gene transcription) are dramatically affected by the infection of other cell types in HP. If expression of digestive enzymes is down-regulated in HP, then this, together with the fact that WSSV infection targets the stomach, might well explain why WSSV-infected shrimp reduce their food consumption.

The compound eye and muscle are also only lightly infected by WSSV. Even so, it seems that infection severely decreases the transcription of several genes that are primarily and/or highly expressed in these two organs, suggesting that WSSV infection would definitely affect the functions of both organs.

Proteins with the chitin binding peritrophin-A domain

The chitin binding Peritrophin-A domain is found in chitin binding proteins, particularly the peritrophic matrix proteins of insects and animal chitinases [26, 27]. The peritrophic matrix (PM) lines the midgut of insects and it is believed that the PM facilitates digestion and forms a protective barrier to prevent invasion by bacteria, viruses and parasites [28]. There are several classes of PM proteins, and one of these protein classes, the peritrophins, is studied extensively and has been found in several insects [26, 27, 29]. Recent studies have shown that peritrophin proteins also exist in crustaceans [3032]. Khayat et al. [30] were the first to identify two peritrophin-like cDNAs that are highly expressed during oogenesis in Penaeus semisulcatus and the two proteins are components of the cortical rods, forming a jelly layer after fertilization. A similiar protein was also isolated from the mature ovary of Marsupenaeus japonicus [31] Another peritrophin-like protein has also been identified in Fenneropenaeus chinensis. This peritrophin mRNA is constitutively expressed in the ovaries, and can only be induced by E. coli to express in hemocytes, heart, stomach, gut, and gills [32]. In addition, the recombinant protein can bind to Gram-negative bacteria and chitin, suggesting that it may play a role in immune defense and other physiological responses. Interestingly, as shown in Table 4, the four proteins with a chitin binding Peritrophin-A domain can only be identified in the infected library, which suggests that their expressions are induced after WSSV infection and that they may therefore play some roles in the shrimp's antiviral immune response.

C-type lectins

In P. monodon postlarvae, the protein homologous to Drosophila CG6055-PA showed greatly increased abundance after WSSV infection (Table 4). The Drosophila CG6055-PA protein has a C-type lectin (CTL) and CTL-like domain, which is a structural module that has Ca2+-dependent carbohydrate-binding activity. Proteins that have this module are generally known as C-type (Ca2+-dependent) lectins to distinguish them from the other (Ca2+-independent) types of animal lectins. Animal C-type lectins play important roles in innate and adaptive immunity through pathogen recognition and cellular interactions [33]. In invertebrates, C-type lectins are involved in various immune responses, including the activation of the proPO system [34], antibacterial activity [35], and the promotion of phagocytosis [36]. In shrimps, several C-type lectins have been identified. PmLec is a P. monodon C-type lectin that is able to bind to bacterial lipopolysaccharide (LPS) to enhance hemocyte phagocytosis [36]. Fclectin is a C-type lectin gene cloned from the hemocytes of Chinese shrimp Fenneropenaeus chinensis [37]. Fclectin mRNA is mainly expressed in hemocytes and its expression is greatly affected after challenge by bacteria, LPS or WSSV [37]. Neither PmLec nor Fclectin was found in our ESTs, but another C-type lectin, PmAV, was represented. The PmAV gene has been identified in WSSV-resistant P. monodon, and the recombinant protein shows a strong antiviral activity toward a fish virus in vitro [38]. Unlike the CG-6055-PA homolog, however, Table 5 suggests that PmAV was strongly down-regulated. A third C-type lectin, the gene homologous to Drosophila RH18728p, was also identified in our EST database, but its expression does not appear to change significantly after WSSV infection.

Calcium-binding proteins

Table 5 suggests that expression of the sarcoplasmic calcium-binding protein (SCP) α and β subunits is strongly down-regulated after WSSV infection. Shrimp, lobster and crayfish SCPs exist as dimers of two different polypeptide chains, the α and β [3941], and they function as cytosolic Ca2+ buffers. In crayfish, although the SCP is ubiquitously expressed in many tissues, it is most abundant in muscle [42]. If the P. monodon SCP α and β subunits are also both highly expressed in muscle, then this would be consistent with our observation that, just like several other muscle-specific transcripts, these two genes showed decreased abundance after WSSV infection (Table 5). In this connection, we also note that WSSV itself encodes several proteins with EF-hand calcium-binding motifs [4]. This means that WSSV can potentially modulate the calcium ion concentration of infected cells by decreasing the expression of shrimp cell SCP and by simultaneously expressing the viral-encoded calcium-binding proteins.

Glycolytic pathway

Until recently, glycolytic enzymes were considered as "straightforward" enzymes with no sophisticated regulatory properties. However, these enzymes are now known to perform various functions in addition to their innate glycolytic function, and they play an important role in several biological and pathophysiological processes [4345]. Two glycolytic enzyme genes, phosphopyruvate hydratase and fructose 1,6-bisphosphate aldolase, showed increased abundance after WSSV infection. Phosphopyruvate hydratase is a key protein in the glycolytic pathway, catalyzing the conversion of 2-phosphoglycerate to phosphoenopyruvate, but it can also be a receptor for plasminogen [46, 47] or a transcriptional repressor [45]. Together with phosphoglycerate kinase and tubulin, phosphopyruvate hydratase forms an active transcription initiation complex that enhances transcriptional elongation of the Sendai virus genome [48]. In addition, this enzyme has also been described as a stress protein induced by hypoxia [49]. The non-glycolytic functions of the other glycolytic enzyme, aldolase, presently remain unknown. Nevertheless, taken together, all these studies suggest that it would be worthwhile to further investigate whether either or both of these enzymes play an essential role during WSSV infection.

Oxidative phosphorylation

Several proteins involved in oxidative phosphorylation and encoded by mitochondrial DNA showed differential abundance at the RNA level after WSSV infection (Table 4 and 5). WSSV infection decreases the abundance of NADH dehydrogenase subunits 1 and 5 and cytochrome c oxidase subunit II, but simultaneously increased the abundance of cytochrome b, cytochrome c oxidase subunit I and IV and ATPase subunit C. Numerous studies have shown that viruses affect mitochondria in different ways. Morphological changes in mitochondria are induced by human immunodeficiency virus (HIV) [50], human T-cell leukemia virus type 1 [51] and Rubella virus [52]. Changes in location are observed after infection by Human herpesvirus 1 (HHV-1) and Hepatitis B virus [53, 54]. The mitochondrial respiratory chain is affected by simian virus 40 [55], Poliovirus [56], HHV-1 and influenza virus [57]. The present study now suggests for the first time that virus infection might also modulate the amounts of mitochondrial mRNAs. However, it remains to be determined whether such changes could affect the mitochondrial oxidative phosphorylation and hence the generation of ATP.

Actin genes

WSSV infection also differentially affected the abundance of actin genes (Table 4 and 5). Actins are conserved proteins that participate in muscle contraction, cell motility, cell division, and cytoskeletal structure [58]. In almost all eukaryotes, actins are encoded by members of multigene families, and these different actin genes are expressed differently across different cell types, tissues, and developmental stages [59]. In vertebrates, three main groups of actin isoforms, α, β and γ, have been identified. The α-actins are found in muscle tissues and the β and γ-actins coexist in most cell types as components of the cytoskeleton. The actin and the microtubule cytoskeleton play important roles in the life cycle of every virus, from the beginning of virus attachment to the host cell to the final assembly and egress of virus [60]. In spite of this, however, some viruses actively degrade some specific host mRNAs, including β-actin, to shut off host cell protein synthesis [61, 62]. Two recent studies have revealed the interplay between WSSV and the shrimp actin gene. One of the major WSSV structural proteins, VP26, interacts with actin [63], and the actin mRNA becomes unstable after WSSV infection [10]. The present study now suggests that the expression of these P. monodon actins is differentially modulated after WSSV infection. Two isoforms (gi|113216 and gi|3907620) showed increased abundance (Table 4), whereas one (gi|3907622) showed decreased abundance (Table 5). We also note that relatively speaking, the P. monodon actins are not very well documented. For instance, while Artemia is shown to have 8–10 actin genes and 4 isoforms, one of which is muscle specific, and crab (Gecarcinus latefalis) has seven or eight documented actin genes [64], only two P. monodon actin genes have so far been released to a public database. The previously undocumented actin genes listed in our EST libraries (see Table 2) now add several more actins to that list.

The predicted modulation of actin isoforms by WSSV also means that actin is not a good choice of reference gene in RNA level studies of WSSV virus/host interactions. This is in addition to the already known difficulty that while the vertebrate muscular actins are recognizable from their amino acid sequence, the invertebrate actins cannot be distinguished in the same way because they all resemble the vertebrate cytoplasmic actins [65, 66]. Clearly, further studies will be needed to characterize and classify these P. monodon actin genes by their tissue distribution patterns, with a particular focus on the isoforms that have their expressions modulated by WSSV.

Conclusion

In conclusion, the 15,981 high-quality ESTs generated in this study provide a rich source for identification of novel genes in shrimp and for comparative analysis of gene expression patterns in normal and WSSV-infected shrimp. An EST-based strategy not only greatly facilitates in silico expression profiling, it also provides an experimental approach to elucidate WSSV pathogenesis and to investigate the shrimp's response to virus infection. Our data suggest that in postlarval shrimp, WSSV infection strongly affects the physiological functions of several organs/tissues, including the HP, muscle, eyestalk and cuticle, and that the expressions of several genes in these organs/tissues are strongly modulated. In addition, WSSV is predicted to affect several basic cellular metabolic processes, including oxidative phosphorylation, protein synthesis, glycolysis, and calcium ion balance.

Methods

Shrimp, virus and challenge

The postlarvae used to construct the cDNA libraries were at the PL20 stage, with an average length and weight of 9 mm and 0.02 g, respectively. These postlarvae were derived from WSSV-free Penaeus monodon spawners. They were cultivated in the Tungkang marine laboratory of Taiwan's Fisheries Research Institute. The WSSV suspension used to challenge the postlarvae was prepared from frozen (-80°C), WSSV-infected black tiger shrimp that had been collected during a natural outbreak of WSS in 1994 [67]. The postlarvae were challenged by immersion. At 66 hours after infection, they were collected in cryotubes and stored in liquid nitrogen for later RNA extraction. They were further confirmed by PCR to be WSSV-infected. Normal, WSSV-free postlarvae were also collected and stored in liquid nitrogen and their WSSV-free status was confirmed by PCR.

RNA extraction and cDNA library construction

For each library, RNA was extracted from a pool of about 50 entire animals weighing approximately 1 g in total. Total RNAs were extracted from normal and WSSV-infected postlarvae using RNAzol B reagent (Teltest Ltd. Friendswood, TX), and the mRNAs were purified from the total RNAs using the QuickPrep™ Micro mRNA Purification Kit (GE healthcare) following the protocol provided by the manufacturer. The respective cDNA libraries were constructed using the λZAP-cDNA library construction kit (Stratagene) according to the manufacturer's instructions. In brief, the first-strand cDNA was synthesized from 5 μg of mRNA with the oligo-d(T) linker-primer, 5'-(GA)10ACTAGTCTCGAG(T)18-3', and the MMLV (Molony murine leukemia virus) reverse transcriptase supplied in the kit. The second strand of cDNA was synthesized with DNA polymerase I in the presence of RNase H, and then the cDNA was blunt ended with pfu DNA polymerase. Both strands of the cDNA were ligated with the Eco R I adapter, and digested by Xho I. cDNA fractionation was performed using Sepharose CL-4B gel filtration, and the cDNA was then inserted into the Eco R I-Xho I site of the Uni-Zap phage vector. The resulting phage libraries were converted to pBluescript phagemid libraries by massive in vivo excision using ExAssist helper phage. About 8,000 white colonies were randomly isolated from both normal and infected libraries on LB-ampicillin plates containing IPTG and X-gal, and plasmid DNA was extracted from each library.

Sequencing

Randomly selected clones were inoculated into individual wells of 96-well plates with 170 μl LB media containing 8 μg ampicillin, and the plates were incubated at 37°C for 18 hours. The DNA templates were prepared using Montage Plasmid Miniprep96 kits, and checked by electrophoresis in 1% agarose gel. Sequencing was carried out using the BigDye version 3.0 sequencing reaction kit using the optimal protocol provided by the manufacturer. DNA sequencing from the 3' end and 5' end of the cDNA was conducted with T7 or SP6 primers, respectively, on a high-throughput automated sequencer (MJ Research BaseStation and ABI3730, USA) using standard protocols.

Sequence analysis

The raw traces for ESTs were subjected to base-calling by running Phred (Q > 13). pBluescript vector sequences were trimmed using Cross_match with default parameters (minimatch 12, penalty -2, minscore 20). Those ESTs having a length of more than 100 bp after vector trimming were subject to further analysis. BlastN [68] was used to find matching sequences with E value < 10-5 in the whole WSSV genome (GenBank accession no. AF440570), and also to screen out possible contaminants from bacterial chromosomal DNA, RNA, and lambda phage DNA. The interspersed repeats and low complexity sequences in the ESTs were then masked by RepeatMasker using the Drosophila repeat sequence as reference. Low quality sequences, including short sequences (less than 100 bp) and those with a high percentage of nucleotide A (adenosine) or N (uncertain read) were considered uninformative and were eliminated from further analysis. The ESTs that passed through the above quality check procedures were considered high quality ESTs. The high quality ESTs in both the normal and infected libraries were combined and assembled to form contigs using CAP3 [69] with the overlapping percentage parameter set to > 95% in order to obtain highly reliable contig sequences. ESTs that did not form contigs designated singlets. Collectively, the resultant contigs and singlets are referred to as unique sequences.

Functional annotation

Putative functions of the unique sequences were discovered by using BlastX to translate each nucleotide query sequence into all reading frames and then searching for matches in the NCBI non-redundant database. Significant hits (with E value < 10-10) in the NCBI nr database were followed up with protein function searches in the UniProt database [70], which provides value-added information reports for protein functions. The UniProt reports consist of Gene Ontology (GO) annotations that classify proteins by biological process, cellular component, and molecular function. Each unique sequence was tentatively assigned GO classification based on annotation of the single "best hit" match in UniProt. These data were then used to classify the corresponding genes according to their GO functions.

Expression analysis and statistical evaluation of EST occurrence

The unique sequences were considered to have increased abundance if they had a significantly greater number of hits (ie. more ESTs) in the WSSV-infected library compared to the normal library. Conversely, unique sequences were considered to have decreased abundance if they had significantly more hits in the normal library. The statistical significance of homologous ESTs with differential abundance was determined using Fisher's exact test [7173], which is widely used to evaluate 2 × 2 contingency tables. Fisher's exact test produces a significance value P ranging between 0 and 1, where a value close to 0 implies that there is a significant differential abundance of the gene or the annotated function between the normal and infected libraries. The significance of differential abundance genes with a P value smaller than 0.001 was considered "very strong". P values between 0.001 and 0.01, and between 0.01 and 0.05 were considered "strong" and "moderate", respectively.

Availability and requirements

Project name: Penaeus monodon Functional Genomics Database

Project home page: http://xbio.lifescience.ntu.edu.tw/pm/

Operating system: Platform independent

Programming language:PHP