Background

The protozoan parasite Perkinsus marinus is a facultative intracellular parasite of mollusks, and the causative agent of "dermo" disease in both wild and farmed Eastern oyster (Crassostrea virginica) populations along the Atlantic and Gulf coasts of the USA [14]. Since its initial description, P. marinus has generated substantial controversy with regards to its taxonomic placement [5, 6]. A close relationship with the Apicomplexa was initially proposed based on the ultrastructural analysis of the zoospore, which revealed the presence of organelles resembling an apical complex [7]. Molecular evidence gathered over the following years revealed affinities with the Dinozoa [8], and even suggested this group to be its closest extant taxon [9, 10]. More recently, however, ultrastructural similarities and molecular phylogenetic affinities to Parvilucifera sp., a parasite of microeukaryotes, led to the establishment of the phylum Perkinsozoa, which like the Apicomplexa, is only comprised of endoparasites [11]. This new phylum is considered to be one of the earliest diverging groups from the lineage leading to dinoflagellates, albeit close to the ancestor from which the ciliates, dinoflagellates, and apicomplexans originated [1214].

Together with the various emerging diseases in the estuarine and marine environment [15], infections by P. marinus and other Perkinsus spp. are responsible for devastating losses in shellfisheries of economically relevant mollusk species worldwide, including oysters, clams, and abalone [16]. Further, given the critical role that oysters and other filter-feeding bivalves play in maintaining environmental water quality, the dramatic declines in bivalve populations, caused by dermo disease, have had correspondingly detrimental impacts on the estuarine environment. Although several intervention strategies have been implemented to control dermo disease, they have had little or no success [17]. During the early 1990s, however, the development of in vitro culture methods for P. marinus[1820] provided a key resource that enabled studies leading to the identification of new targets for intervention [16]. These studies ranged from fundamental cellular, biochemical, molecular and genetic studies of P. marinus biology to the direct in vitro testing of potential chemotherapeutic drugs that might suppress its proliferation [2127].

The life cycle of P. marinus includes a free-living motile stage (zoospore) and a non-motile vegetative stage (trophozoite). The parasite is ingested by the host during filter-feeding, and is then phagocytosed by hemocytes present in the alimentary canal. Although the infection mechanism has not been fully elucidated, once inside the host, P. marinus trophozoites are recognized via a galectin present on the surface of phagocytic hemocytes [28, 29], internalized, and localize inside phagosome-like structures. The phagocytosed trophozoites remain viable by abrogating the host's respiratory burst through their effective antioxidative machinery [30, 31], and retain their proliferative capacity [32]. Hemocyte migration throughout host tissues leads to systemic infection and eventually death [6, 33]. Therefore, in addition to those genes that mediate intrahemocytic survival, the trophozoite is likely to express additional genes that are involved in nutrient acquisition, proliferation, and pathogenesis.

Expressed sequence tag (EST) surveys [34] have proven to be a viable approach for gene discovery and therapeutic target identification in a variety of microbial pathogens and parasites [3538]. EST analysis offers a rapid and valuable first glimpse of gene expression at a particular life cycle stage or under certain environmental conditions. The number of gene sequences published or available in GenBank for the genus Perkinsus is very limited, and mostly consist of ribosomal RNA (rRNA) sequences. In this study, we analyzed 31,727 ESTs generated from trophozoites from two different P. marinus strains, one of which was studied in both the presence and absence of oyster serum. Grouping of the ESTs into clusters and singletons resulted in 7,863 unique sequences. Together, they provide the first broad-based molecular view into the basic biology and cellular metabolism of this protozoan parasite of unique phylogenetic position.

Methods

Parasite cultures

All P. marinus strains were maintained in DME: Ham's F12 (1:2) supplemented with 5% fetal bovine serum (FBS) as reported elsewhere [39]. Two cultured strains of P. marinus trophozoites, CB5D4-ATCC PRA-240 (the strain used for the sequencing the P. marinus genome; [40]) grown in standard medium, and TXsc-ATCC 50983 [18] grown in both standard medium and medium supplemented with C. virginica serum (50%), were used for RNA isolation and subsequent cDNA library and EST generation.

RNA extraction and library construction

A total of four (non-normalized) libraries were constructed using P. marinus trophozoites. Three libraries were constructed from strains propagated in standard culture medium (TXsc and CB5D4), and one from the TXsc strain propagated in medium supplemented with C. virginica oyster serum (50%). Oyster serum was prepared as reported earlier [41]. Briefly, shells were notched at the posterior end and dorsal side of the shells, cleaned with ethanol and hemolymph drawn from the adductor muscle (approximately 2 ml hemolymph per individual oyster) with a sterile syringe fitted with a 19-gauge needle. Hemolymph samples were centrifuged at 800 × g at 4°C for 10 min, the supernatant serum separated from the cell pellet, mixed at equal parts with P. marinus culture medium and used for the experimental cultures described above.

Perkinsus marinus TXsc cultures were centrifuged for 5 min at 490 × g, and RNA extracted from the pellets (1.5-2.0 ml packed P. marinus trophozoites) by the guanidine isothiocyanate method [42]. One mg of total RNA was used for Poly A+ isolation and 5 μg of Poly A+ enriched P. marinus RNA was used to construct each P. marinus TXsc Lambda ZAP library following the manufacturer's instructions (Stratagene, La Jolla, CA). Each library was amplified once through Escherichia coli XL1-Blue MRF and stored at -80°C. Total RNA from the strain CB5D4 was extracted as above. A commercial service (Express Genomics, Frederick, MD) was used to construct the cDNA libraries. Since the libraries were not normalized and to avoid redundancy of the ESTs, a large-insert library (average insert size 2.4 kb) and a small-insert library (average insert size 1.4 kb) were constructed in pExpress 1 (Not I-Eco RV cloned into T1 phage-resistant DH10B E. coli).

DNA sequencing

P. marinus TXsc ESTs were obtained using two methods. First, 308 individual recombinant phage were selected randomly and cored out from the LB agar plates into micro-centrifuge tubes containing 400 μl of TMG (Tris-magnesium-gelatin) buffer and two drops (~50 μl) of chloroform. cDNA inserts were amplified with T3 and T7 primers, resolved in a 1.5% agarose gel (w/v), recovered using the QIAex II kit (QIAGEN, Valencia, CA), and used for direct sequencing. Second, pBluescriptKS(-) phagemid from the Lambda ZAP vectors was mass excised from both libraries and individual colonies grown for 24 h in a volume of 3 ml for plasmid preparations. In both cases, single-pass sequencing of the 5' end of the cDNA clone was carried out to generate the ESTs. P. marinus CB5D4 ESTs were sequenced using the M13 primers.

Clustering and assembly of EST/mRNA sequences

EST sequences were pre-processed to determine the sequence quality and to remove cloning vector sequences from the reads using the Phred and Cross-match software http://www.phrap.org. Poly-A/T tail trimming was done using the 'EMBOSS' Trimmest program [43] before submission to GenBank (accessions #EH059339 - EH090757; GR954914-GR955219; GR955352-GR955353). Sequence assembly was performed by first clustering the ESTs into groups of similar sequences, using TIGR's TGICL [44]. Subsequently, each cluster was separately assembled into consensus sequences consisting of the longest non-redundant stretch of multiple aligned ESTs, using the CAP4 algorithm (Parcel Inc.; http://www.paracel.com). The sequences that did not cluster were treated as singletons. The cluster consensus and singleton sequences are named Pm00001-Pm07863 and are available in Additional File 1.

Annotation of the P. marinus EST sequences

Consensus sequences of the EST assemblies and singletons were compared with the NCBI non-redundant (nr) protein database (May 2009) using the BLASTX algorithm and the GenBank dbEST database (May 2009) using the TBLASTX algorithm [45] with the default parameters and with a cut-off E-value ≤ 1e-5. P. marinus ESTs were removed from dbEST to avoid self hits during screening. Determination of the taxonomic affinities of hits was based upon an NCBI taxonomic trace-back of best hits. For ease of presentation we have grouped the red and green algae together with the plants, and the brown algae with the Stramenopiles.

The PLAN web system (Personal BLAST Navigator, Noble Foundation) was used to assign functional annotation based on the top BLASTX hit and the gene ontology (GO) sequence database [46]. To identify poorly conserved, or short fragments of genes contained in the ESTs, six-frame translations of the sequences were generated. This resulted in 23,888 open reading frames (ORFs) that are ≥ 75 amino acids. We searched the ORFs with Pfam (Protein Families Database) (ver. 22.0) with an E-value threshold of 0.1 to identify protein family domains. Putative signal peptides were identified using SignalP 3.0 server http://www.cbs.dtu.dk/services/SignalP/ and SecretomeP 2.0 server http://www.cbs.dtu.dk/services/SecretomeP/ using default parameters.

Analysis of orthology

We used the annotated proteins from 21 genomes [Additional file 2: Supplemental Table S1] from diverse organisms across the tree of life together with the P. marinus proteome (ORF translations of ESTs) to identify orthologs with the OrthoMCL algorithm [47]. The OrthoMCL parameters used for the analysis were: BLASTP E-value > 1e-25 and an inflation parameter of 1.5. Multiple sequence alignment (MSA) was performed on ortholog groups that are shared by ≥ 4 taxa including P. marinus. MSA was carried out using ClustalW [48] enabling the 'TOSSGAPS' option and using the default values for all other parameters. The regions that contained gaps or were highly divergent were removed from the MSA by GBLOCKS [49] using its default settings. Phylogenetic analysis was performed on the filtered multiple sequence alignments using Seqboot, Protdist, Neighbor and Consense Tree (Phylip) [50], and the nearest neighbor of each P. marinus sequence was determined.

Searches for spliced leaders

The first 22 bp from the 5' end of each of the consensus ESTs were extracted and analyzed with the de novo pattern-finding algorithm MEME to identify over-represented patterns. The width for patterns was set at 22 nt and the zoops (zero or one per sequence) mode of occurrence was specified. Patterns that contained similarity or partial similarity to previously identified spliced leader (SL) sequences in dinoflagellates and Perkinsus marinus were examined in further detail. ESTs that contain potential partial SL sequences at the 5' end were enumerated, and the draft sequence of the P. marinus genome (GenBank Project ID: 12736) was searched using regular expressions to find exact SL consensus sequences. Genomic sequence (+/- 200 nt) flanking each of the putative SL loci was obtained, compared, and used to search the P. marinus ESTs.

Results and Discussion

EST sequence analysis

Large-scale single-pass sequencing of cDNA clones randomly picked from libraries has proven to be a powerful gene discovery approach and represents a "snapshot" of gene expression in a given tissue and/or developmental stage [51, 52]. We took advantage of in vitro culture methods for P. marinus trophozoites to generate the first EST data set for a member of the Perkinsozoa, a phylum that includes parasites of mollusks and microeukaryotes [11]. Sequencing of the 5' end of cDNA clones from the two strains of P. marinus trophozoites cultured in either standard or oyster serum-supplemented medium resulted in 4 EST libraries. The number of ESTs sequenced for each library is shown in Table 1. Initial clustering using TGICL software (JCVI) divided these sequences into 4,876 clusters. The clusters were then further assembled using CAP4 to produce 5,181 final clusters and 2,682 singletons, yielding a total of 7,863 unique sequences (Pm0001 - Pm07863), or potential unigenes [Additional File 1]. The average cluster length was 1,066 bp.

Table 1 EST sequence numbers by library, Perkinsus strain and sequence source.

Comparison of ESTs obtained in the presence and absence of oyster serum

Previously, we have shown that supplementation of the Perkinsus culture medium with plasma from heavily infected C. virginica oysters resulted in a 32% decrease in proliferation of P. marinus; the inhibitory effect is less pronounced with plasma from moderately infected and uninfected oysters [41]. C. virginica plasma can also specifically inhibit protease activity, including from P. marinus. Analysis of the EST sequences from the serum-supplemented library revealed: aldolase, proteases, histone-specific proteins, serine/threonin-protein kinase, deoxycytidylate deaminase and oxidoreductase enzymes, and various transporters [Additional file 2: Supplemental Table S2]. However, a comparison between the ESTs obtained from the standard and oyster serum - supplemented culture (Table 1) is not possible. There are insufficient ESTs for conducting statistically-significant comparisons under the different conditions due to the very small sample size from the serum-supplemented library. Combined sequences from both culture conditions were used for all subsequent analyses.

Similarity searches of the NCBI nr protein database with the P. marinus 7,863 unique sequences revealed that 4,325, (55%) have significant similarity to proteins in the database with E-value ≤ 1e-5. Figure 1 shows the cumulative distribution of nr BLASTX hits by taxonomic group. The P. marinus sequences had the largest number of hits to protein sequences of alveolates followed by hits to proteins of Metazoa/Fungi and Viridiplantae. The Viridiplantae hits may be significant with respect to the relict plastid organelle and are discussed below. Interestingly, there are also a number of hits to viruses. Subsequent the description of P. marinus and most other Perkinsus spp., transmission electron microscopy (TEM) studies have shown "virus-like particles" in both the nucleus and in the cytoplasm [6, 53, 54]. Searches of the newly generated Perkinsus genome sequence for viral sequences reveals more than 350 sequences annotated as putative retrovirus polyproteins. Especially interesting is the EER13269 sequence, which encodes a 3,947 aa protein with putative hits to viral related proteins including RNase H, integrases, retroposons, reverse transcriptase, transposase, and arginine methyltransferase-interacting proteins. TBLASTN analysis of the Perkinsus ESTs with EER13269 results in numerous significant hits to clusters: Pm01789, Pm01239, Pm01095, Pm07693, Pm07559, Pm06487, Pm00183, Pm04306, Pm01512, Pm02880 and Pm06668. In addition, ESTs Pm00328, Pm01050, Pm04352, Pm06163, Pm07375 and Pm07559 have hits to other viruses including retroviruses. The genome annotation together with EST data provide preliminary evidence for the presence of a virus and/or retrotransposon elements in the Perkinsus genome, a finding with potential implications for Perkinsus biology, tool development, and the development of intervention strategies.

Figure 1
figure 1

Distribution of nr and dbEST best BLAST hits in GenBank. The comparative cumulative distribution of the best BLASTX and best non-self TBLASTX hits in the NCBI non-redundant (nr) protein database (E-value ≤ 1e-5) and NCBI dbEST database (Evalue ≤ 1e-5), respectively for P. marinus consensus sequences among the major taxonomic groups.

Figure 2 shows the distribution of the most significant hits among the Chromalveolata, a supergroup that encompasses alveolates (dinoflagellates, apicomplexans, ciliates) and related protistan phyla (diatoms, others). Within the chromoalveolates, protein sequences from organisms in the phylum Apicomplexa had the highest number of hits (1,213), although this result is biased by the relative paucity of dinoflagellate sequences in nr relative to apicomplexan sequences [Additional file 2: Supplemental Table S3]. The Perkinsozoa (Perkinsus, Parvilucifera, and Rastrimonas) are considered the earliest diverging sister group of the dinoflagellates [5, 8, 55]. Indeed, this phylogenetic position at the base of the dinoflagellate branch makes the Perkinsozoa a key taxon for understanding unique adaptations (e.g. parasitism) within the Alveolata [56]. The similarity of P. marinus sequences to various members of these phyla provides important insight into their relevance in alveolate-specific physiology, and helps prioritize them for functional characterization. Even though protein sequences from ciliates are well represented in the nr database [136,531 from 2 ciliate genome sequences as of May, 2009; Additional file 2: Supplemental Table S3], the relatively small number of best hits to ciliates indicates the closer affinity of P. marinus to dinoflagellates and apicomplexans than to the ciliates.

Figure 2
figure 2

Distribution of nr and dbEST best BLAST hits among Chromalveolates. The comparative cumulative distribution of the best BLASTX and best non-self TBLASTX hits in the NCBI non-redundant (nr) protein database (E-value ≤ 1e-5) and NCBI dbEST database (Evalue ≤ 1e-5), respectively for P. marinus consensus sequences among the Chromalveolata.

EST gene discovery projects for several members of the dinoflagellates including Alexandrium tamarense[57], Alexandrium fundyense[58], Karenia brevis[59], Lingulodinium polyedrum[60, 61], Amphidinium carterae[60], and Symbiodinium sp. [52] have been completed. TBLASTX comparisons of all P. marinus sequences to the NCBI dbEST database were performed. Figure 1 shows the top hits of the Perkinsus sequences to ESTs from various taxonomic groups. TBLASTX analysis of 7,863 sequences against dbEST resulted in 3,698 top hits (47%), within which the Alveolata were the most well-represented. Among the chromalveolates, dinoflagellates had the maximum number of top hits (690 hits; Figure 2), an observation that supports the existing phylogenetic placement of P. marinus closer to the dinoflagellates than to the apicomplexans [12, 13].

Molecular phylogenetic data have shown that P. marinus is a basal alveolate derived from ancestral dinoflagellates just after the split from apicomplexans [62]. Further evidence indicating a relationship of P. marinus to the dinoflagellates was revealed with the discovery of SL trans-splicing in P. marinus and P. chesapeaki cDNAs [63], an mRNA processing phenomenon found in several dinoflagellate species [64]. Trans-splicing in P. marinus has been reported, but the SL sequence bears a single nucleotide change with respect to the canonical sequence in dinoflagellates [64]. A considerable number (14.7%), of the P. marinus ESTs contain partial SL sequences identical to the reported single-nucleotide non-canonical SL sequence. The longest SL identified was 13 nts and was present in 40 different sequences. Partial SL sequences of ≥ 5 nt were observed in 1,156 consensus sequences. Incomplete SL sequences in cDNA are common [65] and believed to be the result of inhibitors, often 5' message modifications, that interfere with reverse transcription. We have identified a putative full-length 22 nt SL in three P. marinus Nramp divalent cation transporters (Lin, Z., Fernández-Robledo, J.A., Cellier, M.F.M. Vasta, G.R., unpublished results). Numerous 22 nt SL sequences are encoded in the P. marinus genome (Joseph, S.J., and Kissinger, J.C., unpublished data), and although no full-length spliced-leader sequences were observed in the ESTs presented here, putative ESTs corresponding to portions of genomic SL loci were identified.

Functional categorization of Perkinsus sequences

Gene ontology (GO) categories were assigned based on BLASTX hits according to the PLAN web system. Figure 3 shows the distribution of gene ontology terms (1st level GO terms) according to the GO consortium. Cellular process (32.7%) was the most dominant term out of the 8,670 consensus sequences that were assigned to the Biological Process GO category (Figure 3(a)). This was followed by metabolism at 23.4%. Another 9% percent represented parasite-specific 'developmental' sub-categories. Approximately 7% of the terms were for regulation of protein biosynthesis, transcription, cell proliferation, apoptosis, DNA replication, and glycolysis. Significant representation of proteins involved in translation has also been reported in EST sequencing projects for Toxoplasma tachyzoites [35, 66], Cryptosporidium sporozoites [67] and Eimeria merozoites [68]. Abundant messages for ribosomal proteins are suggestive of the rapid and extensive protein translation that accompanies parasite differentiation and multiplication following host-cell invasion. Catalytic activity (39%) was the most dominant molecular function (Figure 3(b)). Multiple proteases, including cathepsins B, E, H, L, and S, and serine-type peptidase, were identified. The expression of multiple proteases in P. marinus may indicate their role(s) in processing the host protein substrates to facilitate uptake of their metabolic products to sustain normal cell function and proliferation. The expression of superoxide dismutase (SOD) was consistent with prior reports. P. marinus uses SODs to protect itself from reactive oxygen intermediates (ROIs) generated by the host's oxidative enzymes [6974]. Sequences identified as 'binding' components comprised about 36% of the total, followed by 'transporter' at 8%, the majority being ion transporters and transporters with carrier activity. Some of the 'transporter' consensus sequences fell into the 'ATPase coupled transporter' sub-category, which includes transporters of sugars, peptides, amino acids, and other small molecules. Around 2% were designated as transcriptional regulators, with the majority designated as having transcription factor activity. Over two-thirds of the consensus sequences were localized by cellular component to either cell or organelle, as shown in Figure 3(c). Of this, the majority was assigned to the nucleus, mitochondrion, endoplasmic reticulum, nucleolus, and a few were assigned to chloroplast thylakoid. Other sequences (4.7%) were categorized as 'membrane enclosed lumen', and included mitochondrial, nucleolar, and cytosolic membranes.

Figure 3
figure 3

Gene Ontology annotation of P. marinus unique sequences. The top BLASTX hit provided annotation and functional categorization (gene ontology assignment) for each P. marinus assembled consensus sequences. The total numbers of sequences annotated for each main category are (a) 8,670 for Biological Process, (b) 3,618 for Molecular Function and (c) 5,452 for Cellular Component. A single gene product may be associated with multiple GO annotations with in a single category, giving rise to more GO annotations than sequences.

Functional analysis of protein sequences

We analyzed conserved protein domains using Pfam as a database to predict the function of ORFs generated from P. marinus consensus sequences. Overall, 2,704 ORFs encode protein domains similar to 1,042 Pfam protein families (E-value < 0.1). Among these Pfam families, 65 families were denoted as DUFs (Domain of Unknown Function). We found that 2,009 ORFs contain a single PFAM protein domain while 492 have two domains, 96 have three, 58 have four, 28 have five and 28 ORFs contain six or more domains. The most abundant Pfam domains in P. marinus are presented in Table 2; protein domains that are of evolutionary and therapeutic interest in this parasite are shown in Table 3.

Table 2 The 20 most common Pfam domains
Table 3 Pfam domains of interest identified in P. marinus EST cluster open reading frames.

Orthologous groups

To determine the number of potential orthologs between P. marinus and proteins from 21 organisms representative of the major divisions of the tree of life, [Additional file 2: Supplemental Table S1] we performed a clustering analysis on the non-redundant 21 genome protein sets along with the P. marinus EST ORF proteome using the OrthoMCL program [47]. In brief, OrthoMCL defines clusters based on reciprocal best BLASTP hits and sorts proteins without a reciprocal best BLASTP hit into the best matching cluster. The input dataset for OrthoMCL consisted of 325,188 protein sequences from 22 proteomes. P. marinus sequences (5,661 ORFs) were found in 2,878 ortholog groups, of which 1,715 are unique to P. marinus i.e., no sequences from other taxa are present in this group. The extent to which P. marinus shares orthologous genes with the 21 other taxa examined is listed in Additional file 2: Supplemental Table S4. As the number of taxa increases, the number of shared genes decreases.

Phylogenetic analysis

Automated phylogenetic analyses of the orthologous clusters was performed to determine the nearest taxonomic neighbor for each sequence. This approach is useful for establishing gene origins, identifying genes with restricted phylogenetic distribution and to identify putative horizontal gene transfer (HGT) events in the P. marinus genome. Phylogenetic trees were constructed from the 1,042, ortholog groups that shared ≥ 4 taxa including P. marinus [Additional file 2: Supplemental Table S4]. Multiple alignments were created with ClustalW and filtered using GBLOCKS to remove ambiguous regions. Only ortholog groups with ≥ 50 aligned aa were used for phylogenetic analysis, and 291 alignments met this criterion. Neighbor-joining trees were constructed with bootstrap support to determine the P. marinus nearest neighbor. Not surprisingly, 54% of the orthologous groups have alveolates as the closest neighbor to P. marinus (48% apicomplexans and 6% ciliates (dinoflagellates are not represented)). Toxoplasma gondii was the most highly represented species. P. marinus sequences also show nearest neighbors of kinetoplastids, bacteria, archaea and red algae. For the gene encoding 2-C-methyl-D-erythritol 4-phosphate cytidyly transferase (IspD), one of the seven enzymes involved in plastid metabolism recently discovered in P. marinus[7577] its nearest neighbor is that from the red algae Cyanidioschyzon merolae. Other important taxonomic groups that were found to be closest neighbors to P. marinus include the Heterokontophyta [Thalassiosira pseudonana (17 genes), and Phytopthora ramorum (36 genes)], Plantae [Arabidopsis thaliana (26 genes)], Animalia [Homo sapiens (24 genes), Drosophila melanogaster (7 genes)], and Fungi [Saccharomyces cerevisiae (11 genes)]. It should be noted that the nearest neighbors are expected to change as more taxa are sequenced and added to the analysis.

Sequences of particular interest

Proteases

The presence of protease sequences in P. marinus deserves special mention. Proteases from Perkinsus are involved in pathogenesis and host-parasite interactions; indeed, oyster homogenate enhances the infectivity of the parasite and reflects changes in the Perkinsus extracellular product (ECP) composition [78]. In vitro protease expression and cellular differentiation also appear to be modulated by oyster tissue extracts; fewer proteases can be observed in Crassostrea ariakensis supplemented medium compared to C. virginica supplemented medium [79]. There is also evidence that oyster plasma protease inhibitor contributes to some oyster species' resistance to Perkinsus[8082]. Although several proteases have already been reported in P. marinus[21, 83] our EST analysis identified numerous P. marinus sequences with Pfam domains (Table 3) and strong BLAST similarity (Table 4) to cathepsin-like cysteine protease, subtilisin-like serine protease, rhomboid-like protease 1, cysteine protease, ATP-dependent protease, serine protease, metacaspase 1 precursor, and ubiquitin-specific proteases. As observed for other well-characterized parasites, some proteases produced by P. marinus might degrade host protein substrates to acquire the nutrients necessary for normal cell function and proliferation. Interestingly, several of the ESTs encoded proteases with possible signal/secretion peptides, including subtilisin-like serine protease (score 0.882), cathepsin-like cysteine protease (score 0.861), calcium-dependent cysteine protease (score 0.782) and metacaspase 1 precursor (score 0.923). Congruent with the hypothesis that the oyster host has adaptations to contend with parasite peptidases, C. virginica plasma contains a serine protease inhibitor that binds tightly to subtilisin and perkinsin was purified from the plasma of C. virginica suggesting a role in host defense against the parasite's proteolytic activity [84].

Table 4 P. marinus consensus sequences with similarity to known proteases.

Antioxidant enzymes

Upon phagocytosis by oyster hemocytes, Perkinsus marinus trophozites localize inside phagosome-like structures where they remain viable and undergo proliferation. Available evidence indicates that the parasite survives oxidative stress imposed by the oyster defense mechanisms [30, 31]. Protistan parasites generally contain antioxidant activities, but may lack other enzymes typical of animal antioxidant pathways. During recent years we identified and characterized in P. marinus trophozites iron SOD and ascorbate-dependent peroxidase (APX) that degrade the ROIs resulting from the oxidative burst associated with phagocytosis [73, 74]. Specifically, P. marinus SOD1 (PmSOD1) encodes a mitochondrial Fe-SOD [6974], which may contribute to P. marinus resistance to exogenous oxidative damage in host phagocytes [73]. In contrast, although the product of PmSOD2 is predicted to be targeted to a putative plastid, confocal and immunogold studies localized it to the cell periphery and cytoplasmic single membrane compartments [85], raising interesting questions regarding its organellar targeting and the nature of a putative relict plastid described in other Perkinsus species. SOD catalyzes the dismutation of O2- to H2O2, which may be eliminated by either catalase or peroxidases such as glutathione-dependent peroxidase (GPX). GPX requires reduced glutathione produced by glutathione reductase. Analysis of P. marinus ESTs identified sequences for several P. marinus oxidative pathway components that are expressed in the trophozoite stage. These include peroxiredoxin 6, peroxiredoxin V, thioredoxins, glutaredoxins, glutathione reductase, and thioredoxin reductase (Table 5). Sequences highly similar to PmSOD1 and PmSOD2 were identified (Table 5) as well as several confirmatory Pfam domains (Table 3). Because no catalase activity has been detected in P. marinus, and no catalase gene has been identified in its genome, it is likely that some of the abovementioned peroxidases are involved in H2O2 detoxification.

Table 5 P. marinus sequences with similarity to known oxidative enzymes.

Fatty acid synthesis

Lipid analysis of 7 day-old in vitro cultured P. marinus trophozoites indicated that triacylglycerol represents 48.7% of the total lipids [86]. P. marinus trophozoites utilize 13C-acetate to synthesize a range of saturated and unsaturated fatty acids and the parasite's ability to synthesize 20:4(n-6) de novo is unique within parasitic protozoa [87]. Eukaryotes employ either the delta-6 or delta-8 desaturase pathway, or both, to synthesize arachidonic acid, an essential fatty acid. The meront stage of P. marinus synthesizes arachidonic acid through the delta-8 pathway [88]. In addition, it has been suggested that P. marinus cannot synthesize sterols and must sequester them from its host. Perkinsus cells are able to proliferate in complete lipid supplement medium (cod liver oil, cholesterol and alpha tocopherol acetate in detergent) and media containing cholesterol or cholesterol+alpha tocopherol acetate, but fail to proliferate in control medium and medium containing just alpha tocopherol acetate [89]. However, the genome of P. marinus encodes 6 out of 7 methylerythrithol phosphate (MEP) pathway genes [7577] indicating that Perkinsus is able to synthesize de novo sterols (see below: isoprenoid metabolism). ESTs matching enzymes involved in sterol metabolism also include sterol glucosyltransferases (Pm04113, E = 1e-47; Pm01717, E = 4e-42; Pm04859, E = 6e-44), sterol C-24 reductase (Pm01729, E = 3e-50), and sterol desaturase (Pm00771, E = 9e-23). Three P. marinus genes encoding the enzymes responsible for arachidonic acid biosynthesis (C18 delta-9-elongating activity, C20 delta-8 desaturase, C20 delta-5 desaturase) are clustered and co-transcribed as an operon [27]. Sequences highly similar to delta-9 desaturase (Pm03609, E-value = 1e-66), delta-8 fatty acid desaturase (Pm07340, E-value = 1e-161), and delta-5 desaturase (Pm03710, E-value = 0.00) were present, while no sequences similar to delta-6 fatty acid desaturase were identified, in agreement with the abovementioned observations. Pfam analysis identified 8 fatty acid desaturase domains including the above sequences (Table 3).

Sequences similar to the plastid-localized enzyme, acetyl-CoA carboxylase (Pm00609 (E-value = 2e-41) and Pm05907 (E-value = 3e-16)) involved in fatty acid biosynthesis were also identified in this study. Indeed, it has been shown that Perkinsus proliferation is inhibited by Triclosan and cerulenin, which has been interpreted as evidence for the presence of a plastidic FAS II pathway [24, 25]. However, when considering the effect of Triclosan as indicative of the relevance of the apicoplast FASII biosynthesis, results should be interpreted with caution, since in Plasmodium, the antimalarial activity of Triclosan is not targeted to FabI [90]. Further, although Theileria is also susceptible to this drug, genes coding for FASII are lacking in the Theileria genome [91].

Heat shock proteins

Expression of heat shock protein 70 (HSP70) in P. marinus suggests that the parasite might use general stress response genes to overcome the stress imposed by the host environment. In Toxoplasma, the HSP70 gene is expressed during the transition from the active to latent form [92]. In virulent Toxoplasma strains, HSP70 contains seven amino acids not present in the HSP70 from non-virulent strains, and HSP70 expression is elevated 2-fold in virulent versus non-virulent strains [93]. A highly similar consensus sequence to the T. gondii HSP70 protein (Pm00065, Evalue = 0.00) was identified in the P. marinus EST analysis. Further, two sequences similar to HSP90 were identified, Pm00171 (E-value = 1e-176) and Pm05367 (E-value = 3e-60). Many heat shock proteins are chaperonins, which function in concert with proteins similar to the delta subunit of the t-complex family of chaperonins, and are found in virtually all organisms including Leishmania[94]. In different species, t-complex chaperonins are involved in protein folding after stress-related denaturation. Two chaperonin homologs, encoded by Pm03177 and Pm01657, are similar to proteins with roles in thermo-tolerance, cell-cycle progression and hematopoeisis. Pfam analysis detected the same HSP90 and HSP70 proteins as well as and Cpn_TCP1 (HSP60 chaperonin family-TCP-1 family) domains (Table 3).

Isoprenoid metabolism

Although TEM observations have failed to identify a plastid in Perkinsus species [6, 95], recent studies suggest that genes associated with secondary plastids may be present. P. marinus possesses genes for a plant-type ferredoxin system that possibly encodes plastid-targeting signals [25]. Recently, TEM observations in Perkinsus olseni revealed a very large (about 375-800 nm in diameter/larger axis) organelle with four membranes [26], although concerns have been raised about the nature of the structures observed [85]. Ferredoxin and ferredoxin-NADP reductase, proteins predicted to target the putative relict plastid [25], were identified in our analysis (Table 6). The search for P. marinus methylerythrithol phosphate (MEP) pathway genes, responsible for de novo isoprenoid synthesis in plastids, has resulted in the full-length sequences for 6 out of 7 of these genes [7577]. These provide evidence for a complete MEP pathway in P. marinus, and are indicative of a plastid organelle [76]. Four enzymes of the MEP pathway were identified in the present EST study. 1-deoxy-D-xylulose 5-phosphate synthase (DXP synthase) is responsible for conversion of pyruvate and glyceraldehyde 3-phosphate into DXP (Table 6). Another enzyme that functions as an intermediate step in the MEP pathway called MEP cytidylyltransferase (IspD) was also detected in this EST analysis (Pm05015, E-value = 4e-26) as was 2C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (ME-CPP synthase) and 4-hydroxy-2-methyl-2-butenyl 4-diphosphate synthase (HMBPP synthase). An IspD Pfam domain was identified in one of the above P. marinus EST ORFs (Table 3).

Table 6 P. marinus consensus sequences with similarity to enzymes involved in plastid isoprenoid metabolism and other plant-type plastid proteins.

Glycan assembly and carbohydrate-binding proteins

Oligosaccharides on cell surface and intracellular glycoconjugates are assembled by the combined activity of glycosyltransferases and glycosidases, and may interact with carbohydrate-binding proteins (lectins). Further, protein-carbohydrate recognition plays a key role in intracellular processes such as protein folding and transport, as well as interactions between cells or cells and the extracellular matrix in functions related to development, immune responses, and host-parasite interactions. It is widely accepted that recognition of parasitic surface glycoconjugates by host humoral or cell-associated lectins are frequently involved in cellular recognition and colonization of the host. Several studies have partially characterized multiple lectins in the oyster plasma and hemocytes, which recognize glycoproteins that display terminal galactose and N-acetylated sugars [96, 97]. Flow cytometry analysis of P. marinus trophozoites labeled with commercial lectins [98] enabled a tentative identification of some of the sugars present on the surface of the P. marinus trophozoites. Recently, we demonstrated that an oyster galectin present on the hemocyte surface functions as a parasite receptor and facilitates host entry [28]. Once inside the host, trophozoites survive intra-hemocytic killing mechanisms and proliferate [32]. The P. marinus ESTs include several with high similarity to glycosyltransferases and other members of the 'sugar moiety' transferase enzyme family. Four sequences (Pm01326 (E-value = 2e-49), Pm02039 (E-value = 9e-31), Pm07403 (E-value = 1e-08), Pm02460, E-value = 3e-09) show high similarity to the genes that encode glycosyltransferase in Cryptosporidium hominis and Verrucomicrobium spinosum. BlastX hits to N-acetylglucosaminyltransferase (Pm07672, E-value = 4e-18), mannosyltransferase (Pm03101, E-value = 6e-36), sterol-glucosyltransferase (Pm04859, E-value = 6e-44), prolipoprotein diacylglyceryl transferase (Pm04168, E-value = 2e-23) and alpha glucosidase (Pm03682 (E-value = 2e-44), Pm01066 (E-value = 2e-50) were also observed. In addition, Pfam domains for the glycosyltransferase family, Glyco_transf_20 (trehalose-6-phosphate transferase), Glyco_transf_22 (Alg9-like mannosyl transferase), Glyco_transf_28 (UDP-N-acetyl glucosaminine transferase and Glyco_transf_8 (lipopolysaccaride galactosyl transferase) were observed (Table 3). A single EST ORF encoding the domain called Alg14 (oligosaccharide biosynthesis protein Alg14 like), which represents an important protein in the synthesis of glycoconjugates, was also identified (Table 3). Prior studies have revealed glycosidase activity, including β-D-glucosidase, β-D-xylosidase, N-acetyl β-D-glucosaminidase and N-acetyl β-D-galactosaminidase in Perkinsus trophozoites, and N-acetyl β-D-glucosaminidase in spent culture medium (Ahmed, H., Fernández-Robledo, J.A., Vasta, G.R., unpublished results).

Three unique EST clusters (Pm03024 (E-value = 1e-36), Pm03327 (E-value = 1e-30) and Pm02250 (E-value = 7e-26)) show similarity to the ERGIC-53-like mannose binding lectin also present in Cryptosporidium, Toxoplasma and Plasmodium species. ERGIC-53 is a type 1 transmembrane L-type lectin present in the endoplasmic reticulum (ER) that captures correctly folded glycoproteins, and mediates their transport along the secretory pathway. The yeast L-lectins Emp46p and EMP47p, homologues of ERGIC-53, have been proposed to be transport receptors that facilitate recruitment of glycoproteins into vesicles budding from the ER [99]. One unique sequence (Pm04230, E-value = 7e-12) shows similarity to the mannose-binding lectin derived from Crinum asiaticum, a plant lectin with homologies to known monocot mannose-binding lectins from Amaryllidaceae, Orchidaceae, Alliaceae and Liliaceae, high similarity to the gastrodianin-type antifungal proteins and a predicted structure similar to that of the Galanthus nivalis agglutinin [100]. The sequence Pm07056 shows similarity to a galactose-specific lectin. Pfam analysis also detected similarities with plant lectins, such as domains for the ricin-type beta-trefoil lectin and legume-like-lectin (Table 3).

Nucleotide metabolism

Several nucleotide salvage enzymes were identified. Hypoxanthine-guanine phosphoribosyl transferase, HGPRT (Pm04986, E-value = 2e-57), and hypoxanthine-xanthine-guanine phosphoribosyl transferase, HXGPRT (Pm07697, E-value = 6e-53); uridine kinase - uracil phosphoribosyltransferase, UK-UPRT (Pm01230, E-value = 1e-40) and uracil phosphoribosyltransferase, UPRT (Pm07824, E-value = 2e-43) were detected. Pfam analysis revealed the presence of phosphoribosyl transferase in 3 ESTs including Pm04986 (Table 3). Evidence for a few de novo biosynthetic enzymes was detected. There is a hit to dihydroorotate oxidase (Pm05343, E-value = 7e-30), uridylate kinase (Pm05827, E-value = 3e-19) and adenylosuccinate synthetase (Pm02370, E-value = 1e-128) all involved in pyrimidine biosynthesis. Pfam analysis also revealed an adenylosuccinate synthetase domain in Pm02370.

Potential targets for chemotherapy

Several potential candidates for chemotherapy in P. marinus were identified in this EST study. Like the proteases from other parasites, known to degrade host proteins for acquisition of the nutrients, proteases produced by P. marinus could be targeted for chemotherapy. Further, the plasma of the eastern oyster C. virginica contains a serine protease inhibitor that binds tightly to subtilisin and perkinsin, and is potentially involved in blocking the parasite's proteolytic activity [84]. Therefore, dermo disease-resistant oyster populations could be established by introducing through selective breeding selected gene variants that exhibit enhanced expression of protease inhibitors. Further, based on the observed in vitro susceptibility of the parasite to selected ROIs [73] and hemocyte-based defense against the parasite in vivo[101], therapeutic agents could be applied to infected oysters to enhance their respiratory burst in response to P. marinus. Similarly, and based on the observation that parasites are usually more susceptible to ROIs than are their hosts, components of the P. marinus anti-oxidative stress pathway could be the basis for development of therapeutic drugs. Conventional molecular modeling approaches such as the Drug Design by Receptor Fit (DDRF) methods can be applied to the parasite enzymes such as glutathione reductase, peroxiredoxin, thioredoxin, and glutaredoxins, whose sequences were identified in this study. The P. marinus EST analysis also identified sequences similar to aldolase, an enzyme that has been used as a target for therapy in several Plasmodium species because of its central role in energy metabolism (glycolysis) [102]. Aldolases have been also used for intervention in Trypanosoma brucei infection [103]. Therefore, the potential for aldolase inhibitors, already in use for other parasites, to inhibit P. marinus growth in vitro warrants further investigation. The recent characterization of the isoprenoid pathway in P. marinus strongly suggests the presence of a cryptic plastid in the parasite, which has been identified as an excellent target for drug therapy in apicomplexan parasites [104107]. Analytical methods [77] as well as EST evidence from the current study point towards the presence of a DOXP/ME pathway (Table 5) to produce isopentenyl diphosphate. The enzymatic machinery of the DOXP/MEP pathway in Plasmodium falciparum has been fully characterized [108] and fosmidomycin, a specific inhibitor of the DOXP reductoisomerase, is very effective against malaria [109]. Interestingly, fosmidomycin has also been tested against Perkinsus trophozoites and it appears that up to 5 days concentrations up to 1 mM show no effect on growth inhibition [25]. The genes for the DOXP pathway are present in Toxoplasma and it has been proposed [110] that toxoplasmosis could be treated by targeting a downstream pathway enzyme, farnesyl diphosphate synthase (FPPS), using bisphosphonates, which are specific FPPS inhibitors. A similar strategy could also be applied to dermo disease if bisphosphonates do not affect the oyster or if fosmidomycin does not show differential effectiveness against the parasite DOXP pathway. Therapeutic strategies using existing drugs such as bisphosphonates and fosmidomycin have the advantage of avoiding costs of de novo drug design and development. Moreover, virtual screening initiatives could provide new avenues for drug development against numerous protozoan parasites [111]. Conversely, these observations also highlight the potential for using P. marinus as a readily-cultured, non-pathogenic model for early screening of potential drugs against a variety of protistan human parasites.

Conclusions

By providing a first glimpse into expression of genes encoding proteins associated with important metabolic pathways in other parasitic protozoa, such as proteases, oxidative stress enzymes, fatty-acid synthesis and isoprenoid metabolism, the sequences generated from the P. marinus cDNA libraries are extremely informative. The identification of proteins implicated in glycan assembly, protein folding and secretion, and parasite-host interactions, and those that participate in biochemical pathways associated with a putative relict plastid, suggest that potential chemotherapy targets that have been proven to be effective in other protozoan parasites are also expressed in P. marinus and could lead to novel intervention strategies for dermo disease (Figure 4).

Figure 4
figure 4

Biological role(s) of P. marinus sequences of particular interest. Several aspects of Perkinsus biology are highlighted, including the identification of pathways indicative of a relict/cryptic plastid, expression of numerous transporters and proteases likely be involved in the uptake of nutrients and the degradation of host components, enzymes and lectins involved in glycan assembly, protein folding and secretion, and parasite-host interactions.

P. marinus sequences display the greatest similarity to EST sequences from dinoflagellates. No significant differences were observed between EST populations obtained from parasites propagated under standard conditions and those exposed to oyster serum. This finding is a consequence of the small number of ESTs sampled, and a more rigorous analysis should be carried out by expanding both sample size and incorporating additional experimental strategies. Concerning the latter, to gain a better understanding P. marinus virulence, approaches such as subtractive techniques [112, 113] and microarray analysis will be very useful. Although our results are a first step in that direction, the application of subtractive techniques should result in ESTs overlooked in our study due to the limitations of non-normalized libraries. The libraries and ESTs generated here, however, may find further use in the production of microarrays to visualize changes in gene expression, such as expression of parasite genes related to defense against the oyster's immune system.

As indicated above, P. marinus expresses sequences with significant similarity of dinoflagellates followed by the apicomplexans. Although the fraction of P. marinus transcripts that are trans-spliced is still unknown, the identification of a SL in >14% of the ESTs confirms the previously established affinity with dinoflagellates. It also suggests that PCR amplification based on the Perkinsus variant of the SL would provide a rapid and efficient method of amplifying and cloning full-length transcripts in the future.

The EST analysis reported herein, together with the recently completed P. marinus genome sequencing project (GenBank Project ID: 12736), and the development of a transfection system for Perkinsus trophozoites [40] will enhance the community's ability to improve the status of both natural and farmed oyster stocks by identifying gene products suitable for drug targeting, which will lead to therapeutic applications that may be effective in closed (especially hatchery) systems. The genes relevant to host-parasite interactions, particularly those involved in host-cell entry and/or pathway signaling may lead to genetically-selected or -engineered oysters that either block the entry of the parasite or enhance the response of the oyster defense against P. marinus. Production of seed oysters that remain disease-free and reach marketable size will be critical for full recovery of wild eastern oyster populations, which provide irreplaceable environmental services. Disease-resistant oysters would also form the basis of a viable shellfishery, as well as sustainable production of farmed oysters.