Introduction

Clonorchis sinensis is an important human parasite in eastern Asia, including China, Taiwan, northern Vietnam and Korea. In Korea, it is well known that human infection of C. sinensis is widely distributed along the rivers and streams (Rim 1998; Crompton et al. 1999). When the fluke infects humans, the bile duct is severely dilated and the ductal wall is thickened due to mucosal hyperplasia and fibrosis. As the infection becomes chronic and heavier in intensity, complications such as obstructive jaundice, dull epigastric pain, biliary stones, ascites and cholangiocarcinoma can develop (Chapman et al. 1999; Kim et al. 1993).

Over the past decade, the world has witnessed the emergence and progress of several genome projects. The last release of the db expressed sequence tags (ESTs) in 1999 contained a prodigious number of entries for parasite genomes including >20,000 ESTs for Brugia malayi, ∼12,500 for Schistosoma mansoni, and over 12,000 for Trypanosoma cruzi. In addition, an increasing number of ESTs for other parasites including Paragonimus westermani, Strongyloides stercoralis, Toxoplasma gondii and Aedes aegypti etc., have been identified (Paul 2000). Analysis of ESTs has proven to be a rapid and efficient method for characterizing the subset of genes which are expressed in a life-stage specific manner in a wide variety of tissues and organisms (Adams et al. 1995). Gaining knowledge on the genome of parasites is increasingly important in order to understand a parasite's biology, the drug resistance mechanism and antigenic variations that determine the escape from a host's immune system (Franco et al. 2000; Tawe et al. 2001). Although a C. sinensis infection is diagnosed by identifying its eggs through a microscopic examination of stools, immunological methods have recently been adopted for its diagnosis (Kang et al. 1969). The assembly of EST data specific to a protein or a DNA sequence can be used to rapidly produce a gene/protein model to understand the physical and biological properties of C. sinensis, and is useful for immunodiagnosis, determining novel target drugs and identifying potential vaccine candidates.

Materials and methods

Collection of parasite

The C. sinensis metacercariae were collected from naturally infected fish (Pseudorasbora parva) caught in the Nakdong River, Korea. These metacercariae were orally administered to the experimental rabbits. Eight weeks later, adult C. sinensis worms were obtained from the rabbit bile ducts.

Construction of the cDNA library

C. sinensis mRNA was purified by using a messenger RNA isolation kit (Stratagene, La Jolla, Calif.). Briefly, live C. sinensis worms recovered from the infected rabbits were homogenized in a denaturing solution of 4 M guanidium isothiocyanate and 0.14 M β-mercaptoethanol, and centrifuged at 12,000 g for 10 min. The supernatant was transferred to a tube containing oligo (dT) cellulose resin. The resin was washed with a high-salt buffer, and the poly (A)+mRNA was eluted with a low-salt buffer. A cDNA library of C. sinensis was constructed using a ZAP Express cDNA Gigapack II gold cloning kit (Stratagene) according to the manufacturer's instructions. Briefly, the first strand cDNA was synthesized on 5 μg of C. sinensis mRNA and the second strand was synthesized by a nick translation. The EcoRI adaptors were ligated to the blunt ends. XhoI restriction enzyme digestion resulted in directional cDNA. The cDNA was then inserted into the predigested ZAP express vector arms. Packaging was carried out in vitro with Gigapack II packaging extract. The library was plaqued on Escherichia coli XL1-Blue MRF′.

Sequencing

The phage library was converted to the plasmid form by a mass excision according to the reported protocol (Stratagene). The obtained phagemid of the library was used to infect the E. coli strain XLOLR. The bacteria were grown for 45 min and then plated at a low density on a medium containing a Luria-Bertani broth including tetracycline (10 mg/l). The bacteria were cultured at 37°C overnight, and individual colonies were selected randomly for plasmid DNA purification and sequencing. All the sequencing reactions contained the T3 sequencing primer, and were read into the 5′ end of each DNA. The reactions were run and analysed on capillary automated sequencing machines (ABI 377 XL90; Applied Biosystem, USA). The machines generated two computer files, a chromatogram file and a plain text file.

The EST nucleotide sequences and completed cDNA sequence data reported in this paper were submitted to the GenBank dbEST database under the accession nos. AF480464, AF480465, AF527454–AF527457, AT006727–AT006731, AT006733–AT006735, AT006737–AT006740, AT006742–AT006744, AT006746, AT006749, AT006753–AT006755, AT006757, AT006760–AT006762, AT006764, AT006765, AT006768, AT006769, AT006772–AT006775, AT006777, AT006778, AT006780, AT006781, AT00678, AT006784–AT006786, AT006790, AT006797, AT006798, AT006800, AT006801, AT006804, AT006806, AT006809, AT006811, AT006817, AT006818, AT006820, AT006823, AT006833, AT006834, AT006838, AT006840, AT006844, AT006846, AT006847, AT006851, AT006853–AT006855, AT006859, AT006862, AT006864, AT006869, AT006870, AT0068 72, AT006873, AT006879 AT006884, AT006888, AT006891, AT006894, AT006898, AT006902–AT006904, AT006906, AT006907, AT006909, AT006911, AT006912, AT006914–AT006929, AT006931– AT006974.

Homology comparisons

Each edited EST was translated in all six reading frames and compared with the non-redundant database at the National Centre for Biotechnology Information using the Basic Local Alignment Search Tool X (BLASTX) program, which compares the translated nucleotide sequences with the protein sequences. The homologies to the negative reading frames, with the exception of clones with the insert in the reverse orientation, were disregarded. Putative identification of the ESTs was based on the BLAST searches and in some cases on the information contained in the MEDLINE database.

Results

Sequencing of ESTs

Of the 450 sequencing reactions attempted, 415 produced readable amino acid-encoding sequences from the C. sinensis adult cDNA library (Table 1). The leading and tailing vectors as well as the poor-quality sequences were trimmed from each text file. The 3′ vector and linker sequences were removed if the poly (A+) tails could have been included in the sequencing results. Three classes of anomalous sequences were also excluded; sequences without the insert, sequences with the reverse inserts and the incorrect adaptor. The insert size of the clones ranged from 400 to 3,000 bp, and the mean size was 627 bp.

Table 1. Expressed sequence tags (ESTs) of the Clonorchis sinensis adult cDNA librarya

Putative identification of EST sequences

BLASTX analysis showed that 277 of the 415 clones were strongly matched (P<10-9) to other proteins. On the basis of database searches, the 277 different ESTs were classified two groups. The remaining 138 clones fell into a "no database match" category (Table 1). The first class comprised 53% (220 clones) of the sequences with an average read of 562±95 bp, and were matched to the genes with the predicted or known functions of C. sinensis and other organisms. Forty-one clones of these cDNAs were matched with C. sinensis and are related to cysteine protease, glutathione-S transferase 28 kDa and the antigen proteins. The other organisms included helminths (Paragonimus westermani, Fasciola hepatica, Schistosoma japonicum, Schistosoma mansoni, Caenorhabditis elegans), mammals (Homo sapiens, Rattus norvegicus, Mus musculus), insects (Drosophila melanogaster) and others. Secondly, 14% (57 clones) of the sequences with an average sequence read of 548±93 bp were matched with C. elegans, H. sapiens, D. melanogaster, M. musculus, and P. westermani with no assigned function. The remaining 33% of the sequence read of 417±10 bp had no significant match in the database. The first class was sorted into seven functional categories, which included the genes associated with energy metabolism (38), gene expression/RNA metabolism (21), regulatory/signalling components (14), protein metabolism/sorting (98), structure/cytoskeleton (29), membrane transporter (ten), and antigen protein (ten) (Table 2). These were matched to either the ribosomal protein, cysteine protease or the heat shock protein. In addition, the genes related to energy metabolism were matched to those for the enzymes taking part in the glycolytic pathway, TCA cycle and oxidative phosphorylation. Among the other functional categories identified, genes encoding the proteins involved with transcription and translation and those associated with regulatory and signalling functions were identified.

Table 2. Significant matches of C. sinensis adult ESTs with sequences present in DNA and protein databasesa

Discussion

Even though the identification of novel parasite molecules using conventional methods are tedious, identifying some of these molecules might serve as starting points for various studies. However, the search for C. sinensis gene expression using the EST analysis performed in this study, showed many important genes which increase and complement our knowledge of the biology of C. sinensis.

Energy production in this fluke depends largely on glycolysis (Kang et al. 1969; Hong et al. 2000). Through glycolysis, it takes up approximately 1.13 mg glucose/h per gram wet weight, produces lactate, and forms several types of amino acids as the end product from exogenous glucose (Hong et al. 2000; Han et al. 1961). Therefore, the enzymes involved in glycolysis are essential for energy metabolism. In this work, the ESTs found represented a variety of proteins related to metabolism. It has been suggested that they produce ATPs through the glycolytic pathway or aerobic metabolism (Table 2; AT006919, AT006737, AT006806, AT006873, AF480465, AT006919, 006731, AT006921). However, an initial search of the genomic basis of the fundamental biochemical pathways of C. sinensis revealed that its biosynthetic networks are fairly consistent with those of humans.

The adenylate kinase 1 (AK 1, AF480464) homologous gene was identified from the ESTs. AK 1 is indispensable for Escherichia coli (Cronan et al. 1972) and Schizosaccharomyces pombe growth (Konrad et al. 1993), indicating that it is an essential enzyme for life in a single cell. Under normal conditions, AK 1-knockout mice showed no phenotypic changes. However, under metabolic stress, compromised energetics were detected in the heart and skeletal muscle, suggesting the physiological significance of AK-catalysed phosphoryl transfer between the intracellular compartments in cellular energetic homoeostasis (Qualtieri et al. 1997; Janssen et al. 2001). The nucleoside analogues of AK have been used clinically for treating certain viral infections and malignant diseases (Pucar et al. 2000; Bourdais et al. 1996; Schneider et al. 1998). Therefore, they can be also targeted for drug research with recombinant antigens.

Several genes encoding antioxidant and detoxification enzymes, such as Cu/Zn-superoxide dismutase, and glutathione-S-transferase were identified. These proteins are believed to play a crucial role in protecting the parasite from the host immune effector mechanisms, and are being pursued as drug targets in other parasitic infections (Selkirk et al. 1998; Smooker et al. 1999). Of the proteases identified from EST analysis, the high frequency of cysteine protease expression (30 out of 415 randomly selected clones) suggests it has an important role in the metabolism and/or pathogenesis of clonorchiasis. It has also been proposed as a target for a structure-based approach for drug design (Park et al. 2001; Song and Rege 1991). In addition, one protease inhibitor was also identified. Specific protease inhibitors have also been isolated from filarial nematodes, and are suggested to play a role in inhibiting the enzymes secreted from host immune cells, the blocking of antigen processing and the control of the endogenous proteases involved in parasite development (Zang et al. 1999; Yenbutr et al. 1995). The biological role of the protease inhibitor in C. sinensis requires further investigation. The ESTs identified in this study showed a significant number of homologous genes previously reported from closely related parasitic trematodes. A homologue of the S. japonicum fatty acid binding protein (FABP) was identified in C. sinensis, and it was demonstrated that the FABP with a cross-protective efficacy could be used as a vaccine against trematode infections, such as fasiolasis (Hillyer et al. 1987, 1988; Estuningsih et al. 1997) and schistosomiasis (Tendler et al. 1996). It was also found that the ESTs contained several antigenic protein genes for C. sinensis or other parasites. The recombinant glycine-rich C. sinensis protein and proline-rich antigen were reported previously to be useful for immunodiagnosis, with a high specificity (Yang et al. 2000; Kim et al. 2001).

AT006742 and AT006744 were found to be homologous to the Schistosoma tegumental antigens. Sm 20.8 was reported to be a member of a family of soluble tegument antigens that contained the EF-hand motifs, and were recognized as antigenic targets with protective antisera (Mohamed et al. 1998). The most significant feature of the flatworm tegument is that the ultimate boundary between the parasite and host is a living plasma membrane and its associated polyanionic coating or glycocalyx. This knowledge has revolutionized our understanding of symbiotic relationships, in that it raises a new conceptual level of the significance of the host-parasite relationship. In vivo, the alimentary tracts in schistosomes and the liver fluke play a role primarily in macromolecular digestion and the subsequent absorption of soluble digestive products. However, it is likely that these are augmented by the host-derived sugars and amino acids absorbed by the tegument (Halton et al. 1997). This can also be a target for developing diagnostics with recombinant antigens.

The EST analysis using the various stages of the parasite life cycle, such as the metacercariae, which is the infective stage to the final host, will be useful for investigating gene expression and characterizing the purified proteins of interest in C. sinensis. A particular gene or a class of genes identified within the EST dataset can be used as an appropriate target for diagnostics, vaccines or drug research and for further study of the development of clonorchiasis.