Findings

Introduction

Taenia solium taeniasis/cysticercosis is a zoonotic helminth infection mainly found in poor and rural regions of Africa, Asia and Latin America where it has a large impact on public health [13]. The adult tapeworm develops in the small intestine of humans (taeniasis). Mature proglottids full of eggs break off from the distal end of the worm and leave the body with the stool. Both humans and pigs can act as intermediate hosts when the infective larval stages (oncospheres) inside the eggs are ingested and liberated in the stomach. The oncospheres then enter the blood flow through the intestinal mucosa. Cysticercosis is caused when oncospheres lodge themselves in the subcutaneous and muscle tissues and the central nervous system, where they develop into metacestode larval stages (cysts). In humans, epilepsy and other neurological symptoms can be provoked by immunological reactions against degenerating cysts that have developed in the central nervous system (neurocysticercosis).

Diagnosis of porcine and human (neuro) cysticercosis largely depends on antigen and/or antibody detection, but these serological methods also have their specific drawbacks [4]. Improving current diagnostic assays automatically implies better knowledge of the proteins secreted and excreted by the metacestodes.

Proteomic experiments involving liquid chromatography and tandem mass spectrometry (LC-MS/MS) typically attempt to match the generated experimental spectra to in silico spectra from a (target) protein database. Ideally, this database contains every protein likely to be in the sample, but obtaining such an all-including protein database proves difficult when there is little to no genomic information available, as was the case for T. solium until recently [5]. In our previous study, we bypassed this limitation by using a custom database with known proteins from related helminths (Taenia, Echinococcus, Schistosoma and Trichinella) as a target database in the LC-MS/MS experiments [6]. We deliberately did not use translated expressed sequence tags (ESTs), because we wanted to investigate to usefulness of a target database made up of protein sequences originating mostly from (closely) related helminths.

The usefulness of ESTs for the identification of helminth proteins has already been described for e.g. Haemonchus contortus[7, 8] and Echinococcus granulosus[9]. In the case of T. solium, ESTs from different parasite stages have been made available by different research groups, both published [10, 11] and unpublished (Huang J. et al., Analysis of Taenia solium and Taenia saginata adult gene expression profile, 2009 and Aguilar-Diaz H. et al., Taenia solium larva/adult ESTs, 2007). In this study, we use T. solium ESTs combined with the Basic Local Alignment Search Tool (BLAST) and protein mapping to supercontigs of E. granulosus (a member of the Taeniidae family) to investigate whether we could increase the number of T. solium metacestode excretion/secretion protein identifications from the previous study.

Materials and methods

Generation of the data set

The in vitro production of the T. solium metacestode excretion/secretion proteins from Peru and Zambia at 24h and 48h and the generation of line spectra mzXML files have been previously described [6].

Database design and data analysis

To construct the target database, 30,700 expressed sequence tags were downloaded from the National Center for Biotechnology Information (NCBI) website in April 2012 and a six frame translation was performed using transeq [12]. A Sus scrofa database with 1,388 Swiss-Prot sequences (http://www.uniprot.org/) and the common Repository of Adventitious Proteins database (112 protein sequences; http://ftp.thegpm.org/fasta/cRAP/crap.fasta) were also included to assist detection of host proteins and accidental contaminations, respectively. A decoy database with 185,700 reversed sequences was created using decoyfasta. These databases were fused into one final database. Database searching with X!Tandem (2010.10.01.1) [13] and subsequent analyses with PeptideProphet [14, 15], iProphet [16] and ProteinProphet [17] were also performed as previously described [6]. All above mentioned tools, except transeq, are included with the Trans-Proteomic Pipeline v4.5 RAPTURE rev 2 [18]. The identified translated ESTs were further filtered to a false discovery rate of < 1% and ESTs with an individual probability of zero were discarded. The remaining ESTs were blasted against the NCBI nonredundant database (E-value < 10 −10) and for each recognized EST, the best matching protein was retained. The resulting proteins were then screened by mapping the proteins to the E. granulosus supercontigs using TBLASTN (http://www.sanger.ac.uk/cgi-bin/blast/submitblast/Echinococcus). Identifications with a Score > 200 were considered valid. Identifications with a lower score were manually evaluated and proteins originating from T. solium were retained. This step also helped to filter out host contaminations. Finally, proteins were grouped based on homology. All proteins that could not be grouped and were identified by only one EST were also discarded. Finally, Blast2GO was used for Gene Ontology (GO) annotations (biological process, molecular function and cellular component) and the construction of level 2 pie charts [19]. In order to gain more specific information, the largest categories were analyzed to levels 3 and 4.

Results and discussion

Identified proteins and gene ontology annotation

In this study, 297 proteins (from 1,787 translated ESTs) were identified and organized in 106 protein groups based on homology (Additional file 1). For simplicity, each protein group is represented by one protein. The groups were further organized by Gene Ontology annotation information on biological process and molecular function. A total of 48 protein groups are labelled with an asterisk, indicating that they were also identified in the previous study (Additional file 2) [6]. For brevity, Table 1 shows only the 58 newly identified protein groups. For a number of proteins/protein groups, no Gene Ontology information was available. Nonetheless, many of them, like the 8 kDa protein family [20], have been extensively studied and used in diagnostic assays.

Table 1 Protein groups ( n = 58) newly identified in Taenia solium metacestode excretion/secretion proteins, organized by Gene Ontology annotation information on biological processes and molecular functions

Most of the identified protein groups could be categorized in miscellaneous binding activities (e.g. Actin binding, calcium binding and metal ion binding), various metabolic processes, gluconeogenesis (Triosephosphate isomerase, Enolase, Phosphoenolpyruvate carboxykinase and Phosphoglucose isomerase), glycolysis (Glyceraldehyde-3-phosphate dehydrogenase, Phosphoglycerate kinase, Phosphoglycerate mutase and Fructosebisphosphate aldolase) and proteins with (endo) peptidase activity, including cysteine-type (Calpain, UDP-glucose 4-epimerase and Cathepsin), threonine-type (Proteasome subunits) and serine-type endopeptidase activity (Trypsin-like protein). Endopeptidase inhibitors with both serine-type (Kunitz protein 8 and Leukocyte elastase inhibitor) and cysteine-type endopeptidase inhibitor activity (Immunogenic protein Ts11) and components of the enzymatic antioxidant system of Taeniidae (Cu/Zn Superoxide dismutase, Glutathione S-transferase and Peroxiredoxin) were also identified [21].

Gene Ontology level 2 pie charts were created for biological process (Figure 1A), molecular function (Figure 1B) and cellular component (Figure 1C). To avoid overly busy charts, the sequence filter was set to 10. The two largest categories of the biological process chart were cellular and metabolic processes. Others included biological regulation, response to stimulus, multicellular organismal processes and cellular component organization or biogenesis. Further investigation of the general cellular and metabolic processes revealed primary and cellular metabolic processes at level 3 and protein, cellular macromolecule and cellular nitrogen compound metabolic processes at level 4 (Additional file 3, tab 1). Molecular function was clearly divided between binding and catalytic activity. GO level 3 showed protein binding and hydrolase activity while level 4 entailed mostly nucleotide binding, hydrolase activity (acting on acid anhydrides), cation binding, peptidase activity, cytoskeletal and identical protein binding (Additional file 3, tab 2). The level 2 pie chart for the cellular component indicated cell and organelle as the largest categories. Further analyses showed mostly cell part and membrane-bound organelle, and intracellular (part) GO terms at levels 3 and 4, respectively (Additional file 3, tab 3). Human Keratin and porcine Trypsin were identified in all samples. As Keratin is a common contamination and Trypsin was deliberately added during the LC-MS/MS experiments, both were omitted from the final results.

Figure 1
figure 1

Gene Ontology level 2 pie charts displaying the biological processes (A), the molecular functions (B) and the cellular components (C) of the 297 proteins that were identified in the Taenia solium metacestode excretion/secretion proteins. Values within parentheses are the number of sequences associated with each Gene Ontology term. The biological processes are mostly metabolic and cellular processes, while the molecular functions are predominantly catalytic activity and binding. The cellular components reveal a number of intercellular proteins. All charts were created using Blast2GO with the sequence filter set to 10.

The presence of intracellular/non-secreted proteins in the ESPs is interesting and has been observed in other ESP studies before [22, 23]. Although it is highly likely that the majority of those proteins are indeed excreted or secreted by the parasite, the possibility that they are the result of leakage due to cyst damage or death should not be excluded.

In general, the findings reported in this study are comparable to recent studies on other helminth genera like Echinococcus[23], Schistosoma[24] and Clonorchis[25], indicating that excretion/secretion proteomes are not very different between helminth genera/species.

Comparison between the two studies

When comparing the level 2 GO terms identified in both studies (Table 2), all GO terms from the previous study were identified here as well. Additionally, we identified 6 new GO terms with the EST analyses: rhythmic process (GO:0048511), antioxidant activity (GO:0016209), molecular transducer activity (GO:0060089), protein binding transcription factor activity (GO:0000988), receptor activity (GO:0004872) and synapse (GO:0045202). Although a direct comparison between numbers should be avoided (due to proteins having multiple GOs and the presence of homologous proteins in the proteins groups, especially in the previous study where it is a logical result of the target database construction), the general levels of abundance (= proteins in each GO term) are largely comparable between the two studies e.g. in both studies, cellular process, metabolic process and biological stimulation are the largest groups for ‘biological process’ while binding and catalytic activity are the largest groups for ‘molecular function’ and cell and organelle are the largest groups for ‘cellular component’. The 6 new GO terms were identified by a very small number of proteins and may be a result of proteins being linked to multiple GO terms. This is supported by the fact that the proteins linked to these GO terms are homologous to other proteins identified in both studies, so none of these GO terms was identified by a ’new’ protein group.

Table 2 Gene Ontology level 2 annotations identified in this study alongside the ones identified in the previous study

Concluding remarks

In this study, we have used a library of translated ESTs combined with BLAST and mapping strategies not only to confirm previously identified T. solium metacestode excretion/secretion proteins, but to identify several new proteins as well, thereby effectively increasing the overall number of protein identifications.

The larger and more complete the EST database, the better proteomic coverage likely obtained. No ESTs from other Taeniidae were used in this study, since the available T. solium ESTs were already a merge of EST submissions by different groups and were therefore likely to offer decent proteome coverage. However, in cases where only a small EST library is available with low coverage, one could also include protein sequences and/or ESTs from related organisms in a combined database. This may be particularly advantageous in proteomic studies on less studied, unsequenced, organisms. It should be noted that research on non-sequenced organisms mostly relies on homology to already existing proteins from other (preferably closely related) organisms. Therefore, there is no possibility of finding unique proteins, unless (i) de novo sequencing is performed on the good quality unmatched experimental spectra or (ii) ESTs that were identified by spectra but remained unmatched during BLAST are further investigated.

Finally, it is important to realize that, although the mapping to the E. granulosus supercontigs helped to remove S. scrofa host proteins (e.g. Albumin, Protegrin and Hemopexin), some may still be present. Heat shock protein 70, for example, is identified both in S. scrofa and E. granulosus.

In future T. solium work, it is sensible to make use of the T. solium genome sequence that was recently published [5]. However, since no curated protein database or convenient mapping solution is currently available and, for many other helminths, no complete genome sequence is available, the method described here is still valid.

Availability of supporting data

The data sets supporting the results of this article are available in the PRIDE repository at http://www.ebi.ac.uk/pridewith accession numbers 19232 – 19267.