Introduction

The secretome constitutes the entire set of secreted proteins, representing up to 30% of the proteome of an organism [1], and includes functionally diverse classes of molecules such as cytokines, chemokines, hormones, digestive enzymes, antibodies, extracellular proteinases, morphogens, toxins and antimicrobial peptides. Some of these proteins are involved in a host of diverse and vital biological processes, including cell adhesion, cell migration, cell-cell communication, differentiation, proliferation, morphogenesis, survival and defense, virulence factors in bacteria and immune responses [2]. Excretory/secretory proteins (ESPs) circulating throughout the body of an organism (for example, in the extracellular space) are localized to or released from the cell surface, making them readily accessible to drugs and/or the immune system. These characteristics make these molecules extremely attractive targets for novel vaccines and therapeutics, which are currently the focus of major drug discovery research programs [24]. In particular, proteins secreted by pathogens (bacterial, protozoan, fungal, viral or helminth) mediate interactions with the host, because these are present or active at the interface between the pathogen and the host cells, and can regulate or mediate the host responses and/or cause disease [5, 6].

A brief overview of the currently available methods for generating and analyzing pathogen secretome data is presented, followed by a critical analysis of their contribution to our understanding of pathogen infection and host responses, especially in comparison to other genome analysis approaches. Some early successes in the applications of secretome data in the areas of therapeutic target identification, diagnostic tools and pathogen control are also presented.

Approaches for secretome analysis

Genome sequence analysis

Genome sequence analysis is based on transcript profiling and computational analysis. The computational prediction of secreted proteins seeks to identify the presence of signal peptides, which are considered markers for classically secreted proteins. According to the signal hypothesis, most secreted proteins have an amino-terminal signal peptide sequence that targets proteins to the endoplasmic reticulum (ER) lumen via the sec-dependent protein translocation complex [7]. The genome-based approach is fast but incurs three major problems. Primarily, the pathogen genome sequence has to be available. Although the genomes of several pathogens such as Vibrio cholerae [8] and Brugia malayi [9] are now available, several more organisms such as Ascaris lumbricoides and Wuchereria bancrofti are awaiting sequencing. Secondly, this approach is based on the accurate prediction of signal peptides for the detection of secretory proteins. However, many secretory proteins lacking the amino-terminal signal peptides are not predicted by this method. Lastly, secreted proteins are regulated at the post-transcriptional level, resulting in an apparent lack of correlation between the levels of production of secreted proteins and mRNA expression levels.

Proteomics approaches

With the advent of mass spectrometry (MS) and the ensuing bioinformatics analyses, proteomic approaches have become the preferred route for obtaining secretome data. The two main methods available here are gel-based and gel-free proteomics.

Gel-based proteomic analysis

Two-dimensional gel electrophoresis (2-DE) with MS is the most established proteomic approach. This method allows the separation of complex mixtures of intact proteins at high resolution. These protein mixtures are first separated according to their charge in the first dimension by isoelectric focusing, and according to size in the second dimension by SDS-PAGE (sodium dodecyl-sulfate polyacrylamide gel electrophoresis), and then analyzed by peptide mass fingerprinting after in-gel tryptic digestion. This approach has been widely used in pathogen secretome studies, such as that of Helicobacter pylori [10].

Although 2-DE currently remains the most efficient method for the separation of complex mixtures of proteins, this technique has a number of limitations, including poor reproducibility between gels, low sensitivity to detection of proteins at low concentrations and hydrophobic membrane proteins, limited sample capacity, and low linear range of visualization procedures. In addition, this technique is time consuming and labor intensive and has limited efficiency in protein detection due to its limited amenability to automation.

Gel-free proteomic analysis

To overcome the drawbacks of gel-based approaches, efforts have been made to introduce gel-free MS-based proteomics approaches. In these newly emerging techniques, instead of depending on gels to separate and analyze proteins, complex mixtures of proteins are first digested into peptides or peptide fragments, then separated by one or several steps of capillary chromatography, and finally analyzed by tandem MS (MS/MS). The secretome analysis of Leishmania donovani [11] adopted liquid chromatography coupled with automated MS/MS. Matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) MS, a popular tool for the analysis of complex molecules, was used to analyze the secretome of HepG2 cells infected with the dengue virus [12].

Bioinformatics approach

With the generation of large-scale expressed sequence tag (EST) and genomic data due to worldwide sequencing efforts, secretome analysis can be advantageously carried out using bioinformatics analysis systems such as EST2Secretome [13], a pipeline for the prediction of secretory proteins. EST2Secretome accepts EST data for preprocessing, assembly and conceptual translation into protein sequences. Alternatively, peptide sequences can be directly provided to the pipeline, which then separates secreted proteins by identifying an amino-terminal secretory signal peptide and the lack of transmembrane segments. The secreted protein set is then annotated extensively with gene ontologies, protein functional identification, in terms of mapping to protein domains, metabolic pathways, identifying homologs from a well-studied model organism (Caenorhabditis elegans), protein interaction partners and mapping to a manually curated signal peptide database [13, 14]. Figure 1 provides an overview of the EST2Secretome workflow. The application of EST2Secretome to approximately 0.5 million EST sequences from parasitic nematodes resulted in the identi fication of key ESPs, some of which are already being trialed as vaccine candidates and as targets for therapeutic intervention [13]. Similar studies reporting the ESPs of specific parasitic nematodes have been recently reviewed [14]. The accuracy of EST-based predictions of ESPs was assessed with proteomic data from Fasciola hepatica [15]. The EST2Secretome pipeline was successful in identifying the major secreted proteins of adult F. hepatica. Integration of bioinformatics analysis with proteomics data is important for the study of helminth host-pathogen relationships, to distinguish proteins that are secreted extracorporeally from those secreted within the internal tissues of the parasites. Additionally, this integrated approach has identified major helminth proteins that may be secreted by novel or non-classical secretory pathways.

Figure 1
figure 1

Overview of the EST2Secretome workflow. Pathogen EST sequences are analyzed by EST2Secretome to predict excretory/secretory (ES) proteins, which are functionally annotated in terms of InterPro domains, KEGG pathways, interaction partners and homologues from pathogenic, non-pathogenic and host databases.

Towards a better understanding of host-pathogen interactions

Proteins secreted by pathogens can influence infection and modify host defense signaling pathways. Proteomic analysis of secreted proteins from Rhodococcus equi [16], Plasmodium falciparum [17], H. pylori [18] and the eggs of Schistosoma mansoni [19] confirms the major role of the secretome in pathogenesis. Secreted proteins from pathogens modify and adapt the host environment for pathogen survival, invoking processes such as helminth immunoregulation [20]. Inside the host environment, the secre tome serves the role of a parasite genome, as the secreted proteins fulfill all the requirements of the parasite inside the host. While the secretory proteins of pathogens play a key role in pathogenesis, the secretome of the infected host cell is equally important in understanding secreted proteins underpinning host defense mechanisms against pathogen attack, such as the release of GDSL lipase 2 in Arabidopsis, which plays a role in pathogen defense [21]. Another host defense mechanism is the secretion of secretory immunoglobulin As (IgAs) against mucosal pathogens to limit the entry of bacteria, a process is known as 'immune exclusion' [2224]. A study on the malarial parasite P. falciparum [17] concluded that export of proteins from the intracellular parasite to the erythrocyte is vital for infection. These exported proteins are required for the virulence and rigidity of the P. falciparum-infected erythrocyte, which results in malaria infection [25]. This export is guided by a host targeting (HT) signal present on the parasite proteins engaged in remodeling the erythrocyte. The role of this HT motif in the transport of these parasite proteins is yet to be determined.

The major secretions of adult parasites are proteolytic enzymes that help parasites to penetrate the host skin and to cleave host IgE antibodies to regulate the host immune system. These ESPs are exported through classical and non-classical secretory pathways. Classical secretory pathways are mediated by the presence of short amino-terminal signal peptide sequences that are predicted accurately by algorithms [13, 14]. On the other hand, non-classical secreted proteins are hard to track as these are usually secreted by ER/Golgi-independent protein secretion pathways, eliminating the need for signal peptide sequences [26], and are usually predicted by using the SecretomeP method [27]. In a study on B. malayi [28], it was found that filarial ESPs are similar to cytokines, chemokines and other immune effector molecules of humans, and are predicted to promote parasite survival and development in the host environment. A comparative secretome analysis [17] identified 11 proteins that are conserved across human- and rodent-infecting Plasmodium species, suggesting a critical role for these proteins in interacting with and remodeling of the host erythrocyte cells. The secretome of a mammalian parasite consists of proteins required for parasite survival, including those involved in metabolism, reproduction and modification of the host immune system. Identifying pathogen ESPs will permit the identification of host receptors and host cells with which these proteins interact, improving our understanding of the molecular mechanisms involved in pathogenesis.

Recent secretome data

Secretome data on pathogenic organisms are sparse and limited to specific experimental methods or sample types. Over the past few years, a wealth of information on bacteria and the malarial and filarial parasites has become available, although there are still very few data on the infectious agents causing 'neglected tropical diseases' [29]. Major secretome analyses of helminth parasites have attempted to address this deficiency [14]. Examples from recent pathogen studies providing secretome data are listed in Table 1, giving details of the pathogen, its preferred host, the disease caused and the experimental approach. The proteomics approach is based on SDS-PAGE coupled with MS techniques for all studies in Table 1, while most of the bioinformatics analyses involve BLAST (Basic Local Alignment Search Tool) searches against the NCBI (National Center for Biotechnology Information, USA) databases and use of the MASCOT (Modular Approach to Software Construction Operation and Test) software, except for the F. hepatica study by Robinson et al. [15], in which the EST2Secretome pipeline [13] was used for bioinformatics data analysis and annotation.

Table 1 Examples of recent secretome data for major pathogens

Clinical applications

Identification of drug targets and vaccine development

As more and more secretome analysis studies are conducted around the world, our knowledge of the virulence factors present in the secretome has substantially increased. As many of the proteins present in the pathogen secretome remain unannotated, we can assign function to these proteins by homology searches for similar proteins of known function from different organisms. Furthermore, we can use Gene Ontology (GO) terms ascribed to database matches to glean GO terms for pathogen ESPs [13, 14]. The secretome of a pathogen cell provides a rich source of protein antigens that can be used for vaccine development. A very recent study on Mycobacterium immunogenum has investigated the protein antigens of the virulence factors in infection [30], with implications for vaccine development. The Human Hookworm Vaccine Initiative has spearheaded the identification of several prominent anti-parasite vaccine candidates, including a family of pathogenesis-related proteins, such as the Ancylostoma-secreted proteins [31, 32]. Major vaccine antigens determined as a result of this initiative are hydrolytic enzymes, including proteases and acetylcholinesterases from the infective larval 3 (L3) and adult stages. Major L3 candidates found are Ancylostoma-secreted proteins (ASPs), astacin-like metalloprotease (MTP), acetylcholinesterase (ACH) and transthyretin (TTR). From the adult stage, major antigens found are tissue inhibitor of metalloproteases (such as Ac-TMP), aspartic proteases and cysteinyl proteases. Clinical trials for hookworm infection vaccines are in progress.

ESPs from B. malayi [28], H. pylori [18] and Bacillus anthracis [33] have been identified, and drug and vaccine development is under way.

Diagnostic tools

MS has proved to be a successful tool for protein analysis. Secretory proteins serve as a rich source of biomarkers, as reviewed by Chaerkady and Pandey [34]. These biomarkers can be used in various array-based methods for the diagnosis of various medical conditions that occur as a result of pathogen infection, such as dengue virus infection [35] and meningitis [36]. Array-based approaches are more specific and faster than other conventional diagnostic techniques. Such a study of Trypanosoma congolense and Trypanosoma evansi [37], which cause the major strains of animal trypanosomosis, showed differences in their virulence and pathogenicity and has led to the determination of novel ESP targets for species-specific diagnosis and vaccine development.

Host-induced gene silencing using RNA interference technology

The availability of secretome data and the advent of RNA interference (RNAi) technology open up the possibility of host-induced gene silencing in pathogens, making the host resistant to infection. Parasite control in Arabidopsis thaliana has been achieved by host-induced gene silencing of nematode genes [38].

Conclusions

Secretome analysis is a promising area of research providing insights into different pathogenic infections. Recent studies have uncovered a myriad of processes involved in pathogenic infections at the molecular level, enabling us to develop novel therapeutic solutions to eradicate these infections. Although much work remains to be done in generating secretome data for several pathogens, the availability of secretome data for major pathogens such as the malarial and filarial parasites, and the application of bioinformatics tools, will provide us with a working knowledge of host-pathogen interactions and the immune evasion strategies adopted by pathogenic organisms, which will in turn guide the development of therapeutics or vaccines.