Introduction

Primary plastid endosymbiosis and Paulinella chromatophora

Available data clearly demonstrate that plastids evolved from free-living cyanobacteria acquired by heterotrophic eukaryotic cells 1–2 billion years ago (Butterfield 2000; Douzery et al. 2004; Yoon et al. 2004; Kutschera and Niklas 2005; Kutschera 2009). This process, called primary endosymbiosis, resulted in plastids surrounded by two membranes. Such primary plastids are characteristic of three eukaryotic lineages: (i) glaucophytes, (ii) red algae, and (iii) green plants, including green algae and their land-plant descendants (Cavalier-Smith 2000; Palmer 2003; Gould et al. 2008; Archibald 2009). Although some authors still consider polyphyly of these photosynthetic groups and their plastids a reasonable hypothesis (Nozaki et al. 2007; Stiller 2007; Howe et al. 2008; Hampl et al. 2009), most now accept a monophyletic origin (Rodríguez-Ezpeleta et al. 2005; Reyes-Prieto et al. 2007; Burki et al. 2008). Consequently, glaucophytes, red algae, and green plants are grouped together in the kingdom of Archaeplastida or Plantae.

It has been common to assume that such an endosymbiosis was so difficult a process that it must be a unique evolutionary event, occurring only once in the common ancestor of archaeplastid lineages (Cavalier-Smith 2000; Keeling et al. 2005). Recently this view has been challenged by Paulinella chromatophora (Lauterborn 1895; Melkonian and Mollenhauer 2005), a thecate amoeba belonging to the supergroup Rhizaria (Bhattacharya et al. 1995; Yoon et al. 2009). P. chromatophora harbors two cyanobacterium-derived endosymbionts acquired independently of classical primary plastids ~60 million years ago (Marin et al. 2005, 2007; Archibald 2006; Yoon et al. 2006, 2009; Nowack et al. 2008) (Fig. 1). These photosynthetically active endosymbionts retain a peptidogylcan wall; like classical primary plastids they are surrounded by two membranes and are deeply integrated into the host cell’s metabolism and genetics.

Fig. 1
figure 1

Primary plastid endosymbiosis in the rhizarian amoeba Paulinella chromatophora. About 60 million years ago, a heterotrophic and aplastidal ancestor of P. chromatophora engulfed a cyanobacterium, which was then stably integrated within the host cell as a photosynthetic endosymbiont. Today, the endosymbiont/plastid maintains the peptidoglycan wall but has a significantly reduced genome that has lost many essential genes. It is estimated that more than 30 endosymbiont genes were transferred to the host nuclear genome through the process known as the endosymbiotic gene transfer (EGT). This suggests that protein products of these genes must be imported into Paulinella endosymbionts. Such transport could proceed co-translationally in vesicles derived from the host endomembrane system (1) or by a post-translational pathway involving protein-conducting channels (2)

This tight host–endosymbiont relationship is especially well demonstrated by substantial reductions of the Paulinella endosymbiont genomes, which have been fully sequenced in two different strains, CCAC 0185 and FK01 (Nowack et al. 2008; Reyes-Prieto et al. 2010). The sizes and coding capacities of both genomes have decreased approximately three fold, down to ~1 Mb and ~900 genes compared to a ~3 Mb genome encoding ~3,500 genes in their closest free-living relative, the cyanobacterium Synechococcus WH5701 (see Nowack et al. 2008; Reyes-Prieto et al. 2010). This drastic genome reduction has been accompanied by the loss of many genes with products involved in essential biosynthetic pathways, such as the synthesis of amino acids (e.g., glutamine, arginine, methionine) and co-factors (e.g., riboflavine, biotin, coenzyme A) (Nowack et al. 2008; Reyes-Prieto et al. 2010). Moreover, and more importantly, individual genes were lost from vital and otherwise intact biosynthetic pathways (e.g., hemD encoding uroporphyrinogen III synthase), gene expression machinery (e.g., ligA encoding NAD-dependent DNA ligase), and subcellular structures (e.g., sulA encoding cell-division inhibitor that blocks FtsZ polymerization). Finally, Paulinella endosymbiont genomes are especially poor in genes coding for solute channels and membrane transporters (Nowack et al. 2008; Reyes-Prieto et al. 2010). All these features strongly suggest that Paulinella endosymbionts import nuclear-encoded proteins in ways similar to other true cell organelles such as classical primary plastids (Bhattacharya and Archibald 2006; Yoon et al. 2006; Bodył et al. 2007; Mackiewicz and Bodył 2010) (Fig. 1).

The best candidates for genes with protein products that are imported into Paulinella endosymbionts/plastids are those transferred from the endosymbiont genome to the host nuclear genome through a process called endosymbiotic gene transfer (EGT) (Timmis et al. 2004; Bock and Timmis 2008) (Fig. 1). Recent Paulinella genome and transcriptome analyses have identified more than 30 nuclear-encoded genes acquired via EGT (Nakayama and Ishida 2009; Reyes-Prieto et al. 2010; Nowack et al. 2011). The actual number of EGT-derived genes is likely much greater, perhaps between 40 and 125 genes as estimated by Nowack et al. (2011), although this still is much lower than the ~1700–2500 genes transferred from the genomes of classical primary plastids to their hosts’ nuclear genomes (for reviews see Bock and Timmis 2008; Kleine et al. 2009). Many of the Paulinella transferred genes are engaged in photosynthesis or photo-acclimation of thylakoid membranes and are transcriptionally regulated by the host cell.

Possible import routes of proteins into Paulinella endosymbionts

Most proteins imported into classical primary plastids carry N-terminal targeting signals known as plastid transit peptides (Bruce 2000, 2001; Lee et al. 2008). These peptides are sufficient for their translocation across the plastid envelope with the help of (i) the translocon at the outer chloroplast membrane (Toc) and (ii) the translocon at the inner chloroplast membrane (Tic) (for reviews see Inaba and Schnell 2008; Jarvis 2008; Agne and Kessler 2009; Benz et al. 2009). Each of these translocons consists of several specialized protein subunits. Toc involves three kinds of such subunits: (i) Toc34, Toc64, and Toc159 function as transit peptide receptors, (ii) Toc75 forms a protein-conducting channel, and (iii) Toc12 is responsible for delivering imported proteins to the Tic translocon. The Tic translocon is composed of (i) Tic20, Tic21, and Tic110 that probably constitute three independent protein-conducting channels, (ii) Tic32, Tic55, and Tic62 that form a redox regulon, (iii) Tic22 that is responsible for the coordination of the Toc and Tic translocons and/or intermembrane space protein targeting, and (iv) the Tic40 co-chaperone that, along with the scaffold-channel Tic110 subunit and the stroma-residing Hsp93 and Hsp70 chaperones, provides a motor machinery to pull imported proteins into the stroma.

Interestingly, searches of the Paulinella endosymbiont genome from the CCAC 0185 strain have identified homologs of plant toc12, tic32, and tic21 genes (Bodył et al. 2010). By analogy to classical primary plastids, the Paulinella Tic21 homolog may form a protein-conducting channel in the inner plastid membrane, while the Tic32 protein could be engaged in the regulation of the import process via redox sensing (Bodył et al. 2010). The Paulinella Toc12 homolog resembles its plant counterpart by the presence of DnaJ and transmembrane domains, but differs in secondary structure and in the location of its transmembrane domain (Bodył et al. 2010). Since the DnaJ domain is also typical of Hsp40 proteins, it is possible that Paulinella Toc12, anchored in the inner endosymbiont membrane as a result of its hydrophobic domain, operates as one of the subunits of the molecular protein import motor (Fig. 2).

Fig. 2
figure 2

Hypothetical import pathways of nuclear-encoded proteins into the cyanobacterial endosymbionts of Paulinella chromatophora. One model postulates that these proteins carry an N-terminal signal peptide (SP) or signal anchor (SA) and are targeted to the endosymbionts in vesicles derived from the endoplasmic reticulum (ER) (left panel). However, by analogy with classical primary plastids and mitochondria, we cannot exclude the possibility that some proteins are equipped with a transit peptide (TP) or an internal targeting signal (ITS), which results in their translocation via some outer membrane channel (OMC) such as Omp85/Toc75, Tom40, or Tim22/OEP16 (right panel). Proteins released into the intermembrane space could migrate through the peptidoglycan wall freely, or with the assistance of molecular chaperones homologous to the higher plant Hsp70 (DnaK) as well as the bacterial DegP, FkpA, and PpiA, which are encoded in endosymbiont’s genome (Bodył et al. 2010). Protein translocation across the inner endosymbiont membrane could be mediated by a Tic-like translocon characteristic of classical primary plastids because the Paulinella endosymbiont genome encodes significant homologs to several Tic proteins (Bodył et al. 2010). The homologs of Hsp93, Hsp70, and Hsp40 could provide a pulling force to import proteins into the endosymbiont matrix

Although only three Toc and Tic homologs were found in the Paulinella endosymbiont genome, this does not exclude the possibility that other toc and tic genes were transferred to the host nuclear genome and their encoded proteins now are imported into the endosymbiont to create a Toc-Tic-like protein import apparatus. For example, the Paulinella Tic21, Tic32, and Toc12 homologs could form a translocon in the inner envelope membrane, together with other subunits (e.g., the endosymbiont-encoded Hsp93 and probably the nuclear-encoded Tic20, Tic55, and Tic62), that is very similar to the Tic system of classical primary plastids (Bodył et al. 2010). How proteins could pass across the outer membrane of Paulinella endosymbionts is less clear, however, because no homolog to the Toc75 pore (bacterial Omp85) is found in the endosymbiont’s genome (Bodył et al. 2010). At present, we cannot exclude the possibility that a gene encoding this protein was transferred to the host nuclear genome but has yet to be identified. Nevertheless, alternative ways for protein translocation across the outer membrane of Paulinella endosymbionts/plastids should be considered (Fig. 2).

As mentioned previously, the majority of proteins imported into classical primary plastids carry plastid transit peptides responsible for their Toc-Tic-dependent import, but some are equipped with typical endomembrane signal peptides (for reviews see Bhattacharya et al. 2007; Jarvis 2008; Bodył et al. 2009). It has been demonstrated experimentally that these proteins, specifically α-carbonic anhydrase, nucleotide pyrophosphatase/phosphodiesterase, and α-amylases (represented by αAmy3 and αAmy7), are trafficked to the plastids via the endomembrane system involving either the endoplasmic reticulum (ER) alone or in concert with the Golgi apparatus (Chen et al. 2004; Villarejo et al. 2005; Nanjo et al. 2006; Kitajima et al. 2009). Moreover, Armbruster et al. (2009) estimated that as many as 73 plastid-targeted proteins carry such signal peptides in Arabidopsis thaliana, constituting 5% of the plastid proteome. Therefore, it is reasonable that in P. chromatophora nuclear-encoded, endosymbiont-targeted proteins also could be delivered to the outer endosymbiont membrane in vesicles derived from the host’s endomembrane system. In support of this hypothesis, bioinformatics analyses of the sequence from Paulinella photosynthetic psaE gene, which was transferred to the host nuclear genome in the FK01 strain (Nakayama and Ishida 2009), revealed an upstream sequence with qualities of a typical signal peptide (Mackiewicz and Bodył 2010). Applied programs also showed the gene to encode an unambiguous cleavage site for this hypothesized peptide. These findings imply that the PsaE protein is first translocated into the ER lumen where it is processed, and then it is most likely targeted to the endosymbiont’s outer membrane in vesicles derived from the endomembrane system.

The above studies suggest that protein targeting to Paulinella endosymbionts/plastids proceeds via the endomembrane system, however, other bioinformatics analyses of nine EGT-derived genes from Paulinella CCAC 0185 strain did not yield evidence for a universal targeting signal (Nowack et al. 2011). Only products of two transferred genes showed some signal peptide predictions and in one case, PsaE, a weakly supported mitochondrial transit peptide was suggested. Because these analyses provided somewhat ambiguous results, we reanalyzed these sequences by considering more polypeptide variants based on potential translation initiation sites and applying additional bioinformatics tools that predict different targeting signals. We also performed statistical analyses of the basic properties of these hypothesized proteins, including molecular weights, charges, and amino acid compositions, as well as investigate the origins of their potential targeting signals.

Materials and methods

The sequence of psaE from the P. chromatophora FK01 strain was kindly supplied by Dr. Takuro Nakayama and Dr. Ken-ichiro Ischida (Nakayama and Ishida 2009), whereas other EGT candidates with sequenced 5′ ends from the P. chromatophora CCAC 0185 strain were obtained from Nowack et al. (2011). The set of 867 amino acid sequences encoded in the Paulinella endosymbiont genome was downloaded from Genbank (http://www.ncbi.nlm.nih.gov), and 1,762 sequences of proteins imported into classical primary plastids with annotated transit peptides were extracted from the Uniprot database (http://www.uniprot.org).

We applied 25 bioinformatics tools using default settings and all 35 eukaryotic models predicting potential N-terminal targeting signals, including (i) signal peptide, (ii) signal anchor, (iii) plastid transit peptide, and (iv) mitochondrial transit peptide (Table 1). Twelve of these distinguish three types of the targeting signals (i.e., signal peptide, plastid transit peptide, and mitochondrial transit peptide), while 18 algorithms were used to recognize signal peptides/signal anchors, two for plastid transit peptides, and three for mitochondrial transit peptides. To predict transmembrane α-helices in cyanobacterial homologs of Paulinella EGT-derived proteins, we used ConPred II with a selected prokaryote model (Arai et al. 2004).

Table 1 Programs applied in this study that predict different kinds of N-terminal targeting signals including: signal peptide or signal anchor (SP/SA), plastid transit peptide (pTP), and mitochondrial transit peptide (mTP)

Alignments of Paulinella sequences with their 10 top BLAST cyanobacterial homologs found in Genbank database (http://www.ncbi.nlm.nih.gov) were obtained in M-Coffee (Moretti et al. 2007) and prepared in Jalview (Waterhouse et al. 2009). Hydropathy plots of sequences were made assuming Kyte–Doolittle scale (Kyte and Doolittle 1982) and the sliding window length of 11 residues. Molecular weights and charges of proteins were calculated using pepstats from the EMBOSS 3.0.0 package (Rice et al. 2000). The non-parametric U Mann–Whitney test implemented in Statistica software (StatSoft, Inc., 2006) was used to estimate the statistical significance of differences in molecular weight, charge, and amino acid composition of analysed protein sets.

Numbers of amino acid substitutions per site between Paulinella CsoS4A and its 14 closest cyanobacterial homologs were estimated in TREEFINDER (Jobb et al. 2004) as maximum likelihood distances calculated under the best-fit model Dayhoff + Γ(5), whereas in the case of the Paulinella homolog to Synechococcus WH5701_13905, and its ten closest cyanobacterial sequences, the JTT + Γ(5) model was used. The relative numbers of non-synonymous (dN) and synonymous (dS) substitutions for these sequences were calculated according to the modified Nei–Gojobori method, assuming p-distance (Nei and Gojobori 1986) as implemented in MEGA 5.03 (Tamura et al. 2011). Protein domain searches were performed in NCBI CDD database (Marchler-Bauer et al. 2011).

Results

Prediction of targeting signals in Paulinella EGT-derived proteins

We analyzed upstream regions in nine newly identified EGT candidates with determined 5′-ends from the Paulinella CCAC 0185 strain (Nowack et al. 2011). All these proteins are listed in Table 2. The PsaE gene from Paulinella strain FK01 (Nakayama and Ishida 2009) also was included for comparison. Following other recent approaches (Mackiewicz and Bodył 2010; Nowack et al. 2011), we considered pre-sequences starting at all possible eukaryotic translation initiation sites: AUG, UUG, CUG, GUG, AUA, AUC, AUU, and ACG. When a translated sequence did not contain a stop codon in-frame in the upstream region of the mature protein, the longest possible peptide also was studied. A mature protein sequence was included in this analysis as a control. To predict potential N-terminal targeting signals, we used eukaryotic models that distinguish different N-terminal targeting signals, and algorithms specialized for prediction of signal peptides/signal anchors, plastid transit peptides, and mitochondrial transit peptides (Table 1).

Table 2 Number of algorithms that predict a given targeting signal for the pre-sequences of Paulinella nuclear-encoded endosymbiont-targeted proteins considering all possible translation initiation sites (TIS)

Results of these analyses are presented in Table 2. As previously shown (Mackiewicz and Bodył 2010), 90% of algorithms predicted a signal peptide for the longest polypeptide that could be translated from a psaE open reading frame encoded in the nuclear genome of Paulinella FK01. In the case of CCAC 0185 strain, the signal peptide also was recognized confidently in 73% for Paulinella PsbN translation initiation site variants and in 53% for a Paulinella homolog of Synechococcus WH5701_13415; however, the mature polypeptides of the two Paulinella PsaK proteins reached only 40 and 33% predictability for signal peptides. For all these proteins with potential signal peptides, except for PsaE from the FK01 strain, there were only ambiguously predicted hypothetical cleavage sites, and these were scattered within regions of their mature protein sequences. These results suggest that the N-terminal part of these proteins functions as a signal peptide enabling their co-translational translocation into the ER lumen, but that this region is not processed, which resembles the signal anchor (see the next section).

Interestingly, in contrast to the PsaE from Paulinella FK01 strain, the PsaE encoded in the nuclear genome of Paulinella CCAC 0185 strain show no traits of a signal peptide. Its longest hypothetical polypeptide was predicted to have a mitochondrial transit peptide by 60% of the algorithms employed. We also note that the remaining proteins analyzed from the Paulinella CCAC 0185 strain did not show significant predictions of any targeting signals (Table 2).

Origin of potential targeting signals in Paulinella EGT-derived proteins

To reconstruct how the potential N-terminal targeting signals could have evolved, we compared sequences of Paulinella EGT-derived proteins with their ten most similar cyanobacterial homologs from BLAST searches (Fig. 3). The alignment of PsaE sequences shows that annotated cyanobacterial homologs wholly align only with the mature part of two PsaE proteins (Fig. 3a). To test whether Paulinella PsaE extensions likely evolved from sequences upstream of their cyanobacterial homologs, we compared these cyanobacterial regions with the Paulinella pre-sequences (data not shown). The respective sequences did not align with each other. Moreover, sensitive TBLASTN searches (with word size = 2 and other parameters adjusted for short sequences) using Paulinella PsaE pre-sequences as a query did not reveal any significant hits in cyanobacterial genomes. This indicates that the sequences encoding the potential signal peptide in the PsaE from FK01 strain, and the mitochondrial transit peptide-like signal of the homologous protein in the CCAC 0185 strain, were probably added after independent EGT events in their host nuclear genomes. The PsaE protein functions as subunit IV of the photosystem I (PSI) reaction center and is located on the stromal side of the thylakoid membrane (Barth et al. 1998; Klukas et al. 1999; Jordan et al. 2001; Jeanjean et al. 2008). Plant homologs of Paulinella PsaE are also equipped with N-terminal targeting signals but in these cases they represent typical plastid transit peptides responsible for import mediated by the Toc-Tic supercomplex.

Fig. 3
figure 3

Alignments of ten top BLAST cyanobacterial homologs with the following Paulinella sequences: a two PsaE proteins from FK01 (Pau_FK01) and CCAC0185 (Pau_CCAC0185) strains, b the homolog to Synechococcus WH5701_13415 protein (Pau_13415), c PsbN protein (Pau_PsbN), and d two PsaK proteins (Pau_PsaK_1, Pau_PsaK_2). Alignments were prepared in Jalview (Waterhouse et al. 2009) assuming Clustal X color scheme (see online version). Abbreviations: Cya, Cyanobium sp.; Cyt, Cyanothece sp.; Nod, Nodularia spumigena; Nos, Nostoc sp.; Sye, Synechococcus elongatus; Syn, Synechococcus sp.; Pro, Prochlorococcus marinus; The, Thermosynechococcus elongatus. Rectangles under the alignments show transmembrane domains predicted in at least 50% sites of cyanobacterial homologs by ConPred II (Arai et al. 2004)

Interestingly, the Paulinella homolog of Synechococcus WH5701_13415 shows quite a good signal peptide prediction but, unlike its cyanobacterial homologs, does not have any significant N-terminal extension (Fig. 3b). We suggest that, in this case, the N-terminus of the Paulinella protein was adapted to play the role of a signal peptide. In support of this hypothesis, the Paulinella sequence has lost some positively charged and polar residues, which are conserved in cyanobacteria homologs. Consequently, the N-terminal region of the Paulinella sequence has a more hydrophobic character, as is clearly visible in hydropathy profiles (Fig. 4a).

Fig. 4
figure 4

Hydropathy profiles of Paulinella EGT-derived proteins and their closest cyanobacterial homologs: a homolog to Synechococcus WH5701_13415 protein (Pau_13415), b PsbN protein (Pau_PsbN), and c PsaK protein (Pau_PsaK_1). The plots were made assuming Kyte–Doolittle scale (Kyte and Doolittle 1982) and the sliding window length of 11 residues. Rectangles over the profiles show transmembrane domains predicted in at least 50% of sites in cyanobacterial homologs by ConPred II (Arai et al. 2004). The x-axis corresponds to alignment positions. For the abbreviations of sequence names see legend to Fig. 3

Similarly, the Paulinella PsbN variant that shows high signal peptide predictability is similar in length to its cyanobacterial homologs and is not equipped with a substantial N-terminal extension (Fig. 3c). The function of the PsbN protein is still unknown. In photosynthetic eukaryotes, PsbN is encoded in plastid genomes. Some data suggest that this protein represents a component of photosystem II (PSII) (Ikeuchi et al. 1995; Zouni et al. 2001) but this has not been confirmed in other studies (Kashino et al. 2002a, b). Nevertheless, it cannot be ruled out that this protein can sometimes bind transiently to PSII (Plöscher et al. 2009). The presence of reliably predicted transmembrane α-helices in the N-termini of PsbN sequences (Fig. 3c), suggests that this protein is anchored in the thylakoid membrane. Because both signal peptides and transmembrane domains are enriched in hydrophobic residues, it is reasonable that the N-terminal region of Paulinella PsbN mimics a signal peptide and can be recognized by the SRP (signal recognition particle) during its targeting to the ER membrane. The SRP would enable co-translational translocation of this protein into the ER lumen and its subsequent transport in ER- or Golgi-derived vesicles to the Paulinella endosymbiont/plastid. However, in contrast to typical signal peptides, the N-terminal region of Paulinella PsbN most likely is not removed and is required to anchor this protein in its subcellular target, the endosymbiont’s thylakoid membrane. Actually, algorithms that predict cleavage sites of signal peptides gave ambiguous results and located them within the mature region of protein sequence, which suggests that the potential signal peptide is not processed. Furthermore, in contrast to most cyanobacterial homologs, Paulinella PsbN lacks several negatively charged, polar, and hydroxylated residues at its N-terminus. Consequently, it has a longer hydrophobic region than cyanobacterial proteins, which extends toward the N-terminus of the sequence (Fig. 4b).

We hypothesize that an adaptation of the transmembrane domain toward signal peptide function occurred in the two Paulinella PsaK proteins as well. PsaK is subunit X of photosystem I (PSI) (Jone et al. 1991; Jordan et al. 2001) and has two transmembrane α-helices responsible for its insertion into the thylakoid membrane (Kjaerulff et al. 1993; Mant et al. 2001; Düring et al. 2007). In higher plants, this protein is equipped with a typical transit peptide responsible for its import via the Toc-Tic supercomplex (Kjaerulff et al. 1993). Interestingly, the two Paulinella PsaK proteins, which are devoid of N-terminal extensions, are further shortened at their N-termini by at least seven amino acid residues compared with their cyanobacterial homologs (Fig. 3d). In some cyanobacterial PsaK sequences this region shows a weaker hydrophobic character than the more proximal main transmembrane domain (Fig. 4c). Consequently, the N-terminal transmembrane α-helix is located closer to the beginning of the sequence in Paulinella PsaK proteins, which could enable the evolution of signal peptide-like domains at their N-termini. Moreover, the N-terminal ends of Paulinella PsaK sequences do not contain positively charged residues that are conserved in cyanobacteria and are poorer in hydroxylated residues than their cyanobacterial homologs.

Molecular weights and charges of Paulinella EGT-derived proteins

It is notable that the majority of Paulinella EGT-derived genes identified to date encode small proteins (Nowack et al. 2011). Moreover, Mackiewicz and Bodył (2010) found that PsaE from Paulinella strain FK01 has the same number of positively and negatively charged residues. They proposed this as an adaptation to the passage of this protein through the negatively charged peptidoglycan wall located in the intermembrane space of the endosymbiont’s envelope (Kies and Kremer 1979). To check how generally representative these properties are for Paulinella EGT-derived proteins (i.e., nuclear-encoded, endosymbiont-targeted proteins), we compared their calculated molecular weights and charges against all proteins encoded in the Paulinella endosymbiont genomes, as well as proteins imported into classical primary plastids by means of transit peptides.

Results of these comparisons are summarized in Table 3. Paulinella EGT-derived proteins have, on average, four times lower molecular weight than Paulinella endosymbiont genome-encoded proteins and five times lower molecular weight than proteins targeted to classical primary plastids. Moreover, there are no plastid proteins smaller than the largest Paulinella EGT-derived protein and only 6% of Paulinella endosymbiont genome-encoded proteins fulfil this criterion. In addition, Paulinella EGT-derived proteins have more closely balanced numbers of positively and negatively charged residues than do proteins from the other two sets. All these differences in molecular weight and charge are statistically significant at P < 0.001 when analyzed by the U Mann–Whitney test.

Table 3 Average and minimal-maximal range of molecular weight and absolute value of charge for three sets of plastid proteins

The low-molecular weights and almost neutral charges of Paulinella EGT-derived proteins fit well with the properties of proteins that can pass freely through the peptidoglycan wall. In agreement with this hypothesis, Demchick and Koch (1996) demonstrated that globular, uncharged proteins up to 24 kDa in molecular weight pass freely through the isolated unstretched peptidoglycan sacculi of Escherichia coli (Gram-negative bacterium) and Bacillus subtilis (Gram-positive bacterium). Interestingly, the molecular weights of all variants of the Paulinella EGT-derived proteins analyzed are well below that threshold (Table 3), whereas only 38% of Paulinella endosymbiont genome-encoded proteins and 22% of plastid-targeted proteins have weights under 24 kDa.

Amino acid composition of Paulinella EGT-derived proteins

Because Paulinella EGT-derived proteins showed distinctive features in their molecular weights and charges, we also examined them for any peculiar characteristics in their amino acid compositions compared with their ancestral set, that is, proteins encoded in the Paulinella endosymbiont genome. We found that sequences from these two sets differ significantly in composition of 11 amino acid residues when analyzed in the U Mann–Whitney test (Table 4). Compared to those still encoded in the endosymbiont genome, Paulinella nuclear-encoded, endosymbiont-targeted proteins are richer in glycine, valine, methionine, and threonine, but poorer in arginine, serine, isoleucine, cysteine, leucine, glutamine, and histidine.

Table 4 Average and quartile (Q1–Q2) range of amino acid percentages for Paulinella endosymbiont- and nuclear-encoded proteins

Discussion

Targeting signals of Paulinella EGT-derived proteins

Our analyses show that signal peptide-like sequences are the most commonly predicted N-terminal targeting signals in the Paulinella EGT-derived proteins studied, and were identified in five of the ten proteins analyzed (Table 2). This suggests that the signal peptide-carrying proteins are targeted to Paulinella endosymbionts/plastids via the host endomembrane system (Fig. 2) (see also Mackiewicz and Bodył 2010). The results obtained are rather unexpected because, by analogy to classical primary plastids, we might expect proteins imported into Paulinella endosymbionts/plastids to use N-terminal targeting signals resembling plastid transit peptides (Bruce 2000, 2001; Lee et al. 2008). Interestingly, a comparison of Paulinella EGT-derived proteins with their closest cyanobacterial homologs shows that the signal peptide-like sequences have different origins (Fig. 3). The signal peptide of PsaE protein from FK01 strain probably represents a typical cleavable signal peptide (see also Mackiewicz and Bodył 2010) that was added after transfer of this gene to the host nuclear genome. The Paulinella homolog of Synechococcus WH5701_13415 does not have such an extension, but its existing N-terminal sequence has acquired new properties of signal peptides and could play the same role (Table 2; Fig. 4). The N-terminal ends of PsbN and two PsaK proteins contain transmembrane domains that also show features of signal peptides and also could fulfil this function.

Some similarity of N-terminal transmembrane domains to signal peptides has been observed in plastid and mitochondrial outer membrane proteins (Kanaji et al. 2000; Lee et al. 2001, 2004; Horie et al. 2003; Waizenegger et al. 2003; Hofmann and Theg 2005). It was shown that several charged residues adjacent to the transmembrane domain play a crucial role in distinguishing these proteins from those directed to the endomembrane system. For example, experimental replacement of such residues with uncharged glycine, or their complete deletion, caused mistargeting of plastid AtOEP7 and AtToc64 proteins to the endoplasmic reticulum or the plasma membrane (Lee et al. 2001, 2004). Similar mutations that increase the hydrophobicity of the transmembrane domains and decrease the net positive charge within the flanking regions of mitochondrial outer membrane proteins Tom5 and Tom20 (Kanaji et al. 2000; Horie et al. 2003; Waizenegger et al. 2003) result in mistargeting to the endomembrane compartments. Interestingly, we observed similar changes and substitutions in the Paulinella PsbN and PsaK sequences compared with their cyanobacterial homologs (Fig. 3). This strongly suggests that these changes in the Paulinella transmembrane proteins represent the acquisition of signal peptide properties in their N-terminal transmembrane domains.

In contrast to the PsaE protein from Paulinella FK01 strain, its counterpart from the CCAC 0185 strain has a putative mitochondrial transit peptide (Table 2), which implies post-translational import perhaps involving a mitochondrial protein-conducting channel that was relocated to the outer endosymbiont membrane (Fig. 2). Good candidates for such a translocation channel are Tom40 and Tim22. In support of this hypothesis, the outer membrane of higher plant plastids contains the OEP16 channel for protochlorophyllide oxidoreductase A (Reinbothe et al. 2004; Pollmann et al. 2007), which probably evolved from a relocated mitochondrial Tim22 pore (Cavalier-Smith 2006). A third mitochondrial candidate could be the homolog of Omp85 (and, therefore, Toc75), but this gene has been identified to date only in trypanosomatid parasites (Pusnik et al. 2011). At present, we cannot exclude the possibility that the mRNA sequence of psaE obtained from CCAC 0185 strain is incomplete and could yet be found to contain upstream sequence encoding signal peptide properties. The published sequence is not limited by an in-frame stop codon upstream of the mature protein. Moreover, the available N-terminal extensions of the PsaE proteins from two Paulinella strains differ significantly (Fig. 3). This suggests that the PsaE proteins from each of these strains evolved distinct targeting signals when their genes were independently transferred to the hosts’ nuclear genomes. This possibility is supported by substantial differences between these two Paulinella psaE genes, including different intron positions, intron sequences, and 5′ and 3′ untranslated regions (Nowack et al. 2011).

The two-membrane envelope of Paulinella endosymbionts/plastids and the endomembrane system-mediated targeting of their nuclear-encoded proteins

Paulinella endosymbionts/plastids are surrounded by two membranes. Their inner membrane is certainly derived from the cyanobacterial plasmalemma, but origin of the outer membrane is less clear (Bodył et al. 2010; Mackiewicz and Bodył 2010). Because the cyanobacterial ancestor of Paulinella endosymbionts/plastids was surrounded by two membranes, it could be hypothesized that their outer membrane corresponds directly to the outer negibacterial membrane. In contrast, the cyanobacterial outer membrane could have been lost and replaced entirely by the host phagosomal membrane. It is also important to consider the possibility that the outer membrane of Paulinella endosymbionts/plastids has a chimeric origin, and contains components of both bacterial and eukaryotic membranes. The cyanobacterium initially engulfed by P. chromatophora was undoubtedly surrounded by three membranes, the host phagosomal membrane and the two envelope membranes of the endosymbiont, i.e., its plasma membrane and outer membrane (Bodył et al. 2010; Mackiewicz and Bodył 2010). In the initial stages of the endosymbiosis, it is reasonable that uncoordinated divisions of the cyanobacterium, and the phagosome containing it, resulted in regular escapes of these endosymbionts into the host cytosol. During these escapes the outer cyanobacterial membrane could have acquired some lipids and proteins from the phagosomal membrane, a kind of membrane mutation as termed by Cavalier-Smith (2000). This would have led to a chimeric bacterial–eukaryotic membrane (see also Bodył et al. 2009). The existence of clear signal peptides in Paulinella nuclear-encoded, endosymbiont-targeted proteins is compatible with this scenario.

The above evolutionary scenario is consistent with the process that led to classical primary plastids. As with Paulinella endosymbionts/plastids, they have a cyanobacterial origin and are surrounded by two membranes (Cavalier-Smith 2000; Palmer 2003; Gould et al. 2008; Archibald 2009). It was argued for many years that the outer plastid membrane was derived directly from the cyanobacterial outer membrane (Cavalier-Smith 2000); however, it contains lipids characteristic of negibacterial outer membranes (e.g., galactolipids), as well as those found in eukaryotic phagosomal membranes (e.g., phosphatidylcholine) (see Kilian and Kroth 2003). This discovery inspired the hypothesis that the outer plastid membrane has a chimeric bacterial–eukaryotic origin (Kilian and Kroth 2003; Bodył et al. 2009). The identification of many nuclear-encoded, plastid-targeted proteins with signal peptides in higher plants and green algae (Bhattacharya et al. 2007; Jarvis 2008; Armbruster et al. 2009; Bodył et al. 2009) provides additional strong support for this hypothesis.

Possible import routes of Paulinella proteins without N-terminal targeting signals

In contrast to the proteins discussed above, four Paulinella EGT-derived proteins show no evidence of any kind of N-terminal targeting signals (Table 2). It is possible that the sequences of Paulinella homologs to CsoS4A and Synechococcus WH5701_06721 are extended upstream and, therefore, could encode N-terminal targeting signals; none of their upstream sequences are constrained by in-frame stop codons. In contrast, the homologs to Hli and Synechococcus WH5701_13905 are most likely complete sequences.

The Paulinella Hli (high-light inducible protein), like its homologs, possesses a predicted α-helical transmembrane domain and probably is anchored in the thylakoid membrane (Funk and Vermaas 1999; Montané and Kloppstech 2000; He et al. 2001; Bhaya et al. 2002). The other three proteins do not have such recognizable transmembrane domains and, therefore, could reside in the intermembrane space or in the matrix. This indicates that the four proteins must pass one or two envelope membranes during their import into Paulinella endosymbionts/plastids. In such trafficking, they could be using some targeting signals that escaped detection by the algorithms used in this study, which are specialized for predicting classical N-terminal signals.

One such non-classical targeting signal is represented by the C-terminal cleavable region composed of positively charged residues, which was identified in the mitochondrial matrix-residing DNA helicase Hmi1 (Lee et al. 1999). Targeting signals that are not cleaved also were identified in proteins targeted to the inner membrane of classical primary plastids, including ceQORH (Miras et al. 2002, 2007) and Tic32 (Nada and Soll 2004). Their targeting signals appear to enable import into the plastid stroma from where they can be inserted into the inner membrane. The plastid-targeting signal in Tic32 probably is located in its first ten N-terminal residues, whereas in ceQORH it is encoded in an internal domain of 40 residues that is essential but not sufficient for correct plastid localization because it must act in concert with two adjacent domains required for import (Miras et al. 2007). It was demonstrated that import of ceQORH and Tic32 is mediated by a Toc-independent pathway because their translocation involves neither the Toc159 receptor nor the Toc75 channel (Miras et al. 2007). Interestingly, the gene for Toc75 was not identified in the Paulinella endosymbiont genome (Bodył et al. 2010) and is probably also absent from the host nuclear genome (Nowack et al. 2011). The lack of evidence for the presence of Toc75 and other Toc proteins, as well as the absence of recognizable plastid transit peptides in P. chromatophora, suggest that endosymbiont-directed proteins use an import route similar to the alternative pathways used by transit peptide-devoid proteins imported into higher plant plastids. Proteins with non-canonical signals targeted to classical primary plastids are likely to be more common than previously thought because proteomic studies of the A. thaliana plastid proteome revealed 142 (from 604) proteins without the N-terminal cleavable targeting pre-sequences (Kleffmann et al. 2004).

One very interesting aspect of Paulinella EGT-derived proteins is the nature of proteins located in the mitochondrial intermembrane space (IMS). They also are characterized by low molecular weight (7–15 kDa) and the absence of N-terminal transit peptides (Lutz et al. 2003; Herrmann and Hell 2005; Neupert and Herrmann 2007). Many contain conserved patterns of cysteine (and histidine) residues that enable them to bind cofactors or form disulfide bridges. Their translocation through the outer membrane translocons (Tom) requires them to be folded in the IMS, which is triggered by the acquisition of cofactors or by intramolecular disulfide bridges. However, the majority of Paulinella EGT-derived protein variants studied do not have any cysteine residue and eight have only one such residue, which is insufficient to form a disulfide bridge. These sequences also are histidine poor; ten have only one His residue and each of two translation initiation site variants of PsaK 2 copy has three histidines. The other class of IMS mitochondrial proteins that lacks classical transit peptides requires binding to affinity sites for translocation (Herrmann and Hell 2005; Neupert and Herrmann 2007). The targeting signal identified in a representative of this class, heme lyase, consists of a complex pattern of hydrophilic residues (Diekert et al. 1999). Concentration of such residues can be found in Paulinella EGT candidates lacking N-terminal signals but their importance for targeting should be verified experimentally. Similarly, some targeting information can be carried by several amino acids that are used more frequently in the EGT-derived proteins than in proteins encoded in the Paulinella endosymbiont genome (Table 4). They could, for example, facilitate protein import through some unidentified outer membrane channels (Fig. 2) or be involved in still unknown trafficking mechanisms.

Are some Paulinella nuclear-encoded proteins targeted to the endosymbiont as mRNAs?

There is one more possible route for the product of a nuclear-encoded gene to be expressed in an organelle that does not require any targeting signal in the protein product; it operates at the nucleic acid rather than the protein level. Gómez and Pallás (2010) showed recently that a viroid-derived ncRNA acting as a 5′UTR-end mediates the specific import of mRNA for Green Fluorescent Protein into plastids of the tobacco Nicotiana benthamiana. These results suggest the existence of an alternative transport pathway into plastids, where an ncRNA functions as a key regulatory molecule to control the import of plastid-directed, nuclear genes into this organelle. Such import of transcripts instead of proteins would explain the lack of targeting signals in some Paulinella EGT candidates.

An ongoing process of endosymbiotic gene transfer can explain the absence of targeting signals in some Paulinella nuclear-encoded proteins

Data from the above-discussed peculiar plastid and mitochondrial proteins suggest that the Paulinella EGT-derived proteins that lack recognizable targeting signals still could be imported into these endosymbionts; however, it is possible that some of them are not imported or that their import proceeds with low efficiency. Interestingly, copies of Paulinella homologs to both CsoS4A and Synechococcus WH5701_13905 that were transferred to the nucleus, both of which also still are retained in the endosymbiont’s genome, exhibit very high-substitution rates (Nowack et al. 2011). Estimated average numbers of amino acid substitutions per site between the nuclear copies and closest cyanobacterial sequences are 1.81 and 1.25 for Paulinella homologs to CsoS4A and Synechococcus WH5701_13905, respectively. The numbers for the corresponding endosymbiont-encoded copies are only 0.35 and 0.18 amino acid substitutions per site. Weaker purifying selection on nuclear copies also is evident at the DNA level. They show higher average dN/dS values compared to their cyanobacterial homologs. The ratios for Paulinella nuclear homologs to CsoS4A and Synechococcus WH5701_13905 are 0.40 and 0.50, whereas for the endosymbiont’s counterparts it is 0.20 and 0.15, respectively. In addition, an SH3 protein domain was not detected in the Paulinella nuclear homolog to Synechococcus WH5701_13905 although it was found in the endosymbiont’s copy and all cyanobacterial homologs. Domains typical for CsoS4A, such as ethanolamine utilization protein and carboxysome structural protein domain, were identified in the Paulinella nuclear gene copy but with 3 × 109 times higher E-value and 1.5 times lower bit score than in the endosymbiont and cyanobacteral homologs. These results suggest that the original functions of the Paulinella homologs to CsoS4A and Synechococcus WH5701_13905 could have changed after transfer to the nucleus. It is also possible that the Paulinella nuclear copies have not yet acquired efficient targeting signals and, therefore, the endosymbiont’s copies are still maintained.

Molecular weights and charges of Paulinella EGT-derived proteins

The most distinctive features of the EGT-derived proteins analysed are their low molecular weights and nearly neutral charges, which fit well with the properties of proteins known to cross peptidoglycan walls. The permeability limit of ~24 kDa we compared to Paulinella proteins is based on the peptidoglycan wall of E. coli, which is thinner (2–7 nm) than in cyanobacteria (e.g., 10 nm in the genus Synechococcus) (Hoiczyk and Hansel 2000). In some cases, however, it was found that the peptidoglycan wall can change in thickness locally, which could be also true in P. chromatphora. For example, 75–80% of the E. coli peptidoglycan surface is 2.5 nm thick while the remaining areas are ~7 nm in width (Labischinski et al. 1991). There is also the possibility of local lesions in the peptidoglycan wall, as was discovered at the contact sites of translocation pores in the outer and inner envelope membranes of glaucophyte plastids (Steiner and Löffelhardt 2005); however, if comparable import sites associated with local lesions have not evolved yet in the peptidoglycan wall in Paulinella endosymbionts/plastids, genes coding small proteins will be preferably transferred to the host’s nucleus. This could explain why the mass of nuclear-encoded proteins targeted to Paulinella endosymbionts/plastids (4.4–9.1 kDa) is several times lower than the benchmark limit of 24 kDa from E. coli. Moreover, the size of peptidoglycan pores, with diameters of ~2 nm, is probably more restrictive to protein passage than is the wall’s overall thickness. Interestingly, it was found that pores in the peptidoglycans from Gram-negative and -positive bacteria have similar average sizes and are relatively homogenous in size as well (Demchick and Koch 1996). If we assume that peptidoglycan pores in Paulinella endosymbionts/plastids are the same size as in other eubacteria, then the size limit determined for E. coli proteins should also be valid for Paulinella proteins.

Diffusion of charged molecules also can be influenced by peptidoglycan anionic groups (Steiner and Löffelhardt 2005). Consequently, it was suggested initially that the 1-carboxyl groups of d-glutamic acid of peptidoglycans in walls of glaucophyte plastids are amidated with N-acetylputrescine to lower their overall negative charge and polarity, thereby enhancing the passage of nuclear-encoded proteins into the plastid (Pfanzagl et al. 1996a, b; Pfanzagl and Löffelhardt 1999). Although the glaucophyte plastid-targeted proteins investigated to date turn out to be imported post-translationally, through a Toc-Tic supercomplex that bypasses the peptidoglycan wall (Steiner et al. 2005; Steiner and Löffelhardt 2005), problems associated with protein diffusion through the peptidoglycan wall still could be valid for Paulinella endosymbionts. So far, there is no evidence of a Toc-like translocon penetrating the endosymbiont envelope along with the Tic system (Bodył et al. 2010). Moreover, at least some of Paulinella EGT-derived proteins (with recognized signal peptides) are likely delivered to the endosymbionts in vesicles budding off from the host endomembrane system and are released into the endosymbiont intermembrane space, from where they must cross the peptidoglycan wall (Fig. 2) (Mackiewicz and Bodył 2010). It should be also pointed out that potential post-translational protein translocation through some protein-conducting channels in the outer endosymbiont membrane does not exclude release of the imported proteins directly to the intermembrane space. It is possible that such channels, even if present, have not developed a stable connection with the Tic-like translocons residing in the inner endosymbiont membrane.