Background

After malaria, Entamoeba histolytica is the second leading cause of death due to parasitic disease in humans [1]. E. histolytica has been cited as infecting one tenth of the world population, although it is now known that these infections are caused by two very similar species, E. histolytica and E. dispar. The former is the cause of all invasive disease, with an estimated 50,000 to 100,000 fatalities each year [2]. This human parasite was traditionally considered a classic example of a primitive eukaryote due to its apparent lack of `typical' eukaryotic cell structures such as mitochondria, peroxisomes, Golgi apparatus and endoplasmic reticulum [3]. The lack of morphologically identifiable mitochondria led to the suggestion that its ancestors predate the endosymbiotic acquisition of this organelle [4], despite the observation that Entamoeba branches after well established mitochondrial groups in ribosomal RNA phylogenies [5].

The "primitively amitochondrial" view was overturned by the discovery of genes encoding mitochondrial proteins (e.g., chaperonin 60 (Cpn60), mitochondrial-type Hsp70 (mtHsp70), pyridine nucleotide transhydrogenase (PNT)), and by the demonstration that mitochondrial remnant organelles (mitosomes) housing chaperonin Cpn60 have been retained in this organism [69]. Several lines of evidence support the mitochondrial ancestry of mitosomes: i) Cpn60 and mtHsp70 cluster with mitochondrial homologs to the exclusion of prokaryotic sequences in phylogenetic reconstructions; ii) Cpn60, mtHsp70 and PNT contain amino terminal regions rich in hydroxylated and positively charged amino acids, reminiscent of mitochondrial/hydrogenosomal targeting presequences; iii) Deletion of amino acids 2–15 from the putative targeting presequence of Cpn60 leads to an accumulation of the truncated protein in the cytosol, a phenotype that can be reversed by the addition of a functional mitochondrial targeting signal from Trypanosoma cruzi Hsp70 to the truncated protein [8].

Since the discovery of mitosomes in E. histolytica, mitochondrial remnant organelles have also been identified in the microsporidian Trachipleistophora hominis [10], the apicomplexan Cryptosporidium parvum [11] and, most recently, in the diplomonad Giardia intestinalis [12]. Giardia mitosomes have been shown to function in FeS cluster biosynthesis and FeS protein maturation [12], essential mitochondrial functions of eukaryotic organisms [13]. FeS proteins are involved in energy metabolism, DNA repair, transcriptional regulation, and biosynthesis of nucleotides and amino acids [14]. The identification of genes encoding putative Isc proteins in the genomes of all amitochondrial protists sequenced so far [1519] suggests that this mitochondrial function might have been retained in all amitochondrial protists and may be a general functional feature of all mitochondrion-derived organelles [12, 16, 2022].

Here we report on the cloning, structural characterization and phylogenetic analysis of E. histolytica genes encoding Isc proteins. Both E. histolytica IscU and IscS homologs were found to contain all the structural features required for their biological activity, including substrate and co-factor binding sites, suggesting a fully operational FeS cluster biosynthetic pathway in E. histolytica. Phylogenetic analyses show that both Isc proteins have a different evolutionary history to that of mitochondrial homologs, indicating their lateral acquisition from bacteria. Moreover, the observation that both proteins seem to have been acquired from the same bacterial taxon might suggest a single transfer event of a small bacterial Isc operon.

Results and Discussion

Identification and primary sequence analyses of E. histolytica genes encoding the FeS assembly proteins IscS and IscU

BLAST searches of preliminary data generated by the E. histolytica genome-sequencing project revealed clones with extensive sequence similarity to the G. intestinalis iscS gene. PCR amplification of E. histolytica genomic DNA using primers based on these putative E. histolytica EhiscS sequences and on a putative EhiscU sequence (accession number: AY040613) generated products of the expected size. DNA sequencing confirmed the identity of the amplified clones. The 5' untranslated regions of EhiscU and EhiscS contain distinct putative promoter elements reported to be typical for E. histolytica [23]. All three conserved regions are present in the first 40 bases upstream the initiation codon of iscU and iscS (Fig. 1), suggesting both genes are functional, although the GAAC-element is less well conserved in the iscU promoter region. The E. histolytica IscU protein is 348 amino acids in length and has a predicted molecular mass of 38.9 kDa and a predicted isoelectric point of 5.71. Its large size indicates it is a long-form IscU, similar to the one described for Azotobacter vinelandii [24], and not a short form as found in other eukaryotes (Fig. 2). For IscS these values are 390 amino acids, 42.8 kDa, and 5.92, respectively. The GC values for the iscU (iscS) genes are 33 % (32 %) for the coding region, 29 % (25 %) for the 5' untranslated region and 29 % (18 %) for the 3' untranslated region (250 bp each). These values are in agreement with GC values reported for other E. histolytica genes based on 75,615 codons analyzed [25]. Codon usage is also similar to E. histolytica codon usage and no introns are present in either of these two genes.

Figure 1
figure 1

Overview of the 5'-flanking region of E. histolytica iscU (A) and iscS (B) encoding genes. The three typical upstream regulatory elements are depicted as by Purdy et al [23]; the putative initiator element, double underlined; the `GAAC'-element, grey box; and the putative TATA element, boxed.

Figure 2
figure 2

Primary sequence features of the E. histolytica IscU and IscS proteins. A. Schematic drawing of the E. histolytica IscU and IscS protein sequences indicating the positions and sizes of Pfam [26] signature motifs PF01592 (NifU-N), PF04324 ([2Fe-2S]), and PF01106 (NifU-like) on the putative IscU protein and Pfam motif PF00266 (aminotransferase class V – AtV) on the putative IscS protein. The presence of these domains on a protein is used by the various databases to classify a protein and to infer its function. B. Comparison of the E. histolytica IscU protein depicted as above with homologous proteins from Azotobacter vinelandii (NifU), Campylobacter jejuni (NifU), Rickettsia prowazekii (NifU and RP667), Saccharomyces cerevisiae (Isu1 and Nfu1), and Homo sapiens (IscU2 and HIRIP5).

Both E. histolytica IscU and IscS contain structural motifs typical of FeS assembly proteins. Pfam (PF01106, PF01592), PRODOM (PD002830), and InterProScan (IPR001075, IPR002871) motifs characteristic of IscU and NifU proteins are present in the E. histolytica homolog (Fig. 2A) [2628]. E. histolytica IscS contains Pfam (PF00266), PROSITE (PS00595), and InterProScan (IPR000192) motifs that are normally associated with aminotransferase class V proteins, a subfamily of the aminotransferase proteins. IscS is one of the eight members of the class V subfamily (Fig. 2A). As indicated above, IscU has an extension at the carboxy-terminus relative to most IscU homologs. This extension is also present on the A. vinelandii NifU gene whose amino-terminal part is homologous to that of IscU. In addition, this C-terminal extension is similar to a completely different gene from Saccharomyces cerevisiae, Nfu1 (NifU-like in Fig. 2B). Since Nfu1- and Isu-like sequences are part of the same gene in Azotobacter, Campylobacter, Entamoeba and Helicobacter it could be inferred that both proteins interact with each other when found on two separate genes. Such informative fusion proteins (or Rosetta Stone sequences) indicate an interaction between protein pairs [29]. The existence of long IscU isoforms would therefore suggest that the Nfu1 and Isu1/2 proteins do interact in yeast as postulated by Garland et al. [30].

Both proteins align along their whole length to homologous proteins from other organisms (Fig. 3). Residues implicated in function are conserved in both IscU and IscS proteins. The three cysteine residues that are conserved in Escherichia coli IscU which provide a scaffold for the assembly of iron-sulfur clusters [14] are conserved in the E. histolytica protein (Fig. 3A). In addition, in E. coli one of these IscU cysteines interacts with a conserved cysteine from IscS which is also present in the E. histolytica IscS (Figs. 3A and 3B). Most residues considered to be important for IscS function are also present on the E. histolytica protein (Fig. 3B). To test whether the E. histolytica IscS protein assumes a normal three-dimensional conformation, this protein was modeled on the solved NifS protein structure from Thermotoga maritima. The overall topology of both proteins is quite similar and the force field energy of the computed E. histolytica IscS model is -13,800 kJ/mol, indicating an energetically plausible model [31]. The putative active site architecture of E. histolytica IscS and the solved active site of T. maritima NifS show similar structures (Fig. 4). The ring of the cofactor vitamin B6 (or pyridoxal-5'-phosphate; PLP) is sandwiched between EhHis106/TmHis99 and EhThr184/TmVal179 and further fixed by residues EhAsp182/TmAsp177 and EhGln185/TmGln180. The phosphate-group is anchored by six hydrogen bonds from EhThr76/TmThr71, EhHis207/TmHis202, EhThr198/TmSer200, and EhThr243/TmThr238 [32]. The presence of all residues considered to be important for IscU and IscS activity on the E. histolytica proteins suggest that these proteins are indeed involved in FeS cluster assembly.

Figure 3
figure 3

Alignment of the putative E. histolytica IscU and IscS with homologs from C. jejuni , A. vinelandii , R. prowazekii , S. cerevisiae , and H. sapiens . A. Alignment of the E. histolytica long-form IscU with similar isoforms from C. jejuni and A. vinelandii. In addition, short-form IscU homologs from R. prowazekii, S. cerevisiae, and H. sapiens are aligned concatenated with their Nfu1 homologs (arrow indicates start of Nfu1 homologs) which resemble the C-terminal extension found on the long-form IscU. The conserved cysteine residues which provide a scaffold for the IscS-directed sequential assembly of labile FeS-clusters [14] are boxed. The cysteine residue that forms a disulfide bridge with a conserved cysteine residue on IscS (see B) is indicated by a closed square (■). The yeast Nfu1 mitochondrial transit peptide has been deleted. B. Alignment of the E. histolytica IscS with homologs from the above mentioned organisms. Important residues for function are as described by Tachezy, Sánchez and Müller [20]; the conserved lysine involved in co-factor binding (pyridoxal-5'-phosphate, PLP) is indicated by a closed circle (●), other residues involved in PLP interaction are indicated by open circles (○), the cysteine residue that forms a disulfide bridge with a cysteine residue on IscU (see A) is indicated by a closed box (■), residues involved in substrate binding (L-cysteine) are indicated by open squares (□), the conserved histidine involved in substrate deprotonation is indicated by an arrow. Typical eukaryotic/eubacterial conserved cysteine and C-terminal residues are boxed. Note that organisms that contain a long-form IscU (see Fig. 2B and 3A) do not have these conserved residues suggesting that the C-terminal IscU extension might take over the role of these residues. Part of the mitochondrial transit peptides from the yeast and human IscS homologs have been omitted (~) for reasons of clarity. Amino acids were shaded according to similarity/identity scores: dark grey indicates fully conserved residues while light grey indicates similar residues according to the PAM250 matrix [59].

Figure 4
figure 4

Model of the active site of E. histolytica IscS and Thermotoga maritima NifS. The E. histolytica IscS putative three-dimensional structure (A) was deduced using the conceptually translated iscS sequence. The previously solved crystal structure of T. maritima NifS (B) [PDB accession number: 1EG5, [32]] was used as a template. The E. histolytica IscS sequence was aligned to the T. maritima NifS sequence using DeepView v3.7 [http://www.expasy.org/spdbv/, [60]] and manually improved based on an independent Clustal W alignment [54]. Only residues involved in co-factor (PLP) and substrate (Cys) binding were selected in order to show the active site.

No N-terminal or C-terminal organelle targeting domains could be unambiguously identified in E. histolytica IscS/U proteins using subcellular localization and targeting prediction software (e.g., PSORT II [33], MitoProt [34], NNPSL [35]). The C-terminal signature motif which is considered to be characteristic of proteobacterial and eukaryotic IscS proteins [20] is not present in homologs from E. histolytica, Campylobacter or Azotobacter (Fig. 3B). Because these organisms all possess the long-type IscU isoforms, it is possible that the extended IscU protein might negate the need for the C-terminal signature residues on the interacting IscS protein. However, functional studies using deletion mutants are needed to confirm this hypothesis.

Phylogenetic analyses of the E. histolytica FeS cluster assembly proteins

Bayesian and maximum likelihood (ML) phylogenetic analyses of E. histolytica IscU and IscS protein sequences revealed that the Entamoeba Isc proteins form a well supported clade with Helicobacter pylori and Campylobacter jejuni – two bacteria encountered in the human digestive tract – to the exclusion of all other prokaryotic and eukaryotic homologs (Fig. 5). All three independent Bayesian analyses converged on the same tree with similar posterior probabilities. For IscU, the ML tree had a slightly better likelihood than the Bayesian tree, while for IscS both trees had similar likelihoods. The overall topologies of IscS and IscU phylogenetic trees are very similar to each other and major taxonomic clades like plants, animals, and fungi are well conserved. The position of the microsporidium Encephalitozoon cuniculi in the IscU tree is poorly resolved as indicated by the very low support for this node at the base of the metazoa, contrary to its well-documented association with fungi [36].

Figure 5
figure 5

Phylogenetic analysis of E. histolytica IscS and IscU protein sequences using a similar taxonomic sampling. Depicted are unrooted maximum likelihood phylogenetic trees of 29 IscS (left) and 28 IscU (right) protein sequences. The E. histolytica sequences are recovered as part of a well supported monophyletic group comprising the gut bacteria H. pylori and C. jejuni. The orange branches represent those sequences containing the long IscU isoform. Numbers in red represent posterior probabilities as determined by MrBayes [55] where a value of 1.0 represents maximum support (only values above 0.75 are shown). Values in blue represent bootstrap values as determined using PHYML [57], only bootstrap values above 50% are shown.

The position of Rickettsia prowazekii IscS basal to the eukaryotes suggests that eukaryotic IscS proteins originated from the mitochondrial endosymbiont, since this bacterium is considered to be a close relative to the mitochondrial ancestor. Indeed, the mitochondrial ancestry of E. cuniculi, T. vaginalis and G. intestinalis IscS proteins is strongly supported by their clustering with mitochondrial homologs [15, 20, 37]. For IscU, the base of the eukaryotic clade is not well resolved. Animals and plants cluster together with a proteobacterial sister clade containing the α-proteobacterium R. prowazekii, while fungi, G. intestinalis, and the alveolates are basal to this clade. However, the well-supported clustering of E. histolytica Isc proteins with homologs from the bacteria H. pylori and C. jejuni, to the exclusion of all other eukaryotes, suggests that E. histolytica acquired its isc genes laterally from ε-proteobacteria (Fig. 5). This suggestion is further supported by the fact that Campylobacter, Helicobacter and E. histolytica all possess long form IscU proteins to the exclusion of the short isoforms found in eukaryotic organisms and in many bacterial taxa (see orange branches in Fig. 5).

Mitochondrial-type IscS/U proteins have been identified in several amitochondrial eukaryotes including Giardia, Encephalitozoon, Trichomonas and Cryptosporidium, and there is significant direct and indirect evidence that these proteins are targeted into their highly derived mitochondrion-related organelles [12, 15, 16, 20]. Thus, E. histolytica appears to be unique amongst eukaryotic organisms that contain mitochondrion-related organelles in harbouring bacterial-type IscS/U proteins. That no mitochondrial-type IscS/U proteins have thus far been identified in E. histolytica would suggest that its original mitochondrial-type iscS/U genes were replaced during the course of evolution by the more recently acquired bacterial homologs. However, since the E. histolytica genome has not yet been fully sequenced, the possibility that mitochondrial type iscS/U genes might have escaped detection cannot be formally excluded.

Since both E. histolytica Isc proteins form a strongly supported clade with homologs from gut bacteria, we investigated whether other intestinal inhabitants would form part of this clade. The genomes of 23 bacterial and 2 eukaryotic inhabitants of the human gut were screened using E. histolytica IscU and IscS as query sequences, but no additional homologs were identified. Only a fraction of the estimated 400–500 bacteria species living in the human intestine [38] have been sequenced and therefore we may not have been able to identify any other members of this clade due to sampling limitations. Nevertheless, the most parsimonious explanation for the clustering of E. histolytica Isc proteins with those of bacteria is that E. histolytica, or its ancestors, acquired its iscS/U genes by horizontal gene transfer (HGT), a well-documented contributor to prokaryotic and eukaryotic genome evolution. In higher eukaryotes the most obvious example of HGT is the relocation of genes from endosymbiosis-derived organelles to the cell nucleus, which might be regarded as a special case of HGT. However, over the past few years evidence has accumulated of the frequent incorporation of genes into the genomes of microbial eukaryotes by HGT [3947]. The transfer of bacterial genes into eukaryotes might occur in several possible ways. One hypothesis is the `you are what you eat' gene transfer ratchet of HGT which suggests that when a genome is continuously bombarded with DNA, some of these genes might eventually replace the host's own genes [48]. Since both Helicobacter and Campylobacter occupy the same ecological niche as E. histolytica, an avid consumer of gut bacteria, HGT via this mechanism seems plausible. Establishing unequivocally the timing of HGT will be important to test this hypothesis.

Analysis of the organization of Isc/Nif loci on the genomes of several bacteria revealed the presence of a small Isc operon consisting exclusively of IscU and IscS in H. pylori and C. jejuni, whilst the well-studied E. coli and A. vinelandii isc operons contained several other genes involved in FeS cluster assembly (see Fig. 6). This observation provides a mechanistic explanation for the presence of two interacting proteins with similar ancestry in the genome of E. histolytica. It is possible that E. histolytica might have incorporated the entire isc operon from Helicobacter/Campylobacter, or from their ancestors, into its genome in a single transfer event. Once freed from the constraints of operon-type prokaryotic gene expression, the iscS/U genes might have become separated in the E. histolytica genome during the course of evolution.

Figure 6
figure 6

Schematic representation of Isc/Nif operons from different bacteria. Shown is an area of about 10 kb around the IscU/S or NifU/S genes from C. jejuni, H. pylori, A. vinelandii, and E. coli. Isc/Nif genes are indicated by a dark grey box and other genes that are part of the Isc-operon are indicated by a lighter shade of grey. Genes that are not part of the Isc/Nif operon are of yet a lighter shade. Boxes are drawn proportionally with regards to length of the ORF.

Conclusions

E. histolytica or its ancestors appear to have acquired their iscS/U genes by HGT from ε-proteobacteria. The apparent absence of mitochondrial-type IscS/U proteins in an organism with mitochondrion-bearing ancestors such as E. histolytica suggests that its original mitochondrial iscS/U genes might have been replaced with the more recently acquired bacterial homologs. This finding, like several other recently reported cases of prokaryote to eukaryote gene transfers [3947], highlights the important role played by HGT in protozoan genome evolution. Since no recent HGT events from prokaryotes to humans have been detected in the human genome [49], HGT from bacteria to protozoan parasites might have important implications for public health. Targeting enzymes or metabolic pathways of bacterial origin in human pathogens should have more severe consequences for the parasite than for its host, making these proteins promising targets for chemotherapy.

Methods

Organism and DNA isolation

E. histolytica HM-1:IMSS clone 9 was maintained axenically by subculture in YI-S medium with 15% adult bovine serum as described [50]. Entamoeba genomic DNA was isolated using cetyltrimethylammonium bromide (CTAB) according to Clark [51].

Cloning and sequencing of the E. histolytica iscS and iscU genes

Standard recombinant DNA techniques were used as described elsewhere [52]. PCR was performed on isolated E. histolytica genomic DNA. Primers were designed using Primer3 [53]. The EhiscU gene was amplified using primers based on a NifU-like E. histolytica sequence (accession number AY040613). The primers were Eh_IscU_936F, 5'-CCA ACG TAT CGC CAC GAA AA-3' and Eh_IscU_2270R, 5'-GCA AAA CAA AGT ATG GCA GAA GCA-3' for forward and reverse primers, respectively. The EhiscS gene was identified on the E. histolytica genome by BLAST searches of preliminary data generated by the Entamoeba genome sequencing project [17] using G. intestinalis GiiscS (accession number AAK39427) as the query sequence. Putative EhiscS gene sequences (1000 bases up- and downstream of the ORF) were used for primer design. The EhiscS coding region was amplified using primers Eh_IscS_681F, 5'-CAA GTG CGA ATA CCC AAT TTG AA-3' and Eh_IscS_2515R, 5'-GGC TGA AGC CAT GAC ACC TC-3' (forward and reverse primers, respectively). The resulting PCR fragments were all cloned into pGEM-T-Easy (Promega) and sequenced to confirm their identity. The new E. histolytica IscS sequence has been deposited in Genbank (accession number AY277946).

Phylogenetic analyses

The conceptually translated E. histolytica IscS and IscU amino acid sequences were aligned using Clustal W [54] to reference sequences from Genbank. The alignments were manually refined and only unambiguously aligned regions without gaps were used for phylogenetic analysis, leaving data sets of 28 taxa with 116 amino acid positions (IscU) and a similar taxon set consisting of 29 taxa with 326 amino acid positions (IscS). Likelihood searches were performed in a Bayesian framework under the JTT-f substitution model accommodating site rate variation (fraction of invariable sites plus four variable gamma rates) using the program MrBayes [55]. All analyses started with randomly generated trees and ran for 200,000 generations, with sampling at intervals of 100 generations that produced 2,000 trees. To ensure that the analyses were not trapped on local optima, the data set was run three times independently, each run beginning with a different starting tree. The log-likelihood values of the 2,000 trees in each analysis were plotted against the generation time (not shown). Although the likelihood model stabilized very rapidly, only the last 1,500 trees in each of the three independent analyses were used to estimate separate 50% majority rule consensus trees for these. The frequency of any particular clade, among the individual trees contributing to the consensus tree, represents the posterior probability of that clade [55]. For the maximum likelihood analyses, protein data sets were resampled 100 times using SEQBOOT from PHYLIP [56]. These resampled datasets were analysed using PHYML [57] with alpha and invariant sites parameters optimized on the Bayesian tree in TREE-PUZZLE 5.0 [58] with a mixed four-category discrete gamma plus invariable sites model of rate heterogeneity. The JTT substitution model was used in the protein analyses. Majority rule consensus trees were obtained from the resulting 100 trees using CONSENSE (PHYLIP).