Background

Amoebiasis, caused by the enteric protozoan parasite Entamoeba histolytica, is a major global health issue affecting about 50 million individuals around the world each year [1]. The disease is mostly endemic in developing countries with sub-optimal sanitization, improper water treatment and lower socio-economic status. Amoebiasis has been reported all over India, affecting about 15% of the Indian population. Microscopic examination is considered as the first choice of diagnosis till now [2]. Commercially available E. histolytica IgG antibody detection assay which is based on detecting anti-E. histolytica antibodies in the serum sample possesses a challenge in distinguishing past and current infections due to the persistence of IgG from past exposure to the pathogen [3]. Biochemical methods like isoenzyme analysis are not followed anymore due to poor sensitivity and a high rate of false-negative cases [4]. Commercially available TechLab E. histolytica II ELISA, a second-generation monoclonal antibody-based rapid test, revealed variable sensitivity and specificity patterns in several studies too [3, 5,6,7]. Real-time PCR has been reported to show higher diagnostic sensitivity than the antigen detection method [8]. However, high expenditure is still a barrier to the robustness of these molecular diagnostic methods in developing countries. Nonetheless, the tests developed so far for amoebiasis do not fit properly with the ASSURED criteria developed by the World Health Organization (WHO), which further intensifies the need for searching for a new drug target [9]. Metronidazole and other nitroimidazole derivatives remained the drug of choice in the treatment of amoebiasis for over 25 years. Yet, the persistence of the parasite in 40 to 60% of cases even after complete drug therapy has remained a lingering problem [10, 11]. Initially reported cases of drug resistance in the parasite were fairly uncommon, an attribute assumed to have resulted from the pleiotropic mode of action of metronidazole [12]. However, the eventual failure of differentiating pathogenic and non-pathogenic strains of Entamoeba has lead to indiscriminate use of these antibiotics and eventually drug resistance [13]. In vitro experiments have shown the survival capability of E. histolytica at 40 μM concentration of metronidazole which is ten times higher than its sublethal dosage [14]. The carcinogenic and teratogenic effects of metronidazole are debated over a long period. Later, carcinogenicity has been proved in rodents with high dosage treatment with extended duration, but in humans, the effect is still questioned [15]. However, a limited correlation between metronidazole intake and cancer was found in some cases, which eventually pushed the drug to officially be classified as ‘reasonably anticipated to be a human carcinogen’ [16]. Considering all these scenarios along with a lack of effective vaccine, the development of novel anti-amoebic compounds is the only choice to take on this pathogenic parasite [17].

Proteins have always been a forefront runner for being potential drug targets since their involvement in mediating several biological processes. Apart from that, efficient binding of selective compounds, being other desirable criteria, makes proteins a much better option than other macromolecules [18]. Cell cycles of protozoan parasites, having several unusual facets regarding the regulation of cell division, absence or altered checkpoint proteins, has gained a lot of attention as probable drug targets for quite a while [19, 20]. The existence of functional proteasome and components of the ubiquitin-proteasome system (UPS) has already been reported in E. histolytica [21, 22]. Ubiquitin-mediated proteolysis is an important process that governs several key regulatory cell cycle control events by degrading components of cell cycle machinery [20]. Polyubiquitination is a three-step cascade mechanism involving three groups of enzymes: ubiquitin-activating enzymes (E1), ubiquitin-conjugating enzymes (E2) and ubiquitin ligases (E3) [23]. Anaphase-promoting complex or cyclosome (APC/C) is a multi-subunit E3 ubiquitin ligase that spatiotemporally coordinates mitosis by controlling anaphase onset, progression to next steps, mitotic exit and re-entry into the next G1 phase by targeting specific substrates for 26S proteasome-mediated destruction, thereby controlling cell proliferation in all eukaryotes [24]. APC/C is made up of 12 subunits in humans and 13 subunits in budding yeast, but Apc2-Apc11 are considered to be a minimal ligase module essential for ubiquitination [25, 26]. There is no proper evidence of Apc11 acting as ubiquitin carrier onto substrates, but its role to act as a bridge between E2 and substrate has been reported to favour ubiquitination in Saccharomyces cerevisiae [27]. siRNA-induced apoptosis of cancer cells by targeting Cdc20 (an activator of APC/C) has been done effectively in humans [28]; therefore, it will be significant to understand if targeting the catalytic core of APC/C can be another option to arrest the cell cycle. Apc11 dysfunction and subsequent mitotic arrest have been reported previously [29]. The current study was undertaken to characterize EhApc11 that may shed light on its structural and functional aspects and can elucidate the prospect of EhApc11 to be considered as an anti-amoebic drug target and eventually provide insights toward the novel ones.

Methods

In silico analysis

Sequence retrieval

Human Apc11 protein sequence was retrieved from NCBI (NCBI Acc. No. NP_057560.8). The homologous protein of human Apc11 in E. histolytica was identified using human Apc11 as a query in BLASTP. The homologues from other representative model organisms were further identified using EhApc11 as a query sequence.

Multiple sequence alignment and phylogenetic analysis

Multiple sequence alignment of Apc11 from E. histolytica and other selected model organisms was performed by the ClustalW module of BioEdit v7.2.5. A phylogenetic tree was derived using the neighbour-joining algorithm of MEGA10 with 1000 bootstrap replicates.

Physicochemical profiling and primary structure analysis

Physicochemical characterization of the retrieved protein sequence was carried out using ExPASy’s ProtParam tool [30]. Several different parameters like molecular mass, theoretical isoelectric point (pI), amino acid composition, aliphatic index and grand average of hydropathy (GRAVY) were predicted which gives an idea about the intrinsic heterogeneity of the protein. The highest contributing amino acids (top five) in EhApc11 are compared. Under the default configuration, the analysis was done.

Secondary structure analysis

PSIPRED v4.0 was used to predict the secondary structure of EhApc11 in terms of the proportion of α-helices, strands and coils [31].

Tertiary structure analysis

Three-dimensional homology modelling of EhApc11 was performed by MODELLER [32]. Human APC/C-Cdc20-Cdk2-cyclinA2-Cks2 complex (PDB ID: 6Q6G chain C at 3.20 Å resolution) with a significant sequence identity was used as a template to model the tertiary structure of EhApc11. Validation of the modelled structure was performed by several bioinformatics servers viz., QMEAN, ProSA and ERRAT [33,34,35]. To find the energetically favourable residues within the modelled protein structure, PROCHECK was used to generate the Ramachandran plot which gives an overview of φ-ψ torsion angles of the protein backbone and also indicates the percentage of residues present within favoured, allowed and outlier regions [36].

Functional analysis and protein-protein interaction

STRING web server (https://string-db.org/) was used to identify in vivo interacting proteins of EhApc11. The prediction is based on both physical and functional associations [37]. However, the functional association between two proteins is considered as a basic unit in STRING where several sources including (1) known experimental and co-expression studies, (2) pathway knowledge from databases, (3) automated text mining, (4) information from computational algorithms and (5) pre-computed orthology relations are behind the derived prediction. The sequence was subjected to MEME for the identification of functional motifs or signature sequences [38]. For further analysis, NCBI-CDD-BLAST was used to find conserved domains present within the protein sequence.

Prediction of ligand binding site, subcellular localization and antigenicity

Protein-ligand binding sites were predicted using MODELLER-generated model as an input into COACH [39]. Final ligand binding sites are predicted by combining the results with COFACTOR, FINDSITE and ConCavity [39]. Subcellular localization of EhApc11 was predicted with CELLO v.2.5 as it is an important attribute to understand the protein functions and also carries a great significance in the drug designing process [40, 41]. Antigenic segments within EhApc11 were predicted with Predicted Antigenic Peptides (http://imed.med.ucm.es/Tools/antigenic.pl).

Molecular characterization

Culture and maintenance of E. histolytica

The axenic cell culture of E. histolytica HM1:IMSS has been maintained. Trophozoites are grown in TYI-S-33 medium supplemented with 10% adult bovine serum at 37 °C, and log phase culture has been used for the present study [42].

RNA isolation and molecular cloning of full-length EhAPC11 cDNA

Total RNA was isolated from the trophozoites of E. histolytica HM1:IMSS using RNeasy Mini Kit (Qiagen) following the manufacturer’s protocol. cDNA was synthesized from 1 μg of total RNA sample using QuantiTect Reverse Transcription kit (Qiagen). The full-length cDNA fragment of EhAPC11 was PCR amplified using a specific set of primers (forward 5′AGCGGATCCATGACTGTTGAAATAATTGAA-3′ and reverse 5′AGCGTCGACATTTTCTATTTCCATTAAATTT-3′). The following programme was used for amplification: 95 °C for 3 min, followed by 30 cycles of denaturation at 95 °C for 1 min, annealing at 50 °C for 45 s and final extension at 68 °C for 5 min. The PCR products were analysed in 1% agarose gel in 1× TAE buffer. The purified product was cloned into the BamHI-SalI sites of the bacterial expression vector, pGEX-4T-1, using E. coli XL1B competent cells and sequence verified by automated sequencing.

Expression and purification of recombinant EhApc11

The pGEX-4T-1:EhAPC11 was expressed in E. coli BL21 (DE3) as N-terminus GST-tagged fusion protein. Protein expression was induced with 0.1 mM isopropyl thio-β-galactoside (IPTG) for 16 h at 16 °C. Cells were harvested by centrifugation at 5000 rpm, and pellets were resuspended in lysis buffer containing 25 mM Tris-HCl pH 7.4, 10 mM NaCl, 5 mM MgCl2, 5 mM dithiothreitol, 1% Triton X-100 and 1 mM PMSF. Cell lysates were briefly sonicated to destroy the cell membrane and reduce the viscosity. Final centrifugation at 13,000 rpm for 20 min was followed by purification of supernatant fraction using GSH-agarose resin, and expression was analysed on 12% SDS-PAGE.

Results

Sequence retrieval and phylogenetic analysis

BLASTP analysis against the non-redundant database using Homo sapiens Apc11 (NP_057560.8) as query revealed that it shares significant sequence identity and similarity to zinc finger domain-containing protein (NCBI Acc. No. XP_651657.1) in E. histolytica. To better understand the conservation of Apc11 among other model organisms, the sequence of EhApc11 was used as a query in additional BLASTP searches. Details of EhApc11 protein and its homologous protein sequences considered for the phylogenetic study are provided in Table S1. The primary amino acid sequence of EhApc11 was aligned with homologues from other organisms, and strictly conserved residues were shaded in blue and identical residues were marked in black colour (Figure S1). Phylogenetic analysis revealed that EhApc11 is closer to other parasites, such as Trypanosoma brucei gambiense, Plasmodium vivax and Leishmania donovani whereas Danio rerio showed maximum divergence (Fig. 1).

Fig. 1
figure 1

Phylogenetic analysis showing the evolutionary relationship of EhApc11 with other reference organisms using the neighbour-joining method of MEGA10. The numbers below and above the branch points are indicative of the confidence level of relationships as determined by bootstrap analysis

Primary structure analysis and physicochemical characterization

Analysis of the physicochemical properties gives an overall idea about the nature of a protein. The number of amino acids, pI value (isoelectric point), aliphatic index and grand average of hydropathicity (GRAVY) of EhApc11 is given in Table 1. Computational analysis revealed that EhApc11 is 87 residues long, and its pI value is 4.40. The aliphatic index and GRAVY value of EhApc11 are found to be 62.64 and − 0.313, respectively. The primary structure analysis revealed cysteine as the most contributing amino acid (Fig. 2a). The percentage of the top five contributing amino acids of EhApc11 is represented in Fig. 2a.

Table 1 Physicochemical characterization of EhApc11
Fig. 2
figure 2

Primary and secondary structure analysis of EhApc11. a A column graph representing the amino acids dominating in the primary structure. b Secondary structural arrangement predicted by PSIPRED

Secondary structure analysis

The secondary structure analysis revealed that EhApc11 is rich in random coils, followed by strands and helices (Fig. 2b). Random coils have important functions in protein flexibility and allow conformational changes during enzymatic turnover [43].

Tertiary structure analysis and model validation

Sequence similarity may correspond optimally to structural similarity, and knowledge of three-dimensional protein structures along with functional aspects may be helpful to elucidate the nature of uncharacterized proteins. The output model was visualized by Chimera [44] (Fig. 3). The stereochemical quality and accuracy of the modelled structure were validated through several tools (Table 2). The ERRAT analysis of non-bonded interactions between different atom types within EhApc11 indicated good model quality as the ERRAT score of EhApc11 was 84.375 (the accepted limit is > 50 for high-quality models) (Figure S2). Global quality estimation by QMEAN placed the EhApc11 model within the acceptable region of the estimated absolute model quality graph with a z-score of − 2.46 (Fig. 4a–c). The stereochemical evaluation of backbone φ and ψ dihedral angles of EhApc11 by PROCHECK revealed that 96.1% of residues were in the most favoured regions (Figure S3a). A good model is expected to have over 90% of residues in most favoured regions. ProSA was used to find potential errors present in the 3D model of the protein structure. It generates two characteristic features of input structures; z-score and a plot of its residue energies. The z-score signifies the overall model quality and assesses the structure’s total energy deviation from an energy distribution derived from random conformations [34, 45]. ProSA analysis revealed a z-score of − 3.21 for EhApc11 indicating no notable deflection from similar-sized typical native proteins (Figure S3b).

Fig. 3
figure 3

Modelled tertiary structure of EhApc11 protein. The structure prediction was done by MODELLER. Chimera was used for model visualization

Table 2 Quality assessment scores of 3D modelled structure of EhApc11
Fig. 4
figure 4

Quality assessment of modelled EhApc11 structure by QMEAN server. a Global quality estimation. b z-score value (− 2.46) of EhApc11 shown as a red star. c Local quality estimation. QMEAN scoring function depends on four individual components. Local geometry analysis is based on the potential of torsion angle over three consecutive amino acids. Long-range interactions are assessed at the residue level and at the second level based on the potential of Cβ atoms and all atoms, respectively. Solvation energy is calculated to know the accessibility of residues to water. The QMEAN z-score of − 4.0 or below is an indication of low model quality. The QMEAN z-score is shown on top, and individual z-scores are displayed below. The local quality plot is interpreted as the possibility of each residue of the model (on the x-axis) to be similar or deviable to the available native structures (on the y-axis). Residues with a score below 0.6 indicate deflection from expected values of native structures

Biological function prediction

STRING analysis has revealed that EhApc11 is capable of interacting with a variety of proteins (Fig. 5). Alanyl-tRNA synthetase is an interactor of EhApc11 and has much relevance as aminoacyl-tRNA synthetases appear to be promising drug targets against eukaryotic parasites [46]. Another APC/C subunit [(EHI_009380), NCBI Acc. No. XP_654652.1], which is a homologue of human Apc10 (NP_055700.2), is predicted as a strong interactor of EhApc11. The MEME is a powerful web-based tool to identify and characterize sequence motifs in proteins. The location and position of the motifs within EhApc11 are depicted in Figure S4. The presence of a conserved domain can give us insights into the cellular or molecular function of a protein along with evolutionary history [47]. Conserved domains are considered as an important factor while assigning proteins into specific families. EhApc11 was found to be a member of the RING (really interesting new gene) superfamily with a ‘cross-brace’ motif. RINGs are cysteine and histidine-rich, zinc-binding domains reported to play a role in spatial coupling of E2 and E3 ubiquitin ligase while cross-brace arrangement helps in binding two atoms of zinc [48].

Fig. 5
figure 5

Protein-protein interactions of EhApc11 by STRING. a Network map of predicted interactions. b Summary of predicted interacting proteins. The network nodes indicate interacting proteins. The position of EhApc11 is labelled as EHI_135100. Seven different coloured lines indicate the seven different categories of evidences used in predicting the associations (green = neighbourhood, red = gene fusion, blue = co-occurrence, black = co-expression, pink = experiments, sky blue = databases, light green = text mining, grey = homology)

Ligand binding site, localization and antigenicity prediction

Proteins bind to other biomolecules or ions for participating in various essential biological processes. Specific key amino acid residues where interaction between proteins and ligands occur are termed as ligand binding site. Prediction of ligand binding sites is fundamental for the functional identification of a protein [49]. The amino acids around ligand binding sites are likely to be conserved among homologous proteins [50]. COACH analysis predicted a total of four ligand binding residues for zinc atoms at positions 23, 26, 56 and 59 with a significant support score (C-score) of 0.75 (Fig. 6). CELLO v.2.5 analysis predicted EhApc11 as an extracellular protein. EhApc11 was predicted as antigenic having an average antigenic propensity of 1.0513 (Table 3; Fig. 7).

Fig. 6
figure 6

Ligand binding site prediction of EhApc11. COACH output indicates that zinc binds to a cysteine residue at positions 23, 26, and 59 and with a histidine residue at position 56 with a C-score of 0.75. Range values of C-score prediction lie between 0 and 1, where the highest score indicates more reliability

Table 3 Antigenic determinants of EhApc11. The table indicates the positions and sequences that might be associated with the antigenic propensity of EhApc11
Fig. 7
figure 7

Antigenicity plot and antigenic determinants of EhApc11. Two antigenic determinants (spanning the regions 19–61 and 68–74) are indicated by grey lines within the plot

Molecular characterization

Amplification of EhAPC11 resulted in a PCR product of 264 bp (Fig. 8). A similar-sized specific product was obtained by BamHI-SalI double digestion of pGEX-4T-1: EhAPC11. The result concluded the proper cloning of EhAPC11 into pGEX-4T-1, supported by automated sequencing. With IPTG induction, an increase in protein expression is observed. The expressed protein identified by SDS-PAGE produced a specific band of ~ 36 kDa which corresponds to the predicted molecular weight of GST tagged recombinant protein (Fig. 9).

Fig. 8
figure 8

EhAPC11 cDNA amplicon separated by 1% agarose gel electrophoresis. Lane 1, 100 bp ladder ; Lane 2, PCR product of EhAPC11. The ~ 264-bp band represents EhAPC11 cDNA

Fig. 9
figure 9

Expression and purification analysis of recombinant Apc11. Lane 1, uninduced 1 ml crude; lane 2, uninduced supernatant; lane 3, uninduced pellet; lane 4, induced 1 ml crude; lane 5, induced supernatant; lane 6, induced pellet; lane 7, purified protein; lane 8, molecular weight marker. A specific band around ~ 36 kDa indicates induced EhApc11 fusion protein

Discussion

The aim of the present study was to gain structural and functional insights of EhApc11 using bioinformatic tools. The phylogenetic analysis highlights the conservation of the homologue proteins among reference organisms. The physicochemical properties of a protein are of much importance to predict systematic properties. EhApc11’s predicted pI (isoelectric point) value is less than 7, indicating that the protein is acidic. Aliphatic index is the relative volume of a protein occupied by aliphatic side chains. The high aliphatic index indicates the thermostability of a protein over a wide temperature range [51]. Therefore, an aliphatic index of 62.64 suggests the thermostable nature of EhApc11. The GRAVY value for a protein can be calculated as the sum of hydropathy values for all amino acids within a protein divided by the number of total residues in the sequence [52]. Negative GRAVY value is indicative of better interaction between protein and water. A GRAVY index of − 0.313 indicates the hydrophilic (globular) nature of EhApc11. Primary structure analysis revealed cysteine as the most contributing amino acid (Fig. 2a). Secondary structure analysis by PSIPRED showed that 65.51% of total amino acids contributed to coils, 11.49% to helices and 22.98% to strands. High percentage of random coil indicates true enzymatic functions of the protein. Theoretical structure prediction plays a key role in determining the function of an unknown protein. Structure prediction of EhApc11 was done successfully, and model quality assessment revealed that the predicted 3D model is of good stereochemical quality and accuracy. As one of the important properties of RING superfamily protein, EhApc11 is rich in cysteine and histidine which is supported by multiple sequence alignment data. Nearly 200 RING proteins have been discovered to be associated with oncogenesis, apoptosis, transcriptional regulation and ubiquitination [48]. RING finger proteins are the largest class of E3s discovered so far [53]. Members of the RING family are thought to aid in the formation of large molecular assemblies. RING domain is present in all RING E3s and activates E2-ubiquitin conjugates, thus allowing ubiquitin to be transferred directly from E2 to the target protein. Mutation of Saccharomyces cerevisiae Roc1 (ScRoc1), a homologue of Apc11 protein, has been reported to abolish the ligase activity of the complex [54]. It indicates the functional conservation of Apc11 to maintain E3 ubiquitin ligase activity. STRING analysis revealed that EhApc11 can potentially interact with various proteins involved in the cell cycle pathway and also with EhApc10. This is significant because it has been reported that human Apc10 may regulate the interaction between Apc2-Apc11 [55]. Proteins having confidence scores above 0.9 were included in the result (maximum 20 interactors in 1st shell is considered). The extracellular nature of EhApc11 is a positive aspect as extracellular proteins are always considered as good drug targets due to their accessibility. The presence of ligand binding sites for zinc atoms reveals the conserved nature of the RING domain present in EhApc11. The ligation of zinc is essential for proper folding of the domain and subsequent biological functions. The identification of protein antigenic epitopes aids in the identification of protein areas capable of generating a robust immune response that may be significant for producing antibodies. Our bioinformatic analysis revealed two regions that might be associated with antigenicity of EhApc11. However, further research is needed to determine the applicability in this aspect. The SDS-PAGE analysis showed that the recombinant EhApc11 was properly expressed in bacteria.

Conclusion

UPS plays a key role in maintaining cellular homeostasis. Being a part of UPS, APC/C holds much importance as an E3 ubiquitin ligase in regulating cell cycle events through controlled proteolysis of specific cell cycle proteins and may also serve as a potential anti-amoebic target. Thus, analysis of the structural and functional properties of EhApc11 will provide more clarity. In silico analysis revealed that EhApc11 is a hydrophilic, extracellular protein with a molecular weight of ~ 10.21 kDa. The predicted 3D structure may help in the further in-depth analysis of protein-protein interactions and docking studies, as crystal structure-related details of EhApc11 were scarce. The extracellular nature of EhApc11 makes it easily accessible to the host immune system. The protein possesses two antigenic determinants and belongs to the RING superfamily. The expression study in E. coli BL21 (DE3) shows the molecular weight of GST tagged EhApc11 is ~ 36 kDa. While the study has provided basic insights into the structural and functional aspects of the protein, detailed experiments regarding the validation of protein-protein interaction as identified by STRING and phenotypic analysis of Apc proteins would help to get new knowledge about this human pathogen. This ultimately would help to identify new therapeutic targets against amoebiasis in the near future.