Background

Malaria, a mosquito-borne disease, kills roughly 627000 people every year, mostly infants in Africa. It affects about 198 million patients annually (World Health Organization, 2013, http://www.who.int/malaria/media/en/). It is caused by parasites which are clubbed under genus Plasmodium. Among them, P. falciparum is encountered most commonly and is deadliest [1]. Though there are myriad drugs to treat the menace but the increasing instances of resistance against antimalarial drugs are becoming a deepening concern day by day. In recent years, several cases of resistance have been detected across the globe against artemisinin drugs [2]. This underscores the need to discover resilient drugs to combat malaria in future. Therefore, in this effort, several molecular drug targets have been identified to develop new drug candidates. An important drug target is M18 aspartyl aminopeptidase (M18AAP) which is expressed in the cytoplasm of P. falciparum by a single copy of PfM18AAP gene. M18AAP interacts with the human erythrocyte membrane protein Spectrin and other proteins during disease kicking off erythrocytic life cycle, and it is essential for the survival of this parasite in Blood cells. It has been reported that the malaria parasites mutated with M18AAP enzyme are not able to survive, which proves that this plays a critical role in the survival of P. falciparum and could serve as an important molecular target to develop potential therapeutic agents to control malaria infection [3]. In modern times, virtual screening methods like QSAR, pharmacophore modeling, molecular docking have been proved a valuable tool for rapid discovery of novel drug candidates, e.g., the discovery of O-Acetyl-L-Serine Sulfhydrylase of Entamoeba histolytica inhibitors, acetylcholinesterase inhibitors, and antagonists Acetophenazine, fluphenazine and periciazine against Human androgen receptor [46]. In the drug development, the study of Quantitative structure-activity relationships (QSAR) plays an important role to analyze the properties of drugs. QSAR is a mathematical model that relates chemical descriptors of compounds to their quantity showing specific biological or chemical activity [7]. The molecular descriptors for the compounds are calculated and used to derive QSAR Model [8]. In the present study, the known bioactive dataset was used to build 3D QSAR models using partial least square regression (PLSR) [9], principal component regression (PCR) [10, 11] and k-nearest neighbor-molecular field analysis (kNN-MFA) methods [12]. After that, pharmacophore mapping was performed to identify the binding modes and structural features of the ligands and followed by molecular docking. The generated models provided a valuable reference which could be applied in the designing of pharmaceuticals with improved antimalarial activity. In the end, virtual screening of antimalarial compounds from ChEMBL Bioassay, and other dataset were also carried out to identify novel potential inhibitors which could be better as compared to the known inhibitors of PfM18AAP.

Methods

Dataset of experimental PfM18AAP inhibitors

A dataset of 32 compounds known as inhibitors of PfM18AAP was extracted from National Center for Biotechnology Information PubChem bioassay (AID 743024) (https://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=743024). Another high throughput screened dataset of 3502 known bioactive inhibitors of PfM18AAP was extracted from AID 1822 used for docking studies against PfM18AAP (http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=1822). A library of 153,873 compounds was obtained from the ChEMBL antimalarial database used for finding novel inhibitors against PfM18AAP metalloproteinase [https://www.ebi.ac.uk/chembl/]. Additionally, 27 antimalarial drugs described by WHO, 32 analogous of quinine compounds(QN) (AID 660170), 24 analogous of chloroquine (CQ) (AID 404780), and 17 analogous of 8-aminoquinoline(8-AmQN) (AID 554037) were also extracted for molecular docking, 3D QSAR model and pharmacophore similarity search. 2D structures were converted to 3D structures using Corina 2.64v [13] and open babel [14].

Molecular descriptors

The molecular descriptors were calculated by VLifeMDS version 4.3 using Gasteiger-Marsili charge [15, 16]. The PfM18AAP inhibitors compounds along with their activity pIC50 values were given as input for force field calculation. The steric and electrostatic interaction energies are computed using a methyl probe of charge +1.

Development of 3D QSAR models

The biological activity (pIC50) of inhibitors was selected as dependent variables and descriptors as independent variables. The 60 % data for the training set and 40 % for test set were manually selected. The unicolumn statistics were calculated to validate training and test sets. The 3D QSAR models were built using PLSR, PCR, and kNN-MFA by stepwise forward-backward method [17].

3D QSAR Model validation

Internal validation

To perform internal validation (cross validation), a compound is eliminated from the training set and then its biological activity is predicted to validate model accuracy. This step is repeated until the biological activity of every compound in the training set is predicted once. The cross-validated coefficient, q2 is calculated using the given Eq. (1):

$$ {q}^2=1-\frac{{\displaystyle \sum }{\left({y}_i-{\widehat{y}}_i\right)}^2}{{\displaystyle \sum }{\left({y}_i-{y}_{means}\right)}^2} $$
(1)

Where, yi and \( {\hat{y}}_i \) are the actual and predicted activities of the ith molecule in the training set respectively, and ymeans is the average activity of all the molecules in the training set [18, 19].

External validation

External validation (pred_r2) is carried out by calculating predicted correlation coefficient (pred_r2) value using following Eq. (2):

$$ pred\_{r}^2=1-\frac{{\displaystyle \sum }{\left({y}_i-{\widehat{y}}_i\right)}^2}{{\displaystyle \sum }{\left({y}_i-{y}_{means}\right)}^2} $$
(2)

Where, yi and \( {\hat{y}}_i \) are the actual and predicted activities of the ith molecule in the test set, respectively, and ymeans is the average activity of all the molecules in the training set.

A Z-score value is calculated by the following Eq. (3):

$$ Zscore=\frac{\left(h-\mu \right)}{\sigma } $$
(3)

Where, h is the q2 value calculated for the actual dataset, μ is the average q2and σ is the standard deviation calculated for various models built on different random datasets [20].

F-test is Fisher value which indicates statistical significance, a value greater than 30 is considered good, which gives an idea of the chances of failure of the model. On the other hand, q2_se is the standard deviation of cross validated prediction and r2_se is standard deviation is a measure of the absolute quality of a model.

Pharmacophore modeling

The pharmacophore model was built using the Phase module of Schrodinger maestro [21]. The same set of inhibitors of PfM18AAP was subjected to LigPrep module which produces high-quality, all-atom 3D structures with correct chirality. Some pharmacophore hypotheses were generated along with their respective set of aligned conformations. These hypotheses were generated by a systematic variation of many sites and a number of matching active compounds. These selected features were used to build a series of pharmacophore hypotheses by selecting find the common pharmacophore option in phase. The common pharmacophore hypotheses were analyzed using the survival score to yield the best alignment of the active ligands using a maximum overall root mean square deviation (RMSD) value of 2 Å for distance tolerance. Finally, several pharmacophore hypotheses were generated along with their respective set of aligned conformations. All pharmacophore hypotheses were scored for active survival, inactive survival, site, vector, volume, the number of matches, selectivity, energy, active, and inactive terms. Survival score secured by each hypothesis is the measure of the quality of alignment for a particular hypothesis [22].

Docking and scoring

Molecular docking

To understand the nature of the interaction of inhibitors described above [23] with PfM18AAP, molecular docking was performed using GOLD v5.2 (Genetic Optimization for Ligand Docking) [24] and GLIDE module of Schrödinger using [21] against the PfM18AAP. The crystal structure of PfM18AAP (4EME) was obtained from protein data bank (www.rcsb.org/pdb/explore/explore.do?structureId=4eme). Since PfM18AAP requires cofactors for enzymatic activity, Zn was retained during docking analysis [25]. In GOLD docking, the 10 best docked complexes were ranked based on their GOLD fitness score. In GLIDE docking, the top 10 compounds were selected based on G-score. The binding affinity of docked complex was calculated using X-Score v1.2.1 [26]. Protein-ligand interaction was analyzed by using Pymol version 1.1r. www.pymol.org/ and LigPlot + v1.4.5 [27].

Screening of PfM18AAP inhibitors

In this work, High Throughput Virtual Screening (HTVS) used Glide module of the Schrodinger software suite [21]. The ligand libraries were first prepared by adding hydrogen and generating conformations through the LigPrep module. This LigPrep module generated tautomer with the OPLS2005 force field, the total no. of 411,766 output structures were obtained. Then grid on the protein active site was generated. Firstly, HTVS for every ligand library was done and the top 1000 ranked compounds from every library were subjected to Extra-Precision (XP) screening. In both the cases, the structures were flexibly docked on the protein structure. The non-planar conformations were penalized. Structures were having more than 200 atoms or more than 35 rotatable bonds were not docked. Also, the Van Der Waal’s radius scaling factor was set to 0.8, and the partial charge cutoff was set to 0.15. From these 1000 compounds, the top 10 compounds from every library were extracted as target-bound complexes. These complexes were re-scored, and their binding affinity was calculated using X-score software.

Results

3D QSAR modeling using PLSR Method

A dataset known as inhibitors of PfM18AAP (AID: 743024) was used for the unicolumn statistics analysis, which showed that the training and test sets were suitable for 3D QSAR model development. The test set is interpolative i.e. derived within the min-max range of the training set. The unicolumn statistics scores were shown in Table 1. The PLSR model demonstrated that descriptors S_356, S_660, E_996, and S_270 are important features to inhibit the activity of PfM18AAP, which represent steric and electrostatic field energy of interactions. The statistical parameters calculated for developed 3D QSAR model for PLSR shown in Table 2. The number suffixed with descriptors represents its position on the 3D spatial grid.

Table 1 Unicolumn statistics for training and test set
Table 2 The statistical parameters for PLSR, PCR and 3D-QSAR models

Equation 4 represents the PLSR 3D QSAR model:

$$ \mathrm{pIC}50 = \hbox{-} 0.0270\ \left(\mathrm{S}\_356\right) + 0.0182\left(\mathrm{S}\_660\right)\ \hbox{-}\ 0.0905\left(\mathrm{E}\_996\right)\ \hbox{-}\ 0.0125\left(\mathrm{S}\_270\right) + 6.1966 $$
(4)

3D QSAR modeling using PCR

The 3D QSAR Model was developed on the same datasets of molecules by PCR method, and several statistical parameters were calculated which are shown in Table 2. The number suffixed with descriptors represents its position on the 3D spatial grid. This model indicated that descriptors are significant for their biological activities.

Equation 5 represents PCR 3D QSAR model:

$$ \mathrm{pIC}50 = \hbox{-} 0.0321\left(\mathrm{S}\_356\right) + 0.0147\left(\mathrm{S}\_660\right)\ \hbox{-}\ 0.0886\left(\mathrm{E}\_996\right)\ \hbox{-}\ 0.0092\left(\mathrm{S}\_270\right) + 6.3423 $$
(5)

3D QSAR Modeling using (kNN-MFA)

The kNN-MFA model shown that the contributing descriptors E_862 (1.0026 1.1562), S_629 (-0.4639 -0.1045) and S_287 (-0.3372, -0.2663) which indicated that degree of amino group shows potent activity. The range at the lattice point E_862 (1.0026, 1.1562) which is positive that means substitution with more electron density could yield more active molecules.

Pharmacophore-based screening of PfM18AAP inhibitors

From the Phase Software, ten hypotheses (pharmacophore models) were generated having four features DHRR (one hydrogen bond donor (D), hydrophobic groups (H) and two aromatic rings (R)). These features were common to all of the 15 compounds of the assay. Common pharmacophore hypothesis is shown in Fig. 1. The best model was chosen based on the survival score and pharmacophore based QSAR. The final hypothesis, DHRR.31 model, was selected based on the survival score and pharmacophore based QSAR, which showed the best alignment of the active set along with the site score (0.79), vector score (0.949), and volume score (0.527), top 5 model is shown in Table 3.

Fig. 1
figure 1

Diagram showing pharmacophore alignments of known PfM18AAP inhibitors (AID 743024)

Table 3 The statistical values of top 5 the pharmacophore hypotheses

Molecular docking

The same data set used for QSAR and Pharmacophore modeling was subjected to the molecular docking analysis. The top 10 compounds showed GOLD fitness score from 60.62 to 39.81 and predicted binding energy from -6.43 to -7.38 kcal/mol (calculated using the X-Score) and G-score from -7.80 to -4.70 kcal/mol (Table 4). The Ligplot + analysis showed that Ser116 and His87 amino acids interact by h-bond interaction, with docked ligands. Since PfM18AAP requires a cofactor for enzymatic activity, docking was performed along with cofactor bound with specific amino acids. A docked complex is depicted in Fig. 2. These results suggest that the novel PfM18AAP inhibitors could be designed considering parameters of docking results leading to new potent drugs against malaria.

Table 4 Top scoring compounds screened using the selected pharmacophore hypothesis
Fig. 2
figure 2

Docked Complex of PfM18AAP with known ligand 4-[(7-chloroquinolin-4-yl) amino]-2-(diethylaminomethyl) phenol

Molecular docking analysis was done on another dataset (AID1822:3502 molecules from PubChem Bioassay) known inhibitors of PfM18AAP. The top 10 compounds showed G-score from -7.72 to -6.52 kcal/mol. The G-score indicated that these compounds (Table 5) might bind to pfM18AAP with good binding affinity. Further, predicted binding affinity calculated using X-score for best compounds was found to be in between from -9.54 to -6.51 kcal/mol (Table 5).

Table 5 Prediction of pIC50 Value of current antimalarial drugs described in the WHO

HTVS based screening of PfM18AAP inhibitors

ChEMBL antimalarial dataset (153873) was subjected to molecular docking. The top 10 compounds (after docking), based on their G-score are shown in Table 6. The glide score of these compounds varies from -12.50 to -10.45 kcal/mol. The G-score indicated that these compounds (Table 6) have a good binding affinity for PfM18AAP enzyme. Figure 3 shows the docked complex of ligand CHEMBL1506682 (2-(3,4-Dihydroxyphenyl)-5,7-dihydroxy-4-oxo-4H-chromen-3-yl hexopyranosid-uronic acid) in the active site of the receptor with best G-score (-12.50 kcal/mol).To further validate in silico, predicted binding affinity of the best pose obtained from docking studies for each compound was calculated using X-score program was found to be in between -8.28 and -6.89 kcal/mol shown in Table 6.

Table 6 Top scoring of QN, CQ and 8 Amino-QN analogous screened using the selected pharmacophore hypothesis
Fig. 3
figure 3

Ligplot diagram and docked Complex of PfM18AAP with ligand ChEMBL Database Compound [2-(3,4-Dihydroxyphenyl)-5,7-dihydroxy-4-oxo-4H-chromen-3-yl hexopyranosiduronic acid]

Discussion

The best model was selected through the comparison between fitness plots (Fig. 4) and radar plots for training and test sets (Fig. 5 (a, b)). The linear graphical representation of fitness plots shows the observed and predicted activities of the data set. The radar plots show the training and the test sets separately by the red (actual activity) and blue (predicted activity) lines. The radar plot for training set represents a good r2 value because the two lines show a good overlap while for the test set a good overlap represents high pred_r2 value. The PLSR contribution plot for the descriptor is given in Fig. 6 which represents the contribution of various descriptors which are important for the inhibitory activity. In PLSR and PCR models, the negative value in electrostatic field descriptors indicates that negative electronic potential is required to increase antimalarial activity, and more electronegative groups are preferred in that position. Though positive value in kNN-MFA model shows that group that imparting positive electrostatic potential is favorable for antimalarial activity, so less electronegative group should prefer in that region. Similarly, negative values in steric descriptors indicate that negative steric potential is favorable for activity, and less lipophilic substitutions or bulky substituents group should be considered in that region, positive value of steric descriptors reveals that positive steric potential is favorable to increase antimalarial activity as in case of 4-[2-(quinolin-4-ylamino)ethyl] benzene-1,2-diol, and more bulky group is advised to prefer in that region. Comparison of statistical parameters of PLSR, PCR, and kNN-MFA, is shown in (Additional file 1) and the predicted pIC50 values in (Additional file 2).

Fig. 4
figure 4

Scatter plots showing the correlation between actual versus predicted activities for training and test set molecules by using 3D QSAR model- PLSR, PCR, and kNN-MFA

Fig. 5
figure 5

Radar plots showing the actual and predicted activities for a Training set b Test set molecules by using 3D QSAR PLSR model

Fig. 6
figure 6

Plot of the percentage contribution of each descriptor in developed 3D QSAR PLSR model explaining variation in the activity

In the present work, we performed screening of CHEMBL antimalarial library to search antimalarial compounds based on the pharmacophoric hypothesis DHRR.31, which resulted in 29,671 compounds. These compounds were subjected to glide docking against PfM18AAP. The top 10 compounds were selected based on the fitness and G-score; predicted activities are shown in Table 7. Further we also carried out screening of 27 WHO antimalarial drugs which resulted in 14 molecules shown in Table 8. Moreover, 17 compounds of 8-aminoquinolines analogous, 24 compounds of CQ analogous and 32 compounds of 8 amino-QN analogous were subjected to screening resulting 17,19, and 22 PfM18AAP inhibitors respectively (Table 8). The resultant top 2 compounds from each analogous were selected based on the fitness and G-score; predicted activities are shown in Table 9. The study found that WHO current antimalarial compound CHEMBL682 (Amodiaquine) has highest predicted value of pIC50 6.38 which is also present in the known dataset of PfM18AAP with pIC50 value 6.72.

Table 7 Molecular Docking Results for known inhibitors (AID743024) against PfM18AAP
Table 8 Molecular Docking Results for known inhibitors (AID1822) against PfM18AAP
Table 9 Top scoring 10 potential inhibitors from CHEMBL antimalarial Library against PfM18AAP

We analyzed the types of interactions of each top ranked compound for known inhibitors (AID1822) against PfM18AAP; 2D plots were generated using Ligplot + software and ligand-protein complex. The number of hydrogen bonded interactions, lipophilic interactions and the number of non-bonded interactions was counted and tabulated in Table 5. It is observed that overall all compounds from C1 to C10 have formed at least 1 (C1 and C10), mostly 4 (C3, C4, C7, C8, and C9), and at most 5 (C6) hydrogen bonds. The total number of lipophilic interactions for each compound varies in between 9 (for C9, C10) and 4 (for C3 and C7). Also, the total number of non-bonded interactions for each compound varies from 29 (for C5) to 76 (for C2). These observations suggest that the compounds C3, C4, C6, C7, C8, and C9 have better specificity as they have more hydrogen bonds and compounds C1, C2, C9, and C10 have good binding affinity due to a high number of hydrophobic contacts. The Compound C1 showed interaction with Glide score -7.72 kcal/mol. The docking poses analysis of C1shows one hydrogen bond (Gly509) interaction with amino acid residues of the protein. The next favorable interaction is shown by C2 with G-score of -7.71 kcal/mol and four hydrogen bond interactions with the active site residues Ser116, Asp325, Met436 and Lys463, 76 nonbonded interactions and inhibition (75.26 %) and eight hydrophobic interactions. The Compound C6 showed highest five hydrogen bond interaction (His438, Asp325, Glu380, His87, and His535). Asp325 is found to be the most conserved residues, which is present in 6 out of 10 compounds and Ser116 is found to be the most conserved residues, which is present in 5 out of 10 compounds. Hence, based on the Docking analysis against antimalarial PfM18AAP inhibitors, we conclude that these compounds have a better affinity with PfM18AAP enzyme, thus are novel potential candidate to develop drugs against malaria.

Further, we also analyzed the interactions of CHEMBL antimalarial library’s top ranked inhibitors against PfM18AAP (Table 6). The highest X score of - 11.6 kcal/mol was obtained with the ligand (CHEMBL1506682) having three hydrogen bond (Ser116, Glu381, and Met436) interaction with amino acid residues of the protein. The total number of lipophilic interaction for each compound varies in between 9 (CHEMBL602830 and CHEMBL429) and 4 (for CHEMBL511171). This observation suggests that CHEMBL1506682 have better specificity and CHEMBL602830 have a good binding affinity. Ser510 and Glu380 are found to be the most conserved residues, which is present in 5 out of 10 compounds. Hence, based on the comparison between known bioactive antimalarial M18AAP inhibitors (as control) and top ten novel ChEMBL compounds, we conclude that these compounds could bind to PfM18AAP with better affinity, thus are the potential candidate to develop drugs against malaria.

Conclusions

The present study was aimed at generating the predictive 3D QSAR models capable of revealing the structural requirements for antimalarial inhibitors of PfM18AAP. The comparison of the different statistical parameters of the three models suggests that PLSR model is best due to better internal validation q2 = 0.6128 and an external test of pred_r2 = 0.6101. Model 3 (kNN-MFA) also had a good internal validation showing q2 =0.7641, but the external validation had a bad pred_r2 = 0.0366. Therefore both PLSR and PCR models show potential predictive ability as determined by testing the external test set. Thus, 3D QSAR modeling provided a better understanding of the structural requirements of antimalarial compounds, which could help design potent PfM18AAP inhibitors. Also, pharmacophore mapping was applied to identify the binding modes and structural features of the ligands which are important for the biological activity of the inhibitors. The pharmacophore modeling showed that hypothesis DHRR.31 represented the best pharmacophore model for determining PfM18AAP inhibitory activity. Results suggested that the proposed DHRR.31 model can be used to identify the new M18AAP inhibitor and to design a drug rationally for p. falciparum from the extensive 3D database of molecules. Further, HTVS using Glide resulted in several potent PfM18AAP inhibitors from ChEMBL antimalarial data set of 153,873 compounds. These novel compounds having an excellent binding affinity with PfM18AAP are better candidates to design the drug in future. Finally, the 3D QSAR model was deployed on different data set to prioritize PfM18AAP inhibitors and predict new inhibitors. Thus, our study advocates the use of combined approaches of 3D QSAR, pharmacophore modeling, and molecular docking to search for novel potential inhibitors unique to PfM18AAP, which is essential and validated drug target involved in performing various enzymatic functions such as hemoglobin digestion, erythrocyte invasion, and parasite growth in the host cell.