Background

World Health Organization reported 228 million cases and 405,000 deaths worldwide from malaria disease in 2018, where children (< 5 years) accounted for 272,000 deaths. This made it one of the most lives threatening disease prevalent in Africa, where 93% and 94% of the 2018 global cases and deaths respectively came from the region (World Health Organization, 2019). Nigeria is the country with the highest malaria burden responsible for 25% and 24% of the 2018 global cases and deaths, respectively (World Health Organization, 2019). Out of the five species of Plasmodium parasite known to cause malaria, Plasmodium falciparum is the most deadly (Cohen et al., 2012) and commonest in the WHO African region responsible for 99.7% of estimated malaria cases of the region in 2018 (World Health Organization, 2019). Therefore, controlling the disease is important to global health.

Malaria is prevented by vector control through the use of treated mosquito nets and insecticides (World Health Organization, 2019). And the disease is treated by chemotherapies using quinoline derivatives (Salas et al., 2013), artemisinin and its derivatives (Jaen et al., 1990), antifolate drugs (Newton & White, 1999), and some antibiotics (Rathore et al., 2005). But, treatment and elimination of malaria remains a global challenge due to the parasite resistance to existing drugs, mosquito resistance to insecticides, and lack of successful malaria vaccine (Singh et al., 2015). WHO recommends the use of artemisinin-based combination therapies (ACTs) as first-line treatment for P. falciparum (Martinelli et al., 2008). However, resistance to ACTs is increasing in South East Asia Region, the Greater Mekong Subregion (GMS), and Western Pacific Region (World Health Organization, 2019). This could spread to other regions of the world, therefore, the need for highly potent antimalaria with low propensity to resistance.

Krake and coworkers studied 2,5-disubstituted furans as novel inhibitors of P. falciparum (Krake et al., 2017). Their study revealed that the furans have good in vitro activities (in EC50) against P. falciparum NF54 strain, low propensity to generate resistance and long half-life which is a good characteristic of antimalaria in accordance with Medicines for Malaria Venture (MMV) objective (Krake et al., 2017). But the analogues had tight Structure Activity Relationship (SAR) limitations, and their activities need improvement. In silico studies are efficient modeling techniques used to screen chemical databases so as to find novel drug leads (Tropsha, 2010). Herein, we carried out quantitative structure activity relationships (QSAR) on the 2,5-disubstituted furans analogues so as to come up with an empirical relationship between the chemical structures of the furans and their antiplasmodium activities, which could be used to design novel 2,5-disubstituted furans with better activities against P. falciparum. We also performed molecular docking study on the furans with Plasmodium falciparum lactate dehydrogenase (PfLDH) as the potential target so as to investigate the mode of interactions of the compounds with the target and have insight into ways of improving their antiplasmodium activities.

Materials and method

Data collection

2,5-Disubstituted furans with their in vitro activities against Plasmodium falciparum NF54 strain were gotten from the paper of Krake and coworkers (Krake et al., 2017) for used in this research. The antiplasmodial activities of the furans obtained as EC50 (nM) were converted to pEC50 (pEC50 = − logEC50) for the purpose of this research. The structures of the furans molecules and their activities are presented in Table 1.

Table 1 Molecular structures of the 2,5-disubstituted furans with their activities against P. falciparum NF54 strain

Geometric optimization

The molecules presented in Table 1 were drawn with the aid of Chemdraw (Li et al., 2004), and optimized using B3LYP functional and 6-31G basis set with the aid of the Spartan 14 software Version 1.1.4 (Becke, 1993).

Descriptors computation, normalization and pretreatment

The optimized molecules were imported to the PaDEL-Descriptor software in an SD file format, where 1875 molecular descriptors of the furans molecules were calculated (Yap, 2011). The descriptors were normalized based on Eq. (1) in order to give each variable equal opportunity to influence the construction of a high-quality model (Singh, 2013).

$$ X=\frac{X_i-{X}_{\mathrm{min}}}{X_{\mathrm{max}}-{X}_{\mathrm{min}}} $$
(1)

In the equation, X is the normalized descriptors, Xi is the descriptor’s value for each molecule, and Xmin and Xmax are the minimum and maximum values for each descriptor. Redundancy in the normalized data was eliminated using the Data Pretreatment software obtained from Drug Theoretical and Cheminformatics Laboratory (DTC Lab).

Data division

Using the Data Division software of DTC Lab, Kennard Stone method was employed in dividing the pretreated data into training set (70%) and test set (30%) (Kenard & Stone, 1969). The training set was used for model generation and internal validation while the test set was used for external validation.

Model generation

Genetic Function Approximation (GFA) incorporated in the Material Studio software was employed to select descriptors and carry out regression analysis on the training set so as to generate the model using the descriptors and the activities of the furans in pEC50 as the independent and the dependent variable, respectively.

Validation of the model generated

The model generated was assessed using Friedman formula (Friedman, 1991) defined as follows:

$$ \mathrm{LOF}=\frac{\mathrm{SEE}}{{\left(1-\frac{c+ dp}{M}\right)}^2} $$
(2)

LOF is the Friedman’s lack fit which is a measure of fitness of a model; SEE stands for standard error of estimation; p, d, c, and M are total number of descriptors in the model, user-defined smoothing parameter, number of terms in the model, and the number of compound in the training set, respectively.

SEE, the standard deviation of the model whose value has to be low for a good model is defined as follows:

$$ SEE=\sqrt{\frac{{\left({Y}_{\mathrm{exp}}-{Y}_{prd}\right)}^2}{N-P-1}} $$
(3)

Square correlation coefficient, R2, of a built model is another parameter considered and the closer it is to 1.0, the better the model built. This is expressed as follows:

$$ {R}^2=1-\sqrt{\frac{\sum {\left({Y}_{\mathrm{exp}}-{Y}_{\mathrm{prd}}\right)}^2}{\sum {\left({Y}_{\mathrm{exp}}-{Y}_{\mathrm{mtrn}}\right)}^2}} $$
(4)

Yprd, Yexp, and Ymtrn are the predicted, experimental, and mean experimental activities in the training set, respectively.

The value of R2 is directly proportional to the number of descriptors; hence, the stability of the model is not reliable on it. Thus, to have a reliable and stable model, R2 is adjusted according to the expression as follows:

$$ {R}_{\mathrm{adj}}^2=\left(\frac{\left(n-1\right)\left({R}^2-p\right)}{n-p-1}\right) $$
(5)

where p is the number of descriptors in the model, and n is the number of compounds used in training set.

Cross-validation coefficient, Q2cv, is expressed as follows:

$$ {Q}_{\mathrm{cv}}^2=1-\frac{\sum {\left({Y}_{\mathrm{prd}}-{Y}_{\mathrm{exp}}\right)}^2}{\sum {\left({Y}_{\mathrm{exp}}-{Y}_{\mathrm{mtrn}}\right)}^2} $$
(6)

where Yprd, Yexp, and Ymtrn are the predicted, experimental, and average experimental activity in the training set, respectively.

Square correlation coefficient, R2pred, of the generated model using test set was calculated for external validation. The closer the value is to 1.0, the better the model generated (Tropsha et al., 2003). It is expressed as follows:

$$ {R}_{\mathrm{pred}}^2=1-\frac{\sum {\left({Y}_{\mathrm{prd}}-{Y}_{\mathrm{exp}}\right)}^2}{{\left({Y}_{\mathrm{exp}}-{Y}_{\mathrm{mtrn}}\right)}^2} $$
(7)

Yprd and Yexp are respectively the predicted and experimental activities of the test set, and Ymtrn is the mean experimental activity of the training set.

Descriptors contribution

Degree of contribution of each descriptor in affecting the antiplasmodium activities of the furans was accessed by Mean Effect (MFj) value expressed as follows:

$$ {\mathrm{MF}}_{\mathrm{j}}=\frac{\upbeta_j{\sum}_{i=1}^{i=n}{d}_{ij}}{\sum_j^m{\upbeta}_j{\sum}_i^n{d}_{ij}} $$
(8)

bj is the coefficient of descriptor j in the model, dij is the value of the descriptor j in the descriptor matrix for each molecule in the training set, and m and n are the number of descriptors in the model and the number of training set molecules (Habibi-Yangjeh & Danandeh-Jenagharad, 2009). Variance inflation factor, VIF, and correlation analysis were performed on the descriptors in other to check for multi-co-linearity problem.

Y-randomization test

Random multi-linear regression models were generated (using training set) in Y-randomization test whose R2 and Q2 values have to be low for the QSAR model to be robust (Tropsha et al., 2003). Coefficient of determination, cR2p, whose value has to be greater than 0.5 for passing this test is also calculated in the Y-randomization test and is expressed as follows:

$$ {cR}_p^2=R\kern0.5em x{\left({R}^2-{R}_r^2\right)}^2 $$
(9)

R is the correlation coefficient for Y-randomization, and R2r is the average “R” of the random models.

Applicability domain of the generated model

The applicability domain of the model was determined using leverage (hi) (Veerasamy et al., 2011) and is expressed for a compound, i, as follows:

$$ {h}_i={X}_i{\left({X}^TX\right)}^{-1}{X}_I^T $$
(10)

Xi is the training compounds matrix of i. X is the n × k descriptor matrix of the training set compound, and XT is the transpose matrix of X used to generate the model. The warning leverage, h*, is the maximum value for X and is expressed as follows:

$$ {h}^{\ast }=\frac{3\left(p+1\right)}{n} $$
(11)

n is the number of training compounds, and p is the number of descriptors in the model.

Molecular docking

Optimized structures of chloroquine and nine furans molecules having the best antiplasmodium activities were saved as PDB file from the Spartan software and exported to the pyrex software were they were converted to PDBQT file. Crystal structure of the enzyme, Plasmodium falciparum lactate dehydrogenase (pfLDH) was obtained from protein data bank (PDB ID: 1CET) (Read et al., 1999), prepared by removing water molecules, co-crystallized ligand and hetero-atoms using the Discovery Studio software, then saved as PDB file and exported to the pyrex software were it was converted to PDBQT file. The prepared furan molecules (the ligands) and the prepared enzyme (the receptor) were docked using Autodock Vina in the Pyrx software after assessing the reliability of the protocol by re-docking of the enzyme with chloroquine (Trott & Olson, 2010).

Results and discussion

Genetic Function Algorithm (GFA) was used to generate QSAR model that relate the chemical structure of 2,5-disubstituted furans with their biological activities as active antimalarials. The model is presented as follows:

$$ {\displaystyle \begin{array}{l}{\mathrm{pEC}}_{50}=\hbox{-} 1.724449942\mathrm{GATS}5\mathrm{c}+1.042828833\mathrm{minsCl}\\ {}\cdots \kern2em \hbox{-} 0.133382678\mathrm{RDF}130\mathrm{m}\hbox{-} 0.185848627\mathrm{RDF}75\mathrm{p}\\ {}\cdots \kern2.5em \hbox{-} 0.029658471\mathrm{RDF}115\mathrm{s}+8.465152376\end{array}} $$

A predictive, robust, and reliable QSAR model that passed all the validation criteria was generated. The validation parameters of the model were given in Table 2. The model contained autocorrelation (GATS5c), atom type electro-topological (minsCl), and radial distribution function (RDF130m, RDF75p, and RDF115s) descriptors. GATS5c is a 2D Geary autocorrelation of lag 5 weighted by gasteiger charge. Here, gasteiger charge is the physico-chemical property calculated for every atom in the molecule and is the Geary coefficients. Increase in GATS5c decreases the pEC50 of the furans and vice versa. Minimum atom-type E-State: -Cl, minsCl is a 2D descriptor which showed the important of the present of chlorine atoms in the furans molecules. This descriptor incorporated the steric and electronic effects of the atoms surrounding any chlorine atom in the molecules (Kier & Hall, 1999), and it had positive contribution to the model; hence, increase in the descriptor increases the antiplasmodium activities of the furans. The third class of descriptors in the model was a radial distribution function descriptor denoted as RDFЅω where ω is any measurable molecular property, and S is the radius from the geometrical center of the molecule at which ω is measured for any atom or group of atoms in the molecule. S and ω for the descriptors RDF130m, RDF75p, and RDF115s were atomic mass, atomic polarizability, and electronic inductive effect (I-state) and 13 Å, 7.5 Å, and 11.5 Å, respectively. These descriptors contributed negatively to the antiplasmodium activities of the furans that is increase in them decreases the activities of the furans. The power of model in predicting the antiplasmodium activities of the furans is indicated by the low residual values (Table 3).

Table 2 Validation parameters of the model
Table 3 Experimental and predicted activities for the compounds with residual

Table 4 presents the Pearson’s correlation matrix, variance inflation factor and Mean Effect of the five descriptors in the model. The highest coefficient in the correlation matrix was 0.465085 between the descriptors RDF75p and minsCl which showed no significant inter-correlation among the descriptors in the model, and the variance inflation factor values for all the descriptors were less than 2. These mean that the descriptors in the model were good and void of multi-co-linearity (Beheshti et al., 2016).The Mean Effect revealed the extent to which each descriptor contributed to the antiplasmodium activities of the compounds. Therefore, the most important descriptor in the model was GATS5c, and the least was minsCl. The average R, R2, Q2, and the cRp2 from the Y-randomization test result presented in Table 5 confirmed the reliability, robustness, and stability of the built QSAR model (Tropsha et al., 2003; Roy et al., 2016). Hence, the model was not gotten by chance.

Table 4 Pearson’s correlation, VIF, and MF of the descriptors in the model
Table 5 Y-randomization result

The plot of predicted against experimental activities of the compounds as presented in Fig. 1, which indicated the predictive power of the generated model as shown by the linearity of the plot. Therefore, the model is good in predicting the antiplasmodium activities of 2,5-disubstituted furans scaffold. The generated model is void of systematic error as shown by the distribution of standardized residual on both sides of zero (Fig. 2) (Jalali-Heravi & Kyani, 2004). The plot of standardized residuals versus leverages (Williams plot) is given in Fig. 3. It showed that all the training and test set compounds are within the applicability domain as their leverage values was less than the warning leverage (h∗ = 1.125) and their standardized residuals within ± 2.5. Therefore, there is no outlier or influential compound; hence, any of the compound can be used in designing new potent 2,5-disubstituted furans molecules.

Fig. 1
figure 1

Plot of predicted activities against experimental activities of the compounds

Fig. 2
figure 2

Plot of standardized residual activity against experimental activity

Fig. 3
figure 3

Williams plot

The docking result shows that out of the nine docked furans, five had greater binding affinity, and almost all had greater interactions with the enzyme than chloroquine (Table 6). The compounds bind to the receptor via different amino acid; this suggests that they have different mechanisms of action. Compound 22 has the highest binding energy of − 6.9 kcal/mol among the docked compounds. This compound formed two conventional hydrogen bonds between the oxygen atom of the furan moiety and the HN fragment of MET30 and ILE31 amino acids of the enzyme, one conventional hydrogen bond between hydrogen atom from alkyl substituent of the furan and oxygen atom of THR97 residue, and one carbon hydrogen bond between carbon atom from alkyl substituent of the furan and oxygen atom of GLY99 residue (Fig. 4). The compound also formed hydrophobic interactions with the receptor: two of pi-sigma type with ILE31 and ILE31 residues, two of alkyl-alkyl type with ALA236 and PRO246 residues, and one of pi-alkyl type with PRO246 residue. Figure 5 shows the hydrogen bonds and hydrophobic bonds surfaces of the interactions. Although the binding affinities of the docked compounds did not correlate well with their antiplasmodium activities suggesting that pfLDH may not be their target enzyme or rather their main target enzyme, the results could be utilized in developing highly potent furan molecules capable of inhibiting pfLDH.

Table 6 Docking result between the selected ligands and pfLDH
Fig. 4
figure 4

3D and 2D structures of 22-pfLDH interactions

Fig. 5
figure 5

Hydrogen bonds and hydrophobicity surfaces of 22-pfLDH interactions

Conclusion

QSAR model that linked the chemical structures of 2,5-disubstituted furans and their antiplasmodium activities was generated using Genetic Function Algorithm in the Material Studio software. The model was statistically validated to be reliable and robust with internal and external square correlation coefficient of 0.982 and 0.735, respectively. The model revealed that GATS5c, minsCl RDF130m, RDF75p, and RDF115s descriptors connected to the antiplasmodium activities of the furans where increase in all, but minsCl descriptors decrease the activities and vice versa. Molecular docking study conducted revealed that the furans interacted well with pfLDH enzyme, but the enzyme may not be their major target as the binding energy of the docked compound did not correlate well with their antiplasmodium activities. The QSAR and the molecular docking studies could be married together to design novel furans molecule with high potency to inhibit Plasmodium falciparum.