QSAR and molecular docking based design of some indolyl-3-ethanone-α-thioethers derivatives as Plasmodium falciparum dihydroorotate dehydrogenase (PfDHODH) inhibitors

Malaria, a disease caused by one of the world’s fatal parasites Plasmodium falciparum, is responsible for over a million death annually. P. falciparum dihydroorotate dehydrogenase (PfDHODH) is a validated target of this deadly parasite. Quantitative structure–activity relationship and molecular docking in silico methods were employed in the discovery of unique PfDHODH inhibitors from the computational design derivatives of indolyl-3-ethanone-α-thioethers through models generation via a genetic function algorithm methods. The best model indicates good power of prediction with coefficient of determination, R2 = 0.9482, adjusted coefficient of determination (Radj2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{R}}_{\text{adj}}^{2}$$\end{document}) = 0.9288, Leave one out cross-validation coefficient (Q2) = 0.9201 and the external validation (Rpred2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{R}}_{\text{pred}}^{2}$$\end{document}) = 0.6467. The contribution of every descriptor in the model was investigated through finding their mean effect to (pIC50) the activities of the compounds. With MATS5m (− 0.11725), RDF75m (− 0.12097), VE3_Dzp (0.14697), and MLFER_BH (1.08528) contributing more to the model, while AATSC8p (− 0.04833) and minHBa (0.05430) contributed the least to the model. Hence, the mean effect indicated MLFER_BH to be the most relevant descriptor, which aided the design of five derivatives of indolyl-3-ethanone-α-thioethers. All the designed antimalarial compounds were deeply docked within the binding region thereby forming several hydrogens and hydrophobic bonds leading to the generation of better binding affinity and high binding scores (− 156.181 kcal/mol) compared to the design template (− 138.201 kcal/mol) and the standard drug (− 128.467 kcal/mol). Furthermore, all the five designed antimalarial compounds were found to be better bonded to the binding pocket of PfDHODH than other compounds reported by other researchers.


Introduction
Plasmodium is the causative organism of malaria, the most destructive disease worldwide [1], and is transferred between humans through infected Anopheles mosquitoes. The global cases of malarial were put at 228 million yearly with 405,000 deaths, with children under the ages of 5 are the most affected which account for 585,000 (67%) of all cases [2]. Murray and Perkins in 1996 [3], reported various species of Plasmodium, out of which the most savage of all the species of this genus is P. falciparum [4]. Malaria death rates may have decreased in recent years, the disease mortality figures are still on the high side even though the disease is preventable and treatable. This is largely due to loss of efficacy of antimalarial drugs in clinical use such as Chloroquine, Amodiaquine, Pamaquine, and Mefloquine, as a result of increasing drug resistance to the malarial problem [5].
Therefore, the development of new alternative agents that takes into account the problem associated with multiple drug resistance is highly necessary. Series of inquests on the inhibitory efficiencies of several derivatives against a target protein by testing different molecular structures. The inquest was centered on natural products as medicinal plants [6], marine organisms [7] or even bacteria [8] in addition to synthetic processes to prepare new heterocyclic [9] and organometallic [10] compounds. New indolyl-3-ethanone-α-thioethers derivatives were reported to have improved activity for inhibiting antimalarial action [11]. P. falciparum parasites rely mostly on nucleotide synthesis through the de novo pathway to provide the necessary precursor for DNA and RNA biosynthesis, unlike human cells that salvage preformed pyrimidine based as well as pyrimidine biosynthesis from the host cell through the de novo path. Plasmodium metabolic pathways are different from those of human hosts. Hence, aiming purine and pyrimidine metabolic pathways gives a promising route for unique drug development [12]. The oxidation of l-dihydroorotate (DHO) to produce orotate, is catalyzed by Dihydroorotate dehydrogenase enzyme as part of the fourth and rate-limiting step of the pyrimidine biosynthesis pathway [13].
Plasmodium falciparum dihydroorotate dehydrogenase (PfDHODH) was vital for parasite growth and was proven to be an antimalarial drug target for the antimalarial agents [14]. Several agents of triazolopyrimidine, benzamide, naphthamide and urea were reported to inhibit PfDHODH [15]. The major challenges facing the use of these antimalarial drugs are lack of antimalarial vaccine and the resistance of P. falciparum to the available drugs. These challenges associated with antimalarial compounds lead to the development of a quantitative structure-activity relationship (QSAR) model. QSAR is the mathematical model relating the structure of compounds with their biological activities. This research was aimed at relating some structural features of indolyl-3-ethanone-α-thioethers derivatives with their biological activities through the use genetic function algorithm (GFA) calculations and hence, lead to the design of new antimalarial compounds. More also, the binding modes of the active sites of the hypothetical antimalarial compounds were investigated by performing the molecular docking of the designed compounds.

Experimental dataset
Thirty-one indolyl-3-ethanone-α-thioethers derivatives were used as a data set. Their structural formula and antimalarial activity values against P. falciparum were obtained from the literature [11]. The activities of the compounds expressed as IC 50 (μM) (50% inhibition growth concentration against the parasite P. falciparum) were converted to pIC 50 (−Log 10 IC 50 ) as presented in Table 1.

Geometry optimization and calculation of descriptors
The 2D molecular structure of the data sets was produced from the molecular sketched in the Spartan14 [16]. These were subsequently converted to 3D by view module in the software. Conformational geometries were optimized using density functional theory (DFT) by invoking B3LYP [17] and a basis set of 6-311G* to provide a precise conformer relation throughout the compounds. The energy minimized structures were ported to PaDEL-Descriptor used to compute various descriptor classes ranging from 0D, 1D, 2D, and 3D-classes of Chemometric molecular descriptors [18].

Data pre-treatment
The pre-treatment of the molecular descriptors involves the removal of constant values descriptors together with those variables having high correlation coefficient values using "Data Pre-Treatment GUI 1.2" software that employs the V-WSP program [19,20].

Model development and selection
The model was built using the Molegro Data Modeller software where the descriptors and activities of the compounds were imported into the Molegro worksheet. The software randomly separated the dataset into two sets by setting 74% of the data sets (23 compounds) as the training sets and 26% of the data set (08 compounds) as the test sets. After selecting the training sets, the modeling option of the toolbar was selected were where all the active descriptors were all selected while the invaluable descriptors were frozen, and the regression analysis was carried out to develop the model. The choice of models constructed was done based on the conditions of R 2 , Q 2 , and R 2 pred [21,22].

QSAR model validation
The generated model was used on the test set to predict the activity of the set and the result obtained was analyzed for the existence of systematic error in the models [23]. In the absence of systematic error, the models was validated both internally and externally. Internal validation was done with the training set data only using the leave-one-out   (LOO) cross-validation technique. In the LOO, the training set was altered by discarding a data set compound and using the remaining data to construct a model using the validating model descriptors. The new equation obtained was subjected to predicting the activity of the discarded compound. This cycle was redone down to when all the molecules of the data set had been removed a single time.

Descriptor relevance (mean effect)
The influence of the calculated descriptors towards activities of the generated model was measured in terms of the mean effect. The mean effect was obtained from Eq. 1.
where j conforms with the descriptor j's coefficient, D j conforms with each value of matrix descriptor in the training set and m conforms with the tally of model descriptors present and n stands for the tally of molecules used as training set [24].

Models applicability domain (AD)
The plot of standardized residuals against leverage values (William's plot), was employed to interpret the relevant area of the model in terms of biological territory. The leverage strategies of applicability domain was utilized in this study [25] where compounds are assigned specific leverages based on their descriptor and is expressed as: descriptor of the concern compound i, and the training set descriptor matrix was represented as n × k. The caution leverage (h*) represent the limit of typical values for anomalies of X and it's expressed as: h * = 3(p + 1)∕n where n stands for the sum of training compounds, and p the sum of model descriptors present. The compounds that have their leverages h i value greater than caution leverage (h*) and standardized residual values greater than within plus or minus three standard deviation units were seen as anomalies [24].

Molecular docking studies
The software, Molegro Virtual Docker (MVD) predicted protein-ligand interactions form on fresh exploration methods that blend differential transformation with a cavity prediction method [26]. High resolution 1.50 Å crystal structure of P. falciparum dihydroorotate dehydrogenase (PfDHODH) (PDB: 4ORI) was obtained from the protein data bank. It was first extracted and opened with Material Studio software where the protein was first treated by removing water molecules, ligand groups, ions and heteroatoms contained in the pdb files while hydrogen was added to the protein component and saved. The saved file was then imported into the Molegro Virtual Docker where the binding pocket was defined with the aid of Molegro Virtual Docker cavity detection algorithm and the docking was performed to predicting the binding mode of the ligand and the target protein in form of scoring function. The MolDock scoring function is originally Gehlhaar's piecewise linear potential (PLP) that was expanded to including new hydrogen bonding and electrostatic terms [27][28][29].

QSAR results
After meticulous authentication and inspection, the selected model alongside its validation parameter is presented below; The selection was by virtue of the significance of the parameters as it has the largest value of R 2 = 0.9482, R 2 Adj = 0.9288, Q 2 cv = of 0.9201 and R 2 ext = 0.6467 . The internal, as well as the external validation parameters of the model, were in agreement with the minimum standard for a dependable and powerful model. An increase in physicochemical parameters of descriptors MATS5m, VE3_Dzp, minHBa, and RDF75m will increase inhibitory activities of indolyl-3-ethanone-α-thioethers derivatives against P. falciparum dihydroorotate dehydrogenase since their coefficients are positive. Likewise, descriptors with negative coefficients such as AATSC8p and MLFER_BH implies that inhibitory activities of indolyl-3-ethanoneα-thioethers derivatives will increase against PfDHODH enzyme with decreasing values of the descriptors.
The plot of experimental activities versus the predicted activities of the data sets is shown in Fig. 1 and the accuracy of the best model was confirmed as the predicted R 2 value concords with R 2 = 0.8494 reported graphically. The predictive strength of the model is in the high linearity of the plot. Table 2 compares the predictive pIC 50 and that of the experimental pIC 50 with the residual values very low confirming the predictability of the mode. External model validation shows an accurate relation among the experimental and predicted pIC 50 of the test set.
The relevance of the descriptors as well as the correlation between them was reflected in Table 3. Out of the six descriptors in the model, the mean effect statistical analysis (Table 3) Table 3, MLFER_BH (overall or summation solute hydrogen bond basicity) [30] descriptor with mean effect value of 1.08528 was revealed to have contributed most toward the QSAR model development.

Applicability domain of the model
A close observation of the applicability domain for the training as well as the test set objects (Fig. 2) shows that no compound appeared beyond the warning value (h* = 0.913). Indicating lack of outliers (h > h* = 0.913), i.e. no compound (either a training or test sets) is a typical anomaly within the cut off value of ± 3.0σ. Hence, the model was considered to have as good predictions.

In-silico design of antimalarial compound
The molecule with serial number 22 in the dataset (Table 1) was used as a template (Fig. 3) to design its several hypothetic novel derivatives. The template was chosen because of it relatively high activity i.e. pIC 50 = 7.0458, very good AD's leverage value and excellent standardized residual. The design of the derivatives was guided by the information obtained from the descriptors contained in the model. For example, MLFER_BH descriptor as earlier explained was found to be the most influential descriptor given the magnitude of its mean effect (Table 3). Therefore, the addition of electrophiles (electron-withdrawing groups) to the template will increase the antimalarial activity of the novel compounds. The template was modified through the addition and removal of a variety of substituents such as -Br, -Cl, and -NO 2 groups. The five compounds designed (Table 4) were found to have better activities than all the experimental compounds (Table 1)

Molecular docking studies results of indolyl-3-ethanone-α-thioethers derivatives
PfDHODH is an enzyme in the mitochondrial that catalyzes it reactions in the presents of both Flavin mononucleotide (FMN) and coenzyme Q (CoQ). Two half-reactions are required for the catalysis to result: catalytic oxidation of dihydroorotate by FMN, followed by catalytic reoxidation of FMN by CoQ. In PfDHODH, the catalytic domain (β/α-barrel fold in the inner membrane space) is formed by amino-acid residues 162-565. And the residues to the N-terminus of this domain is saddled with anchoring the protein to the inner mitochondrial membrane [15]. Majority of DHODH inhibitors are attached to the assumed CoQ binding site, which is located adjacent to FMN between the β/α-barrel domain and the N-terminal α-helical membrane domain. The differences in amino acid sequence between the Plasmodium and human enzyme, in the inhibitor-binding site was identified to build the species-selectivity of this inhibitors including triazolopyrimidine-based PfDHODH inhibitors [31,32]. The docking analysis showcased the preferred binding-conformation of designed derivatives (ligands) to the CoQ binding site of the target protein. These confirmation aid in predicting the nature and strength of interaction between the ligands and the target molecule. The structure of P. falciparum  Table 5. The designed compounds all have a higher binding affinity that design template with the exception of compound 22D that has its binding energy lower than that of the standard drug. Compound 22C (− 141.336 kcal/mol) was found to have the highest binding affinity as such is more compatible with the receptor than its co-designed compounds as well as even the standard drug. Compounds 22A and 22C are the most active ligands as reflected by their docking affinities in Table 5. Various interactions between these compounds and the target protein as shown in Fig. 5 revealed that for compound 22A, which include but not limited to two hydrogen bonding interactions, one each for conventional hydrogen bonding and carbon-hydrogen bonding both between amino acid residue, HIS56 and carbonyl oxygen of the inhibitor, distance of 2.80 Å, and 2.26 Å respectively in addition of several other hydrophobic interactions. Compound 22C has two conventional hydrogen bonds, the first between amino residue TYR356 and carbonyl oxygen atom of the inhibitor, distance 2.33 Å and the second between hydrogen atom of indole ring with SER305, distance 2.30Å in addition to four hydrophobic interactions (an amide-Pi stacked between ALA55 and benzene ring, distance 4.31 Å; and three Alkyl, between indole ring and VAL143, distance 4.28 Å and between benzene ring and ALA143 and ILE360, distance 4.37 Å and 3.74 Å respectively). These interactions show the binding role of oxygen, hydrogen and carbon atoms as well as their inhibitory capacities.

Conclusion
QSAR techniques applied to an antimalarial derivative, indolyl-3-ethanone-α-thioethers relate the molecular structures of the compounds and their antimalarial activities. Genetic Function Algorithm (GFA) was used to produce

Compliance with ethical standards
Conflict of interest On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethical standard
No human or animal subjects were involved in this study.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright