Introduction

Natural products have been used since ancient time as remedies to treat a plenty of illnesses and diseases [1]. They have attracted much attention and have become an important source of new drugs in the last years. Some of the natural products have shown a great interest in the fight against COVID-19 pandemic [2,3,4,5,6,7].

Terpenoids are secondary metabolites of isopentenyl pyrophosphate oligomers, which include the largest group of plant natural products. Terpenoids is a class of compounds that are divided into subclasses according to the number of its isoprene units (a) and carbon atoms (b) and are identified by the notation a:b, e.g., monoterpenes (2:10), sesquiterpenes (3:15), diterpenes (4:20), sesterpenes (5:25), triterpenes (6:30), carotenoids (8:40), and rubber (> 100: > 500). Among terpenoids compounds, triterpenes are commonly used for medicinal purposes in many countries due to their various pharmacological properties.

In fact, the literature provides a variety of research on the interesting biological activity of oxygenated triterpene compounds. The first studies were based on 7-oxopregnenolone derivatives evaluated as very potential anti-cortisones [8]. In addition, derivatives of 7-oxo-dehydroepiandrosterone have recently been used for their reduction of hyperglycemia or strengthening of the immune system [9]. In the same context, plants of the category Euphorbiaceae are the largest genus with species containing cytotoxic, antibacterial, antifungal, antiparasitic, antitumor, and anti-inflammatory triterpenes [10,11,12,13]. Derivatives obtained by conventional oxidation, using metalloporphyrin complexes, of triterpenes semi-synthesized from Euphorbia officinarum latex [14,15,16] have demonstrated strong postingestive toxic effects on the insect pest Spodoptera littoralis [17, 18]. In addition, some of these derivatives protected tomato plants from Verticillium dahliae at low concentrations, elicited H2O2, and increased antioxidant enzyme activity suggesting elicitor-like effects [19, 20].

In recent years, research on natural substances has led to the development and discovery of drugs for human use and cytotoxic agents for use as pesticides [21]. In this context, we rely in this work on semisynthetic triterpene derivatives to predict the possibility of using these molecules as a new source in the development of insecticides and also for further application as antibacterial. For this purpose, we used 3D-QSAR techniques because of their wide use in drug discovery, due to the correlation between the quantitative three-dimensional structure of molecules and biological activity [17, 18]. Then, we perform molecular docking and prediction of drug kinetic parameters (ADME-Tox) in silico, in view of their importance in the optimization of the accessibility to a new drug at preclinical phase [22]. In this paper, a 3D-QSAR study on 27 triterpenes derivatives was used to build the QSAR model, which was generated using comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) [23]. New molecules were designed, and their activities predicted (pLD50) based on the analyses of the contour maps provided by the 3D-QSAR model. Molecular docking studies of semi-synthesized and newly designed compounds were carried out with MurE protein for antibacterial activity and with EcR protein for anti-insecticidal activity in pesticide use on the other hand. It is in order to understand the main structural requirements and to analyze the main interactions between ligands and receptors. The results obtained confirmed the suitability of the semi-synthesized and also designed compounds as antibacterial agents and insecticides. Each designed compound tested by drug-likeness and computational pharmacokinetics (ADME-Tox) parameters.

Material and methods

Preparation of the database

All molecules were carefully sketched and saved as separate Mol2 file in MOL2 format. The SYBYL X2.1.1 software was used to generate three-dimensional molecular structures and to minimize the energy of each 3D structure created with the standard Tripos Powell force field (100 iterations) [24]. The three-dimensional structures of the twenty-seven semi-synthesized triterpenes studied in this work are optimized and their energy minimized by computation of the popular Gasteiger-Hückel atomic partial charges to construct 3D-QSAR models [25]. All molecular structures were analyzed with a distance-dependent buffer function until a root mean square (RMS) deviation of 0.05 kcal/(mol) was reached by the SYBYL X2.1.1 software.

Analysis of the distribution

In order to perform more powerful 3D-QSAR models, a cluster analysis based on molecular features was performed [26]. In the present work, we have used a set of 27 compounds, which was divided into subsets, a training containing 20 compounds and a test set enclosing 7 compounds. The test set was used to evaluate the predictive ability of the obtained models. The compounds belonging to the test set were chosen based on a diverse range of lethal dose activities and structural diversities. The structures LD50 and pLD50 of the 27 studied compounds are presented in Table 1. These data were exploited to develop the 3D-QSAR models, comparative molecular field analysis (CoMFA) [27], and comparative molecular similarity indices analysis (CoMSIA) [28]. Using CoMFA and CoMSIA approaches to analyze the physical and chemical properties of the studied molecules by analyzing the contour maps obtained after developing the 3D-QSAR model.

Table 1 Chemical structures of the studied molecules and their lethal dose related to the insecticidal activity
figure a

3D-QSAR studies

The molecular alignment method is an important step in the development of a 3D-QSAR model. The highest lethal dose of compound 16 was chosen as a reference. Figure 1 shows that all 3D molecular structures of the training and test sets were aligned to the common core using the alignment technique [29]. The 3D-QSAR models are also established to predict and explain the lethal dose of the studied compounds. In this work, we have developed 3D-QSAR models by relating the following descriptors: static (S), static (E), hydrophobic (H), hydrogen bond donor (D), and hydrogen bond acceptor (A) with the biological activity (pLD50). The aforementioned molecular descriptors used to develop 3D-QSAR models were obtained by CoMFA and CoMSIA techniques.

Fig. 1
figure 1

Alignment of the data set database

The CoMFA model was built based on steric and electrostatic field descriptors, while the CoMSIA model was built based on steric, electrostatic, hydrophobic, hydrogen bond donor, and hydrogen bond acceptor field descriptors. The realization of the 3D-QSAR patterns was performed as previously reported in the literature [25,26,27,28,29]. All of these analyses were realized using the Tripos force field with a reference spatial grid of 2 Å in all Cartesian directions. We used a carbon atom hybridized sp3 and with a net charge of + 1.0 was used as a source to calculate steric and electrostatic energies. The correction factor was set to the default value of 0.3, which controls the slope of the Gaussian function, and the cutoff energy is set to 30 kcal/mol by default [30].

CoMFA and CoMSIA study

The QSAR CoMFA and CoMSIA models were developed using the partial least square (PLS) algorithm [31]. The built models were then validated using cross-leave-one-out validation method and Bootstrapping methods [32]. Based on the training set, the CoMFA and CoMSIA models are developed based on PLS algorithms. The models selected have high values of the standard correlation coefficient (R2 > 0.5) as well as the cross-validation correlation coefficient (Q2 > 0.5) and low values of the standard error of estimation (SEE) value. The predictive power of the developed 3D-QSAR models is evaluated by the index R2pred, which is calculated based on the test set.

PLS analysis and test validation

To obtain a statistically valid 3D-QSAR model, we have implemented PLS (partial least squares regression) method to obtain a linear correlation between the observed pLD50 (dependent variable) and the descriptors of CoMFA and CoMSIA (independent variables) separately. During the PLS analysis, the leave-one-out (LOO) cross-validation method was used to determine the optimal number of components N and the cross-validation correlation coefficient (Q2). After determining N, the analysis using the cross-validation method was performed to test the overall significance of the model by calculating statistical parameters such as coefficient of determination (R2), standard error of estimates (SEE), and F-value (Fischer test). To further evaluate the robustness and statistical confidence of the built models, a bootstrapping analysis for 100 cycles was performed. Bootstrapping involves the generation of many new data sets from the original data set [33]. The statistical calculation was performed on each of these bootstrapping samples. The difference between the parameters calculated from the original data set and the average of the parameters calculated from the many bootstrapping samples is a measure of the bias of the original calculations. All validated results are analyzed according to the fact that Q2 value should be greater than 0.5, which indicates that the probability of association by chance is less than 5% [34]. Similarly, R2pred values greater than 0.6 was obtained after external validation confirms the reliability of the constructed 3D-QSAR models. The predictive power of the 3D-QSAR models (CoMFA and CoMSIA) was evaluated through a test set of 7 molecules, which were aligned as those in the training set, and their activities were predicted using the model created by the training set. The prediction correlation coefficient r2pred, the cross-validation coefficient Q2, and the root mean square correlation coefficient r2m were calculated using the formulas shown in Eqs. (1), (2), and (3).

$$Q^{2} = 1 - \frac{{\sum\limits_{i = 1}^{N} {(Y_{\exp } - Y_{pred} )^{2} } }}{{\sum\limits_{i = 1}^{N} {(Y_{\exp } - \overline{Y}_{\exp } )^{2} } }}$$
(1)
$$R_{pred}^{2} = 1 - \frac{PRESS}{{SD}}$$
(2)
$$r_{m}^{2} = r^{2} \times \left[ {1 - \left| {\left( {r^{2} - r_{0}^{2} } \right)} \right|} \right]$$
(3)

where Ypred, Yexp, and \(\overline{\mathrm{Yexp} }\) are the predicted, experimental, and mean values of the target property (pLD50), respectively. PRESS is the predicted sum of squared residuals between the predicted and actual activity values for each molecule (i) in the test set, SD is the sum of squared residuals between the biological activities of the test set and the average activity of the molecules in the training set (N), R2 is the squared correlation coefficient between the observed and predicted activity, and r02 is the squared correlation coefficient between the observed and predicted activity without intercept.

In addition, the performance test of the CoMFA and CoMSIA models developed in this work was carried out by another test called Y-randomization, with which the possibility of obtaining high performance of the models can be refuted randomly (by chance) [35]. In this approach, the values of the dependent variable (biological activity) are randomly distributed, while those of the independent variable (selected descriptors) are kept integer. The random data are then used to build a new QSAR model. Then, to ensure the robustness of this new QSAR model, the squared correlation coefficient (\({R}_{r}^{2}\)) of the randomized model must be less than the squared correlation coefficient (R2) of the non-randomized model. The difference between the values of the mean square correlation coefficient of the randomized model (\({R}_{r}^{2}\))) and the square correlation coefficient of the non-randomized model (R2) is reflected in the value of the parameter \(({cR}_{p}^{2}\)) [36]. This measure limits the difference between the values of the squared correlation coefficients of the non-randomized model (R2) and the randomized model (\({R}_{r}^{2}\))) according to Eq. (4), The value of \(({cR}_{p}^{2}\)) must be more than 0.5 [37].

$${}^{c}R_{p}^{2} = R \times \sqrt {R^{2} - R_{r}^{2} }$$
(4)

In addition, the results of the CoMFA and CoMSIA models with statistically acceptable efficiency are represented graphically by contour maps. The analysis of these maps allows the prediction of the most important physical and chemical properties of the triterpene derivatives studied in this work, based on the synthesized reference molecule (16) that shows the high experimental activity (pLD50 = 6.22). The structural properties of these molecules allow us to design new molecules and improve their activity (pLD50) and to test the dual use of these molecules as new antibacterial drug agents and also as insecticides.

Drug-likeness properties and ADME-Tox pharmacokinetics in silico

An in silico study was performed to design and to identify new antibacterial and anti-insecticidal agents using 3D-QSAR. The drug-likness profiles (Adsorption, Distribution, Metabolism, Excretion and Toxicity) were determined for the newly designed compounds. The main reason for these two profiles is to predict the most important drug properties of the molecules and also to evaluate their pharmacokinetics before their synthesis [38].

The screening of newly designed molecules investigated for drug-like properties is based on the high pLD50 values predicted by the 3D-QSAR models. ADME-Tox parameters are predicted only for designed molecules that have successfully passed the drug-likeness screening process. The lethal dose (LD50) values predicted by the 3D-QSAR models for the newly designed molecules should be lower than the values (LD50) predicted by the in silico toxicity test. This is to ensure the safe use of these molecules as non-toxic drugs for humans and lethal for insects.

Molecular docking

In this work, molecular docking simulation is performed using AutoDock Vina software [39], in order to analyze the mechanism of interactions and study the binding modes to obtain a predictive view on the main structural requirements of triterpene derivatives to be considered as promising agents in medicinal use as antibacterials and also as insecticides.

One of the best sources of antibacterial targets is the biochemical pathway of peptidoglycan synthesis. The importance of Mur ligases as a suitable antibacterial therapeutic target was described in a study carried out by Kouidmi et al. [40]. Amide ligases (MurC, MurD, MurE, and MurF) are a convenient catalytic mechanism to develop multi-targeted antibacterial targets while reducing the potential of target resistance development [41]. We are investigating the possibility of using compound 16 and molecules designed from triterpenes as new insecticidal agents. This is done by docking these ligands with the ecdysone receptor (EcR) found in insects, in order to identify the most important types of interactions that take place and predict the most important reference sites in inhibiting the activity of this protein. EcR regulates larval development and promotes insect reproduction [42]. The inhibition of the activity of the EcR protein by designed molecules could be a valid proposal in the fight against the insect Mythimna separata, which is harmful to agricultural products.

In this work, the semisynthetic ligand (16) that showed the highest pLD50 value and the newly designed ligands are docked into the receptor pocket of MurE (PDB code: 1E8C) [43], also with the ecdysone receptor (EcR) complexed with the pesticide Tebufenozide (BYIO8346) (PDB ID: 1R20) [44]. In this work, ligands and protein were prepared using Discovery Studio; the same software was used to analyze the interactions between ligands and receptors. The 3D network was generated by the AUTOGRID algorithm in MGLTools 1.5.6. The box grid was constructed along the directions X = 60, Y = 60, and Z = 60 within the MurE receptor pocket and X = 40, Y = 40, and Z = 40 within the EcR receptor pocket with a distance of 0.375 Å between the grid points. The network center coordinates are fixed at 10.40 Å, 50.10 Å, and 98.99 Å at the MurE receptor and 59.883103 Å, 29.836276 Å, and 13.911379 Å at the EcR receptor; these coordinates are considered to be the insertion sites of the docked molecules. The obtained molecular docking results were analyzed using the Discovery Studio 2016 software.

Results and discussion

3D-QSAR models

Database division

Seven molecules were carefully selected as elements of the test set and the remaining twenty molecules as elements of the training set. The partitioning of the database was performed with respect to structural diversity and gradual biological activity (pLD50).

Molecular alignment

All the molecules belonging to the data set were correctly aligned, and the molecule 16 was used as a reference (Fig. 2).

Fig. 2
figure 2

Common substructure (core) used in the alignment, and the structure of the reference molecule 16

PLS analysis

The results of the PLS analysis are shown in Table 2. For a reliable predictive model, the cross-validation coefficient Q2 which defines the quality of the prediction should be greater than 0.5, while the cross-validated correlation coefficient indicates the accuracy of a QSAR model. The F-test value represents the statistical confidence.

Table 2 The statistical parameters of the CoMFA and CoMSIA models obtained by PLS analysis

The statistical results presented in Table 2 show that the predictive power of both CoMFA and CoMSIA models is good and statistically significant. In addition, the CoMFA model is more accurate than the CoMSIA model, as evidenced by the low residual MSE (0.052) values, high Q2(0.672) and R2(0.998) and R2pred (0.918) value.

As shown in Table 2 of the PLS summary, the CoMFA model shows the following results (R2 = 0.99, F = 443.73, and SEE = 0.052). The CoMSIA model shows also (R2 = 0.97, F = 60.38, and SEE = 0.071).

The high cross-validation coefficients Q2 (0.672 and 0.534) observed for the CoMFA and CoMSIA models confirm the good correlation between the descriptors of activity of pLD50 and the molecular fields descriptors calculated for all compounds of the training set. Therefore, the results of the CoMFA and CoMSIA models can be considered statistically acceptable by the internal validation test. The success of both models in the internal test did not confirm their performance to predict the activity pLD50 of the molecules outside the training set. Therefore, the robust of these models was verified by an external test, where these two models applied to the seven molecules of the test set. The high values of the \({R}_{\mathrm{pred}}^{2}\) coefficient obtained by the external tests of the CoMFA and CoMSIA models (0.918 and 0.94), respectively, indicate the powerful predictive power of these two models. Thus, we can exploit both models to predict the activity of new molecules derived from Triterpene that we can design and improve their anti-insect and antibacterial activity.

In addition, through a bootstrap test of 100 runs, we evaluated the measure of bias of the original database on which the two 3D-QSAR models were built. We performed this procedure to verify the effectiveness of the prepared database to build the obtained CoMFA and CoMSIA models. Through this test, we obtained high \({R}_{bs}^{2}\) values (0.96 for CoMFA and 0.97 for CoMSIA) with SEEbs (0.008 and 0.053) for the two models, respectively. These results indicate that there was no significant deviation between the statistical parameters calculated from the original data and the average parameters calculated after 100 runs. Therefore, the database prepared in this work to develop the two models of CoMFA and CoMSIA was adequate, and the high statistical parameters obtained in the internal and external tests are statistically reliable.

The ratios of the coefficients of the molecular field descriptors contributing to the construction of the CoMFA and CoMSIA models. We can notice that the ratios of steric (S) and electrostatic (E) fields descriptors were 38% and 62%, respectively, in the CoMFA model. The contribution ratios of steric (S), electrostatic (E), hydrophobic (H), hydrogen bond donor (D), and hydrogen bond acceptor (A) fields descriptors were 11.9%, 29.5%, 19.3%, 12.8%, and 22.6%, respectively, in the CoMSIA model. The high contribution ratios of the S and E and S, E, H, D, and A field descriptors in the obtained CoMFA and CoMSIA models, respectively, indicate that the biological pLD50 activity of triterpene derivatives structures is strongly related to steric, electrostatic, hydrophobic, and hydrogen bonding properties. Therefore, encoding these properties in the structure of Triterpene derivatives can improve the pLD50 activity of these molecules against insects and bacteria.

Table 3 gather the experimental and predicted inhibitory activities and residual values of the training and test sets obtained by the CoMFA and CoMSIA models.

Table 3 pLD50 values experimental, predicted by CoMFA and CoMSIA analyses and residuals

The residuals plot for the training and test sets obtained from the CoMFA and CoMSIA models are shown in Fig. 3A, B, respectively. Figure 4 illustrates the correlation plot between the predicted and experimental data for the training and test sets obtained from the CoMFA and CoMSIA models, respectively.

Fig. 3
figure 3

The residuals plot for the training and test sets obtained from the CoMFA (A) and CoMSIA (B) models

Fig. 4
figure 4

Experimental versus predicted activity of the training and testing set according to the CoMFA and CoMSIA models

Figure 4 shows a smooth proportional distribution of observed pLD50 activity values based on the predicted values via CoMFA and CoMSIA. The linear and regular distribution of the observed pLD50 values as a function of the predicted values can be explained by the high R2 values obtained for the two developed models and the low SEE values, thus confirming the performance of the obtained 3D-QSAR models. To further confirm that the performance of the 3D-QSAR models is not due to random chance, the Y-randomization test was performed five times. The results of this test are presented in Table 4.

Table 4 The Q2 and R2 values after the Y-randomization tests

From Table 4, we can notice that the Q2 and R2 values obtained after five random mixtures are lower than their original model counterparts, and the cRp values are also higher than 0.5. The strong correlation between the pLD50 activity of the structure of each molecule and the three-dimensional molecular field descriptors S, E, H, D, and A is not due to chance.

Contour maps analysis

The CoMFA contour maps indicates the region in space where aligned molecules can interact favorably or unfavorably with the receptor. On the 3D structure of the active molecules, the contour maps generated by CoMSIA allow one to encode the physicochemical properties and the areas of potential radicals likely to interact with the target receptors [45]. Contour maps are generated from the ratio of each molecular field descriptor that contributed to the construction of the 3D models of CoMFA and CoMSIA. These contributions are visualized as contour maps in the structure of the examined molecules. These visualizations display hypothetical ratios of favorable (80%) and unfavorable (20%) regions related to predicted biological activity in the three-dimensional structure of the examined molecules. In this work, we examined the structure of molecule 16 that presented the maximum observed pLD50 (6.22) value within the triterpene derivative series, and we consider its structure as a reference in the analysis of the generated contour maps.

CoMFA steric interactions

Figure 5A–C shows the spatial position of steric contour maps (green and yellow, Fig. 5B) and electrostatic (blue and red, Fig. 5C) in the structure of template molecule 16 obtained by the CoMFA model.

Fig. 5
figure 5

A Combination of steric and electrostatic fields, B steric contour maps, and C electrostatic contour maps based on the CoMFA model

From Fig. 5B, the green contours indicate areas favorable for large groups to increase the activity of the template molecule, while the yellow contours indicate areas unfavorable for large groups to increase the activity. From Fig. 5C, the red contours show the favorable sites for negatively charged groups to improve the activity of pLD50, while the blue contours show the favorable sites for positively charged radicals to improve the structure of pLD50 activity of the template molecule structure.

CoMSIA steric interactions

In Fig. 6A–E, it can be seen that the S and E fields of the CoMFA and CoMSIA models are similar and consistent, while the other fields (H, D, A) are distinct.

Fig. 6
figure 6

A Champs stériques, B Champs électrostatiques, C Champs hydrophobes, D Champs donneurs de liaisons H, E Champs accepteurs de liaisons H

The colors of the steric (green and yellow, Fig. 6A) and electrostatic (blue and red, Fig. 6B) contour maps reflect the identical structural properties favorable or unfavorable to improved pLD50 activity that are expressed by the CoMFA model.

shows the regions of hydrophobic and hydrophilic fields on the structure of the reference molecule 16. The yellow contours indicate the favorable positions of the hydrophobic groups to enhance pLD50 activity, while the white contours show the unfavorable positions of the hydrophobic groups but favorable to the hydrophilic groups to enhance pLD50 activity. From (Fig. 6E), the magenta contour maps show the regions favorable to hydrogen bond acceptor groups to increase pLD50 activity, whereas the red contours show the regions unfavorable to hydrogen bond acceptor groups.

CoMFA contour maps

From In Fig. 5B, the yellow contour immediately surrounding the 3,3-dimethylcyclohex-1-n group indicates that the addition of bulky radicals to this site can decrease the biological activity of molecule 16. However, the green contour surrounding the aromatic cycle indicates that this site is suitable for the substitution and addition of bulky radicals to improve the anti-insect and antibacterial activity (pLD50). The small green area around the forward methyl radical on the aromatic cycle of 3,3-dimethylcyclohex-1-ene shows the possibility to replace the methyl group with another group to achieve the desirable activity. In parallel, we see that the large green area around the isobutane group located at the right end of the structure shows the possibility to replace the isobutane group with large group. From Fig. 5C, we can see that the contributions of the blue contours are dominant over their red homologue adjacent to the 3,3-dimethylcyclohex-1-ene ring, which means that the insecticidal and antibacterial activity of the triterpene derivatives can be enhanced by the appearance of influential electron donor properties (I +) associated with the 3,3-dimethylcyclohex-1-ene group. As we notice that there is a marked affinity between the red and blue contours near the oxygen atom in the 4-methylcyclohex-2-en-1-one ring, this means that keeping the oxygen atom unmodified at this position is favorable for the conservation of the quantity relationship between the activity of the template molecule and its structure.

CoMSIA contour maps

Through the contour maps shown in Fig. 6 generated from the obtained CoMSIA model, we can notice that the position of the green and yellow contours (Fig. 6A) is very similar to the contours obtained from the CoMFA contour map visualizations, thus the same structural features as presented in the CoMFA analyses are abstracted.

Similarly, the predominance of blue over red electrostatic contours depicted in Fig. 6B greatly supports the results extracted from the steric (Fig. 5C) contours in the CoMFA model, but we can detect the production of a new red contour surrounding an ethene group that was not predicted by the CoMFA model. Thus, the structure of the pLD50 activity of the template molecule structure can be improved by adding radicals on the side of the ethene group that have inductive electronegative effects (-I).

Through the contours of the hydrophobic fields shown in Fig. 6C, we can notice a large profusion of white contours along the structure of the reference molecule. This means that the structure of triterpene derivatives has hydrophilic properties, and this property may be adequate for doping and absorption of these molecules by living organisms. Although we can observe a yellow contour surrounded by a segment covering the oxygen atom site in the 4-methylcyclohex-2-en-1-one group, this ring is favorable for hydrophobic radicals to enhance the structure of the activity of the template molecule structure 16.

Through visualization of the contour maps of the molecular field of hydrogen bond donor group (Fig. 6D) and hydrogen bond acceptor (Fig. 6E) interactions, we can notice the presence of spherical contours in the proximity of the 3,3-dimethylcyclohex-1-ene ring, which means that the addition of hydrogen bond donor radicals on the 3,3-dimethylcyclohex-1-ene ring is favorable for the activity of pLD50. On the other hand, we can note the magenta contour positions around the two oxygen atoms on the rings of 4,8-dimethyl-2,3,4,6,7,8-hexahydronaphthalene-1,5-dione; this means that the addition of hydrogen bond acceptor radicals at the dione sites is favorable to improve the biological activity of the triterpene derivative structure against insects and bacteria (Fig. 7).

Fig. 7
figure 7

Proposed structural modifications of oxo-dehydroepiandrosterone derivatives for the design of new potent and more selective molecules

Design of new compounds

Based on the 3D-QSAR study, we have designed thirty-eight (28–65) new derivatives of 7-oxo-dehydroepiandrosterone by modifying the chemical structure of the reference compound 16. Modification will be introduced based on our expertise in the synthetic chemistry, according to the reactions proposed in schemes 1 and 2.

Scheme 1
scheme 1

Proposed synthetic pathways of the designed compounds

Scheme 2
scheme 2

Proposed synthetic pathway of designed compounds 5565

In the present study, we applied the 3D-QSAR models to predict the pLD50 activity of the proposed new compounds. The proposed new molecules and their predicted pLD50 activities are presented in Table 5.

Table 5 The predicted activities of the newly designed compounds

The results of the predictions obtained for pLD50 through the 3D-QSAR models (Table 5) show that the newly designed compounds based on triterpene derivatives have a great ability to kill the Mythimna separata insect with the median lethal dose LD50 at the minimum concentration. These designed molecules could be used in the future as new insecticidal agents that are harmless to the environment. We can also test the pharmacokinetics of these molecules and the possibility of using them in the biomedical domain, for example, in the treatment of bacterial infections in humans. In the rest of this study, we first perform the prediction of the pharmacological properties of the new molecules that showed the highest values of pLD50. In order to evaluate the possibility of using these molecules as antibacterial drugs. or that the pLD50 activity at these molecules has a negative effect on the in silico pharmacological properties. Secondly, we have tested the possibility of using some designed molecules as insecticides by performing a molecular docking analysis.

ADME-Tox prediction and drug-like character

Evaluation of drug-like properties

There are several potential therapeutic agents that do not make it through clinical trials due to their unfavorable absorption, distribution, metabolism, and elimination (ADME) characteristics [46]. To this end, we investigate these properties in silico of new molecules designed to test their potential as new drugs. ADME is the most recent method used to find molecules that could drug candidates, which must fit Lipinski et al. [47], Veber et al. [48], and Igan et al. rules [49]. The study of similarity with drugs consists of identifying the properties of a molecule, whether or not it is a candidate for use as a drug antibacterial. In order to select the most candidate molecules for drug use, in a rigorous way, other significant properties such as total polar area (TPSA) and number of rotational bonds and molar refractivity were also determined. We performed an evaluation of the pharmacokinetic properties of 17 molecules selected from the 38 newly designed molecules. Molecules that presented high predicted pLD50 values are examined. The evaluation of the drug-likeness properties was performed using SwissADME online server [50], and the obtained outcomes are presented in Table 6.

Table 6 Drug-likeness properties of the selected compounds

According to the results in Table 6, all the selected molecules from the designed data set fulfill with all the required rules (“Lipinski's rules,” “Veber’s rules,” and “Egan’s rules”). They indicate that there is no problem with the oral bioavailability of these compounds, except for the molecules 50 and 53 which did not satisfy Egan’s rules. The results also showed that the molecules (48, 49, 54, 55, 56, 57, 58, 59, 61, 64, and 65) have high absorption capacity, while the molecules 44, 45, 50, 51, 52, and 53 have low absorption capacity. In addition, the synthetic accessibility values for all the proposed compounds are less than 10, which means that these molecules can be easily synthesized. Also, for oral bioavailability which is the partial range of the drug dose that finally reaches the therapeutic site, quantitatively denoted by % F, the acceptable degree of probability is 55%, which means that the molecule exceeds the five rules have been successfully met. All the proposed molecules obtained a score of 55%, indicating a good bioavailability. Activity artifacts in assays are a major problem for biological screening and medicinal chemistry. These artifacts are often caused by the formation of aggregates or the reactivity of the studied compounds under assay conditions. Several assay disrupting compounds (PAINS) have been identified as potential causes of erroneous or positive results [51]. The PAINS violations for the proposed compounds are shown in Table 6. All compounds showed zero PAINS alert and can be used as reference compounds.

ADME-Tox properties predictions

To predict the in silico ADME-Tox properties of the selected molecules (48, 49, 54, 55, 56, 57, 58, 59, 61, 64, and 65) for pharmacological interest, we have used the online tools pkCSM [52] and SwissADME [53]. The results are reported in Table 7.

Table 7 In silico predicted ADME properties for the selected compounds

Based on the results depicted in Table 7, we can conclude that:

  • All molecules have shown a high capacity of absorption in the intestine, where absorption is considered satisfactory when it exceeds 30% [52].

  • In terms of distribution indicators, the volume of distribution (VDss) is considered high if its value is higher than 0.45 [54]. The standard value for blood–brain barrier permeability (BBB) is considered good if its value is greater than 0.3 and poor if LogBB < − 1. For the central nervous system (CNS) index, compounds with LogPS > − 2 are reputed capable of entering the CNS, whereas compounds with LogPS < − 3 are considered unable to enter the CNS [52]. The results obtained indicate that most of the selected compounds have shown the ability to cross the barriers, except for molecule 48 which showed a weak ability to cross the BBB.

  • In terms of metabolism, cytochrome P450 (CYP) is an important enzyme for detoxification. CYP enzymes are present in all tissues of the body [55]. This enzyme oxidizes foreign microorganisms to facilitate their excretion. Many drugs are inhibited by the CYP cytochrome, and some can also be activated by it. Inhibitors of this enzyme can disrupt the metabolism of the drug, which may have an opposite effect to that desired. Studies on the ability of compounds to inhibit cytochrome P450 (CYP) enzymes play an important role in determining drug interactions and toxicity. The two isoforms of CYP (2D6 and 3A4) are primarily responsible for drug metabolism [52]. We found that all the designed molecules can be substrates of CYP3A4. In addition, molecules 55, 56, 59, and 64 can be considered as both substrates and inhibitors of CYP3A4.

  • Regarding the drug clearance index, which is important in determining drug doses to achieve stability of drug concentrations [52]. In the lower TCL value, it is likely that the drug will be more stable in the body and will reach the therapeutic target before excreted. The results of the prediction of this indicator indicate that the total clearance index of all newly designed molecules is less than 0.5 (log mL min−1 kg−1), thus increasing the possibility that a dose of these molecules will reach the therapeutic target.

  • Regarding the toxicity indicator, it is essential to check up whether the predicted compounds are not toxic, as this is important for the selection of drug candidates. The AMES test was used in this study, and this test is widely used to assess the toxicity of compounds [56]. To the extent that all compounds in the database are toxic, it is fortunate that not all proposed molecules are toxic. Table 7 shows that the LD50 values of the new designed particles are between 2.033 and 2.98 (mol/Kg). This indicates that the compounds are only lethal at very high doses as predicted by the oral rat acute toxicity test. The results of the ADME-Tox descriptors predictions obtained for the selected designed compounds have shown good pharmacokinetic properties. Thus, the designed molecules can be proposed in the development of new drugs for pharmacological use as antibacterials.

Relying on the aforementioned, several molecules can be synthesized by making modifications on the structure of the reference molecule 16 using our expertise in organic synthesis and could be tested to evaluate their biological activity towards different biological targets. In the next section, we will perform a molecular docking assay to study the potential of molecules 16, 55, 56, 59, and 64 to inhibit one of the most important pathways in bacterial diseases as an example. Thus, through this test, we will discover the most important binding modes that can take place between ligand 16 and the active sites of the target receptors (MurE). The development of new classes of antibacterial agents against carefully selected targets is a high priority task. Through the study of molecular docking, it will help us to identify the type of interactions that will occur between the semi-synthesized ligand 16 and the designed molecules (55, 56, 59, and 64) with biological targets that contribute to inhibit bacterial growth.

Molecular docking tests

Bacterial growth inhibitors

For this purpose, we have performed molecular docking between ligands 16, 55, 56, 59, and 64 with the receptor MurE using AutoDock Vina software [57]. As a first step, we removed all water molecules as well as other non-protein elements from the structure of the unbound brute MurE protein. The obtained results are presented in Table 8. The crystal structure of the MurE receptor is presented in Fig. 8. The modes of interaction obtained for ligands 16, 55, 56, 59, and 64 are presented in Fig. 9.

Table 8 Interaction modes of ligands 16, 55, 56, 59, and 64 with the receptor sites
Fig. 8
figure 8

The crystal structure of the MurE protein

Fig. 9
figure 9

2D and 3D docking poses interactions between compounds (X-A), 16; (X-B), 55; (X-C), 56; (X-D), 59; and (X-E), 66 to MurE active sites

The results embedded in Table 8, show that all designed compounds have binding affinity values between −7.9 and −10.0 kcal/mol, while the binding affinity value obtained for the reference molecule 16 is 8.1 kcal/mol. This confirms that the majority of the designed molecules 55, 56, and 59 are more stable inside the MurE protein pocket compared to the molecule 16, except the molecule 64 which is less stable (−7.9 kcal/mol). It is also noted that all the selected designed molecules establish hydrogen bonding and hydrophobic interactions with the MurE receptor. That means that the association of molecules 55, 56, 59, and 64 as well as the molecule 16 with the MurE receptor will change the state of the target protein into a functional state and then trigger a chain reaction that leads to the inhibition of cell lines that cause bacterial growth. From the molecular docking analysis, it is appeared clearly that the designed molecules showed significant binding modes to the MurE receptor, and it confirms the effectiveness of the conducted 3D-QSAR study to determine the sites in the structure of the molecules that control the antibacterial activity.

Inhibitors of insect proliferation “insect Mythimna separata model”

The study of the inhibition pathway of the ecdysone receptor (EcR) by the triterpene derivatives was carried out using the following steps. First, we have re-docked the ligand Tebufenozide with the EcR receptor in order to identify the active sites in the EcR protein pocket, which serve as reference sites for EcR inhibitory activity. Moreover, the re-docking procedure allows us to validate the efficiency of the docking molecular protocol carry out via AutoDock Vina in this work. Second, we have docked the ligands 16, 55, 56, and 59 into the EcR protein pocket and compared the binding energies obtained in each complex and also compared the number of binding interactions obtained between each ligand and the novel active sites predicted in the inhibition activity of EcR. In this study, we consider novel, more stable ligand-interacting active sites in the EcR receptor pocket as the most important reference sites in the inhibition of EcR protein activity.

The docking protocol is validated by re-docking the native ligand (Tebufenozide) into the EcR receptor pocket. The root mean square deviation (RMSD) between the original and re-docked ligand is acceptable in the range of 2 Å [58]. The active sites with which the co-crystallized ligand interacts in the EcR complex before this re-docking are Val416, Met381, Met380, Trp526, Tyr403, Thr343, Leu420, Ile339, Cys508, Leu511, Tyr408, Met507, and Asn504. Figure 10 shows 3D and 2D visualizations of the EcR protein complexed with the inhibitor Tebufenozide (green color) in chain D of the protein structure (purple) and the interactions between the ligand Tebufenozide and the active amino acid residues in the EcR pocket (PDB code 1R20).

Fig. 10
figure 10

A Original model of EcR protein, B and C 2D and 3D visualizations of tebufenozide’s ligand interactions

Figure 11 presents a superimposed view of the re-docked conformation (red color) and the original ligand (green color), and the RMSD value between them is 1.97 Å. The clear superimposed between both ligands and also the RMSD value less than 2 indicates the efficiency of the AutoDock Vina algorithms to perform molecular docking protocol with confidence. Thus, we can dock ligands 16, 55, 56, and 59 inside the EcR receptor pocket with the AutoDock Vina software.

Fig. 11
figure 11

Re-docking pose with an RMSD value of 1.97 Å (green = original, red = docked)

From Fig. 12 of the 3D visualization, we can notice that there is a correspondence in the interactions made between the most important active sites of the EcR protein and the original and re-docked ligands.

Fig. 12
figure 12

3D visualization comparison between ligand pose prediction and the crystallographic ligand pose for EcR

Figure 13 shows the predictions of the pose docking of ligands 16, 55, 56, and 59 in the EcR receptor pocket and the most important interactions that occurred between these ligands and the reference sites by which the EcR activity responsible for the proliferation of insect Mythimna separata is inhibited.

Fig. 13
figure 13

Interactions of ligands 16 (Y-A), 55 (Y-B), 56 (Y-C), and 59 (Y-D) with the most important active sites in the EcR receptor pocket

We have summarized the interactions of ligands 16, 55, 56, and 59 with the novel active sites identified by the molecular docking procedure in this work in Table 9.

Table 9 Docking results of ligands 16, 55, 56, and ligand 59 at receptor novel sites

The outcomes listed in Table 9 indicate clearly that the binding energy between the four studied ligands is close to each other, between −7.9 and −9 kcal/mol. Thus, we consider that the stability of the four molecules in the EcR receptor pocket to be good and close. Table 9 gather the most important active sites that contribute to the inhibition of EcR activity and to the inhibition of the growth of Mythimna separata. Finally, in the future, we will be able to complete this study by carrying out in vitro tests to confirm these results, and this study can be used to develop new agents in the fight against insect harmful to the environment, based on the studied triterpene derivatives.

Conclusion

The aim of this study is to improve the antibacterial and insecticidal properties of the studied triterpene derivatives based on the structural modifications. For this reason, we have performed an in silico study based on 3D molecular modeling techniques applied on 27 semisynthetic triterpene derivatives Thus, 3D-QSAR models were developed based on CoMFA and CoMSIA techniques and were checked for their performance by internal and external validations as well as Y-randomization test. Based on the obtained outcomes, we have identified the most favorable structural properties to improve the antibacterial and insecticidal activities of the semi-synthesized triterpene derivatives. In fact, the obtained contour maps, allow us to design 38 new derivatives and to predict their pLD50 against bacteria and insects. Among the 38 designed derivatives, 17 molecules have showed high and promising predicted pLD50.

The 17 new designed molecules were screened in silico to predict drug-like and ADME-Tox properties. This was done to evaluate the possibility of using these proposed molecules as promising antibacterial drugs in complement of their use as insecticides. Through the obtained ADME-Tox predictions, the newly designed molecules 55, 56, 59, and 64 are identified to be favorable candidates for antibacterial drugs. In addition, the AMES test has showed the safety of these molecules as oral drugs. Also, the acute toxicity index (LD50) oral rat predictions indicate that the newly designed molecules become toxic above the experimental and predicted LD50 values. This confirms the safety of the four designed molecules as promising antibacterial drugs.

Furthermore, the molecular docking test has showed the most important interactions that occur between the four proposed (55, 56, 59, and 64) and a reference molecule 16 with the targeted proteins (MurE and EcR). Hence, we have predicted the reference sites that can be targeted to inhibit the MurE protein (antibacterial activity) and to inhibit the growth and the reproduction of insects by targeting the EcR protein. Rely on the molecular docking test, we could also notice that the designed molecules 55, 56, and 59 are more stable with the targeted proteins pocket than the reference molecule 16. This was verified by the comparison of the binding energies between ligands and receptors into the obtained complexes. Finally, the obtained outcomes in this study indicate clearly that the studied triterpene derivatives have a great chance to become potential antibacterial and insecticidal agents.