Introduction

In the end of February 2003, a novel human coronavirus was detected as the causative agent of the first major pandemic of the twenty-first century, severe acute respiratory syndrome (SARS). The first case of "atypical pneumonia" was declared in China and quickly and unexpectedly spread to 29 countries, especially in Asia and North America, alarming the World Health Organization (WHO). Within several months of the outbreak in 2003, the WHO reported that it had caused 916 deaths out of 8422 cases worldwide (10–15% case fatality rate) [1]. In early 2003, a new human coronavirus known as SARS coronavirus (SARS CoV) was recognized as the causative agent of SARS [2].

COVID-19 is the active pandemic which was first reported in late 2019 in Wuhan, China. In February 2020, SARS-COV-2 was announced as the causative agent. As of October 24th 2021, 243 million cases and over 4.9 million deaths have been reported. The 3C-like protease (3CLpro) enzyme or major protease (Mpro), is essential for the process of viral replication and infection, thereby making it an ideal target for antiviral therapy [1]. The coronavirus 3CLpro is a cysteine protease consisting of about 300 amino acids and containing three domains. Domains I (amino acids 8 to 99) and II (amino acids 100 to 183) consist of beta barrels that simulate the chymotrypsin and 3C proteinases. The binding site is located between the mentioned domains, and about 16 residues join domains I and II to residues 200 to 300 as the C-terminal domain III. The proteolytic activity of 3CLpro has been performed by this third five helices domain [3]. The 3CLpro enzymes show a highly conserved structure among known coronavirus species, and several common characteristics are shared among different coronavirus 3CLpro substrates [4]. Comparative sequence analysis has shown that the 3CLpros of the three coronaviruses of SARS-CoV-2, SARS-CoV, and MERS-CoV are very similar in structure and conservatism [5]. These findings indicate that 3CLpro could be used as a homologous target for the development of anti-coronavirus drugs that can inhibit the proliferation of various coronaviruses [4].

Based on various studies, a combination of nucleoside analogues such as ribavirin can be used for the treatment of SARS along with corticosteroids such as methylprednisolone and hydrocortisone [6,7,8,9]. Since the beginning of the COVID-19 pandemic different options for the treatment of this disease have been used including monoclonal antibodies, protease inhibitors, corticosteroids, convalescent plasma and so on. However, the definitive efficacy of these drugs has not been proven.

Previous research has revealed that isatin and its derivatives have a broad range of anti-bacterial and anti-viral activities such as anti-HIV [10, 11], anti-rhinovirus [12] and against mycobacterium tuberculosis [13]. The derivatized isatin scaffold may be a good candidate for the SARS CoV 3CLpro inhibitor because both proteases (human SARS CoV and rhinovirus) are cysteine proteases and are structurally similar in the active site [14].

In 2005, Chen et al. investigate that N-substituted isatin derivatives with anti-rhinovirus activity may also have anti-SARS activity. Therefore, based on these compounds, they synthesized new isatin derivatives and evaluated their inhibition activities against SARS CoV 3CLpro. The IC50 values showed that the mentioned isatin derivatives could inhibit SARS CoV 3CLpro in the low micro molar range (0.95–17.50 µM) [15]. Using the results of the previous study, Zhou et al. designed and synthesized a series of N-substituted 5-carboxamide-isatin compounds and evaluated their activities. They introduced some compounds as SARS CoV 3CLpro inhibitors which the most potent compound showed an IC50 of 0.37 µM [2]. In 2014 Liu et al. in order to improve the inhibitory activity of isatin derivatives against SARS CoV 3CLpro, investigated a replacement of the carboxamide group using a series of substituted sulfonamide groups in isatin. Optimization of 5-sulfonyl isatin derivatives led to the discovery of a new compound with the strongest potency (IC50 = 1.04 µM) [16].

Quantitative structure–activity relationship (QSAR) is one of the critical computational techniques for ligand-based drug design, which can statistically show the correlation between the structural and bioactive properties of compounds [17]. Molecular docking is a computational technique for predicting the optimal interaction of two molecules that creates a binding model, typically a small ligand with a protein receptor [18], most commonly used in drug discovery [19]. CORAL is a new software for developing the reliable and predictive QSAR/QSPR models based on SMILES or quasi-SMILES of materials and Monte Carlo optimization [17, 20].

The main goal of this study is to create the simple and reliable QSAR models by CORAL software to predict the inhibitory activity of 81 isatin and indole-based compounds against SARS CoV 3CLpro. In addition, the effect of using the index of ideality correlation (IIC) as the objective function for modeling in CORAL software has been investigated [21]. Moreover, the results from Monte Carlo optimization-based QSAR modeling with the further addition of molecular docking studies applied for pharmacologically important endpoints. SMILES notation-based optimal descriptors, defined as molecular fragments, identified as main contributors to the increase/decrease of biological activity, which are used further to search compounds from the ChEMBL database with targeted activity based on computer calculation, are presented. Here, molecular docking was applied as an additional method to validate the calculated activity of proposed compounds as novel SARS CoV 3CLpro inhibitors.

Data and methods

Dataset

In this study 81 isatin and indole-based SARS 3CLpro inhibitors were gathered from literature [2, 15, 16, 22,23,24,25]. The number isatin based compounds were 41 and the rest were indole-based compounds. The IC50 (µM) values for inhibitors were converted into their pIC50 (− logIC50). Table 1 shows the structure of the molecules along with their pIC50 (range between 4.08 and 7.77). BIOVIA Draw 2020 was used to draw the molecular structures of the compounds and convert them into SMILES symbols. The dataset divided the active training (≈25%), passive training (≈20%), calibration (≈20%), and validation (≈35%) sets randomly. To construct the QSAR models based on Monte Carlo optimization, four separate random partitions were performed.

Table 1 Molecular structures of isatin and indole derivatives along with their pIC50

Descriptors

There are three categories of optimal descriptors in CORAL software, including SMILES-based, graph-based and a combination of SMILES with molecular graph descriptors as hybrid descriptors. The optimal descriptors used in this research to construct the QSAR model are a combination of hydrogen suppression graph (HSG) and SMILES descriptors. The below equation indicates the optimal type of molecular descriptors for QSAR modeling for pIC50 of isatin and indole-based compounds as SARS 3CLpro inhibitors:

$$\text{DCW}\left(\text{T},\text{ N}\right)=\sum \text{CW}\left({\text{S}}_{\text{k}}\right) +\sum \text{CW}\left({\text{SS}}_{\text{k}}\right)+\sum \text{CW}\left({\text{SSS}}_{\text{k}}\right)+\text{CW}\left(\text{BOND}\right)+\text{CW}\left(\text{NOSP}\right)+\text{CW}\left(\text{HALO}\right)+\text{CW}\left(\text{HARD}\right)+\text{CW}\left(\text{PAIR}\right)+\text{CW}\left(\text{Cmax}\right)+\text{CW}\left(\text{Nmax}\right)+\text{CW}\left(\text{Omax}\right)+\text{CW}\left(\text{Smax}\right)+\text{CW}\left(\text{C}5\right)+\text{CW}\left(\text{C}6\right)$$
(1)

where, Sk, SSk and SSk are one, two and three-character SMILES features, respectively. BOND represents a global SMILES descriptor that demonstrate the presence/absence of various bonds including double ( =), triple (#), and stereochemical (@) bonds. The NOSE indicates the presence/absence of nitrogen, oxygen, sulfur, and phosphorus atoms in the SMILES symbol of molecules. HALO is the presence/absence of halogen in the structure of molecules. HARD is the combination of BOND, NOSP, and HALO in the structure of compounds. Cmax, Nmax, and O max show the maximum number of rings (the range 0–9), the maximum number of nitrogen atoms, and the maximum number of oxygen atoms in the molecular structure, respectively. In addition, C5 and C6 indicate the presence of five- and six-membered rings in the molecular structures, respectively. The CW(x) represents the correlation weight of a SMILES feature or an HSG invariant.

The following equation indicates the correlation between the sum of correlation weights (DCW) of the optimal descriptors and pIC50 of the compounds:

$$ {\text{pIC}}_{{{5}0}} = {\text{a}} + {\text{b}} \times {\text{DCW}}\left( {{\text{T}}*,{\text{ N}}*} \right) $$
(2)

a is the intercept point and b is the slope of the line obtained by the least-squares method. DCW (Descriptors of Correlation Weights) is the sum of correlation weights for the optimal descriptor derived from HSG and SMILES and calculated by Monte Carlo optimization. The T* and N* indicate the optimal threshold value and the number of Monte Carlo optimization cycles, respectively.

A flowchart of a Monte Carlo optimization cycle is presented by Sokolovic et al. [26]. At first cycle, the CW(x) of features is randomly generated and then optimized based on the proposed objective function. There are different objective functions to obtain a reliable QSAR model in CORAL software. TF0, TF1 are two objective functions that we used here to obtain correlation weights for attributes and compare the extracted models based on each of them [27, 28].

$${\text{TF}}_{0}={\text{R}}_{\text{TRN}}+{\text{R}}_{\text{iTRN}}-\left|{\text{R}}_{\text{TRN}}-{\text{R}}_{\text{iTRN}}\right|\times \text{c}$$
(3)
$$ {\text{TF}}_{1} = {\text{TF}}_{0} + {\text{IIC}} \times c^{\prime} $$
(4)

The RATRN and RPTRN denote the correlation coefficients between the experimental and predicted pIC50 for the active training and passive training sets, respectively and, c and c’ represent empirical values which are generally constant.

The IICCAL for calibration (CAL) set is obtained according to the following equation:

$$\text{IIC}={\text{R}}_{\text{CAL}}\times \frac{\text{min}\left({}^{-}{\text{MAE}}_{\text{CAL}, }{{}^{+}\text{MAE}}_{\text{CAL},}\right)}{\text{min}\left({}^{-}{\text{MAE}}_{\text{CAL}, }{{}^{+}\text{MAE}}_{\text{CAL},}\right)}$$
(5)

The RCAL indicates the correlation coefficient for the calibration set. MAECAL (Mean Absolute Error for calibration set) is calculated based on Eqs 6 to 8:

$$^{-}{\text{MAE}}_{\text{CAL}}=-\frac{1}{\text{N}}\sum_{\text{K}=1}^{\text{N}}\left|{\Delta }_{\text{K}}\right| {\Delta }_{\text{K}}< 0,{\text{N}}^{-}\text{is}\;\text{the}\;\text{number}\;\text{of}\; \Delta {\text{k}} < 0$$
(6)
$$^{+}{\text{MAE}}_{\text{CAL }}=+\frac{1}{\text{N}}\sum_{\text{K}=1}^{\text{N}}\left|{\Delta }_{\text{K}}\right| {\Delta }_{\text{K}}\ge 0,{\text{N}}^{+}\text{is}\;\text{the}\;\text{number}\;\text{of}\; \Delta {\text{k}}\ge 0$$
(7)
$${\Delta }_{\text{k}}={\text{Exerimental}}_{\text{k}}-{\text{predicted}}_{\text{k}}$$
(8)

The ‘k’ is the index (1, 2... N) and the experimental k and predicted k are related to the pIC50. The CWs for each attribute of Split 1 is provided as an example in Additional file 1: Table S1, total number of attributes is 383.

QSAR model Validation

There are various criteria for evaluating the predictive ability of QSAR models, such as internal validation, external validation, and Y-scrambling. In this study, some standard statistical criteria were used to check the validity of the QSAR models, such as coefficient of determination (R2), concordance correlation coefficient (CCC), Q2, Q2F1, Q2F2, Q2F3, standard error of estimation (s), mean absolute error (MAE), r2m and new Y-scrambling criteria (\({\text{C}}_{{\text{R}}_{\text{P}}^{2}}\)) [29,30,31,32]. In addition, the IIC of models was used to improve the predictability of the models [33, 34].

Applicability domain

The range of compounds for which a QSAR model can make reliable predictions is defined based on the applicability domain (AD) of model as the Organization of Economic Co-operation and Development (OECD) principle 3. Here, the AD is calculated based on the distribution of SMILES features in the training and calibration sets and is defined as “\({\text{Defect}}_{{\text{A}}_{\text{K}}}\)”[17].

$${\text{Defect}}_{{\text{F}}_{\text{K}}}=\frac{\left|{\text{P}}_{\text{TRN}}{(\text{A}}_{\text{K}})-{\text{P}}_{\text{CAL}}{(\text{F}}_{\text{K}})\right|}{{\text{N}}_{\text{TRN}}{(\text{A}}_{\text{K}})+{\text{N}}_{\text{CAL}}{(\text{F}}_{\text{K}})}$$
(9)

where PTRN(Fk) and PCAL(Fk) represent the probabilities of kth feature (Fk) in the training and calibration set, respectively; NTRN(Fk) and NCAL(Fk) denote the frequency of kth feature (Fk) in the training and calibration set, respectively.

$${\text{Defect}}_{\text{Molecule}}=\sum_{\text{i}=1}^{{\text{F}}_{\text{K}}}{\text{Defect}}_{{\text{F}}_{\text{K}}}$$
(10)

According to the SMILES of molecules, the molecule is included in AD if:

$${\text{Defect}}_{\text{Molecule}}<2\times {\overline{\text{Defect}} }_{\text{TRN}}$$
(11)

where \({\overline{\text{Defect}} }_{\text{TRN}}\) is the average \({\text{Defect}}_{\text{molecule}}\) in the training set.

The interpretation of QSAR models

CORAL software provides a simple approach to interpret QSAR models. Three categories of features can be extracted with numerical data of correlation weights in several Monte Carlo optimization cycles: (I) features with a positive correlation weight in all runs that increase the endpoint; (II) features with a negative correlation weight in all runs that decrease the endpoint; and also (III) features with both negative and positive correlation weight in different optimization runs, these features have an undefined role and not be classified as an increasing/decreasing promoters of the endpoint [35].

Molecular docking study

Molecular docking method as a common virtual screening technique can help to find the most favorable ligand binding mode in protein for computer-aided drug discovery [36,37,38]. The X-ray crystallographic structures of SARS-COV-2 3CLpro were obtained from the Protein Data Bank (PDB: 6XHO) based on a good experimental resolution (1.45 Å), R-value free (0.239), and R-value work (0.211). The native ligand in active site of this protein was ethyl (4R)-4-({N-[(4-methoxy-1H-indol-2-yl)carbonyl]-L-leucyl}amino)-5-[(3S)-2-oxopyrrolidin-3-yl]pentanoate (Query on V34), thus we use this pdb code for molecular docking of indole derivatives. The selected receptor for molecular docking simulation was the x-ray structure of SARS-COV-1 (PDB ID: 1UK4) based on a good experimental resolution (2.5 Å), R-value free (0.231), and R-value work (0.213). The native ligand in active site of this protein was 5-mer peptide. 6XHO and1 UK4 structures consist of a dimer composed of two identical sequences. The side chain A was chosen for molecular docking and the side chain B was removed. The protein structure was prepared using adding hydrogens removing water molecules and native ligands. Then, the Kollmann charges were assigned to the receptor. All compounds were sketched using the by ChemOffice15 (PerkinElmer Inc.), and assigned gasteiger charges and energy optimization of ligands using the steepest descent algorithm carried out by Open Babel [39]. The docking studies were done with the Smina program. Smina is a version of AutoDock Vina with a modified scoring function that is particularly optimized to offer high-throughput scoring (http://smina.sf.net) [40].

The grid parameter file is according to the grid box that comprised 20 × 20 × 20 points with 1 Å space and was centered on the active site of SARS-COV-2 3CLpro (x = 9.412, y = 1.383, and z = 8.836). The grid parameter file is according to the grid box that comprised 14 × 14 × 14 points with 1 Å space and was centered on the active site of SARS-COV-1 (x = 66.036, y = 3.288, and z = 5.254).

The X-ray crystallographic structures of SARS-COV-1, SARS-COV-2 3CLpro were obtained from the Protein Data Bank (PDB: 1UK and 6XHO). The structures of compounds were drawn by BIOVIA Discovery Studio Visualizer 2021. The calculation of energy optimization was done using the steepest descent method. Smina was performed with default settings for three proteins and 9 best conformations of ligand were introduced (Additional file 1: Table S4). The computational docking approach was evaluated based on the root-mean-square deviation (RMSD) value from re-docking the co-crystalized native ligand back into the active pocket site of the receptor [41].

Results and discussion

QSAR models

To build the reliable QSAR models, two objective functions were used: objective function without IIC (TF0) and with IIC (TF1). The range of finding the optimal threshold value (T) and the number of epochs (N) were 1–3 and 1–15, respectively. The QSAR models to predict the inhibitory activity against SARS 3CLpro for four splits were built based on TF1 are given below:

Split 1:

$${\text{pIC}}{50}=2.4816 \left(\pm 0.0328\right)+0.0572 \left(\pm 0.0005\right)\times \text{DCW}\left(\text{1,14}\right)$$
(12)
$${\text{R}}_{\text{ATRN}}^{2}=0.94{,\text{ n}}_{\text{TRN}}=25; {\text{R}}_{\text{PTRN}}^{2}=0.95{,\text{ n}}_{\text{PTRN}}=20;{\text{ R}}_{\text{CAL}}^{2}=0.92{,\text{ n}}_{\text{CAL}}=16;{\text{ R}}_{\text{VAL}}^{2}=0.88{,\text{ n}}_{\text{VAL}}=20$$

Split 2:

$${\text{pIC}}{50}=-0.0804 (\pm 0.0679)+0.0972 (\pm 0.0010)\times \text{DCW}(\text{1,12})$$
(13)
$${\text{R}}_{\text{ATRN}}^{2}=0.94{,\text{ n}}_{\text{ATRN}}=24; {\text{R}}_{\text{PTRN}}^{2}=0.94{,\text{ n}}_{\text{PTRN}}=19;{\text{ R}}_{\text{CAL}}^{2}=0.90{,\text{ n}}_{\text{CAL}}=16;{\text{ R}}_{\text{VAL}}^{2}=0.83{,\text{ n}}_{\text{VAL}}=22$$

Split 3:

$${\text{pIC}}{50}=-0.1674 \left(\pm 0.0477\right)+0.1226 \left(\pm 0.0010\right)\times \text{DCW}\left(\text{1,6}\right)$$
(14)
$${\text{R}}_{\text{ATRN}}^{2}=0.96{,\text{ n}}_{\text{ATRN}}=23; {\text{R}}_{\text{PTRN}}^{2}=0.93{,\text{ n}}_{\text{PTRN}}=20;{\text{ R}}_{\text{CAL}}^{2}=0.87{,\text{ n}}_{\text{CAL}}=16;{\text{ R}}_{\text{VAL}}^{2}=0.92{,\text{ n}}_{\text{VAL}}=22$$

Split 4:

$${\text{pIC}}{50}=0.3203 (\pm 0.0545)+0.1004 (\pm 0.0011)\times \text{DCW}(\text{1,10})$$
(15)
$${\text{R}}_{\text{ATRN}}^{2}=0.96{,\text{ n}}_{\text{ATRN}}=24; {\text{R}}_{\text{PTRN}}^{2}=0.96{,\text{ n}}_{\text{PTRN}}=21;{\text{ R}}_{\text{CAL}}^{2}=0.88{,\text{ n}}_{\text{CAL}}=16; {\text{R}}_{\text{VAL}}^{2}=0.81{,\text{ n}}_{\text{VAL}}=20$$

where \({\text{R}}_{\text{ATRN}}^{2}\), \({\text{R}}_{\text{PTRN}}^{2}\) R2CAL, and R2VAL are coefficient of determination for active training, passive training, calibration, and validation set, respectively. \({\text{ n}}_{\text{ATRN}}, {{\text{ n}}_{\text{PTRN}},\text{ n}}_{\text{CAL}}\), and \({\text{n}}_{\text{VAL}}\) indicate the number of molecules in the training, calibration, and validation set, respectively.

Table 2 indicates the statistical criteria of QSAR models for predicting of pIC50 isatin and indole derivatives based on TF0 and TF1 for each split. Regarding the QSAR models, the models developed based on IIC (TF1) are more predictive than the models developed using TF1. Therefore, it can be stated that the QSAR models built with the modified objective function TF1 using IIC are more reliable and robust than the models built by the objective function TF0. Thus, the QSAR model built for split 3 with TF1 was selected as the best model because the coefficient of determination (R2) was the highest for the validation set of this model.

Table 2 Statistical parameters of QSAR models for prediction of pIC50

Y-randomization test (Y-test) was done by CORAL software to confirm the non-chance correlation of developed QSAR models. After ten repetitions of new random models were developed and the values of average value of R2 were found below 0.1 (see Additional file 1: Table S2). These values confirm that the correlation between pIC50 and molecular attributes is not based on chance correlation. Moreover, for the Y-randomization test, the value of CR2p for all models was more than 0.8 (Table 2).

Additional file 1: Table S3 shows the SMILES symbol of isatin and indole derivatives, the set of each compound, the observed and calculated pIC50 of four models, and AD in four splits using TF1. The average \({\overline{\text{Defect}} }_{\text{TRN}}\) for Split 1 to 4 of constructed models based of TF0 are 5.91, 3.19, 5.18, and 5.05, respectively. So, compounds fall into AD if DefectSMILES < 11.82, 6.38, 10.36, and 10.10, for split 1 to 4 respectively. The percentages of data set in the AD of models were 82, 82, 83, and 88 for splits 1–4, respectively. This revealed that the four prediction models were capable of predicting more than 80% of the new data (Additional file 1: Table S3).

Figure 1 displays the plots of the calculated versus observed pIC50 of SARS 3CLpro inhibitors for four models developed based on TF1. It also shows that there is good agreement between the observed and experimental pIC50.

Fig. 1
figure 1

The graphical representation of the observed versus prediccted pIC50 for split 1 to 4

Mechanistic interpretation

Mechanistic interpretation as the fifth OECD principle of QSAR modeling states that the molecular features responsible for increased or decreased activity should be investigated whenever possible. The interpretation of the model can help to design and identify new isatin- and indole-based derivatives. The list of structural features extracted from the best QSAR model (split 3) for three independent probes is shown in Table 3. A short description of these descriptors is presented in the comments column of Table 3 which shows the structural features of increasing or decreasing pIC50 of isatin and indole derivatives. The identified promotors in the increase of pIC50 include the presence of nitrogen with double bond, presence of nitrogen with oxygen, presence of oxygen with double bond, presence of at least one ring, combination of aliphatic oxygen with double bond, presence of oxygen with double bond and branching and presence of aromatic carbon in first ring. The promoters of decrease of SARS 3CLpro inhibitory activity of isatin and indole derivatives are the presence of nitrogen with sulfur, presence of consecutive aliphatic carbon with aliphatic nitrogen with branching, presence of aromatic carbon with branching in fourth ring and presence of aliphatic carbon with branching in fourth ring.

Table 3 The list of structural attributes increases or decrease the pIC50 of isatin and indole derivatives based on the Split 3 model for three independent probes

Based on the favorable structural features and using the most active molecules among the 81 inhibitors which were gathered from literature, some compounds synthesized in various studies were extracted from ChEMBL database. In the ChEMBL database, newly synthesized compounds can be extracted with percentage similarity with desired compound, so we entered the ligand with the highest activity into ChEMBL and extracted some similar compounds from this database. The inhibitory activity (pIC50) of selected structures was calculated using best QSAR model (Split 3). Finally, eight most active compounds (isatin and indole scaffolds with most pIC50) were selected and introduced which are listed in Table 4. The predicted pIC50 range for the extracted compounds based on average prediction of four models was between 7.35 and 8.30. The AD analysis of these compounds based on the Split 3 model (the best model) shows that they fall into AD except for CHEMBL3103276.

Table 4 The average predicted pIC50, IC50, affinity, based on four models for eight extracted compounds from CHEMBL data search

Molecular docking analysis

First, we perform a re-docking of the V34 ligand with the SARS-COV-2 3CLpro and 5-mer peptide with SARS-COV-1 receptors; this is done to validate the molecular docking protocol and also to get insight into the reference active amino acid residues involved in interactions inside the SARS-COV-2 3CLpro and SARS-COV-1 protein pocket (PDB code: 6XHO and 1UK4). Figure 2 displays 3D and 2D visualizations of the re-docking pathways of V34 inside the COVID-2 3CLpro and 5-mer peptide inside the SARS-COV-1 protein pockets with − 8.07 and − 9.4 kcal/mol, respectively. Figures indicate that the re-dock V34 located in the active site of SARS-COV-2 3CLpro interacts with the THR26, HIS41, PHE140, CYS145, HIS164, MET165, GLU166, PRO168, HIS172, GLN189, THR190, and ALA191. Also, the re-dock 5-mer peptide located in the active site of SARS-COV-1 interacts with the HIS41, PHE140, GLY143, SER144, and GLU166. These interactions were hydrophobic and hydrogen bonds. The root-mean-square deviation (RMSD) values were 0.14 and 1.1 Å for native and re-docked ligands of V34 and 5-mer peptide, respectively; which are lower than the tolerable marginal value of 2 Å (Additional file 1: Fig. S1).

Fig. 2
figure 2

V34 interaction patterns with active residues in the SARS-COV-2 3CLpro pocket (A), 5-mer peptide interaction patterns with active residues in the SARS-COV-1 pocket (B)

Figure 3a and b shows that the compound 12 and 53 were placed into the binding pocket of SARS-COV-1 3CLpro by representing three-dimensional diagram. Two‐dimensional diagram of compound 12 and 53 interactions was presented in Fig. 4a and b the compounds formed some important interactions with binding site residues of SARS-COV-1 3CLpro. As the molecular docking results are shown in Fig. 3a, the compound 12 formed two hydrogen bond interactions with SER144 and CYS145 the binding site of SARS-COV-1 3CLpro. Also, it has two hydrophobic interactions with HIS41 and MET49. Moreover, ALA46, CYS44, THR45, THR25, ASN142, GLY143, HIS163, PHE140, LEU141 and GLU166 have van der Walls interaction with the protein. Figure 3b shows various interactions of compound 57 with HIS41, MET49 and MET165, along with some hydrophobic interactions. In addition, the complex formed hydrogen bond interactions with residues SER144, THR26, CYS145, GLY143 and GLN189. LEU141, PHE140, HIS163, LEU27, THR25, ASN142, GLU166, THR190, ALA191, TYR54, ARG188, LEU167 and PRO168 had van der Walls interaction with the protein.

Fig. 3
figure 3

Three-dimensional diagram of compound 12(A) and 53(B) into the binding pocket of SARS-COV-1 3CLp

Fig. 4
figure 4

Two‐dimensional diagram of compound 12 (A) and 53 (B) interactions with binding site residues of SARS-COV-1 3CLpro

Comparing the molecular docking results of re-docked native ligands and compounds 12 and 53 as the most activist compounds; we can notice that all compounds 12 and 53 interacted with the majority of active residues in the COV-2 3CLpro and SARS-COV-1 pockets with which native ligands interacted.

Molecular docking results agree with some promoters regarding the increase in pIC50 in QSAR models; for instance, compounds 12 and 53 contain oxygen with double bonds, at least one ring, and branching, all of which interact with amino acids residues in protein active sites via hydrogen bonds and hydrophobic interactions.

Hexachlorophene was used as a SARS 3CLpro standard inhibitor (IC50 = 5 µM) according to Liu et al. [42]. We docked Hexachlorophene into the active site of 6XHO. The best binding mode of the Hexachlorophene in the binding site of SARS-COV-1 3CLpro (pdb: 6XHO) was − 8.05 kcal/mol.

Eight extracted compounds from CHEMBL based on scaffold of isatin or indole were docked into 1UK4 and 6XHO as well. Two and three‐dimensional diagrams of the interaction of the eight ligands from CHEMBLE with their receptors are presented in Additional file 1: Fig. S2. Molecular docking analysis shows that these ligands with the majority of active residues in the COV-2 3CLpro and SARS-COV-1 pockets with which native ligands interacted. As before we mentioned it for the activist compounds 12 and 53. It confirmed that indole and isatin are important cores in interaction with targets. As can be seen in Table 4, all eight compounds had higher binding energy compared to the most active compounds in data set and hexachlorophene. The results present a very good correlation between results obtained from Monte Carlo optimization modeling and molecular docking studies.

ADMET results

In silico ADMET (absorption, distribution, metabolism, excretion, and toxicity) screening of compounds can reduce the cost and time associated with the in vitro assay and/or in vivo experiments [43]. AdmetSAR online database was used to predict ADMET properties of extracted isatin- and indole-based compounds [44]. As ADMET properties are shown in Table 5, all eight compounds showed positive results for human intestinal absorption. Furthermore, it is necessary to check whether the proposed molecules are non-toxic because it plays an important role in the selection of drugs. Ames test was negative for all compounds except CHEMBL4443007 and based on acute oral toxicity all compounds were classified as non-toxic.

Table 5 ADMET prediction for eight extracted compounds from CHEMBL

The Osiris Property Explorer (OPE) tool was used to assess the fragment-based drug-likeness of the extracted compounds [45, 46]. A positive value (0.1–10) indicates that the compound mainly contains fragments that are often found in commercial drugs. Also, using this program, the overall drug scores were evaluated that combines drug-likeness, ClogP, ClogS, molecular weight, and toxicity risk factors in one single value where the frequency of occurrence of each fragment is determined within the collection of approved drugs and within Fluka non-medicinal chemicals.

Finally, based on the results of the OPE study, compounds CHEMBL4458417 and CHEMBL4565907 both containing an indole scaffold with the positive values of drug-likeness and the highest drug-score can be introduced as selected leads.

Conclusion

Four simple, predictive, and reliable QSAR models were developed for the pIC50 values of 81 isatin and indole derivatives that inhibit SARS 3CLpro using Monte Carlo with the index of ideality of correlation (IIC) as the objective function. The statistical parameters of the models were suitable with high predictive power (\({R}_{Val}^{2}\) = 0.81–0.92, and MAE = 0.31–0.40). The four proposed models were satisfactory for predicting new isatin and indole derivatives as candidates for SARS 3CLpro inhibitors and can be used for pre-synthesis evaluation of new isatin and indole derivatives. A mechanistic interpretation of the models was done by examining the correlation weights of the different extracted molecular features extracted in several Monte Carlo optimization runs. These features were used to extract eight new and more active isatin and indole derivatives from the ChEMBL database. The activity of new compounds was further verified by molecular docking studies. The activity of the new compounds was further confirmed by molecular docking studies. The binding energy of these molecules with residues of active site were in correlation with calculated pIC50. Finally, the compounds CHEMBL4458417 and CHEMBL4565907 both containing an indole scaffold with the positive values of drug-likeness and the highest drug-score were introduced as selected leads.