Introduction

Docking is a computational modelling technique. It predicts the best energy-minimized pose of one molecule to a second molecule when bound to each other to form a stable complex in a virtual way to complement the experimental method [1,2,3,4]. It is rapid and cost-effective compared to the trial-and-error methods using experimental studies. It can replace or provoke experiments. It also helps to explain and understand experiments and creates a map between the theoretical and practical aspects of molecular interaction. Sometimes using experimental methods, like solution NMR, X-ray diffraction, electron microscopy, solution scattering, and neutron diffraction, it may not be possible to obtain the structure of a stable complex of two different molecules docked together because of cost, research timeline limitation, blind and too many parameters to control, difficulty, impossible or dangerous to perform experiments, and unavailability of material, etc. So, using such an in-silico mechanism based on structural information it is possible to attempt to find the best match between two molecules to predict their intermolecular attraction or repulsion. Knowledge of preferred conformation might speculate the strength of association (or binding affinity) between two molecules and result in different possible complex structures that are ranked and grouped using scoring functions. The molecular docking study evaluates the binding mode of a molecule to the binding pocket of another molecule based on the minimum binding energy.

The binding of a small molecule or atom or ion (called a ligand) to a biomolecule (termed a macromolecule, receptor, or target) is a common computational study in numerical noncovalent interaction processes. Accurate and reliable prediction of docking results between protein and ligand is useful in modern structure-based drug design [5]. Interaction between biologically related macromolecules, namely DNA, RNA, protein, carbohydrate, lipid, enzyme, and small organic/inorganic molecules or atoms or ions play a central role in biological processes, including signal transduction, transport, cell regulation, gene expression control, enzyme inhibition, antibody-antigen recognition, and the assembly of multi-domain proteins. They interact with each other to perform a certain biological function together inside a biological cell. Different relative orientations at specific sites of interactive biological entities may alter their functions. So, a docking study is essential for predicting the pose of biomolecules to dock with each other more precisely. The interaction can occur between any two molecules, like DNA-ligand, RNA-ligand protein–protein, protein-RNA, protein-DNA, protein-drug, protein-nanoparticle, enzyme–substrate, enzyme-drug, etc.

Applications of molecular docking

There are numerous applications of molecular docking technique, namely prediction of lowest free binding energy-optimized structure of a receptor-ligand complex, differential binding of a ligand to different macromolecular receptors, geometry of a receptor-ligand stable complex, propose modification of a lead molecule to increase its efficiency for binding to a receptor and to act together performing a task, de novo design of candidate molecules based on docking studies and their library design, study on side effects of a candidate molecule because of its interaction with other molecules, interaction of a potential drug molecule with homologous proteins, understand a protein–protein interaction (PPI), assign putative roles to unknown proteins, understand the relationships between proteins that form multimolecular stable complexes like proteasome, molecular interaction study to understand various biological pathways (a number of interactions performed by numerous biological molecules in a cell that result in certain other biomolecules and accomplishing different biological functions in a cell), and prediction of pollutants degraded by enzyme, etc.

The docking technique is also used extensively to investigate nanomaterials applications in different areas. Nanoparticles are solid colloidal particles (sizes vary from 10 to 1000 nm) having properties, such as increased surface-to-volume ratio, magnetism for bigger size particles, chemical stability, non-toxicity, biocompatibility, high saturation magnetization and high magnetic susceptibility, etc. These properties are the cause of its beneficial use in antibiotic treatment, antimicrobial (antiviral, antibacterial, antifouling, antifungal), nanocomposite coating, catalyst, and lubricants and thus are needed in biomedical applications such as targeted drug delivery, hyperthermia, photoablation therapy, bioimaging, biosensors, cell labelling, and gene delivery, etc. Many docking studies have been carried out to understand the molecular interaction between biomolecules and inorganic or organic (synthetic/natural) nanoparticles due to their uses in biomedical applications [2].

Databases and tools used in the docking study

The increased availability of three-dimensional structural information of molecules using different major reliable databases, protein structure prediction tools, macromolecular structure validation and quality assessment tools, chemical molecule drawing tools, etc. assist in preparing the molecules for docking. Rapid advance in the development of reliable computational docking tools, docking web servers contribute to solve the unknown problems by docking study between different types of molecules or atoms or ions interacting with each other. Chen has listed down docking databases and webservers in their review [6]. Although there are many accurate reliable docking algorithms emerging, limitations and challenges are still there because of imperfections in their scoring functions, handling protein flexibility, explicit water, etc. Docking software can be evaluated and their limitations can be overcome by benchmark testing with Critical Assessment of PRediction of Interactions (CAPRI) [7]. Various types of databases, software programs, and web servers used in different steps in the docking technique are listed in Table 1.

Table 1 List of databases, software programs and web servers used in different steps in docking method

Molecules used in docking studies and binding energy calculation

When two molecules, in close proximity, favorably interact with each other, they bind to form a stable complex. The acceptor molecule is termed a receptor or target or macromolecule (though it is bigger in size compared to the second molecule) and the received molecule by the receptor is termed a ligand. The site where binding happens is known as the active site or binding site or binding pocket. The stable complex formed is termed a receptor-ligand complex as shown in Fig. 1. There are many different types of noncovalent interaction forces that cause the receptor and ligand to bind together to form a stable complex. Those are torsional, hydrophilic, hydrophobic, van der Waals, electrostatic, hydrogen-bonding, and desolvation, etc. So, the main goal was to find the most stable receptor-ligand complex with optimized geometry and minimum binding energy.

Fig. 1
figure 1

Receptor-ligand docked complex. Docking of a small molecule “Ligand” (cyan) to a bigger size molecule “Receptor” (yellow) to produce a stable docked complex. (Color figure online)

The energy score and energy terms of binding pose are calculated using scoring functions. The ongoing scoring functions applied in different docking algorithms are of four types as follows: i.e., force-field-based, empirical, knowledge-based, and machine-learning-based [8, 9]. Force-field based scoring functions use different non-covalent types of force fields to predict the protein–ligand interactions. Non-covalent interactions, such as van der Waals (∆EVDW), electrostatic (∆Eelectrostatic), hydrogen bonding (∆EH-bond), desolvation (∆Gdesolvation) contribute towards the total binding energy. The extensively used generalized functional form of energy calculation in such type of scoring function is given as follows [9]:

$$\Delta G_{{{\text{binding}}}} = \Delta E_{{{\text{VDW}}}} + \Delta E_{{{\text{electrostatic}}}} + \Delta E_{{\text{H - bond}}} + \Delta G_{{{\text{desolvation}}}}$$
(1)

Second type of scoring function is empirical-based that calculates the fitness of binding between protein and ligand by adding the contribution from each energetic factor of the protein–ligand binding. In case of GOLD docking software program, the scoring function, ChemScore (the higher is the score the better is the docking result) is calculated as follows [10]:

$${\text{ChemScore }} = \, S_{{\text{H - bond}}} + \, S_{{{\text{metal}}}} + \, S_{{{\text{lipophilic}}}} + \, P_{{{\text{rotor}}}} + \, P_{{{\text{strain}}}} + \, P_{{{\text{clash}}}} + P_{{{\text{covalent}}}} + \, P_{{{\text{constraint}}}}$$
(2)

where SH-bond, Smetal, Slipophilic are rewarding scores for hydrogen bonding, coordinate bonding with metal ions and lipophilic contacts, respectively, whereas the Protor, Pstrain, Pclash, Pcovalent and Pconstraint are the penalties assigned for frozen rotatable bonds, ligand’s internal strain energy, steric clashes between protein and ligand, covalent type docking, and restrained docking.

The third category of scoring function is knowledge-based scoring function which computes the binding score as the sum of statistical pairwise potentials between protein and ligand as follows:

$$A=\sum_{i}^{\mathrm{lig}}\sum_{j}^{\mathrm{prot}}{\omega }_{ij}(r)$$
(3)

where wij (r)is the potential between any pair of two atoms i and j at a distance of r and is calculated by the Boltzmann inversion as follows:

$${\omega }_{ij}\left(r\right)={-k}_{\mathrm{B}}T\mathrm{ln}\left[{g}_{ij}\left(r\right)\right]=-{k}_{\mathrm{B}}T\mathrm{ln}\left[\frac{{\rho }_{ij}\left(r\right)}{{\rho }_{ij}^{*}}\right]$$
(4)

where gij(r) is radial distribution function (RDF) of the atom pair ij separated at a distance of r, KB is Boltzmann constant, T is temperature, ρij(r) stands for the density of the atom pair i−j set apart at a distance of r, and ρij* denotes density of the atom pair i−j in a reference state, in which the interactions between atoms are taken as 0.

The fourth category of scoring function is machine-learning (ML) based. It learns the functional form of binding affinity from the training data. The complicated functional form applies ML methods, namely deep neural network (DNN), convolutional neural network (CNN), graphical neural network (GNN), support vector machine (SVM), random forest (RF), eXtreme gradient boosting (XGB) [11,12,13,14]. Quantity structure activity relationship (QSAR) analysis can be useful to model biological, physicochemical, and pharmaceutical properties of the ligand. Similarly, the properties of the protein, and patterns in the protein–ligand interaction can be modelled using some potential descriptors and ML methods can be applied in QSAR analysis to attain a statistical model to calculate the protein–ligand binding score. Scoring functions are discussed more in detail elsewhere [8, 9, 15]. Here we are showing functional form of empirical free energy calculation in AutoDock.

In AutoDock docking tool, the net free energy calculated in a docking process is summation of different types of potentials, including van der Waals (ΔGvdW), hydrogen bonding (ΔGH-bond), electrostatics (ΔGelec), tortional (ΔGtor), and desolvation (ΔGsol) [16]. The net change in free energy calculation in AutoDock is given as follows:

$$\Delta G=\Delta {G}_{\mathrm{vdW}}{\sum }_{i,j}\left(\frac{{A}_{ij}}{{r}_{ij}^{12}}-\frac{{B}_{ij}}{{r}_{ij}^{6}}\right)+\Delta {G}_{\mathrm{H}-\mathrm{bond}}{\sum }_{i,j}E(\theta)\left(\frac{{C}_{ij}}{{r}_{ij}^{12}}-\frac{{D}_{ij}}{{r}_{ij}^{10}}\right)+\Delta {G}_{\mathrm{elec}}{\sum }_{i,j}({q}_{i}{q}_{j}/\varepsilon ({r}_{ij}){r}_{ij}+\Delta {G}_{\mathrm{sol}}{\sum }_{i,j}({S}_{i}{V}_{j}+{S}_{j}{V}_{i}){e}^{(-{r}_{ij}^{2}/2{\sigma }^{2})}+\Delta {G}_{\mathrm{tor}}{N}_{\mathrm{tor}}$$
(5)

where ΔG are the coefficients of different potentials obtained from different studies of receptor-ligand complexes accompanied by known binding constants. The sum is carried out for each pair of ligand atom, i, and receptor atom, j and to each pair of atoms in the ligand separated by three or more bonds. The in-vacuo contributions consist of three following interaction energy terms: (i) Lennard–Jones (12–6 dispersion/repulsion); (ii) a directional 12–10 hydrogen bonding, where E(θ) the directional weight based on the angle, θ, between the probe and the target atom; and (iii) a screened Coulombic electrostatic potential [17, 18]. Detailed of their parameterizations have been described by Morris et al. [19]. A measure of the unfavorable entropy of ligand binding is added to the in-vacuo function. This entropy term is because of the restriction of conformational degrees of freedom and is proportional to the number of sp3 bonds in the ligand, Ntor [20]. The functional forms of various scoring functions used in different docking algorithms are presented elsewhere [21].

Important steps in the docking technique

The steps in a docking procedure are explained in the following.

Receptor selection

Based on the input file format needed in a software tool (varies from one software to another), three-dimensional structural information of a receptor in any of the file formats, like pdb, mol2, sdf, smi, xyz, and cif, etc. is given as input. The pdb file format is accepted commonly in most of the docking software tools and web servers. High-resolution three-dimensional structural information of a receptor can be obtained by experimental methods, namely solution nuclear magnetic resonance (NMR), X-ray diffraction, electron microscopy, solution scattering, and neutron diffraction or downloaded from any structural database, such as, RCSB PDB [22]. Structural information of molecules if not available by experimental methods or structural databases can be obtained by any drawing tool like Avogadro [23]. For proteins, homology modeling or threading can be done to predict the structures of unknown proteins but gives less accurate results compared to structures obtained from PDB. Machine learning can also be useful for protein structure prediction [24]. The quality of the structure can be checked using any validation software, such as PROCHECK [25].

Receptor preparation

After selecting the target macromolecule it is prepared for docking. Ligand, cofactor, ion, lipid, water, solvent, and other unnecessary molecules if present along with the receptor (usually these co-crystallized molecules are found in a database) are removed by any molecule editor program namely PyMOL, Chimera, AutoDock as presence of these molecules in the binding pocket of the target may obstruct the binding of the ligand [26, 27]. Schiebel and Barillari have elucidated the fascinating role of water in protein–ligand interaction [28, 29].

Despite the fact that involvement of water molecules in receptor–ligand interaction is not well understood, they work in either of the two ways as follows: (1) being removed from the binding cavity of receptor that results in increase in binding affinity between the receptor and the ligand as displacement of water molecules may encourage ligand to bind to its receptor; (2) remains in the binding pocket to make the interaction more stable because water molecules get involved in making hydrogen bonds with ligand and receptor at the same time forming hydrogen-bonding network which makes stable binding between them. Hydrated docking (water mediated protein ligand docking) can be performed by docking software suites AutoDock vina, GOLD, GLIDE [30,31,32,33]. These tools permit water molecules to be switched on or off during the docking process [34]. After removing water molecules, missing atoms are added, multiple or low occupancies are refined, chain breaks are repaired and included, hydrogen atoms are added to protonate the molecule which is necessary especially in the binding site though these may be involved in any type of interaction with the ligand. The accurate protonation of the target receptor molecule gives good docking result with correct prediction of van der Waals surface and dipole moment in the binding pocket. Programs like AutoDock, Reduce, Maestro (Schrödinger), PropKa, Chimera are used for protonation of receptor [26, 27, 35, 36]. Metal ionization states are corrected to deal with accurate formal charge and force field. Bond order to HET groups is counted. In case of proteins or peptides, some mutated amino acids may be found in structures deposited in PDB for stability, crystallization and other biochemical causes which can be changed to wild residues specifically in the binding pocket; capping termini is done with ACE (N-terminal capping acetyl group) and NME (C-terminal N-methyl amide capping group) residues. The stable conformation of the receptor is obtained by relaxing it that permits hydrogen atoms to move and freely minimized and heavy atoms to move sufficient enough to make strained bonds, angles, and clashes relaxed. Structural optimization can be done to minimize the energy to get the most stable conformation by programs, such as Avogadro, UCSF Chimera’s Dock Prep, Schrödinger’s Protein Preparation Wizard and molecular operating environment (MOE) programs [23, 37, 38]. The partial charges (also called as point charges) are added to each atom to predict the electrostatic potential around a receptor molecule because of Coulombic interaction between the point charges. The receptor should be taken as a flexible body to reflect the real-time analysis of ligand-receptor dynamics (ex: amino acid residues in the binding site may be involved in binding).

Ligand selection and preparation

Structural information of a ligand molecule or atom or ion can be obtained similarly from any structural database, including PubChem, DrugBank, and PDB etc. or drawn using any chemical molecule drawing tool, like ChemDraw, Avogadro, ChemIDplus, ACD/ChemSketch freeware, BIOVIA Draw, ChemSpider, and PubChem Sketcher, etc. [22, 23, 39,40,41,42,43,44]. They should be treated as flexible bodies as well except for ring conformations. The number of rotatable bonds is calculated. The more the number of rotatable bonds, the more difficult and time-consuming the docking process will be due to the increase in search space. The ligand geometry should be optimized to have a stable conformation by any structure optimization tool, such as Avogadro, UCSF Chimera’s Dock Prep, NAMD tool and the atomic partial charges are then assigned to it by partial charges addition tool, such as UCSC Chimera (Table 2).

Table 2 List of software programs/web servers used for different steps in docking process

Docking

The ligand can be docked into the binding pocket of the receptor either using the known location of the active site (active-site docking) or searching for it (blind docking). If the structure of a holoprotein (protein bound with ligand) gets available then the binding site information can be useful for active-site docking. Otherwise, binding site prediction tools, namely LigASite [45] can be useful to predict the binding pocket information. In the molecular docking process, the search space to fit the ligand is explored considering the flexibility of both receptor and ligand.

Different approaches used to predict the binding site depend on (1) templates, (2) energy functions, (3) geometric considerations, and (4) machine learning [46]. The template-based approach predicts the binding site of a given protein using well-studied proteins having similar structures with known binding pocket information. The energy-based approach guesses the binding pocket information by using energetically favorable regions of the protein to bind the ligand in it. The geometry-based approach uses the geometry of the protein to find binding pocket of it. Machine learning approaches learn from existing vast amount of high-resolution protein structure data available in different structure databases and generalize competently to new data and predicts the binding pocket information more accurately than the other three approaches. The available structural protein–ligand binding data are collected and then normalized and used to guess the binding site information of the input protein by machine learning approaches, like shallow supervised learning algorithm, artificial neural network, convolutional neural network, and ensemble methods. Data acquisition and preprocessing, feature engineering, model development, training–testing, hyperparameter tuning and evaluation are the five important steps in machine learning approach to bind the binding site information of a given protein. A simplified view of machine learning based approach is shown in Fig. 5. Different methods to search for binding site information of a given protein is discussed more in detail [46].

There are three main approaches that treat the ligand as flexible used in various popular docking algorithms for searching conformational space [15] As follows: (i) systematic search (incremental construction, conformational search, databases), (ii) random/stochastic search (Monte Carlo, genetic algorithms, tabu search), (iii) simulation method (molecular dynamics, energy minimization). The energy score and energy terms of each binding pose are calculated using scoring functions. Search approaches are discussed in depth by Kitchen et al. [15].

A scoring function is a mathematical function that is used to predict the binding affinity between a receptor and a ligand when they are bound to each other. The scoring function is used to evaluate and give a ranking of predicted possible ligand conformations in the docking result. Ideally, the best-scored ligand is the best binder having the highest binding affinity to the receptor. There are four categories of scoring functions used in different docking algorithms [8, 9] as follows: (i) force-field-based, (ii) knowledge-based, (iii) empirical, and (iv) machine-learning-based. First three (i, ii, and iii) categories are classical type scoring functions. Force-field based or physics-based scoring functions are calculated using interaction due to van der Waals, electrostatic, desolvation. The docking software, such as GoldScore, early versions of AutoDock and DOCK, employ force-field based scoring function [33, 47,48,49].

Knowledge-based or potential of mean force scoring function use statistical inspections of potentials obtained from experimentally found three-dimensional protein–ligand structures and predict the potential of mean force by inverse Boltzmann distribution. Examples of such type of scoring functions are PMF, DrugScore, SMoG and ITScore, KECSA [50,51,52,53,54].

Empirical or regression-based scoring function depends on list of weighted scoring terms contributed by different types of intermolecular interactions, such as van der Waals, hydrogen bonding, hydrophobic, desolvation, electrostatics, number of rotatable bonds (entropy), and many more (similar to force-field-based scoring function). The weighted scoring terms are derived by regression analysis of experimentally derived binding energy and three-dimensional structural information of known protein–ligand complexes (similar to Knowledge-based scoring function). Examples of empirical-based scoring functions are LUDI, ChemScore (employed in GOLD), GlideScore (implemented in Glide), X-Score, F-Score, SCORE, Fresno, Vina, Lin_F9 [20, 55,56,57,58,59,60].

Machine-learning (ML) or descriptor-based scoring functions use machine learning techniques. The functional form of binding affinity learns from the existing data (training data). Docking tools using ML scoring functions give more accurate docking results compared to classical scoring functions-based docking tools. These ML scoring functions employ ML methods, such as deep neural network (DNN), convolutional neural network (CNN), graphical neural network (GNN), support vector machine (SVM), random forest (RF), eXtreme gradient boosting (XGB) [11,12,13,14]. Scoring functions are discussed more in detail elsewhere [8, 9, 15].

Evaluation and analysis of docking result

After the docking simulation ends, evaluation and analysis of the docking results are important steps to interpret the docking result. Best binding pose (i.e., orientation and geometry), closest to native structure with top score (highest docking score) is chosen based on the estimated lowest binding energy. In the docking result, all potential hydrogen bond acceptors and donors of the ligand should be satisfied [61], charged groups in the ligand should interact with oppositely charged groups of the receptor. The accuracy of the docking result closest to native structure can be verified by similarity checking with the experimental data if available. The certainty of the best binding pose chosen can be evaluated bythe following approaches: (1) calculated root mean square distance, also called as root mean square deviation (RMSD, that is used to predict the correct geometry, orientation, and position of the ligand) value of the successful predicted pose of the ligand from its reference native pose must be less than 2 Å, and (2) comparisons of different types of protein–ligand intermolecular complex interactions, such as van der Waals (vdW), electrostatic, hydrogen bonding, tortional, and desolvation for the estimated pose and experimentally derived one [62,63,64,65]. The second approach, i.e., based on key interactions could be a better relevant approach to measure the docking accuracy. Combination of approaches with key interactions and shape complementarity between the receptor and ligand could also be helpful to verify the correctness of the docking result [62]. Accuracy of the obtained ligand conformations can be checked by visualizing and comparing the molecular interactions between the observed and experimental result [62]. Molecular three-dimensional structural visualization tools, namely PyMOL, VMD, UCSF Chimera can be useful to view and inspect the docking result [26, 66, 67]. The predicted binding affinity between a receptor and a ligand is measured by the calculated binding free energy to bind the two molecules, which can be estimated by different scoring functions used in different docking tools in terms of scores. These scores cannot be compared directly with the experimentally derived binding data [8]. An extensively used measure, Pearson correlation coefficient (PCC) can be applied to find the linear correlation between the experimentally derived binding data and scores predicted by docking tools for sets of test data [68, 69]. Alternative measure is by applying Spearman ranking correlation coefficient to rank the predicted and experimentally derived scores and then find the statistical relationship between the two ranked sets [70].

Virtual screening (a computational method used in drug discovery to find the best possible candidates among large number of molecules available to bind with a target molecule (e.g., protein or enzyme). The screening capability and predictive ability of good hits depends on the power of a scoring function used in molecular docking. The area under the receiver operating characteristic curve (AUC) and best 1% enrichment factor (EF1%) can be used to see the ranking performance of a scoring function [71]. How good the best candidates have been ranked is assessed by AUC which gives a value 1.0 to the accurate ranking results and 0.5 for random ranking results. The ratio of percentage active compounds of top 1% of the ranked compounds and overall percentage of active compounds is termed as EF1%. Afterwards, EF1% can be normalized to have NEF1% by dividing estimated EF1% by the best EF1%. NEF% checks the quality of ranking performance of scoring function accurately by checking the value 1.0 means most of the active compounds are ranked in top 1% of the ranked compounds and the value 0 means none of the compounds are selected as active [72]. Comparative assessment of scoring functions (CASF) evaluates the performance of a scoring function [68, 69]. The scoring function gets analyzed by its scoring, ranking, docking, and screening power using by CASF benchmark. LIT-PCBA benchmarking datasets can be useful to evaluate the quality of a scoring function in machine-learning based virtual screening [73].

Moreover, molecular dynamics (MD) simulation can be used in the next step to affirm and refine the model of the docking complex [74,75,76].

The best docking result can be interpreted with all types of information that tells how, why and where the ligand binds with the given receptor. It includes 3D structural information of the receptor-ligand complex, details about the interacting residues, any type of contact within a certain distance Ex: 3, 3.5, 4 Å, polar contacts, π–π and π-cation interactions, binding affinity, docking score, net binding energy with contributions from different types of interaction potentials, such as van der Waals (vdW), electrostatic, hydrogen bonding, tortional, and desolvation, RMSD value from the reference conformation. Visualization of different type of interactions (hydrogen bonding, hydrophobic, ionic, aromatic, π-π, π-cation) and regions of contact the receptor makes with the ligand in the binding cavity aids in confirmation of the binding stability.

Types of docking

Docking study can be of three types, namely rigid docking, flexible-rigid docking, and flexible docking (based on the flexibility of the interacting molecules, receptor, and ligand) as shown in Fig. 2 [77]. Flexible docking gives more reliable and accurate results because the relative bond angle and bond length of molecules may vary. Pagadala and his co-workers have presented a review on different molecular docking programs based on rigid and flexible docking [78].

Fig. 2
figure 2

Different types of docking studies based on flexibility of receptors/ligands considered in molecular interaction. (Color figure online)

Rigid docking

In rigid type docking, both ligand and receptor molecules are considered rigid bodies. Their shape is not changed and thus the internal geometry of each molecule is kept fixed. Its position can be varied and thus the translational and rotational degrees of freedom are only considered. This is an early docking method that can be carried out between macromolecules, such as two protein molecules and the result is less accurate and unreliable and thus is less frequently used in current docking studies. The lock-and-key principle can be applied in this method [79].

Flexible-rigid docking

It is a semi-flexible docking method. In this case, either ligand or receptor is taken as a rigid body. Usually, the shape of the receptor is kept fixed, and the conformation of the ligand is varied. This method gives more accurate and better reliable results than the rigid docking method and is thus frequently used.

Flexible (soft) docking

It is a fully flexible docking method, in which both ligand and receptor are considered as flexible bodies, i.e., an enumeration of rotations of the molecules (both receptor and ligand) is done to search for optimized conformation and orientation of the molecules to interact with each other. The molecule’s shape can be varied by changing the torsion angles and rotatable bonds. This method results in the prediction of docked conformation with high accuracy that most probably resembles experimental results but may require heavy computational calculation and time.

In both semi-flexible and fully flexible docking methods, the induced-fit principle can be implemented, and the docking process becomes complicated when the interacting molecule has many conformational degrees of freedom [79].

Importance of structural information of receptor’s binding pocket in docking study

A receptor interacts with a ligand at its binding site. But the docking study can be done with or without using the binding site information as shown in Fig. 3 [4].

Fig. 3
figure 3

A A yellow dashed line rectangular docking box is created around the receptor’s whole surface. The whole surface is scanned for a possible binding pocket to dock with the ligand. The ligand is connected in the best binding site to make a stable complex. B The docking box is created around the receptor’s binding site surface. The ligand is connected to the binding site to make a stable complex. (Color figure online)

Blind docking

Docking without taking into consideration the structural information of the binding or active site (where the ligand is expected to bind and interact with it) of the receptor molecule is called blind docking. If the ligand under study does not interact with the receptor in its binding site (the receptor may have a binding pocket where it interacts with its another partner ligand in nature), or the location of the binding site is unknown then the entire surface of the receptor is searched for the binding pocket to accommodate the ligand. If it is not possible to target the whole receptor molecule, then possible pockets can be searched to accommodate the ligand. This type of docking study is usually done to investigate the possible interaction between a novel protein and a ligand under study.

Active site docking

A docking study based on the binding site information of the receptor is called active site or site-specific docking. Previously found results about the position of the binding pocket, active residues of the receptor, and binding modes of known ligands in the binding site reduce the search area of interest and thus increase the speed and effectiveness of docking search. Sometimes for well-studied protein molecules, knowledge of the binding pocket information can be used, in case it is not known it can be retrieved by mapping the protein to its family of well-recognized proteins having similar structure and function with known active site information. In such a docking process, the ligand binds only in the active site and nowhere else.

Models to understand docking study

In the molecular docking method, molecular recognition is processed by the computational simulation to find the best conformation of ligand and receptor based on different features and complementarity of both molecules that predicts the binding affinity and interaction between them [79]. There are three basic models as shown in Fig. 4, which help to understand the docking simulation study more precisely.

Fig. 4
figure 4

Different docking models: (1) Lock and key model, (2) Induced-fit model, (3) Conformational selection model. Among these three models, only induced-fit model induces a change in the shape of the binding pocket. (Color figure online)

Lock-and-key

It is the first proposed model suggested by Fisher to predict a docked complex of a rigid receptor and a rigid ligand [80]. In this type of rigid docking, the ligand is taken as a key and the receptor as a lock. The key tries to open the lock based on the shape complementarity between the receptor’s active site and the ligand. The computation must try to find the accurate relative position and orientation of the “key” with respect to the lock which will unlock the “lock”. This model is quite simple but may not reflect the dynamic behavior of molecules (both receptor and ligand). HINT (Hydropathic INTeractions) docking program uses the experimentally derived solvent partition, LogP values between water and 1-octanol for modeled organic molecules to estimate and predict the score of hydropathic interactions between atoms of protein and ligand in the docking process [81]. Cozzini and co-workers, in their study have used this program to estimate the binding free energy of 53 protein–ligand complexes constructed by 17 known proteins of solved 3D structures, available in protein data bank (PDB) and hydrophobic/polar ligands. They have demonstrated the hydropathic (lock-and-key) complementarity HINT interaction maps to ascertain the binding between protein and ligand at different pH (3.5, 4.5, and 5.5).

Induced-fit

Docking can be better understood by the “hand-in-glove” concept which considers the flexibility of the molecules. The molecules can change their shape to best fit each other while binding. Such type of binding is termed “induced-fit” binding. This model was proposed by Koshland to explain specificity of enzyme during protein synthesis with the concept that the structural changes in an enzyme get induced by a substrate while binding to it at its active site to form a stable enzyme-ligand complex [82]. The model hypothesizes the interface optimization of the enzyme by conformational modifications of its residues present in its active site to react with the substrate by physical interactions. Basically, the binding site of the enzyme is flexible in shape. Before binding, the substrate does not fit into the binding pocket of the enzyme and while binding, the active site of the enzyme undergoes a change in shape to complement the shape of the substrate. This model was backed up by the fact found in the structures of protein–ligand complex obtained in X-ray crystallography experimental method deposited in protein data bank (PDB) which validates the enclosure of ligands in the binding pockets of proteins and proposes covering up the ligands by the binding-site residues after the binding process starts [22, 83]. The difference between structural information of a protein attached with a ligand (holo) and without a ligand (apo) ascertains the conformational alteration of a protein [84]. The ligand binds to the protein conformation with least energy and following the initial binding process the protein undergoes a conformational change. So, the molecules undergo conformational changes to fit each other and minimize the net binding energy. The result obtained using this model best suits the real fact, i.e., the molecules and atoms show physical movements in nature. The ligand first binds to the receptor (with its original shape) and then causes the structural change in the receptor to accommodate it. Without the ligand, the structural changed state of the receptor may not exist. Flexible docking using the induced-fit model permits conformational changes while predicting the binding pose in the interaction between protein with other molecules ex: ligand, protein, or peptide [34, 85]. Fleksy program recognizes both ligand and receptor as flexible and can be used for flexible and induced-fit docking [86]. In this approach, structural ensemble of receptor molecule conformations is produced from backbone-dependent rotamer library in to which the ligand is docked, the best ranked (ranking based on a consensus scoring function which depends on docking scores and molecular dynamics force field interaction energies) selected poses follow a refinement step using the Yasara program, which includes steepest descent minimization, follow a simulated annealing and finally the results with a list of minimized docked complexes are obtained [87]. In the minimization step, both receptor and ligand molecules are allowed to move. Using induced-fit approach Molegro Virtual Docker tool docks a ligand against a protein with user given constraints, option of removal of water molecules, structural alignment of protein and ligand, and ligand-based Graphics processing unit (GPU) screening [88]. FiberDock is a protein–protein docking refinement web server [89]. In the docking process it generates potential candidate docking complexes, models their backbone and side-chain residues based on their movement in the protein–protein interaction, performs refinement of the docking complexes and gives scores to them. The flexible refinement of candidate docking complexes follows induced-fit model. HADDOCK, a flexible protein-peptide docking software uses both the induced-fit and conformational selection mode approaches [90]. This program yields ensembles of favorable peptide structures and these conformations are further refined allowing movement in side-chain and backbone residues and then subjected to dock in the binding pocket of the target protein using the induced-fit approach.

Conformational selection or population shift

The conformational selection model was proposed by Monod, Wyman, and Changeux in 1975, which suggests that the unbound receptor exists in different conformational states based on relative free energy according to Gibbs distribution before binding to the ligand and the ligand chooses the best fit among them to form the final stable docked complex [91]. The difference between the induced-fit and conformation selection model is that in the case of the induced fit model, the ligand binds to the receptor before the structural change in the receptor, and in the conformation selection model the structural change of the receptor occurs before the ligand binding. This model is extensively used even though it requires more time and high-throughput computing facilities. HADDOCK protein-peptide docking software uses conformational selection and induced-fit approach [90].

Combination lock and key

In the docking study, both lock and key and induced-fit models represent a simplified view of the interaction between ligand and receptor. In real scenario, the interaction is a net result of many complicated interaction processes. In this view, Tripathi have proposed a combination of lock and key model [79]. Combined complementary features of receptor and ligand help to understand the stable interaction between them. The more the complementary features fit, the better is the binding. The complementary features can be (1) geometric properties like shape, size, volume, surface area, bond lengths, bond angles, torsional angles, etc., and (2) physicochemical properties like solvation, electrostatic, hydrophobic, polar, nonpolar, van der Waals, etc. For molecules, energy-based features change, on the contrary, geometry-based features do not change a lot in three-dimensional space. Pharmacophoric chemical features, including hydrophobic centroids, aromatic rings, hydrogen bond acceptors or donors, cations, anions, etc., geometric features, and electronic features of molecules due to the presence of electrons contribute towards the net interaction between the molecules. Electronic features, like electrostatic, hydrophobic, van der Waals have varying field intensity in different spatial arrangements of the molecules and thus create a unique electric field whose strength depends on the distance between the atoms and molecules, and it varies from one point to another. Energy contributed by each type of molecular feature, namely pharmacophoric chemical, geometric, and intrinsic electronic towards the net interaction energy corresponds to distinct interaction between molecules. The pattern of these molecular features in three-dimensional space sets in molecular recognition.

This hybrid model is a novel approach and awaits development of new docking software using it.

Use of molecular docking in drug design

Molecular docking is an effective method in drug discovery [5]. There are several applications of molecular docking simulation in drug discovery, such as to explore the binding interaction between protein target and ligand, hit identification and optimization (virtual screening), predicting new disease related targets for existing drugs (drug repositioning), reverse screening for target fishing and profiling (prediction of targets by ligand-receptor complementarity), multi-receptor ligand design and repositioning, investigating the connection between different targets associated with a particular disease (polypharmacology), prediction of structural information those are responsible for effective receptor-ligand binding (ligand-target binding rationalization) [92]. It can be merged with large scale screening methods to recognize the binding pocket of a receptor, predict new targets for known ligand molecules and speculate the adverse drug effects.

Molecular docking approach can be incorporated with artificial intelligence means. Artificial intelligence has two subclasses namely, machine learning and deep learning which have remarkable applications in pharmaceutical industry. Machine learning (ML) [a subset of artificial intelligence (AI)] implements statistical methods to upgrade the machines with experience and novel data, whereas deep learning (DL) (a subcategory of machine learning) empowers the machine to learn from data based on neural network. Machine learning and deep learning methods can be used to predict the binding pocket information of protein where the ligand is bound, binding affinity of the ligand for the protein and its binding orientation and geometry effectively.

ML algorithms are of the following two types: (1) supervised, (2) unsupervised [93]. Unsupervised machine learning algorithms are used to model the training data even if output data is unavailable. So, these algorithms are employed to group data based on feature similarity. On the other hand, in case of supervised machine learning algorithm, the available output is fed into the algorithm with the input data for training. Dhakal and his co-workers have categorized machine learning methods in 2 groups; (1) classical ML (non-deep learning method), (2) modern deep learning or simply deep learning method [46]. Different classical ML methods and modern deep learning methods for prediction of binding site, binding affinity between protein and its ligand, prediction and scoring of binding pose of protein and ligand are listed and summarized by Dhakal and his coworkers [46]. Machine learning learns the interlink between the physicochemical properties and the interactions between protein and ligand from the known binding complexes and implements statistical methods to predict the interactions of unknown protein–ligand complexes. Figure 5 shows the workflow of machine learning method to predict the interaction between protein and ligand [46].

Fig. 5
figure 5

Workflow of machine learning method in predicting protein–ligand interaction [46]. (Color figure online)

The machine learning (ML) algorithms discovers the underlying hidden information from existing data. After training data set are fed into the algorithm, the ML model learns from it automatically. It can identify the hidden relationships and patterns lying in the data which is unidentifiable by experts. The ML model while training can be validated by validation data set to examine how well the model works. Now the model can be executed for a new test data and different assessment metrics can be used to access the attainment of the model prediction [46]. As this model learns from the data, the more the amount of data are fed into it, the better it works. There are several widely used datasets available, which have useful resources for training, validating, and testing data of ML models [46].

Deep learning (DL) method, the subcategory of machine learning method is getting more popularity because of its potential to detect the hidden complex relationships within the data. DL method can automatically extract biological features from raw data. DL neural networks can be employed for predicting pose of a ligand in a protein’s binding pocket and ranking it [93]. Conventional neural networks can be utilized to visualize protein–ligand binding complex in terms of three-dimensional grid and predict binding affinity [94]. DL scoring functions can give remarkable result compared to others. Therefore, together with the abundance of physicochemical properties of protein and ligand along with the modern DL technique a very high accuracy can be achieved. In future, with multitasking learning feature, advance DL method can predict binding site, binding affinity, binding pose simultaneously [46].

Docking in organic, inorganic, and hybrid systems

The molecular docking method can be applied to both organic and inorganic molecules. An organic molecule always contains carbon whereas a pure inorganic molecule does not contain carbon. The organic molecule can be a protein or non-protein or synthetic organic molecule. In the interaction study, the organic receptor and ligand molecule can be any synthetic organic molecule, such as synthetic organic nanoparticle, plastic products made from polymers (small repeating molecules), elastomers (flexible rubber material), medicine, organic dye, artificial sweeteners like stevia and equal, or bio-organic molecule, including carbohydrate, lipid, protein, enzyme, DNA, RNA, or organic nanoparticle derived from natural materials, etc. The inorganic receptor and ligand can be any molecule other than organic molecules and are not found in living beings, like ammonia, hydrogen sulfide, all metals, most elements (such as calcium) and metal nanoparticles, etc. In most of the docking tools, like AutoDock, AutoDock Vina, and ClusPro, etc., inorganic molecules are considered only as ligands and organic molecules are taken as both receptors and ligands [19, 55, 95, 96]. Metal nanoparticle, atom, or ion can be taken as an inorganic ligand, but not as a receptor in a docking study. In case of binding study between protein-metal nanoparticle through molecular docking technique, smaller size protein is assumed as a receptor and the bigger size nanoparticle as a ligand. The types of molecular docking problems depend on the molecules under study which can be organic, inorganic, or hybrid. There are mainly three types of docking problems, namely organic-organic, organic–inorganic, hybrid receptor/ligand, and inorganic-inorganic.

Organic-organic

In this case, both receptor and ligand are organic molecules. Analyzing problems, like protein–protein, protein-DNA/RNA, protein-synthetic organic nanoparticle, protein-organic ligand, protein-organic drug, DNA/RNA-organic ligand, DNA/RNA-organic drug, enzyme-protein, enzyme-organic substrate, and synthetic organic dye-polymer moiety, etc. molecular interactions are examples of organic-organic intermolecular docking studies.

Biomolecules-organic ligand docking case studies

There are many studies over the past that focused on such type. Basu A have found a conformational change in the receptor, a chain of angiotensin-converting enzyme 2 (ACE2) after docking to SARS CoV2 spike protein fragment by ClusPro web server with binding energy − 779.8 kcal/mol as shown in Fig. 6A [96, 97]. The binding complex of SARS Cov2 spike protein and ACE2 protein have been taken as a therapeutic target for SARS CoV2 treatment with naturally available phytochemicals among which hesperidin was found to be the best match after performing docking studies with several phytochemicals. Docking of the complex structure (SARS CoV2 spike protein fragment and ACE2) and different phytochemicals were executed with SWISSDOCK web server and constructed by EADock DSS [98]. The binding affinity of different docking structures of ACE2 and phytochemicals, in presence and absence of the SARS CoV2 spike protein fragment was calculated using Dockthor web server [99].

Fig. 6
figure 6

A Docked complex structure of spike protein fragment of coronavirus (SARS-CoV2) and human receptor angiotensin-converting enzyme 2 (ACE2) using molecular docking webserver ClusPro 2.2 [96, 97]. Spike protein fragment (331–524) is shown in red cartoon view, and human ACE2 is shown in blue cartoon view. B Docked complex structure of phytochemical “hesperidin” (ligand) and protein-enzyme bound structure of spike protein fragment of coronavirus (SARS-CoV2) and angiotensin-converting enzyme 2 (ACE2) (receptor) using SWISSDOCK web server and EADock DSS [97, 98] Spike protein fragment (331–524) is shown in red cartoon view, hesperidin molecule is shown in cyan and red stick view and human ACE2 is shown in blue cartoon view. C Docked complex structure of DNA with aflatoxin B1 exo-8,9-epoxide using AutoDock 4.0 [100, 101, 109]. Aflatoxin B1 exo-8,9-epoxide is shown in red stick view and DNA is shown in violet surface view [97, 101]. (Color figure online)

In absence of spike protein fragment, they have performed the docking study between hesperidin (ligand) and ACE2 protein (receptor) and the binding affinity was found to be − 9.167 kcal/mol. In the second scenario, they have carried out the docking study in presence of spike protein fragment, in which case the previously obtained docking result from ClusPro web server i.e., the organic mixture of SARS Cov2 spike protein and ACE2 protein was taken as receptor and the organic molecule hesperidin was considered as ligand and the binding affinity reduced to − 8.639 kcal/mol. The docking result found is shown in Fig. 6B. Comparison between the binding affinity obtained in these two docking studies in absence and presence of spike protein fragment concluded that, due to the presence of hesperidin molecule the bound complex structure of ACE2 and spike protein fragment gained instability and thus they predicted that hesperidin may be showing antiviral activity in SARS CoV2 infection. In their work, Ricci and Paulo have performed docking between aflatoxin B1 exo-8,9-epoxide, and DNA (PDB id: 1MKL) using AutoDock 4.0 (Fig. 6C) [100, 101].

To understand the possible allosteric inhibiting ability of fourteen laboratory-synthesized aromatic compounds (derived from 2,3-dihydroquinazolin-4(1H)-one compound and considered as ligands) to replace 3,8-diamino-5-ethyl-6-phenylphenanthridinium bromide (Ethidium Bromide or EB) (also referred as ligand) from the calf thymus DNA (CT-DNA, also called as 1-BNA dodecamer or B-DNA) (PBD ID: 1BNA) (taken as receptor), Kumar has performed the molecular docking study between the ligands and the CT-DNA receptor by AutoDock Vina tool using Lamarckian genetic algorithm [16, 95, 102]. Among 15 ligands, 2-(3 nitrophenyl)-2,3-dihydroquinazolin-4(1H)-one) (G13) is found to show better binding affinity than other ligands based on estimated minimum binding energy with hydrogen bonding interaction that helps to understand the potential intercalative DNA binding of the compound. The binding energy in the molecular docking study was − 8.6 and − 7.4 kJ/mol for G13 and EB compounds, respectively. One hydrogen bond was found between G13 compound and nucleotide numbered 22 (DG22, where G stands for Guanine) of CT-DNA with hydrogen bond length 2.224 Å. No hydrogen bond was found between EB compound and CT-DNA. Molecular docking analysis proves that G13 binds to the smaller groove (called as minor grooves) of CT-DNA more strongly than EB that explains the replacing EB by G13 from the minor groove of the DNA. The docking complex results of calf thymus DNA (CT-DNA) with ligands G13 and EB are shown in Fig. 7A, B, respectively.

Fig. 7
figure 7

A Docked complex result of calf thymus DNA (CT-DNA) and G13 B Docking complex result of CT-DNA and EB using AutoDock Vina molecular docking tool [95, 102]. The DNA is represented in in cartoon view and both the ligands (G13 and EB) are shown in sticks view. H-bond is shown in green dashed line [102]. (Color figure online)

Synthetic organic molecules-organic ligand docking case studies

Majority of the docking algorithms recognize biomolecules (e.g., DNA, RNA, protein and enzyme) exclusively as receptors. Other than the biomolecules, they do not support other molecules such as synthetic-organic, synthetic organic–inorganic hybrid, and inorganic molecules as receptors. In our recent work (ability of anionic polymerized hydrogel materials (PGM) (poly(N-isopropylacrylamide-co-acrylic acid) anionic microgels)) to separate cationic organic dye molecules, methylene blue (MB) from water has been studied [103]. The adsorption mechanism of methylene blue dye by organic polymer moieties on the PGM matrix has been understood through molecular docking simulation method using AutoDock 4.2 with simulated annealing algorithm [100, 104].

Docking has been performed between MB cationic dye (receptor) and constituting components i.e., monomeric units of the anionic microgel (ligands), such as carboxylate ion (COO), the carboxylic acid (COOH), and N-isopropylacrylamide (NIPAM) to investigate the possible intermolecular interactions, like hydrogen bonding, electrostatic, van der Waals, desolvation, and torsional between them. The docking complex models between different pairs (COO, MB), (COOH, MB), and (NIPAM, MB) found in this study are presented in Fig. 8. The net binding energy (BE) along with the contributed potentials due to electrostatic (El), van der Waals-hydrophobic-desolvation (Vhd), tortional (T), and unbound (U) calculated for different docking complexes i.e., (COO, MB), (COOH, MB), and (NIPAM, MB) are displayed with a column plot in Fig. 9. The total binding energy was found to be − 2.73 kcal/mol for the docking complex between MB cation and carboxylate anion (COO), which is maximum compared to other docking complexes, such as (COOH, MB) and (NIPAM, MB).

Fig. 8
figure 8

Molecular docking results with least binding energy between methylene blue (MB) cation (recognized as receptor) and organic polymer moieties (identified as ligands), namely carboxylate ion (COO) (deprotonated), carboxylic acid (COOH) (protonated), and N-isopropylacrylamide (NIPAM) using AutoDock 4.2 tool with simulated annealing algorithm [100, 103, 104]. The polymer moieties and MB cationic dye are represented in ball-and-stick models and the inter-atomic distances (within 3.5 Å) between them are displayed in yellow dotted lines. MB, COO, COOH, and NIPAM are colored blue, gray, dark gray, and black, respectively [103]. (Color figure online)

Fig. 9
figure 9

Column charts showing the total binding energy (BE), the contribution from different types of potential namely, electrostatics (EI), van der Waals-hydrophobic-desolvation (Vhd), tortional (T), and unbound (U) found in the docking result for the (COO, MB), (COOH, MB), (NIPAM, MB) bound complexes [103]. (Color figure online)

Organic–inorganic

In this case, the receptor is organic type, and the ligand is inorganic type. The interaction between enzyme-inorganic substrate, enzyme-inorganic drug, protein-inorganic drug, protein-inorganic ligand, protein-metal atom or ion, protein-metal nanoparticle, DNA/RNA-inorganic drug, DNA/RNA-metal atom or ion, DNA/RNA-metal nanoparticle, polymer moiety-metal atom or ion come under the category of organic–inorganic intermolecular docking study.

Molecular docking of ligands having metal atoms or ions (e.g., metal nanoparticle) is challenging though the existing force fields do not support bigger size nanoparticles (having many metal atoms) and all types of metal atoms. Some studies, however, have been carried out using Brownian dynamics (BD) rigid-body docking, AutoDock, AutoDock Vina, and Hex software and are discussed here [55, 95, 100, 105, 106].

Biomolecules-inorganic ligand docking case studies

Aghili Z has predicted the molecular interaction between hen egg lysozyme protein and water-coated iron nanoparticle (Fe NP) by docking study using Hex 6.3 software tool with net binding energy − 230.92 kJ/mol as shown in Fig. 10A [105, 107]. In his work, he found the hydrogen bonding formation between water molecules surrounding the Fe NP and polar residues on the surface of lysozyme protein.

Fig. 10
figure 10

A Docked complex structure of Hen egg white lysozyme (HEWL) protein in gold color cartoon view with water-coated iron nanoparticle (FeNP) in orange stick view using Hex 6.3 docking software tool [105, 107]. B Docked complex structure of VP6 trimer protein in grey cartoon and surface view with Pd(II) ion in blue sphere view at pH 5.0 using AutoDock 4.0 [100, 108]. The metal binding site is shown in red surface view containing polar residues His173, Ser240, and Asp242 viewed in red stick representation. C Docked complex structure of ubiquitin protein with bare neutral gold nano particle (AuNP) using Brownian dynamics (BD) rigid-body docking [106, 186]. The backbone of protein is shown in cartoon view and the residues those are in contact with the Au surface are shown in stick view and the rest of the atoms are shown in line view. The Au surface is shown in gold color net structure [107, 108, 186]. (Color figure online)

Molecular docking study between trimeric structure of VP6 protein and palladium ions (Pd(II)) obtained from PdCl3(H2O) at pH 5 has been demonstrated in Fig. 10B [108]. The study predicts the existence of binding between VP6 protein and Pd(II) ions which further helps in nucleation, growth, and stability in palladium nanoclusters formation. AutoDock 4.0 tool has been used to estimate the metal binding site with residues His173, Ser240, and Asp242 of VP6 for Pd(II) ions and binding energy (− 2.0 kcal/mol at pH 5). Pd(II) ions were found to interact with charged and polar residues, including Asp, His, Ser, Gln, and Asn which results in hydrogen bond formation between Pd(II) ions and protonated side chains of Ser and Asp residues, Pd(II) ions and nitrogen atom of His173. Different types of interaction forces, like hydrogen bond formation between Pd(II) ions and polar residues, electrostatic force of attraction between Pd(II) ions and charged residues in the binding site of VP6 protein favor the stabilization and nucleation growth of Pd nanoparticle.

Brancolini G. in their work have used Brownian dynamics rigid-body docking simulation to understand the interaction between ubiquitin protein taken as rigid body and naked neutral gold nanoparticle (Au(111) NP) surface using protein surface docking method with SDA 6.0 as shown in Fig. 10C [106, 108]. The residues of ubiquitin that make good contact at distances ≤ 3 Å with Au(111) surface are found to be GLY35, PRO37, ARG74, GLY75, and GLY76. The RMSD value of docked complex structure with respect to the initial representative complex structure was found to be − 1.31 Å. The total interaction energy of the resultant docked complex was found to be − 29.12 kcal/mol at 300 K.

Synthetic organic molecules-inorganic ligand docking case studies

In our recent work, we have performed the molecular docking simulation study using the AutoDock 4.2.6 tool between different metal ions (Au3+, Ag+, Fe2+) and atoms (Au0, Ag0, Fe0) with polymer moieties (NIPAM, carboxylate ion (COO)) to understand the molecular interaction which binds the nanoparticle with the polymer matrix and results in stable microgel metal hybrids as shown in Fig. 11 [27, 49, 100, 104, 109,110,111].

Fig. 11
figure 11

Best docking results with minimum binding energy after docking study between polymer moieties (COO, NIPAM) with ions (Ag+, Au3+, and Fe2+) and atoms (Ag0, Au0, and Fe.0), respectively [111]. The polymer moieties, ions and atoms are represented in ball-and-stick, sphere, and sphere views, respectively. The distances between randomly chosen neighboring atoms of polymer moieties and ions are shown in yellow dashed line labelled in Angstrom unit (Å) [111]. (Color figure online)

The simulated annealing algorithm was used for all types of dockings. Ions were found to interact with COO more prominently than atoms. The net binding energy was found to be − 9.28, − 5.7, − 1.46 kcal/mol for interaction between COO (taken as a receptor) and ligands, namely Fe2+, Au3+, and Ag+, respectively. The electrostatic potential contributes the most towards the total binding energy among all types of potentials (van der Waals, hydrophobic, solvation, torsional, and electrostatic). Different types of potentials with the net binding energy found in docking studies for different complexes of polymer moieties (COO, NIPAM) with ions (Ag+, Au3+, and Fe2+) and atoms (Ag0, Au0, and Fe0) are demonstrated using column charts in Fig. 12.

Fig. 12
figure 12

Column charts that represent the contribution from different types of potential such as electrostatic (EI), van der Waals-hydrophobic (Vhd), and torsional(T) towards the net binding energy (BE) in docking studies between polymer moieties (COO, NIPAM) with metal ions (Ag+, Au3+, and Fe2+) and atoms (Ag0, Au0, and Fe0) [111]. (Color figure online)

Hybrid receptor/ligand

From the above classification we understand the interacting molecules can be organic or inorganic. The macromolecule and/or ligand can be an organic–inorganic hybrid molecule, such as metal–organic framework (MOF). MOFs are new advanced synthetic hybrid crystalline materials and can be synthesized from metal ions and organic linkers by coordination bonds. The adsorption property of MOF can be used to detect and remove organic pollutants, including dye, a drug from different environmental water samples. Different case studies are discussed below.

Due to the adsorption behavior of UiO-66 (a Zirconium (Zr) based metal organic framework (MOF)) it can be used to remove pollutants, like anionic dye molecules Congo red (CR) from water [112]. To understand the absorption of Congo red dye by UiO-66, Panda et al. have performed molecular docking study between Congo red dye (CR) and UiO-66 by AutoDock Vina tool using Lamarckian genetic algorithm (LGA) and the docked complex result found in the docking study is shown in Fig. 13 [27, 55, 95]. The binding energy was found to be − 12.82 kcal/mol and the molecular interaction was because of two hydrogen bonds formed with bond length 2.68 and 2.66 Å and three hydrophobic interactions with bond length 4.89, 5.46, and 5.75 Å.

Fig. 13
figure 13

Organic–inorganic-hybrid MOF—organic dye molecule complex [112]. Complex structure of congo red dye (CR) (yellow color) with UiO-66 (a Zr (IV) based metal organic framework (MOF)) (gray and red) using ball-and-stick representation after docking study using AutoDock Vina [55, 95]. The active site is shown in cyan color ellipse [112]. (Color figure online)

Panda J et al. have demonstrated that ZIF-8 MOF can be used to remove reactive blue-4 (RB4), a reactive triazine toxic dye from water because of the absorption property of ZIF-8 MOF for RB4 which helps in water treatment [113]. They have carried out a docking study between ZIF-8 MOF (receptor) and RB4 (ligand) using AutoDock Vina to understand the absorption process computationally and the docking result is shown in Fig. 14. The net binding energy was found to be a negative value (− 22.34 kcal/mol) which means the molecular interaction is an exothermic adsorption process. The favorable binding interactions were due to ten H-bonding having a bond length ranging from 1.87 to 2.8 Å between oxygen atom (O) of –SO3 group in RB4 and hydrogen atoms (H) of –CH3 group in ZIF-8 MOF and three π–π interactions having bond length 4.78, 5.2, and 5.8 Å.

Fig. 14
figure 14

Docked complex structure of zeolitic imidazole framework-8 (ZIF-8), a special class of metal organic framework (MOF) with organic molecule reactive blue-4 (RB4) ions using docking study with AutoDock Vina software [55, 95, 113]. ZIF-8 (receptor) and RB4 (ligand) are represented in stick and ball-and-stick models, respectively. H-bonds are represented with green and π–π interactions are shown with red dashed lines in the active site of ZIF-8 [113]. (Color figure online)

Lu et al. in their work have used the molecular docking method using AutoDock 4 tool to investigate the intermolecular interactions between porous chromium terephthalate, MIL-101(Cr) and widely used antibiotics named 5-nitroimidazoles (5-NDZs) to understand the adsorption mechanism [114]. MIL-101(Cr) is an organic–inorganic-hybrid metal organic framework (MOF) material constructed from metal ions and organic linkers and can be used for the trace analysis of organic pollutants 5-NDZs in different water samples. Among five antibiotics, metronidazole (MNZ) is chosen here to show the molecular interactions. The docking study using the Lamarckian genetic algorithm (LGA) have been carried out between MNZ (ligand) and MIL-101(Cr) (receptor), the docked complex is demonstrated in Fig. 15 and the calculated binding energy by AutoDock tool was found to be − 6.21 kcal/mol. The net binding energy is mostly by the electrostatic force of attraction and the other types of potentials that contribute to binding energy are the formation of hydrogen bonds, coordination bonds and different types of interactions, namely hydrophobic, π–π, and van der Waals. The contributions from all such interaction potentials contribute to the extraction property of MIL-101(Cr) MOF.

Fig. 15
figure 15

Docked complex structure of porous chromium terephthalate, MIL-101(Cr) and metronidazole (MNZ) using AutoDock 4 [114]. The MIL-101(Cr) (receptor) and MNZ (ligand) are viewed in stick and ball-and-stick models, respectively. For ligand, carbon, oxygen, sulfur, nitrogen, and hydrogen atoms are marked in green, red, yellow, blue, and orchid, respectively. For receptor, oxygen atoms and Cr ions are marked in red and orchid, respectively. The π–π interactions between imidazole rings of MIL-101(Cr) and MNZ, and hydrogen bonds are represented by pink and green dashed lines respectively in the binding site of MIL-101(Cr) [114]. (Color figure online)

A highly porous zirconium metal–organic framework (MOF), such as UiO-66 (Universitetet i Oslo), which is composed of [Zr6O4(OH)4] clusters and 1,4-benzodicarboxylic acid struts (University of Liverpool, ChemTube3D) can be used as an adsorbent material. Adsorption of polycyclic aromatic hydrocarbon (PAH) (recognized as a hazardous and toxic pollutant), such as chrysene (CRY, Molecular formula: C18H12) (a product of coal tar) by UiO-66(Zr) helps in wastewater treatment by removal of the pollutant [115]. To understand the mechanism and driving forces of the adsorption process of CRY onto UiO-66(Zr) MOF surface, Zango and his co-workers have implemented molecular docking using AutoDock 4.2 Suite with Lamarckian genetic algorithm (LGA) as shown in Fig. 16 and the predicted binding energy was found to be − 2.32 kcal/mol, that indicates a stable absorption because of potential interactions on the surface.

Fig. 16
figure 16

Docked complex structure of UiO-66(Zr) metal organic framework (MOF) and chrysene (CRY), a toxic and hazardous polycyclic aromatic hydrocarbon (PAH) pollutant using AutoDock 4.2 [115]. UiO-66(Zr) MOF (receptor) and CRY (ligand) are shown in stick and ball-and-stick models, respectively. UiO-66(Zr) is marked in white, violet, and red color. Four benzene rings of CRY are marked in blue and light gray color [115]. (Color figure online)

The calculate energy due to different types of interaction potentials, such as van der Waals, hydrogen bonding and desolvation was − 3.1 kcal/mol and electrostatic was 0.79 kcal/mol. CRY was found to show electrostatic interaction with Zr4+ ion of the MOF.

To understand the cause of instant isotopic exchange reaction in silver nanoparticles cluster, Chakraborty et al. have carried out molecular docking study between two [Ag25(DMBT)18] (DBMT for 2,4-dimethylbenzenethiol, which acts as a protecting ligand) clusters using AutoDock 4.2 with Lamarckian genetic algorithm [100, 116]. [Ag25(DMBT)18] has been used both as receptor and ligand in the docking process. The binding energy was found to be − 23.7 kcal/mol. The docking result with least binding energy is shown in Fig. 17.

Fig. 17
figure 17

Docked complex structure of two [Ag25(DMBT)18] clusters using AutoDock 4.2 [100, 115]. Complex is shown in ball-and-stick model. Silver (Ag) and sulfur (S) atoms are shown in gray and yellow, respectively. C–H π interactions are viewed in green dotted lines. Hydrogen (H) atoms and benzene rings associated with these interactions are viewed in red and blue, respectively. Other benzene rings not associated with these interactions are shown in green [116]. (Color figure online)

The fluorescent cobalt oxide (CoO) umbelliferone nanoconjugate having anti-cancer activity can be used both as a drug and carrier. Ali et al. have conducted molecular docking studies applying the protein docking program, HEX 8.0.0 using Spherical Polar Fourier Correlations technique to find the binding interactions of the CoO-drug nanoconjugate with the B-DNA dodecamer (also called as Calf thymus DNA (CT-DNA)), a DNA duplex having the sequence (CGCGAATTCGCG)2 (PDB ID: 1BNA) and human serum albumin (HSA) (PDB ID: 1H9Z) protein separately [117,118,119]. The most stable docking results with least binding energies of the CoO-drug nanoconjugate against the DNA molecule and interactions in the binding cavity of the CT-DNA are shown in Fig. 18A, B. The docking study predicted the existence of electrostatic, hydrophobic and hydrogen bonding intermolecular noncovalent interactions between them. Similarly, the most stable bound conformation of the docking result of the CoO-drug nanoconjugate and HSA protein with binding site interactions, such as hydrophobic, hydrogen bonding and metal acceptor are shown in Fig. 19A, B.

Fig. 18
figure 18

A Molecular docking complex of calf thymus DNA (CT-DNA) (PDB ID: 1BNA) and cobalt oxide (CoO) umbelliferone drug nanoconjugate using HEX 8.0.0 [117, 118]. DNA is represented in carton and surface view. The drug nanoconjugate, a mixture of CoO nanoparticle (blue and red sphered view) and umbelliferone drug (green, red, and white sphered view). B Non-covalent interactions of DNA bases with the nanoconjugate depicted by dashed lines [117]. (Color figure online)

Fig. 19
figure 19

A Docking result of human serum albumin (HSA) with the CoO-umbelliferone drug nanoconjugate by protein docking program, HEX 8.0.0 using Spherical Polar Fourier Correlations technique [117,118,119]. B Binding interactions of noncovalent type between neighboring amino acid residues in the binding pocket of HSA and the drug nanoconjugate shown in dashed lines [117]. (Color figure online)

Inorganic–inorganic

In this case, both receptor and ligand in docking study are inorganic. The interaction study between any two chemical molecules that do not have any carbon-hydrogen bond comes under this category. To the best of our knowledge, there has not been any study carried out on inorganic receptor and ligand docking till now.

Challenges in molecular docking

There are many limitations and challenges in docking techniques [15]. A docking result predicted may not be accurate and match the result found in an experimental approach.

  1. (i)

    Recognition of different types of molecular features which contribute to the interaction between them can be complex, difficult to understand and time-consuming to simulate in a computer. The docking technique tries to incorporate additional complexity in each step.

  2. (ii)

    Number of conformational degrees of freedom of molecules makes it difficult and time-consuming to pose a ligand in the binding pocket of a macromolecule. A complex effective scoring function helps in this perspective.

  3. (iii)

    Low resolution crystallographic structural information of molecules and molecules with unknown structural information makes it more difficult to find the correct binding pockets of receptors.

  4. (iv)

    Flexibility of molecules and changes in the geometry of molecules that occur during the binding also complicate the finding.

  5. (v)

    Hydrated docking is not possible always by most of the docking tools and thus involvement of water molecules in a macromolecule-ligand interaction is difficult to predict. Removing water molecules from the binding pocket of receptor before docking process is usually done by docking algorithms. But this prepossessing of receptor is not universally correct in many cases, such as if the water molecules are tightly bound or they are functionally active in the binding-site. In such cases hydrated docking is very much essential.

  6. (vi)

    Existing force fields do not support all types of atoms. Finding the force field and some other required parameters is a challenging task in docking.

  7. (vii)

    Most of the docking software tools accept only biomolecules, such as DNA, RNA, protein, enzyme, etc. as receptors. But other than biomolecules they do not recognize synthetic-organic, synthetic organic–inorganic hybrid, and inorganic molecules as receptors.

  8. (viii)

    Also, for ligand, they accept a few metal atoms but not all. Similarly, few docking tools, including AutoDock, Hex, PatchDock recognize nanoparticles as ligands but many docking tools do not recognize all types of nanoparticles, like gold (Au), silver (Ag), iron (Fe), etc. [100, 105, 120]. Even a bigger size ligand having numerous atoms, such as metal nanoparticle having hundreds, or a greater number of atoms cannot be processed by a docking algorithm.

  9. (ix)

    Ions are also supported by a few docking tools. When they are taken as ligands the charge gets neutralized automatically by the software. So, the user must manually check the charges carefully.

  10. (x)

    Number of atoms and torsions of a molecule are also restricted in many docking tools.

Concluding remarks

From the above study, we conclude that although there are numerous docking tools and web servers available, they are specific to type of molecule to be accepted as input as receptor or ligand. That is why interaction studies of biomolecules receptors, namely protein, DNA, and RNA, etc. with organic, inorganic, or hybrid (natural or synthetic organic–inorganic mixture) ligands are commonly seen. Looking into the challenges and limitations we would say the best docking algorithm is probably the one made by a hybrid of many algorithms which will have novel search, scoring function, force field for all types of atoms, allowance of unlimited number of atoms and rotatable bonds (torsions) in a molecule etc. We have shown here some case studies in synthetic organic–inorganic, metal–organic hybrid-organic systems. Inorganic-inorganic docking studies are still missing which can be seen in future hopefully with better force fields and required parameters.