Bringing a new drug into the market is a costly process in terms of money, manpower, and time. Drug discovery and development takes an average of 10–15 years with an approximate cost of US$800 million (DiMasi et al. 2003; Song et al. 2009) to US$1.8 billion (Paul et al. 2010). Innovations in combinatorial chemistry that led to the increase of compound databases covering large chemical spaces aided in the expansion of drug discovery and the development of high-throughput screening (HTS) (Jhoti et al. 2013; Lavecchia and Di Giovanni 2013). Despite this, the number of new molecular entities (NMEs) successfully launched into the market continued to decrease over the last several years (Paul et al. 2010). To this end, employment of computer-aided drug discovery (CADD) techniques by top pharmaceutical companies and other research groups became essential for the preliminary stage of drug discovery to expedite the drug development process in a more cost-efficient way and to minimize failures in the final stage. Recent successes involving rational drug design have been reported elsewhere (Shoichet et al. 2002; Reynolds 2014; Rodrigues and Schneider 2014). While HTS continues to be a big part of the drug discovery process in the pharmaceutical industry due to its high success rate, lack of primary understanding of the molecular mechanism behind the activity of the identified hits can hamper the search of promising candidates (Macarron et al. 2011; Lionta et al. 2014). The use of rational drug design, as applied in CADD, provides a knowledge-driven approach that can yield valuable information about the interaction pattern between protein and ligand (complex), as well as the binding affinity. Furthermore, the availability of supercomputers, parallel processing, and advanced softwares have greatly facilitated the rate of lead identification in pharmaceutical research.

The current scenario of the drug discovery process involves several disciplines such as chemical and structural biology, computational chemistry, organic synthesis, and pharmacology. Accordingly, it is comprised of several stages: (a) Target identification involves the discovery and isolation of individual targets to investigate their functions and association with a specific disease (Anderson 2003). (b) Target validation is the stage where the drug target is linked to the disease of interest, as well as their capacity to regulate biological functions in the body after binding to a partner molecule. Numerous studies are performed to ascertain that the target macromolecule is linked to the diseased state (Chen and Chen 2008). (c) Lead identification entails the discovery of a synthetic chemical that shows a degree of potency and specificity against a biological target and is assumed to have the makings of a drug that can cure the intended disease (Kalyaanamoorthy and Chen 2011). (d) Lead optimization covers the improvement of potency and other significant properties through iterative cycles of evaluation of the lead compound(s) and their analogs. Thus, both in vitro and in vivo experiments are conducted to prioritize and select candidates with optimum potential for development as a safe and efficient drug. Moreover, structure–activity relationships (SARs) are developed to determine pertinent pharmacokinetic and pharmacodynamic properties that can be applied to analogs that will be synthesized for evaluation (Andricopulo et al. 2009). (e) Preclinical stage involves drug synthesis and formulation research, in vivo animal studies for potency and toxicity, and characterization of mechanistic toxicity (Cavagnaro 2002; Silverman 2004). (f) Clinical trials include three phases that investigate safety, adverse side-effects, dosage, efficacy, and pharmacokinetic and pharmacological properties of the candidate drug on human volunteers (Silverman 2004).

In general, modeling approaches are categorized into structure-based and ligand-based methods (Fig. 1). The structure-based approach consists of using the 3D structure of the target (enzyme/receptor) for the generation or screening of potential ligands (modulators), followed by synthesis, biological testing, and optimization. In contrast, ligand-based approach consists of subjecting a collection of molecules with diverse structures and known potency to computational modeling methods to develop theoretical predictive models. These models are then used for structural optimization to enhance potency and for identification of new chemical entities through virtual screening of a large chemical database. Over the decades, these two types of CADD approaches continued to improve and evolve separately. However, combining different structure-based and ligand-based design strategies in the drug discovery effort have been established to be more effective than any single approach since both methods are able to complement their strengths and weaknesses (Prathipati et al. 2007; Grinter and Zou 2014; Lionta et al. 2014). Here, we summarize the recent advances in computational drug design that could provide insights to modern drug discovery.

Fig. 1
figure 1

Representative workflow for computer-aided drug design

Chemical biology in target identification and validation

Target identification can be performed either at the start (target-based or reverse chemical genetics) or at the end (phenotype-based or forward chemical genetics) of a biological screening. Direct biochemical, genetic interaction, and computational inference techniques are used to verify the protein target involved in the biological pathway being studied. Direct biochemical approach pinpoints the target of a small molecule via biochemical affinity purification (Burdine and Kodadek 2004). Enzyme assays and other biochemical tests can provide beneficial information for lead optimization when target structure information is available. In addition, numerous DNA and RNA analysis methods permit target identification using genetic or genomic methods to measure or modify the presence and function of protein targets in a controlled model (Schenone et al. 2013). Computational inference methods are also able to predict the target of identified inhibitors using structure-based and profiling (ligand-based) methods. In particular, these methods are often used for drug repositioning to explain the off-target interactions in drug discovery research. The methods mentioned above are essential in collecting information necessary to perform CADD and have been extensively reviewed in a recent paper by Schenone (Schenone et al. 2013).

Structure-based drug design (SBDD)

In SBDD, the knowledge acquired from the binding site of a 3D macromolecule structure is used to design and evaluate ligands based on their predicted interactions with the protein binding site (Lavecchia and Di Giovanni 2013; Grinter and Zou 2014). Thus, identification of a valid drug target and the acquisition of its structural information are the first vital steps in SBDD. Researches from structural and computational biology aided in the generation of protein structures with the use of X-ray crystallography, nuclear magnetic resonance (NMR), cryo-electron microscopy (EM), homology modeling, and molecular dynamic (MD) simulations (Anderson 2003; Kalyaanamoorthy and Chen 2011). SBDD can be divided into two categories (Kalyaanamoorthy and Chen 2011; Lionta et al. 2014): the de novo approach and the virtual screening approach. De novo drug design exploits information from the 3D receptor to find small fragments that match well with the binding site. These fragments should be linked based on connection rules to ensure synthetic accessibility, providing a structurally novel ligand that can be synthesized for further screening (Kutchukian and Shakhnovich 2010; Rodrigues and Schneider 2014). On the other hand, virtual screening (VS) uses available small molecule libraries (Table 1) to identify compounds with specific bioactivity to act as replacements for existing ligands of target biomolecules or to discover compounds for unexplored known targets with available structural information (Andricopulo et al. 2009; Lavecchia and Di Giovanni 2013).

Table 1 Various software packages for computational drug design

Binding site prediction or identification

The ideal binding site is a concave region containing several chemical functionalities that interact with a ligand to achieve the desired result (activation, modulation, or inhibition) (Anderson 2003; Kalyaanamoorthy and Chen 2011). Proteins co-crystallized with their substrates or known inhibitors, as well as mutation studies identifying key residues for interaction, provide beneficial knowledge in SBDD. However, when no information about the binding site is available, additional analyses are needed in order to perform structure-based rational drug discovery. Currently, several in silico approaches have been reported in a number of papers and are available for the recognition of binding regions in proteins (Laurie and Jackson 2006; Zhang et al. 2011). The binding site of a small molecule compound can be predicted using available tools such as PASS (Brady and Stouten 2000), Q-SiteFinder (Laurie and Jackson 2005), CASTp (Dundas et al. 2006), LIGSITEcsc (Huang and Schroeder 2006), SiteMap (Halgren 2009), FPocket (Le Guilloux et al. 2009), ConCavity (Capra et al. 2009), MED-SuMo (Doppelt-Azeroual et al. 2010), MDPocket (Schmidtke et al. 2011), FTMAP (Schmidtke et al. 2011), POOL (Somarowthu and Ondrechen 2012), and many others. Alternatively, specific approaches that can identify peptide binding sites include PepSite (Trabuco et al. 2012), PeptiMap (Lavi et al. 2013), and PEP-SiteFinder (Saladin et al. 2014). Lastly, difficulties in detecting allosteric sites can be alleviated by recently developed open-access web servers such as SPACER (Goncearenco et al. 2013), MCPath (Kaya et al. 2013), Allosite (Huang et al. 2013), and PARS (Panjkovich and Daura 2014). A systematic assessment of a number of available web servers and stand-alone protein–ligand binding site prediction programs was published previously (Chen et al. 2011). The report detailed that while these methods can be valuable in finding putative binding sites, the predictive quality of the algorithms used may be dependent on several factors, including template similarity and the size of the pocket (Chen et al. 2011).

Docking and scoring

Molecular docking is one of the most well-known SBDD methods that predicts possible binding modes of a compound in a particular target binding site and estimates affinity based on its conformation and complementarity with the features found in the binding pocket (Cheng et al. 2012; Hung and Chen 2014). Accurately scoring the binding and ranking of docked compounds is a crucial step in virtual screening. Frequently used scoring functions, as reported in several papers (Huang et al. 2010; Cheng et al. 2012; Lavecchia and Di Giovanni 2013; Liu and Wang 2015), are divided into three categories: (a) Force field-based functions calculate binding affinity derived from physical atomic contacts in the target-ligand complex. Both solvation and entropy terms can also be evaluated. (b) Empirical-based functions employ simpler energy terms, such as that for hydrogen bonding and hydrophobic interactions, fitted to experimental binding affinity data. (c) Knowledge-based functions derive binding energy from statistical analyses taken from a training set containing protein–ligand atom pair frequencies. Numerous studies have compared different docking programs and their score functions to evaluate their ability to correctly score and rank ligands that bind to a protein target. Even though current scoring functions have greatly advanced, more development is necessary to increase accuracy in hit ranking. Performance of scoring functions varies depending on the purpose (scoring or ranking) and on the protein families and ligand series being considered (Wang et al. 2003; Warren et al. 2006; Cheng et al. 2009; Cross et al. 2009; Xu et al. 2015). Combining the results from two or more scoring functions and forming a consensus has proved to be more beneficial in scoring and ranking compounds. Consensus scoring has superior accuracy, thereby improving the probability of discovering true potential hits (Cheng et al. 2009; Houston and Walkinshaw 2013).

Flexibility and solvent effects

Difficulties experienced in scoring and ranking compounds mainly arise from restraints and approximations applied to conformational flexibility and solvation effects. The dynamic nature of proteins and ligands in a physiological setting allows them to adopt a considerable number of different conformations. However, incorporating molecular flexibility in virtual screening of large chemical libraries is not feasible due to high computational costs (Anderson 2003; Feixas et al. 2014). To solve this, different strategies that can partially account for protein flexibility in docking calculations are being used in various software packages (Table 1): (a) ‘Soft scoring’ or the softening of van der Waals potentials reduces steric energy penalties to permit some degree of overlap between ligand and target atoms (Jiang and Kim 1991; Lavecchia and Di Giovanni 2013). (b) Ensemble-based approach exploits multiple receptor structures that are experimentally determined (X-ray crystallography or NMR) or computationally produced using molecular dynamics [relaxed complex scheme (RCS) (Amaro et al. 2008; Amaro and Li 2010)]. Instead of using a single rigid structure, carefully chosen representative structures accounting for conformational changes in protein (backbone and/or side chain) is capable of yielding a higher enrichment of potential compounds (Cheng et al. 2008; Totrov and Abagyan 2008; Amaro and Li 2010; Korb et al. 2012; Tuffery and Derreumaux 2012). A few of the algorithms that use this approach include BP-Dock (Bolia et al. 2014), MedusaDock (Ding and Dokholyan 2013), and RosettaBackrub (Lauck et al. 2010). (c) Induced fit approach is based on the principle of induced fit effects that take place upon binding of a ligand to a receptor. As introduced by Koshland (Koshland 1958), the mechanism of induced fit happens after the initial formation of receptor-ligand complex where the ligand ‘induces’ conformational change in the protein, prompting a tighter binding (Sotriffer 2011; Feixas et al. 2014). (d) Molecular simulation based on force fields depicting intra- and intermolecular interactions between receptor and ligand is also possible. The downside is that protein flexibility is often constrained to the ligand binding site for computational efficiency, leading to the occurrence of energy barriers in MD calculations. As a result, unrealistic conformations are procured and computationally demanding simulations will be necessary to obtain an accurate complex structure (May et al. 2008). The application of side-chain flexibility, which is critical in receptor flexibility, permits a certain measure of freedom in the binding region of the target (Najmanovich et al. 2000; Bostrom et al. 2006; Waszkowycz et al. 2011). Even so, backbone displacement should also be considered since it can strengthen the accuracy of side chain orientations (Bolia et al. 2014). Existing methods that incorporate partial protein flexibility include Glide, RosettaLigand (Davis and Baker 2009), FLIPDock (Zhao and Sanner 2007), and FITTED (Corbeil et al. 2007, 2008; Corbeil and Moitessier 2009; Englebienne and Moitessier 2009a, b; Pottel et al. 2014; Therrien et al. 2014).

Solvation also plays a crucial role in ligand binding stability by mediating the interaction between receptor and ligand. Entropic and enthalpic contributions of the solvent via displacement of structural water have also been noted to enhance binding affinity (Abel et al. 2008; Yang et al. 2013). Inclusion of explicit solvent assists in accurately predicting the binding mode of ligands in docking studies, especially in complexes where water is required for biomolecular recognition (Lie et al. 2011; Liu et al. 2013). Still, preliminary research regarding the number and location of relevant structural and bridging water molecules is necessary to avoid misleading results (Roberts and Mancera 2008; Thilagavathi and Mancera 2010). Recently developed methods that take into account solvation contributions include WaterMap (Abel et al. 2008; Yang et al. 2013) and AcquaAlta (Rossato et al. 2011).

Ligand-based drug design (LBDD)

In cases where 3D structure of the target protein is lacking, information taken from a set of ligands active against a relevant target (receptor or enzyme) can be used to identify significant structural and physicochemical properties (molecular descriptors) responsible for the observed biological activity. Here, there is an assumption that structurally similar compounds display similar biological response and interaction with the target (Prathipati et al. 2007). The compound set should encompass a wide range of concentration (at least 4 orders of magnitude) (Melo et al. 2014) to generate a reliable ligand-based screening model. Common ligand-based design techniques are quantitative structure–activity relationships (QSARs) and pharmacophore-based methods.

Quantitative structure–activity relationship (QSAR)

QSAR studies are based on the premise that changes in bioactivity are associated with structural and molecular variations in a set of compounds (Lavecchia and Di Giovanni 2013; Melo et al. 2014). A statistical model is generated from this correlation to develop and mathematically predict the biological property of novel compounds (Melo et al. 2014). Several restrictions are required to generate a reliable QSAR model: (a) the bioactivity data should be of sufficient number (minimum of 20 compounds with activity) and acquired from a common experimental protocol such that the potency values are comparable, (b) proper selection of compounds for the training and test sets, (c) molecular descriptors for the ligands should have no autocorrelation to avoid over-fitting, (d) the model should be validated using internal and/or external validation to determine its applicability and predictivity (Cherkasov et al. 2014; Melo et al. 2014). Table 2 lists some of the known programs or web servers commonly used for QSAR studies.

Table 2 Programs for molecular descriptor calculation or QSAR model analysis

Comparative molecular field analysis (CoMFA) (Cramer et al. 1988), which was established more than three decades ago, is still one of the most widely used 3D-QSAR method. More recent 3D-QSAR strategies include Topomer CoMFA (Cramer 2003), spectral structure activity relationship (S-SAR) (Putz and Lacrama 2007), adaptation of the fields for molecular comparison (AFMoC) (Gohlke and Klebe 2002), and comparative residue interaction analysis (CoRIA) (Datar et al. 2006). Despite its notable successes in the drug discovery field, 3D-QSAR still has numerous shortcomings that can be solved by more advanced multidimensional QSAR strategies in the form of 4D, 5D, and 6D-QSAR. 4D-QSAR was developed to address ligand conformation and orientation in the target binding site, while 5D-QSAR incorporates issues like receptor flexibility and induced fit effects. Finally, 6D-QSAR takes note of the solvation effects to include its critical role in receptor-ligand interaction (Damale et al. 2014). Advances in computational power and software performance has also been applied in improving QSAR model development and validation through Discovery Bus (Cartmell et al. 2005) and AutoQSAR (Davis and Wood 2013). In both methods, hundreds of highly predictive statistical models can be objectively discovered, updated, and validated by continuously integrating new machine learning agents and descriptors into the system.

Pharmacophore modeling

Pharmacophore screening aims to identify compounds containing different scaffolds, but with a similar 3D arrangement of key interacting functional groups (Vuorinen and Schuster 2015). Binding site information can be incorporated into the pharmacophore model by exploiting the bioactive conformation of candidate compounds. Pharmacophore modeling is also often performed in the molecular alignment stage of QSAR modeling studies (Melo et al. 2014). Several commonly used programs for automatic pharmacophore generation include Discovery Studio, PHASE (Dixon et al. 2006a, b), LigandScout (Wolber and Langer 2005), and MOE. These softwares and other algorithms have already been extensively reviewed (Vuorinen and Schuster 2015). Accordingly, generating a model with well-balanced sensitivity and specificity is important to reduce false negative and false positive results, respectively. Spatial constraints can be employed in areas occupied by inactive compounds and refined to avoid making the model too restrictive. Furthermore, features not consistently observed in active compounds should be made optional or removed from the model. After model refinement, validation studies must be performed to determine the sensitivity and specificity of a model against an external test set (Vuorinen and Schuster 2015).

In the absence of both receptor 3D information and a set of active ligands, it is possible to create sequence-derived 3D pharmacophore models. From the concept similar receptors can bind with similar ligands (Klabunde et al. 2009; Vuorinen and Schuster 2015), Pharma3D can use homology models and 3D crystal structures to detect common sequence motifs for ligand biomolecular recognition in protein families and create a single-feature pharmacophore database. This approach has been successfully applied in virtual screening for GPCR (family A). Sequences displaying the same motifs can theoretically distinguish the same ligand functionality at an analogous spatial location. With this, the sequence motifs linked to specific single-feature pharmacophores are identified and used to generate the 3D pharmacophore model. In spite of its limitations, this is an attractive technique for drug targets that have limited or no receptor and ligand information available (Klabunde et al. 2009).

Compound selection

Generally, compound selection (or ‘cherry picking’) is done before proceeding to in vitro and in vivo assessment of lead potency. As mentioned in the Docking and Scoring section, one of the serious limitations of the current scoring functions include incorrectly ranking compounds, which lead to difficulty in identifying true hits. Apart from re-scoring poses and creating a consensus list of hits (Lionta et al. 2014), an alternative strategy is to assess the interaction of a ligand in the binding pocket and identify compounds displaying similar interaction patterns (Waszkowycz et al. 2011). The Automatic analysis of Poses using Self-Organizing Map (AuPosSOM) (Bouvier et al. 2010) can be employed for pose clustering and ranking of virtual screening hits. Their strategy is based on the assumption that specific contacts between an active compound and its target are necessary to display the desired activity. In addition, clustering is also often performed to search for common scaffolds among the hit compounds and to select a representative compound per cluster. Evaluation of structurally diverse compounds is a more cost- and time-efficient manner to investigate a large chemical space (Vuorinen and Schuster 2015).


Target specificity is a vital criterion in the search of efficient drugs. Nevertheless, frequent occurrence of false positives due to aggregation, ligand promiscuity, and compound reactivity is still observed during the experimental evaluation of lead candidates, impeding drug discovery and development. The use of surfactants in screening assays minimizes compound aggregation (Ryan et al. 2003; Feng and Shoichet 2006), while reactive compounds are identified by using reactive group filters to improve hit list quality (Roche et al. 2002). Additionally, pan-assay interference compounds (PAINS) filter (Baell and Holloway 2010; Erlanson 2015) can take out compounds that are frequent hitters in various HTS experiments, and thus cannot be used as specific inhibitors. However, the use of PAINS should be considered carefully since this filter was developed based on a limited number of HTS data and may not be applicable to other screening experiments (Vuorinen and Schuster 2015).

Absorption, distribution, metabolism, excretion, and toxicity (ADMET)

High attrition rates due to poor pharmacokinetic profiles produced the need to determine the ADMET properties of leads in the early stages of drug screening. However, experimental evaluation of pharmacokinetic properties of millions of compounds is not a viable option in terms of money and time. Thus, in order to quickly assess the drug-likeness of a lead compound prior to extensive experimental testing, virtual screening can be employed to filter hits and eliminate compounds with undesirable qualities (Bajorath 2002). Similar to QSAR, in silico ADMET filters are derived from chemical or molecular descriptors, and are used to predict drug-like characteristics of compounds. The simplest and most well-known models include Lipinski Rule of Five (Lipinski et al. 2001), Rule of Three for fragments (Congreve et al. 2003), and Veber rules (Veber et al. 2002). Publicly available web servers like ChemBioServer (Athanasiadis et al. 2012) and Free ADMET Filtering-Drugs2 (FAF-Drugs2) (Lagorce et al. 2008, 2011) can be used to filter a large compound database or a list of potential leads. ChemBioServer has the capability to display compounds and graph molecular properties, filter compounds based on different chemical qualities, steric clashes, and toxicity, perform substructure search, cluster compounds, and recommend a representative for each group (Athanasiadis et al. 2012). Alternatively, FAF-Drugs2 features various pre-defined filters that the user can choose from, like the ones mentioned above, along with others such as central nervous system (CNS) Filter (Jeffrey and Summerfield 2010) and reactive group filter. In addition to these, pharmacophore models generated from inhibitors that cause toxicity can also be used to identify compounds with unfavourable moieties (Ananthula et al. 2008). To address the issue of drug metabolism, reactivity models such as those implemented in SMARTCyp are helpful. SMARTCyp is a free web service and downloadable program that predicts sites in 2D compound structures that are likely to undergo Phase I CYP450-mediated metabolism. It calculates the reactivity of ligand fragments using quantum chemical computations and the accessibility of atoms in the molecule to determine possible sites of metabolism (Rydberg et al. 2010). Alternatively, MetaSite (Cruciani et al. 2005) also makes use of a similar algorithm to identify potential metabolic reactivity sites, but with 3D configuration of the compound as the query input. Other ADMET filters and tools are listed in Table 3.

Table 3 Programs for prediction of ADMET properties

In the application of these in silico ADMET models, we should keep in mind that these tools are more helpful in the qualitative analysis of hits or compound sets rather than accurately predicting the quantitative values. These methods are beneficial in the prioritization of an identified class of compounds for in vitro or in vivo assessment or evaluation of a particular descriptor and SAR (Gleeson and Montanari 2012).

Applications of computational drug design

Several CADD studies have been reported in the past years. Here, we briefly illustrate selected studies that apply virtual screening tools in drug discovery.

In 2014, a study by Gao (2014) highlighted the structure-based rational drug design against Tip60 histone acetyltransferase, an attractive target for cancer drug discovery. In their study, the 3D structure of the human Tip60 acetyltransferase domain was retrieved from PDB for modeling experiments. However, since several key residues were missing in the structure, homology modeling was employed using the human Tip60 sequence for the target and the incomplete human Tip60 crystal structure as the template. They used Alpha Site Finder to find the binding site into which Pentamidine (PNT, known inhibitor) and acetyl-CoA (natural substrate) were later docked. Subsequently, PNT derivatives were computationally generated and docked to the Tip60 model to determine their binding affinities. MD simulations were also implemented to account for explicit water molecules and flexibility. Finally, they identified and experimentally validated TH1834 as a potential lead for the inhibition of Tip60 activity. As for studies focused on ligand-based techniques, our laboratory performed 3D-QSAR analyses of heterocyclic quinones to investigate their inhibitory activity on vascular smooth muscle cell (SMC) proliferation (Ryu et al. 2008) and their cytostatic activity (Lee et al. 2009). In both studies, we used genetic algorithm with linear assignment of hypermolecular alignment of database (GALAHAD) to generate and refine the pharmacophore model for molecule alignment, and CoMFA and CoMSIA to relate the molecular properties of the compounds with their observed activities. In the first study, our group determined the inhibitory activity of known heterocyclic quinone inhibitors for SMC proliferation and utilized 3D-QSAR in order to obtain and study their 3D molecular contour maps. The information acquired from this research can then be employed for the design of more potent SMC proliferation inhibitors (Ryu et al. 2008). For the second study, we examined the cytostatic activity of heterocyclic quinones by utilizing semi-empirical calculations and the resulting LUMO energy to optimize the structures and compute the potential for the one-electron reduction of quinones, respectively. Our study provided reasonable correlation between the cytotoxic activity of the compounds with their calculated reduction potential, yielding 3D-QSAR models and contour maps that can be used to design compounds with reduced cytotoxic activity (Lee et al. 2009).

While both SBDD and LBDD were successfully utilized individually in the studies mentioned above, relying solely on one approach greatly limits the probability of finding a plausible lead. Integration of these two methods in a drug discovery study can provide better and more extensive information in the modeling of innovative drug candidates against various diseases. This can be observed in a recent computational study involving the transient receptor potential vanilloid type 1 (TRPV1), wherein the combination of homology modeling, pharmacophore filtering, and molecular docking identified a number of potential antagonists. Initially, they generated a pharmacophore model from known TRPV1 antagonists and used it to filter the NCI database before performing docking experiments against the hTRPV1 homology model. From the docking results, they selected the top-scoring compounds for further experimental testing, yielding novel antagonists for hTRPV1 (Feng et al. 2015). In addition to assisting in the selection of potential leads for a given target, computational tools can also provide viable interaction hypotheses for experimentally validated inhibitors of a target disease. As an example, our group conducted molecular modeling studies of potent DNA methyltransferase (DNMT) inhibitors, SGI-1027 and CBC12, to propose the binding modes of these compounds and give a possible explanation for their observed activity in vitro. Remarkably, the binding scores obtained for SGI-1027 were in excellent agreement with the published experimental results, validating the docking models. Moreover, the docking result of CBC12 corroborates the proposed inhibitory mechanism for DNMTs which suggests the use of “long” scaffolds in the design of DNMT inhibitors (Yoo et al. 2013).

Integration of drug discovery tools

In spite of the apparent advantages of virtual screening, it must be noted that it is not without its own pitfalls (Scior et al. 2012; Kar and Roy 2013). Assimilating both SBDD and LBDD may be necessary to satisfy all the practical requirements in identifying a promising lead. Moreover, combining virtual screening and HTS tools can be used to form an efficient drug discovery workflow to overcome the limitations presented by either approach. Different integration workflows (Polgar and Keseru 2011; Tanrikulu et al. 2013) include: (a) Parallel integration consists of either simultaneous use of multiple virtual screening protocols before HTS and experimental validation or employing both virtual screening and HTS protocols in parallel. This method can provide enriched hit rates since it was often observed that different hits (from a different chemical space) are obtained from different virtual screening protocols (Clark et al. 2004; Pirard et al. 2005; Tomori et al. 2012), as well as from HTS (Doman et al. 2002; Polgar et al. 2005). However, this approach may not be appropriate for some studies as the list of compounds obtained through this method is often large in number. (b) Iterative or Sequential integration consists of combining computational and experimental tools to continually improve the selectivity of the screening workflow. Sequential virtual screening methods use successive filters in the research process to shrink the number of compounds before experimental evaluation (Prathipati et al. 2007; Drwal and Griffith 2013; Kumar and Zhang 2015). Here, the last step before biological testing of a compound is mostly done by visual inspection (includes ligand binding and conformation in the active site or the shape of the pharmacophore) and compound selection (Kumar and Zhang 2015). Another way is subjecting virtual hits to experimental validation, after which the information acquired from in vitro screening will be used to optimize the in silico model (Hofmann et al. 2008; Zander et al. 2010). Theoretically, this will produce more potent hits against the chosen target. (c) Focused integration uses computational methods as a pre-filtering technique to eliminate incompatible and unfavorable molecules and create a focused library for experimental screening (Kiss et al. 2008). Several other methods can be applied upon integration of virtual screening tools, such as interaction-based, pharmacophore-based, and similarity-based approaches (Wilson and Lill 2011). Certainly, the incorporation of computational drug design methods in any stage of the drug discovery process allows great information evolution that can lead to better and more desirable drug candidates.

The discovery and development of LY-517717 (Eli Lilly) for the prevention of venous thromboembolism in hip or knee surgery patients is a useful example of drug discovery integration, combining in silico de novo drug design, structure-based lead optimization, iterative synthesis, and assay studies (Jones et al. 2001; Liebeschuetz et al. 2002; Devabhakthuni et al. 2015). Benzamidine analogs were identified as potent factor Xa inhibitors and subsequent optimization of this compound class provided essential structure-based information. Similar to synthetic combinatorial chemistry, a series of factor Xa focused libraries were generated using de novo method in the PRO_SELECT program (Liebeschuetz et al. 2002). Based on the identified critical binding site interactions, the benzamidine analogs were optimized by changing and adding substituents while measuring the fit to the protein active site. In the end, the benzamidine moiety was replaced with an indole group to improve pharmacokinetic properties, yielding LY-517717 with a Ki value of 5 nM. Phase II clinical trial studies for this compound has already finished (Agnelli et al. 2007), and while continued evaluation is necessary, LY-517717 remains a promising candidate. In another study, parallel integration of virtual and high-throughput screening was applied in the discovery of potential inhibitors against the HIV-1 Nef protein. Computational studies (i.e. drug-like filtering, docking, and pharmacophore screening) were performed in conjunction with high-throughput screening assays for the Diversity compound library. In both methods, PubChem CID 308963 was identified as the most promising inhibitor (Betzi et al. 2007).


Computer aided drug design is a powerful tool in the search of promising drug candidates, particularly when used in tandem with current chemical biology screening techniques. Despite the fact that CADD makes use of several restrictions and approximations, this knowledge-driven approach has become an essential part in the drug design process due to its ability to fast-track drug discovery by utilizing existing knowledge and theories on receptor-ligand interactions, energy and structural optimization, and synthesis. Nonetheless, there is still room for further improvements, especially in the case of scoring functions, incorporating molecular flexibility and solvent effects, targeting receptors with little to no structural information, and increasing computational efficiency. To enhance current in silico tools, continuous developments in the field of chemical and structural biology, bioinformatics, and computational technology is necessary. With this, the numerous shortcomings of virtual drug discovery methods can be alleviated and the full potential of computational drug design can be achieved.