The discovery of safer and more effective new drugs remains critical despite substantial progress in medicinal chemistry and pharmacology [1,2,3]. New drug development is a lengthy and costly endeavor. According to current estimates, development and introduction into practice of a new drug averages 2.8 million US dollars and covers 8 – 10 years [4]. In Russia, development costs for a new drug are estimated at 150 – 200 million Russian rubles according to competitions of Minpromtorg under the FTP Pharma-2020 [5].

The risk of failure must be minimized considering the high cost and lengthy development of a new drug. Computational methods represent an effective approach to optimizing new drug design because the most promising candidates can be identified in early development stages and those with unsatisfactory pharmacological characteristics and toxicity can be deselected. This reduces the probability of obtaining negative outcomes in clinical trials.

Computer drug design takes two main approaches. One is based on the target structure; the other, on the ligand structure [6, 7]. In the first instance, 3D structural (steric) data (better in a complex with a ligand) are needed in order to analyze the interaction of the studied organic compounds with the macromolecular target. These data are obtained using x-ray crystal structure analysis, NMR, or molecular modeling. Molecular docking of potential ligands at the binding center can be simulated. A scoring function of the binding energy can be calculated. Then, the most promising compounds can be selected for experimental testing. In the second instance, a so-called training set containing information on the structures and biological activities of previously studied compounds is needed in order to construct a structure—activity relationship model. Quantitative models (QSAR) can be built if quantitative activity data are available. Classification models (SAR) are constructed if qualitative activity data (active/inactive) are known. Finally, the constructed models (Q)SAR are used to predict the activities of new compounds.

Similarity analysis of chemical structures is the simplest ligand-based method [8,9,10]. It is thought that “Moleculas similares exercem atividades farmacologicas similares” (similar molecules exhibit similar pharmacological activity) [11]. However, Kubinyi notes that universal methods for similarity analysis are lacking. This sometimes leads to the paradoxical conclusion that “chemically similar molecules exhibit different pharmacological activities” (actual examples were published [11]). A retrospective review of highthroughput screening data at Abbott [12] showed that compounds that were greater than 85% similar had similar biological activity in only 30% of the cases. Nevertheless, similarity searches are currently a routine tool in many chemical databases (DBs) for selecting chemical analogs (e.g., included in ChemNavigator DB [13]).

Examples of the practical application of an approach based on the macromolecular target structure [14,15,16] and the ligand structure [17,18,19] have been published.

Computer drug design developed rapidly in the last 20 years because the chemical and pharmaceutical industries were interested in improving such methods. The accuracy and throughput of the employed approaches increased consistently with each year. Currently, computational methods for predicting with satisfactory accuracy physicochemical properties, biological activity, metabolism, toxicity, and other characteristics of drug-like organic compounds have been developed [20]. They were created mainly as a result of the emergence of open access Internet sources containing information on the structures and biological activities of organic compounds. In addition to the well-known research DBs such as PubMed [21], PubChem [22], and ChEMBL [23], a large number of specialized information and computational resources of interest to specialists in new drug design and discovery have recently appeared.

The goal of the present work was to review the most significant open-access Internet biomedical and chemical web resources and to make recommendations for optimizing their use in various discovery and development stages of new safer and more effective drugs.

General review of biomedical and chemical web resources

New drug design requires information on the mechanisms of the pathological processes underlying a certain disease, the molecular targets that can normalize the pathological process, organic drug-like compounds that can interact with the molecular targets, drugs used currently for therapy of the actual disease, and biologically active compounds studied in early stage preclinical and clinical trials in the examined pharmacotherapeutic research area. Computational web resources in addition to information sources are also useful because they enable physicochemical characteristics, biological activity, side effects, and toxicity of particular drugs to be calculated.

Herein, both categories of web resources are discussed (Table 1). Table 1 gives brief descriptions and addresses of the examined Internet web resources that allow a researcher to obtain access directly to the information of interest. All numerical characteristics regarding the number of structures, pharmacological targets, biological test results, etc. are given as of November 2016.

Table 1 List of Internet Resources Examined in the Review

Information resources on disease generation mechanisms

The starting prerequisite for new drug development is information about the generation and development mechanism of the pathology. The ability to access this information expanded considerably with the appearance of open-access web resources that compile information about disease generation mechanisms. Several popular web resources of this type are discussed below.

DisGeNET [24] is a web resource that integrates information about gene-disease associations from open-access DBs and the literature. The current version contains 429,036 associations between 17,381 genomes and 15,093 diseases in addition to 72,870 associations between 46,589 single-nucleotide polymorphisms and 6,356 diseases. DisGeNET has the capability to search by disease name, gene identifier, and single-nucleotide polymorphism. Searches by disease name produce lists of genes and single-nucleotide polymorphisms associated with the disease in addition to lists of other diseases with which these genes are associated. DisGeNET collects data from several sources including Comparative Toxicogenomics Database, Universal Protein Resource, and ORPHANET and can filter data from a particular source. Furthermore, DisGeNET data can be loaded into a SQLite DB for further processing on a local computer.

OMIM [25] is a web resource containing information on human inherited diseases associated with 15,000 genes. Searches by disease name can be made in OMIM. Search results give a detailed description of the mechanism of a particular disease with literature references and additional resources such as Ensembl, NCBI RefSeq, UniProt, KEGG, Reactome, etc. Data can be exported from OMIM in TSV format. However, this requires registration. Also, an API (Application Programming Interface) can be used to access OMIM. The API is a set of allowed queries to the web service that can be used to extract the required data from it.

Information resources for network pharmacology

According to current opinions, disease development in most instances cannot be explained by defects in separate genes but by functional disruption of various gene groups. Network pharmacology methods, which consider functions of genes and proteins and consequences of their destruction in the context of various network types represented as graphs where genes (proteins) are peaks and their interactions, the ribs, are currently used to develop effective drugs. Network pharmacology methods allow disease generation mechanisms to be described on higher biological organization levels, pharmacological targets and their more efficacious combinations to be discovered, and possible side effects of affecting the target to be assessed. Numerous web resources and DBs facilitating these tasks are currently available [26]. Several popular resources are described below.

KEGG (Kyoto Encyclopedia of Genes and Genomes) [27] is a web resource combining several DBs and containing various information on genes, genomes, diseases, and drugs. The KEGG DB on signaling and metabolic pathways is the most requested one. Currently, KEGG can find information on 502 human and animal pathways represented as interactive charts that can also be exported as *.xml files. In addition to signaling and metabolic pathways, KEGG contains pathways describing disease development mechanisms and synthetic pathways for various drugs. Tools included in KEGG allow genes of interest to the user to be marked in color on charts so that the roles of genes in the signaling and metabolic pathway can be found.

DAVID (Database for Annotation, Visualization and Integrated Discovery) [28] is a web resource that integrates information from various DBs and performs functional annotation of gene lists obtained in genomic and proteomic research. DAVID makes it possible to annotate genes by function and localization in the cell, signaling and metabolic pathways, tissue expression, and disease relationships. DAVID uses so-called enrichment analysis based on identification of pathways, processes, tissues, etc. that are supersaturated by the studied genes as compared with a reference group that includes all human genes [29]. Information about disease development mechanisms at higher biological organization levels than separate genes, e.g., at the level of signaling pathways and cellular pathological processes, can be obtained from this analysis.

HIPPIE (Human Integrated Protein—Protein Interaction rEference) [30] is a resource providing access to information on protein—protein interactions in human cells. Analyses of the topology of protein—protein interaction networks are widely used in searches for therapeutic targets of human diseases [26]. However, the completeness and quality of the available data must be considered in constructing such networks. HIPPIE integrates data for 282,472 interactions of 16,829 human proteins that were obtained from numerous sources. Each interaction is assigned a confidence from 0 to 1 that is calculated from (1) the number of studies in which it was recorded, (2) the number and type of corresponding experimental methods, and (3) the presence/absence of known interactions between protein orthologs. HIPPIE can find a network fragment that includes proteins set by the user and their closest neighbors considering different limitations on the functions and protein tissue expression in addition to the probability of their interactions. All data on protein—protein interactions can be exported in various formats.

Information resources on pharmacological targets

A key stage in new drug development is the identification of potential pharmacological targets that can be affected to normalize the corresponding pathologies. Several open-access information resources are now available on the Internet and contain information on pharmacological targets. Several of the currently more popular resources of this type are examined in this section.

IUPHAR/BPS [31] is an information resource that was developed through collaboration of the British Pharmacological Society (BPS) and International Union of Basic and Clinical Pharmacology (IUPHAR). This web resource gathers information on various protein targets. The resource contains information on 2,789 human pharmacological targets including 1,176 enzymes, 508 transport proteins, 395 GPCR receptors, 272 ion channels, 240 catalytic receptors, and several other proteins. Reference ligands interacting with all targets are indicated. Each ligand is described briefly. Hyperlinks to references in PubChem [22] and ChEMBL [23] DBs are also given. The resource contains descriptions of 8,611 ligands, of which 5,515 are synthetic organic compounds, 2,035 peptides, 1,290 drugs allowed for medical use, and several other pharmacological substances. IUPHAR/BPS provides for searching by target name, tissue type affected by the drug, and other terms. The amino-acid sequence of the protein-target and identifiers from DBs such as UniProt, ChEMBL, Ensembl, PDB, etc. can also be searched.

PHAROS [32] is a web interface for accessing the Target Central Resource Database (TCRD). PHAROS was developed by staff at the University of New Mexico, NIH-NCATS, Icahn School of Medicine at Mt. Sinai, EMBL-EBI, and the University of Miami. TCRD collects data on human pharmacological targets and diseases and ligands related to them. TCRD contains four classes of targets including 48 nuclear receptors, 342 ion channels, 578 kinases, and 827 GPCR. All targets included in PHAROS are divided into four categories (Tbio, Tdark, Tchem, Tclin) according to their development level starting with targets for which drugs with well-known mechanisms of action are known and finishing with practically unstudied targets. PHAROS visualizes TCRD data using various diagrams and also offers tools for filtering data according to various parameters such as target, disease, ligand, tissue type, and many others. Furthermore, TCRD contains an API [33] that allows communication with the DB and extraction of data in JSON format, i.e., one of the most common formats for data exchange. JSON format has an extremely simple syntax that simplifies both arithmetic data processing and the human—machine interface. The large number of commercial computer programs written in this format is indicative of its popularity.

Therapeutic Targets Database (TTD) [34, 35] contains information on human proteins and nucleic acids that are drug targets. Diseases related to and drugs interacting with each target are given. Moreover, TTD contains references to supplemental resources, e.g., PDB [36], which contain more detailed information on targets such as amino-acid sequences, 3D structures, etc. TTD version 4.3.02 contains information on 2,025 targets and 17,816 drugs.

Resources such as DrugBank [37], PubChem [22], and ChEMBL [23], which will be examined in more detail in subsequent sections, also contain information on pharmacological targets in addition to the web services described in this section. It is noteworthy that other specialized DBs on pharmacological targets are also known. Information on them can be found in reviews [38, 39].

Information resources on drugs

DrugBank [37, 40] contains an open-access DB that was created and is maintained by the Canadian Institutes of Health Research (CIHR) and The Metabolomics Innovation Centre (TMIC). DrugBank includes chemical structures and describes pharmacological properties of >8,000 drugs in addition to the amino-acid sequences of targets to which they can bind. DrugBank DB version 5 describes 4,333 molecular targets; 8,206 drugs, of which 1,991 are approved by the US Food and Drug Administration (FDA); 93, biologically active additives; and 6,000, experimental drugs. Queries to the DrugBank DB can be made through a web interface [37] that allows searches for drugs by name, structural formula, and amino-acid sequence of protein-targets. Information in XML format can be extracted from the DrugBank DB for subsequent local automated processing.

SIDER [41, 42] is a DB containing information on 1,430 drugs and the side effects caused by them. The DB also contains information on the frequency of incidence of one side effect or another, anatomical-therapeutic-chemical drug classification, and references to additional information sources. The web interface [42] can search by drug name. Information from this DB can also be extracted for subsequent local computer processing.

Information resources on pharmacological substances

PubChem [22, 43] is an open-access DB that is maintained by the National Center for Biotechnology Information (NCBI, USA) and contains information on >92 million chemical compounds and >2.2 million biological test results on interaction with >10,000 molecular targets. PubChem can be used in several ways. First, the DB can be accessed through a web interface [22] that allows chemical compounds to be searched by name and structural formula. Structural data in various formats such as SMILES, InChi, and SDF can also be exported. However, data for >500,000 compounds cannot be exported in one iteration. Furthermore, PubChem can be programmed through an API [44].

ChEMBL [16, 45] was created and is supported by the European Bioinformatics Institute (EBI) (Great Britain). ChEMBL version 21 contains information on 1,592,191 different chemical compounds, 11,019 targets, and 13,967,816 activities. The ChEMBL DB can be accessed through a web interface in which searches can be made using keywords or structural formulas of low-molecular-mass organic compounds. Furthermore, information can be exported from the DB as files in SDF or SQL format for subsequent loading into widely used DB software (Oracle, MySQL, PostgreSQL) and local automated processing. The developers of this DB also provided a mechanism for obtaining access to ChEMBL using an appropriate API [46].

ChEBI [47, 48] (Chemical Entities of Biological Interest) was created and is maintained by the EBI. Version 146 of the ChEBI DB contains annotated information on 50,383 biologically active compounds that was collected from various sources and was analyzed by experts. Searches can be made using various compound names, registration numbers, structural formulas, etc. Search results contain references to additional information sources (DrugBank, KEGG, Wikipedia, etc.) and can be loaded as files in TSV, XML, and SDF format. Structural formulas and all supplementary information from this DB can be exported in whole as an SDF file. It is noteworthy that this is apparently the only foreign DB that supports a Russian user interface [49].

Information resources on chemical compounds

ChemSpider [50, 51] was created and is maintained by the Royal Society of Chemistry. It is an open-access DB that provides access to structural and test information on 58 million chemical compounds that was collected from 500 various sources. Searches can be made using various compound names, registration numbers, structural formulas, etc. Structural fragments and structural analogs can be searched. Search results can be examined on the screen. References to external information sources including patents, lists of actual chemical suppliers, etc. are given.

ChemNavigator [52] provides information on 91.5 million chemical compounds (>60 million unique structures) gleaned from data provided from >200 suppliers in various countries. Structural formulas, structure fragments, and structural analogs can be searched. The user can order directly on the website [52] commercial samples of interesting chemical compounds contained in the ChemNavigator DB for experimental testing of their biological activity. Registered users can also request that the compounds required by them (supply conditions are negotiated during discussions) be synthesized.

Other DBs on chemical compounds in addition to ChemSpider and ChemNavigator are available over the Internet. They both collect information from various suppliers (e.g., ZINC [53] and MolPort [54]) and present libraries of chemical compounds from separate companies (InterBioScreen [55], ChemBridge [56], etc.).

Computational web resources

VCC Lab (Virtual Computational Chemistry Laboratory) [57, 58] facilitates the calculation of various structure descriptors and physicochemical properties of chemical compounds including solubility assessments of chemical compounds in water and n-octane-water distribution coefficients (log P); the analysis of structure—activity relationships based on similarity analysis; the construction of regression equations using partial least-squares methods (PLS), etc.

Chembench [59, 60] supports various chemical-information studies by supplying computational resources for constructing QSAR/QSPR models and predicting the activity and properties of new compounds. Registered users can upload their own training set containing information on the structures and properties of previously studied compounds as an SDF file. Structure—activity or structure—property relationships are modeled using the training set. Then, the resulting models are used to predict the properties or biological activities of new compounds.

Web-portal MPDS (Molecular Property Diagnostic Suite) [61] is a collaborative development of several Indian institutes including the Indian Institute of Chemical Technology, Institute of Microbial Technology, National Institute of Pharmaceutical Education and Research, etc. The main task of the portal is to supply researchers with computer tools for assessing various properties of chemical compounds. MPDS is based on the Galaxy platform for creating web resources that aggregate tools designed to solve computational biology problems. Any design based on Galaxy contains by default a tool set for performing standard chemical-information tasks such as calculating very simple statistical parameters for certain data, constructing linear regression models, combining and separating table data, and converting structural data among various formats. Galaxy allows developers to add their own program modules.

MPDS collects data for low-molecular-mass chemicals from DBs such as Zinc (126,369,246 compounds), PubChem (71,575,000 compounds), NCI (265,242 compounds), KEGG (10,384 compounds), and DrugBank (22,257 compounds) and provides tools for searching compounds according to properties, structures, substructures, and fingerprints. A fingerprint is usually a bit-stream that codes information about certain properties of the associated item. For example, fingerprints of chemical compounds can code information such as the number of rings in the compound structural formula and the presence or absence of some functional groups or others. The portal also allows descriptors of chemical compounds to be calculated using PaDEL [62] and CDK [63] tools, QSAR models to be constructed using McQSAR [64] and SVMLight [65], activities of new compounds to be predicted using them, and ligand docking with proteins to be performed using Autodock Vina [66]. Each of the MPDS program modules at the portal can be used separately or in workflows, for which graphical interfaces are provided.

The Galaxy platform comprises an API that can be used for automatic access to web resources operating on Galaxy. The Bioblend library for Python language was created by Galaxy developers for convenience of using this API.

Web-portal Way2Drug [67] was developed and is maintained by staff of the Laboratory for Structure-Functional Based Drug Design, Institute of Biomedical Chemistry. The computer program PASS Online (previously called PASS INet) became the first open-access web service [68,69,70]. Currently, it can predict about 4,000 types of biological activity. The good predictive capability of PASS Online has been confirmed by years of experience with this resource by almost 15,000 researchers from 90 countries. Hundreds of predictions of biological activity by this web service were confirmed experimentally and reviewed [71,72,73]. PASS Online has demonstrated advantages over other open-access Internet web services that predict drug biological activity profiles [74, 75].

The capabilities for computed prediction of drug properties that are provided by these web resources have expanded considerably in recent years. In particular, the GUSAR computer program [76] could be used to predict acute toxicity for rats after four administration modes [77] and interaction with undesired molecular targets [78]. The web services PASS Targets, which predicts interaction with ~2,500 molecular targets [79], and CLC-Pred, which predicts cytotoxicity against tumor and non-tumor cell lines [80], were created based on data extracted from the ChEMBL DB [23]. The web service MetaTox, which predicts structures and toxicities of metabolites of organic xenobiotics [81], is in operation. A collaborative Russian—Indian project is working on integrating web services based on the Way2Drug and MPDS platforms.

Conclusion

Computer drug design methods have now become an essential part of new drug development [20].

Figure 1 shows how computational methods can be used to design new drugs [82]. Computational methods “from genomes to drugs in silico” [83] are used in all new drug design stages from idea to pharmacy. They can be used to establish mechanisms of pathologies that lead to diseases, to identify promising pharmacological targets for producing pathogenic and ideally etiotropic therapy, and to discover drugs with an array of desired pharmacodynamic and pharmacokinetic characteristics.

Fig. 1
figure 1

Diagram of computational methods in various drug development stages.

Computational methods are based on using established biomedical information for the involvement of separate biological systems on the molecular, cellular, organ-tissue, and organism levels in physiological and pathogenic processes and for biologically active compounds that modulate some physiological functions or others at all levels of the organism under normal and pathological conditions. An enormous amount of information has recently been generated by using high-throughput post-genomic medical technologies [84, 85].

Post-genomic research has generated an enormous amount of varied biomedical and clinical data (Big Data), storage, processing, and analysis of which requires the creation and use of special biological- and chemical-informatics methods, computer programs using these methods, DBs, and knowledge bases. Vigorous work in this area led to an unexpected conclusion, i.e., highly accurate computer predictions must be derived from data of the corresponding quality [86,87,88,89]. Observations over many years of the evolution of web resources available over the Internet such as PubChem, ChEMBL, and ChemSpider led to the conclusion that their developers applied significant efforts to improve the quality of the information presented to researchers. Also, the variety of experimental methods for testing biological activity that is observed in the literature is due to the constant improvement of pharmacology. Training sets of the greatest homogeneity should be used to construct structure—property relationships [88].

Herein, the most significant open-access information and computation resources were reviewed. They can be used to increase the efficiency of searching and designing new drugs. Additional information of interest to the reader can be found in recently published reviews [26, 38, 39, 90, 91].