Background

Bioinformatics uses statistics and computer science to process heterogeneous biological data, which provides opportunities for understanding disease genetics, biological processes and identifying therapeutic targets [1]. During the last decades, drug discoveries have tuned into the combination of experimental approaches and modern science of computational. Various tools and techniques have been used for target identification, enrichment analysis and network algorithm. To date, several in silico bioinformatic methods have been developed and applied [2,3]. Drug targets have been predicted by using chemical two-dimensional structural similarity approach and Bipartite graph learning method [4], in which inverse-docking approach plays an important role in target identification [5,6].

Inverse docking is a novel technology that can dock a compound with known biological activity into the binding sites of all 3D structures in given protein database [7]. The procedure of docking involves multiple conformer shape-matching alignment of drug molecule to a cavity followed by molecular-mechanics torsion optimization and energy minimization on both the molecule and the protein residues at the binding region [8]. And the screening is conducted by the evaluation of molecular mechanics energy, and the potential protein ‘hits’ can be selected by further analysis of binding competitiveness against other ligands that bind to the same receptor site [9]. Further, the most commonly used drug target database is potential drug target database (PDTD, http://www.dddc.ac.cn/pdtd/), which it contains 1207 entries covering 841 known and potential drug targets with structures from the Protein Data Bank (PDB) [10]. There are also a number of academic or commercial available pathway databases and network building tools, such as MetaCore™ and Integrity SM [11,12]. MetaCore™ is one of the most suitable tools for functional mining of large, inherently noisy experimental datasets, and the network visualization of drug-target, target disease and disease-gene associations can provide useful information for studies of therapeutic indications and adverse effects of drugs [13,14].

Traditional Chinese medicines (TCMs) have been used to treat many diseases for thousands of years. Dioscin, a natural steroidal saponin, exists in many Chinese medical herbs including Dioscorea nipponica Makino, Dioscorea zingiberensis C. H. Wright and Dioscorea futschauensis Uline. Pharmarcological studies have showed that dioscin has anti-tumor, anti-hyperlipidemic, anti-fungal and anti-virus activities [15-18]. And our previous studies showed that dioscin has significant hepatoprotective effects on carbon tetrachloride (CCl4) and acetaminophen induced liver damage in mice [19-21]. In the future, more and more researches of dioscin will be investigated because of its important medical value. How will the studies be defined in terms of targets priorities, biological activities, signal pathways and regulating networks affected by the compound? In routine works, the experiments with a lot of blindness should be carried out step-by-step [22,23], and they will last a long time with time consuming and laborious. Thus, a prediction of the drug targets, biological activity, signal pathways and regulatory pathways is necessary, and it will provide complementary and supporting evidence for the next experiments studies.

In the present paper, the drug-targets were predicted based on inverse docking, and enrichment analysis and network assays of dioscin were carried out by GeneGo’s MetaCore™ techniques. Some possible targets, biological activities, signal pathways and regulating networks of dioscin were predicted in advance, which should provide useful information for further investigation.

Methods

System

In the present paper, 2D chemical structure of dioscin was sketched using MarvinSketch (http://www.chemaxon.com), and three-dimension (3D) structure of dioscin was constructed using ISIS/Draw (ISIS/Draw, MDL Information Systems, Inc., San Leandro, CA, USA). Then, the identification and validation of all potential targets of dioscin were carried out by MDock software. The MDock is automated molecular docking software for simultaneously docking dioscin with known/available three-dimension crystal structure against drug targets from PDTD with multiple protein structure/conformations downloaded from RCSB Protein Data Bank (PDB) by using the ensemble docking algorithm [24]. After that, MetaCore platform was used to analysis the biological activities, signal pathways and regulating networks, which is a suite of software oriented toward understanding the function of gene sets discovered by expression analysis (Table 1) and based on a proprietary manually curated database of protein-protein, and protein-DNA interactions, metabolic and signaling pathways. The analysis process of the prediction of drug targets, biological activities, signal pathway and regulating networks is shown in Figure 1.

Table 1 Tools and databases of MetaCore platform
Figure 1
figure 1

The analysis process of the prediction of drug targets, biological activities, signal pathway and regulating networks.

Target protein screening

In the procedure of screening targets, the chemical structure of dioscin was sketched to three-dimension structure by ISIS/Draw for identifying potential biological targets. Then, the structure file was uploaded to the MDock software, the binding site analysis was applied with PDTD. In the procedure, the binding site analysis enable to identify and characterize a protein’s binding site, then use those characteristics to look for similar features in other proteins. The active site and binding energy (kcal/mol) of them were defined and calculated, when the potential drug target proteins were founded. And the threshold value of predicted binding energy was–50.0 and other options used default settings to screen the high binding target protein.

Gene ontology analysis

Enrichment analysis

The corresponding gene of screened target proteins were uploaded to MetaCore platform, and the functional analyses of related data were worked out by ontology enrichment analysis based gene ontology. Here, the category of GeneGo biological processes were chosen, which includes some prebuilt molecular interaction networks, including protein-protein and protein-compounds metabolites, protein-nucleic acid interactions between all the networks.

At first, the data should be activated. When analyzing data, make sure that the threshold is set appropriately. In the analysis settings, the fold change threshold was set at 0.001, and the p-value threshold was 0.05. Namely, any genes with fold change values less than 0.001 were filtered out. Meanwhile, p-value threshold filters out genes with a p-value of more than 0.05. Other parameters also were set before analysis, such as signals was set as both, sorting method was set as statistically significant.

The enrichment analysis consists of matching gene IDs of possible targets for the “common”, “similar” and “unique” sets with gene IDs in functional ontologies in MetaCore. The probability of a random intersection between a set of IDs of the size of listed targets with ontology entities is estimated in p-value of hypergeometric intersection. The lower p-value means higher relevance of the entity to the dataset, which shows in higher rating for the entity. The ontologies include GeneGo Pathway Maps, GeneGo process Networks, Go Processes and GeneGo Diseases (by Biomarkers). The degree of relevance to different categories for the uploaded datasets is defined by p-values, and so the lower p-value gets higher priority. The distributions are calculated and showed as histograms of 10 most significant results (ranked by the-log (p-value)).

Most relevant networks analysis

The gene of the uploaded files is used as the input list for gene statistics analysis, the correlation between gene and network objects were obtained, and the intensity corresponds to the expression values were provided. Then, the obtained results were used for network statistics based networks built from active experiments, the relevant network objects of each network were listed, including the divergence hubs convergence hubs, edges in and edges out. The generation of biological networks uses Analyze Networks (AN) algorithm with default settings. This is a variant of the shortest paths algorithm with main parameters of relative enrichment with the uploaded data, and relative saturation of networks with canonical pathways. These networks are built on the fly and unique for the uploaded data. In this workflow the networks are prioritized based on the number of fragments of canonical pathways on the network.

Results and discussion

Target proteins

Inverse docking was used to identify new potential biological targets, or to identify target for components among a family of related receptors. In the present paper, 71 potential targets of dioscin identified from humans’ proteins, 7 from rats and 8 from mice were screened by MDock software. These target proteins belong to enzymes, G-protein-coupled, receptors, ion channels, and nuclear receptors, which are listed in Table 2.

Table 2 The target proteins of dioscin from human, rat and mouse searched in PDTD

Enrichment analysis

GeneGo pathway Maps

It is generally recognized that the pathway-based analysis can provide much significant information. Canonical pathway maps represent a set of about 650 signaling and metabolic maps covering human biology (signaling and metabolism) in a comprehensive way. The profile of network objects (Table 3) was uploaded to search canonical pathway maps. All maps are drawn from scratch by GeneGo annotators and manually curated & edited. From the distributions shown in Figure 2, the most significantly multistep pathways from literature consensus were enriched in the data set. Experimental data is visualized on the maps as yellow (for down-regulation) histograms. The height of the histogram corresponds to the relative expression value for a particular gene/protein.

Table 3 The Network objects and its functions
Figure 2
figure 2

GeneGo Diseases (by Biomarkers) of human, rat and mouse. Sorting is done for the “Statistically significant Diseases” set.

Top scored pathway maps were sorted by statistically significant Maps (Figure 3). The top scored pathway maps were immune response alternative complement pathway, G-protein signaling_RhoB regulation pathway and immune response antiviral actions of interferons, respectively. Experimental data from all files is linked to and visualized on the maps as thermometer-like figures. Up-ward thermometers have red color and indicate up-regulated signals and down-ward (blue) ones indicate down-regulated expression levels of the genes.

Figure 3
figure 3

GeneGo Pathway Maps. Sorting is done for the “Statistically significant Maps” set.

Complement system can protect the host from microorganisms [25], and the alternative pathway can be directly activated by invading microorganisms. C3/C5 convertases which are complex enzymes transiently assembled on the surface of biological organisms upon activation of the complement system [26]. The generation of active C3/C5 convertases help opsonize, kill, and clear bacteria, parasites and pathogens by eliciting cellular functions including phagocytosis and inflammation [27]. In the map of human, it is initiated by the spontaneous hydrolysis of C3 which is a major effector of humoral branch of the complement system. The down-regulation of C3 treated by dioscin contributes to the down-regulation of C3a and C3b, and then they induce the down-regulation of C5 convertase. For the linkage effect, the cleavage fragments binding to specific receptors suffer affection, including CR1, C3aR, alpha-M/beta-2 integrin, alpha-X/beta-2 intetrin and CD21.

In the map of RhoB regulation pathway, RhoB is a member of small GTPases family and can control multiple cellular processes, including actin and microtubule dynamics, gene expression, cell cycle, cell polarity and membrane transport. Their abilities are bound to numerous downstream effectors which lead to diverse parallel downstream signaling pathways [28]. There are several classes of regulatory proteins affect the activation of RhoB. Among them, GGTase-I (Geranylgeranyltrans-ferase type I) and FTase (Farnesyltransferase CAAX box) promote post-translational modification of RhoB (Ras homolog gene family, member A) protein by geranylgeranylation and farnesylation, which are essential for the biological activity of RhoB. In the prediction, down-regulated expression of RhoB gene was induced by down-regulations of GGTase-I and FTase treated with dioscin.

In the map of immune response antiviral actions of interferons, iNOS (inducible NO synthase) was the network object. iNOS generates copious amounts of NO presumably to help kill or inhibit the growth of invading microorganisms or neoplastic tissue [29]. Over-expression of iNOS, a common phenomenon during chronic inflammatory conditions, generates sustainable amounts of NO. Its reactive intermediates are mutagenic, causing DNA damage or impairment of DNA repair. Recent studies also implicated NO as a key signaling molecule which can regulate the processes of tumorigenesis. Increased expression of iNOS is involved in tumors of the colon, lung, oropharynx, reproductive organs, breast, and CNS (Central Nervous System) [30]. Thus, the map indicated that dioscin can down regulate the expression level of iNOS gene. Namely, it may be a selective inhibitor of iNOS for chemoprevention of cancer.

GeneGo process networks and Go processes

In the GeneGo process networks analysis, sorting is done for the ‘Statistically significant Networks’ set. There are about 110 cellular and molecular processes whose content is defined and annotated by GeneGo. According to the experimental data (Table 3), ten processes networks with lower p-value were obtain (Table 4). In Go processes, the original Gene Ontology (GO) cellular processes, represented at GeneGo were included. Since most of GO processes have no gene/protein content, the “empty terms” are excluded from p-value calculations, and ten processes with lower p-values were obtain (Table 4). The results are all consistent with GeneGo pathway maps, they were associated with immune response, inflammation and cell cycle signaling, the Go processes include regulation of immune response, DNA replication, RNA transport, protein amino acid famesylation, and regulation of cell killing etc. . GeneGo Diseases (by Biomarkers).

Table 4 Ten GeneGo process networks and Go processes with lower p-values

MetaCore can be used for uploading experimental data (Table 3) to discover and validate biomarker. Using bioinformatics approaches, numerous candidate biomarkers associated with the development or prognosis of human disease were reported. Disease folders represent over 500 human diseases with gene content annotated by GeneGo. Disease folders are organized into a hierarchical tree.

In the paper, the enriched disease was detected by the biomarkers. Using network objects known (Table 3) to be associated with dioscin as set of interest, the frequency was recomputed by summing object occurrences for disease. Then, p-values were obtained, which assumes that the probability of picking a network objects annotated with a disease in the reference set. The results are shown in Figure 4. Gene contents may be different greatly between two complex diseases such as cancers and Mendelian diseases. Also, coverages of different diseases in literature are skewed. The two factors may affect p-value prioritization.

Figure 4
figure 4

The top scored map (map with the lowest p-value) of human, rat and mouse based on the enrichment distribution sorted by ‘Statistically significant Maps’ set. Experimental data from all files is linked to and visualized on the maos as thermometer-like figures. Up-ward thermometers have red color and indicate up-regulated signals and down-ward (blue) ones indicate down regulated expression levels of genes.

For human, dioscin may associate itself with experimental autoimmune encephalomyelitis, experimental nervous system autoimmune disease, hypertrophic cicatrix, cicatrix, hemolytic-uremic syndrome, complex and mixed neoplasms, uremia, toxic hepatitis, drug-induced chronic hepatitis, brain ischemia. For rat, dioscin may associate itself with temporal lobe epilepsy, partial epilepsies, amyotrophic lateral sclerosis, melanoma, motor neuron disease, nevi and melanomas, spinal cord diseases, neuroendocrine tumors, hamartoma and epilepsy. For mouse, dioscin may associate itself with ankylosing spondylitis, ulcerative colitis, spondylitis, ankylosis, infectious bone diseases, colitis, spondylarthropathies, spondylarthritis, autoimmune hepatitis, spinal diseases.

The most relevant networks

The network analysis can provide primary information about physical connectivity and functional relationships between proteins/genes. MetaCore database is suitability for manually curated interactions database over 90% human proteins with known function [31]. MetaCore has four “Analyze” network algorithms which are useful when we have a large number of network objects. Among them, analyze network creates a large network and breaks it up into smaller sub-networks which are all ranked by p-value. And analyze transcription regulation works in a similar way. The other two “Analyze” network algorithms (transcription factors and receptors) focus on the presence of either start-nodes or end-nodes of a certain pathway. In the paper, the biological networks were created by Analyze networks algorithm, and the related objects used for network building are listed in Table 3.

As all objects on the networks are annotated, they can be associated with one or more cellular functions including DNA repair, cell cycle or apoptosis. The networks can be scored and prioritized based on statistical “relevance” in the function processes and maps. Each network is associated with a g-score and p-value. The priority can be defined as a proportion of the nodes with the data to the total number of nodes on the networks measured with z-score value. In general, a high positive g-score means it is highly saturated with genes from the experiment data.

The g-score, p-values and z-score of networks are listed in Table 5, and the top two networks of each species are shown in Figure 5. Relative intensity corresponds to the expression value. Unregulated genes are marked with red circles, while down regulated genes with blue circles. The ‘checkerboard’ color indicates mixed expression for the gene between files or between multiple tags for the same gene.

Table 5 g-score, z-score and p-value of the most relevant networks
Figure 5
figure 5

The top two networks scored by MetaCore (AN network) of human (A and B), rat (C and D) and mouse (E and F). A: mRNA metabolic process, RNA metabolic process and response to chemical stimulus of human; B: cell division, mitosis and nuclear division; C: peptidyl-cysteine S-nitrosylation, drug catabolic process and exogenous drug catabolic process; D: regulation of metal ion transport, ion transport and negative regulation of potassium ion transport; E: N-acetylglucosamine biosynthetic, UDP-N-acetylglucosamine biosynthetic and glucosamine biosynthetic process; F: positive regulation of immune system, regulation of immune system and response to stimulus. Thick cyan lines indicate the fragments of canonical pathways. Upregulated genes are marked with red circles; downregulated with blue circles. The ‘checkerboard’ color indicates mixed expression for the gene between files or between multiple tags for the same gene.

The network (p=4.35e-39, g-score=93.09) resulting from the experiment data is shown in Figure 5A. The c-Myc is divergence hub, and the ESR1 (nuclear) is a convergence hub in the network. The c-Myc protein is a key transcriptional factor, and it is almost universally involved in cell cycle progression, transformation and apoptosis through targeting of downstream genes [32]. ESR1 (Arabidopsis Enhancer of Shoot Regeneration 1) was identified as a gene that enhance the in vitro shoot regeneration efficiency when over-expressed [33]. The network includes 16 possible targets including ARL5, TOP1, HBB, KNSL1, CBP80, CBP20, Calcineurin B, RENT1, SNRPA, CSTA1, CES1, C3, EGFR, EGF, Alphal-globin and PDESA. EGF is a metastatic inducer of tumor cells, which activates epidermal growth factor receptor (EGFR)-induced signal pathway to induce cancer metastasis [34]. CES1 is the most versatile human carboxylesterase, and it plays critical roles in drug metabolism and lipid mobilization. Excessive induction of CES1 provides a mechanism for potential anti-oxidants protective effect on human health [35]. The cap-binding protein heterodimer CBP80-CBP20 initially undergo a pioneer round of translation of newly synthesixed messenger ribonucleoproteins (mRNPs) [36]. C3 is a main complement in complement pathway [37]. TOP1 unwinds DNA by making transient single strand breaks that relieves the tosion of supercoiled DNA.

The network (p=41.72e−21, g-score=55.79) is shown in Figure 5B, which contains 9 of drug targets, including Cyclin A, Cyclin A2, HBB, C3a, TOP1, KNSL1, NUP98, TK1 and FBSW7. The divergence hub of the network was c-Jun, and the convergence hubs were c-Jun, APC/Hcdh1 complex and dTMP cytoplasm. The c-Jun N-terminal kinase (JNK) signaling pathway plays a critical role in inflammation _ complement system [38]. JNK can be activated by exposure of cells to cytokines or environmental stress, indicating that this signaling pathway may contribute to inflammatory responses [39]. And genetic and biochemical studies demonstrate that this signaling pathway also regulates cellular proliferation, apoptosis and tissue morphogenesis [40].

The networks of rat are shown in Figure 5C and D, and the networks of mouse are shown in Figure 5E and F. From the networks, well connected clusters of root nodes were found, and more flexibility in the connection were presented. In Figure 5C, mitomycin C can treat a variety of malignancies, such as head and neck cancers and superficial transitional cell carcinoma of the bladder [41]. The nNOS and mitomycin C is involved in a signal pathway, nNOS is down regulated by dioscin while mitomycin C is up-regulation. Thus, dioscin could be used to anti-cancer through that pathway. In Figure 5E, N-acetylglucosamine-1-phosphate catalyzes the formation of UDP-GlcNAC, which is an essential precursor of petidoglycan and the rhamnose-GlcNAc linker region of mycobacterial cell wall [42]. Thus, dioscin may be a potential anti-infections drug through down-regulation of UAP1.

All those results indicated that dioscin may exert biological effects through multi-channel. Such as, dioscin is a TOP1 inhibitor, inhibits relegation and stabilizes the DNA-TOP1 complex in the cleaved DNA form, ultimately leading to breaks of DNA chains and cell death. Thus, the dioscin could used to treat cancer though the cell cycle–transition and termination of DNA replication pathway. And it could inhibit cancer metastasis through EGFR-induced signal pathway. In addition, it could be used to treat inflammation though JNK signaling pathway. However, the dioscin may induce some side effect by down-regulation of complement system.

Conclusions

In the paper, we presented an application of in-silico inverse docking technique coupled with bioinformatics approach to predict the possible targets, biological activities, signal pathways and regulating networks of dioscin. Those studies provide valuable information for future in vitro and in vivo works to validate the previous in silico findings.

Availability and supporting data

MetaCore is available at http://www.genego.com.