Complex behavior and networked structures emerge in systems composed of many interacting elements [14]. The exploration of large databases from drastically different systems has allowed us to construct complex networks and uncover their organizing principles in many disparate fields, from large communication networks [5, 6], transportation infrastructures [7] and social communities [8, 9] to biological systems [1, 10]. Recent drug development strategies suggest that multi-target drug design combined with network-oriented approaches are promising to combat complex multi-genetic disorders [1113]. However, this raises the question of what are the direct and indirect network-dependent effects. Addressing such questions requires an accurate description and fundamental knowledge of the large-scale interactions between drugs and therapies, as well as between drugs and diseases.

Breathtaking advances in pharmacology and medical science together with improvements in storage and data management have made it possible to organize and classify huge amount of information from drugs and associated diseases and therapies. The DrugBank database [14] is one of the largest chemo-informatics resources and contains detailed information about approved drugs and drug targets. The drug category also includes information about their associated therapeutic properties following the Anatomic Therapeutic Chemical (ATC) classification [15]. This knowledge allowed us for the first time to investigate the human network corresponding to the interactions between all US approved drugs and associated human therapies, defined by known drug-therapy relationships. This network defines a bipartite graph [16] whose nodes can be classified into two disjoint sets of drugs (D) and therapies (T) such that each edge connects a node in D and one in T (thus, it is not possible to find two adjacent nodes within the same set). This bipartite graph can be decomposed into two networks. The drug projection is composed of nodes from the set D and two drugs are connected if there is a common therapy that is involved in both. The therapy projection is based on nodes from the set T and two therapies are connected if a drug implicated in both therapies exists. Therapies are closely linked to diseases, therefore the therapy network gives insights about the relations between diseases as well, completing previous work about the global organization of the human disease network [17, 18].

Network analyses based on the bipartite graph and the associated network projections here reveal striking properties that characterize this global map of drug-therapy interactions. Our findings indicate that the network has a small average shortest-path length. In particular, the average distance between therapies is less than three steps, suggesting that distant therapies are separated by a low number of chemicals. In addition, our results indicate that much of the chemical information flowing through the network is routed through a small number of drug hubs.

In order to identify the main set of drugs/therapies that governs the network, we computed several network centrality metrics [19] that characterize the most influential nodes in the network. Next, exploitation of the correlations between pairs of different metrics provides a complementary perspective on the heterogeneous statistical properties of the network. We identified a sub-network composed by drugs with high betweenness centrality in the drug-therapy network, which represent the structural backbone of this system. Identified drugs with highest centrality include Scopolamine, Morphine, Tretinoin and Magnesium Sulfate. Special attention should be given to drugs which combine the two properties of (a) having a high centrality value in the drug-therapy network and (b) acting on multiple molecular targets in the human system.


Drug and therapy networks

The hierarchical structure of the ATC classification makes it possible to represent drug and therapy networks at five different levels, progressively revealing more details on interaction patterns. A striking observation is that the therapy network is fully connected at level 1 (Figure 1 and Additional file 1), and still almost fully connected at level 2 with the exception of one small isolated component. This finding is unexpected, since many drugs only have one specific therapeutic application (Figure 2). By computing the number of connections of each therapy network node (i.e., node degree k) at level 3, we found that the degree distribution follows a power-law P(k) ∝ k-γ, with degree exponent γ = 1.1 [20]. That is, the probability to find highly connected therapies, or hubs, is rather higher than in an equivalent random network. Furthermore, the smaller the value of γ, the more influential the role of the hubs is in the network. We thus conclude that these highly connected therapies play a relevant role in this network because the observed degree exponent is not high. In addition, scale-freeness of the therapy network is conserved in the hierarchy of the ATC classification, as it is observed in both levels 2 and 3 (Figure 3a–b). On the other hand, the drug projection at level 3 shows a skew degree distribution with a broad tail and saturation curve for low degrees (Additional file 2). This pattern resembles the observed distributions in other bipartite networks [21].

Figure 1
figure 1

a, b: The therapy network at level 1 (a) and 2 (b). Nodes are colored according to the first level of the ATC classification. The size of nodes is proportional to the number of therapies in the class. The thickness of edges is proportional to the number of drugs linking the two therapies. c: Distribution of shortest path lengths in level 2 of the therapy network.

Figure 2
figure 2

Distribution of the number of ATC identifiers associated to each drug (corresponding to level 5 of the ATC classification.

Figure 3
figure 3

a, b: Degree distribution of the therapy network at level 2 (a) with degree exponent γ = 0.76 ± 0.10 and level 3 (b) with γ = 1.11 ± 0.14. c, d: Correlation between the node degree k and the betweenness centrality B i (c) and the closeness centrality (d) in the drug network at level 2. The correlation coefficient r is indicated in figures. The P-value is below 2.2e-16 in all cases.

The full bipartite network (Additional files 3, 4, 5, 6, 7, 8) shows that a majority of drugs are grouped in clusters connected to a specific therapy. But links exist between therapies, which are created by drugs spanning different therapeutic classes. These drugs acquire a particular significance, since they create links between different therapies and allow the complete therapy network to be connected. This observation can be quantitatively examined by constructing the histogram of the number of complete therapy identifiers associated to each drug (corresponding to level 5 of the ATC classification) shown in Figure 2. This histogram reveals that a majority of drugs (79%) are associated to a unique therapy. These drugs create no connection in the therapy network projection, thus all edges are due to the remaining 21% of drugs. It is surprising that the therapy network remains fully connected despite this small proportion. Moreover, edges do not predominantly connect therapies belonging to the same first-level class. This finding is made visible in Figure 1, where nodes have been colored according to their first-level class, revealing a large number of links between distinct classes. It is worth noticing that if only 21% of drugs create connections at level 5, it implies that the proportion is even smaller at inferior levels, as smaller levels lead to a merging of therapy nodes.

Shortest paths

Shortest paths provide a measure of the efficiency of information flow in a network. For example, the efficiency of the chemical mass flux in a metabolic network can be estimated by computing its average shortest path length. Here, by investigating the therapy network projection constructed using level 2 of the ATC hierarchical classification, we have found that the average distance between two randomly selected therapies is less than three steps (2.61), which is very low (Figure 1c). The level 2 therapy network is composed of 66 nodes and 237 edges and the main connected component has 64 nodes and 236 edges. It implies that in average distant therapies are separated by a surprisingly low number of chemical compounds. This value slightly increases to 3.41 when level 3 of the therapy network is considered (Additional file 9). The level 3 therapy network has 123 nodes and 349 edges, and the main connected component consists of 106 nodes and 338 edges.

It is known that the average path length <l > is smaller in the Barabási-Albert network than in a random graph for any network size n. It means that a scale-free topology performs better in connecting distant nodes than random structures. Bollobás and Riordan [22] have shown that the average path length of a scale-free model network follows <l > ~ ln(n)/ln ln(n). The computation of this expression for the therapy networks in levels 2 and 3 gives a value of 2.92 and 3.05, respectively. These values are compatible with the observed values of 2.61 and 3.41. This reflects the scale-free topology observed in therapy network (Figure 3a–b).

High-centrality drugs and network backbones

We investigated the betweenness of network nodes, a graph theoretical centrality metric. While the degree k of a node explains the general topological features of the network and can only capture the local structure of network nodes (nearest neighbors), the betweenness B i of a given node i is related to how frequently a node occurs on the shortest paths between all the pairs of nodes in the network [16, 23]. Hence, betweenness centrality identifies nodes with great influence over how the information reaches distant network nodes. This metric is relevant because it connects the local network structure to the global network architecture. In another context it has been proven to be an indicator of interdisciplinarity [24], and it was successfully used in different research areas ranging from the yeast protein interactome, for detecting essential proteins and their evolutionary age [25], to the problem of epidemics, for identifying key players in spreading an infection [26].

In Table 1, we show the top-20 drugs with highest betweenness in the drug network projection corresponding to level 2 in the ATC hierarchical classification. This information is complemented with the measure of the closeness centrality C i , which measures how close a given node i is to others [27]. In some contexts, closeness can be understood as a measure of how long it will take for information to spread from a given node to distant nodes in the network. Thus, nodes with high closeness indicate that their influence can reach others more rapidly. In this network, closeness is relatively high and homogeneous for most nodes. Table 1 indicates that drugs with highest betweenness are correlated with relatively high values of closeness in most cases. The generic drug name and the associated therapy classes are also displayed. The main connected component of the level 2 drug network consists of 828 nodes with an average path length of 3.15.

Table 1 Top-20 drugs with highest betweenness centrality in the drug network projection corresponding to level 2 of the ATC hierarchical classification. The associated therapy classes as well as closeness centrality, node degree and number of targets are also displayed.

We were able to identify a reduced bipartite drug-therapy network composed by drugs with highest betweenness centrality (Figure 4). This sub-network reflects the structural backbone of the drug-therapy system and has great influence over the chemical information flowing through the network. Nodes with high betweenness centrality are relevant because they bridge interactions between distinct parts of the network. Identified drugs with highest centrality include Scopolamine, Morphine, Tretinoin and Magnesium Sulfate. It is worth noticing that the backbone is almost fully connected as well, with the exception of two smaller isolated components. Distant therapies can be connected by a few drugs with high betweenness. For example, Tolbutamide and Magnesium Sulfate define a key shortest path of distance two between distant therapies like "Insulines and analogues" (A10) and "Dermatological preparations" (D11). "Cardiac therapy" (C01) is directly connected to the "Antihemorrhagics" node (B02) via the drug Epinephrine. Apparently unrelated disorders like diabetes and dermatological lesions are thus separated by a much lower number of chemicals than could be expected.

Figure 4
figure 4

The top-20 drugs of highest betweenness centrality and their associated therapies at level 2. Drugs are represented by dark blue empty diamonds, therapies are represented by circles and colored following the same code as in Figure 1.

Correlations between network measures

We investigated the correlations between each of the three measures of topological importance (degree k, betweenness B i and closeness C i ) in the drug network by calculating their Pearson correlation coefficient and P-value. Our results show that the node degree k significantly correlates with closeness and betweenness centralities (Figure 3c–d). This finding can be explained by similar mechanisms to these generating the scale-free property in other networks. In scale-free models, it is well-known that high-connectivity nodes also exhibit high betweenness and closeness centralities. Therefore, this finding is consistent with the observation of a scale-free topology revealed in the therapy network. However, it is worth noticing that the betweenness centrality does not correlate well to the number of drug targets.

The finding that significant correlations between these measures are present suggests the existence of organizing principles behind the man-made drug-therapy/disease system. The chemical information that connects distant diseases and drugs composed of chemical compounds is routed through a small number of drug-therapy nodes having wide influence in the global network. Future multi-target drugs might be designed using similar baselines.

Multi-target drugs

Previous studies have revealed that the distribution of targets associated to approved drugs follows a power-law [28]. A majority of drugs act on only one target, but a small number of drugs act on a large number of targets, which can reach up to 14. The development of drugs that are able to affect multiple targets is seen as promising for treating complex diseases [29, 30]. A special role must be played by these drugs which combine the two properties of (a) having a high betweenness centrality value and (b) acting on multiple targets. These drugs occupy pivotal positions in the drug-therapy network, as they not only connect heterogeneous therapies but also influence multiple metabolic pathways. Several drugs identified in Table 1 meet these two criteria, including Hydroxocobalamin, Vitamin B3, Vitamin B12, Atropine, Orphenadrine, and Procaine.


The importance of studying the global interactions involved in the action of drugs has been widely recognized recently. Goh et al. [17] presented a detailed network analysis of the interactions between human genes and diseases. These interactions define a bipartite network consisting of two sets of nodes. While one set represents genetic disorders, the other set corresponds to all known disease-related genes. In the disease network, the number of genes related to a given disorder exhibits a broad distribution. In addition, this projection shows a skewed degree distribution represented by a generalized power-law, where a few disorders like colon cancer (linked to k = 50 other disorders) and breast cancer (k = 30) are connected to a large number of different disorders. In contrast, we here investigated the drug-therapy bipartite network. The therapy projection, where therapies are connected if a drug implicated in both therapies exists, also revealed a fat-tailed distribution that deviates from a random case and suggests the prominence of some diseases among others.

Paolini et al. [31] constructed a target protein network by integrating several pharmacological resources and studied the properties of this pharmacological space using several chemical quantities: binding affinities, molecular weights, promiscuity, and octanol/water partition coefficients (clogP). Ma'ayan et al. [29] and Yýldýrým et al. [32] in parallel explored the relationships between drugs and their targets by the means of a drug-target network and its drug and target proteins projections. Topological features of both projections revealed an apparent scale-free like degree distributions. Furthermore, disease-related genes and drug targets showed a small average shortest distance. A significant shift was observed toward higher weights in the distribution of shortest paths when targets of drugs approved in the last 10 years were compared to these approved before 1996. This reveals a move toward more rational drug design and reflects the importance of developing new multi-target drugs that shorten the average distance in the drug-target network. Our results in both drug and therapy projections support the idea of networks where distances between node pairs are short in average.

However, none of these works analyzed the relationships between drugs and therapies. This analysis provides the first view of the relationships between therapies, as defined by drug-therapy interactions, revealing that distant therapies are separated by a low number of chemical compounds. It reveals the role of particular drugs with high betweenness centralities in connecting network components of distinct therapy classes. In the future, the combination of this analysis with the above-mentioned works should lead to the construction of an integrated tri-partite network. New network analysis of this integrated network can be expected to reveal fundamental properties of the global relationships between drugs, their targets, and diseases.


Even though new high-throughput technologies have been developed generating large amounts of genomic data, drug design has not followed the same development and it is still complicated and expensive to develop new single-target drugs. In contrast, new findings suggest that multi-target drugs not only maximize the number of possible points of action but also introduce novel network disruption and systems-oriented strategies [12, 13]. Therefore, multi-target drug design combined with a network-dependent approach [11] create a promising concept to combat diseases based on multi-genetic disorders such as cancer, and diseases that involve a variety of cell types such as immunoinflamatory disorders and diabetes [33]. Although the application of these strategies to drug development is still incipient, early results are encouraging [34] and suggest that the control of a complex disease system should consist in the simultaneous disruption of multiple targets located in distant network pathways.


Database and ATC classification

The DrugBank database is a bioinformatics and chemoinformatics resource developed by the University of Alberta that contains detailed drug and drug target information [14]. The database contains nearly 4300 drug entries including 1200 small molecule drugs approved by the US Food and Drug Administration (FDA). Each entry contains more than 80 data fields with detailed chemical, pharmacological and pharmaceutical drug data as well as sequence, structure and pathway information of drug targets.

The Drug Category field contains information on the therapeutic properties and general category of the drug. Therapeutic properties are entered following the Anatomic Therapeutic Chemical (ATC) classification [15]. The ATC system is used by the World Health Organization as an international standard for drug utilization studies. It divides drugs into different groups according to the organ or system on which they act and their chemical, pharmacological and therapeutic properties. Drugs are classified in groups at five different levels. The first level of the code is based on a letter for the anatomical group and consists of one letter; there are 14 main groups. The second level of the code is based on the therapeutic main group. The third and fourth levels are chemical/pharmacological/therapeutic subgroups and the fifth level is the chemical substance. The hierarchical structure of the ATC classification provides an ideal framework for analyzing the relations between drugs and therapeutic applications. Each level of the ATC classification reveals complementary information, making it possible to navigate between different resolutions.

Network construction

We extracted the set of ATC identifiers associated to each FDA-approved drug to construct a bipartite network of drugs and therapies. 186 drugs had no ATC identifier and were discarded, leaving a network of 1014 drugs. Two projections of this bipartite network can be constructed. In the drug projection, nodes represent drugs and two nodes are connected if they share a common therapeutic property. In the therapy projection, nodes represent therapies and two nodes are connected if a common drug belongs to the two therapeutic category. Thanks to the hierarchical structure of the ATC classification, five possible networks can be constructed for each of these two projections. In the therapy projection, the number of nodes decreases at lower ATC levels as several therapies are merged. In the drug projection, the number of nodes is independent of the ATC level but the number of edges increases at lower levels. Networks were drawn using the Cytoscape software [35].

Definition of network metrics

Betweenness (B i ) is a centrality measure of a node in a network. This metric goes beyond local information and reflects the role played by a node in the global network architecture. It is calculated as the fraction of shortest paths between node pairs that pass through a given node.

For a graph G: = (V,E) with n nodes, the betweenness B i for a node i reads as:

B i = 1 ( n 1 ) ( n 2 ) s i t V σ s t ( i ) σ s t MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOqai0aaSbaaSqaaiabdMgaPbqabaGccqGH9aqpjuaGdaWcaaqaaiabigdaXaqaaiabcIcaOiabd6gaUjabgkHiTiabigdaXiabcMcaPiabcIcaOiabd6gaUjabgkHiTiabikdaYiabcMcaPaaakmaaqafajuaGbaWaaSaaaeaacqaHdpWCdaWgaaqaaiabdohaZjabdsha0bqabaGaeiikaGIaemyAaKMaeiykaKcabaGaeq4Wdm3aaSbaaeaacqWGZbWCcqWG0baDaeqaaaaaaSqaaiabdohaZjabgcMi5kabdMgaPjabgcMi5kabdsha0jabgIGiolabdAfawbqab0GaeyyeIuoaaaa@5521@

where σ st is the number of shortest paths from s to t, and σ st (i) the number of shortest paths from s to t that pass through a node i [23]. This measure is normalized by the number of pairs of nodes without including i, that is (n - 1)(n - 2).

Closeness centrality (C i ) measures how close a node i is to all others in the same network and is defined as the average mean path between a node i and all other nodes reachable from it:

C i = n j V d ( i , j ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4qam0aaSbaaSqaaiabdMgaPbqabaGccqGH9aqpjuaGdaWcaaqaaiabd6gaUbqaamaaqafabaGaemizaqMaeiikaGIaemyAaKMaeiilaWIaemOAaOMaeiykaKcabaGaemOAaOMaeyicI4SaemOvayfabeGaeyyeIuoaaaaaaa@3E88@

where d(i,j) is the shortest distance between nodes i and j, and n is the number of nodes in the network [27].

Average shortest path is defined as the average number of steps along the shortest paths for all possible pairs of network nodes. This metric indicates the efficiency of information flow in a network.

Node degree k is the number of edges connected to a given node.