Systems Pharmacogenomic Landscape of Drug Similarities from LINCS data: Drug Association Networks

Musa, Aliyu; Tripathi, Shailesh; Dehmer, Matthias; Yli-Harja, Olli; Kauffman, Stuart A.; Emmert-Streib, Frank

doi:10.1038/s41598-019-44291-3

Systems Pharmacogenomic Landscape of Drug Similarities from LINCS data: Drug Association Networks

Article
Open access
Published: 24 May 2019

Volume 9, article number 7849, (2019)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Systems Pharmacogenomic Landscape of Drug Similarities from LINCS data: Drug Association Networks

Download PDF

Aliyu Musa^1,2,
Shailesh Tripathi^1,6,
Matthias Dehmer^3,4,6,
Olli Yli-Harja^2,5,7,
Stuart A. Kauffman⁷ &
…
Frank Emmert-Streib ORCID: orcid.org/0000-0003-0745-5641^1,2

3902 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

Modern research in the biomedical sciences is data-driven utilizing high-throughput technologies to generate big genomic data. The Library of Integrated Network-based Cellular Signatures (LINCS) is an example for a large-scale genomic data repository providing hundred thousands of high-dimensional gene expression measurements for thousands of drugs and dozens of cell lines. However, the remaining challenge is how to use these data effectively for pharmacogenomics. In this paper, we use LINCS data to construct drug association networks (DANs) representing the relationships between drugs. By using the Anatomical Therapeutic Chemical (ATC) classification of drugs we demonstrate that the DANs represent a systems pharmacogenomic landscape of drugs summarizing the entire LINCS repository on a genomic scale meaningfully. Here we identify the modules of the DANs as therapeutic attractors of the ATC drug classes.

Relationship between drug targets and drug-signature networks: a network-based genome-wide landscape

Article Open access 30 January 2023

Drug Signature Detection Based on L1000 Genomic and Proteomic Big Data

Genomic Approaches for Drug Repositioning

Introduction

Recent availability of large-scale pharmacogenomic data have presented new opportunities but also challanges for tailored patient treatment, drug design and drug safety^1,2. Vast efforts have been placed into discovering the drug mode-of-action (MoA) and understanding the genetic interactions within cells for disease treatment³. Importantly, it has been found that drug-induced transcriptional profiles from cell lines can be used to characterize therapeutic effects, enabling new computational ways for pharmacogenomics for identifying small drug molecules, compounds and drug-drug similarities solely based on gene expression profiles^4,5,6,7.

The Library of Integrated Network-based Cellular Signatures (LINCS) program⁸, (https://clue.io/), funded by the Big Data to Knowledge (BD2K) Initiative at the National Institutes of Health (NIH), generated genetic and molecular signatures of human cell lines in response to various perturbations. The LINCS data repository is a vast library of gene expression profiles covering seventy-two human cell lines and include experiments for thousands of chemical perturbagens (small drug molecules), and drugs added to the cell cultures to induce changes in the gene expression profiles. The LINCS data are publicly available from the Gene Expression Omnibus (GEO) database. Based on these data, several advanced computational methods have been proposed for drug repurposing, identification of mode-of-action (MoA) and discovering phenotypic relations^9,10,11; for an overview see¹². The reason why gene expression data can be utilized as surrogates for the structure of chemical compounds to study mechanism of action and phenotypic impact between compounds^{13,14,15,16,17} it that in¹⁸ it has been shown that structurally similar compounds have similar gene expression profiles, furthermore compounds with similar gene expression signatures tend to interact with similar protein targets¹⁹.

Traditionally, pharmacology approaches focus on single drugs at a time to study their action, effects or safety²⁰. This is similar to traditional molecular biology approaches that focused on single genes or proteins²¹. However, due to modern genomic high-throuhgput technologies, nowadays, it is possible to study many genes or proteins simultenously²². Pharmacogenomics and Systems Pharmacogenomics aim to utilize such genomic profiles to expand beyond single drugs²³. For instance, in²⁴ drug-target and drug-drug networks have been constructed based on the DrugBank database utilizing information about FDA approved and non-approved drugs and their corresponding targets. However, their analysis focused exclusively on drugs and compounds with known targets and did not take into consideration dynamic activity profiles as represented, e.g., by transcriptomics data. In²⁵ some disadvantages were avoided by using gene expression profiles for which Pearson correlation-based networks were constructed. A problem is that the used data were generated from many independent, uncoordinated laboratories using varying platforms and samle preprations. Another drawback of this study is the small number of used profiles (<7,000) and the very limited number of studied drugs (~200). Similar data were used in^4,17 but the construction of the drug network differed. Also, their analysis focused on drugs with known MoA. A different approach has been taken in²⁶ where a drug-drug network has been constructed only based on known side effects of FDA approved drugs. A drawback is the sole focus on negative clinical parameters, limitation to FDA approved drugs and the neglection of dynamical aspects of drug effects. In²⁷ in addition to gene expressin data also information about chemical structures and drug responses have been used. Unfortuantely, the number of drugs for which all three sources of data are available is very limited. A common shortcoming of all these studies is a lack of conceptual explanations of the drug networks.

The ultimate goal in pharmacology is to know all properties, effects and actions of all drugs and componds²⁸. Hypothetically, this information could be obtained from clinical trials testing each compound for every existing disease including subtypes and stages. From this information one could measure the similarity between different compounds, e.g., based on clinically relevant parameters. This would give the network structure of an ideal compound-space giving all relationships among all compounds corresponding to an ideal drug association network (iDAN). Due to the practical impossibility of such an approach the question is, is it possible by using genomics data to approximate such an iDAN?

The main purpose of our paper is to introduce a computational method that provides such an approximation leading to a systematic organization for the thousands of drugs and small compounds that are available from the LINCS repository. Specifically, we introduce a method for constructing Drug Association Networks (DAN) based on almost two million gene expression profiles for over 20,000 chemical perturbagens and seventy-two human cell lines. In these networks nodes correspond to drugs and two drugs are connected if their profile responses are similar, as measured by the statistical significance of the Jaccard Index (JI). The profile responses for each drug correspond to estimates of “consensus” signature profiles summarizing the transcriptional effect of drugs across multiple treatments on different cell lines and/or different dosages and time points. Overall, the DANs provide a systematic summary of the entire LINCS data repository and the complex pharmacogenomic landscape of drug similarities. For a conceptual overview see Fig. 1A.

For obtaining pharmacogenomically meaningful networks, we construct different DANs based on data from different conditions. Specifically, we construct for each cell line a DAN using only the corresponding drug signature profiles. Furthermore, we construct one DAN limited to FDA approved drugs and one DAN for all drugs and small compounds (comprising FDA approved and non-approved drugs). This leads to condition-specific DANs (see Fig. 1C for their dependencies). In total, we are inferring 74 different DANs.

In order to analyze and interpret the DANs, we investigate the DANs on three different levels. First, we study the structure of the DANs by identifying network modules, also called communities^29,30,31. This will allow us to gain insights into the structural properties of the networks. Second, we study drugs pairwise by identifying the presence of significant Anatomical Therapeutic Chemical (ATC) classes in the entire network. This analysis step will show that drugs with similar ATC classes are actually identified in compound space. Third, we study the enrichment of the network modules with respect to ATC classes. By using the ATC classification of drugs, we will demonstrate that the DANs represent a pharmacogenomic landscape of drugs summarizing the entire LINCS repository on a genomic scale.

As a general results, we will show that the ATC code enriched modules in the DANs can be seen as therapeutic attractors of drug classes. We will see that this allows a conceptual extension of the idea of cancer attractors³² introduced for gene regulatory networks to represent cell states^33,34 to DANs representing pharmacological states (need name).

Furthermore, in order to communicate the wealth of our obtained results efficiently, we developed a web interface accessible at (http://dan-network.herokuapp.com). Our web application allows to access the drug-drug interactions inferred by our method, and connecting to external links. The features of our DAN user interface enable searching, browsing, exploration and downloading of the network visualizations.

The paper is organized as follows. In the next section we present the Materials and Methods used for our analysis. Then we present our Results and a Discussion. This paper finishes with Conclusions.

Results

In the following, we first construct DANs from different information corresponding to different characteristics of the LINCS data. This results in DANs having a context specific meaning. Then we will analyze the DANs on three different levels. First, we focus on the structure of the DANs identifying modules in the networks. Second, we study drugs pairwise by identifying the presence of significant ATC classes in the entire network. Third, we study the enrichment of the network modules with respect to ATC classes.

Construction of drug association networks

The first network, we construct for FDA-approved drugs with assigned annotations in DrugBank^35,36. For this reason we call this network N_approved. In total, there are 1139 approved drugs in LINCS, however, only 381 have an ATC annotation. The drugs with DrugBank IDs are repeated in multiple experiments; therefore, the landmark genes have multiple z-scores from different experiments. We first average the z-scores for each drug from different experiments and use the consensus of the z-scores to construct the DAN, as described in the method section. From this analysis, we obtain a network with 381 nodes and 4251 significant interactions. From this network, we extract the giant connected component (GCC) having 367 drugs (nodes) and 4244 interactions (edges). In Fig. 2A, we show the distribution of JI of all significant interactions for this network from profiles having between 100 to 150 DEGs.

The second network we construct, we call N_all, is for all available drugs. In LINCS data there are in total 2505 different drugs applied in the different experiments (cell line, dosage and time point). For these, we construct a network with 2505 drugs and 86,585 significant interactions. From this network, we extract the GCC having 2451 nodes and 22636 interactions. In Fig. 2B, we show the distribution of JI of all significant interactions for this network from profiles having between 700 to 800 DEGs. The higher the value of the JI the more genes are commonly up- or down-regulated between two drugs.

Next, we construct 72 networks that are specific for the 72 cell lines. All of these networks are sub-graphs of N_all, i.e., ${N}_{{\rm{all}}}^{C{L}_{l}}\subset {N}_{{\rm{all}}}$, with CL = {list of cell lines in LINCS}, due to the way we summarize all configurations, see Eqn. 5. In addition, it holds ${N}_{{\rm{all}}}={\cup }_{C{L}_{l}\in CL}{N}_{{\rm{all}}}^{C{L}_{l}}$. That means, N_all contains all significant interactions identified for any cell line.

For our further analysis, we select from these 72 networks the five networks having the highest number of interactions between the drugs; see Fig. 2C for the frequency distribution of interactions for all cell lines. These cell lines are {MCF7, VCAP, PC3, A549, A375}. These 5 networks contain the most information, assuming interactions provide informative knowledge. The high number of interactions in each of these networks (more than 10,000) ensures also that a sensible identification of modules is feasible.

In Table 4, we show a summary of these seven networks and their number of nodes and edges. All of these networks correspond to the GCC of the corresponding network. In the following, we will limit our analysis to these seven networks.

Modules in Dans

Our first analysis consists in the identification of the modules in the seven different DAN networks. For this, we are using a multilevel community module detection algorithm³⁷ to find the modules in the networks. The modularity and the number of modules for each network are summarized in Table 4. We would like to remark that the number of the modules correspond to labels, i.e., the same label for different networks does not mean it should contain the same drugs. In general, we find the modularity to be similar among the different networks except for N_approved and N_all which is smaller. This is understandable considering the used data for these networks is different to the others. For the number of modules we observe similar values ranging from 11 to 25 modules.

In Fig. 3, we show the networks for N_approved and N_all and the distribution of the number of drugs in the modules. The networks for the 5 cell lines are shown in Fig. 1–5 in the Supplementary File. From the barcharts of boths networks one can see that there are a few modules containing a large number of drugs and the remaining modules contain only a few drugs. These large modules are also clearly visible in the network representation of the DANs on the left-hand-side in Fig. 3. In general, the modules in N_all are larger than in N_approved which is understandable because the former DAN contains 2451 nodes whereas the latter has only 367 (see Table 4).

Significance of ATC interactions in the entire network

Next, we analyze pairwise interactions between drugs in terms of their corresponding ATC classes. For this analysis, we use all the significant interactions which are annotated with ATC codes in the 7 DANs. The number of interactions and the distribution of their JI values are shown in Fig. 4. In this figure, we show only drug pairs beloning to the same ATC class corresponding to homogene interactions, i.e., the label L refers to the interaction of two drugs, both from ATC class L.

For the network N_approved the number of interactions and their JI values are shown in Fig. 4A (left with red label). One can see that interactions between drugs from the ATC class L occur far more often than for any other ATC class. Interestingly, the differences in the values of the JI for these interactions (shown in the boxplot in Fig. 4A) are not that different for different ATC classes. The results are similar for N_all.

For the other five networks of the cell lines, the frequency of drug annotations and the distribution of JI values are shown in Fig. 4B. From comparing these five networks we make five observations. First, the number of ATC classes is much smaller than for the two networks N_approved and N_all. Second, the ATC class L is present in all networks for the cell lines. Third, the overlap between the five cell line networks with respect to the ATC classes is smaller than for the two generic networks. Fourth, the network N_VCAP is the only one having more interactions for the ATC class G. Also the difference between the top 4 ATC classes is smaller than for the other networks, except N_PC3. Fifth, all of the networks share that the ATC class of the larges JI values do not correspond to the ATC class for the largest number of interactions.

In order to reveal robust interaction patterns, we randomize the ATC class labels of the drugs and determine statistically significant ATC interactions classes. For this analysis, we study homogeneous as well as heterogeneous interactions (between drugs from different ATC classes) corresponding to the inter-class effect of drugs. Specifically, we obtain the counts of ATC code combinations from each network (i.e. A − A, A − C, B − L etc.) by counting their occurancy in each DAN. Then we randomise each network 10,000 times to obtain the null distribution for each ATC class combination using the counts of ATC classes as test statistic for each ATC class. From comparing the null distributions with the test statistics we obtaine p-values to which we apply a Bonferroni multiple testing correction to get the adjusted p-values.

These results demonstrate that the inferred network structure of all DANs capturing meaningful drug-specific information that could be revealed by the significance of selected ATC classes.

Enrichment analysis of network modules

Finally, in order to obtain a pharmacogenomically meaningful interpretation of the DANs, we perform an enrichment analysis of the modules identified in the previous section.

The constructed DANs have nodes corresponding to known and unknown drugs and some of the nodes (drugs) in these networks have Anatomical Therapeutic Chemical (ATC) annotations³⁸. We categorized these drugs/nodes with ATC annotations into 14 classes, summarized in Table 2. In addition, we use the label ‘X’ to indicate drugs for which no drug annotation is known.

Table 1 Contingency table summarizing the gene regulation profiles R_i and R_j treated by drug D_k and D_l.

Full size table

Table 2 Description of ATC annotations.

Full size table

We performed an enrichment analysis of drugs with ATC codes for the modules detected in each network. In order to test the statistical significance of ATC classes, we use Fisher’s Exact Test³⁹. Since we are testing multiple hypothesis tests for each module, we apply a Benjamini Hochberg correction to control the FDR. In the enrichment analysis we first find the total number of drugs in a module which are labelled with ATC codes and then we performed Fisher’s Exact test to determine which ATC labels are overrepresented in a particular module. The results of this enrichment analysis are shown in Fig. 5.

In N_approved, the N (Nervous system) group is overrepresented in first module. The ATC groups R (Respiratory system), S (Sensory organs) and D (Dermatologicals) are enriched to the second module. The ATC group J (Antiinfectives for systemetic use), G (Genito-urinary system and sex hormones) and P (Antiparasitic products, insecticides and repellents) are enriched in 3, 4 and 5 modules. This is interesting to highlight, since the drugs which are overrepresented in the same modules of different classes perturb common genes or a similar subset of genes. This information can be used for further investigation to see if those drugs can perturb common pathways.

In the network (N_all), the ATC group L (Antineoplastic and immunomodulating agents) is overrepresented in first module. ATC groups H (Systemic hormonal preparations, excluding sex hormones and insulins) and D (Dermatologicals) are enriched to the sixth module, however group S (Sensory organs) also show a low q-value (0.073, which is not significant).

For the network N_MCF7, it shows the ATC group L (Antineoplastic and immunomodulating agents) and R (Respiratory system) are enriched in the first and third modules. However, the ATC group M show a low q-value (0.090) in module 5.

For the network N_VCAP, no ATC group is enriched in any module however, ATC group D (Dermatologicals) show a low q-value (0.121) in module 6.

In the network N_PC3, the ATC groups G (Genito-urinary system and sex hormones) and C (Cardiovascular system) are enriched in module 2. The ATC group L (Antineoplastic and immunomodulating agents), in module 3, also ATC group J (Antiinfectives for systemic use) has a low q-value (0.087) in module 3. The ATC group N (Nervous system) shows a low q-score (0.059) in module 6. The ATC groups S (Sensory organs) and D (Dermatologicals) are enriched in module 8. The ATC group P (Antiparasitic products, insecticides and repellents) is also enriched in module 11. The ATC group L (Antineoplastic and immunomodulating agents) show a low q-score (0.06) in module 12. The ATC group G (Genito-urinary system and sex hormones) is enriched in module 13.

In the network N_A549, the ATC group L (Antineoplastic and immunomodulating agents) is enriched in module 2. The ATC group M is enriched in module 3, ATC group C is enriched in module 4. However, The ATC group L (0.062) and S (0.11) show low q-values in modules 3 and 13 respectively.

In The network N_A375, the ATC group L (Antineoplastic and immunomodulating agents) is enriched in modules 3, 8 and 11 respectively. The ATC group C (Cardiovascular system) is enriched in mdoule 6.

The summary of the enrichment analysis of the ATC groups for the modules of the different networks is shown in Table 5. In this table, we highlighted the ATC groups which are enriched in at least one module in different networks. We also include those ATC groups which are not significant but holds low q-values between 0.05 < α < 0.15.

Web interface for DAN of drugs

Due to complexity of our results making it difficult to communicate all details, we developed an interactive web application. The web application is publicly available at http://dan-network.herokuapp.com/ showing visualizations of all 7 DANs summarized in Table 4. For the technical realization for the visualization of the networks we developed our web interface using the NodeJs⁴⁰ and SigmaJS⁴¹ libraries. Each node in the network (drug) has a dedicated pane with a list of the relevant associations and external resources to websites such as: DrugBank, PubChem, LINCS Portal, ChemBL and KEGG Ligand with relevant identifiers. That means, a user can interactively explore the interactions in all 7 DANs obtaining pharmacological information from the linked data resources. A screen shot of our web application is shown in Fig. 6.

Discussion

In our paper, we based our analysis on the LINCS data repository providing compreshensive information about the effect of drugs or compounds on gene expression changes. This means LINCS enables an estimation of the linkage between genotype, phenotype and therapies and to identify key genes which are a significant part of the biological processes related to phenotype differences as approximated by gene expression values.

For our study, we went beyond single genes because we were aiming at a comprehensive overview of the systems relations among all drugs tested in LINCS. In order to accomplish this, we utilized differentially expression profiles to estimate DANs. Specifically, our analysis started by constructing DANs to estimate the similarity between drug pairs using the Jaccard Index, which estimates the proportion of differentially expressed genes that are common in the corresponding expression profiles. If two drugs showed a statistically significant similarity, we connected them by an edge. In this way, we constructed 7 different DANs for 7 different conditions, which we further analyzed. The results of these networks are summarized in Table 4.

We analyzed the DANs on three differnt levels. First we studied the structure of the DANs by identifying network modules. Second, we studied the drugs pairwise by identifying the presence of significant ATC classes in the entire network. Third, we studied the enrichment of the network modules with respect to ATC classes.

The significant pairs in the networks show a variable JI distribution, shown in Fig. 2A,B. In general, the effect of drugs in terms of differentially expressed genes varies, i.e., some drugs show a strong effect, which means a large number of differentially expressed genes, while other drugs have a moderate effect changing the expression of only a small number of genes. If a drug, D_i has a moderate effect, i.e., a small number of differentially expressed genes, but a strong overlap with the drug, D_j, which has a strong effect on the genes, i.e., it causes a larger number of differentially genes, the JI will be significant but not high. In such cases the interaction may not describe the same functionality of both drugs, but it can have a similar effect on some subset of gene targets. On the other hand, if two drugs have a similar proportion of differentially expressed genes and overlap strongly then the corresponding JI is higher.

After the construction of the networks, we identified modules in the networks. For this we employed the multilevel community algorithm³⁷. The results of this analysis are summarized in Table 4. In general, the modularity of the networks for the five cell lines is higher than for N_all and N_approved, which has the lowest modularity. For the number of identified modules this distinction is no longer present. It is interesting to note that the number of modules in all networks is of the same order of magnitude as the number of our ATC classes (which is 14).

It is interesting that the modularity of N_all and N_approved is different to the five cell line DANs because these two network types are indeed quite different from each other due to the different information used for their construction.

These results suggest that the modules in the networks could represent drugs or drug classes effecting similar targets. That means drugs in the same module have a similar effect on some common gene targets, because of their significant overlapping of differentially expressed genes as measured by the JI. This can also be interpreted as follows: The presence of drugs in different modules suggests that each module can identify a different type of target-set, which is independent from other target-sets for different drugs. For instance, for N_approved, we identify 13 modules which means that there are 13 distinct effect types of drugs. Interestingly, this number is very close the total number of ATC classes we were using, which is 14 (see Table 2).

In order to test this idea further, we performed an enrichment analysis of the network modules testing for the enrichment of ATC classes. The results are summarized in Fig. 5. Due to the complexity of these results, we discuss them in three steps. First, we discuss results for all networks combined. Second, we discuss network specific characteristics of significant modules and ATC classes. Third, we discuss networks and modules indivdually to identify commonalities.

First, from our results (see Table 5) we see that the total number of significant modules (SM (all networks)) for all networks enriched for the ATC classes is low varying between 7 (for ATC class L) and 0 (for ATC class A, B and V). Most ATC classes are only enriched in 1 or 2 modules in all networks, e.g., ATC class H, J, M, N, P, R and S.

Second, when looking at the networks individually, we found that the total number of enriched modules (SM) per network varies between 5 (for N_approved) and 0 (for N_VCAP). Similarly, the number of significant ATC classes (SC) per network varies between 7 (for N_approved) and 0 (for N_VCAP), see Table 5. Taken together, these observations confirm our interpretation of the findings for the number of modules, which did not consider ATC enrichments, underlining the representative character of the modules for ATC classes.

Third, we are looking at networks and modules indivdually. From these we can obtain the following summary for this level. Overall, we can identify four different types of drug-module enrichments discussed in the following.

Single-drug class in individual modules

For this type of enrichment, we find only one enriched ATC class per module in a DAN. That means there is an unique relation between an ATC class and a module in a network. From our results, we find that the N_approved and N_A549 have four modules which are enriched for a single ATC class, N_MCF7 and N_PC3 have two such modules, N_all and N_A375 have one module, and N_VCAP has no significant module.

The interpretation for these results is that each module is characteristic for a set of drugs represented by an ATC code and could be used to predict the function of unknown drugs within this module because they are likely to have common targets. This could be used to predict the function of unknown drugs or drug repositining.

Single-drug class in multiple modules

For this type, an ATC class is enriched in more than one module. For instance, ATC class L is enriched in 3 modules in N_A375; see the vertical boxes in Fig. 5. Furthermore, ATC class G is enriched in two modules in N_PC3. This suggests that drug class G and L have possibly three, respectively two independent target-sets effected by these drugs. This means ATC classes G and L have multiple target sets which are at least partially independent from each other.

The interpretation is that if in a network a single ATC class is enriched in multiple modules, the drugs from this ATC class are heterogenously separated targeting different subsets of genes.

Multiple-drug classes in a single module

For this type, we find more than one ATC class enriched in a module. The N_approved network has three ATC classes (D, R, and S) enriched in module 2; see the horizontal boxes in Fig. 5. The netwok N_PC3 has two modules enriched with two drugs. Specifically, module 2 is enriched by ATC class C and G and moduel 8 is enriched by ATC class D and S. Finally, N_all has module 6 enriched by ATC class D and H.

Our interpretation for this is if multiple ATC classes are enriched in a single module, this means that, e.g., two drugs from two different ATC classes have at least partially common targets. These targets hight be higher order, i.e., not directly targeted by a drug but further downstream, but enough to change the differential expression of such genes. This could be used to predict a drug repurposing.

Multiple Drug classes in multiple modules

For this type, we find an ATC class enriched in multiple modules together with further enriched ATC classes; see the intersection of a horizontal and vertical boxe in Fig. 5. For this type, we find merely one network N_PC3 whereas ATC class G is enriched in module 2 and 13 and the enrichment in module 2 is shared with ATC class C.

This result indicates that a drug class has multiple independent target-sets and could be used for predicting the repurposing of known drugs as well as predicting the function of unknown drugs.

Combining all our findings, our results have a similarity to the conceptual idea of cancer attractors introduced by^32,42 and, e.g., studied in^33,34. The authors analyzed gene regulatory networks and showed that cell types can be seen as attractors in the epigenetic landscape representing the phenotype space of an organism, see Fig. 1A. That means the developmental state of cells giving raise to different cell fates can be seen as dynamical gene networks chaning their structure over time and as a consequence changing their position in the epegenetic landscape. Similar studies have been conducted by^43,44,45. In³³ it has been argued that cancer cells are trapped in abnormal attractors allowing in this way the extension of the conceptual idea of attractors in gene regulatory networks to general abnormal or tumor cell types in diseaes beyond cancer^46,47,48.

Our study adds in a non-trivial way to this because we do not study gene regulatory networks but DANs, where the drugs/compounds correspond to the nodes of the network instead of genes. Due to the fact that we determine the similarity between pairs of drugs based on hundreds or even thousands of expression profiles, for certain conditions, a DAN integrates dozens of individual gene regulatory networks, each representing a particular cell state, see Fig. 1A. This includes a temporal integration of the cells due to the perturbation effect to the exposed drugs. This means that despite the fact that the DANs are static they nevertheless represent dynamical states of the underlying cells. Hence, a DAN is capable of representing many different states of cells, corresponding to phenotypes, simultenously and allows the integrated representation of the drug landscape.

It is important to emphasize the difference between the different ‘spaces’ considered. GRNs are embedded into the genotype space describing the activity of genes, whereas the epigenetic landscape, representing the phenotype space, describes cell states and their transitions. Here a cell state can correspond to a normal cell type or an abnormal tumour or disease cells. These states are the attractors of ^32,42. Each cell state has a corresponding GRN and, hence, a projection into genotype space. Our DANs are embedded into the compound space representing therapeutic interventions. Each state in the compound space corresponds to a drug/compound that is connected to the phenotype space to abnormal and normal cell states. The connection between these three spaces is visualized in Fig. 1A.

For our DANs, we found a graph-theoretical correspondence of an ‘attractor’ state in phenotype space, by the modules in the networks in the compound space. This could be demonstrated by utilizing information about the ATC classification of known drugs. In this way we complemented LINCS with information from DrugBank about known effects of drugs.

For enabling an efficient exploration and reusage of our results, we developed an interactive web interface that can be used to view, explore, and link drug associations for our results. The interface also provides an integration with external resources via added links, curated mappings, and external IDs. Content from other resources such as PubChem has been incorporated into the DAN web interface enabling End users to view information and explore new hypotheses of drug associations. These features could facilitate further research in the field on a large-scale and in addition could provide health care professionals with a valuable systems pharmacogenomics source.

Finally, we would like to note that it appears desirable to integrate different types of genomics data, e.g., transcriptomics, proteomics and metabolomics data, to establish in this way an integrated systems pharmacogenomics landscape of drug similarities. Unfortunately, the LINCS database, on which our analysis is based, nor any other current database, does not provide those different types of data that would allow to realize this approach practically. For this reason, our approach is the most feasible one considering the current practical data constraints and can be as an approximation of thereof. On a more theoretical note, we would like to add that even if one could realize an integrated systems pharmacogenomics landscape it is unclear if all different genomics data types are actually required or if they are, at least partially, redundant. Only future studies can shed light on this conceptual issue.

Conclusion

In this paper, we developed a systems pharmacogenomics approach and applied it to data from the LINCS repository. As a result, we constructed Drug Association Networks summarizing hundreds of drugs and thousands of compounds systematically with respect to their therapeutic effects. We showed that the modular structure of the DANs represent enriched ATC classes thus integrating the drug induced changes on the genotype states of the cells.

Materials and Methods

Drug perturbation data from LINCS data

The LINCS L1000 data comprises of 5806 genetic perturbations (e.g., single gene knockdown and over-expression) and 16,425 perturbations induced by chemical compounds (e.g., drugs)⁴⁹. About 1.3 million gene expression have been profiled and collected for this project using the L1000 technology⁵⁰. The L1000 platform has been developed at the Broad Institute by the connectivity map (CMap) team to facilitate rapid, flexible and high throughput gene expression profiling at a lower cost. However, the L1000 technology only measures expression for 978 landmark genes and the expression values for the rest of the transcriptome are estimated using a computational model based on Gene Expression Omnibus (GEO)⁵¹ data. In this paper, we used the level 5 signature data of drug perturbations in various cell lines. Overall, the LINCS data were generated from a multifacturial experimental space, see Fig. 1B.

DrugBank database

DrugBank is a comprehensive drug data resource that contains records about chemical, pharmacological, and pharmaceutical features of more than 8,000 drugs, including the 2016 FDA-approved drugs⁵². We used version 5.0.11 (released 2017-12-20) of the DrugBank database for our analysis. To make the cross-platform comparisons compatible, we considered the DrugBank ID as the identifier of drugs across the DrugBank and LINCS databases. For our analysis, we used the Anatomical Therapeutic Chemical (ATC) classification codes, controled by the WHO, shown in Table 2. This classification categorizes drugs into different groups/classes according to the organ or system on which they act, their therapeutic effect, and their chemical characteristics. For our analysis we use the first ATC level, which gives 14 main anatomical classes.

Metadata pipeline

The LINCS data API provides a programmatic pipeline to annotations and perturbational signatures in the L1000 dataset via a collection of HTTP-based RESTful web services. An example of these services includes; Cell Service, which is a service that describes the cell line meta-information. The API services provided by the LINCS API for querying the L1000 metadata support complex queries via simple HTTP GET requests that can be executed in a web browser or most programming languages such as R and Python.

Transcriptional profiles and small molecules diversity

We downloaded the L1000 raw z-score vectors from the GEO repository and pre-processed them using the R L1000 tools⁵³. A signature of a small molecule is defined as a vector of z-score values, representing the differential expression between samples treated with small molecules and control samples. That means a z-score signature summarizes the effect of the treatment with a small molecule. This is in depencence on experimental condition, e.g., dosage, time point, cell line etc.

In total, there are 169, 239 z-score signature profiles marked with the highest signature count that satisfied the well- and plate-based quality control. This signature profile subset covers 20, 009 small molecules (out of 49, 400 perturbagens) that were repeatedly measured with 1 to 8 replicates. For our analysis, we select the time points 6, 24 and 48 h because they represent by far the majority of conditions. From this we find in total 158, 054 signature profiles (i.e., any combination of the small molecule, time, and cell line) we use for our analysis. In Table 3, we show some summary statistis of this data set.

Table 3 Summary of z-score signature profiles for DEGs between treatments and controls on the cell line subset.

Full size table

Table 4 Summary of seven DANs constructed from different information.

Full size table

Table 5 Summary of module enrichments shown in Table 5 for all DANs.

Full size table

The z-score signature vectors were used to study the effect of a drug treatment on the differential expression of genes. We used the threshold >2.0 to indicate upregulation and <−2.0 to indicate down-regulation of a gene respectively.

Mapping small molecules to external databases

The L1000 small molecules were assayed across multiple cell lines, experimental replicates, dosages and time points. For this reason, we mapped DrugBank compounds and the directly measured (landmark) genes to calculate a single transcriptional profile across multiple signatures for each L1000 small molecule. We also mapped the L1000 small molecules to external database sources in UniChem database⁵⁴. We achieved this by querying UniChem with the InChIKey of each L1000 compound via UniChem API. This allows us to map the L1000 small molecules not only to DrugBank, but also to PubChem, ChEMBL, and KEGG Ligand covered by UniChem (see Table T1 in Supplementary File 1). The pipeline enables us also to identify FDA-Approved drugs and to map them to the L1000 small molecule identifiers.

After mapping the DrugBank identifiers to small molecules, the identifiers were used to calculate the signature profile consensus for each drug. The purpose for computing consensus is to combine signature profiles for the same perturbation under different conditions (e.g., cell types, different dosages, or time points). The signature profiles consensus were obtained using the following; First, we calculated the Spearman rank correlation of all signatures that belong to a drug identifier in DrugBank. Second, we calculated the weights by taking the mean correlation to normalize the similarities (Total correlation, see Fig. S1 in Supplementary File 1). Third, we multiplied the z-score signatures by their similarity weights. Last, we sum up the weighted z-score vectors to form a single signature consensus.

Drug association network

The basic idea of the drug association network (DAN) is to generate a network where different drugs show a similar effect on gene expressions which means that the number of genes affected by them has the same type of expression profiles compared to the control data. For example, for a particular cell line treated by drug D_i and D_j having observed phenotype changes ${\hat{P}}_{i}$ and ${\hat{P}}_{j}$, these phenotypes will be similar $({\hat{P}}_{i} \sim {\hat{P}}_{j})$ if the two drugs influence (overexpression or underexpression compare to a control state) similar genes. In order to estimate the similarity between two drugs we use a Jaccard-like index⁵⁵ between two vectors of genes which are characterized as 1 (up), −1 (down) and 0 (no change) by drugs D_i and D_j. In the first step, we obtain a matrix by converting the z-scores of drug-treated expression data to a matrix of categorical data-type whereas rows represent genes and drugs correspond to columns. In this matrix, genes are categorized as differentially expressed and non-differentially expressed genes. The differentially expressed genes are labelled by 1, for up-regulated, and −1 for down-regulated. The non-differentially expressed genes are labelled by 0. In the second step, we measure the overlapping score between pairs of drugs by using a JI as described in Eqn. 1. The JI gives a ratio of differentially expressed genes which are common between a pair of drug-treated data w.r.t. all other genes which are differentially expressed in at least one drug-treated data. In the third step, we test the significance of the Jaccard Index. We perform the significance test with a non-parametric approach by randomizing gene labels of each drug data vector independently. This allows us to estimate the sampling distribution of the null hypothesis. A schematic overview for the construction of a DAN is shown in Fig. 1D.

Jaccard Index

Let D_k and D_l be two drugs with regulation profiles R_i and R_j. R_i and R_j are two vectors of length n, whereas n is the number of genes. Their components correspond to (I) down-regulation (−1), (II) no-change (0) or (III) up-reguation (1). The Jaccard Index (JI) can be estimated from the contingency table (see Table 1) giving the overlap between the two regulation profiles representing the effect of the drugs D_k and D_l:

$${J}_{ij}=J({R}_{i},{R}_{j})=\frac{{\Vert {G}_{i}\cap {G}_{j}\Vert }_{/\Vert 0,0\Vert }}{{\Vert {G}_{i}\cup {G}_{j}\Vert }_{/\Vert 0,0\Vert }}=\frac{{n}_{11}+{n}_{33}}{{n}_{t}}$$

(1)

here n_t = n₁₁ + n₁₂ + n₁₃ + n₂₁ + n₂₃ + n₃₁ + n₃₂ + n₃₃ is the number of genes showing differential expression.

Construction of the drug association network

The construction procedure for the DAN consists of 11 steps and is based on z-score vectors available in LINCS. Every z-score vector, Z = {z₁, z₂ ..., z_n} whereas n is the total number of genes, is a function of experimental conditions, including a drug D_k and a cell line CL_m, which was exposed to drug D_k. For briefity we simply write Z = Z(D_k,γ) to indicate that a z-score is a function of drug D_k and further conditions summarized by γ. We call (D_k,γ) a configuration. Due to this dependency, Z = Z(D_k,γ) can be seen as a profile for drug D_k.

For reasons of notational simplicity, we index the configurations (D_k,γ) by an integer number. That means we map (D_k,γ) to c_h ∈ C = {c₁, …, c_t} = {1, …, t}, whereas t is the total number of configurations. This leads to the notation

$$Z=Z({D}_{k},\,\gamma )=Z({c}_{h})$$

(2)

we will use in the following.

1.
This step is only used for N_approved: Summarize the z-scores for all configurations with the same drug, i.e., DC_k = {c_i, c, … c_k} whereas every x ∈ DC_k contains drug D_k. The summarized values are given by
$$Z^{\prime} =\frac{1}{n}\sum _{x\in D{C}_{k}}\,Z(x).$$
(3)

In this case the total number of remaining z-scores corresponds to the number of configurations and the number of drugs. Re-indexing of the configurations gives c_h ∈ C = {c₁, …, c_t} whereas t is now the number of different drugs.
2.
Convert every z-score vector into a p-value vector, P = {p₁, p₂..., p_n}, i.e., P = P(c_h).
3.
Convert every p-score vector into a q-value vector (controlling FDR with Benjamini and Hochberg (BH) method⁵⁶), Q = {q₁, q₂ ..., q_n}, i.e., Q = Q(c_h).
4.
Construct a matrix R of differentially regulated genes for all configurations c_h, i.e., R is a (n × t) matrix, whereas the components of this matrix correspond to (I) down-regulation (−1), (II) no-change (0) or (III) up-reguation (1).:

For each configuration c_h, we have the corresponding z-score vector Z(c_h) and the corresponding q-value vector Q(c_h). The function f:(Z(c_h), Q(c_h))_i → M maps from the q- and z-value of a gene i to its regulation categories, i.e., M = {−1, 0, 1}. Specifically, the function f(z_i(c_h), q_i(c_h)) is defined as follows:
$$f({z}_{i}({c}_{h}),{q}_{i}({c}_{h}))=\{\begin{array}{ll}-1 & :{q}_{i}({c}_{h})\le \,\alpha \,{\rm{and}}\,{z}_{i}({c}_{h})\, < 0\\ 1 & :{q}_{i}({c}_{h})\le \,\alpha \,{\rm{and}}\,{z}_{i}({c}_{h})\, > 0\\ 0 & :{\rm{otherwise}}\end{array}$$

This gives ${R}_{i,h}=f({z}_{i}({c}_{h}),{q}_{i}({c}_{h}))$.
5.
Using R to calculate the Jaccard index (J_ij) as defined in Eqn. 1 for each pair of configurations c_i and c_j, with ${c}_{i}\ne {c}_{j}$ and c_i, c_j ∈ C. Specifically, calculate J_ij = J(R_i, R_j), whereas the R_i and R_j are the columns of matrix R for the configurations c_i and c_j.
6.
Test the significance of a Jaccard Index for each pair of configurations by the following hypothesis.

H₀: The number of differentially expressed genes overlapping in two dataset treated by drugs D_i and D_j is zero.

H₁: The number of differentially expressed genes overlapping in two dataset treated by drugs D_i and D_j is not zero.
7.
The sampling distribution is obtained from gene-label randomizations for each pair of configuration profiles R_i and R_j from which the corresponding Jaccard index, J_ij = J(R_i, R_j), is determined. This results in the permuted Jaccard indices, ${J}_{perm}(ij)=\{{j}_{ij}^{per{m}_{1}},{j}_{ij}^{per{m}_{2}}\ldots {j}_{ij}^{per{m}_{L}}\}$ for L = 2000.
8.
From J_perm(ij), we estimate the p-values by:
$${p}_{i,j}=Pr({j}_{i,j} > {j}_{i,j}^{perm})=\frac{{\sum }_{k=1}^{L}I({j}_{i,j} > {j}_{i,j}^{per{m}_{k}})}{L}$$

This gives P^J = {p_1,2, p_1,3, …, p_n,n−1}, containing in total $\frac{t\cdot (t-1)}{2}$ different p-values.
9.
Controling the FDR by BH we convert P^J into q-values, Q^J = {q_1,2, q_1,3, …, q_n,n−1}, consisting in total of $\frac{t\cdot (t-1)}{2}$ different q-values.
10.
Construct a matrix B for all configurations C by using the q_ij values:
$${B}_{{c}_{i},{c}_{j}}=\{\begin{array}{ll}1 & :{q}_{i,j}\le \alpha \\ 0 & :{\rm{otherwise}}\end{array}$$
(4)

Here ${c}_{i},{c}_{j}\in C$.
11.
Construct a DAN by summarizing all configurations with the same drug, i.e., DC_k = {c_i, c, … c_k} whereas every x ∈ DC_k contains dug D_k

$${A}_{{D}_{k},{D}_{l}}={\rm{\Theta }}(\sum _{x\in D{C}_{k},y\in D{C}_{l}}\,{B}_{xy})$$

(5)

here Θ(w) is the theta function which gives 1 for w > 0 and 0 otherwise.

References

Keiser, M. J. et al. Predicting new molecular targets for known drugs. Nature 462, 175–181 (2009).
Article CAS ADS PubMed PubMed Central Google Scholar
Dunkel, M., Günther, S., Ahmed, J., Wittig, B. & Preissner, R. Superpred: drug classification and target prediction. Nucleic acids research 36, W55–W59 (2008).
Article CAS PubMed PubMed Central Google Scholar
Santarius, T., Shipley, J., Brewer, D., Stratton, M. R. & Cooper, C. S. A census of amplified and overexpressed human cancer genes. Nat. Rev. Cancer 10, 59–64 (2010).
Article CAS PubMed Google Scholar
Iorio, F. et al. Discovery of drug mode of action and drug repositioning from transcriptional responses. Proc. Natl. Acad. Sci. 107, 14621–14626 (2010).
Article CAS ADS PubMed PubMed Central Google Scholar
Finley, S. D., Chu, L.-H. & Popel, A. S. Computational systems biology approaches to anti-angiogenic cancer therapeutics. Drug discovery today 20, 187–197 (2015).
Article CAS PubMed Google Scholar
Lamb, J. et al. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. science 313, 1929–1935 (2006).
Article CAS ADS PubMed Google Scholar
Jiang, W. et al. Identification of links between small molecules and mirnas in human cancers based on transcriptional responses. Sci. reports 2, 282 (2012).
Article Google Scholar
Subramanian, A. et al. A next generation connectivity map: {L1000} platform and the first 1,000,000 profiles. Cell 171, 1437–1452.e17, https://doi.org/10.1016/j.cell.2017.10.049 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, Z., Clark, N. R. & Ma’ayan, A. Drug-induced adverse events prediction with the lincs l1000 data. Bioinformatics 32, 2338–2345 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, J. et al. A survey of current trends in computational drug repositioning. Briefings bioinformatics 17, 2–12 (2015).
Article Google Scholar
Musa, A., Tripathi, S., Kandhavelu, M., Dehmer, M. & Emmert-Streib, F. Harnessing the biological complexity of big data from lincs gene expression signatures. PloS one 13, e0201937 (2018).
Article PubMed PubMed Central Google Scholar
Musa, A. et al. A review of connectivity map and computational approaches in pharmacogenomics. Briefings Bioinforma. bbw112–bbw112 (2017).
Nassiri, I. & McCall, M. N. Systematic exploration of cell morphological phenotypes associated with a transcriptomic query. Nucleic acids research (2018).
Caicedo, J. C., Singh, S. & Carpenter, A. E. Applications in image-based profiling of perturbations. Curr. opinion biotechnology 39, 134–142 (2016).
Article CAS Google Scholar
De Wolf, H., De Bondt, A., Turner, H. & Göhlmann, H. W. Transcriptional characterization of compounds: lessons learned from the public lincs data. Assay drug development technologies 14, 252–260 (2016).
Article PubMed PubMed Central Google Scholar
Aliper, A. et al. Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. pharmaceutics 13, 2524–2530 (2016).
Article CAS Google Scholar
Sirci, F. et al. Comparing structural and transcriptional drug networks reveals signatures of drug activity and toxicity in transcriptional responses. NPJ systems biology applications 3, 23 (2017).
Article PubMed PubMed Central Google Scholar
Chen, B. et al. Relating chemical structure to cellular response: an integrative analysis of gene expression, bioactivity, and structural data across 11,000 compounds. CPT: pharmacometrics & systems pharmacology 4, 576–584 (2015).
CAS Google Scholar
Campillos, M., Kuhn, M., Gavin, A.-C., Jensen, L. J. & Bork, P. Drug target identification using side-effect similarity. Science 321, 263–266 (2008).
Article CAS ADS PubMed Google Scholar
Piening, S. et al. Impact of safety-related regulatory action on clinical practice. Drug safety 35, 373–385 (2012).
Article PubMed Google Scholar
Beadle, G. W. & Tatum, E. L. Genetic control of biochemical reactions in neurospora. Proceedings Natl. Acad. Sci. 27, 499–506 (1941).
Article CAS ADS Google Scholar
Vidal, M. A unifying view of 21st century systems biology. FEBS letters 583, 3891–3894 (2009).
Article CAS PubMed Google Scholar
Wang, L. Pharmacogenomics: a systems approach. Wiley Interdiscip. Rev. Syst. Biol. Medicine 2, 3–22 (2010).
Article Google Scholar
Yıldırım, M. A., Goh, K.-I., Cusick, M. E., Barabási, A.-L. & Vidal, M. Drug—target network. Nat. biotechnology 25, 1119 (2007).
Article Google Scholar
Hu, G. & Agarwal, P. Human disease-drug network based on genomic expression profiles. PloS one 4, e6536 (2009).
Article ADS PubMed PubMed Central Google Scholar
Ye, H., Liu, Q. & Wei, J. Construction of drug network based on side effects and its application for drug repositioning. PloS one 9, e87864 (2014).
Article ADS PubMed PubMed Central Google Scholar
El-Hachem, N. et al. Integrative cancer pharmacogenomics to infer large-scale drug taxonomy. Cancer research (2017).
Sorger, P. K. et al. Quantitative and systems pharmacology in the post-genomic era: new approaches to discovering drugs and understanding therapeutic mechanisms. In An NIH white paper by the QSP workshop group, vol. 48 (NIH Bethesda, MD, 2011).
Danon, L., Diaz-Guilera, A., Duch, J. & Arenas, A. Comparing community structure identification. J. Stat. Mech. Theory Exp. 2005, P09008 (2005).
Article Google Scholar
Girvan, M. & Newman, M. E. J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. United States Am. 99, 7821–7826 (2002).
Article CAS ADS MathSciNet Google Scholar
Tripathi, S., Moutari, S., Dehmer, M. & Emmert-Streib, F. Comparison of module detection algorithms in protein networks and investigation of the biological meaning of predicted modules. BMC bioinformatics 17, 129 (2016).
Article PubMed PubMed Central Google Scholar
Kauffman, S. Differentiation of malignant to benign cells. J. Theor. Biol. 31, 429–451 (1971).
Article CAS PubMed Google Scholar
Huang, S., Ernberg, I. & Kauffman, S. Cancer attractors: a systems view of tumors from a gene network dynamics and developmental perspective. In Seminars in cell & developmental biology, vol. 20, 869–876 (Elsevier, 2009).
Mar, J. C. & Quackenbush, J. Decomposition of gene expression state space trajectories. PLoS computational biology 5, e1000626 (2009).
Article ADS MathSciNet PubMed PubMed Central Google Scholar
Jiang, W. et al. Expression of thyroid hormone receptor alpha in 3t3-l1 adipocytes; triiodothyronine increases the expression of lipogenic enzyme and triglyceride accumulation. J. endocrinology 182, 295–302 (2004).
Article CAS Google Scholar
Mai, W. et al. Thyroid hormone receptor a is a molecular switch of cardiac function between fetal and postnatal life. Proc. Natl. Acad. Sci. 101, 10332–10337 (2004).
Article CAS ADS PubMed PubMed Central Google Scholar
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. statistical mechanics: theory experiment 2008, P10008 (2008).
Article Google Scholar
Chen, L., Zeng, W.-M., Cai, Y.-D., Feng, K.-Y. & Chou, K.-C. Predicting anatomical therapeutic chemical (atc) classification of drugs by integrating chemical-chemical interactions and similarities. PloS one 7, e35254 (2012).
Article CAS ADS PubMed PubMed Central Google Scholar
Raymond, M. & Rousset, F. An exact test for population differentiation. Evolution 49, 1280–1283 (1995).
Article PubMed Google Scholar
Tilkov, S. & Vinoski, S. Node. js: Using javascript to build high-performance network programs. IEEE Internet Comput. 14, 80–83 (2010).
Article Google Scholar
Wang, R., Perez-Riverol, Y., Hermjakob, H. & Vizcaíno, J. A. Open source libraries and frameworks for biological data visualisation: A guide for developers. Proteomics 15, 1356–1374 (2015).
Article CAS PubMed PubMed Central Google Scholar
Huang, S. & Kauffman, S. How to escape the cancer attractor: rationale and limitations of multi-target drugs. In Seminars in cancer biology, vol. 23, 270–278 (Elsevier, 2013).
Cheng, W.-Y., Yang, T.-H. O. & Anastassiou, D. Biomolecular events in cancer revealed by attractor metagenes. PLoS computational biology 9, e1002920 (2013).
Article CAS PubMed PubMed Central Google Scholar
Li, Q. et al. Dynamics inside the cancer cell attractor reveal cell heterogeneity, limits of stability, and escape. Proc. Natl. Acad. Sci. 113, 2672–2677 (2016).
Article CAS ADS PubMed PubMed Central Google Scholar
Creixell, P., Schoof, E. M., Erler, J. T. & Linding, R. Navigating cancer network attractors for tumor-specific therapy. Nat. biotechnology 30, 842 (2012).
Article CAS Google Scholar
Emmert-Streib, F. The chronic fatigue syndrome: a comparative pathway analysis. J. computational biology 14, 961–972 (2007).
Article CAS MathSciNet Google Scholar
Del Sol, A., Balling, R., Hood, L. & Galas, D. Diseases as network perturbations. Curr. opinion biotechnology 21, 566–571 (2010).
Article Google Scholar
Emmert-Streib, F. & Glazko, G. V. Network biology: a direct approach to study biological function. Wiley Interdiscip. Rev. Syst. Biol. Medicine 3, 379–391 (2011).
CAS Google Scholar
Duan, Q. et al. Lincs canvas browser: interactive web app to query, browse and interrogate lincs l1000 gene expression signatures. Nucleic acids research 42, W449–W460 (2014).
Article CAS PubMed PubMed Central Google Scholar
Vidović, D., Koleti, A. & Schürer, S. C. Large-scale integration of small molecule-induced genome-wide transcriptional responses, kinome-wide binding affinities and cell-growth inhibition profiles reveal global trends characterizing systemslevel drug action. Front. genetics 5, 342 (2014).
Google Scholar
Barrett, T. et al. Ncbi geo: archive for functional genomics data sets—update. Nucleic acids research 41, D991–D995 (2012).
Article PubMed PubMed Central Google Scholar
Wu, P., Nielsen, T. E. & Clausen, M. H. Small-molecule kinase inhibitors: an analysis of fda-approved drugs. Drug Discov. Today 21, 5–10 (2016).
Article CAS PubMed Google Scholar
Lincscloud. LINCS L1000 R tools. http://support.lincscloud.org/hc/en-us/articles/202062163-L1000-Code-via-GitHub-(2014). [Online; accessed 19-July-2016].
Chambers, J. et al. Unichem: extension of inchi-based compound mapping to salt, connectivity and stereochemistry layers. J. cheminformatics 6, 43, https://doi.org/10.1186/s13321-014-0043-5 (2014).
Article CAS MathSciNet Google Scholar
Jaccard, P. Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull Soc Vaudoise Sci Nat 37, 547–579 (1901).
Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. royal statistical society. Ser. B (Methodological) 289–300 (1995).

Download references

Acknowledgements

A.M. thanks the CIMO foundation Finland for a scholarship. M.D. thanks the Austrian Science Funds for supporting this work (Project P30031).

Author information

Authors and Affiliations

Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
Aliyu Musa, Shailesh Tripathi & Frank Emmert-Streib
Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
Aliyu Musa, Olli Yli-Harja & Frank Emmert-Streib
Department for Biomedical Computer Science and Mechatronics, UMIT - The Health and Lifesciences University, Eduard Wallnoefer Zentrum 1, 6060, Hall in Tyrol, Austria
Matthias Dehmer
College of Computer and Control Engineering, Nankai University, Tianjin, 300350, P.R. China
Matthias Dehmer
Computational Systems Biology Lab, Tampere University of Technology, Korkeakoulunkatu 10, 33720, Tampere, Finland
Olli Yli-Harja
Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Wehrgrabengasse 1-3, 4400, Steyr, Austria
Shailesh Tripathi & Matthias Dehmer
Institute for Systems Biology, Seattle, WA, 98109, USA
Olli Yli-Harja & Stuart A. Kauffman

Authors

Aliyu Musa
View author publications
You can also search for this author in PubMed Google Scholar
Shailesh Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Dehmer
View author publications
You can also search for this author in PubMed Google Scholar
Olli Yli-Harja
View author publications
You can also search for this author in PubMed Google Scholar
Stuart A. Kauffman
View author publications
You can also search for this author in PubMed Google Scholar
Frank Emmert-Streib
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.M., S.T. and F.E.S. conceived the study and conducted the analysis, A.M., M.D. and F.E.S. interpreted the results. All authors wrote the manuscript.

Corresponding author

Correspondence to Frank Emmert-Streib.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

LaTeX Supplementary File

Supplementary Dataset 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Musa, A., Tripathi, S., Dehmer, M. et al. Systems Pharmacogenomic Landscape of Drug Similarities from LINCS data: Drug Association Networks. Sci Rep 9, 7849 (2019). https://doi.org/10.1038/s41598-019-44291-3

Download citation

Received: 12 October 2018
Accepted: 08 May 2019
Published: 24 May 2019
DOI: https://doi.org/10.1038/s41598-019-44291-3
Springer Nature Limited

This article is cited by

Factor-specific generative pattern from large-scale drug-induced gene expression profile
- Se Hwan Ahn
- Ju Han Kim
Scientific Reports (2023)

Systems Pharmacogenomic Landscape of Drug Similarities from LINCS data: Drug Association Networks

Abstract

Similar content being viewed by others

Relationship between drug targets and drug-signature networks: a network-based genome-wide landscape

Drug Signature Detection Based on L1000 Genomic and Proteomic Big Data

Genomic Approaches for Drug Repositioning

Introduction

Results

Construction of drug association networks

Modules in Dans

Significance of ATC interactions in the entire network

Enrichment analysis of network modules

Web interface for DAN of drugs

Discussion

Single-drug class in individual modules

Single-drug class in multiple modules

Multiple-drug classes in a single module

Multiple Drug classes in multiple modules

Conclusion

Materials and Methods

Drug perturbation data from LINCS data

DrugBank database

Metadata pipeline

Transcriptional profiles and small molecules diversity

Mapping small molecules to external databases

Drug association network

Jaccard Index

Construction of the drug association network

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Supplementary information

LaTeX Supplementary File

Supplementary Dataset 1

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Factor-specific generative pattern from large-scale drug-induced gene expression profile

Search

Navigation