Mechanism-based risk assessment is the gold standard for safety assessment of drugs and chemicals, but is resource- and time-intensive (Lanzoni et al. 2019). Establishing mechanisms generally requires extensive experimentation, leveraging literature data collected over years, and is often based on preclinical (animal) studies, which may or may not translate to human (Bailey and Balls 2019; Clark and Steger-Hartmann 2018; WHO-IPSC 2018). Approaches that rapidly reveal mechanistic detail of preserved responses between animal and human would advance next generation risk assessment and impact both predicting and monitoring toxicity (Rivetti et al. 2020). For example, early in development, drugs that cause preclinical toxicity are often discarded based on preclinical studies without knowing if a toxicity mechanism will translate to human, thus valuable compounds may be discarded. In addition, screening for safer compounds is hampered if testing assays do not interrogate compounds against a mechanism-related event and results are not interpreted in the context of organ-specific toxicity.

Liver is a primary target organ for drugs and chemical toxicants due to its role in metabolism and disposition (Zhang and Venkat 2020). Drug-induced liver injury (DILI) is a major cause of clinical liver failure (Björnsson 2019; Reuben et al. 2016), drug attrition and black box warnings (Onakpoya et al. 2016; Solotke et al. 2018; Watkins 2011). DILI manifests as a variety of clinical pathologies and may depend on genetic and environmental factors making its prediction a challenge for the current preclinical testing paradigm (Koido et al. 2020). Likewise, hepatotoxicity and hepatocarcinogenesis are major concerns for environmental exposures (Colombo et al. 2019; Yorita Christensen et al. 2013). Both non-genotoxic and genotoxic carcinogens often produce hepatoxicity prior to the emergence of liver tumors in longer term preclinical studies (Karin and Dhar 2016). In both situations, hepatotoxicity can be regarded as a multistep, multicellular disease process, where an initial molecular stress is followed by a series of cellular key events that couple the initial stress to an apical endpoint observable as a pathology, e.g. hepatotoxicity or liver tumors. However, in the absence of a mechanism that links cellular (stress) events to a pathology, risk assessment is typically based on the apical endpoint which can take months or years to develop.

Toxicogenomics, the study of transcriptome responses in toxicology, is a promising tool for a comprehensive analysis of toxicity to enable mechanism-based risk assessment (Liu et al. 2019). Current toxicogenomic approaches mainly rely on the analysis of Differentially Expressed Genes (DEGs) or enrichment analysis using annotated pathway analysis tools (Barel and Herwig 2018). Such an approach depends on ontologies with a high degree of redundancy and capture current knowledge of toxicology and general biology. These approaches are useful but biased toward genes that are well annotated and are not designed to interpret mechanisms of toxicity applied to specific testing systems (e.g. hepatocytes). For these and other reasons, and despite several decades of research, toxicogenomics has not produced the anticipated transformation of current safety assessment standard practice (Vahle et al. 2018).

It is known that groups of genes expressed downstream of a (stress-responsive) transcription factor will show co-expression (Yin et al. 2021). For this reason, gene co-expression analysis has been applied to toxicogenomic datasets for rat liver (Podtelezhnikov et al. 2020; Sutherland et al. 2018), but not for human hepatocytes up to now. These co-expressed gene sets mediate the cellular response to stress, providing mechanistic information on the cellular processes and key events involved in adaptation and progression. Herein, we used weighted gene co-expression network analysis (WGCNA) to identify sets of co-expressed genes (termed ‘modules’) using the large TG-GATEs toxicogenomic dataset for primary human hepatocytes (PHH) as a surrogate to address hepatotoxicity and gene expression in the human context (Igarashi et al. 2015; Zhang and Horvath 2005). Gene modules were deployed in an R-Shiny analysis framework accessible via an interactive website, the PHH TXG-MAPr ( to facilitate rapid visualization and mechanistic interpretation of transcriptomic data. Gene modules were annotated with external annotation resources (e.g. pathway enrichment and transcription factor modulation), which are useful in identifying known stress response pathways and modules populated by novel genes responsive to specific stressors. Using endoplasmic reticulum (ER) stress as a case study, we show that canonical ER stress response genes are captured in gene co-expression modules, including novel guilt-by-association genes. We identified ER stress as an early event in cyclosporine A (CSA)-induced toxicity and showed that ER stress modules respond similarly between CSA and the prototypical ER stressor, tunicamycin. ER stress and other known stress response pathways, which were captured in the PHH modules, are conserved across species and test systems. Stress response modules correlate in clusters with defined biological functions triggered by compound exposure. We showed that datasets of different sources can be analyzed with the PHH TXG-MAPr tool for mechanistic interpretation. Finally, we leveraged our PHH WGCNA modules to evaluate donor-to-donor variability upon tunicamycin exposure and identified biological sources of variations. Our results show that co-expression analysis of toxicogenomic data using the PHH TXG-MAPr framework can support human-relevant next generation mechanism-based risk assessment.

Materials and methods

Gene expression data processing

TG-GATEs and GEO microarray data

Microarray data from primary human hepatocytes (PHH) in the TG-GATEs repository were downloaded ( and jointly normalized using the Robust Multi-array Average (RMA) method within the affy R package version 1.52.0 (Gautier et al. 2004). To map probe sets to Entrez IDs, the BrainArray chip description file (CDF) version 20 was used (, HGU133Plus2 array version). Under this annotation, every gene is defined by a single probe set. This resulted in 19,363 unique probe sets, each mapped to a single gene. The TG-GATES (TG) repository contains 941 PHH treatments, where each treatment is defined as a combination of compound, time and concentration. Samples treated with vehicles are available for each individual time point, thus resulting in batch effect removal (Grinberg et al. 2014). For each compound/timepoint combination, the limma R package version 3.30.13 (Ritchie et al. 2015) was used to calculate log2 fold-change values, which was performed by building a linear model fit including concentration levels in the design as covariate and computing the log-odds of differential expression by empirical Bayes moderation. Significance of log2FC was determined by using Benjamini–Hochberg multiple testing correction. TG-GATEs rat primary hepatocytes (RPH) and rat in vivo liver datasets have been analyzed following the same steps using suitable BrainArray CDFs (version 19, Rat2302 array version). Additional human hepatocytes datasets for uploading into the TXG-MAPr tool were first downloaded from Gene Expression Omnibus (GEO: GSE83958; GSE45635; GSE53216; GSE74000; GSE13430; GSE104601. The first four datasets were processed similarly to the TG-GATEs PHH dataset, because the samples were analyzed with the same microarray platform (Affymetrix HGU-133 Plus 2). The GSE13430 and GSE104601 datasets were generated from the Agilent Human 1A Microarray (V2) G4110B and SurePrint G3 Human GE v2 8 × 60 K microarray platforms, respectively. These datasets were analyzed with GEO2R using the provided NCBI gene annotation. In the case of duplicated measurements (probes) for a single gene, the most significant probe (adjusted p value) was used for uploading the data.

TempO-Seq S1500 + set PHH data

The inter-individual variability in chemical-induced ER stress signaling and module activation was evaluated by using available TempO-Seq data of PHHs derived from 50 individuals which were exposed to a wide concentration range of tunicamycin (Niemeijer et al. 2021; Mav et al. 2020). In short, plateable cryopreserved PHHs derived from 50 individuals (KaLy-Cell, Plobsheim, France) were plated in 96 wells BioCoat Collagen I Cellware plates (Corning, Wiesbaden, Germany) at a density of 70,000 viable cells per well. Cells were allowed to attach for 24 h and treated with tunicamycin (Sigma) at a concentration range of 0.0001–10 µM for 8 or 24 h. After exposure, cells were lysed with 1 × TempO-Seq lysis buffer (BioSpyder) and stored at − 80 °C. Lysates were analyzed at BioSpyder (Carlsbad, CA, USA) using the TempO-Seq technology in combination with the S1500 + gene set (Mav et al. 2018). Data analysis is reported in Mav et al. 2020, here briefly summarized: experiments having library size of raw counts lower than 100,000 counts were filtered out. Raw counts were normalized with DESeq2 (R package version 1.28.1) normalization (Love et al. 2014), log2 transformed and analyzed with BMD express version 2.3 (Phillips et al. 2019).

Weighted gene co-expression analysis

To identify co-expressed genes from the PHH data, we used the WGCNA R package version 1.51 (Langfelder and Horvath 2008) and applied it to a matrix consisting of 941 rows (PHH experiments) and 17,500 columns (log2 fold change values for genes). Genes were included into the analyses when they showed FDR (BH) < 0.001 in at least one experiment. We created unsigned gene modules (i.e. grouping together co-induced and co-repressed genes), and selected the optimal soft power threshold maximizing both the scale-free network topology using standard power-law plotting tool in WGCNA (Langfelder and Horvath 2008) and the exclusion of genes having low gene intensity in the DMSO controls (determined via t-tests). We selected 5 as the optimal soft-power parameter. By further refining modules built using WGCNA function to merge similar modules (those having correlation of their eigengene values ≥ 0.8), we obtained 398 modules containing 10,275 genes. As described in (Sutherland et al. 2016), for each experiment we calculated the eigengene score (EGs, or module score) which summarizes log2 fold change of their constituent genes. Briefly, this protocol consisted of performing PCA on the gene matrix of each module, normalizing the log2FC across the entire dataset using Z-score conversion, the 1st principal component corresponds to the EGs. To simplify comparison between modules, the raw module score was normalized to unit variance (fraction between each module score and its standard deviation across the entire dataset) facilitating comparison across modules and across treatment. Therefore, the modules score indicates the level of activation or repression induced by a given treatment, when considering such changes in the context of the large collection of drug perturbations. Represented as Z-score, an eigengene score greater than + 2.0 or smaller than − 2.0 can be considered as a large (and relevant) perturbation in the context of the 941 experiments Although the EGs represent the overall activation or repression of the module gene members, each gene is ordered based on intra-modular connectivity with other genes. For every gene, we calculated the correlation between the log2 fold change versus the eigengene score of the parent module across the 941 experiments (termed ‘corEG’). The gene with the highest correlation between log2FC and module EGs (so called ‘hub gene’) is the most representative of the entire module matrix and show stronger connection to the other module genes. Both positive and negative correlation with the EGs, corEG, were allowed, thus genes that are inversely correlated can be members of the same module. A negative corEG indicates an inverse relationship between the gene log2FCs and its module EGs. The matrix containing EGs across all treatments was organized into a folded hierarchical tree (dendrogram), based on Ward’s hierarchical clustering of pair-wise Pearson correlations for each module across all treatment conditions. Module locations on dendrogram branches were identified with a hierarchical anti-clockwise nomenclature system (Supp. Fig. S1a). Some module branches were artificially elongated to separate module clusters and improve visualization (Supp. Fig. S1b). Compound correlation was calculated with Pearson and Spearman correlation using the module EGs across the entire set of treatment conditions (resulting in 941 instances); similarly, module correlation was calculated across all modules (resulting in 398 instances). Preservation between the modules structure obtained with the PHH TG-GATEs data set and the RPH TG-GATEs and rat in vivo liver data sets has been performed as follows: (1) rat gene IDs have been converted to human gene IDs with the Rat Genome Database, which provided one-to-one ortholog matches (, Smith et al. 2020), (2) preservation statistics have been calculated with the WGCNA R package and thresholds for interpretation were adopted by relevant literature (Langfelder et al. 2011). Briefly, a module showing Z summary >= 2 is considered moderately preserved, >= 10 highly preserved. A lower median rank indicate higher preservation.

Pathway mapping and enrichment analyses

Over Representation Analysis (ORA) was performed via Consensus Pathway DB (cpdb version 34), including the following databases: BioCarta, EHMN, HumanCyc, INOH, KEGG, NetPath, Reactome, Signalink, SMPDB, Wikipathways, UniProt, InterPro (Kamburov et al. 2013). GO enrichment was obtained with the R package topGO version 2.26.0, algorithm = "classic", statistic = "fisher" (Alexa and Rahnenführer 2007). For both resources, we included in the tool enriched terms satisfying hypergeometric test p value < 0.01. However, for module interpretation (Table 1 and Table S8) only the top 10 associated terms per modules were included, that corresponded to terms having always adjusted p value (FDR) for CPDB lower than 0.05. String DB (Szklarczyk et al. 2015) was used to obtain protein–protein interaction networks of specific modules. Edge weights are proportional to the combined score of the nodes that the edges connect (Szklarczyk et al. 2015). Graphical rendering was obtained with Cytoscape (Shannon et al. 2003) using the StringApp ( bridge component.

Table 1 Overview of the properties of the modules associated with the stress response

Transcription factor (TF) enrichment and TF activity scoring

A hypergeometric test was performed on gene members in each WGCNA module to identify its regulatory TFs using the function phyper within the stat package in R (version 3.5.1). The gene set of TFs and their regulated genes (regulons) are derived from DoRothEA (version 2,, Garcia-Alonso et al. 2019) with two sets of confidence levels: the “high confidence” level comprises categories A, B and C while the “high coverage” level comprises categories A, B, C and D. The enriched TFs with p value less than 0.01 were included in the study. In parallel, TFs’ activities were estimated as normalized enrichment scores using the function viper from the viper package (version 1.16.0, Alvarez et al. 2016) with two confidence sets of TF-regulon from DoRothEA as described. All parameter settings were assigned as in the original DoRothEA study (Garcia-Alonso et al. 2019).

Web application TXG-MAPr

The user interface application of the TXG-MAPr tool has been implemented using the R-shiny package version 1.6.0 (Chang et al. 2021). The graphical part of the application has been implemented through the functionalities of the ggplot2, ape, igraph, hclust and pheatmap R-packages (Csardi and Nepusz 2006; Kolde 2019; Paradis and Schliep 2019; Wickham 2016). The credential system to log into the PHH TXG-MAPr tool is established using the shinyauthr package ( The PHH TXG-MAPr tool is available at

Uploading new data files into TXG-MAPr

In order to calculate new EGs for each module from external data, first a modified Z-score is calculated by dividing the gene log2FC by the standard deviation of the gene log2FC across all TG-GATEs conditions, and is further weighted by the correlation eigengene score (corEG) (Eq. 1). New EGs for each module are calculated by summing the Z-scored gene log2FC values of the module genes (n = number of genes in a module), normalized by the standard deviation of the raw module score in the TG-GATEs dataset (Eq. 2). When data of certain genes in a module is not available in the uploaded dataset, then the Z-score for that gene is assumed to be zero, which may create an underestimation of the final module EGs:

$$Z{\text{-score}} = \frac{{\log 2{\text{FC}}}}{{{\text{SD }}(\log 2{\text{FC}})}} *{\text{ corEG,}}$$
$${\text{EGs}} = \frac{{\sum\nolimits_{k = 1}^{n} {Z{\text{-score}}} }}{{{\text{SD }}\left( {\sum\nolimits_{k = 1}^{n} {{\text{all }}Z{\text{-score}}} } \right)}}.$$

The new module EGs will be overlaid onto the PHH TXG-MAPr dendrogram and will be fully integrated into the web application for that particular session. Data will be removed when the session is closed.

Data analyses

Cluster correlation of the expanded seed module set

Figures were made using R and the packages ggplot2, pheatmap, igraph. Heatmap in Fig. 3 is obtained with the following steps: (1) selection of ‘seed modules’ i.e., modules having clear interpretation based on term enrichments and TFs associations (see Table 1). Because TG-GATEs data is a sparse matrix, i.e. not all the compounds have been tested at all concentration levels and time points, we removed the 2-h experiments (missing for roughly 50% of the compounds) and applied imputation of the remaining missing value with Singular Value decomposition via the R package pcaMethods using the svdImpute function (Stacklies et al. 2007; Troyanskaya et al. 2001). (2) Pearson correlations between seed modules and all other preserved modules were calculated considering all remaining EGs from TG-GATEs (119 modules × 779 experiments). Modules were added in (termed ‘add-in modules’) if the (seed x target) correlation values were ≥ 0.7 to yield the ‘expanded set’ of 87 modules. Hierarchical clustering was obtained by applying the Ward D2 method, using 1-cor as distance. In particular, we applied the WardD2 method, including the full implementation of the Ward clustering criterion. Note that the results of Ward’s agglomerative clustering are likely to delineate clusters that visually correspond to regions of high densities of points in PCA ordination, e.g. produce ‘round’ clusters (Murtagh and Legendre 2014).

Targeted genes set analysis (S1500 +)

To evaluate the quality of the genes belonging to the BioSpyder TempO-Seq S1500 + set, we tested for difference of mean absolute corEG with a random draw of genes from the complete pool of genes available in the TG-GATEs dataset. The randomly selected gene set have the same size as the overlap of the S1500 + set with the TG-GATEs gene set (n = 1830) and this process was repeated 10,000 times (Wilcoxon test), adjusting the p value with Bonferroni method. The heatmap in Fig. 5 was obtained with the “complete” clustering algorithm applied to the Euclidean distance between samples. Donors module clusters were obtained with the same method as for TG-based module clusters (see paragraph above). The overlap between TG module clusters and donor module clusters was calculating using the WGCNA R package (Langfelder and Horvath 2008) using the overlap function, which calculates the numerical overlaps between groups and quantify the significance by applying Fisher exact test. Donor traits (phenotypes) associations with modules responses after treatment of donor hepatocyte cultures were determined as previously described in (Sutherland et al. 2018). In brief: (1) different concentrations were analyzed separately, and only after 24 h of exposure (generally higher transcriptomic changes). (2) For cluster to traits associations, cluster behaviors were calculated as mean of the EGs scores of the modules’ members of each cluster. Since only modules with high correlation populate a cluster, we expect that averaging their EGs would lead to a concordant score. To understand driver-modules for each association, we calculated module-to-trait relationships. (3) The association between cluster (or module) and the occurrence of a donor trait was quantified using Cohen’s d, a measure of effect size. Since the average absolute module eigengene (avgAbsEG), a measure of overall transcriptional activity, can have an effect size for traits associations, logistic regression was performed to determine the contribution of a given module in explaining the residual odds of toxicity after accounting for avgAbsEG, and the significance of the module represented as adjusted p values (padj).


The PHH TXG-MAPr: a tool to visualize and explore biological interpretation in primary human hepatocyte transcriptomic data

We applied WGCNA to the TG-GATEs primary human hepatocytes (PHH) dataset (Igarashi et al. 2015), which includes exposure data for 158 compounds at various time points and concentration levels to identify the predominant networks of co-expressed genes (Fig. 1a exemplifies the process for a subset of co-expressed genes). We obtained 398 networks (modules) of highly correlated genes. Modules are labeled with a number inversely proportional to the number of member genes; larger modules have smaller number labels. An eigengene score (EGs) was computed for each module as the first principal component of each module data matrix and further scaled (see Material and Methods). Thus, the EGs represents the trend (induction or repression) of the entire module based on the included genes. By performing WGCNA, the final data matrix was reduced to 941 columns (treatments) and 398 rows (module EGs) which corresponds to a 97.7% reduction in dimensionality of gene expression.

Fig. 1
figure 1

The PHH TXG-MAPr: an innovative tool to visualize and understand PHH toxicogenomic data. a Example gene expression data matrix. Log2FC values of genes (rows) are shown for multiple treatment conditions (columns). Four groups of co-expressed genes (modules PHH:13, 62, 118 and 144), highlighted with boxes and color, show consistent patterns across the experimental conditions and are exemplified with the gene networks on the right side. Treatments are clustered using Euclidean distance to group conditions that regulate the same groups of co-expressed genes. b Hierarchical tree view (dendrogram) of module scores at 24 h cyclosporine A exposure at HI dose level (6 µM). Highlighted modules are PHH:13, PHH:62 and PHH:118 which are strongly induced by CSA (orange/red circles), while module PHH:144 is unchanged (yellow circles). c Comparing modules and compounds. Top: gene log2 fold change (log2FC) for cyclosporine A (HI dose = 6 µM, 24 h) are plotted against gene log2FC for tunicamycin (MED dose = 2 µg/mL, 24 h), showing a Pearson correlation of 0.74. Center: module EGs for cyclosporine A are plotted against module EGs for tunicamycin, showing a Pearson correlation of 0.84. Bottom: EGs for module PHH:62 are plotted against EGs for module PHH:13 (all compounds, concentrations and time points) and show a Pearson correlation of 0.62. Straight blue line represents a fitted linear model. Gray shades represent confidence interval. d Grouping genes into modules. Top: gene network for module 62. Colors qualitatively represent log2FC upon treatment with cyclosporine A at HI dose (6 µM) for 24 h. Edges thickness is proportional to the adjacency value among the genes. The squared gene is the hub-genes. Center: module EGs profile for module PHH:62 at different cyclosporine A concentrations and time points (LO = 0.24 µM, MED = 1.2 µM, HI = 6 µM). Bottom: gene log2FC profile for hub-like genes belonging to module 62 at different cyclosporine A concentrations and time points. Color code of points and edges represents the correlation eigengene (corEG). e Annotating the modules. Top: Hierarchical tree view with module color and size proportional to the − log10 p value for the enrichment for the GO term “response to endoplasmic reticulum stress” (GO:0006520). Significantly enriched modules PHH:13, 15 and 62 with p values (< 0.01) are highlighted in blue. Center: Hierarchical tree view with module color and size proportional to the − log10 p value for the enrichment for the transcription factor ATF6. ATF6 enriched modules PHH:13 and 193 with significant p values (< 0.01) are highlighted in purple. Bottom: ATF6 activation score considering all its targets for cyclosporine A (LO = 0.24 µM, MED = 1.2 µM, HI = 6 µM) and tunicamycin (LO = 0.4 µg/mL, MED = 2 µg/mL, HI = 10 µg/mL) (color figure online)

To facilitate the analysis of transcriptomic information from module responses, we developed a module-based R-Shiny visualization and analysis framework, the PHH toxicogenomic (TXG) MAPr (PHH TXG-MAPr). The EGs-treatment data matrix was organized into a folded hierarchical tree, or dendrogram, based on Ward’s hierarchical clustering of pair-wise Pearson correlations for each module across all treatment conditions (Fig. 1b, Supplemental Fig. S1a, b). The dendrogram allows one to visually appreciate the induction or repression of modules EGs (red to blue color scale, respectively) and the proximity to modules with similar behavior (Pearson R; see Supp. Fig. S2 for dose– and time–response dendrograms for example compounds). We also incorporated module enrichment measures for biological pathways and ontologies, transcription factor-target pairs, compound similarity analysis and a variety of other useful functions for mechanistic analysis and data mining.

To illustrate the utility of the PHH TXG-MAPr analysis framework, we investigated the response of cyclosporine A (CSA), which was transcriptionally active in PHH and has both mild hepatotoxic and more severe renal toxicity potential (Rezzani 2004). Strong induction of modules PHH:13 and PHH:62 can be seen 24 h after treatment with 6 µM CSA (Fig. 1b). Using the compound correlation functionality, we sampled all pair-wise Pearson correlations for all 398 module EGs across all treatment conditions; CSA shows highest similarity to tunicamycin (Pearson R = 0.84), suggesting a common mode of action (Fig. 1c, center). Modules at the extremities of the correlation plot (PHH:13 and PHH:62 in the example) can be highlighted and tabulated to facilitate further analysis. Correlation between gene log2FC from the same treatment conditions had lower similarity (Pearson R = 0.74; Fig. 1c, top). On average, across all TG-GATEs compounds the EGs-based max Pearson correlations were higher than max correlation based on gene log2FC (Supp. Fig. S1c), suggesting that analyzing gene networks (modules) improves the robustness of compound comparison of transcriptomic data by averaging out the variations of individual gene perturbations. Although PHH:13 and PHH:62 are in different branches they did show a Pearson correlation of 0.62 for all TG-GATEs treatment conditions (Fig. 1c, bottom).

Tunicamycin (TUN) is a prototypical inducer of ER stress while there is literature reporting that CSA also perturbs ER functions (Foufelle and Fromenty 2016; Van Summeren et al. 2013; Vickers et al. 2017). Therefore, we investigated modules PHH:13 and PHH:62 for genes associated with ER, ER stress and unfolded protein response (UPR). Module PHH:13 contains 121 genes, including well-known ER stress genes, like HSP90B1 (GRP94), SEL1L, PDIA6, HSPA5 (BiP/GRP78) and its co-chaperone DNAJB9 (Suppl. Table S2). Since PHH:13 was too large to visualize in a gene interaction plot, Fig. 1d (top) shows the module plot for the smaller module PHH:62 (19 genes). This module contains genes involved in endoplasmic reticulum-associated degradation (ERAD), such as SELENOS and SELENOK, but also other ER resident genes with less clear connection to ER stress yet, such as FKBP2, SEC11C and SSR4 (Fig. 1d, top). There is a clear time- and dose-dependent induction of module PHH:62 by CSA, which follows a similar trend for the log2FC induction of the hub-like (highest corEG) module genes (Fig. 1d, center and bottom).

Using the PHH TXG-MAPr module enrichment functionalities, we investigated the enrichment of biological processes and pathways (see Methods) as well as transcription factors (TFs) enrichment using the DoRothEA gene set resource (Garcia-Alonso et al. 2018). Results were overlaid on the module dendrogram (Fig. 1e). Not surprisingly, modules PHH:13 and 62 showed the lowest p-values for GO-CC term “endoplasmic reticulum” as well as terms related to endoplasmic reticulum (ER) stress amongst others (Fig. 1e, top). Module PHH:13 was also enriched for transcription factor ATF6 target genes (Fig. 1e, center) consistent with the presence of more highly annotated ER stress genes. To determine if PHH:13, the primary module enriched for ATF6 gene targets, reflected general ATF6 activation, we calculated the ATF6 activation scores for all ATF6 target genes in the 10275 TXG-MAPr genes (see more details on DoRothEA in the “Materials and Methods” section) and we observed that ATF6 scores also showed time- and dose-dependent activation for both CSA and TUN (Fig. 1e, bottom).

Using the PHH TXG-MAPr analysis framework, we were able to rapidly identify the activation of an ATF6 regulated ER stress response as an early event following cyclosporine A exposure. Data underpinning these functionalities, and others not noted here, are accessible in a tabular format in the supplementary materials (Supp. Tables S1–S7). The dedicated PHH TXG-MAPr application is available at

Module preservation in the PHH-TXG-MAPr

Preservation statistics can be used to determine if networks’ node-edge relationships defined in one biological system are preserved in another (Langfelder et al. 2011). We evaluated network preservation of PHH modules versus rat primary hepatocyte (RPH) and in vivo rat liver TG-GATEs datasets using two different preservation statistics: Z-summary and Median Rank (see Methods, Fig. 2a, b, Supp. Table S8, Supp. Fig. S3). Z-summary preservation statistic shows higher dependency on modules size, as noted in literature; large modules such as PHH:13 are highly preserved (Fig. 2a, color scale) (Langfelder et al. 2011). The complementary Median Rank statistic (Fig. 2b) is less dependent on module size, and can assign high preservation ranking (quantified with low numerical scores) to small modules. Therefore, we considered preserved modules as those with Z-summary preservation score for both systems (RPH and rat liver) higher than 2 or ranked in the top 100 modules for the Median Rank statistics for both systems (for the latter, arbitrary threshold corresponding to 25% of all the PHH modules). This selection led to a set of 102 (26%) preserved modules from the PHH TXG-MAPr (Supp. Table S8), which include the UPR modules PHH:13 and PHH:62.

Fig. 2
figure 2

Some PHH WGCNA modules are preserved in rat systems and connect to stress response pathways. a Z summary preservation score plot. Z summary preservation values of PHH modules in Rat in vivo Liver data (x-axis) are plotted against Z summary preservation values of PHH modules in Rat Primary Hepatocytes data (y-axis). Modules are colored based on their size (log10 transformed) and modules PHH:13 and PHH:62 are labeled. Higher scores imply better preservations. The dashed lines correspond to Z summary = 2 and Z summary = 10, above which a module can be considered, respectively, moderate and high preserved. b Median Rank preservation score plot. Median Rank preservation values of PHH modules in Rat in vivo Liver data (x-axis) are plotted against Median Rank preservation values of PHH modules in Rat Primary Hepatocytes data (y-axis). Modules are colored based on their size (log10 transformed) and modules PHH:13 and PHH:62 are labeled. Lower scores imply a higher rank and greater preservation. c Dose- and time-response EGs plots of modules representing stress response pathways. Modules are grouped by stress processes (roman letters) and their responses upon treatment with representative compounds is shown in dose–response (x axis), faceted by time. Modules are represented by different symbols, while compounds by different colors (color figure online)

Stress response pathways represented by PHH modules

Pathway information facilitates mechanism-based risk assessment (Krewski et al. 2020). Therefore, we used the TXG-MAPr enrichment functionalities to identify modules representing cellular stress response pathways typically involved in toxicity and activated by stress-responsive TFs. These include the ER stress response and the larger integrated stress response (ISR), as well as inflammatory pathway, mitochondrial response, oxidative stress and the DNA damage responses, all of which were previously shown to be relevant for identifying the modes of action of compound-induced toxicity (Wink et al. 2014). We focused on modules which (1) were identified by consistent enriched terms and TF associations with p values < 0.01 and FDR < 0.05, and (2) show response to prototypical compounds (Fig. 2c, Table 1). Below are the selected stress responses pathways represented by PHH modules which we aim to highlight in this study:

ER stress and ISR

As noted above, modules PHH:13 and PHH:62 are enriched for terms associated to endoplasmic reticulum localization, UPR and ERAD as well as transcriptional regulation by ATF6, all of which are components of the larger ISR (Hetz et al. 2020; Ron and Walter 2007). We also looked for modules regulated by ATF4, a transcription factor downstream of the ISR hub, and IF2α, a regulator of translation after stresses including ER stress, starvation, and viral infection (Pakos‐Zebrucka et al. 2016). Modules PHH:15 and PHH:295 were enriched for ATF4 gene targets, and terms associated with amino acid metabolism and transport, respectively (Table 1), which are processes that are regulated by ATF4 (Hetz et al. 2020). All four modules respond to cyclosporine A and tunicamycin (Fig. 2c, panel i).

Heat-shock, proteasome and lysosome

In response to proteotoxic stress, damaged proteins are bound to members of heat shock-inducible chaperone system (heat shock response), which facilitate removal of damaged proteins by lysosomes and/or proteasome degradation. Modules PHH:177, PHH:76 and PHH:82 (proteasome) and PHH:131, PHH:95 (heat shock) are preserved and respond to treatment with allyl alcohol (Mandrekar et al. 2008) (Fig. 2c, panel ii, Table 1). Module PHH:17 (annotated for GO:CC lysosome and includes SQSTM1) is also found to be preserved and is activated by fluoxetine, a known phospholipidosis inducer (Breiden and Sandhoff 2019) (Fig. 2c, panel ii, Table 1).

Immune response

Immune response pathways including inflammatory mediators are activated in hepatocytes during liver injury and in disease states (Campos et al. 2020; Woolbright and Jaeschke 2017). Immune response and inflammation terms are enriched in several preserved modules (PHH:12, PHH:247, PHH26, PHH:22 and PHH:136) that respond to inflammatory agents such as interferon-α and TNF-α (Fig. 2c, panel iii, Table 1). Modules PHH:242, PHH:44 and PHH:70 are also annotated for immune response but not preserved (Table 1). Compound regulation and module enrichment terms identify subgroups of immune response modules. STAT signaling modules (PHH:12, PHH:247) are strongly induced by interferon, while NF-kB signaling modules (PHH:22, PHH:136) are mainly induced by TNFα. Module PHH:26 is induced by both stimuli, but also has a mixed NF-kB and STAT annotation.


Mitochondria are an important target for hepatotoxic chemicals and mitochondrial injury is commonly used to screen for hepatotoxic potential (Rana et al. 2019; Weaver et al. 2020; Yang et al. 2015). Mitochondrial damage is commonly assessed by changes in respiration and the mitochondrial membrane potential. Less is known about the regulation of genes coding for mitochondrial proteins in response to toxic stress. Several preserved PHH modules are found to be annotated with mitochondria-related and mitochondria-component specific terms (PHH:113, PHH:138, PHH:2, PHH:256, PHH:33, PHH:97). These modules respond positively to doxorubicin (Osataphan et al. 2020) and are found to be repressed by ethionine (Zhang et al. 2020) (Fig. 2c, panel iv, Table 1). Notably, PHH:33 and PHH:97 were annotated by terms relating to regulation of mitochondrial ribosome translational control and mitochondrial organization. Module PHH:173 shows enrichment for mitochondria genes, but is not preserved (Table 1).

DNA-damage and oxidative stress

Modules PHH:59 and PHH:83 are enriched for TP53-regulated genes and annotated for terms consistent with the DNA damage response (DDR). The NFE2L2 (NRF2) associated modules PHH:144 and PHH:325 include key genes such as SRXN1 and NQO1 which are direct targets of NRF2, and enrich for terms such as oxidoreductase and glutathione metabolism. In contrast to the other preserved modules from Table 1, modules associated with NRF2 and TP53 activation are not well-preserved, but these modules do respond as expected to specific inducers consistent with interpretation as oxidative stress and DNA damage modules, respectively (Fig. 2c, panels v and vi and Table 1). Only modules PHH:243 (includes GADD45A/B) and PHH:337 (associated with NFE2L2) are found to be preserved (Table 1).

The examples described above exemplify the utility of PHH TXG-MAPr for identifying modules relevant to known stress pathway useful for mechanistic interpretation and benchmarking preservation of co-expression across species and liver in vivo. Other cellular processes and pathways not discussed can be identified using the main enrichment terms and TFs associations of the whole preserved module set (Table S8).

PHH stress response module seeds cluster in an interaction map to describe mechanisms of toxicity

In toxicity pathways, early events are coupled to a cascade of perturbations that lead to an adverse outcome, e.g. cell death. Using the stress pathway modules as seeds, we created an interaction map (cluster correlation) between stress pathways and other highly preserved modules that cluster important cellular processes as well as preserved modules without clear annotation and identify off-diagonal interactions between important biological themes.

We used the well-annotated modules identified above (Table 1 and Table S8) as seeds, and identified preserved ‘add-in’ modules (with or without clear enrichment) if they had a Pearson correlation ≥ 0.7 with a seed module, calculated across all TG-GATEs PHH compound treatments. The final expanded set of 87 seed and add-in modules is in Table S8, identified by the column “Cluster”. Figure 3 shows a cluster correlation analysis of the expanded seed set module EGs based on all TG-GATEs compounds, together with representative compound responses. Eight distinct clusters emerged with modules annotated for similar processes forming macro groups; examples are described below. Note that within each group, modules with limited to no annotation have been included, suggesting they belong to the same response group.

Fig. 3
figure 3

PHH stress modules interaction map. Cluster correlation matrix of the 87 modules correlating with well-annotated modules. On the left, modules are hierarchically clustered with Ward D2 algorithm using Pearson correlation (red–blue color scale) as distance. On the far left, the preservation status of each module is indicated with gray color scale (black—preserved, gray—not preserved). Clusters of modules with concordant annotation are highlighted with dashed squares. On the left, EGs (purple–orange color scale) for the exemplar compounds tunicamycin, cyclosporine A, Acetaminophen, Omeprazole are shown for the 87 modules, in dose– and time–responses. On the far right, modules names are shown together with their main annotation (available for the seed modules), in red highlighted the modules show in Table 1 (color figure online)

Cluster 1: immune response

Cluster 1 includes all the seed modules annotated for immune response pathway terms (Table 1), except PHH:247 (cluster 8). Add-in module PHH:317, which showed no annotation terms, contains an interleukin receptor (IL18R1), but also some genes induced by inflammatory molecules, for which little is known regarding involvement with immune response (GSAP, CSTO). Similar to the observations obtained with TNF-α and interferon-α, inflammatory modules clusters into two subgroups, one enriched with NF-κB modules, the other in STAT regulated modules.

Cluster 5: mitochondria

Cluster 5 contained seed modules annotated with mitochondria related terms, but also add-in modules with less clear annotation. Nevertheless, when looking for protein–protein interactions among genes belonging to this group of modules, high degrees of connections are found for a large subgroup of genes, connecting within modules and showing high corEG (Wilcoxon rank sum test p = 7.72E-07, Fig. S4).

Cluster 3, 4 and 8: cellular stress responses

The above characterized stress modules, like oxidative stress, ER stress response, heat shock and DDR, clustered in isolated sub-groups within clusters 3, 4 and 8. However, these clusters also included other processes less commonly associated with prototypical stress responses. For example, Cluster 3 contained the seed proteasome module (PHH:76), a module containing two growth arrest and DNA-damage inducible (GADD) genes, GADD45A and GADD45B (PHH:243), the ATF4-regulated modules (PHH:15 and PHH:295), a module containing several inducible heat shock protein transcripts (PHH:131) and a module containing two NRF2 target genes, MAFG and MXD1 (PHH:337). Cluster 3 also contained two RNA processing modules (PHH:42, PHH:19) associated with the nucleolus/ribosome biogenesis and mRNA processing, respectively. Cluster 8 contains the seed DDR modules (PHH:59 and PHH:83), both of which contain multiple TP53 regulated genes, and a seed mitochondrion module (PHH:2) along with modules enriched for fatty acid/cholesterol metabolism modules (PHH:40, PHH:240) but also cellular homeostasis modules, like PHH:1 (cell cycle), and PHH:11 (metabolism).

The expanded interaction map describes the interplay between cellular processes in response to chemical treatment

The interplay across module clusters appeared to capture elements of the dynamic interactions among biological response networks (off-diagonal correlation in Fig. 3). Clusters can interact positively, e.g. the mitochondria cluster 5 and the RNA processing cluster 6 (which contains both ribosome RNA (rRNA) and mRNA processing modules), or negatively, e.g. activation of stress cluster 3 is commonly associated with down-regulation of modules in cluster 8 (cell cycle and mitochondrion). Interestingly, half the modules, largely those annotating for NF-κB in immune response cluster 1, showed positive correlation (induced) with activation of processes in clusters 2 and 3, while the other half, including the STAT modules, were inversely correlated and down-regulated. Cluster 1 showed a similar bifurcation with cluster 8 but in the opposing direction, i.e. the NF-κB modules positively correlated with clusters 2 and 3 were negatively correlated with cluster 8 while the STAT modules showed a positive correlation.

An integrated understanding of interaction among complex cellular processes is critical for a mechanism-based risk assessment and requires identifying key event associations with an apical endpoint. Compound with higher versus lower risk may perturb more cellular processes in progressing to cytotoxicity while others may have a more adaptive phenotype. Evidence for progressive perturbation (or lack thereof) can be derived within the module cluster correlation landscape by comparing the module activation by exemplar compounds (Fig. 3, right). For example, tunicamycin is a prototypical ER stress inducer and has a low cytotoxic concentration in human hepatocyte cells and a low LD50 of 2 mg/kg in mice (Morin and Bernacki 1983; Zhang et al. 2014). Tunicamycin clearly activates the ER stress response in PHH both via the ATF6 arm in cluster 4a (PHH:13 and 62), the ATF4 arm (cluster 3) and fatty acid metabolism (cluster 8 and cluster 4), concurrently with strong repression of rRNA and mRNA processing (cluster 6) and immune response activation (cluster 1). In contrast, cyclosporine A, which is less toxic with an LD50 in mice ranging from 96 to 2803 mg/kg (depending on the route of administration (Sax 1975)), primarily triggered the ATF6-ER response (cluster 4a). Taken together, these observations suggest that tunicamycin is inducing a more severe response impacting diverse cellular biological processes, whereas cells challenged with the used concentrations of cyclosporine A showed limited module perturbations.

A different picture emerged when we considered two prototypical drugs that can induce oxidative stress: acetaminophen and omeprazole. For both, immune responses were primarily repressed (cluster 1), while oxidative stress modules (cluster 4) were activated. In addition, omeprazole activates modules in cluster 3, concurrent with a deactivation of modules in cluster 8. Specific transcription regulation modules are activated, including PHH:4 which contains several stress-induced TFs (ATF4, ATF3, NFE2L2, TP53) and the xenobiotic stress module, including CYP enzymes (cluster 4). Interestingly, PHH:358 (cluster 4) contains CYP1A1 and CYP1A2, canonical AHR targets, and omeprazole is a known AHR ligand (Safe et al. 2020). Taken together, these observations suggest that omeprazole activates oxidative stress, coupled with the activation of processes to constrain protein damage (ATF4, proteasome and heat shock modules in cluster 3) and repression of fatty acid metabolism and cell cycle progress (cluster 8).

These examples illustrate the utility of the PHH TXG-MAPr stress-pathway landscape for mechanistic evaluation of compounds and for comparing compounds with similar modes of action but with different degrees of toxicity.

Visualizing external datasets in the TXG-MAPr tool reveals good correlation between different cell types and gene expression platforms in the mode of action

Integrating historical and new datasets to derive mechanistic interpretations is important, but it is often hard to determine if joint information, such as WGCNA modules, provides a consistent view of biological responses across experiments. The high variability of gene-level analyses, and the high penalty for multiple comparisons also complicate data integration. To validate the utility of the PHH TXG-MAPr for integrating and comparing external datasets, we created a data upload function, which calculates new module eigengene scores (EGs) from the gene log2FC of the uploaded data and overlays the scores onto the module dendrogram for visualization (for further details, see Material and Methods paragraph “Uploading new data files”). To further describe CSA mode of action, we processed and uploaded samples from the GEO database of CSA prolonged repeated exposure followed by recovery in collagen sandwich cultured PHH (GSE83958 (Wolters et al. 2016)) into the PHH TXG-MAPr tool (Fig. 4a). The uploaded CSA dataset showed strong correlation (Pearson R > 0.8) in module activation and repression among the three timepoints. A heatmap of module EGs (absolute EGs > 2) separates repressed and activated modules in different clusters (Fig. 4b). ER stress annotated modules (PHH:13 and PHH:62) and an ATF4 module (PHH:15) are located in the same cluster, which are strongly activated by CSA at high dose levels (Fig. 4c). In addition, the 5 day 30 µM CSA treatment with 3 days recovery looks most similar to the CSA_24h_6 µM data from TG-GATEs based on module EGs correlation (R Pearson = 0.67, Fig. 4d). The absolute Pearson correlation between module EGs across the uploaded CSA conditions was higher than the absolute correlation between the gene log2FC (Supp. Fig. S5f). The uploaded data confirm that high dose cyclosporine A induces a strong ER stress response amongst others, which is present for several days during daily exposure and persists even after washout. A similar ER stress response (PHH:13 and PHH:62) was also present in HepG2 hepatocytes exposed to CSA (GEO dataset: GSE45635 (Van den Hof et al. 2015)) and ER stress module activation was seen at low (3 µM) and high (20 µM) dose (Suppl Fig. S5a, S6 cluster 5 red line).

Fig. 4
figure 4

Upload external data into the PHH TXG-MAPr. a Hierarchical tree view of module EGs upon 30 µM cyclosporine A exposure at 3 day (left), 5 day (middle) and 5 day + 3 day recovery (right). b Heatmap of all modules that have at least one condition with absolute EG score > 2 for cyclosporine A treatment. All concentrations and time points from TG-GATEs (0.24, 1.2 and 6 µM) and uploaded GEO: GSE83958 dataset (30 µM) are shown. Modules are clustered by Euclidean distance, Ward.D2 method. The purple color scale indicates to which cluster of Fig. 3 each module belongs, and gray color scale indicates the preservation status. c Zoom in for the third cluster obtained from the heatmap in B. Modules that have a strong annotation for ER stress (PHH:13 and PHH:62) or ISR (PHH:15) are in this module cluster and are strongly induced by 6–30 µM cyclosporine A. d Compound correlation plot comparing the module EG scores of the uploaded 30 µM CSA data at 5 day + 3 day recovery (y-axis), with CSA_24hr_6µM (x-axis) available in the PHH TXG-MAPr (Pearson R = 0.67). e Cluster correlation heatmap showing the Pearson R correlation between the conditions in TG-GATEs data (CSA, TUN, APAP) and the uploaded datasets with CSA and APAP exposures in PHH, HepG2 and HepaRG cells at various time points (see labels). Compounds cluster by mode-of-action, because ER stress inducers TUN and CSA are clustering together and show low correlation with the oxidative stress inducer APAP (color figure online)

Additional datasets of human hepatocytes (PHH, HepG2 and HepaRG) exposed the oxidative stress inducer acetaminophen (APAP) were analyzed to investigate the application of the PHH TXG-MAPr for a different compound (Suppl. Fig. S5b-e). The APAP datasets show a clear oxidative stress response (PHH:144, PHH:325) as well as many other module perturbations, which is independent of the cell type or microarray platform (Fig. S6, cluster 8 red line).

Pearson correlations between CSA and APAP treatments in TG-GATEs and the uploaded data from GEO are shown in a cluster correlation plot (Fig. 4e). Hierarchical clustering based on correlation distance between samples separates the data based on the mode of action, clustering ER stress inducers cyclosporine A and tunicamycin together and separated from oxidative stress inducer acetaminophen. In addition, different hepatocyte cell cultures (3D PHH, HepG2 and HepaRG) show good correlation with the same compound exposure, suggesting that a similar mode of action can be captured by other hepatocyte cells. GEO data of PHH exposed to APAP that was analyzed on two different Agilent microarray platforms correlate well with the TG-GATEs data at the same concentration (GEO_PHH_APAP_24h_5mM / GEO_PHH.3D_APAP_24h_5mM versus TG_PHH_APAP_24h_5mM), even though the average module coverage was 84 or 93% for the Agilent chips (e.g. only 84 or 93% of genes per module were measured in the chip). This indicates that the TXG-MAPr supports the analyses of gene expression data from different platforms/chipsets. In contrast, low dose or early time-points do not correlate well with the same compound, likely due to the minimal module perturbations in these conditions (data not shown).

Overall, the PHH TXG-MAPr upload and analysis functions are powerful tools to compare both global biological responses across disparate datasets and TG-GATEs data, while identifying specific modes-of-action, even for other hepatocyte cell types or other microarray platforms, suggesting that co-expression analysis can obviate technical issues when comparing across expression datasets or platforms.

Leveraging WGCNA to interpret targeted RNA Seq datasets

Although high throughput technologies allow many samples to be processed in parallel, the high costs of whole transcriptome profiling has led to the use of targeted RNA sequencing approaches which are gaining popularity for toxicology studies (Mav et al. 2018). We reasoned that mapping a targeted gene sets to co-expression modules might yield sufficient data to impute more detailed and biologically relevant information. We tested this concept using the PHH TXG-MAPr by uploading a targeted RNA Seq data set, TempO-Seq, obtained from treating plated cryopreserved PHH from 50 different human donors in a detailed dose–response protocol with tunicamycin at two time points (0.0001–10 µM, 8 and 24 h, Niemeijer et al. 2021). We calculated the modules EGs for each donor-concentration–time point combination, a total of 600 different conditions, following the approach detailed in “Material and methods” section.

Of the 2708 genes measured in the TempO-Seq S1500 + gene set, only 1830 genes are mapped to the PHH modules (Supp. Table S9). Genes not mapping were excluded from the PHH network during modules generation because they did not have robust co-expression patterns and are included in the PHH TG-GATEs ‘gray’ module (Langfelder and Horvath 2008). Although some modules were not covered by any genes from the S1500 + set (coverage of 0%), modules selected in Fig. 3 have significantly higher coverage (Wilcoxon test, greater alternative, p < 0.05 and Fig. 5a, b), suggesting that the preserved and well annotated modules are better represented in the S1500 + set. In addition, the coverage was higher when the hub gene was present in the S1500 + set (Fig. 5b). EGs calculation depends on both the fold change and the correlation with the eigengene score of each gene (corEG, a measure of hubness), therefore we assessed the corEG values for S1500 + genes as a gene quality metric. At the module level, the average corEG does not seem to be heavily impacted by the subsampling of gene space, and modules in the expanded seed set (Fig. 3) show higher concordance (Fig. 5c, Pearson R 0.84 versus Pearson R 0.65 including all modules). Additionally, the S1500 + set shows higher average corEG compared with random draws of genes from PHH modules (adj. p value < 0.05, see Material and Methods “Data analysis” section). We also assessed coverage within the set of the preserved modules; only two show coverage of 0%, PHH:103 and PHH:105 (Supp. Table S10). Thus, the S1500 + set represented a set of genes that overlap, to a greater or lesser extent, nearly all preserved modules and higher quality relative to their corEG values.

Fig. 5
figure 5

Upload of 50 donors PHH study S1500 + TempO-Seq set. a Histogram of modules coverage when uploading the targeted TempO-Seq gene set. Frequency (y axis) of modules percentage covered by the uploaded targeted gene set (x axis). b Percentage covered (y axis) for PHH modules, grouped on the x axis by whether they are part of the correlation matrix in Fig. 3 (blue) or not (green). The plot is divided into two sections, the first one showing modules where the hub gene was included in the uploaded gene set, the second one showing modules where the hub gene was not included in the uploaded gene set. c Mean correlation Eigengene for each module, whole genes set (x axis) versus after upload with the targeted TempO-Seq gene set (y axis). Points are colored based on whether they are part of the correlation matrix in Fig. 3 (blue) or not (green). d Pearson R correlation for each module between EGs calculated with the complete gene set, and EGs calculated with the S1500 + gene set, for all TG-GATEs experiments. Points are plotted against module coverage (x-axis) and are colored based on whether they are part of the correlation matrix in Fig. 3 (blue) or not (green). Inside: Pearson correlation calculated as previously grouped by whether they are part of the correlation matrix in Fig. 3 (blue) or not (green). Modules with coverage = 0% had been excluded. e Cluster correlation heatmap (complete clustering, Euclidean distance) showing the Pearson R correlation between modules EGs of the 50 donors dataset samples (10 µM, 24 h, pink color code on the left), TG-GATEs tunicamycin data obtained with the complete gene set (aquamarine color code on the left), and TG-GATEs tunicamycin data using only the S1500 + gene space (salmon color code on the left) (color figure online)

To test the impact of calculating EGs based on the S1500 + gene set, we recalculated EGs for all TG-GATEs samples using only the reduced S1500 + gene set and calculated the Pearson correlation per module compared to EGs calculated using the complete gene set (Fig. 5d). Modules even with minimal, but above zero, gene coverage show good EGs correlation (always higher than 0.5), and modules in the expanded seed set (Fig. 3) show on average higher correlation (Wilcoxon test, p value of 2.72e-09). Using the ER stressor tunicamycin as a model compound, we performed a cluster correlation analysis using EGs for TG-GATE PHH data, calculated using both whole chip or just the S1500 + genes, and compared results with a set of TempO-Seq, S1500 + data derived from 50 donor PHH cultures treated with tunicamycin (Fig. 5e, Supp. Figure 8). The Affymetrix and S1500 + derived scores for TG-GATEs tunicamycin treatments were always showing correlation > 0.7 and clustered together. However, there was considerable variation within the donor set at the highest dose (10 µM). A subset of donors shows higher similarity within themselves and with TG-GATEs tunicamycin samples, especially for medium concentration and when calculated with the S1500 + gene set (Fig. 5e, donors in upper left quadrant, to compared to TG-GATEs sample in the right bottom quadrant). Another subset of donors showed lower similarities within each other and lower similarity with TG-GATEs tunicamycin samples (Fig. 5e, lower right quadrant).

Thus, a targeted gene set can be used as input to derive EGs for modules derived from the whole transcriptome and impute biological responses on EGs.

Assessing donor variability captured by PHH modules

Since modules with similar biological annotations cluster together based on EGs scores (Fig. 3) we reasoned that identifying sources of variability at the level of clusters might be more robust than at the individual module level. Therefore, we clustered modules based on the S1500 + EGs of the 50-donor dataset similarly to Fig. 3, and assessed the overlap between clusters (Fig. S9). Some, but not all, modules clusters derived from the donor data show significant overlap with the groups obtained using the entire TG-GATEs set in Fig. 3 (Fig. S9b, exact Fisher p values and numerical overlaps are shown). In particular, RNA processing, stress clusters (ATF4 and ATF6) and transcriptions regulation modules are clustering similarly. The modest overlap is not surprising since the donor dataset is derived with a single compound with the PHH donors as the primary source of variability (Fig. 5e).

We then made use of modules EGs of S1500 + clusters 1–8 to study donor variability in response to tunicamycin. Donor variability can be attributable to genomic makeup, but also lifestyle, disease and age status of each individual. Consequently, we analyzed relationships between module scores, reflecting induction or repression of underlying genes, and the presence (positives) or absence (negatives) of donors’ traits, i.e. sex, the presence of cancer, liver pathology, hypertension, diabetes, smoking habit (Supp. Table S11).

We applied the methodology described in (Sutherland et al. 2018) and averaged modules EGs within each module clusters and modeling each concentration separately. We calculated Cohen’s d effect sizes (eff, (Cohen 2013)) and performed logistic regression treating avgAbsEG (average absolute cluster score) as a covariate and quantified the p value for each module in explaining the odds of toxicity (signed_log10_p_adj) reflecting the adjustment for differences in overall gene expression, i.e. avgAbsEG (Supp. Table S12). Next, we calculated the same relationship for individual modules EGs (Supp. Table S13).

Following this approach, we were able to identify modules and the underlying biological processes, which are uniquely associated with a donor’s trait upon tunicamycin exposure, and with increasing association levels in a dose–response relationship (Fig. 6). Module clusters have variable number of genes included but still show good gene quality as quantified by the high average correlation with EGs calculated with the entire TG-GATEs gene set, especially compared to modules excluded by these clusters (Figs. 5d and 6). We focused on the presence of liver pathology, that influences donor’s response based on different processes both positively and negatively (Fig. 6a). Modules annotated for DDR (PHH:83), protein folding (PHH:95) and oxidative stress (PHH:325 and PHH:144) are positively associated (induced) with liver pathology phenotype of the donor (Fig. 6a, Supp. Table S12, Supp. Fig. S10). Modules annotated for immune response had a negative association (repressed) with liver pathologies (PHH:12, PHH:26, PHH:44, PHH:22, Fig. 6a, Supp. Table S12 and S8 for module annotation). In particular, donors with liver pathologies show lower EGs for these modules, corresponding to lower log2FC values of the genes belonging to the modules, all annotated for being involved in inflammation and immune response (moderate and high negative Cohen’s d effect sizes, between − 0.4 and − 0.8, and adjusted p value < 0.05, Supp Table S13, Fig. 6b and Supp. Fig. S10c, showing genes in module PHH:12 with corEG > 0.8). To a closer inspection, genes with lower log2FCs are the result of already higher basal expression levels in PHH donors with pre-existing liver pathology (Fig. 6c, showing normalized counts for PHH:12 genes with significant difference between donors with or without liver pathologies, only with DMSO treatment, Wilcoxon test, adj. p value < 0.05).

Fig. 6
figure 6

Some PHH modules are associated with donor’s traits. a Dose–response plot of effect size and adjusted p value (log10 transformation, keeping the sign) resulting from the association of modules clusters to donors’ presence of liver pathologies. Modules clusters indicated by different color, number of donors showing (positive) and not showing (negative) the indicated trait are shown in the plot label. On the right, effect size and adjusted p value (log10 transformation, keeping the sign) of individual modules to trait associations most prominently contributing to the overall cluster associations highlighted with gray shadows. Modules are colored in different saturation level of the same hue of the cluster they belong to. b Boxplot of modules’ EGs negatively associated with liverpath, grouped by increasing concentrations and presence/absence of liver pathologies. **Adjusted p value < 0.01, *adjusted p value < 0.05 in logistic regression of individual modules. c Boxplot of normalized counts of genes belonging to module PHH:12 and significantly different between donors with or without liver pathologies with only DMSO treatment (Wilcoxon test, BH adj. p value < 0.05)

To conclude, pre-existing traits of PHH donors influence the cellular response to an ER stressor (tunicamycin). Such an influence can be attributable not to different magnitude of ER stress response activation, but to variations in the activation of accompanying processes like immune response or oxidative stress.


In this work, we presented a novel Shiny web tool that allows users to apply a gene co-expression network (WGCNA) approach to investigate toxicity mechanisms rapidly and in the context of biological responses preserved between human and animals. This unbiased approach is not weighted toward current knowledge and can reveal unknown relationships between genes and phenotype data, but it also encompasses known biological response networks through modules annotation. In addition, this approach is tailored for a gold standard in vitro testing system: primary human hepatocytes (PHH).

Toxicogenomics data can be difficult to interpret due to its high dimensionality and possible low signal-to-noise ratio (Ideker et al. 2011). We reduced gene expression dimensionality from 104 to 102 by taking a modular approach and constructing networks of co-expressed genes from a large PHH transcriptomic dataset (Igarashi et al. 2015). A set of 398 modules are presented in an intuitive visualization format, the PHH TXG-MAPr. A simplified upload function enables calculation of eigengene scores (EGs) from external sources, allowing new data to be integrated and interpreted in the PHH TXG-MAPr analysis environment, which contains 158 chemical comparators combined in 941 different treatment conditions. Pathway enrichment and TF associations contribute to mechanistic interpretation of modules, allowing the user to identify key processes that are different and in common between compound-induced responses. This represents a starting point for mechanistic interpretation across risk assessment applications, for example, to derive data-driven quantitative AOPs based on molecular initiating and key events represented by modules’ EGs. Stress induced pathways interact in cascades of perturbations, balancing adaptive and progressive responses. We showed how modules and correlation between them can represent connected mechanisms (Fig. 3) and used these to describe compounds’ mode of action (MoA). Although correlation analysis is not sufficient to establish causality, it can suggest potential areas for additional analysis in order to define cause-and-effect relationship, e.g. through validation of a mechanism-based risk assessment framework (Liu et al. 2019). In addition, gene co-expression modules allows to understand how applicable a pathway is to other species (Perkins et al. 2015).

One of the most vexing problems in risk assessment is indeed understanding whether toxicology data are likely to extrapolate across species, e.g., from rodent to human, or from one model to another, e.g. from in vitro to in vivo. Co-expression analysis formalizes methods that allow comparison of node-edge relationships at the gene level (Langfelder et al. 2011). As a result, we were able to define a subset of modules, and the associated biological themes, with higher preservation between in vitro human and rat hepatocytes models. Interestingly, a number of preserved PHH modules show also overlap with published signatures of rat xenobiotic receptor activation (Podtelezhnikov et al. 2020): to mention a few, PHH:358 perfectly overlaps with the AHR signature, PHH:16 shows very high enrichment with gene members of the SREBP signature and prominent annotation for cholesterol biogenesis (Supp. Table S9). A remarkable observation of our results was the notable difference in preservation for some toxicologically relevant and well characterized stress pathways. For example, ER stress response is highly preserved, while DNA damage (p53-driven gene expression) and oxidative stress (Nrf2-driven) responses are not. Interestingly, when comparing module preservation results, we found that in vivo rat liver and RPH show higher similarities (Pearson R 0.85 and 0.60 for Z-summary and Median Rank, respectively). In addition, Nrf2/p53 modules show good preservation between PHH and HepG2 data (data not shown), suggesting the differences in module preservation are influenced by species rather than culture conditions. We expect that preservation results depend, to some extent, on the input dataset composition. For instance, how well a certain process is captured in gene expression data from the input experiments, which can influence gene-to-gene correlation patterns, could be a hypothesis to be explored in future studies. Nevertheless, literature reports that kinetics and activation of the non-preserved Nrf2/p53 processes are different across species (MacRae et al. 2015; Martin and Chang 2018; Monroe et al. 2020), suggesting that preservation of co-expression networks is a promising approach for the evaluation of the applicability domain of new approach methodologies (NAMs) for extrapolating safety assessments across species (Parish et al. 2020).

Comparison with external datasets is a crucial application of the PHH TXG-MAPr tool. Gene level comparisons can be risky given the likelihood of variability across models and the low signal-to-noise ratio. Within TG-GATEs data, we showed that compound correlations are improved at the module level (Fig. 1c, S1c, S5f). In addition, module EGs derived from microarray-based external GEO data shows high similarity across external and TG-GATEs datasets for the same compounds, suggesting that gene-level noise is dampened when results are evaluated at the modules level. Using the external GEO datasets, we also confirmed that ER stress is an early event in the mode of action of CSA in different liver cell types. In contrast, CSA did not show a good correlation with the MoA of APAP datasets, suggesting that oxidative stress is a secondary/late effect to high dose CSA exposure. Although several publications indicate that oxidative stress is involved during CSA induced cholestasis in liver (Kawamoto et al. 2017; Koido et al. 2020; Wolters et al. 2016), our analysis of CSA exposed hepatocytes using the TXG-MAPr tool showed that the ER stress should not be neglected in the mechanism of CSA induced liver injury as was also suggested by others (Koido et al. 2020; Van den Hof et al. 2015b). To validate the applicability of the PHH TXG-MAPr for other transcriptomic platforms, we showed that Agilent microarray data of APAP exposure in PHH showed strong correlation with Affymetrix based data in TG-GATEs.

Integrating and comparing data across platforms represent a great challenge, especially in cases where targeted gene sets have been designed to be representative for the whole transcriptome (Mav et al. 2018; Soufan et al. 2019). A reduced number of features is convenient to diminish complexity and costs, but can weaken the power of enrichment analysis given the smaller gene set background. Imputing the entire transcriptome response prior to analysis is one solution to this problem (Mav et al. 2020), but may not be tailored for the testing system in use. Using the TempO-Seq S1500 + gene set platform, we showed that EGs calculated with PHH co-expression modules can impute biological response patterns directly when analyzing and interpreting PHH data from targeted gene sets in the context of whole transcriptomic data. Although the targeted gene space is dramatically reduced compared to whole transcriptome data in TG-GATEs (~ 17%, 1830/10275 genes), the modules included in our collection of mechanistic relevant modules (Fig. 3) had significantly higher coverage compared to the entire module set, and include genes of high quality (as quantified by the corEG). Module EGs obtained with the complete gene set and with the S1500 + set are in good agreement, and the mechanistic relevant modules had significantly higher agreement than the rest. Taken together, these findings suggest that results based on only genes in the S1500 + platform can be extrapolated to recapitulate the whole transcriptome modules score variability.

Following this strategy, we circumvented the limited gene coverage of the S1500 + set by leveraging the correlation structure of modules to define biological responses relevant to the differential sensitivity across responses of 50 donors’ hepatocytes treated with an ER stress inducer, tunicamycin. Donors’ traits, particularly the presence of liver pathologies, show significant dose–response associations with clusters of modules referring to specific biological responses, namely immune, DNA damage, oxidative stress responses and protein folding. The presence of liver pathologies seems to be associated with a weakened activation of innate immune response upon ER stress induction. Interestingly, genes involved in innate immunity in donors with pre-existing liver pathologies show already higher expression values without treatment compared to donors without liver pathologies. Under conditions of excessive stress, ER stress evokes inflammatory responses via the activation of the three UPR branches, which may sensitize to adverse events (Duvigneau et al. 2019; Fredriksson et al. 2014; Peng et al. 2020). Donors with pre-existing liver diseases show already activated inflammation, therefore possibly resulting in even higher sensitivity. However, only innate immune response could be captured by the PHH model, as it includes isolated primary hepatocytes, but no other cellular components of the liver that contributes to the immune response. Modules related to DNA damage response, protein folding and oxidative stress response appear to be mildly sensitized in patients with liver pathologies upon ER stress induction. That is in accordance with a condition of chronic stress and continuous attempt to restore homeostasis (García‐Ruiz and Fernández‐Checa 2018). Our results partly overlap with findings in recent GWAS DILI studies, where cholestatic injury phenotype has been associated with UPR and mitochondrial stress, but most importantly with an increased sensitivity to ROS upon additional drug treatment (Koido et al. 2020). Interestingly, ER stress response did not show a connection with donor traits: ER stress related modules, collectively, were a stable and preserved set of biomarkers for the unfolded protein response. Ultimately these findings highlight how pre-existing liver diseases can be a confounding factor when interpreting data from in vitro models using donor hepatocytes and that conserve alterations even in 2D culture conditions.

In conclusion, the TXG-MAPr takes advantage of the modular nature of gene co-expression networks to achieve mechanistic relevant, cross-species and cross-platform evaluation of toxicogenomic data. Practically, we envision a multifaceted application of the TXG-MAPr. Toxicogenomic data can be acquired for candidate compounds, even in a high-throughput platform, and a summarized profile of preserved and highly interpretable co-expression module responses can be extracted. The latter can be used to determine the prevalent mode of action and to compare with legacy and internal data. Preserved co-expression PHH modules can be further exploited to derive translational biomarkers to be included in high-throughput in vitro screening methods, or to explore donor-to-donor different response, but also to benchmark donor’s pool responses. Overall, we demonstrated that gene co-expression analysis coupled to a facile visualization environment, the PHH TXG-MAPr, is a promising approach to analyze in vitro human transcriptomic data and derive mechanistic interpretation, and therefore a substantial step forward towards integration of transcriptomic data in mechanistic risk assessment practices.