Self-consistent signal transduction analysis for modeling context-specific signaling cascades and perturbations

Cole, John

doi:10.1038/s41540-024-00404-x

Self-consistent signal transduction analysis for modeling context-specific signaling cascades and perturbations

Article
Open access
Published: 19 July 2024

Volume 10, article number 78, (2024)
Cite this article

Download PDF

You have full access to this open access article

npj Systems Biology and Applications

Self-consistent signal transduction analysis for modeling context-specific signaling cascades and perturbations

Download PDF

John Cole¹

478 Accesses
1 Altmetric
Explore all metrics

Abstract

Biological signal transduction networks are central to information processing and regulation of gene expression across all domains of life. Dysregulation is known to cause a wide array of diseases, including cancers. Here I introduce self-consistent signal transduction analysis, which utilizes genome-scale -omics data (specifically transcriptomics and/or proteomics) in order to predict the flow of information through these networks in an individualized manner. I apply the method to the study of endocrine therapy in breast cancer patients, and show that drugs that inhibit estrogen receptor α elicit a wide array of antitumoral effects, and that their most clinically-impactful ones are through the modulation of proliferative signals that control the genes GREB1, HK1, AKT1, MAPK1, AKT2, and NQO1. This method offers researchers a valuable tool in understanding how and why dysregulation occurs, and how perturbations to the network (such as targeted therapies) effect the network itself, and ultimately patient outcomes.

Gene Expression Models of Signaling Pathways

Introduction: Cancer Gene Networks

Inferring Intracellular Signal Transduction Circuitry from Molecular Perturbation Experiments

Article 28 April 2017

Introduction

Biological signal transduction networks (STNs) constitute the primary mechanism through which living cells sense their environment and adapt their behavior. Nearly every living thing—from the simplest bacterium swimming toward food, to a developing embryo—utilizes complex biochemical networks of interacting species to activate or inhibit different genes or cellular components. Despite their general robustness¹, aberrant network behaviors can give rise to a variety of diseases including cancer.

A number of methods have been developed for modeling and simulating STNs, including coupled ordinary differential equations, partial differential equations, boolean or logic-based methods and their extensions (which treat elements of the network as though they are “on” of “off”), among many others^{2,3,4,5,6,7,8,9,10,11,12}.

Constraint-based models, including flux balance analysis (FBA), leverage linear programming to study large reaction networks like metabolism (for an outstanding introduction to FBA, see ref. ¹³). FBA does not require detailed kinetic parametrization. Instead, modulation of constraints (e.g., limiting uptake, or preventing flux through enzyme-mediated reactions) is used to model environmental or biomolecular perturbations. Entire -omics data sets can be applied to model disease states, or specific cells—known broadly as context-specific FBA¹⁴. Despite their successes, constraint-based models incorporating signal transduction are rare, with notable applications including the ME model of O’Brien et al.¹⁵, the work of Vardi et al.¹⁶, and the lpNet models of Knapp et al. and Matos et al.^17,18.

In this article I present a constraint-based method for modeling the human STN, dubbed self-consistent signal transduction analysis (SCSTA), and apply it to the study of estrogen receptor (ERα)-targeted endocrine therapy (ET) in breast cancer. En route, I will build a model of the human STN comprising over 200, 000 interactions, and employ transcriptomics data from The Cancer Genome Atlas (TCGA Research Network: https://www.cancer.gov/tcga) to impose constraints on gene expression and signal flux throughout the STN in order to model ET in nearly 1000 patients. The model predicts that ET alters gene expression in ways that decrease proliferation and cell cycling, and increase cell death, immune-related behaviors, stemness, and metastasis, with the changes in proliferation being significantly associated with overall survival (OS). The pathways that drive proliferation change center around the genes GREB1, HK1, AKT1, MAPK1, AKT2, and NQO1. Finally, I will directly compare predicted changes in gene expression with actual measured changes in patients that underwent ET, finding that the predictions are reasonably well-correlated with experiment.

Results

SCSTA

The main result of this work is the method for constructing and simulating an SCSTA model. Here I describe and discuss various design choices, as well as possible extensions or alterations (see Supplementary Section 1.1 for a concise description of the main assumptions and/or formalisms used).

Network construction and translation to a Linear Program

SCSTA requires some predetermined directed graph representing the signal transduction network of interest. The graph’s nodes represent biochemical species (small molecules, proteins, genes etc.), while edges represent interactions—a small molecule activating a protein, a protein phosphorylating another protein, or some transcription factor inhibiting some gene, etc. The edges are directed, meaning they have a “parent” node and a “child” node, and they may be either activating or inhibiting. Our SCSTA model is cast as a linear program (LP), and, similar to FBA, we assume it is at steady state. Edges correspond to variables in the LP, while nodes correspond to constraints. Figure 1a depicts a typical node; it has two activating and one inhibiting parent edges, and two child edges. We understand that the protein expression level of this node has some value N. We expect the node might display some level of constitutive activation, and it may also become spontaneously deactivated; accordingly we introduce constitutive activation (CA) and spontaneous deactivation (SD) edges (Fig. 1a). We assume the activating edges increase node activation, the inhibitory edges decrease activation, and complete activation occurs when all protein copies are activated, leading to the first constraint, e_CA + e₁ + e₂ − e₃ ≤ N, or more generally:

$$\sum\limits_{{e}_{p}}{w}_{p}{e}_{p}\le N$$

(1)

where e_p represents each parent edge (including the CA edge), and w_p is some constant that, at the very least, encodes whether the edges activate or inhibit. Here I use w_p = 1 or −1 for activation or inhibition, although other values could be chosen (e.g., if a protein more readily interacts with some of its targets than others).

**Fig. 1: Cartoon representing typical nodes and edges.**

Activation “flows” out of the nodes via child edges, and we expect there could be some associated non-unity gain, G (e.g., if a protein activates multiple targets faster than it becomes deactivated). We require the total activation impinging on a node to be passed through to the child edges (including the SD edge), with the out-flowing activation bound by the product of G and the impinging activation. This results in the pair of constraints (1/G)(e₄ + e₅ + e_SD) ≤ e_CA + e₁ + e₂ − e₃ ≤ e₄ + e₅ + e_SD, or more generally:

$$\begin{array}{l}\sum\limits_{{e}_{c}}{e}_{c}\ge \sum\limits_{{e}_{p}}{w}_{p}{e}_{p}\\ \sum\limits_{{e}_{c}}{e}_{c}\le G\sum\limits_{{e}_{p}}{w}_{p}{e}_{p}\end{array}$$

(2)

where e_c represents the cth child edge (including the SD edge). In this work, G was set conservatively at 2 (see Methods Section “Gain in SCSTA”).

Gene nodes (Fig. 1b) do not have child edges (save for an SD edge), but they still produce something important—transcripts. We assume the transcript expression level is proportional to the gene activation (less the SD edge), and that it is measured, denoted here as M. This gives the constraint (e_CA + e₆ + e₇ − e₈ − e_SD)ξ = M, or more generally:

$$\xi \left(-{e}_{{{{\rm{SD}}}}}+\sum\limits_{{e}_{p}}{w}_{p}{e}_{p}\right)=M$$

(3)

where ξ represents the proportionality between the gene’s activation and the transcripts produced (see Methods Section “Estimating ξ”).

Because paired transcriptomics and proteomics experiments are relatively rare, direct measurements of N and M for each gene are often unavailable. Here I assume that N and M are proportional to each other (i.e., there exists some γ_i such that N_i = γ_iM_i for each transcript and protein), and estimate those proportionality constants utilizing data from ref. ¹⁹ (see Methods Section “Estimating γ”, and Supplementary Fig. 1).

Next, we consider the boundary conditions in our model, namely the small molecules and “parentless” nodes (those that have no impinging edges). Here I assume that small molecules are available in excess, and thus place no upper bound on the activation that can flow from them. I note that modulation of these bounds could be used to simulate different environmental conditions on the STN. The parentless nodes I assume are constitutively activated, and set the lower bound on each e_CA to the node’s corresponding N.

Finally, we consider the objective(s) for our LP. My goal is to find the “simplest” solution possible—that is the one that requires the least amount of pathway activity in order to give rise to the observed gene expression state. I think of this as an application of Occam’s razor; given a set of solutions all of which yield the observed gene expression state, I prefer the one with the lowest total activity. In constraint-based metabolic models, this type of optimality is often called the parsimonious solution. Unfortunately, this choice also imbues the solution with certain unintended biases (see Supplementary Section 1.2), including a tendency toward increased reliance on constitutive activation at or near the gene nodes. To circumvent this bias, I first introduce the objective:

$${c}_{1}=\sum\limits_{n}\frac{1}{{2}^{{d}_{n}}}\left({e}_{{{{\rm{CA}}}},n}+{e}_{{{{\rm{SD}}}},n}\right)$$

(4)

where n runs over all nodes, and d_n is the “distance” along the shortest path from node n to its nearest gene. Here, the CA and SD activations are increasingly more disfavored as the network is traversed toward the genes. By minimizing c₁, subject to all other constraints, I find a solution that is intrinsically biased against large CA and SD values in general, and specifically biased against large CA and SD values at or close to the gene nodes. Once I know the minimal value that c₁ can take, I stop using it as an objective and instead use it as a new constraint (that is, I bound from above the value c₁ can take; in this work it was set to 1% larger than its minimal value). I then introduce the second objective:

$${c}_{2}=\sum\limits_{i}{e}_{i}$$

(5)

which represents the total activity flowing through the network. By minimizing c₂, subject to all other constraints, including the newly-added one bounding the value of c₁, I find a parsimonious solution without the aforementioned bias toward constitutive activation at or near the genes. I call this the “baseline” solution.

Minimally-perturbed solution

We want to perturb the network and predict its response. Unfortunately, many perturbations of interest will alter gene expression, and our model was constructed with a pre-defined gene expression state. Here I introduce the minimally-perturbed solution (MPS). We apply perturbations to the network (e.g., by altering constraints), remove the constraints on gene expression (Eq. (3)), use baseline values to impose upper bounds on the CA and SD edges, and then seek the solution that is most like the baseline solution. This is accomplished by introducing slack and surplus variables for each edge, and adding the constraints:

$${e}_{i}^{{{{\rm{MPS}}}}}+{e}_{i}^{{{{\rm{slack}}}}}-{e}_{i}^{{{{\rm{surplus}}}}}={e}_{i}^{{{{\rm{baseline}}}}}$$

(6)

where ${e}_{i}^{{{{\rm{baseline}}}}}$ and ${e}_{i}^{{{{\rm{MPS}}}}}$ represent ith edge values in the baseline and new (minimally) perturbed solutions, the later being found by minimizing:

$${c}_{3}=\sum\limits_{i}{e}_{i}^{{{{\rm{slack}}}}}+{e}_{i}^{{{{\rm{surplus}}}}}$$

(7)

This approach seeks the smallest change that can result from a given perturbation, and thus represents a conservative starting point for numerous lines of inquiry. Drug targets could be elucidated by systematically preventing activation of each node (mimicking a strong inhibitor), or populations of patients that respond to new or existing drugs may be elucidated through screening of their transcriptomes.

I note that the MPS described here is not “self-consistent” in the same way that the baseline solution is; it uses unperturbed node expression levels (N values) to predict perturbed M values. This can be rectified in several ways, some of which are detailed in Supplementary Section 1.3. In this article, the simpler approach outlined here is used.

Predicting Pre- and Post-ET in TCGA-BRCA patients

From TCGA, I downloaded every RNAseq dataset associated with the TCGA-BRCA project, excluding patients for which either multiple or no transcriptomics datasets were available. For each patient (970 in total), I computed the baseline and MPS associated with the inhibition of ERα (see Methods Section “Modeling ET in Breast Cancer”). I then computed scores for several cancer-associated behaviors (proliferation, cell death, cell cycle, immune, stemness, and metastasis) based on the gene expression profiles of each patient’s pair of solutions, and SIGNOR’s phenotypes dataset²⁰ (see Methods Section “Computing phenotype scores”).

The ET-associated change in each score was computed (defined as pretreatment subtracted from post-treatment), and patients were categorized based on ER immunohistochemistry (IHC) status (when unavailable or equivocal, status was imputed based on transcriptomic data, see Methods Section “Imputation of feature values for TCGA patients”). As expected, we see much larger score changes associated with ER-positive than ER-negative patients, and for most ER-negative patients, no change was predicted (the inner-quartile range is zero, see Fig. 2; also see Supplementary Section 1.4 and accompanying Supplementary Tables 1--2, and Supplementary Fig. 2 for pretreatment scores stratified by ER-positivity).

**Fig. 2: Boxplots of predicted changes in each behavior score under ET.**

Among ER-positive patients, score changes tended to be beneficial: the proliferation and cell cycle scores tend to decrease, while the cell death and immune scores tend to increase (Fig. 2 and Supplementary Fig. 3). The model also correctly predicts that the stemness and metastasis scores increase among ER-positive patients, possibly resulting from therapy driving these tumors into a more basal-like regulatory state. This is in keeping with the results of a 2011 study by Al Saleh et al., which found that estrogen receptor silencing in MCF7 cells drove a transition toward mesenchymal phenotype, increased metastatic propensity, and a shift toward basal cell markers²¹.

It has been reported that epithelial-mesenchymal transition (EMT, part of the stemness score) is associated with tamoxifen resistance^22,23, and that this relationship is bidirectional and reversible—reductions in ER signaling (e.g., via targeted therapy) lead to increased EMT, and increases in EMT lead to decreased ER signaling²⁴. In Supplementary Section 1.5, I explore this phenomenon further, recapitulating the results of²¹, and finding that the model does indeed predict greater (lesser) sensitivity to ET among patients with low (high) EMT scores, and that ER-expressing cell lines with high markers of EMT are predicted to be relatively insensitive to ET.

ET-associated changes in proliferation are associated with enhanced overall survival

Among ER-positive patients, adjuvant ET (extending between 5 and 10 years) has long been standard of care; this was true well before the TCGA-BRCA patients were studied and thus it is likely that many received ET (the TCGA has data on administered drugs like tamoxifen and fulvestrant, but the coverage is relatively low). I performed a multivariate Cox regression over all ER-positive patients (N = 749), including the change in each score (binarized at the median) along with several other covariates associated with survival (see Methods Section “Survival analysis”).

Only the change in proliferation was significantly associated with OS during the 10 years after diagnosis (see Fig. 3a; hazard ratio (HR) 2.38, log-likelihood ratio test p ≈ 0.02). Impressively, this was associated with a greater risk than T, N, and M stage, HER2 status, and ESR1 expression. Only patient age and PR status had greater HRs. In a univariate analysis, the HR associated with proliferation change was 2.0 (p ≈ 0.04); an accompanying Kaplan–Meier plot is shown in Fig. 3b.

The change in proliferation score tended not to correlate with most other covariates (see Supplementary section 1.6 for details on this and other score changes, and Supplementary Fig. 4); for only two covariates—ESR1 expression level, and the change in stemness score—did the magnitude of its Spearman correlation rise above 0.2 (−0.21 and −0.26, respectively). This indicates that: ET is predicted to be more effective in reducing cancer proliferation among patients with higher ERα expression (as should be expected); and that tumors predicted to undergo larger reductions in proliferation are also predicted to increase in stem-like behaviors.

The widely-used 21 gene Oncotype DX (R) Recurrence Score (RS) is heavily-weighted toward proliferative genes, supporting the notion that proliferation strongly influences risk of recurrence²⁵. Although the RS was not designed for prognosticating OS, a recent study showed that when binarized at the median, it appears to show comparable performance to our computed proliferation score change²⁶. I thus sought to determine whether the proliferation score change was correlated with its pretreatment value. I found their Pearson correlation coefficient to be ρ ≈ − 0.072, and observed no obvious trend (see Supplementary Fig. 5). I performed additional multivariate Cox regressions to determine if including the pretreatment score (binarized at its median) impacted the HR associated with the score change. In a bivariate analysis, both were found to have comparable HRs (2.20 for pretreatment score, and 2.16 for score change; p-values of 0.02; see Supplementary Fig. 6 for accompanying Kaplan–Meier plot), while in the more comprehensive analysis that included additional covariates, these HRs jumped to approximately 2.87 and 2.61 respectively (p-values of less than 0.005, and 0.01). All of this strongly supports the hypothesis that the changes in gene expression predicted by SCSTA, particularly among proliferative genes, offer insight into which patients benefit most from ET.

Pathways predicted to drive ET-associated changes in proliferation

I investigated which changes in pathway activation were most strongly associated with changes in the behavior scores. Because proliferation change is associated with OS, it will be presented here, while similar analyses of other score changes will be relegated to Supplementary Section 1.7 and accompanying Supplementary Fig. 7).

Over the entire ER-positive population, the fraction of total score change attributable to a given edge, denoted ϕ, was computed (see Methods Section “Determining drivers of changes in behavior scores”). I found a diverse set of edges with relatively small ϕ-values contributing toward reduced proliferation, with the largest seven accounting for just over half ( ≈ 53%; see Supplementary Tables 3–8 for ϕ-values associated with each score).

Of these seven driver edges, three were the result of ERα’s direct transcriptional regulation of GREB1, AKT1, and AKT2 (the first, third, and fifth largest ϕ-values). GREB1 is a known mediator of ET resistance^27,28; it is strongly associated with ER-positive breast cancer, it is rapidly induced by estrogen, and that induction is critical for ER-associated proliferation^29,30. AKT1 and 2 are linchpin signal transducers in the PI3K/AKT/MTOR pathway, and represent central players in proliferative and anti-apoptotic signaling in many cancers, including breast cancer^31,32.

The second largest ϕ corresponded to decreased activation of HK1 by HIF1. HK1 is a key glycolytic enzyme, and although it has not been implicated directly in conferring sensitivity or resistance to ET, its isoform Hexokinase 2 has³³. To understand how ERα signaling affects HK1 expression, I computed the connecting pathway along which each edge shows the largest difference in activity between predicted pre- and post-ET solutions (see Methods section “Maximally-dysregulated pathways” for details). This “maximally-dysregulated pathway” (MDP) involved: the loss of activation of GNA13 by ERα being offset by F2R, at the cost of F2R’s normal activation of EGFR; the dysregulation of EGFR and PTK7 by POSTN, resulting in redirection of activity away from ERCC6; and partial offset of this loss of ERCC6 activity by the redirection of activity from the gene HK1 (see Fig. 4, upper pathway). Several proteins involved in this pathway are associated with breast cancer prognosis (including ERCC6 and POSTN^34,35) and/or sensitivity/resistance to endocrine or other chemotherapy (including EGFR, ABL1, GNA13, and PTK7^36,37,38,39).

**Fig. 4: Pathway diagrams indicating how ET is predicted to decrease proliferation.**

Transcriptional activation of MAPK1 by both E2F1 and STAT1 decreased under ET (the fourth and sixth largest ϕ values). MAPK1 is a key transducer of proliferative and many other signals, while E2F1 and STAT1 are both critical regulators of proliferation and cell cycle control that have been implicated in endocrine resistance^{40,41,42,43,44,45}. The MDP proceeding through E2F1 consists of only one additional step (see Fig. 4), while the pathway through the STAT1 involved the redirection of NFKB1’s activity from VEGFB to TFF1 to offset the later’s loss in activation by ERα, and subsequent redirection of STAT1’s activity from its downstream targets to VEGFB (see Fig. 4).

The seventh largest ϕ corresponded to decreased transcriptional activation of NQO1 by NFKB1. NQO1 encodes a key metabolic enzyme that has been associated with breast cancer progression⁴⁶. The associated MDP again involves redirection of NFKB1’s activity to TFF1, but this time at the cost of NQO1’s transcriptional activation (see Fig. 4).

Two of these pathways involved the redirection of NFKB1’s activity toward TFF1. In addition to being a direct target of ERα within the model, TFF1 is also an indirect target through FOXA1. This suggests that those patients with relatively low levels of FOXA1 and TFF1 activation by ERα, along with relatively high levels of NFKB1 activation should see reduced clinical benefit from ET. This interpretation is in keeping with the results of a 2017 study by Yamaguchi et al., which showed that expression of ERα, FOXA1, and TFF1 were all significantly reduced in an MCF7-derived tamoxifen-resistant breast cancer cell line, and that the canonical and non-canonical NFκB signaling pathways were significantly upregulated⁴⁷.

A 2022 study by Xia et al. included serially-sampled transcriptomics data for 35 patients undergoing ET, 25 of which had pre-treatment data⁴⁸. For each of these, the pretreatment and earliest post-treatment gene expression levels were extracted and the log₂ fold changes (L2FCs) were computed for GREB1, HK1, AKT1, MAPK1, AKT2, and NQO1. One-sided t-tests revealed that GREB1 and HK1—the two most important effectors of the proliferation score—were significantly down-regulated, and all but AKT2 trended toward downregulation (p-values less than 0.5, see Table 1).

Table 1 One-sided t-test p-vales for the six genes most strongly associated with the change in proliferation score

Full size table

I note that considerable variability exists within the population. While patients’ predicted pathway activities generally led to stable or decreased proliferation, two outliers were observed for which the proliferation score was actually predicted to increase. In both cases, these increases were predominantly driven by the loss of transcriptional inhibition of MYC, resulting in increased expression, and, in turn, an increase in the proliferation score. Over then entire ER-positive cohort, increased MYC expression is among the top 20 drivers of proliferation score change (see Supplementary Table 3), but its impact is generally smaller than that of other drivers, including those described above. In the case of these two outliers, the more canonical pathways leading to decreased proliferation play a much smaller role (the ones terminating in GREB1 and HK1, as examples, have ϕ-values of only 0.0044 and 0.0076, respectively), indicating that these patients may be intrinsically less sensitive to ET.

Direct comparison of SCSTA predictions with measured post-ET data

As a final test, I predicted the post-ET transcriptome for each of the 25 patients selected from ref. ⁴⁸, and compared those results to the actual post-treatment transcriptomic measurements (see Methods Section “Modeling ET in 25 patients with pre- and post-treatment transcrip-tomes” for details).

Over all genes in all patients, a total of 46,456 instances of apparent differential regulation (that is, a given gene in a given patient having an absolute L2FC of >1 between measured post- and pre-treatment expression levels) were observed. As expected, SCSTA was conservative in predicting differential expression, with only 3, 025 such instances found. It also tended to predict downregulation with much greater frequency than upregulation (2992 vs. 33 instances, respectively), and the magnitude of the predicted L2FCs was exaggerated relative to the measurements (see Fig. 5, and Supplementary Section 1.2 and accompanying Supplementary Fig. 8 for discussion of this effect). I note that the post-treatment measurements tended to be made weeks or months after treatment initiation, and may represent some evolution of the tumor cell population (e.g., expansion of less-sensitive clones) that could explain some of the noted discrepancies.

**Fig. 5: Scatter plot of SCSTA-predicted vs. measured L2FCs among the 3025 instances of modeled differential expression in the 25 patients from ref. ⁴⁸.**

Among the predicted instances of differential expression, the Pearson and Spearman correlations with measured L2FCs were 0.27 and 0.42, respectively (indicating a modest, if nonlinear, monotonic relationship). As a comparator, I trained a multiple linear regressor (LR) to predict each of the 25 patient’s post-treatment transcriptomes from their pretreatment transcriptomes (see Methods Section “Linear Regression to predict changes in gene expression under ET”). The LR was less conservative in predicting differential regulation, with 55,138 instances overall, and, despite its use of actual post-treatment data during training, yielded both Pearson and Spearman correlation coefficients of 0.46, indicating greater linearity but only marginal improvement in Spearman correlation (see Supplementary Fig. 9).

Discussion

In this article, I have presented SCSTA, a genome-scale constraint-based method for investigating STNs. The method’s most important conceptual leap lies in its self-consistency with respect to the transcriptional activation of genes, and their products’ engagement within the network. Specifically, SCSTA requires that the signal flux through the network reproduces the observed transcriptional profile of each patient, while simultaneously requiring that the gene products corresponding to that expression profile are capable of transmitting the required flux. I applied SCSTA to the study of one of the most widely-used classes of anti-cancer therapies in one of the most commonly-occurring cancer types, namely ET in breast cancer. I found that changes in the expression of proliferative genes, including GREB1, HK1, AKT1, MAPK1, AKT2, and NQO1, were associated with overall patient survival. In an independent dataset, all of these genes except AKT2 showed evidence of downregulation under ET, and both GREB1 and HK1—the two genes predicted to be most strongly associated with the change in proliferation—were significantly downregulated. Overall genes predicted to be dysregulated under ET, a modest correlation coefficient was observed.

SCSTA can be used in a variety of high-value applications, particularly in the quantitative systems pharmacology (QSP) sphere. Knock-out or knock-down studies of actual patients can be performed in order to: (1) elucidate new potential drug targets; (2) determine therapeutic levels of inhibition; (3) interrogate targeted multi-drug regimens; (4) predict patient populations that may benefit from a targeted therapy; and (5) elucidate potential mechanisms of action of drugs. The approach may also be integrated into other types of QSP models.

Although its current form admits steady-state approximations, other similar constraint-based models have been used in dynamical approaches. Mahadevan’s dynamic flux balance analysis (dFBA)⁴⁹, for example, uses FBA coupled to a dynamic environment to simulate changes in metabolic behavior, while others, including myself, have extended these ideas to simulate microbial colonies and, more recently, solid tumors^50,51,52. In its simplest form, the SCSTA model could be used as part of a pharmacodynamic description within a broader multicompartment pharmacokinetics/pharmacodynamics (PK/PD) simulation. A modeled drug could be dosed, and the concentration of the drug within the tumor compartment, for example, could be used to infer the degree of inhibition of the drug’s target. The SCSTA model could then be used to predict the effect on various readouts, proliferation or otherwise, at each simulation timepoint. The drugs that could be simulated include many already in the network (the set of small molecules extracted from OmniPath contains several drugs), but many more can be modeled. Indeed, the activity of any signaling molecule within the model may be modulated (e.g., partial or complete inhibition, or even increased activity) in order to simulate various existing, experimental, or hypothesized drugs.

There exist several opportunities for expansion and refinement of the approach. I note that the TCGA transcriptomics data used here represents bulk RNA samples, meaning multiple cell types (including infiltrating immune cells, fibroblasts, etc.) are present, to varying degrees, in each sample. While this naturally introduces some uncertainty when interpreting the results presented in this manuscript, it also presents an exciting opportunity for further development. In particular, Supplementary Section 1.8, and related Supplementary Fig. 10, describes one possible avenue through which bulk RNAseq may be deconvolved, and a compartmentalized SCSTA model representing multiple cell types interacting within a shared microenvironment may be constructed. Such an approach may be used to interrogate various forms of intercellular signaling, including potentially critical interactions between cancer cells and infiltrating lymphocytes. In a similar vein, single-cell RNAseq data could be used to characterize the heterogeneity in signaling and or drug sensitivity within a sample. This can be accomplished with a straightforward application of SCSTA directly to each cell sample. Such analyses could be used to predict if and why a given patient may develop resistance to a targeted therapy (e.g., if some sub-population of cells is intrinsically insensitive), or to design personalized multi-drug combinations targeting different subsets of cells. One could even envision combining a single-cell analysis with compartmentalized SCSTA described above; spatially-resolved RNAseq could be leveraged to create a representation of a microenvironment where each cell is its own compartment, and intercellular interactions may proceed among cells making direct contact with each other.

Much work is remains to be done. The development of carefully curated models of metabolism have taken decades, and refining the underlying STN used here in a similar manner will be invaluable. Moreover, many assumptions and approximations have been employed that could be refined, including the possible use of other optimality criteria (in light of the biases introduced by parsimony), better estimates of the weights in Eqs. (1)–(3) (perhaps based on high-throughput knockout experiments), or the inclusion of protein-specific estimates of signal gain (perhaps based on kinetics data), or the refinement of the weights in the behavior scores, to name just a few.

Methods

Gain in SCSTA

One of the parameters of the model presented in this work is the gain, G, which is intended to account for the possibility that proteins or other effectors may be able to interact with multiple targets before becoming deactivated. More conservative values (near 1) essentially require that activation be conserved as it is passed on through the network. Larger values enable small perturbations in upstream targets to have large downstream consequences, a feature of many biological networks, but they also somewhat decouple the activity flowing through the network from the expression levels of the nodes and the activity levels of upstream pathways (for example, a very large G could allow essentially arbitrarily-valued activation levels to appear at a given node despite that node’s parents’ activation levels being near zero). In principle, different gain parameters could be defined for each node in the network, but in this work, for simplicity, I used a single value for all nodes.

One way to roughly estimate the scale of G is by assuming that every node, when fully activated, might be able to fully activate each of its downstream targets. I considered the ratios of child node expression to parent node expression for each node in the STN. In order to do this, for each node I first selected the median protein expression value over all tissues and replicates in¹⁹. If a protein was not represented in the data, I used the product of the median γ value and the median transcript level for the gene encoding that protein, or failing that, the median protein expression level over all observations. Proteins were then redistributed to complexes when appropriate (see Methods Sections 4.3–4.4). Then for each node I computed the ratio of the sum of expression levels of all of the node’s targets to the node’s expression level, and then computed the median over all nodes, as:

$$G \sim {{{\rm{med}}}}\left[\frac{{\sum }_{t}{N}_{t}}{{N}_{s}}\right]$$

(8)

where N_s is the expression level of a given (source) node, and N_t is the expression level of the source node’s t-th target node. This computation yielded a value of approximately 5.7. In the interest of erring on the conservative side, I selected a value of 2 for this work, but believe reasonable values could easily be as high as 10.

Estimating ξ

I split ξ into two parts (ξ = αβ), with α representing the proportionality between the impinging activation and the level of gene activity, and β, representing the proportionality between the gene’s activity and the number of transcripts that result. In order to estimate α, I want to know the level of activity impinging on a gene that leads to full transcriptional activation under normal (healthy) conditions. In order to do this, I considered the subset of genes that only have activating parent edges (in order to avoid the complexities associated with inhibitory interactions). For each such gene, I determined its set of transcription factors (that is, all transcription factors within the model that activate the gene), and their respective protein expression levels (taking the median expression level over all tissues and replicates in¹⁹, which includes paired transcriptomic and proteomic data from 32 healthy tissues taken from multiple patients). If all transcription factors were fully activated, and that activity were distributed evenly among each of their (potentially many) gene targets, then it is possible to sum up an estimate for how much activation is impinging on each gene under fully-activated conditions:

$${{{\Theta }}}_{i}=\sum\limits_{T\in {\{{{{\rm{TF}}}}\}}_{i}}{N}_{T}/{K}_{T}$$

(9)

where Θ_i represents the expected level of activation for gene i under fully activated conditions, T is a transcription factor belonging to the set of transcription factors that activate gene i, N_T is the protein expression level of the transcription factor, and K_T is the number of targets the transcription factor has. Once this is computed for each relevant gene, I simply computed an estimate of α as α = med[{Θ}]⁻¹. This analysis resulted in the estimate α ≈ 0.25, which can be interpreted as meaning that approximately one out of four activated transcription factors goes on to activate one of its targets.

Next, I wanted to estimate β, the number of transcripts produced when a gene is fully activated. One simple way to estimate this is by assuming the highest transcription levels measured for each gene correspond to its full activation. We can again rely on data from¹⁹, in this case focusing on the transcriptomics, and extracting the 95%-ile of all TPM values for each gene (expecting that some fraction of the very highest observations are likely to be spurious outliers). These values are then used as our gene-specific β values.

Estimating γ

In order to estimate the set of values of γ such that N_i = γ_iM_i for all genes and their respective gene products, I once again turn to the paired transcriptomics and proteomics data from¹⁹. For each gene in each sample, the ratio of the protein expression level to its transcript expression level (N/M) was computed. Then, for each gene, the median of those ratios was extracted, yielding an estimate for each γ value (See Supplementary Fig. 1). In the event that no suitable γ value could be found this way (for example, if a gene was in the model, but not represented in¹⁹), the median γ value over all computed γ values was used.

Modeling ET in breast cancer

In order to construct a reasonably comprehensive model of human signal transduction, I first downloaded all human protein-protein interactions graded B or higher from the signaling knowledgebase OmniPath⁵³, along with all transcriptional interactions, and all small molecule interactions. These were compiled into a set of nodes and edges, and an LP was constructed in the manner detailed in section “SCSTA”. The python package optlang was used as an interface to the LP software package GLPK. All N values were taken to be proportional to their respective M values (taken from the TCGA BRCA data set, and mapped from Ensemble ID to the UniProt IDs used in OmniPath using UniProt’s online gene mapping tool), with γ values computed as described above in Methods section “Estimating γ”. In cases where a node in the model was not available in the patient’s RNAseq data, the median TPM value over all observations for that gene in ref. ¹⁹ was used, or failing that, the median TPM value over all observations of all genes in ref. ¹⁹.

OmniPath includes complexes as nodes; because proteins may be members of different complexes, and may also have signal transduction activity on their own, distributing protein copies to their respective complexes required some care. The approach taken was to begin by assuming all proteins’ expression levels (N values) were associated with the uncomplexed state, and then iteratively looping over every complex, discerning if there are enough uncomplexed copies of each constitutive protein to form another complex, and if so, incrementing the complex copy number by 1 while decrementing those of each constituent uncomplexed proteins. This process was repeated until no more complexes could be generated. Small molecules (which appear in some complexes) were assumed to be available in excess, and as such were ignored in this calculation.

In total, the model incorporated 14,786 genes, 11,710 proteins, 1321 complexes, 3793 small molecules, and 210,043 total interactions.

For each patient, a baseline solution was first computed. I then perturbed the network by imposing a 99% reduction in the upper bound on ERα-activation (that is, for the ERα node, I altered the constraint in Eq. (1) such that the right-hand side was 0.01 × N). This was meant to mimick a strong inhibitor of ERα. I then computed the MPS as described in section “Minimally-perturbed solution”.

It is worth noting that OmniPath allows users to restrict their downloads to only those data sets with liscences that allow for commercial use; for the purposes of this research article it is unnecessary, although in many other instances, such an option may be required. I also note that there is nothing intrinsic to the method that requires a specific network, OmniPath or otherwise. OmniPath was chosen in this work for its comprehensiveness, but other knowledgebases could have be used instead (Kegg, SIGNOR, etc.) Moreover, several methods have been developed for network inference, often relying on high-throughput experiments. Examples include the lpNet models (which use linear programming for network inference^17,18). These and other types of unsupervised methods could be applied for network inference, and the resulting networks could be used within SCSTA studies. Minimal data requirements are outlined in Supplementary Section 1.9.

Computing phenotype scores

SIGNOR’s phenotypes data set²⁰ is comprised of a list of gene products that either up- or downregulate members of a set of phenotypes (incidentally, SIGNOR is a signaling pathway knowledgebase that forms part of OmniPath). I used it to compute a score for each phenotype for each patient using both the baseline gene expression data, and the gene expression data predicted by the MPS. Specifically, the level of activation of each gene was computed, and then transformed into protein expression values using appropriate values for ξ and γ (see Methods sections “Estimating ξ” and “Estimating γ”). In cases where the MPS resulted in gene activation levels that swung negative, the predicted copy number was set to zero (these corresponds to situations in which the sum of inhibiting edges is larger than that of activating edges, and an “overly-inhibited” gene is still presumed to produce no transcripts or proteins). Then, weighted sums associated with each phenotype were computed; if a given gene product upregulates a given phenotype, the corresponding protein expression value was added, and when one downregulates a phenotype, it was subtracted.

Many of these phenotypes correspond to closely related behaviors, and as such were further grouped. For example, the phenotypes “Metabolism,” “Glycolysis,” and “Oxidative phosphorylation” are all clearly associated with metabolism, and more generally, are associated with enhanced cell proliferation. Not only can they be grouped together, they can also reasonably be grouped with the phenotypes “Proliferation” and “Cell growth,” among others. Accordingly, a second round of weighted sums, corresponding to just six behaviors (proliferation, cell death, cell cycle, immune, stemness, and metastasis) were constructed. These are detailed in Supplementary Table 9.

Finally, because the absolute values of each score can vary wildly (in part because they are composed of different numbers of phenotypes, and those are composed of different numbers of gene products) the mean and standard deviation of the computed baseline scores for each behavior were found, and used to standardize both the baseline scores and the MPS scores such that:

$${X}^{* }=\left(X-{\mu }_{{X}_{{{{\rm{baseline}}}}}}\right)/{\sigma }_{{X}_{{{{\rm{baseline}}}}}}$$

(10)

where X represents some raw score, and X^* its corresponding standardized score, ${\mu }_{{X}_{{{{\rm{baseline}}}}}}$ represents the mean of the raw baseline scores, and ${\sigma }_{{X}_{{{{\rm{baseline}}}}}}$ represents the standard deviation of the raw baseline scores. This essentially z-scores the baseline scores, and transforms the MPS scores in terms of standard variates relative to the baseline scores.

Imputation of feature values for TCGA patients

Several clinical and pathological features of interest were missing for many of the TCGA patients. These included immunohistochemistry (IHC) statuses for estrogen receptor (ER), progesterone receptor (PR), and Her2/neu, as well as T, N, and M stages.

The IHC statuses, when unavailable or equivocal, were imputed based on available RNAseq data. For each of the genes ESR1, PGR, and HER2, all patients with positive or negative statuses were identified and their corresponding gene expression levels (unstranded TPM) were extracted. Cutoffs were then selected for each gene in order to maximize the Youden’s J-statistic. This was done by iteratively looping over each gene TPM for each patient in the set, using it as a cutoff to predict IHC status, then computing the J-statistic based on the known IHC statuses, and ultimately selecting the cutoff with the highest J-statistic overall. The corresponding cutoffs were 2029 for ESR1, 390 for PGR, and 24311 for Her2. These were then used to impute any missing or equivocal IHC statuses.

Missing T, N, and M stages (including indeterminate values such as those labeled TX, NX, or MX) were imputed based on the mode of the available data. Substages were ignored (e.g., T2a, T3b, etc. were treated as T2, T3, etc.), and then the most frequent stage of those available was computed and used to impute any missing values for the remaining patients.

Survival analysis

The Python package lifelines was used to perform Cox regressions and Kaplan–Meier fits. Overall survival data was extracted from TCGA, truncated at 10 years, and used in each analysis.

A Cox proportional hazards model was used to investigate whether the changes in each behavior score were associated with risk of death. I include as covariates the change in each behavior score (binarized at the median value of each score change), progesterone receptor (PR) and HER2 statuses, T-, N-, and M-stages, age at diagnosis (z-scored), and ESR1 expression level (wherein the TPM value was extracted, one was added to it, and the result was logged and z-scored). The later was included in order to ensure that any risk associated with the predicted changes in behavior was not just a trivial association with high or low ERα expression, since the model itself is predicated (in part) on ERα expression.

Because the change in proliferation score was found to be associated with risk of death, a univariate Cox model was also performed, and an associated Kaplan–Meier plot was produced (see Fig. 3b).

Additional Cox regressions were performed to investigate the impact of including the pretreatment proliferation score. These included a bivariate analysis (pretreatment proliferation score, and the change in score, both binarized at their respective medians), and a more comprehensive model that also included age, T-, N-, and M-stages, PR and HER2 statuses, and ESR1 expression. Finally, patients were grouped into four categories, “+/+,” “+/−,” “−/+,” and “−/−” based on their binarized change in proliferation score and binarized pretreatment proliferation score, respectively, and a Kaplan–meier plot was produced (see Supplementary Fig. 6).

Determining drivers of changes in behavior scores

In order to determine which gene-associated edges are most responsible for the predicted population-level changes in each behavior score, I first computed quantity π, representing the per-patient score change in response to a change in activity of a given edge:

$${\pi }_{i,k}^{l}={\gamma }_{i}{\xi }_{i}\left(\sum\limits_{j}{f}_{i}^{l}{\delta }_{i}^{l}{p}_{i,j}{b}_{j,k}\right)$$

(11)

where l represents the index of a given patient, γ_i is the gene product to transcript ratio for the gene with parent edge i, ξ_i is its activation to transcript ratio, ${\delta }_{i}^{l}={e}_{i}^{{{{\rm{MPS}}}}}-{e}_{i}^{{{{\rm{baseline}}}}}$ is the change in activation of edge i in patient l, p_i,j is the weighting of the gene product associated with edge i in phenotype j (either 1 if it upregulates the phenotype, − 1 if it down-regulates, or 0 if it does not affect the phenotype), and b_j,k is the weighting of phenotype j on behavior score k (analogous to p_i,j, b_j,k takes values 1, − 1, or 0; see Supplementary Table 9). The quantity ${f}_{i}^{l}$ corrects for gene activations that swing negative in the MPS (see Methods section “Computing phenotype scores”); it is the fraction of the change in the activity of the gene associated with edge i that is non-negative. I then used these values to compute the fraction of the total absolute score change (over all genes and all patients) that was associated with each edge:

$${\phi }_{i,k}=\frac{{\sum }_{{{{\rm{patient}}}}\,l}| | {\pi }_{i,k}^{l}| | }{{\sum }_{{{{\rm{patient}}}}\,l}{\sum }_{{{{\rm{edge}}}}\,i}| | {\pi }_{i,k}^{l}| | }$$

(12)

Higher values of ϕ_i,k indicate that edge i is responsible for a larger fraction of the total change in score k.

Maximally-dysregulated pathways

Given a set of paired baseline and MPS solutions, the MDP is the pathway that connects some edge in the network to a perturbed node such that each edge along the pathway best explains the change in the following downstream edge. In order to compute this, I first introduce the quantity ψ_i,n, representing the fraction of the sum over all patients of the absolute values of the changes in activation flowing into or out of each node, n, associated with each child or parent edge i:

$${\psi }_{i,n}=\frac{{\sum }_{{{{\rm{patient}}}}\,l}{\delta }_{i,n}^{l}}{{\sum }_{{{{\rm{patient}}}}\,l}{\sum }_{{{{\rm{edge}}}}\,i}| | {\delta }_{i,n}^{l}| | }$$

(13)

where ${\delta }_{i,n}^{l}={e}_{i}^{{{{\rm{MPS}}}}}-{e}_{i}^{{{{\rm{baseline}}}}}$ is the change in edge i (associated with node n) under the perturbation. These ψ values can be used to construct the MDP straightforwardly.

First, an edge (denoted e₁) is specified. This may be, for example, the transcriptional regulation of the some gene by some transcription factor. I then compute the ψ values of all incoming and outgoing edges connecting to e₁’s parent node (the transcription factor, denoted n₁). From there, I determine which other edges connected to n₁ could possibly give rise to the observed change in e₁. For example if e₁ tends to decrease (negative ψ), then I am only interested in other edges connected to n₁ that can explain that decrease, which include child edges that increase in activity (positive ψ values, representing a diversion of activity to some other pathway), an activating parent edge that decreases in activity, or an inhibiting parent edge that increases in activity. Of those possibilities, the one with the largest absolute value for ψ is selected, and labeled as e₂, with its parent being labeled as n₂. This process is repeated iteratively (explicitly excluding the possibility of traversing the same edge twice) until the perturbed node (e.g., ERα) is reached.

Modeling ET in 25 patients with pre- and post-treatment transcriptomes

Transcriptomis data associated with ref. ⁴⁸ was downloaded from ArrayExpress⁵⁴ (accession: E-MTAB-9917). Patients for which at least one sample was associated with a pre-treatment timepoint were selected. Log₂ normalized counts were then transformed to TPM, and SCSTA was used to model ET as described above in Methods section “Modeling ET in Breast Cancer”.

Linear Regression to predict changes in gene expression under ET

As a comparator for our SCSTA calculations, a multiple linear regression model was developed to predict post-ET gene expression from pre-ET expression. In each case, the earliest post-treatment sample was selected. The LinearRegression implementation in python’s sklearn package was used. The model was trained to predict an entire post-treatment transcriptome from a pretreatment transcriptome. A leave-one-out approach was used, wherein patient i’s gene expression was predicted using their pre-treatment expression data, and a model trained on all j ≠ i patients’ pre- and post-treatment data (the training set). I tried using TMP values directly as inputs and outputs, as well as log₂-transformed values, and found the model performed better with logged values, which was ultimately used as the comparator described in section “Direct comparison of SCSTA predictions with measured post-ET data”.

Data availability

This study utilized only previously-published and freely available data sets (https://portal.gdc.cancer.gov/projects/TCGA-BRCA, and tables published with¹⁹); no new primary data (e.g transcriptomics, proteomics, etc.) were generated.

Code availability

The underlying code for this study is not publicly available but may be made available to qualified researchers on reasonable request from SimBioSys Inc.

References

Schrum, A. G. & Gil, D. Robustness and specificity in signal transduction via physiologic protein interaction networks. Clin. Exp. Pharmacol. 2, S3–001 (2012).
PubMed Central Google Scholar
Birtwistle, M. R. et al. Ligand-dependent responses of the erbb signaling network: experimental and modeling analyses. Mol. Syst. Biol. 3, 144 (2007).
Article PubMed PubMed Central Google Scholar
Erdem, C. et al. A scalable, open-source implementation of a large-scale mechanistic model for single cell proliferation and death signaling. Nat. Commun. 13, 3555 (2022).
Article CAS PubMed PubMed Central Google Scholar
Neves, S. R. & Iyengar, R. Modeling of signaling networks. Bioessays 24, 1110–1117 (2002).
Article CAS PubMed Google Scholar
Hughey, J. J., Lee, T. K. & Covert, M. W. Computational modeling of mammalian signaling networks. Wiley Interdiscip. Rev. Syst. Biol. Med. 2, 194–209 (2010).
Article CAS PubMed PubMed Central Google Scholar
Samaga, R. & Klamt, S. Modeling approaches for qualitative and semi-quantitative analysis of cellular signaling networks. Cell Commun. Signal. 11, 1–19 (2013).
Article Google Scholar
Klipp, E. & Liebermeister, W. Mathematical modeling of intracellular signaling pathways. BMC Neurosci. 7, 1–16 (2006).
Article Google Scholar
Albert, R. éka & Wang, Rui-Sheng Discrete dynamic modeling of cellular signaling networks. Methods Enzymol. 467, 281–306 (2009).
Article CAS PubMed Google Scholar
Morris, M. K., Saez-Rodriguez, J., Sorger, P. K. & Lauffenburger, D. A. Logic-based models for the analysis of cell signaling networks. Biochemistry 49, 3216–3224 (2010).
Article CAS PubMed Google Scholar
Albert, R. & Thakar, J. Boolean modeling: a logic-based dynamic approach for understanding signaling and regulatory networks and for making useful predictions. Wiley Interdiscip. Rev. Syst. Biol. Med. 6, 353–369 (2014).
Article CAS PubMed Google Scholar
Abou-Jaoudé, W. et al. Logical modeling and dynamical analysis of cellular networks. Front. Genet. 7, 94 (2016).
Article PubMed PubMed Central Google Scholar
Koch, I. & Büttner, B. Computational modeling of signal transduction networks without kinetic parameters: Petri net approaches. Am. J. Physiol.-Cell Physiol. 324, C1126–C1140 (2023).
Article CAS PubMed Google Scholar
Orth, J. D., Thiele, I. & Palsson, BernhardØ. What is flux balance analysis? Nat. Biotechnol. 28, 245–248 (2010).
Article CAS PubMed PubMed Central Google Scholar
Becker, S. A. & Palsson, B. O. Context-specific metabolic networks are consistent with experiments. PLoS Comp. Biol. 4, e1000082 (2008).
Article Google Scholar
O’brien, E. J., Lerman, J. A., Chang, R. L., Hyduke, D. R. & Palsson, BernhardØ. Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction. Mol. Syst. Biol. 9, 693 (2013).
Article PubMed PubMed Central Google Scholar
Vardi, L., Ruppin, E. & Sharan, R. A linearized constraint-based approach for modeling signaling networks. J. Comput. Biol. 19, 232–240 (2012).
Article CAS PubMed Google Scholar
Knapp, B. & Kaderali, L. Reconstruction of cellular signal transduction networks using perturbation assays and linear programming. PLoS One 8, e69220 (2013).
Article CAS PubMed PubMed Central Google Scholar
Matos, MartaR. A., Knapp, B. & Kaderali, L. lpnet: a linear programming approach to reconstruct signal transduction networks. Bioinformatics 31, 3231–3233 (2015).
Article CAS PubMed Google Scholar
Jiang, L. et al. A quantitative proteome map of the human body. Cell 183, 269–283 (2020).
Article CAS PubMed PubMed Central Google Scholar
Licata, L. et al. Signor 2.0, the signaling network open resource 2.0: 2019 update. Nucleic Acids Res. 48, D504–D510 (2020).
CAS PubMed Google Scholar
Al Saleh, S., Al Mulla, F. & Luqmani, Y. A. Estrogen receptor silencing induces epithelial to mesenchymal transition in human breast cancer cells. PloS One 6, e20610 (2011).
Article CAS PubMed PubMed Central Google Scholar
Tian, M. & Schiemann, W. P. Tgf-β stimulation of emt programs elicits non-genomic er-α activity and anti-estrogen resistance in breast cancer cells. J. Cancer Metastasis Treat. 3, 150 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yuan, J. et al. Acquisition of epithelial-mesenchymal transition phenotype in the tamoxifen-resistant breast cancer cell: a new role for g protein-coupled estrogen receptor in mediating tamoxifen resistance through cancer-associated fibroblast-derived fibronectin and β1-integrin signaling pathway in tumor cells. Breast Cancer Res. 17, 1–18 (2015).
Article Google Scholar
Sahoo, S. et al. A mechanistic model captures the emergence and implications of non-genetic heterogeneity and reversible drug resistance in er+ breast cancer cells. NAR Cancer 3, zcab027 (2021).
Article PubMed PubMed Central Google Scholar
Baxter, E. et al. Using proliferative markers and oncotype dx in therapeutic decision-making for breast cancer: the bc experience. Curr. Oncol. 22, 192–198 (2015).
Article CAS PubMed PubMed Central Google Scholar
Layman, R. M. et al. Clinical outcomes and oncotype dx breast recurrence score® in early-stage brca-associated hormone receptor-positive breast cancer. Cancer Med. 11, 1474–1483 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wu, Y. et al. Tamoxifen resistance in breast cancer is regulated by the ezh2–erα–greb1 transcriptional axis. Cancer Res. 78, 671–684 (2018).
Article CAS PubMed Google Scholar
Hodgkinson, K. M. & Vanderhyden, B. C. Consideration of greb1 as a potential therapeutic target for hormone-responsive or endocrine-resistant cancers. Expert Opin. Ther. Targets 18, 1065–1076 (2014).
Article CAS PubMed Google Scholar
Rae, J. M. et al. Greb1 is a critical regulator of hormone dependent breast cancer growth. Breast Cancer Res. Treat. 92, 141–149 (2005).
Article CAS PubMed Google Scholar
Chand, A. L. et al. The orphan nuclear receptor lrh-1 and erα activate greb1 expression to induce breast cancer cell proliferation. PloS One 7, e31593 (2012).
Article CAS PubMed PubMed Central Google Scholar
Testa, J. R. & Bellacosa, A. Akt plays a central role in tumorigenesis. Proc. Natl. Acad. Sci. USA 98, 10983–10985 (2001).
Article CAS PubMed PubMed Central Google Scholar
Stål, O. et al. Akt kinases in breast cancer and the results of adjuvant therapy. Breast Cancer Res. 5, 1–8 (2003).
Article Google Scholar
Liu, X. et al. Elevated hexokinase II expression confers acquired resistance to 4-hydroxytamoxifen in breast cancer cells. Mol. Cell. Proteom. 18, 2273–2284 (2019).
Article CAS Google Scholar
Moslehi, R. et al. Integrative genomic analysis implicates ercc6 and its interaction with ercc8 in susceptibility to breast cancer. Sci. Rep. 10, 21276 (2020).
Article CAS PubMed PubMed Central Google Scholar
González-González, L. & Alonso, J. Periostin: a matricellular protein with multiple functions in cancer development and progression. Front. Oncol. 8, 225 (2018).
Article PubMed PubMed Central Google Scholar
Massarweh, S. et al. Tamoxifen resistance in breast tumors is driven by growth factor receptor signaling with repression of classic estrogen receptor genomic function. Cancer Res. 68, 826–833 (2008).
Article CAS PubMed Google Scholar
Zhao, H. et al. Enhanced resistance to tamoxifen by the c-abl proto-oncogene in breast cancer. Neoplasia 12, 214–IN3 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zhou, C. et al. Proteomic analysis of acquired tamoxifen resistance in mcf-7 cells reveals expression signatures associated with enhanced migration. Breast Cancer Res. 14, 1–21 (2012).
Article Google Scholar
Ataseven, B. et al. Ptk7 as a potential prognostic and predictive marker of response to adjuvant chemotherapy in breast cancer patients, and resistance to anthracycline drugs. OncoTargets Ther. 7, 1723–1731 (2014).
Article Google Scholar
Louie, M. C., McClellan, A., Siewit, C. & Kawabata, L. Estrogen receptor regulates e2f1 expression to mediate tamoxifen resistance. Mol. Cancer Res. 8, 343–352 (2010).
Article CAS PubMed Google Scholar
Asghari, A. et al. A novel group of genes that cause endocrine resistance in breast cancer identified by dynamic gene expression analysis. Oncotarget 13, 600 (2022).
Article PubMed PubMed Central Google Scholar
Lee, KwangYoul et al. Pi3-kinase/p38 kinase-dependent e2f1 activation is critical for pin1 induction in tamoxifen-resistant breast cancer cells. Mol. Cells 32, 107–111 (2011).
Article CAS PubMed PubMed Central Google Scholar
Louie, M. C., Zou, J. X., Rabinovich, A. & Chen, Hong-Wu Actr/aib1 functions as an e2f1 coactivator to promote breast cancer cell proliferation and antiestrogen resistance. Mol. Cell. Biol. 24, 5157–5171 (2004).
Article CAS PubMed PubMed Central Google Scholar
Huang, R. et al. Increased stat1 signaling in endocrine-resistant breast cancer. PloS One 9, e94226 (2014).
Article PubMed PubMed Central Google Scholar
Hou, Y. et al. Stat 1 facilitates oestrogen receptor α transcription and stimulates breast cancer cell proliferation. J. Cell. Mol. Med. 22, 6077–6086 (2018).
Article CAS PubMed PubMed Central Google Scholar
Yang, Y. et al. Clinical implications of high nqo1 expression in breast cancers. J. Exp. Clin. Cancer Res. 33, 1–9 (2014).
Article CAS Google Scholar
Yamaguchi, N., Nakayama, Y. & Yamaguchi, N. Down-regulation of forkhead box protein a1 (foxa1) leads to cancer stem cell-like properties in tamoxifen-resistant breast cancer cells through induction of interleukin-6. J. Biol. Chem. 292, 8136–8148 (2017).
Article CAS PubMed PubMed Central Google Scholar
Xia, Y. et al. Integrated DNA and RNA sequencing reveals drivers of endocrine resistance in estrogen receptor–positive breast cancer. Clin. Cancer Res. 28, 3618–3629 (2022).
Article CAS PubMed PubMed Central Google Scholar
Mahadevan, R., Edwards, J. S. & Doyle, F. J. Dynamic flux balance analysis of diauxic growth in escherichia coli. Biophys. J. 83, 1331–1340 (2002).
Article CAS PubMed PubMed Central Google Scholar
Cole, J. A., Kohler, L., Hedhli, J. & Luthey-Schulten, Z. Spatially-resolved metabolic cooperativity within dense bacterial colonies. BMC Syst. Biol. 9, 1–17 (2015).
Article CAS Google Scholar
Howard, F. M. et al. Highly accurate response prediction in high-risk early breast cancer patients using a biophysical simulation platform. Breast Cancer Res. Treat. 196, 57–66 (2022).
Article CAS PubMed PubMed Central Google Scholar
Peterson, J. R. et al. Novel computational biology modeling system can accurately forecast response to neoadjuvant therapy in early breast cancer. Breast Cancer Res. 25, 54 (2023).
Article CAS PubMed PubMed Central Google Scholar
Türei, D. énes, Korcsmáros, Tamás & Saez-Rodriguez, J. Omnipath: guidelines and gateway for literature-curated signaling pathway resources. Nat. Methods 13, 966–967 (2016).
Article PubMed Google Scholar
Sarkans, U. et al. From arrayexpress to biostudies. Nucleic Acids Res. 49, D1502–D1506 (2021).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The author would like to thanks Dr. Daniel Cook for helpful discussions relating to this work, and everyone else he continues to enjoy working alongside at SimBioSys Inc. In addition, the author thanks his reviewers, whose comments have greatly strengthened this manuscript, suggested new areas of inquiry, and solidified his own thinking on critical areas regarding interpretation of the results presented, and various extensions of the ideas described herein.

Author information

Authors and Affiliations

SimBioSys Inc., Champaign, IL, USA
John Cole

Authors

John Cole
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.C. conceived of and performed all calculations, analyzed the results, and wrote the manuscript.

Corresponding author

Correspondence to John Cole.

Ethics declarations

Competing interests

J.C. is a co-founder, stock holder, and employee of SimBioSys Inc.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cole, J. Self-consistent signal transduction analysis for modeling context-specific signaling cascades and perturbations. npj Syst Biol Appl 10, 78 (2024). https://doi.org/10.1038/s41540-024-00404-x

Download citation

Received: 19 September 2023
Accepted: 12 July 2024
Published: 19 July 2024
DOI: https://doi.org/10.1038/s41540-024-00404-x
Springer Nature Limited

Self-consistent signal transduction analysis for modeling context-specific signaling cascades and perturbations

Abstract

Similar content being viewed by others

Gene Expression Models of Signaling Pathways

Introduction: Cancer Gene Networks

Inferring Intracellular Signal Transduction Circuitry from Molecular Perturbation Experiments

Introduction

Results

SCSTA

Network construction and translation to a Linear Program

Minimally-perturbed solution

Predicting Pre- and Post-ET in TCGA-BRCA patients

ET-associated changes in proliferation are associated with enhanced overall survival

Pathways predicted to drive ET-associated changes in proliferation

Direct comparison of SCSTA predictions with measured post-ET data

Discussion

Methods

Gain in SCSTA

Estimating ξ

Estimating γ

Modeling ET in breast cancer

Computing phenotype scores

Imputation of feature values for TCGA patients

Survival analysis

Determining drivers of changes in behavior scores

Maximally-dysregulated pathways

Modeling ET in 25 patients with pre- and post-treatment transcriptomes

Linear Regression to predict changes in gene expression under ET

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplemental Information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation