Introduction

Human epididymis protein-4 (HE4), encoded by the gene whey acidic protein (WAP) four-disulfide core domain 2 (WFDC2), has been extensively studied as biomarker in ovarian cancer [1, 2]. Serum HE4, when used in combination with serum CA125 in the Risk of Ovarian Malignancy Algorithm (ROMA), demonstrates significantly greater detection sensitivity at a set specificity of 75% than the Risk of Malignancy Index (RMI), which uses pelvic imaging, menopausal status, and serum CA125 values. Importantly, serum HE4 levels are less commonly elevated in benign gynecologic conditions than CA125 [2]. Moreover, in vitro and in vivo analyses have shown that HE4 promotes ovarian cancer cell growth, extracellular matrix production, metastatic properties, angiogenesis, and immune evasion through regulation of signaling pathways, including focal adhesion kinase (FAK), extracellular signal-regulated kinase (ERK), and signal transducer and activator of transcription 3 (STAT3) (reviewed in [3]). Our previous studies have uncovered a role for HE4 in promoting chemoresistance and extracellular matrix gene expression in ovarian cancer cells, as well as pro-angiogenic gene expression in immune cells [4,5,6]. Nonetheless, a complete understanding of how this enigmatic and multi-functional protein functions in the context of the human ovarian tumor microenvironment to promote chemoresistance and worse survival outcomes remains undetermined. While numerous studies have explored its use as a diagnostic and prognostic serum biomarker, no study to date has examined the transcriptomic signature associated with high- and low-WFDC2 mRNA levels in high grade serous ovarian cancer (HGSOC) patient tumors.

The complexity of the ovarian cancer microenvironment highlights the importance of studying a particular protein in the context of this microenvironment in order to fully understand its significance in human ovarian tumors. Ovarian cancer is unique in that tumorigenic cells originating from the fallopian tubes “seed” onto the ovary, where they find an attractive niche to promote their growth. One factor that may aid in the production of this ovarian niche is the development of fibrosis, which occurs with advancing age [7]. Cancer associated fibroblasts (CAFs) promote tumorigenesis through production of cytokines, growth factors, and extracellular matrix components, leading to worsening fibrosis and tumor growth [8]. Subsequent to the growth of cancer cells in the ovary, these cells are susceptible to spread within the peritoneal cavity, homing to the highly vascular, fatty tissue of the omentum that uses immunologic and metabolic mechanisms to promote tumor growth [9]. In order to understand the role of HE4/WFDC2 in ovarian cancer, it is important to consider its known functions and how these might interplay with this complex ovarian tumor microenvironment.

HE4 is a small secreted glycoprotein and protease inhibitor that was originally identified in the epididymis and suggested to be important in sperm maturation. It was later found to be expressed in several tissue types, namely trachea, salivary gland, kidney, lung, and nasal epithelium, and may be involved in host defense of airways [10,11,12]. In addition to playing a role in ovarian cancer, it is a key fibroblast mediator of kidney fibrosis, inhibiting the ability of matrix metalloproteinases to degrade collagen I, and is also associated with fibrotic diseases of the lung and liver [13,14,15,16,17]. Its role in promoting fibrosis is a critical component to examine given the degree of desmoplastic, fibrotic matrix that characterizes EOC.

To begin investigating the function of HE4/WFDC2 in the context of the ovarian tumor microenvironment, we performed gene expression analysis of 378 HGSOC tumors with RNA-sequencing (RNA-seq) data available through The Cancer Genome Atlas (TCGA), and compared differential gene expression in samples with high- versus low-WFDC2 mRNA levels. We compared these findings to DepMap gene expression correlation data for serous ovarian cancer cell lines. We identified strong co-regulation of WFDC2 and two other WAP-family members, secretory leukocyte peptidase inhibitor (SLPI/WFDC4) and peptidase inhibitor-3 (PI3/WFDC14/elafin), suggesting an important role for this family of protease inhibitors in promoting ovarian tumor pathogenesis. Moreover, our results reveal a potential novel role for HE4 in modulating oxidative phosphorylation metabolic gene expression in HGSOC. Our data support previously identified HE4 functions in mediating extracellular matrix, angiogenesis, and immune response in ovarian cancer cells, while also highlighting the critical importance of examining HE4 effects in the context of the tumor microenvironment.

Results

HE4 (WFDC2) expression correlates with clinical survival outcomes

As a first exploration of WFDC2 expression in the ovarian cancer TCGA dataset (TCGA-OV), we compared WFDC2 mRNA expression with HE4 protein levels using the cBioPortal, revealing a strong correlation between mRNA and protein (Spearman r = 0.74, p = 2.56e-19) (Fig. 1A). These results suggest that WFDC2 mRNA levels are a relatively accurate representation of its protein expression in human ovarian tumors. Secondly, we examined mRNA expression in relationship to copy number alterations (CNAs). Five percent of 594 samples possess putative copy number amplifications, and CNAs were correlated with mRNA expression, with lower mRNA in the samples with shallow deletion, and higher mRNA in the samples with amplification. Most samples were diploid or possessed copy number gains in WFDC2 (Fig. 1B).

Fig. 1
figure 1

HE4 (WFDC2) expression correlates with clinical survival outcomes A WFDC2 mRNA and HE4 protein levels were correlated by Spearman rank correlation analysis using all samples of the Firehose Legacy cohort with mRNA (RNA Seq V2 RSEM) and protein (mass spectrometry by CPTAC) levels available (n = 105), showing strong correlation between mRNA and protein. B Putative copy number alterations (CNAs) were plotted against WFDC2 mRNA (RNA Seq V2 RSEM) levels in the Firehose Legacy cohort. A majority of samples were diploid (n = 95) or had CNA gains (n = 169). Median mRNA levels generally corresponded to CNAs. C WFDC2 mRNA levels (RNA Seq V2 RSEM) were correlated to mutation counts using Spearman rank correlation (left panel). Firehose Legacy TCGA samples were split into low (8–43) and high (43–158) mutation count groups and median WFDC2 mRNA levels (RNA Seq V2 RSEM and U133 microarray) determined for each group (right panels). There was a small, non-significant inverse correlation between WFDC2 and mutation count, as well as a small, but significant decrease in WFDC2 mRNA levels in patients with fewer mutations. D Kaplan–Meier curves for overall survival and progression-free survival were determined for WFDC2 using all cohorts (GEO Series and TCGA) available for ovarian cancer at http://KMplot.com. Analysis was restricted to serous Stage III and IV, grade 2 and 3. Patients with higher WFDC2 levels had worse overall survival. E Kaplan–Meier curves for overall survival and progression-free survival were determined for WFDC2 using all cohorts (GEO Series and TCGA) available for ovarian cancer at http://KMplot.com. Analysis was restricted to serous Stage III and IV, grade 2 and 3, optimally debulked only. Patients with higher WFDC2 levels had worse overall survival and progression-free survival

Next, as it has been reported that lower tumor mutation count is related to decreased progression-free survival (PFS) and overall survival (OS) in ovarian cancer [18], we sought to explore the relationship between WFDC2 and tumor mutation count. There was a small but significant decrease in WFDC2 mRNA expression in tumors with high (43–158) versus low (8–43) raw mutation count, which was also observed in a slight, but non-significant inverse correlation between WFDC2 mRNA expression and mutation count (Spearman r = −0.14, p = 0.058) (Fig. 1C).

To explore the relationship of WFDC2 mRNA with clinical outcomes, we performed Kaplan–Meier analysis for WFDC2 in TCGA-OV and Gene Expression Omnibus (GEO) Series. There was no significant difference in PFS in WFDC2-high versus -low groups when examining all serous samples, and only a non-significant trend toward worse OS in patients with high-WFDC2 (HR = 1.17, p = 0.083) (data not shown). Next, we narrowed down our analysis to stage III and IV, grade 2 and 3 (n = 975). Using the upper quartile cutoff, we again found no significant difference between WFDC2-high and -low groups with regards to PFS; however, WFDC2-high patients had significantly worse OS (HR = 1.22, p = 0.046) (Fig. 1D). When this group was narrowed further to include only optimally debulked patients (n = 495), the WFDC2-high group showed decreased PFS (HR = 1.31, p = 0.037) and OS (HR = 1.38, p = 0.027) (Fig. 1E). Collectively, these results agree with previously published studies showing HE4 serum levels are prognostic for ovarian cancer patients [19], and also suggests that patients with suboptimal debulking may experience poor outcomes regardless of HE4 levels.

Differential gene expression reveals a positive correlation of WFDC2 and SLPI across many cancers

We next performed differential gene expression analysis in TCGA-OV dataset, which was split into high- and low-WFDC2 groups according to median WFDC2 fragments per kilobase of transcript per million mapped reads (FPKM) levels (Table S1). Principal component analysis (PCA) revealed no strong grouping of the samples, which is unsurprising given the large population with a high degree of biological variability inherently present (Fig. 2A). We identified 512 significant differentially expressed genes (DEGs) (p-adj. < 0.05, log2 fold-change ≥|0.5|, protein-coding), with 399 DEGs corresponding to high-WFDC2 expression (“high-WFDC2 DEGs”) and 113 DEGs corresponding to low-WFDC2 expression (“low-WFDC2 DEGs”) (Table S2). A volcano plot was generated to show the top five DEGs, which reassuringly included WFDC2 (Fig. 2B). Notably, another WAP-domain containing protein, secretory leukocyte peptidase inhibitor (SLPI/WFDC4) was also among the top five DEGs that were associated with high-WFDC2 levels. We then performed correlation analyses of all DEGs with log2 fold-change of ≥ 0.5 in either direction. The cBioPortal co-expression feature was used to generate Spearman r values using RNA Seq V2 RSEM data. As expected, the direction of correlations matched well with the differential gene expression analysis (data not shown). All DEGs that were significantly correlated (p < 0.01) with Spearman r ≥ 0.3 were compared by heatmap analysis to fold-change values, displaying a high degree of similarity between the differential gene expression analysis and correlation analysis results (Fig. 2C).

Fig. 2
figure 2

Differential gene expression reveals a positive correlation of WFDC2 and SLPI across many cancers. A Principal component analysis (PCA) of all TCGA-OV samples. B Volcano plot analysis showing top five differentially expressed genes (DEGs) between high- and low-WFDC2 levels. Protein-coding genes significantly changed (p-adj. < 0.05) with log2 fold-change ≥ 0.5 in either direction are shown as red dots. C All DEGs were correlated with WFDC2 in cBioPortal. Log2 fold-change (log2 FC) and Spearman r-values are represented in a heat map side-by-side comparison for all genes that significantly correlated with WFDC2 ≥ 0.3 in either direction. Fold-change data versus correlation data show a high degree of similarity. D, E SLPI was determined to be the high-WFDC2 DEG that most strongly correlated with WFDC2 in the Firehose Legacy cohort, while AKAP12 was the low-WFDC2 DEG most negatively correlated with WFDC2. Average FPKM values for SLPI and AKAP12 were plotted for WFDC2-high versus WFDC2-low samples. ****p < 0.0001 F, G Spearman rank correlations are shown for SLPI and AKAP12. H Pan-cancer Spearman rank correlation analysis of WFDC2 and SLPI

As previously noted, SLPI emerged as a top high-WFDC2 DEG that was also the most strongly positively correlated gene (Spearman r = 0.59, p = 6.9e-30). A kinase anchor protein-12 (AKAP12) emerged as the low-WFDC2 DEG that was most strongly negatively correlated with WFDC2 (Spearman r = −0.36, p = 1.09e-10) (Fig. 2D–G). Since SLPI was very strongly correlated with WFDC2 and is also a WAP-domain containing protease inhibitor, we suspected that these two proteins may show a tendency toward co-regulation. We performed a pan-cancer correlation analysis of WFDC2 and SLPI, which revealed their strong correlation across many cancers, in particular pancreatic adenocarcinoma (PADD), thymoma (THYM), and uterine carcinosarcoma (UCS) (Fig. 2H). While SLPI has been reported to play a role in ovarian cancer pathogenesis and is associated with worse outcomes in ovarian cancer patients [20,21,22], there is no clear understanding of the potential overlapping or divergent roles HE4/WFDC2 and SLPI may have in regulating tumorigenic properties.

In addition to SLPI, the gene peptidase inhibitor-3 (PI3), encoding the protein elafin, was also among the top WFDC2-high DEGs that was strongly correlated with WFDC2 (r = 0.386, p = 2.39e-12). Elafin is also a WFDC protein that is overexpressed in ovarian cancer and related to poor outcomes [23, 24]; however, even less is known about elafin’s role in ovarian cancer than SLPI’s. Nonetheless, the co-regulation of these three WFDC proteins in ovarian cancer suggests an important role for this protein family in this disease.

Gene ontology analysis implicates metabolism and extracellular matrix correlations with WFDC2 mRNA expression

Gene ontology (GO) analysis was performed on the DEGs of high- and low-WFDC2 expressing tumors. The complete lists of enriched categories are shown in Tables S3 and S4. Strikingly, categories related to oxidative phosphorylation (OXPHOS)/mitochondrial metabolism were highly enriched in high-WFDC2 DEGs. Categories termed “antimicrobial humoral response” and “neutrophil activation” were also significantly enriched, which is interesting in light of a previously reported role for HE4 in regulating innate immunity of the respiratory tract [10] (Fig. 3A, Table S3). Of note, in the “neutrophil activation” category, CXCL8 (Interleukin-8; IL8), was among the identified DEGs, which is in agreement with our previous study reporting on the regulation of CXCL8/IL8 by HE4 in immune cells [6].

Fig. 3
figure 3

Gene ontology analysis implicates metabolism and extracellular matrix correlations with WFDC2 mRNA expression. A Gene ontology analysis was performed for all DEGs associated with high-WFDC2, revealing enrichment in categories related to metabolism/oxidative phosphorylation. Number of genes in each category (“Count”) are indicated by circle size, while adjusted p-value (“p.adjust”) is indicated by color. B Gene ontology analysis was performed for all differentially expressed genes associated with low-WFDC2, revealing enrichment in categories related to extracellular matrix, vascular development, and proliferation. Number of genes in each category (“Count”) are indicated by circle size, while adjusted p-value (“p.adjust”) is indicated by color

GO assessment of low-WFDC2 DEGs revealed enrichment for categories related to extracellular matrix, vascular development, epithelial cell proliferation, and ERK signaling (Fig. 3B, Table S4). These results were particularly surprising given the consistent reports of the stimulatory role of WFDC2 in metastasis, angiogenesis, proliferation, and ERK signaling [3, 6], raising the possibility of negative feedback mechanisms producing this unexpected result. In support of this hypothesis, we noted that expression of tenascin-C (TNC), which we have previously found to be upregulated by HE4 overexpression or treatment in OVCAR8 ovarian cancer cells [5], was a low-WFDC2 DEG. Alternatively, the enrichment of these categories may occur through selective alterations in genes that negatively regulate these processes. In support of this hypothesis, we noted that early growth response-1 (EGR1) was a low-WFDC2 DEG. We have previously reported that cisplatin-induced EGR1 expression was suppressed in WFDC2-overexpressing cells [4], supporting a role for HE4 in suppressing the apoptosis-promoting effects of EGR1 in response to chemotherapy. Moreover, AKAP12 was listed in the “ERK1 and ERK2 cascade” category, and is described as a tumor suppressor known to suppress the ERK signaling pathway [25]. Collectively, HE4 may regulate these various pathways through a combination of feedback mechanisms or negative regulatory approaches, which in some cases may be more accurately represented in the complex tumor microenvironment than in ovarian cancer cell lines.

Survival outcomes related to top correlated DEGs

We then performed Kaplan–Meier analyses for OS for all of the top correlated DEGs. To keep the comparison between WFDC2 and these genes comparable, we used the same parameters we found to produce the most prognostic results for WFDC2 in Fig. 1 (stage III, IV; grade 2, 3 disease, optimally debulked, top quartile cutoff). All genes with hazard ratios (HR) ≤ 0.75 or ≥ 1.5 (log-rank p < 0.01) are shown in Fig. 4A–N. Several genes had higher HRs than WFDC2, including SLPI (HR = 1.63 [1.23–2.16], p = 0.00056) and several NADH:ubiquinone oxidoreductase (NDU) family genes. Using average expression of all these NDU genes also demonstrated a significant HR of 1.8 [1.36–2.4] (p = 3.6e-5) (Fig. 4M). The most prognostic gene was reactive oxygen species modulator-1 (ROMO1), with an HR of 2.7 [1.6–4.55], p = 0.00011) (Fig. 4D). These data suggest that the coordinate expression of specific sets of genes related to WFDC2 expression may indicate patient clinical outcomes in EOC.

Fig. 4
figure 4

Survival outcomes related to top correlated DEGs. Kaplan–Meier curves were generated at http://KMplot.com for all available datasets (TCGA and GEO Series). Top DEGs that were most associated with overall survival (OS) are shown in A–L (hazard ratio (HR) > 1.5; p < 0.01). M All NADH:Ubiquinone (NDU) genes were combined and analyzed by Kaplan–Meier. N Summary of hazard ratios

WFDC2 levels influence immune cell infiltration

Using TIMER 2.0, we examined all immune deconvolution methods to determine immune cell populations significantly correlating with WFDC2 expression in TCGA. B cells (TIMER) and plasmacytoid dendritic cells (XCELL) were positively correlated with WFDC2 levels (Spearman r = 0.243, p = 1.02e-04 and Spearman r = 0.221, p = 4.41e-04, respectively).

Conversely, neutrophils (MCPCOUNTER) (Spearman r = −0.278, p = 8.45e-06), and endothelial cells (MCPCOUNTER, EPIC) (Spearman r = −0.309, p = 6.43e-07 and Spearman r = −0.206, p = 1.07e-03, respectively) were significantly negatively correlated with WFDC2 levels, however the correlations were overall weak (Fig. 5A–F). The putative reduction of endothelial cells in WFDC2-high tumors could reflect hypoxia-induced increases in WFDC2 levels, as has been reported to occur during renal fibrosis and in gastric cancer [26, 27].

Fig. 5
figure 5

WFDC2 levels influence immune cell infiltration. TIMER 2.0 was used to determine the relationship between immune cell infiltration and WFDC2 transcripts per million (TPM) in TCGA-OV dataset. Purity correction was performed for all analyses (A). Significantly correlated immune subsets using the indicated algorithm (TIMER, XCELL, MCPCOUNTER, or EPIC) are shown in BF

Comparison of TCGA data with DepMap ovarian cancer cell line data

We next went on to compare DepMap Cancer Cell Line Encyclopedia (CCLE) cell line expression data at https://depmap.org/portal/ (Table S5). First, we analyzed SLPI and PI3 (elafin) correlation data for 24 HGSOC cell lines. We found these genes to be strongly and significantly correlated with WFDC2, confirming their relationship with WFDC2 (Fig. 6A, B). Next, we stratified the cell lines according to median WFDC2 transcripts per million (TPM) and then analyzed the gene expression of the top five low- and top five high-WFDC2 expressing cell lines using iDEP.94 k-means clustering [28]. PCA revealed a good clustering of low- versus high-WFDC2 cell lines (Fig. 6C). Next, we performed k-means clustering using the top 500 most variable genes (Fig. 6D, Table S6). Gene ontology analysis of the resulting clusters revealed that the genes associated with low-WFDC2 were enriched for categories including extracellular matrix, cell-substrate adhesion, and angiogenesis, which matched the enriched categories we found in our TCGA analysis (Fig. 6E, Table S7). When examining the genes associated with high-WFDC2, we found the enriched categories were involved in epidermis development, epithelial cell proliferation, and negative regulation of peptidase activity (Fig. 6F, Table S8). These results are not entirely unexpected, since HE4 is a protease inhibitor with a known role in promoting proliferation [3]; however, these categories differ from enriched categories in our TCGA analysis. It is possible these differences are related to the lack of tumor microenvironment interactions and hypoxia in the cell lines that are present in ovarian tumors. Overall, these results confirm the role of HE4 in regulating extracellular matrix functions, but suggest it may also have additional effects in an in vivo context versus in vitro.

Fig. 6
figure 6

Comparison of TCGA data with DepMap ovarian cancer cell line data. A, B Pearson correlation of SLPI and PI3 (elafin) with WFDC2 TPMs in DepMap HGSOC cell lines. C Principal component analysis (PCA) of the top five low- and high-WFDC2 expressing HGSOC cell lines. D Two k-means clustering analysis of the top five low- and high-WFDC2 expressing cell lines, using the 500 most variable genes. E Gene ontology analysis was performed for Cluster B genes (genes associated with low-WFDC2 in DepMap HGSOC cell lines), revealing enrichment in categories related to extracellular matrix and adhesion. Number of genes in each category (“Count”) are indicated by circle size, while adjusted p-value (“p.adjust”) is indicated by color. F Gene ontology analysis was performed for k-means cluster A genes (genes associated with high-WFDC2 in DepMap HGSOC cell lines), revealing enrichment in categories related to epidermis development, proliferation, and peptidase activity. Number of genes in each category (“Count”) are indicated by circle size, while adjusted p-value (“p.adjust”) is indicated by color

Discussion

In this analysis, we identified SLPI as the gene most prominently correlated with WFDC2. In differential gene expression analysis, it was also among the top five genes with significantly higher levels in HGSOC samples with high-WFDC2 levels. SLPI and HE4 are both WFDC proteins with reported anti-protease functions [11]. While HE4 has been extensively studied as a biomarker and for its tumor promoting functions, SLPI has received overall less attention. Our results confirmed that SLPI is a prognostic factor in ovarian cancer, with its transcript levels showing better prognostication than WFDC2. We also determined that SLPI is broadly correlated with WFDC2 across cancer types, suggesting that these two genes are frequently co-regulated. One unanswered question regarding the functions of SLPI and HE4 is whether they possess redundant anti-protease functions or if their functions are unique. Supporting their potential unique roles is one study showing their co-expression in similar tissues of the respiratory tract, but divergent subcellular localizations [10]. The question of these two proteins’ overlapping versus divergent functions will be important to answer if either protein were to be targeted therapeutically.

One surprising result from gene ontology analysis of high-WFDC2 DEGs was the enrichment of OXPHOS/mitochondrial metabolism categories. No previous studies in ovarian cancer cell lines or mouse models have identified a potential relationship between HE4 and metabolism. While previous dogma stated that tumors undergo a glycolytic switch whereby they move from relying on OXPHOS to glycolysis, more recent studies show that OXPHOS is an important metabolic pathway in several tumor types [29]. In ovarian cancer, OXPHOS is important in mediating resistance to chemotherapy [30,31,32]. Interestingly, one study also reported that ovarian cancer stem cells prioritize OXPHOS for their metabolism [33], suggesting high OXPHOS metabolism to be a potential defining characteristic of a subset of chemoresistant cancer stem cells. Single-cell sequencing of ovarian tumors identified epithelial and stromal cell subsets in ovarian tumors, with the epithelial subset being characterized by OXPHOS genes [34]. These studies point to the importance of OXPHOS as a key energy production pathway in ovarian cancer and our results raise the possibility of a connection between HE4/WFDC2 and hypoxia, fibrosis, and OXPHOS metabolism. However, the interplay between these various features remains to be elucidated.

Other enriched categories in high-WFDC2 DEGs were related to immune response, including “antimicrobial humoral immunity”, and “neutrophil activation”. An early study on HE4 reported its potential role in innate immunity and epithelial host defense in the respiratory tract [10]. Recently, a correlation between innate immune response and serum HE4 was confirmed in COVID-19 patients [35]. We have also studied its role in regulating gene expression in immune cells, finding that it regulates several important immune genes and promotes secretion of Interleukin-8 (IL8/CXCL8) from peripheral blood mononuclear cells (PBMCs), and contributes to suppression of PBMC cytotoxicity against cancer cells [6, 36, 37]. Given that IL8/CXCL8, which was also a high-WFDC2 DEG in this current analysis, is a potent neutrophil chemotactic factor [38], it made sense that categories related to neutrophil activation and degranulation were enriched. However, somewhat contradictorily, we observed a weak association with reduced neutrophil infiltration according to the TIMER analysis. Further studies are required to query this TIMER result and elucidate the reasoning behind these contradictory findings.

Several interesting GO categories were enriched in the low-WFDC2 DEG group, with a heavy focus on categories related to extracellular matrix (ECM) organization. As previously mentioned, this enrichment of ECM categories in the low-WFDC2 group could result from negative feedback mechanisms, or through downregulation of genes that negatively regulate these processes. Notably, many collagen genes were downregulated, which could signal negative feedback mechanisms or an imbalance of various collagen types, leading to a fibrotic phenotype. The relative amounts and spatial location of various collagens may play a role in the development of fibrosis [39]. Furthermore, an interesting connection between OXPHOS categories and ECM categories exists, with studies reporting that metabolic functions and ECM regulation are intricately linked. Modulation of OXPHOS in osteosarcoma cells resulted in significant changes in ECM genes [40], and triple-negative breast cancer demonstrated increased OXPHOS with decreasing collagen density [41]. Metformin, a drug used to treat diabetes and a known inhibitor of fibrosis [42, 43] and OXPHOS [44], suppresses ovarian cancer growth and metastasis [45], further supporting the connection between these tumor features. Further studies are needed to determine if HE4-mediated changes in ECM regulate metabolic pathways in HGSOC.

“Regulation of response to growth factor stimulus” and the “ERK1 and ERK2 cascade” were two other enriched categories in the low-WFDC2 DEG group. It has been well-documented that HE4 promotes activation of the ERK pathway and other signaling pathways in ovarian cancer (reviewed in [3]). Therefore, a possible explanation for this GO finding can be found by examining the specific genes found in the ERK enrichment group, which include AKAP12 and elastin microfibril interfacer-1 (EMILIN1)—both negative regulators of the ERK pathway [25, 46]. Finally, the “vascular development” category was also enriched in the low-WFDC2 DEG group, which highly overlapped with ECM genes. EGR1, which mediates apoptosis, was also listed in this category, which supports our previous study finding HE4-mediated suppression of EGR1 expression following cisplatin treatment [4]. It is unclear whether the presence of this category actually signals a reduction in vascular formation or angiogenesis, which would disagree with our previous study showing an angiogenesis promoting effect of HE4 via regulation of STAT3 and IL8 in vitro [6]. However, the TIMER immune cell data does suggest reduced endothelial cell infiltration, although these computational findings will need to be confirmed experimentally.

The comparison of these TCGA analysis results with correlation data from high- and low-WFDC2 expressing HGSOC cell lines revealed some interesting findings. We observed an agreement in GO analysis of low-WFDC2 associated genes in the categories of extracellular matrix, cell-substrate adhesion, and angiogenesis, confirming an important role for HE4 in regulating these functions. However, the enriched categories among high-WFDC2 associated genes were notably different between TCGA and HGSOC cell lines, with TCGA data pointing to metabolic functions and the DepMap data pointing to peptidase regulation and epithelial cell proliferation. These data do not preclude the regulation of peptidase activity and proliferation in an in vivo context, but suggest there are tumor microenvironmental effects that lead to a strong association of HE4 with OXPHOS metabolism. Despite these differences, we again noted a strong correlation of WFDC2 with SLPI and PI3 (elafin) in the cell lines, confirming the association of these genes in ovarian cancer. In summary, we have identified genes differentially expressed between HGSOC with high- versus low-WFDC2 expression in human ovarian tumors and cell lines, revealing novel insights but also raising new questions regarding the pathogenic processes that may be regulated by the HE4 protein in the context of the tumor microenvironment in HGSOC. Future studies will further elaborate on the role of HE4 in extracellular matrix regulation and angiogenesis that we have already reported in vitro [5, 6], as well as studies on the metabolic effects of HE4 in an in vivo context.

Methods

cBioPortal

For all cBioPortal [47, 48] analyses, TCGA ovarian cancer Firehose Legacy dataset was explored. All gene correlations were performed using the “co-expression” feature. Protein and mRNA correlation and mRNA levels according to copy number alterations were performed using the “plots” feature. Mutation count analyses were performed using the “plots” feature and the “mutations count” feature selecting “median”. RNA Seq V2 RSEM or U133 microarray data were used where indicated.

DepMap Portal

Cell line expression data (CCLE Expression Public 21Q4) available in the DepMap portal [49] (https://depmap.org/portal/) were downloaded for k-means clustering analysis or analyzed using the “Data Explorer” feature in “Tools”. Data were analyzed for 24 HGSOC cell lines and correlation data were downloaded for SLPI and PI3 with WFDC2.

Differential gene expression analysis

The Cancer Genome Atlas (TCGA) ovarian cancer dataset with complete RNA-sequencing results in Fragments Per Kilobase of transcript per Million mapped reads (FPKM) (n = 378) was obtained using GenomicDataCommons (version 1.12.0) and RStudio (R version 4.0.0) [50, 51]. The R scripts used for plotting and to identify the high- and low-WFDC2 DEGs in this publication are available to the public: https://github.com/mg859337/WFDC2_TCGA_Analysis. The median FPKM of WFDC2 was calculated using the FPKM table from TGCA. FPKM values were used to split the samples into WFDC2-high and -low groups. Based on this, a metadata file for this dataset was manually created in Excel and saved as a csv. The gene count table from TCGA was used to create a PCA plot by variance-stabilizing transformation (vst) of the dds created using DESeq2 (v1.28.1) and plotted using ggplot2 (v3.3.3) in RStudio (Rv4.0.2) (deseq ref, ggplot ref). DESeq2 differential expression analysis was run with the design of “ ~ WFDC2”. Differentially expressed genes (DEGs) were defined as protein-coding genes with log2 fold-change of ≥ 0.5 or ≤ −0.5 that had a p-adjust value < 0.05. The volcano plot was created using ggplot2, dplyr (v1.0.2), and ggrepel (v0.8.2) with DEGs plotted in red.

iDEP.94 k-means clustering analysis

HGSOC cell line expression data (CCLE Expression Public 21Q4) available in the DepMap portal [49] (https://depmap.org/portal/) was downloaded. The cell lines were stratified according to median transcripts per million (TPM), and then the top five low-WFDC2 and top five high-WFDC2 expressing cell lines were determined. Data for these cell lines were uploaded into the iDEP.94 portal [28], and the data were pre-processed, log-transformed, and very low expressing genes were filtered (genes with 0 TPMs in half the samples). A PCA plot was generated, and k-means clustering analysis was performed for two clusters with the 500 most variable genes.

Gene ontology analysis

The R scripts used for gene ontology analysis are available to the public: https://github.com/mg859337/WFDC2_TCGA_Analysis. The DEGs identified using DESeq2 were divided into “high-WFDC2 DEGs” and “low-WFDC2 DEGs” based on their log2 fold-change. The gene lists for the DepMap gene ontology analysis were generated from k-means clustering data. The gene lists were saved into text files and used as input for clusterProfiler (v3.16.1). To use clusterProfiler, loading the packages org.Hs.eg.db (v3.11.4), DOSE (V3.14.0), and ggplot2 (v3.3.3) into RStudio (Rv4.0.2) was also necessary [52,53,54,55]. From the gene ontology results of clusterProfiler, dotplots were constructed for each of the DEG lists ordered by “GeneRatio”.

Kaplan–Meier curves

The ovarian cancer plotter at http://KMplot.com [56] was used to determine progression-free survival (PFS) and overall survival (OS) in TCGA and Gene Expression Omnibus (GEO) Series cohorts limited to stage III and IV, grade 2 and 3 samples. N = 472 (PFS) and n = 495 (OS) for all genes limited to optimally debulked samples except ROMO1, for which only 384–387 samples were available for analysis. n = 942 (PFS) and n = 975 (OS) for analysis of WFDC2 in all serous stage III and IV, grade 2 and 3 samples. Top quartile of expression was used to delineate low and high expressing groups for all analyses.

TIMER immune cell subset analysis

TIMER 2.0 for immune cell subsets with purity adjustment was performed for WFDC2 in TCGA-OV dataset (n = 303), using the website http://timer.comp-genomics.org/. WFDC2 and SLPI pancancer correlation analysis was also performed using TIMER 2.0 “Gene_Corr” feature [57].