Construction of co-expression modules of OC
After removing the missing value of gene expression from raw data, quantile normalization, and WGCNA package filtration, the datasets with 2562 lncRNAs and the top 5000 mRNAs were selected for WGCNA analysis. The expression values of 2562 lncRNAs in 352 OC samples (Supplementary Table 1) and 5000 mRNAs in 359 OC samples (Supplementary Table 2) were utilized to establish co-expression modules with WGCNA package. The clinical characteristics of these 370 (combined those 352 and 359 OC samples with removal of the replicates) eligible patients were summarized (Supplementary Table 3). The samples were clustered by the FlashClust tool with average linkage method and Pearson’s correlation method. Sample clustering identified outliers based on lncRNA and mRNA data, respectively. The red line was the cutoff value to filter data (Fig. 1A). All the samples were in the clusters after removing outliers in the samples based on lncRNA data and mRNA data (Fig. 1B). Sample dendrogram and trait heatmap were plotted based on lncRNA (or mRNA) expression data and mRNA clinical data (Fig. 1B). The approach of algorithm made every sample in different clusters, and showed clinical-data distribution. The power value was the most critical parameter to mainly influence the average connectivity degree and the independence of each co-expression module. Firstly, the power β was selected in lncRNA and mRNA groups, respectively. When β = 3, the scale R2 was 0.88 to obtain a higher average connectivity degree in the lncRNA group. When β = 4, the scale R2 was 0.91 to obtain a higher average connectivity degree in the mRNA group (Fig. 2A and B). Thereby, the β determined distinct gene co-expression modules in OCs. The cluster dendrogram of all selected genes was clustered with the adjacency matrix. These co-expression modules were shown (Fig. 2C). These co-expression modules were distributed within a range from small to large due to the number of included genes. Their interactions were analyzed between co-expression modules.
Heatmap was plotted to reflect topological overlap. Each column and row represented a gene. Low topological overlap was shown in light color, and higher topological overlap was shown in progressively darker red. Each module was shown in darker squares. The network heatmap plot of all genes and module assignment were shown (Fig. 2D). Hierarchical clustering revealed module eigengenes to summarize the modules. Dendrogram branches were grouped together with positively correlated eigengenes. One color module eigengene was shown in each column and row of heatmap: low adjacency was negative correlation in blue, and high adjacency was positive correlation in red. The red squares along the diagonal were defined as meta-modules (Fig. 2E).
Gene co-expression modules corresponding to clinical traits
The association analysis was performed between common expression eigengene pattern in co-expression module and the particular clinical trait dataset from the TCGA database, including age at initial pathologic diagnosis, Karnofsky performance score, lymphatic invasion, histologic grade, cancer status, clinical stage, tissue source site, tumor residual disease, and vascular invasion (Fig. 3A). Heatmap was constructed for the correlation between clinical traits and module eigengenes in ovarian cancer, with r and p values. Based on heatmap of module-trait relationship for lncRNA, gene co-expression module and clinical traits demonstrated that the green module in Fig. 3A was significantly associated with OC Karnofsky performance score, which indicated the close relation of lncRNAs in this co-expression module to the activities of daily life (independent, semi-independent, or dependent) after the patients received treatment. The blue and turquoise modules in Fig. 3A were significantly associated with OC tissue source site, which indicated heterogeneity of gene expression; namely, gene expression was different in different tissues, different even in the same organ tissue. Based on the heatmap of module–trait relationship for mRNA, gene co-expression module and clinical traits demonstrated that the black module in Fig. 3A was significantly associated with OC Karnofsky performance score, which indicated the close relation of mRNAs in this co-expression module to the activities of daily life (independent, semi-independent, or dependent) after the patients received treatment. The blue, green, and purple modules in Fig. 3A were significantly associated with OC lymphatic invasion, histologic grade, and vascular invasion, respectively, which indicated that mRNAs in those co-expression modules were closely related to OC metastasis. Various co-expression modules were related to the clinical trait of tissue source site in module–trait relationship for mRNA, including green, yellow, red, turquoise, purple, and brown modules (Fig. 3A), which indicated that a large heterogeneity exists from different origins of ovary cancer sites. The brown module in the lncRNA group and the yellow module in the mRNA group in Fig. 3A were chosen as key modules for further study according to correlation coefficient (r) and p values, and those two co-expression modules were associated with multiple clinical traits. For lncRNAs, the correlation analysis of gene co-expression module and clinical traits demonstrated that the brown modules that contained 168 RNAs (Fig. 3A; Supplementary Table 4) were significantly associated with OC clinical traits, including age at initial pathologic diagnosis (r = − 0.17, p = 2.0E− 03), Karnofsky performance score (r = − 0.18, p = 5E− 04), clinical stage (r = 0.14, p = 8.0E− 08), tissue source site (r = 0.11, p = 4.0E− 02), and vascular invasion (r = 0.25, p = 1.0E− 06). For mRNAs, the correlation analysis between clinical traits and gene co-expression modules demonstrated that the yellow modules that contained 318 mRNAs (Fig. 3A; Supplementary Table 5) were significantly associated with OC clinical traits, including age at initial pathologic diagnosis (r = 0.17, p = 1.0E− 03), lymphatic invasion (r = −0.25, p = 2E− 06), tumor residual disease (r = − 0.14, p = 8.0E− 03), and vascular invasion (r = − 0.25, p = 2E− 06). Furthermore, the scatterplot was plotted between GS and MM in lncRNA-based brown module and mRNA-based yellow module, respectively. Scatterplot was constructed between MM in x-axis and GS in y-axis for lncRNA-based brown module, and mRNA-based yellow module. In the module–trait relationships, the higher MM value means the higher GS, which suggested hub genes in brown co-expression module or yellow co-expression module were also highly associated with selected clinical characteristics. The results revealed that MM in lncRNA-based brown module was significantly correlated with age at initial pathologic diagnosis (r = − 0.15, p = 3.5E− 02), lymphatic invasion (r = 0.36, p = 1.9E− 07), tissue source site (r = 0.38, p = 3.4E− 08), and vascular invasion (r = 0.28, p = 6.5E− 05) (Fig. 3B), and that MM in mRNA-based yellow module was significantly correlated with age at initial pathologic diagnosis (r = 0.24, p = 1.5E− 05), lymphatic invasion (r = 0.29, p = 1.4E− 07), tumor residual disease (r = 0.17, p = 2.4E− 03), and vascular invasion (r = 0.23, p = 3.5E− 05) (Fig. 3B).
Functional enrichment analysis of mRNAs in an mRNA-based co-expression module
KEGG pathway analysis revealed ten statistically significant signaling pathways to involve mRNAs identified in mRNA-based yellow co-expression module (Supplementary Table 6); and interestingly, OC cells had the enhanced dependence on multiple signaling pathways, including Hippo signaling pathway, basal cell carcinoma, melanogenesis, Wnt signaling pathway, pathways in cancer, proteoglycans in cancer, aldosterone synthesis and secretion, gap junction, ovarian steroidogenesis, and signaling pathways regulating pluripotency of stem cells. For example, organ growth depends on a series of cell biological processes, including cell proliferation, cell division, and programmed cell death. Hippo signaling pathway inhibits cell proliferation and induces apoptosis , which is becoming increasingly important in the study of uncontrolled cell division in cancer. The notably enriched mRNAs in Hippo signaling pathway included WNT5A, DLG4, LEF1, TEAD2, PARD6G, FZD2, BMP7, WNT6, TCF7L1, FZD7, and BMP6. The Wnt signaling pathway plays important roles in many diseases. CTNNB1 mRNA profile alteration, which encodes β-catenin protein, was found in melanoma, breast colorectal, lung, prostate, and other cancers. One study found that Wnt ligand proteins (Wnt 1, Wnt2, and Wnt7A) were significantly upregulated in esophageal cancer, glioblastoma, and OC . Other changed proteins included SFRP4, ROR1, ROR2, WIF1 Wnt5A, and TCF/LEF family. The notably enriched mRNAs in Wnt signaling pathway included WNT5A, GPC4, PLCB4, LEF1, FZD2, BAMBI, WNT6, TCF7L1, and FZD7. Hormone hypothesis in OCs recognized that hormones were OC risk factors, including androgens, gonadotropin, insulin-like growth factor I, progesterone, estrogens, and insulin, and androgens were associated with increased risk of ovarian-origin cancers . The notably enriched mRNAs in ovarian steroidogenesis pathway included CYP17A1, CYP11A1, STAR, and BMP6.
GO enrichment analysis of mRNAs in mRNA-based yellow co-expression module revealed cellular component (CC) (Fig. 4A; Supplementary Table 7), molecular function (MF) (Fig. 4B; Supplementary Table 8), and biological process (BP) (Fig. 4C; Supplementary Table 9). For CC enrichment, the mRNAs in mRNA-based yellow co-expression module were mainly distributed in postsynapse, neuron projection, somatodendritic compartment, axon part, Golgi lumen, endocytic vesicle membrane, dendritic shaft, plasma membrane protein complex, membrane microdomain, perinuclear region of cytoplasm, sarcoplasmic reticulum, and proteinaceous extracellular matrix. For MF enrichment, the mRNAs in mRNA-based yellow co-expression module were mainly distributed in Wnt-protein binding, adrenergic receptor binding, frizzled binding, transforming growth factor beta receptor binding, fibroblast growth factor binding, potassium channel activity, calcium-ion binding, PDZ-domain binding, copper-ion binding, S100 protein binding, protein serine/threonine kinase inhibitor activity, scaffold protein binding, chemoattractant activity, heparan sulfate proteoglycan binding, and cysteine-type endopeptidase regulator activity involved in apoptotic process. For BP enrichment, the mRNAs in mRNA-based yellow co-expression module were classified into ten groups to involve major BPs, including urogenital system development, mesoderm formation, mesenchyme development, cardiac muscle tissue development, endocrine system development, kidney morphogenesis, embryonic organ development, epithelial tube morphogenesis, morphogenesis of a branching epithelium, gland morphogenesis, and neuroepithelial cell differentiation.
Hub genes and survival-associated genes
The intramodular connectivity was to sum connection strengths with other module genes, and was divided by the maximum intramodular connectivity. High intramodular connectivity was defined as MCODE score > 6 and p < 0.05, whose genes were looked as intramodular hub genes. A total of 21 hub-mRNAs were identified from 318 mRNAs in mRNA-based yellow co-expression module, including FBN3, EFS, MSI1, TCF7L1, FXYD6, ZNF423, SULT1C4, SBK1, TRO, SMO, SALL2, TUBB2B, PLCG1, LRP4, KIAA1549, PHC1, RHOBTB1, DNMT3A, TMEFF1, LAMA1, and C10orf82.
The K-M plot analysis revealed that 11 out of 21 hub-mRNAs in the mRNA-based yellow co-expression module were significantly related to OC overall survival (p < 0.05), including FBN3 (HR = 1.48, p = 4.9E− 04), EFS (HR = 1.27, p = 3.1E− 04), TCF7L1 (HR = 1.18, p = 3.3E− 02), SBK1 (HR = 1.26, p = 3.5E− 02), TRO (HR = 1.19, p = 1.5E− 02), TUBB2B (HR = 1.26, p = 6.2E− 04), PLCG1 (HR = 1.15, p = 3.4E− 02), KIAA1549 (HR = 1.22, p = 2.9E− 03), DNMT3A (HR = 1.33, p = 7.0E− 03), LAMA1 (HR = 1.48, p = 1.6E− 04), and C10orf82 (HR = 1.36, p = 3.4E− 03) (Fig. 5). The K-M plot analysis revealed that 16 out of 168 lncRNAs in lncRNA-based brown co-expression module were significantly related to OC overall survival (p < 0.05), including ACTA2-AS1 (HR = 1.38, p = 2.1E− 03), CARD8-AS1 (HR = 1.31, p = 9.3E− 03), HCP5 (HR = 0.81, p = 4.0E− 03), HHIP-AS1 (HR = 1.39, p = 1.4E− 03), HOTAIRM1 (HR = 1.33, p = 7.0E− 03), ITGB2-AS1 (HR = 0.64, p = 9.0E− 05), LINC00324 (HR = 0.75, p = 2.2E− 02), LINC00605 (HR = 1.32, p = 8.3E− 03), LINC01503 (HR = 1.36, p = 5.8E− 03), LINC01547 (HR = 1.28, p = 1.9E− 03), MIR31HG (HR = 1.39, p = 2.5E− 03), MIR155HG (HR = 0.78, p = 1.5E− 02), OTUD6B-AS1 (HR = 1.3, p = 1.1E− 02), PSMG3-AS1 (HR = 0.78, p = 2.1E− 02), SH3PXD2A-AS1 (HR = 0.78, p = 2.4E− 02), and ZBED5-AS1 (HR = 0.79, p = 2.3E− 02) (Fig. 5).
Moreover, RStudio software was used to determine co-expressions of lncRNAs and mRNAs (Fig. 6A), and obtain their correlation coefficients (Supplementary Table 10) and p values (Supplementary Table 11). Some highly correlated (|correlation coefficient| ≥ 0.4, p < 0.05) mRNA–lncRNA, mRNA–mRNA, or lncRNA–lncRNA pairs were identified, including EFS and HHIP-AS1, HHIP-AS1 and TCF7L1, RHOBTB1 and HHIP-AS1, ACTA2-AS1 and HHIP-AS1, CARD8-AS1 and HCP5, LINC00324 and CARD8-AS1, ITGB2-AS1 and LINC01547, LRP4 and TCF7L1, SALL2 and TRO, DNMT3A and PLCG1, and SMO and KIAA1549. Those high-correlation hub-mRNAs and hub-lncRNAs are worthy for further studying to demonstrate their encoded spatiotemporal dynamics.
In addition, survival risk score system was constructed with 21 identified hub-mRNAs and 16 survival-associated lncRNAs using the multivariate regression module in SPSS 20 software. A statistically significant regression equation (Fig. 6B; p < 0.05) was generated to calculate the survival risk score: survival risk score = (− 0.115 × expression level of OTUD6B-AS1) + (− 0.129 × expression level of PSMG3-AS1) + (0.18 × expression level of ZBED5-AS1) + (0.223 × expression level of SBK1) + (− 0.219 × expression level of PLCG1). For this survival risk score system, a higher score indicated a longer survival time or a lower mortality risk for OC patients.
Network analysis and RT-qPCR confirmed the identified molecules
lncRNA–RNA binding protein-mRNA network analyses were used to determine whether lncRNAs regulate hub-mRNAs through RNA-binding proteins. This type of network analysis found that 8 lncRNAs (ACTA2-AS1, HCP5, HOTAIRM1, ITGB2-AS1, LINC00324, MIR155HG, MIR31HG, and PSMG3-AS1), 17 RNA-binding proteins (HuR, eIF4AIII, FUS, U2AF65, PTB, FMRP, LIN28A, UPF1, IGF2BP1, DGCR8, CAPRIN1, SFRS1, TIAL1, hnRNPC, LIN28B, LIN28, and TDP43), and 20 hub-mRNAs (MSI1, PLCG1, SALL2, TUBB2B, DNMT3A, FBN3, KIAA1549, LAMA1, LRP4, SBK1, SMO, SULT1C4, TMEFF1, PHC1, RHOBTB1, TCF7L1, TRO, ZNF423, EFS, and FXYD6) were involved in the network (Fig. 7A). A ceRNA network analysis was used to determine whether lncRNAs regulate hub-mRNAs through miRNAs. Moreover, the ceRNA network analysis found that 4 lncRNAs (HOTAIRM1, HCP5, PSMG3-AS1, and MIR155HG), 35 miRNAs (miR-106a-5p, miR-106b-5p, miR-128-3p, miR-139-5p, miR-140-5p, miR-144-3p, miR-17-5p, miR-186-5p, miR-203a, miR-20a-5p, miR-20b-5p, miR-214-3p, miR-216a-5p, miR-27a-3p, miR-27b-3p, miR-299-3p, miR-29a-3p, miR-29b-3p miR-29c-3p, miR-328-3p, miR-519d-3p, miR-93-5p, miR-103a-3p, miR-107, miR-129-5p, miR-137, miR-148a-3p, miR-148b-3p, miR-152-3p, miR-155-5p, miR-194-5p, miR-490-3p, miR-495-3p, miR-143-3p, miR-210-3p), and 15 hub-mRNAs (KIAA1549, TCF7L1, TUBB2B, LAMA1, RHOBTB1, TMEFF1, PHC1, PLCG1, SBK1, LRP4, MSI1, DNMT3A, SALL2, SMO, and ZNF423) were involved in a ceRNA network (Fig. 7B).
Furthermore, qRT-PCR was used to validate the expressions of OC survival-associated lncRNAs and hub-mRNAs that are from WGCNA analysis, including 16 lncRNAs (ITGB2-AS1, OTUD6B-AS1, PSMG3-AS1, LINC00324, LINC01503, HOTAIRM1, LINC01547, SH3PXD2A-AS1, HCP5, MIR31HG, MIR155HG, ZBED5-AS1, LINC00605, ACT2-AS, CARD8-AS1, and HHIP-AS1) and 11 hub-mRNAs (LAMA1, KIAA1549, TCF7L1, DNMT3A, EFS, SBK1, PLCG1, C10orf82, TUBB2B, TRO, and FBN3) in 3 cultured OC cells and 1 control cell (Fig. 8). Among them, the too low expressions of four lncRNAs (LINC00605, ACT2-AS, CARD8-AS1, and HHIP-AS1) cause their difficulty to be quantified with qRT-PCR. The results showed that no significant difference was found for three lncRNAs (PSMG3-AS1, LINC01547, and ZBED5-AS1) between OC cells (SK-OV3, TOV-21G, and A2780) and control cell IOSE80 (p > 0.05), whereas significant difference was found for nine survival-associated lncRNAs (ITGB2-AS1, OTUD6B-AS1, LINC00324, LINC01503, HOTAIRM1, SH3PXD2A-AS1, HCP5, MIR31HG, and MIR155HG) (Fig. 8A), and nine survival-associated hub-mRNAs (LAMA1, KIAA1549, TCF7L1, DNMT3A, EFS, SBK1, PLCG1, C10orf82, and TUBB2B) (Fig. 8B) between OC cells and control cells.