1 Introduction

With a poor prognosis, gastric cancer (GC) is the fourth most frequent malignancy worldwide and the fifth most common cause of cancer-related mortality [1]. There have been multiple studies investigating the connection between different risk factors and the development of GC, but the exact molecular network processes are yet to be fully understood [2]. Understanding the molecular mechanisms involved in the development of GC can help in finding potential targets for early detection and classification, which may improve patient survival rates.

Carcinogenesis is a complex process that entails numerous genetic and epigenetic alterations [3]. The most common type of GC is adenocarcinoma, accounting for about 95% of cases [4]. Numerous studies have been conducted on the epigenetic and genetic changes in various types of GC, with a particular focus on changes in oncogenes, tumor suppressor genes, DNA repair genes, and cell cycle regulators [5]. Investigating dysregulated genes in various pathways can help in understanding the molecular pathophysiological mechanisms of carcinogenesis and developing new therapeutic approaches. High-throughput techniques for gene and network analysis can help in detecting and classifying cancer, predicting patient response to treatment, and determining disease prognosis. These tools have become increasingly important in cancer research and could significantly contribute to the development of personalized cancer treatments [6, 7]. A more comprehensive understanding of diseases and personalized treatment approaches can be achieved through the use of systems biology techniques, which can help in the identification of gene signatures and the clarification of underlying mechanisms [8]. By utilizing these kinds of systems-level observations, it becomes feasible to generate computational models of biological pathways that govern the development and progression of tumors with greater accuracy [9, 10].

Several investigations have been carried out on the altered genes in GC [11,12,13,14]. However, these studies are still inadequate in providing a complete understanding of the molecular causes of GC [15], and further studies are needed to achieve this objective. The GEO database (Gene Expression Omnibus) is a highly valuable open-source databank in the case of gene expression data. This database contains both array-based and sequence-based data. The experiments can be downloaded, and gene expression profiles can be curated [16]. With a specific focus on probable key genes that can be considered as contributing factors to gastric carcinogenesis, the current study examines the gene expression alterations in GC using bioinformatics tools to determine the crucial genes as well as their different pathways that could be affected in the pathogenesis of GC.

2 Materials and methods

2.1 Dataset selection

In whole GEO database, including 848 series of human gastric cancer, was searched to find datasets that fulfill these conditions: (1) data based on expression profiling by microarray, (2) affiliated projects on gastric cancer, (3) evaluation of both the normal and cancerous tissues, (4) the minimum sample size of two, and (5) CELL as the supplementary filter. The GSE54129 dataset was considered the preferred database which contained a total of 132 specimens, including 111 gastric tumor tissues as well as 21 gastric normal tissues, surgically resected from GC patients.

2.2 Selection of differentially expressed genes (DEGs)

GEO2R was used for the selection of the differentially expressed genes between gastric cancerous and normal tissues in the GSE54129 dataset. GEO2R is a web-based tool that provides users with the evaluation of the GEO Series [16]. A web-based program is applied in this tool that employs the Bioconductor packages [17], limma in R [18], and GEOQuery [19], with a default method of the Benjamini–Hochberg false-discovery rate method [20]. Adjusted p-value < 0.05 and log fold-change (LogFC) between − 1.5 and 1.5 were considered as filters for the selection of the differentially expressed genes. For further evaluation of the overexpressed genes, LogFC was set to > 1.5.

2.3 Network reconstruction and key genes selection

STRING is a database of protein interactions. The interactions among the significantly overexpressed genes were evaluated in STRING v.11.5 (http://string-db.org), and the PPI list was prepared. The PPI list was then inserted into Cytoscape (v.3.9.1), a platform to visualize the PPI networks. CytoHubba software (v.0.1) was chosen for identifying key proteins in the whole network [21]. Four different topological methods (MNC, DMNC, MCC, and Degree) were applied to illustrate the importance of the nodes in the biological network. The top four proteins were proposed as key nodes based on the mentioned four methods. Using the CytoHubba plugin, the subnetwork was finally drawn showing the interactions among hub nodes.

2.4 Enrichment analysis of the key genes

Gene ontology (GO) enrichment analysis, counting biological process (BP), molecular function (MF), and cellular components (CC), as well as the KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analysis was done using STRING [22].

2.5 Gene network clustering

CytoCluster (v.2.1.0) as a Cytoscape plugin was applied for gene network clustering. The IPCA (Identifying Protein Complex Algorithm) algorithm was selected for cluster analysis of the subnetwork. The threshold was set to 0.8, and the list of the genes in all clusters was imported into STRING for further analysis to achieve a better understanding of these genes' involvement in KEGG pathways.

2.6 in silico survival analysis

To identify the prognostic potential of the identified key genes, the Kaplan–Meier survival curves of GC patients with expression changes in these genes were extracted from GEPIA2 (Gene Expression Profiling Interactive Analysis) (http://gepia2.cancer-pku.cn).

2.7 Drug target analysis

The five key genes that showed decreased overall survival times of GC patients (p-value < 0.05) were investigated in DrugBank (https://go.drugbank.com/) to uncover the approved drugs targeting these hub genes.

3 Results

3.1 Selection of differentially expressed genes (DEGs)

There were two groups in this study, gastric cancerous and normal tissues. Based on the analysis using GEO2R (adjusted p-value < 0.05 and |LogFC|> 1.5) the DEGs were selected. Figure 1 shows the results of GEO2R analysis on the selected dataset including the Volcano plot, Venn diagram, UMAP plot, and Boxplot of DEGs between human gastric cancerous and noncancerous tissues. A total of 2005 DEGs, including 1035 up-regulated and 970 down-regulated genes were identified in this study. To find drug targets, further analysis was done on the up-regulated genes.

Fig. 1
figure 1

Volcano plot, Venn diagram, UMAP plot, and Boxplot of DEGs between human gastric cancerous and noncancerous tissues. a Volcano plot of the DEGs identified in gastric cancer tissues; Red: up-regulated genes, Blue: down-regulated genes, Black: no difference. b Venn diagram of the DEGs identified in GC tissues showing 2733 identified DEGs with adjusted p-value < 0.05. c UMAP plot of the human gastric cancerous and noncancerous tissues showing a complete difference between these two groups. d Boxplot of the human gastric cancerous and noncancerous tissues

3.2 Network reconstruction and key genes selection

The upregulated genes (LogFC > 1.5) were selected in this study to be analyzed more precisely with the STIRING online tool and the Cytoscape platform. To identify key proteins in the biological network, we utilized the CytoHubba software (v.0.1). We applied four different topological methods (MNC, DMNC, MCC, and Degree) to determine the importance of the nodes. Based on these methods, the top four proteins were identified as key nodes. We then visualized the interactions among these hub nodes by drawing a subnetwork using the CytoHubba plugin. Figure 2 shows the PPI networks of the key genes. Based on four topological methods, the top four proteins were identified as key nodes. Ten key genes identified in this study based on the previously mentioned four topological methods are introduced in Table 1.

Fig. 2
figure 2

PPI network of the key genes. The overexpressed genes (LogFC > 1.5) in GC and their known neighbors

Table 1 Identified hub genes as the overexpressed genes in GC

3.3 Enrichment analysis of the key genes

GO analysis is one of the most practical and known methods to illustrate typical biological aspects of high-throughput genome or transcriptome data. The enrichment analysis results are shown in Figs. 3, 4, 5, 6.

Fig. 3
figure 3

Gene Ontology enrichment analysis (biological process) of the determined subnetwork of hub genes in GC using STRING v.11.5 (http://string-db.org)

Fig. 4 
figure 4

Gene Ontology enrichment analysis (Cellular Component) of the determined subnetwork of hub genes in GC using STRING v.11.5 (http://string-db.org)

Fig. 5
figure 5

Gene Ontology enrichment analysis (Molecular Function) of the hub genes subnetwork in GC using STRING

Fig. 6 
figure 6

KEGG pathways enrichment analysis of the hub genes of the subnetwork in GC using STRING v.11.5 (http://string-db.org)

GO terms that are commonly observed in at least 50% of genes for BP (Fig. 3) are highly enriched in various areas, including cellular and biological processes, response to stimuli, organismal and anatomical development, regulation of metabolic processes, and cell communication.

As shown in Fig. 4, the most common GO terms for CC were enriched in various regions such as cellular anatomical entity, intracellular, organelle, membrane-bounded organelle, cytoplasm. Extracellular region, cell periphery, plasma membrane, vesicle, and endomembrane system among others.

As shown in Fig. 5, the predominant GO terms found for MF are significantly enriched in binding, ion binding, protein binding, cation binding, metal ion binding, molecular function regulator, anion binding, signaling receptor binding, carbohydrate derivative binding, identical protein binding, protein-containing complex binding, and enzyme binding.

According to KEGG pathway enrichment analysis (Fig. 6), the significant enriched pathways include PI3K-Akt focal adhesion, signaling pathways, pathways in cancer, and human papillomavirus infection.

3.4 Gene network clustering

To find the functional modules in the reconstructed subnetwork cluster analysis was done using the CytoCluster tool with the IPCA algorithm. Two-hundreds and seventeen clusters were found in the cluster analysis process of the subnetwork. Clusters with ranks 1 to 4 were selected to be discussed in this paper (Fig. 7).

Fig. 7
figure 7

PPI network of the genes in clusters rank 1 to 4 using CytoCluster (v.2.1.0)

As illustrated in Table 2, the PI3K-Akt signaling pathway, focal adhesion, protein digestion and absorption, proteoglycans in cancer, human papillomavirus infection, pathways in cancer, ECM-receptor interaction, AGE-RAGE signaling pathway in diabetic complications, Relaxin signaling pathway, and microRNAs in cancer were the common pathways among all ranks.

Table 2 Summary of the clusters (rank 1–4) resulted from the cluster analysis of the subnetwork of the overexpressed genes in GC plus their known neighbors using the CytoCluster App

3.5 in silico survival analysis

The Kaplan–Meier overall survival curves of patients with GC are shown in Fig. 8. These curves indicate that overexpression of CTGF, FN1, IL-6, THBS1, and WISP1 is associated with decreased overall survival times of GC patients (p-value < 0.05).

Fig. 8
figure 8

The Kaplan–Meier overall survival curves of patients with GC based on differential expression of the ten hub genes identified in this study were extracted from the Gene Expression Profiling Interactive Analysis (GEPIA2) database. Kaplan–Meier estimates found that overexpression of (b) CTGF, (e) FN1, (f) IL-6, (h) THBS1, and (j) WISP1 was significantly associated with decreased overall survival times of GC patients (p-value < 0.05). The other panels (a) ACTB, (c) CXCL5, (d) ELN, (g) MMP2, and (i) TP53 show no statistically significant association with overall survival (p-value > 0.05)

3.6 Drug target analysis

Five hub genes (CTGF, FN1, IL-6, THBS1 and WISP1) significantly associated with decreased overall survival times of GC patients (p-value < 0.05) were searched in DrugBank (https://go.drugbank.com/) to find the drugs that target these genes. Currently, the FDA has not authorized any medications, targeting CTGF, THBS1, or WISP1. However, as shown in Table 3, a few approved drugs do target FN1 and IL-6, such as Ocriplasmin (targeting FN1), Foreskin fibroblast, Binimetinib, and Siltuximab (targeting IL-6). Among these approved drugs, only Siltuximab has a single target (IL-6), while the other drugs have multiple targets. A list of the drugs that target these genes is provided in Error! Reference source not found.

Table 3 Approved drugs that target the five hub genes with significant association with decreased survival times in GC patients based on DrugBank data

4 Discussion

In recent years, different investigations in multi-center genomics research from gene to systems level and next-generation sequencing, clarified the various mechanisms specifically involved in tumor progression [23]. This study used bioinformatics analysis to further investigate the genes involved in GC pathogenesis. In our study, we focused on up-regulated genes in gastric cancer to identify potential targets for drug repurposing, aiming to utilize existing approved drugs for novel therapeutic strategies. This approach is motivated by the potential to significantly reduce the time and cost of drug development, leveraging the overexpression of certain genes in the tumor environment that are pivotal for cancer progression. By concentrating on genes that are likely overexpressed and play crucial roles in cell proliferation, survival, invasion, and metastasis, our goal was to disrupt key signaling pathways and biological processes essential for tumor growth. This strategy also allows for the expedited clinical application of our findings, harnessing the safety and efficacy data of already approved pharmacological inhibitors to target these overexpressed genes.

Ten hub genes were identified in this study, including FN1, CTGF, CXCL5, IL6, ELN, ADAMTS2, MMP2, TP53, WISP1, and THBS1. The relationship between the overexpression of these genes and GC is an area of active scientific investigation. These genes have been implicated in various molecular and cellular pathways relevant to cancer initiation, progression, and metastasis.

CXCL5 is a protein that influences immune responses and angiogenesis, playing a significant role in the progression and metastasis of GC, particularly by enhancing tumor spread and being associated with advanced disease stages [24,25,26]. The presence of immune cells and inflammation in the microenvironment can potentially contribute to the development of GC, promoting tumor growth and metastasis [27]. Up-regulation of ELN, which encodes the elastin protein, may affect tissue integrity, elasticity, and angiogenesis, potentially impacting the invasive behavior of GC cells [28]. ADAMTS2, involved in extracellular matrix remodeling, might influence tumor cell invasion and angiogenesis in GC [29]. MMP2, a matrix metalloproteinase, has implications in extracellular matrix degradation, which is essential for tumor invasion and metastasis in GC [30]. The preservation of genomic stability depends greatly on the TP53 gene, which functions as a tumor suppressor. Mutations or dysregulation of TP53 are commonly associated with GC, contributing to uncontrolled cell growth and reduced DNA repair [31]. In conclusion, the overexpression of these genes is interconnected with various cellular processes and molecular pathways relevant to GC. However, the specific roles of each gene in the context of GC initiation, progression, and metastasis are complex and multifaceted. Further comprehensive research is essential to elucidate the precise contributions of these genes, potentially leading to the development of targeted therapeutic strategies and the identification of biomarkers for GC management.

The results of our study showed that the PI3K-Akt signaling pathway is the prominent pathway affected in GC. The PI3K-Akt pathway is commonly disrupted in GC, with multiple essential components showing alterations. Activation of this pathway leads to enhanced tumor growth, survival, metastasis, angiogenesis, and resistance to treatments in GC [32]. The primary genomic changes causing abnormal activation of the PI3K-Akt pathway involve PIK3CA amplifications/mutations, loss of PTEN, and the overexpression of proteins that activate chemical signaling pathways called receptor tyrosine kinases (such as EGFR, HER2, cMET) [33]. Clinical investigations have linked active PI3K-Akt signaling to aggressive disease, advanced tumor stages, metastasis, and unfavorable prognoses in patients with GC. Experimental research shows that blocking the PI3K-Akt signaling pathway can reduce the proliferation of gastric cancer cells, trigger apoptosis, and increase their sensitivity to chemotherapy and radiotherapy [34]. Preliminary results from early-stage trials of PI3K/Akt/mTOR inhibitors, either alone or combined, have demonstrated potential efficacy against advanced GC [35]. Noteworthy challenges encompass treatment-related toxicity, acquired resistance, and the subset of patients who respond positively to PI3K pathway inhibition, thus emphasizing the need to explore predictive biomarkers.

This study showed that overexpression of the CTGF, FN1, IL-6, THBS1, and WISP1 genes correlate with decreased survival rates in individuals diagnosed with GC. Further investigations are warranted to elucidate their potential as therapeutic targets. CTGF, a growth factor involved in fibrosis and tissue remodeling, plays a role in the development of gastric cancer (GC) by affecting the tumor microenvironment, angiogenesis, and cell proliferation. Its overexpression in GC enhances tumor growth, invasion, and chemotherapy resistance, largely through activating the PI3K-Akt signaling pathway, a process driven by CTGF’s direct binding to PI3K’s p85 subunit [36, 37]. FN1, an essential component of the extracellular matrix, is linked to enhanced tumor cell adhesion, migration, and invasion, thus increasing the metastatic capability of cancer cells, including in GC [38]. This is supported by bioinformatics analyses by Sun et al., who identified FN1 as a crucial gene in GC pathogenesis [39], aligning with our findings. The upregulation of FN1 activates FAK/Src signaling in GC cells, which in turn interacts with and intensifies PI3K-Akt signaling, further promoting cellular migration and invasion processes [40]. In addition, IL-6 overexpression in GC triggers the GP130/JAK/STAT3 cascade, leading to the activation of PI3K-Akt signaling, which confers increased advantages in cell proliferation and survival [41]. As a pro-inflammatory cytokine, IL6 can activate various signaling pathways that contribute to cell survival, proliferation, and immune modulation in GC, potentially playing a significant role in tumor progression [42]. THBS1, an adhesive glycoprotein, is involved in angiogenesis and cell–matrix interactions, and its overexpression may contribute to tumor angiogenesis and metastasis in GC [43]. Its upregulation in GC leads to PI3K-Akt signaling activation through integrin-mediated pathways, fostering increased tumor growth and resistance to therapy [44]. WISP1, which regulates the Wnt signaling pathway, may have a part in encouraging cell growth and movement in GC, which could help in the progression of the tumor [45]. WISP1 partially activates the PI3K-Akt/mTOR axis, consequently intensifying proliferation, invasion, and angiogenesis [46]. Importantly, a reciprocal relationship emerges, wherein PI3K-Akt signaling can upregulate the expression of certain genes among this group, forming positive feedback loops that reinforce cancer-related phenotypes.

The overexpression of these genes and activation of the PI3K-Akt pathway are interconnected in promoting gastric tumorigenesis and progression through their abilities to upregulate each other and mediate downstream oncogenic effects. Targeting this signaling connection may have therapeutic potential in GC. The results of our investigation into these genes using DrugBank indicate that there are currently no drugs approved by the FDA targeting CTGF, THBS1, or WISP1. However, a few approved drugs do target FN1 and IL-6, such as Ocriplasmin (targeting FN1), Foreskin fibroblast, Binimetinib, and Siltuximab (targeting IL-6). These drugs are suggested for further investigation as potential new therapeutic options for EC patients. Among these approved drugs, only Siltuximab has a single target (IL-6), indicating specific action and potentially fewer side effects. Therefore, Siltuximab could be considered a suitable choice for EC patients.

The exploration of pharmacotherapeutic agents targeting hub genes holds promise for personalized treatment strategies. While limited FDA-approved drugs targeting FN1 and IL6 were identified, the lack of approved agents for CTGF, THBS1, and WISP1 underscores an unmet therapeutic need. The multitarget nature of the drugs introduced in this study necessitates in-depth evaluation to assess potential off-target effects. Our results support the need for further research to investigate the underlying mechanisms of the key hub genes and their related pathways identified in our study. The integration of multi-omics data, coupled with experimental validation, could shed light on their roles in GC initiation, progression, and therapeutic response. Developing targeted therapies addressing these hub genes could potentially revolutionize treatment strategies for GC patients. Despite the comprehensive nature of our study, limitations such as the reliance on bioinformatics predictions and the lack of experimental validation merit consideration. Future investigations should encompass clinical validation to confirm the roles of these hub genes in GC pathogenesis and treatment response.

We understand that gastric cancer exhibits spatiotemporal heterogeneity, which makes genetic analysis challenging due to its diverse pathological types. However, our study aimed to identify common molecular pathways and hub genes that are overexpressed across a wide range of gastric cancer samples. By focusing on these commonalities, we aimed to provide insights that can be applied to multiple subtypes of the disease, potentially guiding more personalized therapeutic approaches in the future. Despite the limitations imposed by this heterogeneity, we believe that our findings contribute to a foundational understanding of gastric cancer's molecular biology. They can serve as a valuable resource for further research aimed at unraveling the disease's complex genetic landscape and informing the development of targeted treatments.

5 Conclusion

The study employed bioinformatics analysis to unveil ten hub genes—FN1, CTGF, CXCL5, IL6, ELN, ADAMTS2, MMP2, TP53, WISP1, and THBS1—in GC pathogenesis, expanding the understanding of their overexpression and its implications across critical molecular and cellular pathways for cancer initiation, progression, and metastasis. The study demonstrated the PI3K-Akt pathway’s central involvement in GC, frequently disrupted in GC, with pronounced alterations in essential components. Survival analysis revealed significant correlations between CTGF, FN1, IL-6, THBS1, and WISP1 overexpression and reduced overall survival times in GC patients. A mutual interplay emerged, where PI3K-Akt signaling could upregulate certain genes, forming feedback loops and intensifying cancer phenotypes. The interconnected overexpression of genes and the PI3K-Akt pathway fosters gastric tumorigenesis, suggesting therapeutic potential. DrugBank analysis identified limited FDA-approved drugs, advocating for further exploration while targeting these hub genes could reshape GC treatment. While comprehensive, the study acknowledges limitations such as bioinformatics reliance and calls for additional clinical validation. We understand the complexity of gastric cancer's heterogeneity, but our study was focused on identifying common molecular pathways and hub genes across various samples. This approach aimed to offer insights applicable to multiple subtypes, potentially aiding the development of personalized treatments. Despite the challenges of heterogeneity, we believe our findings provide a valuable foundation for further research into the molecular biology of gastric cancer and the advancement of targeted therapies.

6 Limitation section

Our study on identifying prognostic and therapeutic targets in gastric cancer, while offering valuable insights, is subject to limitations including reliance on a single dataset, which may not fully represent the genetic diversity of gastric cancer worldwide. The inherent heterogeneity of gastric cancer, with its varied pathological subtypes, poses additional challenges to the generalizability of our findings. Furthermore, the bioinformatics approach necessitates experimental validation to confirm the therapeutic potential of the identified genes and pathways. Thus, while our findings contribute to the foundational understanding of gastric cancer’s molecular landscape, they should be interpreted with caution, serving as a basis for future, more detailed investigations.