Identification of novel biomarkers, MUC5AC, MUC1, KRT7, GAPDH, CD44 for gastric cancer

Gastric cancer (GC) is one of the most common malignant tumors in the world, and it is also the third largest cause of cancer-related death in the world. As far as we know, no biomarker has been widely accepted for early diagnosis and prognosis prediction of gastric cancer. The purpose of this study is to find potential biomarkers to predict the prognosis of GC. The gene expression profiles of GSE2685 were downloaded from GEO database. Morpheus was used to calculate the differentially expressed genes (DEGs) between primary advanced gastric cancer tissues and noncancerous gastric tissues. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses were performed, and protein–protein interaction (PPI) network of DEGs was constructed. Kaplan–Meier Plotter was used to determine the overall survival (OS) outcomes of UC5AC, MUC1, KRT7, GAPDH, CD44, and GEPIA was used to determine the Pearson correlation analysis. In total, 710 DEGs were identified in GC, including 396 upregulated genes and 314 downregulated genes. GO enrichment revealed that they were mainly enriched in binding, catalytic activity, cellular process and cell. KEGG pathway revealed that they were mainly enriched in metabolic pathways, pathways in cancer and PI3K-Akt signaling pathway. MUC5AC, MUC1, KRT7, GAPDH, CD44 were identified from the PPI network. MUC5AC, MUC1, KRT7, GAPDH, CD44 were demonstrated to have prognostic value for patients with GC. MUC5AC, MUC1 exhibited low expression levels in GC tissues, KRT7, GAPDH, CD44 presented high expression levels in GC tissues. In particular, KRT7 is hardly expressed in normal gastric tissues. MUC5AC and MUC1 were negatively correlated with GAPDH, CD44, respectively; and GAPDH was positively correlated with CD44 and KRT7, respectively. Moreover. MUC5AC, MUC1, KRT7, GAPDH, and CD44 are not only related to GC but also to apoptosis pathway. Results from the present study suggested that MUC5AC, MUC1, KRT7, GAPDH, CD44 may represent novel prognostic biomarkers for GC.


Introduction
Gastric cancer (GC) is one of the most common causes of tumor-related death worldwide. GC is the second largest malignant tumor in China [1,2]. The high incidence of GC is partly due to the widespread use of endoscopes. At present, the sensitivity and specificity of carcinoembryonic antigen (CEA)  in clinical application are limited, which leads to the unsatisfactory level of early diagnosis of GC [3][4][5][6]. Although progress has been made in the diagnosis and treatment of GC [7], the prognosis is still poor, and the 5 year survival rate of patients with GC is less than 20%. If there is no regional lymph node involvement, the survival rate of patients with GC is very high [8]. Unfortunately, GC is difficult to diagnose at an early stage. Therefore, people are very interested in finding prognostic markers for these potentially curable patients [9][10][11][12][13].
From molecular diagnosis to tumor molecular classification, from patient stratification to prognosis prediction, from new drug target discovery to tumor response prediction, high-throughput gene expression analysis platforms (such as microarrays) have been paid more and more attention. It is considered to be a promising tool in medical oncology [14][15][16]. In the past decade, microarray technology has been used to study the expression profiles of many genes in the carcinogenesis of GC, and hundreds of DEGs involved in different pathways, biological processes or molecular 1 3 34 Page 2 of 10 functions have been found [17,18]. However, a comparative analysis of DEGs in independent studies shows that the degree of overlap is relatively limited, and there is no reliable biomarker profile to distinguish cancer from normal tissue. Now, gene chip technology combined with bioinformatics analysis makes it possible to comprehensively analyze the changes of mRNA expression during the occurrence and development of GC. Hippo et al. used laser capture microdissection technique to collect tissue samples and detect DEGs in GC tissue and normal tissue, respectively [19]. However, the interaction between DEGs, especially the pathway in the interaction network, remains to be clarified.
In this study, we download raw data (GSE2685) from GEO as a center for storing and retrieving microarray data, and identify these data by comparing gene expression profiles between GC and normal tissues. Then, DEGs was screened by Morpheus software, and then gene ontology (GO) and pathway enrichment analysis were carried out. Through the analysis of its biological functions and pathways, we can further understand the occurrence and development of GC at the molecular level and explore potential biomarkers for diagnosis, prognosis and drug targets.

Microarray data
The gene expression profiles of GSE2685 were downloaded from GEO database. GSE2685, which was based on Agilent GPL80 platform (Affymetrix Human Full Length HuGen-eFL Array Hu6800), was submitted by Hippo

Identification of DEGs
Morpheus was applied to determine the differentially expressed genes (DEGs) in normal gastric tissues and GC tissues. Adjusted P < 0.01 and |log fold change (FC)|> 1.5 were set as cut-off values. A total of 710 DEGs were then identified, including 396 up-regulated and 314 down-regulated genes.

Gene ontology and pathway enrichment analysis of DEGs
Gene ontology analysis (GO) is a common useful method for annotating genes and gene products and for identifying characteristic biological attributes for high-throughput genome or transcriptome data. KEGG is a knowledge base for systematic analysis of gene functions, linking genomic information with higher-order functional information. In order to analyze the DEGs at the functional level, GO enrichment and KEGG pathway analysis were performed using DAVID online tool. P < 0.05 was considered statistically significant.

Integration of protein-protein interaction (PPI) network analysis
Search tool for the retrieval of interacting genes (STRING) database is online tool designed to evaluate the protein-protein interaction (PPI) information. To evaluate the interactive relationships among DEGs, we mapped the DEGs to STRING, and only experimentally validated interactions with a combined score > 0.4 were selected as significant. Then, PPI networks were constructed using the Cytoscape software. P < 0.05 was considered to have significant differences.

Expression levels, correlation and survival analysis
The prognostic value of STATs mRNA expression was evaluated using Kaplan-Meier Plotter, which contained gene expression data and survival information of 1440 clinical GC patients. To analyze the Overall survival (OS) of patients with GC, patient samples were split into two groups by median expression (high vs. low expression) and assessed by a Kaplan-Meier survival plot, with the hazard ratio (HR) with 95% confidence intervals (CI) and log-rank p value. The genes associated with OS were applied for further analysis, including Pearson correlation analysis and analysis of expression levels in tumor and normal tissues using GEPIA.

Identification of DEGs
There were 22 GC tissues and 8 normal gastric tissue samples analyzed in this study. Firstly, Morpheus was employed to identify DEGs using the following cut-off values: Adjusted P < 0.01 and |log FC|> 1.5. As a result, a total of 710 DEGs were identified, including 396 up-regulated and 314 down-regulated genes.

GO term enrichment analysis
We uploaded all DEGs to the online software DAVID to identify overrepresented GO categories and KEGG pathways. GO analysis results showed that up-regulated DEGs were significantly enriched in biological processes (BP), including cellular process, metabolic process, biological regulation and localization; the down-regulated DEGs were significantly enriched in cellular process, biological regulation, metabolic process and response to stimulus (Table 1).
For molecular function (MF), the up-regulated DEGs were enriched in binding, catalytic activity, and transcription regulator activity; the down-regulated DEGs were significantly enriched in catalytic activity, binding and molecular transducer activity (Table 1). In addition, GO cell component (CC) analysis also displayed that the up-regulated DEGs were significantly enriched in cell, organelle, protein-containing complex and extracellular region; the down-regulated DEGs were significantly enriched in cell, organelle and membrane (Table 1). Table 2 contains the most significantly enriched pathways of the up-regulated DEGs and down-regulated DEGs analyzed by KEGG analysis. The up-regulated DEGs were enriched in pathways in cancer, metabolic pathways, PI3K-Akt signaling pathway, human papillomavirus infection and focal adhesion, while the down-regulated DEGs were enriched in metabolic pathways, neuroactive ligand-receptor interaction, pathways in cancer, calcium signaling pathway, PI3K-Akt signaling pathway and cytokine-cytokine receptor interaction.

Protein-protein interaction (PPI) network analysis
This network contains known interactions from curated databases and those that were experimentally determined; predicted interactions containing gene neighborhood, gene fusions and gene cooccurrence; and text-mining,  co-expression and protein homology. Based on the information in the STRING database, MUC5AC, MUC1, KRT7, GAPDH, CD44 were identified from the PPI network (Fig. 1).

Survival curves, expression levels and correlation analysis
MUC5AC, MUC1, KRT7, GAPDH, CD44 were demonstrated to have prognostic value for patients with GC. The MUC5AC, MUC1, KRT7, GAPDH, CD44 were significantly associated with overall survival (log-rank P = 1.9e-5, 0.018, 8.1e-6, 1.1e-10 and 0.011, respectively) (Fig. 2a-e). The analysis of the five genes shows that low expression levels lead to better living conditions. The genes MUC5AC, MUC1, KRT7, GAPDH, CD44 were then subjected to further analysis. Expression levels of the five genes are displayed in Fig. 3a-e. MUC5AC, MUC1 exhibited low expression levels in GC tissues, KRT7, GAPDH, CD44 presented high expression levels in GC tissues. In particular, KRT7 is hardly expressed in normal gastric tissues.
Furthermore, Pearson correlation analyses between the genes are presented in Fig. 4a-

Discussion
In this study, we investigated the potential prognostic association between GC and DEGs in GSE2685. The results showed that there were 710 differentially expressed genes between 8 normal gastric tissues and 22 gastric cancer tissues, of which 396 genes were up-regulated and 314 genes were down-regulated. MUC5AC, MUC1, KRT7, GAPDH, CD44 has potential prognostic value for patients with GC. Moreover, these five genes are not only related to GC, but also to apoptosis pathway.
Gastric mucosal barrier protects gastric mucosa from hydrochloric acid and various harmful substances. MUC5AC is a gel-formed mucin, which is known as the main component of the gastric mucus layer [20][21][22][23][24][25][26]. MUC5AC is a wellknown marker of gastric differentiation, which is considered to be a very important prognostic indicator of GC and is often used in clinical evaluation [27][28][29]. It has been found that the expression of MUC5AC in stomach decreases with the development of intestinal metaplasia, and the expression of MUC5AC is related to tumor stage: the expression level of MUC5AC in advanced GC is lower than that in early GC [30]. This is consistent with our analysis results that the expression of MUC5AC in normal tissues is higher than that in GC tissues.
Keratin 7 is an intermediate filament protein, which is mainly expressed in epithelial and epithelial tumors. In GC, KRT7 has been identified as the target of long noncoding antisense RNA KRT7-AS and has been proved to be involved in the progression of gastric cancer [31]. KRT7 can promote the proliferation, migration and invasion of GC cells, and reduce the sensitivity to chemotherapy [32]. Interestingly, we found that KRT7 was hardly expressed in normal tissues, but strongly expressed in GC tissues, and its methylation is very strong. We speculate that methylation must play an important role in it.
It was found that stromal cells secrete glyceraldehyde 3-phosphate dehydrogenase (GAPDH). Extracellular GAPDH or its N-terminal domain inhibits the growth of gastric cancer cells, which has been confirmed in other cell systems [33][34][35][36][37][38][39][40]. They believe that the use of GAPDH to negatively regulate tumor growth may be a new anti-cancer strategy [41,42]. They believe that the use of GAPDH to negatively regulate tumor growth may be a new anti-cancer strategy [43][44][45]. Yamaji et al. reported that GAPDH is secreted by some cancer cells and can inhibit cell proliferation [46]. However, the inhibitory activity of GAPDH on the growth of cancer cells has not been reported. This is due to the sensitivity of cancer cells to GAPDH. They found that the N-terminal domain of GAPDH is necessary to exert its growth inhibitory activity. Interestingly, its growth inhibitory activity does not need the catalytic domain of the original enzyme activity. Unexpectedly, it has recently been reported that the N-terminal peptide of GAPDH has antifungal activity against Candida albicans through internalization. However, the immunofluorescence of anti-FLAG antibody under the condition of cell permeation showed that GAPDH was not incorporated into MKN-7 cells. Therefore, its mechanism is considered to be different. Our results showed that the expression of GAPDH in GC was higher than that in normal tissues, and it was negatively correlated with MUC5AC and MUC1, respectively.
In human GC, the mechanism responsible for maintaining malignant stem cells in the tumor microenvironment is largely unknown [47][48][49][50]. Among the stem cell populations in the stomach, the cells that may be targeted and transformed into tumor initiation cells during chronic Helicobacter pylori infection are those labeled by receptors on the surface of differentiated cluster 44 (CD44) cells [51]. Different from the typical CD44 standard isomers, CD44 variants (CD44v) are considered to be the key molecules in the process of malignant transformation, and their expression is highly restricted and specific [52,53]. In general, CD44v, is considered to be a marker of GC cells [54,55], which help to increase resistance to chemotherapy or radiation-induced cell death [56][57][58][59][60][61][62][63][64][65]. Our results indicated that the expression of CD44 in GC was higher than that in normal tissues, and there was a positive correlation between CD44 and GAPDH.
In conclusion, 710 DEGs were found in patients with gastric cancer in this study. These genes may function through binding, catalytic activity, cellular processes and cells, as well as metabolic pathways, cancer pathways and PI3K-Akt signaling pathways. MUC5AC, MUC1, KRT7, GAPDH and CD44 are not only related to gastric cancer, but also to apoptosis pathway, suggesting that MUC5AC, MUC1, KRT7, GAPDH and CD44 may be potential prognostic biomarkers of gastric cancer. In addition, KRT7, GAPDH and CD44 may play a carcinogenic role in gastric cancer, while MUC5AC and MUC1 may play a tumor inhibitory role. Further molecular biology experiments are needed to confirm the function of identified genes in gastric cancer, especially in metastasis and cancer progression, to guide the clinical direction. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.