Abstract
Background
Detailed understanding of pre-, early and late neoplastic states in gastric cancer helps develop better models of risk of progression to gastric cancers (GCs) and medical treatment to intercept such progression.
Methods
We built a Boolean implication network of gastric cancer and deployed machine learning algorithms to develop predictive models of known pre-neoplastic states, e.g., atrophic gastritis, intestinal metaplasia (IM) and low- to high-grade intestinal neoplasia (L/HGIN), and GC. Our approach exploits the presence of asymmetric Boolean implication relationships that are likely to be invariant across almost all gastric cancer datasets. Invariant asymmetric Boolean implication relationships can decipher fundamental time-series underlying the biological data. Pursuing this method, we developed a healthy mucosa → GC continuum model based on this approach.
Results
Our model performed better against publicly available models for distinguishing healthy versus GC samples. Although not trained on IM and L/HGIN datasets, the model could identify the risk of progression to GC via the metaplasia → dysplasia → neoplasia cascade in patient samples. The model could rank all publicly available mouse models for their ability to best recapitulate the gene expression patterns during human GC initiation and progression.
Conclusions
A Boolean implication network enabled the identification of hitherto undefined continuum states during GC initiation. The developed model could now serve as a starting point for rationalizing candidate therapeutic targets to intercept GC progression.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Gastric cancer (GC) often presents as an advanced disease with patients either having inoperable conditions or surgery as the only potentially curative treatment [1]. There is evidence that 75% of all GCs are initiated by Helicobacter pylori, a known carcinogenic pathogen [2, 3]. Risk factors also include age, sex, smoking and family history [4]. This oncogenesis leads to Correa’s cascade, a stepwise progression from normal, chronic active gastritis, atrophic gastritis, intestinal metaplasia, dysplasia then adenocarcinomas [3]. Intestinal metaplasia also has two subtypes, incomplete and complete intestinal metaplasia (IIM and CIM, respectively), with IIM having a higher probability of developing GC compared to CIM [5].
Research into GCs has used impactful approaches to investigate the genome [6], therapeutics [7] and survival [8], but these methods have not translated into actionable biomarkers of prognostication, targets, novel therapeutics, or changes in screening strategies. These genomic insights also have not provided insight into which genes are important in the progression of GC for pre-neoplastic detection and treatment.
Here, we present a network-based approach for biomarker and target discovery that uses artificial intelligence (AI) to select genes and then perform rigorous validation in multiple independent GC datasets. Previously, we have successfully exploited this approach to identify biomarkers in IBD [9], COVID-19 [10] and macrophages [11]. We demonstrate how Boolean implications allow us to develop models that provide insight into the gastric cancer disease continuum.
Methods
Detailed methods for computational modeling and AI-guided target identification are presented in Online Resource 1 and mentioned in brief here.
Construction of a network of Boolean implications
We modeled continuum states within the metaplasia → dysplasia → neoplasia cascade using Boolean Network Explorer (BoNE) [9]. We created an asymmetric gene expression network, for the progression from normal to gastric cancer (GC), using a computational method based on Boolean logic [12]. To build the GC network, we analyzed a publicly available gastric cancer transcriptomic dataset, GSE66229 [13] (n = 400; 300 GC tumor and 100 patient-matched normal tissue). A Boolean Network Explorer (BoNE; see Online Resource 1 for more details) computational tool was introduced, which uses asymmetric properties of Boolean implication relationships (Boolean implication relationships—BIRs—as in MiDReG algorithm [12]) to model natural progressive time-series changes in major cellular compartments that initiate, propagate, and perpetuate cellular state change and are likely to be important for GC progression. BoNE provides an integrated platform for the construction, visualization and querying of a network of progressive changes much like a disease map (in this case, GC map) in three steps: (1) the expression levels of all genes in these datasets were converted to Boolean values (high or low) using the StepMiner algorithm [14]. (2) Gene expression relationships between pairs of genes were classified into six possible BIRs and expressed as Boolean implication statements: two symmetric Boolean implications “equivalent” and “opposite” occur when two diagonally opposite sparse quadrants are identified and four asymmetric relationships, each corresponding to one sparse quadrant. Previous methods of analysis of transcriptomic datasets recognize the two symmetric relationships using correlation, while ignoring the asymmetric relationships. We used BooleanNet statistics to assess the significance of the Boolean implication relationships [12]. Prior work [9] revealed how our Boolean approach offers a distinct advantage from current conventional computational methods that rely on symmetric linear relationships from gene expression data. BIRs are also more robust to the noise of sample heterogeneity (i.e., healthy, diseased, genotypic, phenotypic, ethnic, interventions, and disease severity) compared to other methods and every sample follows the same mathematical equation. This makes BIRs identified in our methods likely to be reproducible in independent validation datasets. (3) A Boolean implications network was created using the identified BIRs. Clusters are defined by groups of genes that are equivalent to at least half of the genes in the rest of the cluster. The clusters were connected with directed edges by identifying the majority Boolean relationships between two clusters. The resulting Boolean implication network contains clusters of genes which are the nodes and the BIR between the clusters are the directed edges. BoNE enables their discovery in an unsupervised way without the bias of sample type. Gene expression datasets were visualized using Hierarchical Exploration of Gene Expression Microarrays Online (HEGEMON) framework [9].
Ordering samples based on composite score of Boolean path
A Boolean path contains one or more clusters. A composite score for each sample is calculated to provide a summary of the genes expressed in the Boolean path. The composite score is calculated using the following steps: (1) the genes in each cluster were normalized and averaged. Gene expression values were normalized according to a modified Z-score approach centered around StepMiner threshold (formula = (expr—SThr)/3/stddev). (2) A weighted linear combination of the averages from the clusters of a Boolean path was used to create a score for each sample. We either monotonically increased or decreased the weights along the path to make the sample order consistent with the logical order based on BIR. We then order the samples based on the final weighted and linearly combined score. If a cluster is highly expressed in a disease setting, it received a positive weight (ex: 1, 2, 3, etc.) and if a cluster is highly expressed in a healthy setting, it received a negative weight (ex: − 1, − 2, − 3, etc.).
Multivariate analysis for model selection
We used two microarray datasets (GSE37023 (only samples on GPL96 Affymetrix Human Genome U133A Array used for analysis), n = 65, non-malignant = 36, GC tumor = 29; GSE122401, n = 160, patient-matched normal = 80, GC tumor = 80) to train a network model that should distinguish normal vs GC samples. Using Ordinary Least Squares (OLS) regression in Python statsmodels (version 0.12.2), we performed multivariate analysis to determine which models performed best in the two training datasets.
Statistical analysis
Statistical significance between experimental groups was determined using Python scipy.stats.ttest_ind package (version 0.19.0) with Welch’s two sample t test (two-tailed, unpaired, unequal variance (equal_var = False), and unequal sample size). For all tests, a p value of 0.05 was used as the cutoff to determine significance. Violin and bar plots are created using Python seaborn package version 0.10.1.
Results
Machine learning identified two possible Boolean paths in the GC disease map
Using a publicly available GC dataset (GSE66229) that is comprised of tumor (T) and adjacent normal (AN) samples, we built a Boolean implication network (See Methods and Online Resource 1; Fig. 1a). Each cluster was evaluated to determine whether they fall on the healthy versus GC side of the disease map based on whether the average gene expression value of a cluster in healthy samples is up or down, yielding a GC map (Fig. 1b). We then used machine learning to identify Boolean paths (clusters connected by Boolean implication relationships) in the GC map that can distinguish tumor from AN samples in the training datasets (Fig. 1C top graphic). Clusters #11-2-4–14 (C#11-2-4-14) performed the best with an ROC-AUC of 0.96 in training dataset #1 (GSE37023 AN versus T), while clusters #7-13-14 (C#7-13-14) performed best in training dataset #2 (GSE122401 AN vs T) with an ROC-AUC of 0.98 (Fig. 1c). Specific violin plots for both datasets and Boolean paths are presented in Fig. 1d. We performed Reactome pathway analysis on clusters in both paths to identify the top five biological processes associated with the clusters (Fig. 1e). Cluster 11 involves the downregulation of genes related to muscle contraction in GC. Cluster 2 represents genes relevant to cell cycle as many other studies pointed out their relevance in the context of GC [15, 16]. Cluster 4 had genes from the immune system including neutrophil degranulation as linked in other papers [17,18,19]. Clusters 7 and 13 had genes involved in the downregulation of ion channel transport in GC [20, 21]. Cluster 14 represents genes increased in extracellular matrix (ECM) processes, indicating our findings that ECM is altered early during cell transformation is in keeping with what has been observed by others [22, 23]. Since both Boolean paths C#11-2-4-14 and C#7-13-14 can distinguish AN versus GC samples, we identified a gene signature called GC-BoNE uses the path that best characterized the different samples (highest ROC-AUC score out of both paths) for classification of samples.
We tested how well the clusters identified by our Boolean approach would compare to previously established gene signatures (Fig. 2a). C#11-2-4-14 and C#7-13-14 individually (Fig. 2b) could classify the tumor and normal/adjacent normal samples in the 21 validation datasets (see Online Resource 2 for a list of GSE IDs; ROC-AUC ranges from 0.57 to 1.00 in C#11-2-4-14, and 0.66–1.00 in C#7-13-14). We then compared GC-BoNE to other gene signatures (see Online Resource 3 for list of genes in signatures; Fig. 2c) and found that our signature outperformed the others (average ROC-AUC for GC-BoNE is 0.933, and other signatures range from 0.690 to 0.921). There were minimal overlaps between clusters 11-2-4 (Fig. 2d), 7-13 (Fig. 2e) and the top three signatures (DEA (Li 2015), DEA + PPIN and DEA (Junnila 2010) [6]). Cluster 14 and the DEA (Junnila 2010) [6] signature had 8 overlapping genes (Fig. 2f). These findings suggest GC-BoNE provides a new list of potential biomarkers for GC that differ from previous signatures.
GC-BoNE identifies progressively increasing risk of GC along the metaplasia–dysplasia continuum
We next asked if the GC-BoNE signature is induced during the progression from normal to GC through the normal → inflammation (gastritis) → metaplasia → dysplasia → neoplasia cascade (Fig. 3a, b). In one dataset (E-MTAB-8889), we looked at the normal → inflammation (gastritis) → metaplasia cascade by comparing pairwise each sequential step, i.e., non-atrophic gastritis (NAG) vs chronic active gastritis (CG), CG vs chronic atrophic gastritis (CAG) and CAG vs intestinal metaplasia (IM) (Fig. 3c). We also looked at the first step in the cascade vs the other steps, i.e., NAG vs CAG and NAG vs IM (Fig. 3c). In another dataset (GSE55696), we studied the dysplasia → neoplasia cascade, which is typically scored by histopathological examination, as per the Vienna classification [24]; the latter comprises a continuum extending from low to high-grade dysplasia to intramucosal carcinoma. Here, we looked at chronic gastritis (CG) vs low-grade intestinal neoplasia (LGIN), LGIN vs high-grade intestinal neoplasia (HGIN), HGIN vs early gastric cancer (EGC), CG vs HGIN and CG vs EGC (Fig. 3d). We compared GC-BoNE to the other signatures (Fig. 3e) and found that our signature again outperformed the others when looking at progression (see Online Resource 2 for a list of GSE IDs; average ROC-AUC for GC-BoNE is 0.828, and other signatures range from 0.633 to 0.806). These findings suggest the genes identified in GC-BoNE may provide further insight into what initiates GC progression.
GC-BoNE can objectively assess the appropriateness of mouse models for studying human GC
Next, we wanted to identify mouse models that recapitulated human normal versus GC. We analyzed 38 mouse models [25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42] from 20 NCBI GEO datasets using C#11-2-4-14 and C#7-13-14 (see Online Resource 2 for a list of GSE IDs; Fig. 3f). Many of the mouse models had a perfect ROC-AUC of 1.00 using C#11-2-4-14 and C#7-13-14 (see Online Resource 4). We then looked at which mouse models are significantly different using a t test to determine the top ten models (Fig. 3g). It is noteworthy that the top two models represent the two common risk factors for GC in humans. The model that ranked #1 (GSE13873) is one in which the H. pylori infection → GC cascade is modeled in C57Bl6 mouse model of experimental infection with the closely related H. felis. The authors showed that while most infected mice develop premalignant lesions such as gastric atrophy, compensatory epithelial hyperplasia and IM, a minority is completely protected from pre-neoplasia. The models that ranked #2-6 (GSE103639 (NGE vs pCP_GC), GSE45956, GSE103639 (NGE vs pChePS_GC), GSE16902, GSE93774) were all genetically engineered mouse models (GEMMs) in which targeted deletions were performed on genes (CDH1, SMAD4, CLDN18, etc.) that are associated with risk of GC, by virtue of being either the most common germline mutation in GC (CDH1 [43]), or for harboring disease-associated SNPs (SMAD4 [44]) or being the target of the most frequent somatic genomic rearrangements [45] (CLDN18). These results suggest that GC-BoNE can objectively assess the degree of similarity between mouse models (both infection-induced and genetically induced types) and human GC. In doing so, it can pinpoint which mouse models best recapitulate the patterns of gene expression that is observed during the transformation from healthy to GC in human samples.
GC-BoNE (C#11-2-4-14) can prognosticate the risk of IM → GC progression
Since we want to identify genes responsible for the progression of GC, we looked at a dataset that curated samples from a prospective study [46] with long-term follow-up (a mean of 12 ± 3.4 years) to evaluate risk of progression to GC among patients with incomplete or complete intestinal metaplasia (IIM and CIM, respectively) (Fig. 4a). It is known that among the types of intestinal metaplasia, IIM carries a greater risk for progression to GC compared to CIM [47]. A recent meta-analysis showed that compared with CIM, pooled relative risk (RR) of cancer/dysplasia in IIM patients was 4.48 (95% CI 2.50–8.03), and the RR was 4.96 (95% CI 2.72–9.04) for cancer, and 4.82 (95% CI 1.45–16.0) for dysplasia [48]. We found that C#11-2-4-14 best distinguished the healthy control patients (HC), patients with high risk-carrying IIM that progressed (IIM-GC) and those that did not progress (IIM-C) (ROC-AUC values: HC vs IIM-C: 0.86, HC vs IIM-GC: 0.94, IIM-C vs IIM-GC: 0.95; Fig. 4b). C#11-2-4-14 was not able to significantly distinguish (using Student’s t test) low risk-carrying CIM from HC. C#7-13-14 also could distinguish HC vs IIM-C (ROC-AUC = 0.80) and HC vs IIM-GC (ROC-AUC = 0.88), but not IIM-C vs IIM-GC (ROC-AUC = 0.71); however, C#11-2-4-14 performed better (Fig. 4c). In addition to IIM, C#7–13-14 could also significantly distinguish HC vs CIM-C (ROC-AUC = 0.73) and HC vs CIM-GC (ROC-AUC = 0.82), but C#7-13-14 could not distinguish CIM-C vs CIM-GC (ROC-AUC = 0.57). The DEA (Li 2015) gene signature similarly separates HC from the other groups (HC vs IIM-C: ROC-AUC = 0.90, HC vs IIM-GC: 0.87, HC vs CIM-C: 0.91, and HC vs CIM-GC 0.97), but is not able to identify the progressors from the non-progressors (IIM-C vs IIM-GC: 0.38; CIM-C vs CIM-GC: 0.47) (Fig. 4d). The DEA (Junnila 2010) [6] signature cannot significantly distinguish any of the samples (ROC-AUC values range from 0.42 to 0.74; Fig. 4e). These findings suggest genes in C#11-2-4-14 might be key to understanding why some IIM patients progress to GC.
GC-BoNE provides insights into the changes in cellular continuum states during healthy → IIM → GC progression
To understand which cellular processes change during cell transformation and which genes contribute to the progression of GC, we checked how clusters in C#11-2-4-14 and C#7-13-14 perform separately (Fig. 4f). When looking at HC vs IIM-C (Fig. 4f row i), cluster 14 is not able to distinguish the samples (ROC-AUC = 0.63), but both C#11-2-4 and C#7-13 are able to separate the samples (ROC-AUC = 0.87, 0.89, respectively). However, when you compare IIM-C vs IIM-GC (Fig. 4f row ii), cluster 14 is better able to distinguish the samples (ROC-AUC = 0.86), with C#11-2-4-14 best able to classify the samples (ROC-AUC = 0.95). These results show genes in C#11-2-4 might be responsible for the progression from HC to IIM, while C#14 is important for IIM to GC. Although C#7-13-14 could not distinguish the progressors in CIM, C#7-13 alone could identify the progressors from the non-progressors (CIM-C vs CIM-GC ROC-AUC = 0.81). Findings thereby suggest that there may be two paths to GC: progression from HC to IIM may be impacted by genes related to muscle contraction, cell cycle and immune system, progression from IIM to GC is affected by extracellular matrix processes and progression from HC to CIM to GC is impacted by genes related to ion transport, which is expected to induce acid/base disturbances and barrier dysfunction, causing gastric acid-related diseases such as CAG and GC [20].
Discussion
Although the incidence rates of GC have been decreasing around the world [4], there have not been any significant improvements in terms of new therapeutics, diagnostics and changes in screening designed for pre-neoplastic stages. In this study, we built a Boolean implication network using GSE66229 and used machine learning (on GSE37023 and GSE122401) to identify a gene signature (GC-BoNE) which could classify normal and gastric samples. Reactome pathway analysis of GC-BoNE revealed C#11-2-4-14 contains genes that control infection-inflammation: increase in cell cycle related genes in C#2 may lead to abnormal cell proliferation [15, 16], increase in immune system genes in C#4 may lead to inflammation in the cells [17,18,19] and increase in ECM genes in C#14 may lead to a remodeled ECM [22, 23]. Changes in genes in C#7-13-14 signify ion transporter related abnormalities, which in parietal cells can lead to the onset of GC [20, 21]. Although previous studies have identified most of these pathways [15,16,17,18,19,20,21,22,23], muscle contraction has not been widely identified. We then tested how GC-BoNE compares to gene signatures from past studies in both normal vs GC samples (Fig. 2c) and GC progression samples (Figs. 3c and 4f).
Our Boolean network-based approach improves upon past studies by first identifying a gene signature (GC-BoNE) that is better able to classify samples along the GC disease continuum compared to previous signatures. When looking at normal vs GC samples, many of the signatures performed well (Fig. 2c). However, we are more interested in finding a gene signature that can distinguish samples earlier in the GC disease continuum. When looking at GC progression, our signature outperforms the other gene signatures (Fig. 3c). Since the genes in GC-BoNE do not overlap with many genes from the other gene signatures (Fig. 1e), this provides a list of new potential biomarkers for targeting therapeutics at different points along the GC disease continuum.
Second, we found that GC-BoNE may have identified two paths that lead from pre-neoplasia to GC. C#11-2-4-14 showcases the immune cell processes which predicted the risk for HC to IIM to GC progression while C#7-13-14 signifies the ion transporter abnormalities seen in HC to CIM to GC (Fig. 5). Although the model was built and trained on N vs GC samples, using a Boolean network-based approach allows us to identify paths that can also determine the intermediate states of disease progression. The invariant asymmetric Boolean implications present in the GC-BoNE signature provide insight into the cellular changes occurring at various time points along the disease continuum. We do not know which cluster is associated with which pre-neoplastic condition, but GC-BoNE provides a list of gene targets that can be tested using the mouse models we identified (Fig. 3e) or other models.
Although this work provides a new set of genes that can be targeted for GC and precancerous conditions, we were not able to rigorously test whether GC-BoNE could identify patients with early lesions such as chronic atrophic gastritis who are at highest risk of progression to GC. We identified six additional datasets (GSE69144, GSE153224, GSE83389, GSE116312, GSE106656, GSE134520) from gastritis samples. One of them is a prospective study (GSE69144) that looked at whether precancerous gastric lesions progressed over time (multifocal atrophic gastritis to intestinal metaplasia or intestinal metaplasia to dysplasia). Since the data was profiled on a DASL Human Cancer Panel microarray, many genes in GC-BoNE were not included in the generation of the violin plot (0/240 genes available for C#11, 0/28 genes for C#7 and 0/14 genes for C#13 and 6/23 genes from C#14; Online Resource 5c). The resulting violin plot indicates we may not be able to predict which patients will progress using the available genes on a DASL cancer panel (progressors follow-up samples have lower scores than at baseline). The other datasets are small and did not show consistent patterns (Online Resource 5d-h). Due to the limited availability of datasets, we conclude that additional prospective studies at all stages of GC progression are necessary before we can fully evaluate the capability of GC-BoNE derived gene signatures to identify high-risk patients.
Overall, we demonstrate that the genes identified from our Boolean network-based approach were better able to classify samples along the GC disease continuum compared to the genes from previous work. The genes from GC-BoNE provide more opportunities to research the cellular processes behind GC progression. Results from this paper can be used to rationalize gene targets for diagnostics and therapeutics.
Data Availability
All data are available in the main text or the supplementary materials. Publicly available data used from NCBI Gene Expression Omnibus is identified by their GSE numbers. All codes are available at https://github.com/sahoo00/BoNE.
References
Van Cutsem E, Sagaert X, Topal B, Haustermans K, Prenen H. Gastric cancer. Lancet. 2016;388(10060):2654–64. https://doi.org/10.1016/S0140-6736(16)30354-3.
Amieva M, Peek RM Jr. Pathobiology of helicobacter pylori-induced gastric cancer. Gastroenterology. 2016;150(1):64–78. https://doi.org/10.1053/j.gastro.2015.09.004.
Correa P, Piazuelo MB. The gastric precancerous cascade. J Dig Dis. 2012;13(1):2–9. https://doi.org/10.1111/j.1751-2980.2011.00550.x.
Karimi P, Islami F, Anandasabapathy S, Freedman ND, Kamangar F. Gastric cancer: descriptive epidemiology, risk factors, screening, and prevention. Cancer Epidemiol Biomarkers Prev. 2014;23(5):700–13. https://doi.org/10.1158/1055-9965.EPI-13-1057.
Gonzalez CA, Sanz-Anquela JM, Companioni O, Bonet C, Berdasco M, Lopez C, et al. Incomplete type of intestinal metaplasia has the highest risk to progress to gastric cancer: results of the Spanish follow-up multicenter study. J Gastroenterol Hepatol. 2016;31(5):953–8. https://doi.org/10.1111/jgh.13249.
Junnila S, Kokkola A, Mizuguchi T, Hirata K, Karjalainen-Lindsberg ML, Puolakkainen P, et al. Gene expression analysis identifies over-expression of CXCL1, SPARC, SPP1, and SULF1 in gastric cancer. Genes Chromosomes Cancer. 2010;49(1):28–39. https://doi.org/10.1002/gcc.20715.
Park S, Nam CM, Kim SG, Mun JE, Rha SY, Chung HC. Comparative efficacy and tolerability of third-line treatments for advanced gastric cancer: a systematic review with Bayesian network meta-analysis. Eur J Cancer. 2021;144:49–60. https://doi.org/10.1016/j.ejca.2020.10.030.
Korhani Kangi A, Bahrampour A. Predicting the survival of gastric cancer patients using artificial and Bayesian neural networks. Asian Pac J Cancer Prev. 2018;19(2):487–90. https://doi.org/10.22034/APJCP.2018.19.2.487.
Sahoo D, Swanson L, Sayed IM, Katkar GD, Ibeawuchi SR, Mittal Y, et al. Artificial intelligence guided discovery of a barrier-protective therapy in inflammatory bowel disease. Nat Commun. 2021;12(1):4246. https://doi.org/10.1038/s41467-021-24470-5.
Sahoo D, Katkar GD, Khandelwal S, Behroozikhah M, Claire A, Castillo V, et al. AI-guided discovery of the invariant host response to viral pandemics. EBioMedicine. 2021;68:103390. https://doi.org/10.1016/j.ebiom.2021.103390.
Ghosh P, Sinha S, Katkar GD, Vo DT, Taheri S, Dang D, et al. Machine learning identifies signatures of macrophage reactivity and tolerance that predict disease outcomes. bioRxiv. 2022;8:10964.
Sahoo D, Dill DL, Gentles AJ, Tibshirani R, Plevritis SK. Boolean implication networks derived from large scale, whole genome microarray datasets. Genome Biol. 2008;9(10):R157. https://doi.org/10.1186/gb-2008-9-10-r157.
Oh SC, Sohn BH, Cheong JH, Kim SB, Lee JE, Park KC, et al. Clinical and genomic landscape of gastric cancer with a mesenchymal phenotype. Nat Commun. 2018;9(1):1777. https://doi.org/10.1038/s41467-018-04179-8.
Sahoo D, Dill DL, Tibshirani R, Plevritis SK. Extracting binary signals from microarray time-course data. Nucleic Acids Res. 2007;35(11):3705–12. https://doi.org/10.1093/nar/gkm284.
Ma X, Huang M, Wang Z, Liu B, Zhu Z, Li C. ZHX1 inhibits gastric cancer cell growth through inducing cell-cycle arrest and apoptosis. J Cancer. 2016;7(1):60–8. https://doi.org/10.7150/jca.12973.
Zhang L, Kang W, Lu X, Ma S, Dong L, Zou B. LncRNA CASC11 promoted gastric cancer cell proliferation, migration and invasion in vitro by regulating cell cycle pathway. Cell Cycle. 2018;17(15):1886–900. https://doi.org/10.1080/15384101.2018.1502574.
Kono K, Nakajima S, Mimura K. Current status of immune checkpoint inhibitors for gastric cancer. Gastric Cancer. 2020;23(4):565–78. https://doi.org/10.1007/s10120-020-01090-4.
Szor DJ, Dias AR, Pereira MA, Ramos M, Zilberstein B, Cecconello I, et al. Prognostic role of neutrophil/lymphocyte ratio in resected gastric cancer: a systematic review and meta-analysis. Clinics. 2018;73:e360. https://doi.org/10.6061/clinics/2018/e360.
Coussens LM, Werb Z. Inflammation and cancer. Nature. 2002;420(6917):860–7. https://doi.org/10.1038/nature01322.
Yuan D, Ma Z, Tuo B, Li T, Liu X. Physiological significance of ion transporters and channels in the stomach and pathophysiological relevance in gastric cancer. Evid Based Complement Alternat Med. 2020;2020:2869138. https://doi.org/10.1155/2020/2869138.
Djamgoz MB, Coombes RC, Schwab A. Ion transport and cancer: from initiation to metastasis. Philos Trans R Soc Lond B Biol Sci. 2014;369(1638):20130092. https://doi.org/10.1098/rstb.2013.0092.
Moreira AM, Pereira J, Melo S, Fernandes MS, Carneiro P, Seruca R, et al. The extracellular matrix: an accomplice in gastric cancer development and progression. Cells. 2020. https://doi.org/10.3390/cells9020394.
Jang M, Koh I, Lee JE, Lim JY, Cheong JH, Kim P. Increased extracellular matrix density disrupts E-cadherin/beta-catenin complex in gastric cancer cells. Biomater Sci. 2018;6(10):2704–13. https://doi.org/10.1039/c8bm00843d.
Schlemper RJ, Riddell RH, Kato Y, Borchard F, Cooper HS, Dawsey SM, et al. The Vienna classification of gastrointestinal epithelial neoplasia. Gut. 2000;47(2):251–5. https://doi.org/10.1136/gut.47.2.251. (PubMed PMID: 10896917).
Sayi A, Kohler E, Hitzler I, Arnold I, Schwendener R, Rehrauer H, et al. The CD4+ T cell-mediated IFN-gamma response to Helicobacter infection is essential for clearance and determines gastric cancer risk. J Immunol. 2009;182(11):7085–101. https://doi.org/10.4049/jimmunol.0803293.
Garay J, Piazuelo MB, Majumdar S, Li L, Trillo-Tinoco J, Del Valle L, et al. The homing receptor CD44 is involved in the progression of precancerous gastric lesions in patients infected with Helicobacter pylori and in development of mucous metaplasia in mice. Cancer Lett. 2016;371(1):90–8. https://doi.org/10.1016/j.canlet.2015.10.037.
Douchi D, Yamamura A, Matsuo J, Melissa Lim YH, Nuttonmanit N, Shimura M, et al. Induction of gastric cancer by successive oncogenic activation in the corpus. Gastroenterology. 2021;161(6):1907-23 e26. https://doi.org/10.1053/j.gastro.2021.08.013.
Garay J, Piazuelo MB, Lopez-Carrillo L, Leal YA, Majumdar S, Li L, et al. Increased expression of deleted in malignant brain tumors (DMBT1) gene in precancerous gastric lesions: findings from human and animal studies. Oncotarget. 2017;8(29):47076–89. https://doi.org/10.18632/oncotarget.16792.
An L, Nie P, Chen M, Tang Y, Zhang H, Guan J, et al. MST4 kinase suppresses gastric tumorigenesis by limiting YAP activation via a non-canonical pathway. J Exp Med. 2020. https://doi.org/10.1084/jem.20191817.
Choi W, Kim J, Park J, Lee DH, Hwang D, Kim JH, et al. YAP/taz initiates gastric tumorigenesis via upregulation of MYC. Cancer Res. 2018;78(12):3306–20. https://doi.org/10.1158/0008-5472.CAN-17-3487.
Giannakis M, Backhed HK, Chen SL, Faith JJ, Wu M, Guruge JL, et al. Response of gastric epithelial progenitors to helicobacter pylori isolates obtained from Swedish patients with chronic atrophic gastritis. J Biol Chem. 2009;284(44):30383–94. https://doi.org/10.1074/jbc.M109.052738.
Shimada S, Akiyama Y, Mogushi K, Ishigami-Yuasa M, Kagechika H, Nagasaki H, et al. Identification of selective inhibitors for diffuse-type gastric cancer cells by screening of annotated compounds in preclinical models. Br J Cancer. 2018;118(7):972–84. https://doi.org/10.1038/s41416-018-0008-y.
Oshima H, Ishikawa T, Yoshida GJ, Naoi K, Maeda Y, Naka K, et al. TNF-alpha/TNFR1 signaling promotes gastric tumorigenesis through induction of Noxo1 and Gna14 in tumor cells. Oncogene. 2014;33(29):3820–9. https://doi.org/10.1038/onc.2013.356.
Ihler F, Vetter EV, Pan J, Kammerer R, Debey-Pascher S, Schultze JL, et al. Expression of a neuroendocrine gene signature in gastric tumor cells from CEA 424-SV40 large T antigen-transgenic mice depends on SV40 large T antigen. PLoS ONE. 2012;7(1):e29846. https://doi.org/10.1371/journal.pone.0029846.
Liu J, Feng W, Liu M, Rao H, Li X, Teng Y, et al. Stomach-specific c-Myc overexpression drives gastric adenoma in mice through AKT/mammalian target of rapamycin signaling. Bosn J Basic Med Sci. 2021;21(4):434–46. https://doi.org/10.17305/bjbms.2020.4978.
Yu L, Wu D, Gao H, Balic JJ, Tsykin A, Han TS, et al. Clinical utility of a STAT3-regulated miRNA-200 family signature with prognostic potential in early gastric cancer. Clin Cancer Res. 2018;24(6):1459–72. https://doi.org/10.1158/1078-0432.CCR-17-2485.
Karasawa F, Shiota A, Goso Y, Kobayashi M, Sato Y, Masumoto J, et al. Essential role of gastric gland mucin in preventing gastric cancer in mice. J Clin Invest. 2012;122(3):923–34. https://doi.org/10.1172/JCI59087.
Loe AKH, Francis R, Seo J, Du L, Wang Y, Kim JE, et al. Uncovering the dosage-dependent roles of Arid1a in gastric tumorigenesis for combinatorial drug therapy. J Exp Med. 2021. https://doi.org/10.1084/jem.20200219.
Hagen SJ, Ang LH, Zheng Y, Karahan SN, Wu J, Wang YE, et al. Loss of tight junction protein claudin 18 promotes progressive neoplasia development in mouse stomach. Gastroenterology. 2018;155(6):1852–67. https://doi.org/10.1053/j.gastro.2018.08.041.
Park JW, Kim MS, Voon DC, Kim SJ, Bae J, Mun DG, et al. Multi-omics analysis identifies pathways and genes involved in diffuse-type gastric carcinogenesis induced by E-cadherin, p53, and Smad4 loss in mice. Mol Carcinog. 2018;57(7):947–54. https://doi.org/10.1002/mc.22803.
Park JW, Jang SH, Park DM, Lim NJ, Deng C, Kim DY, et al. Cooperativity of E-cadherin and Smad4 loss to promote diffuse-type gastric adenocarcinoma and metastasis. Mol Cancer Res. 2014;12(8):1088–99. https://doi.org/10.1158/1541-7786.MCR-14-0192-T.
Itadani H, Oshima H, Oshima M, Kotani H. Mouse gastric tumor models with prostaglandin E2 pathway activation show similar gene expression profiles to intestinal-type human gastric cancer. BMC Genom. 2009;10:615. https://doi.org/10.1186/1471-2164-10-615.
Luo W, Fedda F, Lynch P, Tan D. CDH1 gene and hereditary diffuse gastric cancer syndrome: molecular and histological alterations and implications for diagnosis and treatment. Front Pharmacol. 2018;9:1421. https://doi.org/10.3389/fphar.2018.01421.
Wu DM, Zhu HX, Zhao QH, Zhang ZZ, Wang SZ, Wang ML, et al. Genetic variations in the SMAD4 gene and gastric cancer susceptibility. World J Gastroenterol. 2010;16(44):5635–41. https://doi.org/10.3748/wjg.v16.i44.5635.
Zhang WH, Zhang SY, Hou QQ, Qin Y, Chen XZ, Zhou ZG, et al. The significance of the CLDN18-ARHGAP fusion gene in gastric cancer: a systematic review and meta-analysis. Front Oncol. 2020;10:1214. https://doi.org/10.3389/fonc.2020.01214.
Companioni O, Sanz-Anquela JM, Pardo ML, Puigdecanet E, Nonell L, Garcia N, et al. Gene expression study and pathway analysis of histological subtypes of intestinal metaplasia that progress to gastric cancer. PLoS ONE. 2017;12(4):e0176043. https://doi.org/10.1371/journal.pone.0176043.
Du S, Yang Y, Fang S, Guo S, Xu C, Zhang P, et al. Gastric cancer risk of intestinal metaplasia subtypes: a systematic review and meta-analysis of cohort studies. Clin Transl Gastroenterol. 2021;12(10):e00402. https://doi.org/10.14309/ctg.0000000000000402.
Wei N, Zhou M, Lei S, Zhong Z, Shi R. A meta-analysis and systematic review on subtypes of gastric intestinal metaplasia and neoplasia risk. Cancer Cell Int. 2021;21(1):173. https://doi.org/10.1186/s12935-021-01869-0.
Li H, Yu B, Li J, Su L, Yan M, Zhang J, et al. Characterization of differentially expressed genes involved in pathways associated with gastric cancer. PLoS ONE. 2015;10(4):e0125013. https://doi.org/10.1371/journal.pone.0125013.
Li L, Zhu Z, Zhao Y, Zhang Q, Wu X, Miao B, et al. FN1, SPARC, and SERPINE1 are highly expressed and significantly related to a poor prognosis of gastric adenocarcinoma revealed by microarray and bioinformatics. Sci Rep. 2019;9(1):7827. https://doi.org/10.1038/s41598-019-43924-x.
Takeno A, Takemasa I, Doki Y, Yamasaki M, Miyata H, Takiguchi S, et al. Integrative approach for differentially overexpressed genes in gastric cancer by combining large-scale gene expression profiling and network analysis. Br J Cancer. 2008;99(8):1307–15. https://doi.org/10.1038/sj.bjc.6604682.
Zang S, Guo R, Xing R, Zhang L, Li W, Zhao M, et al. Identification of differentially-expressed genes in intestinal gastric cancer by microarray analysis. Genom Proteom Bioinform. 2014;12(6):276–83. https://doi.org/10.1016/j.gpb.2014.09.004.
Wang JB, Li P, Liu XL, Zheng QL, Ma YB, Zhao YJ, et al. An immune checkpoint score system for prognostic evaluation and adjuvant chemotherapy selection in gastric cancer. Nat Commun. 2020;11(1):6352. https://doi.org/10.1038/s41467-020-20260-7.
Cho JY, Lim JY, Cheong JH, Park YY, Yoon SL, Kim SM, et al. Gene expression signature-based prognostic risk score in gastric cancer. Clin Cancer Res. 2011;17(7):1850–7. https://doi.org/10.1158/1078-0432.CCR-10-2180.
Wang H, Wu X, Chen Y. Stromal-immune score-based gene signature: a prognosis stratification tool in gastric cancer. Front Oncol. 2019;9:1212. https://doi.org/10.3389/fonc.2019.01212.
Acknowledgements
This work was supported by the Torey Pines Foundation Award (to P.G), the National Institutes for Health (NIH) grant R01-AI155696 (to PG and DS). Other sources of support include: T32GM139790 (to DV), R01-GM138385 (to DS), R01-AI141630, CA100768 and CA160911 (to P.G), and UG3TR002968 (to D.S. and P.G). D.S was also supported by three Padres Pedal the Cause awards (Padres Pedal the Cause/RADY #PTC2017, Padres Pedal the Cause #PTC2021, and San Diego NCI Cancer Centers Council (C3) #PTC2017). D.S and P.G were also supported by the Leona M. and Harry B. Helmsley Charitable Trust. We would also like to thank Saptarshi Sinha, Dharanidhar Dang, and Sahar Taheri for providing feedback on the manuscript.
Author information
Authors and Affiliations
Contributions
Conceptualization: DS and PG. methodology: DS and DV. investigation: DV, DS, and PG. visualization: DV, DS, and PG. funding acquisition: DS and PG. Project administration: DS and PG. supervision: DS and PG. writing—original draft: DV, DS, and PG. writing—review and editing: DV, DS, and PG.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
10120_2022_1360_MOESM2_ESM.xlsx
Supplementary Online Resource 2 List of GSE IDs used in the analysis along with sample type (human vs mouse), use (network, training, validation) and figure panel file2 (XLSX 10 KB)
10120_2022_1360_MOESM3_ESM.xlsx
Supplementary Online Resource 3 Complete list of genes used in all gene signatures (GC-BoNE and signatures from other sources) file3 (XLSX 29 KB)
10120_2022_1360_MOESM4_ESM.pdf
Supplementary Online Resource 4 Bubble plots of ROC-AUC values (radius of circles is based on the ROC-AUC) demonstrating the direction of gene regulation (Up: red, Down: blue) for the classification of samples in 38 mouse models (GC-BoNE clusters in columns; sample comparison in rows). P-values based on Welch’s T-test (of composite score of gene expression values) are provided using the standard code (*p<=0.05, **p<=0.01, ***p<=0.001) next to the ROC-AUC file4 (PDF 565 KB)
10120_2022_1360_MOESM5_ESM.pdf
Supplementary Online Resource 5 Analysis of atrophic gastritis datasets using GC-BoNE model. a. Schematic hypothetical disease continuum path from normal, gastritis, intestinal metaplasia to gastric carcinoma. b. Schematic describing study in GSE69144. c-h. Violin plots for gastritis datasets (GSE69144, GSE153224, GSE83389, GSE116312, GSE106656, GSE134520) using the GC-BoNE signature: 11-2-4-14 (left) and 7-13-14 (right) file5 (PDF 3082 KB)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Vo, D., Ghosh, P. & Sahoo, D. Artificial intelligence-guided discovery of gastric cancer continuum. Gastric Cancer 26, 286–297 (2023). https://doi.org/10.1007/s10120-022-01360-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10120-022-01360-3