Identification of a molecular signature of prognostic subtypes in diffuse-type gastric cancer.

Background Although recent advances in high-throughput technology have provided many insights into gastric cancer (GC), few reliable biomarkers for diffuse-type GC have been identified. Here, we aim to identify a prognostic and predictive signature of diffuse-type GC heterogeneity. Methods We analyzed RNA-seq-based transcriptome data to identify a molecular signature in 150 gastric tissue samples including 107 diffuse-type GCs. The predictive value of the signature was verified using other diffuse-type GC samples in three independent cohorts (n = 466). Log-rank and Cox regression analyses were used to estimate the association between the signature and prognosis. The signature was also characterized by somatic variant analyses and tissue microarray analysis between diffuse-type GC subtypes. Results Transcriptomic profiling of RNA-seq data identified a signature which revealed distinct subtypes of diffuse-type GC: the intestinal-like (INT) and core diffuse-type (COD) subtypes. The signature showed high predictability and independent clinical utility in diffuse-type GC prognosis in other patient cohorts (HR 2.058, 95% CI 1.53–2.77, P = 1.76 × 10–6). Integrative mutational and gene expression analyses demonstrated that the COD subtype was responsive to chemotherapy, whereas the INT subtype was responsive to immunotherapy with an immune checkpoint inhibitor (ICI). Tissue microarray analysis showed the practical utility of IGF1 and NXPE2 for predicting diffuse-type GC heterogeneity. Conclusions We present a molecular signature that can identify diffuse-type GC patients who display different clinical behaviors as well as responses to chemotherapy or ICI treatment. Electronic supplementary material The online version of this article (10.1007/s10120-019-01029-4) contains supplementary material, which is available to authorized users.

61 diffuse-type GCs (TCGA cohort, n = 61). Notably, the mRNA quantification and DNA variant data with clinical information on GC for the TCGA cohort were obtained from the GDC portal (https://portal.gdc.cancer.gov/), in which tumor histology was last updated in March 2017. CNV and methylation data of 61 diffuse-type GC patients were obtained from the cBioPortal (https://www.cbioportal.org/). The baseline characteristics of public datasets of the GC patients along with our original cohort are described in the Table S1.

Statistical analysis
To perform gene expression profiling for gastric tissue samples, a hierarchical clustering algorithm with the centered correlation coefficient as the measure of similarity and complete linkage clustering was applied. For cluster analysis, the read counts per million fragments mapped (CPM) of each sample were used to estimate the expression level of each gene. The CPM data were normalized by the quantile method, log2 transformed, and median centered across genes and samples.
To estimate the significance of differences in gene expression between sample subgroups, the edgeR package, which uses a negative binomial model was employed to detect differentially expressed genes from count data [4]. Expression differences in genes were considered statistically significant if the P-value was <0.001 and the fold difference in expression between two sample groups was ≥2. The Kaplan−Meier method was used to calculate disease-free and overall survival, and the difference in survival between two groups was assessed using log-rank statistics. The prognostic association between the signature and known clinico-pathological factors was assessed using multivariate Cox proportional hazard regression models. A backward-forward step procedure was applied to optimize the multivariate model with the most informative variables [5]. Statistical analysis was mainly carried out in the R language environment (ver. 3

.5.3).
Gene set enrichment analysis was carried out to identify the most significant gene sets associated with disease processes, molecular and cellular functions, and physiological and developmental processes. The significance of overrepresented gene sets was estimated by Fisher's exact test. To identify predominant upstream regulators that account for the observed gene expression changes, an upstream regulator analysis was carried out. This analysis determined the number of known targets of each regulator and compared their direction of change to what is expected from the previous literature. An overlap P-value and an activation Z-score for each potential regulator were estimated. The overlap P-value, estimated by Fisher's exact test, measures whether there is statistically significant overlap between the genes in a dataset and the genes regulated by a regulator. The activation Z-score is used to infer likely activation states of regulator candidates by comparison with a model that assigns a random regulation direction. A positive or negative activation Z-score implies that a potential upstream regulator is activated or inhibited, respectively. Gene set enrichment and upstream regulator analyses were performed using the Ingenuity Pathway Analysis (IPA) tool.
For comparing relative linear CNV data for each gene generated from the Affymetrix SNP6 platform in the TCGA cohort, a two-sample t-test was used. When comparing β-values for each gene generated from the Illumina 450K array platform in the TCGA cohort, a Wilcoxon rank sum test was applied.

Tissue microarray analysis
To examine the protein expression levels of IGF1 and NXPE2, TMA blocks were assembled using 2 mm-diameter core tissue samples from 81 individual paraffin-embedded gastric tumors (including 53 diffuse-type GCs matching RNA-seq data) resected at Seoul National University Hospital. The slides from the TMA blocks were subjected to multiplex immunohistochemistry (IHC) with three antibodies, targeting IGF-1 (Cat . No. NBP2-48922,   NOVUS, Abingdon, UK), NXPE2 (FAM55B, NBP2-31001, NOVUS, Abingdon, UK), and cytokeratin (CK, Cat. No. M3515, Agilent). The first IHC assay was performed with anti-IGF-1, followed by the corresponding secondary antibody (rabbit anti-mouse IgG; Invitrogen, Waltham, MA, USA) and an anti-rabbit Envision+ System horseradish peroxidase-labelled polymer. The slides were treated with ImmPact NovaRED (Vector Laboratories, Burlingame, CA, USA) followed by Mayer's hematoxylin. After obtaining scanned images of the whole slides with an AT2 scanner (Leica Biosystems), the slides were treated with stripping buffer (20% SDS, 0.5 M Tris-HCl pH 6.8, β-mercaptoethanol and DW). The slides were then stained with anti-NSPE2, and the above staining-scanning-stripping procedures were repeated. Finally, the slides were stained with anti-cytokeratin, and the images were scanned.
For image analysis, each file of a scanned whole slide was divided into small images comprising single cores as the unit of analysis. After the alignment of three images from IHC, tumour cells were identified using CK images and nuclear images. The ratio of the staining intensity of IGF-1 or NXPE2 to that of cytokeratin was measured in all tumor cells.

Biological insight into the core diffuse-type (COD) signature
To characterize the biological relevance of the signature, we performed a gene set enrichment test of the 1,922 genes in the signature (Fig. S1). The test illustrated significantly altered functions (Fig. S3), among which many genes involved in cellular movement, cell-to-cell signaling and interaction, cellular growth and proliferation, or cell morphology in the category of molecular and cellular functions were significantly enriched, accounting for the poor prognosis of the COD subtype.
To explore the predominant regulators and their biological activities in the COD subtype, an upstream regulator analysis of 1,922 genes in the signature was carried out. This analysis illustrated many important activated or inhibited regulators, among which several genes involved in the epithelial-to-mesenchymal transition (EMT) signature, such as TGFB1, SNAI1 and TWIST1, were significantly activated in the COD subtype. In addition, many regulators playing key roles in cell proliferation, such as GLI1, MYOCD or CCL2, were strongly activated along with the top regulator TGFB1 (Table S2), indicating that an interactive regulatory network consisting of several regulators may be a key genetic determinant associated with the poor prognosis of diffuse-type GC in COD patients.
Based on the identification of upstream regulators, we performed an analysis of the top regulatory effects indicated by the IPA tool to assess which biofunctions were significantly activated by upstream regulators and their effectors revealed in the upstream regulator analysis. Exploration of the regulatory effects of COD regulators (Table S2) revealed a network of top regulatory effects activating the migration of cells and cell movement of tumor cells (Fig. S4). Upstream regulators such as TGFB1, SNAI1, GLI1, HDAC3, and HDAC6 regulated the activity of many downstream effectors, including TGFB1I1 and FGF1, that are involved in the EMT signature, consistent with our previous findings ( Fig. 1; Fig.   S1). We also observed that active TGFB1 regulated IGF1, an indicator of GC with a mesenchymal phenotype [6], and subsequently activated the movement of tumor cells. These results demonstrated a robust property of the COD signature reflecting EMT activity, in which important regulators synergistically activate downstream effectors, leading to cell movement or migration disorders.  It was predicted that an upstream regulator was activated when the activated Z-score was greater than 0, while it was predicted that an upstream regulator was inhibited when the activated Z-score was less than 0. This prediction was carried out using Ingenuity Pathway Analysis software. Abbreviations: COD, core diffuse type. Involved genes Figure S1. Comparative analysis of differentially expressed genes among the molecular subtypes of diffuse type gastric cancer (GC).

Supplementary tables
a Venn diagram of genes selected by the exact test using EdgeR software in the distinct sub patient groups. Genes in the blue circle (gene list A) represent those differentially expressed between normal-like (N) and intestinal-like (INT) subgroups. Genes in the red circle (gene list B) represent those differentially expressed between INT and core diffuse type (COD) subgroups. Cut-off criteria of a P-value of less than 0.001 and a 2-fold or greater relative difference were applied to select genes whose expression were significantly different between the two subtypes. b Expression patterns of selected genes in the Venn diagram. The data are presented in matrix format, in which rows represent individual genes and columns represent each tissue. The red and blue colors reflect high and low expression levels, respectively.   Figure S3. Gene set enrichments analysis of 1,922 genes associated with the CODsignature.
Classification enrichment was determined using Ingenuity Pathway Analysis software. The threshold of significance was P < 0.05. COD, core diffuse type.  Gene expression data from a cohort of GC patients who were treated with an anti-PD-L1 agent (pembrolizumab) was used to estimate an association between COD signature and response to ICI (n=45