Introduction

Pancreatic adenocarcinoma (PAAD) is one of the most common carcinomas globally and ranks 6th in cancer-related deaths1. Although considerable progress has been made in diagnosis and treatment2, the 5-year survival rate of PAAD is still less than 10%3. Therefore, there is still a need for new ways to predict patient prognosis and augment early intervention to maximize long-term survival.

The development of high-throughput sequencing has revolutionized DNA and RNA research4 and broadened the scope of research into potential biological progress and mechanisms of human disease5. Several studies have revealed differentially expressed mRNA/miRNA/lncRNA and differentially expressed genes (DEGs) of pancreatic carcinoma in recent years6,7,8,9,10. Although its theoretical value to the diagnosis and prognosis of pancreatic carcinoma has been detailed, the biological mechanisms, clinical significance, and the interaction between DEGs during pancreatic carcinoma tumorigenesis are yet to be explored.

Inflammation mediates and participates in various pathophysiological processes, including classic pathways of infection, immune elimination, tissue repair and regeneration11,12. The current studies put forwards a new point of view that inflammation is tightly associated with tumorigenesis, progression and metastasis of cancer13,14. Tumor risk factors can stimulate an extrinsic inflammatory response, while innate inflammatory response contributes to tumor progression, indicating that a complex network exists in tumor-immune microenvironment. Furthermore, immune-related genes (IRGs), including interleukin (IL)-1015, IL-616, tumor necrosis factor-α (TNF-α)17 and (C-X-C motif) ligand (CXCL) chemokine family18 played a vital role in tumor proliferation, metabolism and metastasis. The occurrence and development of pancreatic cancer are recognized to be closely linked with inflammation. Local and systemic chronic inflammation could elevate the risk of PAAD, and PAAD-related inflammatory infiltration might simultaneously enhance tumor progression and metastasis19. Beyond the mechanism of an imbalance between inflammatory cell infiltration and immunosuppressive phenotype in the tumor-immunity microenvironment, obesity and diabetes are associated with promoting inflammation and inhibiting autophagy to Create a suitable environment for the tumorigenesis of PAAD through oxidative stress and metabolic impairments20.

Due to the interaction between immune-mediated inflammation and tumorigenesis, identifying whether immune response influences the prognosis of cancer patients has become a research hotspot. Quite a few carcinoma prognosis-related biomarkers have been identified and used to create models to predict patient survival21,22,23,24. However, there has not been much regarding IRGs signature for PAAD, let alone an immune-related prognostic model. In this study, we used the Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO) database to screen out high-risk IRGs and create a novel risk-score signature and nomogram based on the IRGs for predicting the prognosis of PAAD patients. We also identified and comprehensively analyzed potential clinical therapeutic targets. Our findings may highlight the outstanding function of the IRGs signature in predicting PAAD patients’ prognosis and reveal its potential ability to predict the prognosis of patients with liver hepatocellular carcinoma (LIHC).

Materials and methods

Data acquisition and processing

We downloaded the TCGA-PAAD and TCGA-LIHC data sets, including: RNA sequences, raw clinical data and prognostic information, from the TCGA database (https://portal.gdc.cancer.gov/). Data of normal tissues from the GTEx database (https://gtexportal.org/) was obtained for supplementary. The gene expression data were converted to Transcripts per million reads (TPM) format and log2 transformed. Other data was cleaned and batch corrected with clinical information retained. Gene expression profiles and prognostic data of GSE2873525 and GSE6245226 were collected from the GEO database (http://www.ncbi.nlm.nih.gov/geo/) and used as validation datasets.

We obtained complete IRGs names, totaling 2483 from the “Resources-Gene Lists” module of the Immunology Database and Analysis Portal (ImmPort)27 (https://www.immport.org/home).

DEGs & IRGs screening and intersecting

We first conducted a differential gene expression analysis to screen for genes expressed differently between pancreatic tumors and normal tissues, based on the RNA sequence dataset of TCGA-GTEx-PAAD. The log2(Fold Change) (FC) and adjusted p-value (P.adj) were calculated using R. Then, |log2(FC)|> 1 & P.adj < 0.05 was considered as the cut-off criteria for significant DEGs. These were subsequently intersected with the IRGs above. “ggplot2” package of R was used to visualize the performance with volcanoes plot and Venn diagram.

Enrichment analysis for DEGs & IRGs

We performed the Kyoto encyclopedia of genes and genomes (KEGG) pathway and Gene Ontology (GO) enrichment analysis and the results were plotted using “ggplot2” (version 3.3.3) and “clusterProfiler” (version 3.14.3) packages in R28 for the genes of intersection. The settings modes used were: biological process (BP), cellular component (CC) and molecular function (MF) with P.adj < 0.05 were considered statistically significant and output visualized cnetplots respectively.

Construction of PAAD-related IRGs signature (PAAD-IRGS) for prognosis

Based on the gene analysis above, we obtained independent immune-related prognostic risk genes using the Least absolute shrinkage and selection operator (Lasso) regression analysis29, followed by univariate and multivariate Cox regression analysis for further identification. LASSO is a popular algorithm, extensively utilized in medical studies30,31,32,33. Next, the Toil procedure from the university of California Santa Cruz (UCSC) Xena (34 was used to analyze the difference in the expression of the genes identified above in unpaired samples of PAAD. The log2(Transcripts per million (TPM) + 1) for log-scale was used in the assessments. The diagnostic value of these genes was evaluated using receiver operating characteristic (ROC) curves.

After this procedure, the optimal related IRGs were retained to establish the PAAD-IRGS. We compared the expression level of these genes in different pathologic stages and conducted the exclusively related KEGG and GO analysis. According to the expression level (EXP) and multivariate COX regression coefficient β value of the genes, the formula of the immune-related risk score signature is as follows35:

$$\mathrm{PAAD}-\mathrm{IRGS}=\sum_{k=1}^{n}EXPk*\mathrm{\beta k}.$$

Based on the risk score of each sample, the cohort was divided into two groups (low-risk with 0–50% vs high-risk with 50–100%). The performance of the classifier was assessed using ROC. Finally, we performed survival analyses of overall survival (OS) for single and combined genes using Kaplan–Meier and the log-rank test.

Assessment of PAAD-IRGS and relevant clinical nomogram

The model to predict 1–3 years OS was evaluated using time-dependent ROC and decision curve analysis (DCA). Next, clinicopathologic characteristics of patients from TCGA-PAAD were collected and analyzed using univariate and multivariate COX regression analysis. Based on the clinical risk indicators (CRI) and PAAD-IRGS, we established a nomogram model to predict 1–3 years OS probability in PAAD patients. The nomogram was calibrated and assessed using DCA to verify its accuracy and reliability. The predictive accuracy of classical TMN-stage, PAAD-IRGS, CRI and nomogram were compared using the concordance index (C-Index).

Validation and extended application of PAAD-IRGS

To validate the specificity and precision of PAAD-IRGS, we utilized GSE28735 and GSE62452, which contained sufficient gene expression and prognosis data, to conduct differential expression analysis, survival analysis, diagnostic/prognostic value and applicability of clinical decision evaluation.

For assessing the extended applicability of PAAD-IRGS, considering the disease categories and histological homologies, we selected the TCGA-LIHC (n = 374) for further validation of the model. The difference in expression level of these genes between tumor and normal tissues was compared, and their individual and unified diagnostic ability. According to the standard established above, the LIHC cohort was grouped as low- (0–50%) and high-risk (50–100%) groups. Single-gene and unified signature OS analyses were performed using Kaplan–Meier curves, followed by time-dependent ROC and DCA analysis. Similarly, we established a nomogram model to predict 1–3 years OS probability in LIHC patients, based on the PAAD-IRGS and CRI, obtained from the TCGA-LIHC cohort through univariate and multivariate COX regression analysis. Calibration and DCA were performed to verify the reliability and accuracy of the model. Then the classical TMN-stage, PAAD-IRGS, CRI of TCGA-LIHC, and synthetic nomogram were compared with C-Index to assess their accuracy and clinical value for LIHC.

Furthermore, we expanded the application of PAAD-IRGS to predict 1–3 years disease-specific survival (DSS) and progression-free interval (PFI) of patients in TCGA-PAAD.

Immuno-correlation analysis and drug prediction of PAAD-IRGS

We conducted the PAAD-IRGS risk score correlation analysis with 24 immune-related cells36 in PAAD using the spearman’s test37. Subsequently, survival analysis of several significant immune-related cells was conducted to identify whether they were risk factors of PAAD using tumor immune estimation resource (TIMER), version 2.0 database38,39,40. Then we downloaded the immunophenoscore (IPS)41 data from The Cancer Immunome Atlas (TCIA) database (https://tcia.at/patients), which supports results of comprehensive immunogenomic analysis of next generation sequencing data (NGS) based on TCGA42, for analyzing the correlation between PAAD-IRGS and immune response in PAAD patients.

Relationships between PAAD-IRGS risk score and three kinds of immunomodulators expression in PAAD based on TCGA were explored and visualized with heatmaps, as well as relevant drug prediction accordingly via tumor-immune system interaction database (TISIDB)43 (http://cis.hku.hk/TISIDB/index.php), integrating multiple heterogeneous data. We searched the website with the gene symbol S100P, S100A2 and MMP12 and download relevant information in the "drug" module. Circle map and annotations were performed accordingly.

Analysis of protein expression of the PAAD-IRGS

The human protein atlas (HPA) database44,a spatial map of the human proteome (http://www.proteinatlas.org/humanproteome/pathology) was used to ascertain the physiological and pathological expression data of S100P, S100A2 and MMP12. As supplementary, we used UALCAN (http://ualcan.path.uab.edu/index.html) to conduct protein level analysis of S100P, S100A2 and MMP12 genes. It is a comprehensive and interactive public resource for cancer OMICS data analysis45, provided by the Clinical proteomic tumor analysis consortium (CPTAC) dataset46.

Statistical analysis

All statistical analyses were performed with R (version 3.6.3). Normally distributed variables were analyzed using the t-test and one-way ANOVA test and non-normally distributed variables with nonparametric tests. Log-rank test and Cox regression were used for survival analysis, Pearson’s correlation and spearman’s rank correlation test for correlation analysis. P or P.adj < 0.05 was considered statistically significant. The correlations was defined as follows: 0.00–0.10 (negligible), 0.10–0.39 (weak), 0.40–0.69 (moderate), 0.70–0.89 (strong), 0.90–1.00 (very strong)47.

Results

The study design for this work is shown in Fig. 1.

Figure 1
figure 1

Study design flow chart. TCGA the cancer genome atlas, PAAD pancreatic adenocarcinoma, LIHC liver hepatocellular carcinoma, ROC receiver operating characteristic curve. This cover has been designed using images from Freepik.com.

DEGs & IRGs analysis

178 PAAD patients with gene expression and prognostic information and 4 matched adjacent normal samples were included in the training cohort. 25,597 gene IDs were analyzed after removing null values, in which we obtained 539 differentially expressed genes that met the cut-off criterion of |log2(FC)|> 1 & P.adj < 0.05 in PAAD (236 genes up-regulated while 303 down-regulated) (Fig. 2A). Through the intersection of 490 DEGs and 1744 IRGs, 49 differentially expressed IRGs in PAAD were screened out (Fig. 2B).

Figure 2
figure 2

Screening of differentially expressed genes and immune-related genes related to pancreatic adenocarcinoma. (A) Volcano plot of 539 DEGs; (B) Venn diagram of intersection of DEGs and IRGs. (C) KEGG pathways analysis of genes in DEGs & IRGs; (D) GO analysis of genes in DEGs & IRGs; (E) Volcano plot of 49 genes in DEGs & IRGs. DEGs differentially expressed genes, IRGs immune-related genes, KEGG Kyoto Encyclopedia of Genes and Genomes, GO Gene Ontology.

Enrichment analysis

The KEGG pathways which were most associated with immunity involved in natural killer mediated cytotoxicity (P < 0.001), B cell receptor signaling pathway (P < 0.001) and chemokine signaling pathway (P < 0.05) (Fig. 2C). Specifically, regulation of the immune effector process, cell killing and humoral immune response of the biological process (BP) module (all P < 0.001) were observed to be associated with immunity. So was major histocompatibility complex (MHC) protein binding and cytokine receptor binding of molecular functional (MF) module (Fig. 2D). Gene overlap is highlighted in the volcano plot (Fig. 2E).

Construction and assessment of PAAD-IRGS

We further analyzed the genes identified above to identify the potential diagnostic and prognostic value of IRGs in PAAD. Based on LASSO regression analysis, four prognostic risk biomarkers were identified (high expression of S100P, S100A2, and MMP12 was associated with poor prognosis, while low expression of DEFA5 was associated with better prognosis) (Fig. 3A). S100P, S100A2 and MMP12 were expressed higher in tumor tissues, compared with normal tissues (P < 0.001), while the opposite was true for DEFA5 expression (P < 0.05) (Fig. 3B). The area under curve (AUC) of S100P, S100A2, and MMP12 were 0.971, 0.968, and 0.981, indicating their excellent diagnostic value. However, DEFA5 was considered an inefficient biomarker for diagnosis (AUC = 0.438) (Fig. 3C). Subsequent univariate and multivariate COX regression analyses were conducted on the four genes, excluding DEFA5 (P = 0.164) (Fig. 3D). The model of PAAD-IRGS was finally comprised of S100P, S100A2 and MMP12. We plugged the corresponding regression coefficients into the equation as follows to complete the establishment of PAAD-IRGS: PAAD-IRGS = EXP(S100P) × 0.132 + EXP(S100A2) × 0.098 + EXP(MMP12) × 0.095.

Figure 3
figure 3

Establishment of IRGs signature (IRGS). (A) Ten-time cross-validation for tuning parameter selection in the Lasso regression model and risk analysis of four immune-related genes in patients with PAAD; (B) differential expression of four IRGs between tumor and normal tissues of patients with PAAD; (C) diagnostic value of four IRGs for patients with PAAD; (D) univariate and multivariate COX analysis of four IRGs is shown in forest map. *P < 0.05, **P < 0.01, ***P < 0.001.

Furthermore, we performed PAAD-IRGS specialized differential expression analysis in different pathology stages. Using the Gene expression profiling interactive analysis (GEPIA) database48, a statistically significant difference in S100P expression was observed in different pathology stages of TCGA-PAAD (P < 0.001) (Fig. 4A). After a single gene correlation analysis of these three genes, we obtained 79 co-correlated genes (Fig. 4B). Based on further enrichment analysis, KEGG pathways seemingly involved in ECM-receptor interaction, regulation of actin cytoskeleton, p53 signaling pathway, focal adhesion and pancreatic cancer, and GO pathway focused on cell-membrane organization and connection (Fig. 4C).

Figure 4
figure 4

A comprehensive evaluation of IRGS. (A) Expression of 3 signature genes in different pathologic stages of PAAD; (B) Venn diagram of intersection of enrichment analysis of 3 signature genes; (C) GO and KEGG analysis of 3 signature genes; (D) diagnostic value of IRGS in PAAD; (E) single-gene survival analysis of OS was shown in Kaplan–Meier curves respectively; (F) Kaplan–Meier curves show that OS was significantly different between the low- and high-risk groups in TCGA-PAAD. OS overall survival; *P < 0.05, **P < 0.01, ***P < 0.001.

The model showed a better diagnostic capability than individual genes with an AUC of 0.993 (95% confidence interval (CI) = 0.987–0.998) (Fig. 4D). By single gene survival analysis, we observed that patients in S100A2 high-expression group had a worse OS than patients in S100A2 low-expression group (hazard ratio (HR) = 1.62, 95% CI = 1.07–2.46, P = 0.023). However, there is no statistically significant difference between low and high expression groups of S100P or MMP12 (Fig. 4E). Patients in the PAAD-IRGS high-risk score group had a much worse OS than patients in low-risk score group (HR = 2.21, 95% CI = 1.45–3.39, P < 0.001) (Fig. 4F).

Establishment of PAAD-IRGS based prognosis model

A total of 182 TCGA-PAAD patients were included in the prognostic analysis with the baseline characteristics shown in Table 1. Time-dependent ROC analysis was conducted to assess the accuracy of PAAD-IRGS for prediction of OS in PAAD patients. It showed an above average performance of 1 (AUC = 0.679), 2 (AUC = 0.696), and 3 years (AUC = 0.713) (Fig. 5A). DCA showed that model has a good clinical utility (Fig. 5B). T3&T4 stage (P = 0.030), N1 stage (P = 0.004), pathological stage II (P = 0.033), radiation therapy (P = 0.013), primary therapy outcome of PR&CR (P < 0.001), R1&R2 resection (P = 0.028), histological grade G2 (P = 0.047)/G3&G4 (P = 0.008), non-head of pancreas neoplasm (P = 0.004) and PAAD-IRGS (P < 0.001) were significantly correlated with OS. Radiation therapy (HR = 0.437, 95%CI = 0.228–0.835, P = 0.012), primary therapy outcome of PR&CR (HR = 0.547, 95%CI = 0.324–0.923, P = 0.024), R1&R2 resection (HR = 1.896, 95%CI = 1.087–3.308, P = 0.024) and PAAD-IRGS (HR = 2.312, 95%CI = 1.245–4.294,P = 0.008) were independent factors impacting the OS of patients with PAAD (Table 2). Based on the above analysis, the nomogram incorporating PAAD-IRGS and multiple clinicopathological characteristics was plotted (Fig. 5C). Through comparison, the concordance index (C-Index) of TNM-stage, PAAD-IRGS, Nomogram (only clinical indicators), and Nomogram + IRGS was 0.567, 0.639, 0.706 and 0.723 (Table 3), respectively. Additionally, Nomogram calibration curves (Fig. 5D) showed good predictive accuracy of the model and DCA (Fig. 5E).

Table 1 Baseline characteristics of patients (TCGA-PAAD).
Figure 5
figure 5

Evaluation of PAAD-IRGS and establishment and assessment of relevant nomograms. (A) The time-dependent ROC curve of the PAAD-IRGS for predicting 1, 2, and 3-year OS; (B) decision curve analysis for evaluating the PAAD-IRGS; (C) an PAAD-IRGS -based nomogram included with 8 clinical components predicting 1, 2, and 3-year OS of PAAD; (D) nomogram calibration curve for 1, 2, and 3-year. (E) decision curve analysis for evaluating the net benefits of nomogram at 1, 2, and 3 years. *P < 0.05, **P < 0.01, ***P < 0.001.

Table 2 The univariate and multivariate analysis for the OS (TCGA-PAAD).
Table 3 The C-Index values of TNM-stage, PAAD-IRGS, nomogram and nomogram + IRGS in different cohorts.

Validation and extension of PAAD-IRGS

For further validation of the reliability of PAAD-IRGS, we employed two datasets of the GEO database. Differential expression, survival, diagnostic value, prognostic value analysis and DCA were performed in both datasets. The three genes had a higher expression in the tumor tissues than in normal tissues (P < 0.001) of GSE28735 (Fig. 6A). Patients in a high-risk group of PAAD-IRGS had worse OS than that of the low-risk group (HR = 2.35, 95%CI = 1.08–5.14, P = 0.032) (Fig. 6B). Consistent with the results above, although S100P (AUC = 0.929), S100A2 (AUC = 0.764), MMP12 (AUC = 0.828) (Fig. 6C) showed considerable diagnostic values for PAAD respectively, PAAD-IRGS had the optimum diagnostic ability (AUC = 0.943, 95%CI = 0.896–0.991) (Fig. 6D). In addition, time-dependent ROC showed the model had an above-average ability to predict 1—(AUC = 0.671), 2—(AUC = 0.600), and 3—year OSs (AUC = 0.866) (Fig. 6E). The model also had an acceptable net benefit based on DCA (C-Index = 0.644, 95%CI = 0.598–0.690) (Fig. 6F). Similar results of differential expression (P < 0.001) (Fig. 7A) and OS probability (HR = 1.84, 95%CI = 1.02–3.32, P = 0.044) (Fig. 7B) were obtained in GSE62452, as well as the independent diagnostic value of S100P (AUC = 0.865), S100A2 (AUC = 0.745), MMP12 (AUC = 0.811) (Fig. 7C) and all of them combined (AUC = 0.885, 95%CI = 0.828–0.943) (Fig. 7D). The corresponding ROC analysis showed an above-average performance in predicting 1—(AUC = 0.536), 2—(AUC = 0.672), and 3—year prognosis (AUC = 0.861) (Fig. 7E). Although 1-year net benefit of prognostic prediction was not satisfactory, 2- and 3-years showed a much better net benefit (C-Index = 0.580, 95%CI = 0.531–0.629) (Fig. 7F).

Figure 6
figure 6

Validation of PAAD-IRGS with GSE28735. (A) Expression level of 3 IRGs in GSE28735 cohort; (B) Kaplan–Meier curves show a better OS in the low-risk group than the high-risk group; (C) diagnostic value of 3 IRGs for PAAD patients in GSE28735 cohort; (D) diagnostic value of PAAD-IRGS in GSE28735 cohort; (E) time-dependent ROC curve analysis of the PAAD-IRGS at 1, 2, and 3 years in GSE28735 cohort; (F) decision curve analysis for evaluating the net benefits of PAAD-IRGS in GSE28735 cohort. *P < 0.05, **P < 0.01, ***P < 0.001.

Figure 7
figure 7

Validation of PAAD-IRGS with GSE62452. (A) Expression level of 3 IRGs in GSE62452 cohort; (B) Kaplan–Meier curves show a better OS in the low-risk group than the high-risk group; (C) diagnostic value of 3 IRGs for PAAD patients in GSE62452 cohort; (D) diagnostic value of PAAD-IRGS in GSE62452 cohort; (E) time-dependent ROC curve analysis of the PAAD-IRGS at 1, 2, and 3 years in GSE62452 cohort; (F) decision curve analysis for evaluating the net benefits of PAAD-IRGS in GSE62452 cohort. *P < 0.05, **P < 0.01, ***P < 0.001.

Hepatobiliary and pancreatic carcinoma were categorized as a unity of clinical disease due to their close anatomical correlation and mutual functional assistance. To verify the universal applicability of the PAAD-IRGS, the TCGA-LIHC data was used to validate the findings. S100A2, S100P and MMP12 were all over expressed in tumor tissues based on paired (P < 0.01) (Fig. 8A) and unpaired expression analysis (P < 0.001) (Fig. 8B). The diagnostic ROC curves also showed their independent and unified diagnostic value for LIHC (S100P: AUC = 0.739; S100A2: AUC = 0.723; MMP12: AUC = 0.773; model: AUC = 0.812, 95%CI = 0.767–0.857) (Fig. 8C,D). LIHC patients had a worse OS in S100P (HR = 1.43, 95% CI = 1.01–2.02, P = 0.44)/S100A2 (HR = 1.81, 95% CI = 1.27–2.57, P = 0.001)/MMP12 (HR = 1.58, 95% CI = 1.11–2.23, P = 0.01) high-expression group (Fig. 8E) and PAAD-IRGS high-risk group (HR = 1.83, 95% CI = 1.29–2.60, P = 0.001) (Fig. 8F). PAAD-IRGS also had a considerable prognostic value for LIHC patients according to ROC analysis (1-year: AUC = 0.651; 2-year: AUC = 0.612; 3-year: AUC = 0.597) (Fig. 8G) and DCA (Fig. 8H). Furthermore, we extracted baseline characteristics of TCGA-LIHC shown in Table 4 and conducted univariate and multivariate COX regression analysis to establish a nomogram based on PAAD-IRGS and multiple clinicopathologic factors (Fig. 8I). T3&T4 stage (P < 0.001), M1 stage (P = 0.017), pathological stage III&IV (P < 0.001), tumor-bearing status (P < 0.001) and PAAD-IRGS (P < 0.001) were significantly correlated with OS. Tumor-bearing status (HR = 1.992, 95%CI = 1.246–3.185, P = 0.004) and PAAD-IRGS (HR = 2.180, 95%CI = 1.180–4.026, P = 0.013) were independent factors impacting the OS of patients with LIHC (Table 5). Nomogram calibration curves (Fig. 8J) showed good predictive accuracy of the model, and DCA (Fig. 8K) confirmed the clinical utility of the nomogram. Consistent with the nomogram of PAAD, the comprehensive nomogram of LIHC showed the best accuracy (C-Index = 0.666, 95%CI = 0.630–0.701) than any other indicator (Table 3).

Figure 8
figure 8

Validation of PAAD-IRGS with TCGA-LIHC. (A) Paired comparison of 3 IRGs expression levels in TCGA-LIHC; (B) unpaired comparison of 3 IRGs expression levels in TCGA-LIHC by including the relevant normal tissues of the GTEx database as controls; (C) diagnostic value of 3 IRGs for TCGA-LIHC patients; (D) diagnostic value of PAAD-IRGS for TCGA-LIHC patients; (E) single-gene survival analysis of OS for TCGA-LIHC; (F) PAAD-IRGS survival analysis of OS for TCGA-LIHC; (G) time-dependent ROC curve analysis of the PAAD-IRGS at 1, 2, and 3 years for TCGA-LIHC; (H) decision curve analysis for evaluating the net benefits of PAAD-IRGS for TCGA-LIHC; (I) an PAAD-IRGS-based nomogram included with 4 clinical components predicting 1, 2, and 3-year OS of TCGA-LIHC; (J) nomogram calibration curve for 1, 2, and 3-year. (K) Decision curve analysis for evaluating the net benefits of nomogram at 1, 2, and 3 years. *P < 0.05, **P < 0.01, ***P < 0.001.

Table 4 Baseline characteristics of patients (TCGA-LIHC).
Table 5 The univariate and multivariate analysis for the OS (TCGA-LIHC).

For the further expanded application of PAAD-IRGS, we found that it performed well in predicting disease-specific survival (DSS) and progression-free interval (PFI) of PAAD patients. Patients in PAAD-IRGS high-risk group had a significantly worse DSS (HR = 2.54, 95%CI = 1.55–4.15, P < 0.001) (Fig. 9A). Time-dependent ROC showed its robust prognostic predictive value (1-year: AUC = 0.730; 2-year: AUC = 0.724; 3-year: AUC = 0.749) and DCA further validated its clinical applicability (C-Index = 0.680, 95%CI = 0.649–0.711) (Fig. 9B,C). We constructed a comprehensive nomogram composed of PAAD-IRGS and clinicopathological factors (Table 6) (Fig. 9D). Its accuracy and efficiency were evaluated (C-Index = 0.775, 95%CI = 0.742–0.808, Table 3) (Fig. 9E,F). Similarly, Patients in PAAD-IRGS high-risk group had a significantly worse PFI (HR = 2.28, 95%CI = 1.53–3.40, P < 0.001) (Fig. 10A). The model had good clinical utility (C-Index = 0.649, 95%CI = 0.618–0.681) (Fig. 10B) and predictive value for prognosis (1-year: AUC = 0.666; 2-year: AUC = 0.723; 3-year: AUC = 0.730) (Fig. 10C). The nomogram based on this model is shown in Fig. 10D using variables summarized in Table 7. The validation analysis results are in Table 3 and Fig. 10E,F (C-Index = 0.742, 95%CI = 0.712–0.771).

Figure 9
figure 9

Establishment and assessment of PAAD-IRGS-based nomograms for DSS in TCGA-PAAD. (A) Single-gene and IRGs signature Survival analysis of DSS in PAAD; (B) the time-dependent ROC curve of the PAAD-IRGS for predicting 1, 2, and 3-year DSS; (C) decision curve analysis for evaluating the PAAD-IRGS; (D) an PAAD-IRGS-based nomogram included with 8 clinical components predicting 1, 2, and 3-year DSS of PAAD; (E) nomogram calibration curve for 1, 2, and 3-year. (E) Decision curve analysis for evaluating the net benefits of nomogram at 1, 2, and 3 years. DSS, disease-specific survival; *P < 0.05, **P < 0.01, ***P < 0.001.

Table 6 The univariate and multivariate analysis for the DSS (TCGA-PAAD).
Figure 10
figure 10

Establishment and assessment of PAAD-IRGS-based nomograms for PFI in TCGA-PAAD. (A) Single-gene and PAAD-IRGS survival analysis of PFI in PAAD; (B) the time-dependent ROC curve of the PAAD-IRGS for predicting 1, 2, and 3-year PFI; (C) decision curve analysis for evaluating the PAAD-IRGS; (D) an PAAD-IRGS-based nomogram included with 7 clinical components predicting 1, 2, and 3-year PFI of PAAD; (E) nomogram calibration curve for 1, 2, and 3-year. (E) Decision curve analysis for evaluating the net benefits of nomogram at 1, 2, and 3 years. PFI progress free interval; *P < 0.05, **P < 0.01, ***P < 0.001.

Table 7 The univariate and multivariate analysis for the PFI (TCGA-PAAD).

Immunity associated analysis of PAAD-IRGS

Tumor-infiltrating immunocytes (TIICs) play an important role in the complex tumor-immune microenvironment and have been shown to influence the progression of various tumors49,50. Thus, we must investigate any relationship between PAAD-IRGS and TIICs in PAAD. We used a lollipop plot to perform the correlation analysis of 24 immune-related cells (Fig. 11A). There was a significant positive correlation between PAAD-IRGS and NK CD56bright cells (r = 0.333, P < 0.001) and Th2 cells (r = 0.367, P < 0.001) and negative correlation with plasmacytoid dendritic cells (pDC) (r = −0.348, P < 0.001) and follicular helper T cell (TFH) (r = -0.344, P < 0.001). However, only B cell, CD4+ T cell and NK cell infiltration levels were correlated with OS of PAAD patients. Patients with high B cell (HR = 0.776, P = 0.0147) or NK cell (HR = 0.788, P = 0.0226) infiltration level had a better OS, while high CD4+ T cell + Th2 cell infiltration level associated with worse OS (HR = 1.36, P = 0.00337) (Fig. 11B). There was no statistically significant difference between high-risk and low-risk groups in patients with PD-1 blocker/CTLA4 blocker/CTLA4&PD-1 blocker or without immune-blocker (Fig. 11C).

Figure 11
figure 11

Analysis of correlation between PAAD-IRGS and immune-related cells and relevant immunotherapy. (A) Lollipop plot of PAAD-IRGS and immune infiltration cells correlation in TCGA-PAAD; (B) survival analysis of immune-related cells infiltration in PAAD; (C) analysis of immunotherapeutic efficiency based on PAAD-IRGS in TCGA-PAAD. IPS immunephenoscore. *P < 0.05, **P < 0.01, ***P < 0.001.

As a supplement, we conducted correlation analysis between immunomodulators and PAAD-IRGS, which were visualized as heatmaps (Figs. 12A, 13A, 14A). For immune-inhibitors, PAAD-IRGS had positive correlation with TGFB1 (r = 0.372, P < 0.001), LGALS9 (r = 0.674, P < 0.001), IL10RB (r = 0.555, P < 0.001) and CD274 (r = 0.227, P = 0.002), negative correlation with KDR (r = −0.330, P < 0.001), CD160 (r = −0.358, P < 0.001), BTLA (r = −0.224, P = 0.003) and ADORA2A (r = −0.243, P = 0.001) (Fig. 12B). For (MHC) molecule, HLA-B (r = 0.271, P < 0.001), HLA-C (r = 0.229, P = 0.002), B2M (r = 0.482, P < 0.001), HLA-A (r = 0.357, P < 0.001), TAP2 (r = 0.302, P < 0.001), TAPBP (r = 0.330, P < 0.001), HLA-F (r = 0.261, P < 0.001) and TAP1 (r = 0.324, P < 0.001) were positively related with PAAD-IRGS (Fig. 13B). As to immune-stimulators, there were 6 genes negatively related with PAAD-IRGS (Fig. 14B) while 15 genes had a positive correlation (Fig. 14C).

Figure 12
figure 12

Analysis of correlation between PAAD-IRGS and immuno-inhibitors. (A) Immune-inhibitor genes—PAAD-IRGS heatmap; (B) correlation analysis of immune-inhibitor genes and PAAD-IRGS. *P < 0.05, **P < 0.01, ***P < 0.001.

Figure 13
figure 13

Analysis of correlation between PAAD-IRGS and MHC molecule. (A) MHC molecule genes—PAAD-IRGS heatmap; (B) correlation analysis of MHC molecule genes and PAAD-IRGS. MHC major histocompatibility complex; *P < 0.05, **P < 0.01, ***P < 0.001.

Figure 14
figure 14

Analysis of correlation between PAAD-IRGS and immuno-stimulators. (A) Immune-stimulator genes—PAAD-IRGS e heatmap; (B) correlation analysis of immune-stimulator genes and PAAD-IRGS (negative). (C) Correlation analysis of immune-stimulator genes and PAAD-IRGS (positive). *P < 0.05, **P < 0.01, ***P < 0.001.

PAAD-IRGS related drugs

TISIDB is a web portal for tumor and immune system interaction, which supports genomics, transcriptomics, and clinical data from TCGA and mechanism, and drug information from public databases. We can only obtain potential drugs associated with PAAD-IRGS, which is demonstrated in a network diagram (Fig. 15). Currently, drugs targeting PAAD-IRGS (S100P, S100A2 and MMP12) remained in the experimental stage, and effective targeted drugs for pancreatic cancer are still in the blank.

Figure 15
figure 15

Prediction of PAAD-IRGS associated drugs. The network plot showed several potential drugs targeted with the IRGs.

Analysis of protein expression of the PAAD-IRGS

We obtained the protein expression pattern of S100P and S100A2 in different cancers based on the HPA database. Expression of S100P in most pancreatic (83.3%) and liver (54.5%) cancers showed moderate to intense cytoplasmic and nuclear staining (Fig.S1A). Immunohistochemistry (IHC) results also confirmed that S100P was highly expressed in PAAD and LIHC than in corresponding normal tissues (Fig.S1B). Although the level of S100A2 protein expression was lower than that of S100P (Fig.S2A), we can still observe the moderate intensity of S100A2 in PAAD and LIHC than in corresponding normal tissues (Fig.S2B). The information on MMP12 in the HPA database was absent, we conducted further verification using the UALCAN database. To be consistent, the protein expression of S100P was higher in PAAD and LIHC than in corresponding normal tissues (P < 0.001) (Fig.S3A), as well as in MMP12 (P < 0.001) (Fig.S3B). Despite the data absent in LIHC, the protein expression of S100A2 was higher in PAAD than in normal tissues (P < 0.01) (Fig.S3C).

Discussions

Although pancreatic cancer is still one of the leading causes of cancer-related death worldwide, some improvements in patient outcomes have been made due to advancements in therapeutics51. Since there are no obvious clinical symptoms in the early stage, pancreatic cancer is usually advanced at diagnosis. Secondly, the high mortality of PAAD seems to be inextricably associated with its suppressed immune microenvironment and significant decrease of T cell infiltration levels in the tumor52. Although immunotherapy has revolutionized the cancer treatment model, PAAD patients rarely respond to these therapies due to poor activation and infiltration of T cells in the tumor-immunity microenvironment (TIME). Recent research has revealed potential epigenetic-transcriptional mechanisms by which tumor cells remodel their TIME and suggested EGFR inhibitors as potential immunotherapy sensitizers in PAAD53. Intra-tumoral IFN-γ-producing Th22 cells were reported to be associated with TNM staging and the worst outcomes in PAAD54. γδ T Cells were also considered to promote pancreatic oncogenesis by restraining αβ T Cells activation55. Each T cell subpopulations secretes different cytokines and chemokines that modulate the immune response in synergistic and opposite ways56. Additionally, expansion of immunosuppressive B cells induced by IL-1β might promote PAAD57, and many extracellular matrix (ECM) components, including collagen, growth factors, cytokines, chemokines, and cancer-associated fibroblast (CAF) play vital role in tumor progression58. All tumor-immunity components in the TIME interact continuously, constructing a complex stroma-tumor crosstalk network. Due to the complexity of tumor-immunity mechanisms, there is still no effective way to predict prognosis in clinical practice. Our study aimed to discover immune-related biomarkers and establish a robust model to predict prognosis in PAAD patients.

The TCGA-PAAD dataset was used to screen for potentially immune-related DEGs , then analyzed for differential expression and intersection. GO and KEGG enrichment analyses were also performed to confirm that the mechanisms involved in these genes were focused on immune-related pathways (Fig. 2D). Furthermore, we narrowed down the results by Lasso regression analysis and obtained three key IRGs finally through the univariate and multivariate Cox regression analysis. The PAAD-IRGS comprised of S100P, S100A2 and MMP12 had an outstanding diagnostic value (Fig. 4D) and accurately predicted the prognosis for PAAD patients (Fig. 5A,B). We specially performed secondary enrichment analysis on PAAD-IRGS, revealing that this model was also associated with pathways of ECM and cell-membrane junction and immune-related pathways (Fig. 4C).

Among the three genes, S100P, a 95-amino-acid protein belonging to the S100 family, was regarded as a promising diagnostic59 and prognostic biomarker60 for pancreatic cancer with a potential mechanism of regulating invasion into the lymphatic endothelial monolayer61, which is consistent with our results. S100A2, another member of the S100 family, was reported as a prognostic biomarker involved in immune infiltration and immunotherapy response prediction in pancreatic cancer62, which matches our findings. Turn to MMP12, as one of the members of the matrix metalloproteinases family, it encodes extracellular matrix participating in the (EMT) which was identified as a strictly programmed shift playing a crucial role in tumor invasion and metastasis63. MMP12 was also revealed to be a potential diagnostic biomarker for pancreatic carcinoma. Its up-regulation was associated with a poor prognosis64. These genes were verified to be closely correlated with different cancers, especially the diagnosis and prognosis of PAAD, a finding we also made in this work. Although many types of diagnostic or prognostic biomarkers, and even to some extent predictive models have been identified in recent studies60,65,66,67,68, we discovered three IRGs with high specificity. We integrated them to establish a novel prognostic model for PAAD. Compared with other models, our model had an extremely remarkable performance on both diagnosis and prognosis prediction in PAAD patients.

In our study, patients in the PAAD-IRGS high-risk group had a significantly worse OS than those in the low-risk group (Fig. 4F), indicating that the PAAD-IRGS score may be an independent risk factor when evaluating the prognosis of PAAD patients. Additionally, time-dependent ROC and DCA results (Fig. 5A,B) showed that PAAD-IRGS had a good performance in prediction prognosis. The nomogram integrating PAAD-IRGS and multiple clinicopathological variables showed better accuracy and reliability than any singular variable (Table 3).

We not only evaluated and validated the PAAD-IRGS by using two datasets of pancreatic cancer from GEO, but also investigated its application to hepatocellular carcinoma. Hepatobiliary and pancreatic diseases are often classified into the same category since they are anatomically and functionally linked. Although the cholangiocarcinoma dataset of TCGA was discarded due to its small sample size, we found that the PAAD-IRGS had excellent diagnostic and prognostic value on LIHC patients. We also combined relevant clinicopathological variables with PAAD-IRGS to construct a comprehensive nomogram model, which showed good accuracy and robustness. Based on the results, we might speculate whether the three genes participated in the oncogenesis, progression and metastasis of LIHC and PAAD partially or collectively. However this needs further exploration. We also looked into using PAAD-IRGS to predict DSS and PFI in patients with PAAD. The results of PAAD-IRGS and the relevant prognostic model were encouraging. Unlike other biomarkers that only had diagnostic value, PAAD-IRGS had the dual capability to predict diagnosis and prognosis with high accuracy. Several multiple-genes prognostic model have been established and reported67,68. Compared with them, our model had outstanding general applicability with high accuracy and stability. As to the miRNA or lncRNA-related signatures65,69,70,71, our PAAD-IRGS was more stable and convinced; Compared with multiple-gene signatures9,68, necroptosis-related gene signature72 and m6A-related gene signature73,74, which had been reported, our PAAD-IRGS was new and more versatile with outstanding performance. It can be well applied to prognostic prediction of multiple cancers with different prognostic parameters. Its good diagnostic ability for various cancer and its relationship with tumor-immunity would make it promising for further research.

There were several limitations of this research to be concerned about. The limitation to this study worth noting include: there may be an effect on the result due to batch effect and differences in sample sizes that are difficult to eliminate completely. Secondly, although the prognostic value of the PAAD-IRGS was evaluated in multiple datasets, large-scale clinical research is still necessary for further validation. Thirdly, we conducted correlation analyses between PAAD-IRGS and immune-related cells/immunomodulators and disclosed some potential immune-related targets. However, the underlying mechanisms and pathways need further investigation and experiment validation.

In conclusion, our study established a novel prognostic model comprised of three genes with high specificity for predicting prognosis in patients with PAAD. This model demonstrated excellent performance in predicting both diagnosis and prognosis. Since PAAD-IRGS can be generalized, it may be a beneficial predictive model in clinical practice.