Data source and processing
Expression data and corresponding clinical follow-up information from the TCGA-PAAD data set were downloaded using the UCSC genome browser database. GSE62452 chip data sets with survival time were selected from Gene Expression Omnibus (GEO) database. The GSE62452 data set is Microarray gene-expression profiles of 69 pancreatic tumors and 61 adjacent non-tumor tissue from patients with pancreatic ductal adenocarcinoma. ([HuGene-1_0-st] Affymetrix Human Gene 1.0 ST Array [transcript (gene) version]). Henceforth, samples that lacked clinical follow-up information were removed and the expression of multiple gene symbols was considered the median value.
Molecular subtype identification using the non-negative matrix factorization algorithm
Firstly, 60 ferroptosis-related genes were retrieved from the literature [15,16,17,18] (Table 1). Next, 58 ferroptosis-related genes with gene expression data were matched with the TCGA-PAAD data set, and PAAD samples were clustered by non-negative matrix factorization (NMF). The standard "Lee" was selected in the NMF method, and ten iterations were performed. The cluster number ‘k’ was set at 2–10, the average contour width of the common member matrix was determined by the R package "NMF", and the samples were divided into three categories.
Table 1 The clinical statistical information of the samples Identification and functional analysis of differentially expressed genes (DEGs)
The limma_3.42.2 package [19] was used to analyze the differentially expressed genes (DEGs) in cluster 1, cluster 2, and cluster 3 among the molecular subtypes, based on the threshold false discovery rate (FDR) < 0.05 and |log2FC|> 0.5 filters. The DEGs shared by the three clusters were identified, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and gene ontology (GO) functional enrichment analysis were performed on them through the R package clusterProfiler (v3.16.1) (Additional file 1).
Comparative analysis of immune scores among molecular subtypes
The single-sample gene set enrichment analysis (ssGSEA) method of the GSVA (Gene Set Variation Analysis) package was used to identify the immune score-based relationships among the molecular subtypes in the TCGA-PAAD data set. The scores of 28 immune cells were assessed [20] and then the differences in immune scores among the molecular subtypes were compared.
Training set and internal test set construction
A total of 173 samples in the TCGA-PAAD data set were divided into a training set and a test set. To prevent the random allocation bias from affecting the stability of subsequent modeling, all samples were put back into random grouping, two hundred times in advance. Using the training data set, a univariate Cox proportional risk regression model was constructed for ferroptosis-related genes (n = 60) and survival data was constructed using the coxph function with survival R package. Since 58 ferroptosis-related genes had expression profile data in our data set, only these were selected for the univariate Cox regression analysis, and P < 0.05 was selected as the threshold for filtering. The R software package “glmnet_4.10–1” [21] was used to carry out the LASSO Cox regression analysis. We first analyzed the changing trajectory of each independent variable, and later used the fivefold cross-validation to build a model and analyze the confidence interval under each lambda. The target genes were selected by multivariate Cox regression analysis, and a prognostic Kaplan–Meier (KM) curve was established.
3-Gene signature robustness in different data sets
The risk scores of each sample were calculated separately based on the expression level of the sample. KM-curve analysis showed significant differences between the high and low expression groups. Furthermore, we used the R software package timeROC_0.4 [22] to conduct ROC analysis of the prognostic classification of the risk score and analyze the prognostic classification efficiency at 1-year, 3-years, and 5-years. The model and the survival coefficient, developed using the training dataset, were adopted to evaluate the entire TCGA-PAAD data set, calculate the risk score of each sample and establish the risk score distribution of the samples. The independent GSE62452 data set was used to analyze the robustness of the model.
Univariate and multivariate analyses of the 3-gene signature
To identify the independence of the 3-gene signature model in clinical applications, we performed Cox regression analysis on the TCGA-PAAD training dataset. Based on the results of univariate and multivariate analyses, we used the TCGA-PAAD training dataset to construct a histogram. In addition, corrected curves were used to analyze the prediction accuracy of nomogram at 1, 3, and 5 years.
Tissue samples
PAAD tissues were derived from surgically resected specimens and snap-frozen in liquid nitrogen until RNA extraction. None of the patients received chemotherapy or radiation therapy before surgery. All patients signed informed consent forms provided by the Eastern Hepatobiliary Surgery Hospital. This study was approved by the Ethics Committee of the Eastern Hepatobiliary Surgery Hospital.
RNA isolation and RT-qPCR analysis
RNA was extracted from tissues using the TRIzol reagent (Invitrogen, Carlsbad, CA, USA) and was reverse-transcribed into cDNA using the QuantiTect Reverse Transcription Kit (Qiagen, Valencia, CA, USA). Quantitative PCR (qPCR) uses real-time fluorescence to measure the quantity of DNA present at each cycle during a PCR. Real-time qPCR analyses were quantified with SYBR-Green (Takara, Otsu, Shiga, Japan), and expression levels were normalized to GAPDH levels.
Immunohistochemistry
Immunohistochemistry was performed by two-step method according to the instructions (PV-9000; ZSGB-BIO, Beijing, China). Pancreatic cancer samples were fixed in 10% formalin, embedded in paraffin, and processed into 5-µm sequential sections. The samples were de-waxed with ethanol and blocked to inhibit the endogenous peroxidase activity. After this, samples were heated in a microwave for antigen retrieval, cooled to room temperature, and blocked using goat serum for 30 min at 37 °C. The samples were incubated overnight at 4 °C with rabbit anti-ALOX5 (ab169755), anti-ALOX12 (ab211506), and anti-CISD1 (ab203096) (Abcam, USA) (1:200), followed by incubation with horseradish peroxidase-coupled goat anti-rabbit secondary antibody (PV-9000; ZSGB-BIO, Beijing, China) at 37 °C for 30 min. The samples were then stained with 3,3′-Diaminobenzidine (DAB). Cell nuclei were stained blue with hematoxylin. The sections were then dehydrated, cleared with xylene, and mounted. ALOX5, ALOX12, and CISD1 expressions were determined by immunohistochemistry (IHC) using the streptavidin peroxidase method, with adjacent tissues serving as the controls. The experimental procedure was performed as per the manufacturer’s instructions. Image-Pro Plus 6.0 Software (Media Cybernetics, USA) was used to analyze protein expression and perform statistical analysis of the results obtained by IHC.
Cell culture and transfection
The human PAAD cell line T3M4 and Panc 02.03 were provided by the National Collection Authenticated Cell Cultures (Shanghai, China). The T3M4 cell lines are derived from the metastatic lymph node tissue of human pancreatic cancer and are epithelial-like cells. The T3M4 cell lines were cultured in DMEM (Dulbecco’ modified eagle medium) (Gibco, Grand Island, NY, USA). Supplemented with 10% fetal bovine serum (Invitrogen, San Diego, CA, USA) at 37 °C under 5% CO2 in a humidified incubator. The Panc 02.03 cell lines are derived from human primary pancreatic cancer and are epithelial-like cells. The Panc 02.03 cell lines were cultured in RPMI-1640 (Gibco, Grand Island, NY, USA) with 15% fetal bovine serum at 37 °C under 5% CO2 in a humidified incubator. Si-ALOX5 (No: CAT#: SR319325) was purchased from Origene (Beijing, China). Transfection was performed using Lipofectamine 3000 reagent (No. L3000015, Invitrogen, China) according to the instructions and cell transfection efficiency was 82%.The human ALOX12 and CISD coding sequences were cloned into the pEZ-M03 Vector.
Cell viability assays
The si-ALOX5 transfected in T3M4 cell lines, and ALOX12 or CISD transfected in Panc 02.03 cell lines. Forty-eight hours post infection, the cells were collected and seeded into 96-well plates at a concentration of 2000 cells per well. Cell viability was detected by Cell Counting Kit-8 assay (CCK-8, Dojindo, Japan) according to the manufacturer’s protocol after 48 h. Te absorbance at 450 nm was measured using an automatic microplate reader (BioTek, Winooski, VT, USA). All Cell Counting Kit-8 assay were performed in five times.
Cell migration and invasion assays
The si-ALOX5 transfected in T3M4 cell lines, and ALOX12 or CISD transfected in Panc 02.03 cell lines. Forty-eight hours post infection, the cells were collected. For the migration assay, 800 μl DMEM with 20% serum was added to the lower chamber of a Transwell plate (Corning, NY, USA), and 1.5 × 105 cells were added in the upper chamber. The cells were harvested, resuspended in serum-free media and placed into the upper chamber of a Transwell membrane filter (Corning, NY, USA) for the migration assays or in the upper chamber of a transwell membrane filter coated with Matrigel (Corning) for the invasion assays.The T3M4 and Panc 02.03 cells were added in the upper chamber. After incubation for 24 h at 37 °C, the Transwell chamber was removed. The cells were stained with methanol and 0.1% crystal violet, imaged, and the relative cell density was measured by ImageJ (National Institute of Health, USA). ImageJ software was used to analyze and calculate the migration and invasion area of the cells. The migration and invasion index (%) depicts a proportion of an area where cells have invaded in percentage and is calculated as epithelium area divided by the total area of invaded cells area. The area describes the overall area (μm2) of invaded cells. Evaluation of invasive capacity was performed by counting invading cells under a microscope (40 × 10). Five random fields of view were analyzed for each chamber. All cell migration and invasion assays were performed in five times.
Statistical analysis
All data were analyzed using the SPSS 21.0 statistical software program (IBM Corporation, Armonk, NY, USA). Graphs were generated with GraphPad Prism 8.0 software (GraphPad Software, Inc., San Diego, CA, USA). Student’s t-tests were performed. For a two-tailed t-test, P < 0.05 was essential for considering the results to be statistically significant.