Background

Melanoma is a type of skin cancer with a high mortality rate. In 2018, 287,732 new cases and 60,712 deaths of melanoma were registered worldwide [1]. Cutaneous melanoma, which accounts for over 90% of melanoma cases, remains one of the most aggressive forms of skin cancer and shows an increasing incidence and mortality rate globally [2, 3]. Improving prognosis of cutaneous melanoma patients has important implications for a better management of the disease. Routine prognosis method uses clinicopathologic features including Breslow tumor thickness, ulceration, mitotic index, Clark level, and AJCC (The American Joint Committee on Cancer) pathologic tumor stage [4, 5]. Whether such method can be improved with the addition of WSI or high-throughput sequencing data is under active investigation.

Many previous studies had attempted to develop prognostic models using different types of variables including clinicopathologic, mutation, mRNA, microRNA, and methylation variables. Zhao et al. [6] identified a 25-gene signature that can effectively estimate the level of immune cell infiltration in melanoma, providing a robust biomarker significantly related to survival outcome (disease-specific survival, post-recurrence survival, or overall survival). Multidimensional omics data were also utilized to provide more accurate prediction. Jayawardana et al. [7] developed models to classify 1-year and 4-year survival status based on clinicopathologic, mutation, mRNA, microRNA, protein information, and their different combinations. They identified that models based on the combination of clinicopathologic variables and mRNA expression profile performed the best under a cross-validation framework. Jiang et al. [8] used sparse PCA and partial least squares methods to take whole multidimensional omics profiles into consideration. Their methods showed a significant increase of C-index values of overall survival prediction. However, these studies had not extended to including WSI features, while studies that did use image data were not focused on the prediction of survival outcomes. For example, Lu et al. [9] proposed a diagnostic model based on epidermis segmentation, keratinocytes segmentation, melanocytes detection, and feature construction on whole slide images. This technique achieved a classification accuracy of 90% for skin tissue malignancy. Failmezger et al. [10] analyzed the spatial association between different types of nodes in WSI. They identified that two stromal features (stroma clustering and stromal barrier) had significant coefficients in a Cox model. To our knowledge, few studies proposed pathology image-based prediction models for recurrence-free survival analysis.

As the treatment of cutaneous melanoma has improved over years, especially since the advent of targeted immunotherapy [11, 12], the RFS measurement has gained increasing importance for post-surgery management of melanoma patients. An accurate RFS analysis can decrease the cancer-related death rate by not only personalizing treatment options but also prompting active cancer surveillance at an early stage for specific risk groups. On the other hand, as recurrence events have already been used as an effective endpoint for cancer clinical trials, a higher accuracy of RFS measurements allows precise enrollment of patients that are more likely responsive to new therapies.

Our work hence presented a RFS model based on baseline variables and texture features extracted from WSI. This model provided a higher accuracy in predicting recurrence-free survival than baseline variables-based model or models with other types of data integration. The extracted WSI features contributed to the improved prediction by capturing variations in immune-related gene expressions.

Methods

Patients and samples

This study was performed using 152 patients from the TCGA-SKCM [13] (The Cancer Genome Atlas Skin Cutaneous Melanoma) project. We enrolled patients that have complete information in AJCC stage, the dominant clinicopathologic variable used for RFS analysis (Additional file: Table S4). The enrolled patients should also have non-missing values in tumor location (metastatic or locoregional), WSI images, high-throughput gene expression data, and tumor recurrence-related follow-up. The study work flowchart is shown in Fig. 1. For each patient, the recurrence status was defined as 1 for those who had experienced recurrence and as 0 for censoring. The censoring time was set as the death time if one had the record of death or as last follow-up time if otherwise.

Fig. 1
figure 1

The workflow used in this study

We collected 4 types of data for RFS analysis: clinicopathologic variables, high-throughput gene mutational profile, high-throughput gene expressional profile, and WSI features. For clinicopathologic variables, 4 covariates with complete records were included age at diagnosis, gender, primary location, and AJCC pathologic tumor stage. We selected the mutation status of NRAS and BRAF genes to represent the gene mutation data as previously described [7]. These clinicopathologic variables and gene mutations were used as the baseline variables for following model development. The distributions of these variables are summarized in Table 1. For gene expression, we used the FPKM (fragments per kilobase of exon model per million reads mapped) value to represent gene expression levels. A total of 5277 genes were selected for further analysis.

Table 1 Summary of distributions of baseline variables

Whole slide image processing and feature extraction

All the melanoma tissue slides were stained by hematoxylin and eosin (H&E) and scanned by Aperio Digital Pathology Slide Scanner. Ten slides were magnified 20 times (× 20) and the other 142 slides were magnified 40 times (× 40). We extracted 3 types of texture features in 2 regions of interest (ROI): global and nucleus ROI, respectively. This process consisted of two steps: region segmentation and feature extraction. An illustration of these procedures is shown in Additional file: Figure S11. More details on the region segmentation can be found in the supplementary methods.

For feature extraction, we extracted 24 GLCM (gray level cooccurrence matrix) features, 16 GLRLM (gray level run length matrix) features, and 16 GLSZM (gray level size zone matrix) features [14] from each ROI. For nucleus ROIs from the same WSI, we calculated their mean, standard deviation, range, and disorder as the summary statistics [15]. This resulted in a total of 224 nucleus features and 56 global features. Image processing and feature extraction were performed by Python 3.7 and packages including “Pyradiomics” [14].

Modeling and evaluation

We performed QR decomposition-based method [16] to reduce linear dependencies among the gene expressional profile before model fitting. In addition to the three types of features mentioned above, we also combined different types of features as new feature sets for model development. The number of features in each set is summarized in Additional file: Table S1.

We performed threefold cross validation for models developed based on each feature set. In each fold, a lasso Cox model was trained on the training set and was tested on the validation set. Due to the limited sample size, we used C-index computed from validation data to evaluate model performance. We also computed the time-dependent AUCs [17] from day 31 (5% percentile) to day 4631 (95% percentile) to evaluate the performance at each time point. The model development and comparison were performed by R version 4.1.

Differential gene expression analysis

The differential gene expression analysis was performed using edgeR [18] package, and the gene ontology enrichment analysis was performed by GOseq [19] or clusterProfiler [20]. More details on the data processing and parameter setting can be found in the supplementary methods.

Results

Workflow and patient characteristic

The study workflow is shown in Fig. 1. We performed RFS modeling based on three types of variables: baseline variables (including clinicopathologic variables, and mutation status of NRAS and BRAF genes), gene expressional profile and WSI features. We developed 7 sets of features based on these three types of data (Additional file: Table S1). The model performance was evaluated by C-index and time-dependent AUC on threefold cross validation. We further presented the potential application of the best model under clinical setting. We also performed differential gene expression analysis between the higher and lower-risk subgroups stratified by our model for AJCC stage III patients.

We enrolled a total of 152 patients with complete information in recurrence-related survival time, baseline variables, gene expression data, and WSI features. A total of 82 patients had experienced recurrence, while 65 had last follow-up time and 5 died without recurrence. We performed Kaplan-Meier estimation for all patients and the RFS probability curve is shown in Additional file: Figure S2. The median survival time was 1757 days.

Model comparisons

We compared the performance of models developed based on each single type of data to assess the prognostic power of single-type feature sets. The C-index of models based on baseline variables (mean/std = 0.654/0.014) was the highest (Additional file: Table S2, Figure S3). The models based on gene expression or WSI features had a slight difference (mean/std of C-index, expr = 0.639/0.039; im = 0.635/0.033). To evaluate the prediction accuracy at each time point, we computed the time-dependent AUC on the validation results. As shown in Fig. 2a, the models based on baseline variables showed obvious superiority until about day 2500. In contrast, the model based on WSI features had increasing prediction accuracy since about day 1500. This motivated the combinatorial modeling, as combing baseline and WSI feature in survival prediction might utilize the prediction advantage of single-type data-based model within specific time intervals.

Fig. 2
figure 2

The time-dependent AUC of the models developed based on single-type or multi-type data. a Represents the time-dependent AUC of the models based on single-type data. b Represents the time-dependent AUC of the models based on multi-type data. Abbreviations: “cln”, baseline variables-based model; expr, models based on gene expression; “im”, models based on WSI features; “cln_im”, models based on baseline variables and WSI features; “cln_expr”, models based on baseline variables and gene expression; “expr_im”, models based on gene expression and WSI features; “cln_expr_im”, models based on baseline variables, gene expression, and WSI features

We then developed models based on combinations of different types of data, and compared their prediction performance. For WSI image analysis, we extracted texture features from global regions or segmented nucleus regions (Additional file: Figure S1). As summarized in Additional file: Tables S2, the best C-index (mean/std = 0.772/0.029) was achieved by the model combining baseline variables and WSI features (Additional file: Figure S3). As shown in Additional file: Figure 2B, such model also had the best performance at almost every time point as measured by time-dependent AUC (mean/std of time-dependent AUC = 0.785/0.038), indicating a clear benefit of data integration. We therefore used this feature set and the optimal penalty (Additional file: Figure S3) to develop a lasso Cox model on all patients. This model was used as the final prognostic model, and the coefficients of selected features were showed in Additional file: Table S3.

The image-based prognostic model

The proposed model included 20 WSI features and 5 baseline variables. For the WSI features, 14 of them were extracted from nucleus ROIs and 6 were from global ROIs. The computational formula of each feature was shown in supplementary methods. A positive value of coefficient represented that the hazard of recurrence would increase with the feature values. As shown in Additional file: Table S3, the 3 largest absolute values of the coefficients were from GLRLM (RunEntropy_std and ShortRunEmphasis_range) and GLCM (Idn_range) in nucleus ROIs. Both the RunEntropy and ShortRunEmphasis were the measurement of the distribution of run lengths. The Idn (inverse difference normalized) quantified the local homogeneity within nucleus ROI, which could be low value if there was necrosis or dissolution of nucleus. Since the standard deviation or range of these features was significant predictor in the model, we inferred that the variance of nuclei shape, surrounding textures, and homogeneity contributed to an effective prognostic analysis. For the features extracted from global ROIs, the glcm_Idmn and glszm_LargeAreaHighGrayLevelEmphasis had the largest absolute coefficients. The Idmn (inverse difference moment normalized) also measured the local homogeneity, while the LargeAreaHighGrayLevelEmphasis computed the emphasis of regions with large area and high gray level. Both of the features were indicators of the tumor region size.

Risk stratification

To illustrate potential applications of the prognostic model, we computed a risk score by summing over the product of model features and their coefficients. To compare the risk score with traditional prognostic variables, we also computed risk scores for models based on AJCC pathologic tumor stage or baseline variables. We used each risk score to fit a univariate Cox PH model on 5 subgroups of patients (all, AJCC stage < III, AJCC stage ≥ III, metastatic or locoregional) and applied likelihood ratio test to assess the significance of the score. As summarized in Additional file: Table S4, both the likelihood ratio test statistic and its p value showed that adding WSI features to baseline variables greatly improved the prediction accuracy.

To illustrate the independent prognostic value of image-based risk score, we set the median of risk score as the threshold and stratified all the patients into a higher or lower-risk group. We then performed Kaplan-Meier estimation for each group. As shown in Fig. 3a, the survival distributions of the two risk groups characterized among all patients were significantly different (log-rank p value < 0.0001). The median survival time of the two risk groups were 678 days and 3716 days, respectively. We also estimated the survival probability within 4 specific pathologically-defined subgroups of patients (Fig. 3). The survival distributions were all significantly different for each subgroup. The median survival time of each subgroup was shown in Additional file: Table S5. Of note, the lower-risk group in patients with severe stage (AJCC stage ≥ III) had a median RFS time of 5354 days, which is 1.54 times longer than that of with mild or moderate stage (AJCC stage < III), who had a median RFS time of 3488 days. As shown in Additional file: Figure S4, this risk score could significantly stratify the higher and lower-risk group for overall survival as well.

Fig. 3
figure 3

The recurrence-free survival probability of subgroups stratified by the risk score. a Represents all the patients. b For those in AJCC stage < III. c For those in AJCC stage ≥ III. d For those with metastatic tumors. e For those with locoregional tumors

Differential risk-related enrichment of gene sets

To explore the molecular mechanisms underlying the superiority of image-based prognostic model, we performed differential gene expression analysis between the higher (43 patients) and the lower-risk (15 patients) group characterized by our model for patients within AJCC stage III. To reduce the biological variation within each risk group, we calculated the Spearman correlation coefficients between any pairs of samples and filtered out those with a correlation coefficient smaller than 0.85 with more than 20 other samples among the higher-risk group, or with more than 7 other samples among the lower-risk group. This resulted in 9 lower-risk patients and 21 higher-risk patients for further analyses.

A total of 188 downregulated and 28 upregulated genes were identified as significantly differentially expressed. We found 226 enriched BP (biological process), 18 enriched CC (cellular component), and 12 MF (molecular function) GO terms. As shown in Additional file: Figure S5, the most majority of top 20 enriched BP terms was involved in immune response, immune cell activation and proliferation. In particular, the T cell activation and proliferation-related pathways were significantly enriched. These enriched BP terms were also identified by the clusterProfiler package as shown in Additional file: Figure S8, which suggested that the variation in the regulation of T cell activation was the potential key driver for differential risk. For CC terms, the T cell receptor-related GO terms were significantly enriched as shown in Additional file: Figure S6 and S9. For MF terms, MHC protein binding and cytokine activity-related GO were associated with the risk stratification (as shown in Additional file: Figure S7 and S10). These enriched GO terms suggested that the selected image features accounted for the variations of T cell activities and hence provided a more accurate assessment of disease progression.

Discussion

In this study, we performed recurrence-free survival analysis, and developed lasso Cox prediction models based on different types of data (baseline variables, gene expression, and whole slide image features). Accurate RFS measurements have become increasingly important as the treatment of cutaneous melanoma has significantly improved over years. A reliable RFS prediction could thus assist in more precision treatment selection for melanoma patients. Our evaluation criteria included C-index and time-dependent AUC. We identified that combining baseline variables and WSI features achieved the best prediction performance (cross validation C-index, 0.772/0.029; time-dependent AUC, 0.785/0.038). We showed that models trained on single-type data could have varied prediction accuracy within specific time intervals. We were then motivated to combine different types of data to develop a model that achieved uniformly best prediction accuracy at most majority of time after initial diagnosis. We also showed that this combinatorial model provided significant risk stratifications within specific subgroups defined by metastatic status or AJCC pathological tumor stages. We found from gene expressional profiles that T cell activities were significantly associated with the differential risk determined by the image-based prognostic model.

T cell activation is closely related to the adoptive or targeted immunotherapies. Such immunotherapies either provide co-stimulatory signals to trigger T cell proliferation, or block inhibitory molecules to unleash the anti-tumor T cell activities [21]. Our findings indicate that, even within specific risk groups (e.g., AJCC stage III) as defined by routine clinicopathological variables, it is still possible to further identify subgroups characterized with the variation in T cell activation using WSI features. Therefore, whether such subgroups have differential responses to immunotherapy, such as PD-1 or CTLA4-based immune checkpoint therapies, warrants further investigation.

Our study provided a cost-effective way to evaluate the prognosis by only taking baseline variables and automatically generated WSI features as the input. We showed that many top enriched GO terms were related to immune response, T cell activation and proliferation. Genes related to T cell development and function were also identified by Jiang et al. [8] as significantly associated with overall survival by analyses performed based on clinicopathologic variables, methylation, CNA (copy number alteration), gene expression, and mutational data. Pastorfide et al. [21] and others [22, 23] also proved that lymphocytic infiltrates were associated with metastasis and survival outcome by artificial analyses of histologic images. These findings were consistent with our study, but we used computer-aided image processing instead to automatically quantify such information from routinely made WSI.

Our study has limitations. First, due to a large number of missing values, we only enrolled 152 patients with complete information in our study. The best model was determined by threefold cross validation without being tested on an independent dataset. Second, the quality of pathology images varied. The variation in staining intensity and marker-pen pollution were the two most frequent problems. The former problem could affect nucleus segmentation, and we mitigated this by segmenting ROIs centering around the localized nucleus region. We also manually checked each ROI to remove polluted ones. Third, we did not evaluate the effects of interaction between different types of data when performing data integration. Fourth, we did not include some other routinely used prognostic variables, such as Breslow tumor thickness, ulceration, mitotic rate, and Clark levels, due to a large proportion of missing values (Additional file: Table S8). These variables might improve the prognostic prediction performance of our models.

For the future extension of our study, we will perform external validation of the prediction model using multi-center retrospective cohort data. Another key point that needs to be addressed in future studies is the assessment of effect of clinicopathological variables and treatment on RFS prediction using more complete data. Notwithstanding these limitations, we believe our method has potential for clinical translation to reduce the level of heterogeneity in RFS for cutaneous melanoma patients. This could influence the treatment selection for patients, as well as the patient enrollment for related clinical trials. Moreover, considering the capacity of WSI texture features in capturing immune cell activity in this study, we believe that WSI texture features could also have prognostic values in other types of cancer, which warrants exploration in future studies.

Conclusions

In summary, we developed an image-based combinatorial model and demonstrated its prediction ability for recurrence-free survival. The model includes 20 automatically generated WSI features, 3 clinicopathologic variables, and mutation status of 2 genes. We hence provided a cost-effective prognostic model as a substitute for gene expression profiling-based prognosis methods.