Prediction of recurrence risk in endometrial cancer with multimodal deep learning

Volinsky-Fremond, Sarah; Horeweg, Nanda; Andani, Sonali; Barkey Wolf, Jurriaan; Lafarge, Maxime W.; de Kroon, Cor D.; Ørtoft, Gitte; Høgdall, Estrid; Dijkstra, Jouke; Jobsen, Jan J.; Lutgens, Ludy C. H. W.; Powell, Melanie E.; Mileshkin, Linda R.; Mackay, Helen; Leary, Alexandra; Katsaros, Dionyssios; Nijman, Hans W.; de Boer, Stephanie M.; Nout, Remi A.; de Bruyn, Marco; Church, David; Smit, Vincent T. H. B. M.; Creutzberg, Carien L.; Koelzer, Viktor H.; Bosse, Tjalling

doi:10.1038/s41591-024-02993-w

Prediction of recurrence risk in endometrial cancer with multimodal deep learning

Article
Open access
Published: 24 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue Submit your manuscript

Prediction of recurrence risk in endometrial cancer with multimodal deep learning

Download PDF

Sarah Volinsky-Fremond ORCID: orcid.org/0000-0002-8195-5735¹,
Nanda Horeweg ORCID: orcid.org/0000-0002-8581-4753²,
Sonali Andani^3,4,5,
Jurriaan Barkey Wolf ORCID: orcid.org/0000-0002-7811-0280¹,
Maxime W. Lafarge⁴,
Cor D. de Kroon⁶,
Gitte Ørtoft⁷,
Estrid Høgdall⁸,
Jouke Dijkstra ORCID: orcid.org/0000-0002-8666-3731⁹,
Jan J. Jobsen¹⁰,
Ludy C. H. W. Lutgens¹¹,
Melanie E. Powell¹²,
Linda R. Mileshkin ORCID: orcid.org/0000-0001-6826-6014¹³,
Helen Mackay¹⁴,
Alexandra Leary¹⁵,
Dionyssios Katsaros¹⁶,
Hans W. Nijman¹⁷,
Stephanie M. de Boer²,
Remi A. Nout¹⁸,
Marco de Bruyn ORCID: orcid.org/0000-0001-9819-9131¹⁷,
David Church^19,20,
Vincent T. H. B. M. Smit¹,
Carien L. Creutzberg²,
Viktor H. Koelzer ORCID: orcid.org/0000-0001-9206-4885^4,21^na1 &
…
Tjalling Bosse ORCID: orcid.org/0000-0002-6881-8437¹^na1

11k Accesses
27 Altmetric
4 Mentions
Explore all metrics

An Author Correction to this article was published on 01 July 2024

This article has been updated

Abstract

Predicting distant recurrence of endometrial cancer (EC) is crucial for personalized adjuvant treatment. The current gold standard of combined pathological and molecular profiling is costly, hampering implementation. Here we developed HECTOR (histopathology-based endometrial cancer tailored outcome risk), a multimodal deep learning prognostic model using hematoxylin and eosin-stained, whole-slide images and tumor stage as input, on 2,072 patients from eight EC cohorts including the PORTEC-1/-2/-3 randomized trials. HECTOR demonstrated C-indices in internal (n = 353) and two external (n = 160 and n = 151) test sets of 0.789, 0.828 and 0.815, respectively, outperforming the current gold standard, and identified patients with markedly different outcomes (10-year distant recurrence-free probabilities of 97.0%, 77.7% and 58.1% for HECTOR low-, intermediate- and high-risk groups, respectively, by Kaplan–Meier analysis). HECTOR also predicted adjuvant chemotherapy benefit better than current methods. Morphological and genomic feature extraction identified correlates of HECTOR risk groups, some with therapeutic potential. HECTOR improves on the current gold standard and may help delivery of personalized treatment in EC.

Integration of clinical features and deep learning on pathology for the prediction of breast cancer recurrence assays and risk of recurrence

Article Open access 14 April 2023

Clinical evaluation of deep learning-based risk profiling in breast cancer histopathology and comparison to an established multigene assay

Article Open access 09 April 2024

Histopathologic image–based deep learning classifier for predicting platinum-based treatment responses in high-grade serous ovarian cancer

Article Open access 18 May 2024

Main

EC is the most common gynecological malignancy in high-income countries and is increasing in incidence¹. Although most women with localized disease are cured by surgery, 10–20% develop distant recurrence², which is typically incurable. Adjuvant chemotherapy can reduce this risk, at the expense of toxicity^3,4. Thus, current guidelines recommend such adjuvant treatment based on a combination of clinicopathological risk factors (for example, histological subtype, grade, lymphovascular space invasion (LVSI), FIGO (International Federation of Gynaecology and Obstetrics) tumor stage) and, if available, the molecular classification of EC. The last identifies patients with favorable and unfavorable outcomes defined by POLE mutation (POLEmut) or p53 abnormality (p53abn), respectively, and intermediate outcomes characterized by mismatch repair deficiency (MMRd) or no specific molecular profile (NSMP)^5,6,7,8. Recent efforts have been made to combine clinicopathological and molecular factors⁹; however, in practice, challenges remain as a result of the complexity of combining an increasing number of factors, high-interobserver variability in the assessment of histopathological factors, and costs and turnaround-times of molecular testing. In addition, histological slides contain lots of visual information, some with prognostic potential¹⁰, that is only partly captured in the grading and tumor histotyping by pathologists.

Deep learning (DL) models, including those using digitized hematoxylin and eosin (H&E)-stained tumor slides, have shown great promise in the prediction of molecular alterations^11,12,13, cell composition¹⁴ and prognosis^{15,16,17,18,19,20,21}, outperforming standard pathologist-based assessment. This is particularly true of the latest generation of self-supervised learning and whole-slide image (WSI) prediction DL models, which use attention-based networks²², graphs^15,19 or (vision) transformers^23,24 to provide more granular and interpretable image representation. In addition, multimodal DL models for prognosis prediction are promising to outperform unimodal approaches that solely rely on the morphological information provided by H&E WSIs^16,21. We previously developed a DL model, image-based (im) four molecular classes in EC (im4MEC), to accurately predict the molecular EC classification from tumor H&E WSIs, and showed that image-based molecular classes predicted prognosis¹¹. Others have classified EC binary recurrence²⁵ or used uni-/multimodal DL models to predict EC overall survival^15,16,19,21 (concordance indices (C-indices) of 0.629–0.687), but these have relied on more detailed tumor profiling, such as multiplex immunofluorescence staining²⁵ or the combination of H&E WSIs with genomic and/or transcriptomic data¹⁶, neither of which is deliverable in clinical practice at present. Thus, there remains a pressing unmet need for a method that can predict EC distant recurrence from input data generated as part of routine clinical diagnostics.

In the present study, we report the development and evaluation of HECTOR (Fig. 1)—a multimodal DL model to predict distant recurrence from H&E WSI and anatomical stage for postsurgical women with EC—across eight EC cohorts including three large randomized trials^{3,26,27,28,29,30,31}.

Results

EC cohorts

HECTOR is a two-step DL model wherein the first step consists of self-supervised tumor image representational learning and the second of the distant recurrence prediction task (Fig. 1).

To train and validate the distant recurrence prediction task of HECTOR, we collected and curated tumor-containing, H&E-stained WSIs of the hysterectomy specimen and comprehensive clinicopathological datasets, molecular and clinical distant recurrence data for 2,072 patients with tumor stages (FIGO 2009) I–III EC across eight cohorts, including the PORTEC-1, -2 and -3 randomized trials^{3,26,27,28,29,30} (Extended Data Fig. 1; study CONSORT diagram shown as Supplementary Figs. 1 and 2 and Supplementary Tables 1 and 2). Of these, two population-based cohorts were held out as two external test sets: patients treated at the University Medical Center Groningen³¹ (UMCG; n = 160 patients) and the Leiden University Medical Center (LUMC; n = 151 patients) where the LUMC external test set also simulates a diagnostic scenario with up to three tumor blocks per patient. The remaining patients were divided randomly into a 20% held-out internal test set (n = 353) and 80% training set (n = 1,408) where fivefold crossvalidation was performed. The median duration of follow-up in the training set, internal test set, UMCG external test set and LUMC external test was 7.8, 8.4, 5.3 and 2.9 years, respectively, during which 246 (17.5%), 62 (17.6%), 14 (8.8%) and 24 (15.9%) patients had distant recurrence. Importantly, patients who underwent chemotherapy, predominantly the experimental treatment arm of the PORTEC-3 randomized trial (n = 225), were excluded from training because this treatment influences distant recurrence risk^3,4 (Extended Data Fig. 1). These PORTEC-3 patients were, however, used for downstream analysis of adjuvant chemotherapy benefit by HECTOR.

To train HECTOR’s self-supervised learning step (which requires a large imaging dataset without outcome data), we enriched the training set with one additional cohort of the TCGA-UCEC³² (The Cancer Genome Atlas Uterine Corpus Endometrial Carcinoma) as well as the WSIs that were excluded for the distant recurrence task owing to cancer metastasized at diagnosis (FIGO 2009, stage IV) or missing outcome (n = 1,862; Methods).

Altogether, including the two training steps and the downstream analyses, the present study comprised tumor data from 2,751 patients.

HECTOR design and performance

To design HECTOR and obtain the most performant DL model for prediction of distant recurrence based on the highest C-index³³, we conducted ablation studies on the fivefold crossvalidation (Supplementary Table 3). HECTOR’s first step comprises a vision transformer for patch-level, self-supervised representational learning (Fig. 1a). HECTOR’s second step is a multimodal, three-arm architecture to predict distant recurrence-free probabilities (Fig. 1b). The three-arm architecture fuses prognostic information from the H&E-stained WSI of the tumor-containing uterine section, the image-based molecular class as predicted by im4MEC directly from the H&E WSI¹¹ and the surgically assessed anatomical stage (as three-tiered based on the FIGO 2009 system, wherein stage I indicates a tumor confined in the uterus, stage II a cervical extent and stage III beyond, including vaginal, adnexal, pelvic and lymph nodes)³⁴. To do this, we combined attention-based multiple instance learning with Embedding layers to map the discrete risk factors (the image-based molecular class and anatomical stage) to a higher-dimensional continuous vector space, with the importance of each factor controlled by gating-based attention^16,35. Ablation studies (Supplementary Table 3) also included multitask learning³⁶, with a second training objective predicting the image-based molecular class instead of the frozen im4MEC, or replacing attention-based multiple instance learning with DL models that integrate spatial information of the patches, such as transformer²³ and attention-based graph neural network¹⁵. These two architectures did not outperform attention-based multiple instance learning for this task. Further details are provided in Methods and a summary of the HECTOR configuration is provided in Supplementary Tables 4 and 5.

HECTOR demonstrated a mean C-index of 0.795 (95% confidence interval (CI): 0.768–0.822) on fivefold crossvalidation. Notably, the addition of the image-based molecular class arm as predicted by im4MEC to the H&E WSI (referred to as two-arm or one-arm model, respectively) boosted performance from 0.775 (95% CI: 0.748–0.802) to 0.782 (95% CI: 0.759–0.805) with no need for extra input data. Adding the anatomical stage (as three-tiered FIGO 2009, stage I, II or III) further improved the C-index to 0.795 (95% CI: 0.768–0.822), yielding the final architecture of HECTOR (Fig. 2a). The cumulative area under the receiver operating curve (AUC)³⁷ and integrated Brier score³⁸ are reported in Supplementary Table 6. We also observed that HECTOR concentrated high attention to fewer regions while ignoring large parts of the H&E WSI compared with a model relying on the H&E WSI (Extended Data Fig. 2).

On the unseen internal test set, HECTOR obtained a C-index of 0.789 and, on the UMCG external test set, a C-index of 0.828. The performance in the LUMC external test set is depicted in ‘Performance with multiple WSIs’.

To aid clinical interpretation, we first defined categorical HECTOR risk groups as quartiles of the continuous risk scores in the training set. The groups from the first two quartiles were then combined for simplification because these had very similar clinical outcomes in the training set (distant recurrence-free probabilities of 98.1% and 95.8% by Kaplan–Meier analysis, respectively; Supplementary Fig. 3) and applied on to the internal and external test sets. Second, we computed the hazard ratio (HR) of HECTOR using a Cox’s proportional hazard (CPH) model with both continuous and categorical HECTOR risk scores as the independent variable and time to distant recurrence as the dependent variable.

HECTOR showed strong prognostic value as a continuous variable in the training test set (HR = 5.06; 95% CI: 4.35–5.89; P = 9.00 × 10⁻⁹⁹), the internal test set (HR = 2.69; 95% CI: 2.07–3.49; P = 1.31 × 10⁻¹³) and the UMCG external test set (HR = 5.84; 95% CI: 3.06–11.14; P = 8.37 × 10⁻⁸). On the internal test set, 10-year distant recurrence-free probabilities for HECTOR low- (n = 175), intermediate- (n = 82) and high- (n = 96) risk groups were 97.0% (95% CI: 0.930–0.988), 77.7% (95% CI: 0.670–0.854) and 58.1% (95% CI: 0.469–0.677), respectively (log rank P = 1.78 × 10⁻¹⁰; Fig. 2d). The corresponding HR for HECTOR high- and intermediate-risk groups in the internal set, using the HECTOR low-risk group as the reference, were 15.63 (95% CI: 6.58–37.13; P = 4.81 × 10⁻¹⁰) and 7.67 (95% CI: 3.06–19.22; P = 1.37 × 10⁻⁵), respectively. In the UMCG external test set, a similar stratification was observed with 5-year distant recurrence-free probabilities for HECTOR low- (n = 102), intermediate- (n = 44), and high- (n = 14) risk groups of 93.9% (95% CI: 0.859–0.974), 91.4% (95% CI: 0.756–0.972) and 19.0% (95% CI: 0.0097–0.553), respectively (log rank P = 5.56 × 10⁻¹⁰; Supplementary Fig. 4). The corresponding HR for the HECTOR intermediate group in the UMCG external test set was 2.26 (95% CI: 0.61–8.42; P = 0.225) and in the high-risk group was 20.42 (95% CI: 5.92–70.50; P = 2.00 × 10⁻⁶), respectively.

Comparison with current prognostic gold standard

We compared DL-based risk scores (that is, the one-, two-arm and HECTOR models) with the current standards for EC prognostication comprising clinicopathological risk factors and the molecular EC classification on the fivefold crossvalidation (Fig. 2a). For this, we first compared C-indices by type of input required: (1) a ‘base’ CPH model including variables defined by pathologists using H&E images alone (histological subtype, grade and LVSI); (2) the base model plus anatomical stage; and (3) the base model plus anatomical stage and molecular EC class. In the fivefold crossvalidation, given the H&E-based input data, the one- and two-arm model discrimination was superior to the base CPH model (C-index = 0.681; 95% CI: 0.624–0.738). HECTOR model discrimination was superior to the base CPH model plus anatomical stage which used the same inputs (C-index = 0.716; 95% CI: 0.672–0.761) and better or as good as the base CPH model plus anatomical stage and molecular EC class (C-index = 0.762; 95% CI: 0.732–0.791), which requires sequencing, immunohistochemistry (IHC) and expert pathology.

We further compared HECTOR prognostic values against current clinicopathological and molecular risk factors in multivariable analysis using HECTOR continuous risk scores as the independent variable. HECTOR retained prognostic values in multivariable models in which known risk factors (histological subtype, grade, LVSI, FIGO 2009 stage I–III, age, molecular class) combined as one risk score (referred to as the CLINICAL risk score) were not prognostic (HECTOR HR = 4.62 (95% CI: 3.72–5.73; P = 5.02 × 10⁻⁴⁴) versus CLINICAL HR = 1.08 (95% CI: 0.90–1.30; P = 0.402)) (Fig. 2b). Similar multivariable analysis, including risk factors as individual variables, showed independent prognostic value of HECTOR (HR = 5.26; 95% CI: 4.21–6.56; P = 2.30 × 10⁻⁴⁸), with only FIGO 2009 stage III disease retaining statistical significance (HR = 1.50; 95% CI: 1.05–2.14; P = 0.026) (Fig. 2c). Other known risk factors were no longer prognostic after inclusion of the HECTOR risk score, suggesting that these factors were captured by HECTOR. For instance, the POLEmut and p53abn molecular classes derived from ground-truth sequencing and IHC, respectively—HR = 0.66 (95% CI: 0.26–1.69; P = 0.384) and HR = 0.90 (95% CI: 0.61–1.34; P = 0.616)—and histological factors such as LVSI (HR: 1.05; 95% CI: 0.77–1.42, P = 0.776) would not be of additive prognostic value for the prediction of distant recurrence.

Given the current prognostic gold standards that would classify p53abn EC as high-risk tumors and MMRd and NSMP as intermediate-risk tumors with heterogeneous outcomes, we validated the capacity of HECTOR to refine prognosis within the MMRd, NSMP and p53abn molecular classes in the training and internal test sets. In particular, the HECTOR low-risk group also identified about 5.3% (16 out of 300) of p53abn EC cases with excellent prognosis in the entire dataset (Supplementary Fig. 5). Along these lines, we estimated the number of patients with markable different risk classification between HECTOR and the ESGO-ESTRO-ESP 2021 guidelines⁵ which combine clinicopathological and molecular factors (Supplementary Fig. 6). Among all patients with intermediate- to high-risk tumors based on the guidelines (and no report of distant recurrence), 48.2% (552 cases out of 1,146) of patients were predicted to be HECTOR low risk and 16.9% (62 cases out of 366) were predicted to be HECTOR low risk among high-risk tumors only. Among all guideline-based low-to-high intermediate-risk tumors, 11.2% (131 out of 1,170) of patients were predicted to be HECTOR high risk and 4.9% (14 out of 287) when restricting to only low-risk tumors.

Performance with multiple WSIs

To evaluate the prognostic value and robustness of HECTOR in a second real-world external test set, we leveraged the fact that most cases in the LUMC cohort had multiple tumor-containing H&E WSIs derived from different tissue blocks per patient (121 of 151 cases had 3 WSIs, 21 had 2 and 9 had 1; Fig. 2e). This enabled us to validate the external performance of HECTOR in a diagnostic setting and subsequently test robustness to selection of the H&E WSI. The initial evaluation, using a HECTOR score derived from random selection of a single WSI per patient repeated 100×, demonstrated a mean C-index of 0.802 (95% CI: 0.799–0.804) for prediction of distant recurrence on the LUMC external test set (Fig. 2f).

HECTOR performance and risk stratification were slightly improved by the addition of further WSIs (taking per-patient HECTOR risk scores as either the mean or the median scores across WSIs) with C-indices of 0.810 (95% CI: 0.808–0.811) with up to 2 WSIs per patient, and 0.813 or 0.815 with up to 3 WSIs (Fig. 2f). A different method was tested wherein the WSIs were combined as one single input bag of images, yielding a C-index of 0.805. The 5-year distant recurrence-free probabilities using the median of HECTOR risk scores per patient were 98.4% (95% CI: 0.891–0.998) in HECTOR low risk (n = 70), 74.8% (95% CI: 0.534–0.874) in HECTOR intermediate risk (n = 44) and 52.6% (95% CI: 0.323–0.694) in HECTOR high risk (n = 37; log rank P = 1.00 × 10⁻⁶) (Fig. 2g and Supplementary Fig. 7). The corresponding HR (for the continuous HECTOR risk score) was 3.73 (95% CI: 2.34–5.96; P = 3.17 × 10⁻⁸) and (for the categorical high risk versus intermediate risk) 34.51 (95% CI: 4.52–263.39; P = 6.37 × 10⁻⁴) versus 15.08 (95% CI: 1.91–119.16; P = 0.010). Furthermore, HECTOR performance in patient stratification of the LUMC external test set extended to overall survival (5-year probabilities of 88.4% (95% CI: 0.769–0.944), 69.9% (95% CI: 0.468–0.845) and 47.0% (95% CI: 0.289–0.633) for low, intermediate and high risk, respectively; Supplementary Fig. 8).

Potential confounding by intratumoral heterogeneity also appeared to be minimal because 85 cases out of the 142 cases with more than 1 WSI had consistent HECTOR risk group predictions across the WSIs and only 3 cases with 3 WSIs had a different predicted HECTOR risk group for each WSI (Supplementary Figs. 9–12 and Supplementary Notes p16).

Association with prognostic factors and input contribution

DL prognostic models may provide information on the correlates or features that determine clinical outcome. Initial analysis of the internal test set by multiple linear regression (Fig. 3a,b) revealed that lower HECTOR risk scores were associated with established favorable risk factors of endometrioid (EEC) histological subtype, grade 1 and POLEmut EC, and higher HECTOR risk scores with unfavorable factors, including non-EEC histological subtypes, grade 3, FIGO stage III, LVSI, p53abn EC, estrogen receptor negativity and L1 cell adhesion molecule (L1CAM) positivity (Supplementary Tables 7–9 and Supplementary Fig. 13). MMRd EC, grade 2 and FIGO 2009 stage II were spread throughout the risk score axis and were not statistically significant.

**Fig. 3: HECTOR explainability by analysis of HECTOR risk score with prognostic factors and analysis of input contribution.**

For deeper explainability, we evaluated the impact of the H&E WSI, im4MEC and anatomical stage on the prediction, that is, whether each modality decreased (negative contribution) or increased (positive contribution) the HECTOR risk scores of developing distant recurrence. We used the normalized Integrated Gradient (IG) values for the H&E WSIs, and differences in predicted risk scores with fixed value of im4MEC or FIGO anatomical stage for the same case in the internal test set. The H&E WSIs mainly had a positive contribution with values linearly increasing alongside HECTOR risk scores (Fig. 3c and Supplementary Fig. 14). We also noted higher magnitude of contributions toward grade 3 EEC or non-EEC histological subtypes and LVSI (Fig. 3d). Both observations may indicate that unfavorable morphological features captured in H&E WSIs are a strong driver of risk score predictions. The use of image-based molecular class and FIGO 2009 stage I–III was consistent with domain expertise in EC with imPOLEmut and imMMRd mainly decreasing and imp53abn strongly increasing the HECTOR risk scores given accurate predictions (Fig. 3e, Supplementary Table 8 and Supplementary Fig. 15) and higher anatomical stage increasing the HECTOR risk scores (Fig. 3f and Supplementary Fig. 16).

These analyses enabled us to dissect data of the six patients with distant recurrence predicted as HECTOR low risk in the internal test set (Supplementary Table 10 and Supplementary Fig. 17). Experimental tests, in which the image-based molecular class was replaced by the true molecular class, showed no effect of misclassification by im4MEC in these instances on to the HECTOR risk group. Review of the single WSI input by an expert gynecopathologist revealed that, at least in two cases, WSIs were missing unfavorable visual features that were reported in the pathology report (substantial LVSI or high-grade tumoral areas). We also noted three cases predicted as HECTOR high risk with a POLE mutation. Although the same experiment confirmed that the image-based molecular class had little or no effect in the HECTOR predictions of these instances, these three cases all had notably FIGO 2009, stage II or III disease (Supplementary Table 11).

Morphological correlates of outcome risk

To identify the prognostic morphological features that may have been used by HECTOR, the top 5% regions of the H&E WSIs with the highest impact on the risk scores (decreasing and increasing) were extracted and reviewed by an expert gynecopathologist in the internal test set (Fig. 4a and Supplementary Figs. 18–22). Within the HECTOR low-risk group, the morphological features decreasing the risk score were identified as smooth luminal borders, inflamed stroma and intraepithelial lymphocytes, intraepithelial neutrophils and abundant compact normal myometrium without tumor. Morphological features increasing the risk score in the HECTOR high-risk group were a ragged luminal tumor surface (also referred to as hobnailing), LVSI, solid tumor growth with marked nuclear atypia, desmoplastic stromal reaction and the presence of mitotic figures (Fig. 4a). Within the HECTOR low-risk group, we observed morphological features with positive contribution, although relatively less common, as surface changes mimicking hobnailing, retraction artifacts mimicking LVSI, loose myometrium with edema mimicking desmoplasia and solid tumor growth with scattered high-grade nuclear atypia (Extended Data Fig. 3a).

**Fig. 4: Morphological features contributing to HECTOR risk scores.**

Mitotic activity, inflammatory cell density and the size of the tumor nuclei were quantified using DL-based image analysis tools (Fig. 4b and Methods). More inflammatory cells were present in the top 5% regions decreasing the risk scores and this effect was more pronounced in the HECTOR low-risk group (P = 0.011). A higher mitotic density and larger tumor nuclei were found in the top 5% regions in the HECTOR high-risk group (both P < 0.001). These results remained consistent across image-based molecular classes and FIGO 2009 stages I–III (Supplementary Figs. 23–25) and when filtering in regions containing tumor cells (Supplementary Fig. 26). In a quantitative spatial analysis, we computed the overlap of the top 5% regions with the tumor and invasive border areas (Extended Data Fig. 3b). The latter showed that the regions increasing the risk scores were picked out more from the tumor than from the invasive border area. Tumor and invasive border areas contributed almost the same in regions decreasing the risk scores, notably in the HECTOR low-risk group.

Genomic alterations, immune and transcriptional signatures

For comprehensive analysis of the molecular correlates of HECTOR risk scores, we analyzed the TCGA-UCEC (n = 381 FIGO, stage I–III ECs) dataset (Fig. 5 and Supplementary Fig. 27). Coding driver mutations in ARID1A, CTCF, CTNNB1, FGFR2, KRAS and PTEN were enriched in the HECTOR low-risk group (all P < 0.005), whereas PPP2R1A and TP53 mutations were more frequent in the HECTOR high-risk group (P = 2.19 × 10⁻³ and P = 2.81 × 10⁻⁷, respectively) (Fig. 5a and Supplementary Table 12). Using transcriptional data, we performed an analysis of CIBERSORT-defined lymphocyte populations using multiple linear regression (Fig. 5b). This revealed that increasing HECTOR scores were positively correlated with memory B cells (P = 0.008), activated dendritic cells (P < 0.001) and resting mast cells (P = 0.029), and inversely correlated with CD8⁺ T cells (P < 0.001), follicular helper T cells (P < 0.001), regulatory T cells (P < 0.001) and natural killer (NK) cell activation (P = 0.049). Notably, these associations were independent of EC molecular class and tumor mutational burden (TMB) (Supplementary Table 13). Further transcriptomic analysis (Fig. 5c, Supplementary Fig. 27c and Supplementary Table 15) confirmed that variation in lymphocyte populations was reflected in the differential expression of canonical immune cell markers, including CD1C, BTLA and CD40LG (enriched in the HECTOR low-risk cases). HECTOR high-risk tumors also demonstrated upregulation of genes predictive of worse outcomes in EC, including L1CAM and CLDN6, whereas HECTOR low-risk cases showed upregulation of genes associated with hormone signaling (C1orf64 and OVGP1).

**Fig. 5: Genomic and transcriptomic correlations of HECTOR risk groups using TCGA-UCEC (n = 381).**

Adjuvant chemotherapy response prediction by HECTOR

The investigation of whether HECTOR could predict the benefit of chemotherapy for distant recurrence risk was conducted using the PORTEC-3 randomized trial³. In this trial, patients with high-risk stage I–III EC were randomized to concurrent and adjuvant external beam radiotherapy with or without platinum- and paclitaxel-based chemotherapy. HECTOR risk scores were predicted on all PORTEC-3 cases for whom WSI was available (n = 442), which included the patients who underwent chemotherapy (n = 225). Importantly, these 225 cases had not been used in either training or test sets (Extended Data Fig. 4, Supplementary Table 14 and Supplementary Fig. 28). Analysis of distant recurrence-free probabilities by treatment arm and HECTOR demonstrated a statistically significant interaction between chemotherapy and HECTOR risk score as either a continuous or a categorical variable (P_INTERACTION = 0.014 and P_INTERACTION = 0.064, respectively).

We examined this in detail across HECTOR risk groups (Fig. 6a). Within HECTOR low- (n = 92) and HECTOR intermediate-risk (n = 177) groups, outcomes were similarly favorable in both treatment arms, as evidenced by similar probability of EC distant recurrence (log rank P = 0.244 and 0.807, respectively). In contrast, among women classified as HECTOR high risk (n = 173), those who received adjuvant chemotherapy had significantly improved distant recurrence-free probabilities compared with those treated with external beam radiotherapy alone (5-year distant recurrence-free probability of 62.2% (95% CI: 0.511–0.715) versus 42.0% (95% CI: 0.311–0.526); log rank P = 0.007; HR = 0.561 (95% CI: 0.366–0.862; P = 0.008)). Exploratory analysis suggested that the predictive accuracy was greater than that provided by prognostic factors currently used to identify patients with high-risk tumors who were likely to benefit from adjuvant chemotherapy, including serous histological subtype, FIGO 2009 stage III and the p53abn molecular class (Fig. 6b). Further exploratory analyses suggested that HECTOR also identified patients who benefited from adjuvant chemotherapy within the NSMP and MMRd molecular classes (Supplementary Figs. 29 and 30). These results remained consistent when sub-stratifying by the image-based molecular class arm of HECTOR (Supplementary Fig. 31). Thus, HECTOR demonstrated significant predictive utility that may exceed that offered by current methods.

**Fig. 6: Impact of the addition of adjuvant chemotherapy to external beam radiotherapy on distant recurrence in the PORTEC-3 randomized trial by HECTOR risk group.**

Discussion

HECTOR, a DL model trained and validated in 2,072 patients with stage I–III EC^{3,26,27,28,29,30,31}, with long-term follow-up, predicts postoperative distant recurrence risk using only H&E-stained tumor slide(s) of the hysterectomy specimen and anatomical stage. HECTOR obtained C-indices of 0.789, 0.828 and 0.815 in three unseen test sets for distant recurrence outcome. Its performance is on a par with clinically implemented prognostic DL tools in other cancer types (C-indices of 0.714 and 0.744 for colorectal cancer recurrence³⁹, AUC of 0.78 for 10-year prostate cancer distant recurrence⁴⁰) and also favorably compares with molecular prognostic assays such as OncotypeDX (C-index of 0.641 for 10-year breast cancer distant recurrence⁴¹). Notably, HECTOR outperformed the current diagnostic gold standard of combined pathological and molecular analysis for distant recurrence risk prediction, and was also found to be predictive of adjuvant chemotherapy benefit in the PORTEC-3 randomized trial³. Pending prospective validation, our results suggest that HECTOR may have the potential to be a highly effective tool for individualized prognostication of women with EC, while delivering shorter turnaround times and reducing testing costs. HECTOR may also enable biomarker discoveries for improving targeted treatment decision-making.

HECTOR performance is the result of a new multimodal, integrative, three-arm architecture which leveraged prognostic information from the H&E WSI, the image-based molecular class from im4MEC¹¹ and anatomical stage³⁴. This multimodal architecture outperformed alternative DL models using only H&E-based information, corroborating other studies^16,42. It is interesting that nesting of the im4MEC model within HECTOR boosted the performance, in contrast to other studies where integration of copy number variation or transcriptomics did not improve prediction of overall survival in EC¹⁶. We demonstrated that the prognostic value of categorical clinical risk factors, such as the anatomical stage, can be learned end to end by the DL model to increase predictive accuracy. HECTOR takes a step toward integrating patient-level imaging, image-based molecular and clinical insights, which may benefit similar studies in other cancer types where unimodal DL models have been developed on images only^17,20,39.

Our preliminary investigations of model explainability and risk score correlates offer good prospects to improve our understanding of the biology of EC and other cancer types. For example, the association of HECTOR low-risk scores with immune cell infiltrate is consistent with data showing better prognosis of immune-infiltrated EC¹⁰, although at present it is unclear whether HECTOR directly quantified lymphocyte subtypes such as T cells from H&E WSIs. The upregulation of CLDN6 in HECTOR high-risk ECs is consistent with this being a predictor of distant recurrence⁴³. Cases with combined HECTOR high risk and CLDN6 upregulated could be actionable as a chimeric antigen receptor T cell target⁴⁴. Although desmoplastic stromal reaction is known to predict bad prognosis in colorectal cancer, the association that we describe in the present study has not previously been reported in EC⁴⁵. Whether this represents a morphological readout of L1CAM overexpression⁴⁶ is presently unclear. We also confirmed well-established, unfavorable histopathological risk factors in EC aligning with higher HECTOR risk scores⁵. Thus, we expect the outperformance of standard histopathology by HECTOR probably being driven by the nonlinear combination of each factor and, more importantly, the noncategorical processing of the visual information from the WSIs.

HECTOR’s design holds considerable promise for scaling to clinical implementation because it is built on two broadly available and cost-effective inputs routinely obtained in diagnostics: one H&E-stained tumor slide from which we used the image-based rather than the true molecular classes and high-level clinical information of the tumor extension at diagnosis (to the cervix or beyond the uterus excluding distant) which is independent of an evolving FIGO staging system⁹. After appropriate validation in a prospective clinical trial setting, HECTOR may have great potential to individualize triage of women with EC in the adjuvant setting from low to high risk of distant recurrence. Subsequent treatment decision-making by clinicians could be guided accordingly because HECTOR low-risk prediction could provide a means to de-escalate adjuvant treatment or to encourage adjuvant systemic therapy recommendation for patients predicted to be HECTOR high risk (such as chemotherapy^3,4 or targeted therapies in clinical trials^47,48,49). The therapeutic guidance within HECTOR high risk can be supported by selective targeted molecular testing such as MMRd or even DL-based molecular predictions given a good accuracy¹¹. Although our data support that HECTOR could reduce under- and over-treatment for women with EC, it would also spare challenges and expenses of resource-limited environments where molecular testing and expert pathologist review are difficult or not feasible. We speculate that future technical improvements of HECTOR could be an extension of its inputs to consecutive digitized H&E-stained hysterectomy sections followed by three-dimensional reconstruction⁵⁰, routinely performed IHC-stained WSIs⁵¹, preoperative radiology images⁵² or a clinical report encoding patient-level clinical information⁵³. Moreover, DL-based assessment of the anatomical stage by leveraging histology images of the cervical, ovarian and (or radiology images of) lymph node sections would make HECTOR independent of pathology review.

Our study has several strengths. Our total cohort of 2,751 patients, including 3 randomized trials, makes this one of the largest DL-based prognostic studies in EC performed to date. Our state-of-the-art multimodal DL methodology allowed us to leverage prognostic information from multiple factors, including those beyond the H&E image alone. Expert pathology review and molecular profiling enabled us to benchmark our methodology against the current gold standard in risk stratification of EC. Limitations of our study are that our current model based on multiple instance learning is unaware of the spatial relationship between regions and was not designed to leverage information between multiple WSIs, both of which may improve performance^54,55; although context-aware architectures have not been found to improve performance in this task. In addition, complex interactions of the morphology, molecular and anatomical stage may be further optimized by experimenting with other early-to-late fusion techniques⁴², or learning more generalizable morpho-molecular representations using pretext tasks. Some patients in the study did not undergo surgical staging lymphadenectomy^26,27, a consideration that may have introduced some noise in the anatomical stage input and may explain the residual prognostic value of advanced disease stage III in multivariable analysis. Given that POLEmut EC mutations rarely metastasize⁵⁶, we acknowledge the possibility that the risk may be overestimated in these rare instances by HECTOR. Furthermore, not all morphological correlates observed in the H&E regions (for example, structural changes) were quantified in the present study owing to the lack of available labeled datasets that could have been used for training DL-based, EC-specific image analysis tools. Importantly, HECTOR performance needs further validation both in unselected cohorts more diverse than the ones of largely European ancestry that we examined and in prospective trials. As such, prospective validation will be conducted first in the PORTEC-4a trial⁵⁷. Moreover, as the therapeutic landscape of EC is rapidly evolving, the most suitable adjuvant systemic therapy for HECTOR high-risk patients needs to be continuously validated^4,58 or (prospectively) explored in other randomized trials^47,48,49,59.

In summary, validation and extension of HECTOR could help delivery of precision medicine to advance prognostication of women with stage I–III EC who underwent primary surgery, with improvement worldwide on both systemic therapy recommendation and treatment de-escalation.

Methods

Ethics statement

The PORTEC-1, PORTEC-2 (NCT00376844) and PORTEC-3 (NCT00411138) study protocols were approved by the Medical Ethical Committee Leiden, Den Haag, Delft and the medical ethics committees at participating centers. Studies were conducted in accordance with the principles of the Declaration of Helsinki. Ethical permissions for the retrospective use of the clinical trials and retrospective cohorts (TransPORTEC study, Medisch Spectrum Twente (MST)) were obtained by the Medical Ethical Committee Leiden (nos. B21.065 and B21.011), as well as the LUMC cohort (nWMO‐D4‐2023‐002) and the Danish Cohort by the Center for Regional Udvikling, De Videnskabsetiske Komiteer (H-16025909). All study participants of the clinical trials provided informed consent. The ethical boards have provided a waiver for informed consent for the other studies. For the UMCG cohort, the medical ethical committee granted permission for the use of the data and provided a waiver for informed consent owing to the observational nature of the study.

Cohorts

We used formalin-fixed paraffin-embedded (FFPE) tumor material and clinicopathological data of patients with EC from three randomized trials and six clinical cohorts. We included study participants of the female sex, independent of gender identity.

The PORTEC-1 trial recruited 714 women with early stage intermediate-risk EC from 1990 to 1997, and after primary surgery, randomly assigned to pelvic external beam radiotherapy or no adjuvant treatment²⁶. The PORTEC-2 trial randomized 427 women with early stage, high- to intermediate-risk EC between 2000 and 2006 to external beam radiotherapy or vaginal brachytherapy²⁷. The PORTEC-3 randomized trial included 660 women with stage I–III high-risk EC from 2006 and 2013, and randomly allocated them to pelvic external beam radiotherapy alone or external beam radiotherapy combined with concurrent and adjuvant chemotherapy³. The retrospective TransPORTEC study included 116 high-risk EC tumors from international patients using the same inclusion criteria as the PORTEC-3 from 5 institutions (LUMC and UMCG, the Netherlands; University College London and St Mary’s Hospital, Manchester, UK; and Institute Gustave Roussy, Villejuif, France)²⁸. The prospective cohort of MST included 257 patients with stage I–III high-risk EC, with the same inclusion criteria as PORTEC-3, who were treated between 1987 and 2015 at MST, Enschede in the Netherlands²⁹. The Danish cohort consisted of 451 patients with high-grade EC who were prospectively registered in the Danish gynecological cancer database³⁰. The UMCG cohort is a population-based cohort consisting of patients treated at the UMCG between 1984 and 2004, that is, 278 patients with follow-up data collected until 2010 (ref. ³¹). The LUMC cohort is a retrospectively collected, population-based cohort of 222 patients diagnosed and treated at the LUMC between 2012 and 2021. Finally, the publicly available TCGA-UCEC cohort³² of 529 patients was downloaded from the cBioPortal^65,66.

Datasets

One representative H&E-stained slide of the hysterectomy specimen was included for each patient depending on the availability of the tumor material (Supplementary Figs. 1 and 2, and Supplementary Tables 1, 2 and 14). For the LUMC cohort, we collected three diagnostic H&E-stained tumor slides per patient case with EC, each from a different FFPE tumor tissue block. H&E slides were scanned at ×40 magnification using two scanners 3Dhistech P250 (resolution 0.19 µm per pixel) and 3Dhistech P1000 (resolution 0.24 µm per pixel). Any image provided in the manuscript is an unprocessed scan. Qualitative review was conducted on all WSIs by our expert pathologist, after which cases with no tumor, poor tissue quality and out-of-focus scanning issues were excluded, yielding 2,560 cases with at least one WSI per case (CONSORT chart in Supplementary Figs. 1 and 2).

In the present study, some cases were excluded from the supervised training of HECTOR based on the following criteria: (1) missing time to distant recurrence follow-up data, (2) FIGO 2009 stage IV³⁴ because they already have distant recurrence at time of diagnosis and (3) treatment with adjuvant chemotherapy because it may have lowered the risk of distant recurrence^3,4. The categorical anatomical stages I, II and III are defined following the FIGO 2009 classification³⁴. Hence, it represents a tumor confined in the uterus (stage I), a tumor spread to the cervical stroma (stage II) or to the vagina, adnexa, pelvis and lymph nodes (stage III) at diagnosis. Distant recurrence in the adjuvant setting was defined as any recurrence outside the pelvis. Hence, distant recurrence included abdominal metastasis and para-aortic lymph node metastasis. Time to distant recurrence was defined to start at randomization (for PORTEC-1, -2 and -3) or date of primary surgery (MST, TransPORTEC study, Danish, UMCG and LUMC cohort) and to end at the date of the diagnosis of metastasis, or the date of last follow-up or death in patients without metastasis. We also stress that adjuvant chemotherapy was not the standard of care at the time the clinical cohorts were collected and that the vast majority of patients treated with adjuvant chemotherapy originated from the PORTEC-3 randomized trial (n = 225).

Following the aforementioned criteria, 2,072 cases were included for the supervised train–test split: 584 from PORTEC-1 (ref. ²⁶), 395 from PORTEC-2 (ref. ²⁷), 217 from PORTEC-3 (ref. ³), 67 from the TransPORTEC study²⁸, 226 from the MST cohort²⁹, 272 from the Danish cohort³⁰, 160 from the UMCG cohort³¹ and 151 from the LUMC cohort. Then we held out one internal test set and two external test sets, all representing an unselected population. The internal test set was obtained by randomly sampling 20% of the supervised training set, stratified by discrete time intervals and censorship status to ensure the presence of enough events across time (n = 353, of which 116 were from PORTEC-1, 100 from PORTEC-2, 43 from PORTEC-3, 13 from the TransPORTEC study, 35 from the MST cohort and 46 from the Danish cohort; median follow-up of 8.45 years with 62 events). The first external test set is the UMCG cohort (n = 160 patients; 5.32-year median follow-up time with 14 events). The second external test set is the LUMC cohort (n = 151 patients: 121 with 3 WSIs, 21 with 2 WSIs and 9 with 1 WSI; 2.90-year median follow-up time with 24 events). Finally, the remaining 1,408 WSIs were used for supervised training of HECTOR (468 from PORTEC-1, 295 from PORTEC-2, 174 from PORTEC-3, 54 from the TransPORTEC study, 191 from the MST cohort and 226 from the Danish cohort; median follow-up of 7.77 years with 246 events).

In addition, the HECTOR risk scores were predicted on the previously excluded, chemotherapy-treated cases from the PORTEC-3 randomized trial³ (n = 225), as well as the patients with stages I–III from TCGA-UCEC (n = 381).

For the self-supervised learning, we used only the 1,408 WSIs already reserved for supervised training, and thus strictly limited to only those that were not part of the internal and external test sets. In addition, the self-supervised learning training was enriched by cases with any stage of disease, whose treatment or distant recurrence outcome data were unknown (n = 454 of which 31 from the TransPORTEC study, 5 from the MST cohort, 16 from the Danish cohort and 402 from TCGA-UCEC), resulting in 1,862 cases for self-supervised learning.

Performance evaluation

Hyperparameter optimization and model comparisons (including architecture choices for patch representational learning with self-supervised learning) were evaluated on the supervised downstream task guided by the C-index metric³³ (using a tau = 10 years and scikit-survival Python package (v.0.17.2)). To this end, a fivefold crossvalidation routine was performed on the 1,408 WSIs reserved for supervised training. The most performant architecture and hyperparameters were selected based on the highest mean C-index over the five folds. The final model, referred to as HECTOR, is then retrained on the full training set and evaluated on to the internal and the two external test sets (UMCG and LUMC). The cumulative AUC³⁷ and Brier scores³⁸ were additionally computed.

Given the fact that the LUMC external test set contains up to three WSIs per case, as opposed to one in the internal test set and the UMCG external test set, we performed multiple experiments to derive patient-level risk scores using random sampling. First, we randomly selected one WSI per case and repeated this experiment 100×, yielding a mean C-index and CI. Second, we randomly selected up to two WSIs for each case when available, then averaged with the mean the two risk scores per patient and repeated it 100×. Third, we selected all available WSIs of the external test set with up to three WSIs per case when available and computed the mean and median of the two or three risk scores. In an additional experiment, we combined each patient’s WSIs by merging the patch features from all available WSIs into a single feature bag.

WSI preprocessing

WSI segmentation was performed using Otsu thresholding. Nonoverlapping patching was performed at 180 µm and patches were resized to 256 × 256 pixels². On average, this procedure generated a bag of 10,185 patches per WSI.

Vision transformer-based patch representational learning

We followed advancements in self-supervised learning by adopting vision transformer-based DL models that are capable of learning fine-grained, patch-level representation at multiple resolutions. For this, we trained EsVIT⁶⁰ and compared it with CtransPath⁶⁷, an alternative model trained on the histopathology domain (Supplementary Table 3). We modified the initial proposed four-stage Swin⁶⁸, transformer-based architecture of EsVIT to capture cell- and region-level tissue information and to fit our computational resources. The patch size of stage 1 was doubled to 8 pixels to reduce the sequence length and increase field of view to capture cell views. In stages 2–4, we kept the two-factor feature map merging rate and resized the input images to 256 × 256 pixels² instead of 224 × 224 pixels² to avoid indivisible patch size at stage 4. Finally, the number of stacked transformers in stage 3 was reduced from six to four and the rest were kept to two. The first embedding dimension remained unchanged at 96 and the number of attention heads by stage was also kept unchanged, that is, 3, 6, 12 and 24 (Supplementary Table 4).

A dataset of 3,702,447 patches was curated by randomly extracting up to 2,000 patches per WSI at 180 µm resized to 256 × 256 pixels² from the 1,862 WSIs appointed for self-supervised learning. Thereafter, the modified EsVIT was trained on 3 Nvidia RTX 8000 GPUs (graphic processing units) with a batch size of 128 for 100 epochs with a window of 14 to encourage learning of long-term dependencies between patches. For performance improvement, we also used the view- and region-level prediction DINO (self-distillation with no labels) heads with no weight normalization and frozen layers at first epoch and the default output dimension of 65,536 (ref. ⁶⁰). We followed the EsVIT authors’ recommendations with a smaller batch size by increasing the momentum teacher to 0.9996 and starting with the initial teacher temperature of 0.04. The teacher temperature was adjusted halfway through training from 0.04 to 0.02 for further loss decrease. We optimized with AdamW and default parameters, default optimization routines of the learning rate (linear warm-up for ten epochs followed by cosine scheduler to 1 × 10⁻⁶) and weight decay (cosine scheduler from 0.04 to 0.4). The data augmentation was used exactly as done in the original publication⁶⁰.

After the training was completed, the patch-level features were extracted from the attention heads of the stacked transformers at each stage. For our downstream task, we observed an improvement by extracting the last 8 blocks compared with the default last 4 mentioned in the publication⁶⁰, yielding feature vectors of size 3,456 (Supplementary Table 3).

Multimodal DL prognostic model

To build the multimodal model for distant recurrence prediction task, ablation studies were first performed using the H&E WSI modality only (referred to as H&E-based, one-arm model) followed by integrating the image-based molecular classes derived from the H&E-based predictions of im4MEC¹¹ (referred to as two-arm model) and the categorical stage (hence referred to as HECTOR). This section describes HECTOR with Supplementary Table 5 summarizing the architecture and training parameters, whereas ‘Ablation studies’ provides further details about some training experiments and the choice of the architecture.

The H&E-based, one-arm model takes as input the bag of 180-µm patch-level features of size 3,456 extracted from EsVIT⁶⁰, where the number of patches per bag varies. To train toward time-to-event data and given a batch size of one of the attention-based multiple instance learning (AttentionMIL) model, the time scale was discretized into four intervals based on the quartiles of the distribution of uncensored patients and the −log(likelihood loss) was used⁶¹.

Within the AttentionMIL model, we reported a slight performance increase by adding another WSI preprocessing step. Specifically, WSI morphological information was spatially and semantically compressed by averaging highly correlated, nearby patch-level features using a L2 norm threshold of three patches and a cosine similarity of 0.8. This step reduced the bag of features from 10,185 patches on average to 1,723 at 180 µm (Supplementary Table 3). Each mean patch-level feature is compressed by 3 Fully Connected layers gradually down to 512. The attention module computes attention scores on latent features reduced to 256 before pooling, resulting in a slide-level embedding of size 512.

To leverage the well-established prognostic value of the molecular class (here image-based derived from the H&E-based predictions of im4MEC¹¹) and the categorical (FIGO 2009) stage I, II and III variable, and given the AttentionMIL model computes an H&E slide-level embedding from the patches, we experimented with intermediate-to-late fusion to integrate slide-level, image-based molecular class and patient-level anatomical stage information at the H&E slide-level embedding. We proposed an approach of first encoding each categorical risk factor to higher-dimensional vector space with a learnable Embedding layer of size 16 followed by Elu activation function and one Fully Connected layer of size 8. Next, a gating-based attention mechanism with bilinear product was applied on the embeddings from different modalities to weight the importance of each modality based on ref. ¹⁶. To capture all interactions and retain unimodal embeddings, one was appended to the attention-weighted embeddings and then fused using the Kronecker product³⁵. It is important to note that, for using the image-based molecular class as an input modality for HECTOR, we retrained the im4MEC model on the training set specifically designed for the present study. This was done to avoid any information leakage because some cases used for training the original im4MEC model were used as testing on validation in the present study.

The final multimodal embedding was further reduced by using two Fully Connected layers of size 256 and 128 before the survival categorical head of a Fully Connected layer with output size as the number of discrete time intervals. Each Fully Connected layer in the architecture was followed by a dropout of 0.25 and a ReLU activation function.

HECTOR was trained for 24 epochs with an initial learning rate of 3 × 10⁻⁵ decayed by a factor of 10 at epochs 2, 5 and 15. The Adam optimizer was used with default parameters and a weight decay of 1 × 10⁻⁵. HECTOR was also developed by adapting sections of open access repositories^11,16,21.

Ablation studies

To find first the optimal architecture to predict distant recurrence from the H&E modality (one-arm model), three state-of-art WSI classification architectures were adapted to our distant recurrence prediction task: AttentionMIL²², a Graph Attention Network following ref. ¹⁵, with a radius up to 32 connected patch nodes and a transformer architecture following ref. ²³. Both of these architectures were adapted from their open access repository. They were both trained on the same feature bags extracted using EsVIT with a batch size of one and the same discrete survival loss (−log(likelihood loss)). We found that the AttentionMIL architecture yielded a higher C-index than the Graph Attention Network and the transformer in this prognostic task while featuring far lower computational complexity (Supplementary Table 3), which corroborates the findings of ref. ¹⁵ for TCGA-UCEC.

To incorporate the image-based molecular class predicted by im4MEC from the H&E WSIS, experiments included: (1) transfer learning in which the AttentionMIL backbone was pretrained toward the molecular class and subsequently fine-tuned on the prognostic task; (2) multitask learning in which a second training objective was added to predict the image-based molecular class in addition to the prognosis; and (3) fusion of the image-based molecular class derived from the frozen im4MEC model (as extracted from either an intermediate layer or the final predicted categorical class, followed by an Embedding layer and attention gate). In experiment 2, a second classification head was implemented which was trained using the weighted sum of the survival loss (−log(likelihood loss)) and the cross-entropy classification loss. The weight factor was considered as a hyperparameter and was optimized using the fivefold crossvalidation. Experiment 3 which consisted of the inclusion of the predicted categorical class using an Embedding layer and attention gate resulted in the highest mean C-index (Supplementary Table 3).

Experiments around fusing the stage category included notably training with the extended FIGO 2009 taxonomy or a reduced three-class taxonomy (I, II and II) followed by an Embedding layer and attention gate, the latter achieving the highest C-index (Supplementary Table 3).

Association with clinicopathological data analysis

We performed multiple single linear regression analyses using the HECTOR continuous risk scores as the dependent variable and the clinicopathological data as the regressor. Statistical tests were two sided with statistical significance accepted with P values <0.050. Regression coefficients and exact P values have been reported in Supplementary Table 7.

Input contribution

The IG method⁶³ was used to measure the contribution of the WSI and to identify the patches within a WSI relevant to the prediction of the hazard function. Given the discrete time intervals, IG scores were averaged over the four neuron targets. The IG baseline for feature missingness was represented as patch-level features derived from white patches. All IG scores were patient-wise normalized between −1 and +1 while maintaining the sign and the IG score of zero, and further averaged to get a WSI-level IG score. Positive IG value toward 1 means that it contributed positively to increase the risk score, whereas negative means it contributed to decrease the risk score. Selection of representative patches was performed once by an expert pathologist within the top 5% patches, increasing and decreasing the risk scores for each case.

The contribution of the predicted image-based molecular class by im4MEC and the FIGO stage was calculated by fixing the stage- and image-based molecular class values with the value of our choice (referred to as the ‘reference group’) followed by computing the difference in predicted risk scores. Similar to the IG method, a positive or negative difference means a positive or negative contribution to the risk score, respectively.

Cell-level composition

As part of the explainability section of HECTOR to quantify visual features of extracted patches with high contribution, we first used the cell segmentation and classification Hover-Net¹⁴ DL model to obtain inflammatory cell counts, retrained on EC-specific WSIs¹¹. Then, mitotic figures were detected with a pan-cancer DL-based detector⁶⁴ that was fine-tuned on EC tissue for the purpose of the present study. Fine-tuning was performed by extending the original training set⁶⁹ with additional data points that we internally annotated in 10 WSIs from the PORTEC datasets selected to cover the variability of EC histological types. Region-level inflammatory and mitotic activity density were defined as absolute count normalized by the area in square millimeters and further averaged over the number of regions to obtain a patient-level density value. The size of tumor nuclei was reported in mm² and averaged by patient. The statistical association between the HECTOR risk scores and the patient-level quantity of visual features was tested with linear regressions within the regions of interest, that is, the regions with either a negative or a positive contribution. Statistical tests were two sided with statistical significance accepted for P values <0.050. The coefficients of linear regressions and exact P values were the following: coefficient −0.0109 (95% CI: −0.019 to −0.002), P = 0.011, for the patient-level inflammatory density within the negative regions; and coefficient 0.0447 (95% CI: 0.033–0.057), P = 1.96 × 10⁻¹² for the patient-level mitotic density within the positive regions; coefficient 377.916 (95% CI: 297.677–458.155), P = 3.10 × 10⁻¹⁹, for the patient-level tumor nuclei area within the positive regions.

Outcome analysis

Analysis of distant recurrence-free probabilities was conducted according to the Kaplan–Meier method and the two-sided log rank test with statistical significance accepted for P < 0.050. Cutoffs for the HECTOR risk groups were defined by taking the quantiles (25%, 50% and 75%) of the distribution of HECTOR risk scores in the training set only. In the training set, the first two groups (<25% and between 25% and 50%) did not show any major difference in prognosis and were therefore merged into one group named the HECTOR low-risk group. As a result, we defined the HECTOR low-risk group as cases with a risk score below the median risk score value of the training set, the HECTOR intermediate-risk group as those with a risk score between median and third quartile values of the training set and the HECTOR high-risk group as those with a risk score greater than the third quartile value of the training set. These same cutoff values were applied to the unseen internal, UMCG and LUMC external test sets, and the TCGA-UCEC and PORTEC-3.

To compare the DL model performance with well-established clinicopathological risk factors, we fitted CPH models on these clinicopathological risk factors in EC and calculated the corresponding C-index. First, we used risk factors that can be visually assigned on histological slides: the histological subtype, the grade and LVSI. Then we added the FIGO 2009 stage I–III variable. Finally, we included the molecular class of EC (POLEmut, MMRd, NSMP and p53abn). To maintain consistency within validation sets in the fivefold crossvalidation and the internal test sets, missing molecular class (115 out of 1,408 in crossvalidation and 38 out of 353 in the internal test set) was imputed using mean substitution.

To estimate HECTOR’s prognostic value as compared to the clinicopathological risk factors, we computed HRs using CPH with HECTOR continuous risk scores. For these analyses, we included all cases with a complete set of clinicopathological and molecular risk factors (n = 1,254). First, we corrected the HECTOR risk scores for all clinicopathological risk factors combined into one risk score in a multivariable analysis. To this end, a CPH model was first fitted on to these clinicopathological risk factors. Then, the derived risk scores, referred to as ‘clinical’, were calculated by taking the linear combination of the CPH coefficients and the variables. In the second analysis, we corrected HECTOR’s continuous risk scores for the histological subtype, the grade, LVSI, stage, the molecular class and, in addition, L1CAM and age as continuous data in a multivariable analysis.

The histological subtype categorical variable was processed as grade 3 EEC versus the reference group low-grade EEC and non-EEC versus the reference EEC. The reference group for molecular class was NSMP and stage I for the FIGO 2009 stage variable.

All statistical tests were two sided with statistical significance accepted for P values <0.050.

Genomic and transcriptomic correlation analysis

To analyze the frequency of driver mutations by HECTOR risk groups, the genomic features were extracted from ref. ⁷⁰ using MC3 MAF (mutation annotation format) data. The mutational status of the top 19 oncogenic drivers in EC was downloaded from the cBioPortal portal^65,66 and annotated by OncoKB⁷¹. The statistical comparison of proportions with oncogenic mutations between HECTOR risk groups was performed using the two-sided χ² tests for each individual gene with P < 0.050 accepted as significant. Exact P values and sample size are reported in Supplementary Table 12.

The association between the HECTOR continuous risk scores and each immune cell subset was performed using the log₂(transformed proportion of the immune cell subset) as a fraction of the whole tumor, using the leukocyte fraction values. Linear regressions were performed with the HECTOR continuous risk scores as the independent variable. In addition, we tested the associations by correcting for the molecular class and TMB as additional independent variables. Two-sided P values <0.050 are accepted as significant. Regression coefficients and exact P values have been reported in Supplementary Table 13.

Messenger RNA sequencing (mRNA-seq) and clinical data from TCGA-UCEC were downloaded from firebrowse.org. Differentially expressed genes were assessed between HECTOR high-risk and HECTOR low-risk cases by DESeq2 (ref. ⁷²) (v.1.40.1). Genes with a likelihood ratio test P value adjusted using a Benjamini–Hochberg false discovery rate (FDR) were accepted if <0.050 (Supplementary Table 15).

Analysis of adjuvant chemotherapy effect

We predicted the HECTOR risk scores for the patients included in the PORTEC-3 (ref. ³) treatment arm who did receive concurrent and adjuvant chemotherapy (n = 225) and, thus, who had been previously left out from training and any test sets. The effect of the combination of adjuvant chemotherapy and external beam radiotherapy over external beam radiotherapy alone was analyzed by: (1) analyzing distant recurrence-free probabilities by treatment arm stratified by HECTOR risk group and measuring group-wise treatment effect with the Kaplan–Meier method and the two-sided log rank test and/or HR of treatment variable with the univariable Cox’s model; (2) calculating the statistical significance of the interaction term between the HECTOR continuous risk scores and the treatment binary variable; and (3) calculating the statistical significance of the interaction term between the HECTOR high-risk group and the treatment binary variable (corrected for HECTOR intermediate-risk group and using HECTOR low-risk group as a reference group). To measure the statistical significance of the interaction term defined as the HECTOR risk score (continuous or categorical) multiplied by the treatment binary variable, a multivariable Cox’s regression analysis was performed. Similar analyses were performed to test the interaction between serous histological subtype and the chemotherapy treatment binary variable (corrected for EEC and clear cell histological subtype), and the FIGO 2009 stage III (corrected for stages I–II) and p53abn (corrected for MMRd, NSMP as a reference group and POLEmut tumors removed to reach convergence).

All statistical tests were two sided with statistical significance accepted with P values <0.050.

Software and packages

EsVIT and HECTOR were implemented with Pytorch (v.1.8.1 and v.1.10.0, respectively). IG was implemented with Captum Python package (v.0.6.0), metrics such as the C-index with scikit-survival Python package (v.0.17.2), CPH models and the Kaplan–Meier method with Lifelines Python package (v.0.27.1), χ² tests with Scipy Python package (v.1.5.2), boxplot visualizations with altair Python package (v.4.2.0) and linear regression with statsmodels Python package (v.0.13.5). Differentially expressed genes were performed using DESeq2 (v.1.40.1)⁷² and R v.4.3.0 (2023-04-21 ucrt). Additional packages for image processing included Openslide Python package (v.1.1.2), OpenCV (v.4.3.0.36) and Pillow (v.7.2.0). Annotations were done with QuPath (v.0.4.1).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The tumor material and datasets generated during or analyzed in the present study are not publicly available owing to restrictions by privacy laws. Data and tumor material from PORTEC-1, PORTEC-2, PORTEC-3, MST and the TransPORTEC study are held by the PORTEC study group and the international TransPORTEC consortium. Data and tumor material from the Danish cohort are held by the coauthor of this article, G.Ø. Data and tumor material from the UMCG cohort are held by the coauthors of this article, H.W.N. and M.d.B., and from the LUMC by the coauthors N.H. and T.B. Requests for sharing of all data and material should be addressed to the corresponding author within 15 years of the date of publication of this article and include a scientific proposal. Depending on the specific research proposal, the TransPORTEC consortium (PORTEC-3 and TransPORTEC study) or the PORTEC study group (PORTEC-1, PORTEC-2 and MST) or coauthors G.Ø., H.W.N. and M.d.B., or N.H. and T.B., will determine when, for how long, for which specific purposes and under which conditions the requested data can be made available, subject to ethical consent. Requests for data access will be processed within a 3-month timeframe. TCGA-UCEC images, mutational status and clinical data are publicly available via the cBioPortal^65,66 for Cancer Genomics at https://www.cbioportal.org/study/clinicalData?id=ucec_tcga_pan_can_atlas_2018. The mRNA-seq data of the TCGA-UCEC were downloaded from http://firebrowse.org/?cohort=UCEC.

Code availability

The code base is available at https://github.com/AIRMEC/HECTOR.

Change history

01 July 2024
A Correction to this paper has been published: https://doi.org/10.1038/s41591-024-03126-z

References

Crosbie, E. J. et al. Endometrial cancer. Lancet 399, 1412–1428 (2022).
PubMed Google Scholar
Ørtoft, G., Lausten-Thomsen, L., Høgdall, C., Hansen, E. S. & Dueholm, M. Lymph-vascular space invasion (LVSI) as a strong and independent predictor for non-locoregional recurrences in endometrial cancer: a Danish Gynecological Cancer Group Study. J. Gynecol. Oncol. 30, e84 (2019).
PubMed PubMed Central Google Scholar
de Boer, S. M. et al. Adjuvant chemoradiotherapy versus radiotherapy alone in women with high-risk endometrial cancer (PORTEC-3): patterns of recurrence and post-hoc survival analysis of a randomised phase 3 trial. Lancet Oncol. 20, 1273–1285 (2019).
PubMed PubMed Central Google Scholar
Hogberg, T. et al. Sequential adjuvant chemotherapy and radiotherapy in endometrial cancer—results from two randomised studies. Eur. J. Cancer 46, 2422–2431 (2010).
CAS PubMed PubMed Central Google Scholar
Concin, N. et al. ESGO/ESTRO/ESP guidelines for the management of patients with endometrial carcinoma. Int. J. Gynecol. Cancer 31, 12–39 (2021).
PubMed Google Scholar
Abu-Rustum, N. et al. Uterine neoplasms, version 1.2023, NCCN Clinical Practice Guidelines in Oncology. J. Natl Compr. Cancer Netw. 21, 181–209 (2023).
CAS Google Scholar
Oaknin, A. et al. Endometrial cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann. Oncol. 33, 860–877 (2022).
CAS PubMed Google Scholar
Harkenrider, M. M. et al. Radiation therapy for endometrial cancer: an American Society for Radiation Oncology clinical practice guideline. Pract. Radiat. Oncol. 13, 41–65 (2023).
PubMed Google Scholar
Berek, J. S. et al. FIGO staging of endometrial cancer: 2023. Int. J. Gynecol. Obstet. 162, 383–394 (2023).
Google Scholar
Horeweg, N. et al. Prognostic integrated image-based immune and molecular profiling in early-stage endometrial cancer. Cancer Immunol. Res. 8, 1508–1519 (2020).
CAS PubMed Google Scholar
Fremond, S. et al. Interpretable deep learning model to predict the molecular classification of endometrial cancer from haematoxylin and eosin-stained whole-slide images: a combined analysis of the PORTEC randomised trials and clinical cohorts. Lancet Digit. Health 5, e71–e82 (2023).
CAS PubMed Google Scholar
Lafarge, M. W. & Koelzer, V. H. Towards computationally efficient prediction of molecular signatures from routine histology images. Lancet Digit. Health 3, e752–e753 (2021).
CAS PubMed Google Scholar
Sirinukunwattana, K. et al. Image-based consensus molecular subtype (imCMS) classification of colorectal cancer using deep learning. Gut 70, 544–554 (2021).
CAS PubMed Google Scholar
Graham, S. et al. Hover-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
PubMed Google Scholar
Lee, Y. et al. Derivation of prognostic contextual histopathological features from whole-slide images of tumours via graph deep learning. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-022-00923-0 (2022).
Article PubMed PubMed Central Google Scholar
Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878.e6 (2022).
CAS PubMed PubMed Central Google Scholar
Wulczyn, E. et al. Interpretable survival prediction for colorectal cancer using deep learning. NPJ Digit. Med. 4, 71 (2021).
PubMed PubMed Central Google Scholar
Yao, J., Zhu, X., Jonnagaddala, J., Hawkins, N. & Huang, J. Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks. Med. Image Anal. 65, 101789 (2020).
PubMed Google Scholar
Chen, R. J. et al. Whole slide images are 2D point clouds: context-aware survival prediction using patch-based graph convolutional networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention Vol. 12908 (eds de Bruijne, M. et al.) 339–349 (Springer Cham, 2021).
Courtiol, P. et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 25, 1519–1525 (2019).
CAS PubMed Google Scholar
Chen, R. J. et al. Multimodal co-attention transformer for survival prediction in gigapixel whole slide images. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 3995–4005 (IEEE, 2021); https://ieeexplore.ieee.org/document/9710773
Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In Proc. of the 35th International Conference on Machine Learning Vol. 80 (eds Dy, J. & Krause, A.) 2127–2136 (PMLR, 2018).
Wagner, S. J. et al. Transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study. Cancer Cell 41, 1650–1661.e4 (2023).
CAS PubMed PubMed Central Google Scholar
Using AI to improve the molecular classification of brain tumors. Nat. Med. 29, 793–794 (2023).
Jiménez-Sánchez, D. et al. Weakly supervised deep learning to predict recurrence in low-grade endometrial cancer from multiplexed immunofluorescence images. NPJ Digit. Med. 6, 48 (2023).
PubMed PubMed Central Google Scholar
Creutzberg, C. L. et al. Surgery and postoperative radiotherapy versus surgery alone for patients with stage-1 endometrial carcinoma: multicentre randomised trial. PORTEC study group. post operative radiation therapy in endometrial carcinoma. Lancet 355, 1404–1411 (2000).
CAS PubMed Google Scholar
Nout, R. A. et al. Vaginal brachytherapy versus pelvic external beam radiotherapy for patients with endometrial cancer of high-intermediate risk (PORTEC-2): an open-label, non-inferiority, randomised trial. Lancet 375, 816–823 (2010).
CAS PubMed Google Scholar
Stelloo, E. et al. Refining prognosis and identifying targetable pathways for high-risk endometrial cancer; a TransPORTEC initiative. Mod. Pathol. 28, 836–844 (2015).
CAS PubMed Google Scholar
Jobsen, J. J. et al. Outcome of endometrial cancer stage IIIA with adnexa or serosal involvement only. Obstet. Gynecol. Int. 2011, 962518 (2011).
PubMed PubMed Central Google Scholar
Ørtoft, G. et al. Location of recurrences in high-risk stage I endometrial cancer patients not given postoperative radiotherapy: a Danish gynecological cancer group study. Int. J. Gynecol. Cancer 29, 497–504 (2019).
PubMed Google Scholar
Workel, H. H. et al. CD103 defines intraepithelial CD8⁺ PD1⁺ tumour-infiltrating lymphocytes of prognostic significance in endometrial adenocarcinoma. Eur. J. Cancer 60, 1–11 (2016).
CAS PubMed Google Scholar
Kandoth, C. et al. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
PubMed Google Scholar
Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B. & Wei, L. J. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat. Med. 30, 1105–1117 (2011).
PubMed PubMed Central Google Scholar
Pecorelli, S. Revised FIGO staging for carcinoma of the vulva, cervix, and endometrium. Int. J. Gynaecol. Obstet. 105, 103–104 (2009).
PubMed Google Scholar
Zadeh, A., Chen, M., Poria, S., Cambria, E. & Morency, L.-P. Tensor fusion network for multimodal sentiment analysis. In Proc. 2017 Conference on Empirical Methods in Natural Language Processing 1103–1114 (Association for Computational Linguistics, 2017).
Mormont, R., Geurts, P. & Maree, R. Multi-task pre-training of deep neural networks for digital pathology. IEEE J. Biomed. Health Inform. 25, 412–421 (2021).
PubMed Google Scholar
Lambert, J. & Chevret, S. Summary measure of discrimination in survival models based on cumulative/dynamic time-dependent ROC curves. Stat. Methods Med. Res. 25, 2088–2102 (2016).
PubMed Google Scholar
Graf, E., Schmoor, C., Sauerbrei, W. & Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 18, 2529–2545 (1999).
CAS PubMed Google Scholar
Pai, R. K. et al. Quantitative pathologic analysis of digitized images of colorectal carcinoma improves prediction of recurrence-free survival. Gastroenterology 163, 1531–1546.e8 (2022).
PubMed Google Scholar
Esteva, A. et al. Prostate cancer therapy personalization via multi-modal deep learning on randomized phase III clinical trials. NPJ Digit. Med. 5, 71 (2022).
PubMed PubMed Central Google Scholar
Pece, S. et al. Comparison of StemPrintER with Oncotype DX recurrence score for predicting risk of breast cancer distant recurrence after endocrine therapy. Eur. J. Cancer 164, 52–61 (2022).
CAS PubMed Google Scholar
Jaume, G. et al. Modeling dense multimodal interactions between biological pathways and histology for survival prediction. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
Kojima, M. et al. Aberrant claudin-6-adhesion signaling promotes endometrial cancer progression via estrogen receptor α. Mol. Cancer Res. 19, 1208–1220 (2021).
CAS PubMed Google Scholar
Mackensen, A. et al. CLDN6-specific CAR-T cells plus amplifying RNA vaccine in relapsed or refractory solid tumors: the phase 1 BNT211-01 trial. Nat. Med. https://doi.org/10.1038/s41591-023-02612-0 (2023).
Article PubMed PubMed Central Google Scholar
Ueno, H. et al. Prognostic value of desmoplastic reaction characterisation in stage II colon cancer: prospective validation in a phase 3 study (SACURA trial). Br. J. Cancer 124, 1088–1097 (2021).
CAS PubMed PubMed Central Google Scholar
Corrado, G. et al. Endometrial cancer prognosis correlates with the expression of L1CAM and miR34a biomarkers. J. Exp. Clin. Cancer Res. 37, 139 (2018).
PubMed PubMed Central Google Scholar
Mirza, M. R. et al. Dostarlimab for primary advanced or recurrent endometrial cancer. N. Engl. J. Med. 388, 2145–2158 (2023).
CAS PubMed Google Scholar
Makker, V. et al. Lenvatinib plus pembrolizumab for advanced endometrial cancer. N. Engl. J. Med. 386, 437–448 (2022).
CAS PubMed Google Scholar
Eskander, R. N. et al. Pembrolizumab plus chemotherapy in advanced endometrial cancer. N. Engl. J. Med. 388, 2159–2170 (2023).
CAS PubMed PubMed Central Google Scholar
Kiemen, A. L. et al. Tissue clearing and 3D reconstruction of digitized, serially sectioned slides provide novel insights into pancreatic cancer. Med 4, 75–91 (2023).
PubMed Google Scholar
Foersch, S. et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat. Med. 29, 430–439 (2023).
CAS PubMed Google Scholar
Braman, N. et al. Deep orthogonal fusion: multimodal prognostic biomarker discovery integrating radiology, pathology, genomic, and clinical data. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2021 (eds de Bruijne, M. et al.) 667–677 (Springer, 2021).
Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
CAS PubMed PubMed Central Google Scholar
Jaume, G., Song, A. H. & Mahmood, F. Integrating context for superior cancer prognosis. Nat. Biomed. Eng. 6, 1323–1325 (2022).
CAS PubMed Google Scholar
Song, A. H. et al. Analysis of 3D pathology samples using weakly supervised AI. Cell 187, 2502–2520.e17 (2024).
CAS PubMed Google Scholar
León-Castillo, A. et al. Molecular classification of the PORTEC-3 trial for high-risk endometrial cancer: impact on prognosis and benefit from adjuvant therapy. J. Clin. Oncol. 38, 3388–3397 (2020).
PubMed PubMed Central Google Scholar
van den Heerik, A. S. V. M. et al. PORTEC-4a: international randomized trial of molecular profile-based adjuvant treatment for women with high–intermediate risk endometrial cancer. Int. J. Gynecol. Cancer 30, 2002–2007 (2020).
PubMed PubMed Central Google Scholar
Kuoppala, T. et al. Surgically staged high-risk endometrial cancer: randomized study of adjuvant radiotherapy alone vs. sequential chemo-radiotherapy. Gynecol. Oncol. 110, 190–195 (2008).
PubMed Google Scholar
RAINBO Research Consortium. Refining adjuvant treatment in endometrial cancer based on molecular features: the RAINBO clinical trial program. Int. J. Gynecol. Cancer 33, 109–117 (2022).
PubMed Central Google Scholar
Li, C. et al. Efficient self-supervised vision transformers for representation learning. In International Conference on Learning Representations (ICLR, 2022); https://openreview.net/forum?id=fVu3o-YUGQK
Zadeh, S. G. & Schmid, M. Bias in cross-entropy-based training of deep survival networks. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3126–3137 (2021).
PubMed Google Scholar
Höhn, A. K. et al. 2020 WHO classification of female genital tumors. Geburtshilfe Frauenheilkd. 81, 1145–1153 (2021).
PubMed PubMed Central Google Scholar
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. of the 34th International Conference on Machine Learning Vol. 70 (eds Precup, D. & Teh, Y. W.) 3319–3328 (PMLR, 2017).
Lafarge, M. W. & Koelzer, V. H. in Mitosis Domain Generalization and Diabetic Retinopathy Analysis (eds.Sheng, B. & Aubreville, M.) 226–233 (Springer Nature Switzerland, 2023).
Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).
PubMed Google Scholar
Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, pl1 (2013).
PubMed PubMed Central Google Scholar
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
PubMed Google Scholar
Liu, Z. et al. Swin transformer: hierarchical vision transformer using shifted windows. In Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV) 9992–10002 (IEEE, 2021); https://ieeexplore.ieee.org/document/9710580
Aubreville, M. et al. MItosis DOmain Generalization Challenge 2022. Zenodo https://doi.org/10.5281/zenodo.6362337 (2022).
Thorsson, V. et al. The immune landscape of cancer. Immunity 48, 812–830.e14 (2018).
CAS PubMed PubMed Central Google Scholar
Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. https://doi.org/10.1200/PO.17.00011 (2017).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by a translational research project grant from the Hanarth Foundation and the Swiss Federal Institutes of Technology (strategic focus area of personalized health and related technologies; grant no. 2021-367) and a grant from the Promedica Foundation (no. F-87701–41–01) during the conduct of the study. The PORTEC-1, PORTEC-2 and PORTEC-3 trials were funded by grants from the Dutch Cancer Society (DCS; grant nos. CKTO 90–01, CKTO 2001–04 and CKTO 2006–04, respectively). We first and foremost thank the participants in these studies and those who donated a tumor sample for translational research. We are grateful to the international and local investigators and the data management teams who recruited and followed the women who participated in these studies, and to the many pathologists who collected samples for the PORTEC-1, PORTEC-2 and PORTEC-3 randomized trial biobanks, as well as the TransPORTEC Research Consortium for the establishment of the TransPORTEC study. We thank the investigators of the prospective MST cohort and G.Ø. and E.H. and investigators of the Danish cohort. We are indebted to T. Rutten and N. ter Haar, LUMC, for excellent technical support, slide collection and scanning. We thank L. Vermij, A. Leon-Castillo and E. Stelloo for the contribution to molecularly classifying the samples. We thank V. S. Hadnagy, University Hospital Zurich, for contributing to the annotation of the EC image dataset used to develop the mitosis detector for the present study. We also thank the Light Microscopy team of the Cell and Chemical Biology Department, LUMC, for the technical support and use of the 3DHISTECH P250 scanner, and the Netherlands Cancer Institute for use of their 3DHISTECH P1000 scanner. We acknowledge and thank the SHARK team, the computational cluster of the LUMC, for their technical support and the installation of the Nvidia RTX 8000 GPUs. We also thank K. Yost for her work and support with the figures. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

Author information

These authors contributed equally: Viktor H. Koelzer, Tjalling Bosse.

Authors and Affiliations

Department of Pathology, Leiden University Medical Center, Leiden, The Netherlands
Sarah Volinsky-Fremond, Jurriaan Barkey Wolf, Vincent T. H. B. M. Smit & Tjalling Bosse
Department of Radiation Oncology, Leiden University Medical Center, Leiden, The Netherlands
Nanda Horeweg, Stephanie M. de Boer & Carien L. Creutzberg
Department of Computer Science, ETH Zurich, Zurich, Switzerland
Sonali Andani
Department of Pathology and Molecular Pathology, University Hospital, University of Zurich, Zurich, Switzerland
Sonali Andani, Maxime W. Lafarge & Viktor H. Koelzer
Swiss Institute of Bioinformatics, Lausanne, Switzerland
Sonali Andani
Department of Gynecology and Obstetrics, Leiden University Medical Center, Leiden, The Netherlands
Cor D. de Kroon
Department of Gynecology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
Gitte Ørtoft
Department of Pathology, Herlev University Hospital, Herlev, Denmark
Estrid Høgdall
Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
Jouke Dijkstra
Department of Radiation Oncology, Medisch Spectrum Twente, Enschede, The Netherlands
Jan J. Jobsen
Maastricht Radiation Oncology, MAASTRO, Maastricht, The Netherlands
Ludy C. H. W. Lutgens
Department of Clinical Oncology, Barts Health NHS Trust, London, UK
Melanie E. Powell
Department of Medical Oncology, Peter MacCallum Cancer Center, Melbourne, Victoria, Australia
Linda R. Mileshkin
Department of Medical Oncology and Hematology, Odette Cancer Center Sunnybrook Health Sciences Center, Toronto, Ontario, Canada
Helen Mackay
Department Medical Oncology, Gustave Roussy Institute, Villejuif, France
Alexandra Leary
Department of Surgical Sciences, Gynecologic Oncology, Città della Salute and S Anna Hospital, University of Turin, Turin, Italy
Dionyssios Katsaros
Department of Obstetrics and Gynecology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
Hans W. Nijman & Marco de Bruyn
Department of Radiotherapy, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
Remi A. Nout
Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
David Church
Oxford NIHR Comprehensive Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
David Church
Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland
Viktor H. Koelzer

Authors

Sarah Volinsky-Fremond
View author publications
You can also search for this author in PubMed Google Scholar
Nanda Horeweg
View author publications
You can also search for this author in PubMed Google Scholar
Sonali Andani
View author publications
You can also search for this author in PubMed Google Scholar
Jurriaan Barkey Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Maxime W. Lafarge
View author publications
You can also search for this author in PubMed Google Scholar
Cor D. de Kroon
View author publications
You can also search for this author in PubMed Google Scholar
Gitte Ørtoft
View author publications
You can also search for this author in PubMed Google Scholar
Estrid Høgdall
View author publications
You can also search for this author in PubMed Google Scholar
Jouke Dijkstra
View author publications
You can also search for this author in PubMed Google Scholar
Jan J. Jobsen
View author publications
You can also search for this author in PubMed Google Scholar
Ludy C. H. W. Lutgens
View author publications
You can also search for this author in PubMed Google Scholar
Melanie E. Powell
View author publications
You can also search for this author in PubMed Google Scholar
Linda R. Mileshkin
View author publications
You can also search for this author in PubMed Google Scholar
Helen Mackay
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra Leary
View author publications
You can also search for this author in PubMed Google Scholar
Dionyssios Katsaros
View author publications
You can also search for this author in PubMed Google Scholar
Hans W. Nijman
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie M. de Boer
View author publications
You can also search for this author in PubMed Google Scholar
Remi A. Nout
View author publications
You can also search for this author in PubMed Google Scholar
Marco de Bruyn
View author publications
You can also search for this author in PubMed Google Scholar
David Church
View author publications
You can also search for this author in PubMed Google Scholar
Vincent T. H. B. M. Smit
View author publications
You can also search for this author in PubMed Google Scholar
Carien L. Creutzberg
View author publications
You can also search for this author in PubMed Google Scholar
Viktor H. Koelzer
View author publications
You can also search for this author in PubMed Google Scholar
Tjalling Bosse
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.V.-F., N.H., V.H.K. and T.B. conceived the study design. S.V.-F. designed the model and trained in its use. S.V.-F., S.A. and J.B.W. provided the coding, and implementation and technical support. S.V.-F. and N.H. acquired the data. S.V.-F., N.H., S.A., J.B.W., M.W.L., J.D., M.d.B., D.C., C.L.C., V.H.K. and T.B. analyzed and interpreted the data. S.V.-F. drafted the paper and the figures. S.V.-F., N.H., S.A., J.B.W., M.W.L., M.d.B., D.C., V.H.K. and T.B. substantially reviewed the paper. All authors critically reviewed the paper and the results and approved the final version.

Corresponding author

Correspondence to Tjalling Bosse.

Ethics declarations

Competing interests

S.V.-F., N.H., V.H.K. and T.B. are co-inventors on the patent application no. 23315438.4 related to the present study. N.H. declares having received research grants from the DCS and Varian (paid to the institution) unrelated to the present study. C.D.d.K. declares KWF and ZonMW grants unrelated to the project. A.L. received funded research unrelated to the present study from AZ, Clovis, GSK, MSD, Ability, Zentalis, Agenus, Lovance, Sanofi, Roche, OSEimmuno and BMS, is an advisory board member or consultant for AZ, Clovis, GSK, MSD, Merck Serono, Ability, Zentalis, Agenus and Blueprint, and received honoraria and compensation for expenses from AZ, Clovis and GSK. R.A.N. declared research grants unrelated to the present study to the institution from Elekta, Varian, Accuray and Sensius, and is an advisory board member of MSD. M.d.B. received grants from the DCS, the European Research Council, Health Holland, Mendus, BioNovion, Aduro Biotech, Vicinivax, Genmab and IMMIOS (all paid to the institute) unrelated to the present study, received nonfinancial support from BioNTech, Surflay Nanotec and Merck Sharp & Dohme, and is a stock option holder in Sairopa. D.C. is on an advisory board of MSD, received research funding unrelated to the project of HalioDx and Veracyte (to TransSCOT consortium), is a spouse of an Amgen employee, is affiliated to the Wellcome Centre for Human Genetics and National Institute for Health and Care Research (NIHR) Oxford Biomedical Research Centre (BRC), and received funding from Oxford NIHR Comprehensive BRC and a Cancer Research UK (CRUK) Advanced Clinician Scientist Fellowship (C26642/A27963). C.L.C. received grants from the DCS for the PORTEC-1,-2,-3,-4a, RAINBO trials and research grant for translational work on PORTEC unrelated to the present study, and has leadership roles in and is chair of GCIG Endometrial Cancer Committee. V.H.K. declared being an invited speaker for Sharing Progress in Cancer Care and Indica Labs, is on the advisory board of Takeda and sponsored research agreements with Roche and IAG, all unrelated to the present study. T.B. received grants unrelated to this work by the DCS. The other authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Ming Lu, Amit Oza, Antonio Raffone and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Lorenzo Righetto and Ulrike Harjes, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overview of the data split and downstream analyses performed in this study.

One representative WSI per patient from an Formalin-Fixed Paraffin-Embedded (FFPE) block was included. 20% of cases meeting inclusion criteria were randomly held out for an internal test set (n = 353). The remaining 80% was used for five-cross validation (n = 1,408 patients). This training dataset was enriched with dropped WSIs of FIGO 2009 stage IV cases or those with missing outcome such as the TCGA-UCEC cohort²¹ for training with self-supervised learning (n = 1,862). Two cohorts were held out as external test sets, the UMCG external test set (n = 160) and the LUMC external test set (n = 151). The LUMC external test set contains up to three FFPE blocks per case. More details for training and data split are provided in Methods. Altogether, including the two training steps and all downstream analyses, this comprehensive analysis comprised data of 2,751 tumors of women. CT, chemotherapy.

Extended Data Fig. 2 Shifts of attention scores from unimodal to multimodal model.

a, Model using only H&E WSI (unimodal) and a corresponding example of the normalized attention scores shown as overlaid on the H&E WSI as a heatmap where red is high attention score and blue low attention score. b, The two-arm model with H&E WSI and image-based molecular class predicted by im4MEC, and a corresponding example of the normalized attention scores shown as overlaid on the H&E WSI. c, The multimodal three-arm HECTOR model with H&E WSI, image-based molecular class, and stage, and a corresponding example of the normalized attention scores shown as overlaid on the H&E WSI. d, Density plot of the normalized attention scores of the heatmap shown in a,b,c for each model. e, Quantitative analysis of the distribution shift between the three models in the internal test set (n = 353 patients) using the WSI-level skewness and median of the normalized attention scores.

Extended Data Fig. 3 Morphological features increasing risk score in HECTOR high versus low risk group and quantitative spatial analysis.

a, A representative selection of four patches for each morphological subtype (each selected from a different patient) increasing the risk score in the HECTOR low risk group as compared to the features increasing the risk score in the HECTOR high risk. Each patch is 180 × 180 μm. b, Spatial analysis of top 5% regions decreasing and increasing the risk score in all WSIs of the LUMC test set based on the manually annotated areas: tumor and invasive border. (left) An example showing the annotation of the tumor area and invasive border of one WSI and heatmap showing the contribution of the regions using the IG methods. (right) The relative contribution of these two annotated areas averaged by WSI shown for each HECTOR risk group. Data are presented as the mean values and standard deviation (n = 414 WSIs).

Extended Data Fig. 4 Overview of the PORTEC-3 randomized trial and analysis of treatment response prediction by HECTOR.

In PORTEC-3, 660 evaluable patients were randomized (1:1) between adjuvant external beam radiotherapy (EBRT) alone and external beam radiotherapy in combination with concurrent and adjuvant chemotherapy (CT). For 442 patients whose WSI was available, HECTOR risk scores were inferred. HECTOR risk groups cutoffs were kept the same as the training set (Methods).

Supplementary information

Supplementary Information

Supplementary Figs. 1–31, Tables 1–14 and Notes.

Reporting Summary

Supplementary Table 15

Supplementary Table 15, as described in the manuscript, provided the exact P values of the analysis performed in Fig. 5c.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Volinsky-Fremond, S., Horeweg, N., Andani, S. et al. Prediction of recurrence risk in endometrial cancer with multimodal deep learning. Nat Med (2024). https://doi.org/10.1038/s41591-024-02993-w

Download citation

Received: 27 November 2023
Accepted: 11 April 2024
Published: 24 May 2024
DOI: https://doi.org/10.1038/s41591-024-02993-w
Springer Nature America, Inc.

Prediction of recurrence risk in endometrial cancer with multimodal deep learning

Abstract

Similar content being viewed by others

Main

Results

EC cohorts

HECTOR design and performance

Comparison with current prognostic gold standard

Performance with multiple WSIs

Association with prognostic factors and input contribution

Morphological correlates of outcome risk

Genomic alterations, immune and transcriptional signatures

Adjuvant chemotherapy response prediction by HECTOR

Discussion

Methods

Ethics statement

Cohorts

Datasets

Performance evaluation

WSI preprocessing

Vision transformer-based patch representational learning

Multimodal DL prognostic model

Ablation studies

Association with clinicopathological data analysis

Input contribution

Cell-level composition

Outcome analysis

Genomic and transcriptomic correlation analysis

Analysis of adjuvant chemotherapy effect

Software and packages

Reporting summary

Data availability

Code availability

Change history

01 July 2024

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation