Abstract
Response to androgen receptor signaling inhibitors (ARSI) varies widely in metastatic castration resistant prostate cancer (mCRPC). To improve treatment guidance, biomarkers are needed. We use whole-genomics (WGS; n = 155) with matching whole-transcriptomics (WTS; n = 113) from biopsies of ARSI-treated mCRPC patients for unbiased discovery of biomarkers and development of machine learning-based prediction models. Tumor mutational burden (q < 0.001), structural variants (q < 0.05), tandem duplications (q < 0.05) and deletions (q < 0.05) are enriched in poor responders, coupled with distinct transcriptomic expression profiles. Validating various classification models predicting treatment duration with ARSI on our internal and external mCRPC cohort reveals two best-performing models, based on the combination of prior treatment information with either the four combined enriched genomic markers or with overall transcriptomic profiles. In conclusion, predictive models combining genomic, transcriptomic, and clinical data can predict response to ARSI in mCRPC patients and, with additional optimization and prospective validation, could improve treatment guidance.
Similar content being viewed by others
Introduction
With approximately 350,000 men dying yearly of prostate cancer, prostate cancer is the fifth leading cause of cancer-related death worldwide1. Although early-phase prostate cancer is known for its favorable prognosis, the prognosis of metastatic prostate cancer is poor, especially when patients progress to the castration-resistant phase of the disease2,3. The treatment of metastatic castration-resistant prostate cancer (mCRPC) has significantly improved since the advent of second-generation androgen receptor signaling inhibitors (ARSI), like abiraterone acetate + prednisone (AAP) and enzalutamide3,4,5. However, response to these treatments varies widely between individual patients4,5 To improve therapy guidance and optimize patient outcome, biomarkers, which can predict response before or soon after the start of therapy, are needed.
The existing biomarkers for treatment guidance in this setting are increasingly based on so-called liquid biopsies. It has been shown that five or more circulating tumor cells (CTCs) in 7.5 ml of blood and high levels of cell-free DNA (cfDNA) before the start of treatment are associated with a poor prognosis6,7,8. In addition, more detailed molecular analyses can be performed to predict resistance to ARSI. In CTCs, expression of androgen receptor variant 7 (AR-V7) is associated with resistance to ARSI, while this correlation is not found for chemotherapy8,9,10,11,12,13,14. To genotype cfDNA, gene panels targeting known driver and/or resistance-related genes are often used for sequencing or PCR. The most commonly identified alterations, that are associated with resistance to ARSI in patients, encompass AR mutations and amplifications8,15,16,17,18,19. Furthermore, RB1 loss, TP53 aberrations, ZFHX3 deletions and PI3K pathway defects were associated with worse survival8,15,19. However, liquid biopsy-based analyses are mostly targeted to a certain set of genes and rely on patients having a high tumor-derived cfDNA fraction in the blood. Therefore, liquid biopsies are less suitable for the discovery of potential biomarkers, that predict the outcome of treatment.
Whole genome and transcriptome sequencing (WGS and WTS) of tumor tissue provides extensive and detailed information about underlying genomic and functional aberrations of the malignancy. Rather than relying on a priori known targets using targeted gene-panels, studying the genome-wide somatic inventory may enable the unbiased discovery of biomarkers predicting treatment outcome. Prior work shows that genomic clusters could be linked to response to treatment, e.g., patients with microsatellite instability tend to respond well to immunotherapy and patients with a BRCA-like phenotype are likely to benefit from PARP-inhibition20,21. The value of sequencing the entire genome, including non-coding regions, for understanding tumor proliferation mechanisms is shown by the identification of an intergenic enhancer region upstream of AR, that is amplified in 81% of the mCRPC patients and correlates to increased AR expression20,21. Besides, WTS reveals that the Wnt/β-catenin pathway is enriched in enzalutamide-resistant patients in comparison to enzalutamide-naïve patients22. In addition, mutations in β-catenin and loss of 17q22 are solely found in enzalutamide-resistant patients and are associated with poor clinical outcome22,23.
Statistical analysis of WGS and WTS data is challenging due to the extreme high number of features. In the last years, precision oncology often employs machine learning (ML) approaches to build predictive models in clinical and preclinical settings using genomic and transcriptomic information24. By analyzing the performance of ML models, the predictive power of these features can be assessed. Recently, advanced deep learning models have shown promise in the field. In 2018, a deep learning model effectively integrated multiple data modalities and leveraged large available training data (~1000 drug response experiments per compound)25. Within clinical patient cohorts, sample sizes are usually smaller, leading to inevitable overparameterization and poor generalization performance when complex deep-learning models are used. Therefore, in such scenarios simple and strongly regularized ML models are preferred, as these generally suffer less from overfitting on limited training data and have already been proven to be efficient in similar contexts26,27. In addition, several methodological steps can be performed to handle small datasets. One such method is feature selection in which e.g., transcriptomic features, deemed as irrelevant for a particular response or genotype are removed28. Another widely applied procedure is dimensionality reduction in which small datasets with high feature dimensionality (e.g., transcriptome-wide expression) can be represented with reduced feature space and consequently be used in ML models with a lower risk of overfitting29.
In this study, we aim to develop a ML-based classification model using WGS and WTS characteristics from biopsied metastatic malignancies to predict response of individual mCRPC patients to ARSI. To this end, we interrogate the full genomic inventory of metastatic malignancies from 155 mCRPC patients, who were subsequently treated with ARSI. In addition, matched WTS of these malignant tissues is available for 113 included mCRPC patients. Based on ARSI treatment duration, patients are categorized into good and poor responders. Subsequently, we determine and validate relevant clinical, genomic, and transcriptomic features for their usage as features within a ML-based approach to predict response to ARSI. Finally, we validate the performance of this classification model within an internal and external patient cohort.
Results
Included patients in discovery cohort (CPCT-02)
Between February 2015 and October 2019, 235 patients with mCRPC were included within CPCT-02 and treated with AAP or enzalutamide directly after a fresh-frozen biopsy20,30. Two patients were included twice, resulting in the inclusion of 233 unique patients. From 235 biopsies, 155 (66%) could be successfully analyzed by WGS. Eighty biopsies were excluded due to an unevaluable biopsy (n = 42), a biopsy of the primary tumor (n = 13), whole exome sequencing (WES) instead of WGS (n = 11), protocol violation (n = 9), missing treatment information (n = 4) and a second evaluable biopsy in combination with ARSI within one patient (n = 1). The second evaluable biopsy of this patient was excluded to prevent overfitting in the analyses. Matched WTS data of the malignant tissue was available for 113 patients (Fig. 1).
Clinical characteristics and stratification of patients
Patients were stratified in good (≥180 days of treatment; n = 66), ambiguous (101–179 days of treatment; n = 25) and poor (≤100 days of treatment; n = 64) responders, based on treatment duration with ARSI (Fig. 1). Cut-off values were based on clinical practice (see Methods). Baseline characteristics for good, poor and ambiguous responders are shown in Table 1 and were compared for good and poor responders, as only these groups were included for biomarker discovery (see Methods). Good and poor responders were similar in age (mean ± SD: 68.1 ± 7.9 years and 69.5 ± 7.7 years, respectively, adjusted p (q) = 1.000). Poor responders showed a trend towards a higher proportion of biopsies, obtained from liver, compared to good responders (18.8% (n = 12) versus 3.0% (n = 2), q = 0.076). The proportion of patients treated with AAP and enzalutamide, respectively, after biopsy was comparable in the two groups (good responders 40.9% (n = 27) and 59.1% (n = 39), and poor responders 62.5% (n = 40) and 37.5% (n = 24), q = 0.266). Median treatment duration was 445 days (Q1-Q3: 242–NR) and 63 days (Q1 - Q3: 48–83) in the good and poor responder group, respectively. The number of prior systemic treatment lines was higher in poor responders than in good responders (median 2 treatment lines (Q1 - Q3: 1–3) vs 1 treatment line (Q1 - Q3: 0–1), q < 0.001). In detail, poor responders were more often previously treated with enzalutamide (37.5% (n = 24) vs 6.1% (n = 4), q < 0.001) than good responders. In addition, poor responding patients had a higher median PSA value at time of biopsy than good responding patients (140 ug/L (Q1–Q3: 58–390, n = 35) vs 23 ug/L (Q1–Q3: 13–92, n = 44), q < 0.001).
Exploration of relevant WGS and WTS characteristics relating to response
To investigate relevant WGS and WTS features, relating to treatment response, and to design our subsequent classification models, we split our discovery cohort (CPCT-02) of matched WGS and WTS samples (n = 113) into a training (n = 79; 70%) and internal validation (n = 34; 30%) dataset. The training set contained a balanced number of good and poor responders, n = 38 and 41 respectively (Fig. 2).
The genomic landscape of mCRPC patients, treated with ARSI
By utilizing WGS, we could inventory the genomic landscape of the good, ambiguous and poor responders (n = 155; Fig. 3).
Comparisons between good and poor responders were performed within the training set (n = 79). We observed significantly higher numbers of the tumor mutational burden (TMB; q < 0.001), total number of structural variants (SV; q < 0.05), total number of tandem duplications (q < 0.05) and total number of deletions (q < 0.05) within the poor responders compared to the good responders (Suppl. Figure 1a–d). In detail, the median (and Q1 - Q3) for poor responders vs. good responders of TMB, total SV, total number of tandem duplications and total numbers of deletions was observed to be 3.03 (2.32 − 4.3) vs. 2.21 (1.72 − 2.77), 349 (268 − 618) vs. 246 (172 − 377), 44.5 (24 − 80) vs. 30.5 (20 − 44) and 79.5 (59 − 116) vs. 64.5 (43 − 90), respectively.
We next assessed the mutational incidence of known genomic aberrations, related to ARSI resistance, which included genomic aberrations within AR, TP53, PTEN, RB1, CTNNB1 and chromosomal arms aneuploidies (Suppl. Fig. 1e, f)8,15,16,17,18,19. No statistically significant differences between good and poor responders for these markers were found in the training set. In addition, we did not observe significant mutual-exclusivity for driver-genes, as detected by unbiased driver detection (dN/dS and/or GISTIC2). Genes with protein-coding aberrations within ≥20% of all samples within either the poor or good responder group were explored as well. No significant differences between good and poor responders were observed.
Differentially expressed genes in good and poor responders to ARSI
Using the matched WTS from the training cohort (n = 79), we investigated differentially expressed genes (DEGs) between the good and poor responders for all protein-coding genes, which were not designated as putative biopsy-site specific markers (see Methods; Fig. 4). Using stringent criteria to select for uniform disparity between the two groups, we designated 151 genes as DEGs between the two groups (Fig. 4, Suppl. Table 1). In addition, we performed Gene-set Enrichment Analysis (GSEA) between the responder classes (Suppl. Figure 2a) and assessed the expression of AR-V7 (Suppl. Figure 2b).
Within the DEGs (n = 151), we observed uniform presence of genes regulating or attributed to epithelial-mesenchymal transition (EMT) such as TIMP-3 and TGFBI, or found genes previously attributed to tumorigenesis, poor survival and/or aggressiveness such as RGS2 andSLC7A5 to harbor higher expression within the poor responders vs. the good responders31,32. Conversely, genes attributed to the suppression of tumor growth and/or metastatic potential such as RBM47 and ENDOD1 were expressed in fewer quantity33,34 (Suppl. Fig. 2a). We did not observe dissimilar expression of AR-V7 between responder classes (Suppl. Fig. 2b).
Concordant, the GSEA revealed enriched expression of mechanisms and signaling, commonly reported within more aggressive forms of prostate cancer, including EMT, coupled with enriched inflammatory responses, TGF-β receptor signaling and TNFα signaling via NF-kB. In addition, the good responders revealed enrichment of the androgen response gene-set.
Robustness assessment of differential expression analysis
After the initial differential gene expression analysis, we performed an out-of-sample analysis in a Leave-One-Out Cross-Validation (LOOCV) scheme to test the robustness of the selected DEGs due to our limited sample size (see Methods). As we observed notable variation within several DEGs between LOOCV folds, we suspected that a straightforward DESeq2-approach might possibly not provide robust results for classification purposes (Fig. 4). Therefore, we opted for an alternative methodology (Independent Component Analysis, described below) for feature selection during classification model development.
Design of a machine learning-based classification model to predict response to ARSI
Our approach to generate classification models included three stages. First, we assessed relevant and robust WGS, WTS and clinical characteristics for treatment response using a LOOCV on the training set of samples with matched WTS and WGS (n = 79). Resulting models were subsequently compared and evaluated in the internal validation sets and finally, validated in an external cohort (see Methods; Fig. 2). In total, we utilized four schemes of classification models: 1) WGS-only, 2) WTS-only, 3) combined WGS and WTS and, 4) combined WGS and clinical co-variates. For these models, the remaining samples, that were not used for internal training (n = 79), were used for the internal validation. For the WGS-only and combined WGS/clinical variables models, this included 76 patients, spanning 28 good, 23 poor and 25 ambiguous responders. For the WTS-only and combined WGS/WGS models, this included 34 patients, spanning 9 good, 6 poor and 19 ambiguous responders. In addition, similar mCRPC patients from the West Coast Dream Team (WCDT) cohort, who were treated with ARSI as next therapy after biopsy, were used as external validation cohort35. Within this cohort, relevant WGS and WTS characteristics were available for 56 and 77 patients, respectively.
Initial model performance assessment with LOOCV
WGS-only classification model
We utilized the four previously observed WGS characteristics, which revealed statistically significant differences between good and poor responders within the internal training set (TMB and the numbers of total structural variants, tandem duplications and deletions) to train a Logistic Regression classifier. Performance was subsequently measured as Area Under the Curve (AUC) of 0.76, with a specificity and sensitivity of 49% and 79%, respectively (Fig. 5a and Table 2). Classifier hyperparameters were further tested in grid search, but no unequivocally better setting was found, when evaluating the model in LOOCV.
WTS-only classification model
Prior to WTS-based classification, dimensionality reduction was performed on the full transcriptome using multiple approaches (see Methods). Independent Component Analysis, sparse PCA, and conventional PCA were applied to the data with components ranging from 10 to 50, and consequently used as input in the training of linear Support Vector Classifier (SVC) models. The best overall performance was achieved with 40 independent components (ICs), with an AUC of 0.76, specificity of 83%, and sensitivity of 58% (Fig. 5a and Table 2).
Combining WGS and WTS in ensemble classification models
Notable overlaps could be identified in the predicted true positives (true good responders, n = 18) and predicted true negatives (true poor responders, n = 17) of the WGS-only and WTS-only models (Suppl. Fig. 3). The WGS-only model yielded better classification of good responders than the WTS-only model (79% vs. 58% sensitivity), whilst the WTS-only model yielded better classifications of poor responders (83% vs 49% specificity; Table 2). To investigate whether leveraging both WTS and WGS features would improve performance, we combined our best-performing WGS-only and WTS-only classification models using two ensembling approaches (see Methods). The stacking classifier resulted in an AUC of 0.76 (71% specificity / 71% sensitivity), whilst ensemble averaging resulted in an AUC of 0.81 (73% specificity / 68% sensitivity) (Fig. 5b and Table 2). The four WGS features and the WTS features from the best performing model (40 ICs) were also combined in two additional ensembling experiments (see Methods). The bagging classifier yielded an AUC of 0.76 (66% specificity / 71% sensitivity), while the multi-model averaging ensemble resulted in an AUC of 0.75 (59% specificity / 66% sensitivity) (Fig. 5b). Thus, the ensemble model that outperformed the WGS-only and WTS-only classification models was the averaging ensemble, which yielded an AUC of 0.81 compared to the WGS-only and WTS-only model with both an AUC of 0.76 (Fig. 5e).
Addition of clinical data to the WGS- and WTS-based classification models
Compared to true good responders, true poor responders received more prior treatment lines for metastatic prostate cancer, including more frequently prior enzalutamide (Table 1). Therefore, we determined whether including information on whether patients had received prior treatment with ARSI and/or taxane-based chemotherapy and the number of respective treatment lines would increase the performance of the best-performing classification models. A classification model based solely on these clinical variables yielded an overall mediocre performance with a maximal AUC of 0.61 and sensitivity and specificity of 45% and 51%, respectively (Fig. 5c, Suppl. Fig. 4). However, we investigated whether a potential synergistic effect could be found by integrating a mixture of these clinical variables with the WGS and WTS data. Out of the models, combining WGS with clinical variables, the addition of prior/no prior ARSI as feature into the WGS-only model resulted in the highest performance increase compared to WGS-only. This combined ‘clinicogenomics’ model yielded an AUC of 0.81 with 66% specificity and 76% sensitivity (Fig. 5c, e and Table 2). The model that used WTS combined with prior/no prior ARSI also performed well, yielding an AUC of 0.82 with 73% specificity and 74% sensitivity (Fig. 5c, e and Table 2). A final combined model, which included both WTS and WGS features with prior/no prior ARSI, resulted in an AUC of 0.84 with 73% specificity and 74% sensitivity (Fig. 5c, e and Table 2).
Shuffled label experiments
To confirm whether the presented classification models operate on meaningful underlying structures, random label permutation experiments were performed on the best models in LOOCV setting. The shuffled label experiments resulted in a median AUC of 0.50–0.51 for all models, with upper quartiles of the shuffled label experiments well below the AUC obtained using correctly labeled data (Fig. 5d). Based on these results, we concluded that our presented classification models indeed capture underlying patterns relating to the treatment response.
Validation of final classification models
We validated our best-performing models in an internal and external validation cohort. Here, we describe the validation of one of the best performing models, which utilizes the four significant genomic characteristics and prior treatment with ARSI (clinicogenomics model), in detail (Table 2, Figs. 6 and 7). The other models were also successfully validated and the corresponding results are summarized in Table 2 and Suppl. Figs. 6 and 8.
Internal validation cohort
For internal validation, we used 76 WGS samples of the CPCT-02 cohort, that were not used during training. This internal validation cohort encompassed 28 good, 23 poor and 25 ambiguous responders. For 34 patients, including 9 good, 6 poor, and 19 ambiguous responders, matched WTS was available. This subset was used for the validation of models, that included transcriptomics features (Table 2).
Within our internal validation cohort (n = 76), we correctly predicted 21 out of 28 true good responders and 13 out of 23 true poor responders using the clinicogenomics model, thereby resulting in a sensitivity of 79% and specificity of 57% and an AUC of 0.74. These results are comparable with the results during training. Although limited by the number of samples during validation, the overall distribution of genomic and clinical features for predicted classes resembles those seen during feature selection (Fig. 6a–n). Survival analysis of the complete internal validation cohort, including the true ambiguous responders, revealed an overall longer ARSI-treatment duration (p = 0.015; log-rank test) for predicted good responders vs. poor responders with respectively a median (and 95%CI) of 187.5 days (143-386) and 116.5 days (90–138) (Fig. 6o).
To investigate the confidence of our binary predictions (i.e., predicted poor or good response), we explored whether including a third category of predicted ambiguous responders, capturing uncertain predictions (probability scores of 50 ± 10%), to the clinicogenomics model could result in a better discrimination of poor and good predicted responders (Fig. 6b and p). Using these three prediction categories, similar survival analysis indeed revealed a larger stratification of treatment duration between predicted poor and good responders with a median ARSI-treatment duration (and 95%CI) of 217 days (166–488), 133 days (110–267) and 103 days (84–147) for good, ambiguous and poor predicted responders, respectively, (Fig. 6o) and an increased statistical difference between predicted good and poor responders of q = 0.0013 (pairwise log-rank test with BH-correction), compared to the two-group scheme (p = 0.015), described above. Although all predicted groups consist of patients from all three true responder categories, only 12% (n = 3) of the predicted poor responders is a true good responder, while 23% (n = 7) of the predicted good responders is a true poor responder (Fig. 6b–c).
To explore whether the model has additional value in all patient groups, the performance of the clinicogenomics model was tested in uniform pre-treated subgroups within our internal validation cohort (Suppl. Figure 5). In patients, who received 0 or 1 prior therapy, predicted good responders showed a higher median ARSI-treatment duration than predicted poor responders of 266 days (95% CI 172-790, n = 23) vs 129.5 days (95% CI 89-NA, n = 10), p = 0.059. Predicted ambiguous responders showed a median treatment duration of 218 days (95% CI 116-NA) (n = 9) in this subgroup (Suppl. Fig. 5a). In patients, who received ≥2 prior therapies, the difference between predicted good and poor responders was less pronounced (median treatment duration (95% CI) 110 days (52-NA) (n = 7), 121.5 days (102-NA) (n = 12) and 84 days (59–147) (n = 15) for good, ambiguous and poor responders, respectively, p = 0.093 for good vs poor responders) (Suppl. Fig. 5b). In addition, in patients who did not receive prior enzalutamide, predicted good responders showed a longer treatment duration than predicted poor responders (median 217 days (95% CI 166-488) (n = 30) and 90 days (95% CI 84-189) (n = 15) respectively, p = 0.012). Predicted ambiguous responders showed a moderate median treatment duration of 142 days (95% CI 112-NA) (n = 15) (Suppl. Fig. 5c). In patients who did receive prior enzalutamide, no good responders were predicted, while ambiguous and poor predicted responders showed a median treatment duration of 117.5 days (95% CI 63-NA, n = 6) and 112 days (95% CI 59-NA, n = 10), respectively (Suppl. Fig. 5d). Despite being limited by the sample sizes per subgroup, the relevance of incorporating genomic features was especially visible in patients, who received less prior therapies.
Upon internal validation of the other classification models, especially the clinicotranscriptomics model performed well with a specificity of 50%, sensitivity of 89% and AUC of 0.83. Predicted good responders showed a median treatment duration of 243 days (95% CI 110-NA, n = 9), compared to 138 days (95% CI 112-168, n = 14) for poor responders (q = 0.020) (Table 2, Suppl. Fig. 6).
External validation cohort
Next, the models were externally validated in the West Coast Dream Team (WCDT) cohort, which included 56 and 77 mCRPC patients for whom WGS and WTS of metastatic biopsies, respectively, was available, and who were treated with ARSI after these biopsies 35. In contrast to the CPCT-02 cohort, clinical outcome in the WCDT cohort was only expressed as overall survival from time of biopsy (OS). Nevertheless, as the correlation between treatment duration and overall survival was clear for patients within the CPCT-02 cohort (mean OS (95% CI) 1613 days (1365–1860) (n = 66), 764 days (547–982) (n = 25) and 774 days (568–979) (n = 64) for true good, ambiguous and poor responders, respectively, p < 0.001 for true good vs poor responders), use of the WCDT cohort for external validation was considered justified (Suppl. Fig. 7a).
After application of the clinicogenomics classification model on the external validation cohort, survival analyses of predicted classes revealed overall longer OS (p = 0.015; log-rank test) for predicted good responders (n = 27) vs. predicted poor responders (n = 29), with a median (and IQR) of 34.1 (25.7-NA) and 17.4 (9.8-31.5) months, respectively, and a hazard ratio (95% CI) of 0.47 (0.28-0.88) (Fig. 7).
In response to the correlation of treatment duration and OS in true responders in the internal cohort and the difference in OS in predicted responders in the external validation cohort, we explored OS in predicted responders in the internal training and validation cohort. Nevertheless, this did not reveal statistically significant differences between predicted poor and good responders (n = 31 vs. n = 36, q = 0.88 and n = 25 vs. n = 30, q = 0.15, respectively; in total 46% of the patients had to be censored for OS) (Suppl. Figure 7b, c).
Finally, we validated the other classification models within the external cohort. As in the internal validation cohort, the clinicotranscriptomics model showed good performance with a hazard ratio of 0.51 (95% CI 0.30–0.86) and a median OS of 1017 days (IQR 771–1691, n = 42) and 589 (IQR 487–959, n = 35) for predicted good and poor responders, respectively (p = 0.001) (Suppl. Fig. 8). Combination of the genomics and transcriptomics in an averaging ensemble model did not result in a better performance than the single models.
Application of an adapted clinicogenomics model to WES data
To explore the possibility of applying the clinicogenomics model to targeted sequencing data, we assessed the importance of the individual features to the clinicogenomics model. TMB and prior ARSI were more valuable than the number of SVs, deletions and tandem duplications (Suppl Fig. 9). As TMB is also the only feature that could be partially extracted from targeted sequencing data, we developed a simplified model based on TMB and prior ARSI only. Subsequently, we applied this model to WES data, that was extracted from the original WGS data, showing a specificity of 56%, sensitivity of 84% and an AUC of 0.71 in the training cohort and good performance in the internal (q = 0.001) and external validation cohort (p = 0.029) (Table 2, Fig. 8).
Discussion
Within this study, we performed an unbiased discovery of biomarkers in whole genomic and transcriptomic data to predict response of mCRPC patients to ARSI. Subsequently, we developed multiple classification models, that can predict response to ARSI in individual mCRPC patients, using machine learning. The clinicogenomics model as well as the clinicotranscriptomics model, both based on prior treatment with ARSI and genomic or transcriptomic features, respectively, performed well in the training set, internal and external validation cohort. The averaging ensemble model, based on a combination of the genomics and transcriptomics model, performed good as well during training and external validation, but could not distinguish good from poor responders in the internal validation. In addition, we considered it less suitable for clinical application, since it did not outperform the clinicogenomics and clinicotranscriptomics model, and obtaining the combined sequencing data would be more expensive. The exome-only approximation of the clinicogenomics model showed good results. Application of this model in current clinical practice would be lower in costs than the WGS-based model. In addition, genomics-based models might also be effective with liquid biopsy-obtained sequencing data, which offers possibilities for less invasive response prediction.
Clinical differences between true good and poor responders included a lower number of prior treatment lines, less prior treatment with enzalutamide and lower PSA at time of biopsy within good responders, and were expected based on previous studies36,37,38. Prior treatment with enzalutamide could have caused resistance to ARSI within the poor responders, whilst the lower PSA at start of ARSI within the good responders has previously been associated with a better prognosis36. However, as baseline PSA was only available for a part of the patients, this could not be included in the training of the classification models37,38. Nevertheless, it might be interesting to add this feature to the models during future optimization.
Comparison of genomic characteristics in the internal training set revealed four significantly enriched genomic markers within the true poor vs. good responders. These genomic characteristics included TMB, the total number of structural variants and the total number of tandem duplications and deletions. Genomic aberrations within AR, TP53, PTEN, RB1, CTNNB1, and chromosomal arms aneuploidies, that were previously associated with ARSI resistance, could not be confirmed within our internal training set8,15,16,17,18,19,22,23,39. In addition to genomic characteristics, we observed uniform presence of genes and gene-sets regulating or attributed to EMT, tumorigenesis, poor survival and/or aggressiveness, having greater expression within the poor responders vs. the good responders. AR-V7, which was previously associated with ARSI resistance, was not differentially expressed in the internal training cohort8,9,10,11,12,13,14.
These discrepancies might be caused by differences in prior therapy and used clinical outcome between the cohorts of the large tissue-based studies (SU2C West Coast and East Coast cohort). In addition, the SU2C West Coast cohort compared enzalutamide-naïve and enzalutamide-resistant patients whilst we investigated pre-treatment biopsies of good and poor responders for WTS analyses. The cfDNA-based studies often studied only a targeted panel of genes and might be confounded by a varying (unknown) tumor fraction within blood.
The CPCT-02 cohort is a diverse cohort of mCRPC patients, who varied in phase of their disease, as is for example illustrated by the wide distribution in number of prior therapies30. As we expected treatment duration to be less influenced by disease phase than overall survival, we preferred treatment duration above overall survival as clinical endpoint. Additionally, treatment duration was already available for most patients, while overall survival would have needed to be censored for approximately half of the patients, which would have increased uncertainty in training and internal validation of the model. Nevertheless, treatment duration and overall survival appeared to be highly correlated within the CPCT-02 cohort, justifying its use in the WCDT validation cohort, for which only OS was available35.
For the clinicogenomics model, additional analyses were performed. The addition of the ambiguous prediction category in the internal validation cohort increased the overall stratification of ARSI-treatment duration for good and poor predicted responders, which were predicted with at least 60% predictive probabilities. To explore the additional value of the clinicogenomics model in clinical subgroups, the performance was tested in uniform pre-treated subgroups within our internal validation cohort. Despite being limited by the number of patients within the subgroups, the relevance of incorporating genomic features was especially visible in patients, that were not heavily pre-treated. Interestingly, this is also the patient group, that would benefit most from improved treatment guidance, as the number of available therapies is highest in the beginning of the disease course.
By simultaneously interrogating both WGS and WTS with machine learning, we were able to perform an unbiased discovery of biomarkers for response to ARSI in one of the largest cohorts of mCRPC patients with extensive sequencing data. Although clinical studies often have relatively small sample sizes for statistical analyses of whole omics data, machine learning techniques such as dimensionality reduction with Independent Component Analysis, and LOOCV, enabled the selection of predictive features whilst preventing overfitting. ML-based classification models can, in contrast to the traditional statistical models, determine the most predictive combination of biomarkers from a large set of features and predict response of future individual patients.
Up to now, no biomarkers for response prediction to ARSI are implemented in clinical practice. The most extensively studied biomarker is AR-V7 in CTCs, which presence has been associated with a shorter PFS and OS14. However, questions are raised about the confounding prognostic value of AR-V7, and a randomized controlled trial, showing better outcomes for AR-V7 positive patients, who were treated with other therapies than ARSI, hasn’t been performed yet40. The observed performance of our classification models is not high enough for direct application in the clinical setting. In addition, we can’t distinguish whether our models have rather a predictive or prognostic value, as in this study patients were only treated with ARSI. To the best of our knowledge, no other machine learning-based models, that aim to predict response to ARSI in mCRPC patients, have been published.
This study does show the possibilities of response or prognosis prediction based on whole omics data. We also explored the performance of a simplified version of the clinicogenomics model on approximated WES data, which showed good results in the training and validation cohorts as well. Currently, the lower costs of WES are an advantage for the clinical implementation of the simplified model. Nevertheless, the implementation of a wider range of genomic features in the model, as within the original clinicogenomics model, might result in better generalizability in other patient cohorts. Additionally, although whole omics sequencing is not available for all patients nowadays, it is expected that WGS will be more cost-efficient than targeted panel sequencing in the near future due to decreasing costs and increasing number of targeted therapies41.
In theory, the clinicogenomics model might also be extended to liquid biopsies if sequenced deeply enough to reliably obtain tumor mutational burden and structural variant load. However, obtaining comparably detailed sequencing data from liquid biopsies might be challenging due to the often low tumor fraction and amount of cfDNA, which can be isolated from blood. Nevertheless, analysis on cfDNA does harbor the potential to better capture the inherent heterogeneity of (metastatic) cancer and different clones present throughout the body and would be a worthwhile endeavor to follow up. However, a similar modeling approach to generate a cfDNA-specific model would likely yield better results and might better take into consideration the landscape as seen within cfDNA.
Prediction models could be used to not only stratify patients with a predicted poor response to ARSI for alternative treatments, but also to identify those patients, which are in highest need of additional therapies. In response, clinical trials can focus on the subgroup of patients who respond poorly or moderately to standard-of-care options, such as ARSI, and who would benefit most from the development of additional therapies.
In conclusion, response to ARSI in mCRPC patients can be predicted using machine learning-based classification models, that included whole genomics, transcriptomics and prior treatment data. After optimization and prospective validation, these models could be used to guide treatment decisions and select those patients for clinical trials, that would benefit mostly from the development of therapies.
Methods
Study design and patients
With 41 participating hospitals within the Netherlands, the Center for Personalized Cancer Treatment (CPCT) aims to improve cancer treatment by selecting patients for clinical trial participation based on Next Generation Sequencing data of tumor tissue. A list of participating hospitals is available via www.cpct.nl/ziekenhuizen. The prospective CPCT-02 biopsy study (NCT01855477) has been approved by the medical ethical committee of the University Medical Center Utrecht and has been conducted in accordance with the Declaration of Helsinki. In- and exclusion criteria were published before20,30,42. In short, patients were eligible if they had a locally advanced or metastatic solid tumor for which a next line of systemic treatment with a registered anti-cancer agent was indicated, and a safe tumor biopsy could be obtained. All patients provided written informed consent before any study procedures were performed. Compensation for participation was not provided.
For the current analysis, all mCRPC patients, who underwent a successfully sequenced biopsy from a metastatic lesion within the CPCT-02 study between February 2015 and October 2019, and who were subsequently treated with AAP or enzalutamide, were included. As CPCT-02 is an ongoing study with more than 4000 patients, we used a snapshot of the clinical data from December 19th, 2021 for the current analysis (ALEA Clinical). Clinical data collection is performed by trained local data managers and supervised by a central data manager.
Stratification of patients based on response to ARSI
As the main reason for stop of ARSI is progression of disease and rarely toxicity, patients were stratified according to treatment duration (TD) as surrogate for treatment response4,5,43. Patients were stratified in good (TD ≥ 180 days), ambiguous (TD 101–179 days), and poor (TD ≤ 100 days) responders. Cut-off values were based on clinical practice. We considered patients with a treatment duration of ≤100 days as true poor responders, as 100 days (~12 weeks) is typically the first major decision point for treatment (dis)continuation according to the PCWG3 criteria44. In addition, another threshold was set at ≥180 days to distinguish the true good responders from the ambiguous responders. To minimize the chance of bias due to incorrectly categorized patients, only the good and poor responder group were used for biomarker discovery and training of the classification model. Nevertheless, for a complete overview of the patient cohort, the ambiguous responders are visualized in the figures and are included during the testing of the classification model.
Study procedures, sample processing, and sequencing strategies
Study procedures consisted of peripheral blood samples for germline DNA and image-guided core needle biopsies of a metastatic lesion. Biopsies were obtained before start of systemic treatment, independent of line of therapy. Detailed study procedures were published before30,42. In short, core needle biopsies were obtained according to standardized protocols and frozen in liquid nitrogen, directly after the procedure. In addition, a tube of blood was drawn. Further sample processing has been performed by the Hartwig Medical Foundation, Amsterdam, the Netherlands. Tumor cellularity was estimated by an experienced pathologist based on a single 6 µm haematoxylin and eosin (H&E) stained section. DNA was isolated from blood and biopsies with ≥30% tumor cellularity, according to supplier’s protocol (Qiagen) using the DSP DNA Midi kit and QIAsymphony DSP DNA Mini kit, respectively. Barcoded DNA libraries were prepared from 50–100 ng of genomic DNA (TruSeq Nano LT library preparation, Illumina) and sequenced on HiSeqX generating 2 × 150 read pairs using standard settings (Illumina).
Whole-transcriptome sequencing was performed according to the manufacturer protocols using a minimum of 100 ng total RNA input. Total RNA was extracted using the QIAsymphony RNA kit (QIAGEN, FRITSCH GmbH, Idar-Oberstein, Germany). Paired-end sequencing of (m)RNA was performed on either the Illumina NextSeq 550 platform (2 x 75bp; Illumina, San Diego, CA, USA) and NovaSeq 6000 platform (2 x 150bp; Illumina, San Diego, CA, USA) using manufacturer’s protocols.
Processing and analysis of the whole-genome sequencing data
Pre-processing of whole-genome sequenced samples
Whole-genome sequencing samples were pre-processed by the GRIDSS, PURPLE, LINX workflow as detailed previously by Priestley et al. and Cameron et al.30,45. PURPLE v3.1, GRIDSS v2.11.1 and LINX v1.16 was used by the Hartwig Medical Foundation (HMF) using a matched-normal design using peripheral blood.
Additional processing of whole-genome sequenced samples
From the WGS-data obtained from the HMF, we performed additional processing using a custom workflow as implemented in the R2CPCT (v0.3.2) package. Genomic variants were re-annotated using Variant Effect Predictor46 (VEP; release 104) based on GRCh37 and GENCODE v38 annotations using the custom workflow available from https://github.com/J0bbie/VariantAnnotation_VEP. In addition, gnomAD47 (genome and exome v2.1.1) and ClinVar48 (accessed on 27-09-2021) annotations were added in addition to default VEP annotations.
Genomic (somatic) variants were filtered if they were present in ≥5 samples in the Panel-Of-Normals (PON) of the HMF. In addition, genomic variants were filtered if they were present in the gnomAD exome and/or genome populations with an allele-frequency (AF) of 0.001 and 0.005, respectively. Large structural somatic variants (SV), as detected by GRIDSS (PASS-only), were imported and annotated using the StructuralVariantAnnotation package (v1.10.0) into translocations, deletions, insertions, inversions, tandem duplications and single-breakends (in which the partnering break-end could not be detected).
Genome-wide ploidy, overlapping copy-number segments and their estimated tumor purity-corrected absolute copy-number as derived by PURPLE were used in assessing gene-wise copy-number alterations. If the overlapping copy-number segment of a gene harbored an estimated absolute copy-number ≤0.75, the gene would be classified as an “deep deletion”. Similarly, if the estimated absolute copy-number was only half of genome-wide ploidy, it would be classified as an “deletion”. If the estimated absolute copy-number was 1.5 times the genome-wide ploidy, it would be classified as an “amplification” and if the estimated absolute copy-number was 3 times the genome-wide ploidy or constituted ≥15 copies, it would be classified as a “deep amplification”. For chromosome X and Y, a correction of genome-wide ploidy minus one was used to correct for haploidy in these chromosomes. In addition, if the gene-wise B-allele frequency based on heterozygous germline markers was ≤0.15 or ≥0.85, it would also be classified as a Loss-Of-Heterozygosity (LOH) event. Per gene, this approach was also used to detect deleted or amplified exons using the same criteria.
CHORD (v2.0)49 was used to assess samples with BRCA1/BRCA2-associated homologous repair deficiency using default settings. ShatterSeek (v0.6)50 was used to detect putative chromothripsis events using best-practice settings as detailed by the authors. The criteria for a chromothripsis-like event were based on the following criteria: (a) total number of intra-chromosomal SVs involved in the event ≥25; (b) max. number of oscillating CN segments (2 states) ≥7 or max. number of oscillating CN segments (3 states) ≥14; (c) total size of chromothripsis event ≥20 megabase pairs (Mbp); (d) satisfying the test of equal distribution of SV types (p > 0.05); and (e) satisfying the test of non-random SV distribution within the cluster region or chromosome (p ≤ 0.05).
Discovery of genes under evolutionary selection
We performed a dN/dS analysis on somatic mutations (SNV and InDels) using dndscv (v0.0.1.0)51 on respective genome sequences and transcript annotations using a custom transcript database based on ENSEMBL Genes (v104)/GENCODE (v38) annotations. We performed a dN/dS analysis over the entire discovery cohort (n = 155) and on the poor and good responders, separately. Genes-of-interest were selected based on the statistical significance, corrected for multiple hypothesis testing (Benjamini-Hochberg), which integrated all mutation types (missense, nonsense, essential splice-site mutations and InDels; qglobal_cv ≤ 0.1) and/or without InDels (qallsubs_cv ≤ 0.1).
Unbiased detection recurrent and focal copy-number aberrations and overlap with known drivers
We performed GISTIC2 analysis (v2.0.23) for the WGS-discovery cohort on the PURPLE-derived copy-number segments using tumor purity-corrected absolute copy-numbers as input (log2-transformed − 1, i.e., diploid is set to zero); haploid chromosomes in male samples were corrected by adding a pseudo-count (of 1) prior to log2-transformation. Segments with log2-transformed values ≤10 were set to −10.
GISTIC2 (v2.0.23) was performed using the following settings with default GISTIC2-provided GRCh37 annotations:
gistic2 -b <output > -seg <segments > -refgene hg19.UCSC.add_miR.140312.refgene.mat -genegistic 1 -gcm extreme -maxseg 4000 -broad 1 -brlen 0.98 -conf 0.95 -rx 0 -cap 3 -saveseg 0 -armpeel 1 -smallmem 0 -res 0.01 -ta 0.3 -td 0.3 -savedata 0 -savegene 1 -qvt 0.1 -twoside 0
We performed this GISTIC2 analysis for the full discovery cohort (n = 155) and separately on the poor and good responder groups.
GISTIC2 output was imported and re-annotated using GENCODE annotations (v38; min. 10 bp overlap) thereby using the wide-peak limits of the recurrent copy-number peaks (q ≤ 0.1) to classify the region containing the likely target(s) of the recurrent and focal copy-number aberration.
Genes were annotated to GISTIC2 peaks (q ≤ 0.1) based on the following strategy;
-
1.
All overlapping genes (min. 10 bp) were assigned to the each GISTIC2 peak.
-
2.
If multiple genes overlap a GISTIC2 peak, known driver genes would be used to annotate that peak. E.g., if a GISTIC2 peak overlapped both MYC and a near-adjacent non-driver gene, only MYC would be chosen as possible target.
-
3.
If no overlapping genes could be found, GISTIC2 peaks were annotated with the nearest GENCODE (v38) protein-coding gene.
The peak amplitude thresholds were used to represent the presence (or absence) of the observed GISTIC2 peak within each respective sample; Low amplitude (t > −0.3), Med. amplitude (−0.3 > t > −1.3) and High amplitude (t < −1.3).
Analysis and quantification of known mutational signatures
Mutational signatures analysis was performed using the MutationalPatterns package (v3.2.0) based on COSMIC signatures (v3.2; single-base substitutions, doublet-substitutions and InDels-based)52. Sample-specific signature refitting was done by finding the optimal contribution of the COSMIC signatures (v3.2). Proposed etiologies for the COSMIC signatures were taken from the COSMIC signature database (v3.2).
Detection of genomic differences between poor and good responders
Differences in genomic characteristics between poor and good responders were tested using a two-sided Mann-Whitney U test with Benjamini-Hochberg correction on the internal validation cohort (n = 79). We tested the following genomic characteristics: tumor mutational burden, total number of deletions (SV), total number of inversions (SV), total number of insertions (SV), total number of translocations (SV), total number of tandem duplications (SV), total sum of structural variants per sample and genome-wide ploidy.
Mutual-exclusiveness of mutant genes, chromothripsis status (≥1 chromothripsis event in sample) and HRD-status were assessed between poor and good responders using a two-sided Fisher’s Exact Test with Benjamini–Hochberg correction. Genes with protein-coding mutation(s) and/or deep amplification or deep deletion status were counted as mutants within this analysis.
Processing and analysis of the whole-transcriptome sequencing data
Pre-processing of whole-transcriptome data
Prior to alignment, raw reads (per lane) were pre-processed using fastp (v0.23.2) to trim adapter sequences (paired-end), low-quality bases and perform low-complexity trimming but without a min. length selection on the remainder of the read. Subsequently, these corrected reads are aligned against the human reference genome (GRCh37) with GENCODE (v38)53 annotations using STAR (v2.7.9a)54. Alignment was performed against the full reference genome and also only against the transcriptome to allow for downstream calculation of the fragments per kilo base per million mapped reads (FPKM). Per sample, all lanes (both R1 and R2; paired-end) are used during alignment using the following command:
STAR --genomeDir <GRCh37 > --readFilesIn <R1 lanes > <R2 lanes > --readFilesCommand zcat --outFileNamePrefix <prefix > --outSAMtype BAM SortedByCoordinate --outSAMunmapped Within --outSAMattributes All --outFilterMultimapNmax 10 --outFilterMismatchNmax 3 --limitOutSJcollapsed 3000000 --chimSegmentMin 10 --chimOutType WithinBAM SoftClip --chimJunctionOverhangMin 10 --chimSegmentReadGapMax 3 --chimScoreMin 1 --chimScoreDropMax 30 --chimScoreJunctionNonGTAG 0 --chimScoreSeparation 1 --outFilterScoreMinOverLread 0.33 --outFilterMatchNminOverLread 0.33 --outFilterMatchNmin 35 --alignSplicedMateMapLminOverLmate 0.33 --alignSplicedMateMapLmin 35 --alignSJstitchMismatchNmax 5 −1 5 5 --twopassMode Basic --twopass1readsN −1 --runThreadN 10 --limitBAMsortRAM 10000000000 --quantMode TranscriptomeSAM --outSAMattrRGline <sample-specific readgroup>
Post-alignment, duplicate reads were marked using sambamba markdup (v0.8.1)55 and general alignment metrics (e.g., number of primary-mapped reads) were retrieved using sambamba flagstats (v0.8.1).
Determining per-gene expression
Read-counts per gene, from GENCODE annotations (v38), were retrieved using featureCounts (v2.0.3)56 on primary-aligned reads only with paired-end and strand-specific options:
featureCounts -T 50 -t exon -g gene_id --primary -p -s (1 or 2) -a <GENCODE v38 > -o <output > <Genome-aligned BAM files>
FeatureCounts was performed on NextSeq 550 WTS samples with -s = 2 whilst NovaSeq 6000 WTS samples were performed with -s set to 1 to address differences in library read-orientations.
Only protein-coding genes were used (n = 19449) in all downstream analysis.
Batch-effect correction
To remove potential bias regarding site of biopsy, we used the full CPCT-02 mCRPC cohort. We performed differential analysis using DESeq2 per major biopsy site (≥5 samples) versus the rest; the major sites being liver, lymph node, bone and “Other”, i.e., liver (n = 54) vs. the rest (n = 267), lymph node (n = 159) vs. the rest (n = 162), bone (n = 72) vs. the rest (n = 249) and “Other” (n = 36) vs. the rest (n = 285) on all protein-coding genes. Following default DESeq2 analysis (Wald test), we performed LFC-shrinkage using the ‘ashr’ method57. Next, genes with the following criteria were designated as putative biopsy-site (batch-effect) markers: adjusted p (q) ≤ 0.05, log2 fold-change standard error (lfcSE) ≤ 1 and |log2 fold-change | ≥ 1. This resulted in a list of 3419 distinct genes, which were significantly enriched (or depleted) within liver, lymph node and/or bone biopsies (Suppl. Table 1). These markers were removed prior to all subsequent whole-transcriptome analysis.
A t-SNE approach (θ = 0.5, perplexity = 15, dims = 2, 1000 iterations) was performed to visualize the batch effect of alternate biopsy sites and the effectiveness of the removal within the 155 whole-transcriptome sequenced samples used as discovery cohort within this study and no lingering batch-effects were observed.
Differential expression analysis between treatment response groups
We performed differential expression analysis using DESeq2 on all protein-coding genes without the designated biopsy-site specific genes (as described above) between good responders (n = 38) vs. poor responders (n = 41) within our internal validation cohort. Next, genes with the following criteria were designated as differentially-expressed genes: adjusted p (q) ≤ 0.05, an average read count over all samples (baseMean) ≥ 25, Log2FC standard error ≤ 1 and |log2FC | ≥ 0.5.
Quantification of AR-V7 expression
We quantified the percent spliced in (PSI) of AR-V7 by comparing the number of junction-reads which spanned AR exon 2 and cryptic exon 3 (ARV7) vs. the number of junction-reads spanning AR exon 1 and exon 2 (ARe12) and dividing them appropriately:
Design of ML-classification models for prediction of response to ARSI
The internal cohort of patients with matched WGS and WTS was divided in a training- and internal validation set of 70% (n = 79) and 30% (n = 34) of the samples, respectively. Good and poor responders were randomly divided. Ambiguous responders were included in the validation set only. The training set was used in LOOCV (‘LeaveOneOut’ from sklearn.model_selection) to determine model performance and then the full training set was used to train a model, that was applied on the internal validation set. To perform validation on the external cohort, the same training set was used to train a classifier, but for WTS data, additional preprocessing steps were applied (see below). Figure 2 shows the machine learning model design and evaluation steps (figure made in BioRender.com).
Additional experiments with hyperparameter tuning in grid search were applied for the best performing models in LOOCV. The hyperparameter combinations were evaluated based on accuracy score (‘sklearn.metrics.accuracy_score’).
Classification input preprocessing and model design
The four significantly divergent genomic features between good and poor responders to ARSI, namely TMB, total number of structural variants, total number of tandem duplications and total number of deletions, were centered and scaled prior to classification. Standard scaling was a necessary pre-processing step based on the comparison of genomic feature distributions in the training, internal and external cohorts (Suppl. Figure 10). To train the genomics and genomics-clinical covariate models, Logistic Regression classifier was applied (‘LogisticRegression’ from sklearn.linear_model, solver = ’liblinear’).
The raw transcriptomics data was TMM transformed (edgeR) and centered and scaled (‘StandardScaler’ from sklearn.preprocessing, with_mean = True, with_std = True). To perform dimensionality reduction, sparse PCA58, conventional PCA59 and Independent Component Analysis60 were evaluated. While evaluating the cumulative explained variance of the principal components is a widely used approach to select the optimal number of components to describe the dataset with PCA, this information is not available when applying sparse PCA and Independent Component Analysis (additionally the latter being an entirely different approach). Therefore, to compare sparse PCA (‘sparsePCA’ from sklearn.decompostion), PCA (‘PCA’ from sklearn.decomposition) and Independent Component Analysis (‘FastICA’ from sklearn.decomposition), these models were first applied on the training dataset with target sparse component number (‘n_components’) ranging from 10 to 50. Afterwards, a Linear Support Vector Classifier (Linear SVC) (‘LinearSVC’ from sklearn, penalty = ’l2’, loss = ’squared_hinge’, C = 1.0, max_iter = 10’000) was trained on a given set of components which was subsequently calibrated (‘CalibratedClassifierCV’ from sklearn). Lastly, all dimensionality reduction-based classification models were evaluated based on AUC (Suppl. Fig. 11) and the best performing model was chosen for internal and external validation.
Combining genomic and transcriptomic models and data
As an attempt to exploit the strength of both models and data types, different ensembling techniques were applied. A stacking classifier was built using ‘StackingClassifier’ from sklearn.ensemble where the prediction probability output of both models was used in final_estimator=LogisticRegression() to calculate the final prediction. The averaging ensemble approach was carried out by averaging the prediction probabilities from the best transcriptomic model (40 independent components or ICs) and the genomic model. A multi-model averaging ensemble was built by averaging predictions from transcriptomics-genomics averaging ensemble model pairs from 100 randomized evaluations. In each individual averaging ensemble, the transcriptomics model used n randomly selected components (max. n = 40; from the best performing 40 independent components (ICs) based transcriptomics data decomposition) from which the prediction was averaged with the genomics model. Each of these individual ensemble results were then aggregated and averaged over the 100 experiments. Lastly, a bagging classifier was built using ‘BaggingClassifier’ from sklearn.ensemble with n_estimators = 100, max_samples = 1.0, bootstrap = True, oob_score = True and max_features = 0.35, by randomly sampling from a joint set of transcriptomics (40 ICs) and genomics features for each individual learner. Boosting ensemble methods that require extensive subsampling were not assessed due to the limited size of the training set (79 samples). Moreover, the main goal of ensembling was to reduce variance (due to potential overfitting) and not bias. All tested ensemble approaches were evaluated in LOOCV in the initial model design step.
Addition of prior treatment data to classification models
Additionally, the WGS-only and WTS-only classification models were extended with baseline clinical variables: AAP/enzalutamide pretreatment, chemotherapy pretreatment and the number of treatment lines. The former two clinical variables were binary (0 – not received, 1 – received) while the number of treatment lines was ranging from 0 to 9. The clinicogenomics, clinicotranscriptomics and a joint WTS + WGS + ARSI models were trained and evaluated in LOOCV.
Shuffled labels experiments on final classification models
Shuffled label experiments were carried out on the best performing WTS-only, WTS + ARSI, WTS + WGS + ARSI, WGS-only, WGS + ARSI and ensemble (WTS + WGS) models. By permuting the sample labels, the corresponding distribution of the null hypothesis (=’there is no meaningful feature pattern that can be used to distinguish between poor and good responders’) can be estimated. The label shuffling procedure measures how likely it is that the observed classifier accuracy can be obtained by chance. For each classification model, the sample labels were randomly shuffled in 10’000 iterations using numpy.random.shuffle(). Afterwards, LOOCV was performed in each iteration, where the shuffled labels were used in model training and then a left-out test sample label was predicted in each fold. The same shuffled labels were used for all classification models in each iteration.
Validation of classification models in internal cohort
For the validation of the WGS-only and clinicogenomics classification models, the genomic features were centered and scaled within the internal cohort data. To perform dimensionality reduction on the transcriptomics data of the internal validation cohort, the best independent components model that was already fitted on the training cohort was applied to transform the dataset. Following the dimensionality reduction, the transformed data was used not only in the WTS-only model but in the combined WTS + ARSI, WTS + WGS + ARSI and ensemble (WTS + ARSI) models to predict responder groups in the internal validation cohort.
Diagnostic accuracy and predictive values were evaluated for the true good and poor responders within the internal validation cohort, for comparison with the training set, as well as within the complete internal validation cohort, including true ambiguous responders too. Additionally, treatment duration and overall survival were compared in the predicted groups. Subgroup analyses in similarly pre-treated patients were performed for the clinicogenomics model to evaluate the additional predictive value of genomics to clinical data.
Validation of classification models in external cohort
Classification models were validated in the external West Coast Dream Team cohort (WCDT), which included mCRPC patients treated with ARSI after biopsy21,35. WGS was available for 56 patients, while WTS was available for 77 patients. Clinical outcome was defined as overall survival from time of biopsy to death of any cause.
Data from the WCDT cohort was pre-processed by applying the same steps as in the CPCT-02 cohort. The genomic features were centered and scaled prior to prediction. The transcriptomics data was TMM-normalized, then centered and scaled. To account for batch-effect and general inter-domain variability of the CPCT-02 and the WCDT transcriptomics datasets, a domain adaptation method (PRECISE) was used61. PRECISE first employs Independent Component Analysis separately on the two transcriptomics matrices. Then using the independent component datasets, it infers a so-called consensus representation. First, the consensus representation was fitted on the training data (70% of CPCT) and calculated with ‘ConsensusRepresentation’ with n_factors = 40, n_pv = 40, dim_reduction = ica, n_representations = 40, mean_center = True and std_unit = True. Then, both the training set and the external validation set were transformed and a Linear SVC classifier (with the same parameters as described in ‘Classification input preprocessing and model design’) was trained on the transformed training data. The combined WTS + ARSI, WTS + WGS + ARSI and ensemble (WTS + ARSI) models were also re-trained on the transformed training data. Afterwards, the external validation cohort sample labels were predicted with each model as good or poor responder and OS in both groups was compared.
As the utilized transcript annotations differed between the internal training set and the external cohort (GENCODE v38 and GENCODE v28), missing genes were filtered out from the internal training set. To evaluate the potential effect of the missing genes on the classification, we compared a classifier that was trained on the full transcriptomics dataset and a classifier that was trained on the filtered transcriptomics dataset in the initial LOOCV step (Suppl. Fig. 12).
Application of an adapted clinicogenomics model to WES data
Importance of the features in the clinicogenomics model was assessed in LOOCV. Individual importance values were determined based on the fitted Logistic Regression model coefficients, which were accessible from the trained model (\({mode}l.{coef}\left[0\right]\)). Unbiased interpretation of these coefficients required that all the input features were scaled prior to model training (see Suppl. Fig. 10 and Classification input preprocessing and model design). The obtained importance values were visualized in a bar plot, with error bars indicating the standard deviation of values across all LOOCV folds (Suppl. fig. 9). To determine the potential of a prediction model based on tumor mutation burden as could be observed with WES (coupled with information on prior ARSI), we subsampled somatic mutations found within our WGS to include those found within exonic regions only (GENCODE v38). TMB was recalculated (using total Mb of exonic regions rather than the entire genome) to generate a WES proxy for TMB.
A Logistic Regression model was trained on the approximated WES feature (exome TMB) and prior ARSI treatment information in LOOCV, then the performance was assessed based on AUC (Fig. 8a). Internal and external validations were carried out in the same fashion as for the WGS-based clinicogenomics model (see Fig. 8b, c).
Statistical designs
Clinical characteristics of good and poor responders were compared using the appropriate statistical test, based on number of variables and normality distribution (t-test, non-parametric test or Fisher’s Exact test). P-values were adjusted for multiple testing using the Bonferroni method. Unless otherwise stated, statistical tests were performed in a two-sided manner. Treatment duration and OS in the predicted groups were visualized in Kaplan Meier curves. Good and poor responders were compared using log rank tests. Statistical tests were performed in IBM SPSS Statistics (v28.0.1.0 (142) and the statistical platform R (v4.1.1)).
Genomic differences (i.e., TMB, total SV and total deletions, translocations, insertions, inversions and tandem duplications and genome-wide ploidy) were tested using the two-sided Wilcoxon Rank-Sum Test with multiple testing correction (Benjamini-Hochberg). Statistical tests were performed in the statistical platform R (v4.1.1). For visualization, p-values (or q-values) are visualized as *(p < 0.05), **(p < 0.01) and ***(p < 0.001).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The WGS, RNA-Seq and corresponding clinical data used in this study was made available by the Hartwig Medical Foundation (Dutch nonprofit biobank organization) after signing a license agreement stating data cannot be made publicly available via third party organizations. Therefore, the data are available under restricted access and can be requested upon by contacting the Hartwig Medical Foundation (https://www.hartwigmedicalfoundation.nl/applying-for-data/) under the accession code: DR-07120. In addition, we performed analysis on the patients of the previously reported WCDT cohort, who were treated directly after biopsy with ARSI and for who WGS and/or WTS was previously performed21,35. For a detailed description of data availability, we refer to this paper21. Requests for data can be directed towards prof. dr. Felix Feng, E: Felix.Feng@ucsf.edu. The data generated in this study are provided in the Source Data file. Source data are provided with this paper.
Code availability
The initial workflows and software for the processing of the WGS data are available at https://github.com/hartwigmedical/. Any additional custom code and scripts used within this study (processing, analysis, and visualization) have been deposited on Zenodo: DOI: 10.5281/zenodo.771261062 The custom R-based workflow (R2CPCT) used to further analyze the WGS-data obtained from HMF and CPCT-02 study is available on GitHub under the GPL-3.0 license: https://github.com/J0bbie/R2CPCT The code used to further annotate genomic variants (as retrieved from HMF) using VEP is available on GitHub under the GPL-3.0 license: https://github.com/J0bbie/VariantAnnotation_VEP.
References
Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).
Siegel, R. L., Miller, K. D. & Jemal, A. Cancer statistics, 2020. CA Cancer J. Clin. 70, 7–30 (2020).
Armstrong, A. J. et al. Five-year survival prediction and safety outcomes with enzalutamide in men with chemotherapy-naïve metastatic castration-resistant prostate cancer from the prevail trial. Eur. Urol. 78, 347–357 (2020).
Fizazi, K. et al. Abiraterone acetate for treatment of metastatic castration-resistant prostate cancer: final overall survival analysis of the COU-AA-301 randomised, double-blind, placebo-controlled phase 3 study. Lancet Oncol. 13, 983–992 (2012).
Scher, H. I. et al. Increased survival with enzalutamide in prostate cancer after chemotherapy. N. Engl. J. Med. 367, 1187–1197 (2012).
de Bono, J. S. et al. Circulating tumor cells predict survival benefit from treatment in metastatic castration-resistant prostate cancer. Clin. Cancer Res. 14, 6302–6309 (2008).
Mehra, N. et al. Plasma cell-free dna concentration and outcomes from taxane therapy in metastatic castration-resistant prostate cancer from two phase III trials (FIRSTANA and PROSELICA). Eur. Urol. 74, 283–291 (2018).
Torquato, S. et al. Genetic alterations detected in cell-free dna are associated with enzalutamide and abiraterone resistance in castration-resistant prostate cancer. JCO Precis. Oncol. 3, 18.00227 (2019).
Antonarakis, E. S. et al. Androgen receptor splice variant 7 and efficacy of taxane chemotherapy in patients with metastatic castration-resistant prostate cancer. JAMA Oncol. 1, 582–591 (2015).
Antonarakis, E. S. et al. AR-V7 and resistance to enzalutamide and abiraterone in prostate cancer. N. Engl. J. Med. 371, 1028–1038 (2014).
Armstrong, A. J. et al. Prospective multicenter study of circulating tumor cell ar-v7 and taxane versus hormonal treatment outcomes in metastatic castration-resistant prostate cancer. JCO Precis. Oncol. 4, PO.20.00200 (2020).
Maillet, D. et al. Improved androgen receptor splice variant 7 detections using a highly sensitive assay to predict resistance to abiraterone or enzalutamide in metastatic prostate cancer patients. Eur. Urol. Oncol. 4, 609–617 (2019).
Chung, J. S. et al. Circulating tumor cell-based molecular classifier for predicting resistance to abiraterone and enzalutamide in metastatic castration-resistant prostate cancer. Neoplasia 21, 802–809 (2019).
Armstrong, A. J. et al. Prospective multicenter validation of androgen receptor splice variant 7 and hormone therapy resistance in high-risk castration-resistant prostate cancer: the prophecy study. J. Clin. Oncol. 37, 1120–1129 (2019).
Wyatt, A. W. et al. Genomic alterations in cell-free DNA and enzalutamide resistance in castration-resistant prostate cancer. JAMA Oncol. 2, 1598–1606 (2016).
Azad, A. A. et al. Androgen receptor gene aberrations in circulating cell-free dna: biomarkers of therapeutic resistance in castration-resistant prostate cancer. Clin. Cancer Res. 21, 2315–2324 (2015).
Del, Re, M. et al. Androgen receptor gain in circulating free DNA and splicing variant 7 in exosomes predict clinical outcome in CRPC patients treated with abiraterone and enzalutamide. Prostate Cancer Prostatic Dis. 24, 524–531 (2021).
Gurioli, G. et al. Plasma AR copy number changes and outcome to abiraterone and enzalutamide. Front Oncol. 10, 567809 (2020).
Du, M. et al. Plasma cell-free DNA-based predictors of response to abiraterone acetate/prednisone and prognostic factors in metastatic castration-resistant prostate cancer. Prostate Cancer Prostatic Dis. 23, 705–713 (2020).
van Dessel, L. F. et al. The genomic landscape of metastatic castration-resistant prostate cancers reveals multiple distinct genotypes with potential clinical impact. Nat. Commun. 10, 5251 (2019).
Quigley, D. A. et al. Genomic hallmarks and structural variation in metastatic prostate cancer. Cell 174, 758–769.e9 (2018).
Chen, W. S. et al. Genomic drivers of poor prognosis and enzalutamide resistance in metastatic castration-resistant prostate cancer. Eur. Urol. 76, 562–571 (2019).
Guan, X. et al. Copy number loss of 17q22 is associated with enzalutamide resistance and poor prognosis in metastatic castration-resistant prostate cancer. Clin. Cancer Res 26, 4616–4624 (2020).
Adam, G. et al. Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis Oncol. 4, 19 (2020).
Chang, Y. et al. Cancer drug response profile scan (cdrscan): a deep learning model that predicts drug effectiveness from cancer genomic signature. Sci. Rep. 8, 8857 (2018).
McVeigh, T. P. et al. The impact of Oncotype DX testing on breast cancer management and chemotherapy prescribing patterns in a tertiary referral centre. Eur. J. Cancer 50, 2763–2770 (2014).
Slodkowska, E. A. & Ross, J. S. MammaPrint 70-gene signature: another milestone in personalized medical care for breast cancer patients. Expert Rev. Mol. Diagn. 9, 417–422 (2009).
Ali, M. & Aittokallio, T. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys. Rev. 11, 31–39 (2019).
Koras, K. et al. Feature selection strategies for drug sensitivity prediction. Sci. Rep. 10, 9377 (2020).
Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).
Linder, A., Hagberg Thulin, M., Damber, J. E. & Welen, K. Analysis of regulator of G-protein signalling 2 (RGS2) expression and function during prostate cancer progression. Sci. Rep. 8, 17259 (2018).
Najumudeen, A. K. et al. The amino acid transporter SLC7A5 is required for efficient growth of KRAS-mutant colorectal cancer. Nat. Genet 53, 16–26 (2021).
Vanharanta, S. et al. Loss of the multifunctional RNA-binding protein RBM47 as a source of selectable metastatic traits in breast cancer. Elife 3, e02734 (2014).
Qiu, J. et al. Identification of endonuclease domain-containing 1 as a novel tumor suppressor in prostate cancer. BMC Cancer 17, 360 (2017).
Aggarwal, R. et al. Prognosis associated with luminal and basal subtypes of metastatic prostate cancer. JAMA Oncol. 7, 1644–1652 (2021).
Loriot, Y. et al. Antitumour activity of abiraterone acetate against metastatic castration-resistant prostate cancer progressing after docetaxel and enzalutamide (MDV3100). Ann. Oncol. 24, 1807–1812 (2013).
Saad, F. et al. Efficacy outcomes by baseline prostate-specific antigen quartile in the AFFIRM trial. Eur. Urol. 67, 223–230 (2015).
Miller, K. et al. The phase 3 COU-AA-302 study of abiraterone acetate plus prednisone in men with chemotherapy-naive metastatic castration-resistant prostate cancer: stratified analysis based on pain, prostate-specific antigen, and gleason score. Eur. Urol. 74, 17–23 (2018).
Abida, W. et al. Genomic correlates of clinical outcome in advanced prostate cancer. Proc. Natl Acad. Sci. USA 116, 11428–11436 (2019).
Cattrini, C. et al. Optimal sequencing and predictive biomarkers in patients with advanced prostate cancer. Cancers (Basel) 13, 4522 (2021).
Ernst, R. et al. WGS in kanker diagnostiek - Betaalbaar beter. (Hartwig Medical Foundation, (2022).
Bins, S. et al. Implementation of a multicenter biobanking collaboration for next-generation sequencing-based biomarker discovery based on fresh frozen pretreatment tumor tissue biopsies. Oncologist 22, 33–40 (2017).
Armstrong, A. J. et al. Five-year survival prediction and safety outcomes with enzalutamide in men with chemotherapy-naive metastatic castration-resistant prostate cancer from the prevail trial. Eur. Urol. 78, 347–357 (2020).
Scher, H. I. et al. Trial design and objectives for castration-resistant prostate cancer: updated recommendations from the prostate cancer clinical trials working group 3. J. Clin. Oncol. 34, 1402–1418 (2016).
Shale, C. et al. Unscrambling cancer genomes via integrated analysis of structural variation and copy number. Cell Genomics. 2, 100112 (2022).
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Nguyen, L., J, W. M. M., Van Hoeck, A. & Cuppen, E. Pan-cancer landscape of homologous recombination deficiency. Nat. Commun. 11, 5584 (2020).
Cortes-Ciriano, I. et al. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nat. Genet 52, 331–341 (2020).
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Frankish, A. et al. Gencode 2021. Nucleic Acids Res. 49, D916–D923 (2021).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Stephens, M. False discovery rates: a new deal. Biostatistics 18, 275–294 (2017).
Zou, H., Hastie, T. & Tibshirani, R. Sparse principal component analysis. J. Computational Graph. Stat. 15, 265–286 (2006).
Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441 (1993). 498-520.
Hyvarinen, A. & Oja, E. Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000).
Mourragui, S., Loog, M., van de Wiel, M. A., Reinders, M. J. T. & Wessels, L. F. A. PRECISE: a domain adaptation approach to transfer predictors of drug response from pre-clinical models to tumors. Bioinformatics 35, i510–i519 (2019).
de Jong, A. C. et al. Predicting response to enzalutamide and abiraterone in metastatic prostate cancer using whole-omics machine learning models; ErasmusMC-MedicalOncology/ResponsePredictionAbiEnza: https://doi.org/10.5281/zenodo.7712610 (2023).
Acknowledgements
This research was financially supported with an unrestricted grant by Johnson & Johnson (ML; 212082PCR3014) and Astellas Pharma (ML; Lolkema/NL-72-RG-11). In addition, we would like to acknowledge the Erasmus MC Cancer Computational Biology Center (CCBC) and Hartwig Medical Foundation (HMF) for sharing their expertise and computational resources.
Author information
Authors and Affiliations
Contributions
Ad.J., A.D., and J.V.R. wrote the manuscript, which all authors critically reviewed. Ad.J. managed clinical data assessment, which was supervised by Rd.W. and M.L. A.D., supervised by Jd.R. and Jv.R. performed the bioinformatics analyses. M.L. is PI of the CPCT-02 study. F.F. provided the external validation cohort, for which M.S. performed the validation analyses.
Corresponding author
Ethics declarations
Competing interests
RdW has speaker/advisory roles at Sanofi, Bayer, and Astellas, advisory roles at Orion, Hengrui, and Merck US, and received institutional research grants from Sanofi and Bayer. FF received personal fees as a consultant for Janssen, Myovant, Roivant, Novartis, Astellas, Foundation Medicine and Exact Sciences, as a member of the scientific advisory board of Bayer, SerImmune, Bristol Meyers Squibb (BMS), Bluestar Genomics, Blue Earth Diagnostics, and Tempus, as an advisor with stock options at Artera, and as former co-founder with ownership interests of PFS Genomics (relationship termed in April 2021. JdR is co-founder of Cyclomics BV. ML received advisory role/speaker fees from Incyte, Amgen, Janssen Cilag B.V., Bayer, Servier, Roche, Pfizer, Sanofi Aventis Netherlands BV, and Astellas, and has received institutional research funding from Sanofi, JnJ, Merck, and Astellas. All other authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Kuan-lin Huang, Ping Mu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
de Jong, A.C., Danyi, A., van Riet, J. et al. Predicting response to enzalutamide and abiraterone in metastatic prostate cancer using whole-omics machine learning. Nat Commun 14, 1968 (2023). https://doi.org/10.1038/s41467-023-37647-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-023-37647-x
- Springer Nature Limited