Co-clinical FDG-PET radiomic signature in predicting response to neoadjuvant chemotherapy in triple-negative breast cancer

Purpose We sought to exploit the heterogeneity afforded by patient-derived tumor xenografts (PDX) to first, optimize and identify robust radiomic features to predict response to therapy in subtype-matched triple negative breast cancer (TNBC) PDX, and second, to implement PDX-optimized image features in a TNBC co-clinical study to predict response to therapy using machine learning (ML) algorithms. Methods TNBC patients and subtype-matched PDX were recruited into a co-clinical FDG-PET imaging trial to predict response to therapy. One hundred thirty-one imaging features were extracted from PDX and human-segmented tumors. Robust image features were identified based on reproducibility, cross-correlation, and volume independence. A rank importance of predictors using ReliefF was used to identify predictive radiomic features in the preclinical PDX trial in conjunction with ML algorithms: classification and regression tree (CART), Naïve Bayes (NB), and support vector machines (SVM). The top four PDX-optimized image features, defined as radiomic signatures (RadSig), from each task were then used to predict or assess response to therapy. Performance of RadSig in predicting/assessing response was compared to SUVmean, SUVmax, and lean body mass-normalized SULpeak measures. Results Sixty-four out of 131 preclinical imaging features were identified as robust. NB-RadSig performed highest in predicting and assessing response to therapy in the preclinical PDX trial. In the clinical study, the performance of SVM-RadSig and NB-RadSig to predict and assess response was practically identical and superior to SUVmean, SUVmax, and SULpeak measures. Conclusions We optimized robust FDG-PET radiomic signatures (RadSig) to predict and assess response to therapy in the context of a co-clinical imaging trial. Supplementary Information The online version contains supplementary material available at 10.1007/s00259-021-05489-8.


Introduction
Triple-negative breast cancer (TNBC) is a highly heterogeneous and aggressive cancer characterized by poor outcome and higher relapse rates compared to other subtypes of breast cancer. Pathological complete response (pCR) is often used as a critical endpoint in the treatment of TNBC following neoadjuvant chemotherapy (NAC) as it is often associated with favorable long-term outcome. Therefore, it is critical to identify patients who will respond to NAC therapy to avoid the use of ineffective treatments. Intratumoral heterogeneity is regarded as a major factor in tumor progression and resistance to NAC [1]. Towards that end, advanced quantitative imaging (QI) strategies, including extraction of image features, or radiomics, have been employed to characterize tumor heterogeneity and to predict/assess response to therapy [2,3].
We designed a co-clinical trial to assess the efficacy of docetaxel/carboplatin therapy in patients with TNBC and patient-derived tumor xenografts (PDX) generated from TNBC patient biopsies. Co-clinical trials are an emerging area of investigation in which a clinical trial is coupled with a corresponding (subtype-matched or patient-specific) preclinical trial to inform the corresponding clinical trial [4][5][6][7][8][9][10]. The emergence of PDXs as co-clinical platforms is largely motivated by the realization that established cell lines do not recapitulate the heterogeneity of human tumors and the diversity of tumor phenotypes [11]. Indeed, numerous investigations have demonstrated that PDX accurately reflect patients' tumors in terms of the histomorphology, gene expression profiles, and gene copy number alterations [12][13][14][15][16], as well as the ability to predict therapeutic response in patients, especially when a clinically relevant drug dosage is used [17][18][19]. To that end, the National Cancer Institute's (NCI) Patient-Derived Models Repository (https:// pdmr. cancer. gov), EuroPDX (https:// www. europ dx. eu), academic institutions, and numerous commercial entities have launched wide-ranging PDX repositories to advance the use of PDX in precision medicine.
One of the objectives of the co-clinical trial, which is still underway, is to predict response to therapy using [ 18 F] fluorodeoxyglucose (FDG) with positron emission tomography (PET). We previously identified six TNBC subtypes including 2 basal-like (BL1 and BL2), an immunomodulatory (IM), a mesenchymal (M), a mesenchymal stem-like (MSL), and a luminal androgen receptor (LAR) subtype through molecular signatures of TNBC subtypes [20]. The use of PDX in preclinical imaging offers numerous advantages in translational imaging research, chief among them is retention of human tumor heterogeneity [12,16,21], which can be exploited to develop image metrics of response to therapy. Thus, the objective of this work was to first, optimize and identify robust radiomic features to predict response to therapy in subtype-matched TNBC PDX, and second, implement PDX-optimized image features in the TNBC co-clinical study to predict response to therapy using machine learning (ML) algorithms.
The scheme outlined in Fig. 1 highlights the paradigm we undertook in this effort. We used the co-clinical imaging trial to define, for the first time, parallels in radiomic Fig. 1 Overview of methodology in optimizing radiomic features in the co-clinical trial. TNBC PDX were generated from human tumor biopsies. Tumors were segmented following co-clinical imaging to extract radiomic features. Radiomic features were extracted per IBSI guidelines. Repeatability, cross-correlation, and volume dependency were performed to identify the robust features. ReliefF and then ML were used to predict/assess the response to therapy in PDX and to identify radiomic signatures (RadSig). RadSig was implemented in the clinical trial to predict/assess response to therapy features between preclinical and clinical imaging. To address the primary objective, we characterized the reproducibility, cross-correlation (auto-correlation), and volume dependency of FDG-PET radiomic features in PDX. Optimal radiomic features were then used in ML algorithms to define radiomic signatures (RadSig) of response to therapy in the preclinical PDX trial. With the RadSig at hand, we used RadSig to predict response to therapy in the preclinical arm. To address the secondary objective, we performed an interim analysis to implement radiomic signatures optimized in the preclinical PDX trial to predict response to therapy in the clinical arm. Our findings suggest that RadSig performed significantly better than SUV measures to predict (using baseline metrics) and assess (difference in image metrics) response to therapy in both preclinical and clinical arms.

Co-clinical protocol
The co-clinical design is outlined in the scheme of Fig. 2A and described below. Twenty newly diagnosed stage II or III TNBC patients were recruited into an ongoing co-clinical trial. Patient inclusion and exclusion selection criteria are detailed in ClinicalTrial.gov ID # NCT02124902. A secondary goal of the co-clinical trial was to assess the performance of FDG-PET in predicting/assessing response to therapy. TNBC PDX were generated as previously described [22] from TNBC patient tumor repository. Briefly, 6-to 10-weekold female NOD scid gamma (NSG) mice were obtained from The Jacksons Laboratory (https:// www. jax. org were anesthetized with isoflurane and an inverted Y-shaped incision was made along the thoracic-inguinal region to expose the 4 th inguinal mammary fat pad. Two to four million tumor cells mixed with Matrigel in a volume of 30 µl were injected into the mammary fat pad. Following engraftment, tumor growth in PDX mice was monitored for recruitment into the preclinical trial. TNBC PDX subtypes were identified as described previously [23] based on molecular signature analysis of 93 TNBC PDX to identify TNBC subtypes including basal-like (BL1 and BL2), an immunomodulatory (IM), a mesenchymal (M), and a luminal androgen receptor (LAR) subtype.
Preclinical imaging Small animal PET/CT was performed on the Inveon microPET/CT scanner as described previously [23]. Briefly, 4 h prior to imaging session, food was removed from cages while water was given ad libitum. Mice were anesthetized with 2-2.5% isoflurane by inhalation via an induction chamber. Anesthesia was maintained throughout the imaging session by delivering 1-1.5% isoflurane via a custom-designed nose cone. A heat lamp was used to maintain body temperature. TNBC PDX were injected with 18 FDG (6.66-8.14 MBq) by tail vein immediately before a 0-60-min dynamic small animal PET acquisition. Images were reconstructed with a 3D OSEM algorithm with a ramp filter at 0.5 cutoff and voxel size of ~ 0.8 mm isotropic.
Preclinical therapeutic studies TNBC PDX (N = 29) were imaged at baseline (BL) and 4 days (4D) following start of therapy ( Fig. 2A). Docetaxel (20 mg/kg IP)/carboplatin (50 mg/kg IP) was administered following BL imaging and weekly for a period of 4 weeks. Tumor volumes were measured bi-weekly using the formula volume = 1/6*L*W 2 where L and W represent the length and width of the tumor, respectively. All animal experiments were conducted in compliance with the Guidelines for the Care and Use of Research Animals established by Washington University's Animal Studies Committee.
Clinical imaging Simultaneous FDG-PET and MR imaging protocol was implemented on the Siemens Biograph mMR. Subjects were imaged at baseline (BL) prior to therapy and between the first cycle (C1) and second cycle of docetaxel/ carboplatin for a total of 6 cycles (21 days per cycle). At each imaging time point, patients were fasted for ~ 4 h prior to injection of ~ 10 mCi of FDG. After an uptake period, patients were positioned prone on the PET/MR scanner. FDG-PET imaging was performed starting at 30 to 70 min post FDG administration for a total of 40 min acquisition to accommodate the simultaneous MR acquisition protocol. Default Dixon sequence was used for attenuation correction. Images were reconstructed to produce four 10-min frames.

Image analysis and extraction of radiomic features
Preclinical imaging Static 10-min PET/CT images obtained 50-min post-administration of FDG (representative image in Fig. 2B) were processed in two steps. In the first step, coregistered PET/CT images were analyzed using the Inveon Research Workplace (IRW) software (Siemens Healthcare).
Volumes of interest (VOIs) were manually drawn on co-registered PET/CT images to include tumor(s). Second, VOIs and individual voxels were normalized to SUV in MAT-LAB using the relation: Clinical imaging Tumor VOIs were manually drawn on 20-min static PET images obtained by averaging two 10-min frames 50-70-min post-administration of FDG (representative image in Fig. 2C). To ensure harmonization of preclinical and clinical pipelines, IRW was used to segment tumors on PET/MR images. Mean SUV (SUV mean ) for the entire tumor was calculated as per above. SUV max was determined by identifying the maximum voxel activity in the tumor VOI. Peak SUV was normalized to lean body mass (SUL peak ) based on positron emission tomography response criteria in solid tumors (PERCIST) [24].

Extraction of imaging features
One hundred thirty-one imaging features were extracted from preclinical and clinical tumors. These include one hundred twenty radiomic features, tumor volume, metabolic tumor volume, and nine SUV metrics as tabulated in Supplemental Table S1. Radiomic features were determined per the image biomarker standardization initiative (IBSI) guidelines [25,26]. Equalprobability quantization algorithms to quantize raw data into gray level (Ng) were implemented using histeq MATLAB function. Resampling to isotropic voxel size in all three directions was applied to all higher order features. Thirtyseven first-order features were extracted directly from raw data. All higher order features were extracted after applying fixed quantization of gray level Ng = 64.

Robustness of radiomic features
We evaluated the robustness of radiomic features in terms of reproducibility (test-retest), cross-correlation, and the dependency on tumor volume. Robust radiomic features were then used as predictors of response to therapy.
Test-retest A preclinical test-retest protocol was implemented to optimize the reproducibility of radiomic features.
PDX (N = 40) were imaged on consecutive days (day 1 and day 2) in identical conditions.

Cross-correlation
The cross-correlation between features was determined using Spearman correlation. A threshold Spearman correlation of ρ ≥ 0.9 and significance value P < 0.001 were chosen to signify high correlation between features.
Volume-dependent radiomic features Radiomic features were regressed against their corresponding tumor volumes. Linear or nonlinear functional forms were used to fit all significant volume-dependent features.

Prediction and assessment of response to therapy
Prediction vs. assessment of response to therapy We make a distinction between predicting and assessing response to therapy. In predicting response to therapy, BL imaging features were used to predict response to therapy in either the preclinical or the clinical arm. In assessing response to therapy, the change (Δ) in image feature between on-treatment (4 days post baseline imaging in preclinical and post-C1 in clinical) and BL was used to predict response to therapy in the preclinical or the clinical arms.

Classification of response to therapy
In preclinical studies, endpoint caliper volume change from start of treatment was considered as surrogate of response to therapy with response to therapy corresponding to > 20% decrease in volume, partial response corresponding to ≤|20|% change in volume, and no response corresponding to > 20% increase in volume. Baseline radiomic features and change in radiomic features between 4d post-treatment and baseline scans were used as the predictive criterion for ML algorithms. In clinical studies, pCR was used to determine response to therapy. pCR was defined as no histological evidence of invasive tumor cells in the surgical breast specimen and sentinel or axillary lymph nodes.

Feature selection
In preclinical studies, the relief-based algorithm (RBA) [27] was used to select a subset of features as inputs to the ML algorithms. A relevance threshold (τ = 0.05) [28] was used to select most relevant weighted features to facilitate in expansive modeling, reduce overfitting, and make the task tractable for inputs in ML algorithms. These optimal features were used to predict response or assess response to therapy using BL and difference between on-treatment and BL optimal features, respectively.
Machine learning for outcome prediction The ML algorithms used in this study include CART [29], SVM [30], and NB [31]. In implementing CART, Gini index was used at each partition to determine splitting criteria with a binary threshold of CART. In implementing SVM, radial basis function (RBF) kernel was used to make the hyperplane decision boundary between the classes. Objective function L2-norm regularization was used to overcome overfitting problem. CART, SVM, and NB work well with datasets as low as N = 20 [32]. Ten-fold cross-validation was used to avoid overfitting the ML model [33].

Statistical analyses
Robustness of features Lin's concordance correlation coefficient (LCC) [34] was used to assess reproducibility using Stata version 12.1. LCC ≥ 0.7 was considered as a threshold of reproducible radiomic feature [35,36]. As indicated above, cross-correlation between features was evaluated using the Spearman correlation ρ ≥ 0.9 at significance value P < 0.001. To display clusters of correlations, hierarchical clustering of the Spearman correlation heatmap was performed. In evaluating volume dependency of features, the Akaike information criterion (AIC) and Bayesian information criterion (BIC) were calculated for each functional form, and the appropriate model was selected based on the minimum value of AIC and BIC. The Spearman correlation (ρ) was used to determine the correlation between each feature and tumor volume.
Performance metrics of response to therapy prediction Common performance metrics including accuracy, F-score, sensitivity, specificity, precision, and negative predictive value (NPV) were used to assess performance of response to therapy [20]. The performance of the radiomic features was additionally compared with SUV mean , SUV max , and SUL peak based on PERCIST [24].

Reproducibility of preclinical radiomic features
Test-retest was performed to assess the reproducibility of radiomic features using LCC as a measure of reproducibility. Ninety-four out of 129 radiomic features (72.9%) were identified as reproducible with LCC ≥ 0.7. The frequency of correlations along with the cumulative percent is displayed in Fig. 3A. Approximately 22% of features were highly reproducible with LCC ≥ 0.9. The reproducibility by class of features is depicted in Fig. 3B

Cross-correlation between features (preclinical and clinical)
We ascertained the cross-correlation between features using the Spearman correlation (ρ). Highly correlated features (ρ ≥ 0.9) were removed and reduced to 94 features from 129 features. Hierarchical clustering of the Spearman correlation heatmap is shown Fig. 4. Twenty-one clusters were identified in the preclinical heatmap (Fig. 4A) and similarly 21 clusters were identified in the clinical cross-correlation heatmap (Fig. 4B). Membership of features to clusters is available in Supplemental Table S2. The distribution of Spearman correlations is available in Fig. 4C and D for preclinical and clinical cross-correlations, respectively.

Volume-dependent radiomic features (preclinical and clinical)
In total, 10 radiomic features were highly correlated to volume (ρ > 0.9; P < 0.001). The functional form of the volume dependency and corresponding goodness-of-fit measures for preclinical and corresponding clinical images is shown in Fig. 5, which was similar for both preclinical and clinical features. Supplemental Table S3 summarizes the statistical analyses for the correlations.

Prediction and assessment of response to therapy
At the intersection of robustness analyses, 62 of the 129 (48.06%) features were found to be optimal and were passed to ReliefF feature selection followed by ML. ReliefF rank importance identified top performing 15 features for prediction (based on BL features) and assessment (based on 4D-BL features) of response to therapy ( Fig. 6B and C, respectively). The rank importance of radiomic features is given in Supplemental Table S4.

Preclinical PDX studies
The accuracy of ML in predicting/assessing response to therapy as a function of the number of radiomic features is depicted in Fig. 6D and E for BL and 4D-BL, respectively. The number of radiomic features to maximize prediction accuracy saturated at 4 features ( Fig. 6D) with NB exhibiting the highest accuracy at 86.21%, followed by SVM and CART. In contrast, the accuracy of assessing response to therapy (4D-BL) increased with increasing number of radiomic features; the accuracy of NB is 86.9% followed by SVM and CART (Fig. 6E). We opted to  compare performance between prediction and assessment (i.e., BL vs. 4D-BL) using the least number of robust features. For this reason, Table 1 tabulates the performance of ML algorithms to predict/assess response prior to and following optimization for robust features using only the top 4 radiomic features for each classification (prediction vs. assessment of response). The set of 4 radiomic features from each task (prediction and assessment of response) make up the radiomics signature (RadSig). As tabulated in Table 1, RadSig performs as well as, or marginally better than, nonoptimized features (all features) in predicting response. The performance of prediction/assessment of response to therapy stratified by TNBC subtype is tabulated in Supplemental Table S5 and highlights differences in prediction by TNBC subtype.
The performance of RadSig in comparison to SUV mean , SUV max , and SUL peak for the top two performing ML algorithms (NB and SVM) is summarized in Fig. 7. NB performed marginally better than SVM in predicting/assessing response to therapy (Fig. 7A) in the preclinical PDX trial. The percent increase in predicting/assessing response to therapy relative to SUV mean , SUV max , and SUL peak is depicted in Fig. 7B for NB. NB-RadSig improved prediction of response by over 60% in all performance measures. In assessing response to therapy, RadSig performed better than SUV mean in most performance criteria and marginally better than SUL peak and SUV max (Fig. 7B). Thus, RadSig has greater impact in predicting response to therapy than assessing response to therapy. Full performance data is available in Supplemental Table S6. We then performed an interim analysis of the ongoing clinical trial to assess the feasibility of implementing PDX-optimized RadSig to predict/assess response to therapy using ML. Table 2 summarizes patient characteristics, pathological response, SUV metrics at BL, and percent change in SUV metrics between on-treatment (post C1) and BL for the interim analyses (Supplemental Table S7 contains SUV values at baseline and on-treatment). Of the twenty patients, ten patients exhibited pCR; however, all patients exhibited reduction in SUV. Average percent (± 1SD) reduction in the non-pCR group was − 46

Cumulative % Frequency
Frequency Cumulative % Fig. 4 Cross-correlation between radiomic features. Hierarchical clustering of the Spearman cross-correlation heatmap for A preclinical and B clinical. The dendrogram shows the clustering, and each color represents a different cluster. The frequency and cumulative sum at each Spearman correlation (ρ) is displayed in C for preclinical and D for clinical 83, − 60.32 ± 16.47, and − 66.16 ± 13.74 for SUV mean , SUL peak , and SUV max , respectively, in the pCR group. Figure 7 also depicts the performance of the ML algorithms in predicting and assessing response to therapy in the clinical arm (Fig. 7C). The performance of SVM and NB with RadSig as a predictor was marginally similar, although overall SVM performed better than NB when using SUV metrics as predictors (Supplemental Table S6). SVM-RadSig exhibited higher prediction rates of response to therapy relative to SUV max , SUV mean , and SUL peak in all performance measures (20-40% higher), as well as in assessing response to therapy (15-75% higher) (Fig. 7D). Overall, RadSig performed better than SUV metrics in predicting and assessing response to therapy.

Discussion
The emergence of co-clinical models is largely motivated by the realization that established cell lines do not recapitulate the heterogeneity of human tumors and the diversity of tumor phenotypes [11] and that better oncology models are needed to support high-impact translational cancer research [12,16,21]. An underlying premise in the co-clinical study design is that the heterogeneity of the human tumor is retained in PDX. Indeed, tumor genomic and pathological investigations have confirmed that PDX recapitulate the heterogeneity of human tumors [12][13][14][15][16]  and that these can be used to better inform cancer biology, therapeutic design [17][18][19], and therefore by extension imaging studies, albeit with some limitations [21].
With that in mind, in this this work, we exploited the heterogeneity of TNBC PDX subtypes to (1) identify robust radiomic features in preclinical TNBC PDX; (2) optimize RadSig-ML algorithms to predict response to therapy in PDX; and (3) implement PDX-optimized RadSig to predict/assess response to therapy in the clinical trial.
To our knowledge, this study represents the first such effort to optimize radiomic features in preclinical PET imaging to predict/assess response to therapy in TNBC PDX. We recently characterized the dependency of preclinical MR radiomic features on tumor volume [37]. In this work, we confirmed dependency of preclinical PET radiomic features on tumor volume with strikingly similar clinical parallels. This is particularly relevant in longitudinal studies during which tumor volumes will change with the course of the disease or following therapy. Ideally, volume-independent features should be used as to not bias image features longitudinally. We further evaluated the cross-correlation of preclinical and clinical radiomic features with the goal of reducing the dimensionality of features. Finally, we evaluated the repeatability of radiomic features in preclinical PET imaging to identify robust features for inclusion in ML-based prediction of response to therapy. At the thresholds defined within to screen for volume dependency, repeatability, and cross-correlation, we identified 62 optimal features to predict/assess response to therapy.
RBF [27] was used to rank image features using three ML algorithms as to their relevance in predicting/assessing response to therapy. Our data suggests that overall SVM performed better than NB and CART in predicting response to therapy. We used the top four ML-RBF-optimized radiomic features-referred to as radiomic signature (RadSig)from each task (prediction vs. assessment) to either predict or assess response to therapy. In the preclinical arm, RadSig performed significantly better in predicting response to therapy relative to standard SUV measures. Importantly, RadSig also performed better in predicting and assessing response to therapy in the clinical arm. Antunovic et al. [38] reported the utility of FDG-PET radiomic features to assess response to therapy using four different models in 79 patients with heterogenous breast cancer subtypes. The reported area under the curve of an ROC analysis ranged from 0.70 to 0.73. Li et al. [39] recently assessed the utility of both PET  Fig. 6 ML-based selection of radiomic features. A At the thresholds defined within to screen for volume dependency, repeatability, and cross-correlation, we identified 62 optimal features. Implementation of relief-based algorithm (RBA) to identify a subset of features as inputs to ML-based prediction (B) and assessment (C) of response to therapy. Accuracy of ML algorithms CART, SVM, and Naïve Bayes to predict (D) or assess (E) response to therapy as a function of number of radiomic features in PDX and CT radiomic features to predict response to therapy in a retrospective study that included 100 heterogenous breast cancer patients. The PET/CT radiomic predictors achieved a prediction accuracy of 87% on the training split set and 77% on the independent validation set. In this small, albeit homogenous, dataset of TNBC patients where PDX-optimized radiomic features were implemented in the clinical imaging arm, we observed an impressive accuracy of 72% and 71% when predicting and assessing response, respectively, compared to SUV metrics. We were unable to perform a validation test on an independent dataset. However, the primary objective was to compare the performance of predictive metrics in the training phase relative to standard SUV measures. Prediction of response to therapy can be further enhanced through integration of MR image features [40]. At this stage, we did not include MR radiomic features due to increased dimensionality with added MR image features and a limited number of patients. Other technologies that could be integrated with imaging to enhance therapeutic prediction include liquid biopsies such as circulating tumor DNA (ctDNA) analyses [41] and molecular/genomic features of tumors [42], both of which are an active area of investigation. Finally, numerous recent studies have documented that pCR rates varied with breast cancer molecular subtypes. TNBC and HER2-positive molecular subtypes have shown to have higher pCR rates after NAC [43]. Importantly, several studies have demonstrated an association between imaging features and molecular phenotypes, risk of recurrence, and prognosis [44][45][46]. Interestingly, our PDX studies similarly suggest that response to therapy (and prediction thereof) is a function of the TNBC subtype; however, further studies are needed to support this hypothesis and the utility of radiomic features in classifying TNBC subtypes. With that in mind, one of the most critical aspects in practical implementation of radiomics is a consensus on the most effective features and their standardization.

Conclusions
We identified robust FDG-PET radiomic features in terms of volume dependency, reproducibility, and crosscorrelation to predict and assess response to therapy in a  . 7 Performance of ML algorithms. A Performance of RadSig in predicting/assessing response to therapy in the preclinical PDX trial with NB and SVM. B Percent improvement in NB-RadSig prediction/ assessment of response relative to SUV max , SUV mean , and SUL peak . C similar to A but for the clinical investigation. D Similar to B but using SVM preclinical PDX trial. The number of radiomic features to maximize accuracy was further optimized to yield ML radiomic signatures (RadSig) of response to therapy. RadSig improved prediction of response to therapy in the preclinical arm. Given that PDX recapitulate the heterogeneity of human tumors, we then assessed the feasibility of implementing PDX-optimized RadSig in an interim analysis of the clinical trial to predict response to therapy. The performance of SVM-RadSig in predicting/assessing response to therapy was superior to SUV max , SUV mean , and SUL peak metrics in the clinical setting; however, given the small sample size, additional studies are warranted to further validate the utility of PDX-optimized features, such as RadSig, and potentially integrate with multi-scale features to enhance prediction/assessment of response to therapy.