Quantitative classification and radiomics of [18F]FDG-PET/CT in indeterminate thyroid nodules

Purpose To evaluate whether quantitative [18F]FDG-PET/CT assessment, including radiomic analysis of [18F]FDG-positive thyroid nodules, improved the preoperative differentiation of indeterminate thyroid nodules of non-Hürthle cell and Hürthle cell cytology. Methods Prospectively included patients with a Bethesda III or IV thyroid nodule underwent [18F]FDG-PET/CT imaging. Receiver operating characteristic (ROC) curve analysis was performed for standardised uptake values (SUV) and SUV-ratios, including assessment of SUV cut-offs at which a malignant/borderline neoplasm was reliably ruled out (≥ 95% sensitivity). [18F]FDG-positive scans were included in radiomic analysis. After segmentation at 50% of SUVpeak, 107 radiomic features were extracted from [18F]FDG-PET and low-dose CT images. Elastic net regression classifiers were trained in a 20-times repeated random split. Dimensionality reduction was incorporated into the splits. Predictive performance of radiomics was presented as mean area under the ROC curve (AUC) across the test sets. Results Of 123 included patients, 84 (68%) index nodules were visually [18F]FDG-positive. The malignant/borderline rate was 27% (33/123). SUV-metrices showed AUCs ranging from 0.705 (95% CI, 0.601–0.810) to 0.729 (0.633–0.824), 0.708 (0.580–0.835) to 0.757 (0.650–0.864), and 0.533 (0.320–0.747) to 0.700 (0.502–0.898) in all (n = 123), non-Hürthle (n = 94), and Hürthle cell (n = 29) nodules, respectively. At SUVmax, SUVpeak, SUVmax-ratio, and SUVpeak-ratio cut-offs of 2.1 g/mL, 1.6 g/mL, 1.2, and 0.9, respectively, sensitivity of [18F]FDG-PET/CT was 95.8% (95% CI, 78.9–99.9%) in non-Hürthle cell nodules. In Hürthle cell nodules, cut-offs of 5.2 g/mL, 4.7 g/mL, 3.4, and 2.8, respectively, resulted in 100% sensitivity (95% CI, 66.4–100%). Radiomic analysis of 84 (68%) [18F]FDG-positive nodules showed a mean test set AUC of 0.445 (95% CI, 0.290–0.600) for the PET model. Conclusion Quantitative [18F]FDG-PET/CT assessment ruled out malignancy in indeterminate thyroid nodules. Distinctive, higher SUV cut-offs should be applied in Hürthle cell nodules to optimize rule-out ability. Radiomic analysis did not contribute to the additional differentiation of [18F]FDG-positive nodules. Trial registration number This trial is registered with ClinicalTrials.gov: NCT02208544 (5 August 2014), https://clinicaltrials.gov/ct2/show/NCT02208544. Supplementary Information The online version contains supplementary material available at 10.1007/s00259-022-05712-0.

Part of this limited specificity is explained by the proportion of HCN/SHCN cytology, which varied from 21 to 52% in previous PET/CT studies including 23% in our trial [4][5][6]. Hürthle cell neoplasms, defined as tumours composed of > 75% Hürthle cells, constitute an extraordinary subgroup: following the abundance of mitochondria in their oxyphilic follicular-derived cells, nearly all of these neoplasms are strongly [ 18 F]FDG-positive [4,[13][14][15]. As such, visual [ 18 F]FDG-PET/CT assessment cannot differentiate between benign and malignant Hürthle cell nodules. We previously advocated that a visual [ 18 F]FDG-PET/CT-driven diagnostic workup should be limited to non-Hürthle cell Bethesda III/ IV nodules to optimize therapeutic yield [4].
The risk of malignancy in nodules with HCN/SHCN cytology appears lower than in FN/SFN cytology, but Hürthle cell carcinomas typically show more aggressive behaviour and less favourable prognosis than their nononcocytic follicular counterparts [13,15]. This underlines the currently unmet need for an accurate diagnostic workup for this subgroup.
Several studies have reported the quantitative assessment of [ 18 F]FDG-PET/CT images using the standardised uptake value (SUV, g/mL) of the indeterminate thyroid nodule, most frequently reported as the maximum SUV (SUV max ) [8,[10][11][12]16]. A higher SUV max was generally reported in thyroid malignancies than in benign lesions [5][6][7]9]. Threshold analysis using a SUV max cut-off of 5 g/mL resulted in 42-80% sensitivity and 41-91% specificity to detect malignancy [6,8,10]. To the best of our knowledge, there is no evidence on the quantitative assessment of [ 18 F]FDG-PET/ CT in Hürthle cell nodules.
In addition to the traditional quantitative PET features such as the SUV max , PET/CT images harbour an abundance of information inside the myriad of voxels that could be identified using radiomics [17]. In radiomics, large amounts of quantitative features are extracted from medical images, aiming to find stable and clinically relevant image-derived biomarkers that may provide new insights in tumour biology and guide patient management [18]. After a number of studies suggested that radiomic analysis could contribute to the differentiation of [ 18 F]FDG-positive thyroid incidentalomas, one study recently also indicated its potential in the diagnosis of indeterminate nodules [19][20][21][22][23][24].
In the current study, we sought to optimize the [ 18 F] FDG-PET/CT-driven differentiation of indeterminate thyroid nodules through quantitative [ 18 F]FDG-PET/CT assessment including radiomic analysis, with particular attention for the separate assessment of non-Hürthle and Hürthle cell nodules. We aimed to rule out malignancy and decrease the false-positive rate as compared to visual [ 18 F]FDG-PET/CT assessment. We ultimately aimed to further prevent futile surgeries for benign, [ 18 F]FDG-positive indeterminate nodules.

Study design and patient selection
All patients who participated in a randomised controlled multicentre trial (ClinicalTrials.gov NCT02208544) on the efficacy of [ 18 F]FDG-PET/CT in cytologically indeterminate thyroid nodules (EfFECTS) were assessed for eligibility for the current study. The EfFECTS trial was conducted in eight academic and seven community hospitals in the Netherlands with a high level of experience in the diagnosis and treatment of thyroid nodules and differentiated thyroid carcinoma (Supplementary data). [ 18 F]FDG-PET/CT was performed in 132 patients with a solitary nodule or dominant nodule in multinodular disease from which indeterminate cytology was obtained, defined as at least two Bethesda III or one Bethesda IV cytology result (confirmed on central review). Based on cytology, clinical characteristics, and ultrasound features, diagnostic thyroid surgery was scheduled in all patients, in accordance with current international guidelines [25]. Further inclusion and exclusion criteria of the EfFECTS trial, and its comprehensive study procedures were previously reported [4]. Patients from the original study were only eligible for inclusion in the current study if their [ 18 F]FDG-PET/CT scan was acquired with strict adherence to the EANM guidelines, including a patient fasting time of at least 4 h and an acquisition time between 55 and 75 min (Fig. 1) [26]. Written informed consent was obtained from all participants prior to any study activity. The trial was approved by the Medical Research Ethics Committee on Research Involving Human Subjects region Arnhem-Nijmegen, Nijmegen, the Netherlands. The funder of the trial had no influence on the design or conduct of the trial and was not involved in the collection or analysis of the data, or in the writing of the manuscript.

Image acquisition and reconstruction
During the EfFECTS trial, all participants underwent an [ 18 F]FDG-PET/CT covering skull-base to upper thorax. These scans were acquired by 20 different scanners at 12 EARL-accredited study sites (Supplementary table 1) using a standard acquisition and reconstruction protocol in accordance with European Association of Nuclear Medicine (EANM) guidelines [26]. Patients were advised to fast for at least 6 h. Serum glucose levels were between 4 and 11 mmol/L. PET-acquisition was scheduled 60 (55-75) minutes after intravenous bolus administration of [ 18 F]FDG. The administered activity was dependent on body weight, scan speed, bed overlap, and scanner sensitivity, equivalent to 3.45 MBq/kg (4 min/bed, < 25% bed overlap). Low-dose, non-contrast-enhanced CT (ldCT) scans were acquired for attenuation correction of PET images. Additional details on patient preparation, data acquisition, image reconstruction, and image processing are reported in Supplementary table 2.

[ 18 F]FDG-PET/CT quantitative analysis
Quantitative image analyses were performed using Osi-riX Lite DICOM-viewer (Pixmeo SARL, Bernex, Switzerland). SUV-computation was validated after each mandatory software version update. All scans were centrally assessed by two independent, experienced nuclear medicine physicians (DV, LF). They were blinded to patient allocation and all clinical and cytological data except for the ultrasonographic size and location of the index nodule, to ensure its correct identification. For the visual assessment, any focal [ 18 F]FDG-uptake within the thyroid that was visually higher than the physiological background [ 18 F]FDG-uptake of the surrounding normal thyroid tissue and that corresponded to the index nodule in size and location, was considered positive. The SUV max and peak SUV (SUV peak , defined as the maximum average SUV within a 1 cm 3 spherical volume) of the index nodule were semi-automatically measured (Fig. 2) [27]. Body weight corrected values were used. The SUV max -ratio and SUV peak -ratio were respectively calculated by dividing the SUV max and SUV peak of the nodule by the background SUV max of normal thyroid tissue in the contralateral lobe. [ 18 F] FDG-positive foci in the thyroid that did not correspond to the index nodule in size and location (i.e., thyroid incidentalomas) were not analysed.

Radiomic analysis
All visually [ 18 F]FDG-positive nodules, defined as index nodules with focal [ 18 F]FDG-uptake that was visually higher than the background [ 18 F]FDG-uptake in the surrounding normal thyroid tissue, were included in the radiomic analysis.

Volume of interest definition
Volumes of interest (VOI) were delineated semi-automatically around visually [ 18  demonstrated a SUV max of 9.7 g/mL and SUV peak of 7.0 g/mL of the index nodule, and a SUV max of 1.6 g/mL in the background of surrounding normal thyroid tissue. Consequently, the SUV max -ratio and SUV peak -ratio were 6.1 (9.7/1.6) and 4.4 (7.0/1.6), respectively. For radiomic analysis, VOIs were delineated on the [ 18 F]FDG-PET scans using an isocontour that applies a threshold of 50% of the SUV peak , corrected for local background (c, d) [29]. Boxing was applied to exclude [ 18 F]FDG-positive tissue surrounding the index nodule and ldCT images were used as a visual reference (e, f). VOIs delineated on the PET images were resampled with a nearest neighbour algorithm to derive the ldCT VOIs software implemented in Python (version 3.6.10; Python Software Foundation, Wilmington, Delaware) [28]. VOIs were delineated on the [ 18 F]FDG-PET scans using an isocontour that applies a threshold of 50% of the SUV peak , corrected for local background activity ( Fig. 2) [29]. Boxing was applied to exclude [ 18 F]FDG-positive tissue surrounding the index nodule and ldCT images were used as a visual reference. VOIs delineated on the PET images were resampled with a nearest neighbour algorithm to derive the ldCT VOIs. Potential mismatch was evaluated visually: no corrections were required.

Image processing and radiomic feature extraction
Radiomic features were extracted from the VOIs on both the interpolated PET (4 × 4 × 4 mm 3 ) and the ldCT (2 × 2 × 2 mm 3 ) images using PyRadiomics (version 2.1.2 in Python version 3.6.10; pyradiomics.readthedocs.io) [30]. From both PET and CT images, 107 standardised features were extracted: 14 shape features, 18 intensity features, and 75 texture features (24 grey level co-occurrence matrix (GLCM), 16 grey level run length matrix, 16 grey level size zone matrix, 14 grey level dependence matrix, five neighbouring grey tone difference matrix). For PET, the total lesion glycolysis (TLG, defined as the product of the SUV mean and the metabolic tumour volume) was also added. A fixed bin size of 0.5 g/mL for PET and 25 HU for ldCT was used (Supplementary table 2) [31].

Reference standard
During participation in the EfFECTS trial, patients were advised to refrain from the scheduled diagnostic surgery when they were allocated to the [ 18 F]FDG-PET/CT-driven group and the index nodule was visually [ 18 F]FDG-negative. These patients remained under active surveillance and had at least a follow-up ultrasound examination after 12 months. All other patients were advised to proceed to the scheduled diagnostic surgery and were treated according to current guidelines [4,32]. This resulted in the following reference standard for the current study: benign nodules were defined either as benign on final histopathology (i.e., hyperplastic nodules, follicular adenoma or Hürthle cell adenoma) or as index nodules that remained unchanged in size and appearance on ultrasound follow-up, in accordance with definitions from the EfFECTS trial. Malignancies and borderline nodules were defined as index nodules that were histopathologically diagnosed as thyroid carcinoma or borderline tumours, the latter including non-invasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP), follicular tumour of uncertain malignant potential (FT-UMP), and paraganglioma. Throughout the manuscript, malignant and borderline lesions are grouped, as diagnostic surgery is considered the right course of treatment for all these lesions according to current insights. Incidentally detected (micro) carcinomas or borderline tumours located outside the index nodule were not considered for the reference standard. Blinded central revision of all cyto-and histopathology was performed by a dedicated thyroid pathologist. In case of discordance with the local histopathologist, a third pathologist was consulted and consensus was reached.

Outcomes
The primary outcome of the study was the diagnostic accuracy of quantitative [ 18 F]FDG-PET/CT assessment and radiomics in non-Hürthle cell (defined as AUS/FLUS and FN/ SFN cytology) and Hürthle cell (defined as HCN/SHCN cytology) nodules. True-positive and false-negative were respectively defined as test-positive and test-negative histopathologically malignant/borderline nodules. False-positive and true-negative were respectively defined as test-positive and test-negative benign nodules.

Statistical and radiomic analysis
Categorical data were expressed as absolute and relative (%) frequencies, and compared using Pearson's chi-squared or Fisher's exact tests, where appropriate. Continuous data were assessed for log-normality, expressed using mean ± standard deviation or median (interquartile range), and compared using independent samples t-tests or Mann-Whitney U tests when (log-)normally or non-normally distributed, respectively. Receiver operator characteristic (ROC) curve analysis was performed for the SUV max , SUV peak , SUV max -ratio, and SUV peak -ratio, using the area under the curve (AUC) to describe the overall diagnostic accuracy. Next, for each of the SUV-metrices, the cut-off value was determined at which an optimal test sensitivity was found, defined as a sensitivity ≥ 95%. This is in accordance with the current ATA recommendations that a useful rule-out test is characterised by a negative predictive value (NPV) similar to a Bethesda II cytological diagnosis (i.e., 96%) [25]. At these SUV cutoffs, we assessed the benign call rate, representing the rate of potentially avoidable diagnostic surgeries. Sensitivity, specificity, negative and positive predictive value (PPV), benign call rate, and 95% confidence intervals (CI) were calculated using the traditional formulas and β-distribution (Clopper-Pearson interval), respectively. Subgroup analysis was performed for [ 18

Radiomic classifier
Radiomic analysis was performed in Python (version 3.6.10) and R (version 3.6.0; R Foundation for Statistical Computing, Vienna, Austria). In Python, an elastic net regression classifier was trained and evaluated in a 20-times repeated random split, in which the dataset was split in 80% training and 20% test data. Since the number of extracted features exceeded the number of patients in the dataset, dimensionality reduction incorporating redundancy filtering and factor analysis of radiomic features was performed for each split on the training set using FMradio (Factor Modelling for Radiomics Data) R-package (version 1.1.1) [33]. One factor was selected for every ten subjects in the training set (Details provided in Supplementary table 2) [18]. Factors for the training and test set were calculated. The factors of the training set were used as input for the elastic net regression classifier. The predictive performance of the model is expressed as the mean AUC of the ROC curve over the 20 splits for the test sets. The 95% CIs were constructed using a corrected resampled t-test [34]. Classification models were trained on PET features and subsequently on PET and ldCT features.
The definitions of the factors in the model were determined based on the underlying clusters of features in the different folds. Subgroup analysis was performed for nodules meeting the minimal size recommendation for radiomic analysis of 64 voxels per VOI [35]. The TRIPOD statement (transparent reporting of multivariable prediction model for individual prognosis or diagnosis, version 1 October 2020) was used (IBSI reporting guidelines, Supplementary table 2) [36].

Patients
The current study included 123 patients between 1 July 2015 and 16 October 2018 ( Fig. 1

Quantitative [ 18 F]FDG-PET/CT assessment
In all 123 nodules, the median SUV max , SUV peak , SUV max -ratio, and SUV peak -ratio were significantly higher in malignant/borderline nodules than in benign nodules (p < 0.001) ( Table 3). ROC curve analysis showed similar AUCs for all SUV-metrices (Fig. 3). A 97.0% sensitivity was reached at SUV max , SUV peak , SUV max -ratio, and SUV peak -ratio cut-offs of 2.1 g/mL, 1.6 g/mL, 1.2, and 0.9, respectively ( Table 2). At these cut-offs, the benign call rate varied between 8.9% for the SUV peak and 28.5% for the SUV max -ratio. Missed malignant/borderline tumours varied across the SUV-metrices and included the two visually falsenegative nodules and a 20 mm NIFTP with a SUV max of 2.1 g/mL and SUV peak of 1.6 g/mL.
In the 94 non-Hürthle cell nodules, a sensitivity of 95.8% was established at the same cut-offs, with the benign call rate ranging from 10.6 to 35.1% (Table 2). Similar cut-offs for all SUV metrices and similar benign call rates were found in AUS/FLUS as compared to FN/SFN nodules (Supplementary tables 6-7, Supplementary Fig. 1). In the 29 Hürthle cell nodules, no significant differences in SUV-metrices were found between malignant/borderline and benign nodules ( Table 3). The AUCs ranged from 0.533 for the SUV max to 0.700 for the SUV peak -ratio (Fig. 3). Yet, at SUV cut-offs of 5.2 g/mL, 4.7 g/mL, 3.4, and 2.8, sensitivity was 100% with benign call rates ranging from 17.2% for the SUV max to 24.1% for the SUV peak and SUV peak -ratio (Table 2).
Subgroup analysis of the visually [ 18 F]FDG-positive non-Hürthle cell nodules showed similar SUV values in malignant/borderline and benign nodules. Threshold analysis in this subgroup showed that a ≥ 95% sensitivity was only achieved at minimal benign call rates (Supplementary  table 8 and 9).

Radiomic analysis
The 84 (68%) patients with visually [ 18 F]FDG-positive nodules were included in the radiomic analysis, including 56 (67%) non-Hürthle and 28 (33%) Hürthle cell nodules (Fig. 1). Dimensionality reduction of the radiomic feature set retained six factors in every training set (68 patients in training sets). The mean AUC of the PET model was 0.445 in the test set (Fig. 4, Supplementary table 10). The retained   Based on the results of our previous and the current study, we suggest a diagnostic algorithm for the [ 18 F]FDG-PET/ CT-driven workup of Bethesda III and IV thyroid nodules (Fig. 5). If externally validated, this workup could prevent more than half of the futile diagnostic surgeries for benign nodules. Additional diagnostics could be considered to further improve the differentiation of [ 18 F]FDG-positive non-Hürthle cell nodules and Hürthle cell nodules, including molecular diagnostics and systematic ultrasound evaluation using the Thyroid Imaging Reporting and Data System (TIRADS) [16]. Combined [ 18 F]FDG-PET/CT and TIRADS assessment previously showed high diagnostic accuracy in indeterminate thyroid nodules [12]. The performance of EU-TIRADS in Hürthle cell nodules seems more limited [37].
The limited number of prior studies on quantitative [ 18 F] FDG-PET/CT assessment in indeterminate thyroid nodules reported major variations in SUV cut-offs and diagnostic accuracy. Deandreis et al. and Rosario et al., who respectively included 56 indeterminate nodules (including 29 [52%] with Hürthle cell cytology) and 63 Bethesda III/IV nodules, showed that a SUV max of at least 5 g/mL was 91% specific to detect thyroid carcinoma, NIFTP, and FT-UMP [6,10]. In contrast, Merten et al. found that the same cutoff was only 41% specific but 80% sensitive in their study in 51 Bethesda IV nodules (including 24 [47%] Hürthle cell cytology) [8]. Piccardo et al. reported that a SUV max -ratio of 5 was the most accurate, without reporting an AUC or corresponding sensitivity and specificity in 111 indeterminate nodules [12]. Pathak et al. excluded Hürthle cell nodules and reported that a SUV max cut-off of 3.25 g/mL best differentiated the remaining 42 non-Hürthle cell nodules with 79% sensitivity and 83% specificity [38]. Part of the mixed results of these studies may be explained by different compositions of the patient populations, including the fractions of Hürthle cell cytology. Unfortunately, none of these studies separately analysed non-Hürthle and Hürthle cell nodules, even though multiple studies have reported higher [ 18 F]FDG uptake in Hürthle cell nodules and it has repeatedly been suggested that Hürthle cell nodules should be treated as separate entities in the diagnostic workup [7,14]. Besides that, SUV calculations strongly depend, amongst others, on image acquisition and reconstruction settings, and PET-scanner model [7,16]. It requires harmonised [ 18 F]FDG-PET protocols to enable the global interinstitution comparison of study results and advancement of PET research [26,39]. 24.1 (10.3-43.5) None of these previous studies used ROC curve analysis to determine SUV cut-offs that corresponded to optimal test sensitivity, even though threshold analysis seems a suitable method to uphold the ATA recommendations for a useful additional diagnostic (i.e., ≥ 96% NPV for a rule-out test) [25].
To the best of our knowledge, our study is the second to report PET/CT radiomics in indeterminate thyroid nodules. Giovanella et al. recently published the first study in 78 Bethesda III/IV patients, demonstrating a 96% NPV and 58% PPV for a multiparametric model including the cytological classification and two radiomic features [19]. PPV improved to 79% if 13 patients with a histopathological Hürthle cell adenoma were excluded (cytology not reported). Supervised feature selection was performed using redundancy filtering of features strongly correlating to SUV max and the metabolic tumour volume (ρ > 0.7) and LASSO logistic regression. The included features were GLCM autocorrelation and shape sphericity. In our factor-based analysis, the feature GLCM autocorrelation was frequently the underlying factor for 'high intensity on PET', a factor that was also often accompanied by SUV max . Shape sphericity was not one of the main features that explained our factors. When incorporated in the proposed multiparametric model, however, clinical application of radiomics seems feasible. Future studies are required to validate their results [19].
One of the strengths of our study is its carefully evaluated radiomic methodology. First, we preferred unsupervised feature selection or dimensionality reduction over supervised feature selection, which uses discriminative values for the outcome. Unsupervised methods take into account the interaction of features and multicollinearity, thereby preventing overfitting of the model [40]. We selected nonredundant features with low multicollinearity, which were not necessarily the features with the highest predictive performance. Second, dimensionality reduction was performed on the training sets in the folds instead of on the dataset as a whole, strictly distinguishing the independent test sets. Third, factor-based dimensionality reduction was chosen over a feature-based approach for generalizability purposes. Instead of selecting features corresponding to the retained factors, the factors were used as input for the model and patterns in corresponding features were compared between folds. In a feature-based approach, different features might have been selected in different folds, resulting in limited insight in these patterns. Along these lines, a factor-based approach improves the generalizability and interpretability of the model and might provide insight in the semantics or underlying tumour biology of the factors [41]. Contrarily, it reduces the (mathematical) explainability and reproducibility of the radiomic model during external validation, as it uses derivatives of features. Adherence to the IBSI reporting guidelines and TRIPOD statement may prevent reproducibility issues [31,36]. Another limitation is that eighteen nodules did not meet the minimal size recommendation for radiomic analysis of 64 voxels per VOI [35]. Subgroup analysis of the nodules meeting this requirement showed similar results. It is unlikely that the nodule size had a large impact on the radiomic analysis.
The multicentre design of the study was both a strength and limitation. While the population of our nationwide trial is unique and an adequate reflection of the diverse presentation of thyroid nodules, the different scanners and slight variations in imaging protocols among the 12 hospitals introduced heterogeneity and may have limited the radiomic analysis. Therefore, only scans with strict adherence to the EANM guidelines were assessed, as these reconstructions leads to a larger number of reliable, repeatable, and reproducible radiomic features in a multicentre and multivendor setting [42]. In addition, nodules were delineated using a threshold of 50% of the SUV peak , corrected for local background, which is recommended in multicentre [ 18 F]FDG PET/CT studies because of its high feasibility and repeatability [29]. Moreover, all images were interpolated to isotropic voxels in order to allow comparison between image data from different samples and centres [43]. The number of included patients per centre or PET/CT-scanner was not sufficiently large to incorporate post-reconstruction harmonization strategies such as ComBat [44].
In conclusion, the current study showed that quantitative [ 18 F]FDG-PET/CT assessment accurately ruled out malignancy in both Hürthle cell and non-Hürthle cell indeterminate thyroid nodules. Distinctive SUV cut-offs may avoid up to one in three futile diagnostic surgeries for benign Hürthle cell nodules. In non-Hürthle cell nodules, quantitative assessment had no added diagnostic value over visual Acknowledgements The authors like to thank all patients who participated in the EfFECTS trial and all members of the EfFECTS trial consortium.
Author contribution All authors contributed to the study conception and design, data acquisition, and interpretation of the data. Data analysis was performed by Elizabeth J. de Koster, Wyanne A. Noortman, Jacob M. Mostert, Floris H.P. van Velden, and Dennis Vriens. The first draft of the manuscript was written by Elizabeth J. de Koster, Wyanne A. Noortman, and Jacob M. Mostert. All authors contributed to data acquisition, critically reviewed the manuscript, and read and approved the final version. Lioe-Fee de Geus-Oei, Wim J.G. Oyen, and Dennis Vriens supervised the study.

Funding
The EfFECTS trial was supported by a project grant from the Dutch Cancer Society (KUN 2014-6514).