Patients and procedures
The inclusion criteria were consecutive patients with non-small-cell lung cancer (NSCLC), or entire available cohort for The Cancer Imaging Archive (TCIA) (http://cancerimagingarchive.net/, last accessed June 2015), with a target lesions ≥ 5 ml who had a pre-therapy FDG-PET/CT scan available and underwent radical radiotherapy with or without chemotherapy between October 2008 and December 2013. The minimum lesion volume of interest (VOI) of 5 ml was selected, in accordance with work carried out by Soussan et al. [8]. Exclusion criteria were patients undergoing surgery or palliative treatment. Institutional ethical approval for retrospective analysis was obtained, and informed consent was waived.
The following hospitals took part in the trial (Fig. 1): Imperial College Healthcare NHS Trust, London, St James’s University Hospital, Leeds, Guy’s and St. Thomas’ Hospitals, London, The Royal Marsden Hospital, Sutton, Nottingham University Hospital, Nottingham, and Mount Vernon Hospital, Northwood; a dataset was also obtained from TCIA. This work was carried out sequentially with training and validation followed by TESTI. Data from the four hospitals and The Cancer Imaging Archive (Imperial, Kings, Leeds, and Royal Marsden, and TCIA) patients were collated and randomly split into two (by computer) as training set (n = 133) and validation set (n = 134). A power calculation based on the training set (HR = 1.78, median survival: 2.92 years, censoring rate: 0.012, median follow-up: 2.17 years) suggested a sample size of 203 was needed to obtain the alpha of 0.05 and beta of 0.25. Therefore, all 70 cases from another centre were added to the validation set to make a total of 204. This validation set was only used for testing the findings from the training set. We used the maximum number of patients in the TCIA database that were available at the time. The original number of patients screened and basis for exclusion are indicated in Supplementary Table 1.
Pre-therapy clinico-pathologic data were obtained from medical records (Table 1). Overall survival was defined as number of months from commencement of treatment to date of death. Patients who were alive were censored at last follow-up to 31st July 2016. The hospital records were used to determine who was still alive at the time of cut-off. This was a multi-institutional analysis and so patients were examined on different PET/CT scanners including Phillips Allegro Body, Phillips Gemini TF TOF 16 (Phillips Medical Systems, Amsterdam, Netherlands), Siemens Biograph 64 mCT, Siemens Biograph 128 mCT (Siemens Healthcare, Erlangen, Germany), GE Healthcare Discovery ST, GE Discovery STE (GE Healthcare, Waukesha, Wisconsin, USA), CTI ECAT HR+ (CTI PET Systems Inc., Knoxville, Tennessee, USA), and CPS/Siemens Sensation 16. For PET, slice thickness ranged between 2 and 5.15 mm; the matrix size ranged between 1282 and 5122. After injection of 350–500 MBq 18F-FDG [9], emission data were acquired (five or six bed positions, 2–4 min per bed position) after a 60–90 min uptake period. In all cases, PET/CT scans were performed from upper thighs to the base of the skull following ≥ 4–6-h fast, and had a measured blood glucose level < 11.0 mmol/l at the time of injection. CT was acquired without oral or intravenous contrast agent. The PET data were reconstructed using OSEM iterative reconstruction and were attenuation-corrected using the CT data.
Table 1 Characteristics of the training, validation and test datasets PET analysis
Central analyses of all PET/CT data were conducted at Imperial College London by a semi-automated adaptive threshold method. The primary tumor was initially delineated using an initial threshold of 40% of the SUVmax on semi-automated software (Hermes Gold3; Hermes Medical Solutions Ltd., London, UK) and VOIs drawn. The PET volume was correlated with the primary tumor on CT, and underestimation was determined by checking if the PET tumour VOI encompassed the whole tumour on the CT component of the PET. If the VOI did not cover the tumour visually, a lower threshold was used [10]. Manual adjustment was employed when the VOI incorporated adjacent normal structures such as adjacent myocardial activity [11]. All segmentations were made by the same operator (observer 1, a radionuclide specialist radiologist with 4 years’ experience of tumor delineation).
The SUVmean, SUVmax, SUVpeak, metabolic tumor volume (MTV), and total lesion glycolysis (SUVmean × MTV)(TLG) of the primary tumor were recorded. Using Youdens’s J to find the optimal cut-off from the ROC for median survival, Kaplan–Meier curves were generated. The VOIs were extracted and imported into the radiomics software. To assess intra- and inter-observer variability of the segmentation method, 18 patients were selected at random by SPSS, and segmentation of the tumor was performed (at 128 Gy level) by two additional experienced operators (observers 2 and 3, with 6 and 10 years’ experience of tumor delineation respectively) blinded to the original results and clinical data. Lymph nodes were excluded from statistical analyses.
The interclass coefficient was used to assess intra- (by observer 1) and inter-observer (by observers 1, 2, and 3) differences in texture. The differences between the observers were performed by a 2-way ANOVA repeated measures model using Bonferroni correction.
Radiomics analysis
Radiomics analysis (Supplementary Fig. 1) was performed at seven different quantisation/gray levels — 4, 8, 16, 32, 64,128 and 256 Gy — on TexLAB v2, which was developed and implemented in-house within Matlab R2015b (MathWorks Inc., Natick, MA, USA). From each primary tumor, 665 radiomic features (listed in Supplementary Table 3) were extracted from segmented VOIs using local, regional, global, fractal, and wavelet techniques. These included intensity features, shape features, and texture features [gray level co-occurrence matrix (GLCM), gray level run length matrix (GLRLM) and neighbourhood gray difference matrix (NGTDM)] with or without wavelet transformation, as previously reported [5, 6]. Radiomics features were determined from 133 PET scans (training set) using TexLAB v2.
Feature selection and radiomics signature discovery
As with other high-throughput analyses, it is important to reduce the total number of features for prediction purposes in order to eliminate Type 1 errors and instead learn the true basis of a decision. We initially identified highly correlated features for elimination using heatmaps, as highly correlated features suggested that some feature reduction could be undertaken without information being lost. Heatmaps were created using R software (http://www.r-project.org/; Version 3.03 Vienna, Austria). It is known that there is correlation of several texture features with volume [12]. Using Spearman's rank correlation, features that had a high correlation with volume (≥ 0.7) were normalised by dividing the feature value by volume to obtain volume-invariant texture features (notably, the two features included in the final analysis did not correlate with volume, and thus, did not require normalisation to volume).
From the 665 sets of features at each gray-level, we used least absolute shrinkage and selection operator (LASSO) regression analysis for data dimension reduction, radiomics feature vector (composite feature) discovery, generating Kaplan–Meier curves and computing the Cox regression analysis. LASSO is a form of penalised regression used to reduce the problem of multi-collinearity. Briefly, the non-contributory variables were assigned zero-weighting, and numerous iterations were performed to link the non-zero contributory variables to the chosen clinical outcome (in this example, overall survival) [13]. Analyses were conducted with R software; the packages in R used for our analysis are indicated in Supplementary Table 4. Two-sided statistical significance levels were used, and p ≤ 0.05 was considered statistically significant. SPSS for Statistics Version 22 (IBM, Armonk, NY, USA) was used for interclass correlation and 2-way ANOVA.
The most predictive feature vectors (FVX) were computed by linear combination of selected statistical features of the matrices weighted by their respective coefficients and by comparison with overall survival (OS). Survival curves were plotted using Kaplan–Meier (KM) methods, stage-specific or Youden’s J cut-off on the receiver operator curve for the median survival in the case of FVX. Kaplan–Meier curves were plotted for overall survival using the ‘survfit’ function from the ‘survival’ package in R using the median cut-off for the MTV, TLG, and FVX. The statistical significance of the difference in the survival curves was calculated using the logrank test implemented in the ‘survdiff’ function. The survival curves were evaluated using a log-rank test (Cox Regression). Multivariable analysis of the FVX, stage, MTV, and TLG were compared with each other using a stepwise backward procedure to determine significantly independent survival indicators. P values of ≤ 0.05 were considered statistically significant, and 95% confidence intervals were calculated. A continuous Cox regression and the C-index, was computed for each prognosticator in the univariate analysis, and for the multivariable analysis with and without FVX. All four variables (FVX, stage, MTV, and TLG) were used as continuous variables in the analysis.
Independent validation and testing
Performance of the FVX and stage were tested by comparison to OS in an independent validation set of 204 patients, and a further independent set of 21 (TESTI; the final institutional dataset to be accepted into the study) patients. Similar survival comparisons were made with routine PET variables including SUVmean, SUVmax, SUVpeak, MTV, and TLG.