Does Measurement of First-Order and Heterogeneity Parameters Improve Response Assessment of Bone Metastases in Breast Cancer Compared to SUVmax in [18F]fluoride and [18F]FDG PET?

Purpose To establish whether first-order statistical features from [18F]fluoride and 2-deoxy-2-[18F] fluoro-d-glucose ([18F]FDG) positron emission tomography/x-ray computed tomography (PET/CT) demonstrate incremental value in skeletal metastasis response assessment compared with maximum standardised uptake value (SUVmax). Procedures Sixteen patients starting endocrine treatment for de novo or progressive breast cancer bone metastases were prospectively recruited to undergo [18F]fluoride and [18F]FDG PET/CT scans before and 8 weeks after treatment. Percentage changes in SUV parameters, metabolic tumour volume (MTV), total lesion metabolism (TLM), standard deviation (SD), entropy, uniformity and absolute changes in kurtosis and skewness, from the same ≤ 5 index lesions, were measured. Clinical response to 24 weeks, assessed by two experienced oncologists blinded to PET/CT imaging findings, was used as a reference standard and associations were made between parameters and progression free and overall survival. Results [18F]fluoride PET/CT: In four patients (20 lesions) with progressive disease (PD), TLM and kurtosis predicted PD better than SUVmax on a patient basis (4, 4 and 3 out of 4, respectively) and TLM, entropy, uniformity and skewness on a lesion basis (18, 16, 16, 18 and 15 out of 20, respectively). Kurtosis was independently associated with PFS (p = 0.033) and OS (p = 0.008) on Kaplan-Meier analysis. [18F]FDG PET: No parameter provided incremental value over SUVmax in predicting PD or non-PD. TLM was significantly associated with OS (p = 0.041) and skewness with PFS (p = 0.005). Interlesional heterogeneity of response was seen in 11/16 and 8/16 patients on [18F]fluoride and [18F]FDG PET/CT, respectively. Conclusion With [18F]fluoride PET/CT, some first-order features, including those that take into account lesion volume but also some heterogeneity parameters, provide incremental value over SUVmax in predicting clinical response and survival in breast cancer patients with bone metastases treated with endocrine therapy. With [18F]FDG PET/CT, no first-order parameters were more accurate than SUVmax although TLM and skewness were associated with OS and PFS, respectively. Intra-patient heterogeneity of response occurs commonly between metastases with both tracers and most parameters.


Introduction
Skeletal metastases are common in patients with advanced breast cancer and are associated with significant morbidity [1]. With the introduction of new systemic therapies that improve survival time, early detection and response assessment of skeletal metastases has become more important. Varied tumour response to treatment is undoubtedly an important factor in the clinical outcome, accentuating the need to have reliable measures to monitor patients for early disease progression in order to allow timely discontinuation of ineffective treatment.
Conventionally, bone scintigraphy has been used to assess breast cancer bone metastases but has certain limitations when assessing treatment response [2]. Molecular and functional imaging can improve diagnosis and treatment response assessment of breast cancer bone metastases [3]. Standardised uptake value (SUV) on positron emission tomography/computed tomography (PET/CT) has been used as a standard semi-quantitative method for monitoring treatment response but other non-heterogeneity parameters such as metabolic tumour volume (MTV) and tumour lesion metabolism (TLM) have also been used to measure metabolic activity within the tumour [4,5]. Several retrospective studies using 2-deoxy-2-[ 18 F] fluoro-D-glucose ([ 18 F]FDG) PET/CT, mainly focusing on osseous response to treatment, have established that a change in SUV max can predict disease response or progression [6][7][8] and a feasibility study has also shown that [ 18 F]fluoride PET may be useful in evaluating treatment response in breast cancer [9]. Despite this, there is limited evidence to support the use of either tracer in routine clinical practice. Measuring volumetric parameters or heterogeneity of tracer activity has also been shown to have incremental predictive or prognostic value in a number of cancers [10][11][12][13][14][15][16][17][18][19][20][21]. First-order statistics measure global properties of a tumour from individual voxel values and can be obtained from the histogram of voxel intensities and are most commonly used [22], but there are no reports on the use of first-order heterogeneity parameters in evaluating treatment response assessment of breast cancer bone metastases using [ 18 F]fluoride or [ 18 F]FDG PET.
The hypothesis of this study was that global first-order features derived from skeletal metastases, some of which describe heterogeneity in bone metastases, may be a better predictor of response to treatment in comparison with SUV max .
The objective of this study was to extract first-order features on both [ 18 F]FDG (tumour-specific radiotracer that targets glucose metabolism) and [ 18 F]fluoride (bone-specific tracer that targets osteoblast activity and local blood flow) PET/CT images in breast cancer bone metastases, at baseline and 8 weeks after endocrine treatment, and to compare their ability to predict treatment response determined by a clinical reference standard as well as survival with the most commonly described parameter, SUV max .

Participants
Sixteen female breast cancer patients (mean age 51.6, range 40-79 years) starting endocrine treatment for de novo (n = 5) or progressive bone metastases (n = 11) from an ongoing prospective single-centre exploratory study were included. The endocrine treatments used were letrozole (n = 12), tamoxifen (n = 2), everolimus/exemestane (n = 1) and anastrazole (n = 1). Apart from two patients who had small volume lung and liver metastases, all other patients had bone-only disease. [  , scans were performed after an uptake time of 60 min. Imaging comprised a static PET/CT scan using a GE Discovery 710 PET/CT scanner (GE Healthcare, Chicago, USA). Each scan covered the base of the skull to mid-thigh, with an axial field-of-view of 15.7 cm and an 11-slice overlap between bed positions. A low-dose CT scan (140 kV, 10 mA, 0.5 s rotation time and 40 mm collimation) was performed at the start of imaging to provide attenuation correction and an anatomical reference. PET scan duration was set to 3 min per bed position.
PET image reconstruction included standard scannerbased corrections for radiotracer decay, scatter, randoms and dead-time. Emission sinograms were reconstructed with a time-of-flight ordered subset expectation maximisation (OSEM) algorithm (2 iterations, 24 subsets), with a 256 × 256 matrix and a 4-mm full-width at half-maximum (FWHM) Gaussian post-reconstruction smoothing filter on the scanner front end, available from the manufacturer.

Parameter Analysis
Up to five of the most active (SUV ≥ 10) [23] and largest (≥ 1 cm diameter) lesions were first identified for analysis on . Image heterogeneity analysis was performed using in-house quantitative analysis software implemented in MATLAB (Mathworks, Natick, MA, USA). First-order statistics derived from regional geometry and the histogram distribution of voxel intensities (standard deviation (SD), entropy, uniformity, kurtosis and skewness) on both [ 18 F]FDG and [ 18 F]fluoride PET scans were calculated as well as non-heterogeneity parameters such as SUV max , SUV mean , SUV peak , TLM and MTV. All parameters on both PET scans were calculated for the same lesions at baseline and 8 weeks, and changes in the values of these parameters from baseline were used for statistical analyses. The tumour volumes were generally small ((mean volume = 7.1 cm 3 (SD = 8.3) on [ 18 F]fluoride PET/CT and (mean volume = 5.7 cm 3 (SD = 5.9) on [ 18 F]FDG PET/CT)); therefore, in order to avoid bias from small volumes, second and high-order texture features were not calculated [24,25]. We also analysed changes in SUV max between lesions in each individual patient to assess the degree of interlesional heterogeneity of response with both tracers. Interlesional heterogeneity was defined when a metastasis showed a change in parameter that was opposite to the clinical reference standard.
Two experienced oncologists working in consensus, blinded to PET/CT findings, determined clinical response based on standard imaging including bone scans and contrast-enhanced CT, clinical assessment, including pain scores (using brief pain inventory questionnaire), as well as alkaline phosphatase and carcinoma antigen 15-3 serology up to 24 weeks after the start of treatment or until progression, whichever came first and was used as a reference standard ( Table 1). The changes in clinical parameters were used in patient assessment and none of the parameters were used in isolation. Any discrepancy was reviewed by a third clinician and only one went to a third reader. Assessment decisions were made on bone-only disease, given no soft tissue disease in the majority of this group, so soft tissue response (Response Evaluation Criteria In Solid Tumours (RECIST)) was not relevant in our studied population. Patients were grouped into progressive disease (PD) and non-progressive disease (non-PD = partial response (PR) and stable disease (SD)). PR and SD patients were assessed together as clinical management is rarely different in these two groups.

Statistical Analysis
Statistical analysis was performed using SPSS for windows version 24 (IBM SPSS Statistics 24). After testing for normality, parametric or nonparametric tests were applied to each set of data. Data that were normally distributed were expressed as a mean and standard deviation and evaluated using the paired t test. Data that were not normally distributed were expressed as median and range and evaluated using Wilcoxon signed rank test or Mann-Whitney U test. For all statistical tests, a P value of ≤ 0.05 was considered statistically significant.   Kaplan-Meier analysis was performed using the median value for each parameter to dichotomise the results with differences in the curves tested with the log-rank test. Progression free survival (PFS) was defined as the time between the date of the start of endocrine treatment and the date of disease progression and overall survival (OS) was calculated from the start of endocrine treatment to the date of death or until censoring on the date of the last follow-up.

Results
There was a total of 16 patients (72 lesions). By the clinical reference standard, 4 patients (20 lesions) had PD at or before 24 weeks and 12 patients (52 lesions) non-PD at 24 weeks. Patients were followed up from between 12 and 49 (median 31.5) months. Four patients died during the follow-up period and all 16 patients progressed at between 2 and 32 (median 11.3) months.
In  Kaplan-Meier analysis showed that on [ 18 F]fluoride PET/ CT, at 8 weeks, change in kurtosis had a statistically significant association with PFS (p = 0.033) and OS (p = 0.008) (Fig. 3a, b). On [ 18 F]FDG PET/CT, change in TLM was significantly associated with OS (p = 0.041) and skewness with PFS (p = 0.005).

Discussion
Breast cancer is commonly associated with skeletal metastases and early evaluation of response or progression to treatment is vital to the optimisation of patients' clinical management. To our knowledge, this is the first report that has evaluated several first-order statistical features, including some heterogeneity parameters, for early treatment response assessment of breast cancer bone metastases compared to standard SUV measures using [ 18 F]fluoride and [ 18 F]FDG PET/CT. For [ 18 F]fluoride PET/CT, several first-order global parameters showed superiority over SUV max , either on a patient-based or lesion-based analysis in predicting PD, including volume-based parameters (MTV, TLM) and heterogeneity parameters (entropy, uniformity, kurtosis and skewness). Additionally, kurtosis was associated with both PFS and OS. Whilst recognition of PD is of most clinical importance, as these patients will need an early transition to second-line therapy, the majority of first-order parameters were also better than SUV max at predicting non-PD. The observed changes in patients with PD were as expected, i.e., an increase in activity, volume and/or heterogeneity. In particular, a decrease in kurtosis was also associated with PFS and OS. This relates to an increase in spread (less peakedness) of the voxel intensity histogram (Fig. 4a, b) or greater Bheterogeneity^in voxel values within lesions. Whilst changes in SUV and volume parameters, as well as kinetic analysis, have been reported for monitoring therapy with [ 18 F]fluoride PET in bone metastases [9,[27][28][29], to our knowledge, there are no data describing superiority of heterogeneity parameters in this situation. However, first-order heterogeneity parameters have shown predictive and prognostic ability in other cancers with [ 18 F]FDG PET/CT and increased heterogeneity is usually associated with more aggressive tumours and poor treatment response [11,13,21,30].
For [ 18 F]FDG PET/CT, no parameter performed better than SUV max in predicting response, although an increase in TLM and skewness (shift of histogram to the right with more high intensity voxels) was associated with poor OS and PFS, respectively, whereas SUV max was not prognostic. In accordance with our findings, changes in SUV max have previously been shown to be valuable in assessing treatment response in breast cancer skeletal metastases [7,8].
Heterogeneity of response between metastases is a recognised phenomenon [31] and occurred in 11/16 of our patients with [ 18 F]fluoride and 8/16 with [ 18 F]FDG. The more frequent occurrence with [ 18 F]fluoride may be partly explained by the flare phenomenon whereby a temporary increase in osteoblastic activity can occur in healing metastases in non-PD patients [32][33][34]. Despite this and the fact that uptake of the two tracers is dependent on differing underlying biological processes (tumour cell glucose metabolism with [ 18 F]FDG and osteoblastic mineralisation of bone with [ 18 F]fluoride), both tracers offered predictive and prognostic information at a level that would be of clinical utility on a per patient basis. We also observed that entropy, uniformity and kurtosis were significantly different in the 15/20 lesions with a concordant increase in  SUV max in PD patients compared to a discordant increase in the 22/52 lesions in patients with non-PD, the latter that could be attributed to the flare phenomenon. This would require further prospective validation but the potential to differentiate an increase in uptake due to true progression from the flare phenomenon at 8 weeks would be of great clinical utility when using [ 18 F]fluoride PET to measure early response, overcoming one of the limitations of bonespecific imaging.
Limitations of this study include a relatively small number of patients, although we were able to include a large number of individual metastases in the analysis (n = 72). In addition, there was a smaller number of patients with PD (n = 4) compared to non-PD which may have introduced an element of statistical bias. Whilst all patients had similar treatment, i.e., endocrine therapy, treatment regimens were not exactly the same and there was probably heterogeneity in response. Nevertheless, the main objective of this exploratory study was to measure response rather than treatmentspecific effects. Though our study was prospective, these findings deserve further evaluation in larger cohorts as well as in the assessment of different types of therapy and bone metastases from other cancers. Whilst there is no gold standard for determining treatment response in bone metastases, our clinical reference standard was made as robust and clinically relevant as possible by including clinical findings, conventional imaging, biochemistry and tumour markers up to 24 weeks assessed by two oncologists in consensus and we were also able to include a survival analysis as an objective assessment of the measured parameters. Whilst criteria for response assessment have previously been reported for [ 18 F]FDG SUV max or SUV peak [3,35], these criteria have not been applied to other first-order parameters and so we used optimal cut-offs in this exploratory study in addition to 25 % cut-offs for [ 18 F]FDG and [ 18 F]fluoride conventional SUV parameters. We acknowledge that repeatability of first-order texture features is variable and that the optimal cut-offs for some parameters that have been reported as showing lower repeatability, such as uniformity and skewness [36], may be within the limits of repeatability measurements. Several novel parameters still performed better, even when an optimal SUV max cut-off was used for comparison.

Conclusions
Our exploratory data demonstrate that certain first-order statistical features from [ 18 F]fluoride and [ 18 F]FDG PET, related to volume and heterogeneity, may provide incremental value over SUV max in the prediction of treatment response and survival in breast cancer bone metastases treated with endocrine therapy, a finding that deserves confirmation in further prospective evaluation in future studies. In addition, with [ 18 F]fluoride some heterogeneity parameters can potentially differentiate an increase in SUV max due to flare from that due to progression of disease.
These findings may be of potential clinical utility as the prediction of early PD helps oncologists decide whether an earlier switch to more effective therapies is required, whereas non-PD patients would generally continue the same treatment if there were no significant toxicities. We also observed that intra-patient heterogeneity of response occurs commonly between metastases with both tracers and most parameters.