Prediction of therapy response in bone-predominant metastatic breast cancer: comparison of [18F] fluorodeoxyglucose and [18F]-fluoride PET/CT with whole-body MRI with diffusion-weighted imaging

Purpose To compare [18F]-fluorodeoxyglucose (FDG) and [18F]-sodium fluoride (NaF) positron emission tomography/computed tomography (PET/CT) with whole-body magnetic resonance with diffusion-weighted imaging (WB-MRI), for endocrine therapy response prediction at 8 weeks in bone-predominant metastatic breast cancer. Patients and methods Thirty-one patients scheduled for endocrine therapy had up to five bone metastases measured [FDG, NaF PET/CT: maximum standardized uptake value (SUVmax); WB-MRI: median apparent diffusion coefficient (ADCmed)] at baseline and 8 weeks. To detect the flare phenomenon, a 12-week NaF PET/CT was also performed if 8-week SUVmax increased. A 25% parameter change differentiated imaging progressive disease (PD) from non-PD and was compared to a 24-week clinical reference standard and progression-free survival (PFS). Results Twenty-two patients (median age, 58.6 years, range, 40–79 years) completing baseline and 8-week imaging were included in the final analysis. Per-patient % change in NaF SUVmax predicted 24-week clinical PD with sensitivity, specificity and accuracy of 60, 73.3, and 70%, respectively. For FDG SUVmax the results were 0, 100, and 76.2% and for ADCmed, 0, 100 and 72.2%, respectively. PFS < 24 weeks was associated with % change in SUVmax (NaF: 41.7 vs. 0.7%, p = 0.039; FDG: − 4.8 vs. − 28.6%, p = 0.005) but not ADCmed (− 0.5 vs. 10.1%, p = 0.098). Interlesional response heterogeneity occurred in all modalities and NaF flare occurred in seven patients. Conclusions FDG PET/CT and WB-MRI best predicted clinical non-PD and both FDG and NaF PET/CT predicted PFS < 24 weeks. Lesional response heterogeneity occurs with all modalities and flare is common with NaF PET/CT.


Introduction
Bone metastases in patients with advanced breast cancer are common, occurring in at least 70% of patients with advanced disease, and cause significant morbidity [1]. Patients with breast cancer and bone metastases have a relatively long survival compared to other cancers and coupled with the associated morbidity, there are significant implications for healthcare costs [2,3]. Despite improved therapeutics, response rates are generally less than 50% and so accurate and timely treatment response-assessment methods are essential for optimal management [4]. However, it is recognized that there is an unmet clinical need for correct categorization of treatment response versus non-response in skeletal metastases at an early time point as conventional methods, e.g., RECIST 1.1 measurements on computed tomography (CT) or magnetic resonance imaging (MRI), usually classify skeletal metastases as non-measurable disease [5,6]. Similarly, the isotope bone scan is considered to have poor sensitivity and specificity for detecting early response or non-response [7]. This means that without an objective early measure of non-response, patients with bone-predominant metastatic disease may continue with ineffective treatment longer than necessary, delaying therapeutic transition to second or third-line treatment and exposing them to unnecessary treatment-related side effects.
There is increasing evidence that functional imaging methods may be able to address this need with reported studies evaluating individual modalities. There is greatest supporting evidence for [ 18 F]-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) measuring reduction in glucose metabolism in responding metastases [8][9][10][11][12]. There is relatively little evidence for alternative imaging methods, which nevertheless show promise in breast cancer or other cancers, including [ 18 F]-sodium fluoride (NaF) PET/CT [13][14][15][16][17] and wholebody MRI including diffusion-weighted sequences (WB-MRI) [18][19][20]. NaF uptake is dependent on altered blood flow and mineralization in the metastatic bone microenvironment [21] and WB-MRI measures changes in the diffusivity of water molecules within tumors [20]. There is no good comparative evidence of significant superiority of any of these methods in measuring treatment response and no clear guidance on the preferred imaging technique in clinical practice [6].
Our hypothesis was that the three functional imaging methods, WB-MRI, FDG, and NaF PET/CT, can detect functional and metabolic changes in breast cancer skeletal metastases as early as 8 weeks after commencing endocrine-based therapy. The aims were to measure baseline parameters and endocrine therapy related changes with each method, to determine the accuracy of each method to predict progressive disease (PD), or non-PD compared to a clinical reference standard, and to determine if the magnitude of change in any method after 8 weeks of treatment was associated with progression-free survival (PFS).

Patients and methods
This prospective study received research ethics committee approval and all patients gave signed informed consent. Thirty-one patients over the age of 18 with histologically confirmed breast cancer, with either de-novo or progressive bone-predominant metastatic disease scheduled for new endocrine therapy, were recruited. Patients who were also scheduled for radiotherapy or colony-stimulating factors were excluded due to potential effects on functional imaging parameters.
All patients underwent standard follow-up with clinical assessments including a pain inventory [22], blood tests, including serum alkaline phosphatase and tumor marker CA15-3, and standard imaging, including bone scintigraphy and/or diagnostic CT. The reference standard for clinical PD or non-PD was determined by two oncologists (IS and JM, with 10 and 27 years of specialist oncology experience) in consensus using all the listed clinical assessments up to 24 weeks, or earlier if there was clinical PD. Patients were categorized as either having clinical PD or non-PD (stable disease or partial response) as this is the most relevant dichotomization for clinical management, i.e., continue treatment if non-PD without treatment toxicity or change treatment if PD [23].
Prior to commencing endocrine therapy, patients underwent baseline WB-MRI, FDG, and NaF PET/CT, which were repeated using the same imaging protocol after 8 weeks of therapy. When an increase in maximum standardized uptake value (SUV max ) was measured in any bone lesions, a further 12-week NaF PET/CT scan was performed when possible to help determine if an early increase in activity was due to the flare phenomenon, which has been reported with this tracer [24]. As a flare is not a recognized phenomenon with WB-MRI or FDG PET/CT, this was not performed with these modalities. As RECIST 1.1 precludes using CT for measuring response in bone metastases unless there is a measurable soft tissue component, we did not include stand-alone CT analysis in our protocol.

FDG PET/CT
Scans were acquired 60 min after intravenous injection of FDG (mean 348 ± 18 MBq) and all patients had blood glucose measurements of < 10 mmol/l. Images were acquired from skull base to upper thighs with 3 min per bed position using a GE Discovery 710 PET/CT scanner (GE Healthcare, Chicago, IL, USA). A low-dose CT scan (140 kV, 10 mA, 0.5 s rotation time, and 40-mm collimation) was performed at the start of imaging to provide attenuation correction and an anatomical reference. PET images were reconstructed with a time-of-flight ordered subset expectation maximization algorithm (2 iterations, 24 subsets) with a reconstructed slice thickness of 3.27 mm and pixel size 4.7 mm.

NaF PET/CT
Scans were acquired 60 min after injection of NaF (mean 228 ± 15 MBq). All other acquisition and reconstruction parameters were as for FDG PET/CT.

Scan analysis
Up to five of the largest bone metastases as assessed on the NaF PET scans were analyzed in each patient by the same reader (GC), with 25 years of radiology and PET experience, using the identical lesions in each of the three scan types. For WB-MRI, lesions were identified on T1-W and T2-W sequences and regions drawn on DWI b900, which were automatically mapped to the accompanying ADC images for measurement of the median ADC in mm 2 /s (ADC med ). A reduction in ADC med of > 25% was used to differentiate imaging PD from imaging non-PD [25,26]. For FDG and NaF PET/ CT, the same lesions were selected and regions of interest (ROIs) outlined semi-automatically using a 40% of maximum activity threshold. The maximum standardized uptake value (SUV max ) was measured from these regions and a > 25% increase used to differentiate imaging PD from imaging non-PD [27][28][29]. Individual lesion ADC med and SUV max measurements were recorded for per-lesion analysis and mean values for each patient recorded for per-patient analysis. Both analyses were performed to obtain clinically relevant results on a per-patient basis that can be used in management decisions and also to report on intra-patient response heterogeneity, a topic of interest in oncology and an observation that may impact on treatment decisions. Intra-patient inter-lesional heterogeneity of response was defined when a lesion showed a > 25% change that was discordant with the clinical reference standard for that patient. In NaF PET/CT scans, a flare was defined in each lesion that showed an initial increase in SUV max at 8 weeks that then declined on the 12-week scan.

Statistical methods
It was calculated that 20 patients with baseline and 8-week scans would give 80% power to predict clinical PD from non-PD 80% of the time (deemed a clinically useful level) for each modality. Differences between parameters in patients with clinical PD (PFS < 24 weeks) versus non-PD were tested for normality and compared using Student's t test or Mann-Whitney U test as appropriate. PFS was defined as the time from the first study scan until the time to clinical progression. Patients who had not progressed clinically at the end of the study were censored. Relationships between scans and PFS were tested by comparing scan metrics in patients with PFS < 24 weeks with those > 24 weeks. Statistical analyses were conducted using IBM SPSS statistics software (version 24). A P value of < 0.05 was used for statistical significance.

Patients
Twenty-two patients (median age 58.6, range, 40-79 years) completed at least one set of both baseline and 8-week imaging (18 all three modalities, two FDG alone, one FDG and NaF and one NaF alone) and hence WB-MRI: 18 patients, 76 lesions; FDG PET/CT: 21 patients, 90 lesions; NaF PET/CT: 20 patients, 85 lesions. Nine patients did not undergo 8-week imaging (eight due to patient choice and one required radiotherapy for incipient cord compression). Six patients had de novo metastatic disease and 16 had progressive disease prior to recruitment and apart from two patients who had small volume lung and liver metastases; all patients had only skeletal metastases. Endocrine therapy consisted of letrozole (n = 12), exemestane with everolimus (n = 6), tamoxifen (n = 3), and famotidine (n = 1). Bisphosphonates (zoledronic acid, n = 11 or ibandronate, n = 4) or denosumab (n = 7), were used as adjunctive therapy. By the clinical reference standard up to 24 weeks, five patients had PD and 17 patients had non-PD. Median PFS was 10.3 months (2.6-47.5 months) with 15 patients alive at the end of the study when censored.

NaF PET/CT
There was a significant difference in % change in SUV max between patients with clinical PD (PFS < 24 weeks) and non-  Fig. 1a). On analysis of the 85 lesions, 54 were concordant with the clinical reference standard of non-PD and 11 of 20 lesions showed a > 25% increase in SUV max in patients with clinical PD (Table 1, Fig. 1b). Baseline SUV max was not associated with clinical PD (p = 0.6). Twelve of the 85 lesions (14.1%) in seven patients showed discordant changes to the clinical reference standard and were categorized as showing an inter-lesional heterogeneous imaging response.
Eighteen lesions (21.2%) in seven patients showed an increase in NaF SUV max at 8 weeks followed by a subsequent decrease at 12 weeks, and were therefore categorized as a flare. Four of these patients had clinical non-PD at 24 weeks.

FDG PET/CT
There was a significant difference in % change in SUV max between patients with clinical PD (PFS < 24 weeks) and non-PD on a per-patient analysis (− 4.8 vs. -28.6%, p = 0.005) and on a per-lesion analysis (− 5.0 vs. − 29.7%, p = 0.001). Of the 21 patients, 16 showed < 25% increase in SUV max and were concordant with the clinical reference standard of non-PD. None of the five patients with clinical PD showed an increase in SUV max > 25% (Table 1, Fig. 1a). On analysis of the 90 individual lesions, 68 were concordant with clinical non-PD but  only one of 20 lesions showed a > 25% increase in SUV max in patients with clinical PD (Table 1, Fig. 1b). Baseline SUV max was not associated with clinical PD (p = 0.65).
Five of the 90 lesions (5.6%) in four patients showed discordant changes to the clinical reference standard and were categorized as showing an inter-lesional heterogeneous imaging response.

WB-MRI
There was no significant difference in % change in ADC med between patients with clinical PD (PFS < 24 weeks) and non-PD on a per-patient analysis (− 0.5 vs. 10.1%, p = 0.098) but there was on a per-lesion analysis (− 3.2 vs. 9.2%, p = 0.012). Of the 18 patients, 13 showed less than a 25% decrease in ADC med and were concordant with the clinical reference standard of non-PD. None of the five patients with clinical PD showed a > 25% decrease in ADC med (Table 1, Fig. 1a). On analysis of the 76 individual lesions, 54 were concordant with clinical non-PD but only three of 20 lesions in patients with clinical PD showed a > 25% decrease in ADC med (Table 1, Fig. 1b). There was no difference in baseline ADC med (p = 0.46) between patients with PFS < 24 weeks compared to PFS > 24 weeks.
Two of the 76 lesions (2.6%) in two patients showed discordant changes to the clinical reference standard and were categorized as showing an inter-lesional heterogeneous imaging response. Representative NaF, FDG PET, and WB-MRI images from a patient who showed a response by the clinical reference standard are illustrated in Figs. 2, 3, and 4, respectively.

Discussion
Recognizing the limitations of conventional imaging in predicting treatment response in skeletal metastases and the increasing adoption of novel functional imaging into oncologic practice, it is timely to directly compare three contending functional imaging methods in this role. We have shown in this cohort of bone-predominant metastatic breast cancer patients treated with endocrine therapy that changes in parameters that reflect tumor cellularity (DWI), tumor glucose metabolism (FDG PET/CT), and the bone microenvironment (NaF PET/CT), can be detected and quantified. All three modalities showed a similar overall accuracy in predicting PD/non-PD as determined by a clinical reference standard that used conventional clinical, blood, and imaging methods up to 24 weeks. In addition, significant differences were seen in the magnitude of parameter change in those with PFS < 24 weeks compared to those with longer PFS, for NaF, and FDG PET/CT. While FDG PET/CT and WB-MRI performed well in predicting non-PD, which would allow patients to continue with therapy [26], the magnitude of change (reduction in SUV max or increase in ADC med ) on a per-patient or perlesion basis was greater with FDG ( Fig. 1). However, neither WB-MRI nor FDG PET/CT predicted PD at this early 8-week time point. Both FDG PET/CT and WB-MRI primarily reflect tumor cell effects (glucose metabolism [30] and restriction in water molecule motion influenced by cellularity and other tumor-related factors [31], respectively) and while these demonstrated > 25% changes in less than 8 weeks in many responding metastases, the biological changes associated with tumor progression were not of sufficient magnitude to be detected this early at this threshold, implying a non-linear relationship between changes in image parameters and clinical PD.
Nevertheless, a significant difference in % change of FDG SUV max was seen between patients with PFS < 24 weeks and those with longer PFS but with no significant difference in ADC med , implying % change in FDG SUV max may be a better prognostic metric. Our findings augment previous reports where FDG PET/CT SUV max has previously been shown to be associated with PFS in skeletal, nodal, or visceral metastases from breast cancer [8,9] or to be associated with changes in tumor markers [10,11]. To our knowledge, no literature exists for response prediction or assessment using WB-MRI in skeletal metastases from breast cancer in humans. Preclinical data, using a breast cancer model treated with the antiangiogenic agent bevacizumab, rather than endocrine treatment or chemotherapy, found DWI to be insensitive [32]. However, several small series report an increase in ADC in responding prostate cancer bone metastases [19,20,33].
While NaF PET/CT feasibility has previously been shown for monitoring treatment response in breast cancer bone metastases [13], to our knowledge definitive results have only been shown in prostate cancer [14][15][16][17]. In our series, NaF PET/CT showed modest sensitivity for predicting clinical PD (three of five patients). However, the clinical utility of NaF would be limited, as imaging PD would not be able to be differentiated from a treatment-induced flare, as observed in some of our patients. Despite these observations, the results for NaF PET/CT are of academic interest and suggest that the bone microenvironment changes reflected by this tracer are more rapid and larger in amplitude than the changes we saw with tumor cellular processes demonstrated by FDG PET/CT and WB-MRI.
We observed a heterogeneous response between metastases most frequently with NaF PET/CT, predominantly reflecting the flare phenomenon seen with this tracer. Some interlesional response heterogeneity was also observed with FDG PET/CT (5.6%) and WB-MRI (2.6%), suggesting that biological response heterogeneity exists and may reflect tumor resistance to therapy in some clones.
This study has some potential limitations. Additional testretest scans were not performed for measurement of repeatability as the imaging protocol was already intensive for patients and as good repeatability and inter-observer variation have previously been reported for all three imaging methods employed in this study [19,25,26,28,29,34]. Partly due to the intensity of the protocol, nine patients did not complete any 8-week imaging and a small number of patients did not undergo all three imaging tests. This led to a lower number of evaluable patients than preferred but nevertheless, enabled a comparison between all three modalities in most patients and a large number of metastases (n = 90) were included in the analysis. We adopted previously published thresholds of 25% change in image metrics to differentiate PD from non-PD but using alternative thresholds would not have significantly improved the ability to differentiate in this series with 25% appearing to be a satisfactory level cross the three modalities. Potential limitations with the clinical reference standard we employed were offset by using consensus from two blinded oncologists and allowing all standard clinical, blood, and imaging to be included while allowing up to 24 weeks for assessment in a method we have previously shown to be robust [35]. While specific treatments differed slightly between patients, they were all endocrine-based regimens without chemotherapy or other non-endocrine treatment-based regimes in order to minimize any heterogeneity due to different classes of treatment as was practically possible. Finally, we have only tested the imaging metrics in breast cancer patients undergoing endocrine-based therapy and we cannot exclude different biological effects from other therapeutic regimes, e.g., chemotherapy, that could affect imaging parameters differently.

Conclusions
Changes in tumor cell characteristics and bone microenvironment can be measured with functional imaging methods in bone-predominant metastatic breast cancer at 8 weeks after commencing endocrine-based therapy, although the amplitude of changes did not always reach the threshold for response categorization. Overall accuracy in predicting PD is similar between the three tested modalities but FDG PET/CT and WB-MRI are more reliable than NaF PET/CT in determining non-PD at 8 weeks. Given the larger quantitative percent changes in FDG SUV max compared to ADC med and the fact that larger percent changes in FDG SUV max are associated with PFS, this method has an advantage for determining non-PD and would allow an early decision for patient therapy to continue if there were no limiting side effects. In contrast, none of the three methods were reliable at 8 weeks in predicting subsequent clinical PD, NaF PET/CT performing best with a sensitivity of 60%. However, a flare and inter-lesional heterogeneity is relatively common at 8 weeks with NaF PET/CT and because of these factors, this method would not be sufficiently reliable to change a patient's treatment at 8 weeks.

Compliance with ethical standards
Conflict of interest The authors declare they have no conflicts of interest.
Ethical approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Informed consent Informed consent was obtained from all individual participants included in the study.
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.