Diagnostic and prognostic value of baseline FDG PET/CT skeletal textural features in diffuse large B cell lymphoma

Purpose Our purpose was to evaluate the diagnostic and prognostic value of skeletal textural features (TFs) on baseline FDG PET in diffuse large B cell lymphoma (DLBCL) patients. Methods Eighty-two patients with DLBCL who underwent a bone marrow biopsy (BMB) and a PET scan between December 2008 and December 2015 were included. Two readers blinded to the BMB results visually assessed PET images for bone marrow involvement (BMI) in consensus, and a third observer drew a volume of interest (VOI) encompassing the axial skeleton and the pelvis, which was used to assess skeletal TFs. ROC analysis was used to determine the best TF able to diagnose BMI among four first-order, six second-order and 11 third-order metrics, which was then compared for diagnosis and prognosis in disease-free patients (BMB−/PET-) versus patients considered to have BMI (BMB+/PET-, BMB−/PET+, and BMB+/PET+). Results Twenty-two out of 82 patients (26.8%) had BMI: 13 BMB−/PET+, eight BMB+/PET+ and one BMB+/PET-. Among the nine BMB+ patients, one had discordant BMI identified by both visual and TF PET assessment. ROC analysis showed that SkewnessH, a first-order metric, was the best parameter for identifying BMI with sensitivity and specificity of 81.8% and 81.7%, respectively. SkewnessH demonstrated better discriminative power over BMB and PET visual analysis for patient stratification: hazard ratios (HR), 3.78 (P = 0.02) versus 2.81 (P = 0.06) for overall survival (OS) and HR, 3.17 (P = 0.03) versus 1.26 (P = 0.70) for progression-free survival (PFS). In multivariate analysis accounting for IPI score, bulky status, haemoglobin and SkewnessH, the only independent predictor of OS was the IPI score, while the only independent predictor of PFS was SkewnessH. Conclusion The better discriminative power of skeletal heterogeneity for risk stratification compared to BMB and PET visual analysis in the overall population, and more specifically in BMB−/PET- patients, suggests that it can be useful to identify diagnostically overlooked BMI.


Introduction
Diffuse large B-cell lymphoma (DLBCL) accounts for 30% to 58% of the non-Hodgkin's lymphoma series [1]. Positron emission tomography coupled with computed tomography (PET/CT) has become the standard non-invasive examination for the initial staging of DLBCL [2,3]. It improves the accuracy of staging compared to CT and leads to stage migration in 10 to 30% of patients. Consequently, fewer patients are undertreated or overtreated [4]. In regard to bone marrow involvement (BMI), focal bone marrow FDG uptake with or without increased diffuse uptake is more sensitive than bone marrow biopsy (BMB) but can overlook low-volume diffuse involvement of 10% to 20% of the marrow and discordant lymphoma (small cells) [5][6][7]. The proportion of patients in whom BMB is positive while FDG PET/CT is negative for BMI has been estimated at 3.1% [8]. This could be explained by the lack of consensus on whether diffuse bone marrow FDG uptake should be regarded as a positive or a negative finding, as highlighted in a recent meta-analysis on PET/CT for the detection of BMI in DLBCL [8]. Moreover, Paone et al. [9] found that PET-CT was more sensitive for the detection of concordant BM involvement (large cells) than discordant BM involvement (small cells). Therefore, in most cases, positive PET/ CT is usually sufficient to designate advanced-stage disease, and BMB is not required. However, if the scan is negative, a BMB could be indicated to identify involvement relevant for a clinical trial or patient management and especially discordant histology.
Regarding the prognostic impact of BM status as determined by FDG PET/CT, discrepant findings have been published. In a retrospective study including 133 patients, Berthet et al. demonstrated that PET was an independent predictor for progression-free survival (PFS) but not overall survival (OS) in a multivariate analysis [5], whereas Hong et al. found no differences in PFS or OS between PET-positive and PET-negative patients [10]. A third study stated that the outcome of patients with positive PET findings was comparable to that of other patients with stage IV disease without positive BMB [6]. Finally, the latest study to date showed that bone marrow status assessed by baseline PET is an independent predictor of OS with worse survival outcomes in patients with BMI in patients staged IV. These conflicting results suggest that further research is needed.
There is currently growing interest in oncology in using alternatives to visual or semi-quantitative PET assessment that are based on SUV metrics as diagnostic and prognostic indicators or probabilistic indicators of response to treatment. Until now, these metrics (textural features, TFs) were applied only to primary tumours and studied mainly as predictors of treatment response or prognostic factors [11,12]. We assume that in the framework of newly diagnosed DLBCL, FDG PET TFs may provide a more comprehensive quantitative assessment of bone involvement, in particular in doubtful patients displaying diffuse and heterogeneous skeleton uptake. Thus, the aim of this study was to evaluate (1) the value of textural features (TFs) for the diagnosis of bone involvement, especially in diffuse and discordant BMI, and (2) the prognostic value of TF-based bone marrow assessment in patients with newly diagnosed DLBCL.

Population
All patients diagnosed with diffuse large B cell lymphoma and who had a bone marrow biopsy (BMB) were retrospectively included from December 2008 to December 2015. In accordance with European regulations, French observational studies without any additional therapy or monitoring procedure do not need the approval of an ethical committee. Nonetheless, we sought approval to collect data for our study from the national committee for data privacy, the National Commission on Informatics and Liberty (CNIL), with registration n°2,081,250 v 0.

PET acquisition and reconstruction parameters
After a 15-min rest in a warm room, patients who had been fasting for 6 h were injected intravenously with 18F-FDG. Height, weight, the injected dose, the capillary glycaemia at the injection time and the exact delay between injection and the start of the acquisition were recorded for each patient. Body mass index was used to separate overweight and obese patients (≥ 25 kg/m 2 ) from low to normal-weight patients (< 25 kg/m 2 ) for whom a longer time per bed position was used.
All PET imaging studies were performed on a Biograph TrueV (Siemens Medical Solutions) with a 6-slice spiral CT (Computed Tomography) component. Additional technical details regarding the system and PSF reconstruction can be found elsewhere [13,14]. CT acquisition was performed first with the following parameters: 60 mAs, 130 kVp, pitch 1 and 6 × 2 mm collimation. Subsequently, the PET emission acquisition was performed in 3D-mode. For low to normal-weight patients and overweight to obese patients, the durations were 2 min 40 s and 3 min 40 s, respectively. Patients were scanned from the skull to the mid-thighs. All examinations were reconstructed using an OSEM algorithm with point spread function (PSF) modelling (HD; TrueX, Siemens Medical Solution) with three iterations and 21 subsets without filtering. The matrix size was 168 × 168 voxels, resulting in isotropic voxels of 4.07 × 4.07 × 4.07 mm 3 . Scatter and attenuation corrections were applied.

PET visual interpretation
PET examinations were reviewed using MIM (MIM software, Cleveland, OH, USA, version 5.6.5). Two experienced readers, blinded to BMB results, visually assessed the bone status of each patient. PET/CT examinations were considered to be positive in cases of one or several bone focal uptakes on PET images with or without bone lesion on CT images. Diffuse and/or heterogeneous skeleton uptake was not considered a positive finding. In case of discrepancy, the examination was conjointly reviewed to reach a consensus.

Extraction of PET bone textural features
The following procedure was made in duplicate by one junior and one senior PET reader. For each examination, a preliminary volume of interest (VOI) involving the axial skeleton was drawn using CT densities [Hounsfield units (HU) >150]. To obtain a single volume of interest (VOI) encompassing the spine and half of the pelvis, the VOI was manually adapted to exclude all unwanted bone areas. The chosen pelvic area was the one reported to be the site of BMB in the medical report, when mentioned. Otherwise, the right part of the pelvis was arbitrarily chosen. The only exception was the presence of a hip prosthesis, for which the contralateral pelvic area was chosen to avoid PET attenuation correction artefacts. The final CT VOI was then transferred to PET images and checked for all possible lymph node areas of increased FDG uptake in the vicinity of the skeleton (especially in the retroperitoneum) that could affect texture features because of a spill-over effect [15]. Areas of contiguous bone involvement were also manually excluded. Finally, the VOI was saved in DICOM-RT structure format ( Fig. 1) so that it could be processed on LIFEx (version 2.0), third party freeware developed by Buvat and co-workers [16,17] (www.lifexsoft.org).
Each PET dataset and corresponding VOI were loaded into LifeX software to extract the following: . Index values were calculated using a single co-occurrence matrix taking into account all 13 spatial directions simultaneously [18,19].

Statistical analysis
Quantitative data are presented as the mean ± standard deviation (SD) or median (interquartile range) when appropriate. Characteristics of populations were compared by using Fischer's exact tests or Chi-square tests for discrete variables and Mann-Whitney tests for continuous variables.
The agreement of VOIs and SkewnessH values between observers was evaluated by means of linear regressions, volume concordance indexes (Dice and Jaccard indexes) and Cohen's kappa. Kappa (κ) value was reported using the benchmarks of Landis and Koch [20]. Volume concordance indexes were computed as follows: where ∩ and ∪ represent the intersection and the union of two VOIs, respectively. For receivers operating characteristic (ROC) and survival analyses, BMB and PET results were taken as the gold standard. BMB−/PET-patients were considered disease-free patients (disease-patients) whereas BMB+/PET-, BMB−/PET+ and BMB+/PET+ patients were considered to have BMI (disease + patients). Receivers operating characteristic (ROC) analyses [21] were used to define area under the curve (AUC), Youden indexes and optimal cut-off values of each metric for the diagnosis of BMI.
Survival analysis was performed using univariate and multivariate Cox regression models. OS and PFS univariate survival functions were calculated by using Kaplan-Meyer survival analyses with log-rank tests to compare survival curves. For specific overall survival (OS), the end-point was defined as the time from diagnosis to the date of death from the lymphoma disease (lymphoma itself or treatment side-effects). For progression-free survival (PFS) the end-point was defined as the time from diagnosis to the point of relapse or progression.
Further ROC and survival analyses were conducted using overall survival (OS) and progression-free survival (PFS) data as the reference standard.
Statistical significance was considered at p < 0.05. Data were analysed using Graphpad Prism and MedCalc (MedCalc Software, Ostend, Belgium) software.

Population characteristics
The baseline examinations of 84 patients were included. Two patients were excluded because of non-contributive BMB results. The population's characteristics (n = 82) are summarised in Table 1. One PET examination (1.2%) did not fulfil the EANM procedure guidelines for tumour imaging [22]. The mean capillary glycaemia was 1.00 ± 0.20 g/l. The mean injected dose was 4.00 ± 0.29 MBq/kg, and the mean post injection imaging time was 62 ± 5 min. None of the patients  Textural feature ROC analyses for the diagnosis of bone involvement were highly statistically significant for all the first-order parameters with p values <0.0001. Among second-order and third-order parameters, two parameters over six (contrast and correlation) and five parameters over 11 (SZE, HGZE, SZHGE, LZHGE, ZLNU) were found to have statistically significant ROC analyses, respectively ( Table 2). The parameter displaying the highest Youden index (J = 0.6348) and area under the curve (AUC = 0.820) was SkewnessH. ROC analyses with OS and PFS as reference standard are displayed in the Tables 3 and 4 as well as corresponding univariate survival analyses.
Linear regressions showed significant association between SkewnessH values and haemoglobin and LDH values with r 2 values equal to 0.10 (p = 0.005) and 0.08 (p = 0.01), respectively. There was no significant association between SkewnessH values and lymphocytes level (r 2 = 0.04, p = 0.07). Moreover, neither was there any significant association between SkewnessH value and patient age (r 2 = 0.003, p = 0.21), thus suggesting that degenerative osteoarthritis was not a confounding factor.
There was a significant difference between mean SkewnessH values extracted from disease-free patients' images and those extracted from disease + patients' images, with higher values in disease + patients: 2.75 ± 1.575 versus 1.26 ± 0.968, p < 0.0001 (Fig. 2). With a SkewnessH cut-off value set to 1.26, the sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio and negative likelihood ratio were 81.8%, 81.7%, 62.1%, 92.5%, 4.46 and 0.22, respectively. Fifty-three (64.6%) patients had a SkewnessH value ≤1.26 and 29 (35.4%) patients had a SkewnessH value >1.26 (Table 1). There were four SkewnessH false negative (FN) results (two BMB−/PET+ and two BMB+/PET+ patients) corresponding to two patients with unifocal abnormality on PET images and two patients with bone involvement outside the VOIs. There were also 11 false positive findings among the 60 BMB−/PET-patients. Notably, the unique BMB+/PET-patient with concordant bone involvement on BMB was efficiently diagnosed with BMI when using SkewnessH: she was a normal-weighted 58-year-old woman with a Bulky disease, an IPI score of 3 and SkewnessH value equal to 1.40667.
Representative PET/CT images and corresponding VOIs of two BMB−/PET-patients with SkewnessH values ≤1.26 and >1.26 are shown in Fig. 3.

Prognostic value of bone textural features at baseline staging
With a median follow-up of 25.7 months (range: 1.4 -83.9 months), 13 patients (15.9%) experienced progression or relapse of their DLBCL, and 12 patients (14.6%) died from the lymphoma disease (lymphoma itself or treatment side-effects). For the whole population, the estimated PFS at 2 years was 82.2 ± 4.5%, and the estimated OS at 2 years was 84.5 ± 4.1%. In univariate analysis, using BMB results and PET bone marrow visual assessment, there was no significant difference between the PFS or OS of disease-(BMB−/PET-) patients and disease + (BMB−/PET+, BMB+/PET-and BMB+/PET+) patients (Fig. 4). Using the quantitative bone marrow assessment based on SkewnessH values, there was a difference between PFS and OS of SkewnessH negative patients (≤1.26) and SkewnessH positive patients (>1.26). The estimated PFS at 2 years was 88.8 ± 4.8% and 70.7 ± 8.8% for SkewnessH negative patients (≤1.26) and SkewnessH positive patients (>1.26), respectively (p = 0.03). The estimated OS at 2 years was 92.0 ± 3.8% and 71.5 ± 8.5% for SkewnessH negative patients (≤1.26) and SkewnessH positive patients (>1.26), respectively (p = 0.02) (Fig. 4). Notably, among 60 BMB−/PET-patients, there was a significant difference between the 2-year PFS of the 11 patients who had a SkewnessH value >1.26 and that of the 49 patients who had a SkewnessH value ≤1.26: 63.6% ± 14.5 and 87.9% ± 5.1, respectively (p = 0.03). For 2-year OS, similar Fig. 3 Representative images and VOI histograms of two BMB −/PETpatients. The patient displayed on panel (a) was considered positive for BMI according to PET textural feature assessment with a SkewnessH value equal to 5.30, whereas the patient displayed on panel (b) was considered negative with a Skewness value equal to 0.63. For each patient, from left to right, the maximum intensity projection (MIP) image, a coronal slice, a sagittal slice centred on the spine and the VOI histogram are displayed result was observed (p = 0.04) (Fig. 5). In multivariate analyses integrating the international prognostic index (IPI), Bulky status, haemoglobin, and SkewnessH, the only independent predictor of OS was the IPI (Table 5), and the only independent predictor of PFS was SkewnessH (Table 6).

Inter-observer VOI agreement
Linear regression showed a good agreement of VOI volumes (cc) between observers with a r 2 value equal to 0.87 (p < 0.0001) (Fig. 6a). Moreover, spatial concordance of VOIs was almost perfect with mean Dice and Jaccard indexes equal to 0.89 ± 0.02 and 0.81 ± 0.04, respectively (Fig. 6b). Concerning SkewnessH values, there was also a good agreement between observers with a r 2 value equal to 0.87 (p < 0.0001) (Fig. 6c). Inter-rater agreement for the diagnosis of BMI using SkewnessH was very good with κ value of 0.81 (95%CI = 0.68-0.95).

Discussion
In the present study, conducted to determine the diagnostic and prognostic value of FDG skeletal TFs in DLBCL, the parameter displaying the better AUC and Youden index in diagnostic ROC analyses appeared to be SkewnessH. It is noteworthy that this metric is a first-order TF parameter, meaning that it is one of the simplest ones based on the VOI histogram [23]. Further ROC analyses with OS and PFS data as the reference standard were done and it was found also that first-order metrics performed better than second-order and third-order ones. However, this method had the disadvantage of giving different cut-off values for OS and PFS survival analyses. Using BMB and PET as reference, SkewnessH ROC analysis give a unique cut-off value set to 1.26. Using this cut-off, the sensitivity and specificity in detecting BMI were equal to 81.8% and 81.7%, respectively. Four false negative (FN) findings were observed among the 22 disease + patients (BMB+/PET+, BMB+/PET-and BMB−/PET+ patients). Two of them were related to BMB-patients harbouring small and unifocal PET abnormalities. These patients  demonstrated good PFS and OS with no relapse or death at 2 years. Therefore, one could wonder about the clinical relevance of such findings. The two remaining FN findings were due to consequent bone involvement but outside the VOI. Due to technical considerations, more exhaustive VOIs could not yet be applied. Nevertheless, efforts could certainly be made on software development to figure out this issue. However, in clinical practice, the problem is currently mainly focused on negative PET scans for which BMI cannot be definitely excluded. Indeed, it was previously estimated in a recent metaanalysis that the proportion of patients in whom BMB is positive while FDG PET is negative for BMI reached 3.1% of cases. One could assume that this proportion of visually PET negative patients having actual BMI is certainly under-estimated because of the randomly selected and restricted BMB exploration, limited to a small zone of the pelvic bone.
Previous studies had shown that discrepancy between biopsy sites may occur in as many as 10% to 60% of non-Hodgkin lymphomas [24][25][26][27]. In our study, 18.3% of BMB−/PET-patients (11out of 60 patients) were considered positive for BMI using SkewnessH (false positive findings). Moreover, these patients demonstrated worse PFS and OS than BMB−/PET-and SkewnessH negative patients at 2 years. This result has to be confirmed in larger studies, but it suggests that low-volume involvement of the bone marrow can be overlooked by both BMB and visual PET analysis and that bone heterogeneity assessment could help its diagnosis. Additionally, quantitative PET assessment of BMI on baseline FDG-PET/CT   using SkewnessH demonstrated better discriminative power over visual PET assessment for the prognosis stratification of patients in our overall population and may be of some help in PET-negative patients for the diagnosis of low-volume BMI. In addition, SkewnessH appeared to be the only independent predictor of PFS in multivariate analysis. A limitation of this study is that the drawing of the semiautomatic skeleton VOI is time consuming and that TF analysis has to be performed on third party software that is currently not approved for clinical use. Additionally, most of the TFs have been shown to be sensitive to reconstruction parameters, and the thresholds determined in the present study could not be used for patients scanned on other PET systems. With regard to this issue, our group has shown that some TFs such as entropy are less sensitive to reconstruction variability between PET centres [28], but SkewnessH was not tested in that previous study. However, Shiri et al. [29] and Galavis et al. [30] found consistent findings on the sensitivity of SkewnessH to reconstruction settings. To overcome this problem, we previously demonstrated that harmonised PET data could be considered [28,31]. Harmonised images were not used in the present study, as our PET centre has been EARL accredited since 2015, and we included patients scanned from 2008 to 2015 to ensure a sufficient follow-up period for the purpose of survival analysis. Of note, PSF reconstruction with no post-filtering step has been shown to have potential for more discriminative power in stratifying or ranking patients; therefore, future studies aiming at confirming our results should ideally be performed on images optimised for diagnosis (for instance with PSF or PSF + TOF and post filtering with a low kernel or no post filtering) as well as on images meeting harmonising standards. Perspectives other than testing FDG skeletal TFs on harmonised PET data in larger and multicentric series would be to investigate the diagnostic value of TFs in other lymphoma subtypes for which BMB is performed on a regular basis, such as Follicular lymphomas (FL) and Hodgkin lymphomas (HL). Depending on the additional diagnostic value of FDG skeletal TFs over visual assessment in these lymphoma subtypes, they might be of some help in obviating BMB in certain cases or for guiding the site of biopsy.

Conclusion
The better discriminative power of skeletal heterogeneity for risk stratification, compared to BMB and PET visual analysis in the overall population and more specifically in BMB−/PETpatients, suggests that it can be useful to identify BMI overlooked by PET visual analysis and BMB. The diagnostic value of FDG skeletal TFs should be confirmed with harmonised PET data in larger and multicentric series and in other lymphoma subtypes for which BMB is performed on a regular basis to determine whether TFs might be of some help in obviating BMB in certain cases or guiding the site of biopsy.
Acknowledgements The authors wish to thank Dr. Buvat and her team for having made the LIFEx software freely available to the scientific community.

Compliance with ethical standards
Conflicts of interest None to declare.
Ethical approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. Informed consent In accordance with European regulation, French observational studies without any additional therapy or monitoring procedure do not need the approval of an ethical committee. Nonetheless, we sought approval to collect data for our study from the national committee for data privacy, the National Commission on Informatics and Liberty (CNIL), with the registration n°2,081,250 v 0.
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.