Introduction

Follicular lymphoma (FL) is the most common indolent B-cell lympho-proliferative disorder of transformed follicular centre B cells, accounting for 20–25% of adult non-Hodgkin’s lymphomas (HLs) worldwide1. Follicular lymphoma is characterized by diffuse lymphadenopathy, splenomegaly and often bone marrow involvement (BMI)2. Indeed, BMI defined by a positive bone marrow biopsy (BMB) has been reported in 52% to 55% of newly diagnosed FL patients3,4,5,6. BMI is an important factor in most clinical risk stratification indices, including the Follicular Lymphoma International Prognostic Index (FLIPI). The presence of BMI can change the treatment strategy, especially in patients who were thought to have early-stage disease prior to having a bone assessment. 18F-fluorodesoxyglucose (18FDG) positron emission tomography (PET)/computed tomography (CT) is now used for the assessment of BMI in HL patients7,8 and DLBCL patients9,10,11,12,13. Indeed, several studies have demonstrated sufficient diagnostic performances for these two histologic subtypes of lymphoma. In FL, data are sparse from smaller series of patients and display weaker diagnostic performances for BMI13,14,15,16,17. More specifically, the incidence of cases of positive BMB with negative visual PET examinations has been estimated to be 13%18. Therefore, PET is generally not used for BMI assessment in follicular lymphomas, and BMB arbitrarily taken from the iliac crest is preferred as the gold standard. However, the main shortcomings of BMB are inadequate sampling and possible pain, bleeding or infection complications19. Some studies have shown that a combination of PET and BMB could be an alternative for a more accurate BMI assessment than either PET or BMB alone20,21.

Today, there is a growing interest in haematology in using alternatives to visual or semiquantitative PET assessments that are based on textural features (TFs)22,23. Indeed, the basic visual interpretation of diffuse bone marrow involvement without focal bone lesions on PET can be difficult, leading to false-negative results. The diagnostic value of skeletal TFs compared to BMB and PET visual analysis on baseline 18FDG PET/CT in DLBCL patients has been demonstrated24. By extrapolation, we assume that the quantification of the metabolic heterogeneity of the skeleton could also significantly improve the bone pretherapeutic evaluation in FL patients. Therefore, the aim of this study was to evaluate the value of textural features (TFs) for the diagnosis of BMI.

Results

Population characteristics

From the 113 FL patients identified from our database, 66 patients were ultimately included. Twenty-one patients were excluded because of missing BMB and 26 because of missing baseline 18FDG PET/CT. Fifty-nine patients were scanned on the Biograph TrueV Pet system, and seven were scanned on the Vereos PET system. There were 36 bone-NEGATIVE patients (54.5%) and 30 bone-POSITIVE patients (45.5%). Among the bone-POSITIVE patients, there were four BMB −/PETVISU + patients (13.3%), 14 BMB +/PETVISU− patients (46.7%) and 12 BMB +/PETVISU + patients (40.0%). Focusing on BMB −/PETVISU + patients, hypermetabolic lesions were located towards the axial skeleton and not in the appendicular skeleton, explaining the negativity of the BMB. Representative examples of each case are shown in Fig. 1.

Figure 1
figure 1

Representative MIP PET, sagittal PET and axial PET/CT images of patients PET +/BMB + (a); PET −/BMB + (b) and PET +/BMB − (c).

The population characteristics are summarized in Table 1. Among the bone-NEGATIVE patients, there were 4 patients (11.1%) staged 1, 10 (27.8%) staged 2, 13 (36.1%) staged 3 and 9 (25.0%) staged 4. There was no difference in the technical PET parameters between the bone-NEGATIVE and bone-POSITIVE groups of patients. The mean injected dose (MBq/kg), uptake time (min) and glycaemia (g/l) were 289 ± 54.5 versus 287 ± 56.3 (p = 0.61), 58.32 ± 3.59 versus 59.15 ± 3.52 (p = 0.30) and 1.06 ± 0.29 versus 0.98 ± 0.11 (p = 0.21), respectively. Hip prostheses were encountered in only four patients, two with a unilateral hip prosthesis and two with a bilateral hip prosthesis. No other types of prosthesis were encountered.

Table 1 Patients’ characteristics.

Validation of previous results in the field

A previous study24 found skewness_HISTO to be a promising PET parameter to discriminate between bone-NEGATIVE and bone-POSITIVE DLBCL patients with a cut-off value set to 1.26. In the present database of FL patients, the optimal cut-off for skewness_HISTO was 1.20 (AUC = 0.750 [− 95% CI = 0.629–0.871], p < 0.0001), with sensitivity, specificity, PPV, NPV and accuracy values of 66.7%, 80.6%, 74.1%, 74.4% and 74.2%, respectively (Fig. 2). Among the 10 false-negative results, 3 would have been reclassified as positive on a visual PET assessment basis because of lesions located out of the field of analysis, especially on the costal grill or vertebra atlas and axis. Seven false-positive results were also observed.

Figure 2
figure 2

ROC curves for the diagnosis of BMI using BMB, visual PET, skewness_HISTO PET and pred.score PET assessments. (a) ROC curves comparison; (b) sensitivity, specificity, positive predictive value (PPV), and negative predictive values (NPV); (c) true positive, true negative, false positive and false negative rates.

Multivariable diagnostic value of PET radiomics for bone involvement at baseline staging

Thirteen 18FDG PET/CT variables out of the 26 analysed were significantly different between the bone-NEGATIVE and bone-POSITIVE patients (Table 2). The LASSO regression including all analysed PET radiomics (n = 26) found variance_GLCM, correlation_GLCM, joint entropy_GLCM and busyness_NGLDM to have nonzero regression coefficients. Coefficient and cross-validation plots are provided in Fig. 3. Correlations between these four PET radiomics scans can be seen in Fig. 4. The corresponding linear equation for the computation of the prediction score was as follows:

$$\begin{aligned} Pred.score & = - 8.134 + 0.927 \times variance_{GLCM} + 10.272 \times correlation_{GLCM} \\ & \quad + 0.076 \times joint entropy_{GLCM} - 0.003 \times busyness_{NGLDM} \\ \end{aligned}$$
Table 2 PET characteristics for the entire series, for bone-NEGATIVE and for bone-POSITIVE patients.
Figure 3
figure 3

Coefficient (left panel) and cross-validation (right panel) plots of the LASSO analysis.

Figure 4
figure 4

Correlation plots between PET radiomics retained by the LASSO analysis. Red dots represent bone-NEGATIVE patients, and blue dots represent bone-POSITIVE patients.

The mean pred.score of the entire series was equal to − 0.096 ± 1.383. Based on ROC analysis, a cut-off equal to − 0.190 was found to be optimal for the diagnosis of BMI: AUC = 0.822 (95% CI = 0.721–0.924, p < 0.0001). The corresponding sensitivity, specificity, PPV and NPV values were equal to 70.0%, 83.3%, 77.8% and 76.9%, respectively (Fig. 2). Twenty-seven patients had a pred.score >  − 0.190 and were considered positive for BMI, among which six were false-positive results (BMB −/PET − patients). Additionally, nine false-negative results were observed, including seven BMB +/PETVISU − patients and two BMB +/PETVISU + patients whose lesions were out of the field of quantitative PET analysis. These patients could be easily recovered by visual analysis: one with a lesion on the costal grill and the other with a lesion on the upper jaw. In fine, only 7 bone-POSITIVE patients (23.3%) would have been missed using skeletal PET quantification analysis. When comparing the AUCs from ROC analyses for BMI assessment with BMB alone, visual PET alone, PET skewness_HISTO alone and PET pred.score (Fig. 2, Table 3), significant differences were found between BMB and visual PET assessments (p = 0.010) and between BMB and PET skewness_HISTO assessments (p = 0.015). No difference was observed between BMB and PET pred.score assessments (p = 0.097). No difference was found among the PET pred.score, visual PET alone, or PET skewness_HISTO regarding the assessment of BMI.

Table 3 ROC curves results regarding the diagnosis of BMI using BMB, visual PET, skewness_HISTO PET and pred.score PET assessments.

The correlation between selected PET radiomics and biological characteristics was explored and is summarized in Table 4. Significant negative correlations were found between haemoglobin blood level and variance_GLCM (ρ = -0.447, p = 0.0003) and joint entropy_GLCM (ρ = -0.498, p < 0.0001).

Table 4 Correlations between biological variables and PET variables retained for pred.score computation.

Discussion

The aim of the present study was to extrapolate previous results obtained for the diagnosis of BMI using PET radiomics in DLBCL patients to FL patients.

Skewness was previously found to be a promising parameter for the identification of patients with BMB involvement without visually assessable focal lesions, with a positive LR of 4.46. Interestingly, in our series of FL patients, the optimal cut-off value was consistent: equal to 1.20 versus 1.26 previously in DLBCL. However, skewness_HISTO BMI diagnostic performances were not as impressive, with low additional value over visual PET assessment alone: the sensitivity and NPV were 66.7% versus 53.3% and 74.4% versus 72.0% for skewness_HISTO and visual PET assessments, respectively. Well-known differences in metabolic characteristics between FL and DLBCL diseases could explain these results. In particular, FL uptake is usually less intense than that of DLBCL25. Another issue could be the important discrepancies in BMI at diagnosis between DLBCL and FL patients, with the rate of positive BMB estimated to be 15% in newly diagnosed DLBCL and 50% in FL6. Additionally, cases of BMB +/PET − patients were previously estimated to be only 3.1% in DLBCL26 but were estimated to be 13% in FL patients18. It is worth noting that this rate was even slightly superior in our series, reaching 21% of patients. However, it should be emphasized that cases of pure diffuse FDG uptake were considered positive in the study performed by Nakajima et al., whereas they were considered negative in the present study, which could partly explain this difference. Moreover, BMB −/PET + patients were also estimated at 13% and 12% in the another publication26, meaning that BMB could be tricked.

All things considered, it seemed to us that a multivariable approach using radiomics could be more accurate.

As has been highlighted in the literature, radiomic index values are highly dependent on the segmentation method27,28,29,30. The CT bone segmentation methods used here to draw VOIs were semiautomatic, with very little manual intervention and had already been shown to have a great interobserver agreement, which guaranteed their robustness24. To continue methodological considerations, the robustness of the radiomic indices to the intensity discretization method has been widely evaluated in the literature. Indices can be compared only if the same calculation parameters are used, which is the case here due to absolute resampling31.

In doing so, variance_GLCM, correlation_GLCM, joint entropy_GLCM and busyness_NGLDM were identified by LASSO analysis as potential variables of interest to build a linear model of prediction. None of the histogram or size-zone matrices were retained. It seemed that parameters extracted from GLCM or NGLDM were ideal candidates for describing skeletal tumour heterogeneity. Unlike histogram-based indices, calculated from original images, they reflect the spatial arrangement of voxel intensities. Even though statistical significance was not reached, with an optimal threshold PET pred.score set to − 0.19, the sensitivity and NPV were improved compared to visual PET assessment alone: 70.0% versus 53.3% and 76.9% versus 72.0%, respectively (Fig. 2). Finally, even if the performance of the BMB appeared to be better than that of the PET pred.score, it was still notable that there was no statistically significant difference between the ROC curves AUCs of these two diagnostic tests (p = 0.097). This may suggest that PET pred.score BMI assessment could perform equally to BMB provided that the model is strengthened with a larger database.

Notably, some examinations were found to be negative in terms of the PET pred.score but positive on visual PET assessment because of lesions located outside the VOIs. This result means either that improvement in CT bone segmentation has to be made to encompass the whole skeleton or that visual and quantitative PET assessments have to be conjointly made. The current paradigm of radiomic analysis adds quantitative information to visual analysis or biology without totally replacing them32, and it seems that the best option would be to combine visual and quantitative PET assessments. Presently, using this combined strategy, 7 bone_POSITIVE patients would have been missed compared to 14 patients using visual PET alone.

A more complex strategy combining clinical, biological and PET features should also be explored. However, the number of patients included in the present study did not allow us to test such strategies. We still looked for correlations between biological PET variables and found significant negative correlations between haemoglobin level and variance_GLCM and joint entropy_GLCM. Some studies have demonstrated that marrow hypermetabolism correlates with leukocyte and neutrophil levels, both of which are associated with a poor response to treatment33,34, but this was not observed in our series.

Furthermore, the limited number of included patients did not allow the performance of the internal test. Therefore, the reliability of such a model should be evaluated on an independent dataset, ideally acquired on a different PET system or from a different centre, for its performances to be definitely validated.

Applying a multivariable PET radiomics model to baseline 18FDG PET/CT images could be a promising path to improve the diagnosis of BMI follicular lymphoma patients. Prospective and larger clinical studies are needed to strengthen the model and to definitively confirm this hypothesis.

Methods

Population

In this retrospective double-centre study, we enrolled 113 patients newly diagnosed with FL from November 2014 to May 2019 who were treated with a chemotherapy regimen. The inclusion criteria were as follows: patients over 18 years old, histopathologically proven FL, pretherapeutic bone marrow biopsy and 18FDG PET/CT. Clinical variables, including age at diagnosis, sex, body mass index, Ann Arbor stage, bulky mass, FLIPI score, first-line treatment type, serum haemoglobin level, serum platelet level, serum white cell level, serum β2-microglobulin (β2M) level, serum lactate dehydrogenase (LDH) level, serum albumin level, serum calcium level and serum alkaline phosphatase level, were recorded. All procedures performed in studies involving human participants were approved by the local ethics committee and were in accordance with the 1964 Helsinki Declaration. In accordance with European regulations, observational studies without any additional therapy or monitoring procedures do not need the approval of an ethical committee. Additionally, the need for informed signed consent was waived. The procedure was declared to the National Institute for Health Data, with registration no. F20201023145322.

PET acquisition and reconstruction parameters

Patients fasted for 6 h before undergoing the examination. After a 15-min rest in a warm room, they were injected intravenously with 4.0 Mb/kg of 18FDG. Height, weight, injected doses, capillary glycaemia at the injection time and the delay between injection and the start of the acquisition were recorded for each patient. All images were acquired and reconstructed according to the European Association of Nuclear Medicine (EANM) guidelines version 2.035. PET imaging studies were performed on two different PET/CT systems:

  • A PET/CT Biograph TrueV PET system (Siemens Healthineers) with 3 iterations 21 subsets with point spread function (PSF) reconstruction resulted in voxels of 2.0 × 4.0 × 4.0 mm. PET emission acquisition was performed from the skull to mid-thighs with 2 min 40 s and 3 min 40 s per bed position for normal-weight and overweight patients, respectively.

  • A Vereos PET system (Philips) with 2 iterations 10 subsets with point spread function (PSF) reconstruction resulted in isotropic voxels of 2 mm3. PET emission acquisition was performed from the skull to mid-thighs with 2 min per bed position.

Extraction of PET bone textural features

All images were analysed by the same reviewer with 5 years of experience in PET interpretation using MIM (MIM Software, Cleveland, OH, USA, version 5.6.5). For visual PET/CT assessment, examinations were considered to be positive in cases of one or several obvious bone focal uptakes on PET images with or without bone lesions on CT images. Doubtful diffuse and/or heterogeneous skeletal uptake was not considered a positive finding. In case of discrepancy, the examination was conjointly reviewed to reach a consensus with a second experienced nuclear medicine physician having more than 10 years of experience in PET.

For textural analysis, the skeleton volumes of interest (VOIs) from the C3 vertebra to the upper third of femurs were automatically extracted from CT images for each examination (Supplemental Fig. 1).

In the case of hip prostheses, the zone was excluded to avoid PET attenuation correction artefacts. The final CT VOIs were then transferred to PET images. All possible lymph node areas of increased FDG uptake in the vicinity of the skeleton (especially in the retroperitoneum) that could affect texture features because of a partial volume effect were checked36. Finally, the VOIs were saved in DICOM-RT structure format so that they could be loaded in LIFEx software version 5.137. For the resampling step, 64 discrete values with a range of SUV units set to 0–30 and a spatial resampling set to 2.0 × 4.0 × 4.0 mm were used. The following PET variables were extracted:

  • five conventional PET parameters: SUVmax, SUVpeak, SUVskewness, SUVkurtosis and SUVexcessKurtosis

  • six grey-level co-occurrence matrix (GLCM) parameters: inverse difference, angular second moment, variance, correlation, joint entropy and dissimilarity

  • three neighbourhood grey-level different matrix (NGLDM) parameters: coarseness, contrast and busyness

  • eleven third-order metrics calculated from size-zone matrices: SZE (Short-Zone Emphasis), LZE (Long-Zone Emphasis), LGZE (Low Grey-Level Zone Emphasis), HGZE (High Grey-Level Zone Emphasis), SZLGE (Short-Zone Low Grey-Level Emphasis), SZHGE (Short-Zone High Grey-Level Emphasis), LZLGE (Long-Zone Low Grey-Level Emphasis), LZHGE (Long-Zone High Grey-Level Emphasis), GLNUZ (Grey-Level Non-Uniformity for Zone) ZLNU (Zone Length Non-Uniformity) and ZP (Zone Percentage). Index values were calculated using a single co-occurrence matrix simultaneously considering all 13 spatial directions.

All textural features were compliant with the benchmark of the image biomarkers standardisation initiative38.

Statistical analysis

Quantitative data are presented as the mean ± standard deviation (SD) or median (interquartile range) when appropriate. Characteristics of populations and PET radiomics were compared using Fischer’s exact tests for discrete variables and Mann–Whitney tests for continuous variables with Bonferroni correction. Both BMB and visual PET assessment as described above were taken as the gold standard for the patient’s classification. BMB −/PET − patients were considered disease-free patients (bone-NEGATIVE patients), whereas BMB +/PET −, BMB −/PET + and BMB +/PET + patients were considered as bone-POSITIVE patients. A least absolute shrinkage and selection operator (LASSO) regression algorithm with tenfold cross-validation was used to select features of interest, namely, those with nonzero coefficients. This regression method performs both variable selection and regularization to enhance the prediction accuracy and interpretability of the resulting statistical model39. A prediction score (pred.score) was computed for each patient by means of a linear regression combining all selected PET variables. Receiver operating characteristic curves (ROCs) were used to define the optimal pred.score cut-off value for the diagnosis of BMI by maximizing the sensitivity and specificity according to the Youden index and for diagnostic performance comparisons using the DeLong et al. methodology. Finally, Spearman correlation tests were used to determine the relationship between biological variables and PET radiomics of interest. Statistical analysis and figure conception were performed using XLSTAT software (XLSTAT 2019: Data Analysis and Statistical Solution for Microsoft Excel. Addinsoft).

Ethical approval

The authors are accountable for all aspects of the work and guarantee that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All procedures performed in the studies involving human participants were in accordance also the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration (as revised in 2013) and its later amendments or comparable ethical standards.

Consent to participate and for publication

In accordance with European regulations, French observational studies without any additional therapy or monitoring procedures do not need the approval of an ethics committee. Additionally, the need for informed signed consent was waived. Nevertheless, global information for people participating in research was provided, including a specific paragraph on the possibility of using health data for research purposes. The patient had the right to oppose the transmission of data covered by medical confidentiality that may be used and processed in the context of this research. The procedure was declared to the National Institute for Health Data with the registration no. F20201023145322.