A novel analytic approach for outcome prediction in diffuse large B-cell lymphoma by [18F]FDG PET/CT

Purpose This study aimed to develop a novel analytic approach based on 2-deoxy-2-[18F]fluoro-D-glucose positron emission tomography/computed tomography ([18F]FDG PET/CT) radiomic signature (RS) and International Prognostic Index (IPI) to predict the progression-free survival (PFS) and overall survival (OS) of patients with diffuse large B-cell lymphoma (DLBCL). Methods We retrospectively enrolled 152 DLBCL patients and divided them into a training cohort (n = 100) and a validation cohort (n = 52). A total of 1245 radiomic features were extracted from the total metabolic tumor volume (TMTV) and the metabolic bulk volume (MBV) of pre-treatment PET/CT images. The least absolute shrinkage and selection operator (LASSO) algorithm was applied to develop the RS. Cox regression analysis was used to construct hybrid nomograms based on different RS and clinical variables. The performances of hybrid nomograms were evaluated using the time-dependent receiver operator characteristic (ROC) curve and the Hosmer–Lemeshow test. The clinical utilities of prediction nomograms were determined via decision curve analysis. The predictive efficiency of different RS, clinical variables, and hybrid nomograms was compared. Results The RS and IPI were identified as independent predictors of PFS and OS, and were selected to construct hybrid nomograms. Both TMTV- and MBV-based hybrid nomograms had significantly higher values of area under the curve (AUC) than IPI in training and validation cohorts (all P < 0.05), while no significant difference was found between TMTV- and MBV-based hybrid nomograms (P > 0.05). The Hosmer–Lemeshow test showed that both TMTV- and MBV-based hybrid nomograms calibrated well in the training and validation cohorts (all P > 0.05). Decision curve analysis indicated that hybrid nomograms had higher net benefits than IPI. Conclusion The hybrid nomograms combining RS with IPI could significantly improve survival prediction in DLBCL. Radiomic analysis on MBV may serve as a potential approach for prognosis assessment in DLBCL. Trial registration NCT04317313. Registered March 16, 2020. Public site: https://clinicaltrials.gov/ct2/show/NCT04317313 Supplementary Information The online version contains supplementary material available at 10.1007/s00259-021-05572-0.


Introduction
Diffuse large B-cell lymphoma (DLBCL) represents the most common type of lymphoid neoplasm [1]. Since over 30% of patients experience disease progression or relapse, early identification of high-risk patients is important for patient management [2]. Over the past two decades, the International Prognostic Index (IPI) has been recognized as a prognostic model, which is based on the properties of several clinical factors including age, Ann Arbor stage, extranodal involvement, serum lactate dehydrogenase (LDH) level, and performance status [3]. However, IPI is not suitable for predicting refractory disease, which might be due to its lack of information on intratumoral functional and metabolic profiles [4,5].
Positron emission tomography/computed tomography (PET/CT) with 2-deoxy-2-[ 18 F]fluoro-D-glucose ([ 18 F]FDG) is a representative of molecular imaging and transpathology [6], which has been applied as a routine imaging tool for staging and response assessment of lymphoma [7]. Several studies have indicated that PET semi-quantitative parameters, particularly maximum standardized uptake value (SUVmax), total metabolic tumor volume (TMTV), and total lesion glycolysis (TLG), might be independent prognostic factors in DLBCL [8][9][10]. However, those parameters are only used to evaluate the gross tumor metabolism, which cannot fully depict the subtle metabolic heterogeneity within a targeted lesion. Recently, PET-based radiomics has been introduced as an innovative image analysis that can capture intratumoral metabolic heterogeneity and allow accurate prediction of clinical outcome in various malignancies, such as breast cancer, non-small cell lung cancer, and lymphoma [11][12][13]. Studies have shown that several single radiomic features, including long-zone high gray-level emphasis, and skeletal textural feature SkewnessH, were significant predictors of survival in DLBCL [14,15]. Other literatures reported that the combination of multiple radiomic features, which is often defined as the radiomic signature (RS), may hold higher prognostic value than the single feature [16,17]. In addition, RS combined with clinical or genomic data can produce robust and improved medical decision-making [18,19]. However, to the best of our knowledge, the RS based on [ 18 F]FDG PET/CT for prognosis assessment of DLBCL has not yet been described. Furthermore, it remains unclear whether PET-based RS could add more prognostic values to the IPI in DLBCL.
We hypothesized that RS combined with IPI score could help improve the prognosis assessment of DLBCL patients. Therefore, for the first time, this study investigated the prognostic value of the RS combined with IPI score in predicting the survival of DLBCL patients by [ 18 F]FDG PET/CT.

Study population
We retrospectively enrolled patients with newly diagnosed DLBCL between July 2013 and July 2019 in The Second Affiliated Hospital of Zhejiang University School of Medicine. The inclusion criteria were (1) histopathologically confirmed DLBCL, (2) over 18 years old, (3) underwent pre-treatment [ 18 F]FDG PET/ CT, and (4) initial treatment with R-CHOP (rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone) or R-EPOCH (rituximab, etoposide, prednisone, vincristine, cyclophosphamide, and doxorubicin). Patients were excluded if they had coexistent central nervous system lymphoma or other malignancies, or had an incomplete follow-up. In total, 152 patients were enrolled, and divided into a training cohort (n = 100) and a validation cohort (n = 52) according to the time of enrollment. The flowchart of patient enrollment is shown in Supplemental Fig. 1.
Clinical variables including gender, age at diagnosis, cell of origin, performance status, B symptoms, Ann Arbor stage, serum LDH level, serum β2-microglobulin (β2-MG) level, extranodal involvement, and treatment regimens of each patient were recorded. The IPI score was calculated as described previously [3]. This study was approved by the institutional review board, and the requirement to obtain written informed consent was waived (Approval Number: 2019-350).

Patient treatment and follow-up
Patients were initially treated with standard first-line chemotherapy every 21 days. Response to treatment was assessed according to the Lugano classification [20]. Those with no response or progressive disease then received involved-site radiotherapy or autologous stem cell transplantation.
The follow-ups were performed every 3 months after the completion of treatment, and ended in Jul 2021. The endpoints included overall survival (OS) (defined as the period from the initial diagnosis to the death from any cause) and progression-free survival (PFS) (defined as the period from the initial diagnosis to the progression, relapse, or death from any cause).

PET/CT imaging protocol
All images were acquired and reconstructed according to the European Association of Nuclear Medicine (EANM) guidelines version 2.0 [21]. Patients were fasted for at least 6 h and had a blood glucose level below 200 mg/dL before the PET/CT examination. PET/CT imaging was performed at a median uptake time of 67 min (range, 53-81 min) after intravenous injection of [ 18 F]FDG (3.7 MBq/kg). Patients were scanned on a PET/CT scanner (Biograph mCT, Siemens Medical Solutions) with 5 min per bed position. A low-dose CT scan (120 kVp; 40-100 mAs; 5 mm slice thickness) was performed from the upper thigh to the base of the skull, followed by a PET scan. PET images were reconstructed with 4.07 × 4.07 × 3 mm 3 voxels using CT-based attenuation correction by Siemens-specific TrueX algorithm.
All standardized uptake values (SUVs) were normalized for body weight and corrected to the uptake time of 60 min throughout the study. The mean standardized uptake value (SUVmean) of the liver should be between 1.3 and 3.0, consistent with the EANM guidelines [21]. To ensure the accuracy and reproducibility of SUV measurements, a set of quality control (QC) procedures are undertaken, including daily QC, calibration, and cross-calibration. Both daily QC and calibration are performed using a 68 Ge cylinder with a known radioactive concentration according to the manufacturer's protocol. The cross-calibration and normalization with time alignment are performed to evaluate the SUV bias according to the manufacturer's recommendations.

PET image segmentation and feature extraction
PET images were analyzed by two experienced nuclear medicine physicians who were blind to the patients' outcome. The volumes of interest (VOIs) were semiautomatically delineated using the LIFEx software (version 6.30, https:// www. lifex soft. org/ index. php) with a fixed threshold of 41% SUVmax [21]. To reduce the influence of partial volume effects, lesions with a minimal diameter of 2 cm on CT or a minimum metabolic volume of 4.2 cm 3 (if lesion was not apparent on CT) were selected [22]. Bone marrow was considered involved if focal or multifocal lesions presented higher uptake than the liver [23]. For each patient, SUVmax, TMTV, and TLG were recorded. In addition, the metabolic bulk volume (MBV) defined as the metabolic volume of the largest lesion was also recorded [24]. Delineations of MBV and TMTV in LIFEx software are shown in Fig. 1. To assess the time-varying effect on SUV measurements, liver SUVmean and SUVmax were compared between the training and validation cohorts, and also between pre-treatment and end-of-treatment PET scans for patients who underwent end-of-treatment PET/CT evaluation, using a 1.2 cm diameter VOI in posterior right liver lobe, as previously reported [25].
A total of 1245 radiomic features from TMTV and MBV were extracted via PyRadiomics software [26], followed by z-score normalization. Feature extraction on TMTV was performed according to a previous literature [13]. Specifically, TMTV was constructed by using the "save all in one" function in LIFEx, and the radiomic features were extracted across the entire metabolic tumor volume. Detailed descriptions of the extracted features are presented in Supplemental Table 1.

Feature selection and RS construction
Intra-class correlation coefficients (ICCs) were used to evaluate the inter-and intra-observer agreement for 3 conventional and 1245 radiomic features [27]. Features with both inter-and intra-observer ICCs of over 0.75 were retained. The least absolute shrinkage and selection operator (LASSO) Cox regression algorithm with tenfold crossvalidation was applied to select the optimal features with non-zero coefficients in the training cohort [28].
The RS based on MBV (MBV-RS) and TMTV (TMTV-RS) were developed through linearly combining the nodes were semiautomatically delineated using the 41% SUVmax threshold method. The VOI of inguinal lymph node (green) represents the MBV. c TMTV (blue) was constructed using the "save all in one" function in LIFEx selected features weighted by their corresponding LASSO coefficients. The cut-off values of the RS were identified by X-tile software (version 3.6.1, Yale University).

Construction of hybrid nomograms
Univariate Cox regression analysis was performed to investigate the prognostic values of the RS and clinical variables. All significant variables were then enrolled into a multivariate Cox regression. MBV-or TMTV-based hybrid nomograms for PFS and OS prediction (MBV-HN PFS , TMTV-HN PFS , MBV-HN OS , and TMTV-HN OS ) were then established on the basis of the regression coefficient of each variable that remained significant in the multivariate Cox analysis [29]. Based on the established nomogram, a risk score was calculated for each patient in the training cohort. An optimal cut-off point of the risk score was determined by maximized Youden index to stratify patients into low-risk or high-risk groups [30].

Model performance assessment
Time-dependent receiver operator characteristic (ROC) curve analysis was conducted to investigate the predictive accuracies of RS, clinical variables, and hybrid nomograms [31]. Sensitivity, specificity, positive predictive value, and negative predictive value were calculated accordingly. The cut-off values of RS, clinical variables, and risk score of hybrid nomograms evaluated in the training cohort were applied to the validation cohort. The area under the curves (AUCs) between the hybrid nomograms and clinical variables were compared by using the DeLong test [32]. The Hosmer-Lemeshow test was used to evaluate the goodness-of-fit of the hybrid nomograms [33]. The decision curve analysis (DCA) was applied to determine the clinical utilities of the hybrid nomograms by quantifying the net benefits under different threshold probabilities in the whole cohort [34].

Statistical analysis
All statistical analyses were performed by using R (version 3.6.1, http:// www.r-proje ct. org) and SPSS software (version 25.0, IBM). The differences in clinical characteristics between the training and validation cohorts were assessed by using the chi-square test and independent t-test, where appropriate. Survival functions were estimated by Kaplan-Meier analysis, and survival distributions were compared by using log-rank test. A P value of < 0.05 was considered statistically significant.

Patient characteristics and outcome
The patients' characteristics are summarized in Table 1. No significant difference in clinical characteristics was observed between the training and the validation cohort (P = 0.062-0.888). No time-varying effect on SUV measurement was identified (Supplemental Table 2). The median follow-up period of the whole cohort was 42.5 months (range 4-96 months). By the end of follow-up, 49 patients (32.2%) had a PFS event (with a median of 11.5 months), while 41 patients (27%) died (with a median of 14 months).

Relationship between MBV and TMTV
The location of the tumor bulk is summarized in Supplemental Table 3. MBV was significantly correlated with TMTV (Pearson's correlation coefficient r = 0.778; P < 0.0001) (Supplemental Fig. 2). No significant difference was found in AUCs between MBV and TMTV for predicting PFS and OS in the training (P = 0.161 and P = 0.526, respectively) nor the validation cohort (P = 0.967 and P = 0.940, respectively) (Supplemental Fig. 3).

Construction of hybrid nomograms
Univariate Cox analysis showed that β2-MG, B symptoms, IPI score, MBV-RS PFS , and TMTV-RS PFS were significantly associated with PFS, while IPI score, MBV-RS OS , and TMTV-RS OS were significantly associated with OS (Supplemental Table 6). In the multivariate analysis, the RS (MBV-RS PFS , TMTV-RS PFS , MBV-RS OS , and TMTV-RS OS ) and IPI score were independent predictors of PFS and OS (Supplemental Table 7), and were selected to build the hybrid nomograms based on MBV (Fig. 2a) and TMTV (Fig. 2b). The cut-off values of risk score for MBV-HN PFS , MBV-HN OS , TMTV-HN PFS , and TMTV-HN OS correspond to 51, 69, 54, and 63 total points, respectively.

Model performance assessment
The diagnostic performances of hybrid nomograms, RS, and IPI score are presented in Table 2. For PFS prediction, TMTV-based hybrid nomogram had significantly higher AUCs than the IPI score in both training cohort (0.828 vs. 0.701, P < 0.001) and validation cohort (0.783 vs. 0.663, P = 0.041). Significant differences of AUC were also found between MBV-based hybrid nomogram and IPI score in both training cohort (0.835 vs. 0.701, P < 0.001) and validation cohort (0.787 vs. 0.663, P = 0.017). No significant difference was observed between the two hybrid nomograms in the training (P = 0.456) nor the validation cohort (P = 0.971). There was no significant difference between MBV-RS PFS / TMTV-RS PFS and IPI score in the two cohorts (all P > 0.05). The Hosmer-Lemeshow test showed that both TMTV-and MBV-based hybrid nomograms calibrated well in the two cohorts (all P > 0.05). DCA of 2-year PFS indicated that TMTV-and MBV-based hybrid nomograms had higher net benefits than IPI score (threshold probability over 7% and 6%, respectively) (Fig. 3a).
For OS prediction, the AUCs of TMTV-based hybrid nomogram were significantly higher than IPI score in both training cohort (0.818 vs. 0.713, P = 0.005) and validation cohort (0.789 vs. 0.652, P = 0.038). Similar results were also observed between MBV-based hybrid nomogram and IPI score in both training (0.831 vs. 0.713, P < 0.001) and validation cohort (0.792 vs. 0.652, P = 0.013). There was no significant difference between TMTV-and MBV-based hybrid nomograms in the training (P = 0.242) nor the validation cohort (P = 0.965). Also, no significant difference was observed between MBV-RS OS /TMTV-RS OS and IPI score (all P > 0.05). Both TMTV-and MBV-based hybrid nomograms calibrated well in the two cohorts (all P > 0.05). TMTV-and MBV-based hybrid nomograms had higher net benefits than IPI score if the threshold probability was over 9% and 4%, respectively (Fig. 3b).

Survival prediction
Kaplan-Meier estimates showed that patients could be stratified into distinct subgroups according to IPI score (Fig. 4) and RS (Fig. 5) (all P < 0.05). By combining RS with IPI, we observed that both MBV-and TMTV-based hybrid nomograms demonstrated a more distinct risk stratification than IPI alone, with larger differences between subgroups and improved hazard ratios (all P < 0.05) (Fig. 6).

Discussion
In the present study, we developed RS and hybrid nomograms from pre-treatment [ 18  The most important finding of our study was that the RS composed of multiple radiomic features could improve the prognostic value beyond the conventional IPI score. The hybrid nomograms combining RS with IPI could help stratify those high-risk individuals with poorer survival outcomes, achieving significantly higher AUCs and contributing to more distinct risk stratifications than IPI alone. This is consistent with a very recent study indicating that a single radiomic feature run length non-uniformity could provide additional prognostic value to the IPI in DLBCL [35]. However, compared with the run length non-uniformity reported in their study, the RS in our study showed more significant P values in the multivariate analysis (e.g., the P value of the MBV-based RS for PFS prediction was 0.001). Similarly, another study also identified a single radiomic feature longzone high gray-level emphasis as an independent predictor of 2-year event-free survival (with a sensitivity of 0.60) [14]. By comparison, the RS in our study showed higher sensitivities for survival prediction (e.g., the MBV-based RS had a sensitivity of 0.84 for PFS prediction in the training cohort). In our study, neither MBV-nor TMTV-based RS showed significantly higher AUCs than IPI score. This observation is in line with previous studies demonstrating that the predictive ability was comparable between RS and clinical variables [36,37]. A possible explanation is that IPI score and RS possess different properties in phenotyping disease characteristics. IPI score is based only on clinical factors, while RS represents imaging features that reflect the intratumoral metabolic heterogeneity. Since the complex nature and biologic processes of malignancy involve multiple components, taking both clinical and imaging features into account may provide a more comprehensive disease characterization and a better prognostication. In this study, we reported the first attempt to develop hybrid nomograms by combining the PET-based RS with IPI, which was supposed to provide an   individualized estimate of survival and could serve as an easy-to-use tool for clinical decision-making. Taken previous findings and our results together, we speculated that PET-based radiomics and IPI could be complementary and synergistic for estimating survival in DLBCL. Radiomic analysis for lymphoma is challenging, at least in part due to the inter-and intratumoral heterogeneity, and the complexity of isolated lesion segmentation especially when disease is disseminated [38]. In light of these concerns, we performed radiomic analysis on the metabolic volume of the largest lesion, which was defined as MBV in our study, and compared its performance with that based on TMTV. Our results demonstrated that radiomic analysis on MBV and TMTV both perform well in predicting survival, which is in line with previous reports [13,14]. Moreover, no significant difference was found between the performance of Hazard ratio (HR) with 95% confidence interval and log-rank P value are reported MBV-and TMTV-based hybrid nomograms. As measuring MBV is technically easier and faster than measuring TMTV, our results indicated that radiomic analysis on MBV could be a feasible approach for prognosis assessment in DLBCL.
We also compared the prognostic value of MBV and TMTV. While TMTV has been commonly reported as a potential prognostic indicator in DLBCL [9,39], very few studies have focused on the prognostic effect of MBV. A recent study suggested that MBV was an independent predictor of OS and had a strong correlation with TMTV [24]. Consistently, ROC analysis in our study showed no significant difference between MBV and TMTV in survival prediction, suggesting that MBV holds prognostic value as TMTV does. Besides, for 25% of patients in validation cohort who had discordance in MBV and TMTV, MBV accurately predicted the outcome regardless of TMTV, which was in accordance with the previous finding that MBV might have greater influence on survival than TMTV [24], and indicated that MBV could be a surrogate marker of TMTV.
Our results showed that conventional PET parameters including SUVmax, TMTV, and TLG were not retained for model construction. However, these parameters were reported to be predictive of survival in DLBCL [8][9][10]. This discrepancy may be attributed to the differences in the methods employed for feature selection and model building. In our study, we applied the LASSO Cox regression algorithm which is considered suitable for screening high-dimensional features that are most strongly associated with patient outcome and avoiding overfitting [40]. As shown in our results, only radiomic features were finally selected via this algorithm, indicating that radiomic features were correlated with tumor volume and might provide more accurate prognostic information than conventional PET parameters in DLBCL.
In our study, we applied the 41% SUVmax method which has been recommended by the EANM and identified as an effective approach for prognosis assessment in DLBCL [21,41,42]. The results of the ICC analysis showed that the majority of assessed features had good intra-and interobserver agreement (ICC > 0.75), which is consistent with a recent study demonstrating that the 40% SUVmax method could improve the repeatability of most radiomic features [43]. However, it has been revealed that the radiomic features could be influenced by different segmentation methods [44]. Future studies are required to use multiple segmentation and explore optimized methods through more advanced, deep learning techniques [45,46]. There are some limitations to this study. First, protein expressions and gene arrangements of MYC and BCL-2 are acknowledged prognostic factors but are not evaluated in our study due to the unavailability of these data from all patients. Second, one should be cautious when extrapolating these findings as this is a retrospective single-center study with a relatively small sample size. Therefore, our results need to be further validated in prospective multi-center studies involving a larger cohort of patients.

Conclusion
In this study, we developed a novel analytic approach based on RS and IPI score for predicting the outcome of patients with DLBCL by [ 18 F]FDG PET/CT, which showed significant predictive performance. MBV-based radiomic analysis may serve as a potential approach for prognosis assessment in DLBCL.
Funding This work was supported by grant from the National Science Foundation of China (No. 81725009).

Data availability
The data sets generated and analyzed during the current study are available from the corresponding author on reasonable request.
Code availability Not applicable.

Declarations
Ethical approval All procedures were performed in accordance with the principles of the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. The study design and exemption from informed consent were approved by the Institutional Review Board of The Second Affiliated Hospital of Zhejiang University School of Medicine (Approval No. 2019-350).
Informed consent For this type of study, formal consent is not required.

Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, Fig. 6 Kaplan-Meier estimates of PFS and OS according to MBV-and TMTV-based hybrid nomograms (HN) in a the training cohort and b the validation cohort. Hazard ratio (HR) with 95% confidence interval and log-rank P value are reported provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.