Introduction

Multiple myeloma (MM) is considered an incurable hematological malignancy. The duration and quality of life for patients continue to improve [1]. At the same time, MM is a very heterogeneous disease with wide variations in clinical course and outcome among patients, largely due to the fact that plasma cells display distinct malignant potential, e.g., based on their cytogenetic profile or genetic alterations [2]. The prognosis and patient survival in MM are affected by multiple factors, among which tumor burden plays a key role [3]. Tumor burden is estimated using the Durie-Salmon staging [4] and the International Staging System (ISS) [5], and nowadays the Revised International Staging System (R-ISS) [6].

For an accurate individual assessment of the extent of the disease and its prognosis, imaging has evolved into a key component. Among cross-sectional imaging modalities, which have replaced skeletal radiography, traditionally used in the Durie and Salmon system, [18F]FDG PET/CT is gradually gaining in importance. Performed at the initial diagnosis of MM, [18F]FDG PET/CT has been shown to provide significant prognostic information based on the presence and number of focal lesions and diffuse bone marrow (BM) infiltration [7,8,9,10]. In addition, [18F]FDG PET/CT is considered to be the most appropriate method for assessing treatment response in this disease because of its ability to reliably distinguish metabolically active from inactive myeloma lesions [8, 9].

However, [18F]FDG PET/CT does have certain limitations concerning individual MM evaluation. Due to the different patterns of BM involvement as well as the non-negligible incidence of concomitant, myeloma-related events of adverse prognostic impact, such as the development of paramedullary disease (PMD), extramedullary disease (EMD), and pathological fractures, the standardization of assessment and reporting of the PET/CT examinations may be difficult to obtain, which, in turn, negatively affects inter-observer reproducibility in scan interpretation [11, 12].

One approach to address the issue of standardization of [18F]FDG PET/CT evaluation involves the volumetric assessment of the metabolic activity in the whole BM compartment—mainly through the calculation of the parameters metabolic tumor volume (MTV) and total lesion glycolysis (TLG)—which incorporates information both on focal myeloma lesions and diffuse BM infiltration [13, 14]. However, the accurate calculation of these volumetric parameters may prove to be a challenging task, requiring considerable computing power and fast and reproducible computer programs to allow for proper segmentation and correction of background activity and partial volume effect [15].

Our group has recently validated a novel artificial intelligence (AI)–based tool for whole-body volumetric assessment of BM metabolic activity on PET/CT images in MM. We demonstrated that automated BM segmentation and calculation of MTV and TLG are feasible, while AI-derived quantitative PET parameters significantly correlate with the results of visual analysis of scans as well as with the biopsy-derived BM plasma cell infiltration rate, highlighting the potential role of deep learning methods towards optimization and standardization of PET/CT interpretation [16].

In this context, and in an attempt to deepen the existing knowledge on the perspectives of AI in PET/CT for MM diagnostics, we investigated the prognostic role of automatically obtained whole-body volumetric calculations of BM metabolism in patients with newly diagnosed MM in the current study. In addition, the study aimed to identify potential [18F]FDG uptake thresholds for effective BM segmentation in terms of patient prognosis.

Materials and methods

Patients

Forty-four patients (29 male, 15 female; mean age 58.2 years) with previously untreated MM based on the criteria established by the International Myeloma Working Group (2003) were included in this analysis (Table 1) [17]. All patients received bortezomib-based induction therapy followed by high-dose chemotherapy (HDT) and autologous stem cell transplantation (ASCT). Twenty-one patients were enrolled in the prospective, open-label, randomized GMMG-MM5 phase III trial (EudraCT No. 2010–019173-16), which compared two different bortezomib-based induction regimens, followed by HDT, ASCT, and lenalidomide maintenance for 2 years or until complete response [18], while the remaining 23 patients were treated with comparable regimens and ASCT outside the trial. Patients of this cohort were previously studied by our group in other analyses but with different approaches and shorter follow-up as presented here [19, 20]. No patient had received chemotherapy, granulocyte colony-stimulating factor (G-CSF), or erythropoietin before the study inclusion. The study was conducted in accordance with the Declaration of Helsinki principles, with institutional approval by the ethical committee of the University of Heidelberg (S-076/2010) and the Federal Agency of Radiation Protection in Germany (“Bundesamt für Strahlenschutz”). All patients gave written informed consent prior to participation in this study.

Table 1 Patient characteristics (N = 44)

PET/CT data acquisition

All patients underwent whole-body [18F]FDG PET/CT at diagnosis before commencement of treatment. Imaging was performed 60 min after injection of the radiopharmaceutical from the skull to the toes with a scan duration of 2 min per bed position. A dedicated PET/CT system (Biograph mCT, S128, Siemens Co., Erlangen, Germany) with an axial field of view of 21.6 cm with TruePoint and TrueV operated in a three-dimensional mode was used. A low-dose attenuation CT (120 kV, 30 eff mA) was used for attenuation correction of the PET data and for image fusion. All PET images were attenuation-corrected and an image matrix of 400 × 400 pixels was used for iterative image reconstruction. Iterative image reconstruction was based on the ordered subset expectation maximization (OSEM) algorithm with two iterations and 21 subsets as well as time of flight (TOF).

PET/CT data analysis

Automated PET/CT data analysis and quantification

Automated [18F]FDG PET/CT image segmentation and volumetric quantification were performed as described previously [16]. Briefly, the applied deep learning–based methodology consists of the following three steps:

  1. 1.

    CT-based organ segmentation: With the use of a convolutional neural network [21], segmentation of the skeleton, the liver, and muscles was performed. The bones were divided into bones of the axial skeleton, including the vertebrae, scapulae, clavicles, sternum, ribs, sacrum, and pelvic bones, and into bones of the extremities, including the humeri and femora. In order to avoid the effect of the intense physiological [18F]FDG uptake from the brain, the skull was excluded frοm the evaluations.

  2. 2.

    Application of standardized uptake value (SUV) threshold(s) in the relevant organs: The CT-based segmentation was transferred to the SUV PET images, and then, different SUV thresholds were applied to identify BM infiltration. All pixels with SUV above or equal to the threshold were segmented as positive for MM infiltration, after employment of specific steps to mitigate the effect of potential spillover of tracer uptake from adjacent tissues into the bone mask due to the poor resolution of PET images [16]. In the present study, ten different SUV thresholds were employed to define pathological [18F]FDG uptake in the skeleton. Four of these thresholds had already been tested in the initial application of the tool in MM where they showed a significant correlation with the BM plasma cell infiltration rate (approaches 1–4) [16]. The remaining six were developed based on modifications of the previous approaches with the aim of optimizing or complementing the existing thresholds (approaches 5–10). In particular, the thresholds applied were the following:

    • Approach 1 Axial skeleton: SUV ≥ liver SUVmedian × 1.1. Extremities: SUV ≥ muscle SUVmedian × 4.

    • Approach 2 Axial skeleton: SUV ≥ liver SUVmedian × 1.5. Extremities: SUV ≥ muscle SUVmedian × 4.

    • Approach 3 Axial skeleton and extremities: SUV ≥ 2.5, according to Terao et al. [14].

    • Approach 4 Axial skeleton: SUV ≥ 2.5. Extremities: SUV ≥ 2.0.

    • Approach 5 Axial skeleton and extremities: SUV ≥ liver SUVmedian.

    • Approach 6 Axial skeleton: SUV ≥ liver SUVmedian. Extremities: SUV ≥ muscle SUVmedian × 4.

    • Approach 7 Axial skeleton and extremities: SUV ≥ liver SUVmedian × 1.2.

    • Approach 8 Axial skeleton: SUV ≥ liver SUVmedian × 1.2. Extremities: SUV ≥ muscle SUVmedian × 4.

    • Approach 9 Axial skeleton and extremities: SUV ≥ liver SUVmean.

    • Approach 10 Axial skeleton and extremities: SUV ≥ 3.0 (Table 2).

  3. 3.

    Refinement of the resulting regions using postprocessing and subsequent calculation of whole-body MTV and TLG: Using the resulting masks, the total whole-body MTV could be estimated as the volume of the segmented pathological uptake in each patient. In particular, MTV (mL) represents the volume of myeloma lesions visualized on PET/CT with SUV greater than a predefined threshold. Similarly, TLG was calculated as the product of average SUV and MTV for the segmented regions (TLG = SUVmean × MTV).

Table 2 The different SUV thresholds applied for definition of pathological tracer uptake in the BM with the AI tool, their median values (range) in the studied cohort, and the respective r and p-values of the correlation analysis between MTV, TLG, and the BM infiltration rate by malignant plasma cells

Application of IMPeTUs

Two nuclear medicine physicians (first and last authors) independently analyzed the [18F]FDG PET/CT images on an Aycan workstation. Disagreements between the readers were resolved by consensus. Image interpretation was based on IMPeTUs [11], which briefly consider the following parameters:

  • BM metabolic state calculated in the lower lumbar spine in the absence of focal tracer enhancement, based on the 5-point Deauville score (DS): score 1, no uptake at all; score 2, ≤ mediastinal blood pool uptake; score 3, > mediastinal blood pool uptake, ≤ liver uptake; score 4, > liver uptake + 10%; score 5, >  > liver uptake (twice).

  • Number of focal, [18F]FDG-avid medullary lesions, consistent with MM (Fx): F1, no lesions; F2, 1–3 lesions; F3, 4–10 lesions; F4, > 10 lesions.

  • Location of focal, [18F]FDG-avid medullary lesions, consistent with MM: skull, spine, other.

  • Degree of [18F]FDG uptake of the hottest MM lesion, based on DS score: 1, no uptake at all; score 2, ≤ mediastinal blood pool uptake; score 3, > mediastinal blood pool uptake, ≤ liver uptake; score 4, > liver uptake + 10%; score 5, >  > liver uptake (twice).

  • Number of lytic lesions on CT (Lx): L1, no lesions; L2, 1–3 lesions; L3, 4–10 lesions; L4, > 10 lesions.

  • Presence of at least one fracture on CT.

  • Presence of PMD, defined as a hypermetabolic bone lesion extending through the cortical bone and involving the surrounding soft tissues.

  • Presence of EMD, defined as hypermetabolic myeloma lesion in the soft tissues without bone involvement.

Clinical parameters, BM plasma cell infiltration, and fluorescence in situ hybridization

BM aspirates or biopsies were performed within 4 weeks of the [18F]FDG PET/CT examination and prior commencement of treatment. The routine BM biopsy procedure in our center involves collection of 1–3-cm-long BM core taken from the iliac crest. Percentage of BM infiltration by malignant plasma cells was assessed via Giemsa-stained bone marrow smears. The infiltration rate is the ratio of the number of plasma cells to the number of all nucleated cells in BM. Fluorescence in situ hybridization was performed, as described previously [22] on CD138-purified plasma cells using the following probes: 1q21, 5p15, 5q35, 8p21, 9q34, 11q22.3, 13q14, 15q22, 17p13, and 19q13. We also investigated immunoglobulin H (IgH) translocations using an IgH break-apart probe as well as probes for t(11;14), t(4;14), and t(14;16). The R-ISS score was used to define high-risk disease. According to this prognostic system, stage R-ISS I included patients of ISS stage I (serum β2-microglobulin level < 3.5 mg/L and serum albumin level ≥ 3.5 g/dL); no high-risk chromosomal abnormalities, defined as deletion 17p13, and/or translocation t(4;14) and/or translocation t(14;16); and normal lactate dehydrogenase (LDH) level; stage R-ISS III included ISS stage III (serum β2-microglobulin level > 5.5 mg/L) and high-risk chromosomal abnormalities or high LDH level; stage R-ISS II included all the other possible combinations [6].

Statistical analysis

For all approaches, MTV and TLG measurements showed a skewed distribution. Therefore, median and range values are reported. Consequently, correlation analysis of MTV and TLG measurements with BM infiltration rate was based on Spearman rank correlation. Progression-free survival (PFS) was defined as time from PET/CT to disease progression or death from any cause, whichever occurs first, and overall survival (OS) was defined as the time from PET/CT until death from any cause or last follow-up. Kaplan–Meier estimates were generated and median PFS and OS estimated. Median follow-up time was determined by inverse Kaplan–Meier estimation. For univariable comparison of PFS and OS, log-rank test was used, dichotomizing the quantitative variables at their median. Univariable Cox proportional hazards regression analysis was applied for the MTV and TLG measurements and parameters included in IMPeTUs. Multivariable Cox proportional hazards regression analysis was applied to investigate association between survival time of patients and multiple predictors simultaneously. For parameters highly correlated with each other, such as MTV and TLG, only one was included at a time in the model. No correction for multiple testing was performed as the study was exploratory. The receiver operating characteristic (ROC) curve was used to investigate the discrimination performance of MTV and TLG for prediction of survival in the first 2 years after PET/CT. The area under the curve (AUC) was calculated, and the cut point optimizing the Youden index, i.e., the sum of sensitivity and specificity, was determined for each approach. The results were considered significant for p-value less than 0.05 (p < 0.05). Calculations were performed with R version 4.1.1. and R packages survival, survminer, prodlim, timeROC, and pROC (R Foundation for Statistical Computing, Vienna, Austria; https://www.R-project.org/).

Results

Patient characteristics

The median BM plasma cell infiltration rate ranged between 1 and 92%, with a mean value of 41% (median 38%). Twenty-three patients were classified as ISS stage I, 13 patients as ISS stage II, and four patients as ISS stage III. Cytogenetic data were available for 39 patients, with high-risk cytogenetic abnormalities being detected in eight of them. A combination of the ISS and cytogenetic data was available in 35 patients. Based on this, 13 patients were classified as R-ISS-I group, 20 patients as R-ISS-II group, and two patients as R-ISS-III group. The patients’ characteristics at the time of PET/CT as well as data on the second line of treatment after bortezomib induction, ASCT, and lenalidomide maintenance in patients with relapse/progression are summarized in Table 1.

Automated [18F]FDG PET/CT quantification

Ten different SUV thresholds were used to define pathological skeletal tracer uptake and subsequently calculate AI-based, automated whole-body MTV and TLG values. BM segmentation and volumetric calculations were feasible in all patients. The results of this analysis are shown in Table 2. Examples of the application of the tool in two patients of the studied cohort are presented in Figs. 1 and 2.

Fig. 1
figure 1

Example of the application of the AI-based software tool for automated calculation of total MTV and TLG of a MM patient with intense diffuse BM [18F]FDG uptake as well as multiple focal hypermetabolic lesions. The use of different tracer uptake thresholds leads to different BM segmentation patterns and, subsequently, to different whole-body MTV and TLG values

Fig. 2
figure 2

Example of the application of the AI-based software tool for automated calculation of total MTV and TLG of a MM patient with diffuse BM [18F]FDG uptake in the axial skeleton as well as both humeri and femora. The use of different tracer uptake thresholds leads to different BM segmentation patterns and, subsequently, to different whole-body MTV and TLG values. Notably, the application of AI approach 10 could identify no pathologically increased tracer uptake in the BM, leading to zero values for the parameters MTV and TLG

Correlation between automated quantitative PET/CT parameters and BM plasma cell infiltration

Exploratory correlation analysis revealed a significant, moderate, positive correlation between the automated quantitative whole-body PET/CT parameters, MTV and TLG, and BM plasma cell infiltration after employment of all ten thresholds. The results of this analysis can be found in Table 2.

Correlation between automated quantitative PET/CT parameters and patient survival

Median follow-up [95% CI] of the patient cohort from the date of PET/CT was 110 months [105–123 months]. At the time of analysis, the median PFS was 37.5 months [29.9–89.4 months] and 33 patients had progressed or died. Respectively, the median OS was not reached [104.0–NA], with 16 patients having died. Univariable analysis for both PFS and OS was performed including the automated volumetric parameters MTV and TLG derived after application of all ten thresholds. Multivariable analysis was also performed for the PET volumetric parameters, after adjusting for high-risk cytogenetic abnormalities and BM plasma cell infiltration rate.

Univariable analysis showed that both MTV and TLG predicted PFS accurately for all AI approaches (Table 3). In the multivariable analysis, high MTV based on approaches 1, 5, 6, 7, 8, and 9 retained its prognostic significance (Table 4), whereas TLG had no effect on PFS at any of the applied SUV thresholds. Further, the automated quantitative PET parameters were dichotomized at the median to investigate their effect on PFS, estimated by the Kaplan–Meier method and log-rank test. PFS curve was significantly shorter in patients with higher than the median volumetric PET values based on approaches 1, 2, 7, 8, and 10 for MTV, and approaches 1, 5, 6, 7, 8, and 9 for TLG, respectively (Fig. 3, Suppl. Figures 19, Suppl. Tables 1 and 2).

Table 3 Prognostic significance of the AI-derived PET biomarkers, according to the different SUV thresholds, for progression-free survival and overall survival in univariable analysis
Table 4 Multivariable model on clinical parameters and the AI-derived PET metric MTV significantly influencing progression-free survival and overall survival. The PET parameter TLG did not show any significant correlation with patient survival in the multivariable model
Fig. 3
figure 3

Example of Kaplan–Meier estimates of PFS according to AI-derived, whole-body MTV (A) and TLG (B), and estimates of OS according to whole-body MTV (C) and TLG (D), based on approach 7. The number of patients at risk in each group and at each time point is shown below the plots

Univariable analysis for OS showed that whole-body MTV based on approaches 1, 5, 6, 7, 8, and 9 was significantly associated with shorter survival (Table 3), whereas TLG had no effect on OS. Accordingly, in the multivariable regression model for OS, MTV derived from approaches 5, 6, and 9 was significantly associated with shorter OS, while a similar though non-significant trend was also observed for MTV based on approaches 1, 7, and 8 (Table 4). No significant results were obtained with high TLG values in multivariable analysis with any of the SUV thresholds used. After dichotomization of the PET parameters at the median, the log-rank test showed that the OS curve was significantly shorter in patients with automated PET values higher than the median based on approaches 1, 7, and 8 for MTV, and approaches 1, 5, 6, 7, 8, and 9 for TLG, respectively (Fig. 3, Suppl. Figures 19, Suppl. Tables 1 and 2).

Further, in an attempt to evaluate the discrimination performance of the automated tool in correctly predicting PFS over a time interval of 2 years, we performed ROC analysis for all approaches. In line with the previous, liver-based cut-offs, in particular approaches 1, 5, 6, 7, 8, and 9 provided the best results for PFS prediction, as reflected by the respective AUC being significantly different from 0.5 for both MTV and TLG. Cut points of absolute values of the MTV and TLG parameters for optimizing the sum of sensitivity and specificity were identified. Due to the small number of events (n = 4), we did not perform ROC-based calculations for 2-year OS. The detailed results of ROC analysis are presented in Table 5. An example of ROC curves is presented in Fig. 4.

Table 5 Area under the curve (AUC), 95% confidence interval (95% CI) of AUC, p-values for testing whether AUC = 0.5, and MTV and TLG thresholds optimizing the sum of sensitivity and specificity at this threshold according to the different approaches for prediction of 2-year PFS from the time of PET/CT imaging
Fig. 4
figure 4

ROC curves for prediction of 2-year PFS from the time of PET/CT imaging based on approach 5. The cut points of absolute volumetric PET values for optimizing the sum of sensitivity and specificity were 394.0 mL for MTV (sensitivity = 0.75, specificity = 0.75; A) and 1015.1 g for TLG (sensitivity = 0.75, specificity = 0.72; B)

Application of IMPeTUs

The use of IMPeTUs yielded the following results: the median 5-point DS of diffuse BM uptake was 3 (range DS = 2–5). Eleven patients (25%) had no detectable focal medullary hypermetabolic lesions (F1 score), while 33 of them (75%) had at least one focal hypermetabolic lesion (median F score = 2; range F score = 1–4). In the 33 patients with focal lesions, the median 5-point DS of the hottest lesion was 5 (range DS = 3–5). PMD and EMD were present in 22/44 (50%) and 4/44 (9%) patients, respectively. Eight patients (18%) had no osteolysis (L1 score), while 36 of them (82%) had at least one lytic lesion (median L score = 3; range L score = 1–4). Fractures were found in 20 patients (45%) (Suppl. Table 3).

Univariable analysis revealed that the number of focal, [18F]FDG-avid lesions and the presence of EMD were significantly associated with shorter PFS, whereas the other parameters considered in IMPeTUs had no effect on PFS. No parameter had a significant effect on OS, although a non-significant (p = 0.06) trend towards shorter OS was observed in patients with more hypermetabolic lesions. The results of this analysis are shown in Suppl. Table 4.

Similar to the automated PET volumetric parameters, multivariable analysis was performed for the IMPeTUs parameters after adjustment for high-risk cytogenetic abnormalities and BM plasma cell infiltration rate. The number of focal, [18F]FDG-avid, BM lesions and the presence of EMD were associated with significantly shorter PFS, while the other parameters did not have a significant effect on PFS. No parameter had a significant effect on OS.

Discussion

MM represents a heterogeneous hematological malignancy with a highly variable clinical outcome [2]. Therefore, the identification of reliable prognostic factors and robust positivity cut-offs for outcome prediction have beneficial implications for disease management. In this context, the volumetric PET indices MTV and TLG have emerged in recent years as promising metabolic parameters for the quantification of tumor burden and patient prognosis in MM, although no standardized methodology for their calculation has yet been established [13, 14, 23]. Accordingly, the aim of this study was to investigate the prognostic role of automated, baseline, volumetric PET/CT calculations of BM metabolism in patients with newly diagnosed MM, using an AI-based tool that has recently been shown to be satisfactorily applicable to the assessment of myeloma disease burden [16].

The major findings of our study are the following: firstly, we confirm the significant positive correlation between the AI-derived, whole-body parameters MTV and TLG and the degree of BM plasma cell infiltration, initially demonstrated in our previous study [16]. Secondly, based on univariable analysis, the MTV and TLG values are significantly associated with adverse patient outcome after application of several different [18F]FDG uptake thresholds for automated BM segmentation. Lastly, using liver [18F]FDG uptake, partly with some modifications, as a cut-off to define patient-level pathological tracer uptake in the BM, whole-body MTV is associated with poor PFS and OS in multivariable analysis adjusting for high-risk cytogenetic abnormalities and BM plasma cell infiltration rate.

PET-based volumetric calculation of total tumor burden has gained increasing interest as a more reliable way of estimating disease extent and therefore prognosis [24,25,26]. Nevertheless, this approach has not yet become established in daily nuclear medicine practice for clinical interpretation because its application requires accurate identification and segmentation of tumor lesions, as well as a degree of manual correction of segmentation and exclusion of tracer uptake in normal organs or other non-malignant sites [27, 28]. Particularly in MM, this process of exact tumor delineation can be cumbersome and time-consuming as these patients often have multiple lesions and different patterns of BM involvement, with both focal and diffuse bone lesions coexisting with varying degrees of [18F]FDG uptake. It would not be an exaggeration to claim that in some MM patients, accurate quantification of tumor volume is virtually impossible and is hampered by subjective evaluation due to the extensive infiltration of the BM and its variable patterns. Recognizing this difficulty and the lack of consensus on the best method for PET measurements in MM, we have recently applied an AI-based method with the aim of standardizing tumor volume quantification at the whole-body level. Having demonstrated the feasibility of the tool for automated volumetric quantification of BM infiltration, and the significant correlation of the therein generated whole-body MTV and TLG parameters with the results of conventional analysis of PET/CT findings [16], in the present work, we were able to confirm the significant positive correlation of these parameters with the degree of BM infiltration by plasma cells after application of ten different uptake thresholds in another MM cohort. Despite the sampling error in BM biopsies due to the spatial heterogeneity of MM in the BM compartment, this finding underlines the reproducibility and robustness of the AI tool for volumetric calculations, while also highlighting the suitability of these PET parameters for the assessment of histopathological tumor burden in MM, as suggested by previous studies [23, 29].

Nevertheless, the main aim of our work was to investigate the prognostic role of the AI-derived volumetric parameters in relation to outcome in patients with MM. Here, the results did indeed demonstrate the potential of automated quantification of total metabolic load as a tool for predicting survival. Starting with univariable analysis, the parameters MTV and TLG generated by most segmentation approaches were found to be predictive of PFS. Specifically, seven of the ten thresholds used (approaches 1, 2, 5, 6, 7, 8, and 9) showed that high MTV and TLG had a significant adverse effect on PFS. Furthermore, on multivariable analysis, MTV was found to be an independent predictor of PFS after applying six of the above SUV thresholds (approaches 1, 5, 6, 7, 8, and 9). Accordingly, in terms of OS, MTV had a significant effect on patient outcome both on univariable (approaches 1, 5, 6, 7, 8, and 9) and multivariable analysis (approaches 5, 6, and 9). A common feature of the approaches with a significant impact on survival is that, with some modifications, they use liver uptake as a threshold to define pathological [18F]FDG uptake in the BM, a finding also confirmed by the results of Kaplan–Meier analysis. These results support the role of patient-level segmentation using the liver as background reference organ in the evaluation of myeloma with PET/CT [10, 11, 15]. Further, given these findings, a robust yet simple and practical threshold to predict patient survival in MM would be the mean or median value of liver [18F]FDG uptake which could reliably serve as a reference for automatic BM segmentation.

In the same context, although some cut-off values for whole-body MTV and TLG for predicting 2-year PFS have been identified using ROC analysis, we prefer not to recommend these absolute values as general thresholds for predicting survival in MM. In contrast to other studies in this field, which relied exclusively on absolute tracer uptake values to identify abnormal glucose metabolism in the BM compartment and subsequently led to the generation of corresponding volumetric thresholds [12, 14, 23], in our work, thresholds based on liver uptake per patient performed better than absolute SUV cut-off values. On the basis of these findings, we therefore advocate the use of patient-level liver-based segmentation thresholds, in line with previous studies that have highlighted BM metabolism relative to liver uptake as a useful and reproducible approach to PET image interpretation and as a reliable predictor of outcome, especially in MM patients receiving ASCT [10, 30]. Interestingly, the use of liver [18F]FDG uptake as a threshold to classify BM uptake within the IMPeTUs criteria did not have a significant effect on survival, which may be due to the relatively small patient cohort. On the other hand, the number of focal hypermetabolic lesions and EMD had a significant adverse effect on PFS, which is consistent with previous analysis by our group in this cohort [20].

The identification of MTV as a negative predictor in MM is consistent with previous works [12,13,14, 23]. However, the main novelty of the present study is the use of an automated, quick, and simple global thresholding tool for bone segmentation and subsequent quantification of [18F]FDG metabolism, which is not affected by manual or semi-automated region of interest (ROI) definitions, and thus operator-independent. In particular, the deep learning, volumetric quantification method applied here is based on the initial CT-based segmentation of the skeleton, its transfer to the PET images and the application of different tracer uptake thresholds, and the subsequent refinement of the resulting regions using postprocessing. In fact, this is an area where AI can find broad application, as it can surpass older, conventional methods of whole-body volumetric analysis and can be used as an adjunct and very facilitating tool for interpreting physicians, removing a significant amount of repetitive, time-consuming, operator-dependent, and tedious tasks [28]. An additional strength of our analysis lies in the follow-up period of the cohort, which is, to our knowledge, the longest published in the field of PET/CT studies focusing on application of whole-body, volumetric metabolic parameters in MM patients.

This study has certain limitations. First, this is a single-center retrospective analysis of prospectively acquired data from a relatively small cohort. Ideally, data from larger, prospective, multicentric studies would be warranted to validate the findings presented here. However, the cohort studied is homogeneous, consisting of treatment-naïve MM patients at the time of PET/CT, who have received very similar therapies and have been followed for a very long time. Second, there are limitations in the segmentation method used: the calculation of MTV and TLG is SUV-dependent, meaning that the calculation of these parameters is susceptible to a number of factors that affect SUV, such as partial volume correction, blood glucose, reconstruction and acquisition parameters, and time between [18F]FDG injection and image acquisition [16]. Further weaknesses of the applied methodology include the exclusion of the skull from calculations due to the very high [18F]FDG uptake in the brain, which requires an independent assessment of this anatomical area, inevitably making the method more operator-dependent in selected MM cases with cranial involvement. Another source of error that may potentially require manual corrections is extensive lytic or paramedullary lesions, which may be excluded from the BM segmentation, as the AI tool is based on the initial CT-based identification of the skeleton using the Hounsfield unit (HU) scale of each region. These specific lesions will be investigated in a larger patient cohort in terms of another multi-center, randomized phase 3 trial [31] with the aim of validating AI-based volumetric PET calculations against whole-body MRI, which is considered the modality of choice for BM assessment in MM patients [7]. Finally, although the patients studied received novel agents at that time, the therapeutic landscape in MM is rapidly changing, with newer agents such as monoclonal antibodies, bispecific antibodies, and CAR-T cells being approved or tested for the treatment of the disease. It would therefore be interesting to validate the prognostic performance of the tool in cohorts receiving such therapies.

Conclusion

In this study, we investigated the prognostic role of automatically obtained whole-body volumetric calculations of BM metabolism after application of an automated, deep learning–based tool on PET/CT images in a cohort of 44 MM patients. At a median follow-up of 110 months [105–123 months], based on univariable analysis, the AI-derived parameters MTV and TLG had a significant adverse effect on patient outcome after application of several different [18F]FDG uptake thresholds for automated BM segmentation. Importantly, in multivariable analysis adjusted for high-risk cytogenetic abnormalities and BM plasma cell infiltration rate, whole-body MTV was significantly associated with poor PFS and OS. In addition to highlighting the prognostic significance of automated, global volumetric calculations of metabolic tumor burden, using the liver uptake as reference for BM segmentation, these data open up new perspectives towards solving the complex problem of interpreting PET scans in MM patients with a simple, fast, and robust method that is not affected by manual or semi-automated, and thus operator-dependent, interventions.