Introduction

[18F]FDG PET/CT is an imaging modality of high performance in the management of patients with multiple myeloma (MM) [1,2,3,4]). Foremost, due to its ability to reliably differentiate metabolically active from inactive lesions, [18F]FDG PET/CT is considered the appropriate method for treatment response evaluation in the disease [3, 4]. On the other hand, [18F]FDG PET/CT carries some limitations in MM evaluation. Some of these limitations are rather general, derived from the non-specific nature of the tracer, such as several false-positive findings [5], while some are more specific for MM, including a non-negligible (11%) incidence of false-negative results [1, 6]. Moreover, one particular challenge that clinical specialists and radiologists commonly face in MM is the standardization of the evaluation of PET/CT scans. This issue is mainly due to the different patterns of bone marrow (BM) infiltration in the disease, which, in turn, may hamper inter-observer reproducibility in interpreting scan results [7].

In recent years, many different approaches have been developed in order to address the issue of standardization of [18F]FDG PET/CT evaluation in MM. These approaches have made use of visual [7, 8] as well as semi-quantitative and quantitative [1,2,3,4,5,6,7,8,9,10,11,12] approaches. Although all these attempts seem promising, none can yet be considered a standard and widely accepted method in the interpretation of PET/CT. In this context, visual evaluation of the PET/CT scans remains the mainstay in clinical routine.

Deep learning, a subfield of artificial intelligence (AI), has nowadays become the method of choice for automated image analysis [13]. This method provides new opportunities for the development of automated analysis tools for CT, PET/CT, and MRI, which have the potential to improve or replace current methods for the evaluation of these imaging modalities [14]. Still, although the number of studies in this field is constantly growing, a large body of the literature is dominated by retrospective cohort studies with limited external validation and a high probability of bias [15,16,17,18,19]. Particularly with regard to MM, there are no data on the application of deep learning tools for the assessment of malignancy using PET/CT.

Accordingly, the aim of this prospective study is to evaluate a novel three-dimensional deep learning-based tool on PET/CT images for automated assessment of the intensity of BM metabolism in MM patients.

Materials and methods

Patients

Thirty-five consecutive patients (26 male, 9 female; mean age 59.3 years) with previously untreated MM based on the criteria established by the International Myeloma Working Group (2014) were included in this analysis (Table 1) [20]. All patients were investigated in the context of an open-label, multicenter, randomized, active-controlled, phase 3 trial (GMMG-HD7) [21]. No patient had previously received chemotherapy, granulocyte colony-stimulating factor (G-CSF), or erythropoietin. All patients gave written informed consent after the study was fully explained to them. The study was conducted in accordance with the International Conference on Harmonization Good Clinical Practice guidelines and the Declaration of Helsinki principles, and with institutional approval by the ethical committee of the University of Heidelberg (AFmu-412/2018) and the Federal Agency of Radiation Protection in Germany (“Bundesamt für Strahlenschutz”).

Table 1 Patient characteristics (N = 35)

PET/CT data acquisition

All patients underwent whole-body [18F]FDG PET/CT at diagnosis before commencement of treatment including induction therapy, high-dose chemotherapy (HDT), and autologous stem cell transplantation (ASCT), as well as maintenance therapy based on lenalidomide. All patients were normoglycemic at the time of the PET study. Imaging was performed at a mean time of 64 min (range: 51–79 min) postinjection (p.i.) of [18F]FDG from the skull to the toes with an image duration of 2 min per bed position. A dedicated PET/CT system (Biograph mCT, S128, Siemens Co., Erlangen, Germany) with an axial field of view of 21.6 cm with TruePoint and TrueV, operated in a three-dimensional mode, was used. A low-dose attenuation CT (120 kV, 30 eff mA) was used for attenuation correction of the PET data and for image fusion. All PET images were attenuation-corrected, and an image matrix of 400 × 400 pixels was used for iterative image reconstruction. Iterative image reconstruction was based on the ordered subset expectation maximization (OSEM) algorithm with two iterations and 21 subsets as well as time of flight (TOF).

PET/CT data analysis

[18F]FDG PET/CT images were analyzed on an Aycan workstation. Two experienced, board-certified nuclear medicine physicians well versed in MM diagnosis (with 11 and 30 years of clinical experience in MM PET diagnostics—first and last authors, respectively) read and interpreted the datasets in consensus. The qualitative analysis was based on the visual assessment of the PET/CT scans. In particular, BM/skeletal foci presenting with enhanced (higher than background bone) [18F]FDG uptake, for which another benign etiology—for example, trauma or arthritis—was excluded, were considered positive for myeloma. These foci were correlated with the fused low-dose CT findings to ensure higher diagnostic accuracy. Nevertheless, even foci of increased [18F]FDG uptake without corresponding osteolysis on CT were accounted for as MM-positive, since in general terms, metabolic processes take place earlier than morphological changes [22]. The number of [18F]FDG-avid lesions was calculated in each patient since this parameter is of prognostic significance in newly diagnosed MM, with a higher number of lesions being associated with adverse progression-free survival (PFS) and overall survival (OS) [2324].

Moreover, the degree of diffuse [18F]FDG uptake in the BM was estimated both visually, mainly employing the maximum intensity projection (MIP) PET images, and semi-quantitatively, after the calculation of the standardized uptake value (SUV) in the iliac bone and the lower lumbar spine. Due to its reasonably uniform tracer uptake, the liver parenchyma was used for background measurements by positioning the spheric VOIs in the right liver lobe, if without lesions, and at least 1 cm away from the edge of the liver. Based on these, the uptake in the BM was classified as negative/mild (< liver uptake), moderate (> 1.1 × liver uptake), and intense (> 2 × liver uptake) [7].

Furthermore, patients were classified into three groups based on the combination of the aforementioned findings and similar approaches applied in the literature in the field [8, 25]: group A, including patients with no focal lesions and negative/mild diffuse BM uptake (< liver). Group B, including patients with 1–3 focal lesions and/or moderate diffuse BM uptake (> liver uptake, + 10%). Group C, including patients with > 3 focal lesions and/or intense diffuse BM uptake (> > liver uptake, twice).

Automated quantification method

The proposed deep learning-based method consists of the following three steps:

  1. 1.

    CT-based organ segmentation

  2. 2.

    Application of SUV threshold(s) inside the relevant organs

  3. 3.

    Refinement of the resulting regions using postprocessing

The convolutional neural network (CNN) described in [14] was used to segment 17 different bones as well as the liver and the gluteus maximus muscle. The bones were divided into bones of the axial skeleton, including the vertebrae, scapulae, clavicles, sternum, ribs, sacrum, and pelvic bones, and into bones of the extremities, including the humeri, ulnae, radii, hands, femora, patellae, tibiae, fibulae, tali, and feet. The skull was excluded frοm the evaluations to avoid the effect of the intense physiological [18F]FDG uptake from the brain.

The CT-based segmentation was transferred to the SUV PET images, and, subsequently, different SUV thresholds were applied to identify BM infiltration. All pixels with SUV above or equal to the threshold were segmented as positive for infiltration by MM. In the present analysis, six different SUV thresholds were applied for the definition of pathological tracer uptake in the skeleton:

  • Approach 1: For the bones of the axial skeleton, a threshold of (liver SUVmedian) × 1.1 was used. Respectively, for the bones of the extremities, a threshold of (gluteus maximus SUVmedian) × 4 was applied.

  • Approach 2: For the bones of the axial skeleton, a threshold of (liver SUVmedian) × 1.5 was used, and for the bones of the extremities, a threshold of (gluteus maximus SUVmedian) × 4.

  • Approach 3: For the bones of the axial skeleton, a threshold of (liver SUVmedian) × 2 was used, and for the bones of the extremities, a threshold of (gluteus maximus SUVmedian) × 4.

  • Approach 4: Every bone was given a threshold of ≥ 2.5 according to Terao et al. [12].

  • Approach 5: For the bones of the axial skeleton, a threshold of ≥ 2.5 was used, and for the bones of the extremities, a threshold of ≥ 2.0 was used.

  • Approach 6: The SUVmax in the liver was used for all bones as proposed in the respective literature [7, 8, 26] (Table 2).

Table 2 The different SUV thresholds applied for the definition of pathological tracer uptake in the BM with the AI tool, their median values (range) in the studied cohort, and the respective r and p values of the correlation analysis between MTV, TLG, and the biopsy-derived BM infiltration rate by malignant plasma cells

The reason for the use of SUVmedian in Approaches 1–3 lies in the fact that median values are more robust to outlier SUV values, e.g., due to small inconsistencies in organ segmentation or PET/CT misalignment.

Due to the poor resolution of PET images, tracer uptake from adjacent tissue might spill over into the bone mask. Therefore, in order to alleviate this effect, the following steps were employed:

  1. 1.

    Each pixel in the mask was assigned to its local maximum using Meyer’s flooding algorithm [27].

  2. 2.

    Each pixel in the mask was then given the same label as its local maxima.

  3. 3.

    All connected components (18-connectivity) with volumes less than 1 mL were removed.

Using the resulting masks, the total, whole-body metabolic tumor volume (MTV) could be estimated as the volume of the segmented high uptake in each patient. In particular, MTV (mL) represents the myeloma lesions’ volume visualized on PET/CT with SUV greater than a pre-defined threshold (absolute value or relative to other organs). Similarly, the total lesion glycolysis (TLG) was estimated as the product of the average SUV and MTV for the segmented regions (TLG = SUVmean × MTV) (Fig. 1).

Fig. 1
figure 1

Image processing methodology for calculation of whole-body MTV and TLG with the application of the deep learning-based software tool. Myeloma lesions in the bone marrow compartment (red) are visualized on a standard display of PET/CT (left). The lesions are calculated based on an AI-based bone segmentation in CT illustrated in the 3D image (second from the right) and appropriate SUV thresholds for each group of bones resulting in bone lesion segmentations illustrated in the 3D image to the right. The resulting lesion segmentation (red) can be used to compute the total MTV and TLG for the patient

Clinical parameters, BM plasma cell infiltration, and fluorescence in situ hybridization

Thirty-four patients received BM biopsies from the iliac crest performed within 4 weeks of the [18F]FDG PET/CT examination and prior to the commencement of treatment. BM trephines were analyzed using hematoxylin–eosin stain, periodic acid–Schiff stain, and Giemsa stain. The percentage of BM infiltration by plasma cells was assessed via a light microscope. The infiltration rate represents the number of plasma cell in comparison to all nucleated cells in the BM. The monoclonality of plasma cells was confirmed by immunohistochemical staining.

Cytogenetic analyses were performed on CD138-purified BM plasma cells. High-risk cytogenetics was defined as the presence of at least one of the following aberrations (cutoff, ≥ 10% of cells): del(17)(p13), t(4;14)(p16;q32), or t(14;16)(q32;q23) (21). For the definition of high-risk disease, the Revised International Staging System (R-ISS) score was defined. Based on this prognostic system, stage R-ISS I included patients of ISS stage I (serum β2-microglobulin level < 3.5 mg/L and serum albumin level ≥ 3.5 g/dL), no high-risk chromosomal abnormalities, and a normal lactate dehydrogenase (LDH) level; stage R-ISS III included ISS stage III (serum β2-microglobulin level > 5.5 mg/L) and high-risk chromosomal abnormalities or a high LDH level; stage R-ISS II included all the other possible combinations [28].

Statistical analysis

For all approaches, MTV and TLG measurements showed a skewed distribution. Therefore, median and range values are reported. Consequently, the correlation analysis of MTV and TLG measurements with BM infiltration rate and β2-microglobulin was based on Spearman’s rank correlation. For two-group comparisons (high-risk versus standard-risk cytogenetic abnormalities), the Wilcoxon rank sum test was used. To investigate whether there is a positive or negative trend in MTV and TLG measurements with ISS stage, R-ISS-stage, or groups A, B, and C from PET visual analysis, the (non-parametric) Jonckheere-Terpstra test for trend was used from the R package DescTools. The Jonckheere-Terpstra test was used for the investigation of the trend between the results of the PET visual analysis and the BM infiltration rate. The receiver operating characteristic (ROC) curve was used to investigate the performance of MTV and TLG for discrimination of the population according to the degree of BM infiltration rate, based on the BM plasma cell cut-off of 60% (≥ 60% versus < 60%). The area under the curve (AUC) was calculated, and the cut point optimizing the sum of sensitivity and specificity was determined for each Approach. Calculations were performed with R version 4.1.1 with packages DescTools and pROC. p values below 0.05 were considered statistically significant.

Results

Patient cohort

The plasma cell infiltration, as derived from BM biopsies, ranged between 4 and 100%, with a mean value of 42% (median = 38%). Cytogenetic data were available in 31 patients (89%), with high-risk cytogenetic abnormalities being detected in 8/31 (26%) of them. A combination of the ISS and cytogenetic data was available in 31 patients. Based on this, 17 patients were classified in the R-ISS-1 group (55%), 11 patients in the R-ISS-2 group (35%), and three patients in the R-ISS-3 group (10%). The patients’ characteristics are summarized in Table 1.

[18F]FDG PET/CT findings

Visual analysis

Based on the results of the visual (qualitative) analysis of the PET/CT scans, 12 patients were classified into group A, 8 patients into group B, and 15 patients into group C. No statistically significant trend was observed between the results of the PET/CT visual analysis and the BM plasma cell infiltration. There were no cases of marked misalignment between PET and CT due to patient movement. Moreover, no patient exhibited an increased gluteal muscle uptake of reactive origin.

Automated quantification method

Six different SUV thresholds were applied for the definition of pathological tracer uptake in the skeleton and the subsequent AI-based, automated calculation of the MTV and TLG values. The results of this analysis are presented in Table 2. Examples of the application of the AI-based software tool for automated calculation of whole-body MTV and TLG in two MM patients are presented in Figs. 2 and 3.

Fig. 2
figure 2

Example of the application of the AI-based software tool for automated calculation of total MTV and TLG of a MM patient with intense diffuse BM [18F]FDG uptake, therefore visually classified in group C. The use of different tracer uptake thresholds leads to different BM segmentation patterns and, subsequently, to different MTV and TLG values

Fig. 3
figure 3

Example of the application of the AI-based software tool for automated calculation of total MTV and TLG of a MM patient with multiple focal [18F]FDG-avid lesions, therefore visually classified in group C. The use of different tracer uptake thresholds leads to different BM segmentation patterns and, subsequently, to different MTV and TLG values. Of note is the presence of a large myeloma bone lesion originating from the second left rib and infiltrating the adjacent soft tissues, of which only the osseous and not the paramedullary part is identified and segmented. Accurate calculation of paramedullary disease (PMD) represents a potential challenge for the implementation of the software tool

Correlation between automated quantitative PET/CT parameters and visual PET/CT evaluation

A significant (p < 0.001) positive trend was observed between the results of the visual analysis of the PET/CT scans, based on the classification of patients in groups A, B, and C, and the MTV and TLG values after the application of all six [18F]FDG uptake thresholds. In addition, there were significant differences between the three patient groups with regard to their MTV and TLG values for all applied thresholds.

Correlation between automated quantitative PET/CT parameters, BM plasma cell infiltration, and clinical data

Exploratory correlation analysis revealed a significant, moderate, positive correlation between the automated quantitative PET/CT parameters, MTV and TLG, and BM plasma cell infiltration as well as plasma levels of β2-microglobulin after the utilization of the thresholds applied in Approaches 1, 2, 4, and 5 (Table 2). On the other hand, no significant correlations were observed for these parameters when employing Approaches 3 and 6.

In an attempt to evaluate the performance of the automated tool in providing information on myeloma disease severity, we dichotomized the population based on the cut-off of BM plasma cells of 60% and investigated the performance of whole-body MTV and TLG for the discrimination of patients based on this histopathological feature by ROC analysis. In line with the previous, Approaches 1, 2, 4, and 5 provided the best results regarding discrimination of population according to the degree of BM plasma cell infiltration, as reflected by the respective AUC being significantly different from 0.5. Cut points optimizing the sum of sensitivity and specificity were identified. The detailed results of the ROC analysis are presented in Tables 3 and 4.

Table 3 Area under the curve (AUC), 95% confidence interval (95% CI) of AUC, p values for testing whether AUC = 0.5, MTV thresholds optimizing the sum of sensitivity and specificity, as well as sensitivity, specificity, and accuracy at this threshold according to the different approaches for the discrimination of patient population based on the degree of bone marrow plasma cell infiltration (≥ 60% versus < 60%)
Table 4 Area under the curve (AUC), 95% confidence interval (95% CI) of AUC, p values for testing whether AUC = 0.5, TLG thresholds optimizing the sum of sensitivity and specificity, as well as sensitivity, specificity, and accuracy at this threshold according to the different approaches for the discrimination of patient population based on the degree of bone marrow plasma cell infiltration (≥ 60% versus < 60%)

In contrast, no significant correlation was observed with the ISS-stage and the R-ISS stage for any of the applied thresholds. Moreover, no statistically significant differences were found between patients with high-risk abnormalities and those with standard cytogenetic risk, regarding automated PET parameters.

Discussion

The interpretation of [18F]FDG PET/CT in MM may prove particularly challenging since both focal and diffuse bone lesions may coexist with varying degrees of [18F]FDG uptake. In clinical routine, the evaluation of BM involvement is primarily visual and subjective in nature, with quantitative—thus more objective—assessments being mainly restricted in the calculation of the semi-quantitative parameter, SUV, which is, however, susceptible to several factors, such as the reconstruction and acquisition parameters, partial-volume correction, blood glucose, and time between [18F]FDG injection and image acquisition [25], which affect its reliable and reproducible measurement. However, the standardized and reproducible interpretation of [18F]FDG PET/CT scans is clinically relevant in both the pre- and posttreatment settings of MM. Especially, the identification of robust positivity cut-offs for outcome prediction would have beneficial implications in the management of the disease. In this context, MTV and TLG have been proposed as promising metabolic parameters for the quantification of tumor burden and outcome prediction in MM [9, 10, 12]. At the same time, however, the accurate calculation of these parameters can be a very demanding task since it requires great computing power as well as fast and reproducible computer programs, enabling proper segmentation and correction of the background activity and partial volume effect [25]. The herein proposed approach, involving a combination of AI-based segmentation of the skeleton and subsequent thresholding of metabolic activity, aimed to objectively address these issues, enabling an automated, volumetric assessment of the BM metabolism in MM patients.

There are three major findings after the initial application of our deep learning-based tool in MM: firstly, the automated, volumetric, whole-body assessment of the intensity of BM metabolic activity in PET/CT images is feasible. Secondly, the AI-derived PET/CT biomarkers MTV and TLG are significantly correlated with the visual (subjective) analysis of the extent of BM involvement in [18F]FDG PET/CT images. Thirdly, automatically based MTV and TLG values are also significantly correlated with the degree of BM plasma cell infiltration rate and the independent prognostic factor β2-microglobulin after the application of certain [18F]FDG uptake thresholds.

The herein applied deep learning, whole body, volumetric quantification method of [18F]FDG metabolism in the BM is based on the initial CT-based segmentation of the skeleton, its transfer in the PET images, the application of different thresholds of tracer uptake, and the subsequent refinement of the resulting regions using postprocessing. Global thresholding for bone segmentation has only recently been applied in the setting of MM with promising results. Takahashi et al. developed a semi-automated, quantitative parameter, defined as the intensity of bone involvement (IBI), for the assessment of the amount and extent of [18F]FDG uptake based on SUV metrics, using liver SUV as a threshold to determine metabolically active volumes in the skeleton. After the categorization of MM patients into three groups, based on the degree of visually assessed bone involvement in PET/CT, which served as a reference, the authors found significant differences between the three groups regarding the median IBI score [25]. The same group evaluated the parameter IBI for monitoring outcomes in MM. Again, after categorization of patients into three groups based on the visual analysis of PET/CT (PET-remission, PET-stable, and PET-progression), the authors found that the IBI variation (ΔIBI) between two consecutive scans was related to the outcome in PET/CT as evaluated visually, while, moreover, significant differences in ΔIBI were found between the three groups [29]. In our study, patients were also classified into three groups based on visual and semi-quantitative evaluation of the PET/CT scans, after taking into account parameters suggested by the literature [8, 24, 2530]. We could, similarly, demonstrate a significant positive correlation between automatically derived PET parameters for all six thresholds and the degree of BM involvement in PET/CT as assessed by visual analysis. Moreover, significant differences were highlighted between the three patient groups regarding the MTV and TLG values for all applied thresholds. Of note is the—partly marked—variance in the yielded MTV and TLG values between the different approaches (Approaches 1–6) employed, which highlights the sensitivity of whole body calculations depending on the applied [18F]FDG uptake threshold, thus calling for caution in the routine use of the tool depending on the respective clinical setting.

Another distinguishing point between this work and previous ones in the field is that in our study, we went one step further and managed to show a significant moderate correlation between the AI-derived MTV and TLG and two clinically relevant biomarkers in MM. In specific, the demonstration of the correlation between the automated, volumetric PET parameters—derived by four of the evaluated approaches (Approaches 1, 2, 4, and 5)—and the percentage of BM plasma cells derived from biopsies, a main histopathological biomarker in the disease, and the levels of β2-microglobulin, a powerful predictor of survival and a key variable of ISS [31,32,33], significantly enhanced the robustness of our analysis, suggesting four of the applied thresholds as potentially useful cut-off values for reliable segmentation of the pathological skeleton. Moreover, the application of these four thresholds provided the best results in terms of discrimination of the studied population according to the degree of disease severity, using as a cut-off the BM plasma cell infiltration of 60% [2034]. These approaches were based on the comparison of [18F]FDG metabolism in the BM either with the tracer activity in reference organs which show very low variability and a narrow range in tracer uptake (liver and gluteal muscles) [35,36,37] or with absolute SUV values [12]. The reason for the partial use of different pathological uptake thresholds for the axial skeleton and the long bones is based on the fact that [18F]FDG uptake in the skeleton is not uniform, gradually decreasing from the axial to the appendicular skeleton [38]. Based on the present findings, these approaches will be further evaluated in future studies with larger patient cohorts. On the other hand, two of the applied thresholds (Approaches 4 and 6) failed to either demonstrate statistically significant correlations with the abovementioned clinical parameters or to discriminate the patient population based on the degree of BM plasma cell infiltration, which is attributed to the use of high [18F]FDG uptake thresholds, leading to rather low whole body MTV and TLG values.

The interest in volumetric PET measurements in MM is not new. Fonti et al. were the first to explore the predictive role of MTV and TLG in a mixed group of 47 MM patients who received various therapies. Their analysis was based on the identification of focal lesions and the calculation of SUVmax. Afterwards, MTV was calculated in those lesions with a SUVmax > 2.5, which was almost the same as one of the thresholds applied in our analysis (SUVmax ≥ 2.5, Approaches 5 and 6) that led to a significant correlation between the automated MTV and TLG values and the percentage of BM plasma cells and β2-microglobulin. Similarly to our results, the authors noted that MTV positively correlated with the percentage of BM infiltration by plasma cells (r = 0.46), while TLG correlated significantly with β2-microglobulin levels (r = 0.38). They could, moreover, show that an MTV value of 77.6 mL and a TLG value of 201.4 g predicted patients with a good OS [9]. In line with this, in a larger and more homogeneous MM cohort, McDonald et al. found that baseline TLG > 620 g and total MTV > 210 mL of MM lesions were significant factors in poor PFS and OS. In that study, MM lesions were defined as foci of increased [18F]FDG uptake exhibiting a peak SUV (SUVpeak) greater than that of background BM assessed in the most inferior vertebral body [10]. These findings are in agreement with the ones in the present analysis. However, an essential difference between the aforementioned studies and ours is that these approaches were not automated, and were, thus, dependent on ROI definition, which was not the case in our analysis.

Recently, in a retrospective analysis of 185 patients with newly diagnosed MM, Terao et al. investigated the predictive value of pre-treatment MTV and TLG, as assessed by a semi-automated, computer-aided analysis of the PET/CT images, and compared it with conventional PET/CT variables. The authors could show that the high-burden MTV and TLG findings were superior to the conventional high-risk PET/CT variables for outcome prediction, as assessed by PFS and OS [12]. Similarly to our results, in another study of the same group, a significant correlation between TLG and the percentage of plasma cells in the BM was demonstrated, rendering this PET parameter potentially suitable for evaluating the histopathological tumor burden in MM [39]. Notably, in the studies by Terao et al., MTV was defined as the volume of myeloma lesions with SUV ≥ 2.5, a threshold also herein applied (Approaches 5 and 6) that led to significant correlations between the PET, histopathological, and clinical parameters.

We note some limitations in our study. Foremost, the number of patients enrolled and PET/CT scans analyzed was relatively small. However, the studied cohort is homogeneous, consisting of treatment-naive, symptomatic MM patients examined in terms of an ongoing prospective study. Therefore, the presented findings can only be considered the preliminary results of an ongoing study. Secondly, the vast majority of PET/CT findings were not histopathologically confirmed, which is, obviously, not possible in the clinical setting. However, the demonstration of a significant correlation with two commonly accepted reference standards, namely, the percentage of BM infiltration by malignant plasma cells as derived from biopsies of the iliac crest and the plasma levels of β2-microglobulin, essentially contributed to the validation of the results. Moreover, especially with regard to the diffuse BM uptake pattern, in an effort to reduce the incidence of false positive findings, it was ensured that no included patient had previously received agents or medications, which could lead to a diffusely increased tracer accumulation in the BM, at least one month before the PET/CT study [40]. Furthermore, limitations exist with regard to the applied segmentation method: the calculation of MTV and TLG is SUV-dependent, meaning that every factor affecting SUV calculations may also affect the evaluation of these parameters. Moreover, the patient’s skull was excluded from the segmentation analysis and subsequent metabolic parameters’ calculation due to the very high-lying diffuse [18F]FDG uptake of the brain, rendering the skull as an “obscured site” [1]. Although in our sample, no patient had metabolically active, focal, cranial [18F]FDG-avid lesions, this anatomical area must be analyzed independently, inevitably making the method more operator-dependent in selected MM cases with cranial involvement. Finally, extensive lytic or paramedullary lesions, i.e., soft tissue/extraosseous lesions originating from bone lesions (Fig. 3), may be an additional source of error, subsequently leading to the need for manual corrections; since the AI tool initially makes a CT-based identification of the skeleton based on the HU scale of each region, it may be possible that large osteolytic lesions or soft tissue infiltrations linked to skeletal involvement are excluded from the BM segmentation. These issues will be specifically investigated in the future in a larger patient cohort in the context of this multicenter, randomized phase 3 trial, with the goal of validating the AI-based automated PET results in comparison to patient outcome data as well as the findings of whole-body MRI, which is considered the modality of choice for bone marrow evaluation and assessment of disease extent in MM patients [41].

Conclusion

In an attempt to address the issue of the standardization of the [18F]FDG PET/CT interpretation in MM, we validated a novel three-dimensional deep learning-based tool on PET/CT images for automated assessment of the intensity of BM metabolism in a group of 35 consecutive, symptomatic, treatment-naive MM patients. We could show that BM segmentation and calculation of whole-body MTV and TLG after the application of the deep learning tool were feasible in all patients. Moreover, the AI-derived quantitative PET parameters correlated significantly with the results of the visual analysis of the PET/CT scans as well as with the biopsy-derived BM plasma cell infiltration and the plasma levels of β2-microglobulin. These preliminary results suggest that the automated, volumetric, whole-body PET/CT assessment of the BM metabolic activity after the application of the deep learning-based tool is a potentially reliable method in the direction of optimization and standardization of PET/CT interpretation in MM and will be further evaluated in future prospective studies with larger patient cohorts.