Quantification of tumor burden in multiple myeloma by atlas-based semi-automatic segmentation of WB-DWI
Whole-body diffusion weighted imaging (WB-DWI) has proven value to detect multiple myeloma (MM) lesions. However, the large volume of imaging data and the presence of numerous lesions makes the reading process challenging. The aim of the current study was to develop a semi-automatic lesion segmentation algorithm for WB-DWI images in MM patients and to evaluate this smart-algorithm (SA) performance by comparing it to the manual segmentations performed by radiologists.
An atlas-based segmentation was developed to remove the high-signal intensity normal tissues on WB-DWI and to restrict the lesion area to the skeleton. Then, an outlier threshold-based segmentation was applied to WB-DWI images, and the segmented area’s signal intensity was compared to the average signal intensity of a low-fat muscle on T1-weighted images. This method was validated in 22 whole-body DWI images of patients diagnosed with MM. Dice similarity coefficient (DSC), sensitivity and positive predictive value (PPV) were computed to evaluate the SA performance against the gold standard (GS) and to compare with the radiologists. A non-parametric Wilcoxon test was also performed. Apparent diffusion coefficient (ADC) histogram metrics and lesion volume were extracted for the GS segmentation and for the correctly identified lesions by SA and their correlation was assessed.
The mean inter-radiologists DSC was 0.323 ± 0.268. The SA vs GS achieved a DSC of 0.274 ± 0.227, sensitivity of 0.764 ± 0.276 and PPV 0.217 ± 0.207. Its distribution was not significantly different from the mean DSC of inter-radiologist segmentation (p = 0.108, Wilcoxon test). ADC and lesion volume intraclass correlation coefficient (ICC) of the GS and of the correctly identified lesions by the SA was 0.996 for the median and 0.894 for the lesion volume (p < 0.001). The duration of the lesion volume segmentation by the SA was, on average, 10.22 ± 0.86 min, per patient.
The SA provides equally reproducible segmentation results when compared to the manual segmentation of radiologists. Thus, the proposed method offers robust and efficient segmentation of MM lesions on WB-DWI. This method may aid accurate assessment of tumor burden and therefore provide insights to treatment response assessment.
KeywordsDiffusion weighted imaging Semi-automatic segmentation Atlas-based segmentation Total lesion burden Multiple myeloma
Multiple myeloma is a hematologic neoplasia characterized by an abnormal proliferation of malignant plasma cells throughout the bone marrow with specific diagnosis criteria [1, 2]. Whole-body diffusion weighted imaging (WB-DWI) has a proven value to detect and follow-up of MM lesions . The International Myeloma Working Group (IMWG) holds that magnetic resonance imaging (MRI) is the gold standard for the detection of bone marrow involvement in MM . Whole-body MRI (WB-MRI) protocols have now been standardized, which include T1-weighted (T1w), short tau inversion recovery (STIR) and diffusion weighted imaging (DWI) . Infiltrative bone marrow pathologies, such as MM, show high cellularity patterns and water content while decreasing the amount of fat. As a result, MM lesions often present with low signal on T1w spin echo MRI, being easily distinguishable from yellow marrow, while showing restricted diffusion. Apparent diffusion coefficient (ADC) is an imaging biomarker that quantifies diffusion processes within the tissues, and it is related to the ratio of intracellular and extracellular water diffusivity. It has been proposed as a potential imaging biomarker to assess treatment response. Hypercellularity or cell swelling causes contraction of the extracellular space, resulting in restricted diffusion of water molecules, as indicated by a low ADC value; on the contrary, tissues with low cellularity or necrotic areas exhibit an increase of ADC value. Combined with signal intensity on high b-value DWI images, ADC has been shown as a good indicator of the biophysical properties of bone metastases [6, 7, 8, 9].
One of the biggest challenges for radiologists when evaluating WB-MRI images is the increased volume of data. Additionally, in patients with a large number of MM lesions and several patterns of bone marrow involvement, the quantification of the total lesion burden by means of segmentation is challenging when using manual delineation, on a slice by slice manner. Furthermore, the choice of a limited number of lesions, occasionally engaged due to time constraints, instead of the total lesion volume may lead to wrong treatment response assessment. Lesion segmentation is challenging and depend on several factors, including low image quality, a large number of lesions and limited contrast resolution. These problems may be solved by utilizing information from multiple contrasts mechanisms and by combining different segmentation algorithms (thresholding, region-growing, clustering) . Although region-growing algorithms to delineate lesions are emerging , significant user’s interaction is still necessary to define seed points or a threshold that segments the lesion. Another big challenge, related to infiltrative bone marrow pathologies, is the inherent difficulty to distinguish between disease infiltration and hematopoietic marrow due to reconversion after treatment or hematopoietic stimuli in patients with MM .
In this work, we propose a novel smart algorithm (SA) that: i) removes organs presented with high signal intensity on WB-DWI (spleen, kidneys, spinal cord, bladder and testis) based on WB atlas registration, ii) restricts the lesion area to the skeleton and nearby areas, also based on a WB atlas registration and iii) segments suspicious areas on DWI images of MM patients, utilizing T1 information. The purpose of the current study was to develop a fast, semi-automatic segmentation method that is robust irrespectively of the types of MM lesion patterns and that could be used to assist radiologists in the accurate quantification of total lesion burden with implications on treatment response assessment.
Patients and imaging protocol
Forty WB-MRI from MM patients, acquired between 2014 and 2018, were consecutively selected by a hematologist from our MM database. The Hemato-Oncology department of our institution keeps record of all patients with MM that are diagnosed and followed in our institution. Four radiologists (3 specialists and one resident) assessed these datasets. The inclusion criteria were that WB-DWI (b value of 0 and 800 or 1000 s/mm2) and WB-T1w were available, that no severe anatomical deformities, distortion artifacts or implants were present and that at least one lesion was found by all radiologists (N = 22). Consequently, a retrospective analysis was performed on these 22 WB-MRI datasets, obtained from 16 patients (10 male age 67 ± 13 years; 6 female age 70 ± 6 years) diagnosed with MM. Two female and three male patients had multiple WB-MRI exams during treatment. From the 22 WB-MRI, at the time of the exam, four had Monoclonal Gammopathy of Unknown Significance (MGUS) and 18 had MM, from which 13 had active disease according to the IMWG criteria; two MGUS and three MM were diagnosed at that date; all the others relapsed. Images were anonymized being nominated by a prefix (WB) and a number (from 01 to 22).
WB-MRI was performed using a 1.5 T MRI scanner (Ingenia, Philips Healthcare, Best, The Netherlands). At least three sequences were acquired namely T1w, STIR and DWI (0 and 800 or 1000 s/mm2 b-value). T2-weighted images were also acquired in five cases.
This dataset was used for validation of the semi-automatic segmentation method proposed.
The semi-automatic segmentation can be divided into three main steps: removal of the hyperintense organs based on WB atlas registration [13, 14]; selection of the skeleton and nearby regions, also based on a WB atlas registration; and, finally semi-automatic segmentation on the selected regions based on threshold methods.
Since we did not have a DWI atlas, we started our work by constructing one, as described below. Once created, it was used in this study but can be applied in other similar studies.
DWI atlas building
Seventy-four high b-value (800 or 1000 s/mm2) DWI images from 74 patients (42 male age 69 ± 10, 25 diagnosed with MM, 11 with prostate carcinoma and 6 with follicular non-Hodgkin lymphoma or carcinoma of the transverse colon or nodular lymphoma or clear cell kidney carcinoma; 32 female age 62 ± 10, 32 diagnosed with MM) were selected and anonymized to build a representative DWI atlas. There was no overlap between this dataset and the one used to validate the SA. The difference in the b-values played no role in the construction of the algorithm. Patients gave written informed consent. This representative DWI atlas was achieved by registering (geometrical alignment) the images to each other and then computing their average.
The registration algorithm used to build the image atlas comprised the computation of an optimal rigid transformation followed by affine and a free-form transformation. Multi-resolution registration with four levels was used on both rigid and affine transformations, while one resolution was used for the free-form. The adaptive stochastic gradient descent optimization algorithm with mutual information as similarity metric was used with a maximum number of iterations of 255. Linear interpolation was used for image resampling. The rigid transformation was initialized based on the image’s geometric center. For the free-form deformation model, a grid of control points was placed over the fixed image and the deformation field was determined based on the displacements of the grid. The transformation between control points was chosen to be propagated by cubic B-spline, since they provide excellent alignments with a low computational cost .
Four radiologists (3 specialists and one last year resident) with 10, 7 and 5 years of experience manually segmented the normal hyperintense organs (spleen, kidneys, spinal cord, bladder, and testis) in the male and female atlases, using a semi-automatic tool Level Tracing Effect of 3D Slicer 4.8.1 . Their segmentations were merged by union, i.e., the gold standard segmentation contains all voxels that were selected for at least one radiologist. This guarantees that all voxels of each organ were included.
MM lesions are mainly located in the skeleton or nearby areas. Exceptions are the extramedullary plasmacytomas which are less common, with an incidence of 7–18% in newly diagnosed MM and 6–20% in the course of the disease [17, 18]. Therefore, in addition to removing the normal hyperintense organs, the lesion search area was restricted to the skeleton and nearby areas. This was achieved by manually delineating, on 3D Slicer, the skeleton of the male and female atlases. The head was excluded due to neck glands and sinus liquid hyperintense tissues. The goal was not to have a perfect segmentation of bones, but rather a restricted area that surrounds them, approximately 2 mm.
At this point, there were two DWI atlases, one per each gender, their skeleton and hyperintense organs segmentations.
Automatic selection of the lesions search regions
The atlas was registered to each high b-value validation DWI image, using the same algorithm used to build the atlas. Then, for each validation image, the transformation found was applied to the pre-segmented atlas, obtaining this way the location of the hyperintense organs and skeleton.
To guarantee that the hyperintense organs were completely removed and that the appendicular skeleton, especially the proximal and long bones, were included, these registered binary segmentations were dilated using a spherical kernel of 4 and 6 mm, respectively.
Finally, the hyperintense organs of the validation images were removed and the skeleton with nearby areas were selected for lesion search.
Threshold and connected component segmentation
Lesion identification was based on the imaging characteristics of MM lesions. The lesions are of high signal intensity on DWI images but with equivalent or lower signal intensity than the muscle on T1w [19, 20, 21, 22].
The manual segmentation was done by the same four radiologists that segmented the normal hyperintense organs in the atlases (E1, E2, E3, and E4) using ITK-SNAP 3.6.0 . Radiologists were given access to all the WB-MRI images available (DWI, T1w, T2w, STIR) in the reconstructed coronal plane. After visual analyzing the images, they proceeded with the manual segmentation of lesions.
Radiologists’ manual segmentation of MM lesion lacks high agreement, which is supported by previous studies regarding inter-observer measurements [25, 26]. To overcome this, a majority voting method was used, where a voxel is considered as lesion if at least three out of four radiologists considered it as such. This was taken as the lesion segmentation gold standard (GS).
Dice similarity coefficient (DSC)  was the metric used to assess the segmentation agreement: SA vs GS and radiologist vs radiologist. Also, sensitivity in lesion detection was assessed for SA and each radiologist in relation to the GS. A lesion was considered correctly detected if it overlapped, at least partially, with a lesion in the GS. A non-parametric Wilcoxon test was performed on IBM SPSS version 25 to compare the distributions of the DSC of the SA against GS and against the mean inter-radiologist DSC.
Smart algorithm lesion segmentation
The threshold values predefined for the lesion segmentation based on the DWI and T1w images were not optimal for all patients. The k value ranged from 2.5 to 7.5, being 4 the most frequent value. The cut-off percentage to exclude the lesions based on the T1w varied between 60 and 99%, being 90% the most frequent.
The duration of the lesion segmentation by the SA was, on average, 10.22 ± 0.86 min, per patient.
Mean DSC ± standard deviation (SD) for SA vs GS and inter-radiologist
DSC ± SD
SA vs GS
0.274 ± 0.227
E1 vs E2
0.353 ± 0.277
E1 vs E3
0.364 ± 0.261
E1 vs E4
0.263 ± 0.252
E2 vs E3
0.367 ± 0.237
E2 vs E4
0.264 ± 0.244
E3 vs E4
0.379 ± 0.295
The DSC measured between radiologists showed a poor agreement, and therefore high segmentation variability. Considering all combinations of the four radiologists, an average DSC of 0.332 ± 0.261 was achieved. Taking into consideration that a DSC above 0.7 is suggestive of very good agreement , this result is much lower than would be generally acceptable. Even considering the two radiologists that showed the highest level of agreement (E3 vs E4 0.379 ± 0.295), the agreement was very low. Mean sensitivity of the radiologists was 0.836 ± 0.285.
Considering only as input the DWI images, without T1w information, the SA algorithm achieved a mean DSC against the GS of 0.229 ± 0.215, sensitivity of 0.872 ± 0.202 and PPV of 0.166 ± 0.179. Its distribution is significantly different from the mean DSC of inter-radiologist segmentation (p = 0.024, Wilcoxon test).
Considering as input the DWI and T1w images, the SA vs GS achieved a mean DSC of 0.274 ± 0.227, sensitivity of 0.764 ± 0.276 and PPV 0.217 ± 0.207. Its distribution is not significantly different from the mean DSC of inter-radiologist segmentation (p = 0.108, Wilcoxon test).
ICC and p-values of histogram metrics between SA and GS
In this work, we developed an algorithm to improve the process of segmentation of lesions of MM in WB-DWI images, to allow accurately and rapidly quantification of total lesion burden, by validating against radiologist’s manual segmentation. Quantification of bone marrow lesions (bright on DWI) volume without considering normally hyperintense organs and other structures was made possible due to the development of an atlas-based and a smart lesion detector algorithm. The first allowed the removal of normal hyperintense organs and restriction of the search area to the skeleton or nearby areas from the images to be studied, using a suitable registration procedure. The second applied an outlier detector algorithm to delineate lesions. The segmentation performance was improved by introducing T1w information. Percentiles of single lesions obtained from the outlier segmentation were compared to the average of the intensity of psoas muscle, on T1w.
We have shown the feasibility of applying this segmentation algorithm in a group of 22 WB-DWI images, corresponding to 16 patients diagnosed with MM, in different stages of the disease and where the majority checks for the active MM criteria.
According to the revised IMWG diagnostic criteria for MM, the presence of at least two focal lesions on MRI studies with more than 5 mm in size is a biomarker of malignancy and one of myeloma defining events. According to the GS segmentation, eighteen images from our dataset had at least 2 lesions. Four images had more than 40 lesions. In normal clinical practice, total lesion volume is not assessed. Instead, the biggest lesions diameters are measured, which may be problematic due to non-spherical lesions and decision variation. If its biggest diameter is less than 5 mm, it is too small to characterize as pathological. On the other hand, might be excluding clinical findings. We opted by not to define a minimum size for lesions, assigning more freedom on this.
The high disagreement between radiologists was confirmed by the low mean DSC of the manual segmentation (0.332 ± 0.261) and the high variance in the number of lesions detected, which is shown in Fig. 4. It should be noted that, in general, the higher the number of lesions, the higher the disagreement between the radiologists (Fig. 4), which can be explained by the fatigue and complexity when having to analyze more than 60 images per sequence and counting four sequences. Furthermore, manual delineation hindered the reading process, making errors more propitious. This small test allowed to have a proof of concept about the disagreement between radiologists when defining situations with multiple lesions as in MM, which stresses the need for the development of a robust tool to assist radiologists to better and more rapidly identify lesions and to segment them. This divergence shows how difficult it is to detect and segment lesions in the common clinical practice, which results in misinterpretations between experts of the same field. Tight schedules, high workflow and lack of proper segmentation tools maybe some of the reasons for this low agreement.
The DSC of the smart-segmentation algorithm is not significantly different from the majority voting of radiologists (p = 0.108, Wilcoxon test), achieving a very good sensitivity (0.764 ± 0.276). The SA succeeds in identifying lesions as such but still has false positives. One cause of false positives is the presence of hyperintensities at the periphery of the field of view due to radiofrequency coil effects. When these hyperintense areas are in muscle, for instance, they may not be removed based on the T1w, and thus wrongly classified. Although the agreement SA vs GS is not the ideal, the results show that our method provides equally reproducible segmentations than the manual. Preliminary results obtained in the development stage have shown that using both DWI and T1w as inputs seem to improve the segmentation performance compared to using only the DWI images. This was expected since it is accessing further lesion information, which may help to distinguish between MM lesion and hematopoietic marrow due to reconversion.
The ICC, considering the correctly identified lesions by the SA and the GS, was 0.894 for the total lesion volume and 0.996 for the median ADC values (p < 0.001), which shows the reliability of the volume and median ADC measurements using the SA. This information could be used to assess treatment response by measurements of ADC and lesion volume changes between pre- and post-treatment scans and to distinguish between different MM patterns, for instance, diffuse from normal .
To our knowledge, there is no previous relevant work describing a semi-automatic segmentation of MM lesions on WB-DWI, using an atlas-based approach and comparing it to the segmentation of radiologists. Thus, the performance of our method cannot be directly compared to any method described in the literature. However, in a previous study, a semi-automatic segmentation model using a Markov Random Field was used to infer tumor diffusion volume and ADC, using WB-DWI . Although the results were quite promising, it still requires a lot of user interaction to define the contrast between the lesion and normal tissues and to define a threshold that covers the lesions. Also, this segmentation algorithm was not validated against radiologists, since they only evaluated based on responder/non-responder to treatment. Plus, the associated computational time is of the order of 30 min. Our method benefits from the inclusion of two sequences in the analysis, which increases the confidence in the results and is faster (computational time of approximately 10 min). The latter further supports the notion that optimized semi-automatic registration methods combined with algorithms that quantitatively analyze DWI and T1w images can be used to assist radiologists while defining the total lesion burden. Also, the high ICC of the lesion volume and of the median ADC with that of GS may show the potential of extracting this global measurement out of the segmentation and use it as an indicator of response to therapy.
As far as metabolic imaging is concerned, lytic lesions and extramedullary masses present as avid for 18F-fluorodeoxyglucose (18F-FDG). Thus, positron emission tomography/computed tomography (PET/CT) exhibits high value for its identification, besides distinguishing active from inactive metabolic lesions due to the combination of functional and morphological information. According to the IMWG, 18F-FDG PET/CT is recommended to distinguish active from smoldering MM only when WB-MRI is unavailable and WB X-ray is negative . MRI remains as the preferred and more sensitive method for the assessment of diffuse bone marrow involvement of the spine.
Among the limitations of this study is the fact that for technical reasons, some of the selected images suffered from artifactual hyperintensities that contaminated the results. There is a need for the image acquisition technique to be meticulous, thus minimizing these effects. Another limitation was the exclusion of skull and neck lesions due to technical problems in the raw images, which reduce the accuracy to quantify total lesion burden. Nonetheless, we verified that no radiologist segmented lesions above the neck. Also, the fact that the lesion area is restricted to the skeleton and nearby areas maybe excluding extramedullary plasmacytomas. However, this would not have a significant impact on the total lesion volume assessment, since the majority of lesions are found in the spine, proximal and long bones. Among the 22 WB images evaluated, 15 had previous treatment, thus T2 shine-through effect is not completely excluded just by using DWI and T1w images. Moreover, the incorporation of ADC maps or other sequences, as STIR, opposed-phase and post-contrast T1w images may increase the reliability of the segmentation helping to eliminate false positives, what would benefit this study. Another limitation is the fact that the radiologists’ reading, and segmentation time was not measured in this study. Thus, their time performance was not yet compared with the algorithm. Even though, just for a simple reference, a recent report from the Sociedad Española de Radiología Médica state that radiologists should be given 40 min to analyze WB-MRI datasets , without segmentation.
The proposed method provides equivalent segmentations to the manual ones from radiologists on MM lesion and reproducible results on ADC histogram metrics. This study can be considered as part of the development of automatic detection and segmentation methods for MM lesions, with high sensitivity. The current smart segmentation algorithm might be useful to perform a more objective evaluation of total lesion volume and ADC quantification in order to provide a more accurate assessment of treatment response in patients with MM.
Study Concepts: SDA, JS, FPMO, CM, CJ, NP. Data Acquisition: JS. Quality Control of Data: SDA, JS, JI, ML, JL, AU, CM, NP. Data Analysis and Interpretations: SDA, JS, FPMO, JI, ML, JL, AU, CM. Statistical Analysis: SDA, JS, FPMO, CJ, NP. Manuscript Preparation: SDA, FPMO. Manuscript Editing and Reviewing: JS, FPMO, JI, ML, JL, AU, CM, CJ, NP. All authors read and approved the final manuscript.
Ethics approval and consent to participate
The study was approved by the Ethics Research Committee of the Champalimaud Clinical Centre (CCC) while written patient informed consent was waved due to the retrospective, observational nature of the study.
Consent for publication
The authors declare that they have no competing interests.
- 15.Marstal K, Berendsen F, Staring M, Klein S. SimpleElastix: A User-Friendly, Multi-lingual Library for Medical Image Registration, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW): IEEE; 2016. p. 574–82. https://ieeexplore.ieee.org/document/7789568.
- 19.Koutoulidis V, Fontara S, Terpos E, Zagouri F, Matsaridis D, Christoulas D, et al. Quantitative diffusion-weighted imaging of the bone marrow: an adjunct tool for the diagnosis of a diffuse MR imaging pattern in patients with multiple myeloma. Radiology. 2016;000:160363.Google Scholar
- 23.Tukey JW. Exploratory data analysis. Biom J. 1981;23:413–4.Google Scholar
- 29.Blackledge MD, Collins DJ, Tunariu N, Orton MR, Padhani AR, Leach MO, et al. Assessment of treatment response by total tumor volume and global apparent diffusion coefficient using diffusion-weighted MRI in patients with metastatic bone disease: a feasibility study. PLoS One. 2014;9:1–9.CrossRefGoogle Scholar
- 30.Cavo M, Terpos E, Nanni C, Moreau P, Lentzsch S, Zweegman S, et al. Role of18F-FDG PET/CT in the diagnosis and management of multiple myeloma and other plasma cell disorders: a consensus statement by the international myeloma working group. Lancet Oncol. Elsevier Ltd. 2017;18:e206–17.CrossRefGoogle Scholar
- 31.Sociedad Española de Radiología Médica. Listado de exploraciones; 2016. p. 1–15. Available from: https://www.seram.es/~josepr23/sociedades/SERAM/images/site/catalogo/catalogo_2016_exploraciones.pdf. Cited 2019 May 10Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.