Study design and population
The study design was a prospective observational diagnostic test accuracy study, approved by the institutional review board. Patients with an established new diagnosis of myeloma as per IMWG criteria  who were planned to be imaged with both WB-CT and WB-MRI examinations before starting treatment between 2013 and 2017 were prospectively included following written informed consent. Patients with second malignancies were excluded.
Low-dose WB-CT (mean radiation dose 5 mSv) was acquired with a 128-slice CT scanner (Somatom Definition Flash, Siemens Healthineers), 120 kV, 50 mAs, and 0.5-s pitch. Axial images were reconstructed to 3 mm for review. All subjects were scanned supine with arms by their sides and the images were acquired from the skull vertex to the toes. No intravenous iodinated contrast was administered. Axial images of 1-mm slice thickness were reconstructed to secondary coronal and sagittal images for review. Axial images for bone and soft tissue assessment were reconstructed from the raw data obtained during scanning: for bone assessment using sharp (B50f) kernel and for soft tissue assessment using soft (B20f) kernel. Secondary coronal and sagittal reconstructions were generated using a slice thickness of 2 mm and slice increment of 1.5 mm. The typical duration for WB-CT examination was less than 5 min. The dose-length products (DLP) for the WB-CT examinations were recorded.
WB-MRI studies were performed using an Avanto 1.5-T system (Siemens Healthineers). All subjects were scanned supine with arms by their sides. Coil elements were positioned from the skull vertex to the knees. Sagittal T1-weighted images (TR 590 ms, TE 11 ms, FOV 400 mm, slice thickness 4 mm) and T2-weighted images (TR 2690 ms, TE 93 ms, FOV 400 mm, slice thickness 4 mm) of the spine were acquired, followed by axial diffusion-weighted sequences (single-shot double spin-echo echo-planar technique with STIR fat suppression in free breathing) using b values of 50 and 900 s/mm2 applied in 3 orthogonal directions and combined to the isotropic trace images. Diffusion-weighted images were acquired in multiple contiguous stations of 50 slices per station (slice thickness 5 mm, no gap, FOV 430 mm, phase direction AP, parallel imaging (GRAPPA) factor 2, TR 14800 ms, TE 66 ms, inversion time (TI) 180 ms, voxel size 2.9 mm × 2.9 mm × 5 mm, number of signal averages 4, matrix 150 × 150, bandwidth 1960 Hz per pixel). Axial T1-weighted Vibe Dixon 3D gradient echo breath-hold sequences (52 slices per slab, FOV 470 mm, TR/TE 7/2.38, 4.76 ms, flip angle 30, matrix 192 × 192) were also acquired, matching the acquisition stacks and partition thickness to the DWI. No intravenous gadolinium contrast was used. The typical duration for WB-MRI examination was 45 min.
For each body region (skull, cervical spine, thoracic spine, lumbar spine, pelvis, ribs/other, long bones), two radiologists each with > 10 years of experience, blinded to clinical information and the MRI findings, made a categorisation of disease burden on WB-CT with a previously described scoring system [4, 13]. This allowed the assessment of the number of lesions (> 20, 10–20, < 10, 0) and largest lesion dimension (> 10, 5–10, < 5, 0 mm) for each body region, assigning a score from 3 to 0 for each characteristic (lesion number and size), i.e. score 3 for > 20 lesions, score 2 for 10–20 lesions, score 1 for < 10 lesions and score 0 for 0 lesions; score 3 for > 10 mm, score 2 for 5–10 mm, score 1 for < 5 mm and score 0 for 0 lesions. The maximum lesion dimension was measured on the window setting in which the lesion was the most readily appreciated. A total score was then calculated for the whole skeleton. To achieve the final observer scores, discrepancies were resolved by a consensus reading facilitated by a third experienced radiologist. At a different time, the image reading was repeated for the WB-MRI data with readers blinded to the clinical information and CT findings. The maximum lesion dimension was measured on the sequence in which the lesion was the most readily appreciated. The image reading for WB-CT and WB-MRI was subsequently repeated by another pair of junior radiologists (< 1-year experience as a consultant).
Intraclass correlation coefficient (ICC) estimates and their 95% confidence intervals (95%CI) were calculated using a two-way random absolute single measures model. Statistical analyses were performed using IBM SPSS Statistics for Windows Version 25.0 (SPSS Inc.). ICC values less than 0.5 are considered to be indicative of poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good reliability, and values greater than 0.90 indicate excellent reliability . With two observers, to detect the smallest possible value of 0.5 for ICC, using a two-sided test, with a pre-specified 5% significance level test (α = 0.05) and a power of 80% (β = 0.2), the required sample size is approximately 22 . The median and interquartile ranges (IQR) of the consensus observer scores on WB-MRI were compared with those on WB-CT, and the Wilcoxon signed-rank test was used to test the null hypothesis that the average signed rank of the two samples is zero. Spearman’s rank correlation coefficients were used to evaluate whether the WB-MRI and WB-CT scores of one observer correlated with the analogous scores of the other observer on a per-region and per-patient basis. A value of p < 0.05 was taken to be statistically significant in all tests.