Background

Combined PET/MRI has gradually gained ground in routine clinical practice [1, 2]. The feasibility and utility of PET/MRI for evaluating a single body region have been highlighted in several recent reports [3, 4]. Its whole-body application has also gained attention lately as a potentially effective tool for the evaluation of oncologic diseases and detection of malignancies in various organs [5]. However, one of the major limitations of the PET/MRI has been the lack of a reliable attenuation correction (AC), which is required for accurate quantification of the radiotracer distribution. Several studies have developed innovative techniques for calculating accurate MR-based AC maps that have been shown to improve the quantitative accuracy of PET/MRI on a regional level [6,7,8,9]. However, a demonstration of whole-body implementation, covering skull to feet, is still lacking.

To convert MRI intensities to attenuation coefficients, there is no unique global mapping technique. Commercially available MR-based AC methods are based on a quickly acquired Dixon MRI sequence [10], combined with the segmentation of different tissue classes and assignment of proper linear attenuation coefficients (LACs) to each tissue class [11]. The segmentation can be combined with registration to atlas templates providing constant LACs for bone [12, 13]. However, such methods are not patient-specific and inaccurate registration between images and bone models could increase the risk of errors in PET quantification. Alternative approaches proposed the acquisition of ultrashort echo time (UTE) [14, 15] or zero echo time (ZTE) [16, 17] sequences to obtain a patient-specific signal from the bone. The performance of these methods is limited by issues like high level of noise or long acquisition time, which is not suitable for breath-hold imaging.

Moreover, a major limitation of conventional MR-based AC methods has been truncation due to the limited transaxial field of view (FOV) of MRI compared to PET [18]. Different approaches have been proposed to estimate the missing part of the anatomy from non-attenuation-corrected (NAC) PET [19] or using the maximum-likelihood reconstruction of activity and attenuation (MLAA) algorithm [20]. Alternatively, a fully MR-based approach with B0 homogenization using gradient enhancement (HUGE) sequence was proposed to compensate for the B0 inhomogeneities at the periphery of an increased MRI FOV [21]. The method showed significantly improved PET quantification but replaced the truncated anatomy with a single LAC value.

Recently, artificial intelligence algorithms based on deep learning (DL) convolutional neural networks (CNN), and generative adversarial network (GAN) have demonstrated remarkable potential as alternatives to conventional AC methods [22,23,24,25]. This approach can be time effective and robust to individual patient variations. Many DL-based studies are focused on cross-domain image transition to derive synthetic CT (sCT) images directly from MR images [26, 27]. This approach was initially developed for brain imaging due to the low geometric variance across different imaging modalities [26] but has proven robust and accurate, especially using a large training cohort [28]. Apart from the brain, several studies have been conducted to explore DL-based AC for other regions including head and neck [29], and pelvis [30, 31]. While these studies offer important contributions to improving PET AC in specific anatomical regions, the limited FOV hampers clinical adoption for whole-body PET/MRI. A few studies attempted to expand their method for AC of whole-body PET/MRI imaging [32, 33], but their results were not anatomically accurate across the entire body, possibly due to the challenges of producing well-aligned reference data throughout the body. To overcome this issue, DL-based methods have been proposed to generate sCT from NAC PET images [34] or directly generate AC PET from emission data [35]. However, NAC PET images do not provide explicit information about photon attenuation, but rely on information from the PET tracer of choice. Therefore, models generating sCT from NAC PET become tracer dependent and might work best for non-specific tracers like FDG that relay information from all parts of the body.

With this study, we aim to infer voxel-based sCT suitable for whole-body PET/MRI, covering skull to feet, and using standard Dixon MR images routinely acquired for AC purposes. The method will be based on paired whole-body Dixon MRI and CT data and a deep learning method and will be evaluated with reference to a CT-based PET reconstruction.

Materials and methods

Patients

The data in this study were acquired in two groups. First, a validation cohort of fifteen patients diagnosed with malignant melanoma who underwent whole-body acquisition. The imaging consists of a clinically indicated 18F-FDG-PET/CT examination and a subsequent whole-body PET/MRI acquisition. No additional radiotracer was injected for the PET/MRI acquisition. The second group was employed for model development and consisted of 11 patients scanned over head and neck (2 bed positions) and 20 patients scanned over the thorax and pelvis (2 bed positions). The imaging protocols and detailed information regarding these patients are described in previous studies [9, 29].

Written informed consent was obtained from all patients before the examination.

Image acquisition

CT data

Each whole-body 18F-FDG-PET/CT examination (Biograph mCT, Siemens Healthineers) was performed with arms positioned alongside the body. Approximately 3 MBq/kg 18F-FDG was injected intravenously about 60 min prior to image acquisition. Standardized CT examination protocols included a weight-adapted 90–120 ml intravenous CT contrast agent, as part of the clinical routine. CT imaging was performed with a tube voltage of 120 kV, 0.98 × 0.98 × 2 mm3 voxels, and a reference mAs of 240 using CARE Dose 4D. Data were acquired from the vertex of the skull to the toes using continuous table motion acquisition, except for two patients who were scanned from the skull to mid-thigh.

18 F-FDG-PET/MRI data

All PET/MRI examinations were performed on a 3 T PET/MRI whole-body hybrid imaging system (Biograph mMR, Siemens Healthineers) covering the same anatomy as in PET/CT. PET data were acquired in list mode over seven to eight bed positions with an acquisition time of 3 min per bed position.

For MR-based AC, the standard transaxial two-point Dixon, three-dimensional (3D), volume-interpolated, T1-weighted breath-hold MRI sequence (VIBE) was acquired utilizing a head/neck RF coil, a spine-array RF coil, and five flexible body array RF coil. MRI sequence was acquired with 3.85 ms repetition time (TR), 1.23 & 2.46 ms echo time (TE), and 10° flip angle for MR images with 384 × 312 pixel in-plane dimension. It provided four sets of MR images including T1-weighted in- and opposed-phase, fat and water images, with a transaxial MRI FOV of 500 mm × 408 mm.

To prevent truncation of the peripheral body parts due to the limited transaxial MRI FOV, the HUGE method was applied [21]. The sequence was acquired in the left and right direction with 1610 ms repetition time (TR), 27 ms echo time (TE), 180° flip angle, resolution of 2.3 × 2.3 mm2, and 8 mm slice thickness.

Data preprocessing

Images were preprocessed using Python (version 3.7) and MINC Toolkit (McConnell Brain Imaging Centre, Montreal).

Truncation correction

The composed Dixon in-phase and opposed-phase images were combined with images acquired with the HUGE technique to correct truncation. HUGE images from the left and right side of the body were resampled to match the resolution of the Dixon images using trilinear interpolation. The histogram normalization was performed twice using the inormalize tool (version 1.5.1 for OS X as part of MINC toolkit) to match the intensity with Dixon in-phase and opposed-phase. The resampled and normalized HUGE images were then used to replace voxels in the Dixon images where the arms were truncated. Dixon images were extended by 18 voxels on each side, for a total transaxial FOV of 576 mm × 374 mm.

CT to MRI registration

Whole-body co-registration for the validation cohort was challenging due to the different positioning of patients between the two scans. To secure accurate co-registering of paired whole-body CT and MRI images, Dixon in-phase images were cropped into sub-volumes and registration was performed independently for various anatomical sites. Sub-volumes were defined by joints with a high degree of freedom which led to anatomical regions such as head, neck, torso, pelvic, upper arm, lower arm, upper leg, lower leg, and feet. For each sub-volume, registration was performed in two steps. First, CT images were rigidly aligned to the corresponding Dixon in-phase images using a set of landmarks. The transformation file was used to initialize a deformable registration using the freely available registration package, NiftyReg [36] (Centre for Medical Image Computing, University College London). Sub-volumes were drawn with an overlap of two voxels in which the co-registered CT was averaged. Finally, co-registered sub-volumes were stitched together forming the whole-body co-registered CT volume. A thorough visual inspection was performed to validate each individual registration.

Registration between CT and MRI for regional data was less challenging due to similar positioning between the two scans. However, registration was performed for each bed position separately using the method described above.

Network structure

The deep CNN with 3D U-Net architecture [37] used in this study was developed with convolutional encoder/decoder parts for generating sCT in LAC from MR images (as shown in Additional file 1: Fig. S1). The overall architecture of our model was a slight modification of the DeepDixon network presented by Ladefoged et al. [23]. Dixon in-phase and opposed-phase MRI were the inputs to the network and integrated by the first convolutional layer with 32 different kernels. Each of the encoder and decoder paths contains 3 × 3 × 3 kernels, followed by a batch normalization (BN) for faster convergence and a rectified linear unit (ReLU) activation function. At the end of each convolution operation, a similar convolutional layer with strides of 2 is attached for downsampling.

Model training

The training of the model was performed using pairs of Dixon MRI and CT-based AC volumes. Prior to the training, images were resampled to the isotropic voxel size of 2 mm and normalized to zero mean and unit standard deviation. A binary mask was derived from Dixon in-phase images to clear the CT images from elements outside the body, before transforming the voxels into LACs at 511 keV. Subsequently, we extracted 3D patches from 288 × 192 × S with a stride of 4, where S refers to the number of slices and varies between patients depending on their length. The networks were implemented in TensorFlow (version 2.1.0). Our model used mean absolute error as the loss function, Adam optimizer with a learning rate of 5 × 10–5 trained for 1000 epochs with a batch size of 16 (random selection of patches). Computations were performed on two IBM Power9 each equipped with four Nvidia Tesla V100 GPUs and a Lenovo SR650_v2 with four Nvidia A40 GPUs. The networks used 3D MRI volumes as a 2-channel input consisting of 16 full adjacent transaxial slices (288 × 192 × 16 slices) and output the corresponding slices of sCT in LAC (288 × 192 × 16 slices). First model was trained using the regional data containing thorax, pelvis, and head and neck regions with transfer from a pretrained model on 811 brain scans [23]. Subsequently, we created an updated model using the whole-body database with transfer learning from the first model. In this step, we used a leave-one-out cross-validation approach. To avoid artificially increased bias, CT slices with a metal artifact from pelvis and knee were not included in the training. For each whole-body dataset, the sCT volume was generated from the MRI volumes.

Image reconstruction

PET images from raw data acquired on PET/MRI scanner were reconstructed offline (e7-tool, Siemens Healthineers) using 3D ordinary Poisson ordered-subset expectation maximization (3D OP-OSEM) algorithm with 3 iterations, 21 subsets in 344 × 344 image matrix, and a Gaussian filter with 4 mm full width at half maximum (FWHM). For each patient, PET images were reconstructed using three different attenuation maps: co-registered CT-based AC map serving as a standard of reference (PETCT), deep learning-derived sCT map (PETsCT), and vendor-provided atlas-based map (PETAtlas).

Analysis

Data analysis involved resampling all reconstructed PET images to match the voxel size of the attenuation map.

sCT evaluation

The generated sCT images in LAC were converted back to HU and compared to CT on a voxel-wise basis.

For each patient, the accuracy of sCT relative to CT was compared by measuring the mean absolute error (MAE) within the body contour. The ability of the methods to correctly estimate bony tissues was evaluated using the Dice similarity coefficient that measures the overlap between the segmented bones on sCT and CT volumes. In both images, voxels with HU higher than 300 were classified as bone.

PET evaluation

For quantitative assessment, the intensity values in PET images were converted to standardized uptake value (SUV). Considering PETCT as the ground truth, the performance of MR-based AC methods in quantifying radiotracer uptake in PETsCT and PETAtlas were compared for the entire body as well as specific regions including lung, liver, spinal cord, femoral head, iliac bone, and aorta. These regions were manually segmented on the reference CT. Quantitative assessment was compared using relative difference (Rel%) and absolute relative difference (Abs%) for all the voxels within the above-mentioned regions, using the following formula:

$${\text{Rel}}\% = \frac{{{\text{PET}}_{X} - {\text{PET}}_{{{\text{CT}}}} }}{{{\text{PET}}_{{{\text{CT}}}} }} \times 100\%$$
$${\text{Abs}}\% = \frac{{\left| {{\text{PET}}_{X} - {\text{PET}}_{{{\text{CT}}}} } \right|}}{{{\text{PET}}_{{{\text{CT}}}} }} \times 100\%$$

For a fair comparison, PET slices with a metal artifact in CT-based AC map were ignored in the voxel-wise calculation of relative differences. The mean SUV (SUVmean) was calculated for the segmented regions.

Moreover, all the voxels within the specified regions were pooled over all subjects and the accuracy of the two MR-based AC maps on PET quantification was compared to PETCT in a joint histogram.

Results

Among the validation cohort of fifteen, one patient was suffering from severe scoliosis. Two patients had hip and knee implants, where the slices with artifact were excluded from training. In addition, one patient has scanned arms up in PET/CT, and therefore slices with arms were excluded from training.

Accuracy of sCT

Figure 1 illustrates the comparison between MR-based AC maps and the reference CT of a representative patient. The CT image of some patients showed streaking artifacts due to the dental implants. However, the CNN could generate sCT without these artifacts as exemplified in Fig. 1. The sCT appeared to be very similar to the CT except for some discrepancies in the bony structures and location of air cavities. The HU difference of the sCT compared to CT was smaller than vendor-provided atlas-based map with MAE within the body contours of 62 ± 110 and 109 ± 202, respectively. However, there is a slight underestimation of the sCT HU value as illustrated by the overall redness in the HU difference map in Fig. 1. The mean Dice coefficients for bone in sCT and atlas-based maps were 0.68 and 0.34, respectively.

Fig. 1
figure 1

Whole-body coronal images of a representative patient as well as transaxial views of various intersections. Presented from left to right: reference CT, deep learning-derived sCT, vendor-provided atlas-based map, the HU difference between CT and sCT, and the HU difference between CT and atlas

Effect on PET AC

The PET images reconstructed with different AC maps, PETCT, PETsCT, and PETAtlas, along with the percentage difference maps, are shown in Fig. 2. According to the difference maps, the DL-based method exhibited an overall superior performance to the vendor-provided method. PETAtlas demonstrated the effect of errors in bone registration and also the absence of bone information in some regions, where the activity around the bony regions is underestimated. A Similar pattern can be observed for PETsCT, but with a considerably lower deviation. To further evaluate the performance of different AC maps in the entire body as well as at a regional level, joint histograms are shown in Fig. 3. The SUV value of PETCT and PETsCT (R2 = 0.98) is distributed closer to the equality line as compared to PETCT and PETAtlas (R2 = 0.96).

Fig. 2
figure 2

Representative images showing reference PETCT, PETsCT, and PETAtlas images as well as the corresponding relative difference map of PETsCT and PETAtlas as compared to the reference PETCT

Fig. 3
figure 3

Joint histograms for A PETCT and PETsCT, and B PETCT and PETAtlas of voxels within the body contour as well as lung, bone, and liver mask

PETsCT produced an average Abs% of 6.1% ± 11.9% and an average Rel% of 1.5% ± 14.1% in SUV values for all the voxels within the body contour across all patients, while these quantitative metrics for PETAtlas were 11.8% ± 23.6% and − 0.5% ± 26.2% respectively. Table 1 provides an average Abs% and Rel% of SUV values in PETsCT and PETAtlas for different anatomical regions. The comparison between the two methods in different regions is more evident in Fig. 4. PETsCT generally has lower errors than PETAtlas within all regions of interest except the liver. For both methods, the PET uptake in lung is underestimated and has the highest variation compared to other regions, while the uptake in liver has the least variation. The standard deviations of SUV differences with respect to the CT AC are always lower for PETsCT than PETAtlas.

Table 1 Statistics for PET quantification errors of the two AC methods: the relative difference (Rel%) and absolute relative difference (Abs%) are reported for all the voxels within the specified regions
Fig. 4
figure 4

Percentage difference in PET SUVmean of regions averaged across all patients for PETAtlas (red) and PETsCT (blue) compared with PETCT

Apart from the overall performance of both methods, we wished to investigate how our DL-based AC method performs on a patient with abnormalities. Figure 5 depicts AC maps and PET images of the patient with scoliosis presented with the percentage difference maps. The vendor-provided atlas-based method was not able to adapt to the sideways curvature of the spine, which lead to significant quantification error around the spine. The DL-based method, however, showed promising results in generating an sCT volume that accurately reflected the abnormal anatomy and consequently minimal SUV bias.

Fig. 5
figure 5

A clinical case with severe scoliosis. Dixon MRI images (first row); Reference CT, sCT, and atlas-based AC maps (second row); corresponding PET image (third row); percentage difference map (last row)

Discussion

Our study explored the feasibility of a DL approach for more accurate and robust whole-body attenuation correction in PET/MRI systems. The application of the conventional atlas-based method for whole-body PET/MRI is challenging because of its limitation in accurate bone estimation as well as arms truncation and anatomical variations in the chest and abdomen. Here we performed truncation correction of Dixon images and used them as the training inputs to generate DL-based sCT which proved an effective tool for PET AC and outperformed the current clinical approach.

The DL method employed in this study has previously been explored for different regions [23, 30]. However, the novelty of this study is the development of a model addressing whole-body PET/MRI AC. Qualitative analyses of sCT provided results similar to reference CT and indicated a better estimation of patient-specific AC map, particularly in lung and bone. Our method showed a better estimation of AC map compared to the atlas-based method. The largest inaccuracies of the sCT were observed at the bone boundaries in pelvis and spine, possibly due to the slightly blurred appearance of the sCT and inaccuracies in MRI to CT registration. The atlas-based method shows relatively large discrepancies in the bony structures mostly due to inaccurate atlas registration and missing bones in some regions. Moreover, sCT was exposed to considerable variations in air pocket location in the abdomen due to different air configuration between MRI and CT acquisitions. However, sCT is expected to align well with MRI images, while the atlas-based method cannot handle air pockets in AC maps.

A quantitative PET evaluation showed an average underestimation of 1.5% using the DL-based method, compared to the CT-based reference. Notably, the absolute error was significantly lower for PETsCT (6.1%) than for PETAtlas (11.8%) signifying a more accurate estimation of tracer uptake as also shown by smaller standard deviations of errors. Joint histogram plots indicate that the DL-based AC method produces PET images that are more similar to the reference than the vendor-provided atlas-based method. The narrow distribution for PETsCT suggests it is more accurate than the PETAtlas.

PETsCT demonstrated the lowest errors in comparison with PETAtlas in all regions; however, the largest bias was observed in lung and the smallest biases occurred in brain and liver. The atlas-based method allocated a predefined attenuation coefficient to the entire lung, which led to the overall underestimation of PET tracer uptake, whereas sCT images displayed a continuous density signal in this region. An inaccurate estimation of lung tissue LAC was observed in sCT, but its impact on PET quantification was less severe. Also in earlier PET/MRI AC publications, the lung demonstrated a substantially larger error compared with other body regions [12, 37]. Regions representing bone also showed significant errors in PETsCT, especially iliac bone in pelvis due to the high density of bone in this region.

A previous evaluation of the conventional method for whole-body PET/MRI AC was in line with our observations reporting an underestimation of − 4.9% ± 7.7% in bone [12]. Our DL model reduced the error from − 8.7 to − 2.3% in bone, excluding the lower legs where the atlas-based method does not include bone at all.

Some studies have utilized diagnostic MR images to generate AC maps with DL approaches [28, 38]. Despite the more accurate anatomical information in diagnostic images, it is not practical for whole-body scans, given the long scan time and different contrast of diagnostic MR images. Development of our DL method relies on a unique dataset of paired co-registered whole-body Dixon MRI and CT-based AC volumes for training, representing a variety of anatomical features. It has the strength of using a fast sequence which is less sensitive to anatomical variation and patient movement. Dixon MR images have been investigated for the pelvic region as training inputs alone [31] and in combination with ZTE images [30], also leading to a reduction in PET quantification errors.

The findings of previous studies involving DL-based AC mainly stem from a regional analysis of 18F-FDG PET. In our previous work [9], we demonstrated the feasibility of the proposed method using a different radiotracer, 68 Ga-RGD, in the pelvic region. In contrast, only limited research has been done on the whole-body application of DL-based techniques due to the limited availability of paired whole-body CT and MR images [33]. To overcome the challenge regarding whole-body registration between the two scans, methods using CycleGAN have been proposed for training on unpaired data and showed promising results for brain [39]. Another DL approach relied on NAC PET images for whole-body AC PET, which fails in complex anatomical regions due to the lack of structural information and has shown relatively high error in the lungs [35]. Recent studies relying on NAC PET images improved the accuracy of AC maps by simultaneous reconstruction of the activity in PET images using MLAA algorithm [40]. This method is not feasible without time-of-flight method and was limited with the activity distribution which resulted in a high level of noise on the AC map [41].

Moreover, the main concern about all MRI-based AC methods is their performance for cases with abnormalities. A recent study compared the performance of the atlas-based method with a DL-based approach in cases with severe body truncation, metal artifacts, abnormal anatomy, and lesions in the lungs [42]. They reported overall promising results for sCT and suggested more detailed studies addressing metal artifact and body truncation issues. In our model, the truncation artifact was significantly reduced compared to similar studies [43, 44]. It also outperformed the atlas-based method in a case with severe scoliosis, where the conventional method was unable to predict the sideway curve of the spine correctly. In some of the cases with dental implants, the extent of the artifact in MR images was localized to a greater extent than in CT as shown in Fig. 1, which made sCT a valuable source for predicting anatomical information in a corrupted region. Other studies, however, reported failures of the DL-based method in patients with dental artifacts [23].

This study had certain limitations. First, our training dataset was relatively small. Training data are a significant contributor to overall model performance [23]. However, transfer learning has been used to overcome small training group sizes to some extent. Additionally, due to the different bed shapes and patient positioning between the two scans, our training quality was prone to inaccuracies as a result of registration errors between MRI and CT. A relatively large mismatch was observed for the patient arms, due to the higher degree of freedom. It should also be mentioned that CT is not an ideal reference for training since CT cannot directly reflect the attenuation information for photons at 511 keV. It may be possible in the future to use a reproducible positioning system for whole-body imaging and a larger database containing the subjects with anatomical abnormalities to improve the robustness of the model. Different training and testing datasets obtained from different scanners also would be valuable to evaluate the clinical translation of our method.

Conclusion

This study demonstrated the feasibility of a DL method to improve MR-based PET attenuation correction in whole-body PET/MRI, covering skull to feet. The sCT images, inferred from routinely acquired standard Dixon MR images, demonstrated only a small quantification error compared to a CT reference and better performance than the conventional vendor-provided atlas-based method. The sCT significantly improved PET AC in bone and lung proved and is likely to be a valuable tool for clinical implementation.