Novel adversarial semantic structure deep learning for MRI-guided attenuation correction in brain PET/MRI
- 489 Downloads
Quantitative PET/MR imaging is challenged by the accuracy of synthetic CT (sCT) generation from MR images. Deep learning-based algorithms have recently gained momentum for a number of medical image analysis applications. In this work, a novel sCT generation algorithm based on deep learning adversarial semantic structure (DL-AdvSS) is proposed for MRI-guided attenuation correction in brain PET/MRI.
Materials and methods
The proposed DL-AdvSS algorithm exploits the ASS learning framework to constrain the synthetic CT generation process to comply with the extracted structural features from CT images. The proposed technique was evaluated through comparison to an atlas-based sCT generation method (Atlas), previously developed for MRI-only or PET/MRI-guided radiation planning. Moreover, the commercial segmentation-based approach (Segm) implemented on the Philips TF PET/MRI system was included in the evaluation. Clinical brain studies of 40 patients who underwent PET/CT and MR imaging were used for the evaluation of the proposed method under a two-fold cross validation scheme.
The accuracy of cortical bone extraction and CT value estimation were investigated for the three different methods. Atlas and DL-AdvSS exhibited similar cortical bone extraction accuracy resulting in a Dice coefficient of 0.78 ± 0.07 and 0.77 ± 0.07, respectively. Likewise, DL-AdvSS and Atlas techniques performed similarly in terms of CT value estimation in the cortical bone region where a mean error (ME) of less than −11 HU was obtained. The Segm approach led to a ME of −1025 HU. Furthermore, the quantitative analysis of corresponding PET images using the three approaches assuming the CT-based attenuation corrected PET (PETCTAC) as reference demonstrated comparative performance of DL-AdvSS and Atlas techniques with a mean standardized uptake value (SUV) bias less than 4% in 63 brain regions. In addition, less that 2% SUV bias was observed in the cortical bone when using Atlas and DL-AdvSS approaches. However, Segm resulted in 14.7 ± 8.9% SUV underestimation in the cortical bone.
The proposed DL-AdvSS approach demonstrated competitive performance with respect to the state-of-the-art atlas-based technique achieving clinically tolerable errors, thus outperforming the commercial segmentation approach used in the clinic.
KeywordsPET/MRI Brain imaging Attenuation correction Quantitative imaging Deep learning
Positron emission tomography (PET) and magnetic resonance imaging (MRI) have emerged as leading medical imaging modalities enabling the early detection and characterization of human diseases on standalone scanners or on hybrid PET/MRI systems, enabling concurrent morphological and molecular characterization of tissues. At the present time, molecular PET imaging is capitalizing and complementing anatomical MR imaging to answer basic research and clinical questions. PET/MRI is attractive owing to MRI’s multiparametric imaging capabilities, superior soft-tissue contrast compared to CT, and the fact that MRI does not use ionizing radiation with the consequence of reduced radiation dose to patients . The bulk of PET/MRI research to date is focusing on optimizing instrumentation design and building MR-compatible PET detectors and readout technologies and addressing the challenges of quantitative imaging biomarkers using this technology through the development of appropriate schemes for MRI-guided PET attenuation correction , partial volume correction , motion compensation  and more recently synergistic functional-structural image reconstruction [5, 6]. Photon attenuation, considered as one of the major physical degradation factors hindering quantitative PET imaging, is typically dealt with through the use of electron density information provided by computed tomography (CT) on combined PET/CT systems. The major challenge on combined PET/MRI is the lack of direct correlation between MR intensities and attenuation properties of biological tissues, which renders direct attenuation map estimation from MRI difficult .
The strategies proposed in the literature to generate attenuation correction (AC) maps from MRI can be classified into three generic categories : Segmentation-based approaches (including dedicated MR sequences enabling depiction of bones) classify MR images into a number of tissue classes followed by assignment of predefined attenuation coefficients [8, 9, 10]; atlas-based mapping and machine learning approaches in which a co-registered MRI-CT atlas database is used to generate a synthetic CT image through a mapping function [11, 12, 13] or a learning process that predicts the synthetic CT (sCT) images from patient-specific MR images. The availability of time-of-flight (TOF) information enabled the implementation of joint activity/attenuation reconstruction of the PET emission data with or without the exploitation of MRI information [14, 15, 16]. Machine learning or deep learning techniques, including random forest  and neural network methods [18, 19, 20, 21], emerged as a promising approach enabling generation of AC maps directly from MR images through prior training using samples of MRI and CT pairs.
Over the past few years, deep convolutional neural networks (DCNN) have been widely employed in medical imaging showing promising results in image segmentation, denoising and reconstruction and radiomics analysis [22, 23]. With respect to synthetic CT generation tasks, a number of pioneering studies have demonstrated the potential of DCNN approaches in brain [18, 21, 24, 25, 26] and pelvic [19, 27, 28, 29, 30] PET/MR attenuation correction or MR-only radiation therapy.
In this work, a novel adversarial semantic structure deep learning method is proposed to predict continuous AC maps from structural MR images suitable for attenuation correction in brain PET/MRI studies. Existing DCNN algorithms, based on either fully convolutional networks or generative adversarial networks, do not explicitly take semantic structure learning into consideration, which might result in incorrect tissue synthesis. This issue is addressed herein by using adversarial semantic structure learning implemented as CT classification into a number of tissue classes to regularize the main adversarial MRI to CT synthesis process. The performance of the proposed technique is compared to previously proposed atlas- and segmentation-based approaches using CT-based PET attenuation correction (PETCTAC) as reference.
Materials and methods
PET/CT and MRI data acquisition
18F-FDG PET/CT and MRI brain studies of 50 patients referred to the nuclear medicine division of Geneva University Hospital were retrospectively employed for the quantitative evaluation of the proposed deep learning method. The patient population included a cohort of 28 women and 32 men (mean age = 61 ± 12 years). The clinical indications included neurodegenerative disease (44), epilepsy (3) and grading of brain tumors (3). Ten out of the 50 patients were excluded from evaluation including one epilepsy, one brain tumor and eight patients suffering from neurodegenerative disease. Ten patients were excluded because of minor misalignment between CT and MR images or corrupted PET raw data. The first-line diagnostic step included an MRI scan on a 3 T MAGNETOM Skyra (Siemens Healthcare, Erlangen, Germany) with a 64 channel head coil. The MRI scans included a 3D T1-weighted (magnetization-prepared rapid gradient-echo (MP-RAGE)) sequence using the following parameters TE/TR/TI, 2.3 ms/1900 ms/970 ms, flip angle 8°; NEX = 1. The 50 patients underwent an 18F-FDG PET/CT scan on either the Biograph mCT (15 patients) or Biograph 64 True Point (35 patients) scanners (Siemens Healthcare, Erlangen, Germany). Low-dose CT scanning (120 kVp, 20 mAs) was performed for PET attenuation correction. This was followed by PET acquisitions (~30 min post-injection) lasting 20 min after injection of 210.2 ± 13.9 MBq of 18F-FDG. The original T1-weighted MR images were acquired in a matrix dimension of 255 × 255 × 250 with voxel size of 0.86 × 0.86 × 1 mm. The reference CT images were saved in a matrix of 512 × 512 × 150 with voxel size of 0.97 × 0.97 × 1.5 mm. Due to the temporal gap between the acquisition of MRI and CT, MR images were aligned to the corresponding CT using a combination of rigid and non-rigid deformation based on the mutual information criteria. After alignment of MRI to CT images, MR images were converted to the CT resolution for training and validation. PET images were reconstructed in 200 × 200 × 109 and 168 × 168 × 100 resolution from the Biograph mCT and Biograph 64 True Point scanners, respectively, with a voxel size of 4 × 4 × 2 mm. PET images were reconstructed in 200 × 200 × 109 and 168 × 168 × 100 resolution from the Biograph mCT and Biograph 64 True Point scanners, respectively, with a voxel size of 4 × 4 × 2 mm. PET image reconstruction was performed using the Siemens VG50 e7 tool with an ordinary Poisson ordered subsets-expectation maximization (OP-OSEM) algorithm (4 iterations, 21 subsets). The single scatter simulation (SSS) algorithm with tail fitting scaling was employed for the scatter correction. Post-reconstruction filtering was performed using a Gaussian filter with 5 mm FWHM. For the sake of consistency between the two PET acquisitions, PET image reconstruction was performed without time-of-flight and point spread function information. The study protocol was approved by the ethics committee of Geneva University Hospitals and all patients gave informed consent.
Deep learning adversarial semantic structure (DL-AdvSS) model
Overview of the DL-AdvSS model
The deep learning adversarial semantic structure (DL-AdvSS) model consists of two major compartments: synthesis generative adversarial network (SynGAN) and segmentation generative adversarial network (SegGAN) (Fig. 1). The SynGAN block generates sCTs from the input MR images whereas the SegGAN block segments the generated sCTs into four tissue classes, namely, air cavities, soft-tissue, bone and background air. CT image segmentation was performed automatically using the following intensity thresholds: bone (>160 HU), air cavities (<−400 HU inside the body contour), soft-tissue (between −400 HU and 160 HU), background (otherwise). SegGAN aims at regularizing the main CT synthesis procedure through back-propagating gradients from the SegGAN block to the main sCT generation process (dashed lines from SegGAN block to the SynGAN block in Fig. 1). The SynGAN compartment consists of the two synthesis generator (Gsyn) and synthesis discriminator (Dsyn) networks. Likewise, the SegGAN compartment includes the segmentation generator (Gseg) and segmentation discriminator (Dseg) networks as illustrated in Fig. 1. Gsyn is the main core of the DL-AdvSS model which aims at generating sCT images from MRI. The resulting sCT is then fed into Gseg to perform the semantic segmentation. Each of the SynGAN and SegGAN blocks consists of two loss functions as described below.
SynGAN loss functions: LDsyn and LGsyn are the two loss functions of the SynGAN block. Dsyn aims to distinguish whether the input image is real or synthetic CT such that label 1 is returned for real CT and 0 for synthetic. LDsyn (loss of Dsyn) is defined as a binary cross entropy loss (LbCE) between the classification and label result of LDsyn as formulated in Eqs. 1 and 2.
SegGAN loss functions: LDseg and LGseg are the two loss functions of the SegGAN block. Gseg attempts to generate segmentation which is expected to trick Dseg. Dseg, as opposed to Gseg, focuses on discriminating the real segmentation from the fake one produced by Gseg. To this end, the loss function for Dseg is specifically defined as:
The architectures of SynGAN and SegGAN are similar; however, they are governed by two different loss functions and provide different outcomes. The network training is elaborated in the Supplemental Materials section.
Architecture of the DL-AdvSS model
Dsyn and Dseg in the DL-AdvSS model (Fig. 1) share the same convolutional neural network containing three convolutional layers followed by three fully connected layers. Batch normalization (BN) layers were intermediately inserted to enhance the networks performance and accelerate the training process . As the activation function, the rectified linear units (ReLU) were used for non-linear transformation . A filter size of 3 × 3 × 3 and stride step of 2 were used in all the 3D convolutional layers of Dsyn and Dseg. The entire convolutional and deconvolutional layers in Dsyn and Dseg were linked to BN and ReLU layers, except the layer before the output of the network (last layer). The sigmoid and softmax activation functions were utilized for Gsyn (sCT generation process) and Gseg, respectively. The dimensionality of the output layer for the three fully connected layers are 512,128 and 1, respectively. Additional details about the network layers are provided in Supplemental Tables 1 and 2 in the Supplemental materials section. The training of the network was performed on 3D images using a patch size of 224 × 224 × 32 voxels. The implementation details are covered in the Supplemental Materials section.
Comparison with alternative attenuation map generation techniques
Segmentation-based approach (Segm)
The segmentation-based approach implemented on the Philips Ingenuity TF PET/MR system (Philips Healthcare, Cleveland, Ohio) entails segmentation of brain MR images into two tissue classes: soft-tissue and background air, ignoring fat, bone and internal air cavities . Mean CT values of −1000 and 0 HU were assigned to the background air and soft-tissue classes, respectively. Then, the resulting AC maps were completed by inserting the corresponding scanner bed extracted from the patient’s CT image.
Atlas-based approach (Atlas)
The Atlas-based approach (Atlas) was originally developed and validated in the context of MRI-guided radiation therapy using in-phase Dixon MR images . The Atlas approach entailed pair-wise alignment of atlas MR images to the target MRI using a two-fold cross validation scheme. Given the MRI-to-MRI transformation maps, CT atlas images were accordingly mapped to the target MRI. In the first phase of sCT estimation, bony structures are extracted from the target MRI through voxel-by-voxel atlas voting. The outcome of this step is a binary bone map representing the most likely bone tissue delineation of the target MRI. The obtained bone map will be later (in the second phase) exploited to guide the atlas CT fusion task with special emphasis on bony structures. Bone extraction from the target image at each voxel relies on morphology likelihood between the target and atlas MR images as well as the bone label prior.
The inter-image morphology likelihood was calculated using the phase congruency map (PCM) having the ability to detect significant image features and robustness to inter-subject intensity variation and noise. The bone label prior was calculated based on the signed distance transform estimated on the bone label maps. The estimated bone from the target MRI was later used to define weighting factors for atlas images. The continuous valued sCT images were then generated using a voxel-wise weighted atlas fusion framework.
The first part of the evaluation involved the assessment of CT values estimation and bone extraction accuracy using the different sCT generation techniques. To this end, bony structures were extracted from the reference CT and sCT images using two intensity thresholds of 150 HU (entire bone tissue) and 600 HU (cortical bone). The accuracy of bone delineation was assessed using standard segmentation metrics including the Dice similarity coefficient (DSC), relative volume difference (RVD) and mean absolute surface distance (MASD) with respect to reference CT images. For each method, the above-mentioned metrics were calculated as follows:
The ground truth soft-tissue regions were extracted from reference CTs using lower and upper intensity thresholds of −465 and 150 HU, respectively. The PET data of each patient were reconstructed four times using the different attenuation maps, namely, reference CT, Segm, Atlas and DL-AdvSS. This was performed using the Siemens VG50 e7 tool with an ordinary Poisson ordered subsets-expectation maximization (OP-OSEM) algorithm (4 iterations, 21 subsets). Prior to attenuation correction, CT and sCT images were down-sampled to PET’s image resolution (from 1 × 1 × 1.5 mm to 4 × 4 × 2 mm) followed by a 4-mm FWHM Gaussian filter to match PET’s spatial resolution. The quantitative evaluation of the brain PET data was performed using the Hermes BRASS analysis tool (Hermes Medical Solutions AB, Sweden) for 63 brain regions.
Moreover, MEPET and MAEPET were estimated for bone, soft-tissue, air and the entire head regions. The correlation between PETCTAC activity concentration and those generated by the different AC methods were evaluated using Pearson correlation analysis. The paired t-test method was used to calculate the statistical significance of the differences between the three AC methods. Differences with a p value less than 0.05 were considered statistically significant.
In Eq. (14), V is the total number of voxels in the head region, Iref is the reference image (ground truth CT or PET-CT AC), and Itest is the test image (synthetic CT or PET-sCT AC). In Eq. (15) Pk is the maximum intensity value of Iref or Itest whereas MSE is the mean squared error. Mref and Mtest in Eq. (16) denote the mean value of the images Iref and Itest, respectively. δref,test indicates the covariance of δref and δtest, which in turn represent the variances of Iref and Itest images, respectively. The constant parameters K1 and K2 (K1 = 0.01 and K2 = 0.02) aim to avoid a division by very small numbers.
Quantitative accuracy of the estimated bone tissues using the Atlas, DL-AdvSS and Segm approaches in 40 patients. Bone extraction was carried out using two intensity thresholds of >150 HU and > 600 HU
0.85 ± 0.05
27.6 ± 10.0
−18 ± 80
211 ± 70
2.0 ± 0.50
0.80 ± 0.07
45.2 ± 20.1
−46 ± 150
302 ± 79
2.4 ± 0.65
−801 ± 90
801 ± 90
0.78 ± 0.07
41.4 ± 12.3
5 ± 110
241 ± 100
2.0 ± 0.50
0.77 ± 0.07
46.3 ± 16.3
−10 ± 167
312 ± 101
2.01 ± 0.28
−1025 ± 100
1025 ± 100
Accuracy of CT values estimation using the Atlas, DL-AdvSS and Segm attenuation correction approaches within the entire head contour, air and soft-tissue regions
−8 ± 20
103 ± 36
−14 ± 18
101 ± 40
−175 ± 34
230 ± 33
317 ± 294
459 ± 240
295 ± 282
407 ± 228
805 ± 47
805 ± 47
1 ± 5
8 ± 4
2 ± 6
10 ± 5
−2 ± 1
5 ± 2
Mean and mean absolute PET quantification errors in soft-tissue, bone, air within the entire head contour for the different MRI-guided attenuation correction methods with respect to CTAC used as reference
mean ± SD (Abs. mean ± SD)
mean ± SD (Abs. mean ± SD)
mean ± SD (Abs. mean ± SD)
mean ± SD (Abs. mean ± SD)
1.5 ± 9.3 (3.5 ± 8.7)
1.1 ± 9.9 (6.0 ± 8.3)
6.3 ± 13.7 (7.1 ± 12.6)
2.6 ± 2.9 (3.7 ± 8.5)
3.2 ± 13.6 (5.0 ± 13.1)
1.2 ± 13.8 (6.7 ± 12.1)
3.2 ± 13.6 (5.5 ± 13.1)
3.2 ± 3.4 (4.0 ± 8.6)
−1.6 ± 10.2 (5.4 ± 8.8)
−14.7 ± 8.9 (15.5 ± 7.3)
40.8 ± 10.6 (42.6 ± 8.8)
−5.6 ± 4.1 (7.6 ± 9.5)
The statistical analysis proved that there is a statistically significant difference between the performance of the Atlas and DL-AdvSS method against the Segm method over all brain regions (p < 0.001). However, the difference between DL-AdvSS and Atlas was only statistically significant for the left and right Rectus (p < 0.02), Orbitalis (p < 0.02) and Thalamus (p < 0.02) regions.
RMSE, PSNR and SSIM computed between PET-CT AC and PET-sCT AC (top) and ground truth CT and synthetic CTs images (bottom)
177 ± 48
27.98 ± 1.15
0.86 ± 0.03
168 ± 52
28.43 ± 1.16
0.87 ± 0.04
301 ± 46
19.35 ± 1.11
0.51 ± 0.03
0.3 ± 0.01
32.7 ± 1.4
0.93 ± 0.03
0.3 ± 0.01
33.3 ± 1.5
0.93 ± 0.02
0.7 ± 0.02
31.1 ± 1.2
0.90 ± 0.02
Among the 40 patients, the DL-AdvSS method failed to generate a proper sCT image only in one case (supplemental figure 2). For this case, the DL-AdvSS method resulted in substantial overestimation of the soft-tissue CT value in the brain and as such, the bone delineation (using a threshold of 150 HU) as well as PET quantification accuracy were disturbed.
Accurate attenuation correction has been a major challenge facing quantitative PET/MR imaging since its introduction. As such, considerable efforts have focused on deriving patient-specific attenuation maps using various approaches. More advanced methods attempt to account for bone attenuation particularly in PET brain imaging since cortical bone constitutes a large proportion of the skull . We demonstrated in a previous study that the Atlas approach by far outperforms emission-based AC with the current TOF timing resolution of 580 ps . Emission-based methods will be revisited on future generation TOF PET/MRI scanners with improved TOF resolution.
The proposed DL-AdvSS method incorporates semantic structure learning using an adversarial framework into MRI-to-CT synthesis. The adversarial network used in SynGAN aimed at enforcing a higher order consistency in the appearance space. Likewise, the adversarial network incorporated in SegGAN encouraged a higher-order consistency in the semantic structure space. The volumetric evaluation of bony structures extracted by DL-AdvSS demonstrate the efficacy of the proposed method. The proposed DL-AdvSS shares some similarities with the work presented by Chartsias et al.  and Huo et al. . Hence, it is worthwhile to clarify the difference between these works and the DL-AdvSS model. The proposed CT to MRI synthesis in  employs a 2D CycleGAN  functioning in parallel with an independent 2D segmentation network aiming at segmenting the synthesized MRI images. As opposed to the DL-AdvSS model, the segmentation network is independent and does not regularize the main synthesis process. Based upon this idea, Huo et al.  proposed joint training of a 2D segmentation network and the 2D CycleGAN to achieve end-to-end segmentation and synthesis. Hence, only a multi-class cross entropy loss function was defined to govern the discrepancy between predicted and ground truth segmentations. However, SegGAN in the DL-AdvSS model was designed to reinforce the high-order consistency not only in the appearance space but also in the semantic structure space for MRI-to-CT synthesis. The quantitative accuracy of the proposed DL-AdvSS method was compared against the state-of-the-art atlas-based approach and the commercial segmentation-based method to provide a clear picture of the performance and robustness of learning based methods in the context of brain PET/MR imaging.
The proposed DL-AdvSS method resulted in an overall average ME of −14 HU and MAE of 101 HU for the 40 brain studies included in this work, which is comparable with results reported in the literature. The DCNN model proposed by Han  achieved an overall average MAE of 84.8 HU on data from 18 patients in the head region. The fuzzy c-means clustering method implemented on UTE MR sequences by Su et al.  resulted in a MAE of 130 HU whereas the multi-atlas method developed by Burgos et al.  produced a MAE of 102 HU on 17 head and neck studies in the context of radiotherapy planning. The deep neural network based on Dixon and ZTE MR images built by Gong et al.  achieved a bone extraction accuracy of DSC = 0.76 on 40 clinical studies (vs. DSC = 0.77 achieved by DL-AdvSS). The data-driven deep learning approach for PET AC without anatomical imaging was developed by Liu et al.  to continuously generate sCT images from uncorrected PET images. This approach resulted in bone delineation accuracy of DSC = 0.75 and a MAE of 111 HU on 128 patients.
Liu et al.  developed a deep learning approach to generate PET attenuation maps from T1-weighted MR sequences. They reported a mean PET quantification bias of less than 1% for brain imaging studies. It should be noted that the evaluation of this model was performed using segmented CT images into three classes (bone, air and soft-tissue) as reference versus the three-class attenuation map generated by the deep learning algorithm. The different observed PET quantification bias might be partly due to different validation strategies. Gong et al.  reported a mean PET quantification bias up 3% using a convolutional neural network trained with Dixon and ZTE MR sequences. The additional ZTE sequence, providing complementary information about bone tissue, supports the deep learning network to better extract the bony tissue. Spuhler et al.  developed a uNet deep learning structure to estimate PET attenuation maps from T1-weighted MR images in brain imaging. Overall, biases of −0.49 and 1.52% were reported for static 11C-WAY-100635 and 11C-DASB PET imaging, respectively. However, the training of neural networks was performed using the transmission data to avoid the discrepancy between CT-derived and optimal PET attenuation maps owing to the polyenergetic nature and lower energy photons of CT acquisition compared to PET.
One of the major advantages of deep learning methods is the fast computation time, even though the training phase can take a few days. The training of the model needs to be done only once and application of the model to generate sCT for new subjects takes less than a minute. Multi-atlas approaches tend to be slow, taking a couple of hours to generate a single sCT image depending on the size of the atlas dataset. The Atlas approach required 6 h to generate one sCT image considering an atlas containing 20 MR/CT pairs (or 40 MR/CT pairs under a two-fold cross validation scheme). Moreover, deep learning approaches can easily accommodate large training databases owing to the high model capacity. Since we used a relatively large number of training samples (50 MR/CT pairs), we believe that the training of the DL-AdvSS model was good enough and as such, increasing the number of training data would not significantly enhance the performance of the model. For the training and validation of the proposed method, a two-fold cross-validation scheme was exploited. Similar studies commonly use a higher number of data splitting (4-fold to 6-fold). Since the training dataset in this work was sufficiently large, a two-fold cross-validation did not lead to substantial over-fitting of the model and no significant bias and variance was observed. The burden of increasing the number of training data is only on the training phase while the size of the final model and computation time will remain the same. Conversely, increasing the size of the atlas dataset in multi-atlas approaches proportionally increases their computation time.
In our previous comparison of MRI-guided sCT generation methods , the DCNN method exhibited promising performance when compared to state-of-the-art atlas-based methods. DCNN led to slightly more accurate derivation of sCT values compared to the atlas-based approaches. However, the Atlas-based method showed higher robustness to outliers as it usually results in a realistic outcome representing the average of the atlas dataset. In contrast, the DCNN method failed to identify bone tissue and appropriate sCT estimation in four out of 39 clinical studies. The sCT generation process failed in only one case (limited to soft-tissue region) when using the DL-AdvSS approach (Supplemental figure 2), thus presenting good robustness to MRI intensity variation and minor metal-induced artifacts thanks to the adversarial semantic learning framework.
The proposed DL-AdvSS algorithm effectively incorporates semantic structural features into the synthetic CT generation process. As a result, the proposed method exhibited excellent consistency to derive accurate anatomical structures in the sCT images. To this end, the DL-AdvSS network benefits from two loss functions governed by binary cross-entropy and multi-class cross-entropy intended for maximizing structural feature extraction. However, in practice, these loss functions introduce a trade-off between the accuracy of anatomical structures definition and CT intensity estimation. Apart from the outlier depicted in supplemental figure 2, this method exhibited minor vulnerability to CT intensity variation across patients. In this regard, the atlas-based method exhibited more robust performance, resulting in no gross CT intensity fluctuation. In this work focusing on brain imaging, the atlas-based technique performed sufficiently well compared to the proposed deep learning approach. However, the proposed deep learning algorithm showed much less vulnerability to outliers since the CT synthesis process is governed by semantic structure features. A major finding of this work is that despite the elaborated design of the proposed algorithm and its promising potential, deep learning techniques may not be able to completely outperform advanced atlas-based methods. The performance of the atlas-based and deep learning techniques might depend on the selected patient population.
A novel deep learning algorithm was presented and its quantitative performance evaluated against the commercial segmentation-based method and an atlas-based approach. The proposed DL-AdvSS method is capable of estimating accurate patient-specific attenuation maps comparable to the Atlas method, thereby reducing significantly the quantification bias in brain PET/MR imaging. The DL-AdvSS technique also exhibited high robustness to anatomical variation and MR intensity fluctuation as only one outlier was observed. Overall, the DL-AdvSS algorithm demonstrated competitive performance with respect to the state-of-the-art atlas-based approach achieving clinically tolerable errors, thus outperforming the commercial segmentation approach used in the clinic.
This work was supported by the Swiss National Science Foundation under grant SNFN 320030_176052 and the Swiss Cancer Research Foundation under Grant KFS-3855-02-2016.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
None of the authors have affiliations that present financial or non-financial competing interests for this work.
Research involving human participants
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent was obtained from all individual participants included in the study.
- 13.Burgos N, Cardoso MJ, Guerreiro F, Veiga C, Modat M, McClelland J, et al. Robust CT synthesis for radiotherapy planning: application to the head and neck region. In: International conference on medical image computing and computer-assisted intervention: Springer; 2015. p. 476–84.Google Scholar
- 19.Leynes AP, Yang J, Wiesinger F, Kaushik SS, Shanbhag DD, Seo Y, et al. Zero-echo-time and Dixon deep pseudo-CT (ZeDD CT): direct generation of pseudo-CT images for pelvic PET/MRI attenuation correction using deep convolutional neural networks with multiparametric MRI. J Nucl Med. 2018;59:852–8.CrossRefGoogle Scholar
- 20.Nie D, Trullo R, Lian J, Petitjean C, Ruan S, Wang Q, et al. Medical image synthesis with context-aware generative adversarial network. In: Medical image computing and computer-assisted intervention − MICCAI 2017, Quebec, Canada; Springer, 2017. pp 417–25.Google Scholar
- 31.Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, Lille, France, vol. 37; 2015. pp 448–56.Google Scholar
- 36.Chartsias A, Joyce T, Dharmakumar R, Tsaftaris SA. Adversarial image synthesis for unpaired nulti-modal cardiac data. In: International workshop on simulation and synthesis in medical imaging, SASHIMI; 2017. pp 3–13.Google Scholar
- 37.Huo Y, Xu Z, Bao S, Assad A, Abramson RG, Landman BA. Adversarial synthesis learning enables segmentation without target modality ground truth. In: IEEE 15th International Symposium on Biomedical Imaging (ISBI); 2018. pp 1217–20.Google Scholar
- 38.Zhu J-Y, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. IEEE international conference on computer vision. 2017. pp. 2223-2232.Google Scholar