Introduction

Cardiac radioablation is a new non-invasive modality for the treatment of cardiac arrhythmias. This treatment method is based on delivering a radiation dose to the arrhythmogenic tissues using external beam radiation therapy [1,2,3]. A typical treatment workflow includes acquiring a planning computed tomography (CT) used to delineate the arrhythmogenic tissue (target) and the organs-at-risk (OARs) for the Hounsfield Units (HU) derived foreseen radiation dose calculation. At treatment, the dose is then delivered to the target while sparing the OARs as much as possible. However, the complex cardiorespiratory motion may impact the accuracy of dose delivery [4, 5]. This makes real-time monitoring of the cardiorespiratory motion of paramount importance to achieve safe and effective treatment delivery.

A possible candidate image modality for real-time guidance in cardiac radioablation is transthoracic ultrasound (US) imaging [6, 7]. This approach relies on identifying the cardiac tissue position using US imaging at both the simulation and the treatment delivery stages. Comparing the two allows to compensate for possible displacements of cardiac structures at treatment. To minimize clinical workflow steps and ease treatment planning, it may be favourable to acquire the US scans simultaneously with the planning CT scan at the simulation stage. However, this approach is prone to creating transducer-induced metal artifacts on the CT scans, caused by the internal metal components of the US transducers [8, 9].

Metal artifacts may generate improper representations of anatomical structures and incorrect HU values, resulting in potentially inaccurate radiation dose calculation [10,11,12]. Several algorithms for metal artifact reduction (MAR) have been developed, mainly focusing on the artifacts generated by implanted metal structures. The majority of these MAR algorithms, both commercially available and research-based, follow a conventional analytical approach. Recently, new works based on deep learning have been proposed [11, 12]. Commercially available MAR algorithms use an iterative approach of correction of CT projection data [12]. Among them, the Orthopaedics Metal Artifact Reduction (O-MAR, Philips Health System) and the iterative Metal Artifact Reduction (iMAR, Siemens Healthcare) algorithms were widely investigated for the improvement of radiation therapy planning [13,14,15,16,17,18,19,20,21]. Among the research-based MAR algorithms, Metal Deletion Technique (MDT, ReVision Radiology) [22] was the one most often compared to the commercially available algorithms [23,24,25,26]. All these algorithms typically suffer from improper restoration of HU values, from distortion of anatomical structures, and even from creation of secondary artifacts [11]. Recently, our group [27] investigated a newly developed MAR algorithm, Combined Clustered Scan-based MAR (CCS-MAR), which followed the traditional analytical approach, and compared its performance to commonly used commercial and research-based MAR algorithms. The results of this study revealed that further development or improvements were needed to reduce residual artifacts and further improve HU value restoration capabilities.

In recent years, the development of deep learning-based algorithms for metal artifact reduction in CT imaging gained significant interest [28,29,30,31,32,33,34,35,36,37,38]. In general, to learn in a supervised manner [39, 40] the complex metal artifact patterns and propagation, these algorithms require data including CT scans with artifacts (CTart), and corresponding artifact-free (CTref) scans. In the absence of adequate paired data, typical metal artifacts which resemble real clinical scenarios can be simulated [28, 30, 31, 38]. For our work, we have used generative adversarial networks (GAN) [41]. Typically, these architectures consist of a generator and a discriminator, which are types of convolutional neural networks (CNN) [41, 42]. The generator and the discriminator are trained in an adversarial manner to perform the transformation. We have focused in particular on their extension CycleGAN [43], which utilizes two GAN architectures, and has already been studied in literature for the transformation of CTart scans into artifact-reduced CT (CTcor) scans for radiation therapy applications [28, 29, 31].

This work aims to develop a deep learning-based MAR algorithm for the reduction of US transducer-induced metal artifacts on CT scans. The proposed algorithm was designed following a supervised learning scheme with a CycleGAN architecture using paired clinical CT scans. Then it was evaluated on phantom CT scans with real artifacts and clinical CT scans with simulated artifacts. In addition, the performance of CycleGAN has been compared with the performance of iMAR, O-MAR, MDT and CCS-MAR.

Materials and methods

CT data preparation

Clinical CT scans

Paired clinical CT scans with and without US transducer-induced metal artifacts were used. In particular, DICOM CT scans of the thoracic region were utilized from the “COVID19-CT-Dataset” online database [44]. Initially, CT scans from 180 patients were downloaded and visually inspected for suitability. CT scans or CT slices with large COVID19 induced density changes in the lungs region, with suboptimal quality due to low resolution, and/or with presence of metal artifacts resulting from external foreign bodies were excluded. This resulted in CT scans from 84 patients, and from each of these CT scans, axial CT slices composed of 22–100 slices per patient covering the cardiac structure from the apex to the base were selected. These CT scans consisted of 512 × 512-pixel slices with 1.5 or 3 mm slice thickness [45].

To simulate the metal artifacts on the CT scans, Sakamoto et al. [46]. developed a MatLab package based on the study by Zhang et al. [38]. This package was modified in our work for the specific simulation of US transducer-induced metal artifacts on the selected COVID19-CT-Dataset (See Fig. 1 a). In our procedure, initially the pixels identifying the US transducers on the phantom CTart scans were manually segmented and stored separately. Then, the segmented US transducers were copied to the clinical CT scans and positioned on the scans using rigid rotation-translations to be imaging the cardiac structures. A threshold value of 2000 HU was applied on the imported US transducer to extract metal components, which were then saved as binary images. The metal extraction threshold value of 2000 HU was chosen according to the previous research published in the literature [27, 47, 48]. Then, pre-defined HU thresholds were used to segment the bone, lung, and water-equivalent tissues on the clinical CTref slices. To convert the HU values of the pixels into linear attenuation coefficient for varying X-ray energies, corresponding mass attenuation coefficients were used from the NIST [49] database. Subsequently, the polychromatic projection data for corresponding X-ray energies were simulated from the segmented bone, lung, water equivalent tissue and from the metal binary image. As the metal components of a US transducer consist of lead, zirconate, and titanate [50], the average mass attenuation coefficient value of these metals was used to generate the projection data. Consequently, metal-containing projection data was created from those simulated projections with Poisson distribution for the reconstruction of a CTart slice.

Fig. 1
figure 1

a The procedure for simulation of US transducer-induced metal artifact on clinical CT scans. The first and second row show the segmented US transducers from the phantom CT scans and the aligned US transducers with the suitable clinical CT slices, respectively. Pixels containing metal parts of a US transducer and the CT slices consisting of simulated artifacts are shown in the last two rows, respectively. b Visual representation of the real artifacts and the simulated artifacts on the phantom scans. [Window level/ width: 50/ 350]

To check the correctness of this artifact simulation method, phantom CT scans were utilized (See Fig. 1 b). From a particular phantom CT scan, initially, CTref and the corresponding CTart slices were selected. Then, the US transducer-induced metal artifacts were simulated on the phantom CTref slices based on the procedure described above. The simulated CTart slices were visually validated against the corresponding phantom CTart slices comparing them to the real US transducer-induced metal artifacts.

Phantom CT scans

Table 1 shows the combinations of CT scanners, anthropomorphic phantoms, and US transducers used in this work. In particular, three types of adult anthropomorphic phantoms were used to scan with and without a total of four types of US transducers. The utilized anthropomorphic phantoms were an ART-211 male phantom (ART, Radiology Support Devices, Long Beach, CA, USA); an ATOM® male phantom (CIRS, Model-701, Norfolk, VA, USA); and a CT torso phantom (CT Torso, Model CTU-41, Kyoto Kagaku Ltd, Japan). These anthropomorphic phantoms were constructed using tissue-equivalent epoxy materials that mimics the density and attenuation characteristics of human tissues. They include a range of components, such as cardiac structures, air-equivalent materials for simulating lungs, and bone-like materials with simulated air pockets. To obtain the paired CT scans, each phantom was CT scanned with and without a US transducer, resulting in a CTart scan and the corresponding CTref scan, respectively (See Fig. 2 for an example of the procedure). The US transducers were positioned on the phantoms at various angles to be suitable for proper imaging of the heart. As the dimensions of US transducers, including the size and width of their metal components, can have an impact on the creation of metal artifacts on CT scans. Among the US transducers used, the linear volume array transducer was the largest and had the widest metal component, which was measured to be 6 cm using CT scans, and induced a greater quantity of metal artifacts compared to the other transducers.

Fig. 2
figure 2

Workflow to obtain the dataset pairs: in (a) the CT torso phantom is shown. The positioning of the phantom with (b) and without (c) the linear volume array transducer resulted in CT scans as shown in (d) and (e)

Table 1 The details of utilized CT scanners, anthropomorphic phantoms, and the US transducers

Cycle-MAR network

The CycleGAN model [43] has been proposed in literature for unpaired training data. However, we used paired data in this work to enforce the restoration accuracy of anatomical structures and HU values [51]. The workflow of the developed Cycle-MAR network is illustrated in Fig. 3.

Fig. 3
figure 3

Training workflow of the Cycle-MAR network for the reduction of the US transducer-induced metal artifacts. It has two mapping functions: generator (GY) transforms the CTart scan (domain X) into CTcor scan, while generator (GX) transforms the CTref scan (domain Y) into an CTart scan. Discriminator (DX) and discriminator (DY) aim to distinguish translated CTref scan from the CTref (domain Y) and distinguish the translated CTart scan from the CTart scan (domain X), respectively

The CycleGAN translates the metal artifact domain (X) into an artifact-free domain (Y) by using adversarial loss (\({\mathcal{L}}_{adv})\), cycle consistency loss \({\mathcal{L}}_{\left(cycle\right)}\), and identity loss \({\mathcal{L}}_{\left(identity\right)}\). This process includes two mapping functions, \({G}_{X}:X\to Y\) and \({G}_{Y}:Y\to X\). The mapping function \({G}_{X}\) translates a CTart slice into a CTcor slice, whereas \({G}_{Y}\) translates a CTref slice into CTart slice. The network also consists of two adversarial discriminators \({D}_{X}\) and \({D}_{Y}\) which aim to distinguish the translated domain as fake. \({D}_{X}\) aims to distinguish between \(x\) from \({G}_{X}\left(y\right)\) and \({D}_{Y}\) aims to distinguish \(y\) from \({G}_{Y}\left(x\right)\). For \({G}_{Y}\), the (\({\mathcal{L}}_{adv})\) is the mean squared error (MSE) between output \({G}_{Y}\left(x\right)\) and target domain \(Y\). The \({\mathcal{L}}_{\left(cycle\right)}\) calculates the translation error between \(x\)and \({G}_{X}\left({G}_{Y}\left(x\right)\right)\) through the translation of \(X\to Y \to X\) and in vice versa of \(Y\to X \to Y\). The \({\mathcal{L}}_{\left(identity\right)}\) was introduced to regularize the \({G}_{X}\) and \({G}_{Y}\) to not induce any changes when x and y were the input for them, respectively.

In this study, regularization parameters values of \({\lambda }_{cycle}=10\) for, and\({\lambda }_{identity}=15\) were chosen among several examined parameter sets. To implement Cycle-MAR, the ResNet [52] and the PatchGAN [42] architectures were used as the generator and the discriminator, respectively. The network was trained using the Adam optimizer [53] for 500 epochs using a batch size of 1 which produced the best results and validated with five-fold cross-validation. The cross-validation with five-fold was chosen based on the previous studies [29, 33, 54]. To avoid overfitting, data augmentation methods of rotation, horizontal flip, resized crop and image perspective were utilized during the training session. The Cycle-MAR network was implemented using Python 3.8.8, with the PyTorch (version 1.10.2) framework. A NVIDIA (TITAN RTX) GPU with 25GB was used throughout the experiments.

Data split for training and testing

Table 2 shows the data split strategy for the training and testing of Cycle-MAR. The network was trained using randomly selected paired clinical CT scans consisting of CTref scans and the corresponding simulated CTart scans. Then it was tested on the phantom CT scans with real artifacts, and on the clinical CT scans with simulated artifacts. The HU values on the clinical CT scans were clipped between − 1000 and 1000, and the remaining pixel values were normalized between − 1 and 1 to improve the training efficiency [28].

Table 2 Data split strategy for training and testing of the Cycle-MAR network

Comparison with commercial and research-based MAR algorithms

The performance of the developed Cycle-MAR network was compared with commercially available MAR algorithms and research-based MAR algorithms. iMAR and O-MAR were directly applied during the reconstruction of the phantom CT scans by the scanners. iMAR was not applied to the Siemens PET-CT scans, because it was not available on this particular scanner. In addition to this, MDT, CCS-MAR, and Cycle-MAR were also applied to all phantom and clinical CT scans.

Image quality metrics analysis

Structural similarity (SSIM) index, root mean square error (RMSE) of the HU values, and peak signal-to-noise ratio (PSNR) [38, 55] were calculated to evaluate the performance of the Cycle-MAR network for metal artifact reduction and image quality improvement. These metrics were calculated for the CTart and the CTcor scans compared to the CTref scans. The analysis was performed using the overall mean values of these image quality metrics calculated for all CTart and CTcor scans.

HU value restoration evaluation

For the clinical CT scans, HU value measurements on specific regions were performed on the CTart scans, CTcor scans and the CTref scan. The contour-based mean HU values and standard deviation (STD) were calculated for the entire heart, lungs, and bone regions from all CT slices using MatLab (The MathWorks Inc, USA) (See Fig. 5). The percentage of mean HU value improvement for CTcor scans was calculated using the following equation,

$${\text{Mean}}\,{\text{HU}}\,{\text{value}}\,{\text{improvement}}\,\left( \% \right) = \frac{{\left| {\Delta HU} \right|_{{CT_{{ref,}} CT_{{art}} }} - \left| {\Delta HU} \right|_{{CT_{{ref,}} CT_{{cor}} }} }}{{\left| {\Delta HU} \right|_{{CT_{{ref,}} CT_{{art}} }} }} \times 100\%$$
(1)

where \({\left|{\Delta } \text{H}\text{U}\right|}_{{\text{C}\text{T}}_{\text{r}\text{e}\text{f}, }{\text{C}\text{T}}_{\text{a}\text{r}\text{t}}}\) and \({\left|{\Delta } \text{H}\text{U}\right|}_{{\text{C}\text{T}}_{\text{r}\text{e}\text{f}, }{\text{C}\text{T}}_{\text{c}\text{o}\text{r}}}\) indicate the absolute difference in the contour-based mean HU values between the CTref scans and the corresponding CTart and CTcor scans, respectively.

Results

Phantom scans analysis

Figure 4 shows the CTref scans, and the corresponding CTart and CTcor scans from the ART, ATOM®, and CT torso phantoms. In general, Cycle-MAR outperformed other MAR algorithms to reduce the intense dark or bright regions near the US transducer during the visual inspection. Residual streak artifacts were observed on CTcor after the Cycle-MAR application, especially in the ART phantom scans (red mark in Fig. 4). In the ATOM® phantom scans, O-MAR and MDT applications induced secondary dark streak artifacts (yellow marks in Fig. 4). The Cycle-MAR application on the CTart scans generally improved the calculated SSIM and PSNR values, while the RMSE values were decreased (Table 3).

Fig. 4
figure 4

From top row to bottom: CT scans of the ATOM®, ART, and CT torso phantoms. The images from left to right show: the CTref scan with the US transducer details, CTart scan, and the CTcor scans after application of the MAR algorithms [Window level/width: 50/350]. The yellow marks indicate induced secondary artifacts, while the red mark indicates residual streaks artifacts, respectively

Table 3 Mean values of SSIM, PSNR and RMSE for the CT scans of the ATOM®, ART and CT torso phantoms

Clinical scans analysis

An example of a CTart scans from three randomly selected patients and the effect of MDT, CCS-MAR, and CycleGAN applications on them for metal artifact reduction is shown in Fig. 5. Based on the visual inspection, Cycle-MAR application on CTart scans restored the soft tissue and bone details better than the MDT and CCS-MAR algorithms (red and green arrows in Fig. 5). The overall mean values of SSIM, PSNR, and RMSE for all clinical CT scans from 14 patients are shown in Table 4. Cycle-MAR application on CTart scans generally resulted in higher mean SSIM and PSNR values and lower mean RMSE values.

Fig. 5
figure 5

CT scans from three patients. The images from left to right show: the CTref scan with the US transducer details, CTart scan, and the CTcor scans after MDT, CCS-MAR, and Cycle-MAR applications [Window level/width: 50/350]. The red and green arrows indicate the changes of soft tissue density. The counters for the HU value measurements were shown on the CTref scans: the heart (red), bone (blue) and lungs (green)

Table 4 The overall calculated mean values of SSIM, PSNR and RMSE values for all clinical CT scans from 14 patients

Table 5 shows the overall mean (± STD) HU values for the CTref Scans, and the calculated absolute differences between the overall mean and the differences of standard deviation (STD) of HU values for the CTart Scans, and CTcor Scans compared to CTref scans for the heart, lungs, and bone regions on the clinical CT scans. The application of MAR algorithms improved the HU value measurements across all regions. Cycle-MAR application restored the mean HU values for the heart and lung region better than MDT and CCS-MAR.

The percentage of mean HU value improvement for the heart, lungs and bone regions is shown in Fig. 6. For the heart region, in which typically the target is located, the improved HU value percentage after MDT, CCS-MAR, and Cycle-MAR applications was 59.58%, 62.22%, and 72.84%, respectively. The regions of lungs and bone were considered as OARs in this study, for these regions, the highest improvement percentage was found after the application of Cycle-MAR and MDT, respectively.

Fig. 6
figure 6

HU values improvement percentage for the heart, lungs and bone regions on the CTcor scans after MDT, CCS-MAR and Cycle-MAR application

Table 5 The overall region-based Hounsfield unit (HU) value measurements and calculation using all clinical CT scans from 14 patients

Discussion

In this study, a Cycle-MAR algorithm which used paired CT scans for training purposes was proposed to reduce the US transducer-induced metal artifacts on planning CT scans for US-guided cardiac radioablation. Cycle-MAR was evaluated for the improvement of image quality and HU value restoration compared to the commonly used commercial and research-based MAR algorithms.

Overall, the proposed model effectively reduced the metal artifacts on the clinical CT scans more than on the phantom CT scans. For the Cycle-MAR training, only clinical CT scans, and no phantom CT scans were used. This might be a reason for the noticeable residual streaks on the phantom CTcor scans after the Cycle-MAR application (See Fig. 4). This potentially can be improved by adding separate sets of phantom scans to the algorithm training set. On the other hand, in the end, the algorithm will be used in the clinic and therefore good performance on phantom scans is less important. To further reduce the residual streaks, the CTcor scans which resulted from the training of Cycle-MAR may be added to the training data set and this is under consideration for future work.

All MAR algorithms restored the measured HU values for the cardiac structures within 21 HU, which is well below the tolerance accuracy of 30 HU for the waterlike material recommended by the American Association of Physicists in Medicine (AAPM) guidelines [56] for image-guided radiation therapy. Remarkably, the same guidelines recommend that the HU value deviation for the lung and bone be within 50 HU.

The application of conventional MAR algorithms on CTart scans modified the anatomical structures and induced a number of secondary artifacts (See Figs. 4 and 5). These MAR algorithms apply their correction on the projection data, therefore, small errors in local corrections in the projection data can affect the reconstructed CT scan globally [12]. However, Cycle-MAR works in the image space and does not require any projection data for artifact corrections. This means that the local changes are applied only to a specific area on the CT scan.

To the best of our knowledge, this is the first study to investigate the application of a deep learning model, CycleGAN, for the reduction of US transducer-induced metal artifact on CT scans which has been compared to state-of-the-art MAR algorithms. Even though Cycle-MAR generally well reduced the metal artifacts compared to other MAR algorithms, especially in the clinical CT scans, a reduction in image contrast was observed on the CTcor scans after Cycle-MAR application. A possible reason for this is the inherent limitation of the generator in the conversion and/or reassignment accuracy of pixel values while performing feature extraction and the image translation process. In addition, the direct optimization in pixel differences through the loss function may also result in reduced image contrast or blurry appearance on CT scans [31, 57]. Therefore, investigating a different generator, especially DenseNet (Densely Connected Network) [58, 59] instead of ResNet, and also examining appropriate loss functions may solve this issue.

This work has a few limitations: the performance of Cycle-MAR was evaluated using clinical CT scans with the simulated US-transducer-induced metal artifacts. To draw final conclusions regarding the performance of the proposed MAR algorithm, further evaluation using clinical data with real artifacts is necessary. However, this can be a challenging task due to ethical justification of acquiring an additional CT scan [one with the probe in place (CTart) and another one without it (CTref)].

In addition to the evaluation of image quality improvement and HU value restoration, the dosimetric impact of the metal artifact reduction including the accuracy in contouring, and the calculation of dose distribution for the arrhythmogenic tissue (target) and OARs during the treatment planning is also crucial. Future work will therefore include an evaluation of the dosimetric impacts of the application of the Cycle-MAR network.

Conclusion

This work developed a MAR network based on a deep learning CycleGAN which can be used to reduce metal artifacts resulting from the presence of a US transducer during CT scan acquisition. The performance of the proposed algorithm was evaluated for the metal artifact reduction abilities on phantom and clinical CT scans in comparison with commonly used commercial and research-based MAR algorithms. The results of the study have shown that the proposed Cycle-MAR considerably reduces the metal artifacts, while preserving the bone density and soft tissue details. Future challenges and analysis include exploring appropriate loss functions for the improvement of adversarial training, and dosimetric evaluations using clinical CT scans.