Abstract
Purpose
During the acquisition of MRI data, patient, sequence, or hardwarerelated factors can introduce artefacts that degrade image quality. Four of the most significant tasks for improving MRI image quality have been bias field correction, superresolution, motion, and noise correction. Machine learning has achieved outstanding results in improving MR image quality for these tasks individually, yet multitask methods are rarely explored.
Methods
In this study, we developed a model to simultaneously correct for all four aforementioned artefacts using multitask learning. Two different datasets were collected, one consisting of brain scans while the other pelvic scans, which were used to train separate models, implementing their corresponding artefact augmentations. Additionally, we explored a novel loss function that does not only aim to reconstruct the individual pixel values, but also the image gradients, to produce sharper, more realistic results. The difference between the evaluated methods was tested for significance using a Friedman test of equivalence followed by a Nemenyi posthoc test.
Results
Our proposed model generally outperformed other commonlyused correction methods for individual artefacts, consistently achieving equal or superior results in at least one of the evaluation metrics. For images with multiple simultaneous artefacts, we show that the performance of using a combination of models, trained to correct individual artefacts depends heavily on the order that they were applied. This is not an issue for our proposed multitask model. The model trained using our novel convolutional loss function always outperformed the model trained with a mean squared error loss, when evaluated using Visual Information Fidelity, a quality metric connected to perceptual quality.
Conclusion
We trained two models for multitask MRI artefact correction of brain, and pelvic scans. We used a novel loss function that significantly improves the image quality of the outputs over using mean squared error. The approach performs well on real world data, and it provides insight into which artefacts it detects and corrects for. Our proposed model and source code were made publicly available.
Background
Due to the sensitivity of the magnetic resonance imaging (MRI) data acquisition process, slight changes around the scanner system can produce artefacts in the images. The artefacts generally belong to three main classes: hardwarerelated (e.g. magnetic field inhomogeneities, zipper artefacts, bias fields), sequencerelated (e.g. aliasing, subsampled kspace, Gibbsringing, low signaltonoise ratio (SNR) due to short acquisition time) or patientrelated (e.g. involuntary patient motion) [1, 2].
Four of the most significant, often addressed MRI artefacts have been identified as: bias fields, subsampled kspace, patient motion, and noise. Proposed machine learning solutions commonly establish stateoftheart results [3,4,5,6] for correcting these artefacts individually. However, when training a model for an individual task, it most commonly does not include any other artefacts. For example bias field correction is often accounted for as a preprocessing step in models for other tasks [7, 8] instead of something to consider by the model. In reallife scenarios, multiple artefacts can appear simultaneously, hindering the performance of such models. Multitask learning offers a solution and it has been shown to result in more robust models [9], yet it has only been applied to MRI artefact correction in a few cases [10, 11], where multiple artefacts are involved in the training dataset, however the model architecture does not separate the corrections according to what artefact they correspond to.
Common conclusions from research in correcting these individual artefacts have shown that after training on artificially augmented artefacts, the performance translates well to reallife data [12,13,14]. Furthermore, meaningful corrections can be performed in both the image [15] and the Fourier space (kspace) [16] of the data. A recently proposed framework for MRI superresolution [17] exploits the fact that the problem can be viewed both in the image and Fourier space by implementing Fourier transform layers in the model architecture, which has recently been implemented to reconstruct MRI images [11].
In machine learning research for medical imaging, the most common choices for loss functions include minimising the pixelwise \(L_1\) or \(L_2\) losses, since they are easy to compute and usually lead to quick convergence [18]. However, it is becoming increasingly clear that they have a low correlation to perceptual quality [19, 20]. Alternative loss functions have recently been proposed for more robust results [21] or to show better correlation to the perceptual quality of the images. This can be achieved by using the deep features of the model in the loss [22], or by using an adversarial model in a GAN setting [23], but these tend to increase the complexity of training the model, and the selfsupervision of GANs introduces other concerns such as hallucinations [24]. Alternatively, making small modifications such as including the image gradients in the loss function has shown to benefit the sharpness of the predictions [25, 26].
In the presented work, we investigate a multitask model, trained using a novel loss function, and how it performs the task of MRI artefact correction, compared to a combination of approaches that handle only a single task. Our main contributions can be divided into the following components:

Using a multitask learning approach, our model is trained to correct for four types of artefacts simultaneously, even though it has not encountered an image with multiple artefacts during training.

Our model was trained using a novel loss function based on convolutional kernels that is simple to compute, yet it contains image gradient information that could help the model reconstruct images of better perceptual quality.

Our implementations of the artefact augmentations and the trained models are publicly available.
Methodology
We begin by discussing our model architecture, followed by a summary of the datasets used. Then we detail our augmentations of four artefacts in MRI imaging, identified as some of the most significant: bias fields, subsampled kspace, patient motion, and noise. This is followed by the proposed alternative loss function.
Model
Model architectures that incorporate the kspace information as well as image space information about the input image have been shown to perform well for MR image reconstruction tasks [16, 17], therefore we have decided to implement an interleaved model architecture that simultaneously corrects the images in both domains [11].
Our baseline architecture is the SRResNet [27]—originally introduced for superresolution tasks,—due to its photorealistic reconstructions. To exploit both the image and frequency domains of the image, we used the combination of two SRResNet models. The first model operates on the input image, and outputs the residuals in the image space. Alternatively, the second model implements a Fourier transform layer on the input image, which means the model operates on the kspace of the data. Additionally, a final layer performing an inverse Fourier transform was also added to the outputs of this model. The outputs of the two models, both in the image space, were added together, which form the outputs of our proposed architecture.
Both models contain twelve convolutional blocks, starting with five strided blocks for downsampling, and ending with five blocks with upsampling layers. The output of the strided blocks is also concatenated with the input of the upsampling blocks, following the UNet architecture [28].
Each block at downsampling depth d contains three 2D convolutional layers with \(64 \cdot 2^d\) channels, where \(d=\{0, \ldots , 4\}\), with a kernel size of three. These are followed by batch normalization layers and a LeakyReLU activation with a slope of 0.2.
Four individual convolutional layers with a filter size of 1 provide the model outputs. The outputs of the two models are added together providing the four final output residuals of our proposed model. Each output corresponds to a residual, \(p_i\), for \(i\in \{1,2,3,4\}\), for an artefact. The reconstructed image, v, is obtained by adding all residuals to the input image u, as \(v = u + \sum _i p_i\). A Znormalization layer is applied on the reconstructed image.
An \(\mathcal {L}_2\) regularization term was also added to each of the final output residuals p, defined as
As the regularization term penalizes the residuals of each artefact, it encourages the model to keep the residual small, i.e. keep the input image unchanged. The parameter \(\alpha\) determines the balance between the regularization term and the main objective loss. Our implementation of the model has input and output sizes of \(320\times 320\). The model returns four residuals, for the four artefacts: bias fields, kspace subsampling, motion, and noise. The model has a total of 75.4 million parameters.
Data
We have performed two sets of experiments: first, the models were evaluated quantitatively performing artefact correction of brain scans using a public dataset for training, validation, and testing. Second, a model was trained and validated on an inhouse pelvic MRI dataset, which was evaluated qualitatively on a publicly available dataset for reproducibility. The datasets are further described below.
Datasets
Brain The public dataset contains 3 969 scans of 3T brain MR images [29, 30] of size \(320\times 320\), introduced for the fastMRI challenge [29, 30]^{Footnote 1} for the task of MRI superresolution. The dataset was split into training, validation, and testing datasets, using 70, 10, and \(20\%\) of the patients, respectively.
Pelvic An inhouse dataset that includes \(T_2\)weighted pelvic MR images from 375 patients captured using a GE Signa 3T PET/MR (GE Healthcare, Chicago, Illinois, United States) at the University Hospital of Umeå, Sweden. The images were \(512\times 512\) with 131 slices per patient. Each image slice was cropped in the Fourier space, to yield a \(320\times 320\) image. The dataset was split into training, and validation datasets, using 80 and \(20\%\) of the patients respectively.
For testing, we used the \(T_2\) weighted images from the publicly available Gold Atlas dataset [31]^{Footnote 2}. Similarly, the image slices were \(512\times 512\), therefore each slice was cropped in the Fourier space, to size \(320\times 320\).
Artefacts
The training and validation datasets of both the brain and pelvic scans were augmented to obtain images with a varied set of MRI corruptions. Four different types of artefacts were implemented: bias fields, kspace subsampling, motion, and noise. To each input image, only one of the augmented artefacts with randomized parameters were applied. We now describe these augmentations in turn.
Bias field removal
The term bias field can refer to the effect of various artefacts, for example caused by a nonuniformity in the \(\textrm{B}_0\) static field and the transmitted \(\textrm{B}_1\) field [1], the inhomogeneity of the RF receive coil [32] or heterogeneous \(\textrm{B}_1\) fields. Extensive research on the characteristics of bias fields [33] shows that despite its complex combination of origins, a bias field can be described as a lowfrequency multiplicative imaging artefact causing a smooth intensity variation spatially across the image [34,35,36]. The bias fields were generated in the same way as done by Simkó et al. [36], i.e. we simulated each bias field as a spatial random field (SRF) [37]. For each field, a Gaussian covariance model was used, defined by
where r is the distance from a randomly chosen peak of the Gaussian and l a length scale relating to the frequency of the field. In the covariance model we used a variance of 50 and a length scale of \(10< l < 50\), after downscaling the image to a size of \(32 \times 32\) for computational efficiency.
Bias fields can cause an intensity variation in the range of 10% to 40% [34, 35]. In the paper proposing N4ITK, a commonly used correction method [38], bias field ranges of 20% and 40% were used for evaluation. However, we used larger bias field ranges, chosen randomly between 10% and 100%, corresponding to absolute minimum and maximum values between [0.95, 1.05] and [0.5, 2], respectively. This was done to ensure that the bias artefacts introduced a similar degradation in MSE as the other artefact types.
kspace superresolution
The fully sampled kspace data was subsampled retrospectively by selecting only part of the kspace lines (frequencies) using subsampling masks. We used the sampling masks proposed in the fastMRI challenge [30] and also added our own centered masks selecting only the center of the kspace. All masks are based on Cartesian sampling, which means they follow a rectilinear pattern.
For the fastMRI masks, kspace subsampling was only performed in the phaseencoding direction. When acquiring kspace data, frequencyencoding (FE) and phaseencoding (PE) gradients are applied in perpendicular directions to specify the location of the signal. Consecutive steps in the FE direction can be measured with a single radiofrequency pulse, whereas for the PE direction a different radiofrequency pulse needs to be applied for each step. Consequently, PE takes more time than FE and is therefore more susceptible to movement artefacts. For the axial brain scans from the fastMRI dataset the PE direction was from left to right (LR). In this study, the PE direction was chosen randomly for each image on both subsampling types.
The purpose of kspace subsampling is to accelerate image acquisitions. For example, when selecting only half of the kspace lines, the acquisition time is halved, meaning an acceleration factor of 2. Here the total acceleration factor was chosen uniformly at random to be 2, 3, 4, or 8 to combine the acceleration factors used in the fastMRI dataset [30]. Depending on this acceleration factor, a band of 16, 12, 8, or 4 % of the total kspace lines was included in the centre of the fastMRI masks to maintain the lowfrequency information of the image. The remaining kspace lines were uniformly chosen to be sampled either equidistantly or randomly. The equidistant sampling began with a random offset from the start, so that first a few kspace lines could be skipped before the sampling began. For the centered masks, only the centre square of kspace data was kept. The width of this square was again 16, 12, 8, or 4 % of the total kspace width depending on the selected acceleration factor.
Motion correction
We added rigid motion to the brain scans, and both rigid and periodic motion to the pelvic scans.
Rigid motion corrupted scans were simulated by adding rigid transformations to image space over a series of timesteps. First, the number of kspace lines to be corrupted was uniformly selected between 30 to 80% of the total kspace lines, and these lines were then split randomly into 4–24 segments. A piecewise constant approach was used, where all lines within a segment were corrupted with the same motion parameters. These rotation and translation parameters were sampled from a Gaussian distribution with zero mean and standard deviations of 12\(^{\circ }\) and 30 voxels respectively. These parameters were then applied to the image space and the corrupted image was converted back to kspace. The kspace lines belonging to that segment were sampled and these segments were then combined to form the motion corrupted kspace.
The periodic motion, which was added to simulate breathing, was implemented similar to Tamada et al. [39]. A phase error was added to the kspace in the PE direction, for the pelvic images being AP. The corrupted kspace signal S was given by
where \(S_{0}\) is the original kspace signal, \(k_{x}\) and \(k_{y}\) the FE and PE directions with \(\pi \le k_{x} \le \pi\) and \(\pi < k_{y} \le \pi\), and \(\phi (k_{y})\) the phase error. The phase error is defined as,
where \(k_{y_{0}}\) is the range of centre kspace lines to which no motion was added with \(\frac{\pi }{10} < k_{y_{0}} \le \frac{\pi }{8}\). This was to keep the corrupted image better aligned with the clean image. The \(\Delta\) is the extent of the motion in pixels with \(20 \le \Delta \le 120\), the \(\alpha\) is the period of the respiratory wave in Hz with \(0.1 \le \alpha \le 5\), and \(\beta\) is the phase of the respiratory wave with \(0 \le \beta \le \frac{\pi }{4}\). For each of these four parameters the exact values were uniformly selected between the specified ranges. We used smaller values for \(k_{y_{0}}\) and larger values for \(\Delta\) than Tamada et al. [39], to simulate larger motion artefacts.
Noise removal
We corrupted the kspace of the relatively noisefree images with complex Gaussian white noise such that the SNR of the kspace decreased to a certain value. The following definition of SNR was used similar to Cohen et al. [40],
with S being the mean absolute kspace value and N the standard deviation of the added noise, which was equal for the real and imaginary components. The designated SNR for each kspace was selected uniformly in the range \([12, 10]\) dB based on visual inspection of the noise levels.
Loss functions
Multitask
We implemented a multitask loss function [41] to correct multiple artefacts simultaneously. For each sample in the training data, the loss function would take into account which artefact was used to corrupt it, and the model only minimized the loss with respect to the output residuals to the corresponding artefact, and ignored the other residuals. The loss was,
where v denotes the artefactfree image, u denotes the image with added artefacts, p is the list of residuals for each of the \(n = 4\) artefacts, and y is the index of the artefact that was used in the current training sample. Each residual was added to the output depending on the indicator function \(\lambda [y = i]\), which returns 1 if y equals i, and 0 otherwise. The indicator function sets the unknown residuals to zero in the loss function.
Perceptual loss
In the area of image restoration with machine learning solutions, new models are often introduced that outperform the stateoftheart methods, but the loss functions used when training such models have received less attention.
The MSE loss is often used in regression problems due to its simplicity, wellunderstood statistical interpretation, fast convergence, and low computational cost [18]. However, since the MSE assumes independent Gaussian distributed errors, it often results in a loss of contrast. The MSE loss has also been shown to have a low correlation with perceptual quality [19].
The MSE loss regards the difference between pixel intensities. What we propose is to also include information about the relationship between neighbouring pixels, in the form of image gradients, without significantly increasing the complexity of the loss computation. We begin by looking at the MSE loss,
where \(x  \hat{x}\) contains the residual matrix between the true and predicted images, j is the image pixel index, and N is the number of image pixels.
To this definition, we propose to introduce a convolutional kernel I, that is applied on both images before taking their elementwise difference, introducing the proposed convolutional loss \(\mathcal {L}\) as,
where \(*\) is a (discrete) convolution. If I equals the identity kernel (e.g. \(3\times 3\) \(I_E\), seen in Table 1) the proposed convolutional loss function is identical to MSE. Looking at the kernel \(I_E\), it comes as no surprise, that only the differences between the pixel values are considered, and their relationship to their neighbouring pixels are disregarded.
We propose an extension of this loss function, by adding various kernels to replace \(I_E\). As an example, using edge detection kernels as I means that the loss will focus on, by design, high contrast changes between the pixels. Our proposed convolutional loss function uses a combination of nine kernels, defined as:
where each kernel is collected in Table 1, and each component of the loss has a scaling factor \(\delta\). We have collected the Prewitt top (\(I_{PT}\)) and right (\(I_{PR}\)) operators and the Sobel operators for both the \(3\times 3\) and \(5\times 5\) case (\(I_{S3T}\), \(I_{S3R}\) and \(I_{S5T}\), \(I_{S5R}\) respectively) which are often used for edge detection, and also both the \(3\times 3\) and \(5\times 5\) Laplace operators (\(L_3\) and \(L_5\), respectively). We tune their corresponding \(\delta\) values during optimization to explore which components the models find most beneficial.
\(\mathcal {L}_{C}\) implements convolutions to incorporate the image gradients in the computed loss, while the optimization of the scaling factors allows the model to show which kernels make a significant contribution to the model performance. Once the optimal scaling factors are found, the proposed loss does not significantly increase the computational complexity of the loss over MSE, only introducing convolutional operations which can be performed on a GPU.
Experiments
To evaluate the effect of our multitask approach and the convolutional losses, we have trained seven different models.
As a benchmark on how well the proposed model architecture performs for correcting the individual tasks, we have trained four models using the brain scans for correcting only one of the artefacts: bias, subsampling, motion, and noise. For these models, the architecture was modified to return only one residual. The corresponding models are denoted Bias, Subsampling, Motion, and Noise, respectively.
A model was also trained on the brain scans using the multitask approach, correcting for all artefacts simultaneously, denoted MT.
Additionally, we have also trained a model using the multitask approach, as well as the proposed convolutional loss, denoted MT+\(\mathcal {L}_C\).
Finally, a multitask model using the convolutional loss was also trained on the pelvic dataset, denoted Pelvis MT+\(\mathcal {L}_C\).
The models were first evaluated and compared on their performance of correcting the individual artefacts. Afterwards the models were evaluated on how well they perform if multiple artefacts are present simultaneously. The Pelvis MT+\(\mathcal {L}_C\) model was then evaluated qualitatively on the Gold Atlas dataset.
We selected the evaluation metrics based on the findings of Mason et al. [19] looking into the gap between commonly used image quality metrics and expert human evaluations of image quality. Despite their low correlation to perceptual quality, we used the metrics of MSE and SSIM due to their popularity, and we have also used the more complex Visual Information Fidelity (VIF) metric [42] since it has been shown to correlate well with human perceptual quality. The difference between the evaluated methods was tested for significance using a Friedman test of equivalence followed by a Nemenyi posthoc test [43] with a threshold of 0.05.
Bayesian hyperparameter search
A Bayesian optimization was performed to find the best set of model hyperparameters. This method was selected over random or grid search methods due to its better efficiency as it typically requires fewer iterations to find the optimal hyperparameters [44]. The training of each model was stopped when the performance on the validation dataset did not improve for 10 epochs, using MSE as the validation metric. The Bayesian process had 35 iterations for the models implementing the convolutional loss (MT+\(\mathcal {L}_C\) and Pelvis MT+\(\mathcal {L}_C\)), and 10 iterations for all other models, as the convolutional loss introduces 9 additional parameters to optimize.
We performed the Bayesian optimization over optimization algorithms (Adam [45] and RMSprop [46]), learning rates (\(10^\gamma\), selecting \(\gamma\) from the range [7, 2]), regularization parameters (from the range [0, 1]), and scaling factors (from the range [0, 1]), with the optimized values collected in Table 2.
Evaluating the artefact corrections
The performance of each individual task was investigated, with the trained models being compared to analytical or established machine learning solutions proposed for the individual tasks. Keep in mind, that while most of the methods for comparison can only be applied to a single task, the MT and MT+\(\mathcal {L}_C\) models are the same for all experiments.
Bias field correction
The models trained for bias field correction (Bias, MT, and MT+\(\mathcal {L}_C\)) were compared to the N4ITK method provided by the SimpleITK package [38] and to a machine learningbased bias field correction model, described by Simkó et al. [36].
We applied artificial bias fields of \(5\%\), \(10\%\), and \(20\%\)—which correspond to normalizing the bias fields between [0.95, 1.15], [0.9, 1.1], and [0.8, 1.2], respectively—and compared the corrections of the three methods using SSIM and VIF.
The N4ITK method was optimized for use on the testing dataset without downsampling and using 4 control points. The input image intensities were scaled to a range between 0 and 10.
Subsampling the kspace
We compared our three models trained for improving the resolution of images (Subsampling, MT, and MT+\(\mathcal {L}_C\)) to bicubic upsampling and UniRes^{Footnote 3} [47, 48] a machine learning solution for improving the resolution of MRI images. Their performance is evaluated on the brain scans from the testing dataset. Here each image slice was downsampled using three acceleration factors (\(\times 2\), \(\times 3\), and \(\times 4\)) with the centered masks. Selecting only the center of the kspace allows for removing all the values of the kspace excluded by the mask leading to a smaller downsampled image size, which allows comparisons to bicubic interpolation that takes an image with a smaller image size and increases it to the original size.
Motion
For motion artefact correction, our three corresponding trained models (Motion, MT, and MT+\(\mathcal {L}_C\)) was compared to Total Variation Denoising (TV), following the work in [49], available from the scikitimage [50] Python library. Performing a Bayesian optimization on the weighting parameter with the range \((0.01, 0.02, \ldots , 0.15)\), the best results were achieved with 0.05. We evaluate the methods for three percentages of the center of the kspace that was not altered by the artefact augmentation: \(25\%\), \(12.5\%\), and \(5\%\). This means motion artefacts were not introduced in the center \(25\%\) of the image in the first experiment, \(12.5\%\) for the second, and \(5\%\) for the last.
Noise
For noise removal, our three corresponding trained models (Noise, MT, and MT+\(\mathcal {L}_C\)) were compared to a curvature anisotropic diffusion denoising algorithm from the SimpleITK Python package^{Footnote 4} with time step 0.0625 and 5 iterations and using BM3D^{Footnote 5} [51] with hard thresholding and \(\sigma = 0.2\). We evaluate the methods for three SNR values: 5, 0, and \(5\) dB.
Evaluating multitask learning
After evaluating the models for the individual tasks, the multitask approach is further evaluated on a dataset where multiple artefacts are present simultaneously. All artefact combinations have been evaluated, and they demonstrate comparable results, hence only two particularly interesting combinations are presented. In the first scenario, we applied both subsampling and motion artefacts on the brain scans from the testing dataset. Although the background of the two artefacts are very different, they both create a blurry effect, and cause ringing artefacts, making it more difficult for the model to correct individually. The two artefacts were applied in a random order for each slice. We evaluated how applying the Subsampling and Motion models consecutively compared to the performance of the multitask models (MT and MT+\(\mathcal {L}_C\)).
In the second scenario, we applied noise and bias fields on the available brain scans of the testing dataset. Since we have added complex Gaussian white noise, the model might always assume that the noise comes from this distribution. However as the bias field adds a multiplicative noise over the image, the distribution of the noise term changes. The two artefacts were applied in a random order for each slice. We evaluated how applying the Noise and Bias models consecutively compared to the performance of the multitask models (MT and MT+\(\mathcal {L}_C\)).
An inherent quality of the multitask models is the residual outputs with respect to each artefact correction. First, this allows to disregard any of the corrections if not all artefacts are to be corrected. Second, the mean absolute value of the four outputs gives insight into how much each of the artefacts are corrected for. We collected the mean absolute value of the output residuals of the multitask models for both scenarios.
Qualitative evaluation
To explore the performance on real world datasets with simultaneous artefacts, the model trained on pelvic datasets, using the multitask approach and the convolutional loss function was evaluated on the Gold Atlas dataset. We applied the trained model (Pelvis MT+\(\mathcal {L}_C\)) on scans from two patients.
Results
The Bayesian hyperparameter optimization found that all seven models achieved the best results using the RMSprop optimizer, with the optimal learning rates and \(\alpha\) values collected in Table 2.
The performance of the trained models for individual artefact correction is compared to other established methods. For bias field correction, the results are collected in Table 3. For kspace subsampling, the results can be found in Table 4, while for motion and noise correction, the results are collected in Tables 5 and 6, respectively.
This is followed by the evaluations of scenarios where multiple artefacts are present, with the results in Table 7. For each image correction, the mean absolute values of the four residual outputs were collected for the multitask models. For the first scenario, the values were [0.008, 0.032, 0.021, 0.004] and [0.005, 0.026, 0.010, 0.002] for the MT and MT+\(\mathcal {L}_C\) models respectively. For the second scenario, the values were [0.026, 0.002, 0.012, 0.058] and [0.010, 0.002, 0.006, 0.060] for the two models.
The performance of the Pelvis MT+\(\mathcal {L}_C\) model, using two examples from the Gold Atlas dataset is visualized on Fig. 2.
Discussion
The Bayesian hyperparameter optimization shows that all models improve by using regularization, with the best values ranging between 0.48 and 0.89. Both models using \(\mathcal {L}_C\) favor a higher \(\alpha\), 0.87 and 0.89 for the brain and pelvic datasets, respectively.
Both models used a small scaling factor for the identity kernel, \(I_E\), namely 0.30 and 0.13, respectively, which means the MSE loss contributed only a small fraction of the final loss. The models favored the \(3\times 3\) top Sobel operator over the \(5\times 5\) version, while favoring the \(5\times 5\) version of the right Sobel operator over the \(3\times 3\) kernel. Similarly, the \(5\times 5\) Laplace kernel had a much larger contribution to the loss than the \(3\times 3\) kernel. The largest difference between the \(\delta\) values of the two models was the top Prewitt operator, which had a much larger scaling factor for the MT+\(\mathcal {L}_C\) model. This might be due to the fact that this model has a smaller \(\delta\) for the other top edge detection kernels, than the Pelvis MT+\(\mathcal {L}_C\) model. In general, out of the kernels performing similar tasks, only one of them contributed largely to the loss, while all other \(\delta\) values were minimized.
Despite using different training datasets of different anatomies, the two models implementing \(\mathcal {L}_C\), MT+\(\mathcal {L}_C\) and Pelvis MT+\(\mathcal {L}_C\) found similar \(\delta\) values to work best for the individual kernels. From these findings, we propose that the Bayesian optimization approach does not have to be repeated to find the optimal \(\mathcal {L}_C\), instead the \(\delta\) values can be adjusted to the values presented here.
For bias field correction, the less pronounced fields (\(5\%\)) have a larger effect on SSIM, than on the VIF metric. While the Bias and MT models achieve the best SSIM results, but not for VIF, and the ‘N4ITK’ method achieves the best VIF results, but not for SSIM, the MT+\(\mathcal {L}_C\) model performs well regarding both evaluation metrics.
For kspace subsampling, the trained models outperformed bicubic interpolation and the UniRes model, with regards to VIF. The proposed models can handle a wider variation of subsampling masks than the ones evaluated here.
For motion correction, the smaller motions introduced only a small change with regards to SSIM which could not be significantly improved by either of the methods. For larger motions, although the trained models achieved the best results with regards to SSIM, the Total Variation denoising achieved the best VIF results. Looking at an example correction of the model in Fig. 1, we can see that the model removes a large amount of ripple effects and retains the sharp edges of the clean image.
While the models trained for correcting noise generally perform best with regards to SSIM, only the MT+\(\mathcal {L}_C\) model shows comparable results to C.A.D. and BM3D with regards to VIF.
For an example image slice, all augmented artefacts are visualized, and their respective corrections by the MT+\(\mathcal {L}_C\) model in Fig. 1. While the error maps of the motion and noise corrections display higher values compared to the original images, both examples still demonstrate an improvement in the SSIM and VIF results. In the case of the error maps for additive noise corrections, the blue contour surrounding the skull indicates a decrease in the pixel values relative to the original image. However, this difference was not perceptible upon visual inspection.
Evaluating the performance of \(\mathcal {L}_C\), we see that for all individual artefact correction tasks, the MT+\(\mathcal {L}_C\) model has always reached a higher VIF score than the MT model.
Regarding the multitask aspect of the model, contrary to our hypothesis, the performance of the models trained to correct only a single artefact type did not decrease in the presence of another, previously unseen artefact. In fact, for both cases where two different artefacts were applied on an image (kspace subsampling and motion, bias and noise) a combination of two alternative models could achieved similar, or even better performance than our multitask models. However, the order of applying the models changes the performance significantly. For the case of kspace subsampling and motion, applying the Motion model first, and the Subsampling model second achieved the best VIF results, however applying the models in the other order achieves significantly worse results than our multi task models. For the other scenario, applying the Bias model first, and then the Noise model yields significantly better results than the other order. A possible reason for the superior results of combining two models compared to the multitask model stems from the complexity of the two methods. Since all models use the same network architecture, the combination of the two models has double the number of model parameters, compared to the proposed multitask solution. However, the large difference in performance when changing the order of the two models in the consecutive approach, also suggests that increasing the complexity of the approach not only leads to an increase in performance but also in sensitivity.
The mean absolute values of the residuals show that the largest corrections were indeed added to the artefacts that were present in the images.
In the second scenario, the Noise model seems especially sensitive to bias, which could be explained by the model assuming a homogeneous pixel intensity within the tissues to estimate the noise.
The MT+\(\mathcal {L}_C\) model significantly outperformed MT in both scenarios, with regards to both evaluation metrics.
The example corrections of pelvic scans in Fig. 2 show that the residuals indicate well the artefacts they correspond to. The bias term is generally smooth and slowly varying, while the subsampling term is insignificant since the input image was already of fullresolution (all residuals are plotted using the same scale). The motion term shows ripple effects that are corrected for in the reconstructed output.
The second example illustrates how the model removes bias and noise from the image slices to return a cleaner image, and it also shows how the axial, slicewise corrections do not introduce artefacts when viewed from an orthogonal angle.
Conclusions
We have developed and trained two multitask, deep learning models using our novel loss function, for artefact correction in brain, and pelvic MR data. It provides more robust results than using a combination of machine learning solutions correcting only a single artefact, and it also performs all four tasks with similar or superior performance to other, established methods.
Our work included implementing the augmentation of four of the most typical artefacts in MR data, bias fields, subsampled kspace, motion, and noise. The model simultaneously corrects all four aforementioned artefact types, without any decrease in performance compared to models trained for correcting only one artefact type.
The multitask approach also proved more robust in the scenario of multiple simultaneous artefacts, compared to applying the individual models sequentially.
Our proposed convolutional loss function introduces the differences in the image gradients in the computed loss, through the use of image convolutions. The introduction of the Laplace operator, and at least one top and right edge detection kernels improve the performance of the model for artefact corrections in all cases with regards to VIF. The thorough evaluation of the model trained on brain scans showed to outperform several alternative methods for the correction of individual artefacts.
In addition, our example corrections on pelvic scans indicated that the performance of the model translates well to real data despite being trained on augmented artefacts.
To improve the reproducibility of our approach, we have made all trained models and source code openly accessible.
Availability of data and materials
The fastMRI dataset [29, 30] (https://fastmri.med.nyu.edu/) is publically available.
For evaluating the pelvic model, we have used the publicly available Gold Atlas dataset [31] (https://zenodo.org/record/583096).
The source code used for the conclusions of this article is available on Github (https://github.com/attilasimko/mriac). The trained models are also available in Hero Imaging (https://heroimaging.com), in its “AI Model Zoo” collection.
We used TensorFlow with two RTX 3090 and four 2080 Ti GPUs for training.
References
McRobbie DW, Moore EA, Graves MJ. MRI from picture to proton. Cambridge: Cambridge University Press; 2007.
Zaitsev M, Maclaren J, Herbst M. Motion artifacts in MRI: A complex problem with many partial solutions. J Magn Reson Imaging. 2015;42(4):887–901. https://doi.org/10.1002/jmri.24850.
Laves MH, Tölle M, Ortmaier T. Uncertainty Estimation in Medical Image Denoising with Bayesian Deep Image Prior. Lect Notes Comput Sci (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2020;12443 LNCS:81–96. https://doi.org/10.1007/9783030603656_9.
Wang Z, Chen J, Hoi SCH. Deep Learning for Image SuperResolution: A Survey. IEEE Trans Pattern Anal Mach Intell. 2021;43(10):3365–87. https://doi.org/10.1109/TPAMI.2020.2982166.
Terpstra ML, Maspero M, Bruijnen T, Verhoeff JJC, Lagendijk JJW, van den Berg CAT. Realtime 3D motion estimation from undersampled MRI using multiresolution neural networks. Med Phys. 2021;48(11):6597–613. https://doi.org/10.1002/mp.15217.
Liu Y, Wang Y, Li N, Cheng X, Zhang Y, Huang Y, et al. An AttentionBased Approach for Single Image Super Resolution. Proc Int Conf Pattern Recogn. 2018;2018August:2777–84. https://doi.org/10.1109/ICPR.2018.8545760.
Duffy BA, Zhao L, Sepehrband F, Min J, Wang DJ, Shi Y, et al. Retrospective motion artifact correction of structural MRI images using deep learning improves the quality of cortical surface reconstructions. NeuroImage. 2021;230:117756. https://doi.org/10.1016/j.neuroimage.2021.117756.
Chen Y, Christodoulou AG, Zhou Z, Shi F, Xie Y, Li D. MRI SuperResolution with GAN and 3D MultiLevel DenseNet: Smaller, Faster, and Better. Med Image Anal. 2020;1502.02072. arXiv:2003.01217.
Ramsundar B, Kearnes S, Riley P, Webster D, Konerding D, Pande V. Massively Multitask Networks for Drug Discovery. arXiv preprint. 2015;1502.02072(Icml). arXiv:1502.02072.
Eslami M, Tabarestani S, Adjouadi M. Feasibility Assessment of Multitasking in MRI Neuroimaging Analysis : Tissue Segmentation , CrossModality Conversion and Bias correction. arXiv eprints. 2021;2105.14986.
Singh NM, Iglesias JE. Joint Frequency and Image Space Learning for MRI Reconstruction and Analysis MNIST  Image Brains  Image MNIST  Frequency Brains  Frequency. J Mach Learn Biomed Imaging. 2022;018:1–28.
Lee S, Jung S, Jung KJ, Kim DH. Deep Learning in MR Motion Correction: a Brief Review and a New Motion Simulation Tool (view2Dmotion). Investigative Magn Reson Imaging. 2020;24(4):196. https://doi.org/10.13104/imri.2020.24.4.196.
Muckley MJ, Riemenschneider B, Radmanesh A, Kim S, Jeong G, Ko J, et al. Results of the 2020 fastMRI Challenge for Machine Learning MR Image Reconstruction. IEEE Trans Med Imaging. 2021;40(9):2306–17. https://doi.org/10.1109/TMI.2021.3075856.
Simko AT, Löfstedt T, Garpebring A, Bylund M, Nyholm T, Jonsson J. Changing the Contrast of Magnetic Resonance Imaging Signals using Deep Learning. Proceedings of the Fourth Conference on Medical Imaging with Deep Learning. 2021;143:713–27.
Hyun CM, Kim HP, Lee SM, Lee S, Seo JK. Deep learning for undersampled MRI reconstruction. Phys Med Biol. 2018;63(13):135007. https://doi.org/10.1088/13616560/aac71a.
Han Y, Sunwoo L, Ye JC. KSpace Deep Learning for Accelerated MRI. IEEE Trans Med Imaging. 2020;39(2):377–86. https://doi.org/10.1109/TMI.2019.2927101.
Eo T, Jun Y, Kim T, Jang J, Lee HJ, Hwang D. KIKInet: crossdomain convolutional neural networks for reconstructing undersampled magnetic resonance images. Magn Reson Med. 2018;80(5):2188–201. https://doi.org/10.1002/mrm.27201.
Lim B, Son S, Kim H, Nah S, Lee KM. Enhanced Deep Residual Networks for Single Image SuperResolution. IEEE Comput Soc Conf Comput Vis Pattern Recogn Workshops. 2017;2017July:1132–40. https://doi.org/10.1109/CVPRW.2017.151.
Mason A, Rioux J, Clarke SE, Costa A, Schmidt M, Keough V, et al. Comparison of Objective Image Quality Metrics to Expert Radiologists’ Scoring of Diagnostic Quality of MR Images. IEEE Trans Med Imaging. 2020;39(4):1064–72. https://doi.org/10.1109/TMI.2019.2930338.
Sommer K, Saalbach A, Brosch T, Hall C, Cross NM, Andre JB. Correction of motion artifacts using a multiscale fully convolutional neural network. Am J Neuroradiol. 2020;41(3):416–23. https://doi.org/10.3174/ajnr.A6436.
Kervadec H, Bahig H, LetourneauGuillon L, Dolz J, Ayed IB. Beyond pixelwise supervision for segmentation: A few global shape descriptors might be surprisingly good! Proc Mach Learn Res. 2021;1–16. arXiv:2105.00859.
Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. Proc IEEE Comput Soc Conf Comput Vis Pattern Recogn. 2018;2018(1):586–95. https://doi.org/10.1109/CVPR.2018.00068.
Boulanger M, Nunes JC, Chourak H, Largent A, Tahri S, Acosta O, et al. Deep learning methods to generate synthetic CT from MRI in radiotherapy: A literature review. Phys Med. 2021;89(July):265–81. https://doi.org/10.1016/j.ejmp.2021.07.027.
Saxena D, Cao J. Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions. arXiv. 2020. arXiv:2005.00065.
Chu J, Liu J, Qiao J, Wang X, Li Y. Gradientbased adaptive interpolation in superresolution image restoration. Int Conf Signal Process Proc ICSP. 2008;2008(1):1027–30. https://doi.org/10.1109/ICOSP.2008.4697303.
Abrahamyan L, Truong AM, Philips W, Deligiannis N. Gradient Variance Loss for StructureEnhanced Image SuperResolution. ICASSP IEEE Int Conf Acoust Speech Signal Process Proc. 2022;2022May:3219–23. https://doi.org/10.1109/ICASSP43922.2022.9747387.
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, et al. PhotoRealistic Single Image SuperResolution Using a Generative Adversarial Network. Cvpr. 2017;2(3):4.
Singh B, Najibi M, Davis LS. Sniper: Efficient multiscale training. Adv Neural Inf Process Syst. 2018;2018December:9310–20. arXiv:1805.09300.
Knoll F, Zbontar J, Sriram A, Muckley MJ, Bruno M, Defazio A, et al. FastMRI: A publicly available raw kspace and DICOM dataset of knee images for accelerated MR image reconstruction using machine learning. Radiol Artif Intell. 2020;2(1):e190007. https://doi.org/10.1148/ryai.2020190007.
Zbontar J, Knoll F, Sriram A, Murrell T, Huang Z, Muckley MJ, et al. fastMRI: An Open Dataset and Benchmarks for Accelerated MRI. arXiv. 2018;1–35. arXiv:1811.08839.
Nyholm T, Svensson S, Andersson S, Jonsson J, Sohlin M, Gustafsson C, et al. MR and CT data with multiobserver delineations of organs in the pelvic areaPart of the Gold Atlas project. Medical Physics. 2018;45(3):1295–300. https://doi.org/10.1002/mp.12748.
Belaroussi B, Milles J, Carme S, Zhu YM, BenoitCattin H. Intensity nonuniformity correction in MRI: Existing methods and their validation. Med Image Anal. 2006;10(2):234–46. https://doi.org/10.1016/j.media.2005.09.004.
Sled JG, Zijdenbos AP, Evans AC. A nonparametric method for automatic correction of intensity nonuniformity in mri data. IEEE Trans Med Imaging. 1998;17(1):87–97. https://doi.org/10.1109/42.668698.
Meyer CR, Bland PH, Pipe J. Retrospective correction of intensity inhomogeneities in MRI. IEEE Trans Med Imaging. 1995;14(1):36–41. https://doi.org/10.1109/42.370400.
Sled JG, Zijdenbos AP, Evans AC. A comparison of retrospective intensity nonuniformity correction methods for MRI. Lect Notes Comput Sci (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 1997;1230:459–64. https://doi.org/10.1007/3540630465_43.
Simkó A, Löfstedt T, Garpebring A, Nyholm T, Jonsson J. MRI bias field correction with an implicitly trained CNN. Proc Mach Learn ResUnder Rev. 2022;1–14. https://doi.org/10.5281/zenodo.3749526.
Heße F, Prykhodko V, Schlüter S, Attinger S. Generating random fields with a truncated powerlaw variogram: Acomparison of several numerical methods. Environ Model Softw. 2014;55:32–48. https://doi.org/10.1016/j.envsoft.2014.01.013.
Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, et al. N4ITK: Improved N3 bias correction. IEEE Trans Med Imaging. 2010;29(6):1310–20. https://doi.org/10.1109/TMI.2010.2046908.
Tamada D, Kromrey ML, Ichikawa S, Onishi H, Motosugi U. Motion artifact reduction using a convolutional neural network for dynamic contrast enhanced mr imaging of the liver. Magn Reson Med Sci. 2020;19(1):64–76. https://doi.org/10.2463/mrms.mp.20180156.
Cohen O, Rosen MS. Algorithm comparison for schedule optimization in MR fingerprinting. Magn Reson Imaging. 2017;41:15–21. https://doi.org/10.1016/j.mri.2017.02.010.
Zhang Y, Yang Q. A Survey on MultiTask Learning. IEEE Trans Knowl Data Eng. 2022;34(12):5586–609. https://doi.org/10.1109/TKDE.2021.3070203.
Sheikh HR, Bovik AC. Image information and visual quality. ICASSP IEEE Int Conf Acoust Speech Signal Process Proc. 2004;3(2):430–44. https://doi.org/10.1109/icassp.2004.1326643.
Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1–30.
Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst. 2012;4:2951–9 arXiv:1206.2944.
Kingma DP, Ba JL. Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015  Conference Track Proceedings. 2015. p. 1–15. arXiv:1412.6980.
Tieleman T, Hinton G. Divide the gradient by a running average of its recent magnitude. Human Mach Hear. 2012;4(2):26–31.
Brudfors M, Nachev P. MRI SuperResolution using MultiChannel Total Variation. MIUA2018. 2018;3(1):1–12.
Brudfors M, Balbastre Y, Nachev P, Ashburner J. A Tool for SuperResolving Multimodal Clinical MRI. 2019. arXiv eprints:1909.01140. arXiv:1909.01140.
Küstner T, Armanious K, Yang J, Yang B, Schick F, Gatidis S. Retrospective correction of motionaffected MR images using deep learning frameworks. Magn Reson Med. 2019;82(4):1527–40. https://doi.org/10.1002/mrm.27783.
Van Der Walt S, Schönberger JL, NunezIglesias J, Boulogne F, Warner JD, Yager N, et al. Scikitimage: Image processing in python. PeerJ. 2014;2014(1):1–18. https://doi.org/10.7717/peerj.453.
Makinen Y, Azzari L, Foi A. Collaborative Filtering of Correlated Noise: Exact TransformDomain Variance for Improved Shrinkage and Patch Matching. IEEE Trans Image Process. 2020;29:8339–54. https://doi.org/10.1109/TIP.2020.3014721.
Acknowledgements
Not applicable.
Funding
Open access funding provided by Umea University. We are grateful for the financial support obtained from the Cancer Research Foundation in Northern Sweden (LP 182182, AMP 18912, AMP 201014, LP 222319), the Västerbotten regional county, and from Karin and Krister Olsson. The computations were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) at the High Performance Computing Center North (HPC2N) in Umeå, Sweden, partially funded by the Swedish Research Council through grant agreement no. 201805973.
Author information
Authors and Affiliations
Contributions
A.S., S.R., T.L., A.G., T.N., M.B., and J.J. have all contributed to designing the methods and experiments presented in the paper. A.S. and S.R. wrote the main manuscript text. A.S. prepared the tables and figures. M.B. organized the inhouse pelvic dataset. A.S., S.R., T.L., A.G., T.N., M.B., and J.J. have all reviewed the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The pelvic model was trained and validated on an inhouse dataset. The scans were captured using a GE Signa 3T PET/MR (GE Healthcare, Chicago, Illinois, United States) at the University Hospital of Umeå, Sweden. All experimental protocols were approved by the Swedish Ethical Review Authority (ethical approval Dnr: 201902666), and all methods were carried out in accordance with relevant guidelines and regulations. All participants were adults and their informed consent was obtained.
Consent for publication
Not applicable.
Competing interests
The qualitative evaluations were performed using Hero Imaging, NONPI Medical AB, where Attila Simkó is an employee and Tommy Löfstedt, Anders Garpebring, Tufve Nyholm, and Joakim Jonsson are coowners. Simone Ruiter and Mikael Bylund declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Simkó, A., Ruiter, S., Löfstedt, T. et al. Improving MR image quality with a multitask model, using convolutional losses. BMC Med Imaging 23, 148 (2023). https://doi.org/10.1186/s1288002301109z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1288002301109z