Towards retrospective motion correction and reconstruction for clinical 3D brain MRI protocols with a reference contrast

Motion artifacts often spoil the radiological interpretation of MR images, and in the most severe cases the scan needs be repeated, with additional costs for the provider. We discuss the application of a novel 3D retrospective rigid motion correction and reconstruction scheme for MRI, which leverages multiple scans contained in a MR session. Typically, in a multi-contrast MR session, motion does not equally affect all the scans, and some motion-free scans are generally available, so that we can exploit their anatomic similarity. The uncorrupted scan is used as a reference in a generalized rigid-motion registration problem to remove the motion artifacts affecting the corrupted scans. We discuss the potential of the proposed algorithm with a prospective in-vivo study and clinical 3D brain protocols. This framework can be easily incorporated into the existing clinical practice with no disruption to the conventional workflow.


Introduction
Magnetic resonance imaging (MRI) is fundamentally prone to motion artifacts, since the data acquisition process usually lasts several minutes for each acquired contrast, and the MR exam can be an uncomfortable experience for the patient.Motion corruption impedes a correct radiological assessment, which then may require a scan repetition, leading to considerable waste of resources for the hospital [Andre et al., 2015].
Motion reduction strategies are broadly classified as preventive, prospective, or retrospective techniques [Zaitsev et al., 2015, Godenschweger et al., 2016].Preventive strategies include physical devices to limit the motion (e.g. head holders) or sedation, but their application is limited by ethical or health considerations, and are often ineffective in eliminating patient movement.Prospective and retrospective strategies, on the other hand, directly or indirectly estimate the motion that the object of interest undergoes inside the scanner, and remove its effect from the data or in the reconstruction phase.This correction step is said to be applied "prospectively" [Maclaren et al., 2013] when the position of the patient is tracked in real time and the scan settings are adjusted accordingly on-the-fly.For example, the relative change of position can be estimated by acquiring additional k-space or image-space navigators [Ehman andFelmlee, 1989b, Welch et al., 2002], or with "self-navigating" sequences [Pipe, 1999, Welch et al., 2004, Bookwalter et al., 2010].Alternatively, camera devices or markers [Zaitsev et al., 2006, Forman et al., 2011] can be used to estimate the imaging object position.However, most tracking modalities are often defective in terms of either precision, patient interaction, or sequence independence [Maclaren et al., 2013].Therefore, although effective in many respects, prospective methods have somewhat limited range of application.
Retrospective algorithms are characterized by the removal of motion artifacts in the final reconstruction phase, after the data acquisition.The main advantage of retrospective schemes is in their flexibility, since they do not necessarily require additional hardware, scanner modifications, MR navigators, markers, and so on.Note, however, that they may benefit from using prior information about the target imaging object and motion pattern.One main challenge for this class of methods is the need for time-intensive computations.The scientific literature on retrospective motion correction is quite rich: examples of retrospective techniques for rigid motion using navigators or markers can be found in Ehman and Felmlee [1989a], Korin et al. [1995], Mendes et al. [2009], Bookwalter et al. [2010], Vaillant et al. [2014], while examples of "blind" techniques (in this context, meaning that they are not using navigators or markers) are presented in Atkinson et al. [1997Atkinson et al. [ , 1999]], Manduca et al. [2000], Lin and Song [2006], Loktyushin et al. [2013].
Retrospective correction schemes are typically formulated as a bi-level optimization problem, where two types of unknown are jointly estimated: the reconstructed (2D/3D) image and the motion parameters.Due to the ill-posedness of the problem here considered, the choice of the regularization method is crucial: see, for example, gradient-entropy regularization in Manduca et al. [2000], Lin and Song [2006], Loktyushin et al. [2013], sparsity regularization in Möller et al. [2015], or iteratively re-weighted least-squares regularization in Cordero-Grande et al. [2020].Another strategy to ease the ill-posedness is to resort to special acquisition patterns in k-space that are more robust in terms of motion correction, as described in the DISORDER method in Cordero-Grande et al. [2020].Alternatively, many machine-learning approaches have been recently proposed for retrospective motion correction [Pawar et al., 2018, Küstner et al., 2019, Haskell et al., 2019, Lee et al., 2020, Ghaffari et al., 2021, Lee et al., 2021, Hossbach et al., 2022].Some previous work in Rizzuti et al. [2022] introduced a retrospective motion correction scheme, whose novel aspect is the use of a contrast free of motion artifacts that can be leveraged as a reference to remove motion effects from any other contrast from the same patient, akin to a generalized rigid motion registration.The chief assumption of this work is the following: in a multicontrast MR session, motion does not typically affect all the scans and some motion-free scans are generally available, so that we can exploit their anatomic similarity.Structural similarity is technically achieved via structure-guided total variation (TV), as originally proposed in Ehrhardt and Betcke [2016] and further developed in Bungert and Ehrhardt [2020] (see also Ehrhardt et al. [2015]).
The goal of this paper is to extend the scope of Rizzuti et al. [2022], limited to 2D synthetic results, to general 3D randomized acquisitions and 3D rigid-motion correction.We experimentally verify that a 3D extension is indeed feasible for brain imaging.We do not assume data-driven priors (so that machine learning is not available), any additional navigator data, nor consider motion-resilient acquisition schemes, in order to conform to more broadly available clinical protocols.Note that the proposed method can employ any acquisition scheme, in principle, but we stick to Cartesian acquisition, which are the standard encoding strategies of clinical protocols.Since we focus on brain imaging, rigid motion can be effectively assumed for our scope.The reference and the corrupted contrast do not need be co-registered or acquired with the same resolution.
We thoroughly validate the method with a prospective in-vivo study based on three volunteers and several motion types.The strength and limitations of the method are highlighted with the comparison of correction quality with varying degrees of motion artifacts and contrast type as a reference prior.

Theory
In this section, we present the basic mathematical formulation underpinning the proposed motion correction method (further details can be found in Rizzuti et al. [2022]).
The contrast volume, in the remainder of this section, will be denoted by u ∈ C nx , where n x is the number of voxels contained in a rectangular field of view.The 3D image undergoes a time-dependent rigid motion where t is a time-related label.In practice, t corresponds to the index of the kspace readout line in the phase-encoding plane.The corresponding rigid transformation is given by T θ θ θt , and is parameterized by a time-dependent motion parameter θ θ θ t ∈ R 6 , which includes translations and rotations in 3D: The rigid motion consists of a 3D rotation (defined by the 2D rotation angles ϕ xy , ϕ xz , ϕ yz , performed in this order in the corresponding planes) followed by a translation (governed by the translation parameters τ x , τ y , τ z ).Without loss of generality, we are assuming a Cartesian acquisition.At each given time t, the MR acquisition process corresponds to the evaluation of the Fourier transform F of u t in a particular subset K t of the k-space.In practice, the acquisition is structured in such a way that all the subsets K t consist of parallel lines in the k-space (the common direction being the readout direction).We refer to the Fourier transform of a rigidly moving object u θ θ θ := T θ θ θ u as the perturbed Fourier transform F θ θ θ u := Fu θ θ θ , and can be directly characterized as where the rotational operator with respect to the 3D angle ϕ ϕ ϕ is indicated by R ϕ ϕ ϕ .This definition is motivated by classical Fourier identities that describe the action of rigid motion under the Fourier transform.Due to rotational effects, one must resort to the non-uniform discrete Fourier transform (NUFFT) to evaluate equation (3) [Barnett et al., 2019, Barnett, 2021].Note that we implicitly assumed that no motion occurs while sampling the elements of K t , since the state of the object at the time t is associated to a single motion parameter θ θ θ t .The assumption is motivated by the fact that K t will correspond, in practice, to a single Cartesian readout line, which lasts few milliseconds.Hence, the data acquisition at time t is symbolized by the application of the selection operator S t to the Fourier-transformed volume: Here, n r is the number of k-space samples in a single readout.
The resulting inverse problem can be cast as an optimization problem over the reconstruction unknowns u and the motion parameters θ θ θ t , that is: where θ θ θ 1:nt = (θ θ θ 1 , . . ., θ θ θ nt ), and n t is the number of time steps.The weighting parameters λ, µ (both positive numbers) set the strength of the corresponding regularization terms.The first term of the objective functional in equation ( 5) corresponds to the data misfit: The least-squares norm is indicated here by • .The regularization terms g u and g θ are crucial in ensuring the well-posedness of the problem.Indeed, the objective in equation ( 6) will be sensitive to the relatively high signal-to-noise ratios (SNR) of the high-frequency components of the data.Moreover, the objective is highly non-convex as a function of θ θ θ 1:nt .The motion-parameter regularization is designed to ensure some form of regularity in time (e.g.smoothness), this can be achieved for example by setting Alternatively, higher-order derivatives may be used.Another strategy, adopted in this paper, is to impose smoothness by setting hard constraints for the motion parameters, rather than via an additive penalty term as in equation ( 7) [Rizzuti et al., 2022].

Reference-guided total variation regularization
The crux of the proposed method is related to the choice of the regularization term g u in equation ( 5).We adopt the structure-guided total variation scheme proposed in Ehrhardt and Betcke [2016] in the context of multi-contrast imaging, that is: where I 3 is the 3 × 3 identity matrix, ∇•| x is the discretized gradient operator evaluated at the voxel with center x, and Π v | x is the projection operator on the linear space that is orthogonal to the vector ξ v | x ∈ C 3 .The symbol H indicates the adjoint operation.The vector ξ v | x corresponds to the normalized gradient of a given motion-free contrast v, e.g.
The actual definition is for some constant η > 0. The regularization term in equation ( 8) enforces the gradient structure of v onto u, when v and u are anatomically compatible.It is important to observe that v is not required to be registered with the target contrast u, since the estimation of the motion parameters in equation ( 5) will automatically compensate for the initial misalignment [see also Bungert and Ehrhardt, 2020].In this work, we actually adopt a constrained formulation of equation ( 8), meaning that structural similarity is imposed by forcing the solution to belong to the constraint set C u = {u : g u (u) ≤ ε}, where ε > 0 is a prescribed regularization level [see Rizzuti et al., 2022, Peters et al., 2018, for more details].

Optimization
In order to solve equation ( 5), we adopt an alternating update scheme based on the proximal alternating minimization algorithm (PALM) described in Bolte et al. [2014].The algorithm of the optimization strategy is exemplified in Algorithm 1.Each update requires the linearization of the smooth objective f and the application of the proximal operators associated to g u and g θ .As it is commonly noted in the image registration literature, we will make use of multi-scale methods to ease the ill-posedness of the problem.Two types of scale are considered, here.One is traditionally associated to the reconstruction grid size, by considering a sequence of optimization problems defined on progressively finer grids.Note that spatial coarsening of the reconstructed image u is intertwined with the temporal coarsening of the motion parameters θ θ θ 1:nt , since they are associated to sample locations in the k-space.The other scale is related to the regularization strength λ, as defined in equation ( 5).Hence, strongly weighted problems are solved first, and the regularization is gradually relaxed as in a continuation strategy.Overall, two nested sequences of optimization problems are considered here.
Algorithm 1 Joint motion correction and reconstruction with alternating proximal operator evaluation Motion parameter proxy end for end for Upscaling of u, θ θ θ end for

Experiments
In this section, we set up several experiments that demonstrate the capabilities of the retrospective motion correction algorithm detailed in Section 2, whose main novel aspect and strength is the use of a reference contrast to guide the correction.Our objective is to tackle motion correction for brain imaging, and we focus on acquisition protocols that are relevant for the clinical practice.All the imaging sequences considered in this study were taken from actual clinical brain protocols of the Radiology and Radiotherapy departments of the UMC Utrecht.The data considered in this section is based on 3D Cartesian acquisition.The sampling pattern used in these acquisitions typically utilizes pseudo-random undersampling.The main assumptions underlying the proposed method are related to the availability of a motion-free reference contrast and the motion artifacts being produced by rigid motion.
We consider several studies with volunteer data (three volunteers in total to actively move during the scan (a certain number of times).While we did not track the type of rigid motion produced by the volunteers, we prompted them to maintain the same position in between our instructions.In this way, we have some fair qualitative expectations about the motion estimated by the correction algorithm (that is, a stepwise behavior, see also Appendix C for the estimated motion unknowns associated to each of the experiments described in this section).The 'ground-truth' acquisition and reconstruction is obtained by simply asking the volunteers not to move.
The volunteer studies aim at investigating several relevant questions related to the application of the proposed retrospective motion correction technique.The first study in Section 3.1 is a qualitative assessment of the robustness of the motion correction with respect to motion complexity, here equated to the number of volunteer poses during the scan.In Section 3.2, we demonstrate that many combinations of corrupted-contrast and reference-contrast types are possible for adequate correction.In the experiment in Section 3.3, we ascertain whether using the scanner reconstruction in the DICOM format, as opposed to the raw k-space data, is suited as input data for the algorithm after applying the Fourier transform (e.g., d in equation 6).We note that the proposed method assumes coil-resolved data as input for computational reasons, therefore it is sensitive to how the raw k-space data is post-processed, and, in particular, to the degree of which the post-processed data can be adequately corrected by rigid-motion estimation.Finally, further experimentation is deferred to the supplemental section in Appendix B, where we demonstrate the effectiveness of the reference-based motion correction against a "blind" motion correction method, which does not use a reference contrast to eliminate the motion artifacts.
All the following investigations use a 1.5 T Philips Ingenia scanner with a 15-channel head coil.We considered several contrast acquisition sequences with the specifications highlighted in Table 1.For all the experiments except the one described in Section 3.3, the raw k-space data (pertaining to corrupted or ground-truth scans) was exported for off-line processing.A pre-processing step is dedicated to remove coil dependency, by performing a SENSE reconstruction and then applying the Fourier transform.

Experiment 1: robustness with respect to motion complexity
In order to test the robustness of the proposed motion correction scheme in terms of motion complexity, we instruct volunteer 1 to move multiple times during acquisition.With "motion complexity" we specifically refer to the number of position changes performed by the volunteer within one prospectively corrupted scan.The goal of this in-vivo study is to provide a qualitative assessment of the degradation of the reconstruction quality as a function of motion complexity.
We consider three levels of motion corruption: (i ) the volunteer moves once, (ii ) the volunteer moves twice, and (iii ) the volunteer moves five times.The volunteer is instructed to change its head position every time it is prompted to do so, and maintain that position in between instructions.We use T2-FLAIRweighted contrasts as corrupted scans, with T1-weighted contrast as a reference (see Table 1 for further details).The corrupted acquisition employs randomized sampling.
The results of this experiment are collected in Section 4.1.Note that, in Appendix B, we use the same settings detailed in this experiment to compare the proposed algorithm with a baseline method without a reference guide.

Experiment 2: on the choice of the reference contrast
This in-vivo experiment tests the proposed correction scheme with respect to a different combination of corrupted and reference contrast, namely a T1-weighted corrupted contrast with a T2-weighted reference contrast (see Table 1).For this experiment, we prompt volunteer 2 to move five times during the acquisition.The corrupted acquisition employs randomized sampling.
In Section 4.2, we gather the results for this experiment.

Experiment 3: scanner reconstruction vs processed raw k-space data as input for retrospective motion correction
With the in-vivo studies presented in this section, we investigate a question related to the nature of the input data d (equation 6) required by the algorithm.Due to the formulation of the problem directly in k-space (by means of the NUFFT), the method assumes coil-resolved data.One must then assess whether the scanner reconstruction (available in the DICOM format) is suitable for this purpose, since many different reconstruction methods are available depending on the acquisition protocol.In particular, the default reconstruction method for linear-filling patterns in k-space employs the SENSE framework [Pruessmann et al., 1999], while compressed-sensing reconstruction (via the wavelet transform) is used for randomized acquisitions [Lustig et al., 2007].Note that our experimentation suggests that without the phase map of the scanner reconstruction our motion correction scheme does not perform adequately.
Therefore, with "scanner reconstruction", we will always refer to the complexvalued scanner reconstruction (comprising both the respective amplitude and phase).
In the first experiment, we asked volunteer 3 to change position once during the prospectively-corrupted acquisition.We consider a corrupted T2-weighted contrast and a reference T1-weighted contrast (see Table 1).One important aspect of this experiment is related to the acquisition protocol of the T2-weighted contrast, based on a linear-filling pattern in k-space.The corrupted data used as input for the proposed motion-correction algorithm is obtained by exporting the reconstructed volume directly from the scanner, followed by a simple Fourier transform.Note that this 3D image has been obtained by a SENSE reconstruction.
The second experiment is set up similarly to the previous one.We asked volunteer 3 to change position only once during the acquisition phase.We consider, now, a corrupted T2-FLAIR-weighted contrast with a reference T1weighted contrast (see Table 1).The most important difference with the previous experiment, besides the type of contrast pair considered, is related to the randomized acquisition protocol.In this case, the scanner reconstruction employs a compressed-sensing reconstruction, and is not suited as input for the proposed motion-correction algorithm (see Appendix A).Therefore, for adequate motion correction, we must set up an intermediate step for processing the raw k-space data via the SENSE reconstruction.
We further discuss the results of this experiment in Section 4.3.

Results
In this section, we display and briefly analyze the results of the experiments presented in the previous section.For ease of exposition, we organized the power signal-to-noise ratio (PSNR) and structural similarity index (SSIM) values of the reconstructions (with respect to a known ground truth) in Table 2.The motion-corrected full-volume scans were analyzed by a neuroradiologist with 16 years of experience.These were generally deemed of good radiological quality.The motion-related artifacts have been completely removed, and the results are quite close to the ground truth.In Table 3, we organized a more detailed qualitative analysis of the 3D results, geared toward a radiological assessment of the corrected scans.

Experiment 1: robustness test
We gather the results for the robustness test described in Section 3.1 (volunteer 1) in Figures 1, 2, and 3 for motion corruption mechanisms associated to one, two, and five changes of position, respectively.Furthermore, we juxtapose the corrected images with varying degrees of corruption in Figure 4. We observe that the proposed method consistently ameliorates the corrupted scan.The quality   indexes based on PSNR and SSIM show only a modest decrease in correction quality as a function of motion complexity (Figure 4).

Experiment 2: choice of the reference contrast
With the experiment described in Section 3.2, we demonstrate the flexibility of the correction scheme with respect to the choice of the reference contrast.
The results are shown in Figure 5. Contrary to the experiments detailed in the previous section, we are now considering a T2-weighted reference contrast to guide the correction of a T1-weighted corrupted contrast.The quality of the correction indicates that the proposed technique is rather flexible in terms of reference contrast.

Experiment 3: scanner reconstruction vs raw k-space data
The results of the two experiments described in Section 3.3 are depicted in Figures 6 and 7.The main difference between the two experiments is related to the input data for the proposed motion-correction algorithm.
In the first experiment, the corrupted contrast has been acquired with a protocol based on a linear filling pattern in k-space.Note that, in this particular case, the scanner reconstruction implements the SENSE method.We then extracted the DICOM of both amplitude and phase produced by the scanner, and used it as input data (after a Fourier transform) for the algorithm.The proposed scheme is able to successfully remove the motion artifacts in Figure 6.
In the case of randomized sampling, the scanner reconstruction is not adequate as input data for the proposed motion-correction algorithm, because it employs a compressed-sensing algorithm.We speculate that compressed-sensing reconstructions degrade the information contained in the corrupted volume, and the corrected contrast cannot be effectively recovered by simply removing rigid-motion artifacts (we defer the degraded results when using scanner reconstruction data in Appendix A).However, when the input data is obtained by directly processing the raw k-space data via the SENSE reconstruction, the motion-correction scheme is able to successfully remove the motion artifacts (Figure 7).

Discussion
Reference-guided TV regularization substantially improves the motion correction quality, both visually and in terms of quality metrics based on PSNR and SSIM, when compared to basic Fourier reconstruction without motion correction.The comparison is also substantially favorable with standard "blind" motion correction techniques, for example based on conventional regularization such as TV, which do not employ a reference to guide the correction (see Appendix B).In fact, for randomized sampling patterns that are now common in the clinical practice, we verified that blind retrospective techniques are wholly inadequate for motion correction of radiological quality (cf. the comparison in Appendix B, Figure 9).
Our experimentation based on volunteer data aimed at assessing the robustness of the correction quality with respect to motion artifacts of increasing complexity.In this study, we equated this complexity to the number of volunteer changes of pose during the acquisition phase.Clearly, this does not fully describe the complexity of motion encountered in practice in the clinic, but it only constitutes a preliminary step in that direction.Nevertheless, the results described in Section 4.1 support the indication that the retrospective motion correction of T2-FLAIR weighted images based on a T1 reference contrast is quite robust in terms of reconstruction quality, with only minor degradations in terms of contrast and resolution.
Furthermore, the flexibility of the proposed motion-correction method is demonstrated with different combinations of motion-corrupted and reference contrasts (Section 4.2).Our experience suggests that an important factor in assessing the effectiveness of the reference contrast as a guide for motion correction lies in the similarity of the k-space distribution of the two contrasts.Good reconstruction quality can be expected when the reference contrast has similar or higher frequency content when compared to the corrupted contrast, regardless of the type of contrast considered.
A significant part of our experimentation was devoted to assess whether the scanner reconstruction (available as DICOM format) can be directly used as input data for the proposed correction method (Section 4.3).We established that the scanner reconstruction is not suitable for this purpose when it is obtained via compressed-sensing algorithms (Appendix A), which is the case for randomized sampling on the 1.5 T Philips Ingenia scanner utilized in this work.In this case, we must resort to the raw k-space data and perform an intermediate SENSE reconstruction for effective motion correction.
The computational times of the motion correction are, generally speaking, problem dependent, since complex motion artifacts require an increasing number of iterations as a function of motion complexity (Section 2.2).The examples illustrated in this study, where a fixed number of iterations was consider irrespectively of motion complexity, are completed within 1 h 30 min for 3D images of approximately 256×256×256 voxels.The current CPU implementation was run on a consumer-grade laptop with the following processor specifications: Intel Core i7-10750H CPU@2.60GHz×12.An effective implementation in a clinical scenario for on-line reconstructions will likely require GPUs.
The basic assumption of the proposed retrospective correction method is related to the availability of a motion-free contrast.While we believe that it is a realistic possibility within an MR session, we note that the reference contrast may come from previous MR sessions (or even different imaging modalities altogether, such as CT).In this particular case, the bias introduced by the structural prior may have an adverse effect in case of an evolving pathology.However, when structural changes involve a limited pathological region, the adverse bias can be easily mitigated by masking the affected zone.
Note that the motion-free reference can be exploited differently than the reference-guided TV regularization introduced in Ehrhardt and Betcke [2016], and adopted in this work.For example, one may consider several competing techniques advanced for multi-contrast MRI, such as Bayesian compressed sensing [Bilgic et al., 2011], group sparsity [Huang et al., 2014], reference-based MRI [Weizman et al., 2015], or multi-contrast graph-based sparsity [Lai et al., 2017[Lai et al., , 2018]].
The method here presented is limited to rigid motion.Indeed, some decrease in correction quality is noticeable in Figure 6 in the neck region (which is not supposed to behave rigidly).However, our technique may be extended to non-rigid motion and, hence, different body regions other than the brain [see, for example, Huttinga et al., 2020].A major challenge for such extension is a computationally effective parameterization of the motion effects, and the result-ing ill-posedness of the inverse problem.Note that a significant computational advantage of rigid motion over non-rigid motion is related to the direct implementation of the rigid motion in k-space, via equation (3), which results in a data model that requires a single NUFFT evaluation, regardless of the number of time samples considered.Other interesting extensions of the method are related to the integration of specialized motion-resilient acquisition patterns, e.g. as described in Cordero-Grande et al. [2020].

Conclusions
We assessed the performance of the proposed retrospective motion correction method based on a reference contrast not affected by motion artifacts.The current prospective in-vivo study targets 3D clinical protocols conventionally used in brain imaging.
The method is tested with several degrees of motion artifacts, by instructing the volunteers to change position during the scan multiple times.While, we observe that the corrupted images are severely degraded as a function of motion complexity, the corrected images are generally robustly estimated.We also verified that the proposed technique is agnostic with respect to the choice of the reference contrast, as long as the frequency content of the reference and target contrasts is comparable.
Further assessment of the proposed method will be devoted to patient data.2, 3).The volunteer is instructed to move a variable number of times during the scan in order to test the robustness of the proposed correction scheme with respect the motion complexity.The corrupted images are increasingly affected by motion artifacts, however only modest decrease in reconstruction quality can be observed for the corrected images (here axial slices).

Coronal detail
Figure 5: Reconstruction results for volunteer 2. The volunteer is instructed to move five times during the scan.The corrupted contrast is T1-weighted, while the reference contrast is T2-weighted.The proposed correction scheme is agnostic about the choice of corrupted/reference contrast combinations with similar spectral content.

Axial detail
Figure 6: Reconstruction results for volunteer 3. The volunteer is instructed to move once, halfway through the scan (the two overlapping positions are clearly visible in the corrupted slices).The corrupted contrast is T2-weighted, while the reference contrast is T1-weighted.In this case, the input data for the correction algorithm is directly extracted from the scanner reconstruction in DICOM format (comprising both amplitude and phase).The acquisition scheme for the T2-weighted contrast follows a linear filling pattern in k-space.The proposed method successfully removes the motion artifacts because the scanner reconstruction is obtained through a conventional SENSE reconstruction (cf. Figure 7).

Axial detail
Figure 7: Reconstruction results for volunteer 3. The volunteer is instructed to move once, halfway through the scan (the two overlapping positions are clearly visible in the corrupted slices).The corrupted contrast is T2-FLAIR-weighted, while the reference contrast is T1-weighted.Unlike Figure 6, the input data for the correction algorithm is not extracted from the scanner, but is obtained via SENSE reconstruction of the raw k-space data.The acquisition scheme for the T2-FLAIR-weighted contrast is randomized in k-space.When the scanner reconstruction is directly used as input data, the motion correction is highly suboptimal (cf. Figure 8 in Appendix A).

A Inadequate motion correction with scanner reconstruction as input data
As anticipated in Section 3.3, directly using the scanner reconstruction (extracted as DICOM files of both the amplitude and phase of the reconstruction) as input data for the proposed motion correction scheme may degrade the performance when compressed-sensing reconstruction tools have been employed in the reconstruction process.To motivate this conclusion, we setup an experiment with the same setting as described in the second experiment in Section 3.3, the only difference being in how the input data is generated.In this case, the input data consist of the Fourier transform of the extracted scanner reconstruction.
The related suboptimal correction is quite evident when comparing Figure 8 with Figure 7.

Corrupted Corrected Ground truth Reference
Figure 8: Reconstruction results for volunteer 3. The volunteer is instructed to move once, halfway through the scan.The corrupted contrast is T2-FLAIRweighted, while the reference contrast is T1-weighted.In this experiment, the proposed motion correction scheme processes the scanner reconstruction directly.Since the reconstruction algorithm implemented in the scanner destroys the coherence of the rigid motion artifact, the proposed method cannot properly recover the correct reconstruction by simply estimating the motion parameters.With contrasts obtained by randomized acquisitions, we advise to use raw k-space data instead (cf. Figure 7).

B Comparison of motion correction with and without a reference guide
The reference-guided motion-correction algorithm described in Section 2 is compared with a standard retrospective motion-correction algorithm based on the TV regularization.We note that most retrospective motion-correction methods follows the basic mathematical framework detailed in Section 2 (see, for example, Loktyushin et al. [2013] or Cordero-Grande et al. [2020]), where the main mathematical difference consists in the choice of the regularization term g u , in equation ( 5).Hence, in order to assess the effect of the reference contrast, we adopt the same formulation described in Section 2 with a simple TV regularization term g u (u) = x ∇u| x (cf.equation 8 for the reference-guided version of TV).
For the comparison with the baseline method, we use the same experimental settings in Section 3.1.Once again, the motion artifacts are prospectively induced by prompting the volunteer to move during the scan.The results are summarized in Figure 9 and Table 4.
Figure 9: Comparison of the reconstruction results for volunteer 1 with a reference-guided (ours) and a baseline motion-correction method (e.g., not guided by a reference contrast).The volunteer is instructed to move a variable number of times during the scan in order to test the robustness of the proposed correction schemes with respect the motion complexity.The corrupted images are increasingly affected by motion artifacts.The decrease in reconstruction quality for the baseline method is substantially more pronounced than the results obtained with our reference-guided correction (see also Figures 1-3).
We note that the difference in performance between the reference-guided and blind motion correction is even more pronounced in this example than what was previously shown in Rizzuti et al. [2022] (which was limited to 2D synthetic data).This might depend on the fact that the problem is substantially more ill-posed in 3D with randomized sampling than the 2D full-acquisition setup considered in Rizzuti et al. [2022].It is also worth noting that, in our experience, the results for blind motion correction depend more sensibly on the choice of the hyper-parameters in equation ( 5) than the proposed reference-based version.

C Motion parameter estimation
The proposed motion correction algorithm described in Section 2 estimates the rigid motion that the object of interest undergoes during the scan, in order to undo its effect on the reconstructed 3D image.In 3D, the rigid motion is performed by: a plane rotation θ xy in the corresponding plane xy, a plane rotation θ xz in the xz plane, a plane rotation θ yz in the yz plane, a translation τ x in the x direction, a translation τ y in the y direction, and a translation τ z in the z direction (in this order).We adopt the following convention: the x direction corresponds to the left-right direction, y to the posterior-anterior direction, and z to the inferior-superior direction, the xy plane corresponds to the axial plane, xz to the coronal plane, and yz to the sagittal plane.Left/right, anterior/posterior, and inferior/superior are meant from the patient perspective.The orientation of the rotation planes is determined by the right-hand rule.By design, the prospectively-induced motion for all the experiments detailed in Section 3 follows a step-wise behavior (each step corresponding to a change of pose).In this appendix, we gather the estimated rigid motion parameters for the results shown in Section 4, as a function of time.As noted in the main body of the paper, time is equated to the phase-encoding plane coordinate index, ordered by the corresponding acquisition ordering.We display the estimated motion parameters in Figure 10

1 ,
Figure 1 T2-FLAIR Completely corrected Some blurring No additional artifacts Good grey white matter differentiation Section 3.1, Figure 2 T2-FLAIR Completely corrected Some blurring No additional artifacts Good grey white matter differentiation Section 3.1, Figure 3 T2-FLAIR Completely corrected Some blurring Darker areas within the white matter Good grey white matter differentiation Section 3.2, Figure 5 T1 Completely corrected Some blurring No additional artifacts Good grey white matter differentiation, some loss of grey matter low signal Section 3.3, Figure 6 T2 Completely corrected No blurring No additional artifacts Section 3.3, Figure 7 T2-FLAIR Completely corrected Some blurring No additional artifacts Good grey white matter differentiation

Figure 1 :Figure 2 :Figure 3 :
Figure 1: Reconstruction results for volunteer 1.The volunteer is instructed to move once during the scan.The corrupted contrast is T2-FLAIR-weighted, while the reference contrast is T1-weighted.Compare these results with the one obtained with different motion complexity in Figures 2, 3.

Figure 4 :
Figure 4: Summary of the reconstruction results for volunteer 1 (see Figures 1,2, 3).The volunteer is instructed to move a variable number of times during the scan in order to test the robustness of the proposed correction scheme with respect the motion complexity.The corrupted images are increasingly affected by motion artifacts, however only modest decrease in reconstruction quality can be observed for the corrected images (here axial slices).

Table 1 :
1), where motion artifacts are prospectively generated by instructing the volunteer Specification of the acquisition sequences utilized in the experiments in Section 3. We use a 1.5 T Philips Ingenia scanner with a 15-channel head coil.For each experiment, the asterisk indicates the reference contrast.The "randomized" sampling pattern indicated in this table more specifically refers to variable density Cartesian randomized undersampling, while the "regular" pattern refers to classical accelerated linear filling undersampling.

Table 2 :
Summary of the motion-correction results shown in Section 4 in terms of PSNR and SSIM

Table 3 :
Qualitative radiological analysis of the motion-corrected results shown in Section 4. The corrected scans are radiologically equivalent to the ground truth.

Table 4 :
Comparison of the motion-correction results for the baseline and reference-guided methods in terms of PSNR and SSIM.The experiment setup is described in Section 3.1.