LSTM Spatial Co-transformer Networks for Registration of 3D Fetal US and MR Brain Images

Wright, Robert; Khanal, Bishesh; Gomez, Alberto; Skelton, Emily; Matthew, Jacqueline; Hajnal, Jo V.; Rueckert, Daniel; Schnabel, Julia A.

doi:10.1007/978-3-030-00807-9_15

Robert Wright²⁴,
Bishesh Khanal²⁴,
Alberto Gomez²⁴,
Emily Skelton²⁴,
Jacqueline Matthew²⁴,
Jo V. Hajnal²⁴,
Daniel Rueckert²⁵ &
…
Julia A. Schnabel²⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11076))

Included in the following conference series:

1775 Accesses
14 Citations

Abstract

In this work, we propose a deep learning-based method for iterative registration of fetal brain images acquired by ultrasound and magnetic resonance, inspired by “Spatial Transformer Networks”. Images are co-aligned to a dual modality spatio-temporal atlas, where computational image analysis may be performed in the future. Our results show better alignment accuracy compared to “Self-Similarity Context descriptors”, a state-of-the-art method developed for multi-modal image registration. Furthermore, our method is robust and able to register highly misaligned images, with any initial orientation, where similarity-based methods typically fail.

You have full access to this open access chapter, Download conference paper PDF

Deep Learning Spatial Compounding from Multiple Fetal Head Ultrasound Acquisitions

Context-Sensitive Super-Resolution for Fast Fetal Magnetic Resonance Imaging

Joint Image Quality Assessment and Brain Extraction of Fetal MRI Using Deep Learning

1 Introduction

Registration, the process of aligning images, is an important technique which allows visual inspection and computational analysis of images in a common coordinate system. For fetal abnormality screening, registered Magnetic Resonance (MR)/Ultrasound (US) images may assist diagnosis as the two modalities capture complementary anatomical information. For example, in the fetal brain, MR images have better contrast between important structures such as cortical Grey Matter (GM) and White Matter (WM), whereas the higher spatial resolutions of US gives better discrimination between fine structures such as the septum pellucidum and the choroid plexus [7].

A voxel-wise image similarity measure or cost function is commonly used in medical imaging to register images. This function quantifies the alignment of images, where an extremum gives the optimum alignment between images. Unfortunately, image similarity-based methods are ill-suited to the challenging task of US/MR image registration as there is no global intensity relationship between the two modalities. Primarily this is due to the imaging artefacts present in US images, such as view-dependent shadows, speckle noise, anisotropy, attenuation, reverberation and refraction. Popular similarity measures developed specifically for other multi-modal registration problems in the past, such as Normalised Mutual Information (NMI), often fail, even with a good initialisation [12].

Consequently, an alternative approach has arisen for registration of images with non-global intensity relationships whereby image intensities are first transformed to a modality independent representation. These are typically derived from hand-crafted descriptors which capture structural information from images such as edges and corners. Representations used by previous authors include local gradient orientation [4], local phase [9] and local entropy [16]. Notably, [5] use the concept of self-similarity, computing the similarity of small image patches in a local neighboured within an image, which achieved state-of-the-art performance on a challenging US/MRI registration dataset. Another approach to this problem is modality synthesis, which aims to transform image intensities from one modality to another allowing the registration task to be treated as a mono-modal problem. [7] made use of this approach to register the fetal brain imaged by US and MR for the first time.

More recently, deep neural networks have been applied to the problem of registration. Two common strategies for registration with deep learning include estimating a similarity measure [2, 15] and predicting transformations directly [1, 13]. An advantage of the first approach is that it allows established transformation models and optimizers to be used, however, this could be a hindrance if the learnt similarity function is not smooth or convex. The second approach, predicting the parameters of a transformation model directly, is receiving more research focus recently as it allows more robust transformation estimates.

1.1 Proposed Method

In this work, we adopt a deep learning approach to tackle the challenging task of paired 3D MR/US fetal brain registration. Our Long Short-Term Memory (LSTM) network simultaneously predicts a joint isotropic rescaling plus independent rigid transformations for both MR/US images, aligning them to a dual-modality spatio-temporal atlas (Sect. 2.6). Transformation estimates are refined iteratively over time, allowing for higher accuracy. For this, we extend the iterative spatial transformer [8] for co-transformation of multiple images (see Fig. 1). The main contributions of this work are as follows:

A network architecture inspired by spatial transformer networks [6] for group-wise registration of images to a common pose.
A loss function which encourages convergence and fine alignment of images.

2 Methods

2.1 Overview

The spatial transformer module [6] allows geometric transformation of network inputs or feature maps within a network, conditioned on the input or feature map itself. Importantly, the spatial transformer module is differentiable, allowing end-to-end training of any network it is inserted into. This allows reorientation of an image into a canonical pose, simplifying the task of subsequent layers. [8] proposed an elegant iterative version of the spatial transformer that passes composed transformation parameters through the network instead of warped images, preserving image intensities until the final transformation. The same geometric predictor with a much simpler network architecture can be used in a recurrent manner, for more accurate alignment.

In this work, we propose a novel extension the “recurrent/LSTM spatial co-transformer”, which allows simultaneous transformation of multiple images to a common pose. Commonly, registration algorithms estimate a warp from one image (the source) towards another (the target). However, we found that fine alignment is more easily learnt between images in a common pose. Thus, we simultaneously co-align pairs of MR/US images to a common atlas-space (Sect. 2.6), which will also facilitate future computational image analysis.

Additionally, we propose an LSTM-based parameter prediction network (Fig. 2) and a temporally varying loss function (Sect. 2.5) for more accurate alignments.

2.2 Recurrent Spatial Co-transformer

The recurrent spatial co-transformer consists of three main components: (1) the warper, (2) the residual parameter prediction network and (3) the composer. The first component, the warper, is the computational machinery needed to transform an image and does not contain any learnable parameters. For simplicity of discourse, we treat this as a single function $f_{warp}$ and refer the reader to [6] for a detailed description of grid transformation and differentiable interpolation. The second component, the parameter prediction network, $f_{predict}$, predicts residual transformations conditioned on the current warped output images. Finally, the third component, the composer, updates the transformation estimates. The recurrent spatial co-transformer iterates between three steps, which will now be described in more detail.

Step 1 - Image Warping. For iteration t, Let $\mathcal {I}=(I^{0},\, I^{1},\,\dots \,,\, I^{N})$ denote an N-tuple of input images, $\varTheta _{t}=(\theta _{t}^{0},\theta _{t}^{1},\,\dots \,,\,\theta _{t}^{N})$ denote an N-tuple of corresponding transformation estimates and $\mathcal {O}_{t}=(O_{t}^{0},O_{t}^{1},\,\dots \,,\, O_{t}^{N})$ denote an N-tuple of corresponding warped output images. Then each input image $I^{i}$ is first warped independently given its last transformation estimate $\theta _{t-1}^{i}$

$$\begin{aligned} O_{t-1}^{i}=f_{warp}(I^{i},\,\mathbf {G},\,\theta _{t-1}^{i})\quad \forall i\in [1,\,\dots \,,\, N]. \end{aligned}$$

(1)

Here, $\mathbf {G}=[\mathbf {g}_{1},\,\dots \,,\mathbf {g}_{g}]\in \mathbb {R}^{4\times g}$ is a matrix of homogeneous grid coordinates.

Step 2 - Residual Parameter Prediction. Warped images $\mathcal {O}_{t-1}$ are concatenated along the channel axis and passed as a single tensor to $f_{predict}$ which simultaneously predicts an N-tuple of corresponding residual transformations $\varDelta _{t}=(\delta _{t}^{0},\delta _{t}^{1},\,\dots \,,\,\delta _{t}^{N})$

(2)

$f_{predict}$ can take any form but typically consists of a feed-forward network with several interleaved convolutional and max pooling layers followed by a fully connected layer and a final fully connected regression layer with the number of units equalling the number of model parameters.

Step 3 - Parameter Composition. Finally, each transformation estimate $\theta _{t-1}^{i}$, is composed with its residual transformation estimate $\delta _{t}^{i}$, yielding a new transformation estimate $\theta _{t}^{i}$

$$\begin{aligned} \theta _{t}^{i}=f_{update}(\theta _{t-1}^{i},\,\delta _{t}^{i})\quad \forall i\in [1,\,\dots \,,\, N]. \end{aligned}$$

(3)

The composition function $f_{update}$ will vary depending on the transformation model. For example, if $\theta $ parametrises a homogeneous transformation matrix, $f_{update}$ would be matrix multiplication.

2.3 LSTM Spatial Co-transformer

For more accurate parameter prediction, we propose an LSTM network architecture for $f_{predict}$. LSTMs are an extremely powerful network architecture capable of storing information in a cell state allowing them to learn long term dependencies in sequential data much better than recurrent neural networks. For this we modify the prediction function $f_{predict}$ (Eq. 2) so that it now takes a feature vector $\mathbf {x}_{t}$, and a cell state vector $\mathbf {c}_{t}$

(4)

Here $f_{extract}$ is a function that extracts the feature vector $\mathbf {x}_{t}$ from the concatenation of the output images, . For this, we chose a neural network with a series of convolutions and max pooling operations followed by a flattening procedure (see Fig. 2 for a schematic, however any network architecture may be used that produces a vector). At each iteration t, the cell state $\mathbf {c}_{t}$ is updated by a linear blend of the previous cell state $\mathbf {c}_{t-1}$ and a vector of candidate values $\tilde{\mathbf {c}}_{t}$ [3]

(5)

Here, is the Hadamard or element-wise product and $\mathbf {f}_{t}$ is the forget mask, a real valued vector that determines which information is forgotten from the cell state and which candidate values are added. We define $\mathbf {f}_{t}$ as the result of a single function $f_{forget}$ that takes the extracted feature vector $\mathbf {x}_{t}$ and also the previous cell state $\mathbf {c}_{t-1}$. We implement both the forget and candidate functions as a sequence of two dense layers with weight matrices $\mathbf {W}_{f1}$, $\mathbf {W}_{f2}$ and $\mathbf {W}_{c1}$, $\mathbf {W}_{c2}$, respectively

$$\begin{aligned} \mathbf {f}_{t}=f_{forget}(\mathbf {c}_{t-1},\,\mathbf {x}_{t})=\sigma (\mathbf {W}_{f2}\,.\text {max}(\mathbf {W}_{f1}\,.\,\left[ \mathbf {c}_{t-1},\,\mathbf {x}_{t}\right] ,\,0)), \end{aligned}$$

(6)

$$\begin{aligned} \tilde{\mathbf {c}}_{t}=f_{candidate}(\mathbf {c}_{t-1},\,\mathbf {x}_{t})=\text {tanh}(\mathbf {W}_{c2}\,.\text {max}(\mathbf {W}_{c1}\,.\,\left[ \mathbf {c}_{t-1},\,\mathbf {x}_{t}\right] ,\,0)). \end{aligned}$$

(7)

2.4 Rigid Parameter Prediction

For rigid coalignment, our network predicts seven residual update parameters per image: an isotropic log scaling s, three rotation parameters $r_{x}$, $\, r_{y}$, $r_{z}$ and three translation parameters $t_{x}$, $t_{y}$, $t_{z}$. Here, $[r_{x}$, $r_{y}$, $r_{z}]$ gives an axis of rotation, while $\phi =\left\| [r_{x},\, r_{y},\, r_{z}]\right\| _{2}$, gives the angle of rotation. Note, weights are shared between images for scaling parameters. Our transformation parameters now become rigid transformation matrices $\delta _{t}=\mathbf {M}_{t}^{\delta }$, $\theta _{t}=\mathbf {M}_{t}$. Note, for simplicity, transformations $\mathbf {M}$ are applied to the target grid $\mathbf {G}$ before resampling, i.e. the inverse transformation. For consistency, we define $\mathbf {M}^{\delta }$ as the inverse update and $(\mathbf {M}^{\delta })^{-1}$ as the forward update. Learning a series of forward update transformations is inherently easier for the network, thus we post-multiply the current transformation matrix by the residual matrix, $\mathbf {M}\leftarrow \mathbf {M}\mathbf {M^{\delta }}$. This is equivalent to updating the forward transformation as follows $\mathbf {M}^{-1}\leftarrow (\mathbf {M}^{\delta })^{-1}\mathbf {M}^{-1}$. The forward update transformation is composed as a translation, followed by a rotation, followed by an isotropic rescaling, $(\mathbf {M}^{\delta })^{-1}=\mathbf {S}\mathbf {R}\mathbf {T}$. In practice, we predict the inverse of the update directly by reversing the composition and inverting the operations $\mathbf {M}^{\delta }=\mathbf {T}^{-1}\mathbf {R}^{-1}\mathbf {S}^{-1}$.

2.5 Training and Loss Function

Let $\mathbf {X}=\{\mathcal {I}_{0},\mathcal {I}_{1},\,\dots \,,\mathcal {I}_{n}\}$ denote a training set of n aligned image tuples. Images in the training set are initially aligned to a common pose (in our case we affinely align our MR and US images to a dual-modality atlas, see Sect. 2.6). For each training iteration, an image tuple is selected $\mathcal {I}=(I^{0},\, I^{1},\,\dots \,,\, I^{N})$ and each image $I^{i}$ is transformed by a randomly generated matrix $\mathbf {D}^{i}$, before being fed into the network. $\mathbf {D}^{i}$ incorporates an affine augmentation (shared across the input tuple) and an initial rigid disorientation. For augmentation, we randomly sample and compose a shearing, an anisotropic scaling and an isotropic scaling. For disorientation, we compose a random rotation and translation. Crucially, the use of a recurrent network allows us to back-propagate errors through time. We took advantage of this by designing a temporally varying loss function comprising of a relative and an absolute term, which allows our network to learn a long term strategy for alignment. For k alignment iterations of N images, we define our loss

$$\begin{aligned} \mathcal {L}=\sum _{i=1}^{N}\sum _{t=1}^{k}d(\mathbf {M}_{t}^{i}\,\mathbf {D}^{i})/\text {d}(\mathbf {M}_{t-1}^{i}\,\mathbf {D}^{i})+\lambda \dfrac{t}{k}\, d(\mathbf {M}_{t}^{i}\,\mathbf {D}^{i}). \end{aligned}$$

(8)

Here, d is a distance function of a transformation matrix from the identity and $\lambda $ is a weighting between the loss terms. The first loss term rescales distance errors $d(\mathbf {M}_{t}^{i}\,\mathbf {D}^{i})$, relative to the previous distance error, $\text {d}(\mathbf {M}_{t-1}^{i}\,\mathbf {D}^{i})$. This encourages the network to learn fine alignments and convergence. Note, $\text {d}(\mathbf {M}_{t-1}^{i}\,\mathbf {D}^{i})$ is treated as a constant here. The second term penalises the absolute error with increasing weight, encouraging initial exploration but still penalising poor final alignments. The distance function $d(\mathbf {M})$ is computed by first decomposing matrix $\mathbf {M}$ into a isotropic scale s, a translation vector $\mathbf {t}$ and a rotation matrix $\mathbf {R}$. We then compute $d(\mathbf {M})$ as a sum of separate distance measures for each of these components

$$ d(\mathbf {M})=d_{scale}(s)+d_{rotate}(\mathbf {R})+d_{translate}(\mathbf {t}),\,\,\;\text {where}\quad d_{translate}(\mathbf {t})=\left\| \mathbf {t}\right\| _{2}, $$

$$\begin{aligned} d_{scale}(s)=\mu \left| \text {log}(s)\right| \,\text {and}\quad d_{rotate}(\mathbf {R})=\dfrac{1}{g}\sum _{i=1}^{g}\left\| \mathbf {g}_{i}-\mathbf {R}\mathbf {g}_{i}\right\| _{2}. \end{aligned}$$

(9)

Here, $\mu $ weights $d_{scale}$ relative to the other two distance measures. Rotation distance, $d_{rotate}$, is given by the mean distance between transformed grid points $\mathbf {R}\mathbf {g}_{i}$ and their initial locations $\mathbf {g}_{i}$. This gives a natural weighting between translation and rotation components.

2.6 Joint Affine MR/US Spatio-Temporal Atlas (Ground Truth)

We followed the approach of [14], by constructing average image intensity templates for each week of gestation (20–31 weeks), from 166 3D reconstructed MR/3D US image pairs. A set of templates was constructed for each modality separately with a final registration step between templates to establish correspondences across modalities. This process comprised of three parts: (1) manual reorientation (2) age-dependant template bootstrapping and (3) unified template bootstrapping. All images were carefully manually reoriented to a standard pose with the yz plane aligned with the brain midline and the top of the brain stem centred at the origin. Averaging reoriented image intensities yielded an initial template estimate which was refined using a bootstrapping procedure. This involved alternating between two steps: (1) affinely registering images to the current template and (2) averaging registered image intensities. The bootstrapping procedure was then repeated between templates to establish correspondences across time. MR templates were constructed first, allowing us to fix the shearing and scaling parameters for US template construction. For US registration, we restrict the optimisation to three degrees of freedom, rotation around x, and translation along y and z, thus respecting the manual definition of the midline. With additional masking, this allowed robust registration of US images for template construction using [10].

3 Results and Discussion

3.1 Alignment Error

To demonstrate the accuracy of our method (LSTM ST) we compute registration errors with respect to two ground truth alignments: the first, derived from our spatio-temporal atlas and the second, derived from anatomical landmarks picked by clinical experts (fourteen per image), which offers an unbiased alternative. For comparison, two image similarity-based registration methods were chosen, NMI with block-matching (NMI+MI) [10] and self-similarity context descriptors with discrete optimisation (SSC+DO) [5]. Both of these methods were developed for robust registration and have been used for multi-modality registration tasks previously. To compare the accuracy of the methods and also their ability to register highly misaligned images, we created three test sets with different ranges of disorientation: [, ], [, ] and [, ].

Table 1. Mean alignment error. Mean rotation and translation errors over our test set are shown for three automated registration methods, relative to two ground truth alignments.

Full size table

As we can see from Table 1 our method outperforms both similarity-based methods for all disorientation levels and both ground truth datasets. Furthermore, our method converges to the same alignment for each image pair, irrespective of initial orientation and positioning, which explains the very similar mean errors seen for the three disorientation levels. Conversely, similarity-based methods failed to register images for higher levels of disorientation. All pairs of images registered by our method were visually inspected and a reasonable alignment was found in all cases (see Fig. 3 for example alignments). The worst rotation and translation errors seen were $7.9^{\circ }$ and $1.8\,\text {mm}$ respectively, showing our method is relatively robust.

3.2 Mean Templates

We construct US mean templates by first registering each US image to its corresponding MR image, rigidly, then affinely transforming the image pair to the MR atlas space and finally averaging the intensities for all transformed US images. If registration between modalities is accurate, then the constructed US template should be crisp. To evaluate the constructed templates, we compute two measures, Peak Signal-to-Noise Ratio (PSNR) with respect to our ground truth US template (Sect. 2.6) and the Variance of the image Laplacian (VAR), which provides an unbiased measure of sharpness [11]. Figure 4 shows that our method produces the sharpest template as measured by VAR and also has the highest PNSR. Furthermore, templates for our method have the same sharpness for any level of initial disorientation.

4 Conclusion

In this work, we proposed the LSTM spatial co-transformer, a deep learning-based method for group-wise registration of images to a standard pose. We applied this to the challenging task of fetal MR/US brain image registration. Our method automatically coaligns brain images with a dual-modality spatio-temporal atlas, where future computational image analysis may be performed. Our results show that our method registers images more accurately than state-of-the-art similarity-based registration method “self-similarity context descriptors” [5]. Furthermore, it is able to robustly register highly misaligned images, where similarity-based will fail.

References

Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: An unsupervised learning model for deformable medical image registration. In: CVPR (2018)
Google Scholar
Cheng, X., Zhang, L., Zheng, Y.: Deep similarity learning for multimodal medical images. Comput. Meth. Biomech. Biomed. Eng. Imaging Vis. 6(3), 248–252 (2018)
Article Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Google Scholar
Haber, E., Modersitzki, J.: Intensity gradient based registration and fusion of multi-modal images. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4191, pp. 726–733. Springer, Heidelberg (2006). https://doi.org/10.1007/11866763_89
Chapter Google Scholar
Heinrich, M.P., Jenkinson, M., Papież, B.W., Brady, S.M., Schnabel, J.A.: Towards realtime multimodal fusion for image-guided interventions using self-similarities. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8149, pp. 187–194. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40811-3_24
Chapter Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)
Google Scholar
Kuklisova-Murgasova, M., et al.: Registration of 3D fetal brain US and MRI. In: MICCAI, pp. 667–674 (2012)
Chapter Google Scholar
Lin, C.H., Lucey, S.: Inverse compositional spatial transformer networks. In: CVPR, pp. 2252–2260 (2017)
Google Scholar
Mellor, M., Brady, M.: Phase mutual information as a similarity measure for registration. Med. Image Anal. 9(4), 330–343 (2005). Functional Imaging and Modeling of the Heart - FIMH 2003
Google Scholar
Ourselin, S., Roche, A., Subsol, G., Pennec, X., Ayache, N.: Reconstructing a 3D structure from serial histological sections. Image Vis. Comput. 19(1), 25–31 (2001)
Article Google Scholar
Pech-Pacheco, J.L., Cristobal, G., Chamorro-Martinez, J., Fernandez-Valdivia, J.: Diatom autofocusing in brightfield microscopy: a comparative study. ICPR 3, 314–317 (2000)
Google Scholar
Rivaz, H., Karimaghaloo, Z., Collins, D.L.: Self-similarity weighted mutual information: a new nonrigid image registration metric. Med. Image Anal. 18(2), 343–358 (2014)
Article Google Scholar
Rohé, M.-M., Datar, M., Heimann, T., Sermesant, M., Pennec, X.: SVF-Net: learning deformable image registration using shape matching. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10433, pp. 266–274. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66182-7_31
Chapter Google Scholar
Serag, A., Aljabar, P., Ball, G., Counsell, S.J., Boardman, J.P., Rutherford, M.A., Edwards, A.D., Hajnal, J.V., Rueckert, D.: Construction of a consistent high-definition spatio-temporal atlas of the developing brain using adaptive Kernel regression. NeuroImage 59(3), 2255–2265 (2012)
Article Google Scholar
Simonovsky, M., Gutiérrez-Becker, B., Mateus, D., Navab, N., Komodakis, N.: A deep metric for multimodal registration. In: MICCAI, pp. 10–18 (2016)
Google Scholar
Wachinger, C., Navab, N.: Entropy and Laplacian images: structural representations for multi-modal registration. Med. Image Anal. 16(1), 1–17 (2012)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Wellcome/EPSRC Centre for Medical Engineering [WT 203148/Z/16/Z], Wellcome Trust IEH Award [102431] and NVIDIA with the donation of a Titan Xp GPU.

Author information

Authors and Affiliations

School of Biomedical Engineering and Imaging Sciences, King’s College London, London, UK
Robert Wright, Bishesh Khanal, Alberto Gomez, Emily Skelton, Jacqueline Matthew, Jo V. Hajnal & Julia A. Schnabel
Department of Computing, Imperial College London, London, UK
Daniel Rueckert

Authors

Robert Wright
View author publications
You can also search for this author in PubMed Google Scholar
Bishesh Khanal
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Gomez
View author publications
You can also search for this author in PubMed Google Scholar
Emily Skelton
View author publications
You can also search for this author in PubMed Google Scholar
Jacqueline Matthew
View author publications
You can also search for this author in PubMed Google Scholar
Jo V. Hajnal
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rueckert
View author publications
You can also search for this author in PubMed Google Scholar
Julia A. Schnabel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Wright .

Editor information

Editors and Affiliations

University College London, London, UK
Andrew Melbourne
TU Wien, Vienna, Austria
Roxane Licandro
University of California, San Francisco, CA, USA
Matthew DiFranco
Italian Institute of Technology, Genoa, Italy
Paolo Rota
TU Wien, Vienna, Austria
Melanie Gau
TU Wien, Vienna, Austria
Martin Kampel
University College London, London, UK
Rosalind Aughwane
University Medical Center Utrecht, Utrecht, The Netherlands
Pim Moeskops
Medical University of Vienna, Vienna, Austria
Ernst Schwartz
King's College London, London, UK
Emma Robinson
Imperial College London, London, UK
Antonios Makropoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wright, R. et al. (2018). LSTM Spatial Co-transformer Networks for Registration of 3D Fetal US and MR Brain Images. In: Melbourne, A., et al. Data Driven Treatment Response Assessment and Preterm, Perinatal, and Paediatric Image Analysis. PIPPI DATRA 2018 2018. Lecture Notes in Computer Science(), vol 11076. Springer, Cham. https://doi.org/10.1007/978-3-030-00807-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-00807-9_15
Published: 15 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00806-2
Online ISBN: 978-3-030-00807-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LSTM Spatial Co-transformer Networks for Registration of 3D Fetal US and MR Brain Images

Abstract

Similar content being viewed by others

Deep Learning Spatial Compounding from Multiple Fetal Head Ultrasound Acquisitions

Context-Sensitive Super-Resolution for Fast Fetal Magnetic Resonance Imaging

Joint Image Quality Assessment and Brain Extraction of Fetal MRI Using Deep Learning