Skip to main content
Log in

Hierarchical Geodesic Models in Diffeomorphisms

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Hierarchical linear models (HLMs) are a standard approach for analyzing data where individuals are measured repeatedly over time. However, such models are only applicable to longitudinal studies of Euclidean data. This paper develops the theory of hierarchical geodesic models (HGMs), which generalize HLMs to the manifold setting. Our proposed model quantifies longitudinal trends in shapes as a hierarchy of geodesics in the group of diffeomorphisms. First, individual-level geodesics represent the trajectory of shape changes within individuals. Second, a group-level geodesic represents the average trajectory of shape changes for the population. Our proposed HGM is applicable to longitudinal data from unbalanced designs, i.e., varying numbers of timepoints for individuals, which is typical in medical studies. We derive the solution of HGMs on diffeomorphisms to estimate individual-level geodesics, the group geodesic, and the residual diffeomorphisms. We also propose an efficient parallel algorithm that easily scales to solve HGMs on a large collection of 3D images of several individuals. Finally, we present an effective model selection procedure based on cross validation. We demonstrate the effectiveness of HGMs for longitudinal analysis of synthetically generated shapes and 3D MRI brain scans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Adams, J. F. (1969). Lectures on Lie groups. Chicago: University of Chicago Press.

    MATH  Google Scholar 

  • Amit, Y., Grenander, U., & Piccioni, M. (1991). Structural image restoration through deformable templates. Journal of the American Statistical Association, 86(414), 376–387.

    Article  Google Scholar 

  • Arnol’d, V. I. (1966). Sur la géométrie différentielle des groupes de Lie de dimension infinie et ses applications à l’hydrodynamique des fluides parfaits. Annales de l’institut Fourier, 16, 319–361.

    Article  MATH  Google Scholar 

  • Burke, S. N., & Barnes, C. A. (2006). Neural plasticity in the ageing brain. Nature Reviews Neuroscience, 7(1), 30–40.

    Article  Google Scholar 

  • Chevalley, C. (1999). Theory of Lie groups: 1 (Vol. 1). Princeton: Princeton University Press.

    Google Scholar 

  • Davis, B. C., Fletcher, P. T., Bullitt, E., & Joshi, S. (2010). Population shape regression from random design data. International Journal of Computer Vision, 90(2), 255–266.

    Article  Google Scholar 

  • Durrleman, S., Pennec, X., Trouvé, A., Gerig, G., & Ayache, N. (2009). Spatiotemporal atlas estimation for developmental delay detection in longitudinal datasets. In MICCAI (pp. 297–304). Berlin: Springer.

  • Fishbaugh, J., Prastawa, M., Durrleman, S., Piven, J., & Gerig, G. (2012). Analysis of longitudinal shape variability via subject specific growth modeling. MICCAI. Berlin: Springer.

  • Fishbaugh, J., Prastawa, M., Gerig, G., & Durrleman, S. (2013). Geodesic image regression with a sparse parameterization of diffeomorphisms. In Geometric Science of Information (pp. 95–102). New York: Springer.

  • Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2012). Applied longitudinal analysis (Vol. 998). New Jersey: Wiley.

    Google Scholar 

  • Fletcher, P. T. (2013). Geodesic regression and the theory of least squares on riemannian manifolds. International Journal of Computer Vision, 105(2), 171–185.

  • Fox, N. C., & Schott, J. M. (2004). Imaging cerebral atrophy: Normal ageing to alzheimerEijs disease. Lance, 363(9406), 392–394.

    Article  Google Scholar 

  • Grenander, U., & Miller, M. I. (1998). Computational anatomy: An emerging discipline. Quarterly of Applied Mathematics, LVI(4), 617–694.

    MathSciNet  Google Scholar 

  • Hinkle, J., Muralidharan, P., Fletcher, P. T., & Joshi, S. (2012). Polynomial regression on riemannian manifolds. In Computer Vision–ECCV 2012 (pp. 1–14). New York: Springer.

  • Hong, Y., Joshi, S., Sanchez, M., Styner, M., & Niethammer, M. (2012). Metamorphic geodesic regression. In N. Ayache, H. Delingette, P. Golland, & K. Mori (Eds.), Medical image computing and computer-assisted intervention âĂŞ MICCAI 2012. Lecture Notes in Computer Science (Vol. 7512, pp. 197–205). Berlin: Springer. doi:10.1007/978-3-642-33454-2_25.

  • Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38(4), 963–974.

    Article  MATH  Google Scholar 

  • Lorenzi, M., Ayache, N., Frisoni, G. B., & Pennec, X. (2011). Mapping the effects of Ab142 levels on the longitudinal changes in healthy aging: Hierarchical modeling based on stationary velocity fields. In: MICCAI 2011. Heidelberg: Springer.

  • Lorenzi, M., Pennec, X., Ayache, N., & Frisoni, G. (2012). Disentangling the normal aging from the pathological Alzheimer’s disease progression on cross-sectional structural MR images. MICCAI Workshop on Novel Imaging Biomarkers for Alzheimer’s Disease and Related Disorders (NIBAD’12) (pp. 145–154). France: Nice.

  • Marcus, D. S., Fotenos, A. F., Csernansky, J. G., Morris, J. C., & Buckner, R. L. (2010). Open access series of imaging studies: Longitudinal mri data in nondemented and demented older adults. Journal of Cognitive Neuroscience, 22(12), 2677–2684.

    Article  Google Scholar 

  • Micheli, M., Michor, P. W., & Mumford, D. (2012). Sectional curvature in terms of the cometric, with applications to the riemannian manifolds of landmarks. SIAM Journal on Imaging Sciences, 5(1), 394–433.

    Article  MathSciNet  MATH  Google Scholar 

  • Miller, M., Banerjee, A., Christensen, G., Joshi, S., Khaneja, N., Grenander, U., et al. (1997). Statistical methods in computational anatomy. Statistical Methods in Medical Research, 6(3), 267–299.

    Article  Google Scholar 

  • Miller, M. I. (2004). Computational anatomy: Shape, growth, and atrophy comparison via diffeomorphisms. NeuroImage, 23, 19–33.

    Article  Google Scholar 

  • Miller, M. I., Trouvé, A., & Younes, L. (2006). Geodesic shooting for computational anatomy. Journal of Mathematical Imaging and Vision, 24(2), 209–228.

  • Muralidharan, P., & Fletcher, P. (2012). Sasaki metrics for analysis of longitudinal data on manifolds. In: IEEE Conference on CVPR (pp. 1027–1034).

  • Niethammer, M., Huang, Y., & Vialard, F. X. (2011). Geodesic regression for image time-series. In MICCAI 2011 (Vol. 6892, pp. 655–662). Berlin: Springer.

  • Pinheiro, J. C., & Bates, D. M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4(1), 12–35.

    Google Scholar 

  • Raz, N., & Rodrigue, K. M. (2006). Differential aging of the brain: Patterns, cognitive correlates and modifiers. Neuroscience & Biobehavioral Reviews, 30(6), 730–748.

    Article  Google Scholar 

  • Reuter, M., Rosas, H. D., & Fischl, B. (2010). Highly accurate inverse consistent registration: A robust approach. NeuroImage, 53(4), 1181–1196.

    Article  Google Scholar 

  • Reuter, M., Schmansky, N. J., Rosas, H. D., & Fischl, B. (2012). Within-subject template estimation for unbiased longitudinal image analysis. NeuroImage, 61(4), 1402–1418.

    Article  Google Scholar 

  • Singh, N., & Niethammer, M. (2014). Splines for diffeomorphic image regression. In: P. Golland, N. Hata, C. Barillot, J. Hornegger, & R. Howe (Eds.), Medical image computing and computer-assisted intervention âĂŞ MICCAI 2014. Lecture Notes in Computer Science (Vol. 8674, pp. 121–129). Springer. doi:10.1007/978-3-319-10470-6_16.

  • Singh, N., Hinkle, J., Joshi, S., & Fletcher, P. (2013a). A hierarchical geodesic model for diffeomorphic longitudinal shape analysis. In: J. Gee, S. Joshi, K. Pohl, W. Wells, & L. ZÃűllei (Eds.), Information processing in medical imaging. Lecture Notes in Computer Science (Vol. 7917, pp. 560–571). Berlin: Springer.

  • Singh, N., Hinkle, J., Joshi, S., & Fletcher, P. (2013b). A vector momenta formulation of diffeomorphisms for improved geodesic regression and atlas construction. In: 2013 IEEE 10th International Symposium on Biomedical imaging (ISBI) (pp. 1219–1222). doi:10.1109/ISBI.2013.6556700

  • Singh, N., Hinkle, J., Joshi, S., & Fletcher, P. (2014). An efficient parallel algorithm for hierarchical geodesic models in diffeomorphisms. In: Proceedings of the 2014 IEEE International Symposium on Biomedical Imaging (ISBI).

  • Sowell, E. R., Peterson, B. S., Thompson, P. M., Welcome, S. E., Henkenius, A. L., & Toga, A. W. (2003). Mapping cortical change across the human life span. Nature Neuroscience, 6, 309–315.

    Article  Google Scholar 

  • Thompson, D. W. (1942). On growth and form.

  • Thompson, P. M., & Toga, A. W. (2002). A framework for computational anatomy. Computing and Visualization in Science, 5(1), 13–34.

    Article  MATH  Google Scholar 

  • Winer, B. J. (1962). Statistical principles in experimental design. New York: McGraw-Hill Book Company.

    Book  Google Scholar 

  • Younes, L. (2010). Shapes and diffeomorphisms (Vol. 171). New York: Springer.

    MATH  Google Scholar 

  • Younes, L., Qiu, A., Winslow, R. L., & Miller, M. I. (2008). Transport of relational structures in groups of diffeomorphisms. Journal of Mathematical Imaging and Vision, 32(1), 41–56.

    Article  MathSciNet  Google Scholar 

  • Younes, L., Arrate, F., & Miller, M. I. (2009). Evolution equations in computational anatomy. NeuroImage, 45(1 Suppl), S40–S50.

    Article  Google Scholar 

  • Zhang, M., Singh, N., & Fletcher, P. (2013). Bayesian estimation of regularization and atlas building in diffeomorphic image registration. In: J. Gee, S. Joshi, K. Pohl, W. Wells, & L. ZÃűllei (Eds.), Information processing in medical imaging. Lecture Notes in Computer Science (Vol. 7917, pp. 37–48). Berlin: Springer. doi:10.1007/978-3-642-38868-2_4

Download references

Acknowledgments

This research is supported by NIH Grants U01NS082086, 5R01EB007688, U01 AG024904, R01 MH084795 and P41 RR023953, and NSF Grant 1054057. National Institutes of Health Grant U01 AG024904.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikhil Singh.

Additional information

Communicated by Pushmeet Kohli.

Appendices

Appendix 1: Review of Classical Mixed-Effects Models and the HGM Simplifications

Our proposed hierarchical geodesic models (HGM) is inspired by the work of Laird and Ware (1982) who proposed mixed-effects models in linear Euclidean space. In this section, we briefly review the classical hierarchical linear models or mixed-effects models seen in the literature. We then discuss the differences in modeling and the assumptions on variance parameters we make in our HGM compared to the standard models. See the book from Fitzmaurice et al. (2012) and the works of Laird and Ware (1982) and Pinheiro and Bates (1995) for more details.

Similar to the data setup explained in Sect. 2.1, say we have data for a sequence of \(M_i\) measurements for N individuals, for \(i=1\ldots N\), such that \(j^\mathrm{th}\) measurement of \(i^\mathrm{th}\) individual is denoted by \(y_{ij}\). Let us further denote the individual measurements by a vector, \(Y_i=\begin{pmatrix}y_{i0}\\ \vdots \\ y_{iM_i}\end{pmatrix}\).

The standard linear mixed effects model is expressed in two ways, (a) single-stage formulation, (b) two-stage formulation (see Fitzmaurice et al (2012, page 200)).

Single-stage formulation: The standard linear mixed effects model expresses the model as a combination of fixed and random effects as,

$$\begin{aligned} Y_i = X_iB+ Z_ib_i+e_i, \end{aligned}$$
(18)

where \(X_i\) and \(Z_i\) are the matrix of covariates, B is the vector of fixed effect and \(b_i\) is a vector of random effects. The randome effects are normally dstributed such that \(b_i\sim {\mathscr {N}}(0, G)\), such that G is an arbitrary covariance matrix. The vector of errors, \(e_i\), can be thought of as measurement or sampling errors and are normally distributed with identical variance and zero mean, i.e., \(e_i\sim {\mathscr {N}}(0,\sigma ^2 I_{M_i})\), where \(I_{M_i}\) denotes the \((M_i\times M_i)\) identity matrix. The matrix \(Z_i\) is known and links the matrix of random effects, \(b_i\) to \(Y_i\). In fact, the columns of \(Z_i\) are a subset of the columns of \(X_i\). This can be seen clearly using the two-stage random effects formulation of the same model.

Two-stage formulation: The above mixed effects model can be better motivated as arising from its two-stage specification, where the random and the fixed effects are split in two separate stages. The model at Stage 1 is written at the individual level as,

$$\begin{aligned} Y_i=Z_iB_i+e_i, \end{aligned}$$
(19)

where, as in the single-stage formulation above, \(e_i\) are the vector of errors distributed as, \(e_i\sim {\mathscr {N}}(0,\sigma ^2 I_{M_i})\). Such a specification says that the longitudinal responses on the \(i^\mathrm{th}\) individuals are assumed to follow the individual-specific response path given by \(Z_i B_i\) with a certain added sampling error given by \(e_i\).

The model at Stage 2 at the group level assumes that the individual effects, \(\beta _i\), are random with mean given by a linear function of the matrix of fixed effect B and the covariance G. In particular, the model at this stage takes the form,

$$\begin{aligned} B_i=A_iB+b_i, \end{aligned}$$
(20)

where \(b_i\sim {\mathscr {N}}(0, G)\). This specification says that the \(i^\mathrm{th}\) individual deviates from the population mean response by a random amount represented by \(b_i\).

Finally, to see the equivalence between the above two specifications of the mixed effects model, we substitute, the subject specific effect, \(B_i\), from Eq. (20) in Eq. (19) to get,

$$\begin{aligned} Y_i&=Z_i(A_iB + b_i)+e_i,\\&=Z_iA_iB+Z_i b_i + e_i, \end{aligned}$$

which reduces to the single-stage model by observing that \(X_i=Z_iA_i\).

1.1 From Linear Mixed-Effects Models to the HGMs

We now discuss the connections of the above model with the Euclidean version of our proposed HGM in Sect. 2.1 that we subsequently generalize to the manifold of diffeomorphisms in Sect. 2.2.

Our model at the individual level reduces to Eq. (19) of the Stage 1 with time as the covariates and for the simplest case when \(t_{i0}=0\), when we observe that,

$$\begin{aligned} B_i=\begin{pmatrix} a_i \\ b_i\end{pmatrix} \text {, } Z_i= \begin{pmatrix} 1 &{} t_{i0} \\ \vdots \\ 1 &{} t_{iM_i}\end{pmatrix} \text { and } e_i\sim {\mathscr {N}}(0,\sigma ^2 I_{M_i}). \end{aligned}$$

For the case when \(t_{i0}\) is not zero, subtract the second column of \(Z_i\) with \(t_{i0}\) to see the equivalence.

Similarly, the group level for HGM reduces to Eq. (19) of the Stage 2, when we observe that,

$$\begin{aligned} B=\begin{pmatrix} \alpha \\ \beta \end{pmatrix} \text {, } A_i= \begin{pmatrix} 1 &{} 0 \\ 0 &{} 1\end{pmatrix}. \text { and } G=\begin{pmatrix}\sigma ^2_I &{} 0\\ 0 &{} \sigma ^2_S\end{pmatrix} \end{aligned}$$

For the case when \(t_{i0}\) is not zero, the second entry in the matrix \(A_i\) is replaced by \(t_{i0}\).

We induce some critical simplifying assumptions in HGM that makes the its generalization possible to the manifold of diffeomorphisms and subsequently the estimation of the model parameters in the intrinsic sense. The simplifications are that:

  1. 1.

    the individual random intercepts \(a_i\) are modeled at starting point of each individual,

  2. 2.

    we perform a stage-wise estimation of model parameters, and

  3. 3.

    the variance parameters of the models are known a priori.

The first assumption is critical to the HGM because unlike the Euclidean case, the group of diffeomorphisms is not flat and has a nonzero curvature to it. This necessitates that the slope comparisons must be performed within a tangent space specific to the individual.

For the second assumption, similar to the standard mixed-effects parameter estimation, we can estimate the models parameters using EM based approach and integrate out \(a_i\) and \(b_i\). However, this is not feasible on manifolds of diffeomorphisms. This is because the slope-intercept parameter pairs are the elements of the tangent bundle and the theoretical development and analysis of distributions on the tangent bundle is itself an open problem. Instead, a possible improvement over our proposed method is to estimate the group and the individual level parameters jointly by minimizing the joint log-likelihood using the single-stage combined model rather than estimating it in two stages. The main challenge with such a formulation would be to address the computational expense and memory requirement for a joint optimization scheme before it is feasible for practical applications. Another direction of future improvements could be to explore approximations to log-likelihood that could be generalized to curved spaces, similar to those proposed in the works of Pinheiro and Bates (1995) for Euclidean cases. We make the last assumption also to reduce the computational expense of the algorithm. One strategy could be to investigate sampling based methods on the manifold of diffeomorphisms to estimate these model parameters similar to that proposed by Zhang et al. (2013) for image registration and atlas estimation.

Appendix 2: Derivations for Regression with Vector Momenta

The forward evolution along geodesics in diffeomorphisms is governed by the set of three time dependent constraints written as the following PDEs:

$$\begin{aligned} \left. \begin{aligned} \partial _t I + \nabla I\cdot v&= 0\quad \\ \partial _t m + \mathrm{ad}^*_vm&= 0\\ m-Lv&=0 \end{aligned} \right\} \end{aligned}$$
(21)

Along the geodesic, each one of m(t), I(t), v(t), evolve with time. The energy functional for geodesic regression with M measured image scans is of the form:

$$\begin{aligned} {\mathscr {S}}(m(0)) =&\frac{1}{2}\langle m(0),K\star m(0)\rangle _{L^2}\nonumber \\&+\frac{1}{2\sigma ^2}\sum _{i=0}^{M-1}||I(t^i)-J^i||^2 \end{aligned}$$
(22)

Here \(t^i\) are the timepoints where the noisy data, \(J^i\)’s are observed and \(0<=t^i<=1\). Extending the functional, \({\mathscr {S}}\) with the Lagrange multipliers (adjoint variables), we get:

$$\begin{aligned} \hat{{\mathscr {S}}}&= {\mathscr {S}}+\int _0^1 \langle \hat{m}, \dot{m} + \mathrm{ad}^*_vm \rangle _{L^2} \end{aligned}$$
(23)
$$\begin{aligned}&\quad +\int _0^1 \langle \hat{I}, \dot{I} + \nabla I\cdot v \rangle _{L^2} \end{aligned}$$
(24)
$$\begin{aligned}&\quad +\int _0^1 \langle \hat{v}, m-Lv\rangle _{L^2} \end{aligned}$$
(25)

We now evalute variations of \(\hat{{\mathscr {S}}}\) with respect to paths of each of the time-dependent variables, mIv.

For the variation of the energy functional, S, with respect to momenta, m, we have:

$$\begin{aligned} \partial _m\hat{{\mathscr {S}}}&= \langle \delta m(0),K\star m(0)\rangle +\frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg (\int _0^1 \langle \hat{m}, \partial _t(m+\epsilon \delta m)\\&\quad + \mathrm{ad}^*_v(m+\epsilon \delta m) \rangle + \int _0^1 \langle \hat{v}, m+\epsilon \delta m -Lv\rangle \bigg )\\&= \langle \delta m(0),K\star m(0)\rangle + \int _0^1\langle \hat{m},\delta \dot{m}+\mathrm{ad}_v^*\delta m\rangle \\&\quad +\int _0^1\langle \hat{v},\delta m\rangle \\&= \langle \delta m(0),K\star m(0)\rangle + \int _0^1\langle \hat{m},\delta \dot{m}\rangle \\&\quad + \int _0^1\langle \hat{m},\mathrm{ad}_v^*\delta m\rangle +\int _0^1\langle \hat{v},\delta m\rangle \\&= \langle \delta m(0),K\star m(0)\rangle +\langle \hat{m},\delta m\rangle \bigg |_{t=0}^{t=1}\\&\quad - \int _0^1\langle \dot{\hat{m}},\delta m\rangle + \int _0^1\langle \mathrm{ad}_v\hat{m},\delta m\rangle +\int _0^1\langle \hat{v},\delta m\rangle \end{aligned}$$

Thus the variation takes the form:

$$\begin{aligned} \partial _m\hat{{\mathscr {S}}}&= \langle \delta m(0),K\star m(0)\rangle +\langle \hat{m}(1),\delta m(1)\rangle \nonumber \\&\quad -\langle \hat{m}(0),\delta m(0)\rangle \nonumber \\&\quad - \int _0^1\langle \dot{\hat{m}},\delta m\rangle + \int _0^1\langle \mathrm{ad}_v\hat{m},\delta m\rangle +\int _0^1\langle \hat{v},\delta m\rangle \end{aligned}$$
(26)

For the variation of energy functional, S, with respect to image, I, we have:

$$\begin{aligned} \partial _I\hat{{\mathscr {S}}}&= \frac{1}{\sigma ^2}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle \\&\quad + \frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg (\int _0^1 \langle \hat{I}, \partial _t(I+\epsilon \delta I) + \nabla (I+\epsilon \delta I)\cdot v \rangle \bigg ) \\&=\frac{1}{\sigma ^2}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle + \int _0^1\langle \hat{I},\delta \dot{I}+ \nabla \delta I\cdot v\rangle \\&=\frac{1}{\sigma ^2}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle + \int _0^1\langle \hat{I},\delta \dot{I}\rangle \\&\quad + \int _0^1\langle \hat{I},\nabla \delta I\cdot v\rangle \\&=\frac{1}{\sigma ^2}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle + \langle \hat{I},\delta I\rangle \bigg |_{t=0}^{t=1} \\&\quad - \int _0^1 \langle \dot{\hat{I}},\delta I\rangle + \int _0^1\langle \hat{I},\nabla \delta I\cdot v\rangle \\&=\frac{1}{\sigma ^2}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle + \langle \hat{I}(1),\delta I(1)\rangle \\&\quad - \langle \hat{I}(0),\delta I(0)\rangle \\&\quad - \int _0^1 \langle \dot{\hat{I}},\delta I\rangle + \int _0^1\langle \hat{I}v,\nabla \delta I\rangle \end{aligned}$$

Thus the variation takes the form:

$$\begin{aligned} \partial _I\hat{{\mathscr {S}}}&=\frac{1}{\sigma ^2}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle + \langle \hat{I}(1),\delta I(1)\rangle \nonumber \\&\quad - \langle \hat{I}(0),\delta I(0)\rangle \nonumber \\&\quad - \int _0^1 \langle \dot{\hat{I}},\delta I\rangle - \int _0^1\langle \nabla \cdot (\hat{I}v),\delta I\rangle \end{aligned}$$
(27)

For the variation of energy functional, S, with respect to velocity, v, we have:

$$\begin{aligned} \partial _v \hat{{\mathscr {S}}}&= \frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg (\int _0^1 \langle \hat{m}, \dot{m} + \mathrm{ad}^*_{v+\epsilon \delta v}m) \rangle \nonumber \\&\quad + \int _0^1 \langle \hat{I},\dot{I} + \nabla I\cdot (v + \epsilon \delta v)\rangle _{L^2} \nonumber \\&\quad + \int _0^1 \langle \hat{v}, m -L(v+\epsilon \delta v)\rangle \bigg )\nonumber \\&=\frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg (\int _0^1 \langle \mathrm{ad}_{v+\epsilon \delta v}\hat{m}, m) \rangle \nonumber \\&\quad + \int _0^1 \langle \hat{I},\dot{I} + \nabla I\cdot (v + \epsilon \delta v)\rangle _{L^2} \nonumber \\&\quad + \int _0^1 \langle \hat{v}, m -L(v+\epsilon \delta v)\rangle \bigg )\nonumber \\&=\frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg (\int _0^1 \langle -\mathrm{ad}_{\hat{m}}(v+\epsilon \delta v), m \rangle \nonumber \\&\quad +\int _0^1 \langle \hat{I},\dot{I} + \nabla I\cdot (v + \epsilon \delta v)\rangle _{L^2} \nonumber \\&\quad + \int _0^1 \langle \hat{v}, m -L(v+\epsilon \delta v)\rangle \bigg )\nonumber \\&=\int _0^1 \langle -\mathrm{ad}_{\hat{m}}\delta v, m \rangle {+}\int _0^1\langle \hat{I},\nabla I \cdot \delta v\rangle {+} \int _0^1 \langle \hat{v},{-}L( \delta v)\rangle \nonumber \\&=\int _0^1 \langle -\mathrm{ad}^*_{\hat{m}} m ,\delta v\rangle +\int _0^1\langle \hat{I}\nabla I,\delta v\rangle - \int _0^1 \langle L\hat{v},\delta v\rangle \end{aligned}$$
(28)

Collecting all variations together:

$$\begin{aligned} \left. \begin{aligned} -\dot{\hat{m}}+\mathrm{ad}_v\hat{m}+\hat{v}&= 0\quad \\ -\dot{\hat{I}}-\nabla \cdot (\hat{I}v)&= 0\\ -\mathrm{ad}_{\hat{m}}^*m+\hat{I}\nabla I-L\hat{v}&=0 \end{aligned} \right\} \end{aligned}$$
(29)

subject to boundary condition,

$$\begin{aligned} \left. \begin{aligned} \hat{m}(1)&= 0\\ \hat{I}(1)&= 0\\ \end{aligned} \right\} \end{aligned}$$
(30)

and, adding jump conditions at observed data points \(t^i,\forall i=1,\cdots ,M\), (while integrating \(\hat{I}\) backwards) i.e., for, \(\hat{I}(t^{i+}) - \hat{I}(t^{i-}) = \frac{1}{\sigma ^2}(I(t^i)-J^i)\)

$$\begin{aligned} \left. \begin{aligned} \hat{I}(t^{i-}) = \hat{I}(t^{i+})+ \delta ^i \quad \end{aligned} \right\} \end{aligned}$$
(31)

where \(\hat{I}(t^{i+})\) and \(\hat{I}(t^{i-})\) denote the values of the integrated \(\hat{I}\) just the right and left, respectively, of the observed data point at \(t^i\). Also, jumps, \(\delta ^i = -\frac{1}{\sigma ^2}(I(t^i)-J^i)\ \forall i=0,\cdots ,M-1\).

Finally the variation of \(\hat{{\mathscr {S}}}\) with respect to \(\delta m(0)\) is:

$$\begin{aligned} \delta \hat{{\mathscr {S}}} = \langle K\star m(0)-\hat{m}(0),\delta m(0) \rangle \end{aligned}$$
(32)

and, the variation of \(\hat{{\mathscr {S}}}\) with respect to \(\delta I(0)\) is:

$$\begin{aligned} \delta \hat{{\mathscr {S}}} = \langle -\hat{I}(0),\delta I(0) \rangle \end{aligned}$$
(33)

Note that Equation set (29) can be written as:

$$\begin{aligned} \left. \begin{aligned} -\dot{\hat{m}}+\mathrm{ad}_v\hat{m}+ K\star (\hat{I} \nabla I - -\mathrm{ad}^*_{\hat{m}}m)&= -0\quad \\ \dot{\hat{I}}-\nabla \cdot (\hat{I}v)&= 0 \end{aligned} \right\} \end{aligned}$$
(34)

or equivalently,

$$\begin{aligned} \left. \begin{aligned} -\dot{\hat{m}}+\mathrm{ad}_v\hat{m}-\mathrm{ad}^\dagger _{\hat{m}}v + -K\star (\hat{I} \nabla I)&= -0\quad \\ \dot{\hat{I}}-\nabla \cdot (\hat{I}v)&= 0 \end{aligned} \right\} \end{aligned}$$
(35)

1.1 Backward Integration of Adjoint System

Note that the solution to equation for \(\hat{I}\) under no jump conditions is:

$$\begin{aligned} \hat{I}(t)&= |D\phi _{t,1}|\hat{I}(1)\circ \phi _{t,1} \end{aligned}$$
(36)

With jumps in \(\hat{I}\) along the integral, the solution takes the form:

$$\begin{aligned} \hat{I}(t)&= |D\phi _{t,1}|\hat{I}(1)\circ \phi _{t,1} + \sum _{t>t_i} |D\phi _{t,t_i}|\delta ^i\circ \phi _{t,t_i} \end{aligned}$$
(37)

Notice, we can further simplify Eq. (37) using splatting operators \(S_\phi (a) = |D\phi ^{-1}|a\circ \phi ^{-1}\).:

$$\begin{aligned} \hat{I}(t)&= S_{\phi _{1,t}}(\hat{I}(1)) +\sum _{t>t_i} S_{\phi _{t_i,t}}(\delta ^i) \end{aligned}$$
(38)

1.2 Closed Form Update for I(0)

Looking closely at the original energy functional in (22), we notice that the second term is the only dependence on I(0) by noting that \(I(t^i) = I(0)\circ \phi _{t^i,0}\). The norm in the second term is expanded to write:

$$\begin{aligned} {\mathscr {S}}(m(0),I(0))&= \frac{1}{2}\langle m(0),K\star m(0)\rangle _{L^2} \nonumber \\&\quad +\frac{1}{2\sigma ^2}\sum _{i=0}^{M-1}\int _\varOmega \langle I(0)\circ \phi _{t^i,0}(x)\nonumber \\&\quad -J^i(x), I(0)\circ \phi _{t^i,0}(x)-J^i(x)\rangle _{L^2}dx \end{aligned}$$
(39)

A change of variable, \(x=\phi _{0,t^i}(y)\) such that \(dx=|D\phi _{0,t^i}(y)|dy\) gives,

$$\begin{aligned}&{\mathscr {S}}(m(0),I(0)) = \frac{1}{2}\langle m(0),K\star m(0)\rangle _{L^2} \nonumber \\&\quad +\frac{1}{2\sigma ^2}\sum _{i=0}^{M-1}\int _\varOmega \langle I(0)(y)-J^i\circ \phi _{0,t^i}(y), I(0)(y)\nonumber \\&\quad -J^i\circ \phi _{0,t^i}(y)\rangle _{L^2} |D\phi _{0,t^i}(y)|dy \end{aligned}$$
(40)

which gives,

$$\begin{aligned}&{\mathscr {S}}(m(0),I(0)) \nonumber \\&\quad = \frac{1}{2}\langle m(0),K\star m(0)\rangle _{L^2} \nonumber \\&\qquad +\frac{1}{2\sigma ^2}\sum _{i=0}^{M-1} ||(I(0)-J^i\circ \phi _{0,t^i})\sqrt{|D\phi _{0,t^i}|}||^2 \end{aligned}$$
(41)

This implies the derivative with respect to I(0) becomes:

$$\begin{aligned} \partial _{I(0)}{\mathscr {S}}&= \sum _{i=0}^{M-1} \partial _{I(0)}||(I(0)-J^i\circ \phi _{0,t^i})\sqrt{|D\phi _{0,t^i}|}||^2\nonumber \\&=\sum _{i=0}^{M-1} \langle (I(0)-J^i\circ \phi _{0,t^i})\sqrt{|D\phi _{0,t^i}|},\sqrt{|D\phi _{0,t^i}|}\rangle \nonumber \\&=\sum _{i=0}^{M-1} (I(0)-J^i\circ \phi _{0,t^i})|D\phi _{0,t^i}| \end{aligned}$$
(42)

Eq. (42) to zero at optimal,

$$\begin{aligned}&\sum _{i=0}^{M-1} (I(0)-J^i\circ \phi _{0,t^i})|D\phi _{0,t^i}|=0\nonumber \\&\sum _{i=0}^{M-1} I(0)|D\phi _{0,t^i}| - \sum _{i=0}^{M-1}J^i\circ \phi _{0,t^i}|D\phi _{0,t^i}|=0\nonumber \\&I(0)\sum _{i=0}^{M-1} |D\phi _{0,t^i}|= \sum _{i=0}^{M-1}J^i\circ \phi _{0,t^i}|D\phi _{0,t^i}|\nonumber \\&I(0)=\frac{\sum _{i=0}^{M-1}J^i\circ \phi _{0,t^i}|D\phi _{0,t^i} |}{\sum _{i=0}^{M-1} |D\phi _{0,t^i}|} \end{aligned}$$
(43)

Appendix 3: Derivations for Hierarchical Geodesic Model

1.1 Group Geodesic Initial Conditions in Hierarchical Geodesic Model (HGM)

At the group level (see Fig. 4), the idea is to estimate the average geodesic, \(\psi (t)\), that is a representative of the population of geodesic trends denoted by the initial intercept-slope pair, \((J_i,n_i)\), for N individuals, \(i=1, \ldots , N\). The required estimate for \(\psi (t)\) must span the entire range of time along which the measurements are made for the population and must minimize residual diffeomorphisms \(\rho _i\) from \(\psi (t)\).

The augmented Lagrangian for the group geodesic as presented in Eq. (8) is:

$$\begin{aligned} \tilde{{\mathscr {E}}}&= {\mathscr {E}}+\int _0^1 \langle \hat{m},\dot{m}+\mathrm{ad}^*_{v} m \rangle _{L^2} dt+ \int _0^1 \langle \hat{I},\dot{I}+ \nabla I\cdot v\rangle _{L^2} dt\\&\quad +\int _0^1 \langle \hat{v},m-Lv\rangle _{L^2} dt\\&\quad +\sum _{i=1}^{N}\int _0^1 \langle \hat{p}_i,\dot{p}_i+\mathrm{ad}^*_{u_i} p_i \rangle _{L^2} ds{+}\int _0^1 \langle \hat{u}_i, p_i{-}Lu_i\rangle _{L^2} ds\\&\quad +\int _0^1 \langle \hat{\rho }_i,\dot{\rho }_i\circ \rho ^{-1}_i-u_i \rangle _{L^2}ds. \end{aligned}$$

The added constraints in the form of integrals represent geodesic constraints on \(\psi (t)\) and \(\rho _i\) for \(i=1, \ldots , N\). Notice, \(\sigma ^2_I\) and \(\sigma ^2_S\) represent the variances corresponding to the likelihood for the intercept and slope terms, respectively. Also, \(\rho _i\cdot I(t_i)\) is the group action of the residual diffeomorphism \(\rho _i\) on the image, \(I(t_i)\), and \(\rho _i\cdot m(t_i)\) is its group action on the momenta, \(m(t_i)\). This group action on momenta also coincides with the co-adjoint transport in the group of diffeomorphisms. This optimization problem corresponds to jointly estimating the group geodesic flow, \(\psi ,\) and residual geodesic flows, \(\rho _i,\) and the group baseline template, I(0).

The variation of the energy functional \(\tilde{{\mathscr {E}}}\) with respect to all time dependent variables results in ODEs in the form of dependent adjoint equations with boundary conditions and added jump conditions. We first report derivatives for the residual geodesics followed by that for the group geodesic.

1.1.1 For the Residual Geodesics, \(\rho _i\) Parameterized by s

For the sake of clarity we omit script i representing each residual for an individual. For each of the residual geodesics, the derivation proceeds as follows:

For the variation of the energy, \({\mathscr {E}}\), with respect to momenta, p, we have:

$$\begin{aligned} \partial _p\tilde{{\mathscr {E}}}&= \frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg (\int _0^1 \langle \hat{p}, \partial _s(p+\epsilon \delta p ) + \mathrm{ad}^*_u(p+\epsilon \delta p \rangle \\&\quad + \int _0^1 \langle \hat{u}, p+\epsilon \delta p -Lu\rangle \bigg )\\&= \int _0^1\langle \hat{p},\delta \dot{p}+\mathrm{ad}_u^*\delta p\rangle +\int _0^1\langle \hat{u},\delta p\rangle \nonumber \\&= \int _0^1\langle \hat{p},\delta \dot{p}\rangle + \int _0^1\langle \hat{p},\mathrm{ad}_u^*\delta p\rangle +\int _0^1\langle \hat{u},\delta p\rangle \\&= \langle \hat{p},\delta p\rangle \bigg |_{s=0}^{s=1} - \int _0^1\langle \dot{\hat{p}},\delta p\rangle + \int _0^1\langle \mathrm{ad}_u\hat{p},\delta p\rangle \\&\quad +\int _0^1\langle \hat{u},\delta p\rangle \end{aligned}$$

Thus the variation takes the form:

$$\begin{aligned} \partial _p\tilde{{\mathscr {E}}}&= \langle \hat{p}(1),\delta p(1)\rangle -\langle \hat{p}(0),\delta p(0)\rangle \nonumber \\&\quad - \int _0^1\langle \dot{\hat{p}},\delta p\rangle + \int _0^1\langle \mathrm{ad}_u\hat{p},\delta p\rangle +\int _0^1\langle \hat{u},\delta p\rangle \end{aligned}$$
(44)

For the variation of the energy, \({\mathscr {E}}\), with respect to \(\rho \), we have:

$$\begin{aligned} \partial _\rho \tilde{{\mathscr {E}}}&= \frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg ( \frac{1}{2\sigma ^2_I}\langle I(t^i)\circ \rho _\epsilon ^{-1}-J^i,I(t^i)\circ \rho _\epsilon ^{-1}-J^i\rangle \\&\quad + \frac{1}{2\sigma ^2_S}\langle \mathrm{Ad}^*_{\rho _\epsilon ^{-1}}m(t^i)-n^i, K\star (\mathrm{Ad}^*_{\rho _\epsilon ^{-1}}m(t^i)-n^i)\rangle \\&\quad + \int _0^1 \langle \hat{\rho }, \dot{\rho }_\epsilon \rho _\epsilon ^{-1}-u\rangle \bigg )\\&=\frac{1}{\sigma ^2_I}\langle \delta \rho ,(I(t^i)\circ \rho ^{-1} - J^i)\nabla (I(t^i)\circ \rho ^{-1})\rangle \\&\quad + \frac{1}{\sigma ^2_S}\langle \delta \mathrm{Ad}^*_{\rho ^{-1}}m(t^i), K\star (\mathrm{Ad}^*_{\rho _\epsilon ^{-1}}m(t^i)-n^i)\rangle \\&\quad + \int _0^1 \langle \hat{\rho }, (\dot{\delta \rho })\rho ^{-1}-(\dot{\rho }\rho ^{-1})(\delta \rho \rho ^{-1})\rangle \\ \partial _\rho \tilde{{\mathscr {E}}}&=\frac{1}{\sigma ^2_I}\langle \delta \rho ,(I(t^i)\circ \rho ^{-1} - J^i)\nabla (I(t^i)\circ \rho ^{-1})\rangle \\&\quad + \frac{1}{\sigma ^2_S}\langle -\mathrm{ad}^*_{\delta \rho \circ \rho ^{-1}}\mathrm{Ad}_\rho ^*m(t^i), K\star (\mathrm{Ad}^*_{\rho _\epsilon ^{-1}}m(t^i)-n^i)\rangle \\&\quad + \int _0^1 \langle \hat{\rho }, (\frac{d}{ds}{\delta \rho }\rho ^{-1})-\mathrm{ad}_u(\delta \rho \rho ^{-1})\rangle \\&=\frac{1}{\sigma ^2_I}\langle \delta \rho ,(I(t^i)\circ \rho ^{-1} - J^i)\nabla (I(t^i)\circ \rho ^{-1})\rangle \\&\quad + \frac{1}{\sigma ^2_S}\langle \mathrm{Ad}_\rho ^*m(t^i),-\mathrm{ad}_{\delta \rho \circ \rho ^{-1}} K\star (\mathrm{Ad}^*_{\rho _\epsilon ^{-1}}m(t^i)-n^i)\rangle \\&\quad + \int _0^1 \langle -\frac{d}{ds}\hat{\rho }-\mathrm{ad}^*_u\hat{\rho }, \delta \rho \rho ^{-1}\rangle \end{aligned}$$

Thus the variation takes the form:

$$\begin{aligned} \partial _\rho \tilde{{\mathscr {E}}}&=\frac{1}{\sigma ^2_I}\langle \delta \rho ,(I(t^i)\circ \rho ^{-1} - J^i)\nabla (I(t^i)\circ \rho ^{-1})\rangle \nonumber \\&\quad + \frac{1}{\sigma ^2_S}\langle \mathrm{Ad}_\rho ^*m(t^i),\mathrm{ad}_{ K\star (\mathrm{Ad}^*_{\rho _\epsilon ^{-1}}m(t^i)-n^i)}\delta \rho \circ \rho ^{-1} \rangle \nonumber \\&\quad + \int _0^1 \langle -\frac{d}{ds}\hat{\rho }-\mathrm{ad}^*_u\hat{\rho }, \delta \rho \rho ^{-1}\rangle \nonumber \\&=\frac{1}{\sigma ^2_I}\langle \delta \rho ,(I(t^i)\circ \rho ^{-1} - J^i)\nabla (I(t^i)\circ \rho ^{-1})\rangle \nonumber \\&\quad + \frac{1}{\sigma ^2_S}\langle \mathrm{ad}^*_{K\star (\mathrm{Ad}^*_{\rho _\epsilon ^{-1}}m(t^i)-n^i)}\mathrm{Ad}_\rho ^*m(t^i),\delta \rho \circ \rho ^{-1}\rangle \nonumber \\&\quad + \int _0^1 \langle -\frac{d}{ds}\hat{\rho }-\mathrm{ad}^*_u\hat{\rho }, \delta \rho \rho ^{-1}\rangle \end{aligned}$$
(45)

For the variation of the energy, \({\mathscr {E}}\), with respect to u, we have:

$$\begin{aligned} \partial _u \tilde{{\mathscr {E}}}&= \frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg (\int _0^1 \langle \hat{p}, \dot{p} + \mathrm{ad}^*_{u+\epsilon \delta u}p) \rangle \nonumber \\&\quad + \int _0^1 \langle \hat{u}, p -L(u+\epsilon \delta u)\rangle \nonumber \\&\quad + \int _0^1 \langle \hat{\rho }, \dot{\rho }\rho ^{-1} -(u+\epsilon \delta u)\rangle \bigg )\nonumber \\&=\int _0^1 \langle -\mathrm{ad}_{\hat{p}}\delta u, p \rangle + \int _0^1 \langle \hat{u},-L( \delta u)\rangle +\int _0^1 \langle \hat{\rho }, \delta u\rangle \nonumber \\&=\int _0^1 \langle -\mathrm{ad}^*_{\hat{p}} p ,\delta u\rangle - \int _0^1 \langle L\hat{u},\delta u\rangle + \int _0^1\hat{\rho }, \delta u\rangle \end{aligned}$$
(46)

Collecting variations together, the resulting adjoint systems for the residual geodesics for \(i=1,\ldots ,N\) are:

$$\begin{aligned} \left. \begin{aligned} \hat{u}_i-\dot{\hat{p}}_i+\mathrm{ad}_{u_i}\hat{p}_i&=0\quad \\ \hat{\rho }_i - L\hat{u}_i - \mathrm{ad}^*_{\hat{p}_i}p_i&=0\\ -\dot{\hat{\rho }}_i-\mathrm{ad}^*_{u_i}\hat{\rho }_i&=0 \end{aligned} \right\} \end{aligned}$$
(47)

with boundary conditions:

$$\begin{aligned} \left. \begin{aligned} \hat{p}_i(1)=&0, \mathrm{\ and\ \ } \\ \hat{\rho }_i(1) =&-\frac{1}{\sigma ^2_I}\big [\big (I(t_i)\circ \rho _i^{-1} - J_i\big )\big ]\nabla (I(t_i)\circ \rho ^{-1}_i) \\ {}&-\frac{1}{\sigma ^2_S}\big ( \mathrm{ad}^*_{K\star [\mathrm{Ad}^*_{{\rho _i}^{-1}}m(t_i)-n_i ]}\mathrm{Ad}^*_{\rho ^{-1}_i} m(t_i)\big ) \end{aligned} \right\} \end{aligned}$$
(48)

The gradients for update of initial momenta, \(p_i\) for residual diffeomorphisms are:

$$\begin{aligned} \delta _{p_i(0)}\tilde{{\mathscr {E}}}&= \frac{1}{\sigma ^2_I}K\star p_i(0) - \hat{p}_i(0). \end{aligned}$$
(49)

1.1.2 For the Group Geodesic Parameterized by t

The derivation of the adjoint system for the group geodesic is exactly same as that for the individual geodesic regression except for the extra slope match term that results in added jumps for the adjoint equation for momenta.

For the variation of the energy, \({\mathscr {E}}\), with respect to m, we have:

$$\begin{aligned} \partial _m\tilde{{\mathscr {E}}}&= \langle \delta m(0),K\star m(0)\rangle \\&\quad +\frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \frac{1}{2\sigma ^2_S}\sum _{i=0}^{M-1}\langle \mathrm{Ad}^*_{\rho ^{-1}}(m(t^i)\\&\quad +\epsilon \delta m(t^i)){-}n^i, K\star (\mathrm{Ad}^*_{\rho ^{-1}}(m(t^i){+}\epsilon \delta m(t^i)){-}n^i\rangle \\&\quad + \int _0^1 \langle \hat{m}, \partial _t(m+\epsilon \delta m) + \mathrm{ad}^*_v(m+\epsilon \delta m) \rangle \\&\quad + \int _0^1 \langle \hat{v}, m+\epsilon \delta m -Lv\rangle \\&= \langle \delta m(0),K\star m(0)\rangle \\&\quad + \frac{1}{\sigma ^2_S}\sum _{i=0}^{M-1}\langle \mathrm{Ad}^*_{\rho ^{-1}}\delta m(t^i),K\star (\mathrm{Ad}^*_{\rho ^{-1}}m(t^i)-n^i)\rangle \\&\quad +\int _0^1\langle \hat{m},\delta \dot{m}+\mathrm{ad}_v^*\delta m\rangle +\int _0^1\langle \hat{v},\delta m\rangle \\&=\langle \delta m(0),K\star m(0)\rangle \\&\quad + \frac{1}{\sigma ^2_S}\sum _{i=0}^{M-1}\langle \delta m(t^i),\mathrm{Ad}_{\rho ^{-1}} K\star (\mathrm{Ad}^*_{\rho ^{-1}}m(t^i)-n^i)\rangle \\&\quad +\int _0^1\langle \hat{m},\delta \dot{m}\rangle + \int _0^1\langle \hat{m},\mathrm{ad}_v^*\delta m\rangle +\int _0^1\langle \hat{v},\delta m\rangle \\&= \langle \delta m(0),K\star m(0)\rangle \\&\quad + \frac{1}{\sigma ^2_S}\sum _{i=0}^{M-1}\langle \delta m(t^i),\mathrm{Ad}_{\rho ^{-1}} K\star (\mathrm{Ad}^*_{\rho ^{-1}}m(t^i)-n^i)\rangle \\&\quad +\langle \hat{m},\delta m\rangle \bigg |_{t=0}^{t=1} - \int _0^1\langle \dot{\hat{m}},\delta m\rangle \\&\quad + \int _0^1\langle \mathrm{ad}_v\hat{m},\delta m\rangle +\int _0^1\langle \hat{v},\delta m\rangle \end{aligned}$$

Thus the variation takes the form:

$$\begin{aligned} \partial _m\tilde{{\mathscr {E}}}&= \langle \delta m(0),K\star m(0)\rangle \nonumber \\&\quad + \frac{1}{\sigma ^2_S}\sum _{i=0}^{M-1}\langle \delta m(t^i),\mathrm{Ad}_{\rho ^{-1}} K\star (\mathrm{Ad}^*_{\rho ^{-1}}m(t^i)-n^i)\rangle \nonumber \\&\quad +\langle \hat{m}(1),\delta m(1)\rangle -\langle \hat{m}(0),\delta m(0)\rangle \nonumber \\&\quad - \int _0^1\langle \dot{\hat{m}},\delta m\rangle + \int _0^1\langle \mathrm{ad}_v\hat{m},\delta m\rangle +\int _0^1\langle \hat{v},\delta m\rangle \end{aligned}$$
(50)

For the variation of the energy, \({\mathscr {E}}\), with respect to image, I, we have:

$$\begin{aligned} \partial _I\tilde{{\mathscr {E}}}&= \frac{1}{\sigma ^2_I}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle \\&\quad + \frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg (\int _0^1 \langle \hat{I}, \partial _t(I+\epsilon \delta I) + \nabla (I+\epsilon \delta I)\cdot v \rangle \bigg )\\&=\frac{1}{\sigma ^2_I}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle + \int _0^1\langle \hat{I},\delta \dot{I}+ \nabla \delta I\cdot v\rangle \\&=\frac{1}{\sigma ^2_I}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle + \int _0^1\langle \hat{I},\delta \dot{I}\rangle \\&\quad + \int _0^1\langle \hat{I},\nabla \delta I\cdot v\rangle \\&=\frac{1}{\sigma ^2_I}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle + \langle \hat{I},\delta I\rangle \bigg |_{t=0}^{t=1} \\&\quad - \int _0^1 \langle \dot{\hat{I}},\delta I\rangle + \int _0^1\langle \hat{I},\nabla \delta I\cdot v\rangle \\&=\frac{1}{\sigma ^2_I}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle + \langle \hat{I}(1),\delta I(1)\rangle \\&\quad - \langle \hat{I}(0),\delta I(0)\rangle - \int _0^1 \langle \dot{\hat{I}},\delta I\rangle \\&\quad + \int _0^1\langle \hat{I}v,\nabla \delta I\rangle \end{aligned}$$

Thus, the variation takes the form:

$$\begin{aligned} \partial _I\tilde{{\mathscr {E}}}&=\frac{1}{\sigma ^2_I}\sum _{i=0}^{M-1}\langle \delta I(t^i),I(t^i)-J^i\rangle \nonumber \\&\quad + \langle \hat{I}(1),\delta I(1)\rangle - \langle \hat{I}(0),\delta I(0)\rangle \nonumber \\&\quad - \int _0^1 \langle \dot{\hat{I}},\delta I\rangle - \int _0^1\langle \nabla \cdot (\hat{I}v),\delta I\rangle \end{aligned}$$
(51)

For the variation of the energy, \({\mathscr {E}}\), with respect to velocity, v, we have:

$$\begin{aligned} \partial _v \tilde{{\mathscr {E}}}&= \frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg (\int _0^1 \langle \hat{m}, \dot{m} + \mathrm{ad}^*_{v+\epsilon \delta v}m) \rangle \nonumber \\&\quad + \int _0^1 \langle \hat{I},\dot{I} + \nabla I\cdot (v + \epsilon \delta v)\rangle _{L^2} \nonumber \\&\quad + \int _0^1 \langle \hat{v}, m -L(v+\epsilon \delta v)\rangle \bigg )\nonumber \\&=\frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg (\int _0^1 \langle \mathrm{ad}_{v+\epsilon \delta v}\hat{m}, m) \rangle \nonumber \\&\quad + \int _0^1 \langle \hat{I},\dot{I} + \nabla I\cdot (v + \epsilon \delta v)\rangle _{L^2} \nonumber \\&\quad + \int _0^1 \langle \hat{v}, m -L(v+\epsilon \delta v)\rangle \bigg )\nonumber \\&=\frac{\partial }{\partial \epsilon }\bigg |_{\epsilon =0} \bigg (\int _0^1 \langle -\mathrm{ad}_{\hat{m}}(v+\epsilon \delta v), m \rangle \nonumber \\&\quad + \int _0^1 \langle \hat{I},\dot{I} + \nabla I\cdot (v + \epsilon \delta v)\rangle _{L^2} \nonumber \\&\quad + \int _0^1 \langle \hat{v}, m -L(v+\epsilon \delta v)\rangle \bigg )\nonumber \\&=\int _0^1 \langle -\mathrm{ad}_{\hat{m}}\delta v, m \rangle +\int _0^1\langle \hat{I},\nabla I {\cdot } \delta v\rangle + \int _0^1 \langle \hat{v},{-}L( \delta v)\rangle \nonumber \\&=\int _0^1 \langle -\mathrm{ad}^*_{\hat{m}} m ,\delta v\rangle +\int _0^1\langle \hat{I}\nabla I,\delta v\rangle - \int _0^1 \langle L\hat{v},\delta v\rangle \end{aligned}$$
(52)

Collecting all variations together resulting adjoint system for the group geodesic:

$$\begin{aligned} \left. \begin{aligned} -\dot{\hat{m}}+\mathrm{ad}_v\hat{m}+\hat{v} = -0\\ \dot{\hat{I}}-\nabla \cdot (\hat{I}v)= -0\\ \mathrm{ad}_{\hat{m}}^*m+\hat{I}\nabla I-L\hat{v} =0 \end{aligned} \right\} \end{aligned}$$
(53)

with boundary conditions:

$$\begin{aligned} \begin{aligned} \hat{I}(1) = 0, \mathrm{\ and\ \ } \hat{m}(1) = 0, \end{aligned} \end{aligned}$$
(54)

with added jumps at measurements, \(t_i\), such that,

$$\begin{aligned} \left. \begin{aligned} \hat{I}(t^{i+})-\hat{I}(t^{i-}) =\frac{1}{\sigma ^2_I}|D\rho _i|(I(t_i)\circ \rho _i^{-1} - J_i)\circ \rho _i \\ \hat{m}(t^{i+})-\hat{m}(t^{i-}) =\frac{1}{\sigma ^2_S}\mathrm{Ad}_{\rho _i^{-1}}\big (K\star (\mathrm{Ad}^*_{\rho _i^{-1}} m(t_i) - n_i) \big ) \end{aligned} \right\} \end{aligned}$$
(55)

Finally, the gradients for update of the initial group momentum is:

$$\begin{aligned} \delta _{m(0)}\tilde{{\mathscr {E}}}&= K\star m(0)-\hat{m}(0) \end{aligned}$$
(56)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, N., Hinkle, J., Joshi, S. et al. Hierarchical Geodesic Models in Diffeomorphisms. Int J Comput Vis 117, 70–92 (2016). https://doi.org/10.1007/s11263-015-0849-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-015-0849-2

Keywords

Navigation