Intrinsic Polynomials for Regression on Riemannian Manifolds

Hinkle, Jacob; Fletcher, P. Thomas; Joshi, Sarang

doi:10.1007/s10851-013-0489-5

Intrinsic Polynomials for Regression on Riemannian Manifolds

Open access
Published: 22 February 2014

Volume 50, pages 32–52, (2014)
Cite this article

Download PDF

You have full access to this open access article

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Intrinsic Polynomials for Regression on Riemannian Manifolds

Download PDF

Jacob Hinkle¹,
P. Thomas Fletcher¹ &
Sarang Joshi¹

3981 Accesses
54 Citations
Explore all metrics

Abstract

We develop a framework for polynomial regression on Riemannian manifolds. Unlike recently developed spline models on Riemannian manifolds, Riemannian polynomials offer the ability to model parametric polynomials of all integer orders, odd and even. An intrinsic adjoint method is employed to compute variations of the matching functional, and polynomial regression is accomplished using a gradient-based optimization scheme. We apply our polynomial regression framework in the context of shape analysis in Kendall shape space as well as in diffeomorphic landmark space. Our algorithm is shown to be particularly convenient in Riemannian manifolds with additional symmetry, such as Lie groups and homogeneous spaces with right or left invariant metrics. As a particularly important example, we also apply polynomial regression to time-series imaging data using a right invariant Sobolev metric on the diffeomorphism group. The results show that Riemannian polynomials provide a practical model for parametric curve regression, while offering increased flexibility over geodesics.

Nonlinear Regression on Manifolds for Shape Analysis using Intrinsic Bézier Splines

Shape Analysis by Computing Geodesics on a Manifold via Cubic B-splines

Article 27 October 2023

Geodesic Shape Regression in the Framework of Currents

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Comparative studies are essential to biomedical statistical analysis. In the context of shape, such analyses are used to discriminate between healthy and disease states based on observations of anatomical shapes within individuals in the two populations [35]. Commonly, in these methods the shape data are modelled on a Riemannian manifold and intrinsic coordinate-free manifold-based methods are used [8]. This prevents bias due to arbitrary choice of coordinates and avoids the influence of unwanted effects. For instance, by modelling shapes with a representation incapable of representing scale and rotation of an object and using intrinsic manifold-based methods, scale and rotation are guaranteed not to effect the analysis [19].

Many conditions such as developmental disorders and neurodegeneration are characterized not only by shape characteristics, but by abnormal trends in anatomical shapes over time. Thus it is often the temporal dependence of shape that is most useful for comparative shape analysis. The field of regression analysis involves studying the connection between independent variables and observed responses [34]. In particular, this includes the study of temporal trends in a observed data.

In this work, we extend the recently developed geodesic regression model [12] to higher order polynomials using intrinsic Riemannian manifold-based methods. We show that this Riemannian polynomial model is able to provide increased flexibility over geodesics, while remaining in the parametric regression setting. The increase in flexibility is particularly important, as it enables a more accurate description of shape trends and, ultimately, more useful comparative regression analysis.

While our primary motivation is shape analysis, the Riemannian polynomial model is applicable in a variety of applications. For instance, directional data is commonly modelled as points on the sphere $\mathbb{S}^{2}$, and video sequences representing human activity are modelled in Grassmannian manifolds [36].

In computational anatomy applications, the primary objects of interest are elements of a group of symmetries acting on the space of observable data. For instance, rigid motion is studied using the groups SO(3) and SE(3), acting on a space of landmark points or scalar images. Non-rigid motion and growth is modelled using infinite-dimensional diffeomorphism groups, such as in the currents framework [37] for unlabelled landmarks or the large deformation diffeomorphic metric mapping (LDDMM) framework of deforming images [30]. We show that in the presence of a group action, optimization of our polynomial regression model using an adjoint method is particularly convenient.

This work is an extension of the Riemannian polynomial regression framework first presented by Hinkle et al. [15]. In Sects. 5–7, we give a new derivation of polynomial regression for Lie groups and Lie group actions with Riemannian metrics. By performing the adjoint optimization directly in the Lie algebra, the computations in these spaces are greatly simplified over the general formulation. We show how this Lie group formulation can be used to perform polynomial regression on the space of images acted on by groups of diffeomorphisms.

1.1 Regression Analysis and Curve-Fitting

The study of the relationship between measured data and descriptive variables is known as the field of regression analysis. As with most statistical techniques, regression analyses can be broadly divided into two classes: parametric and non-parametric. The most widely used parametric regression methods are linear and polynomial regression in Euclidean space, wherein a linear or polynomial function is fit in a least-squares fashion to observed data. Such methods are the staple of modern data analysis. The most common non-parametric regression approaches are kernel-based methods and spline smoothing approaches which provide great flexibility in the class of regression functions. However, their non-parametric nature presents a challenge to inference problems; if, for example, one wishes to perform a hypothesis test to determine whether the trend for one group of data is significantly different from that of another group.

In previous work, non-parametric kernel-based and spline-based methods have been extended to observations that lie on a Riemannian manifold with some success [8, 18, 22, 26], but intrinsic parametric regression on Riemannian manifolds has received limited attention. Recently, Fletcher [12] and Niethammer et al. [31] have each independently developed a form of parametric regression, geodesic regression, which generalizes the notion of linear regression to Riemannian manifolds. Geodesic models are useful, but are limited by their lack of flexibility when modelling complex trends.

Fletcher [12] defines a geodesic regression model by introducing a manifold-valued random variable Y,

$$\begin{aligned} Y=\operatorname{Exp}\bigl(\operatorname{Exp}(p,Xv),\epsilon\bigr), \end{aligned}$$

(1)

where p∈M is an initial point and v∈T _p M an initial velocity. The geodesic curve $\operatorname{Exp}(p,Xv)$ then relates the independent variable X∈R to the dependent random variable Y, via this equation and the Gaussian random vector $\epsilon\in T_{\operatorname{Exp}(p,Xv)}M$. In this paper, we extend this model to a polynomial regression model

$$\begin{aligned} Y=\operatorname{Exp}\bigl(\gamma(X),\epsilon\bigr), \end{aligned}$$

(2)

where the curve γ(X) is a Riemannian polynomial of integer order k. In the case that M is Euclidean space, this model is simply

$$\begin{aligned} Y=p + \sum_{i=1}^k \frac{v_i}{i!} X^i + \epsilon, \end{aligned}$$

(3)

where the point p and vectors v _i constitute the parameters of our model.

In this work we use the common term regression to describe methods of fitting polynomial curves using a sum of squared error penalty function. In Euclidean spaces, this is equivalent to solving a maximum likelihood estimation problem using a Gaussian noise model for the observed data. In Riemannian manifolds, the situation is more nuanced, as there is no consensus on how to define Gaussian distributions on general Riemannian manifolds, and in general the least-squares penalty may not correspond to a log likelihood. Many of the examples we will present are symmetric spaces: Kendall shape space in two dimensions, the rotation group, and the sphere, for instance. As Fletcher [12, Sect. 4] explains, least-squares regression in symmetric spaces does, in fact, correspond to maximum likelihood estimation of model parameters, using a natural definition of Gaussian distribution.

1.2 Previous Work: Cubic Splines

Noakes et al. [32] first introduced the notion of Riemannian cubic splines. They fix the endpoints y ₀,y ₁∈M of a curve, as well as the derivative of the curve at those points $y_{0}'\in T_{y_{0}}M, y_{1}'\in T_{y_{1}}M$. A Riemannian cubic spline is then defined as any differentiable curve γ:[0,1]→M taking on those endpoints and derivatives and minimizing

$$\begin{aligned} \varPhi(\gamma) = \int_0^1 \biggl\langle \nabla_{\frac{d}{dt}\gamma}\frac{d}{dt}\gamma(t), \nabla_{\frac{d}{dt}\gamma}\frac{d}{dt}\gamma(t) \biggr\rangle dt. \end{aligned}$$

(4)

As is shown by Noakes et al. [14, 32], between endpoints, cubic splines satisfy the following Euler-Lagrange equation:

$$\begin{aligned} \nabla_{\frac{d}{dt}\gamma}\frac{d}{dt}\gamma+ R \biggl( \nabla_{\frac{d}{dt}\gamma}\frac{d}{dt}\gamma,\frac{d}{dt}\gamma \biggr) \frac{d}{dt}\gamma&= 0. \end{aligned}$$

(5)

Cubic splines are useful for interpolation problems on Riemannian manifolds. However, cubic splines provide an insufficient model for parametric curve regression. For instance, by increasing the order of derivatives in Eq. (4), cubic splines are generalizable to higher order curves. Still, only odd order splines may be defined in this way, and there is no clear way to define even order splines.

Riemannian splines are parametrized by the endpoint conditions, meaning that the space of curves is naturally explored by varying control points. This is convenient if control points such as observed data are given at the outset. However, for parametric curve regression, curve models are preferred that don’t depend on the data, such as the initial conditions of a geodesic [12]. Although Eq. (5) provides an ODE which could be used as such a parametric model in a “spline shooting” algorithm, estimating initial position and derivatives as parameters, the curvature term complicates integration and optimization.

1.3 Contributions in This Work

The goal of the current work is to extend the geodesic regression model in order to accommodate more flexibility while remaining in the parametric setting. The increased flexibility introduced by the methods in this manuscript allow a better description of the variability in the data. The work presented in this paper allows one to fit polynomial regression curves on a general Riemannian manifold, using intrinsic methods and avoiding the need for unwrapping and rolling. Since our model includes time-reparametrized geodesics as a special case, information about time dependence is also obtained from the regression without explicit modeling by examining the collinearity of the estimated parameters.

We derive practical algorithms for fitting polynomial curves to observations in Riemannian manifolds. The class of polynomial curves we use, described by Leite & Krakowski [24], is more suited to parametric curve regression than are spline models. These polynomials curves are defined for any integer order and are naturally parametrized via initial conditions instead of control points. We derive explicit formulas for computing derivatives with respect to the initial conditions of these polynomials in a least-squares curve-fitting setting.

In the following sections, we describe our method of fitting polynomial curves to data lying in various spaces. We develop the theory for general Riemannian manifolds, Lie groups with right invariant metrics, and finally for spaces acted on by such Lie groups. In order to keep each application somewhat self-contained, results will be shown in each case in the section in which the associated space is treated, instead of in a separate results section following all the methods.

2 Riemannian Geometry Preliminaries

Before defining Riemannian polynomials, we first review a few basic results from Riemannian geometry and establish a common notation. For a more in-depth treatment of this background material see, for instance, do Carmo [9]. Let (M,g) be a Riemannian manifold. At each point p∈M, the metric g defines an inner product on the tangent space T _p M. The metric also provides a method to differentiate vector fields with respect to one another, referred to as the covariant derivative. For smooth vector fields $v,w\in\mathfrak{X}(M)$ and a smooth curve γ:[0,1]→M the covariant derivative satisfies the following product rule:

$$\begin{aligned} \frac{d}{dt}\bigl\langle v\bigl(\gamma(t)\bigr),w\bigl(\gamma (t)\bigr)\bigr\rangle &= \bigl\langle \nabla_{\frac{d}{dt}{\gamma}}v\bigl(\gamma (t)\bigr),w\bigl( \gamma(t)\bigr)\bigr\rangle \\ &\quad{}+ \bigl\langle v\bigl(\gamma(t)\bigr),\nabla_{\frac {d}{dt}{\gamma}}w\bigl( \gamma(t)\bigr)\bigr\rangle . \end{aligned}$$

(6)

A geodesic γ:[0,1]→M is characterized (for instance) by the conservation of kinetic energy along the curve:

$$\begin{aligned} \frac{d}{dt} \biggl\langle \frac{d}{dt}{\gamma},\frac{d}{dt}{ \gamma} \biggr\rangle = 0 = 2 \biggl\langle \nabla_{\frac {d}{dt}{\gamma}} \frac{d}{dt}{\gamma},\frac{d}{dt}{\gamma} \biggr\rangle . \end{aligned}$$

(7)

which leads to the differential equation

$$\begin{aligned} \nabla_{\frac{d}{dt}{\gamma}} \frac{d}{dt}{\gamma} &= 0. \end{aligned}$$

(8)

This is called the geodesic equation and uniquely determines geodesics, parametrized by the initial conditions $(\gamma(0),\frac{d}{dt}{\gamma}(0))\in TM$. The mapping from the tangent space at p into the manifold M, defined by integration of the geodesic equation, is called the exponential map and is written $\operatorname{Exp}_{p}:T_{p}M\to M$. The exponential map is injective on a zero-centered ball B in T _p M of some non-zero radius. Thus, for a point q within a neighborhood of p, there exists a unique vector v∈T _p M corresponding to a minimal length path under the exponential map from p to q. The mapping of such points q to their associated tangent vectors v at p is called the log map of q at p, denoted $v = \operatorname{Log}_{p} q$.

Given a curve γ:[0,1]→M, the covariant derivative $\nabla_{\frac{d}{dt}\gamma}$ provides a way to relate tangent vectors at different points along γ. A vector field w is said to be parallel transported along γ if it satisfies the parallel transport equation,

$$\begin{aligned} \nabla_{\frac{d}{dt}{\gamma}} w\bigl(\gamma(t)\bigr) = 0. \end{aligned}$$

(9)

Notice that the geodesic equation is a special case of parallel transport, under which the velocity is parallel along the curve itself.

3 Riemannian Polynomials

We now introduce Riemannian polynomials as a generalization of geodesics [15]. Geodesics are generalizations to the Riemannian manifold setting of curves in $\mathbb{R}^{d}$ with constant first derivative. In the previous section we briefly reviewed how the covariant derivative provides a way to define vector fields which are analogous to constant vector fields along γ, via parallel transport.

We refer to the vector field $\nabla_{\frac{d}{dt}{\gamma}}\frac{d}{dt}{\gamma}(t)$ as the acceleration of the curve γ. Curves with parallel acceleration are generalizations of curves in $\mathbb{R}$ whose coordinates are second order polynomials, and satisfy the second order polynomial equation,

$$\begin{aligned} (\nabla_{\frac{d}{dt}{\gamma}})^2\frac{d}{dt}{\gamma}(t) &= 0. \end{aligned}$$

(10)

Extending this idea, a cubic polynomial is a curve with parallel jerk (time derivative of acceleration), and so on. Generally, a kth order polynomial in M is defined as a curve γ:[0,1]→M satisfying

$$\begin{aligned} (\nabla_{\frac{d}{dt}{\gamma}})^k\frac{d}{dt}{\gamma}(t) &= 0 \end{aligned}$$

(11)

for all times t∈[0,1]. As with polynomials in Euclidean space, polynomials are fully determined by initial conditions at t=0:

$$\begin{aligned} &\gamma(0)\in M, \end{aligned}$$

(12)

$$\begin{aligned} &\frac{d}{dt}{\gamma}(0)\in T_{\gamma(0)}M, \end{aligned}$$

(13)

$$\begin{aligned} &(\nabla_{\frac{d}{dt}\gamma} )^i\frac{d}{dt}\gamma(0)\in T_{\gamma(0)}M,\quad i=1,\ldots,k-1. \end{aligned}$$

(14)

Introducing vector fields v ₁(t),…,v _k(t)∈T _γ(t) M, we write the following system of covariant differential equations, which is equivalent to Eq. (11):

$$\begin{aligned} &\frac{d}{dt}{\gamma}(t) = v_1(t) \end{aligned}$$

(15)

$$\begin{aligned} &\nabla_{\frac{d}{dt}{\gamma}}v_{i}(t) = v_{i+1}(t),\quad i=1, \ldots,k-1 \end{aligned}$$

(16)

$$\begin{aligned} &\nabla_{\frac{d}{dt}{\gamma}}v_{k}(t) = 0. \end{aligned}$$

(17)

In this notation, the initial conditions that determine the polynomial are γ(0),v _i(0),i=1,…,k.

The Riemannian polynomial equations cannot, in general, be solved in closed form, and must be integrated numerically. In order to discretize this system of covariant differential equations, we implement a covariant Euler integrator, depicted in Algorithm 1. A time step Δt is chosen and, at each step of the integrator, γ(t+Δt) is computed using the exponential map:

$$\begin{aligned} \gamma(t+\varDelta t) &= \operatorname{Exp}_{\gamma(t)}\bigl(\varDelta t v_1(t) \bigr). \end{aligned}$$

(18)

Each vector v _i is incremented within the tangent space at γ(t) and the results are parallel transported infinitesimally along a geodesic from γ(t) to γ(t+Δt). For a proof that this algorithm approximates the polynomial equations, see Appendix A. The only ingredients necessary to integrate a polynomial are the exponential map and parallel transport on the manifold.

Figure 1 shows the result of integrating polynomials of order one, two, and three on the sphere. The parameters, the initial velocity, acceleration, and jerk, were chosen a priori and a cubic polynomial was integrated to obtain the blue curve. Then the initial jerk was set to zero and the blue quadratic curve was integrated, followed by the black geodesic whose acceleration was also set to zero.

3.1 Polynomial Time Reparametrization

Geodesic curves propagate at a constant speed as a result of their extremal action property. Polynomials provide flexibility not only in the class of paths that are possible, but in the time dependence of the curves traversing those paths. If the parameters of a polynomial γ consist of collinear vectors v _i(0)∈T _γ(0) M, then the path of γ (the image of the mapping γ) matches that of a geodesic, but the time dependence has been reparametrized by some polynomial transformation t↦c ₀+c ₁ t+c ₂ t ²+c ₃ t ³. This generalizes the existence of polynomials in Euclidean space which are merely polynomial transformations of a straight line path. Regression models could even be implemented in which the operator wishes to estimate geodesic paths, but is unsure of parametrization, and so enforces the estimated parameters to be collinear.

4 Polynomial Regression via Adjoint Optimization

In order to regress polynomials against observed data J _j∈M,j=1,…,N at known times $t_{j}\in\mathbb{R},j=1,\dots,N$, we define the following objective function

$$\begin{aligned} E_0\bigl(\gamma(0),v_1(0),\ldots,v_k(0) \bigr)&= \frac{1}{N}\sum_{j=1}^N d \bigl(\gamma(t_j),J_j\bigr)^2 \end{aligned}$$

(19)

subject to the constraints given by Eqs. (15)–(17). Note that in this expression d represents the geodesic distance: the minimum length of a path from the curve point γ(t _j) to the data point J _j. The function E ₀ is minimized in order to find the optimal initial conditions γ(0),v _i(0),i=1,…,k, which we will refer to as the parameters of our model.

In order to determine the optimal parameters of the polynomial, we introduce Lagrange multiplier vector fields λ _i for i=0,…,k, often called the adjoint variables, and define the augmented Lagrangian function

$$\begin{aligned} &E\bigl(\gamma,\{v_i\},\{\lambda_i\}\bigr) \\ &\quad = \frac{1}{N}\sum_{j=1}^N d\bigl( \gamma(t_j),J_j\bigr)^2 \\ &\qquad{}+\int_0^T \biggl\langle \lambda_0(t), \frac{d}{dt}{\gamma}(t)-v_1(t) \biggr\rangle dt \\ &\qquad{}+\sum_{i=1}^{k-1}\int _0^T \bigl\langle \lambda_i(t), \nabla_{\frac{d}{dt}{\gamma}}v_i(t) - v_{i+1}(t) \bigr\rangle dt \\ &\qquad{}+\int_0^T \bigl\langle \lambda_k(t), \nabla_{\frac{d}{dt}{\gamma}}v_k(t) \bigr\rangle dt. \end{aligned}$$

(20)

As is standard practice, the optimality conditions for this equation are obtained by taking variations with respect to all arguments of E, integrating by parts when necessary. The resulting variations with respect to the adjoint variables yield the original dynamic constraints: the polynomial equations. Variations with respect to the primal variables gives rise to the following system of equations, termed the adjoint equations (see B for derivation).

$$\begin{aligned} &\nabla_{\frac{d}{dt}{\gamma}}\lambda_i(t) = -\lambda _{i-1}(t)\quad i=1,\ldots,k \end{aligned}$$

(21)

$$\begin{aligned} &\nabla_{\frac{d}{dt}{\gamma}}\lambda_0(t) = - \sum _{i=1}^kR\bigl(v_i(t), \lambda_i(t)\bigr)v_1(t) , \end{aligned}$$

(22)

where R is the Riemannian curvature tensor and the adjoint variable λ ₀ takes jump discontinuities at time points where data is present:

$$\begin{aligned} \lambda_0\bigl(t_j^-\bigr)-\lambda_0 \bigl(t_j^+\bigr) &= \operatorname{Log}_{\gamma(t_j)}J_j. \end{aligned}$$

(23)

Note that this jump discontinuity corresponds to the variation of E with respect to γ(t _j). The Riemannian curvature tensor is defined by the formula [9]

$$\begin{aligned} R(u,v)w = \nabla_u \nabla_v w - \nabla_v \nabla_u w - \nabla_{[u,v]} w, \end{aligned}$$

(24)

and can be computed in closed form for many manifolds. Gradients of E with respect to initial and final conditions give rise to the terminal endpoint conditions for the adjoint variables,

$$\begin{aligned} \lambda_i(1) = 0,\quad i=0,\ldots,k \end{aligned}$$

(25)

as well as expressions for the gradients with respect to the parameters γ(0),v _i(0):

$$\begin{aligned} &\delta_{\gamma(0)} E = -\lambda_0(0), \end{aligned}$$

(26)

$$\begin{aligned} &\delta_{v_i(0)} E = -\lambda_i(0). \end{aligned}$$

(27)

In order to determine the value of the adjoint vector fields at t=0, and thus the gradients of the functional E ₀, the adjoint variables are initialized to zero at time 1, then Eq. (22) is integrated backward in time to t=0.

Given the gradients with respect to the parameters, a simple steepest descent algorithm is used to optimize the functional. At each iteration, γ(0) is updated using the exponential map and the vectors v _i(0) are updated via parallel translation. This algorithm is depicted in Algorithm 2.

Note that in the special case of a zero-order polynomial (k=0), the only gradient λ ₀ is simply the mean of the log map vectors at the current estimate of the Fréchet mean. So this method generalizes the common method of Fréchet averaging on manifolds via gradient descent [13]. In the case of geodesic polynomials, k=1, the curvature term in Eq. (22) indicates that λ ₁ is a sum of Jacobi fields. So this approach subsumes geodesic regression as presented by Fletcher [12]. For higher order polynomials, the adjoint equations represent a generalization of Jacobi field.

As we will see later, in some cases these adjoint equations take a simpler form not involving curvature. In the case that the manifold M is a Lie group, the adjoint equations can be computed by taking variations in the Lie algebra, avoiding explicit curvature computation.

4.1 Coefficient of Determination (R ²) in Metric Spaces

In order to characterize how well our model fits a given set of data, we define the coefficient of determination of our regression curve γ(t), denoted R ² [12]. As with the usual definition of R ², we first compute the variance of the data. Naturally, as the data lie on a non-Euclidean metric space, instead of the standard sample variance, we substitute the Fréchet variance, defined as

$$\begin{aligned} \operatorname{var}\{y_1,\ldots,y_N\} = \frac{1}{N} \min _{\bar{y}\in M} \sum_{j=1}^N d( \bar{y},y_j)^2. \end{aligned}$$

(28)

The sum of squared error for a curve γ is the value E ₀(γ):

$$\begin{aligned} \mathit{SSE} = \frac{1}{N} \sum_{j=1}^N d \bigl(\gamma(t_j),y_j\bigr)^2. \end{aligned}$$

(29)

We then define R ² as the amount of variance that has been reduced using the curve γ:

$$\begin{aligned} R^2 &= 1- \frac{\mathit{SSE}}{\operatorname{var}\{y_1,\ldots,N\}}. \end{aligned}$$

(30)

Clearly a perfect fit will remove all error, resulting in an R ² value of one. The worst case (R ²=0) occurs when no polynomial can improve over a stationary point at the Fréchet mean, which can be considered a zero-order polynomial regression against the data.

4.2 Example: Kendall Shape Space

A common challenge in medical imaging is the comparison of shape features which are independent of easily explained differences such as differences in pose (relative position and rotation). Additionally, scale is often uninteresting as it is easily characterized by volume calculation and explained mostly by intersubject variability or differences in age. It was with this perspective that Kendall [19] originally developed his theory of shape space. Here we briefly describe Kendall’s shape space of m-landmark point sets in $\mathbb{R}^{d}$, denoted $\varSigma_{d}^{m}$. For a complete treatment of Kendall’s shape space, the reader is encouraged to consult Kendall and Le [20, 23].

Given a point set $x=(x_{i})_{i=1,\ldots,m},x_{i}\in\mathbb{R}^{d}$, translation and scaling effects are removed by centering and uniform scaling. This is achieved by translating the point set so that the centroid is at zero, then scaling so that $\sum_{i=1}^{m} \|x_{i}\|^{2}=1$. After this standardization, x constitutes a point in the sphere $\mathbb{S}^{(m-1)d-1}$. This representation of shape is not yet complete as it is effected by global rotation, which we wish to ignore. Thus points on $\mathbb{S}^{(m-1)d-1}$ are referred to as preshapes and the sphere $\mathbb{S}^{(m-1)d-1}$ is referred to as preshape space. Kendall shape space $\varSigma_{d}^{m}$ is obtained by taking the quotient of the preshape space by the action of the rotation group SO(d). In practice, points in the quotient (referred to as shapes) are represented by members of their equivalence class in preshape space. We describe now how to compute exponential maps, log maps, and parallel transport in shape space, using representatives in $\mathbb{S}^{(m-1)d-1}$. The work of O’Neill [33] concerning Riemannian submersions characterizes the link between the shape and preshape spaces.

The case d>2 is complicated in that these spaces contain degeneracies: points at which the mapping from preshape space to $\varSigma_{d}^{m}$ fails to be a submersion [1, 11, 17]. Despite these pathologies, outside of a singular set, the shape spaces are described by the theory of Riemannian submersions. We assume the data lie within a single “manifold part” away from any singularities, and show experiments in two dimensions, so that these technical issues can be safely ignored.

Each point p in preshape space projects to a point π(p) in shape space. The shape π(p) is the orbit of p under the action of SO(d). Viewed as a subset of $\mathbb{S}^{(m-1)d-1}$, this orbit is a submanifold whose tangent space is a subspace of that of the sphere. This subspace is called the vertical subspace of $T_{p}\mathbb {S}^{(m-1)d-1}$ and its orthogonal complement is the horizontal subspace. Projections onto the two subspaces of a vector $v\in T_{p}\mathbb {S}^{(m-1)d-1}$ are denoted by $\mathcal{V}(v)$ and $\mathcal{H}(v)$, respectively. Curves moving along vertical tangent vectors result in rotations of a preshape, and so do not indicate any change in actual shape.

A vertical vector in preshape space arises as the derivative of a rotation of a preshape. The derivative of such a rotation is a skew-symmetric matrix W, and its action on a preshape x has the form $(Wx_{1},\ldots,Wx_{n})\in T\mathbb{S}^{(m-1)d-1}$. The vertical subspace is then spanned by such tangent vectors arising from any linearly independent set of skew-symmetric matrices. The projection $\mathcal{H}$ is performed by taking such a spanning set, performing Gram-Schmidt orthonormalization, and removing each component.

The horizontal projection allows one to relate the covariant derivative on the sphere to that on shape space. Lemma 1 of O’Neill [33] states that if X,Y are horizontal vector fields at some point p in preshape space, then

$$\begin{aligned} \mathcal{H}\nabla_XY &= \nabla_{X^*}^*Y^*, \end{aligned}$$

(31)

where ∇ denotes the covariant derivative on preshape space and ∇^∗,X ^∗, and Y ^∗ are their counterparts in shape space.

For the manifold part of a general shape space $\varSigma_{d}^{m}$, the exponential map and parallel translation are performed using representatives preshapes in $\mathbb {S}^{(m-1)d-1}$. For d>2, this must be done in a time-stepping algorithm, in which at each time step an infinitesimal spherical parallel transport is performed, followed by the horizontal projection. The resulting algorithm can be used to compute the exponential map as well. Computation of the log map is less trivial, as it requires an iterative optimization routine. A special case arises in the case when d=2, in which case the entire space $\varSigma_{d}^{m}$ is a manifold. In this case the exponential map, parallel transport and log map are computed in closed form [12]. With the exponential map, log map, and parallel transport, one performs polynomial regression on Kendall shape space via the adjoint method described previously.

4.2.1 Rat Calivaria Growth

We have applied polynomial regression in Kendall shape space to the data first analyzed by Bookstein [2], which consists of m=8 landmarks on a midsagittal section of rat calivaria (skulls excluding the lower jaw). The positions of eight identifiable positions on the skull are available for 18 rats and at of eight ages apiece. Figure 2 shows Riemannian polynomial fits of orders k=0,1,2,3. Curves of the same color indicate the synchronized motion of landmarks within a preshape, and the collection of curves for all eight landmarks represents a curve in shape space. While the geodesic curve in Kendall shape space shows little curvature, the quadratic and cubic curves are less linear which demonstrates the added flexibility provided by higher order polynomials. The R ² values agree with this qualitative difference: the geodesic regression has R ²=0.79, while the quadratic and cubic regressions have R ² values of 0.85 and 0.87, respectively. While this shows that there is a clear improvement in the fit due to increasing k from one to two, it also shows that little is gained by increasing the order of the polynomial beyond k=2. Qualitatively, Fig. 2 shows that the slight increase in R ² obtained by moving from a quadratic to cubic model corresponds to a marked difference in the curves, indicating that the cubic curve is likely overfitting the data. As seen in Table 1, increasing the order of polynomial to four or five has very little effect on R ² as well.

Table 1 R ² for regression of rat dataset

Full size table

These results indicate that moving from a geodesic to quadratic model provides an important improvement in fit quality. This is consistent with the results of Kenobi et al. [21], who also found that quadratic and possibly cubic curves are necessary to fit this dataset. However, whereas Kenobi et al. use polynomials defined in the tangent space at the Fréchet mean of the data points, the polynomials we use are defined intrinsically, independent of base point.

4.2.2 Corpus Callosum Aging

The corpus callosum, the major white matter bundle connecting the two hemispheres of the brain, is known to shrink during aging [10]. Fletcher showed [12] that more nuanced modes of shape change are observed using geodesic regression. In particular, the volume change observed in earlier studies corresponds to a thinning of the corpus callosum and increased curling of the anterior and posterior regions. In order to investigate even higher modes of shape change of the corpus callosum during normal aging, polynomial regression was performed on data from the OASIS brain database [27]. Magnetic resonance imaging (MRI) scans from 32 normal subjects with ages between 19 and 90 years were obtained from the database and a midsagittal slice was extracted from each volumetric image. The corpus callosum was then segmented on the 2D slices using the ITK-SNAP program [39]. Sets of 64 landmarks for each patient were obtained using the ShapeWorks program [6], which generates samplings of each shape boundary with optimal correspondences among the population.

Regression results for geodesic, quadratic, and cubic regression are shown in Fig. 3. At first glance the results appear similar for the three different models, since the motion envelopes each show the thinning and curling observed by Fletcher. Indeed, the optimal quadratic curve is quite similar to the optimal geodesic, as reflected by their similar R ² values (0.13 and 0.12, respectively). However, moving from a quadratic to cubic polynomial model delivers a substantial increase in R ² (from 0.13 to 0.21). This suggests that there are interesting third-order phenomena at work. However, as seen in Table 2, increasing the order beyond three results in very little increase in R ², indicating that those orders overfit the data, as was the case in the rat calivaria study as well.

Table 2 R ² for regression of corpus callosum dataset

Full size table

Inspection of the estimated parameters for the optimal cubic curve, shown in Fig. 4, reveals that the tangent vectors appear to be collinear. As discussed in Sect. 3.1, this suggests that the cubic curve is a geodesic that has undergone a cubic time reparametrization.

Note that the R ² values are quite low in this study. Similar values were observed using geodesic regression in [12]. As is noted, this is likely due to high inter-subject variability, and that age is only able to explain an effect which is small compared to differences between subjects. Fletcher [12] also notes that although the effect may be small, geodesic regression gives a result which is significant (p=0.009) using a non-parametric permutation test.

Model selection, which in the case of polynomial regression amounts to the choice of polynomial order, is an important issue. R ² always increases with increasing k, as we have seen in these two studies. As a result, other measures are sought which balance goodness of fit with complexity of the curve model. Tools often used for model selection in Euclidean polynomial regression, such as Akaike information criterion and Bayesian information criterion [5] make assumptions about the distribution of data that are difficult to generalize to the manifold setting. Extension of permutation testing for geodesic regression to higher orders would be useful for this task, but such extension is not trivial on a Riemannian manifold. We expect that such an extension of permutation testing is possible in certain cases where it is possible to define “exchangeability” under the null hypothesis that the data follow a given order k trend. Currently, we select models based on qualitative analysis of the fit curves, as in the rat calivaria study, and R ² values.

4.3 LDDMM Landmark Space

Analysis of landmarks is commonly done in an alternative fashion when scale and rotation invariance is not desired. In this section, we present polynomial regression using the large distance diffeomorphic metric mapping (LDDMM) framework. This framework consists of a Lie group of diffeomorphisms endowed with a right invariant Sobolev metric acting on a space of landmark configurations. For a more detailed description of the group action approach, the reader is encouraged to consult Bruveris et al. [4]. We will instead focus on the Riemannian structure of landmarks and use the formulas for general Riemannian manifolds.

Given m landmarks in d dimensions, let $M\cong\mathbb{R}^{md}$ be the space of all possible configurations. We denote by $x_{i}\in\mathbb{R}^{d}$ the location of the ith landmark point. Tangent vectors are also represented as tuples of vectors, $v=(v_{i})_{i=1,\ldots,m}\in\mathbb{R}^{md}$, as are cotangent vectors $\alpha=(\alpha_{i})_{i=1,\ldots,m}\in\mathbb{R}^{md}$. Contrasting ordinary differential geometric methods in which vectors and metrics are the objects of interest, it is more convenient to work with landmark covectors (which we refer to as momenta). In such case the inverse metric (also called the cometric) is generally written using a shift-invariant scalar kernel $K:\mathbb{R}\to\mathbb{R}$. The inner product of two covectors is given by

$$\begin{aligned} \langle\alpha,\beta\rangle_{T_x^*M} = \sum_{i,j} K\bigl(|x_i-x_j|^2\bigr) \alpha_i^T\beta_j. \end{aligned}$$

(32)

The following Hamilton’s equations describe geodesics in landmark space [38, Eq. (21)]:

$$\begin{aligned} &\frac{d}{dt}x_i = \sum_j K \bigl(|x_i-x_j|^2\bigr)\alpha_j \end{aligned}$$

(33)

$$\begin{aligned} &\frac{d}{dt}\alpha_i = \sum_j 2(x_i-x_j)K'\bigl(|x_i-x_j|^2 \bigr)\alpha_i^T\alpha_j \end{aligned}$$

(34)

where K′ denotes the derivative of the kernel.

Introducing tangent vectors v=Kα and w=Kβ, parallel transport in LDDMM landmark space are computed in coordinates using the following formula, derived by Younes et al. [38, Eq. (25)]:

$$\begin{aligned} \frac{d}{dt}\beta_i &= K^{-1} \Biggl( \sum _{j=1}^N(x_i-x_j)^T(w_i-w_j)K' \bigl(|x_i-x_j|^2\bigr)\alpha_j \\ &\quad{}-\sum_{j=1}^N(x_i-x_j)^T(v_i-v_j)K' \bigl(|x_i-x_j|^2\bigr)\beta_j \Biggr) \\ &\quad-\sum_{j=1}^N(x_i-x_j) \gamma'\bigl(|x_i-x_j|^2\bigr) \bigl(\alpha_j^T\beta_i+\alpha_i^T \beta_j\bigr). \end{aligned}$$

(35)

In order to integrate the adjoint equations, it is also necessary to compute the Riemannian curvature tensor, which in this case is more complicated. For an in-depth treatment, see Micheli et al. [29, Theorem 2.2].

Using these approaches to computing parallel transport and curvature, we implemented the general polynomial adjoint optimization method. We applied this approach to the rat calivaria data, treating the data as absolute landmark positions (after Procrustes alignment) instead of as scale and rotation invariant Kendall shapes.

Shown in Fig. 5 are the results of LDDMM landmark polynomial regression. Notice that while the geodesic curve in this case corresponds to nonlinear trajectories for the individual landmarks, these paths do not fit the data quite as well as the quadratic curve. In particular, the point at the crown of the skull (labelled point A in Fig. 5) appears to change directions in the quadratic curve, which is not possible using a geodesic. These qualitative improvements correspond to a slight increase in R ², from 0.92 with the geodesic to 0.94 with the quadratic curve.

5 Riemannian Polynomials in Lie Groups

In this section, we consider the case when the configuration manifold is a Lie group G. A tangent vector v∈T _g G at a point g∈G can be identified with a tangent vector at the identity element e∈G via either right or left translation by g ⁻¹. The resulting element of T _e G is referred to as the right (respectively, left) trivialization of v. We call a vector field $X\in\mathfrak{X}(G)$ right (respectively, left) invariant if the right trivialization of X(g) is constant for all g. Both left and right translation, considered as mappings T _g G→T _e G are linear isomorphisms, and we will use the common notation $\mathfrak {g}$ to refer to T _e G. The vector space $\mathfrak{g}$, endowed with the vector product given by the right trivialization of the negative Jacobi-Lie bracket of right invariant vector fields is called the Lie algebra of G.

Of particular importance to the study of Lie groups is the adjoint representation, which for each group element g determines a linear action $\operatorname{Ad}_{g}$ on $\mathfrak{g}$ called the adjoint action and its dual action $\operatorname{Ad}_{g}^{*}$ on $\mathfrak{g}^{*}$ which is called the coadjoint action of g. In a Riemannian Lie group, the inner product on $\mathfrak{g}$ can be used to compute the adjoint of the adjoint action, which we term the adjoint-transpose action $\operatorname{Ad}_{g}^{\dagger}$, defined by

$$\begin{aligned} \bigl\langle \operatorname{Ad}_g^{\dagger} X,Y\bigr\rangle = \langle X, \operatorname{Ad}_g Y\rangle \end{aligned}$$

(36)

for all $X,Y\in\mathfrak{g}$. The infinitesimal version of these actions at the identity element are termed the infinitesimal adjoint action, $\operatorname{ad}_{X}$, and the infinitesimal adjoint-transpose action, $\operatorname{ad}_{X}^{\dagger}$. These operators, along with the metric at the identity, encode all geometric properties such as covariant derivatives and curvature in a Lie group with right invariant Riemannian metric. For a more complete review of Lie groups and the adjoint representation, see [28]. Following [25], we introduce the symmetric product of two vectors $X,Y\in\mathfrak{g}$ as

$$\begin{aligned} \operatorname{sym}_X Y = \operatorname{sym}_Y X = - \bigl(\operatorname{ad}_X^{\dagger} Y + \operatorname{ad}_Y^{\dagger} X \bigr). \end{aligned}$$

(37)

Extending X and Y to right invariant vector fields $\widetilde{X},\widetilde{Y}$, the covariant derivative $\nabla_{\widetilde{X}}\widetilde{Y}$ is also right invariant (c.f. [7, Proposition 3.18]) and satisfies

$$\begin{aligned} (\nabla_{\widetilde{X}}\widetilde{Y} )g^{-1} = -{\overline{\nabla}}_XY \end{aligned}$$

(38)

where we have introduced the notation ${\overline{\nabla}}$ for the reduced Levi-Civita connection:

$$\begin{aligned} {\overline{\nabla}}_XY = \frac{1}{2}\operatorname{ad}_XY + \frac{1}{2}\operatorname{sym}_XY. \end{aligned}$$

(39)

Notice that in this notation, $\operatorname{ad}$ represents the skew-symmetric component of the Levi-Civita connection, while $\operatorname{sym}$ represents the symmetric component.

We use ξ ₁ to denote the right trivialized velocity of the curve γ(t)∈G. Using our formula for the covariant derivative, one sees that the geodesic equation in a Lie group with right invariant metric is the right “Euler-Poincaré” equation:

$$\begin{aligned} \frac{d}{dt} \xi_1 = {\overline{\nabla}}_{\xi_1}\xi_1 = -\operatorname{ad}_{\xi_1}^{\dagger}\xi_1. \end{aligned}$$

(40)

The left Euler-Poincaré equation is obtained by removing the negative sign from the right hand side. For polynomials, the Euler-Poincaré equation is generalized to higher order. Introducing ξ _i,i=1,…,k to represent the right trivialized higher-order velocity vectors v _i,

$$\begin{aligned} v_i(t) = \xi_i(t)g(t), \end{aligned}$$

(41)

the reduced Riemannian polynomial equations are

$$\begin{aligned} &\frac{d}{dt}\gamma(t) = \xi_1\gamma(t) \end{aligned}$$

(42)

$$\begin{aligned} &\frac{d}{dt}\xi_i(t) = {\overline{\nabla}}_{\xi_1} \xi_i(t) + \xi_{i+1}(t), \quad i=1,\ldots,k-1 \end{aligned}$$

(43)

$$\begin{aligned} &\frac{d}{dt}\xi_k(t) = {\overline{\nabla}}_{\xi_1} \xi_k(t). \end{aligned}$$

(44)

Notice that these equations correspond precisely to the polynomial equations (Eq. (15)).

6 Polynomial Regression in Lie Groups

We have seen that the geodesic equation is simplified in a Lie group with right invariant metric, using the Euler-Poincaré equation. In this section, we derive the adjoint equations used to perform geodesic and polynomial regression in a Lie group. Using right-trivialized adjoint variables, we will see that the symmetries provided by the Lie group structure result in adjoint equations more amenable to computation than those in Sect. 4.

6.1 Geodesic Regression

Before moving on to polynomial regression, we first present an adjoint optimization approach to geodesic regression in a Lie group with right invariant metric. Suppose N data points J _j∈G are observed at times t _j∈[0,1]. Using the geodesic distance $d:G\times G\to\mathbb{R}$, the least squares geodesic regression problem is to find the minimum of

$$\begin{aligned} E(\gamma)= \frac{1}{2}\sum_{j=1}^N d\bigl(\gamma(t_j),J_j\bigr)^2, \end{aligned}$$

(45)

subject to the constraint that the curve γ:[0,1]→G is a geodesic.

In order to determine optimality conditions for γ, consider a variation of the geodesic γ(t), which is a vector field along γ that we denote δγ(t)∈T _γ(t) G. We denote by Z(t) the right trivialization of δγ(t). The variation of γ induces the following variation in the trivialized velocity ξ ₁ [16]:

$$\begin{aligned} \delta\xi_1(t) = \frac{d}{dt}Z(t) - \operatorname{ad}_{\xi_1}Z(t). \end{aligned}$$

(46)

Constraining δγ to be a Jacobi field, we use the following variation of the Euler-Poincaré equation to obtain

$$\begin{aligned} \frac{d}{dt}\delta\xi_1 = \delta\biggl(\frac{d}{dt} \xi_1 \biggr) = \delta\bigl(-\operatorname{ad}_{\xi_1}^{\dagger} \xi_1 \bigr) = \operatorname{sym}_{\xi_1}\delta\xi_1. \end{aligned}$$

(47)

Combining these results, we write the ordinary differential equation (ODE) that determines, along with initial conditions, the vector field Z:

$$\begin{aligned} \frac{d}{dt}\left( \begin{array}{c}Z\\ \delta\xi_1 \end{array} \right)=\left( \begin{array}{c@{\quad}c} \operatorname{ad}_{\xi_1}&I\\ 0&\operatorname {sym}_{\xi_1} \end{array} \right)\left( \begin{array}{c}Z\\ \delta\xi_1 \end{array} \right). \end{aligned}$$

(48)

This ODE constitutes a general perturbation of a geodesic and the vector field Z(t) is a right trivialized Jacobi field. In order to compute the variations of E with respect to the initial position γ(0) and velocity ξ ₁(0) of the geodesic γ(t), the variations of E with respect to γ(1) and ξ ₁(1) are transported backward to t=0 by the adjoint ODE. Introducing adjoint variables $\lambda_{0}(t),\lambda_{1}(t)\in\mathfrak {g}$, the left trivialized variation of E with respect to γ(t) and the variation with respect to ξ ₁(t) are given by

$$\begin{aligned} &\delta_{\gamma(0)}E = -\lambda_0(0) \end{aligned}$$

(49)

$$\begin{aligned} &\delta_{\xi_1(0)}E = -\lambda_1(0). \end{aligned}$$

(50)

These variations are computed by initializing λ ₀(1)=λ ₁(1)=0 and integrating the adjoint ODE backward to t=0. The adjoint ODE is obtained by simply computing the adjoint of the ODE governing geodesic perturbations, Eq. (48), with respect to the $L^{2}([0,1]\to\mathfrak{g})$ inner product. The resulting adjoint ODE is

$$\begin{aligned} \frac{d}{dt}\left( \begin{array}{c}\lambda_0\\ \lambda_1 \end{array} \right)=\left( \begin{array}{c@{\quad}c} -\operatorname{ad}_{\xi_1}^{\dagger}&0\\ -I&-\operatorname{sym}_{\xi_1}^{\dagger} \end{array} \right)\left( \begin{array}{c}\lambda_0\\ \lambda_1 \end{array} \right), \end{aligned}$$

(51)

where the adjoint of the symmetric product is given by

$$\begin{aligned} \operatorname{sym}_X^{\dagger} Y = -\operatorname{ad}_X Y + \operatorname{ad}_Y^{\dagger} X. \end{aligned}$$

(52)

The adjoint variable λ ₀ takes jump discontinuities when passing over data points:

$$\begin{aligned} \lambda_0\bigl(t_j^-\bigr)-\lambda_0 \bigl(t_j^+\bigr) = (\operatorname{Log}_{\gamma(t_j)}J_j ) \gamma(t_j)^{-1}. \end{aligned}$$

(53)

The jumps represent the residual vectors, obtained by right trivialization of the Riemannian log map from the predicted point γ(t _j) to the data J _j. Notice that the adjoint variable λ satisfies an equation resembling the Euler-Poincaré equation and can likewise be solved in closed form:

$$\begin{aligned} \lambda_0(t) = \sum_{j,t_j>t} \operatorname{Ad}_{\gamma^{-1}(t)\gamma(t_j)}^{\dagger}\operatorname {Log}_{\gamma(t_j)}J_j. \end{aligned}$$

(54)

This is particularly useful because it reduces the second order ODE, Eq. (51), to an ODE of first order, since the first equation is solved in closed form. We will soon see that this simplification occurs even when using higher order polynomials.

Finally, minimization of E is performed using the variations $\delta_{\gamma(0)}E,\delta_{\xi_{1}(0)}E$ using, for example the following gradient descent steps:

$$\begin{aligned} &\gamma(0)^{k+1} = {\operatorname{Exp}}(-\alpha\delta_{\gamma(0)^k}E) \gamma(0)^k \end{aligned}$$

(55)

$$\begin{aligned} &\xi_1(0)^{k+1} = \xi_1(0)^k - \alpha\delta_{\xi(0)^k} E \end{aligned}$$

(56)

for some positive step size α, where k denotes the step of the iterative optimization process. Note that commonly the Riemannian exponential map $\operatorname{Exp}$ in the above expression is replaced by a numerically efficient approximation such as the Cayley map [3].

6.2 Example: Rotation Group SO(3)

As an example, in this section we derive the algorithm for polynomial regression in the group of rotations in three dimensions, SO(3). This group consists of orthogonal matrices with determinant one, and has associated the Lie algebra $\mathfrak{so}(3)$ of skew-symmetric 3-by-3 matrices. Skew-symmetric matrices can be bijectively identified with vectors in $\mathbb{R}^{3}$ using the following mapping ∗:

$$\begin{aligned} * &: \mathbb{R}^3 \leftrightarrow\mathfrak{so}(3),\qquad * \left( \begin{array}{c}a\\ b\\ c \end{array} \right) = \left( \begin{array}{c@{\quad}c@{\quad}c} 0&-c&b\\ c&0&-a\\ -b&a&0 \end{array} \right). \end{aligned}$$

(57)

We use a star to indicate both this mapping $\mathbb{R}^{3}\to\mathfrak {so}(3)$ and its inverse, a notation which emphasizes that it is the Hodge dual in $\mathbb{R}^{3}$, though it is also commonly written using a hat symbol [28]. Using the cross product on $\mathbb{R}^{3}$, the star map is also a Lie algebra isomorphism, so that

$$\begin{aligned} *\operatorname{ad}_{*x}*y = x\times y. \end{aligned}$$

(58)

The adjoint action under the star is also quite convenient, as it is given simply by matrix-vector multiplication:

$$\begin{aligned} \operatorname{Ad}_g (*x) = *(gx) \end{aligned}$$

(59)

for any $g\in\mathrm{SO}(3),x\in\mathbb{R}^{3}$.

We will use a left invariant metric given by a symmetric positive definite 3-by-3 matrix A. For vectors $x,y\in\mathbb{R}^{3}$, the inner product is

$$\begin{aligned} \langle *x,*y\rangle_{\mathfrak{g}}= x^TAy. \end{aligned}$$

(60)

With this inner product, the infinitesimal adjoint transpose action is

$$\begin{aligned} *\operatorname{ad}_{*x}^{\dagger}*y = -A^{-1}(x\times A y). \end{aligned}$$

(61)

The most natural metric is that in which A is the identity matrix. In that case, left invariance also implies right invariance and skew-symmetry of $\operatorname{ad}^{\dagger}$, so that for any $X,Y\in\mathfrak{so}(3)$:

$$\begin{aligned} \operatorname{sym}_XY & 0,\qquad{\overline{\nabla}}_X Y = \frac{1}{2} \operatorname{ad}_XY. \end{aligned}$$

(62)

The Euler-Poincaré equation in the biinvariant case is

$$\begin{aligned} \frac{d}{dt} \xi= \operatorname{ad}_{\xi}^{\dagger}\xi= - *\xi \times * \xi= 0, \end{aligned}$$

(63)

implying that geodesics using the biinvariant metric have constant trivialized velocity. The geodesic can then be integrated in closed form:

$$\begin{aligned} \frac{d}{dt}\gamma(t) &= \xi\gamma(t) \quad\implies\quad\gamma (t) = \exp(t\xi). \end{aligned}$$

(64)

Notice that the adjoint-transpose action of a rotation matrix g∈SO(3) on a 3-vector x is given by

$$\begin{aligned} *\operatorname{Ad}_g^{\dagger}(*x) = g^Tx. \end{aligned}$$

(65)

So the first adjoint equation is given by

$$\begin{aligned} \lambda_0(t) &= \gamma(t)^T\gamma(1) \lambda_0(1) \end{aligned}$$

(66)

$$\begin{aligned} &= \exp(-t\xi)\exp(\xi)\lambda_0(1) \end{aligned}$$

(67)

$$\begin{aligned} &= \exp\bigl((1-t)\xi\bigr)\lambda_0(1) \end{aligned}$$

(68)

$$\begin{aligned} &= \lambda_0(1)\cos\bigl((1-t)\|\xi\|\bigr) \\ &\quad{}- \frac{1}{\|\xi\|}\bigl(*\xi\times\lambda_0(1)\bigr)\sin \bigl((1-t)\|\xi\|\bigr) \\ &\quad{}+ \frac{1}{\|\xi\|^2}*\xi\bigl(*\xi\cdot\lambda_0(1)\bigr) \bigl(1- \cos\bigl((1-t)\|\xi\|\bigr)\bigr). \end{aligned}$$

(69)

where the last line is Rodrigues’ rotation formula. The second adjoint equation, which determines the variation used to update the velocity, is obtained by integrating this. For geodesic regression with biinvariant metric, a closed form solution is available for the second adjoint variable as well:

$$\begin{aligned} \frac{d}{dt}\lambda_1(t) &= -\lambda_0(t) \end{aligned}$$

(70)

$$\begin{aligned} \lambda_1(t) &= \int_t^1 \lambda_0(s) ds \end{aligned}$$

(71)

$$\begin{aligned} &= \lambda_0(1)\frac{1}{\|\xi\|}\sin\bigl((1-t)\|\xi\|\bigr) \\ &\quad{}- \frac{1}{\|\xi\|^2}\bigl(*\xi\times\lambda_0(1)\bigr) \bigl(1-\cos\bigl((1-t)\|\xi\|\bigr)\bigr) \\ &\quad{}+ \frac{1}{\|\xi\|^3}*\xi\bigl(*\xi\cdot\lambda_0(1)\bigr) \bigl(1\,{-}\,t\,{-}\,\sin\bigl((1\,{-}\,t)\|\xi\|\bigr)\bigr). \end{aligned}$$

(72)

6.3 Polynomial Regression

We apply a method similar to that of the previous section to derive an adjoint optimization scheme for Riemannian polynomial regression in a Lie group with right invariant metric. A variation of the first equation gives Eq. (46). Taking variations of the other equations, noting that ${\overline{\nabla}}$ is linear in each argument, we have

$$\begin{aligned} \frac{d}{dt}\delta\xi_i = {\overline{\nabla}}_{\delta\xi_1} \xi_i + {\overline{\nabla}}_{\xi_1}\delta\xi_i + \delta \xi_{i+1}. \end{aligned}$$

(73)

Along with Eq. (46), these provide the essential equations for a polynomial perturbation Z of γ, which can be considered a kind of higher-order Jacobi field. Introducing adjoint variables $\lambda_{0},\ldots,\lambda_{k}\in \mathfrak{g}$, the adjoint system is (see Appendix C for derivation)

$$\begin{aligned} &\frac{d}{dt}\lambda_0 = -\operatorname{ad}_{\xi_1}^{\dagger} \lambda_0 \end{aligned}$$

(74)

$$\begin{aligned} &\frac{d}{dt} \lambda_1 = -\lambda_0 - \operatorname{sym}_{\xi_1}^{\dagger}\lambda_1 + \sum _{i=2}^k \bigl(-{\overline{\nabla}}_{\xi_i}- \operatorname{sym}_{\xi_i}^{\dagger}\bigr)\lambda_i \end{aligned}$$

(75)

$$\begin{aligned} &\frac{d}{dt}\lambda_i = -\lambda_{i-1} + {\overline{\nabla}}_{\xi_1}\lambda_i,\quad i=2,\ldots,k, \end{aligned}$$

(76)

or, using only $\operatorname{ad}$ and $\operatorname{ad}^{\dagger}$, as

$$\begin{aligned} &\frac{d}{dt}\lambda_0 = -\operatorname{ad}_{\xi_1}^{\dagger} \lambda_0 \end{aligned}$$

(77)

$$\begin{aligned} &\frac{d}{dt} \lambda_1 = -\lambda_0 + \operatorname{ad}_{\xi_1}\lambda_1 - \operatorname{ad}_{\lambda _1}^{\dagger} \xi_1 \\ &\hphantom{\frac{d}{dt} \lambda_1 =} {}+ \frac{1}{2}\sum_{i=2}^k \bigl(\operatorname{ad}_{\xi_i}\lambda_i+\operatorname{ad}_{\xi _i}^{\dagger} \lambda_i-\operatorname{ad}_{\lambda_i}^{\dagger}\xi_i \bigr) \end{aligned}$$

(78)

$$\begin{aligned} &\frac{d}{dt}\lambda_i = -\lambda_{i-1} + \frac{1}{2} \bigl(\operatorname{ad}_{\xi_1}\lambda_i- \operatorname{ad}_{\xi_1}^{\dagger}\lambda_i -\operatorname {ad}_{\lambda_i}^{\dagger} \xi_1 \bigr). \end{aligned}$$

(79)

For i=2,…,k, these equations resemble the original polynomial equations. However, the evolution of λ ₁ is influenced by all adjoint variables and higher-order velocities in a non-trivial way. The first adjoint equation again resembles the Euler-Poincaré equation, and its solution is given by Eq. (54).

6.3.1 Polynomial Regression in SO(3)

Revisiting the rotation group, we can extend the geodesic regression results to polynomials. Representing Lie algebra elements as 3-vectors ξ _i, the equations for higher order polynomials in SO(3) are

$$\begin{aligned} &\frac{d}{dt} \gamma(t) = \bigl(*\xi_1(t)\bigr)\gamma(t) \end{aligned}$$

(80)

$$\begin{aligned} &\frac{d}{dt}\xi_1(t) = \xi_2(t) \end{aligned}$$

(81)

$$\begin{aligned} &\frac{d}{dt}\xi_i(t) = \frac{1}{2}\xi_1(t) \times\xi_i(t) + \xi_{i+1}(t),\quad i=2,\ldots,k-1 \end{aligned}$$

(82)

$$\begin{aligned} &\frac{d}{dt}\xi_k(t) = \frac{1}{2}\xi_1(t) \times\xi_k(t). \end{aligned}$$

(83)

In this case, closed form integration isn’t available, even with a biinvariant metric. Even for higher order polynomials, the first adjoint equation is integrated in closed form, giving

$$\begin{aligned} \lambda_0(t) = \gamma(t)^T\gamma(1) \lambda_0(1). \end{aligned}$$

(84)

7 Lie Group Actions

So far, we’ve seen that polynomial regression is particularly convenient in Lie groups with right invariant metrics, reducing the adjoint system from second to first order using the closed form integral of λ ₀. We now consider the case when a Lie group G acts on another manifold M which is itself equipped with a Riemannian metric. For our purposes, the group action need not be transitive, in which case the target space is called a “homogeneous space” for G.

Although the two approaches sometimes coincide, generally one must choose between using polynomials defined by the metric in M, ignoring the action of G, or using curves defined by the action of polynomials in G on points in M. In cases when a Riemannian Lie group is known to act on the space M, the primary object of interest is usually not the path in the object space M, but the path of symmetries described by the group elements. Therefore it is most natural to use the Lie group structure to define paths in object space. We employ this approach, in which polynomial regression under a Riemannian Lie group action is studied primarily using the Lie group elements.

Following this plan, we model a polynomial in M as a curve p(t) defined using the group action:

$$\begin{aligned} p(t) &= \gamma(t).p_0 \end{aligned}$$

(85)

where γ is a polynomial of order k in G with parameters

$$\begin{aligned} \gamma(0)\in G,\qquad\xi_1,\ldots,\xi_k\in\mathfrak{g} \end{aligned}$$

(86)

and p ₀∈M is a base point in the object space. Invariance of the metric on G allows us to assume, without loss of flexibility in the model, that the base deformation is the identity: γ(0)=e∈G. Optimization is done by fixing γ(0)=e∈G and minimizing a least squares objective function defined using the metric on M, with respect to the base point p ₀∈M and the parameters of the Lie group polynomial, $\xi_{1},\ldots,\xi_{k}\in\mathfrak{g}$. This is accomplished using a similar adjoint method to that presented in the previous sections, but where the jump discontinuities in λ ₀ are modified due to this change in objective function. In the following sections, we discuss this in more detail and also derive the gradients with respect to the base point p ₀.

7.1 Action on a General Manifold

A smooth group action can be differentiated to obtain a mapping from the Lie algebra $\mathfrak{g}$ to the tangent space T _p M at any point p∈M. Given a curve g(t):(−ϵ,ϵ)→G such that g(0)=e and $\frac{d}{dt}|_{t=0}g(t)=\xi\in\mathfrak{g}$, define the following mapping (c.f. [16]):

$$\begin{aligned} \rho_p(\xi) &:=\frac{d}{dt}\bigg|_{t=0} g(t).p. \end{aligned}$$

(87)

The function ρ _p is a linear mapping from $\mathfrak{g}$ to T _p M, and as such it has a dual $\rho_{p}^{*}:T_{p}^{*}M\to\mathfrak{g}^{*}$ that maps cotangent vectors in M to the Lie coalgebra $\mathfrak{g}^{*}$. This dual mapping we refer to as the cotangent lift momentum map and use the notation $\mathbf{J}:T^{*}M\to\mathfrak{g}^{*}$.

The most important property of J is that it is preserved under the coadjoint action:

$$\begin{aligned} \operatorname{Ad}_g^* \mathbf{J}m = \mathbf{J}g.m\quad\forall m\in T^*M. \end{aligned}$$

(88)

The action of g on the cotangent bundle, which appears on the right-hand side above, maps a cotangent vector μ at point p to the vector $g.\mu \in T_{g.p}^{*}M$. Replacing squared norm with squared geodesic distance on the Riemannian manifold M, the first adjoint variable is then given by

$$\begin{aligned} \lambda_0(t) = \sum_{j,t_j>t} \mathbf{J}\gamma(t) \gamma(t_j)^{-1}.(\operatorname{Log}_{\gamma(t_j).p_0}J_j)^{\flat}. \end{aligned}$$

(89)

Of particular interest is the case when the metric on G and the metric on the manifold M coincide, in the sense that for any vectors $\xi,\mu\in \mathfrak{g}$ and points p∈M:

$$\begin{aligned} \langle\xi,\mu\rangle_{\mathfrak{g}}= \langle\xi.p,\mu.p \rangle_{T_pM}. \end{aligned}$$

(90)

Fixing a base point p ₀∈M, this means the mapping g→g.p ₀ is a Riemannian submersion. If, additionally, the metric on G is biinvariant, this implies that the covariant derivative satisfies [33]

$$\begin{aligned} \nabla_{\xi.p}\mu.p = ({\overline{\nabla}}_\xi\mu).p \end{aligned}$$

(91)

so that geodesics and polynomials in M are generated by polynomials in G along with the action on the base point p ₀.

7.1.1 Example: Rotations of the Sphere

Consider the sphere of radius one in $\mathbb{R}^{3}$, which is denoted $\mathbb{S}^{2}$. The group SO(3) acts naturally on the sphere. For this example, we will use the biinvariant metric on SO(3), which corresponds to using the identity for the A matrix in Sect. 6.2. Representing points on the sphere as unit vectors in $\mathbb{R}^{3}$, the group action is simply left multiplication by a matrix in SO(3):

$$\begin{aligned} &\gamma.p = \gamma p \end{aligned}$$

(92)

$$\begin{aligned} &\xi.p = \xi p \end{aligned}$$

(93)

for all $\gamma\in\mathrm{SO}(3),\xi\in\mathfrak{so}(3),p\in \mathbb{S}^{2},v\in T_{p}\mathbb{S}^{2}$. The infinitesimal action is in fact a cross product, which is easily seen using the star map:

$$\begin{aligned} \xi.p = \xi p = (*\xi)\times p. \end{aligned}$$

(94)

Representing elements in $\mathfrak{so}(3)^{*}$ as 3-vectors, we derive the cotangent lift momentum map as well; letting $a\in T_{p}\mathbb{S}^{2}$,

$$\begin{aligned} \mathbf{J}a = *(p\times a). \end{aligned}$$

(95)

This can be interpreted as converting a linear momentum on the surface of the sphere into an angular momentum in $\mathfrak{so}(3)$ using the cross product with the moment arm p. The standard metric on the sphere corresponds to the standard biinvariant metric on SO(3) so that, as discussed previously, polynomials on $\mathbb{S}^{2}$ correspond to polynomials in SO(3) acting on points on the sphere.

The polynomial equations for the sphere are precisely those for SO(3), along with the action of γ(t) on the base point $p_{0}\in\mathbb{S}^{2}$. The derivative of γ(t) is replaced by the equation

$$\begin{aligned} \frac{d}{dt}p(t) = \frac{d}{dt}\bigl(\gamma(t).p_0\bigr) = \xi_1(t).p(t). \end{aligned}$$

(96)

The evolution of ξ _i is the same as that for SO(3). Figure 6 shows example polynomial curves in the rotation group and their action on a point on the sphere. Notice that the example polynomials on the sphere are precisely those shown in Fig. 1, although they were generated here using polynomials on SO(3) instead of integrating directly on the sphere.

In order to integrate the adjoint equations, the jump discontinuities must be computed using the log map on the sphere:

$$\begin{aligned} \operatorname{Log}_x y = \theta\biggl(\frac{y-\cos\theta x}{\sin \theta} \biggr), \quad\cos \theta= x^Ty. \end{aligned}$$

(97)

The flatting operation acts trivially on this vector, and the action of SO(3) on covectors corresponds to matrix-vector multiplication. Using this, along with the momentum map J, we have the jump discontinuities for the first adjoint variable λ ₀:

$$\begin{aligned} \lambda_0\bigl(t_j^-\bigr)-\lambda_0 \bigl(t_j^+\bigr) = \gamma(t_j)\times( \operatorname{Log}_{\gamma(t_j)}J_j ). \end{aligned}$$

(98)

The higher adjoint variables satisfy the same ODEs as in Sect. 6.3.1.

7.2 Lie Group Actions on Vector Spaces

We will assume in this section that the manifold is a vector space V and that G acts linearly on the left on V. Given a smooth linear group action, a vector ξ in the Lie algebra $\mathfrak{g}$ acts linearly on a vector v∈V in the following way

$$\begin{aligned} \xi.v = \frac{d}{d\epsilon}\bigg|_{\epsilon=0} g(\epsilon).v \end{aligned}$$

(99)

where g(ϵ) is a curve in G satisfying g(0)=e and $\frac{d}{d\epsilon}|_{\epsilon=0}g(\epsilon)=\xi$. Again we use the notation $\rho_{v}:\mathfrak{g}\to V$ to denote right-multiplication under this action:

$$\begin{aligned} \rho_v \xi:=\xi.v\quad\forall v\in V,\xi\in\mathfrak{g}. \end{aligned}$$

(100)

In the vector space setting, the cotangent lift momentum map (again defined as the dual of ρ _v), is written using the diamond notation introduced in [16]:

$$\begin{aligned} &v\diamond a\in\mathfrak{g}^*\quad\forall v\in V, a\in V^*, \end{aligned}$$

(101)

$$\begin{aligned} &(v\diamond a,\xi)_{(\mathfrak{g}*,\mathfrak{g})} :=(a,\rho_v \xi)_{(V^*,V)}\quad\forall\xi\in\mathfrak{g}. \end{aligned}$$

(102)

The diamond map interacts with the coadjoint action $\operatorname {Ad}^{*}$ in a convenient way:

$$\begin{aligned} \operatorname{Ad}_{g^{-1}}^*(v\diamond a) = (g.v)\diamond(g.a). \end{aligned}$$

(103)

This relation is fundamental in that it shows that the diamond map is preserved under the coadjoint action. This is quite useful in our case, as we will soon see that diamond maps show up commonly in variational problems on inner product spaces.

Commonly, data is provided in the form of points J _i in the vector space V. In that case, the inner product on V is used to write the regression problem as a minimization of

$$\begin{aligned} E(\gamma, v_0) = \frac{1}{2}\sum _{j=1}^N \big\| \gamma(t_j).v_0 - J_j\big\| _V^2, \end{aligned}$$

(104)

subject to the constraint that γ is a polynomial in G and v ₀∈V is an evolving template vector. Without loss of generality, γ(0) can also be constrained to be the identity so that v ₀ is the template vector at time zero. Optimization of Eq. (104) with respect to v ₀ requires the variation

$$\begin{aligned} \delta_{v_0} E = \sum_{j=1}^N \gamma(t_j)^{-1}\bigl(\gamma(t_j).v_0 - J_j\bigr)^{\flat}. \end{aligned}$$

(105)

Here the musical flat symbol ♭ denotes lowering of indices using the metric on V, an operation mapping V to V ^∗. If the group G acts by isometries on V, then the group action commutes with flatting and the optimal base vector v ₀ can be computed in closed form

$$\begin{aligned} \hat{v}_0 = \frac{1}{N} \sum_{j=1}^N \gamma(t_j)^{-1}.J_j. \end{aligned}$$

(106)

Even when G does not act by isometries, the optimal base vector can often be solved for in closed form.

The variation with respect to γ(t _j) is more interesting:

$$\begin{aligned} \delta_{\gamma(t_j)}E = \bigl(\gamma(t_j).v_0\bigr) \diamond\bigl(\gamma(t_j).v_0-J_j \bigr)^{\flat}. \end{aligned}$$

(107)

Using this along with the relation between the coadjoint action and diamond map, we can write the first polynomial adjoint variable in closed form

$$\begin{aligned} \lambda_0(t) = \sum_{j,t_j>t} \bigl( \gamma(t).v_0\bigr)\diamond\bigl(\gamma(t)\gamma(t_j)^{-1}. \bigl(\gamma(t_j).v_0-J_j \bigr)^{\flat}\bigr). \end{aligned}$$

(108)

7.2.1 Example: Diffeomorphically Deforming Images

Right invariant Sobolev metrics on groups of diffeomorphisms are the main objects of study in computational anatomy [30]. Describing an image I as a square integrable function of a domain $\varOmega\subset\mathbb{R}^{d}$, the left action of a diffeomorphism γ∈Diff(Ω) is

$$\begin{aligned} \gamma.I = I\circ\gamma^{-1}. \end{aligned}$$

(109)

The corresponding infinitesimal action of a velocity field ξ on an image is

$$\begin{aligned} \xi.I = -\xi^T\nabla I \end{aligned}$$

(110)

and the diamond map is

$$\begin{aligned} (I\diamond\alpha) (y) = -\alpha(y)\nabla I(y). \end{aligned}$$

(111)

Geodesic regression in this context, using an adjoint optimization method, has been previously studied [31]. Using their method, the initial momentum of a geodesic is constrained by horizontal: that is, Lξ ₁(0)=I ₀⋄α(0). As a result, changes in base image I ₀ influence the behavior of the deformation itself.

Using our method, the base velocity vectors ξ _i are not constrained to be horizontal. Implementation of polynomial regression involves the expression above for the diamond map, along with the $\operatorname{ad}$ and $\operatorname {ad}^{*}$ operators [28]

$$\begin{aligned} &\operatorname{ad}_\xi X = \mathrm{D}\xi X - \mathrm{D}X \xi, \end{aligned}$$

(112)

$$\begin{aligned} &\operatorname{ad}_\xi^* m = \mathrm{D}m \xi+ m\operatorname {div}\xi+ (\mathrm{D}\xi)^T m. \end{aligned}$$

(113)

Inserting this into the right Euler-Poincaré equation yields the well-known EPDiff equation for geodesic evolution in the diffeomorphism group [16]:

$$\begin{aligned} \frac{d}{dt}m = - \mathrm{D}m \xi- m\operatorname{div}\xi- (\mathrm{D}\xi)^T m. \end{aligned}$$

(114)

For polynomials, momenta m _i=Lξ _i are introduced and this EPDiff equation is generalized to

$$\begin{aligned} &\frac{d}{dt}m_1 = - \mathrm{D}m_1 \xi_1 - m_1\operatorname{div}\xi_1 - (\mathrm{D}\xi_1)^T m_1 + m_2 \end{aligned}$$

(115)

$$\begin{aligned} &\frac{d}{dt}m_i = m_{i+1} + \frac{1}{2} \bigl( L (\mathrm{D}\xi_1\xi_i-\mathrm{D}\xi_i\xi_1 ) \\ &\hphantom{\frac{d}{dt}m_i =} {}-\mathrm{D}m_i\xi_1-(\mathrm{D}\xi_1)^Tm_i-m_i \operatorname{div}\xi_1 \\ &\hphantom{\frac{d}{dt}m_i =} {}-\mathrm{D}m_1\xi_i-(\mathrm{D}\xi_i)^Tm_1-m_1 \operatorname{div}\xi_i \bigr) \end{aligned}$$

(116)

$$\begin{aligned} &\frac{d}{dt}m_k = \frac{1}{2} \bigl( L (\mathrm{D} \xi_1\xi_i-\mathrm{D}\xi_i\xi_1 ) \\ &\hphantom{\frac{d}{dt}m_k =} {}-\mathrm{D}m_k\xi_1-(\mathrm{D}\xi_1)^Tm_k-m_k \operatorname{div}\xi_1 \\ &\hphantom{\frac{d}{dt}m_k =} {}-\mathrm{D}m_1\xi_k-(\mathrm{D}\xi_i)^Tm_1-m_1 \operatorname{div}\xi_k \bigr) \end{aligned}$$

(117)

The estimation of the base image I ₀ is simplified, as Eq. (105) is solved in closed form using

$$\begin{aligned} I_0(y) = \frac{\sum_j |\mathrm{D}\gamma_j(y)|J_j\circ\gamma_j (y)}{\sum_j |\mathrm{D}\gamma_j(y)|}. \end{aligned}$$

(118)

As an example of image regression, synthetic data were generated and geodesic regression was performed using the adjoint method described above. Figure 7 shows the input images, as well as the estimated geodesic trend, which matches the input data well. Note that although the method presented in [31] is similar, using our abstraction, geodesic regression can be generalized to polynomials of any order, and to data which are not necessarily scalar-valued images.

8 Discussion

The Riemannian polynomial framework we have presented provides a general approach to regression for manifold-valued data. The greatest limitation to performing polynomial regression on a general Riemannian manifold is that it requires computation of the Riemannian curvature tensor, which is often tedious [29]. In a Lie group or homogeneous space, we have shown that the symmetries provided by the group allow for not only simple integration using parallel transport in the Lie algebra, but also simplified adjoint equations that do not require explicit curvature computation.

The theory of rolling maps on the sphere, introduced by Jupp & Kent [18], offer another perspective on Riemannian polynomials. On the sphere, this interesting interpretation is related to the group action described above. Given a curve $\gamma:[0,1]\to\mathbb{S}^{2}$, consider embedding both the sphere and a plane in $\mathbb{R}^{3}$ such that the plane is tangent to the sphere at the point γ(0). Now roll the sphere along so that it remains tangent at γ(t) at every time, and such that no slipping or twisting occurs. The resulting path, $\gamma_{u}:[0,1]\to\mathbb{R}^{2}$, traced out on the plane is called the unwrapped curve. Remarkably, the property that γ is a k-order polynomial on $\mathbb{S}^{2}$ is equivalent to the unwrapped curve γ _u being a k-order polynomial in the conventional sense. For more information regarding this connection to Jupp & Kent’s rolling maps, as well as a comparison to Noakes’ cubic splines [32], the reader is referred to the literature of Leite & Krakowski [24].

References

Bandulasiri, A., Gunathilaka, A., Patrangenaru, V., Ruymgaart, F., Thompson, H.: Nonparametric shape analysis methods in glaucoma detection. I. J. Stat. Sci. 9, 135–149 (2009)
Google Scholar
Bookstein, F.L.: Morphometric Tools for Landmark Data: Geometry and Biology. Cambridge Univ. Press, Cambridge (1991)
MATH Google Scholar
Bou-Rabee, N.: Hamilton-Pontryagin integrators on Lie groups. Ph.D. thesis, California Institute of Technology (2007)
Bruveris, M., Gay-Balmaz, F., Holm, D., Ratiu, T.: The momentum map representation of images. J. Nonlinear Sci. 21(1), 115–150 (2011)
Article MATH MathSciNet Google Scholar
Burnham, K., Anderson, D.: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, New York (2002)
Google Scholar
Cates, J., Fletcher, P.T., Styner, M., Shenton, M., Whitaker, R.: Shape modeling and analysis with entropy-based particle systems. In: Proceedings of Information Processing in Medical Imaging (IPMI) (2007)
Google Scholar
Cheeger, J., Ebin, D.G.: Comparison Theorems in Riemannian Geometry, vol. 365. AMS Bookstore, Providence (1975)
MATH Google Scholar
Davis, B.C., Fletcher, P.T., Bullitt, E., Joshi, S.C.: Population shape regression from random design data. Int. J. Comput. Vis. 90(2), 255–266 (2010)
Article Google Scholar
do Carmo, M.P.: Riemannian Geometry, 1st edn. Birkhäuser, Boston (1992)
Book MATH Google Scholar
Driesen, N, Raz, N: The influence of sex, age, and handedness on corpus callosum morphology: A meta-analysis. Psychobiology (1995). doi:10.3758/BF03332028
Google Scholar
Dryden, I.L., Kume, A., Le, H., Wood, A.T.: A multi-dimensional scaling approach to shape analysis. Biometrika 95(4), 779–798 (2008)
Article MATH MathSciNet Google Scholar
Fletcher, PT: Geodesic regression and the theory of least squares on Riemannian manifolds. Int. J. Comput. Vis. (2012). doi:10.1007/s11263-012-0591-y
Google Scholar
Fletcher, P.T., Liu, C., Pizer, S.M., Joshi, S.C.: Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans. Med. Imaging 23(8), 995–1005 (2004)
Article Google Scholar
Giambò, R., Giannoni, F., Piccione, P.: An analytical theory for Riemannian cubic polynomials. IMA J. Math. Control Inf. 19(4), 445–460 (2002)
Article MATH Google Scholar
Hinkle, J., Muralidharan, P., Fletcher, P.T., Joshi, S.C.: Polynomial regression on Riemannian manifolds. In: ECCV, Florence, Italy, vol. 3, pp. 1–14 (2012)
Google Scholar
Holm, D.D., Marsden, J.E., Ratiu, T.S.: The Euler-Poincaré equations and semidirect products with applications to continuum theories. Adv. Math. 137, 1–81 (1998)
Article MATH MathSciNet Google Scholar
Huckemann, S., Hotz, T., Munk, A.: Intrinsic shape analysis: geodesic principal component analysis for Riemannian manifolds modulo Lie group actions. Discussion paper with rejoinder. Stat. Sin. 20, 1–100 (2010)
MATH MathSciNet Google Scholar
Jupp, P.E., Kent, J.T.: Fitting smooth paths to spherical data. Appl. Stat. 36(1), 34–46 (1987)
Article MATH MathSciNet Google Scholar
Kendall, D.G.: Shape manifolds, procrustean metrics, and complex projective spaces. Bull. Lond. Math. Soc. 16(2), 81–121 (1984)
Article MATH MathSciNet Google Scholar
Kendall, D.G.: A survey of the statistical theory of shape. Stat. Sci. 4(2), 87–99 (1989)
Article MATH MathSciNet Google Scholar
Kenobi, K., Dryden, I.L., Le, H.: Shape curves and geodesic modelling. Biometrika 97(3), 567–584 (2010)
Article MATH MathSciNet Google Scholar
Kume, A., Dryden, I.L., Le, H.: Shape-space smoothing splines for planar landmark data. Biometrika 94(3), 513–528 (2007). doi:10.1093/biomet/asm047
Article MATH MathSciNet Google Scholar
Le, H., Kendall, D.G.: The Riemannian structure of Euclidean shape spaces: a novel environment for statistics. Ann. Stat. 21(3), 1225–1271 (1993)
Article MATH MathSciNet Google Scholar
Silva Leite, F, Krakowski, K: Covariant differentiation under rolling maps. Departamento de Matemática, Universidade of Coimbra, Portugal (2008), No. 08-22, 1–8
Lewis, A., Murray, R.: Configuration controllability of simple mechanical control systems. SIAM J. Control Optim. 35(3), 766–790 (1997)
Article MATH MathSciNet Google Scholar
Machado, L., Leite, F.S., Krakowski, K.: Higher-order smoothing splines versus least squares problems on Riemannian manifolds. J. Dyn. Control Syst. 16(1), 121–148 (2010)
Article MATH MathSciNet Google Scholar
Marcus, D., Wang, T., Parker, J., Csernansky, J., Morris, J., Buckner, R.: Open access series of imaging studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19(9), 1498–1507 (2007)
Article Google Scholar
Marsden, J., Ratiu, T.: Introduction to Mechanics and Symmetry: a Basic Exposition of Classical Mechanical Systems, vol. 17. Springer, Berlin (1999)
MATH Google Scholar
Micheli, M., Michor, P., Mumford, D.: Sectional curvature in terms of the cometric, with applications to the Riemannian manifolds of landmarks. SIAM J. Imaging Sci. 5(1), 394–433 (2012)
Article MATH MathSciNet Google Scholar
Miller, M.I., Trouvé, A., Younes, L.: Geodesic shooting for computational anatomy. J. Math. Imaging Vis. 24(2), 209–228 (2006). doi:10.1007/s10851-005-3624-0
Article Google Scholar
Niethammer, M., Huang, Y., Vialard, F.X.: Geodesic regression for image time-series. In: Proceedings of Medical Image Computing and Computer Assisted Intervention (MICCAI) (2011)
Google Scholar
Noakes, L., Heinzinger, G., Paden, B.: Cubic splines on curved surfaces. IMA J. Math. Control Inf. 6, 465–473 (1989)
Article MATH MathSciNet Google Scholar
O’Neill, B.: The fundamental equations of a submersion. Mich. Math. J. 13(4), 459–469 (1966)
Article MATH MathSciNet Google Scholar
Shi, X., Styner, M., Lieberman, J., Ibrahim, J.G., Lin, W., Zhu, H.: Intrinsic regression models for manifold-valued data. In: Medical Image Computing and Computer-Assisted Intervention—MICCAI 2009, pp. 192–199. Springer, Berlin (2009)
Chapter Google Scholar
Singh, N., Wang, A., Sankaranarayanan, P., Fletcher, P., Joshi, S.: Genetic, structural and functional imaging biomarkers for early detection of conversion from MCI to AD. In: Proceedings of Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 132–140. Springer, Berlin (2012)
Google Scholar
Turaga, P., Veeraraghavan, A., Srivastava, A., Chellappa, R.: Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2273–2286 (2011)
Article Google Scholar
Vaillant, M., Glaunes, J.: Surface matching via currents. In: Information Processing in Medical Imaging, pp. 1–5. Springer, Berlin (2005)
Google Scholar
Younes, L., Qiu, A., Winslow, R., Miller, M.: Transport of relational structures in groups of diffeomorphisms. J. Math. Imaging Vis. 32(1), 41–56 (2008)
Article MathSciNet Google Scholar
Yushkevich, P.A., Piven, J., Cody Hazlett, H., Gimpel Smith, R., Ho, S., Gee, J.C., Gerig, G.: User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. NeuroImage 31(3), 1116–1128 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

SCI Institute, University of Utah, 72 Central Campus Dr., Salt Lake City, Utah, 84112, USA
Jacob Hinkle, P. Thomas Fletcher & Sarang Joshi

Authors

Jacob Hinkle
View author publications
You can also search for this author in PubMed Google Scholar
P. Thomas Fletcher
View author publications
You can also search for this author in PubMed Google Scholar
Sarang Joshi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jacob Hinkle.

Appendices

Appendix A: Numerical Integration of the Polynomial Equations

By definition, in the limit Δt→0, the exponential map satisfies $\dot{\gamma}(t) = v_{1}(t)$. To see that the forward integration algorithm shown in Algorithm 1 approximates the polynomial equations, let w(t) be any vector field parallel along γ(t). That is,

$$\begin{aligned} \nabla_{\dot{\gamma}(t)} w(t) = 0. \end{aligned}$$

(119)

Denote by $P_{\varDelta t}(t)=\operatorname{ParTrans}(p,\varDelta t v,w)$ the parallel transport of a vector w∈T _p M along a geodesic from point p for time Δt in the direction of vector v∈T _p M. Then

$$\begin{aligned} \frac{d}{dt}\langle w, v_i\rangle&= \langle \nabla_{\dot{\gamma}}w,v_i\rangle+ \langle w, \nabla_{\dot{\gamma}}v_i \rangle= \langle w, \nabla_{\dot{\gamma}}v_i\rangle \end{aligned}$$

(120)

Now consider approximation of this inner product derivative under our integration scheme:

$$\begin{aligned} \frac{d}{dt}\langle w, v_i\rangle&\approx\lim _{\varDelta t\to0} \frac{1}{\varDelta t} \bigl( \bigl\langle P_{\varDelta t}w(t),P_{\varDelta t}\bigl(v_i(t)+\varDelta t v_{i+1}(t)\bigr) \bigr\rangle \\ &\quad{}- \bigl\langle w(t),v_i(t)\bigr\rangle \bigr). \end{aligned}$$

(121)

The parallel transport operator is linear in the vectors being transported, so

$$\begin{aligned} \frac{d}{dt}\langle w, v_i\rangle&\approx\lim _{\varDelta t\to0} \frac{1}{\varDelta t} \bigl( \bigl\langle P_{\varDelta t}w(t),P_{\varDelta t}v_i(t) \bigr\rangle \\ &\quad{}+\varDelta t \bigl\langle P_{\varDelta t}w(t),v_{i+1}(t) \bigr\rangle - \bigl\langle w(t),v_i(t)\bigr\rangle \bigr) \\ &= \lim_{\varDelta t\to0} \frac{1}{\varDelta t} \bigl( \bigl( \bigl\langle P_{\varDelta t}w(t),P_{\varDelta t}v_i(t) \bigr\rangle -\bigl\langle w(t),v_i(t)\bigr\rangle \bigr) \\ &\quad{}+\lim_{\varDelta t\to0} \bigl\langle P_{\varDelta t}w(t),v_{i+1}(t) \bigr\rangle \bigr) \end{aligned}$$

(122)

The first line is zero, by definition of parallel transport. Also note that lim_Δt→0 P _Δt w=w, so that

$$\begin{aligned} \frac{d}{dt}\langle w, v_i\rangle= \langle w, \nabla_{\dot{\gamma}}v_i\rangle&\approx\langle w,v_{i+1} \rangle. \end{aligned}$$

(123)

As this holds for any parallel vector field w, this implies that our integration algorithm approximates the polynomial equation

$$\begin{aligned} \nabla_{\dot{\gamma}}v_i = v_{i+1}. \end{aligned}$$

(124)

Appendix B: Derivation of Adjoint Equations in Riemannian Manifolds

In this appendix we derive the adjoint system for the polynomial regression problem. The approach to calculus of variations on Riemannian manifolds described here is very similar to that employed by Noakes et al. [32]. Consider a simplified objective function containing only a single data term, at time T:

$$\begin{aligned} E\bigl(\gamma,\{v_i\},\{\lambda_i\}\bigr) &= d\bigl( \gamma(T),y\bigr)^2 +\int_0^T \langle\lambda_0,\dot{\gamma}-v_1\rangle dt \\ &\quad{}+ \sum_{i=1}^{k-1}\int _0^T\langle\lambda_i, \nabla_{\dot{\gamma}}v_i-v_{i+1}\rangle dt \\ &\quad{}+ \int_0^T\langle\lambda_k, \nabla_{\dot{\gamma}}v_k\rangle dt. \end{aligned}$$

(125)

Now consider taking variations of E with respect to the vector fields v _i. For each i there are only two terms containing v _i, so if W is a test vector field along γ, then the variation of E with respect to v _i in the direction W satisfies

$$\begin{aligned} \int_0^T\langle\delta_{v_i}E,W \rangle dt = \int_0^T\langle \lambda_i,\nabla_{\dot{\gamma}} W\rangle dt -\int_0^T \langle\lambda_{i-1},W\rangle dt. \end{aligned}$$

(126)

The first term is integrated by parts to yield

$$\begin{aligned} \int_0^T\langle\delta_{v_i}E,W \rangle dt &= \langle\lambda_i, W\rangle|_0^T - \int_0^T\langle\nabla_{\dot{\gamma}} \lambda_i, W\rangle dt \\ &\quad{}-\int_0^T\langle\lambda_{i-1},W \rangle dt. \end{aligned}$$

(127)

The variation with respect to v _i for i=1,…,k is then given by

$$\begin{aligned} &\delta_{v_i(t)}E = 0 = -\nabla_{\dot{\gamma}}\lambda_i - \lambda_{i-1} ,\quad t\in(0,T) \end{aligned}$$

(128)

$$\begin{aligned} &\delta_{v_i(T)}E = 0 = \lambda_i(T) \end{aligned}$$

(129)

$$\begin{aligned} &\delta_{v_i(0)}E = -\lambda_i(t). \end{aligned}$$

(130)

In order to determine the differential equation for λ ₀, the variation with respect to γ must be computed. Let W again denote a test vector field along γ. For some ϵ>0, let {γ _s:s∈(−ϵ,ϵ)} be a differentiable family of curves satisfying

$$\begin{aligned} &\gamma_0 = \gamma \end{aligned}$$

(131)

$$\begin{aligned} &\frac{d}{ds}\gamma_s\bigg|_{s=0} = W. \end{aligned}$$

(132)

If ϵ is chosen small enough, the vector field W can be extended to a neighborhood of γ such that $[W,\dot{\gamma_{s}}]=0$, where a dot indicates the derivative in the $\frac{\partial}{\partial t}$ direction. The vanishing Lie bracket implies the following identities

$$\begin{aligned} &\nabla_W\dot{\gamma}_s = \nabla_{\dot{\gamma}_s}W \end{aligned}$$

(133)

$$\begin{aligned} &\nabla_W\nabla_{\dot{\gamma}_s} = \nabla_{\dot{\gamma}_s} \nabla_W + R(W,\dot{\gamma}_s). \end{aligned}$$

(134)

Finally, the vector fields v _i,λ _i are extended along γ _s via parallel translation, so that

$$\begin{aligned} &\nabla_W v_i = 0 \end{aligned}$$

(135)

$$\begin{aligned} &\nabla_W \lambda_i = 0. \end{aligned}$$

(136)

The variation of E with respect to γ satisfies

$$\begin{aligned} \int_0^T\langle\delta_\gamma E,W \rangle dt &= \frac{d}{ds}E\bigl(\gamma_s,\{v_i\}, \{\lambda_i\}\bigr)\big|_{s=0} \\ &= -\bigl\langle \operatorname{Log}_{\gamma(T)}y,W(T)\bigr\rangle \\ &\quad{}+ \frac{d}{ds}\int_0^T \langle \lambda_0,\dot{\gamma_s}-v_1\rangle dt\bigg|_{s=0} \\ &\quad{}+ \frac{d}{ds}\sum_{i=1}^{k-1} \int_0^T\langle\lambda_i, \nabla_{\dot{\gamma}_s}v_i-v_{i+1}\rangle dt\bigg|_{s=0} \\ &\quad{}+ \frac{d}{ds}\int_0^T\langle \lambda_k,\nabla_{\dot{\gamma}_s}v_k\rangle dt\bigg|_{s=0}. \end{aligned}$$

(137)

As the λ _i are extended via parallel translation, their inner products satisfy

$$\begin{aligned} \frac{d}{ds}\langle\lambda_i, U\rangle|_{s=0} = \langle\nabla_W\lambda_i,U\rangle+ \langle \lambda_i,\nabla_W U\rangle= \langle \lambda_i,\nabla_W U\rangle. \end{aligned}$$

(138)

Then applying this to each term in the previous equation,

$$\begin{aligned} \int_0^T\langle\delta_\gamma E,W \rangle dt &= -\bigl\langle \operatorname{Log}_{\gamma (T)}y,W(T)\bigr\rangle \\ &\quad{}+ \int_0^T \langle\lambda_0, \nabla_W\dot{\gamma}-\nabla_Wv_1\rangle dt \\ &\quad{}+ \sum_{i=1}^{k-1}\int _0^T\langle\lambda_i, \nabla_W\nabla_{\dot{\gamma}}v_i-\nabla_Wv_{i+1} \rangle dt \\ &\quad{}+ \int_0^T\langle\lambda_k, \nabla_W\nabla_{\dot{\gamma}}v_k\rangle dt. \end{aligned}$$

(139)

Then by construction, since ∇_W v _i=0,

$$\begin{aligned} \int_0^T\langle\delta_\gamma E,W \rangle dt &= -\bigl\langle \operatorname{Log}_{\gamma (T)}y,W(T)\bigr\rangle \\ &\quad{}+ \int_0^T \langle\lambda_0, \nabla_W\dot{\gamma}\rangle dt \\ &\quad{}+ \sum_{i=1}^{k}\int _0^T\langle\lambda_i, \nabla_W\nabla_{\dot{\gamma}}v_i\rangle dt. \end{aligned}$$

(140)

Then using the Lie bracket and curvature identities, this is written as

$$\begin{aligned} \int_0^T\langle\delta_\gamma E,W \rangle dt &= -\bigl\langle \operatorname{Log}_{\gamma (T)}y,W(T)\bigr\rangle \\ &\quad{}+ \int_0^T \langle\lambda_0, \nabla_{\dot{\gamma}}W\rangle dt \\ &\quad{}+ \sum_{i=1}^{k}\int _0^T\bigl\langle \lambda_i, \nabla_{\dot{\gamma}}\nabla_W v_i + R(W,\dot{ \gamma})v_i\bigr\rangle dt, \end{aligned}$$

(141)

which is further simplified, again using the identity ∇_W v _i=0:

$$\begin{aligned} \int_0^T\langle\delta_\gamma E,W \rangle dt &= -\bigl\langle \operatorname{Log}_{\gamma (T)}y,W(T)\bigr\rangle \\ &\quad{}+ \int_0^T \langle\lambda_0, \nabla_{\dot{\gamma}}W\rangle dt \\ &\quad{}+ \sum_{i=1}^{k}\int _0^T\bigl\langle \lambda_i,R(W, \dot{\gamma})v_i\bigr\rangle dt, \end{aligned}$$

(142)

Using the Bianchi identities, it can be demonstrated that the curvature tensor satisfies the identity [9]:

$$\begin{aligned} \bigl\langle A,R(B,C)D\bigr\rangle = -\bigl\langle B,R(D,A)C\bigr\rangle , \end{aligned}$$

(143)

for any vectors A,B,C,D. The covariant derivative along γ is also integrated by parts to arrive at

$$\begin{aligned} \int_0^T\langle\delta_\gamma E,W \rangle dt &= -\bigl\langle \operatorname{Log}_{\gamma (T)}y,W(T)\bigr\rangle \\ &\quad{}+ \langle\lambda_0,W\rangle|_0^T - \int_0^T \langle\nabla_{\dot{\gamma}} \lambda_0,W\rangle dt \\ &\quad{}- \sum_{i=1}^{k}\int _0^T\bigl\langle R(v_i, \lambda_i)\dot{\gamma},W\bigr\rangle dt. \end{aligned}$$

(144)

Finally, gathering terms, the adjoint equation for λ ₀ and its gradients are obtained:

$$\begin{aligned} &\delta_{\gamma(t)}E = 0 = -\nabla_{\dot{\gamma}}\lambda_0 - \sum_{i=1}^k R(v_i, \lambda_i)\dot{\gamma},\quad t\in(0,T) \end{aligned}$$

(145)

$$\begin{aligned} &\delta_{\gamma(T)}E = 0 = -\operatorname{Log}_{\gamma(T)}y + \lambda_0 \end{aligned}$$

(146)

$$\begin{aligned} &\delta_{\gamma(0)}E = -\lambda_0. \end{aligned}$$

(147)

Along with the variations with respect to v _i, this constitutes the full adjoint system. Extension to the case of multiple data at multiple time points is trivial, and results in the adjoint system presented in Sect. 4.

Appendix C: Derivation of Adjoint Equations in Lie Groups

Let G be a Lie group with Lie algebra $\mathfrak{g}$, equipped with a right invariant metric. Let γ:[0,1]→G be a polynomial in G of order k with right-trivialized velocities $\xi_{i}:[0,1]\to\mathfrak{g}$. Recall the equations for a perturbation Z,δξ _i of this polynomial:

$$\begin{aligned} &\frac{d}{dt}Z = \delta\xi_1 - \operatorname{ad}_{\xi_1} Z \end{aligned}$$

(148)

$$\begin{aligned} &\frac{d}{dt}\delta\xi_i = {\overline{\nabla}}_{\delta\xi_1} \xi_i + {\overline{\nabla}}_{\xi_1}\delta\xi_i + \delta \xi_{i+1}. \end{aligned}$$

(149)

The second equation can be rewritten

$$\begin{aligned} \frac{d}{dt}\delta\xi_i &= \frac{1}{2}\operatorname{ad}_{\delta \xi_1} \xi_i + \frac{1}{2}\operatorname{sym}_{\delta\xi_1}\xi_i + {\overline{\nabla}}_{\xi_1}\delta\xi_i + \delta\xi_{i+1} \end{aligned}$$

(150)

$$\begin{aligned} &= -\frac{1}{2}\operatorname{ad}_{\xi_i}\delta\xi_1 + \frac{1}{2}\operatorname{sym}_{\xi_i}\delta\xi_1 + {\overline{\nabla}}_{\xi_1}\delta\xi_i + \delta\xi_{i+1} \end{aligned}$$

(151)

$$\begin{aligned} &= (-{\overline{\nabla}}_{\xi_i} + \operatorname{sym}_{\xi_i} )\delta\xi_1 + {\overline{\nabla}}_{\xi_1}\delta\xi_i + \delta\xi_{i+1}. \end{aligned}$$

(152)

This suggests the following matrix form ODE:

$$\begin{aligned} &\frac{d}{dt}\left( \begin{array}{c} Z \\ \delta\xi_1 \\ \vdots\\ \delta\xi_k \end{array} \right) \\ &\quad = \left( \begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c} \operatorname{ad}_{\xi_1} & I & \cdots& 0 & 0 & 0\\ 0 & \operatorname{sym}_{\xi_1} & I &\cdots& 0 \\ 0 & -{\overline{\nabla}}_{\xi_2}+\operatorname{sym}_{\xi_2} & {\overline{\nabla}}_{\xi_1} & I &\cdots& 0 \\ \vdots& & & & & \vdots\\ 0 & -{\overline{\nabla}}_{\xi_k}+\operatorname{sym}_{\xi_k} & 0 &\cdots& & {\overline{\nabla}}_{\xi_1} \end{array} \right) \\ &\qquad{}\times \left( \begin{array}{c} Z \\ \delta\xi_1 \\ \vdots\\ \delta\xi_k \end{array} \right). \end{aligned}$$

(153)

In order to derive the adjoint Jacobi field, one simply computes the negative adjoint of the matrix in the above equation. The adjoint of the above matrix is

$$\begin{aligned} \left( \begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c} -\operatorname{ad}_{\xi_1}^{\dagger}& 0 & \cdots & 0 & 0 \\ -I & -\operatorname{sym}_{\xi_1}^{\dagger}& {\overline{\nabla}}_{\xi _2}^{\dagger}-\operatorname{sym}_{\xi_2}^{\dagger} & \cdots& {\overline{\nabla}}_{\xi_k}^{\dagger}- \operatorname{sym}_{\xi_k}^{\dagger}\\ 0 & -I & -{\overline{\nabla}}_{\xi_1} & 0 &\cdots\\ \vdots& & & & 0 \\ 0 & 0 & \cdots& -I & -{\overline{\nabla}}_{\xi_1}^{\dagger} \end{array} \right). \end{aligned}$$

(154)

Now note that the adjoint of the ${\overline{\nabla}}_{\xi}$ operator is $-{\overline{\nabla}}_{\xi}$, since (using Eq. (52))

$$\begin{aligned} 2{\overline{\nabla}}_X^{\dagger} Y &= \operatorname{ad}_X^{\dagger} Y + \operatorname{sym}_X^{\dagger} Y \end{aligned}$$

(155)

$$\begin{aligned} &= \operatorname{ad}_X^{\dagger} Y - \operatorname{ad}_X Y + \operatorname{ad}_Y^{\dagger} X \end{aligned}$$

(156)

$$\begin{aligned} &= -\operatorname{ad}_X Y - \operatorname{sym}_X Y \end{aligned}$$

(157)

$$\begin{aligned} &= -2{\overline{\nabla}}_X Y. \end{aligned}$$

(158)

Now let $\lambda_{0},\ldots,\lambda_{k}:[0,1]\to\mathfrak{g}$ be adjoint variables representing gradients with respect to position γ and velocities ξ ₁,…,ξ _k. Using the equations above, we write the reduced polynomial adjoint equations as

$$\begin{aligned} &\frac{d}{dt}\lambda_0 = -\operatorname{ad}_{\xi_1}^{\dagger} \lambda_0 \end{aligned}$$

(159)

$$\begin{aligned} &\frac{d}{dt}\lambda_1 = -\lambda_0 - \operatorname{sym}_{\xi_1}^{\dagger}\lambda_1 + \sum _{i=2}^k \bigl(-{\overline{\nabla}}_{\xi_k}- \operatorname{sym}_{\xi_k}^{\dagger}\bigr)\lambda_k \end{aligned}$$

(160)

$$\begin{aligned} &\frac{d}{dt}\lambda_i = -\lambda_{i-1} + {\overline{\nabla}}_{\xi_1}\lambda_i\quad i=2,\ldots,k. \end{aligned}$$

(161)

The first adjoint variable, λ ₀, takes on jump discontinuities when passing data points, which are derived identically to the geodesic case. Also note that this derivation is for right invariant metrics using right trivialized vectors, but the equivalent derivation in the case of left invariance is essentially identical.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Reprints and permissions

About this article

Cite this article

Hinkle, J., Fletcher, P.T. & Joshi, S. Intrinsic Polynomials for Regression on Riemannian Manifolds. J Math Imaging Vis 50, 32–52 (2014). https://doi.org/10.1007/s10851-013-0489-5

Download citation

Received: 11 February 2013
Accepted: 28 December 2013
Published: 22 February 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10851-013-0489-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Intrinsic Polynomials for Regression on Riemannian Manifolds

Abstract

Similar content being viewed by others

Nonlinear Regression on Manifolds for Shape Analysis using Intrinsic Bézier Splines

Shape Analysis by Computing Geodesics on a Manifold via Cubic B-splines

Geodesic Shape Regression in the Framework of Currents

1 Introduction

1.1 Regression Analysis and Curve-Fitting

1.2 Previous Work: Cubic Splines

1.3 Contributions in This Work

2 Riemannian Geometry Preliminaries

3 Riemannian Polynomials

3.1 Polynomial Time Reparametrization

4 Polynomial Regression via Adjoint Optimization

4.1 Coefficient of Determination (R 2) in Metric Spaces

4.2 Example: Kendall Shape Space

4.2.1 Rat Calivaria Growth

4.2.2 Corpus Callosum Aging

4.3 LDDMM Landmark Space

5 Riemannian Polynomials in Lie Groups

6 Polynomial Regression in Lie Groups

6.1 Geodesic Regression

6.2 Example: Rotation Group SO(3)

6.3 Polynomial Regression

6.3.1 Polynomial Regression in SO(3)

7 Lie Group Actions

7.1 Action on a General Manifold

7.1.1 Example: Rotations of the Sphere

7.2 Lie Group Actions on Vector Spaces

7.2.1 Example: Diffeomorphically Deforming Images

8 Discussion

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Numerical Integration of the Polynomial Equations

Appendix B: Derivation of Adjoint Equations in Riemannian Manifolds

Appendix C: Derivation of Adjoint Equations in Lie Groups

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

4.1 Coefficient of Determination (R ²) in Metric Spaces