1 Introduction

The beam and rod theories have been developed to model a typical three dimensional solid structure which is much longer in one dimension than the other two dimensions. The classical Euler-Bernoulli beam theory considers the extension and compression of a rod or beam and allows loads of stretching, compressing or bending [6, 24], which is generally suitable for modelling a thin rod with small deformation. The Timoshenko-Ehrenfest beam theory was developed to take into account shear deformation induced by rotational bending effects, making it suitable for modelling thick beams with larger deformation [19, 23]. The Kirchhoff-Love and Cosserat rod theories were developed to model rods with finite deformation, with the former allowing bending and twisting while ignoring stretching, compressing and shearing deformation [21, 56]; and the latter allowing all types of loads and deformation [2, 18, 29, 74, 75]. This paper is to formulate an optimal control problem based upon the special Cosserat theory of rods. Cosserat rod theory is geometrically exact for modelling bending and torsion as well as extension and shear, which is considered as a geometrically nonlinear generalisation of the Timoshenko-Ehrenfest beam (while Kirchhoff-Love is a geometrically nonlineaer generalisation of Euler-Bernoulli beam) [2, 74], and has been adopted to model the locomotion of Caenorhabditis elegans (C. elegans) in recent years [31, 68].

Optimal control is a branch of mathematical optimisation which seeks to optimise an objective function of a dynamical system, usually described by partial differential equations, through controlling meaningful variables of the system [78]. Optimal control theory has a wide range of applications from classical control of solid structures [45, 60, 64, 77, 84] to optimal flow control [33, 43, 44, 61, 65] including recent control formulation for fluid-structure interaction systems [13, 14, 62, 80,81,82]. The objective can be a desired deformed configuration of an elastic solid controlled by a set of load parameters [45], drag force reduction of a flow system by shape optimisation [33, 61, 65] or active turbulence control at the boundary layer [15, 22, 46, 50, 59, 61]; it could also be velocity tracking by controlling a body force [3, 36, 38, 40, 43, 44, 55, 57, 58, 66] or boundary force [3, 4, 26, 28, 37, 38, 41]; the objective may also be reducing vorticity [1, 3, 66] or matching a turbulence kinetic energy [3, 57, 58]. Velocity-tracking type of optimal control has a rigorous mathematical theory for its solution existence [1, 26, 37, 39] and stability of its numerical algorithm [39, 41, 43, 44].

In the context of optimal control of a rod or beam, the solution existence of optimal control of the longitudinal vibration of a viscoelastic rod by either a contact force or distributed force is discussed in [73], and the mean mechanical energy minimised by a boundary force is studied, using the methods of the calculus of variations [30], maximum principle [70] and Ritz method [51]; minimisation of the mean square deviation of the Timoshenko beam is investigated by controlling a distributed force [71] or by the angular acceleration [83], and singularity of its solution is discussed in [69]; optimal control of transverse vibration of Euler-Bernoulli beam is introduced in [79]. We will consider displacement tracking of the Cosserat rod in this paper, which, to the best of our knowledge, has not been studied before. Due to their long, thin nature, accurately capturing the rotation vectors of a rod is a challenging task whereas there are several well studied approaches for reconstructing the centreline [8, 27, 32]. To investigate the applicability of our method we will consider a case study on the nematode, C. elegans: We shall apply this optimal control formulation to the reconstruction of the locomotion of C. elegans based upon laboratory data [72].

Locomotion – the ability of an organism to move from one place to another – is achieved by animals through a variety of methods [9, 16, 35]. C. elegans is a transparent nematode of about \(1\,\textrm{mm}\) long [11, 76], whose planar undulatory locomotion has been widely studied [17, 48] by laboratory experiments [8, 25, 47, 54] or mathematical modelling [31, 63, 68]. In its natural habitat, this nematode moves in three dimensional environments. However, such locomotion has only recently begun to be recorded [52, 72]. Of particular interest is a recent modelling study of a roll manoeuvre modelled as a torsional turn [10]. One key challenge with interpretation of 3D video footage of locomotion is that the images (and hence the centreline reconstructions) lack information linking the body shape to the local, anatomically meaningful frame of the body (its left, right, ventral and dorsal directions (see Figure 4), and information about internal torsion or twist along the body).

In this paper, we propose a method to combine laboratory data of the motion of C. elegans ’ centreline and mathematical modelling to reconstruct the whole picture of C. elegans locomotion: how does the worm wriggle and wiggle locally through its body (how does its anatomical frame evolve)? The centreline data of C. elegans can be constructed using videos from three different perspectives[72]. However, it is challenging to construct the local frames (information about the internal torsion or twist along the body), which is the motivation to develop the proposed method in this paper. We point out that the proposed optimal control formulation is general and not limited to C. elegans but for simplicity we restrict the example formulation to neglect inertial terms [17] (consistent with C. elegans being a low Reynolds number swimmer).

The contributions of this paper are highlighted as follows: a monolithic optimal control method is developed based upon the special Cosserat theory of rods; an incompressibility condition is derived and integrated into the forward as well as the control problem; the formulation is implicit and the primal and adjoint equations are solved in a fully-coupled manner; this new optimal control method is applied to a challenging inverse problem: reconstruction of C. elegans locomotion based on its centreline from laboratory data; implementation in the open-source software package FreeFEM++, which is available on the public Github site.

The paper is organised as follows. The governing partial differential equations of the Cosserat rod are introduced in Section 2 with a focus on expressing these control equations in a closed component form. The optimisation problem with the corresponding primal and adjoint equations are derived in Section 3, followed by a monolithic optimal control formulation in Section 4. Numerical experiments are carried out in Section 5 to validate both the forward and the optimal control formulations, and the proposed optimal control method is applied to the reconstruction of C. elegans locomotion in Section 6. Finally, conclusions are drawn and future work are discussed in Section 7.

2 Governing equations for the Cosserat rod

First, two coordinate systems or frames, as well as their relations, are introduced in order to describe the geometry of the Cosserat rod. Then, the mechanics of Cosserat rod is described by the conservation of linear momentum and angular momentum, and a set of constitutive equations is introduced to close the system. Finally, the governing equations are expressed in terms of six unknown variables: three components of the position vector (xyz) and three components of the rotation vector \((\alpha , \beta , \gamma ) \). This formulation is based on the one presented in [12]: we derive the angular velocity and generalised curvature using a new method in Section 2.3 and rewrite all the control equations in a matrix-vector format; in addition, we consider dilation of the cross section of the rod by differentiation of the reference and current arc lengths and derivation of the incompressibility condition in Section 2.4.

Fig. 1
figure 1

A sketch of the Cosserat rod

2.1 Global and local frames

In order to describe all the types of deformation of the Cosserat rod, a global coordinate system \(\left[ \textbf{e}_1, \textbf{e}_2, \textbf{e}_3 \right] \) is first introduced as shown in Figure 1, which is assumed to form a fixed right-hand orthogonal unit basis (also called the fixed frame), to define the centreline of the rod by a three-dimensional curve: \(\textbf{r}(s, t)=x(s,t)\textbf{e}_1 + y(s,t)\textbf{e}_2 + z(s,t)\textbf{e}_3\), where \(s\in \left[ a(t), b(t)\right] \) is the arc-length parameter of the curve and t denotes the time; a local coordinate system (the moving frame) \(\left[ \textbf{d}_1 (s,t), \textbf{d}_2(s,t), \textbf{d}_3(s,t)\right] \) (orthogonal unit basis) is also introduced everywhere at the centreline to describe the motion of the rod’s cross section, and it is assumed that \(\textbf{d}_3(s,t)\) is always perpendicular (not necessarily coinciding with the tangential \(\partial _s\textbf{r}(s,t)\) of the centreline) to the cross section to facilitate the expressions of the moment of inertia and constitutive relations. This local coordinate system can be constructed by the following three successive rotations from the global coordinate system.

Remark 1

The arc length s is a current configuration, which is generally not a constant especially when considering the case of large extension or compression [31]. We introduce a reference or initial configuration \(\tilde{s}=s_0\in \left[ a_0, b_0\right] \) to compute the strain (equations (9) and (10)), and consider \(s=s(\tilde{s},t)\) as a function of \(\tilde{s}\) and time t. Let us also introduce the deformation scalar \(j(\tilde{s},t)=ds(\tilde{s},t)/d\tilde{s}\) for the convenience of notation in the following sections.

Step 1: Rotate the \(\textbf{e}_1-\textbf{e}_3\) plane clockwise around the \(\textbf{e}_2\) axis by an angle \(\gamma \), so that the \(\textbf{e}_3\) axis sits on the \(\textbf{d}_2-\textbf{d}_3\) plane as shown in Figure 2 (left), i.e.: perform a rotation operation \(\left[ \textbf{e}_1, \textbf{e}_2, \textbf{e}_3 \right] \textbf{R}_y^T(\gamma )\) with

$$\begin{aligned} \textbf{R}_y = \left[ \begin{array}{ccc} \cos \gamma &{} 0 &{} \sin \gamma \\ 0 &{} 1 &{} 0 \\ -\sin \gamma &{} 0 &{} \cos \gamma \\ \end{array}\right] . \end{aligned}$$

Step 2: Rotate the \(\textbf{e}_2-\textbf{e}_3\) plane clockwise around the \(\textbf{e}_1\) axis by an angle \(\beta \), so that the \(\textbf{e}_3\) axis overlaps with the \(\textbf{d}_3\) axis as shown in Figure 2 (middle), i.e.: perform another rotation operation \(\left[ \textbf{e}_1, \textbf{e}_2, \textbf{e}_3 \right] \textbf{R}_y^T(\gamma )\textbf{R}_x^T(\beta )\) with

$$\begin{aligned} \textbf{R}_x=\left[ \begin{array}{ccc} 1 &{} 0 &{} 0 \\ 0 &{} \cos \beta &{} -\sin \beta \\ 0 &{} \sin \beta &{} \cos \beta \\ \end{array}\right] . \end{aligned}$$

Step 3: Rotate the \(\textbf{e}_1-\textbf{e}_2\) plane clockwise around \(\textbf{e}_3\) axis by an angle \(\alpha \), so that the \(\left[ \textbf{e}_1, \textbf{e}_2, \textbf{e}_3 \right] \) overlaps with \(\left[ \textbf{d}_1, \textbf{d}_2, \textbf{d}_3\right] \) as shown in Figure 2 (right), i.e.: perform the final rotation operation \(\left[ \textbf{e}_1, \textbf{e}_2, \textbf{e}_3 \right] \textbf{R}_y^T(\gamma )\textbf{R}_x^T(\beta )\textbf{R}_z^T(\alpha )\) with

$$\begin{aligned} \textbf{R}_z = \left[ \begin{array}{ccc} \cos \alpha &{} -\sin \alpha &{} 0 \\ \sin \alpha &{} \cos \alpha &{} 0\\ 0 &{} 0 &{} 1 \\ \end{array}\right] . \end{aligned}$$
Fig. 2
figure 2

Three rotations of the coordinate system

The overall rotation matrix can be expressed as:

$$\begin{aligned} \begin{aligned} \textbf{Q}&=\textbf{R}_z\textbf{R}_x\textbf{R}_y\\&=\left[ \begin{array}{ccc} \cos \alpha \cos \gamma - \sin \alpha \sin \beta \sin \gamma &{} - \sin \alpha \cos \beta &{} \cos \alpha \sin \gamma + \sin \alpha \sin \beta \cos \gamma \\ \sin \alpha \cos \gamma + \cos \alpha \sin \beta \sin \gamma &{} \cos \alpha \cos \beta &{} \sin \alpha \sin \gamma - \cos \alpha \sin \beta \cos \gamma \\ -\cos \beta \sin \gamma &{} \sin \beta &{} \cos \beta \cos \gamma \\ \end{array}\right] \end{aligned} \end{aligned}$$
(1)

where all the three angles \(\alpha \), \(\beta \) and \(\gamma \) are functions of the arc length s and time t: \(\alpha =\alpha (s,t)\), \(\beta =\beta (s,t)\) and \(\gamma =\gamma (s,t)\). Therefore, the local coordinate system can be obtained by

$$\begin{aligned} \left[ \textbf{d}_1 (s,t), \textbf{d}_2(s,t), \textbf{d}_3(s,t)\right] =\left[ \textbf{e}_1, \textbf{e}_2, \textbf{e}_3 \right] \textbf{Q}^T. \end{aligned}$$
(2)

The components of any vector \(\textbf{v}\) in these two coordinates system have the following relations: if \(\textbf{v}\) is expanded in the global frame as \(\textbf{v}=\sum _{i=1}^{3}v_i^g\textbf{e}_i\) and the local frame as \(\textbf{v}=\sum _{i=1}^{3}v_i^l\textbf{d}_i\), then (noticing that \(\textbf{Q}\) is an orthogonal unit matrix)

$$\begin{aligned} \textbf{v}=\left[ \textbf{e}_1, \textbf{e}_2, \textbf{e}_3 \right] \begin{pmatrix} v_1^g \\ v_2^g \\ v_3^g \end{pmatrix} =\left[ \textbf{d}_1, \textbf{d}_2, \textbf{d}_3 \right] \textbf{Q} \begin{pmatrix} v_1^g \\ v_2^g \\ v_3^g \end{pmatrix}, \end{aligned}$$
(3)

which implies

$$\begin{aligned} \begin{pmatrix} v_1^l \\ v_2^l \\ v_3^l \end{pmatrix} =\textbf{Q} \begin{pmatrix} v_1^g \\ v_2^g \\ v_3^g \end{pmatrix}. \end{aligned}$$
(4)

In the rest of this article, we use the superscript ‘g’ to indicate the components of a vector expanded in the global frame and ‘l’ in the local frame.

2.2 Conservation laws

The governing equations of the Cosserat rod are based on the conservation of linear momentum and conservation of angular momentum as follows [2]:

$$\begin{aligned}{} & {} \rho (s)A(s,t)\partial _{tt}{} \textbf{r}(s,t) =\partial _s\textbf{n}(s,t) + \textbf{f}(s,t), \end{aligned}$$
(5)
$$\begin{aligned}{} & {} \partial _t\textbf{h}(s,t) =\partial _s\textbf{m}(s,t) + \partial _s\textbf{r}(s,t)\times \textbf{n}(s,t)+ \textbf{l}(s,t), \end{aligned}$$
(6)

where \(\textbf{n}\) and \(\textbf{m}\) are the internal force and torque respectively, \(\textbf{f}\) and \(\textbf{l}\) are the external force and torque densities (per unit reference length) respectively, \(\rho (s)\) and A(st) are the density and area of the cross section respectively, and \(\textbf{h}\) is the angular momentum (per unit reference length).

In equations (5) and (6), it is convenient to express all the vectors in the local frame except \(\textbf{r}(s,t)\). Therefore, we first express these vectors in the local frame, and then transform them to the global frame using (4), which will finally be substituted into (5) and (6) in order to obtain an equation system in its component form.

The angular momentum \(\textbf{h}=\sum _{i=1}^{3}h_i^l\textbf{d}_i\) can be expressed as:

$$\begin{aligned} \begin{pmatrix} h_1^l \\ h_2^l \\ h_3^l \end{pmatrix} =\textbf{I}(s) \begin{pmatrix} \omega _1^l \\ \omega _2^l \\ \omega _3^l \end{pmatrix} =\left[ \begin{array}{ccc} I_{11} &{} 0 &{} 0 \\ 0 &{} I_{22} &{} 0\\ 0 &{} 0 &{} I_{33} \\ \end{array}\right] \begin{pmatrix} \omega _1^l \\ \omega _2^l \\ \omega _3^l \end{pmatrix} \end{aligned}$$
(7)

with \({\varvec{\omega }}=\sum _{i=1}^{3}\omega _i^l\textbf{d}_i\) denoting the generalised angular velocity, and \(\textbf{I}(s)\) denoting the moment of inertia (per unit reference length). Let \((\xi , \eta , \zeta )\) denote the coordinates in the local frame, then \(\textbf{I}(s)\) can be computed as follows, noticing that \(\textbf{d}_3\) is perpendicular to the cross section:

$$\begin{aligned} I_{11}= & {} \int _{A(s)}\rho \eta ^2d\xi d\eta , \quad I_{22}=\int _{A(s)}\rho \xi ^2d\xi d\eta , \nonumber \\ I_{33}= & {} \int _{A(s)}\rho \left( \xi ^2+\eta ^2\right) d\xi d\eta . \end{aligned}$$
(8)

In order to close the equation system (5) and (6), constitutive relations of \(\textbf{n}\) and \(\textbf{m}\) have to be established in terms of the unknown variables. In the local frame, we adopt a linear relation between the internal force \(\textbf{n}(s,t)\) and strain \({\varvec{\epsilon }}(s,t)\), and linear relation between internal torque \(\textbf{m}\) and the curvature \({\varvec{\kappa }}(s,t)\) [12, 31] as follows. Let

$$\begin{aligned} \textbf{n}(s,t)= & {} \sum _{i=1}^{3}n_i^g(s,t)\textbf{e}_i, \quad {\varvec{\epsilon }}(s,t):=\partial _{\tilde{s}}\textbf{r}(s,t) \nonumber \\= & {} \sum _{i=1}^{3}\epsilon _i^g(s,t)\textbf{e}_i, \end{aligned}$$
(9)

and

$$\begin{aligned} \textbf{n}(s,t)= & {} \sum _{i=1}^{3}n_i^l(s,t)\textbf{d}_i(s,t), \quad {\varvec{\epsilon }}(s,t):=\partial _{\tilde{s}}\textbf{r}(s,t)\nonumber \\= & {} \sum _{i=1}^{3}\epsilon _i^l(s,t)\textbf{d}_i(s,t), \end{aligned}$$
(10)

a linear relation, in the local frame, between \(\textbf{n}(s,t)\) and \(\varvec{\epsilon }(s,t)\) can be expressed as

$$\begin{aligned} \begin{pmatrix} n_1^l \\ n_2^l \\ n_3^l \end{pmatrix} =\textbf{K} \begin{pmatrix} \epsilon _1^l \\ \epsilon _2^l \\ \epsilon _3^l - 1 \end{pmatrix} =\left[ \begin{array}{ccc} K_{11} &{} 0 &{} 0 \\ 0 &{} K_{22} &{} 0\\ 0 &{} 0 &{} K_{33} \\ \end{array}\right] \begin{pmatrix} \epsilon _1^l \\ \epsilon _2^l \\ \epsilon _3^l-1 \end{pmatrix},\nonumber \\ \end{aligned}$$
(11)

where

$$\begin{aligned} K_{11}=K_{22}=kGA(s,t), \quad K_{33}=EA(s,t). \end{aligned}$$
(12)

E and G are the Young’s and shear moduli respectively, k is a numerical factor depending on the shape of the cross section at s [34], and A(st) is the area of the rod’s cross section. We assume A(st) is a function of space s and time t, and an incompressibility assumption will be used to determine A(st) in Section 2.4. Using the transformation (4) between the local and global coordinates, (11) can be expressed as

$$\begin{aligned} \begin{aligned} \begin{pmatrix} n_1^g \\ n_2^g \\ n_3^g \end{pmatrix}&=\textbf{Q}^T\textbf{K}{} \textbf{Q} \begin{pmatrix} \epsilon _1^g \\ \epsilon _2^g \\ \epsilon _3^g \end{pmatrix} - \textbf{Q}^T \begin{pmatrix} 0 \\ 0 \\ K_{33} \end{pmatrix} \\&=j(\tilde{s},t) \textbf{Q}^T\textbf{K}{} \textbf{Q} \partial _s \begin{pmatrix} x(s,t) \\ y(s,t) \\ z(s,t) \end{pmatrix} - \textbf{Q}^T \begin{pmatrix} 0 \\ 0 \\ K_{33} \end{pmatrix}. \end{aligned} \end{aligned}$$
(13)

Similarly, let

$$\begin{aligned} \textbf{m}(s,t)=\sum _{i=1}^{3}m_i^l(s,t)\textbf{d}_i(s,t), \quad {\varvec{\kappa }}(s,t)=\sum _{i=1}^{3}\kappa _i^l(s,t)\textbf{d}_i(s,t). \end{aligned}$$
(14)

Then, a linear relation, in the local frame, between \(\textbf{m}(s,t)\) and \(\varvec{\kappa }(s,t)\) can be expressed as

$$\begin{aligned} \begin{pmatrix} m_1^l \\ m_2^l \\ m_3^l \end{pmatrix} = \textbf{J} \begin{pmatrix} \kappa _1^l \\ \kappa _2^l \\ \kappa _3^l \end{pmatrix} = \left[ \begin{array}{ccc} J_{11} &{} 0 &{} 0 \\ 0 &{} J_{22} &{} 0\\ 0 &{} 0 &{} J_{33} \\ \end{array} \right] \begin{pmatrix} \kappa _1^l \\ \kappa _2^l \\ \kappa _3^l \end{pmatrix}, \end{aligned}$$
(15)

where

$$\begin{aligned} J_{11}= & {} \int _{A(s)}E\eta ^2d\xi d\eta , \quad J_{22}=\int _{A(s)}E\xi ^2d\xi d\eta , \nonumber \\ J_{33}= & {} \int _{A(s)}G\left( \xi ^2+\eta ^2\right) d\xi d\eta \end{aligned}$$
(16)

Remark 2

For the main context of this paper, we consider a circular cross section (with the exception of a rectangular cross section that is used in numerical test 5.1 for validation against a published result), and constant density \(\rho \), Young’s modulus E and shear modulus G. In which case, \(I_{11}=I_{22}=\rho A^2/4\pi \), \(I_{33}=\rho A^2/2\pi \), \(J_{11}=J_{22}=E A^2/4\pi \) and \(J_{33}=G A^2/2\pi \).

In the spirit of expressing all the unknown variables in terms of (xyz) and \((\alpha ,\beta ,\gamma )\), we further express the angular velocity \(\varvec{\omega }\) and curvature \(\varvec{\kappa }\) in terms of rotation angles \((\alpha ,\beta ,\gamma )\) in the following section.

2.3 Expressions of angular velocity and curvature in terms of the angles of rotation

For any fixed-length vector function, say, of t,

$$\begin{aligned} \textbf{v}\cdot \textbf{v}=c \Rightarrow (\partial _t\textbf{v})\cdot \textbf{v}=0, \end{aligned}$$

with constant c. This suggests that \(\partial _t\textbf{v}\) is always perpendicular to \(\textbf{v}\). If vector \(\textbf{v}\) rotates according to an angular velocity \(\varvec{\omega }\) as shown in Figure 3, we have

$$\begin{aligned} \partial _t\textbf{v}={\varvec{\omega }}\times \textbf{v}. \end{aligned}$$
(17)
Fig. 3
figure 3

Rotation of vector \(\textbf{v}\) with angular velocity \(\varvec{\omega }\). \(r=\Vert \textbf{v}\Vert \sin \theta \) and \(\Vert \partial _t\textbf{v}\Vert =r\Vert {\varvec{\omega }}\Vert \)

Since \(\textbf{d}_1\), \(\textbf{d}_2\) and \(\textbf{d}_3\) are all unit vectors, we can apply the above property (17) to these three vectors and have

$$\begin{aligned}{} & {} \sum _{i=1}^{i=3}{} \textbf{d}_i\times \partial _t\textbf{d}_i =\sum _{i=1}^{i=3}\textbf{d}_i\times \left( {\varvec{\omega }}\times \textbf{d}_i\right) =\sum _{i=1}^{i=3}{\varvec{\omega }}\left( \textbf{d}_i\cdot \textbf{d}_i\right) \nonumber \\{} & {} \quad -\sum _{i=1}^{i=3}\textbf{d}_i\left( {\varvec{\omega }}\cdot \textbf{d}_i\right) =2{\varvec{\omega }}. \end{aligned}$$
(18)

Following the same argument, we also have:

$$\begin{aligned} \sum _{i=1}^{i=3}{} \textbf{d}_i\times \partial _s\textbf{d}_i =2{\varvec{\kappa }}. \end{aligned}$$
(19)

Now, let \(\textbf{Q}^T=\left[ \textbf{q}_1, \textbf{q}_2, \textbf{q}_3\right] \) with \(\textbf{q}_i^T=\left( q_{i1}, q_{i2}, q_{i3}\right) \), \(i=1,2,3\), being the row vectors of \(\textbf{Q}\), then from (2) we have

$$\begin{aligned}{} & {} \textbf{d}_i=\left[ \textbf{e}_1, \textbf{e}_2, \textbf{e}_3\right] \begin{pmatrix} q_{i1} \\ q_{i2} \\ q_{i3} \end{pmatrix},\nonumber \\{} & {} \quad i=1,2,3, \end{aligned}$$
(20)

and

$$\begin{aligned}{} & {} \partial _t\textbf{d}_i =\left[ \textbf{e}_1, \textbf{e}_2, \textbf{e}_3\right] \partial _t \begin{pmatrix} q_{i1} \\ q_{i2} \\ q_{i3} \end{pmatrix}\nonumber \\{} & {} =\left[ \textbf{d}_1, \textbf{d}_2, \textbf{d}_3\right] \textbf{Q} \partial _t \begin{pmatrix} q_{i1} \\ q_{i2} \\ q_{i3} \end{pmatrix}, \quad i=1,2,3. \end{aligned}$$
(21)

Using the fact that for any two vectors \(\textbf{u}=\sum _{i=1}^{3}u_i^l\textbf{d}_i\) and \(\textbf{v}=\sum _{i=1}^{3}v_i^l\textbf{d}_i\),

$$\begin{aligned} \textbf{u}\times \textbf{v}= \left[ \textbf{d}_1, \textbf{d}_2, \textbf{d}_3\right] \left[ \begin{array}{ccc} 0 &{} -u_3^l &{} u_2^l \\ u_3^l &{} 0 &{} -u_1^l\\ -u_2^l &{} u_1^l &{} 0 \\ \end{array}\right] \begin{pmatrix} v_1^l \\ v_2^l \\ v_3^l \end{pmatrix}, \end{aligned}$$
(22)

we can compute the cross product in (18):

$$\begin{aligned} \textbf{d}_1\times \partial _t\textbf{q}_1= & {} \left[ \textbf{d}_1, \textbf{d}_2, \textbf{d}_3\right] \left[ \begin{array}{ccc} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} -1\\ 0 &{} 1 &{} 0 \\ \end{array}\right] \textbf{Q} \partial _t\nonumber \\ \begin{pmatrix} q_{11} \\ q_{12} \\ q_{13} \end{pmatrix}= & {} \left[ \textbf{d}_1, \textbf{d}_2, \textbf{d}_3\right] \begin{pmatrix} 0 \\ -\textbf{q}_3\cdot \partial _t\textbf{q}_1 \\ \textbf{q}_2\cdot \partial _t\textbf{q}_1 \end{pmatrix}, \end{aligned}$$
(23)
$$\begin{aligned} \textbf{d}_2\times \partial _t\textbf{q}_2= & {} \left[ \textbf{d}_1, \textbf{d}_2, \textbf{d}_3\right] \left[ \begin{array}{ccc} 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 0\\ -1 &{} 0 &{} 0 \\ \end{array}\right] \textbf{Q} \partial _t\nonumber \\ \begin{pmatrix} q_{21} \\ q_{22} \\ q_{23} \end{pmatrix}= & {} \left[ \textbf{d}_1, \textbf{d}_2, \textbf{d}_3\right] \begin{pmatrix} \textbf{q}_3\cdot \partial _t\textbf{q}_2 \\ 0 \\ -\textbf{q}_1\cdot \partial _t\textbf{q}_2 \end{pmatrix}, \end{aligned}$$
(24)
$$\begin{aligned} \textbf{d}_3\times \partial _t\textbf{q}_3= & {} \left[ \textbf{d}_1, \textbf{d}_2, \textbf{d}_3\right] \left[ \begin{array}{ccc} 0 &{} -1 &{} 0 \\ 1 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 \\ \end{array}\right] \textbf{Q} \partial _t\nonumber \\ \begin{pmatrix} q_{31} \\ q_{32} \\ q_{33} \end{pmatrix}= & {} \left[ \textbf{d}_1, \textbf{d}_2, \textbf{d}_3\right] \begin{pmatrix} -\textbf{q}_2\cdot \partial _t\textbf{q}_3 \\ \textbf{q}_1\cdot \partial _t\textbf{q}_3 \\ 0 \end{pmatrix}. \end{aligned}$$
(25)

Finally using (18) and (23) to (25), the angular velocity \(\varvec{\omega }\), in the local frame, can be expressed as:

$$\begin{aligned} \begin{pmatrix} \omega _1^l \\ \omega _2^l \\ \omega _3^l \end{pmatrix} =\frac{1}{2} \begin{pmatrix} \textbf{q}_3\cdot \partial _t\textbf{q}_2-\textbf{q}_2\cdot \partial _t\textbf{q}_3 \\ \textbf{q}_1\cdot \partial _t\textbf{q}_3-\textbf{q}_3\cdot \partial _t\textbf{q}_1 \\ \textbf{q}_2\cdot \partial _t\textbf{q}_1-\textbf{q}_1\cdot \partial _t\textbf{q}_2 \end{pmatrix} =\begin{pmatrix} \textbf{q}_3\cdot \partial _t\textbf{q}_2 \\ \textbf{q}_1\cdot \partial _t\textbf{q}_3 \\ \textbf{q}_2\cdot \partial _t\textbf{q}_1 \end{pmatrix}, \end{aligned}$$
(26)

noticing that for \(i\ne j\) (\(i,j=1,2,3\))

$$\begin{aligned} \textbf{q}_i\cdot \textbf{q}_j=0 \Rightarrow \partial _t\textbf{q}_i\cdot \textbf{q}_j+\textbf{q}_i\cdot \partial _t\textbf{q}_j=0. \end{aligned}$$

A further calculation based on (1) and (26) expresses rotation angles in the local frame as follows:

$$\begin{aligned} \begin{pmatrix} \omega _1^l \\ \omega _2^l \\ \omega _3^l \end{pmatrix} =\textbf{A} \partial _t \begin{pmatrix} \alpha \\ \beta \\ \gamma \end{pmatrix}, \quad \textbf{A} = \left[ \begin{array}{ccc} 0 &{} -\cos \alpha &{} \sin \alpha \cos \beta \\ 0 &{} -\sin \alpha &{} -\cos \alpha \cos \beta \\ -1 &{} 0 &{} -\sin \beta \\ \end{array} \right] . \end{aligned}$$
(27)

Using the same procedure, the curvature at a point along the centreline can be expressed in the local frame as:

$$\begin{aligned} \begin{pmatrix} \kappa _1^l \\ \kappa _2^l \\ \kappa _3^l \end{pmatrix} = \textbf{A} \partial _s \begin{pmatrix} \alpha \\ \beta \\ \gamma \end{pmatrix}. \end{aligned}$$
(28)

Substituting equation (13) into equation (5), we express the conservation of linear momentum in its component form as follows.

$$\begin{aligned}{} & {} \rho (s)A(s,t) \partial _{tt} \begin{pmatrix} x \\ y \\ z \end{pmatrix}\nonumber \\{} & {} \quad =\partial _s \left( j(\tilde{s},t) \textbf{Q}^T\textbf{K}{} \textbf{Q} \partial _s \begin{pmatrix} x \\ y \\ z \end{pmatrix} - \textbf{Q}^T \begin{pmatrix} 0 \\ 0 \\ K_{33} \end{pmatrix} \right) \nonumber \\{} & {} \qquad +\begin{pmatrix} f_1^g \\ f_2^g \\ f_3^g \end{pmatrix}, \end{aligned}$$
(29)

with \(\textbf{f}(s,t)=\sum _{i=1}^{3}f_i^g\textbf{e}_i\).

Transforming the local coordinates in (27), (28), (7) and (15) into global coordinates by (4), then substituting them into equation (6), we express the conservation of angular momentum equation in its component form as follows:

$$\begin{aligned}{} & {} \partial _t \left( \textbf{Q}^T \textbf{I} \textbf{A} \partial _t \begin{pmatrix} \alpha \\ \beta \\ \gamma \end{pmatrix} \right) =\partial _s \left( \textbf{Q}^T \textbf{J} \textbf{A} \partial _s \begin{pmatrix} \alpha \\ \beta \\ \gamma \end{pmatrix} \right) \nonumber \\{} & {} \quad + \left[ \begin{array}{ccc} 0 &{} -\partial _s z &{} \partial _s y \\ \partial _s z &{} 0 &{} -\partial _s x\\ -\partial _s y &{} \partial _s x &{} 0 \\ \end{array} \right] \left( j(\tilde{s},t) \textbf{Q}^T\textbf{K}{} \textbf{Q} \partial _s \begin{pmatrix} x \\ y\\ z \end{pmatrix} - \textbf{Q}^T \begin{pmatrix} 0 \\ 0 \\ K_{33} \end{pmatrix} \right) \nonumber \\{} & {} \quad + \begin{pmatrix} l_1^g \\ l_2^g \\ l_3^g \end{pmatrix}, \end{aligned}$$
(30)

with \(\textbf{l}(s,t)=\sum _{i=1}^{3}l_i^g\textbf{e}_i\).

Remark 3

It can be seen from (2) that \(\textbf{d}_i\equiv \textbf{q}_i\) (\(i=1,2,3\)) if we choose \(\textbf{e}_1=\left( 1,0,0\right) ^T\), \(\textbf{e}_2=\left( 0,1,0\right) ^T\) and \(\textbf{e}_3=\left( 0,0,1\right) ^T\). This observation will be adopted in Section 5 for numerical implementation.

2.4 Incompressibility assumption

We assume the rod is incompressible and derive a condition for its cross section A(st) in this section. An incompressible material requires the total volume to be constant, i.e.:

$$\begin{aligned} \frac{d}{dt}\int _{a(t)}^{b(t)}A(s,t)ds=\frac{d}{dt}\int _{a_0}^{b_0}A(s(\tilde{s},t),t)j(\tilde{s},t)d\tilde{s}=0, \end{aligned}$$
(31)

from which we get

$$\begin{aligned} \frac{dA(s,t)}{dt}j(\tilde{s},t)+A(s,t)\frac{dj(\tilde{s},t)}{dt}=0. \end{aligned}$$
(32)

This equation can be solved by separation of variables as follows:

$$\begin{aligned} \frac{dA}{A}=-\frac{dj}{j}=0. \end{aligned}$$
(33)

Considering the initial condition \(j(\tilde{s},0)=1\) and \(A(\tilde{s},0)=A_0\), and noticing that A and j are both positive, the solution of (33) can be expressed as:

$$\begin{aligned} ln(A)=-ln(j)+ln(A_0) \Rightarrow A(s,t)=\frac{A_0}{j(\tilde{s},t)}. \end{aligned}$$
(34)

2.5 Finite element weak form

Equations (29) and (30) can be solved either on the reference configuration \(\tilde{s}\) (total Lagrangian formulation) or the current configuration s (updated Lagrangian formulation). These two formulations can be transformed from one to another using equation (34), and we introduce these two formulations in this section. Let

$$\begin{aligned} \textbf{x} = \begin{pmatrix} x \\ y \\ z \end{pmatrix}, \quad {\varvec{\alpha }} = \begin{pmatrix} \alpha \\ \beta \\ \gamma \end{pmatrix}, \quad \textbf{B}(\textbf{x}) =\left[ \begin{array}{ccc} 0 &{} -\partial _s z &{} \partial _s y \\ \partial _s z &{} 0 &{} -\partial _s x\\ -\partial _s y &{} \partial _s x &{} 0 \\ \end{array}\right] , \end{aligned}$$
(35)

then the weak form of (29) and (30) on the current configuration \(\left[ a(t), b(t)\right] \) can be expressed as

$$\begin{aligned} \begin{aligned}&\int _a^b\rho A(s,t){\delta \textbf{x}}^T\partial _{tt}{} \textbf{x} ds +\int _a^b \partial _s{\delta \textbf{x}}^T \left[ j\textbf{Q}^T\textbf{K}\textbf{Q}\partial _s\textbf{x}-K_{33}{} \textbf{q}_3\right] ds \\&\qquad +\int _a^b{\delta {\varvec{\alpha }}}^T\partial _t\left[ \textbf{Q}^T\textbf{I}{} \textbf{A}\partial _t{\varvec{\alpha }}\right] ds +\int _a^b\partial _s {\delta {\varvec{\alpha }}}^T \left[ \textbf{Q}^T\textbf{J} \textbf{A}\partial _s{\varvec{\alpha }}\right] ds\\&\quad =\int _a^b {\delta {\varvec{\alpha }}}^T\textbf{B}\left[ j\textbf{Q}^T\textbf{K}{} \textbf{Q}\partial _s\textbf{x}-K_{33}{} \textbf{q}_3\right] ds\\&\qquad +\int _a^b {\delta {\varvec{\alpha }}}^T\textbf{l} ds +\int _a^b {\delta \textbf{x}}^T \textbf{f} ds \end{aligned} \end{aligned}$$
(36)

with \(\delta \textbf{x}\) and \({\delta {\varvec{\alpha }}}\) denoting the test functions corresponding to \(\textbf{x}\) and \({\varvec{\alpha }}\) respectively. Let

$$\begin{aligned} K_{11}^0= & {} K_{22}^0=kGA_0, \quad K_{33}^0=EA_0,\nonumber \\ \textbf{K}_0= & {} \text {diag}\left( K_{11}^0, K_{22}^0 , K_{33}^0\right) , \end{aligned}$$
(37)
$$\begin{aligned} I_{11}^0= & {} I_{22}^0=\rho A_0^2/4\pi , \quad I_{33}^0=\rho A_0^2/2\pi ,\nonumber \\ \textbf{I}_0= & {} \text {diag}\left( I_{11}^0, I_{22}^0 , I_{33}^0\right) , \end{aligned}$$
(38)

and

$$\begin{aligned}{} & {} J_{11}^0=J_{22}^0=EA_0^2/4\pi , \quad J_{33}^0=GA_0^2/2\pi , \nonumber \\{} & {} \quad \textbf{J}_0=\text {diag}\left( J_{11}^0, J_{22}^0 , J_{33}^0\right) . \end{aligned}$$
(39)

Then, (36) can rewritten, in the reference configuration \(\tilde{s}\), as:

$$\begin{aligned} \begin{aligned}&\int _{a_0}^{b_0}\rho A_0{\delta \textbf{x}}^T\partial _{tt}{} \textbf{x} d\tilde{s} +\int _{a_0}^{b_0} \partial _{\tilde{s}}{\delta \textbf{x}}^T \left[ \textbf{Q}^T\textbf{K}_0\textbf{Q}\partial _{\tilde{s}}{} \textbf{x}-K_{33}^0\textbf{q}_3\right] d\tilde{s} \\&\qquad +\int _{a_0}^{b_0}j{\delta {\varvec{\alpha }}}^T\partial _t\left[ j^{-2}\textbf{Q}^T\textbf{I}_0\textbf{A}\partial _t{\varvec{\alpha }}\right] d\tilde{s}\\&\qquad +\int _{a_0}^{b_0}j^{-3}\partial _{\tilde{s}} {\delta {\varvec{\alpha }}}^T \left[ \textbf{Q}^T\textbf{J}_0\textbf{A}\partial _{\tilde{s}}{\varvec{\alpha }}\right] d\tilde{s}\\&\quad =\int _{a_0}^{b_0} j^{-1}{\delta {\varvec{\alpha }}}^T\textbf{B}\left[ \textbf{Q}^T\textbf{K}_0\textbf{Q}\partial _{\tilde{s}}\textbf{x}-K_{33}{} \textbf{q}_3\right] d\tilde{s} \\&\qquad +\int _{a_0}^{b_0} j{\delta {\varvec{\alpha }}}^T\textbf{l} d\tilde{s} +\int _{a_0}^{b_0} j{\delta \textbf{x}}^T \textbf{f} d\tilde{s}. \end{aligned} \end{aligned}$$
(40)

Remark 4

It is convenient to solve a forward problem on \(\tilde{s}\) with updating the deformation scaler j based on a fixed-point iteration for example, while it is convenient to solve a backward problem on s (see Section 3) because we already have the current mesh s and j can be computed directly.

3 The optimal control problem

In this section, we formulate an optimal control problem based on the Cosserat rod model described in the previous section. The motivation is to reconstruct the full rod configuration by computing \((\alpha , \beta , \gamma )\) from observed data \((x_g, y_g, z_g)\). This is an inverse problem which we formulate as a control problem. In case of low Reynolds number rods, we neglect the inertia terms and consider the following optimisation problem: reducing the discrepancy between the centreline \(\textbf{x}\) and an objective position given by the observed data \(\textbf{x}_g = (x_g, y_g, z_g)\), by optimisation of the external force \(\textbf{f}\) and torque \(\textbf{l}\) in (29) and (30).

Problem 1

(piecewise-in-time control) Given the state variables \(\textbf{x}_{n-1}\) and \({\varvec{\alpha }}_{n-1}\) at the previous time \(t_{n-1}\) (\(n=1, 2, \ldots \)), and an objective position vector \(\textbf{x}_g(t_n)\) of the worm’s centreline at current time \(t_n\),

$$\begin{aligned} \begin{aligned}&\underset{\textbf{f}_n,\textbf{l}_n\in L^2\left( [a,b]\right) }{\text {minimise}} \quad J(\textbf{x}_n,{\varvec{\alpha }}_n,\textbf{f}_n,\textbf{l}_n) =\frac{\lambda _g}{2}\int _{a}^{b}\left| \textbf{x}_n-\textbf{x}_g(t_n)\right| ^2 \\&\quad +\frac{\lambda _f}{2}\int _{a}^{b}\left| \textbf{f}_n\right| ^2 +\frac{\lambda _l}{2}\int _{a}^{b}\left| \textbf{l}_n\right| ^2 \\&\quad +\frac{\lambda _d}{2}\int _{a}^{b}\left| \partial _s\textbf{x}_n-\textbf{d}_3({\varvec{\alpha }}_n)\right| ^2 +\frac{1}{2}\int _{a}^{b}\left| {\varvec{\varLambda }}_\kappa ^{1/2}\textbf{A}\partial _s{\varvec{\alpha }}_n\right| ^2\\&\quad +\frac{1}{2}\int _{a}^{b}\left| {\varvec{\varLambda }}_\omega ^{1/2}\textbf{A}\frac{\left( {\varvec{\alpha }}_n-{\varvec{\alpha }}_{n-1}\right) }{\varDelta t}\right| ^2, \end{aligned} \end{aligned}$$
(41)

subject to

$$\begin{aligned} \partial _s\left[ \textbf{Q}^T({\varvec{\alpha }}_n)\textbf{K}_0\textbf{Q} ({\varvec{\alpha }}_n)\partial _s\textbf{x}_n-j^{-1}K_{33}\textbf{q}_3({\varvec{\alpha }}_n)\right] +\textbf{f}_n=0, \end{aligned}$$
(42)

and

$$\begin{aligned}{} & {} \partial _s\left[ \textbf{Q}^T({\varvec{\alpha }}_n)\textbf{J}\textbf{A}({\varvec{\alpha }}_n)\partial _s{\varvec{\alpha }}_n\right] +\textbf{B}(\textbf{x}_n)\left[ \textbf{Q}^T({\varvec{\alpha }}_n)\textbf{K}_0\textbf{Q}({\varvec{\alpha }}_n)\partial _s\textbf{x}_n\right. \nonumber \\{} & {} \quad \left. -j^{-1}K_{33}{} \textbf{q}_3({\varvec{\alpha }}_n)\right] +\textbf{l}_n=0. \end{aligned}$$
(43)

In the above, \({\varvec{\varLambda }}_\kappa =\text {diag}\left( \lambda _{\kappa _1}, \lambda _{\kappa _2}, \lambda _{\kappa _3}\right) \) and \({\varvec{\varLambda }}_\omega =\text {diag}\left( \lambda _{\omega _1}, \lambda _{\omega _2}, \lambda _{\omega _3}\right) \) are two diagonal matrices. The first term in (41) is the real objective to be minimised, and we choose \(\lambda _g=10^9 \sim 1/\int _a^b|\textbf{x}_g|^2\), so that the first term would not become infinitely small during the process of minimisation. All the other terms are regularisation terms with regularisation parameters \(\lambda _f\), \(\lambda _l\), \(\lambda _d\), \({\varvec{\varLambda }}_\kappa \) and \({\varvec{\varLambda }}_\omega \). Too large regularisation parameters could make it difficult to achieve the real objective, while too small ones may cause convergence issues for the numerical scheme. These regularisation terms have different mathematical and computational purposes (see Section 3.4 for biological meanings of each term): generally speaking, we want to add reasonable constraints so that the control problem is solvable and a unique solution can be obtained; the second (\(\lambda _f-\)term) and the third term (\(\lambda _l-\)term) are constraints of the control variables so that they would not go to infinity; the term \(\lambda _d\) is used to disallow arbitrary rotation angles and ensure the problem is solvable; the regularisation parameter of the last two terms \({\varvec{\varLambda }}_\kappa -\)term and \({\varvec{\varLambda }}_\omega -\)term in (41) are diagonal matrices, whose elements are parameters to control the three components of generalised curvature and angular velocity correspondingly.

Remark 5

Because we lack data to control the rotation angle \(\varvec{\alpha }\), we set a control of the variation of \(\varvec{\alpha }\), with respect to space (\({\varvec{\varLambda }}_\kappa -\)term) and time (\({\varvec{\varLambda }}_\omega -\)term), in the last two terms in (41), which is necessary for the solution existence and for the convergence of our numerical method as formulated in Problem 2 in Section 4.

We introduce the Lagrange multipliers (or adjoint variables) \(\hat{\textbf{x}}, \hat{\varvec{\alpha }}\) to eliminate the constraints of Problem 1 (dropping the subscript ‘n’ for simplicity).

$$\begin{aligned} \begin{aligned}&L\left( \textbf{x}, {\varvec{\alpha }}, \textbf{f}, \textbf{l}, \hat{\textbf{x}}, \hat{\varvec{\alpha }}\right) = J(\textbf{x},\textbf{f},\textbf{l}) \\&\quad -\int _{a}^{b}\hat{\textbf{x}}^T\left[ \partial _s\left( \textbf{Q}^T\textbf{K}_0\textbf{Q}\partial _s\textbf{x}-j^{-1}K_{33}{} \textbf{q}_3\right) +\textbf{f}\right] \\&\quad -\int _{a}^{b}\hat{\varvec{\alpha }}^T\left[ \partial _s\left( \textbf{Q}^T\textbf{J}{} \textbf{A}\partial _s{\varvec{\alpha }}\right) \right. \\&\quad \left. +\textbf{B}\left( \textbf{Q}^T\textbf{K}_0\textbf{Q}\partial _s\textbf{x}-j^{-1}K_{33}{} \textbf{q}_3\right) +\textbf{l}\right] \\&\quad +\hat{\textbf{x}}(b)^T\left[ \textbf{x}(b)-\textbf{x}_g(b)\right] - \hat{\textbf{x}}(a)^T\left[ \textbf{x}(a)-\textbf{x}_g(a)\right] \\&\quad + \hat{\varvec{\alpha }}(b)^T\textbf{m}(b) - \hat{\varvec{\alpha }}(a)^T\textbf{m}(a). \end{aligned} \end{aligned}$$
(44)

The following boundary conditions are also included in the above functional L:

$$\begin{aligned} \textbf{x}(a)-\textbf{x}_g(a)=\textbf{x}(b)-\textbf{x}_g(b)=0, \end{aligned}$$
(45)

and

$$\begin{aligned} \textbf{m}(a)=\textbf{m}(b)=0. \end{aligned}$$
(46)

Remark 6

We find that the proposed optimal control formulation is solvable with either a Dirichlet boundary condition \(\alpha (a)=\alpha _a\) (first component of rotation \(\varvec{\alpha }\)) or the regularisation \({\varvec{\varLambda }}_\omega -\)term in (41). We do not have the Dirichlet data for all frames unfortunately, but we notice from the last term in (41) that \({\varvec{\alpha }}_0\) must be given. Therefore, we solve the first frame using Dirichlet boundary condition \(\alpha (a)=0\) instead of \({\varvec{\varLambda }}_\omega -\)term, and from the second frame we use the \({\varvec{\varLambda }}_\omega -\)term.

Remark 7

The solvability of Problem 1 is generally a difficult question, and there is one case we are sure is unsolvable: suppose the rod undergoes a pure twist induced only by the third component of the torque \(\textbf{l}\) in (43), in which case there is no way to determine the frames only using the data from the centreline. In all other cases (\(l_3^l=0\)), the deformation of the centreline is coupled with the frames and hopefully we can detect the frames through this coupling by solving Problem 1. We shall validate this idea in numerical test 5.2. Luckily, it is parsimonious to assume \(l_3^l=0\) when modelling C. elegans because its longitudinal body wall muscles which may not generate \(l_3^l\) torque.

After integration by parts, L can be further expressed as follows:

$$\begin{aligned} \begin{aligned}&L\left( \textbf{x}, {\varvec{\alpha }}, \textbf{f}, \textbf{l}, \hat{\textbf{x}}, \hat{\varvec{\alpha }}\right) = J(\textbf{x}, \textbf{f},\textbf{l}) \\&\quad +\int _{a}^{b}\partial _s\hat{\textbf{x}}^T\left( \textbf{Q}^T\textbf{K}_0\textbf{Q}\partial _s\textbf{x}-j^{-1}K_{33}{} \textbf{q}_3\right) -\int _{a}^{b}\hat{\textbf{x}}^T\textbf{f} \\&\quad +\int _{a}^{b}\partial _s\hat{\varvec{\alpha }}^T\left( \textbf{Q}^T\textbf{J}{} \textbf{A}\right) \partial _s{\varvec{\alpha }}\\&\qquad -\int _{a}^{b}\hat{\varvec{\alpha }}^T\textbf{B}\left( \textbf{Q}^T\textbf{K}_0\textbf{Q}\partial _s\textbf{x}-j^{-1}K_{33}{} \textbf{q}_3\right) \\&\quad -\int _{a}^{b}\hat{\varvec{\alpha }}^T\textbf{l} \\&\quad + \hat{\textbf{x}}(b)^T\left[ \textbf{x}(b)-\textbf{x}_g(b)\right] - \hat{\textbf{x}}(a)^T\left[ \textbf{x}(a)-\textbf{x}_g(a)\right] \\&\quad - \hat{\textbf{x}}(b)^T\textbf{n}(b) + \hat{\textbf{x}}(a)^T\textbf{n}(a). \end{aligned} \end{aligned}$$
(47)

The following Karush-Kuhn-Tucker (KKT) conditions are the first-order necessary conditions to minimise (47).

$$\begin{aligned}{} & {} \delta {L}(\cdot )\left[ \left( \hat{\textbf{x}}, \hat{\varvec{\alpha }}\right) ; \left( \delta \hat{\textbf{x}}, \delta \hat{\varvec{\alpha }}\right) \right] =0, \end{aligned}$$
(48)
$$\begin{aligned}{} & {} \delta {L}(\cdot )\left[ \left( \textbf{x}, {\varvec{\alpha }}\right) ;\left( {\delta \textbf{x}}, \delta {\varvec{\alpha }}\right) \right] =0, \end{aligned}$$
(49)
$$\begin{aligned}{} & {} \delta {L}(\cdot )\left[ \left( \textbf{f},\textbf{l}\right) ; \left( \delta \textbf{f},\delta \textbf{l}\right) \right] =0, \end{aligned}$$
(50)

with

$$\begin{aligned} \delta {L}(\cdot )[\textbf{p}; \textbf{q}]=\left. \frac{d}{d\epsilon }L\left( \textbf{p}+\epsilon \textbf{q}\right) \right| _{\epsilon =0} \end{aligned}$$
(51)

being the G\(\mathrm{\hat{a}}\)teaux derivative with respect to variable \(\textbf{p}\) along the direction \(\textbf{q}\) [67]. If \(\textbf{q}\) is an arbitrary direction from \(\textbf{p}\), it is usually expressed as \(\textbf{q}=\delta \textbf{p}\) (variation of \(\textbf{p}\)) [7], in which case it is convenient to abbreviate \(\delta {L}(\textbf{p})[\textbf{p}; \delta \textbf{p}]\) as \(\delta {L}(\textbf{p})\).

Let \(L^2\left( [a,b]\right) \) be the square integrable functions in domain [ab] with inner product \((u,v)=\int _a^buv\) and the induced norm \(\Vert u\Vert =(u,u)^{1/2}\), \(\forall u,v\in L^2([a,b])\). For vector function \(\textbf{u}\in L^2([a,b])^d\) (\(d=6\) for three components of the position vector and three components of the rotation vector), the norm is defined component-wise as \(\Vert \textbf{u}\Vert ^2=\sum _{i=1}^{d}\Vert u_i\Vert ^2\). Let \(H^1([a,b])=\left\{ \textbf{u}: \textbf{u}, \partial _s\textbf{u}\in L^2([a,b])^d\right\} \), and \(H_D^1([a,b])\) be the subspace of \(H^1([a,b])\) whose functions satisfy the Dirichlet boundary condition in (45), in particular \(H_0^1([a,b])\), the homogeneous Dirichlet boundary conditions. The above optimality conditions: (48) to (50), lead to the following partial differential equations (in weak forms).

3.1 Primal equation

The optimality condition (48) gives the primal equation in its weak form as follows. Find \(\left( \textbf{x}, {\varvec{\alpha }}\right) \in H_D^1\), such that \(\forall \left( \delta \hat{\textbf{x}}, \delta \hat{\varvec{\alpha }}\right) \in H_0^1\):

$$\begin{aligned} \begin{aligned}&\int _{a}^{b}\partial _s{\delta \hat{\textbf{x}}}^T\left( \textbf{Q}^T\textbf{K}_0\textbf{Q}\right) \partial _s\textbf{x} +\int _{a}^{b}\partial _s{\delta \hat{\varvec{\alpha }}}^T\left( \textbf{Q}^T\textbf{J}{} \textbf{A}\right) \partial _s{\varvec{\alpha }} \\&\quad -\int _{a}^{b} {\delta \hat{\varvec{\alpha }}}^T\left( \textbf{B} \textbf{Q}^T\textbf{K}_0\textbf{Q}\right) \partial _s\textbf{x}\\&\qquad =\int _{a}^{b}{\delta \hat{\textbf{x}}}^T\textbf{f} +\int _{a}^{b}{\delta \hat{\varvec{\alpha }}}^T\textbf{l} +\int _{a}^{b}j^{-1}K_{33}\partial _s{\delta \hat{\textbf{x}}}^T\textbf{q}_3\\&\quad -\int _{a}^{b} j^{-1}K_{33}{\delta \hat{\varvec{\alpha }}}^T\textbf{B} \textbf{q}_3. \end{aligned} \end{aligned}$$
(52)

3.2 Adjoint equation

The optimality condition (49) gives the adjoint equation in its weak form as follows (neglecting the variation of the matrix \(\textbf{Q}\) \(\textbf{A}\), \(\textbf{B}\) and vector \(\textbf{q}_3\)). Find \(\left( \hat{\textbf{x}}, \hat{\varvec{\alpha }} \right) \in H_0^1\), such that \(\forall \left( \delta \textbf{x},\delta {\varvec{\alpha }}\right) \in H_0^1\):

$$\begin{aligned} \begin{aligned}&\int _{a}^{b}\partial _s{\hat{\textbf{x}}}^T\left( \textbf{Q}^T\textbf{K}_0\textbf{Q}\right) \partial _s\delta \textbf{x} +\int _{a}^{b}\partial _s{\hat{\varvec{\alpha }}}^T\left( \textbf{Q}^T\textbf{J}{} \textbf{A}\right) \partial _s{\delta \varvec{\alpha }} \\&\quad -\int _{a}^{b} {\hat{\varvec{\alpha }}}^T\left( \textbf{B} \textbf{Q}^T\textbf{K}_0\textbf{Q}\right) \partial _s{\delta \textbf{x}} + \lambda _g \int _{a}^{b}{\delta \textbf{x}}^T\left( \textbf{x}-\textbf{x}_g\right) \\&\quad + \int _{a}^{b}{\delta {\varvec{\alpha }}}^T\textbf{A}^T{\varvec{\varLambda }}_\kappa \textbf{A} \partial _s{\varvec{\alpha }}\\&\quad + \frac{1}{\varDelta t^2} \int _{a}^{b}{\delta {\varvec{\alpha }}}^T\textbf{A}^T{\varvec{\varLambda }}_\omega \textbf{A}\left( {\varvec{\alpha }}-{\varvec{\alpha }}_{n-1}\right) \\&\quad +\lambda _d \int _{a}^{b}\partial _s\delta \textbf{x}^T\left( \partial _s\textbf{x}-\textbf{d}_3({\varvec{\alpha }})\right) \\&\quad +\lambda _d \int _{a}^{b}\delta \textbf{d}_3^T\left( \partial _s\textbf{x}-\textbf{d}_3({\varvec{\alpha }})\right) =0. \end{aligned} \end{aligned}$$
(53)

Remark 8

We have implemented the adjoint equation with consideration of variation of \(\textbf{q}_3\), and we found that our optimal control algorithm in Section 4 struggled to converge. It is worth investigating the reason for this convergence issue, and testing the case in which the variation of all these terms is included in the future.

3.3 Optimality equation

The optimality condition (50) gives the relation between the control force and adjoint variable:

$$\begin{aligned} \lambda _f\int _a^b\delta \textbf{f}^T\textbf{f} +\lambda _l\int _a^b\delta \textbf{l}^T\textbf{l} =\int _a^b\delta \textbf{f}^T\hat{\textbf{x}} +\int _a^b\delta \textbf{l}^T\hat{\varvec{\alpha }}. \end{aligned}$$
(54)

3.4 Relation to C. elegans locomotion problem

Our locomotion dataset, obtained from a 3D microscopic set up [72] contains many different trajectories of centreline positions \(\textbf{x}_g\) over time which will be used in Problem 1. The controls, the forces and the torques, can be interpreted as the reaction force from the surrounding fluids, which is initially activated by the worm’s muscles.

Fig. 4
figure 4

A sketch of C. elegans. DNC: dorsal nerve cord; VNC: ventral nerve cord

The muscles generating locomotion in C. elegans, are called body wall muscles (see Figure 4) because they are tethered to the ‘wall’ (or cuticle) of the animal, acting longitudinally to contract or relax the local side of the body. As C. elegans contains 95 body wall muscles that span the entire body length, we consider the action of the muscles continuously along the body. The directionality of the muscle contraction at every point along the animal is determined by \(\alpha \), \(\beta \) and \(\gamma \) which themselves are unknowns as part of the control problem. To represent this muscle action, we only allow \(\textbf{d}_3\) close to the tangential direction of the body and also restrict the twisting movement of the worm as captured by the \(\lambda _d-\)term.

The regularisations given by \({\varvec{\varLambda }}_\kappa -\)term and \({\varvec{\varLambda }}_\omega -\)term are also biologically motivated. For example, it is known from the anatomy that the left and right muscle quadrants do not receive distinct neural connections along the body and tail (i.e. the posterior two thirds of the body which lie beyond the neck of the animal). This results in the majority of bending occurring in the dorsal-ventral directions and less in the left-right directions with the exception of the head and tail. We therefore adjust the magnitude of \(\lambda _{\kappa _1}\) and \(\lambda _{\kappa _2}\) to favour solutions that have more bending around \(\textbf{d}_1\) than \(\textbf{d}_2\). The longitudinal muscles also restrict the twisting motion of the body which can be considered by setting a larger \(\lambda _{\kappa _3}\). These additional adjustments of constraints may result in a frame that more closely matches the anatomically meaningful frame of the animal as it moves around in 3D. \({\varvec{\varLambda }}_\omega -\)term models the internal friction, which can also be different in three local directions based on the worm’s anatomy.

For the forward simulations of biological worms, the Neumann boundary condition is usually adopted at both ends of the rod. For the backward simulations, we can fully use the information from the data and adopt appropriate Dirichlet boundary conditions as considered above in (45).

Remark 9

We have the data \(\textbf{x}_g\) of the worm’s centreline for every time frame, which means the mesh (arc length) \(s(\tilde{s},t_n)\) is known at the current time frame \(t_n\). Therefore, we can directly compute the deformation scaler \(j=\partial _{\tilde{s}}s(\tilde{s},t_n)\). This is generally not true for a forward problem, which usually requires j to be iteratively computed.

4 A monolithic optimal control formulation

Substituting the optimality condition (54), specifically its strong form \(\textbf{f}=\hat{\textbf{x}}/\lambda _f\) and \(\textbf{f}=\hat{\varvec{\alpha }}/\lambda _l\), into equation (52), we have a monolithic scheme to solve the optimisation Problem 1 as follows.

$$\begin{aligned} \begin{aligned}&\int _{a}^{b}\partial _s{\delta \hat{\textbf{x}}}^T\left( \textbf{Q}^T\textbf{K}_0\textbf{Q}\right) \partial _s\textbf{x} +\int _{a}^{b}\partial _s{\delta \hat{\varvec{\alpha }}}^T\left( \textbf{Q}^T\textbf{J}{} \textbf{A}\right) \partial _s{\varvec{\alpha }} \\&\qquad -\int _{a}^{b} {\delta \hat{\varvec{\alpha }}}^T\left( \textbf{B} \textbf{Q}^T\textbf{K}_0\textbf{Q}\right) \partial _s\textbf{x}\\&\qquad +\int _{a}^{b}\partial _s{\hat{\textbf{x}}}^T\left( \textbf{Q}^T\textbf{K}_0\textbf{Q}\right) \partial _s{\delta \textbf{x}} +\int _{a}^{b}\partial _s{\hat{\varvec{\alpha }}}^T\left( \textbf{Q}^T\textbf{J}{} \textbf{A}\right) \partial _s{\delta \varvec{\alpha }} \\&\qquad -\int _{a}^{b} {\hat{\varvec{\alpha }}}^T\left( \textbf{B} \textbf{Q}^T\textbf{K}_0\textbf{Q}\right) \partial _s{\delta \textbf{x}} +\lambda _g \int _{a}^{b}{\delta \textbf{x}}^T\left( \textbf{x}-\textbf{x}_g\right) \\&\qquad + \int _{a}^{b}{\delta {\varvec{\alpha }}}^T\textbf{A}^T{\varvec{\varLambda }}_\kappa \textbf{A} \partial _s{\varvec{\alpha }} + \frac{1}{\varDelta t^2} \int _{a}^{b}{\delta {\varvec{\alpha }}}^T\textbf{A}^T{\varvec{\varLambda }}_\omega \textbf{A}\left( {\varvec{\alpha }}-{\varvec{\alpha }}_{n-1}\right) \\&\qquad +\lambda _d \int _{a}^{b}\partial _s\delta \textbf{x}^T\left( \partial _s\textbf{x}-\textbf{d}_3({\varvec{\alpha }})\right) +\lambda _d \int _{a}^{b}\delta \textbf{d}_3^T\left( \partial _s\textbf{x}-\textbf{d}_3({\varvec{\alpha }})\right) \\&\quad =\int _{a}^{b}\frac{1}{\lambda _f}{\delta \hat{\textbf{x}}}^T{\hat{\textbf{x}}} +\int _{a}^{b}\frac{1}{\lambda _l}{\delta \hat{\varvec{\alpha }}}^T{\hat{\varvec{\alpha }}} +\int _{a}^{b}j^{-1}K_{33}\partial _s{\delta \hat{\textbf{x}}}^T\textbf{q}_3\\&\qquad -\int _{a}^{b} j^{-1}K_{33}{\delta \hat{\varvec{\alpha }}}^T\textbf{B} \textbf{q}_3. \end{aligned} \end{aligned}$$
(55)

The above equation is highly non-linear and coupled between the state variables \(\textbf{x}, {\varvec{\alpha }}\) and adjoint variables \(\hat{\textbf{x}}, \hat{\varvec{\alpha }}\). For the coefficient matrices \(\textbf{Q}\), \(\textbf{A}\) and \(\textbf{B}\), we use the fixed-point iterations to compute \(\textbf{Q}({\varvec{\alpha }}_n^k)\rightarrow \textbf{Q}({\varvec{\alpha }}_n)\), \(\textbf{A}({\varvec{\alpha }}_n^k)\rightarrow \textbf{A}({\varvec{\alpha }}_n)\) and \(\textbf{B}(\textbf{x}_n^k)\rightarrow \textbf{B}(\textbf{x}_n)\) as \(k\rightarrow +\infty \), starting from \({\varvec{\alpha }}_n^0={\varvec{\alpha }}_{n-1}\) and \(\textbf{x}_n^0=\textbf{x}_{n-1}\). In addition, \(\delta \textbf{d}_3({\varvec{\alpha }})\) is also linearised as \(\delta \textbf{d}_3({\varvec{\alpha }}^k)\) based on the fixed-point iteration. We use Newton’s method to linearise the non-linear term \(\textbf{d}_3({\varvec{\alpha }})\) as follows. From (1) and Remark 3, \(\textbf{d}_3=\left[ -\cos \beta \sin \gamma , \sin \beta , \cos \beta \cos \gamma \right] ^T\), we compute the variation of \(\textbf{d}_3\) with respect to \({\varvec{\alpha }}\) along \(\delta {\varvec{\alpha }}\):

$$\begin{aligned} \delta \textbf{d}_3\left( {\varvec{\alpha }}\right) = \delta \beta \textbf{d}_{\beta }(\beta , \gamma )+\delta \gamma \textbf{d}_\gamma (\beta , \gamma ), \end{aligned}$$
(56)

with

$$\begin{aligned} \textbf{d}_\beta (\beta , \gamma ) =\left[ \sin \beta \sin \gamma , \cos \beta , -\sin \beta \cos \gamma \right] ^T, \end{aligned}$$
(57)

and

$$\begin{aligned} \textbf{d}_\gamma (\beta , \gamma ) =\left[ -\cos \beta \cos \gamma , 0, -\cos \beta \sin \gamma \right] ^T. \end{aligned}$$
(58)

The first order Taylor approximation of \(\textbf{d}_3({\varvec{\alpha }})\) at \({\varvec{\alpha }}^k\) is expressed as:

$$\begin{aligned} \textbf{d}_3({\varvec{\alpha }})\approx \textbf{d}_3({\varvec{\alpha }}^k) + \delta \textbf{d}_{3}\left[ {\varvec{\alpha }}^k; {\varvec{\alpha }}-{\varvec{\alpha }}^k\right] . \end{aligned}$$
(59)

Substituting (57) and (58) into (59), we then can linearise \(\textbf{d}_3({\varvec{\alpha }})\) as follows:

$$\begin{aligned} \textbf{d}_3({\varvec{\alpha }})\approx & {} \textbf{d}_3({\varvec{\alpha }}^k) + \left( \beta -\beta ^k\right) \textbf{d}_\beta (\beta ^k, \gamma ^k) \nonumber \\{} & {} + \left( \gamma -\gamma ^k\right) \textbf{d}_\gamma (\beta ^k, \gamma ^k). \end{aligned}$$
(60)

Finally, by substituting (56) with \(\beta =\beta ^k\) and \(\gamma =\gamma ^k\) and (60) into equation (55), and denoting \(\textbf{Q}\left( {\varvec{\alpha }}_n^k\right) =\textbf{Q}_k\), \(\textbf{A}\left( {\varvec{\alpha }}_n^k\right) =\textbf{A}_k\), \(\textbf{B}(\textbf{x}_n^k)=\textbf{B}_k\), \(\textbf{d}_\beta (\beta ^k, \gamma ^k)=\textbf{d}_\beta ^k\) and \(\textbf{d}_\gamma (\beta ^k, \gamma ^k)=\textbf{d}_\gamma ^k\), we have the following (Problem 2) monolithic formulation to solve Problem 1.

Problem 2

Given the state variables \(\left( \textbf{x}_{n-1}, {\varvec{\alpha }}_{n-1}\right) \) at the previous time \(t_{n-1}\) (\(n=1, 2, \ldots \)), and an objective position vector \(\textbf{x}_g(t_n)\) at the current time \(t_n\), compute \(\left( \textbf{x}^k, {\varvec{\alpha }}^k\right) \rightarrow \left( \textbf{x}, {\varvec{\alpha }}\right) \in H_D^1\), \(\left( \hat{\textbf{x}}, \hat{\varvec{\alpha }} \right) \in H_0^1\) iteratively from \(\left( \textbf{x}^0, {\varvec{\alpha }}^0\right) =\left( \textbf{x}_{n-1}, {\varvec{\alpha }}_{n-1}\right) \), such that \(\forall \left( \delta \textbf{x}, \delta {\varvec{\alpha }}\right) \in H_0^1\) and \(\forall \left( \delta \hat{\textbf{x}}, \delta \hat{\varvec{\alpha }}\right) \in H_0^1\):

$$\begin{aligned}&\int _{a}^{b}\partial _s{\delta \hat{\textbf{x}}}^T\left( \textbf{Q}_k^T\textbf{K}_0\textbf{Q}_k\right) \partial _s\textbf{x} +\partial _s{\delta \hat{\varvec{\alpha }}}^T\left( \textbf{Q}_k^T\textbf{J}{} \textbf{A}_k\right) \partial _s{\varvec{\alpha }} \nonumber \\&\qquad +\int _{a}^{b}\partial _s{\hat{\textbf{x}}}^T\left( \textbf{Q}_k^T\textbf{K}_0\textbf{Q}_k\right) \partial _s{\delta \textbf{x}} +\partial _s{\hat{\varvec{\alpha }}}^T\left( \textbf{Q}_k^T\textbf{J}\textbf{A}_k\right) \partial _s{\delta \varvec{\alpha }} \nonumber \\&\qquad -\int _{a}^{b} {\hat{\varvec{\alpha }}}^T\left( \textbf{B}_k \textbf{Q}_k^T\textbf{K}_0\textbf{Q}_k\right) \partial _s{\delta \textbf{x}} + {\delta \hat{\varvec{\alpha }}}^T\left( \textbf{B}_k \textbf{Q}_k^T\textbf{K}_0\textbf{Q}_k\right) \partial _s\textbf{x}\nonumber \\&\qquad +\lambda _g \int _{a}^{b}{\delta \textbf{x}}^T\textbf{x} -\frac{1}{\lambda _f}\int _{a}^{b}{\delta \hat{\textbf{x}}}^T{\hat{\textbf{x}}} -\frac{1}{\lambda _l}\int _{a}^{b}{\delta \hat{\varvec{\alpha }}}^T{\hat{\varvec{\alpha }}} \nonumber \\&\qquad +\int _{a}^{b}\partial _s{\delta {\varvec{\alpha }}}^T\textbf{A}_k^T{\varvec{\varLambda }}_\kappa \textbf{A}_k \partial _s{\varvec{\alpha }} +\frac{1}{\varDelta t^2} \int _{a}^{b}{\delta {\varvec{\alpha }}}^T\textbf{A}_k^T{\varvec{\varLambda }}_\omega \textbf{A}_k{\varvec{\alpha }}\nonumber \\&\qquad +\lambda _d\int _a^b \partial _s{\delta \textbf{x}}^T\partial _s\textbf{x} +\lambda _d\int _a^b \left( \delta \beta \textbf{d}_\beta ^k+ \delta \gamma \textbf{d}_\gamma ^k\right) ^T\partial _s\textbf{x}\nonumber \\&\qquad - \partial _s{\delta \textbf{x}}^T\left( \beta \textbf{d}_\beta ^k+ \gamma \textbf{d}_\gamma ^k\right) \nonumber \\&\qquad -\lambda _d\int _a^b \left( \delta \beta \textbf{d}_\beta ^k+ \delta \gamma \textbf{d}_\gamma ^k\right) ^T \left( \beta \textbf{d}_\beta ^k+ \gamma \textbf{d}_\gamma ^k\right) \nonumber \\&\quad = \lambda _g \int _{a}^{b}\delta \textbf{x}^T\textbf{x}_g +\int _{a}^{b}j^{-1}K_{33}\partial _s{\delta \hat{\textbf{x}}}^T\textbf{q}_3\nonumber \\&\qquad -\int _{a}^{b} j^{-1}K_{33}{\delta \hat{\varvec{\alpha }}}^T\textbf{B}_k \textbf{q}_3^k\nonumber \\&\qquad +\lambda _d\int _a^b \partial _s{\delta \textbf{x}}^T\left( \textbf{d}_3^k -\beta ^k\textbf{d}_\beta ^k-\gamma ^k\textbf{d}_\gamma ^k\right) \nonumber \\&\qquad +\lambda _d\int _a^b \left( \delta \beta \textbf{d}_\beta ^k+ \delta \gamma \textbf{d}_\gamma ^k\right) \textbf{d}_3^k\nonumber \\&\qquad -\lambda _d\int _a^b \left( \delta \beta \textbf{d}_\beta ^k+ \delta \gamma \textbf{d}_\gamma ^k\right) ^T \left( \beta ^k\textbf{d}_\beta ^k+ \gamma ^k\textbf{d}_\gamma ^k\right) \nonumber \\&\qquad +\frac{1}{\varDelta t^2} \int _{a}^{b}{\delta {\varvec{\alpha }}}^T\textbf{A}_k^T{\varvec{\varLambda }}_\omega \textbf{A}_k{\varvec{\alpha }}_{n-1}. \end{aligned}$$
(61)

Remark 10

For the above fixed-point iteration, a relaxation parameter \(0\le w\le 1\) is introduced to stabilise the algorithm: instead of directly updating \(\left( \textbf{x}^k, {\varvec{\alpha }}^k\right) \) after solving (61), a weighted \(w\left( \textbf{x}^k, {\varvec{\alpha }}^k\right) + (1-w)\left( \textbf{x}^{k-1}, {\varvec{\alpha }}^{k-1}\right) \) is adopted. We use \(w=0.5\) for all our simulations.

Fig. 5
figure 5

Displacement and rotation at the tip of the cantilever beam

5 Numerical tests

We first validate the formulation (40) for simulation of a forward problem with a time discretisation scheme introduced in Appendix A, and then apply the optimal control formulation (61) to data from a forward simulation, in which case we have the ground truth rotations of the local frames and a quantitative comparison can be performed. Finally, we apply the optimal control method to data from laboratory experiments and infer the frames of rotation. All the numerical tests are implemented using open-source library FreeFem++ [42]. For code and results, see Data Availability in declarations section below.

5.1 Forward simulation of a cantilever beam

We consider a cantilever beam with a dynamic load and reproduce the result presented in [12]. The beam’s length \(L=1 m\) (which is a constant for this test due to a small deformation), with density \(\rho =2.73\times 10^3 kg/m^3\), Young’s modulus \(E=7.10\times 10^{10} Pa\) and shear modulus \(G=2.69\times 10^{10} Pa\). The cross section of the beam is a rectangle with width \(a=0.06 m\) and height \(h=0.04 m\), and the numerical shear correction factor for this cross section is set to be \(k=0.833\). The moment of inertia is \(I_{11}=\rho ab^3/12\), \(I_{22}=\rho a^3b/12\) and \(I_{33}=I_{11}+I_{22}\). The stiffness for the torque in (15) is \(J_{11}=E ab^3/12\), \(J_{22}=E a^3b/12\) and \(J_{33}=G (ab^3+a^3b)/12\). The external force, corresponding to \(\textbf{f}\) in equation (40), is expressed as:

$$\begin{aligned}{} & {} f_1^g(s,t)=f_2^g(s,t)=2\sin (\pi s)\sin (8\omega _0 t) kNm^{-1}, \nonumber \\{} & {} f_3^g(s,t)=0, \end{aligned}$$
(62)

with the natural frequency of the system \(\omega _0=207.0236 s^{-1}\).

The beam is discretised by 100 segments and the total computational time \(T=0.06\) is divided into 1000 steps. The displacement and rotation at the end of the beam are plotted in Figure 5, which quantitatively reproduces the results (Fig. 6 and Fig. 7) in [12].

5.2 Optimal control using data from a forward simulation

In this example, we modify the previous test of the cantilever beam so that it has similar material properties to C. elegans and undergoes a large deformation. As discussed in Section 3, we also neglect the inertia terms in equation (40) to model C. elegans locomotion. Our motivation is to generate a dataset to validate the proposed optimal control method. The new beam now has an initial length \(L_0=10^{-3} m\) and circular cross section with radius \(r_0=L_0/40 m\) [5]. The numerical correction factor for a circular cross section is taken to be \(k=4/3\) [34]. We adopt values for its Young’s modulus \(E=1.1\times 10^5 Pa\) and shear modulus \(G=5.0\times 10^4 Pa\) [5].

Fig. 6
figure 6

Diagram of a cantilever beam

Before generating the dataset by applying a complicated external force and torque, we first apply a simple force F at the end of the beam, as shown in Figure 6, to validate the approach against the analytical solution based on the Timoshenko beam theory [34]:

$$\begin{aligned} y=-\frac{F}{kAG}s-\frac{FL_0}{2J_{11}}s^2+\frac{F}{6J_{11}}s^3. \end{aligned}$$
(63)

The rod is discretised by 100 segments (for which the mesh has converged), and we compute the deflection of the rod by solving the primal equation (52). It can be seen from Figure 7 that the result of the Cosserat model agrees very well with the prediction of the Timoshenko theory for up to \(10\%\) deflection of the rod.

Fig. 7
figure 7

Deflection of a cantilever beam under a concentrated force at the end of the beam. \(F=5\times 10^{-9} N\) for the case of \(5\%\) deflection and \(F=10^{-8} N\) for \(10\%\) deflection

The first test of the proposed control algorithm is to use a dataset generated by a distributed force along the rod:

$$\begin{aligned} f_2^l=-F_{\max }\left( e^{4z/L_0}-1\right) /3, \quad f_1^l=f_3^l=0, \end{aligned}$$
(64)

with \(F_{\max }=10^{-4}\). The beam undergoes a large deformation as shown in Figure 8 (left); meanwhile a curvature \(\kappa _1^l\) (see formula (28)) is also generated along the rod. Using the proposed control formulation in Section 4 and control parameters of \(\lambda _f=1\), \(\lambda _l=10^{-10}\), \(\lambda _d=10^{-6}\), \(\lambda _{\kappa _3}=10^{-10}\) (\(\lambda _{\kappa _1}=\lambda _{\kappa _2}=0\)) and \({\varvec{\varLambda }}_\omega =\textbf{0}\), both the position and the curvature can be recovered accurately as shown in Figure 8. Note that all the other components of the position vector and generalised curvature are zero although they are not presented here. Before moving to other test cases, let us test the convergence of the objective \(\Vert \textbf{x}-\textbf{x}_g\Vert \), the output curvature \(\Vert {\varvec{\kappa }}^l-{\varvec{\kappa }}^l_{f}\Vert \) (\({\varvec{\kappa }}^l_{f}\) is from the forward simulation), the tangential direction \(\Vert \partial _s\textbf{x}-\textbf{d}_3\Vert \) as well as the algorithm itself measured by the relative error of \(\left( \textbf{x}, {\varvec{\alpha }}\right) \) between the current and previous fixed-point iterations, with regards to the control parameters \(\lambda _f\), \(\lambda _l\) and \(\lambda _d\).

Fig. 8
figure 8

Comparison of the position vector (left) and curvature (right) between forward and backward simulations for the first test case in 5.2

The main findings are summarised as follows: (1) If we only use \(\textbf{f}\) (notice that this does not mean \(\lambda _l=0\); we have to remove \(\textbf{l}\) term in (61)) as the control, the proposed algorithm struggles to converge no matter how we play with the other parameters. Therefore, the regularization \(\lambda _l-\) term in (41) does play an important stabilisation role, although \(\textbf{l}=0\) when we generate the dataset; (2) If we plot the above convergence measures as shown in Figure 9 (\(\lambda _f=1\), \(\lambda _d=10^{-6}\) and \(\lambda _{\kappa _3}=10^{-20}\)) and vary \(\lambda _l\) from magnitude \(10^{-20}\) to \(10^{3}\), we find that these convergence curves are exactly the same. The only difference we notice is that the magnitude of the adjoint variable \(\delta {\varvec{\alpha }}\) varies correspondingly from \(10^{-13}\) to \(10^{10}\) so that the control torque \(\textbf{l}=\delta {\varvec{\alpha }}/\lambda _l\) always has a magnitude of \(10^{-7}\). Therefore, the algorithm (at least for this test) is not sensitive to the regularisation parameter \(\lambda _l\), although it is required to stabilise the algorithm as pointed out above; (3) The proposed algorithm can converge stably with the regularisation parameter \(\lambda _d\) varying from \(10^1\) to \(10^{-10}\), and the convergence of \(\Vert {\varvec{\kappa }}^l-{\varvec{\kappa }}^l_{f}\Vert \) and \(\Vert \partial _s\textbf{x}-\textbf{d}_3\Vert \) is plotted in Figure 10 with \(\lambda _f=1\), \(\lambda _l=10^{-10}\) and \(\lambda _{\kappa _3}=10^{-20}\); (4) The proposed algorithm converges for a range of parameters \(\lambda _f\) from \(10^{-10}\) to \(10^6\). Despite the steady convergence of the algorithm, the value of \(\lambda _f\) must be sufficiently small for the objectives to be sufficiently reduced. To demonstrate this, convergence plots for the relevant quantities given two extreme values (\(10^{-10}\) and \(10^6\)) of \(\lambda _f\), are compared in Figure 11; (5) The purpose of the \(\lambda _{\kappa _3}\) parameter is to control twist along the rod. In this test, the algorithm can converge stably and the curvature error can be reduced sufficiently with \(\lambda _{\kappa _3}\) from 1 to \(10^{-15}\).

We next consider a dataset generated by the following force and torque, which creates both curvature and torsion along the rod as shown in Figure 12.

$$\begin{aligned} f_1^l=-2.2F_{\max }, \quad f_2^l=f_3^l=0, \quad l_1^l=10^{-7}, \quad l_2^l=l_3^l=0. \end{aligned}$$
(65)

Again we test all the regularisation parameters systematically, and our findings are summarised as follows: (1) \(\lambda _l\) is necessary for stability, and it can be chosen from a magnitude of \(10^{-30}\) to \(10^{-2}\) (based on a test with \(\lambda _f=1\), \(\lambda _d=10^{-6}\) and \(\lambda _{\kappa _3}=10^{-20}\)); (2) the recommended value for \(\lambda _d\) ranges between \(10^{-10}\) and \(10^3\), otherwise the objective cannot be sufficiently reduced when \(\lambda _d>10^3\), or the algorithm cannot converge when \(\lambda _d<10^{-10}\) (based on a test with \(\lambda _f=1\), \(\lambda _l=10^{-6}\) and \(\lambda _{\kappa _3}=10^{-20}\)); (3) the proposed algorithm can converge steadily for a range of \(\lambda _f\) from \(10^{-30}\) to \(10^2\), and the suggested values are \(\lambda _f<1\), otherwise the objective cannot be reduced sufficiently (based on test of \(\lambda _l=10^{-6}\), \(\lambda _d=10^{-6}\) and \(\lambda _{\kappa _3}=10^{-20}\)); (4) The algorithm converges stably with \(\lambda _{\kappa _3}\) for magnitude of \(10^{-15}\) to 1, but the recommended value is \(>10^{-25}\) otherwise a too large \(\kappa _3^l\) could be generated. A convergence of relevant quantities with a specific parameter set is plotted in Figure 13. The comparison of the position and curvature between the forward and backward computations are displayed in Figure 12 and 14 respectively. It can be seen that the positions between the forward and backward simulations match very well along the rod, and the curvatures also match well except the end where the Dirichlet boundary condition is applied.

Fig. 9
figure 9

Log-log plot of the convergence of relevant quantities as functions of the number of iterations for the first test case in 5.2

Fig. 10
figure 10

Convergence of \(\Vert {\varvec{\kappa }}^l-{\varvec{\kappa }}^l_{f}\Vert \) (red) and \(\Vert \partial _s\textbf{x}-\textbf{d}_3\Vert \) (blue) in terms of regularisation parameter \(\lambda _d\) for the first test case in 5.2

Fig. 11
figure 11

Comparison of convergence between two extreme values of \(\lambda _f\) for the first test case in 5.2

Fig. 12
figure 12

Comparison of the position vector between forward and backward simulations using (65) for the second test case in 5.2

Fig. 13
figure 13

Convergence of different measures with \(\lambda _f=10^{-2}\), \(\lambda _l=10^{-6}\), \(\lambda _d=10^{-6}\) and \(\lambda _\kappa =10^{-20}\) for the second test case in 5.2

Remark 11

We find that non-trivial forward data, involving large bend and torsion for example, is not easy to generate, because it is not straightforward to provide or design a force \(\textbf{f}\) or torque \(\textbf{l}\) so that the forward problem can converge easily. While once a dataset is given, the control problem is easy to converge – converging to the same position vector \({ \textbf{x}}\) and rotation \({\varvec{\alpha }}\) (as the forward simulation results) even with a different control force and torque. This can be understood and by noting that we do not expect that the force and torque (producing the same \({ \textbf{x}}\) and \({\varvec{\alpha }}\)) are unique. As an example, we show the control force and torque for the previous test in Figure 15, from which it can be seen that the magnitude is similar to the designed one in (65), but the distribution is different.

Fig. 14
figure 14

Comparison of the curvature between forward and backward simulations for the second test case in 5.2

Fig. 15
figure 15

Control force (left) and torque (right) for the second test case in 5.2

6 Reconstruction of C. elegans locomotion based on experimental data

We tested the proposed method on three examples of C. elegans locomotion in 3D volumes [72]. The data represent the body-midlines that were reconstructed from microscopy-video footage of freely and spontaneously moving worms that were immersed in different fluids. We present one test case in this section and all the three tests (including the dataset, FreeFem++ code and simulation results) can be found from public GitHub repository: https://github.com/yongxingwang. The first test case consists of a sequence of 1160 time frames with a sampling interval of 0.04s (as inertia is not considered in our model and the time step only appears in the regularisation term in (41)), and each reconstructed body centerline consists of 128 discrete three-dimensional spatial points. These examples are of interest due to the three-dimensional postures and motion of the swimmer: in this clip, the worm exhibits large bend, large torsion and moves forward and backward in a three-dimensional space for about 46s. The physical body of the worm is modelled by a cylindrical Cosserat rod with a circular cross section of initial radius \(2\times 10^{-5} m\), Young’s modulus \(E=1.1\times 10^5 Pa\) and shear modulus \(E/(1+\nu )/2 Pa\) with \(\nu =0.4\) [5, 20]. Four typical postures of the worm are shown in Figure 16: the worm initially moves from the right to the left and starts a manoeuvre to reverse its motion at around the \(400^{th}\) time frame; after another 350 steps, the worm suddenly bends to resemble a capital \(\varOmega \) (left-bottom in Figure 16) and moves to the right.

To construct the first frame, we set \(\alpha _a=0\) and \({\varvec{\varLambda }}_\omega =0\); and from the second frame, we use a non-zero \({\varvec{\varLambda }}_\omega \) (at least non-zero \(\lambda _{\omega _3}\) for the sake of convergence) without any Dirichlet data. Three components of generalised curvature are plotted along the worm’s body for all the time frames in a two-dimensional plane as shown in Figure 17, from which a bending (\(\kappa _1^l\) and \(\kappa _2^l\)) wave can be seen propagating from the worm’s head to the tail; the twisting (\(\kappa _3^l\)) wave is not obvious but some twisting can still be observed. The propagated wave is consistent with the moving direction of the worm as shown in Figure 16 and analysed in the above.

Fig. 16
figure 16

Four typical postures of C. elegans. The arrows show the local frames and the colourful one is \(\textbf{d}_3\) pointing from the worm’s head to tail, with colour showing the magnitude of generalised curvature. (Color figure online)

Starting from a converged parameter set as shown in Table 1 (for the results in Figures 16 and 17), with the converged objectives in (41) being shown in Figure 18, we vary these parameters, study the convergence of the algorithm and compare corresponding results in the following. Notice that this set of parameters has the minimal non-zero parameters to make sure the proposed algorithm can converge (please refer to Remark 5 and 6 and Section 3.4 for explanations).

Parameter \(\lambda _f\): with other parameters frozen, the proposed algorithm converges stably for \(\lambda _f\) from \(10^{-20}\) to 10; the fixed-point iteration becomes slower for larger \(\lambda _f\). We find that all the objectives stay the same except the control \(\textbf{f}\) (see Figure 19).

Parameter \(\lambda _l\): we then keep \(\lambda _f=10^{-20}\) and the proposed algorithm still converges stably with the magnitude of \(\lambda _l\) from \(10^{-10}\) to 1. We observe that all the objectives are still almost the same except the control \(\textbf{l}\) as shown in Figure 19.

Parameter \(\lambda _d\): based on the above two tests, we realise that the total objective function (41) is dominated by the \(\lambda _d-\)term. With other parameters frozen, we find that the convergence range for \(\lambda _d\) is approximately between \(10^{-2}\) and \(10^3\). A comparison of converged objectives between Parameter-0 in Table 1 and its variation case with \(\lambda _d=10^{-2}\) (reduced from \(\lambda _d=10^2\)) is plotted in Figure 20, from which it can be seen that (i) the real objective \(\Vert \textbf{x}-\textbf{x}_g\Vert /\Vert \textbf{x}_g\Vert \) is reduced by two orders of magnitude, with oscillations for some frames which is expected for such a small regularisation parameter; (ii) the \(\lambda _d-\)term increases as its regularisation parameter decreases from \(\lambda _d=10^2\) to \(\lambda _d=10^{-2}\). We notice that the magnitude of \(\Vert \partial _s\textbf{x}-\textbf{d}_3\Vert /\Vert \textbf{d}_3\Vert \) increases to \(10^{-1}\) for some frames, which means \(\textbf{d}_3\) detaches from the tangential \(\partial _s\textbf{x}\) of the centreline 10%. For example, Figure 21 shows frame number 700 where the normal direction \(\textbf{d}_3\) of the cross section detaches from the tangential direction \(\partial _s\textbf{x}\) of the centreline; (iii) none of the other objective terms show a significant change except the control \(\textbf{f}\) which varies according to \(\Vert \textbf{x}-\textbf{x}_g\Vert /\Vert \textbf{x}_g\Vert \) as expected.

Table 1 Parameter-0: minimal non-zero parameters for the sake of convergence of the proposed algorithm

Remark 12

The detachment of \(\textbf{d}_3\) from \(\partial _s\textbf{x}\) is an important feature of Cosserat rods. Otherwise, the Cosserat rod approaches to the Kirchhoff rod (if the deformation scaler \(j=1\), which is true for our case study system of C. elegans: we observe that \(|j-1|<10^{-3}\) always holds numerically), in which case it is assumed that \(\textbf{d}_3=\partial _s\textbf{x}\) [31].

Fig. 17
figure 17

Local curvature \(\kappa _1^l, \kappa _2^l, \kappa _3^l\) (from top to bottom) along the C. elegans body centerline (vertical axis: 0 = head, 127 = tail) as a function of time (horizontal axis), as computed from data (clip 1) for all the time frames. \(\lambda _f=10^{-3}\), \(\lambda _l=10^{-6}\), \(\lambda _d=10^{2}\), \({\varvec{\varLambda }}_\kappa =\textbf{0}\) and \({\varvec{\varLambda }}_\omega =\text {diag}\left( 0,0,10^{-20}\right) \)

Fig. 18
figure 18

Objective terms in (41) as a function of time, using \(\lambda _f=10^{-3}\), \(\lambda _l=10^{-6}\), \(\lambda _d=10^{2}\), \({\varvec{\varLambda }}_\kappa =\textbf{0}\) and \({\varvec{\varLambda }}_\omega =\text {diag}\left( 0,0,10^{-20}\right) \)

Fig. 19
figure 19

Objective terms in (41) as a function of time; solid lines (—) are the initial parameter set as shown in Figure 18 and dashed lines (- - -) are a variation of \(\lambda _f\) and \(\lambda _d\) in the initial parameter set with \(\lambda _f=10^{-20}\) and \(\lambda _l=10^{-10}\)

Fig. 20
figure 20

Objective terms in (41) as a function of time; solid lines (—) are the initial parameter set as shown in Figure 18 and dashed lines (- - -) are a variation of \(\lambda _d\) in the initial parameter set with \(\lambda _d=10^{-2}\)

Remark 13

In keeping \(\lambda _d=10^{-2}\) (which now cannot dominate the total objective function of (41)) and varying \(\lambda _f\) and \(\lambda _l\), we again can observe a variation of other objective terms although we would not present all these tests here. However, too small regularisation parameters can cause stability issues as we have already seen when reducing \(\lambda _d\) from \(10^2\) to \(10^{-2}\) although the algorithm still converged.

The parameter \({\varvec{\varLambda }}_\kappa \) provides a constraint of the curvature along the worm’s body, which allows us to consider the anatomical muscle structure of C. elegans as explained in Section 3.4. Based on Parameter-0, we now simply choose \({\varvec{\varLambda }}_\kappa =\text {diag}\left( 0,0,10^{-20}\right) \) to restrict the twist motion of the worm, and the generalised curvature is plotted in Figure 23. Comparing with Figure 17, we can see that not only does the magnitude of \(\kappa _3^l\) dramatically decrease, but a clear twisting wave also appears along the worm. In addition, there is also an influence on the first two components \(\kappa _1^l\) and \(\kappa _2^l\): the wave propagation is clearer although the curvature magnitudes are almost the same as the case of \({\varvec{\varLambda }}_\kappa =\textbf{0}\). We also notice that the reversing manoeuvre (starting at around frame 700) becomes more distinct: \(\kappa _3^l\) is much larger near the worm’s head at frame 700 than elsewhere or any other frames, as can be observed from Figure 22 and 23. Similarly, using non-zero \(\lambda _{\kappa _1}\) or \(\lambda _{\kappa _2}\) would allow us to favour bending in the dorsal-ventral directions which is consistent with the C. elegans neuromusculature.

Fig. 21
figure 21

Frame NO. 700 using \(\lambda _d=10^{-2}\), where the normal direction \(\textbf{d}_3\) of the cross section detaches from the tangential direction \(\partial _s\textbf{x}\) of the centreline: \(\textbf{d}_3\) in white colour and \(\partial _s\textbf{x}\) is colourful showing the magnitude of the generalised curvature. (Color figure online)

Fig. 22
figure 22

Frame NO. 700 using \({\varvec{\varLambda }}_\kappa =\text {diag}\left( 0,0,10^{-20}\right) \), where the worm undergoes a strong twist at its head: the magnitude of \(\kappa _3^l\) is much larger at the worm’s head than other place along the body

Fig. 23
figure 23

Local curvature \(\kappa _1^l, \kappa _2^l, \kappa _3^l\) (from top to bottom) along the C. elegans body centerline (vertical axis: 0 = head, 127 = tail) as a function of time (horizontal axis), as computed from data (clip 1) for all the time frames. \(\lambda _f=10^{-3}\), \(\lambda _l=10^{-6}\), \(\lambda _d=10^{2}\), \({\varvec{\varLambda }}_\kappa =\text {diag}\left( 0,0,10^{-20}\right) \) and \({\varvec{\varLambda }}_\omega =\text {diag}\left( 0,0,10^{-20}\right) \)

\({\varvec{\varLambda }}_\omega \) is used to model the internal friction of the worm which is necessary from the second time frame. For the sake of convergence of our algorithm, only \(\lambda _{\omega _3}\) is required. If we increase \({\varvec{\varLambda }}_\omega \) in Parameter-0 from \(\text {diag}\left( 0,0,10^{-20}\right) \) to some value less than \(\text {diag}\left( 1,1,1\right) \), the proposed algorithm converges stably and the angular velocity stays almost the same as shown in Figure 24. However, there is a big change in the curvature (see Figure 24 and Figure 25): wave propagation is no longer apparent because larger \({\varvec{\varLambda }}_\omega \) tends to keep the rotation angles (along the body and overtime) the same in time, consequently the space derivative (curvature) along the worm’s body does not change significantly in time.

Fig. 24
figure 24

Converged curvature and angular velocity as a function of time, using \(\lambda _f=10^{-3}\), \(\lambda _l=10^{-6}\), \(\lambda _d=10^{2}\), \({\varvec{\varLambda }}_\kappa =\textbf{0}\) and \({\varvec{\varLambda }}_\omega =\text {diag}\left( 0,0,10^{-10}\right) \)

Fig. 25
figure 25

Local curvature \(\kappa _3^l\) along the C. elegans body centerline (vertical axis: 0 = head, 127 = tail) as a function of time (horizontal axis). \(\lambda _f=10^{-3}\), \(\lambda _l=10^{-6}\), \(\lambda _d=10^{2}\), \({\varvec{\varLambda }}_\kappa =\text {diag}\left( 0,0,10^{-20}\right) \) and \({\varvec{\varLambda }}_\omega =\text {diag}\left( 0,0,10^{-10}\right) \)

7 Conclusion and discussion

This paper presents three contributions: the forward formulation of a Cosserat rod, the optimal control method and the reconstruction of C. elegans locomotion.

The forward formulation of Cosserat rod is developed from [12], in which the Cosserat rod is described by three components of the position vector (xyz) and three components of the rotation vector \((\alpha , \beta , \gamma )\). We derive the angular velocity and generalised curvature using a new method in Section 2.3 and rewrite all the control equations in a matrix-vector format; in addition, we consider dilation of the cross section of the rod by differentiation of the reference and current arc lengths and derivation of the incompressibility condition in Section 2.4. We define a forward problem: to solve for the position vector and rotation vector given external forces and torques. We have reproduced the numerical examples in [12] and found that this formulation is robust and convenient for analysis of complex dynamic behaviour of slender rods.

A well-posed inverse problem may be solving for external forces and torques, given both the position vector and rotation vector. However, accessing both position and rotational information may not be practical. For example, for a biological worm moving freely in a fluid environment, it is difficult to measure the worm’s local orientation (rotation vector) while its centreline (position vector in the global frame) can be reconstructed from video footage [72]. A complementary problem that may be tackled with an analogous approach considers a robotic worm exploring an unknown space. Given sufficient local body sensors, the body posture (including bending and twisting) would be reliably detected by such a robot. However, in the absence of external location data, the robot may lack positional information. Motivated by these biological and engineering problems, we consider an ill-posed inverse problem which solves for rotation vector, external forces and torques given only the position vector. We present a robust and efficient optimal control method to solve this inverse problem: the objective is to minimise the discrepancy between the position vector and a given centreline of the Cosserat rod, and the control variables are the external forces and torques, with regularisations of the rotation vector. The regularisation terms provide constraints of the rotation vector so that the inverse problem is solvable. We have tested the proposed optimal control formulation using data from forward simulations and shown that the rotation vector can be accurately computed with appropriate and controllable regularisation parameters.

The proposed optimal control is applied to reconstruction of C. elegans locomotion based upon its centreline data from laboratory recordings. The solvability of this challenging inverse problem relies on meaningful regularisation terms. The proposed approach allows us to add different terms conveniently to model C. elegans ’ neuromusculature, and our inverse model is demonstrated to be robust to a range of regularisation parameters. There are five parameters (nine if considering components of \({\varvec{\varLambda }}_\kappa \) and \({\varvec{\varLambda }}_\omega \)) as shown in Table 1, which also indicates the minimally-required non-zero parameters for the sake of convergence of our proposed method.

\(\lambda _f\) and \(\lambda _l\) correspond to the control force \(\textbf{f}\) and torque \(\textbf{l}\) respectively, which can stop \(\Vert \textbf{f}\Vert \) and \(\Vert \textbf{l}\Vert \) becoming infinite and have the effect of stabilising the proposed method. \(\lambda _f\) and \(\lambda _l\) can be robustly chosen from a range of values based on our numerical experiments, without a significant influence on the main outputs such as the centreline \(\textbf{x}\) and curvature \({\varvec{\kappa }}\).

\(\lambda _d\) has a biological and anatomical grounding because it keeps the normal direction \(\textbf{d}_3\) of the worm’s cross section close to the tangential direction \(\partial _s\textbf{x}\) of its body’s centreline. It also has a numerical effect of stabilising the proposed method and differentiating the Cosserat rod and Kirchhoff rod models: larger \(\lambda _d\) tends to force \(\textbf{d}_3\) to be the same as \(\partial _s\textbf{x}\) (approaching the Kirchhoff rod consequently).

\(\lambda _{\omega _3}\) in \({\varvec{\varLambda }}_\omega =\text {diag}\left( \lambda _{\omega _1},\lambda _{\omega _2},\lambda _{\omega _3}\right) \) has a clear numerical purpose, because our method needs it to be non-zero for convergence. The other two components of \({\varvec{\varLambda }}_\omega \) and all the three components of \({\varvec{\varLambda }}_\kappa =\text {diag}\left( \lambda _{\kappa _1},\lambda _{\kappa _2},\lambda _{\kappa _3}\right) \) can be zero. However, setting \({\varvec{\varLambda }}_\omega \) and \({\varvec{\varLambda }}_\kappa \) to non-zero values allows us to model the muscular of C. elegans as pointed out in Section 3.4.

Several interesting topics have been stimulated by this study, which are briefly summarised as follows:

The proposed optimal control formulation is based on a combination of laboratory data and modelling of the worm’s muscle structure. If we can collect the data of at least one cross section’s movement of C. elegans, we can then apply a Dirichlet boundary condition of \({\varvec{\alpha }}\) and use less regularisation terms as commented in Remark 6 (\(\lambda _{\omega _3}\) can be zero then). Measuring the movement of cross sections of a hair-thin C. elegans in laboratory is technically difficult. However, setting a mark and following one cross section may be possible and would provide a possibility to validate the predictions of our inverse model based on the above assumptions.

Having computed the local frames (rotation vectors), we can then formulate these rotations into the objective function, and compute the external force \(\textbf{f}\) and torque \(\textbf{l}\) without regularisation \(\lambda _d-\), \(\lambda _\omega -\), and \(\lambda _\kappa -\) terms. These force and torque terms will provide us C. elegans ’ muscle force quantitatively, which will help us to model and understand its neuromuscular system.

One more interesting topic is modelling time evolution. In this paper, a friction term (\(\lambda _\omega -\) term) is introduced to link different time frames. An alternative approach is to introduce a viscoelastic constitutive model as adopted in [53], which is expected to more appropriate for modelling nematodes locomotion [5].

Another way to formulate the underlining problem is to apply a model for the force \(\textbf{f}\), such as slender body theory [49], then only use \(\textbf{l}\) as a control variable. Hopefully, this would lead to a well-posed problem without additional regularisation terms.