1 Introduction

This work presents a geometrically exact Reissner-Mindlin shell formulation with consistent interpolation of the director field. It is singularity free, path independent and objective as well as numerically efficient and robust. The origins of geometrically non-linear shell formulations can be traced back to [24]. For a historical review of non-linear shell theory in general we refer to [80] or [94] and for the historical context of Reissner’s developments we refer to [64]. A concise summary of geometrically exact shell formulation using stress resultants we refer to [101].

Classical shell theories can be grouped into Kirchhoff–Love and Reissner–Mindlin type models. The former rely on a kinematic description that is based on the midsurface position only, whereas the latter incorporate independent rotations of the director field, thus taking into account transverse shear deformation. Here, the director field is usually associated with the material fibers in normal direction in the undeformed configuration.

The treatment of this non-linear director field is discussed with respect to two major aspects in this paper: first, correct linearization of potential energy and second, feasible interpolation of the director in order to meet the requirements of being singularity-free, path independent and objective. Special attention is devoted to interpolation schemes with higher continuity, for instance using non-uniform rational B-splines (NURBS) as ansatz functions.

Such an inextensional single director shell model leads to a finite element formulation with five degrees of freedom per node. Three of them are the midsurface positions and the remaining two are increments associated with rotational change of the director, e.g., incremental rotations or director increments. Historically, these non-standard degrees of freedom gave rise to several uncertainties concerning a correct or optimal finite element formulation with respect to accuracy and efficiency. Mainly, two questions are crucial:

  • Which is the correct linearization process required to develop a well-defined and consistent iterative solution scheme?

  • Which interpolation scheme yields accurate results and does not violate fundamental properties, such as objectivity?

2 Historical Remarks

Correct linearization to obtain the stiffness matrix and, in particular, symmetry of the stiffness matrix are often discussed issues. In [82], Simo derived a tangent operator in the context of beams that may be unsymmetric away from equilibrium for a potential taking values from \({\mathcal {S}}{\mathcal {O}}(3)\). This is a result of using the second variation as a tool to construct the tangent operator.

Later, Simo [83] concluded that the Hessian (more precisely, the Riemannian Hessian) can be obtained by symmetrizing the unsymmetric second variation. We stress that this procedure is only useful for manifolds that can be classified as compact Lie Groups, which does not apply for the two-dimensional unit sphere. A short note on the history of unsymmetric tangent operators can be found in ([66], Ch. 1.7). Some of the questions concerning linearization were discussed in [53]. There, the authors concluded that a tangent operator can be unsymmetric, if it is constructed from the second variation. But they also pointed out that this construction does not lead to a well-defined operator.Footnote 1 The authors concluded that, therefore, the Hessian is the correct quantity to use as tangent operator since it is a well-defined tensorial quantity. For this Hessian, several works concluded that it needs to be symmetric for a torsion-free connectionFootnote 2, [2, 53, 56, 67, 70, 83, 91].

Furthermore, symmetry of the Riemannian Hessian does also hold during iteration. Nevertheless, symmetry of the tangent operator is still controversially discussed, see e.g. [92]. Therefore, to avoid uncertainties, we present the variation and linearization in detail and try to avoid vagueness. The resulting process of variation and linearization strongly depends on the results from [2]. We also refer to [20] for the interested reader. Using these results, the problem of optimization on non-linear manifolds can be solved elegantly using only Euclidean quantities and projection onto the manifold to simplify the construction of the gradient and the Hessian.

Historically, director rotations in Reissner-Mindlin type shell formulations were first formulated in terms of angle pairs that live in linear spaces in \({\mathbb {R}}^2\). This linearity naturally leads to a symmetric stiffness matrix. The shortcomings of this description is that it contains singularities, which may lead to convergence problems and limit the magnitude of the rotation, see e.g. [18, 23, 45, 63]. It is worth noting that the hairy ball theorem, see ([21], Sect. 2), excludes a singularity-free parametrization over a single domain in \(\mathbb {R}^2\) of the unit sphere \({{\mathcal {S}}}^2\). Therefore, the singularity requires a switch of parametrization in its vicinity. Additionally, these formulations require an involved evaluation of trigonometric functions for the residual and the stiffness matrix, see [63]. A similar history can be observed for Kirchhoff-Love rods, starting from [5], where Euler angles are used to rotate the cross section frame. Earlier, [7] proposed to include the Taylor expansion of the rotations only up to the quadratic term, which leads to a formulation allowing moderate rotations but non-objective results. An alternative solution of the problem is to avoid parametrization of the unit sphere in the first place.

To overcome the singularities of a parametrization, direct interpolation of the director was used in [43]. Here, the update of the nodal directors was done in terms of director increments that are expressed in the midsurface tangent vectors. This formulation leads to an objective and path independent approach. This approach and some of the aforementioned interpolation approaches are compared in [49]. An alternative approach was proposed by Simo and co-workers for Timoshenko beams [82] and for Reissner-Mindlin shells [84]. There, the authors exploited the manifold structure of \({{\mathcal {S}}}{{\mathcal {O}}}(3)\) and \({{\mathcal {S}}}^2\). The Riemannian manifold \({{\mathcal {S}}}{{\mathcal {O}}}(3)\) is the compact three-dimensional Lie Group called special orthogonal group and the Riemannian manifold \({{\mathcal {S}}}^2\) is the two dimensional unit sphere. These can be defined as \({{\mathcal {S}}}{{\mathcal {O}}}(3) = \{ {\textbf {X}}\in {{\,\textrm{Mat}\,}}_3(\mathbb {R})~|~ {\textbf {X}}^T {\textbf {X}}= {\textbf {I}}~\wedge ~ \det {\textbf {X}}=1 \}\) and \({{\mathcal {S}}}^2 = \{ {{\textbf {x}}}\in {{\,\textrm{Vec}\,}}_3(\mathbb {R})~|~ {{\textbf {x}}}^T {{\textbf {x}}}= 1 \}\). Additionally, \({{\,\textrm{Mat}\,}}_n(\mathbb {R})\) and \({{\,\textrm{Vec}\,}}_n(\mathbb {R})\) define all \(n \times n\) matrices and all n-sized vectors, respectively. The interest in \({{\mathcal {S}}}^2\) stems from the fact that the kinematic description of the Reissner-Mindlin model does not contain a stretch of transverse fibers (thickness change in direction of the director).Footnote 3 The approach from Simo and co-workers for the numerical treatment of the Reissner-Mindlin shell model leads to a singularity free formulation in terms of rotational parameters.

Nevertheless, this formulation is path dependent and non-objective due to the interpolation of rotation increments. Moreover, keeping track of fields of history variables at every quadrature point is necessary. [26] pointed out these shortcomings and presented a cure for the case of a non-linear Timoshenko beam formulation. In this formulation, the nodal quantities living in \({{\mathcal {S}}}{{\mathcal {O}}}(3)\) are interpolated, in contrast to the erroneous interpolation of quantities in the tangent space \(T{{\mathcal {S}}}{{\mathcal {O}}}(3)\). This approach yields an objective and path independent formulation. The proposed formula is well-known in computer graphics and is called SLERP (Spherical Linear intERPolation), see [70]. In ([26], Ch. 5b), this concept is extended to higher order polynomials, but it is still restricted to one-dimensional spaces. Therefore, it is only useful for beams. For a summary of different interpolation schemes for beams and their drawbacks we refer to [68].

For the two-dimensional representation in the context of shells, generalization of SLERP does not lead to a satisfying concept. For example, the formulation proposed in [6] generalizes SLERP and leads to an objective and path independent approach but it suffers from a spurious dependency of the computational results on node numbering, see ([6], Eq. 40). Furthermore, since the nodal director components are arguments inside trigonometric functions, evaluation and linearization are expensive. Additionally, the trigonometric function \(\arccos (x)\) (and its derivatives) exhibit poor numerical behavior near \(x=1\). As a result, a perturbation has to be applied for the special case of plane reference states and in general it is not possible for the nodal directors to be parallel.

Recently proposed isogeometric formulations limit the transverse shear to be geometrically linear, thus circumventing the treatment of large rotations [61, 16], where the latter approach inherits this property directly from the formulation of [14, 15]. In [61, 51] the ansatz space needs to be \(C^1\)-continuous between elements due to the presence of second derivatives in the weak form. This necessity is inherited from the Kirchhoff-Love model, as the shear deformation is hierarchically added to the Kirchhoff-Love formulation. Within patches, this continuity constraint can be trivially fulfilled using splines as shape functions. However, \(C^1\)-continuous patch coupling, as in [47], requires special attention and may compromise the elegance of the approach. Nevertheless, the formulations [61] and [51] are path independent and objective.

An isogeometric approach that includes the director rotation in a non-linear fashion has been proposed by [29]. It is based on the non-linear formulation of [86] and thus suffers from similar shortcomings, such as path dependence and non-objectivity inherited from the history fields at every integration point, see ([29], Table A.2.).

Finally, the formulation of [78] has to be mentioned, which inherits objectivity and path independence from the continuous model at the cost of solving a small non-linear minimization problem at every integration point to obtain the interpolated value. This formulation is constructed for Cosserat shell models with drilling rotations, which necessitates interpolation in \({{\mathcal {S}}}{{\mathcal {O}}}(3)\). The resulting finite elements are called geodesic finite elements.

Table 1 summarizes various shell formulations and their properties concerning objectivity, path independence, history variables and singularities.

In view of the aim to construct a formulation that enjoys all desired properties, the following can be concluded: Due to the difficulties arising from an interpolation of incremental rotations (non-objective, path dependent) and the interpolation of angles (singularities), interpolating the director field appears to be the most attractive option. An important issue in this context is the constraint that the director needs to have unit length, which is not naturally satisfied by standard interpolations. In the following we discuss three different approaches to interpolate the director field.

Nodal Finite Elements (NFE)

If the directors have unit length at the nodes only, as it is done in [15, 16, 43, 63], the constraint is violated in the domain. We will refer to this approach by the name Nodal Finite Elements.

In the following, we discuss two recent approaches to satisfy the unit length condition also in the domain. They are deduced from general constructions found in mathematical literature.

Geodesic Finite Elements (GFE)

The first approach is based on the works of Sander [73, 75] and Grohs [34] to generalize the concept of interpolation from vector spaces to manifolds. If such an interpolation is used, the resulting finite elements are called geodesic finite elements (GFE). These finite elements automatically inherit objectivity and path independence of the continuous formulation. This is due to the fact that the interpolation scheme is constructed from a weighted (geodesic) distance measure on the corresponding manifold. Since distances are invariant to rotations by definition, objectivity follows directly. Furthermore, due to the intrinsic nature, the interpolation always stays on the manifold and therefore this approach preserves unit length of the director in the domain. The major drawback is the implicit definition of the interpolation, which involves a non-linear minimization problemFootnote 4 at each integration point. This interpolation scheme is applied on the Reissner-Mindlin shell for the first time in this work.

Projection-Based Finite Elements (PBFE)

The second approach is projection-based interpolation, based on the works of Sprecher [90] and subsequent papers, e.g. Grohs et al. [36], which present a framework for finite elements that interpolate on manifolds by a closest point projection from an embedding space onto the manifold. The finite elements are constructed as in the nodal approach and then projected onto the corresponding manifold.

Luckily, the closest point projection of a vector in \(\mathbb {R}^3\) onto \({{\mathcal {S}}}^2\) has a closed form, namely the trivial normalization of the vector. This interpolation of projecting the vector back onto \({{\mathcal {S}}}^2\) was also used in [85], but only for the reference director field. A more involved version of this approach can be found in [30]. We refer to the resulting finite elements by the name Projection-Based Finite Elements (PBFE) in this paper.

Projection-based finite elements can be regarded as a special case of geodesic finite elements, where the distance measure for the interpolation is the one of the embedding space. For example, for the interpolation on \({{\mathcal {S}}}^2\) the distance measure is the Euclidean distance from \(\mathbb {R}^3\). Due to the construction of the interpolation as a distance measure, objectivity and path independence are inherited from geodesic finite elements.

Additionally, geodesic finite elements and projection-based finite elements are summarized as the group of geometric finite elements. Both interpolations (geodesic and projection-based) yield a path independent and objective discrete problem. The corresponding proofs can be found in ([36], Ch. 1.3) and ([74], Ch. 2.4, Lemma 2.5–2.6).

Table 1 Comparison of geometrically non-linear (Reissner-Mindlin) finite element shell formulations

3 Scope of this Work

We use the projection-based finite element concept and the linearization results from [2, 3] to construct a sound shell formulation. The underlying shell theory is identical to the one put forward by Simo and Fox [84]. In summary, the presented shell finite element formulation enjoys the following desirable features:

  • It inherits the objectivity of the continuous model.

  • The magnitude of total rotations is not limited.

  • The unit length constraint of the interpolated director is satisfied in the domain.

  • As no history fields at the integration points are introduced, interpolation of the director is path independent.

  • The formulation goes without trigonometric functions. This simplifies linearization and results in a compact and fast implementation.

  • No singularities occur, as parametrization of the unit sphere is avoided.

  • The stiffness matrix is symmetric without neglecting any terms and without applying symmetrization procedures.

  • The resulting element vectors and matrices are invariant to node numbering.

Any of the above features can be found in shell formulations in the literature. However, to the authors’ best knowledge, there is no shell finite element formulation that enjoys all of them.

Beyond presentation of a novel shell finite element formulation, the following aspects are covered in the paper:

  • It is shown that the unit length of the director in the domain is crucial for using higher order interpolation schemes to circumvent a degeneration of the convergence order.

  • We show the superiority of the radial return normalization (projection-based retraction) for the nodal director instead of using the exponential map.

  • The non-linear space \({{\mathcal {S}}}^2\), in which the director is defined, needs special attention. Therefore, the consistent linearization process is presented in detail. A symmetric tangent operator is consistently derived rather than symmetrizing an initially unsymmetric result.

  • We show that the consistent linearization process yields an additional contribution to the stiffness matrix and provide a physical interpretation of this term.

  • We compare three update schemes for the nodal director tangent base, where two can be found in the literature and one is newly introduced.

  • We show the advantageous features of projection-based finite elements for the Reissner-Mindlin model.

The paper is organized as follows: In Sect. 4 we present the Reissner-Mindlin shell model up to the total potential energy in the total Lagrangian setting. After this, we deduce the construction of the correct gradient (residual, internal forces) and the correct Hessian (stiffness matrix) for an arbitrary energy living on \(M=\mathbb {R}^3 \times {{\mathcal {S}}}^2\) in Sect. 5. We apply these results onto the Reissner-Mindlin shell problem where only the Euclidean partial derivatives, i.e., the variation and linearization are needed to construct the correct Riemannian gradient and Riemannian Hessian following the results from [2, 3]. We obtain these Euclidean quantities in Sect. 5.3.

After the introduction of the correct operators, we establish our consistent interpolation in Sect. 6. In particular, we use projection-based finite elements (PBFE) for our shell discretization living in the space \(M=\mathbb {R}^3 \times {{\mathcal {S}}}^2\). Here, the midsurface displacement lives in \(\mathbb {R}^3\) and the unit director lives on \({{\mathcal {S}}}^2\). With this at hand, we explore several numerical examples to compare the presented approach with geodesic finite elements and nodal finite elements and conclude the superior behavior of the projection-based finite elements, especially in the context of higher order ansatz functions. This is done in Sect. 10. Furthermore, we investigate in our numerical examples the usage of the mixed interpolation point (MIP) technique to improve the convergence behavior, as introduced in [52]. Additionally, we compare the geometric meaning of the three interpolations (NFE, PBFE, GFE) in 10.2.

The notation used in the present work is a mixture of the one used in the classical papers of Simo and co-workers and the one of [36] and [2, 74]. Moreover, some quantities are newly introduced, see Table 2.

Table 2 Notation and definitions

4 Non-linear Shell Theory Including Transverse Shear Deformation

4.1 Geometry and Kinematics

First, we define the geometric description of a shell structure as

$$\begin{aligned} {{\mathcal {M}}}:= \{ ({\varvec{\varphi }},{{\textbf {t}}}): {\varOmega }\rightarrow \mathbb {R}^3 \times {{\mathcal {S}}}^2 =M \}. \end{aligned}$$
(4.1)

Therefore, \({{\mathcal {M}}}\) represents the set of functions mapping from \({\varOmega }\) onto \(M = \mathbb {R}^3 \times {{\mathcal {S}}}^2\). The set \({\varOmega }\subset \mathbb {R}^2\) is the two-dimensional parameter space with the points

$$\begin{aligned} {\varvec{\xi }}= \xi ^1 {\tilde{\textbf {{\textbf {E}}}}}_1+\xi ^2 {\tilde{\textbf {{\textbf {E}}}}}_2\,, \end{aligned}$$
(4.2)

where \({\tilde{\textbf {{\textbf {E}}}}}_{{\alpha }}\) denote Cartesian base vectors. Furthermore, we introduce the map \({\varvec{\varphi }}: {\varOmega }\rightarrow \mathbb {R}^3\), defining the position vector of the midsurface of the shell, and the so-called director \({{\textbf {t}}}: {\varOmega }\rightarrow {{\mathcal {S}}}^2\), a field of unit vectors which are initially normal to the midsurface. The independent representation of \({\varvec{\varphi }}\) and \({{\textbf {t}}}\) allows the kinematic description of transverse shear deformation and thus realizes a Reissner-Mindlin type model. Using these quantities we can define the stress-free shell body as reference configuration

$$\begin{aligned} \begin{aligned}&{{\mathcal {B}}}_0:=\{ {\textbf {X}}\in \mathbb {R}^3 ~|~ {\textbf {X}}= {\varvec{\varphi }}_0+ \xi ^3 {{\textbf {t}}}_0 \\&\quad \text { with } ({\varvec{\varphi }}_0,{{\textbf {t}}}_0) \in {{\mathcal {M}}}\text { and } \xi ^3 \in [h^-,h^+] \subset \mathbb {R}\} \,. \end{aligned} \end{aligned}$$
(4.3)

here \(h^-\) and \(h^+\) denote top and bottom surface coordinates of the shell and \(h=(h^+-h^-)\) is the shell thickness. Similarly, a configuration at time t is given by

$$\begin{aligned} \begin{aligned}&{{\mathcal {B}}}_t:= \{ {{\textbf {x}}}\in \mathbb {R}^3 ~|~ {{\textbf {x}}}= {\varvec{\varphi }}+ \xi ^3 {{\textbf {t}}}\\&\quad \text { with } ({\varvec{\varphi }},{{\textbf {t}}}) \in {{\mathcal {M}}}\text { and } \xi ^3 \in [h^-,h^+] \subset \mathbb {R}\} \,. \end{aligned} \end{aligned}$$
(4.4)

For the total Lagrangian setting the kinematics are summarized in Fig. 1. With this, the reference and current position of a point of the shell body are given as

$$\begin{aligned} \begin{aligned} {\textbf {X}}&= \hat{\varvec{{\Phi }}}_0(\xi ^1,\xi ^2,\xi ^3)= {\varvec{\varphi }}_0(\xi ^1,\xi ^2)+\xi ^3 {{\textbf {t}}}_0(\xi ^1,\xi ^2), \\ {{\textbf {x}}}&= \hat{\varvec{{\Phi }}}(\xi ^1,\xi ^2,\xi ^3)= {\varvec{\varphi }}(\xi ^1,\xi ^2)+\xi ^3 {{\textbf {t}}}(\xi ^1,\xi ^2), \end{aligned} \end{aligned}$$
(4.5)

where we introduced the maps \(\hat{\varvec{{\Phi }}}: {{\mathcal {A}}}\rightarrow {{\mathcal {B}}}_t\) and \(\hat{\varvec{{\Phi }}}_0: {{\mathcal {A}}}\rightarrow {{\mathcal {B}}}_0\). \({{\mathcal {A}}}={\varOmega }\times [h^-,h^+]\) represents the three-dimensional parameter space, which can be also seen in Fig. 1. The deformation is then defined as a mapping \(\chi _t: {{\mathcal {B}}}_0 \rightarrow {{\mathcal {B}}}_t\) with

$$\begin{aligned} \chi _t := \hat{\varvec{{\Phi }}} \,\circ\, \hat{\varvec{{\Phi }}}^{-1}_0. \end{aligned}$$
(4.6)
Fig. 1
figure 1

Kinematics and mappings of the Reissner-Mindlin shell model. We denote \({{\mathcal {B}}}_0 \) and \( {{\mathcal {B}}}_t \) as the Lagrangian and Eulerian manifold. \({{\mathcal {B}}}^C \) denotes the corresponding midsurfaces. Furthermore, we denote the two-dimensional and three-dimensional parameter space by \({\varOmega }, {{\mathcal {A}}}\). Here, \(\hat{\varvec{{\Phi }}},\hat{\varvec{{\Phi }}}_0 \) and \({\varvec{\chi }}\) denote the non-linear point maps between the spaces. The standard Cartesian bases associated with \({{\mathcal {B}}}_0,{{\mathcal {B}}}_t \) and \({\varOmega }\) are \(\{{\textbf {E}}_i \}_{i=1,3}, \{{\textbf {e}}_i \}_{i=1,3} \) and \(\{{\tilde{\textbf {{\textbf {E}}}}}_i \}_{i=1,3}\) respectively. \(\{{\textbf {A}}_1,{\textbf {A}}_2,{{\textbf {t}}}_0 \}\) and \(\{{\textbf {a}}_1,{\textbf {a}}_2,{{\textbf {t}}}\}\) are curvilinear co-variant bases at \({\textbf {X}}\) and \({{\textbf {x}}}\), similar to the definitions from [55]

For the base vectors \({{\textbf {g}}}_i \) and \({\textbf {G}}_i \) of both configurations it follows from Eq. (4.5)

$$\begin{aligned} \begin{aligned} \frac{{\partial }{{\textbf {x}}}}{{\partial }\xi ^{\alpha }}&={{\textbf {g}}}_{\alpha }= {\varvec{\varphi }}_{, {\alpha }}+\xi ^3 {{\textbf {t}}}_{, {\alpha }}= {\textbf {a}}_{{\alpha }}+\xi ^3 {{\textbf {t}}}_{ ,{\alpha }},\\ \frac{{\partial }{{\textbf {x}}}}{{\partial }\xi ^3}&= {{\textbf {g}}}_3 = {{\textbf {t}}}, \\ \frac{{\partial }{\textbf {X}}}{{\partial }\xi ^{\alpha }}&={\textbf {G}}_{\alpha }= {\varvec{\varphi }}_{0,{\alpha }}+\xi ^3 {{\textbf {t}}}_{0,{\alpha }}= {\textbf {A}}_{{\alpha }}+\xi ^3 {{\textbf {t}}}_{0,{\alpha }}, \\ \frac{{\partial }{\textbf {X}}}{{\partial }\xi ^3}&= {\textbf {G}}_3 = {\textbf {A}}_3 = {{\textbf {t}}}_0\,, \end{aligned} \end{aligned}$$
(4.7)

where the base vectors of the reference and current midsurfaces \({\textbf {a}}_{\alpha }={\varvec{\varphi }}_{,{\alpha }}\) and \({\textbf {A}}_{\alpha }={\varvec{\varphi }}_{0,{\alpha }}\) are introduced. These relations are summarized in Fig. 1.

In view of the later introduction of the so-called effective stress resultants in Eq. (4.13), we define kinematic tensors based on the reference midsurface basis \( \{{\textbf {A}}_{\alpha },{{\textbf {t}}}_0\} \). The tensor of effective strains

$$\begin{aligned} {\textbf {E}}= E_{ij}{\textbf {A}}^i \otimes {\textbf {A}}^j \end{aligned}$$
(4.8)

is composed of the components of the Green-Lagrange strain tensor,

$$\begin{aligned} \begin{aligned} E_{{\alpha }{\beta }}&= \frac{1}{2}\left( {{\textbf {g}}}_{{\alpha }} \cdot {{\textbf {g}}}_{{\beta }} -{\textbf {G}}_{{\alpha }} \cdot {\textbf {G}}_{{\beta }} \right) \\&= {\varepsilon }_{{\alpha }{\beta }} +2\xi ^3 {\kappa }_{{\alpha }{\beta }} +{(\xi ^3)}^2 \rho _{{\alpha }{\beta }}, \\2 E_{{\alpha }3}&= 2 E_{3{\alpha }}= {{\textbf {g}}}_{{\alpha }} \cdot {{\textbf {g}}}_{3} -{\textbf {G}}_{{\alpha }} \cdot {\textbf {G}}_{3}= {\gamma }_{{\alpha }} \\2 E_{33}&= {{\textbf {g}}}_{3} \cdot {{\textbf {g}}}_{3} -{\textbf {G}}_{3} \cdot {\textbf {G}}_{3} = 0 \end{aligned} \end{aligned}$$
(4.9)

referring to the midsurface metric \({\textbf {A}}^i \otimes {\textbf {A}}^j \). Here, the usual Reissner-Mindlin kinematic assumptions \({{\textbf {t}}}_{,{\alpha }}\cdot {{\textbf {t}}}=0 \) and \({{\textbf {t}}}\cdot {{\textbf {t}}}={{\textbf {t}}}_0\cdot {{\textbf {t}}}_0 = 1 \) apply. The quadratic part \(\rho _{{\alpha }{\beta }}\) of \(E_{{\alpha }{\beta }}\) in \(\xi ^3\) is usually neglected. For the implications of this, see ([22], Chapter 9.1.3, Equation 9.34) or ([22], Chapter 3.7, Annahme A4). The effective membrane strain \({\varvec{\varepsilon }}\), curvature \({\varvec{\kappa }}\) and transverse shear strain \({\varvec{\gamma }}\)Footnote 5, implied by Eq. (4.9), read

$$\begin{aligned} \begin{aligned} {\varvec{\varepsilon }}&={\varepsilon }_{{\alpha }{\beta }} {\textbf {A}}^{\alpha }\otimes {\textbf {A}}^{\beta },&{\varepsilon }_{{\alpha }{\beta }}&= \frac{1}{2}({\textbf {a}}_{{\alpha }} \cdot {\textbf {a}}_{{\beta }}-{\textbf {A}}_{{\alpha }} \cdot {\textbf {A}}_{{\beta }}), \\ {\varvec{\kappa }}&={\kappa }_{{\alpha }{\beta }}{\textbf {A}}^{\alpha }\otimes {\textbf {A}}^{\beta },&{\kappa }_{{\alpha }{\beta }}&=\frac{1}{2} ({\textbf {a}}_{{\alpha }} \cdot {{\textbf {t}}}_{,{\beta }}+ {{\textbf {t}}}_{,{\alpha }} \cdot {\textbf {a}}_{{\beta }} \\&\quad -{\textbf {A}}_{{\alpha }} \cdot {{\textbf {t}}}_{0,{\beta }}- {{\textbf {t}}}_{0,{\alpha }} \cdot {\textbf {A}}_{{\beta }}), \\ {\varvec{\gamma }}&= {\gamma }_{\alpha }{\textbf {A}}^{\alpha },&{\gamma }_{\alpha }&= {\textbf {a}}_{,{\alpha }}\cdot {{\textbf {t}}}-{\textbf {A}}_{,{\alpha }}\cdot {{\textbf {t}}}_0. \end{aligned} \end{aligned}$$
(4.10)

4.2 Continuous Potential Energy of the Reissner-Mindlin Model

In the case of the Reissner-Mindlin shell model, the total potential energy functional depends on the function of the midsurface position \({\varvec{\varphi }}\in {{\mathcal {X}}}(\mathbb {R}^3)\) and the function of the director field \({{\textbf {t}}}\in {{\mathcal {X}}}({{\mathcal {S}}}^2)\), as in Eq. (4.5). Therefore, the functional \( {{\hat{{\varPi }}}}\) takes values from the non-linear manifold \({{\mathcal {M}}}\), which results in some non-trivial considerations concerning mainly the interpolation and linearization of these values. The total potential energy \({{\hat{{\varPi }}}}: {{\mathcal {M}}}\rightarrow \mathbb {R}\) reads

$$\begin{aligned} \begin{aligned} {{\hat{{\varPi }}}}(\hat{\varvec{{\Phi }}})&= \int _{{{\mathcal {B}}}_0} {\bar{\psi }}(E_{ij}({\varvec{\xi }},\xi ^3) ) \,{\textrm {d}}V- {{\hat{{\varPi }}}}^\text {ext}(\hat{\varvec{{\Phi }}}) \\&\approx \int _{{{\mathcal {B}}}_0^C} {{\hat{\psi }}}(\varepsilon _{{\alpha }{\beta }}({\varvec{\xi }}), {\kappa }_{{\alpha }{\beta }}({\varvec{\xi }}),{\gamma }_{\alpha }({\varvec{\xi }})) \,{\textrm {d}}A\\&\quad - {{\hat{{\varPi }}}}^\text {ext}(\hat{\varvec{{\Phi }}})\,, \end{aligned} \end{aligned}$$
(4.11)

where \({{\mathcal {B}}}_0^C\) denotes the shell midsurface and \({{\mathcal {B}}}_0\) is the three-dimensional shell body. Here, \(\hat{\varvec{{\Phi }}} \in {{\mathcal {M}}}={{\mathcal {X}}}(M)\) is an element of the continuous configuration space. Furthermore, \({\bar{\psi }}\) denotes a generic strain energy volume density functional and \({{\hat{\psi }}}\) denotes a generic strain energy midsurface density functional.

We introduce the constitutive relations from the pre-integrated potential \({{\hat{\psi }}}\) resulting from a standard Coleman-Noll procedure as

$$\begin{aligned} {{\tilde{N}}}^{{\alpha }{\beta }} =\frac{{\partial }{{\hat{\psi }}}}{{\partial }\varepsilon _{{\alpha }{\beta }}},\quad {{\tilde{M}}}^{{\alpha }{\beta }} =\frac{{\partial }{{\hat{\psi }}}}{{\partial }{\kappa }_{{\alpha }{\beta }}},\quad {{\tilde{Q}}}^{{\alpha }} =\frac{{\partial }{{\hat{\psi }}}}{{\partial }{\gamma }_{{\alpha }}}. \end{aligned}$$
(4.12)

The corresponding tensor quantities

$$\begin{aligned} \begin{aligned} {\tilde{{\textbf {N}}}}&= {{\tilde{N}}}^{{\alpha }{\beta }}{\textbf {A}}_{\alpha }\otimes {\textbf {A}}_{\beta },\quad {\tilde{{\textbf {M}}}} = {{\tilde{M}}}^{{\alpha }{\beta }}{\textbf {A}}_{\alpha }\otimes {\textbf {A}}_{\beta },\\ {\tilde{{{\textbf {Q}}}}}&= {{\tilde{Q}}}^{{\alpha }}{\textbf {A}}_{\alpha }\end{aligned} \end{aligned}$$
(4.13)

are called second Piola Kirchhoff effective symmetric stress resultants. They do not represent the physical membrane forces, moments and shear forces, but they are energetically conjugate to the the effective strains defined in Eq. (4.10). For an explanation why the energy can be written in terms of the effective symmetric stress resultants, see [84, 19]. In Appendix 1, we show how to obtain the physical Cauchy stress resultants from these quantities. Again, since this energy expression contains values from \(\hat{\varvec{{\Phi }}}\in {{\mathcal {M}}}\), i.e. from \({{\mathcal {S}}}^2\), we have to take special care of its variation and linearization, which is carried out in Chapter 5. Before this, we make consideration about non-linear infinite-dimensional function spaces.

4.3 Dealing with Non-linear Infinite-Dimensional Function Spaces

Variation and linearization of a functional depending on quantities living in continuous vector spaces is a well understood topic in the context of solving PDEs in such spaces. In the case of quantities living in non-linear spaces, the situation becomes more difficult. For instance, in the continuous (infinite-dimensional) case, the important Sobolev space \(W^{1,2}({\varOmega },M)\) does not always possess the structure of a Banach manifold ([35], Ch. 3.2). Linearization and smoothness in these spaces is non-trivial, because the geometric structure can be unknown.

In Fig. 2, this situation corresponds to going down from the continuous potential \({{{\hat{{\varPi }}}}} \) on the left to the continuous weak form \({{{\hat{G}}}}\). The continuous weak form also needs to be linearized and therefore a connection on \({{\mathcal {M}}}\) has to be introduced to arrive at the linearized continuous weak form. This may or may not result in a consistent formulation if one moves from there to the right by discretization.

In order to manage this delicate situation, we restart from the continuous potential at the top left corner of Fig. 2. From there move to the discrete potential by direct discretization, following the red dash-dotted path to the right. Subsequently, there are several paths to develop an iterative solution scheme for a functional potential. Nevertheless, we introduce the Sobolev spaces where the continuous quantities live in. We first introduce the corresponding spaces as \( W^{k,p}({\varOmega },\mathbb {R}^3) \times W^{l,q}({\varOmega },{{\mathcal {S}}}^2) = {\mathbb {W}}_{k,p}^{l,q}({\varOmega },\mathbb {R}^3 \times {{\mathcal {S}}}^2)\) where \({\varvec{\varphi }}\in W^{k,p}({\varOmega },\mathbb {R}^3) \) and \({{\textbf {t}}}\in W^{l,q}({\varOmega },{{\mathcal {S}}}^2) \).

Numerical evidence suggests that only weak first order derivatives \((k=l=1)\) are needed, but we are not aware of any proofs. Lacking a more rigorous choice, we use the space \({\mathbb {W}}_{k,p}^{l,q}({\varOmega },\mathbb {R}^3 \times {{\mathcal {S}}}^2)={\mathbb {W}}_{k,p}^{l,q}({\varOmega },M)\) with \(M= \mathbb {R}^3 \times {{\mathcal {S}}}^2\) as surrogate. For more details we refer to [59]. Our model coincides with the one in Ch. 7.6. Moreover, Corollary 8.1 of [59] states that for a pure bending case at least one minimizer exists for \({\varvec{\varphi }}\in H^2({\varOmega },\mathbb {R}^3) = W^{2,2}({\varOmega },\mathbb {R}^3)\) and \({{\textbf {t}}}\in H^1({\varOmega },{{\mathcal {S}}}^2) = W^{1,2}({\varOmega },{{\mathcal {S}}}^2)\). Unfortunately, this is not useful for our case, since the minimizer corresponds to a deformation of Kirchhoff-Love type. For a theoretical treatment of these discrete and continuous non-linear function spaces of Sect. 4.3, we refer to [35, 40, 41, 42].

5 Discretization, Variation and Linearization

5.1 Dealing with Non-linear Discrete Configuration Spaces

We start in the top left corner of Fig. 2 at the continuous problem represented by the potential \({{\hat{\varPi }}}(\hat{\varvec{{\Phi }}}) \). Next, we discretize the continuous quantities in the potential to end up with the discrete finite element functions in the space \(V_h^{M}({\varOmega }) \subset {\mathbb {W}}_{k,p}^{l,q}({\varOmega },M)\). At this point we can formulate the first-order optimality condition in terms of the discrete test function space \( T_{\varvec{{\Phi }}^{\textrm {h}}}V_{\textrm {h}}^M({\varOmega }) \) by moving down from the discrete potential to the discrete weak form.

Alternatively, and more easily, we may first move to the algebraic settingFootnote 6 with the following procedure: We extract the nodal quantities from the finite element functions via the operator \({{\mathcal {E}}}\) and end up in the algebraic space \(M^n\), where n denotes the number of nodes. Here, the nodal evaluation operator \({{\mathcal {E}}}\) is defined as in [77] after Theorem 2.3. In the next step (now in the top right corner), using the construction of Absil et al. [3], the discrete variation and linearization can be obtained purely in the Euclidean embedding space and a projection afterwards. In this space, we can simply use partial derivatives instead of covariant derivatives. Moreover, we can treat the cumbersome Riemannian linearization (from weak form and discrete weak form, respectively, downwards via covariant derivatives) as a problem in Euclidean space. Thus, we can simply use the Gâteaux derivative in the Euclidean continuous vector space, which is easy to compute. The result is projected onto the tangent space of the manifold to obtain the correct Riemannian gradient and Riemannian Hessian, respectively, the residual and stiffness matrix of the Reissner-Mindlin formulation. With these algebraic quantities at hand, we can solve the non-linear minimization problem iteratively, ending up at the bottom right of Fig. 2.

The derivations sketched above, are described in more detail in the following, starting in Sect. 5.2 with the generic construction of the algebraic Riemannian gradient and Riemannian Hessian and eventually arriving at the element vectors and matrices in Sect. 7. Sect. 8 introduces the so-called retractions, defined in [2], which project nodal updates from the corresponding tangent space back onto the manifold. They are needed to generalize the standard addition to incrementally update the nodal displacements.

Fig. 2
figure 2

General solution process of the manifold-valued PDE or potential functionals. In red dash-dotted the path taken in this paper

5.2 Computing the Gradient and the Hessian for the Algebraic Setting

5.2.1 The Intrinsic Approach

This section explains why the intrinsic algebraic linearization is cumbersome and why we use the extrinsic constructions of [2, 3], see Sects. 5.2.2 and 5.2.3. Starting point is the algebraic minimization problem of the functional \({\varPi }\)

$$\begin{aligned} \varvec{{\Phi }}^*= {\mathop {\textrm{argmin}}\limits _{\varvec{{\Phi }} \in M^n}} {\varPi }(\varvec{{\Phi }}), \end{aligned}$$
(5.1)

which corresponds to the top right corner of Fig. 2. In the following we assume a Newton-Raphson-type solution process. For such an approach we need a residual quantity that typically arises from a weak form due to some virtual work principle, or, as in this case, simply as derivatives of \({\varPi }\). Furthermore, the Newton method requires the linearization of the residual at a specific point, which yields the Hessian operator. For the intrinsic approach we have to introduce a parametrization of the space M. For the midsurface vector space \(\mathbb {R}^3 \) this parametrization and the derivatives can be trivially constructed. Therefore, we focus on a single director \({{\textbf {t}}}\) living in \({{\mathcal {S}}}^2\). To point out the crucial aspects, we switch to index notation \({{\textbf {t}}}= t^{i} {\textbf {e}}_i\), where \({\textbf {e}}_i\) is the i-th Euclidean unit vector of \(\mathbb {R}^3 \) and \(i\in \{1,2, 3 \}\). A feasible parametrization of \({{\textbf {t}}}({\varvec{\alpha }})\in {{\mathcal {S}}}^2 \) is

$$\begin{aligned} \begin{aligned} t^1&= \cos {\alpha }_1 \sin {\alpha }_2 , \\ t^2&= \sin {\alpha }_1 \sin {\alpha }_2 , \\ t^3&= \cos {\alpha }_2 , \end{aligned} \end{aligned}$$
(5.2)

Similar to [63] with \({\alpha }_1 \in [0, 2\pi [\) and \({\alpha }_2 \in [0, \pi ]\). The two-dimensional parameter space \(\mathbb {R}^2\) is then parametrized as \({\varvec{\alpha }}= {\alpha }^{\beta }{\hat{{\textbf {{\textbf {e}}}}}}_{\beta }\) where \({\hat{{\textbf {{\textbf {e}}}}}}_{\beta }\) is the \({\beta }\)-th Euclidean unit vector of \(\mathbb {R}^2\) with \({\beta }\in \{1,2\} \). Therefore, for the director the minimization reads

$$\begin{aligned} {\varvec{\alpha }}^*= {\mathop {\textrm{argmin}}\limits _{{\varvec{\alpha }}\in \mathbb {R}^2}} {\varPi }({{\textbf {t}}}({\varvec{\alpha }})). \end{aligned}$$
(5.3)

with the chain rule the first derivative of the potential \( {\varPi }({{\textbf {t}}}({\varvec{\alpha }}))\) is obtained as

$$\begin{aligned} \partial _{\alpha ^{\beta }}{\varPi }({{\textbf {t}}}({\varvec{\alpha }})) = \left[ \frac{{\partial }{\varPi }({{\textbf {t}}}({\varvec{\alpha }}))}{{\partial }t^{i}}\frac{{\partial }t^i}{{\partial }{\alpha }^{\beta }}\right] {\hat{{\textbf {{\textbf {e}}}}}}^{\beta }, \end{aligned}$$
(5.4)

where \(\frac{{\partial }t^i}{{\partial }{\alpha }^{\beta }} \) can be identified as the components of the two base vectors \({{\textbf {g}}}_{\beta }({\varvec{\alpha }})\) of the tangent space at \({{\textbf {t}}}\), such that

$$\begin{aligned} {{\,\textrm{grad}\,}}{\varPi }({{\textbf {t}}}({\varvec{\alpha }})) = \left[ \frac{{\partial }{\varPi }({{\textbf {t}}}({\varvec{\alpha }}))}{{\partial }{{\textbf {t}}}}\cdot {{\textbf {g}}}_{\beta }({\varvec{\alpha }}) \right] {\hat{{\textbf {{\textbf {e}}}}}}^{\beta }. \end{aligned}$$
(5.5)

Due to the inevitable singularity in the parametrization Eq. (5.2) according to the hairy ball theorem, \({{\textbf {g}}}_{\beta }({\varvec{\alpha }})\) are doomed to vanish at some point \({\varvec{\alpha }}\), which leads to divergence of the solution procedure.

The second derivatives are straightforwardly obtained as

$$\begin{aligned} \begin{aligned} \partial _{\alpha ^{\gamma }}\partial _{\alpha ^{\beta }}{\varPi }({{\textbf {t}}}({\varvec{\alpha }}))&= \Big [\frac{{\partial }^2{\varPi }({{\textbf {t}}}({\varvec{\alpha }}))}{{\partial }t^{i}\partial t^{j}}\frac{{\partial }t^i}{{\partial }{\alpha }^{\beta }} \frac{{\partial }t^j}{{\partial }{\alpha }^{\gamma }} \\&\quad + \frac{{\partial }{\varPi }({{\textbf {t}}}({\varvec{\alpha }}))}{{\partial }t^{i}}\frac{{\partial }^2 t^i}{{\partial }{\alpha }^{\beta }{\alpha }^{\gamma }}\Big ] {\hat{{\textbf {{\textbf {e}}}}}}^{\beta }\otimes ~ {\hat{{\textbf {{\textbf {e}}}}}}^{\gamma }, \end{aligned} \end{aligned}$$
(5.6)

which can be recast to

$$\begin{aligned} \begin{aligned} {{\,\textrm{Hess}\,}}{\varPi }({{\textbf {t}}}({\varvec{\alpha }}))&= \Big [{{\textbf {g}}}_{{\beta }}\cdot \frac{{\partial }^2{\varPi }({{\textbf {t}}}({\varvec{\alpha }}))}{{\partial }{{\textbf {t}}}\partial {{\textbf {t}}}}{{\textbf {g}}}_{{\gamma }} \\&\quad + \frac{{\partial }{\varPi }({{\textbf {t}}}({\varvec{\alpha }}))}{{\partial }t^{i}}\varGamma _{{\beta }{\gamma }}^{\kappa }g_{\kappa }^i\Big ] {\hat{{\textbf {{\textbf {e}}}}}}^{\beta }\otimes {\hat{{\textbf {{\textbf {e}}}}}}^{\gamma }. \end{aligned} \end{aligned}$$
(5.7)

The Christoffel symbol \(\varGamma _{{\beta }{\gamma }}^{\kappa }\) is defined as \(\frac{{\partial }^2{{\textbf {t}}}}{{\partial }{\alpha }^{\beta }{\alpha }^{\gamma }}=\frac{{\partial }{{\textbf {g}}}_{\beta }}{{\partial }{\alpha }^{\gamma }} = \varGamma _{{\beta }{\gamma }}^{\kappa }{{\textbf {g}}}_{\kappa }\). To find the stationary point \({\varvec{\alpha }}^*\), for which \({{\,\textrm{grad}\,}}{\varPi }({{\textbf {t}}}({\varvec{\alpha }})) =0\), a Newton-Raphson scheme can be used, where the first order Taylor expansion of the weak form is set to zero,

$$\begin{aligned} {{\,\textrm{grad}\,}}{\varPi }({{\textbf {t}}}({\varvec{\alpha }}_K))+ {{\,\textrm{Hess}\,}}{\varPi }({{\textbf {t}}}({\varvec{\alpha }}_K)) \Updelta {\varvec{\alpha }}_K {\mathop {=}\limits ^{!}}0. \end{aligned}$$
(5.8)

Eq. (5.8) is then solved for \(\Updelta {\varvec{\alpha }}_K \) and with this the configuration is simply updated as

$$\begin{aligned} {\varvec{\alpha }}_{K+1} = {\varvec{\alpha }}_K + \Updelta {\varvec{\alpha }}_K. \end{aligned}$$
(5.9)

To summarize, we repeat the singularities of the gradient and Hessian in Eqs. (5.5) and (5.7). Furthermore, the resulting quantities tend to be expensive to evaluate, due to the involved dependence on trigonometric functions inherited from the parametrization of Eq. (5.2). Both drawbacks can be circumvented using the extrinsic approach from [2, 3], which we will use and discuss in the following. As historical note, we mention [60, 65], who use a similar concept, which was also exploited by [6] to obtain the derivatives of their formulation. In contrast to the forecasted advantages, we point out that the trivial update formula of Eq. (5.9) will become more involved in the extrinsic approach and this will lead to the so-called retractions, described in Sect. 8. Furthermore, we mention that \({{\textbf {g}}}_{\alpha }\) is similar to the later introduced 3\(\times \)2 matrix \(\varvec{{\Lambda }}\).

5.2.2 The Extrinsic Approach: Projection of the Euclidean Gradient

The construction of the gradient and Hessian from [2] can only be applied in an algebraic setting. Therefore, we start from the discretized functional \({\varPi }^h: V_h^{{{\mathcal {M}}}}({\varOmega }) \rightarrow \mathbb {R}\). With the operator \({{\mathcal {E}}}: V_{\textrm {h}}^{{{\mathcal {M}}}} \rightarrow M^n\) it can be cast into a discrete function \({\varPi }: M^n \rightarrow \mathbb {R}\), which depends on n nodal values of the manifold M. The relation of \(V_{\textrm {h}}^{{{\mathcal {M}}}}\) and \(M^n\) for non-linear manifolds is more delicate than the case where \({{\mathcal {M}}}\) is a vector space. Especially the inverse \({{\mathcal {E}}}^{-1}\) is not unique in every case, as stated in ([74], Theorem 3.2). However this non-uniqueness is not an issue for Reissner-Mindlin energies, since the director does not vary by \(180^{\circ }\) from node to node. Nevertheless, with this at hand, we move in Fig. 2 from the top left to the top right corner. In the following we first describe the projection of gradients and Hessians.

We first introduce the standard concept of projecting a gradient, defined in an embedding space, onto a submanifold in such a way that it yields the correct gradient on the submanifold. This idea goes back at least to Rosen [71, 72]. The later projection of the Hessian has not such a long tradition, at least the authors could not find anything older than [65]. Here, M can be an arbitrary manifold and is not restricted to \({\mathbb {R}}^{3}\times {{{\mathcal {S}}}}^{2}\).

Any (co-)vector of a Euclidean embedding space can be decomposed into a tangential part and a normal part of a Riemannian submanifold \(M \subset {\mathbb {R}}^n\), see Fig. 3 and ([2], Chapter 3.6.1). For an arbitrary vector \({\varvec{\eta }}\in T_{{\textbf {x}}}\mathbb {R}^n\cong \mathbb {R}^n\) with its leg at \({{\textbf {x}}}\in \mathbb {R}^n\) this can be written as

$$\begin{aligned} {\varvec{\eta }}= P_{{\textbf {x}}}{\varvec{\eta }}+ P_{{\textbf {x}}}^\perp {\varvec{\eta }}. \end{aligned}$$
(5.10)

here \(P_{{\textbf {x}}}\) denotes the projection onto \(T_{{\textbf {x}}}M\) and \(P_{{\textbf {x}}}^\perp \) is the projection onto \({(T_{{\textbf {x}}}M)}^\perp \).

Fig. 3
figure 3

Decomposition of a vector \({\varvec{\eta }}\) which lies in the embedding space into a tangential and a normal part of the submanifold M

We consider a function \({\bar{{\varPi }}}: \mathbb {R}^n \rightarrow \mathbb {R}\) defined on \(\mathbb {R}^n\) and a function \({\varPi }: M \rightarrow \mathbb {R}\), which is the same as \({\bar{{\varPi }}}\) with the restriction to take only values from M. The gradient of \({\varPi }\), which is a co-tangent vector, can then be expressed as the gradient of \({\bar{{\varPi }}}\), projected onto the submanifold M,

$$\begin{aligned} {{\,\textrm{grad}\,}}{\varPi }({{\textbf {x}}}) =P_{{\textbf {x}}}{{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}}), \end{aligned}$$
(5.11)

see also Fig. 3.

5.2.3 The Extrinsic Approach: Construction of the Riemannian Hessian

From the aforementioned projection to construct the Riemannian gradient, we can derive a similar reasoning to obtain the Riemannian Hessian. For two vectors \({\varvec{\eta }},{\varvec{\xi }}\in T_{{\textbf {x}}}M\) we get

$$\begin{aligned} \begin{aligned} \nabla _{\varvec{\eta }}{{\,\textrm{grad}\,}}{\varPi }({{\textbf {x}}}) \cdot {\varvec{\xi }}&= {\varvec{\xi }}\cdot {{\,\textrm{Hess}\,}}{\varPi }({{\textbf {x}}}) {\varvec{\eta }}\\&= {\varvec{\xi }}\cdot \nabla _{\varvec{\eta }}(P_{{\textbf {x}}}{{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}})) \\&= {\varvec{\xi }}\cdot P_{{\textbf {x}}}{\textrm {D}}_{\varvec{\eta }}(P_{{\textbf {x}}}{{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}})) \\&={\varvec{\xi }}\cdot [P_{{\textbf {x}}}P_{{\textbf {x}}}{\textrm {D}}_{\varvec{\eta }}{{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}})+P_{{\textbf {x}}}({\textrm {D}}_{\varvec{\eta }}P_{{\textbf {x}}}) {{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}})] \\&={\varvec{\xi }}\cdot [P_{{\textbf {x}}}{\textrm {D}}_{\varvec{\eta }}{{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}})+P_{{\textbf {x}}}({\textrm {D}}_{\varvec{\eta }}P_{{\textbf {x}}}) {{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}})] \\&={\varvec{\xi }}\cdot [P_{{\textbf {x}}}{{\,\textrm{Hess}\,}}{\bar{{\varPi }}}({{\textbf {x}}}) {\varvec{\eta }}+P_{{\textbf {x}}}({\textrm {D}}_{\varvec{\eta }}P_{{\textbf {x}}}) {{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}})]. \end{aligned} \end{aligned}$$
(5.12)

here we made use of the relation \(\nabla _{\varvec{\eta }}=P_{{\textbf {x}}}{\textrm {D}}_{\varvec{\eta }}\), which stems from ([2], Ch. 5.3.3, Eq. 5.15). Additionally, \(\nabla _{\varvec{\eta }}\) denotes the covariant derivative, whereas \({\textrm {D}}_\eta \) describes a standard Euclidean directional derivative. Since \(P_{{\textbf {x}}}\) is a projection matrix, we additionally have the idempotency property \(P_{{\textbf {x}}}P_{{\textbf {x}}}=P_{{\textbf {x}}}\).

Thus, the Riemannian Hessian can be expressed by using four quantities, namely (i) the projection \(P_{{\textbf {x}}}\) from the embedding space onto the tangent space of the submanifold, (ii) the directional derivative of \(P_{{\textbf {x}}}\) in \(({\textrm {D}}_{\varvec{\eta }}P_{{\textbf {x}}})\), (iii) the gradient \({{\,\textrm{grad}\,}}{\bar{{\varPi }}}\) and (iv) the Hessian \({{\,\textrm{Hess}\,}}{\bar{{\varPi }}}\) of the Euclidean extension of the functional. This construction is obtained from [3]. With this we can calculate the Riemannian Hessian without any Christoffel symbols and using an extrinsic view without any singularities, since no parametrization, such as angles, are introduced.

In particular, for the case of the unit sphere \({{\textbf {x}}}\in {{\mathcal {S}}}^n\) the projection is

$$\begin{aligned} P_{{{\textbf {x}}}}= {\textbf {I}}-{{\textbf {x}}}\otimes {{\textbf {x}}}, \end{aligned}$$
(5.13)

where \({\textbf {I}}\) is the identity matrix. For the partial derivative \({\textrm {D}}_{\varvec{\eta }}P_{{\textbf {x}}}\) we obtain

$$\begin{aligned} \begin{aligned} {\textrm {D}}_{\varvec{\eta }}P_{{\textbf {x}}}&= {\textrm {D}}_{\varvec{\eta }}({\textbf {I}}-{{\textbf {x}}}\otimes {{\textbf {x}}}) =-{\textrm {D}}_{\varvec{\eta }}({{\textbf {x}}}\otimes {{\textbf {x}}}) \\&= -({\varvec{\eta }}\otimes {{\textbf {x}}}+ {{\textbf {x}}}\otimes {\varvec{\eta }}). \end{aligned} \end{aligned}$$
(5.14)

With this result we compute the product \(P_{{\textbf {x}}}({\textrm {D}}_{\varvec{\eta }}P_{{\textbf {x}}}) \) in Eq. (5.12) as follows

$$\begin{aligned} \begin{aligned} P_{{\textbf {x}}}({\textrm {D}}_{\varvec{\eta }}P_{{\textbf {x}}})&= -P_{{\textbf {x}}}({\varvec{\eta }}\otimes {{\textbf {x}}}+ {{\textbf {x}}}\otimes {\varvec{\eta }})\\&=-P_{{\textbf {x}}}{\varvec{\eta }}\otimes {{\textbf {x}}}-P_{{\textbf {x}}}{{\textbf {x}}}\otimes {\varvec{\eta }}\\&=-P_{{\textbf {x}}}{\varvec{\eta }}\otimes {{\textbf {x}}}=- {\varvec{\eta }}\otimes {{\textbf {x}}}, \end{aligned} \end{aligned}$$
(5.15)

where the identities \({\varvec{\eta }}=P_{{\textbf {x}}}{\varvec{\eta }}\) and \(P_{{\textbf {x}}}{{\textbf {x}}}= {\varvec{0}}\) of the projection have been exploited. Naturally, we have for \(M={{\mathcal {S}}}^2\)

$$\begin{aligned} \begin{aligned} {\varvec{\xi }}\cdot {{\,\textrm{Hess}\,}}{\varPi }({{\textbf {x}}}) {\varvec{\eta }}&= {\varvec{\xi }}\cdot [P_{{\textbf {x}}}{{\,\textrm{Hess}\,}}{\bar{{\varPi }}}({{\textbf {x}}}) {\varvec{\eta }}- {\varvec{\eta }}\otimes {{\textbf {x}}}{{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}})] \\&={\varvec{\xi }}\cdot [P_{{\textbf {x}}}{{\,\textrm{Hess}\,}}{\bar{{\varPi }}}({{\textbf {x}}}) - ({{\textbf {x}}}\cdot {{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}})) {\textbf {I}}] {\varvec{\eta }}. \end{aligned} \end{aligned}$$
(5.16)

By comparing coefficients, we obtain

$$\begin{aligned} {{\,\textrm{Hess}\,}}{\varPi }({{\textbf {x}}})=P_{{\textbf {x}}}{{\,\textrm{Hess}\,}}{\bar{{\varPi }}}({{\textbf {x}}}) - ({{\textbf {x}}}\cdot {{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}})) {\textbf {I}} \end{aligned}$$
(5.17)

as Riemannian Hessian. Using \({\varvec{\eta }}=P_{{\textbf {x}}}{\varvec{\eta }}\), we can rewrite Eq. (5.16) in the form

$$\begin{aligned} \begin{aligned} {\varvec{\xi }}\cdot {{\,\textrm{Hess}\,}}{\varPi }({{\textbf {x}}}) {\varvec{\eta }}= {\varvec{\xi }}\cdot [P_{{\textbf {x}}}{{\,\textrm{Hess}\,}}{\bar{{\varPi }}}({{\textbf {x}}})P_{{\textbf {x}}}- ({{\textbf {x}}}\cdot {{\,\textrm{grad}\,}}{\bar{{\varPi }}}({{\textbf {x}}}) ){\textbf {I}}] {\varvec{\eta }}, \end{aligned} \end{aligned}$$
(5.18)

which underlines the symmetry of the Riemannian Hessian.

5.2.4 Gradient and Hessian Final Form

We go back to the algebraic Reissner-Mindlin potential \({\varPi }(\varvec{{\Phi }})\), which takes values from \(M^n = (\mathbb {R}^3 \times {{\mathcal {S}}}^2)^n\). The aim is to move from the top right corner to the bottom right corner in Fig. 2. We restrict the subsequent derivations to the contributions of two nodes I and J, which can then be assembled to the full gradient and Hessian. Furthermore, if we use uppercase letters as indices no sum convention is applied. We express this explicitly by using the usual notation \(\sum _{I=1}^N\). First, the results from Eqs. (5.17) and (5.18) can be generalized for the midsurface \({\varvec{\varphi }}_I\). The tangent space of \(\mathbb {R}^3\) can be identified with itself \(T_{\varvec{\varphi }}\mathbb {R}^3\cong \mathbb {R}^3\) and we have as projection the identity \(P_{{\varvec{\varphi }}_I}={\textbf {I}}\). With this result at hand we find the projector \(P_{\varvec{{\Phi }}_I}: \mathbb {R}^6 \rightarrow T_{\varvec{{\Phi }}_I} M=T_{{\varvec{\varphi }}_I}\mathbb {R}^3 \times T_{{{\textbf {t}}}_I}{{\mathcal {S}}}^2 \cong \mathbb {R}^3 \times T_{{{\textbf {t}}}_I}{{\mathcal {S}}}^2 \) as

$$\begin{aligned} P_{\varvec{{\Phi }}_I}=\begin{bmatrix}{} {\textbf {I}}_{3\times 3} &{}\quad {\varvec{0}}_{3\times 3} \\ {\varvec{0}}_{3\times 3} &{} P_{{{\textbf {t}}}_I} \\ \end{bmatrix}, \quad P_{{{\textbf {t}}}_I} = {\textbf {I}}_{3\times 3}- {{\textbf {t}}}_I \otimes {{\textbf {t}}}_I. \end{aligned}$$
(5.19)

We can now fully define the contribution of node I to the Riemannian gradient \({{\,\textrm{grad}\,}}_I {\varPi }(\varvec{{\Phi }})\), similar to Eq. (5.11), as

$$\begin{aligned} {{\,\textrm{grad}\,}}_I {\varPi }{(\varvec{{\Phi }})}_{6\times 1} = P_{\varvec{{\Phi }}_I} {{\,\textrm{grad}\,}}_I {\bar{{\varPi }}}(\varvec{{\Phi }})= \begin{bmatrix} \frac{{\partial }{\bar{{\varPi }}}}{{\partial }{\varvec{\varphi }}_I} \\ P_{{{\textbf {t}}}_I}\frac{{\partial }{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_I} \\ \end{bmatrix}. \end{aligned}$$
(5.20)

To obtain the Hessian we need to specify the quantity \(P_{\varvec{{\Phi }}_I} ({\textrm {D}}_{\Updelta \varvec{{\Phi }}_I} P_{\varvec{{\Phi }}_J} ) \) as

$$\begin{aligned} \begin{aligned} P_{\varvec{{\Phi }}_I} ({\textrm {D}}_{\Updelta \varvec{{\Phi }}_I} P_{\varvec{{\Phi }}_J} ) =\begin{bmatrix} {\varvec{0}}_{3\times 3} &{} \quad {\varvec{0}}_{3\times 3} \\ {\varvec{0}}_{3\times 3} &{} \quad -\Updelta {{\textbf {t}}}_I \otimes {{\textbf {t}}}_J \delta _{IJ} \\ \end{bmatrix}, \end{aligned} \end{aligned}$$
(5.21)

where \(\delta _{IJ}\) is the Kronecker delta, taking care of the fact that \({\textrm {D}}_{\Updelta \varvec{{\Phi }}_I} P_{\varvec{{\Phi }}_J} ={\varvec{0}}\) for \(I\ne J\).

Using the results from Sect. 5.2.3, the contribution of nodes I and J to the Hessian reads

$$\begin{aligned} \begin{aligned} \nabla _{\Updelta \varvec{{\Phi }}_I}&{{\,\textrm{grad}\,}}_J {\varPi }(\varvec{{\Phi }}) \cdot \updelta \varvec{{\Phi }}_J =\updelta \varvec{{\Phi }}_J {{\,\textrm{Hess}\,}}_{JI}{\varPi }(\varvec{{\Phi }}) \Updelta \varvec{{\Phi }}_I \\&=\updelta \varvec{{\Phi }}_J \cdot [P_{\varvec{{\Phi }}_J} {{\,\textrm{Hess}\,}}_{JI} {\bar{{\varPi }}}(\varvec{{\Phi }} ) P_{\varvec{{\Phi }}_I}\Updelta \varvec{{\Phi }}_I\\&\quad+P_{\varvec{{\Phi }}_I} ({\textrm {D}}_{\Updelta \varvec{{\Phi }}_I} P_{\varvec{{\Phi }}_J} ) {{\,\textrm{grad}\,}}_J {\bar{{\varPi }}}(\varvec{{\Phi }} )]. \end{aligned} \end{aligned}$$
(5.22)

With the definitions from Eqs. (5.19) and (5.21) this can be expanded to

$$\begin{aligned} \begin{aligned}&\updelta \varvec{{\Phi }}_J {{\,\textrm{Hess}\,}}_{JI}{\varPi }(\varvec{{\Phi }}) \Updelta \varvec{{\Phi }}_I \\&= \begin{bmatrix} \updelta {\varvec{\varphi }}_J \\ \updelta {{\textbf {t}}}_J \\ \end{bmatrix}^T_{6\times 1} \Bigg \{ \begin{bmatrix} \frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{\varvec{\varphi }}_J \partial {\varvec{\varphi }}_I} &{} \frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{\varvec{\varphi }}_J \partial {{\textbf {t}}}_I} P_{{{\textbf {t}}}_I} \\ P_{{{\textbf {t}}}_J}\frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_J \partial {\varvec{\varphi }}_I} &{} P_{{{\textbf {t}}}_J}\frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_J \partial {{\textbf {t}}}_I}P_{{{\textbf {t}}}_I} \\ \end{bmatrix} \\&\quad -\begin{bmatrix} {\varvec{0}}_{3\times 3} &{} {\varvec{0}}_{3\times 3} \\ {\varvec{0}}_{3\times 3} &{} {{\textbf {t}}}_J^T\frac{{\partial }{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_J} \delta _{IJ} \\ \end{bmatrix} \Bigg \}_{6\times 6}\begin{bmatrix} \Updelta {\varvec{\varphi }}_I \\ \Updelta {{\textbf {t}}}_I \\ \end{bmatrix}_{6\times 1}. \end{aligned} \end{aligned}$$
(5.23)

The Hessian inherits the dimensions \(6\times 6\) from the embedding space \(M_{{\textrm {E}}}=\mathbb {R}^3 \times \mathbb {R}^3\). With reference to the five-dimensional configuration space \(M=\mathbb {R}^3 \times {{\mathcal {S}}}^2\), however, it has to be reduced to \(5\times 5\). Moreover, the six-dimensional Hessian has a non-trivial kernel, since \(P_{{{\textbf {t}}}_I}\) projects a normal at \({{\textbf {t}}}_I\) onto the zero vector, \(P_{{{\textbf {t}}}_I} {{\textbf {t}}}_I={\varvec{0}}\). In particular, we have for a vector \({{\textbf {q}}}_I={[0\; 0\; 0\; {\alpha }{{\textbf {t}}}_I^T]}^T, {\alpha }\in \mathbb {R}\) as result

$$\begin{aligned} {{\,\textrm{Hess}\,}}_{JI}{\varPi }(\varvec{{\Phi }}) {{\textbf {q}}}_I = {\varvec{0}}_{6\times 1}. \end{aligned}$$
(5.24)

Therefore, Hessian-vector products are only non-zero in the tangent space of \(T_{{{\textbf {t}}}_I} {{\mathcal {S}}}^2\). Using this information, we apply a base change of the Hessian to get a final stiffness matrix with dimensions \(5 \times 5\), see ([78], Eq. 23) or similarly [84].

At this point, we postulate a generic tangent space of the director \({{\textbf {t}}}_I\) which we introduce as

$$\begin{aligned} \varvec{{\Lambda }}_I=\begin{bmatrix} {{\textbf {t}}}^1_I&\quad {{\textbf {t}}}^2_I \end{bmatrix}_{3\times 2}, \end{aligned}$$
(5.25)

see Fig. 4. Several options to explicitly obtain this nodal tangent space are presented in Section C.3. A generalization of this tangent space base that includes the midsurface can be written as

$$\begin{aligned} \varvec{{\Lambda }}_{\varvec{{\Phi }}_I}=\begin{bmatrix} {\textbf {I}}_{3\times 3} &{}\quad {\varvec{0}}_{3\times 2} \\ {\varvec{0}}_{3\times 3} &{}\quad \varvec{{\Lambda }}_{I,3\times 2} \end{bmatrix}_{6\times 5}, \end{aligned}$$

since the tangent space of \(\mathbb {R}^3\) is \(\mathbb {R}^3\) itself.

Fig. 4
figure 4

The unit sphere with a unit vector \({{\textbf {t}}}_I\) and the corresponding tangent space \(\varvec{{\Lambda }}_I=[{{\textbf {t}}}^1_I~{{\textbf {t}}}^2_I]\)

In Appendix 3, we present three different tangent base update schemes. The first two are based on parallel transport of the tangent base vectors from the old state to the new one. We will call these IncPT for incremental parallel transport and IncVT for incremental vector transport, see Algs. 4 and 3. The first one (IncPT) is similar to the approaches by, e.g. [29, 86]. The second one (IncVT) is newly introduced in this work. Both methods differ only slightly, but IncVT is computationally faster since it avoids evaulation of trigonometric functions and uses less multiplications. The third one is based on stereographic projection (SP), see Algo. 5. This tangent base construction is proposed in ([74], Eq. 31) for the general unit sphere \({{\mathcal {S}}}^{n-1} \).

In contrast to the first two schemes the stereographic projection does not need information of the old state. Instead the only information needed is the current nodal director to construct the new basis.

Since, by definition, the columns of \(\varvec{{\Lambda }}_I\) are elements of \(T_{{{\textbf {t}}}_I}{{\mathcal {S}}}^2\), we have \(P_{{{\textbf {t}}}_I} \varvec{{\Lambda }}_I=\varvec{{\Lambda }}_I\). Moreover, we always assume \({{\textbf {t}}}^{\alpha }_I\) to be unit vectors and \({{\textbf {t}}}_I^1 \cdot {{\textbf {t}}}_I^2 =0\). This yields \(\varvec{{\Lambda }}_I^T \varvec{{\Lambda }}_I= {\textbf {I}}_{2\times 2}\), which simplifies the formulation. Therefore, by rewriting Eq. (5.23), we can finally define the Riemannian stiffness matrix

$$\begin{aligned} \begin{aligned} {\textbf {K}}^{JI,\text {riem}}_{5\times 5}&= {{\,\textrm{Hess}\,}}_{JI}{\varPi }{(\varvec{{\Phi }})}_{5\times 5} \\&= \varvec{{\Lambda }}_{\varvec{{\Phi }}_J,5\times 6}^T{{\,\textrm{Hess}\,}}_{JI}{\varPi }{(\varvec{{\Phi }})}_{6\times 6} \varvec{{\Lambda }}_{\varvec{{\Phi }}_I,6\times 5} \\&= \begin{bmatrix} \frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{\varvec{\varphi }}_J \partial {\varvec{\varphi }}_I} &{} \quad \frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{\varvec{\varphi }}_J \partial {{\textbf {t}}}_I}\varvec{{\Lambda }}_I \\ \varvec{{\Lambda }}_J^T \frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_J \partial {\varvec{\varphi }}_I} &{} \quad \varvec{{\Lambda }}_J^T\frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_J \partial {{\textbf {t}}}_I}\varvec{{\Lambda }}_I \\ \end{bmatrix}\\&\quad -\begin{bmatrix} {\varvec{0}}_{3\times 3} &{} \quad {\varvec{0}}_{3\times 3} \\ {\varvec{0}}_{3\times 3} &{} \quad {{\textbf {t}}}_J^T\frac{{\partial }{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_J} {\textbf {I}}_{2\times 2}\delta _{IJ} \\ \end{bmatrix}. \end{aligned} \end{aligned}$$
(5.26)

With this tangent space representation we can also rewrite the three-dimensional director update \(\Updelta {{\textbf {t}}}_I \in T_{{{\textbf {t}}}_I}{{\mathcal {S}}}^2 \) as

$$\begin{aligned} \Updelta {{\textbf {t}}}_I = \varvec{{\Lambda }}_I \Updelta {\textbf {T}}_I= T_I^1 {{\textbf {t}}}^1_I+ T_I^2 {{\textbf {t}}}^2_I \end{aligned}$$
(5.27)

with two degrees of freedom \(T_I^{\alpha }\) for the director update in the tangent plane \(T_{{{\textbf {t}}}_I}{{\mathcal {S}}}^2\). This construction is identical to the notation used by Simo et al. [86]. Additionally, since \({\textbf {K}}^{JI}_{5\times 5}\) is expressed in the tangent space of the nodal director, we also need to express the residual in the base given by Eq. (5.25). The corresponding version of Eq. (5.20) reads

$$\begin{aligned} \begin{aligned} {\textbf {R}}_{I,5\times 1}^\text {riem}&= {{\,\textrm{grad}\,}}_I {\varPi }{(\varvec{{\Phi }})}_{5\times 1} \\&= \varvec{{\Lambda }}_{\varvec{{\Phi }}_I,5\times 6}^T {{\,\textrm{grad}\,}}_I {\varPi }{(\varvec{{\Phi }})}_{6\times 1} \\&= \begin{bmatrix} \frac{{\partial }{\bar{{\varPi }}}}{{\partial }{\varvec{\varphi }}_I} \\ \varvec{{\Lambda }}_I^T\frac{{\partial }{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_I} \\ \end{bmatrix}_{5\times 1}. \end{aligned} \end{aligned}$$
(5.28)

At this point, we want to discuss why it is problematic to directly insert this base \(\varvec{{\Lambda }}_I\) into the energy like \({\varPi }({{\textbf {t}}}_I+ \varvec{{\Lambda }}_I {\textbf {T}}_I)\) and differentiate everything to obtain a formulation with five degrees of freedom. For the geometrically linear case this is a valid procedure, since the tangent space does not change during deformation. For the geometrically non-linear case this does not hold, since \(\varvec{{\Lambda }}_I\) is a function of \({{\textbf {t}}}_I\), which, in turn, changes during deformation. This results in a complicated linearization procedure (e.g. linearizing Algorithm 5). Furthermore, this ends up in the situation where the derivatives are not continuous, since the parametrization of the tangent space, due to the hairy ball theorem, contains singularities. The linearization of an incremental approach, see Algorithms 4 and 3 in the appendix, is complicated and not well-defined, because of the non-additive relation between the degrees of freedom and the tangent space of the director.

Finally, we have expressions for the Riemannian gradient and Riemannian Hessian that can be purely constructed from the Euclidean gradient and the Euclidean Hessian of the function \({\bar{{\varPi }}}\). With these, we can trivially compute by standard Euclidean partial derivatives the missing quantities and finally truly move down in Fig. 2 from the top right to the bottom right. Here, we apply the iterative Riemannian Newton-Raphson scheme to solve our non-linear optimization problem \(\min {\varPi }(\varvec{{\Phi }})\) or finding the root of the residual \({\textbf {R}}\). The resulting iterative scheme is summarized in Algo. 1, which differs from ([2], Alg. 5) by using the tangent space representation, similar to [78, 86].

figure a

5.3 Euclidean Variation and Linearization of the Continuous Reissner-Mindlin Energy

The last missing ingredient are the partial derivatives of the algebraic potential energy, see Eqs. (5.26) and (5.28). These can be simply obtained using e.g. automatic differentiation. Nevertheless, we choose another route in the following. Due to the fact that all quantities live in the six-dimensional Euclidean embedding space \(M_{\textrm {E}}=\mathbb {R}^3 \times \mathbb {R}^3\), the usual rules apply. We can therefore take a step back and derive everything in the continuous space \({\mathbb {W}}_{k,p}^{l,q}({\varOmega },M_E)\). Here, the stiffness matrix and the residual Eqs. (5.26) and (5.28) can be computed by using standard Gâteaux directional derivatives. Using this approach we obtain a template for the differential operators, the residual and the stiffness matrix. This template is independent of a particular interpolation scheme. Accordingly, in subsequent derivations we can start from the continuous Euclidean setting in the space \({\mathbb {W}}_{k,p}^{l,q}({\varOmega },M_E)\), which contains \(\hat{\bar{\varvec{{\Phi }}}}\). We start by applying the axiom of minimum of potential energy in this setting and obtain the first variation of \(\hat{{\bar{{\varPi }}}}(\hat{\bar{\varvec{{\Phi }}}})\) as

$$\begin{aligned} \begin{aligned} \updelta \hat{{\bar{{\varPi }}}}(\hat{\bar{\varvec{{\Phi }}}})&=\frac{{\partial }\hat{{\bar{{\varPi }}}}(\hat{\bar{\varvec{{\Phi }}}}+\epsilon \updelta \hat{\bar{\varvec{{\Phi }}}} )}{{\partial }\epsilon }\Bigg |_{\epsilon =0} =\hat{{{\bar{G}}}}(\hat{\bar{\varvec{{\Phi }}}};\updelta \hat{\bar{\varvec{{\Phi }}}})\\&= \int _{{{\mathcal {B}}}_0^C}\frac{{\partial }\hat{{\bar{\psi }}}}{{\partial }\varepsilon _{{\alpha }{\beta }}} \updelta \varepsilon _{{\alpha }{\beta }}+ \frac{{\partial }\hat{{\bar{\psi }}}}{{\partial }{\kappa }_{{\alpha }{\beta }}} \updelta {\kappa }_{{\alpha }{\beta }}+ \frac{{\partial }\hat{{\bar{\psi }}}}{{\partial }\rho _{{\alpha }}} \updelta {\gamma }_{{\alpha }} \,{\textrm {d}}A\\&=\int _{{{\mathcal {B}}}_0^C} {{\tilde{N}}}^{{\alpha }{\beta }} \updelta \varepsilon _{{\alpha }{\beta }}+ {{\tilde{M}}}^{{\alpha }{\beta }} \updelta {\kappa }_{{\alpha }{\beta }}+ {{\tilde{Q}}}^{{\alpha }} \updelta {\gamma }_{{\alpha }}\,{\textrm {d}}A, \\&\quad \forall \updelta \hat{\bar{\varvec{{\Phi }}}} \in {\mathbb {W}}_{k,p}^{l,q}({\varOmega },M_E) . \end{aligned} \end{aligned}$$
(5.29)

In the notation used for the energy \(\hat{{\bar{{\varPi }}}}(\hat{\bar{\varvec{{\Phi }}}})\), the hat indicates a continuous quantity and the bar denotes a function which lives in the embedding space. For simplicity and better readability, however, we refrain from introducing expressions like \(\hat{{\bar{{\textbf {a}}}}}_i\) for the base vectors and just stick to \({\textbf {a}}_i\). Instead, we restrict this notation to the total potential energy \(\hat{{\bar{{\varPi }}}}\), its density \(\hat{{\bar{\psi }}}\), the weak form \(\hat{{{\bar{G}}}}\) and the state variables \(\hat{\bar{\varvec{{\Phi }}}}\). The variation \(\updelta \hat{\bar{\varvec{{\Phi }}}}\) lives in \({\mathbb {W}}_{k,p}^{l,q}({\varOmega },\hat{\bar{\varvec{{\Phi }}}}^{-1}TM_{\textrm {E}})\), but we can identify this space with \({{\mathcal {M}}}_{\textrm {E}}={{\mathcal {X}}}(\mathbb {R}^6)\), since \(TM_{\textrm {E}}\) is a linear space. This simplifies the space in which the variation takes place to \({\mathbb {W}}_{k,p}^{l,q}({\varOmega },M_{\textrm {E}})\).

From Eq. (5.29) we can directly introduce the variations of the kinematic quantities.

5.3.1 Variations of Kinematic Quantities

The variation of the kinematics, Eq. (4.10), reads

$$\begin{aligned} \begin{aligned}&\updelta \varepsilon _{{\alpha }{\beta }} =\frac{1}{2}(\updelta {\textbf {a}}_{{\alpha }} \cdot {\textbf {a}}_{{\beta }}+ {\textbf {a}}_{{\alpha }} \cdot \updelta {\textbf {a}}_{{\beta }}) \\&\updelta \kappa _{{\alpha }{\beta }} =\frac{1}{2}(\updelta {\textbf {a}}_{{\alpha }} \cdot {{\textbf {t}}}_{,{\beta }}+ \updelta {\textbf {a}}_{{\beta }} \cdot {{\textbf {t}}}_{,{\alpha }} \\&\quad \quad \quad \,\, + {\textbf {a}}_{{\alpha }} \cdot \updelta {{\textbf {t}}}_{,{\beta }}+ {\textbf {a}}_{{\beta }} \cdot \updelta {{\textbf {t}}}_{,{\alpha }}) \\&\updelta \gamma _{{\alpha }} = \updelta {\textbf {a}}_{{\alpha }} \cdot {{\textbf {t}}}+ {\textbf {a}}_{{\alpha }} \cdot \updelta {{\textbf {t}}}. \end{aligned} \end{aligned}$$
(5.30)

These quantities can be rearranged according to Voigt notation, since the effective stress resultants are symmetric.

$$\begin{aligned} \begin{aligned} \updelta {\varvec{\varepsilon }}_{\textrm {V}}&= \begin{bmatrix} \updelta {\varepsilon }_{11} \\ \updelta {\varepsilon }_{22} \\ 2 {\varepsilon }_{12} \end{bmatrix},\quad \updelta {\varvec{\kappa }}_{\textrm {V}}=\begin{bmatrix} \updelta {\kappa }_{11} \\ \updelta {\kappa }_{22} \\ 2\updelta {\kappa }_{12} \end{bmatrix},\quad \\ \updelta {\varvec{\gamma }}_{\textrm {V}}&= \begin{bmatrix} \updelta {\gamma }_{1} \\ \updelta {\gamma }_{2} \end{bmatrix}, \quad \updelta {\textbf {E}}_{\textrm {V}}=\begin{bmatrix} \updelta {\varvec{\varepsilon }}_{\textrm {V}}\\ \updelta {\varvec{\kappa }}_{\textrm {V}}\\ \updelta {\varvec{\gamma }}_{\textrm {V}}\end{bmatrix}. \end{aligned} \end{aligned}$$
(5.31)

Introducing the quantities

$$\begin{aligned}&{{\mathcal {B}}}^\text {mm} =\begin{bmatrix} {\textbf {a}}_1^T \frac{{\partial }}{{\partial }\xi ^1} \\ {\textbf {a}}_2^T \frac{{\partial }}{{\partial }\xi ^2} \\ {\textbf {a}}_1^T \frac{{\partial }}{{\partial }\xi ^2}+{\textbf {a}}_2^T \frac{{\partial }}{{\partial }\xi ^1} \end{bmatrix}_{3\times 3}, \end{aligned}$$
(5.32)
$$\begin{aligned}&\begin{aligned} {{\mathcal {B}}}^\text {bm}&=\begin{bmatrix} {{\textbf {t}}}_{,1}^T \frac{{\partial }}{{\partial }\xi ^1} \\ {{\textbf {t}}}_{,2}^T \frac{{\partial }}{{\partial }\xi ^2} \\ {{\textbf {t}}}_{,1}^T \frac{{\partial }}{{\partial }\xi ^2}+{{\textbf {t}}}_{,2}^T \frac{{\partial }}{{\partial }\xi ^1} \end{bmatrix}_{3\times 3},\\ {{\mathcal {B}}}^\text {bd}&=\begin{bmatrix} {\textbf {a}}_{1}^T \frac{{\partial }}{{\partial }\xi ^1} \\ {\textbf {a}}_{2}^T \frac{{\partial }}{{\partial }\xi ^2} \\ {\textbf {a}}_{1}^T \frac{{\partial }}{{\partial }\xi ^2}+{\textbf {a}}_{2}^T \frac{{\partial }}{{\partial }\xi ^1} \end{bmatrix}_{3\times 3}= {{\mathcal {B}}}^\text {mm}, \end{aligned} \end{aligned}$$
(5.33)
$$\begin{aligned}&{{\mathcal {B}}}^\text {sm} = \begin{bmatrix} {{\textbf {t}}}^T \frac{{\partial }}{{\partial }\xi ^1} \\ {{\textbf {t}}}^T \frac{{\partial }}{{\partial }\xi ^2} \end{bmatrix}_{2\times 3},\quad {{\mathcal {B}}}^\text {sd}= \begin{bmatrix} {\textbf {a}}_{1}^T \\ {\textbf {a}}_{2}^T \end{bmatrix}_{2\times 3} \end{aligned}$$
(5.34)

the continuous strain-displacement differential operator of the Euclidean problem is obtained as

$$\begin{aligned} {{\mathcal {B}}}=\begin{bmatrix} {{\mathcal {B}}}^\text {mm} &{}\quad {\varvec{0}}_{3\times 3} \\ {{\mathcal {B}}}^\text {bm} &{}\quad {{\mathcal {B}}}^\text {bd} \\ {{\mathcal {B}}}^\text {sm} &{}\quad {{\mathcal {B}}}^\text {sd} \\ \end{bmatrix}_{8\times 6}, \end{aligned}$$
(5.35)

where the first superscript denotes the corresponding strain: “m” for membrane, “b” for bending and “s” for shear. The second superscript denotes the variables for which the variation takes place: “m” for midsurface displacement and “d” for the director. The vector of strain variations is thus \(\updelta {\textbf {E}}_{\textrm {V}}= {{\mathcal {B}}}\updelta \hat{\bar{\varvec{{\Phi }}}}\). In line with this notation, we introduce the following notation for the effective stress resultants

$$\begin{aligned} \begin{aligned} {\tilde{\textbf {{\textbf {N}}}}}&={[{{\tilde{N}}}^{11}~{{\tilde{N}}}^{22}~ {{\tilde{N}}}^{12}]}^T, \quad {\tilde{\textbf {{\textbf {M}}}}}={[{{\tilde{M}}}^{11}~{{\tilde{M}}}^{22}~ {{\tilde{M}}}^{12}]}^T, \\ {\tilde{\textbf {{{\textbf {Q}}}}}}&={[{{\tilde{Q}}}^1~{{\tilde{Q}}}^2]}^T \quad {\tilde{\textbf {{\textbf {S}}}}}= {[{\tilde{\textbf {{\textbf {N}}}}}~{\tilde{\textbf {{\textbf {M}}}}}~{\tilde{\textbf {{{\textbf {Q}}}}}}]}^T. \end{aligned} \end{aligned}$$
(5.36)

We can now write the Euclidean weak form by inserting \({{\mathcal {B}}}\) and \({\tilde{\textbf {{\textbf {S}}}}} \) into Eq. (5.29) as

$$\begin{aligned} \begin{aligned} \hat{{{\bar{G}}}}(\hat{\bar{\varvec{{\Phi }}}};\updelta \hat{\bar{\varvec{{\Phi }}}})&= \int _{{{\mathcal {B}}}_0^C} [{{\mathcal {B}}}^\text {mm} \updelta {\varvec{\varphi }}]\cdot {\tilde{\textbf {{\textbf {N}}}}} \\&\quad +[{{\mathcal {B}}}^\text {bm} \updelta {\varvec{\varphi }}+ {{\mathcal {B}}}^\text {bd} \updelta {{\textbf {t}}}]\cdot {\tilde{\textbf {{\textbf {M}}}}}\\&\quad +[{{\mathcal {B}}}^\text {sm} \updelta {\varvec{\varphi }}+ {{\mathcal {B}}}^\text {sd} \updelta {{\textbf {t}}}]\cdot {\tilde{\textbf {{{\textbf {Q}}}}}} \,{\textrm {d}}A, \end{aligned} \end{aligned}$$
(5.37)

or, in a more compact notation,

$$\begin{aligned} \begin{aligned} \hat{{{\bar{G}}}}(\hat{\bar{\varvec{{\Phi }}}},\updelta \hat{\bar{\varvec{{\Phi }}}})&= \int _{{{\mathcal {B}}}_0^C} [{{\mathcal {B}}}(\hat{\bar{\varvec{{\Phi }}}})\updelta \hat{\bar{\varvec{{\Phi }}}}] \cdot {\tilde{\textbf {{\textbf {S}}}}}(\hat{\bar{\varvec{{\Phi }}}}) \,{\textrm {d}}A\\&= \int _{{{\mathcal {B}}}_0^C} \updelta {\textbf {E}}_{\textrm {V}}(\hat{\bar{\varvec{{\Phi }}}}) \cdot {\tilde{\textbf {{\textbf {S}}}}}(\hat{\bar{\varvec{{\Phi }}}}) \,{\textrm {d}}A. \end{aligned} \end{aligned}$$
(5.38)

5.3.2 Linearization of the Continuous Euclidean Weak Form

Linearization of a weak form living in a vector space is a standard exercise in finite element analysis. With the Gâteaux derivative we obtain the following expression for the Euclidean linearization of the weak form

$$\begin{aligned} \begin{aligned} {\textrm {D}}_{\Updelta \hat{\bar{\varvec{{\Phi }}}}} \hat{{{\bar{G}}}}(\hat{\bar{\varvec{{\Phi }}}},\updelta \hat{\bar{\varvec{{\Phi }}}})&=\int _{{{\mathcal {B}}}_0^C}\underbrace{[{\textrm {D}}_{\Updelta \hat{\bar{\varvec{{\Phi }}}}}{{\mathcal {B}}}\updelta \hat{\bar{\varvec{{\Phi }}}}]}_{\text {geometric part}} \cdot ~{\tilde{\textbf {{\textbf {S}}}}} \\&\quad + \underbrace{{{\mathcal {B}}}\updelta \hat{\bar{\varvec{{\Phi }}}}\cdot [{\textrm {D}}_{\Updelta \hat{\bar{\varvec{{\Phi }}}}}{\tilde{\textbf {{\textbf {S}}}}}]}_{\text {material part}}\,{\textrm {d}}A,\, \Updelta \hat{\bar{\varvec{{\Phi }}}} \in {\mathbb {W}}_{k,p}^{l,q}({\varOmega },M_E). \end{aligned} \end{aligned}$$
(5.39)

The two individual contributions resulting from application of the product rule of differentiation represent the classical separation of the tangent stiffness into a geometric and a material part. In the following derivations we take care of these contributions separately.

5.3.3 Material Part

The material part can be straightforwardly computed as

$$\begin{aligned} \begin{aligned} {[}{{\mathcal {B}}}(\hat{\bar{\varvec{{\Phi }}}})\updelta \hat{\bar{\varvec{{\Phi }}}}]\cdot {\textrm {D}}_{\Updelta \hat{\bar{\varvec{{\Phi }}}}}{\tilde{\textbf {{\textbf {S}}}}}({\textbf {E}}_{\textrm {V}}(\hat{\bar{\varvec{{\Phi }}}}))&= [{{\mathcal {B}}}(\hat{\bar{\varvec{{\Phi }}}})\updelta \hat{\bar{\varvec{{\Phi }}}}]\cdot \frac{{\partial }{\tilde{\textbf {{\textbf {S}}}}}({\textbf {E}}_{\textrm {V}}(\hat{\bar{\varvec{{\Phi }}}}+\epsilon \Updelta \hat{\bar{\varvec{{\Phi }}}}))}{{\partial }\epsilon }\Bigg |_{\epsilon =0} \\&=\updelta \hat{\bar{\varvec{{\Phi }}}}^T_{1\times 6} {{\mathcal {B}}}^T_{6\times 8}\mathbb {C}_{8\times 8} {{\mathcal {B}}}_{8\times 6} \Updelta \hat{\bar{\varvec{{\Phi }}}}_{6\times 1} \end{aligned} \end{aligned}$$
(5.40)

The material tangent moduli can be written in a local Cartesian coordinate system as

$$\begin{aligned} \mathbb {C}_{8\times 8}= \begin{bmatrix} {\left[ \frac{{\partial }^2\hat{{\bar{\psi }}}}{{\partial }\varepsilon _{i}\partial \varepsilon _{j}}\right] }_{3\times 3} &{}\quad {\left[ \frac{{\partial }^2\hat{{\bar{\psi }}}}{{\partial }\varepsilon _{i}\partial {\kappa }_{j}}\right] }_{3\times 3} &{}\quad {\left[ \frac{{\partial }^2\hat{{\bar{\psi }}}}{{\partial }\varepsilon _{i}\partial {\gamma }_{j}}\right] }_{3\times 2} \\ {\left[ \frac{{\partial }^2\hat{{\bar{\psi }}}}{{\partial }{\kappa }_{i}\partial \varepsilon _{j}}\right] }_{3\times 3} &{}\quad {\left[ \frac{{\partial }^2\hat{{\bar{\psi }}}}{{\partial }{\kappa }_{i}\partial {\kappa }_{j}}\right] }_{3\times 3} &{}\quad {\left[ \frac{{\partial }^2\hat{{\bar{\psi }}}}{{\partial }{\kappa }_{i}\partial {\gamma }_{j}}\right] }_{3\times 2} \\ {\left[ \frac{{\partial }^2\hat{{\bar{\psi }}}}{{\partial }{\gamma }_{i}\partial \varepsilon _{j}}\right] }_{2\times 3} &{}\quad {\left[ \frac{{\partial }^2\hat{{\bar{\psi }}}}{{\partial }{\gamma }_{i}\partial {\kappa }_{j}}\right] }_{2\times 3} &{}\quad {\left[ \frac{{\partial }^2\hat{{\bar{\psi }}}}{{\partial }{\gamma }_{i}\partial {\gamma }_{j}}\right] }_{2\times 2} \\ \end{bmatrix}. \end{aligned}$$
(5.41)

5.3.4 Geometric Part

By computing the Gâteaux derivative, the geometric part is obtained as

$$\begin{aligned} \left\{ {\textrm {D}}_{\Updelta \hat{\bar{\varvec{{\Phi }} } } }{{\mathcal {B}}}\updelta \hat{ \bar{ \varvec{{\Phi }} } } \right\} \cdot {\tilde{\textbf {{\textbf {S}}}}} = \left\{ \frac{{\partial }{{\mathcal {B}}}(\hat{\bar{\varvec{{\Phi }}}}+\epsilon \Updelta \hat{\bar{\varvec{{\Phi }}}})}{{\partial }\epsilon }\Big |_{\epsilon =0}\updelta \hat{\bar{\varvec{{\Phi }}}} \right\} \cdot {\tilde{\textbf {{\textbf {S}}}}}. \end{aligned}$$
(5.42)

which, in turn, can be rewritten as

$$\begin{aligned} \begin{aligned} \{{\textrm {D}}_{\Updelta \hat{\bar{\varvec{{\Phi }}}}}{{\mathcal {B}}}\updelta \hat{\bar{\varvec{{\Phi }}}}\} \cdot {\tilde{\textbf {{\textbf {S}}}}}&=\updelta \hat{\bar{\varvec{{\Phi }}}}^T [{{\textbf {k}}}^{\textrm {g}}] \Updelta \hat{\bar{\varvec{{\Phi }}}}\\&= \Updelta \updelta {\varvec{\varepsilon }}\cdot {\tilde{\textbf {{\textbf {N}}}}} +\Updelta \updelta {\varvec{\kappa }}\cdot {\tilde{\textbf {{\textbf {M}}}}} +\Updelta \updelta {\varvec{\gamma }}\cdot {\tilde{\textbf {{{\textbf {Q}}}}}}, \end{aligned} \end{aligned}$$
(5.43)

to implicitly define \({{\textbf {k}}}^{\textrm {g}}\). Furthermore, we used the definitions

$$\begin{aligned} \begin{aligned}&\Updelta \updelta \varepsilon _{{\alpha }{\beta }} =\frac{1}{2}(\updelta {\textbf {a}}_{{\alpha }} \cdot \Updelta {\textbf {a}}_{{\beta }}+ \Updelta {\textbf {a}}_{{\alpha }} \cdot \updelta {\textbf {a}}_{{\beta }}), \\&\Updelta \updelta \kappa _{{\alpha }{\beta }} =\frac{1}{2}(\updelta {\textbf {a}}_{\alpha }\cdot \Updelta {{\textbf {t}}}_{,{\beta }}+ \updelta {\textbf {a}}_{\beta }\cdot \Updelta {{\textbf {t}}}_{,{\alpha }} + \Updelta {\textbf {a}}_{\alpha }\cdot \updelta {{\textbf {t}}}_{,{\beta }} \\&\quad \quad \quad \quad \quad + \Updelta {\textbf {a}}_{\beta }\cdot \updelta {{\textbf {t}}}_{,{\alpha }}+{\textbf {a}}_{\alpha }\cdot \Updelta \updelta {{\textbf {t}}}_{,{\beta }}+ {\textbf {a}}_{\beta }\cdot \Updelta \updelta {{\textbf {t}}}_{,{\alpha }}), \\&\Updelta \updelta {\gamma }_{{\alpha }} = \updelta {\textbf {a}}_{\alpha }\cdot \Updelta {{\textbf {t}}}+ \Updelta {\textbf {a}}_{\alpha }\cdot \updelta {{\textbf {t}}}+{\textbf {a}}_{\alpha }\cdot \Updelta \updelta {{\textbf {t}}}. \end{aligned} \end{aligned}$$
(5.44)

We finally obtain

$$\begin{aligned} \begin{aligned}&{\textrm {D}}{{\bar{G}}}(\hat{\bar{\varvec{{\Phi }}}},\updelta \hat{\bar{\varvec{{\Phi }}}}) \cdot \Updelta \hat{\bar{\varvec{{\Phi }}}} \\&\quad = \int _{{{\mathcal {B}}}_0^C}\begin{bmatrix} \updelta {\varvec{\varphi }}\\ \updelta {{\textbf {t}}}\\ \end{bmatrix}^T\left( \underbrace{{{\mathcal {B}}}^T \mathbb {C}{{\mathcal {B}}}}_{\text {material}}+ \underbrace{{{\textbf {k}}}^{\textrm {g}}}_{\text {geometric}}\right) \begin{bmatrix} \Updelta {\varvec{\varphi }}\\ \Updelta {{\textbf {t}}}\\ \end{bmatrix}\,{\textrm {d}}A. \end{aligned} \end{aligned}$$
(5.45)

We stress that discretizing this quantity would yield the Euclidean algebraic stiffness matrix of an extensible director formulation living in \(\mathbb {R}^3\). This would result in a 6-parameter formulation but without any stiffness associated with the thickness stretch, due to the missing corresponding strain in Eq. (4.9). The reduction to a 5-parameter model is done as follows: We discretize Eqs. (5.45) and (5.38) and extract the Euclidean algebraic Hessian and the Euclidean algebraic residual from Eqs. (5.45) and (5.38). These are plugged into Eqs. (5.26) and (5.28) to obtain the Riemannian algebraic stiffness matrix and Riemannian algebraic residual.

So far, these derivations are general and no specific director interpolation was introduced. Therefore, we next specify interpolations of the director.

6 Director Interpolation

6.1 Motivation

We want to establish a Euclidean algebraic version of the internal forces of Eq. (5.38) and the tangent stiffness of Eq. (5.45). For this we first need to establish a consistent interpolation of the inextensible director. Consistency of the interpolation means satisfaction of the following properties. These are objectivity, path independence, invariance to node numbering, no singularities and unit length in the domain. At the end of this chapter we arrive at the algebraic element vectors and matrices in Sect. 7, which can be implemented directly. Furthermore, we show a possible C++-implementation of these quantities in Appendix 7. But before we derive the algebraic element vectors and matrices, we discuss properties of interpolation schemes for the director field found in literature.

6.2 Classical Interpolation Schemes

In the literature there are innumerable ways of interpolating the director in the non-linear Reissner-Mindlin model. Each of them have their unique advantages and drawbacks. In order to clarify some underlying issues and their origins, we take a small detour. It is tempting to simply define an angle pair \({\alpha },\beta \) to obtain a parametrization of quantities living on the unit sphere. Inevitably, this comes along with singularities according to the “hairy ball theorem”, see [37, 57, 62, 63, 97]. It is apparently straightforward to identify these angles as degrees of freedom and to apply standard interpolation such that \({\alpha }=\sum _{I=1}^n N^I {\alpha }_I,{\beta }=\sum _{I=1}^n N^I {\beta }_I\), see ([97], Eq. 54) and ([37], Eq 5.1). The director can then be constructed as \({{\textbf {t}}}={\textbf {R}}({\alpha },{\beta }){{\textbf {t}}}_0\). Unfortunately, using an additive update, such that \({\alpha }_I^{k+1}= {\alpha }_I^k + \Updelta \alpha _I\), leads to a non-objective formulation, since rigid body rotations do not cancel in the strain measures. Similar drawbacks can be found in [38, 79], where an incremental rotation vector \(\Updelta {\varvec{\theta }}\) is interpolated. The incremental quantities live in a linear space, where standard interpolation schemes apply. Because this contradicts the intrinsically non-linear nature of the problem, this construction leads to a non-objective formulation. This is also due to the results of [26].

The problem can be avoided by constructing nodal directors as \({{\textbf {t}}}_I={\textbf {R}}_I{{\textbf {t}}}_0^I\), and directly interpolating them, \({{\textbf {t}}}=\sum _{I=1}^n N^I {\textbf {R}}_I{{\textbf {t}}}_0 \), instead of the (rotational) degrees of freedom. The rotation matrix \({\textbf {R}}\) is updated multiplicatively as \({\textbf {R}}_I^{k+1} = \Updelta {\textbf {R}}(\Updelta {\alpha }_I,\Updelta {\beta }_I){\textbf {R}}_I^{k}\). This formulation uses incremental degrees of freedom \(\Updelta {\alpha }_I,\Updelta {\beta }_I\), see [30]. The relation between the director at the interpolation point and the nodal degrees of freedom is then

$$\begin{aligned} {{\textbf {t}}}= \sum _{I=1}^n N^I \Updelta {\textbf {R}}(\Updelta {\alpha }_I,\Updelta {\beta }_I){\textbf {R}}_I{{\textbf {t}}}_0. \end{aligned}$$
(6.1)

Equation (6.1) illustrates the complicated dependency of the interpolation scheme on the degrees of freedom.

The formulation is objective and the singularity that comes along with the parametrization of the unit sphere is practically irrelevant, since the iterative changes of the angles \(\Updelta {\alpha }_I,\Updelta {\beta }_I\) are typically small.

However, the procedure leads to a non-compact formulation and involves numerically expensive evaluations of trigonometric functions. Moreover, the interpolation does not conserve the director length. For low order finite elements and fine meshes this is not a big issue, but for higher order elements, which are typically larger, the effect is not only stronger but it results in a degeneration of the convergence order. This dramatic consequence, which is barely mentioned in the literature, is studied in detail in Sect. 10.2. The director \({{\textbf {t}}}\) can be normalized to remove this problem. However, as a consequence the expressions get even more involved. As an alternative to preserve the director length within the domain, several formulations introduce the director \({{\textbf {t}}}_{GP}\) as a history field at each Gauss point, see [17, 29, 86]. In these formulations, only the increment \(\Updelta {{\textbf {t}}}\) is interpolated from the nodes \(\Updelta {{\textbf {t}}}_{GP} =\sum _{I=1}^n N^I(\xi ^1,\xi ^2) \Updelta {{\textbf {t}}}_I\). This is then used to update the directors at each Gauss point as follows

$$\begin{aligned} \begin{aligned} {{\textbf {t}}}_{GP}^{k+1}&= \exp _{{{\textbf {t}}}_{GP}^{k}} (\Updelta {{\textbf {t}}}_{GP}) \\&= \cos (||\Updelta {{\textbf {t}}}_{GP}||) {{\textbf {t}}}_{GP}^{k} + \frac{\sin ||\Updelta {{\textbf {t}}}_{GP}|| }{||\Updelta {{\textbf {t}}}_{GP}||} \Updelta {{\textbf {t}}}_{GP} \end{aligned} \end{aligned}$$
(6.2)

or

$$\begin{aligned} {{\textbf {t}}}_{GP}^{k+1} = \Updelta {\textbf {R}}\left( \sum _{I=1}^n N^I \Updelta {\varvec{\theta }}_I\right) {\textbf {R}}_{GP}^k{\textbf {e}}_3. \end{aligned}$$
(6.3)

where \(\Updelta {\varvec{\theta }}_I = \Updelta {{\textbf {t}}}_I \times {{\textbf {t}}}_I\) and \({\textbf {e}}_3={[0,0,1]}^T\). The resulting scheme is non-objective and path dependent in nature, which was also proven by [26]. Additionally, the interpolated increment \(\Updelta {{\textbf {t}}}_{GP}\) is not automatically in the tangent space of \({{\textbf {t}}}_{GP}\), which can also lead to undesired consequences. The drawbacks of some of these formulations are also discussed in [73].

The entire procedure seems to be error prone in terms of singularities, non-objectivity and path dependence. Moreover, the evaluation and linearization can be expensive due the involved interpolation schemes.

An apparently attractive option to circumvent these drawbacks is to avoid parametrization of the unit sphere and the introduction of rotation matrices in the first place. This can be trivially done, if relations of submanifolds and their embedding are exploited, as proposed in [2] on an abstract level, not related to finite elements. It is shown in the following, how this can be applied to the Reissner-Mindlin model.

6.3 Interpolation, Variation and Linearization of the Midsurface Position Field

At first, we present the quantities of the midsurface interpolation. These can be trivially obtained by the standard interpolation procedure

$$\begin{aligned} \begin{aligned}&{\varvec{\varphi }}^{\textrm {h}}_0 = \sum _{I=1}^{n}N^I {\varvec{\varphi }}_{I,0},\, {\varvec{\varphi }}_{0,{\alpha }}^{\textrm {h}}= {\textbf {A}}_{\alpha }^{\textrm {h}}= \sum _{I=1}^{n}N^I_{,{\alpha }} {\varvec{\varphi }}_{I,0}, \\&{\varvec{\varphi }}^{\textrm {h}}= \sum _{I=1}^{n}N^I {\varvec{\varphi }}_I, \, {\varvec{\varphi }}_{,{\alpha }}^{\textrm {h}}= {\textbf {a}}_{\alpha }^{\textrm {h}}= \sum _{I=1}^{n}N^I_{,{\alpha }} {\varvec{\varphi }}_I, \\&\updelta {\varvec{\varphi }}^{\textrm {h}}= \sum _{I=1}^{n}N^I \updelta {\varvec{\varphi }}_I, \, \updelta {\varvec{\varphi }}_{,{\alpha }}^{\textrm {h}}= \updelta {\textbf {a}}_{\alpha }^{\textrm {h}}= \sum _{I=1}^{n}N^I_{,{\alpha }} \updelta {\varvec{\varphi }}_I, \\&\Updelta \updelta {\varvec{\varphi }}^{\textrm {h}}= 0, \, \Updelta \updelta {\varvec{\varphi }}_{,{\alpha }}^{\textrm {h}}=\Updelta \updelta {\textbf {a}}_{\alpha }^{\textrm {h}}= 0. \end{aligned} \end{aligned}$$
(6.4)

here \(N^I(\xi ^1,\xi ^2)= N^I\) is the basis function of node I. Obviously, here the linearization of the variation \(\Updelta \updelta {\varvec{\varphi }}^{\textrm {h}}\) of the field \({\varvec{\varphi }}^{\textrm {h}}\) vanishes, since the variation does not depend on the nodal values \({\varvec{\varphi }}_I\). This is different for the director field \({{\textbf {t}}}\), which will be treated next.

6.4 Interpolation, Variation and Linearization of the Director Field

In contrast to interpolation of the midsurface position, for the director field we use a non-linear interpolation scheme to ensure unit length. Here, we apply the projection-based approach used in [36, 90]. A similar approach can also be identified in a somewhat involved format in [30].

First, we construct the reference nodal directors \({{\textbf {t}}}_{0,I}\) according to the algorithm proposed in [28]. The reference director field and its spatial derivatives read

$$\begin{aligned} {{\textbf {t}}}_0^{\textrm {h}}= \sum _{I=1}^{n}N^I {{\textbf {t}}}_{0,I}, \quad {{\textbf {t}}}_{0,{\alpha }}^{\textrm {h}}= \sum _{I=1}^{n}N^I_{,{\alpha }} {{\textbf {t}}}_{0,I}. \end{aligned}$$
(6.5)

The algorithm in [28] returns the nodal reference directors \({{\textbf {t}}}_{0,I}\) in such a way that the interpolated directors of Eq. (6.5) at the integrations points are as normal as possible to the reference midsurface. Furthermore, the unit length constraint is also only fulfilled approximately. In general, however, the interpolated reference directors at each integration point are neither unit vectors nor are they normal to the midsurface. Still, in this algorithm the error is minimized in a least square sense. To cure at least the non-unit length of the interpolated reference director, we normalize it and obtain for the reference director field and its spatial derivatives

$$\begin{aligned} \begin{aligned}&{{\textbf {t}}}_0^{\textrm {h}}= \varvec{{{\mathcal {P}}}}_0({{\textbf {w}}}^{\textrm {h}}_0 )=\frac{{{\textbf {w}}}_0^{\textrm {h}}}{||{{\textbf {w}}}_0^{\textrm {h}}||},\quad {{\textbf {w}}}_0^{\textrm {h}}= \sum _{I=1}^{n}N^I {{\textbf {t}}}_{0,I} , \\&{{\textbf {t}}}_{0,{\alpha }}^{\textrm {h}}=\frac{{\partial }\varvec{{{\mathcal {P}}}}_0({{\textbf {w}}}^{\textrm {h}}_0 )}{{\partial }{{\textbf {w}}}^{\textrm {h}}_0 }\frac{{\partial }{{\textbf {w}}}^{\textrm {h}}_0 }{{\partial }\xi ^{\alpha }}= \frac{{\textbf {I}}-{{\textbf {t}}}_0^{\textrm {h}}\otimes {{\textbf {t}}}_0^{\textrm {h}}}{||{{\textbf {w}}}_0^{\textrm {h}}||}\sum _{I=1}^{n}N^I_{,{\alpha }} {{\textbf {t}}}_{0,I}\\&\quad \quad = \varvec{{{\mathcal {P}}}}'_0 \sum _{I=1}^{n}N^I_{,{\alpha }} {{\textbf {t}}}_{0,I}. \end{aligned} \end{aligned}$$
(6.6)

where the derivative of the closest point projection w.r.t. to its argument reads

$$\begin{aligned} \begin{aligned} \frac{{\partial }\varvec{{{\mathcal {P}}}}_0({{\textbf {w}}}^{\textrm {h}}_0 )}{{\partial }{{\textbf {w}}}^{\textrm {h}}_0 }&= \frac{{\partial }\frac{{{\textbf {w}}}_0^{\textrm {h}}}{{({{\textbf {w}}}_0^{\textrm {h}}\cdot {{\textbf {w}}}_0^{\textrm {h}})}^{1/2}}}{{\partial }{{\textbf {w}}}^{\textrm {h}}_0 }= \frac{{\textbf {I}}}{||{{\textbf {w}}}_0^{\textrm {h}}||} -\frac{{{\textbf {w}}}_0^{\textrm {h}}\otimes {{\textbf {w}}}_0^{\textrm {h}}}{||{{\textbf {w}}}_0^{\textrm {h}}||^3}\\&= \frac{{\textbf {I}}}{||{{\textbf {w}}}_0^{\textrm {h}}||} -\frac{{{\textbf {t}}}_0^{\textrm {h}}\otimes {{\textbf {t}}}_0^{\textrm {h}}}{||{{\textbf {w}}}_0^{\textrm {h}}||}. \end{aligned} \end{aligned}$$
(6.7)

Thus, the only remaining potential error for the reference interpolation of the director is its angle deviation from the surface normal. Note that the nodal directors do in general not have unit length, this is only true for the interpolated director. The treatment of this effect in the nodal update algorithm can be seen in Algo. 2.

Second, we introduce the projection-based interpolation for the current director field as

$$\begin{aligned} \begin{aligned} {{\textbf {w}}}^{\textrm {h}}&= \sum _{I=1}^{n}N^I {{\textbf {t}}}_I, {{\textbf {t}}}^{\textrm {h}}= \varvec{{{\mathcal {P}}}}({{\textbf {w}}}^{\textrm {h}}) = \frac{{{\textbf {w}}}^{\textrm {h}}}{||{{\textbf {w}}}^{\textrm {h}}||}, \\ {{\textbf {t}}}^{\textrm {h}}_{,{\alpha }}&= \frac{{\textbf {I}}-{{\textbf {t}}}^{\textrm {h}}\otimes {{\textbf {t}}}^{\textrm {h}}}{||{{\textbf {w}}}^{\textrm {h}}||} \sum _{I=1}^{n}N^I_{,{\alpha }} {{\textbf {t}}}_I = \varvec{{{\mathcal {P}}}}' \sum _{I=1}^{n}N^I_{,{\alpha }} {{\textbf {t}}}_I . \end{aligned} \end{aligned}$$
(6.8)

In the following, for a more compact notation we omit the superscript \({\textrm {h}}\), denoting discretized quantities. Moreover, the following operators are introduced:

$$\begin{aligned} \begin{aligned} \varvec{{{\mathcal {P}}}}'&= \frac{{\textbf {I}}-{{\textbf {t}}}\otimes {{\textbf {t}}}}{||{{\textbf {w}}}||}, \quad \varvec{{{\mathcal {P}}}}'_0 = \frac{{\textbf {I}}-{{\textbf {t}}}_0 \otimes {{\textbf {t}}}_0 }{||{{\textbf {w}}}_0||}, \\ \varvec{{{\mathcal {Q}}}}_{{\alpha }}&= \frac{1}{||{{\textbf {w}}}||^2} \Big [ ( {{\textbf {t}}}\cdot {{\textbf {w}}}_{,{\alpha }})\left( 3{{\textbf {t}}}\otimes {{\textbf {t}}}- {\textbf {I}}\right) \\&\quad - 2 {{\,\textrm{sym}\,}}({{\textbf {w}}}_{,{\alpha }} \otimes {{\textbf {t}}})\Big ], \\ \varvec{{{\mathcal {S}}}}_{\alpha }&= \frac{1}{||{{\textbf {w}}}||^2}\Big [({{\textbf {t}}}\cdot {\textbf {a}}_{\alpha }) \left( 3 {{\textbf {t}}}\otimes {{\textbf {t}}}-{\textbf {I}}\right) \\&\quad -2{{\,\textrm{sym}\,}}({\textbf {a}}_{\alpha }\otimes {{\textbf {t}}}) \Big ], \\ \varvec{{{\mathcal {W}}}}_{\alpha }^I&=\varvec{{{\mathcal {Q}}}}_{\alpha }N^I + \varvec{{{\mathcal {P}}}}' N^I_{,{\alpha }},\\ \varvec{{{\mathcal {X}}}}_{{\alpha }{\beta }}&= \frac{2{{\,\textrm{sym}\,}}}{||{{\textbf {w}}}||^3}\Big [3({{\textbf {t}}}\cdot {{\textbf {w}}}_{,{\beta }})[{\textbf {a}}_{\alpha }\otimes {{\textbf {t}}}\\&\quad +\frac{1}{2}({\textbf {a}}_{\alpha }\cdot {{\textbf {t}}}) ({\textbf {I}}-5 {{\textbf {t}}}\otimes {{\textbf {t}}})] \\&\quad + 3\left[ \frac{1}{2}({\textbf {a}}_{\alpha }\cdot {{\textbf {w}}}_{,{\beta }}) ({{\textbf {t}}}\otimes {{\textbf {t}}}-\frac{1}{3}{\textbf {I}})\right. \\&\quad +\left. ({\textbf {a}}_{\alpha }\cdot {{\textbf {t}}}) {{\textbf {w}}}_{,{\beta }} \otimes {{\textbf {t}}}\right] -{\textbf {a}}_{\alpha }\otimes {{\textbf {w}}}_{,{\beta }}\Big ]. \end{aligned} \end{aligned}$$
(6.9)

With these operators the variation and linearization of the director quantities can be derived as

$$\begin{aligned} \begin{aligned}&\updelta {{\textbf {t}}}=\sum _{I=1}^{n} \frac{{\partial }{{\textbf {t}}}}{{\partial }{{\textbf {t}}}_I} \updelta {{\textbf {t}}}_I = \varvec{{{\mathcal {P}}}}' \sum _{I=1}^{n}N^I \updelta {{\textbf {t}}}_I,\\&\updelta {{\textbf {t}}}_{,{\alpha }}=\sum _{I=1}^{n} \frac{{\partial }{{\textbf {t}}}_{,{\alpha }}}{{\partial }{{\textbf {t}}}_I} \updelta {{\textbf {t}}}_I =\sum _{I=1}^{n}\varvec{{{\mathcal {W}}}}_{\alpha }^I \updelta {{\textbf {t}}}_I, \\&\Updelta {{\textbf {t}}}=\sum _{I=1}^{n} \frac{{\partial }{{\textbf {t}}}}{{\partial }{{\textbf {t}}}_I} \Updelta {{\textbf {t}}}_I = \varvec{{{\mathcal {P}}}}' \sum _{I=1}^{n}N^I \Updelta {{\textbf {t}}}_I, \\&\Updelta {{\textbf {t}}}_{,{\alpha }}=\sum _{I=1}^{n} \frac{{\partial }{{\textbf {t}}}_{,{\alpha }}}{{\partial }{{\textbf {t}}}_I} \Updelta {{\textbf {t}}}_I =\sum _{I=1}^{n}\varvec{{{\mathcal {W}}}}_{\alpha }^I \Updelta {{\textbf {t}}}_I \\ \end{aligned} \end{aligned}$$
(6.10)

and the linearization of the variation reads

$$\begin{aligned} \begin{aligned} \Updelta \updelta {{\textbf {t}}}&=\sum _{I=1}^{n}\sum _{J=1}^{n} \Updelta {{\textbf {t}}}_J \frac{{\partial }^2{{\textbf {t}}}}{{\partial }{{\textbf {t}}}_J \partial {{\textbf {t}}}_I }\updelta {{\textbf {t}}}_I = \Updelta \updelta t^k {\textbf {e}}_k \\&=\sum _{J=1}^{n}\sum _{I=1}^{n}\updelta t_J^l{({{\mathcal {P}}}'')}^k_{lj} N^J N^I\Updelta t_I^j {\textbf {e}}_k, \\ \Updelta \updelta {{\textbf {t}}}_{,{\alpha }}&=\sum _{I=1}^{n} \sum _{J=1}^{n} \Updelta {{\textbf {t}}}_J \frac{{\partial }^2{{\textbf {t}}}_{,{\alpha }}}{{\partial }{{\textbf {t}}}_J \partial {{\textbf {t}}}_I }\updelta {{\textbf {t}}}_I = {(\Updelta \updelta t_{,{\alpha }})}^k {\textbf {e}}_k \\&=\sum _{I=1}^{n} \sum _{J=1}^{n}\updelta t_J^l [N^I N^J {({{\mathcal {P}}}''')}^k_{lmj}w^m_{,{\alpha }} \\&\quad + {({{\mathcal {P}}}'')}^k_{lj}(N_{,{\alpha }}^I N^J + N^I N^J_{,{\alpha }} ) ]\Updelta t_I^j {\textbf {e}}_k. \end{aligned} \end{aligned}$$
(6.11)

The partial derivatives of the projector onto the unit sphere \(\varvec{{{\mathcal {P}}}}\) are summarized in Appendix 4. Furthermore, the first derivative \(\varvec{{{\mathcal {P}}}}'\) does not coincide with the nodal projection \(P_{{{\textbf {t}}}_I}\) and only the numerator is a projection matrix, since \({(\varvec{{{\mathcal {P}}}}')}^n= \frac{1}{||{{\textbf {w}}}||^n}\varvec{{{\mathcal {P}}}}'\) instead of \({(\varvec{{{\mathcal {P}}}}')}^n= \varvec{{{\mathcal {P}}}}'\). Additionally, the quantities of Eq. (6.11) always occur in a scalar product with \({\textbf {a}}_{\alpha }\). They can be rewritten as

$$\begin{aligned} \begin{aligned}&{\textbf {a}}_{\alpha }\cdot \Updelta \updelta {{\textbf {t}}}=\sum _{J=1}^{n}\sum _{I=1}^{n}\updelta {{\textbf {t}}}_J\varvec{{{\mathcal {S}}}}_{\alpha }N^J N^I\Updelta {{\textbf {t}}}_I , \\&{\textbf {a}}_{\alpha }\cdot \Updelta \updelta {{\textbf {t}}}_{,{\beta }} =\sum _{I=1}^{n} \sum _{J=1}^{n}\updelta {{\textbf {t}}}_J[N^J N^I \varvec{{{\mathcal {X}}}}_{{\alpha }{\beta }} \\&\quad \quad \quad \quad \quad \quad + \varvec{{{\mathcal {S}}}}_{\alpha }(N_{,{\beta }}^I N^J + N^I N^J_{,{\beta }} ) ]\Updelta {{\textbf {t}}}_I. \end{aligned} \end{aligned}$$
(6.12)

7 Element Vectors and Matrices

In the following, we present the quantities required to obtain the algebraic optimization problem. First, we present the Euclidean quantities and afterwards we apply the base change using \(\varvec{{\Lambda }}_{\varvec{{\Phi }}_I}\) and the projections to obtain the Riemannian quantities.

7.1 Internal Forces and Material Part of the Stiffness Matrix

Using the aforementioned definitions for the interpolation, the Euclidean algebraic strain-displacement operator — resulting from the continuous one from Eq. (5.35) — for a generic node I can be given as

$$\begin{aligned} {{\mathcal {B}}}_I=\begin{bmatrix}\begin{bmatrix} {\textbf {a}}_1^T N^I_{,1} \\ {\textbf {a}}_2^T N^I_{,2} \\ {\textbf {a}}_1^T N^I_{,2}+{\textbf {a}}_2^T N^I_{,1} \\ \end{bmatrix} &{}\quad \!\!\!\!\!{\varvec{0}}_{3\times 3} \\ \begin{bmatrix} {{\textbf {t}}}_{,1}^T N^I_{,1} \\ {{\textbf {t}}}_{,2}^T N^I_{,2} \\ {{\textbf {t}}}_{,1}^T N^I_{,2}+{{\textbf {t}}}_{,2}^T N^I_{,1} \end{bmatrix} &{}\quad \!\!\!\!\!\begin{bmatrix} {\textbf {a}}_1^T \varvec{{{\mathcal {W}}}}_1^I \\ {\textbf {a}}_2^T \varvec{{{\mathcal {W}}}}_2^I \\ {\textbf {a}}_1^T \varvec{{{\mathcal {W}}}}_2^I + {\textbf {a}}_2^T \varvec{{{\mathcal {W}}}}_1^I \end{bmatrix} \\ \begin{bmatrix} {{\textbf {t}}}^T N^I_{,1} \\ {{\textbf {t}}}^T N^I_{,2} \end{bmatrix} &{}\quad \!\!\!\!\!\begin{bmatrix} {\textbf {a}}_1^T N^I \\ {\textbf {a}}_2^T N^I \end{bmatrix}\varvec{{{\mathcal {P}}}}' \\ \end{bmatrix}_{8\times 6} . \end{aligned}$$
(7.1)

With this and the fundamental lemma of variational calculus we can derive the Euclidean algebraic internal forces as

$$\begin{aligned} {\textbf {F}}_\text {int,I}^\text {euk} = \int _{{{\mathcal {B}}}_0^C} {{\mathcal {B}}}_I^T {\textbf {S}}~\,{\textrm {d}}A \end{aligned}$$
(7.2)

and the material part of the Euclidean stiffness matrix as

$$\begin{aligned} {\textbf {K}}_\text {mat,JI}^\text {euk} = \int _{{{\mathcal {B}}}_0^C} {{\mathcal {B}}}_J^T\mathbb {C}{{\mathcal {B}}}_I ~\,{\textrm {d}}A. \end{aligned}$$
(7.3)

7.2 Geometric Part of the Stiffness Matrix

With the linearization of the variation of the strains and the definitions in Eq. (5.44) we obtain the following discrete quantities.

7.2.1 Contribution from Membrane Strains

For the membrane strains we have

$$\begin{aligned} \begin{aligned} \Updelta \updelta {\varvec{\varepsilon }}: {\tilde{\textbf {{\textbf {N}}}}}&=\frac{1}{2}(\updelta {\textbf {a}}_{{\alpha }} \cdot \Updelta {\textbf {a}}_{{\beta }}+ \Updelta {\textbf {a}}_{{\alpha }} \cdot \updelta {\textbf {a}}_{{\beta }}){{\tilde{N}}}^{{\alpha }{\beta }} \\&=\sum _{J=1}^{n}\sum _{I=1}^{n}\updelta {\varvec{\varphi }}_J [{{\tilde{N}}}^{11} N_{,1}^{J} N_{,1}^{I}+ {{\tilde{N}}}^{22} N_{,2}^{J} N_{,2}^{I}\\&\quad + {{\tilde{N}}}^{12}(N_{,1}^{J} N_{,2}^{I}+ N_{,2}^{J} N_{,1}^{I})]{\textbf {I}}_{3\times 3} \Updelta {\varvec{\varphi }}_I \\&=\sum _{J=1}^{n}\sum _{I=1}^{n}\updelta {\varvec{\varphi }}_J {\hat{{\textbf {{\textbf {N}}}}}}^{JI} \Updelta {\varvec{\varphi }}_I, \end{aligned} \end{aligned}$$
(7.4)

using the abbreviation

$$\begin{aligned} \begin{aligned} {\hat{{\textbf {{\textbf {N}}}}}}^{JI}&=[{{\tilde{N}}}^{11} N_{,1}^{J} N_{,1}^{I}+ {{\tilde{N}}}^{22} N_{,2}^{J} N_{,2}^{I}\\&\quad + {{\tilde{N}}}^{12}(N_{,1}^{J} N_{,2}^{I}+ N_{,2}^{J} N_{,1}^{I})]{\textbf {I}}_{3\times 3}. \end{aligned} \end{aligned}$$
(7.5)

The membrane contribution to the geometric stiffness matrix is thus

$$\begin{aligned} \Updelta \updelta {\varvec{\varepsilon }}: {\tilde{\textbf {{\textbf {N}}}}} =\sum _{J=1}^{n}\sum _{I=1}^{n}\begin{bmatrix} \updelta {\varvec{\varphi }}_J \\ \updelta {{\textbf {t}}}_J \\ \end{bmatrix}^T \begin{bmatrix} {\hat{{\textbf {{\textbf {N}}}}}}^{JI} &{} {\varvec{0}}_{3\times 3} \\ {\varvec{0}}_{3\times 3} &{} {\varvec{0}}_{3\times 3} \\ \end{bmatrix}\begin{bmatrix} \Updelta {\varvec{\varphi }}_I \\ \Updelta {{\textbf {t}}}_I \\ \end{bmatrix}. \end{aligned}$$
(7.6)

7.2.2 Contribution from Curvature

For the linearization of the variation of the curvature we get

$$\begin{aligned} \begin{aligned} \Updelta \updelta {\varvec{\kappa }}: {\tilde{\textbf {M}}}&=\frac{1}{2}(\updelta {\textbf {a}}_{\alpha }\cdot \Updelta {{\textbf {t}}}_{,{\beta }}+ \updelta {\textbf {a}}_{\beta }\cdot \Updelta {{\textbf {t}}}_{,{\alpha }}+ \Updelta {\textbf {a}}_{\alpha }\cdot \updelta {{\textbf {t}}}_{,{\beta }}\\&\quad + \Updelta {\textbf {a}}_{\beta }\cdot \updelta {{\textbf {t}}}_{,{\alpha }}+{\textbf {a}}_{\alpha }\cdot \Updelta \updelta {{\textbf {t}}}_{,{\beta }}+ {\textbf {a}}_{\beta }\cdot \Updelta \updelta {{\textbf {t}}}_{,{\alpha }}) {{\tilde{M}}}^{{\alpha }{\beta }} \\&= \frac{1}{2}\sum _{J=1}^{n}\sum _{I=1}^{n}[N^J_{,{\alpha }} \updelta {\varvec{\varphi }}_J \cdot \!\varvec{{{\mathcal {W}}}}_{\beta }^I \Updelta {{\textbf {t}}}_I +N^J_{,{\beta }} \updelta {\varvec{\varphi }}_J \!\cdot \! \varvec{{{\mathcal {W}}}}_{\alpha }^I \Updelta {{\textbf {t}}}_I \\&\quad + N^J_{,{\alpha }} \Updelta {\varvec{\varphi }}_J \!\cdot \!\varvec{{{\mathcal {W}}}}_{\beta }^I \updelta {{\textbf {t}}}_I + N^J_{,{\beta }} \Updelta {\varvec{\varphi }}_J \cdot \varvec{{{\mathcal {W}}}}_{\alpha }^I \updelta {{\textbf {t}}}_I \\&\quad +\updelta {{\textbf {t}}}_J [N^J N^I (\varvec{{{\mathcal {X}}}}_{{\alpha }{\beta }}+\varvec{{{\mathcal {X}}}}_{{\beta }{\alpha }}) \\&\quad + \varvec{{{\mathcal {S}}}}_{\alpha }(N_{,{\beta }}^I N^J + N^I N^J_{,{\beta }} ) \\&\quad + \varvec{{{\mathcal {S}}}}_{\beta }(N_{,{\alpha }}^I N^J + N^I N^J_{,{\alpha }} ) ]\Updelta {{\textbf {t}}}_I] {{\tilde{M}}}^{{\alpha }{\beta }}. \end{aligned} \end{aligned}$$
(7.7)

Using the short cuts

$$\begin{aligned} \begin{aligned}&{{\mathcal {N}}}^{IJ}_1 = (N^I_{,1}N^J+N^J_{,1}N^I), \\&{{\mathcal {N}}}^{IJ}_2 = (N^I_{,2}N^J+N^J_{,2}N^I), \\&{\hat{{\textbf {M}}}}^{JI} =N^J_{,1}\varvec{{{\mathcal {W}}}}_1^I {{\tilde{M}}}^{11}+ N^J_{,2} \varvec{{{\mathcal {W}}}}_2^I {{\tilde{M}}}^{22}\\&\quad \quad \quad \,\, +(N^J_{,1} \varvec{{{\mathcal {W}}}}_2^I + N^J_{,2} \varvec{{{\mathcal {W}}}}_1^I) {{\tilde{M}}}^{12} , \\&{\hat{{\textbf {M}}}}^{IJ} =\varvec{{{\mathcal {W}}}}_1^J N^I_{,1} {{\tilde{M}}}^{11}+\varvec{{{\mathcal {W}}}}_2^J N^I_{,2}{{\tilde{M}}}^{22}\\&\quad \quad \quad \,\, + (\varvec{{{\mathcal {W}}}}_2^J N^I_{,1} +\varvec{{{\mathcal {W}}}}_1^J N^I_{,2}) {{\tilde{M}}}^{12} , \\&{\hat{{\textbf {M}}}}^{JI}_{\varvec{{{\mathcal {X}}}}} =N^J N^I ({{\tilde{M}}}^{11}\varvec{{{\mathcal {X}}}}_{11}+ {{\tilde{M}}}^{22}\varvec{{{\mathcal {X}}}}_{22}\\&\quad \quad \quad \,\, +( \varvec{{{\mathcal {X}}}}_{21}+ \varvec{{{\mathcal {X}}}}_{12}){{\tilde{M}}}^{12}) , \\&{\hat{{\textbf {M}}}}^{JI}_{\varvec{{{\mathcal {S}}}}} = \varvec{{{\mathcal {S}}}}_1 {{\mathcal {N}}}^{IJ}_1 {{\tilde{M}}}^{11}+ \varvec{{{\mathcal {S}}}}_2 {{\mathcal {N}}}^{IJ}_2 {{\tilde{M}}}^{22}\\&\quad \quad \quad \,\, + [\varvec{{{\mathcal {S}}}}_1 {{\mathcal {N}}}^{IJ}_2 + \varvec{{{\mathcal {S}}}}_2 {{\mathcal {N}}}^{IJ}_1] {{\tilde{M}}}^{12}, \end{aligned} \end{aligned}$$
(7.8)

the bending contribution to the geometric stiffness matrix reads

$$\begin{aligned} \begin{aligned} \Updelta&\updelta {\varvec{\kappa }}: {\tilde{\textbf {{\textbf {M}}}}} =&\sum _{J=1}^{n}\sum _{I=1}^{n}\begin{bmatrix} \updelta {\varvec{\varphi }}_J \\ \updelta {{\textbf {t}}}_J \\ \end{bmatrix}^T \begin{bmatrix} {\varvec{0}}_{3\times 3} &{}\quad {\hat{{\textbf {M}}}}^{JI} \\ {\hat{{\textbf {M}}}}^{IJ} &{}\quad {\hat{{\textbf {M}}}}^{JI}_{\varvec{{{\mathcal {X}}}}} + {\hat{{\textbf {M}}}}^{JI}_{\varvec{{{\mathcal {S}}}}} \\ \end{bmatrix}\begin{bmatrix} \Updelta {\varvec{\varphi }}_I \\ \Updelta {{\textbf {t}}}_I \\ \end{bmatrix}. \end{aligned} \end{aligned}$$
(7.9)

7.2.3 Contribution from Transverse Shear

Similarly, we obtain for transverse shear

$$\begin{aligned} \begin{aligned} \Updelta \updelta {\varvec{\rho }}: {\tilde{\textbf {{{\textbf {Q}}}}}}&= [\updelta {\textbf {a}}_{{\alpha }} \cdot \Updelta {{\textbf {t}}}+ \Updelta {\textbf {a}}_{{\alpha }} \cdot \updelta {{\textbf {t}}}+{\textbf {a}}_{{\alpha }} \cdot \Updelta \updelta {{\textbf {t}}}]{{\tilde{Q}}}^{\alpha }\\&= \sum _{J=1}^{n}\sum _{I=1}^{n}[N^J_{,{\alpha }} \updelta {\varvec{\varphi }}_J \cdot \varvec{{{\mathcal {P}}}}' N^I \Updelta {{\textbf {t}}}_I \\&\quad + N^I_{,{\alpha }} \Updelta {\varvec{\varphi }}_I \cdot \varvec{{{\mathcal {P}}}}' N^J \updelta {{\textbf {t}}}_J+\updelta {{\textbf {t}}}_J(\varvec{{{\mathcal {S}}}}_{\alpha }) \Updelta {{\textbf {t}}}_I]{{\tilde{Q}}}^{\alpha }, \\&= \sum _{J=1}^{n}\sum _{I=1}^{n} \updelta {\varvec{\varphi }}_J \cdot \varvec{{{\mathcal {P}}}}' (N^J_{,1}N^I {{\tilde{Q}}}^{1}+ N^J_{,2}N^I {{\tilde{Q}}}^{2})\Updelta {{\textbf {t}}}_I \\&\quad + \Updelta {\varvec{\varphi }}_I \cdot \varvec{{{\mathcal {P}}}}'( N^I_{,1} N^J {{\tilde{Q}}}^1+ N^I_{,2} N^J{{\tilde{Q}}}^2) \updelta {{\textbf {t}}}_J, \\&\quad +\updelta {{\textbf {t}}}_J N^J(\varvec{{{\mathcal {S}}}}_1 {{\tilde{Q}}}^{1} +\varvec{{{\mathcal {S}}}}_2 {{\tilde{Q}}}^{2}) N^I \Updelta {{\textbf {t}}}_I. \end{aligned} \end{aligned}$$
(7.10)

This can be rearranged using the following short cuts

$$\begin{aligned} \begin{aligned} {\hat{{\textbf {{{\textbf {Q}}}}}}}^{JI}&= \varvec{{{\mathcal {P}}}}' N^I (N^J_{,1}{{\tilde{Q}}}^{1}+ N^J_{,2} {{\tilde{Q}}}^{2}), \\ {\hat{{\textbf {{{\textbf {Q}}}}}}}^{IJ}&= \varvec{{{\mathcal {P}}}}' N^J( N^I_{,1} {{\tilde{Q}}}^1 + N^I_{,2} {{\tilde{Q}}}^2), \\ {\hat{{\textbf {{{\textbf {Q}}}}}}}^{JI}_{\varvec{{{\mathcal {S}}}}}&= N^J N^I(\varvec{{{\mathcal {S}}}}_1 {{\tilde{Q}}}^{1} +{\varvec{{{\mathcal {S}}}}}_2 {{\tilde{Q}}}^{2}). \end{aligned} \end{aligned}$$
(7.11)

The corresponding contribution to the geometric stiffness matrix is

$$\begin{aligned} \Updelta \updelta {\varvec{\rho }}: {{\textbf {Q}}}=\sum _{J=1}^{n}\sum _{I=1}^{n}\begin{bmatrix} \updelta {\varvec{\varphi }}_J \\ \updelta {{\textbf {t}}}_J \\ \end{bmatrix}^T \begin{bmatrix} {\varvec{0}}_{3\times 3} &{} {\hat{{\textbf {{{\textbf {Q}}}}}}}^{JI} \\ {\hat{{\textbf {{{\textbf {Q}}}}}}}^{IJ} &{} {\hat{{\textbf {{{\textbf {Q}}}}}}}^{JI}_{\varvec{{{\mathcal {S}}}}} \\ \end{bmatrix}\begin{bmatrix} \Updelta {\varvec{\varphi }}_I \\ \Updelta {{\textbf {t}}}_I \\ \end{bmatrix}. \end{aligned}$$
(7.12)

Finally, the Euclidean geometric stiffness contribution for a pair of nodes I and J is

$$\begin{aligned} \begin{aligned}&{\textbf {K}}_{\text {geo},JI}^\text {euk} = \int _{{{\mathcal {B}}}_0^C} \begin{bmatrix} {\hat{{\textbf {{\textbf {N}}}}}}^{JI} &{} {\hat{{\textbf {M}}}}^{JI}+{\hat{{\textbf {{{\textbf {Q}}}}}}}^{JI} \\ {\hat{{\textbf {M}}}}^{IJ}+{\hat{{\textbf {{{\textbf {Q}}}}}}}^{IJ} &{} {\hat{{\textbf {M}}}}^{JI}_{\varvec{{{\mathcal {X}}}}} + {\hat{{\textbf {M}}}}^{JI}_{\varvec{{{\mathcal {S}}}}}+{\hat{{\textbf {{{\textbf {Q}}}}}}}^{JI}_{\varvec{{{\mathcal {S}}}}} \\ \end{bmatrix}~\,{\textrm {d}}A. \end{aligned} \end{aligned}$$
(7.13)

This results in the total Euclidean algebraic stiffness matrix

$$\begin{aligned} {\textbf {K}}_{JI}^\text {euk} = \begin{bmatrix} \frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{\varvec{\varphi }}_J \partial {\varvec{\varphi }}_I} &{} \quad \frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{\varvec{\varphi }}_J \partial {{\textbf {t}}}_I} \\ \frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_J \partial {\varvec{\varphi }}_I} &{} \frac{{\partial }^2{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_J \partial {{\textbf {t}}}_I} \\ \end{bmatrix}= {\textbf {K}}_{\text {mat},JI}^\text {euk} + {\textbf {K}}_{\text {geo},JI}^\text {euk}, \end{aligned}$$
(7.14)

as needed for the left part in Eq. (5.26). The last missing part for the stiffness matrix is the product in the last part in Eq. (5.26), i.e. \({{\textbf {t}}}_J^T \frac{{\partial }{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_J}\).

Introducing the external potential \({\varPi }^{\text {ext}}\), as defined in Appendix 6, and inserting Eq. (7.2) the Euclidean residual \({\textbf {R}}_I^\text {euk}\) is obtained as

$$\begin{aligned} \begin{aligned} {\textbf {R}}_I^\text {euk}&={{\,\textrm{grad}\,}}_I {\bar{{\varPi }}}(\varvec{{\Phi }}) =\begin{bmatrix} \frac{{\partial }{\bar{{\varPi }}}}{{\partial }{\varvec{\varphi }}_I} \\ \frac{{\partial }{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_I} \\ \end{bmatrix}_{6\times 1} \\&= \begin{bmatrix} {\textbf {F}}_\text {int}^{\text {euk},{\varvec{\varphi }}_I} \\ {\textbf {F}}_\text {int}^{\text {euk},{{\textbf {t}}}_I} \\ \end{bmatrix}-\begin{bmatrix} {\textbf {F}}_\text {ext}^{\text {euk},{\varvec{\varphi }}_I} \\ {\textbf {F}}_\text {ext}^{\text {euk},{{\textbf {t}}}_I} \\ \end{bmatrix}. \end{aligned} \end{aligned}$$
(7.15)

From the results of Appendix 6 we know that the external moment load vector lies in the tangent space of \({{\textbf {t}}}_J\) (for conservative loading) and this results in

$$\begin{aligned} {{\textbf {t}}}_J^T\frac{{\partial }{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_J}={{\textbf {t}}}_J^T \left( {\textbf {F}}_\text {int}^{\text {euk},{{\textbf {t}}}_J}- {\textbf {F}}_\text {ext}^{\text {euk},{{\textbf {t}}}_J} \right) = {{\textbf {t}}}_J^T {\textbf {F}}_\text {int}^{\text {euk},{{\textbf {t}}}_J}, \end{aligned}$$
(7.16)

since \({{\textbf {t}}}_J^T{\textbf {F}}_\text {ext}^{{{\textbf {t}}}_J}=0\). The stiffness matrix thus further simplifies, because it is now independent of the external loads.

If we introduce the following tangent base matrix, which consists of the tangent base of \(\mathbb {R}^3\), which is the identity, and the tangent base of \({{\mathcal {S}}}^2\) at node I, which is \(\varvec{{\Lambda }}_I\), we have

$$\begin{aligned} \varvec{{\Lambda }}_{\varvec{{\Phi }}^I}=\begin{bmatrix} {\textbf {I}}_{3\times 3} &{} {\varvec{0}}_{3\times 2} \\ {\varvec{0}}_{3\times 3} &{} \varvec{{\Lambda }}_I \\ \end{bmatrix}_{6\times 5}. \end{aligned}$$
(7.17)

Recalling Eq. (5.26) and plugging in Eqs. (7.3), (7.13) and (7.15) to (7.17) we can derive the reduced Riemannian stiffness matrix

$$\begin{aligned} \begin{aligned} {\textbf {K}}^{JI,\text {riem}}_{5\times 5}&=\underbrace{{(\varvec{{\Lambda }}_{\varvec{{\Phi }}^J}^T)}_{5\times 6} \int _{{{\mathcal {B}}}_0^C} {[{{\mathcal {B}}}^T \mathbb {C}{{\mathcal {B}}}]}_{6\times 6} ~\,{\textrm {d}}A~ {\varvec{{\Lambda }}_{\varvec{{\Phi }}^I}}_{6\times 5}}_{{\textbf {K}}^{JI,\text {eu},\text {riem}}} \\&\quad +\underbrace{{(\varvec{{\Lambda }}_{\varvec{{\Phi }}^J}^T)}_{5\times 6} \int _{{{\mathcal {B}}}_0^C} \begin{bmatrix} {\hat{{\textbf {{\textbf {N}}}}}}^{JI} &{} {\hat{{\textbf {M}}}}^{JI}+{\hat{{\textbf {{{\textbf {Q}}}}}}}^{JI} \\ {\hat{{\textbf {M}}}}^{IJ}+{\hat{{\textbf {{{\textbf {Q}}}}}}}^{IJ} &{} {\hat{{\textbf {M}}}}^{JI}_{\varvec{{{\mathcal {X}}}}} + {\hat{{\textbf {M}}}}^{JI}_{\varvec{{{\mathcal {S}}}}}+{\hat{{\textbf {{{\textbf {Q}}}}}}}^{JI}_{{\mathcal {S}}}\\ \end{bmatrix}_{6\times 6}\!\!\,{\textrm {d}}A~{\varvec{{\Lambda }}_{\varvec{{\Phi }}^I}}_{6\times 5}}_{{\textbf {K}}^{JI,{\textrm {g}},\text {riem}}}- \underbrace{\begin{bmatrix} {\varvec{0}}_{3\times 3} &{} {\varvec{0}}_{3\times 2} \\ {\varvec{0}}_{2\times 3} &{} {{\textbf {t}}}_J^T {\textbf {F}}_{\text {int}}^{\text {euk},{{\textbf {t}}}_J} {\textbf {I}}_{2\times 2} \delta _{IJ} \\ \end{bmatrix}}_{{\textbf {K}}^{JI,\text {g2},\text {riem}}} \end{aligned} \end{aligned}$$
(7.18)

With the reduced discrete Riemannian strain-displacement operator of node I

$$\begin{aligned} \begin{aligned} {{\mathcal {B}}}^\text {riem}_I&= {{\mathcal {B}}}_I \varvec{{\Lambda }}_{\varvec{{\Phi }}^I} \\&=\begin{bmatrix}\begin{bmatrix} {\textbf {a}}_1^T N^I_{,1} \\ {\textbf {a}}_2^T N^I_{,2} \\ {\textbf {a}}_1^T N^I_{,2}+{\textbf {a}}_2^T N^I_{,1} \\ \end{bmatrix} &{}\quad {\varvec{0}}_{3\times 2} \\ \begin{bmatrix} {{\textbf {t}}}_{,1}^T N^I_{,1} \\ {{\textbf {t}}}_{,2}^T N^I_{,2} \\ {{\textbf {t}}}_{,1}^T N^I_{,2}+{{\textbf {t}}}_{,2}^T N^I_{,1} \end{bmatrix} &{}\quad \begin{bmatrix} {\textbf {a}}_1^T \varvec{{{\mathcal {W}}}}_1^I \\ {\textbf {a}}_2^T \varvec{{{\mathcal {W}}}}_2^I \\ {\textbf {a}}_1^T \varvec{{{\mathcal {W}}}}_2^I + {\textbf {a}}_2^T \varvec{{{\mathcal {W}}}}_1^I \end{bmatrix}\varvec{{\Lambda }}_I \\ \begin{bmatrix} {{\textbf {t}}}^T N^I_{,1} \\ {{\textbf {t}}}^T N^I_{,2} \end{bmatrix} &{} \quad \begin{bmatrix} {\textbf {a}}_1^T N^I \\ {\textbf {a}}_2^T N^I \end{bmatrix}\varvec{{{\mathcal {P}}}}'\varvec{{\Lambda }}_I \\ \end{bmatrix}_{8\times 5} \end{aligned} \end{aligned}$$
(7.19)

we obtain the stiffness matrix

$$\begin{aligned} \begin{aligned} {\textbf {K}}^{JI,\text {riem}}_{5\times 5}&= \int _{{{\mathcal {B}}}_0^C} {{\mathcal {B}}}^\text {riem,T}_J \mathbb {C}{{\mathcal {B}}}^\text {riem}_I\\&\quad + \begin{bmatrix} {\hat{{\textbf {{\textbf {N}}}}}}^{JI} &{}\quad \left( {\hat{{\textbf {M}}}}^{JI}+{\hat{{\textbf {{{\textbf {Q}}}}}}}^{JI}\right) \varvec{{\Lambda }}_I \\ \varvec{{\Lambda }}_J^T \left( {\hat{{\textbf {M}}}}^{IJ}+{\hat{{\textbf {{{\textbf {Q}}}}}}}^{IJ}\right) &{}\quad \varvec{{\Lambda }}_J^T \left( {\hat{{\textbf {M}}}}^{JI}_{\varvec{{{\mathcal {X}}}}} + {\hat{{\textbf {M}}}}^{JI}_{\varvec{{{\mathcal {S}}}}}+{\hat{{\textbf {{{\textbf {Q}}}}}}}^{JI}_{{\mathcal {S}}}\right) \varvec{{\Lambda }}_I -{{\textbf {t}}}_J^T {\textbf {F}}_{\text {int}}^{\text {euk},{{\textbf {t}}}_J} {\textbf {I}}_{2\times 2} \delta _{IJ} \\ \end{bmatrix}~\,{\textrm {d}}A. \end{aligned} \end{aligned}$$
(7.20)

The part \({{\textbf {t}}}_J^T {\textbf {F}}_{\text {int}}^{\text {euk},{{\textbf {t}}}_J} {\textbf {I}}_{2\times 2} \delta _{IJ}\) is missing in similar formulations in the literature, with the exception of [86], where it can be found as last part in Eq. (B.5) and in Chapter C.2.4 (v) Geometric-diagonal. Furthermore, we stress that this contribution does not vanish at equilibrium, since only the tangential part of the residual vanishes at equilibrium. Therefore, an eigenvalue analysis to study stability problems does not yield the correct results, if this quantity is neglected. Additionally, it also does not vanish with mesh refinement. We study the influence of this quantity on the number of iterations in Sect. 10.

This additional quantity can be represented more explicitly by expanding the involved products as

$$\begin{aligned} \begin{aligned} {\textbf {K}}^{{\textrm {g}}2,\text {riem}}_{5n \times 5n} = -{{\,\textrm{diag}\,}}[&({{\bar{M}}}^1+{{\bar{Q}}}^1) {\textbf {H}}_{5 \times 5} ,\ldots , ({{\bar{M}}}^n+{{\bar{Q}}}^n) {\textbf {H}}_{5 \times 5} ], \end{aligned} \end{aligned}$$
(7.21)

with

$$\begin{aligned} \begin{aligned}&{{\bar{M}}}^I = {{\textbf {t}}}_I \cdot \int _{{{\mathcal {B}}}_0^C} \varvec{{{\mathcal {W}}}}_1^I {\textbf {a}}_1 {{\tilde{M}}}^{11}+ \varvec{{{\mathcal {W}}}}_2^I {\textbf {a}}_2 {{\tilde{M}}}^{22}\\&\quad \quad \quad +( \varvec{{{\mathcal {W}}}}_2^I {\textbf {a}}_1 + \varvec{{{\mathcal {W}}}}_1^I {\textbf {a}}_2) {{\tilde{M}}}^{12} ~\,{\textrm {d}}A, \\&{{\bar{Q}}}^I = {{\textbf {t}}}_I \cdot \int _{{{\mathcal {B}}}_0^C}\varvec{{{\mathcal {P}}}}' ( {\textbf {a}}_1 {{\tilde{Q}}}^{1}+ {\textbf {a}}_2 {{\tilde{Q}}}^{2} )N^I ~\,{\textrm {d}}A, \\&{\textbf {H}}_{5\times 5} =\begin{bmatrix} {\varvec{0}}_{3\times 3} &{}\quad {\varvec{0}}_{3\times 2} \\ {\varvec{0}}_{2\times 3} &{}\quad {\textbf {I}}_{2\times 2} \end{bmatrix}. \end{aligned} \end{aligned}$$
(7.22)

The interested reader is refered to Appendix 8 for a geometric and physical interpretation of \({\textbf {K}}^{{\textrm {g}}2,\text {riem}}\).

Furthermore, with Eqs. (7.17) and (7.15) the final Riemannian gradient or residual in the tangent space representation reads

$$\begin{aligned} \begin{aligned} {\textbf {R}}_{J,5\times 1}^\text {riem}&={{\,\textrm{grad}\,}}_J {\varPi }(\varvec{{\Phi }}) = \varvec{{\Lambda }}_{\varvec{{\Phi }}_J}^T P_{\varvec{{\Phi }}_J}{{\,\textrm{grad}\,}}_J {\bar{{\varPi }}}(\varvec{{\Phi }})\\&=\begin{bmatrix} \frac{{\partial }{\bar{{\varPi }}}}{{\partial }{\varvec{\varphi }}_J} \\ \varvec{{\Lambda }}_{{{\textbf {t}}}_J}^T\frac{{\partial }{\bar{{\varPi }}}}{{\partial }{{\textbf {t}}}_J} \\ \end{bmatrix}= \begin{bmatrix} {\textbf {F}}_\text {int}^{\text {euk},{\varvec{\varphi }}_J} \\ \varvec{{\Lambda }}_{{{\textbf {t}}}_J}^T {\textbf {F}}_\text {int}^{\text {euk},{{\textbf {t}}}_J} \\ \end{bmatrix}-\begin{bmatrix} {\textbf {F}}_\text {ext}^{\text {euk},{\varvec{\varphi }}_J} \\ \varvec{{\Lambda }}_{{{\textbf {t}}}_J}^T {\textbf {F}}_\text {ext}^{\text {euk},{{\textbf {t}}}_J} \end{bmatrix}, \end{aligned} \end{aligned}$$
(7.23)

where \(\varvec{{\Lambda }}_{\varvec{{\Phi }}_J}^T P_{\varvec{{\Phi }}_J}=\varvec{{\Lambda }}_{\varvec{{\Phi }}_J}^T\) has been used.

Now, with Eqs. (7.23) and (7.20) we have all ingredients to apply Algo. 1, except the definition of retractions.

8 Retractions

Incremental solution procedures of non-linear problems require the update of the unknown variables. In displacement based methods in mechanics, these variables are the positions of the nodes, which are updated by the displacement increments. This is trivially accomplished by addition of the variables

$$\begin{aligned} {{\textbf {x}}}_{i+1}={{\textbf {x}}}_{i}+\Updelta {{\textbf {x}}}. \end{aligned}$$
(8.1)

Since all occurring variables live in \(\mathbb {R}^n\), this addition is well-defined.

In the following, the generic variable \({{\textbf {x}}}\) denotes an arbitrary quantity living in the manifold M. In our specific application, ths quantity is the director \( {{\textbf {t}}}\). In the general case of M being a non-linear manifold, the update of these variables is non-trivial, since the variable \({{\textbf {x}}}\) lives in the manifold M, but the update \(\Updelta {{\textbf {x}}}\) lives in the tangent space \(T_{{\textbf {x}}}M\) at \({{\textbf {x}}}\).

Therefore, a naive addition \({{\textbf {x}}}+\Updelta {{\textbf {x}}}\) results in a quantity which is not an element of M anymore. Hence, we need a retraction, which is an operator \(R_{{\textbf {x}}}(\Updelta {{\textbf {x}}}): T_{{\textbf {x}}}M \rightarrow M\), that maps the tangent vector \(\Updelta {{\textbf {x}}}\) back onto the manifold. This is illustrated in Figs. 5 and 6, for the general case and the case of the unit sphere, respectively.

Fig. 5
figure 5

Retraction from the tangent space \(T_{{{\textbf {x}}}} M\) back onto the manifold M at the position \(R_{{\textbf {x}}}(\Updelta {{\textbf {x}}})\) with the geodesic curve \({\varvec{\gamma }}\)

The most prominent example of such an operator is the Riemannian exponential map, which maps the tangent vectors on locally uniquely defined geodesic curves \({\varvec{\gamma }}\) of the manifold. Since in most cases the computation of the exponential map is expensive or even not possible in closed form, an alternative attenuated concept of retractions can be defined, see ([2], Chapter 4.1) or more recently [1]. Additionally, this notion of retractions goes back to [4]. Luckily, for the unit sphere a closed form of the exponential map is available. For a shell formulation with an inextensible director in the nona-linear manifold \({{\mathcal {S}}}^2\) we end up with two different feasible possible retractions. The first retraction is the exponential map \(R_{{\textbf {x}}}^{em}\) of the unit sphere, which projects the quantity from the tangent space onto a geodesic curve. It can be written as

$$\begin{aligned} \begin{aligned} R_{{\textbf {x}}}^{em}(\Updelta {{\textbf {x}}})&= \exp _{{\textbf {x}}}(\Updelta {{\textbf {x}}}) \\&=\cos (||\Updelta {{\textbf {x}}}||){{\textbf {x}}}+\frac{\sin (||\Updelta {{\textbf {x}}}||)}{||\Updelta {{\textbf {x}}}||}\Updelta {{\textbf {x}}}. \end{aligned} \end{aligned}$$
(8.2)

The second possible retraction, namely the projection-based retraction \(R_{{\textbf {x}}}^{pb}\), see [2] or [90], is defined as closest point projection onto the manifold, that is

$$\begin{aligned} \begin{aligned} R_{{\textbf {x}}}^{pb}(\Updelta {{\textbf {x}}})= {{\mathcal {P}}}_{{\textbf {x}}}(\Updelta {{\textbf {x}}})&=\frac{{{\textbf {x}}}+\Updelta {{\textbf {x}}}}{||{{\textbf {x}}}+\Updelta {{\textbf {x}}}||} \\&= \frac{{{\textbf {x}}}+\Updelta {{\textbf {x}}}}{\sqrt{1+\Updelta {{\textbf {x}}}\cdot \Updelta {{\textbf {x}}}}}. \end{aligned} \end{aligned}$$
(8.3)

The projection-based retraction and the Riemannian exponential map coincide up to second order in a Taylor expansion. As shown in ([2], Chapter 4, p. 76) or ([90], Chapter 1.5.1), the Taylor expansions read

$$\begin{aligned} \begin{aligned} R_{{\textbf {x}}}^{em}(\Updelta {{\textbf {x}}})&=\cos (||\Updelta {{\textbf {x}}}||){{\textbf {x}}}+\frac{\sin (||\Updelta {{\textbf {x}}}||)}{||\Updelta {{\textbf {x}}}||} \\&= {{\textbf {x}}}+\Updelta {{\textbf {x}}}- \frac{||\Updelta {{\textbf {x}}}||^2}{2}{{\textbf {x}}}+ {{\mathcal {O}}}(||\Updelta {{\textbf {x}}}||^2), \\ R_{{\textbf {x}}}^{pb}(\Updelta {{\textbf {x}}})&=\frac{{{\textbf {x}}}+\Updelta {{\textbf {x}}}}{||{{\textbf {x}}}+\Updelta {{\textbf {x}}}||} \\&= {{\textbf {x}}}+\Updelta {{\textbf {x}}}- \frac{||\Updelta {{\textbf {x}}}||^2}{2}{{\textbf {x}}}+ {{\mathcal {O}}}(||\Updelta {{\textbf {x}}}||^2). \end{aligned} \end{aligned}$$
(8.4)

Consequently, for small increments \(||\Updelta {{\textbf {x}}}||\) both retractions (or update schemes) yield similar updated quantities. Nevertheless, in the neighborhood of the solution even a first order retraction will still lead to quadratic convergence of 1. For a proof we refer to ([20], Theorem 6.5). The projection-based retraction was also used in [43] and was named radial return normalization in [18]. In Sect. 10.2.2 we will show the superiority of this procedure. In order to avoid confusion with the projection-based interpolation of the director, we use the term radial return normalization in the following, in spite of the fact that projection-based retration appears to be the denomination mostly used today, especially in the mathematical literature [90]. This renaming does only make sense for the case of the unit sphere, since for other manifolds the projection-based retraction is of course not a radial return.

Fig. 6
figure 6

Two different retractions for the unit sphere; red: the exponential map, which maps lines onto great circles, blue: the radial return normalization, which normalizes the vector \({{\textbf {x}}}+\Updelta {{\textbf {x}}}\)

9 Improving the Convergence Properties

We additionally use the mixed interpolation point (MIP) technique from [52] to further improve the convergence of the equilibrium iteration. This technique can be traced back to [48] and [49], where it is recommended to use for the first few iterations the stresses of the last converged step to compute the geometric stiffness matrix. In the MIP method at every integration point a Hellinger-Reissner functional is introduced, with the stresses at the integration points as free variables. These are eliminated via static condensation at integration point level. The corresponding stress values are only used for computation of the geometric stiffness matrices \({\textbf {K}}^{{\textrm {g}},\text {riem}}\) and in \({\textbf {K}}^{{\textrm {g}}2,\text {riem}}\) as

$$\begin{aligned} \begin{aligned} {\textbf {S}}^{\textrm {MIP}}_{k+1}&=\mathbb {C}_{k} \left( {\textbf {E}}_{k}^V+{{\mathcal {B}}}_{k}^{\text {riem}}\Updelta \varvec{{\Phi }}\right) \\&= {\textbf {S}}_{k}^{\textrm {disp}} +\mathbb {C}_{k}{{\mathcal {B}}}_{k}^{\text {riem}}\Updelta \varvec{{\Phi }}, \end{aligned} \end{aligned}$$
(9.1)

but the stresses for the internal forces are computed in the usual way as

$$\begin{aligned} {\textbf {S}}^{\textrm {disp}}_{k+1}=\mathbb {C}_{k+1} {\textbf {E}}_{k+1}^V, \end{aligned}$$
(9.2)

where \({\textbf {E}}^V={[{\varvec{\varepsilon }}^V~{\varvec{\kappa }}^V~{\varvec{\gamma }}^V]}^T\). The subscript k denotes values from the previous iteration. For linear material laws the procedure to obtain the alternatives stresses for the geometric stiffness can also be interpreted as a linearized Taylor expansion from the previous iteration to the current one. This reads

$$\begin{aligned} \begin{aligned} {\textbf {S}}_{k+1}^{\textrm {MIP}}&={\textbf {S}}_{k}^{\textrm {disp}} + \frac{{\partial }{\textbf {S}}({\textbf {E}}^V(\varvec{{\Phi }}))}{{\partial }\varvec{{\Phi }}}\Bigg |_{\varvec{{\Phi }}=\varvec{{\Phi }}_k}\Updelta \varvec{{\Phi }} \\&= {\textbf {S}}_{k}^{\textrm {disp}}+ \underbrace{\frac{{\partial }{\textbf {S}}({\textbf {E}}^V)}{{\partial }{\textbf {E}}^V}\Bigg |_{\varvec{{\Phi }}=\varvec{{\Phi }}_k}}_{\mathbb {C}_{k}}\underbrace{\frac{{\partial }{\textbf {E}}^V(\varvec{{\Phi }})}{{\partial }\varvec{{\Phi }}}\Bigg |_{\varvec{{\Phi }}=\varvec{{\Phi }}_k}}_{{{\mathcal {B}}}_{k}^{\text {riem}}}\!\!\Updelta \varvec{{\Phi }}. \end{aligned} \end{aligned}$$
(9.3)

Thus, the current stresses for the geometric tangent are extrapolated from the stresses and the strain-displacement operator of the previous step. This leads to significantly fewer iterations and allows much larger load steps. The corresponding improvements are numerically studied in Sect. 10.

For the converged state, where \(\Updelta \varvec{{\Phi }}={\varvec{0}}\), the stresses from the purely displacement based formulation are recovered and therefore at equilibrium the stiffness matrix corresponds to the one obtained with a primal formulation in terms of displacements. Therefore, the MIP technique does not pollute the final result, but it only influences convergence behavior. The reason for this can be found in [52]. The benefit of this method can be interpreted as a circumvention of an iteration locking phenomenon due to the highly different membrane and bending stiffnesses, which can be cured using the MIP technique.

10 Numerical Examples and Discussion

10.1 Overview

In the following we want to emphasize various properties of the presented formulation using numerical examples. Completely geometrically non-linear kinematic equations are used in all simulations. First, we compare the chosen projection-based director interpolation (PBFE) to the nodal (NFE) and geodesic (GFE) approach in Sect. 10.2. Furthermore, we study the influence of the MIP scheme as mentioned in Sect. 9 on the number of iterations required for convergence. The influences of the chosen tangent base update scheme and nodal director update scheme are also shown. After this, we study the problem of a doubly curved shell subject to a deformation and a subsequent rigid body rotation to show the objectivity and path independence of the formulation. We also investigate the consequences of neglecting the additional contribution \({\textbf {K}}^{{\textrm {g}}2,\text {riem}}_{5n \times 5n}\), equation ((7.21)), to the geometric stiffness matrix. We proceed with the study of a path following example with branch switching of an L-shaped shell. Finally, a simulation of wrinkling patterns and qualitative comparison to an experimental result demonstrates the applicability of the shell element formulation to more complex problems.

For all simulations, Non-Uniform Rational B-Spline (NURBS) are used as ansatz and test spaces, as proposed in [44]. We denote these spaces in the following as e.g. “P2C1” denoting quadratic NURBS with \(C^1\)-continuity between elements. The cases of P1C0 and P2C0 reproduces the standard Q1 and Q2 finite elements, respectively. Unless stated otherwise, we use the radial return normalization as nodal director update and the incremental vector transport (IncVT) as tangent base update as shown in Algo. 4. As default we use the projection-based finite elements (PBFE). Within the incremental-iterative solution procedure, load control with equidistant load increments is used. Additionally, we use a St.-Venant-Kirchhoff material law. In local Cartesian coordinates, the corresponding material tangent reads

$$\begin{aligned} \mathbb {C}_{8\times 8} ={{\,\textrm{diag}\,}}\left[ {\textbf {D}}_{2\times 2},~G h,~{\textbf {K}}_{2\times 2},~ G h^3 ,~ G h~{\textbf {I}}_{2\times 2}\right] \end{aligned}$$
(10.1)

with

$$\begin{aligned} \begin{aligned} {\textbf {D}}&= \frac{h E}{1-\nu ^2} \begin{bmatrix} 1 &{} \nu \\ \nu &{} 1 \end{bmatrix} , \quad {\textbf {K}}= \frac{h^3 E}{12(1-\nu ^2)} \begin{bmatrix} 1 &{} \nu \\ \nu &{} 1 \end{bmatrix}, \\ G&= \frac{E}{2(1+\nu )}, \end{aligned} \end{aligned}$$
(10.2)

where the vanishing normal stress condition is already enforced. Here, E is Young’s modulus, \(\nu \) Poisson’s ratio and h the shell thickness. A shear correction factor is not applied. No specific measures are taken to avoid membrane locking and transverse shear locking in this context. The focus is on the aforementioned aspects of consistency, efficiency, objectivity and path-independence.

Furthermore, to use a Cartesian material law we construct a Cartesian reference frame \(\theta ^i{\tilde{\textbf {{\textbf {A}}}}}_i\) from \(\xi ^i{\textbf {A}}_i\) identical to ([29], Eq. 9–13). The only change is the definition of the shape function derivatives \(N_{,{\alpha }}=\frac{{\partial }N}{{\partial }\xi ^{\alpha }}\), which are now carried out as \(N_{,{\alpha }}=\frac{{\partial }N}{{\partial }\theta ^{\alpha }}\) which is done by constructing the Jacobian \(\frac{{\partial }\xi ^{\alpha }}{{\partial }\theta ^{\beta }}\) between both coordinate frames. Explicitly, this yields \(N_{,{\alpha }}=\frac{{\partial }N}{{\partial }\theta ^{\alpha }}= \frac{{\partial }N}{{\partial }\xi ^{\alpha }}\frac{{\partial }\xi ^{\alpha }}{{\partial }\theta ^{\beta }}\). This is only done once before the start of the simulation and the derivative values are stored at the integration points. Additionally, this simplifies the strain measures due to \({\tilde{\textbf {{\textbf {A}}}}}_{\alpha }\cdot {\tilde{\textbf {{\textbf {A}}}}}_{\beta }={\delta }_{{\alpha }{\beta }} \).

10.2 Comparison of Nodal, Projection-Based and Geodesic Interpolation

10.2.1 Theoretical Comparison

In this section, we discuss the difference of three director interpolation schemes, which are all path independent and objective. In the projection-based approach (PBFE) from Eq. (6.8) the interpolation is

$$\begin{aligned} {{\textbf {t}}}_{{\textrm {P}}{\textrm {B}}}= \frac{\sum _{I=1}^n N^I {{\textbf {t}}}_I}{||\sum _{I=1}^n N^I {{\textbf {t}}}_I||}. \end{aligned}$$
(10.3)

The first alternative is the nodal approach (NFE), in which the director is interpolated without normalization

$$\begin{aligned} {{\textbf {t}}}_{\textrm {N}}= \sum _{I=1}^n N^I {{\textbf {t}}}_I. \end{aligned}$$
(10.4)

This scheme corresponds to the interpolation approaches used in [15, 16, 43, 63, 87], where the inextensibility condition is only fulfilled at the nodes. The aforementioned schemes from literature differ by the definition of the degrees of freedom and the stiffness matrix. Therefore, it may not be possible to directly compare the NFE approach described herein to the mentioned references. The NFE residual and stiffness matrix can be constructed from the PBFE approach by skipping normalization of the director and by replacing \(\varvec{{{\mathcal {P}}}}'={\textbf {I}}, \varvec{{{\mathcal {Q}}}}_{{\alpha }}=\varvec{{{\mathcal {X}}}}_{{\alpha }{\beta }}=\varvec{{{\mathcal {S}}}}_{{\alpha }}={\varvec{0}}\) in all quantities. Compared to PBFE this results in a simpler formulation.

The third alternative are the geodesic finite elements (GFE), which use the interpolation scheme

$$\begin{aligned} \begin{aligned} {{\textbf {t}}}_{\text {GFE}}&={\mathop {\textrm{argmin}}\limits _{{{\textbf {t}}}\in {{\mathcal {S}}}^2}} \sum _{I=1}^n N^I{{\,\textrm{dist}\,}}_{{{\mathcal {S}}}^2}^2({{\textbf {t}}}_I, {{\textbf {t}}})\\&={\mathop {\textrm{argmin}}\limits _{{{\textbf {t}}}\in {{\mathcal {S}}}^2}} \sum _{I=1}^n N^I\arccos ^2({{\textbf {t}}}_I\cdot {{\textbf {t}}}), \end{aligned} \end{aligned}$$
(10.5)

taken from ([74], Eq. 29). Solving of the local minimization problem Eq. (10.5) at each integration point is documented in Appendix 5.

We obtain the values \(\frac{{\partial }{{\textbf {t}}}_{\text {GFE}}}{{\partial }{{\textbf {t}}}_I}\) and \(\frac{{\partial }^2{{\textbf {t}}}_{\text {GFE}}}{{\partial }{{\textbf {t}}}_I\partial {{\textbf {t}}}_J}\) for the Euclidean gradient and Euclidean Hessian, respectively, using the forward mode of the automatic differentiation tool Autodiff [50]. Here, we first solve the director interpolation minimization problem using the double number format. After this, we use the converged director as initial iterate for the calculation of the derivatives using the dual2nd number format to obtain the derivatives. Using the converged director as predictor, calculation of this derivative mostly takes only one or two iterations in dual2nd format to converge to machine precision. For higher order interpolations, e.g. P4C3 the algorithm sometimes needed up to four iterations to converge. The idea, to first converge the values and then differentiate them, can be found in ([32], Ch. 4). Furthermore, in [32] and [13] the convergence to the correct derivatives for Newton methods is proven.

The differences between the three interpolation schemes are first explained on a theoretical basis. Similar to the GFE definition, the nodal and the projection-based approach can be reformulated as closed form solutions of minimization problems. The three approaches can thus be represented as

$$\begin{aligned} \begin{aligned} {{\textbf {t}}}_{{\textrm {N}}{\textrm {F}}{\textrm {E}}}&= \sum _{I=1}^n N^I {{\textbf {t}}}_I \\&= {\mathop {\textrm{argmin}}\limits _{{{\textbf {t}}}\in {\mathbb {R}}^{3}}} \sum _{I=1}^n N^I{{\,\textrm{dist}\,}}^2_{{\mathbb {R}}^{3}}({{\textbf {t}}}_I, {{\textbf {t}}}) \\&= {\mathop {\textrm{argmin}}\limits _{{{\textbf {t}}}\in {\mathbb {R}}^{3}}} \sum _{I=1}^n N^I|| {{\textbf {t}}}_I- {{\textbf {t}}}||^2, \\ {{\textbf {t}}}_{{\textrm {P}}{\textrm {B}}{\textrm {F}}{\textrm {E}}}&= \frac{\sum _{I=1}^n N^I {{\textbf {t}}}_I}{||\sum _{I=1}^n N^I {{\textbf {t}}}_I||} \\&= {\mathop {\textrm{argmin}}\limits _{{{\textbf {t}}}\in {{{\mathcal {S}}}}^{2}}} \sum _{I=1}^n N^I{{\,\textrm{dist}\,}}^2_{{\mathbb {R}}^{3}}({{\textbf {t}}}_I, {{\textbf {t}}}) \\&= {\mathop {\textrm{argmin}}\limits _{{{\textbf {t}}}\in {{{\mathcal {S}}}}^{2}}} \sum _{I=1}^n N^I|| {{\textbf {t}}}_I- {{\textbf {t}}}||^2, \\ {{\textbf {t}}}_{\text {GFE}}&= {\mathop {\textrm{argmin}}\limits _{{{\textbf {t}}}\in {{{\mathcal {S}}}}^{2}}} \sum _{I=1}^n N^I{{\,\textrm{dist}\,}}^2_{{{\mathcal {S}}}^{2}}({{\textbf {t}}}_I, {{\textbf {t}}}) \\&= {\mathop {\textrm{argmin}}\limits _{{{\textbf {t}}}\in {{{\mathcal {S}}}}^{2}}} \sum _{I=1}^n N^I\arccos ^2({{\textbf {t}}}_I\cdot {{\textbf {t}}}). \end{aligned} \end{aligned}$$
(10.6)

These abstract definitions can be interpreted geometrically as shown in Fig. 7, see also [76] for a discussion of interpolation on manifolds. One can see, that the nodal approach violates the unit length condition in the domain. For GFE and PBFE this constraint is exactly satisfied, since the minization takes place on the manifold. Since GFE and PBFE satisfy this constraint exactly, both of them suffer from the following problem: Consider two directors that point in exactly opposite directions. The geodesic that connects them is not unique and the interpolation is ambigious. This problem only arises for very coarse meshes and even then we consider this event rare. Therefore, we argue that this is an interesting issue but a theoretical one. For the interested reader, we refer to Chapter 4 of [73]. Furthermore, GFE also uses the distance measure of the manifold and therefore it is fully intrinsic since it does not depend on a particular embedding. Also the hybrid nature of the projection-based approach becomes obvious, as it relies on the distance measure of the embedding space but the minimization is solved for values on the manifold. In particular, loosely speaking, PBFE inherits the simplicity of NFE and the accuracy of GFE.

Fig. 7
figure 7

Graphical comparison of interpolation between two nodal directors \({{\textbf {t}}}_1,{{\textbf {t}}}_2\), using the schemes of Eq. (10.6). The blue line indicates the used distance measure and the red dashed line indicates the space in which the minimization problem is formulated, i.e. where the interpolated director lives in

In contrast to nodal interpolation, projection-based interpolation and geodesic interpolation are not exactly integrated by Gauss quadrature even for the simple case of \( N^I(\xi ^1,\xi ^2)\) being Lagrange polynomials. This is the case, since the interpolation formula of PBFE involves irrational fractions of the ansatz functions whereas for GFE the dependence of \(\xi ^1,\xi ^2 \) on the director \({{\textbf {t}}}\) cannot even be expressed in closed form.

Still, we use Gauss integration in the numerical examples that fits the order of the used NURBS function spaces and numerical studies revealed that increasing the number of quadrature points leaves the results practically unchanged.

In [36] numerical evidence is given that for the case of the unit sphere geodesic finite elements are superior to the projection-based formulation in terms of \({\textrm {h}}\)-convergence. However, in ([36], Ch. 5.1) they also mention that the projection-based approach is 10 times as fast and therefore an overall superiority of PBFE can be deduced in terms of computational efficiency. As already mentioned, this is due to the implicit definition of the geodesic interpolation, which leads to a small minimization problem at each integration point, see [75], whereas for projection-based finite elements an explicit formula is available. The implicit definition of GFE does also lead to the need of using automatic differentiation for the derivatives which explains the major speed difference. Furthermore, we stress that the obtained convergence rates for \({\textrm {h}}\)-refinement are not backed up by theory, since no error estimates for this type of energy are available. Furthermore, since refraining from interpolating the directors directly can lead to non-objective results, difficulties in conserving the total angular momentum arise in dynamic simulations. Therefore, Simo et al. refrained in [87] from the involved interpolations of [86], which used history fields at the integration points, and simply sticked to NFE to be able to conserve total angular momentum. A similar reasoning for rods can be found in [69]. We hypothesize that this also holds for PBFE and GFE. In the following example we finally compare performance of the different interpolations applied to the Reissner-Mindlin shell.

10.2.2 Pure Bending of a Straight Beam

Fig. 8
figure 8

Deformed and undeformed configuration of the bending of a straight beam, including boundary conditions and parameters

In order to compare the interpolation schemes NFE, PBFE and GFE, we consider the example of pure bending of a straight beam. The straight beam is clamped at one end and a moment load is applied at the opposite free end, see Fig. 8. The moment that is needed to coil the beam to a full circle is obtained as follows: Since the deformed configuration is a full circle, the geometry of the final deformed configuration is known. This can be used to obtain the analytic (Green-Lagrangian) strain function through the cross section. In combination with the material law from Eq. (10.1) we can derive the stresses, which we integrate through the cross section to obtain the stress resultants, using Eq. (A.8). The normal force and the transverse shear force must be identically zero. From these constraints, the correct moment load can be derived as

$$\begin{aligned} M = \frac{2 EI \pi }{L}\left( 1-\frac{2}{3}\frac{ h^2 \pi ^2}{L^2}\right) = M^{\text {lin}}(1-\frac{2}{3}\frac{h^2 \pi ^2}{L^2}), \end{aligned}$$
(10.7)

with \(I=bh^3/12\). This moment load can then be translated to a moment line load at the corresponding edge, see Fig. 8. Furthermore, we point out that this is in line with the value \(M^{\text {lin}}=2\pi EI/L\) found in the literature [22, 63, 86]. This solution assumes a linear stress-strain relation using Biot strains. It is also valid for very thin beams. Additionally, if we do not neglect the quadratic part \(\rho _{{\alpha }{\beta }}\) in the strains in Eq. (4.9) the factor of the difference term in M changes from 2/3 to 1.

Fig. 9
figure 9

Pure bending of a straight beam with \(h=1\,\hbox {cm}\), comparison of \({\textrm {h}}\)-convergence and runtime in seconds for different interpolation schemes and different orders of interpolation

Comparison of Interpolations All results for this example are obtained using the IncVT-update for the tangent space of the director, which is explained in Appendix 3 in Algorithm 4. As director update we use the radial return normalization. This combination allows to apply the moment to turn the straight beam into a full circle in a single load step. The iteration of the load step is considered as converged if the Euclidean norm of the displacement correction drops below \(1 \times 10^{-12}\).

The different interpolations are tested for several polynomial orders and an increasing element count for a thickness of \(h=1\hbox {cm}\). The results are summarized in Fig. 9. The diagrams on the left show the absolute error of the tip rotation plotted versus the number of elements in length direction. More precisely, for one of the directors at the tip \({{\textbf {t}}}^{\text {tip}}\) the quantity \(|\pi -{{\,\textrm{atan2}\,}}( -t^{\text {tip}}_3,t^{\text {tip}}_1 )| \) is computed. The diagrams on the right show the runtime needed for evaluation of all element matrices, accumulated over all iterations.

All interpolation schemes require the same number of iterations to satisfy the convergence criterion. Additionally, we use up to 3600 elements, if the absolute tip rotation error is above \(1 \times 10^{-7}\). In order to obtain errors smaller than \(1\times 10^{-8}\) by introducing more elements, we witnessed numerical issues due to the conditioning of the stiffness matrix. This results in a diminishing convergence order and the error can not be further reduced. These results are not reported in the figures.

Figure 9a and b show that for the case of linear Q1 elements the convergence behavior is practically identical for all three interpolation schemes. This indicates that omitting director normalization, as it is proposed by many authors, is in fact acceptable for linear elements. The runtimes shown in Fig. 9b reveal that the GFE approach is significantly more expensive than both PBFE and NFE. The latter two exhibit almost identical performance, with a marginal advantage for the NFE approach.

For second order elements with \(C^1 \)-continuity (P2C1) the results are quite different, see Fig. 9c and d. For the NFE approach the underlying Newton-Raphson process does not converge for all discretizations, even if the process is subdivided into 100 increments. For sufficiently fine meshes convergence is achieved, but the resulting curve indicates only quadratic \({\textrm {h}}\)-convergence while both the GFE and PBFE approach achieve convergence within a single load step and the curves indicate cubic \({\textrm {h}}\)-convergence. Both schemes yield similar results but PBFE is again superior in terms of runtime, as seen in Fig. 9d. The inferior behavior of the NFE approach can be advocated to the fact that for higher order elements the geometric effect of not normalizing the director is more pronounced.

Further elevation of the polynomial order to \(p=4\) and \(C^3\)-continuity increases these effects, see Fig. 9e and f. Here, for the NFE approach, the Newton-Raphson process does not converge for an element count below 20. Furthermore, the convergence order remains quadratic for NFE, in contrast to PBFE and GFE, for which the curves indicate a quintic convergence order. Moreover, the PBFE and GFE approach start to differ and the GFE approach is slightly superior in terms of absolute error in the tip rotation in the range of almost an order of magnitude, for fine meshes. Comparing runtimes, however, the PBFE approach clearly outperforms GFE, although it can be forecasted that GFE will eventually surpass NFE also in terms of runtime for very small errors due to the degeneration of convergence order of the latter.

In summary, these results strongly indicate a superiority of PBFE to GFE in terms of overall efficiency. However, it does of course not prove it. The investigations here only deal with the one-dimensional interpolation quality since the deformation is uni-axial. In [36] there is a comparsion for \({{\mathcal {S}}}^2\) harmonic maps up to a polynomial order of three but there the differences of \({\textrm {h}}\)-convergence are small. Additonally, we refer the interested reader to [76], where “Abbildung 4” provides a nice visualization of the PBFE and GFE interpolation schemes. Furthermore, the difference between PBFE and GFE in terms of computational speed is reported in the literature as a 10-fold difference [36]. These results are obtained with ADOL-C [33], which uses a mixed forward and backward mode. In contrast to this, we witnessed for our experiments a 100-fold speed difference using the forward mode of [50], which indicates that this is not the best choice to differentiate the geodesic director interpolation algorithm. Nevertheless, the overall picture remains unchanged concerning the speed of the different interpolations. Furthermore, since automatic differentiation is not always an option and we expect the differences in accuracy of PBFE and GFE to only play a role for high interpolation orders, we still recommend PBFE.

Fig. 10
figure 10

Pure bending of a straight beam with \(h=1\,\hbox {cm}\), comparison of \({\textrm {h}}\)-convergence and runtime in seconds for different interpolation schemes and \(p=2\) and \(C^0\)-continuity

Fig. 11
figure 11

Pure bending of a straight beam, normalized tip rotation for different polynomial orders and director interpolation schemes

Figure 10a and b show results for a \(p=2,C^0\) ansatz basis in order to study the case of not exploiting the highest possible continuity of \(C^{p-1}\). The same effects as in Fig. 9 occur, except the problem of non-convergence of the NFE approach for coarse meshes. In particular, it must be stressed that also here the convergence order of the nodal approach is not optimal.

The specific behavior of the NFE approach for higher order interpolation can be explained by regarding plots of the normalized tip rotation versus the number of degrees of freedom, Fig. 11. Apparently, NFE overestimates the deformation and reacts too softly. This behavior results from the fact that for higher polynomial orders the dimension of individual elements are larger for a given total number of degrees of freedom. Therefore, the differences of the nodal directors also increase and consequently the deviation from unit length of the interpolated director increases. In particular, this results in directors being too short and an artificial thinning of shell. This, in turn, reduces bending stiffness, which leads to the mentioned artificial softening. In contrast to this, PBFE and GFE strictly converge from the “stiff” side even for the case of P4C3, since it always exactly preserves the unit length of the interpolated director.

Table 3 Pure bending of a straight beam; iteration count for several combination of tangent base update and geometric tangent stiffness; p=5, 50 elements, \(C^4\) continuity, convergence threshold for the norm of displacement correction \(=1 \times 10^{-11}\)

Comparison of Director Update and Tangent Base Update The example of pure bending of a straight beam is now used to study the influence of using the MIP method for the geometric stiffness and to compare three different update procedures for the tangent base of the director with regard to the required number of iterations. A fine discretization of 50 \(5^\textrm {th}\) order elements is used in order to minimize the effect of locking. Only one single load step is taken for coiling the beam to a full circle. The two incremental approaches incremental vector transport (IncVT) Algo. 4, incremental parallel transport (IncPT) Algo. 3 and stereographic projection (SP) Algo. 5 are taken into account in combination with a pure displacement scheme (disp) and the MIP scheme for calculation of the geometric stiffness. All results are obtained using projection-based interpolation (PBFE) as well as radial return normalization \(R_{{\textbf {x}}}^{\text {pb}}\).

The results are shown in Table 3 for various values of the beam thickness. For a thick beam (\(h=2.0\,\hbox {cm}\)) applying the MIP method reduces the number of required iterations from 29 to 16 for all versions of the tangent base update. The results remain independent of the update procedure down to a thickness of \(h=0.01\,\hbox {cm}\), but the absolute number of iterations increases significantly for the standard geometric stiffness, while along with MIP the required number of iterations is constantly equal to 16.

Overall, no significant and systematic difference between the incremental approaches (IncVT, IncPT) and the stereographic one (SP) can be observed. For \(h=0.01\,\hbox {cm}\) and larger the results are exactly identical. For extremely thin structures, some differences are observed, however, with no distinct trend in favor of one of the three versions. During the numerical studies it has been found that different results are obtained when using direct solvers. This indicates that arithmetic effects, probably related to the conditioning of the system of equations, are significant. Therefore, drawing conclusions concerning the effects of the tangent base algorithm is prohibitive for the extremely thin cases.

The convergence behavior of the Newton-Raphson scheme observed for this example is superior to that of formulations found in literature. In [28] 5 load steps are needed for convergence and in [93] 125 load steps are needed with 715 accumulated iterations.

Fig. 12
figure 12

Pure bending of a straight beam; deformed geometry after the first iteration; left: exponential map \(R^{em}_{{\textbf {t}}}(\Updelta {{\textbf {t}}})= \exp _{{\textbf {t}}}(\Updelta {{\textbf {t}}})\), right: radial return normalization \(R^{\text {pb}}_{{\textbf {t}}}(\Updelta {{\textbf {t}}})= {{\mathcal {P}}}_{{\textbf {t}}}(\Updelta {{\textbf {t}}})\). The plot of the surface, the control polygons and control points are obtained using [89]

When using the exponential map \(R_{{\textbf {x}}}^{\text {em}}\) no convergence can be achieved in only one load step for this example. The dramatic failure of the exponential map procedure for this specific setting can be explained by examining the deformed geometry after the first iteration, which corresponds to the linear solution. The diagrams in Fig. 12 show the deformed center line along with the nodal directors for the exponential map (left) and the radial return normalization (right). Obviously, the directors are far from being normal to the midsurface for the former case, even including self-penetration of the material, while for the latter the rotated directors and the deformed center line match quite well. The reason is that the radial return normalization approach naturally limits the rotations within one iteration to \(< 90^{\circ }\). In fact, the underlying problem is almost linear in the rotations but highly non-linear in the midsurface displacements. The exponential map yields the correct solution of the rotations in one iteration but the mid-surface displacements lag behind and, in combination, yield an unphysical state with self-penetration. This explains the divergence using the exponential map.

In summary, for the director update the normalization procedure appears to be superior to the exponential map due to the following reasons:

  • The radial return update is cheaper due to the easy normalization, circumventing trigonometric functions.

  • The Taylor expansions of both procedures coincide up to second order, which yields similar results for small updates, see Eq. (8.4)

  • For large increments the normalization seems to be more robust due to the natural limitation of the rotation increments within one iteration, which can be incoherent with the mid-surface displacements.

Table 4 shows the iteration history for this example with a thickness of \(h=1\,\hbox {cm}\), confirming quadratic convergence of the Riemannian Newton-Raphson scheme for both the displacement-based and the MIP approach.

Table 4 Pure bending of a straight beam; iteration history of Euclidean eresidual norm and Euclidean norm of displacement correction; \(\hbox {p}=5\), 50 elements, \(C^4\) continuity, thickness \(h=1\,\hbox {cm}\)

10.3 Deformation and Subsequent Rigid Body Rotation of a Doubly Curved Freeform Surface

Fig. 13
figure 13

System and boundary conditions of the free form surface

Inspired by numerical examples from [45] and [46] we consider a curved shell structure undergoing a rigid body rotation, while a constant force is acting on the shell. The purpose of this study is to evaluate the objectivity of the presented formulation. Again, we restrict these investigation to the projection-based interpolation (PBFE). A doubly curved free form surface is used to obtain a challenging setup, since several formulations exist for which objectivity is obtained for the special case of plane reference configurations. Geometry and material data are taken from [28].

The system and boundary conditions are presented in Fig. 13. The bottom edge is simply supported and all other edges are free. In [28] the problem data are given without units. At the top edge a line load \(p_Y=10\) per unit length in global Y-direction is prescribed. Material data are given as Young’s modulus \(E=1.2\cdot 10^6\) Poisson’s ratio \(\nu =0.3\) and the shell thickness is \(h=0.1\). We study the objectivity on the coarsest mesh, because this is the most challenging case, using three cubic B-Spline elements per direction. This configuration, including the directors at the control points, is shown in Fig. 14.

Fig. 14
figure 14

Coarsest free form surface representation with control polygons, control points and directors at the control points. The plot of the surface, the control polygons and control points are obtained using [36]

The results obtained with the present formulation are compared with the full and reduced integrated four-node shell elements S4 and S4R of the finite element software ABAQUS [88]. The two different meshed used are shown in Fig. 15.

First, we apply the load \(p_Y\). The resulting Y-displacements of point A for different methods and discretizations are summarized in Table 5. Obviously, the displacements are severely underestimated with the present formulation and a coarse discretization with cubic elements. This can be explained by the effect of locking, mainly membrane locking, which has to be expected in this purely displacement-based approach. This, however, does not compromise the discussion of objectivity.

Fig. 15
figure 15

Meshes of ABAQUS computation

Table 5 Displacement \(u_y^A\) after applying the load \(p_y\) and before starting the rotation

After this, the free form surface is subjected to a rigid body rotation around the Y-axis, see Fig. 16 by applying inhomogeneous Dirichlet boundary conditions at the bottom edge. The rotation is applied in increments of \(22.5^{\circ }\) and a total of 100000 load steps are applied, resulting in 6250 full turns. For the simulations using the present approach, the corresponding equilibrium iteration is considered as converged if the Euclidean norm of the displacement correction drops below \(1\times 10^{-12}\). For the simulation with ABAQUS we changed the ABAQUS control routine values \(R_n^{\alpha }, C_n^{\alpha }\) from default (0.005, 0.01) to (\(1 \times 10^{-12}\),\(1\times 10^{-12}\)), which usually leads to a termination of the iterations when the maximum entry in the displacements corrections drops at least below \(1 \times 10^{-8}\). Nevertheless, no major difference in the results was obtained using the ABAQUS default values.

Fig. 16
figure 16

Snapshots of the deformed and rotated free form surface. Snapshots are taken for every second load step, which represents a support rotation of \({\varphi }_Y=45^{\circ }\)

Fig. 17
figure 17

Deviation of y-displacement of point A in Fig. 13 during rotation

Figure 17 shows the absolute changes of the Y-displacement of point A (which is supposed to remain constant during rotation), plotted versus the number of turns. For the proposed formulation the relative error is bounded by \(1 \times 10^{-12}\), which underlines the objectivity of the formulation. This property is backed up by theory, see ([36], Ch. 1.3). Since the deformation does not depend on the number of turns it also confirms path independence of the formulation. In contrast to this, the results obtained with the S4 and S4R element of ABAQUS indicate non-objectivity. The absolute errors are several orders of magnitudes larger and—more importantly—these errors do not diminish with mesh refinement. For the S4R element the non-objective error even increases. Nevertheless, after every full turn the original displacement is obtained, which indicates path independence.

Influence of the second geometric stiffness contribution This numerical example studies the influence of the non-standard contribution to the geometric stiffness matrix \({\textbf {K}}^{{\textrm {g}}2,\text {riem}}_{5n \times 5n}\) (Eq. (7.21)) on the iteration history. Its effect can be interpreted as preventing an artificial geometric stiffness with respect to the nodal director change, see Appendix 8. As one can see in Eq. (7.21), only moments and transverse shear forces contribute to this geometric stiffness. Furthermore, it only contributes to diagonal terms to the stiffness matrix. Departing from these algebraic facts, we reuse the example of Sect. 10.3 to show the influence in a simulation. To increase the moment stress resultants we increase the load \(p_Y\) to 30. This load is applied for different mesh sizes in three equidistant load steps. After the load has been applied, a prescribed rotation is imposed with a magnitude of \(11.25^{\circ }\) within each load step. These load steps are considered converged, if the Euclidean norm of the displacement correction drops below \(1 \times 10^{-11}\).

The resulting iteration counts are summarized in Table 6. Also the effect of whether or not the MIP method is used is documented. It is apparent that this stiffness quantity saves between one to three iterations. This is true for both load cases, \(p_Y\) and support rotation.

Table 6 Influence of using \({\textbf {K}}^{{\textrm {g}}2,\text {riem}}_{5n \times 5n}\) on the number of iterations required for the doubly curved shell problem from Fig. 13. The MIP method is not used here

10.4 Out-of-Plane Buckling of an L-Shaped Plate

This example is designed to demonstrate the capability of the proposed formulation to follow complex load-displacement characteristics. For this we use the L-shaped plate example, which can be found in [9, 78, 81, 86, 97]. The L-shaped plate and the corresponding boundary conditions are shown in Fig. 18. The material parameters are set to \(E=71240\,\hbox {N}/\hbox {m}\hbox {m}^{2}\) and \(\nu = 0.31\). The thickness is \(h=0.6\,\hbox {mm}\). The left end is fully clamped and the upper edge A-B is subject to a line load \(q_X= {\lambda }\frac{1}{30}~\hbox {N}/\hbox {mm}\). In this definition of the load, we follow [78], although in most publications a point load at point A or B or at the center of the edge is applied, which gives rise to a singularity. We use a discretization with bi-quadratic \(C^0\)-continuous ansatz functions. Figure 18 shows a mesh with 63 elements; in addition a uniformly refined mesh with 2800 elements is used.

Fig. 18
figure 18

L-shaped plate, problem setup and mesh with a \(9\times 3\), \(3\times 3\) and \(9 \times 3\) = 63 elements

An arclength method is used for path following. Accompanying eigenvalue analyses are used to check for critical points. As soon as a negative eigenvalue is observed, the corresponding critical load factor \({\lambda }_{\text {crit}}\) and eigenvector \({\varvec{\phi }}\) are identified via extended systems [98, 99]. For the given problem, these belong to a bifurcation point, with the eigenvector indicating the buckling mode. Our results — also including the cases of point loads at A and B — are compared to results from the literature in Table 7. Here, for better comparability, a finite element mesh with 99 elements (\(15 \times 3\) elements for each of the longer rectangular patches) is used.

For branch switching, the eigenvector is then used as predictor in the next load step in order to follow the secondary equilibrium path and to study the post-buckling behavior. For the extended systems the directional derivative \(\frac{\textrm {d} }{\textrm {d} \epsilon } {\textbf {K}}(\varvec{{\Phi }}+\epsilon {\varvec{\phi }})|_{\epsilon =0}\) of the stiffness matrix is needed, where \({\varvec{\phi }}\in T_{\varvec{{\Phi }}} M^n\). For this we use the liftedFootnote 7 function to evaluate the derivative in the tangent space. To obtain this derivative within machine precision, we use a complex step derivative [54].

Fig. 19
figure 19

Z-displacements of point A and B for different discretizations with a zoomed-in subplot of the \(\lambda \)-range from 1.15 to 1.34. The curves are obtained using the distributed load \(q_X\)

Fig. 20
figure 20

Deformed and undeformed configuration, including nodal directors. The deformed configuration snapshot is obtained for \(\lambda =2.35\) and applying the distributed load \(q_X\)

Figure 19 shows the convergence of the load-displacement curves of points A and B and their corresponding out-of-plane Z-displacement. Here, the convergence of the critical load can be seen and furthermore the convergence of the post buckling path. The deformed structure, including the nodal directors, is shown in Fig. 20 for a load value of \(\lambda =2.35\). Figure 21 shows the convergence of the critical load factor for all three discussed load cases. The refinement factor k refers to the multiplicity of elements per edge compared to the coarsest mesh shown in Fig. 18.

Fig. 21
figure 21

Convergence of the critical load factor \({\lambda }_{\text {crit}}\) for the three load cases. The first point on the left corresponds to a mesh with 63 elements and the last to a mesh with 2800 elements

Table 7 Literature results of the critical load. Expanded version of ([78], Table 2). The question marks indicate where the load application point is not directly clear to the authors

10.5 Shearing of a Rectangular Sheet

In this last example we demonstrate the performance of the formulation for a more challenging setup. Similar to Example 6.4 in [78] we simulate an experiment of Wong and Pellegrino [95]. The experiment consists of a rectangular sheet with dimensions \(380\,\hbox {mm} \times 128\,\hbox {mm}\) and a thickness of \(0.025\,\hbox {mm}\). This results in an extremely large slenderness of \(15\,200\) if the thickness is compared to the longer sides of the rectangular sheet. The material properties mentioned in [95] are \(E=3500\,\hbox {N}/\hbox {m}\hbox {m}^{2}\) as Young’s modulus and \(\nu =0.31\) as Poisson’s ratio. The two short vertical edges are free and the two long horizontal edges are fully clamped. These boundary conditions and the mesh are shown in Fig. 22. In the experiment the upper edge is moved horizontally by \({\delta }_h=3\,\hbox {mm}\) and during the process a different number of wrinkles occur and vanish due to mode jumping. Wong and Pellegrino mention a small vertical prestress before the experiment is started but no numbers are given. We use an initial vertical displacement of \({\delta }_v=0.05\,\hbox {mm}\), which was also used in the simulations in [96].

Due to the slender geometry, it is extremely demanding—if at all possible—to compute the problem using an arc-length method. Wong and Pellegrino therefore resort to a stabilization technique available in the commercial code ABAQUS [88] as “pseudo dynamic solution procedure”. As an alternative, we solve this challenging problem similar to Sander et al. [78] using a Riemannian trust-region method, see [2] for details. This method guarantees global convergence [2]. This property allows to solve the problem by applying the displacement in one single load step.

In short, a standard Newton-Raphson scheme Algo. 1 can be interpreted as a method to find the stationary points of the second order model \(m_{\varvec{{\Phi }}_k}\) of the real energy \({\varPi }\) in a neighborhood of \(\varvec{{\Phi }}_{k}\). To construct the next update \(\Updelta \varvec{{\Phi }}\) in the solution process the incremental Newton-Raphson problem can be formulated as

$$\begin{aligned} \begin{aligned}&m_{\varvec{{\Phi }}_k}(\Updelta \varvec{{\Phi }}) = {\varPi }(\varvec{{\Phi }}_k) + {\textbf {R}}^\text {riem}(\varvec{{\Phi }}_k)\cdot \Updelta \varvec{{\Phi }}+ \frac{1}{2} \Updelta \varvec{{\Phi }} \cdot ({\textbf {K}}^\text {riem} \Updelta \varvec{{\Phi }}) \\&{{\,\textrm{stat}\,}}_{\Updelta \varvec{{\Phi }}} ~ m_{\varvec{{\Phi }}_k}(\Updelta \varvec{{\Phi }}) \implies \Updelta \varvec{{\Phi }} = {({\textbf {K}}^\text {riem})}^{-1} {\textbf {R}}^\text {riem} . \end{aligned} \end{aligned}$$
(10.8)

With this, the nodal values are updated and the new model around \(\varvec{{\Phi }}_{k+1}\) is constructed. Since this method returns a stationary point in every iteration, an increase in energy can occur between iterations. Depending on this property the Newton-Raphson iteration can diverge. The trust-region method instead trusts this model \(m_{\varvec{{\Phi }}_k}(\Updelta \varvec{{\Phi }})\) in Eq. (10.8) of the true energy \({\varPi }\) only in a region around the center \(\varvec{{\Phi }}_k\) and restricts the length of the next iterate such that \(||\Updelta \varvec{{\Phi }}||_{\textbf {M}}< \Updelta _{{\textrm {T}}{\textrm {R}}}\). Additionally, the update value \(\Updelta \varvec{{\Phi }}\) is constrained to yield a decreasing energy value. This yields the constrained minimization problem

$$\begin{aligned} \begin{aligned} \min _{\Updelta \varvec{{\Phi }}, \text { s.t. }||\Updelta \varvec{{\Phi }}||_{\textbf {M}}< \Updelta _{{\textrm {T}}{\textrm {R}}}} m_{\varvec{{\Phi }}_k}(\Updelta \varvec{{\Phi }}) , \end{aligned} \end{aligned}$$
(10.9)

where \(||{{\textbf {x}}}||_{\textbf {M}}= \sqrt{{{\textbf {x}}}^T {\textbf {M}}{{\textbf {x}}}}\) denotes the norm with respect to a generic preconditioner \({\textbf {M}}\). Several ways to solve this trust-region subproblem of Eq. (10.9) exist in the literature, see [25]. We choose the preconditioned Steihaug-Toint truncated conjugate gradient method using an incomplete Cholesky decomposition (Eigen::IncompleteCholesky) of the Eigen library [39] as preconditioner \({\textbf {M}}\). This method does not always yield the correct minimum of the model, if the minimum lies at the boundary, but only returns a sufficient energy decrease. Furthermore, depending on the trust in the current model \(m_{\varvec{{\Phi }}_k}\) that the algorithm has, the trust-region radius \(\Updelta _{{\textrm {T}}{\textrm {R}}}\) is adjusted in every iteration.

We use a mesh of \(240\times 80\) elements with quadratic \(C^0\)-continuous shape functions as in [78], along with the projection-based interpolation from Eq. (10.6)\(_2\). The mesh is shown in Fig. 22. This results in \(19\,200\) elements and \(387\,205\) degrees of freedom. Additionally, in contrast to the involved including of imperfections in [96], we start the solution of every trust-region subproblem in Eq. (10.9) from a random vector \(\Updelta \varvec{{\Phi }} \) that is scaled such that it lies inside the initial trust-region. This is done for the first load step, which applies a prestress via a prescribed vertical displacement \({\delta }_v=0.05\,\hbox {mm}\). This load step is accepted if the displacement norm drops below \(1 \times 10^{-8}\). This remaining distortion in \(Z-\)direction is enough to break the initial symmetry and to trigger buckling.

Due to the globally convergent nature of the trust region algorithm, we may apply the entire horizontal displacement of the upper edge of \(\delta _h=0.5\,\hbox {mm}\) or \(\delta _h=3\,\hbox {mm}\), respectively, in one single load step. Obviously, the deformation history of the developing wrinkles is captured better, if more intermediate equilibrium points are used. But for this example we observed that the final state does not differ much, qualitively, if we use one hundred increments instead of a single load step. The iteration is accepted as converged, if the norm of the displacement correction drops below \(1 \times 10^{-6}\) in this second load step. The left hand side in Fig. 23 corresponds to a horizontal displacement of \(\delta _h=0.5\,\hbox {mm}\). In the diagram on the bottom left the wavenumber (15 “valleys”) of the simulation coincides with the on observed in the experiment. Furthermore, the amplitudes of the wrinkles are captured quite satisfactorily. For the larger displacement \(\delta _h=3\,\hbox {mm}\) (pictures on the right in Fig. 23) the number of valleys in the experiment is 19 as opposed to 20 in our simulation. The amplitude of the wrinkles is less well captured, especially for the wrinkles at the right and left boundary. However, the results still coincide qualitatively; in particular, the different inclination of the wrinkles and their hierarchical geometry at the the top and bottom edge are clearly visible in the contour plot.

Fig. 22
figure 22

System sketch with dimensions, boundary conditions and the \(240 \times 80\) mesh

Fig. 23
figure 23

Comparison of the shearing experiment of [95] and results of our simulation; \(240 \times 80\) elements with quadratic \(C^0\)-continuous shape functions; left: \(\delta _h=0.5\,\hbox {mm}\), right: \(\delta _h=3\hbox {mm}\)

11 Conclusion

Various aspects of a consistent geometrically non-linear Reissner-Mindlin shell formulation and its finite element discretization are discussed. We paid special attention to interpolation of the director, the inextensibility condition and consistent linearization.

Objectivity, path independence and efficiency of the formulation are discussed and confirmed by numerical examples. As one of the most important results, we conclude that a combination of the projection-based interpolation and a radial return normalization is the most recommended choice.

Projection-based interpolation is particularly useful for higher order ansatz functions, for instance in the context of isogeometric analysis. We conclude that, at least for uni-axial bending, fulfilling the inextensibility condition for the director only at the nodes and neglecting it in the domain, as it is frequently done, is acceptable for (bi-)linear shape functions. For higher order shape functions it deteriorates the rate of convergence.

For the update of the tangent bases of the nodal directors, incremental vector transport (IncVT) has been proposed as a numerically more efficient alternative to incremental parallel transport. Both schemes are free from singularities. In many cases, however, stereographic projection is the recommended scheme, because it requires the least numerical effort. For the case of external moment loads, however, depending on the implementation, stereographic projection may be unfeasible because of the singularity. In such cases, IncVT may be the better choice.

Additionally, we presented the steps how to generate from a continuous minimization problem formulated on a manifold an algebraic problem. Here, we pointed out the role of an additional part of the stiffness matrix which improves convergence behavior and allows even larger load steps.

We expect that the entirety of issues discussed in this work can improve standard large rotation shell formulations found in the literature, even if they use only (bi-)linear ansatz functions and fulfill the inextensibility condition for the director at the nodes only.

Additionally, if all these recommondations are applied, one ends up with a formulation completly free from trigonometric functions, despite its capability to track large rotation deformations. This is not only benefical from a numerical point of view. We also highlight the more compact theory and a formulation which is—in our view—easier to understand. To facilitate further benchmarking of the proposed element, the C++ routines for the strain-displacement operator and the geometric stiffness matrix are published in Appendix 7. Additionally, we provide an implementation of the element using projection-based interpolation in MATLAB [58]. It contains the example from Sect. 10.2.2 and the example of Sect. 10.3 without the support rotation. The element and its stress-based counterpart are also currently implemented in the publicly available multi-disciplinary finite element code KRATOS [27] in the IgaApplication.