1 Introduction

Continuum-based theories combined with the finite element (FE) method provide a powerful tool for the virtual design and testing of engineering components. Regarding classical solid mechanical problems, the FE formulation is based on the weak form of the balance of linear momentum which is a universal and well known physical principle corresponding to the set of material independent equations [24, 49]. Furthermore, in order to simulate components which consist of a specific material, constitutive equations which describe the behavior of the considered material have to be formulated mathematically and parameterized based on experimental or micro-mechanically generated data. Since this process is however still a challenging task for single phase materials or composites which reveal complex behavior as highly nonlinear elasticity, anisotropy or dissipative properties, it could be useful to automate or circumvent the formulation of constitutive equations. For this purpose, several approaches – generally referred to as data-based or data-driven methods – exist which have become increasingly popular in the computational mechanics community during the last years [4, 47]. In the following, a brief overview of these methods is given.

Pure data-based methods in solid mechanics A first possibility to replace classical constitutive equations is the direct data-driven mechanics approach, recently proposed by Kirchdoerfer and Ortiz [31]. This method totally avoids the formulation of constitutive equations and uses sets of stress-strain tuples which characterize the material’s behavior instead. Consequently, a data-driven solver seeks to minimize the distance between the searched solution \((\varvec{\sigma },\varvec{\varepsilon })\) and the material data set within a proper energy norm, while compatibility and equilibrium have to be satisfied simultaneously. In the meanwhile, several extensions and applications of the direct data-driven approach have been proposed, e. g., for noisy data sets [3, 32], dynamics [33], finite strains [48], inelasticity [12], fracture mechanics [5] or bio-mechanics [51].

In contrast to classical constitutive modeling, the data-driven approach obviously requires a huge set of data characterizing the specific behavior of a material. These data could be collected from experiments or, due to the steadily increasing computing power, generated in silicio from lower scale simulations. Within the data-driven formalism the latter one has been applied by Karapiperis et al. [30] for sand recently. In order to enable the collection of data from experiments, Leygue et al. [38] and Stainier et al. [55] have introduced the data-driven identification which allows to collect strain-stress tuples from displacement fields and boundary conditions in inhomogeneous samples. An application of the data-driven identification to real experiments is shown in [10], the robustness regarding incomplete input data is discussed in [11].

A further possibility to circumvent the formulation of classical constitutive equations has been proposed by Ibañez et al. [25]. Therein, the construction of constitutive manifolds from collected data is described for elasticity and inelasticity. Based on this, an application to nonlinear elasticity is given in [26]. Finally, the strict fulfillment of the 2nd law of thermodynamics, i.e., the thermodynamic consistency, during the construction of constitutive manifolds is possible by using the GENERIC paradigm (General Equation for Non-Equilibrium Reversible-Irreversible Coupling) [22].

Besides the discussed direct data-driven approach and the construction of constitutive manifolds which are comparatively new data-based methods, the application of artificial neural networks (ANNs) as a mechanical constitutive model have already been proposed in the early 90s by Ghabussi et al. [19]. Due to their easy use, straight forward integrability into FE codes [23] and the excellent ability to learn complex relationships, the usage of ANNs is the most common data-based constitutive modeling method in the computational mechanics community. Thereby, the topology of the ANN, i.e., the network architecture, plays a decisive role in its overall functioning. Thus, the so called input, hidden and output layers differ in their basic tasks. Accordingly, input variables, e. g., strains \(\varvec{\varepsilon }\), enter into the system in the input layer and are processed further in the hidden layer. Finally, the signals that reach the output layer are the output variables, e. g., stresses \(\varvec{\sigma }\). In so called feedforward neural networks (FNNs) connections between the neurons are always uniquely directed. In contrast, ANNs for which feedback is explicitly desired are called recurrent neural networks (RNNs), cf. [34]. Both types of networks have been used for the mechanical constitutive modeling intensively. For instance, to model one-dimensional viscoelasticity [1], viscoplasticity [17] or time dependent behavior of concrete [28]. Likewise, several two- and three-dimensional ANN-based models could be found, e. g., to describe the behavior of geomaterials [20] or nonlinear viscoplasticity [56]. In order to ensure thermodynamic consistency of the trained ANN-models, Masi et al. [44] recently suggested the concept of thermodynamics-based ANNs for the constitutive modeling. Thereby, a scalar energy function is approximated by the network instead of the stress \(\varvec{\sigma }\). In order to still enable the training with respect to \(\varvec{\sigma }\), customized training loops in which gradients of the energy with respect to the input variables occur in the loss, see also [9, 58].

Finally, ANNs have been used to set up so called decoupled multiscale schemes [29, 57] which enable computationally efficient macroscopic simulations based on the homogenized response of lower scale representative volume elements (RVEs). Thereby, in contrast to computationally expensive FE\({}^2\) schemes [35] which require an RVE-simulation at each quadrature point in each iteration, the small scale processes within the heterogeneous microstructure that lead to the effective macroscopic response are implicitly captured via an appropriate macro- or surrogate-model. In this context, ANN-based models are predestined for the approximation of the homogenized data basis. For instance, this has been done for the scale bridging of composites with cubic microstructures [37], the simulation of the elastic-plastic deformation behavior of open-cell foam structures [52] and to link stresses and strains or traction-separation curves obtained from molecular dynamics simulations to the continuum scale [6, 14], respectively. A combination of classical FE\({}^2\) simulations with on-the-fly adaptive switching to ANN-based surrogate models is shown by Fritzen et al. [16].

Hybrid approaches in solid mechanics The big advantage of pure data-based methods, i.e., the potential to model the reality with high accuracy in an automatized manner, is also countered by some disadvantages: namely the loss of physical sense, the lack of data and the possibly high computational cost. According to that, a combination of classical constitutive models with data-based methods or even an enrichment with physical knowledge – the so called hybrid approach – seems to be a promising modeling strategy.

A first possibility to apply the hybrid approach is to enhance classical well established constitutive models with information from data. Thus, the stress \(\varvec{\sigma }\) is calculated as the sum of the model prediction \(\varvec{\sigma }^\text {mod}\) and a correction obtained from a data-based method which describes the deviation between the chosen model and the real data, i.e. \(\varvec{\sigma }- \varvec{\sigma }^\text {mod}\). This technique has been used for the correction of hyperelastic models in the GENERIC formalism by González et al. [21] or for the correction of plasticity models by Ibañez et al. [27].

Likewise, it is possible to replace parts of classical models with data-based approaches as ANNs. This has been done by Settgast et al. [53] for irreversible elastic-plastic deformations and damage of metal foams, whereby the yield function and the evolution equations are described by ANNs. An application for the description of uncured natural rubber by combining the micro sphere model with FNNs and RNNs is shown by Zopf and Kaliske [59]. In Liu et al. [42], a collection of connected mechanistic building blocks with analytical homogenization solutions is used to describe complex macroscopic responses without the loss of essential physics. In the context of isotropic hyperelasticity Shen et al. [54] as well as Liang and Chandrashekhara [39] approximate the free energy function by an ANN-based model with three deformation-type invariants as the input and thus incorporate knowledge about the anisotropy class of the considered material. This methodology, i.e., the usage of invariants as the input instead of the strain, has been continued for materials with more general anisotropy [40, 41].

Content Within this contribution, an ANN-based hybrid approach for the efficient automated modeling and simulation of isotropic hyperelastic solids is presented. Following Shen et al. [54], three deformation-type invariants are used as the input in order to incorporate physical knowledge. Unlike previous works, three corresponding stress coefficients which follow from simple linear least square optimizations replace the stress tensor in the output layer likewise. Thus, a fully physically-based reduction of the problem’s dimensionality is applied for both, input and output quantities, respectively. With the reduced data set, an ANN is trained by using standard machine learning methods. A combination of the predicted stress coefficients with derivatives of the invariants with respect to the deformation finally enables the calculation of the stress. Furthermore, in order to ensure thermodynamic consistency a posteriori, the previously trained network is modified by constructing a pseudo-potential within an integration step and a subsequent derivation which leads to a further ANN-based model. The proposed strategy enables the efficient application of ANNs for the constitutive modeling by using standard training algorithms. It is exemplarily applied to the description of a highly nonlinear Ogden type material [49], where only two-dimensional plane stress data are needed for the training of a fully three-dimensional model. The trained ANNs are included into an open source FE software to test their approximation quality in different three-dimensional simulation examples.

The organization of the paper is as follows: In Sect. 2, the basic equations of the finite strain continuum mechanics theory as well as general constitutive relations for isotropic hyperelastic solids are given. After this, a general framework for the ANN-based modeling of isotropic hyperelasticity is presented in Sect. 3. The developed approach is exemplarily applied to hypothetic data which were generated by using a Ogden model in Sect. 4. Therein, the data processing and training processes as well as a validation by means of representative FE simulations are shown. After a discussion of the results, the paper is closed by concluding remarks and an outlook to necessary future work in Sect. 5.

2 Continuum solid mechanics and hyperelasticity

In this section, the main relations of hyperelastic constitutive models including the relevant basic continuum mechanical equations are summarized. For a detailed overview, the reader is referred to the textbooks of Holzapfel [24] or Ogden [49]. In order to describe the physical state of the considered solids, the space of tensors

$$\begin{aligned} {\mathcal {L}}_{n}:=\underbrace{\mathbb {R}^3 \otimes \cdots \otimes \mathbb {R}^3}_{n\text {-times}} \ \forall n\in \mathbb {N}_{\ge 1} , \end{aligned}$$
(1)

except for a tensor of rank zero, is defined, where \(\mathbb {R}^3\) is the Euclidean vector space and \(\mathbb {N}\) is the set of natural numbers.

2.1 Kinematics and balance laws

Regarding the reference and current configurations of a material body given by \({\mathcal {B}}_0 \subset \mathbb {R}^3\) and \({\mathcal {B}}\subset \mathbb {R}^3\), the displacement vector \(\varvec{u} \in {\mathcal {L}}_{1}\) of a material point capturing the positions \(\varvec{X} \in {\mathcal {B}}_0\) at time \(t_0 \in \mathbb {R}_{\ge 0}\) and \(\varvec{x} \in {\mathcal {B}}\) at time \(t>t_0\) is given by \(\varvec{u}(\varvec{X},t):=\varvec{\varphi }(\varvec{X},t) - \varvec{X}\). Therein, \(\varvec{\varphi }(\varvec{X},t)\) represents the bijective motion function which is continuous in space and time. With that, in order to describe the state of deformation, the deformation gradient \(\varvec{F} \in {\mathcal {L}}_{2}\) and its determinant are defined by

$$\begin{aligned} \varvec{F} := (\nabla _{\!\!{\varvec{X}}}\varvec{\varphi })^\text {T} \quad \text {and} \quad J:=\det \varvec{F} > 0 , \end{aligned}$$
(2)

where \(\nabla _{\!\!{\varvec{X}}}:=\varvec{E}_K \partial _{X_K}\) is the nabla-operator with respect to the reference configuration. Therein, \(\varvec{E}_K \in {\mathcal {L}}_{1}\) denotes the Cartesian basis vector. Furthermore, the positive definite right Cauchy-Green deformation tensor \(\varvec{C}:=\varvec{F}^\text {T} \cdot \varvec{F} \in \mathscr {S}\mathscr {y}\mathscr {m}\) with \(\mathscr {S}\mathscr {y}\mathscr {m}:=\left\{ \varvec{\tau }\in {\mathcal {L}}_{2} \, |\, \varvec{\tau }= \varvec{\tau }^\text {T}\right\} \) is introduced.

To complete the necessary general equations, relevant balance laws are given in short. Supposing mass conservation, the balance of linear momentum with neglected inertia terms is given by

$$\begin{aligned} \nabla \cdot \varvec{\sigma }+ \varrho \varvec{f} = \varvec{0} \quad \text {or} \quad \nabla _{\!\!{\varvec{X}}}\cdot (\varvec{T} \cdot \varvec{F}^\text {T}) + \varrho _0 \varvec{f} = \varvec{0} , \end{aligned}$$
(3)

where the symmetry of the introduced Cauchy and 2nd Piola-Kirchhoff stress tensors \(\varvec{\sigma }\in \mathscr {S}\mathscr {y}\mathscr {m}\) and \(\varvec{T} \in \mathscr {S}\mathscr {y}\mathscr {m}\) follows from the balance of angular momentum. These tensors are related via the equation \(\varvec{T} := J \varvec{F}^{-1}\cdot \varvec{\sigma }\cdot \varvec{F}^{-\text {T}}\). The symbols \(\varvec{f}\), \(\varrho \) and \(\varrho _0\) denote a mass specific force and the mass density with respect to \({\mathcal {B}}\) and \({\mathcal {B}}_0\), respectively. Finally, in order to ensure thermodynamic consistency, all physical processes have to fulfill the Clausius-Duhem-inequality. If thermal effects are neglected, it is given by

$$\begin{aligned} -{{\dot{\Psi }}} + \frac{1}{2} \varvec{T} : \dot{\varvec{C}} \ge 0 . \end{aligned}$$
(4)

In the equation above, \(\Psi \) denotes the Helmholtz free energy density with respect to \(\mathrm {d}V_0 \subset {\mathcal {B}}_0\) and \(\dot{(\cdot )}\) is the material time derivative.

2.2 Constitutive relations for isotropic hyperelastic materials

Regarding a hyperelastic solid, one finds from Eq. (4) and applying the principle of Coleman and Noll [7], that the stress tensor \(\varvec{T}\) follows from the Helmholtz free energy density \(\Psi \) by

$$\begin{aligned} \varvec{T} = 2\frac{\partial \Psi }{\partial \varvec{C}} . \end{aligned}$$
(5)

If the behavior is restricted to isotropy, i.e. the relation \(\Psi (\varvec{C}) = \Psi (\varvec{Q} \cdot \varvec{C} \cdot \varvec{Q}^\text {T})\) holds for all special orthogonal tensors \(\varvec{Q} \in \mathscr {O}\mathscr {r}\mathscr {t}\mathscr {h}^+ :=\{ \varvec{\tau }\in {\mathcal {L}}_{2} \; | \; \varvec{\tau }^\text {T} = \varvec{\tau }^{-1} \; ,\det \varvec{\tau }\equiv 1 \}\), the free energy function can be given by \(\Psi = \Psi (I_1,I_2,I_3)\) including

$$\begin{aligned} I_1 := {{\,\mathrm{tr}\,}}\varvec{C} , \; I_2 := \frac{1}{2} \left( {{\,\mathrm{tr}\,}}^2 \varvec{C} - {{\,\mathrm{tr}\,}}\varvec{C}^2 \right) \; \text {and} \; I_3:=\det \varvec{C} , \end{aligned}$$
(6)

where \(I_\alpha \) with \(\alpha \in \left\{ 1,2,3\right\} \) denote the invariants of \(\varvec{C}\). Using the definition (6), the stress follows to

$$\begin{aligned} \varvec{T} = \sum _{\alpha =1}^{3}\underbrace{2 \frac{\partial \Psi }{\partial I_\alpha }}_{\displaystyle =:f_\alpha } \; \underbrace{\frac{\partial I_\alpha }{\partial \varvec{C}}}_{\displaystyle =:\varvec{G}^\alpha } = \sum _{\alpha =1}^{3} f_\alpha \varvec{G}^\alpha . \end{aligned}$$
(7)

Therein, \(f_\alpha \) and \(\varvec{G}^\alpha \in {\mathcal {L}}_{2}\) denote stress coefficients related to the invariants \(I_\alpha \) and corresponding tensor generators, respectively.

3 ANN-based constitutive modeling framework for isotropic elasticity

After the continuum mechanical basics, the following section introduces the main idea of this work, i.e. the way how the constitutive ANN is efficiently trained by a suitable data set \({\mathcal {D}}:=\{{\mathcal {D}}_1,{\mathcal {D}}_2, \dots , \mathcal D_n\}\) consisting of \(n\in \mathbb {N}\) data tuples \({\mathcal {D}}_i := ({}^{i} \underline{\mathbf {C}} , {}^{i}\underline{\mathbf {T}}) \in \mathbb {R}^{6 \times 1} \times \mathbb {R}^{6 \times 1}\). The introduced vectors \({}^{i} \underline{\mathbf {C}}\) and \({}^{i} \underline{\mathbf {T}}\) contain the coordinates of the symmetric deformation and stress tensors, respectively: \({}^{i} \underline{\mathbf {C}} = ({}^i C_{11}, {}^i C_{22}, \dots , {}^i C_{12})^\text {T}\), \({}^{i} \underline{\mathbf {T}} = ({}^i T_{11}, {}^i T_{22}, \dots , {}^i T_{12})^\text {T}\). Within the proposed framework, known physical conditions and constraints are incorporated, where – as mentioned in the beginning – a restriction to isotropic hyperelasticity is made for now. The framework is subdivided into the steps

  1. (a)

    data mining,

  2. (b)

    data processing,

  3. (c.1)

    training process, and

  4. (c.2)

    thermodynamically consistent correction.

A scheme representing the procedure of the approach is depicted in Fig. 1. The single steps are described detailed in the following.

Fig. 1
figure 1

Procedure of the ANN-based constitutive modeling framework for isotropic elasticity with four steps. a data mining: collection from stess-strain tuples from FE simulations, b data processing: reduction of the data set’s dimensionality, c.1 training process: training of the constitutive ANN within the reduced data space, and c.2 thermodynamically consistent correction: adjustment of the weights based on the Clausius-Duhem inequality

3.1 Data mining (a)

To start with, the data basis for the training of the network has to be achieved. Thereby it is important to collect a broad set of stress strain states to ensure the correct description of the material’s behavior later on. Ideally, the data mining has to be done based on real experiments of complex shaped samples in combination with digital image correlation [10]. The evaluation of such experiments with the algorithm proposed by Leygue et al. [38] enables the extraction of a high number of multiaxial stress-strain states for the considered material. However, in want of such experiments, the data basis is generated numerically by means of virtual experiments for now. To this end, a virtual sample is tested in several FE simulations. Within these virtual experiments which are described in the examples given in Sect. 4, the required data tuples \({\mathcal {D}}_i\) are collected at the quadrature points of the finite elements.

3.2 Data processing – transforming the data into the space of invariants (b)

Now it is assumed that a suitable data set \({\mathcal {D}}\) of a specific material is available. In order to achieve a constitutive ANN which is able to compute the stress \(\varvec{T}\) from a given state of deformation \(\varvec{C}\) with high accuracy and at the same time as few neurons as possible, it is essential to reduce the dimensionality of the input and output quantities. To this end, a variety of methods exist, e. g. the locally linear embedding (LLE) technique which is used in Ibañez et al. [25].

Following the works of Shen et al. [54] or Liang and Chandrashekhara [39], such a transformation is naturally given for the input values by using the invariants according to Eq. (6) instead of the coordinates of the deformation tensor \(\varvec{C}\).Footnote 1 However, in contrast to these works in which the free energy density \(\Psi : \mathscr {S}\mathscr {y}\mathscr {m}\rightarrow \mathbb {R}_{\ge 0}\) is approximated by the ANN and the stress tensor \(\varvec{T}\) is not trained directly, quantities related to \(\varvec{T}\) should serve as the output in the following. Consequently, the evaluation of the path integral

$$\begin{aligned} \Psi (\varvec{C}) = \frac{1}{2} \int \limits _{{\mathcal {P}}_{\varvec{C}}} \varvec{T}(\tilde{\varvec{C}}) : \mathrm {d}\tilde{\varvec{C}} \end{aligned}$$
(8)

with \({\mathcal {P}}_{\varvec{C}}:[a,b] \rightarrow \mathscr {S}\mathscr {y}\mathscr {m},\; {\mathcal {P}}_{\varvec{C}}(a)=\varvec{1}, \; P_{\varvec{C}}(b)=\varvec{C}\) from the data tuples \({\mathcal {D}}_i\) could be avoided.Footnote 2 In order to reduce the dimensionality of the output variables anyhow, the isotropy property according to Eqs. (6) and (7) is used so that the variables \(f_\alpha \) replace the coordinates of \(\varvec{T}\). The ANN thus directly approximates on the level of stresses which is preferable for the solution of the balance of linear momentum (3).

Fig. 2
figure 2

Transformation of a hyperelastic stress-strain data set \({\mathcal {D}}\) into the reduced space \({\mathcal {D}}^\text {red}\) consisting of strain invariants \(I_\alpha \) and stress coefficients \(f_\alpha \). Correlation plots of the transformed quantities: a deformation components and strain invariants, as well as b stress components and stress coefficients

In contrast to the deformation type invariants \(I_\alpha \), the stress coefficients \(f_\alpha \) cannot be determined directly since the analytical expression for the free energy function \(\Psi \) is unknown. However, if it is assumed that \(\Psi (I_1,I_2,I_3)\) exists, the relation

$$\begin{aligned} {}^i\varvec{T} = \sum _{\alpha =1}^{3} {}^i \!f_\alpha {}^i \varvec{G}^\alpha \end{aligned}$$
(9)

holds for each tuple \({\mathcal {D}}_i\). Consequently, the three stress coefficients \({}^{i}\!f_\alpha \) could be determined from a simple least square optimization for each tuple. For the cases of states with different eigenvalues \(\lambda _\alpha ^2 \in \mathbb {R}_{\ge 0}\) of the deformation tensor \(\varvec{C}\), the optimization problem is given by

$$\begin{aligned} {}^i \underline{\mathbf {f}} = \underset{{}^i\underline{\mathbf {f}} \in \mathbb {R}^{3 \times 1}}{\arg \min } \; \Big \Vert {}^i\varvec{T} - \sum _{\alpha =1}^3 {}^i \!f_{\alpha } {}^i \varvec{G}^{\alpha } \; \Big \Vert ^2 \; \forall \, i \in \{1,\dots ,n\}, \end{aligned}$$
(10)

where \(\Vert \cdot \Vert \) denotes the Frobenius norm of a rank two tensor. Due to the convexity with respect to \({}^i\!f_\alpha \), the solution of Eq. (10) is unique and follows from a linear optimization. With that, each data tuple \({\mathcal {D}}_i\) could be transformed to a tuple \({\mathcal {D}}_i^\text {red} := ({}^i\underline{\mathbf {I}}, {}^i\underline{\mathbf {f}}) \in \mathbb {R}^{3 \times 1} \times \mathbb {R}^{3 \times 1}\) of reduced dimensionality.

In Fig. 2, the transformation step is exemplarily visualized for deformation-stress states collected from an FE simulation of a uniaxially pulled disc with a hole. Thereby, the constitutive behavior is given by a hyperelastic Ogden model [49], cf. Sect. 4.1.1. In the transformation process, a transition from a rather disordered set of data to well-ordered structures in the reduced space takes place. Furthermore, as part of the transformation process, the cardinality n of the reduced data set \({\mathcal {D}}^\text {red}:=\{\mathcal D_1^\text {red},{\mathcal {D}}_2^\text {red}, \ldots , \mathcal D_n^\text {red}\}\) can be decreased by several orders of magnitude compared to the original set \({\mathcal {D}}\), by just removing similar tuples within a given tolerance. Consequently, the training of the constitutive ANN within the reduced space \({\mathcal {D}}^\text {red}\) is much more efficient later on.

Considering deformation states with multiple eigenvalues such as the undeformed one, where \(\varvec{C} = \varvec{1}\), the uniqueness of Eq. (10) is not given anymore. Thus, the true stress coefficients, which – according to Eq. (7) – have to follow from a free energy function, are not determinable as described above. Due to this, the sought values \(f_\alpha \) have to be calculated by means of an interpolation. Each function \(f_\alpha (I_1,I_2,I_3)\) is now approximated by a hyperplane in the invariant space. Thus, within the surrounding of the deformation state \(\varvec{C}^*\), the ansatz

$$\begin{aligned}&f_\alpha ^\text {int}(I_1,I_2,I_3,\underline{\varvec{\kappa }}) := \kappa _{\alpha 0} + \sum _{\beta =1}^{3} \kappa _{\alpha \beta } I_\beta \; \text {with} \end{aligned}$$
(11)
$$\begin{aligned}&\underline{\varvec{\kappa }}:= (\kappa _{10}, \ldots , \kappa _{13}, \kappa _{20}, \ldots , \kappa _{32}, \kappa _{33})^\text {T} \in \mathbb {R}^{12\times 1} \end{aligned}$$
(12)

for all \(\alpha \in \{1, 2, 3\}\) is chosen. The parameters \(\underline{\varvec{\kappa }}\) can now be find from a constrained optimization problem which ensures that the corresponding state of stress \(\varvec{T}^*\) is correct. In this case with \(\beta \in \{1, 2, 3\}\) it is given by

$$\begin{aligned} \underline{\varvec{\kappa }}= \underset{\underline{\varvec{\kappa }}\in {\mathcal {C}}}{\arg \min } \frac{1}{2}\sum _{k=1}^{n} \sum _{\alpha =1}^{3} \left( f_\alpha ^\text {int}(I_\beta ({}^k\varvec{C}),\underline{\kappa }) - {}^k \! f_\alpha \right) ^2 \end{aligned}$$
(13)

with respect to the corresponding constraint subset

$$\begin{aligned} {\mathcal {C}}:=\left\{ \underline{\varvec{\kappa }}\in \mathbb {R}^{12\times 1}: \; \varvec{T}^* = \sum _{\alpha =1}^{3} f_\alpha ^\text {int}(I_\beta (\varvec{C}^*),\underline{\varvec{\kappa }}) \; \partial _{\varvec{C}} I_\alpha (\varvec{C}^*) \right\} . \end{aligned}$$
(14)

3.3 Training of the constitutive ANN (c.1)

Based on the extracted and transformed data basis \(\mathcal D^\text {red}\), a feed forward neural network has to be trained which could be done by using conventional training loops. In the examples given in Sect. 4, the fasted training could be achieved with the deep learning toolbox included in Matlab, where a Levenberg-Marquardt optimization procedure has been used. Since input data \(I_\alpha \) and output data \(f_\alpha \) are normalized to the range \([-1,1]\), the ANN-based model for the stress coefficients is given by

$$\begin{aligned} f_\alpha ^\text {ANN}({\mathfrak {i}}_\beta ) := \underbrace{\frac{f_{(\alpha )}^\text {max} - f_{(\alpha )}^\text {min}}{2}}_{\displaystyle =: X_\alpha } \mathfrak f_\alpha ({\mathfrak {i}}_\beta ) + \underbrace{\frac{f_{\alpha }^\text {max} + f_{\alpha }^\text {min}}{2}}_{\displaystyle =: Y_\alpha } , \end{aligned}$$
(15)

where the superscripts \((\cdot )^\text {max}\) and \((\cdot )^\text {min}\) denote the maximum and minimum components of a given data set, respectively. Choosing hyperbolic tangent activation functions and restricting the network architecture to only one hidden layer, the normalized stress coefficients are given by

$$\begin{aligned}&{\mathfrak {f}}_\alpha ({\mathfrak {i}}_\beta )&:= B_\alpha + \sum _{\beta =1}^{N} W_{\alpha \beta } \tanh \left( \sum _{\gamma =1}^{3} w_{\beta \gamma } {\mathfrak {i}}_\gamma + b_\beta \right) \;, \end{aligned}$$
(16)
$$\begin{aligned}&{\mathfrak {i}}_\beta (I_{(\beta )})&:= \Biggl [I_\beta - \underbrace{\frac{I_{\beta }^\text {max} + I_{\beta }^\text {min}}{2}}_{\displaystyle =: V_\alpha }\Biggr ]\underbrace{\frac{2}{I_{(\beta )}^\text {max} - I_{(\beta )}^\text {min}}}_{\displaystyle =: U_\alpha ^{-1}} , \end{aligned}$$
(17)

where \({\mathfrak {f}}_\alpha ,\; {\mathfrak {i}}_\beta \in [-1,1]\). In the equations above, \(W_{\alpha \beta }\), \(w_{\alpha \beta }\) and \(B_\alpha \), \(b_\beta \) are weights and bias values, so that the total number of training parameters follows to \(p=3+7N\). Using the introduced Eqs. (15)–(17), the linkage of the 2nd Piola-Kirchhoff stress tensor \(\varvec{T}\) and the right Cauchy-Green deformation tensor \(\varvec{C}\) is modeled by

$$\begin{aligned} \varvec{T}^\text {ANN}(\varvec{C}) = \sum _{\alpha =1}^3 f_\alpha ^\text {ANN}(I_\beta (\varvec{C})) \, \varvec{G}^\alpha (I_\gamma (\varvec{C}),\varvec{C}) . \end{aligned}$$
(18)

The model equations have been implemented within the FE toolbox FEniCS [2, 43]. Therein, the tangent

$$\begin{aligned} \begin{aligned} {\mathbb {C}}^\text {ANN}&:= 2 \frac{\partial \varvec{T}}{\partial \varvec{C}} \in {\mathcal {L}}_{4} \\&= 2 \sum _{\alpha =1}^3\sum _{\beta =1}^3 \frac{\partial f_\alpha ^\text {ANN}}{\partial I_\beta } \frac{\partial I_\alpha }{\partial \varvec{C}}{\otimes } \frac{\partial I_\beta }{\partial \varvec{C}} {+} 2 \sum _{\alpha =1}^3 f_\alpha ^\text {ANN} \frac{\partial ^2 I_\alpha }{\partial \varvec{C} \partial \varvec{C}}, \end{aligned} \end{aligned}$$
(19)

which is required within the solution via a Newton–Raphson scheme, is calculated automatically by means of automatic differentiation. The reader is referred to Hashash et al. [23] for details concerning the implementation into classical FE codes.

3.4 Thermodynamically consistent correction (c.2)

With the conventional training of the constitutive ANN, an accurate approximation of the deformation-stress relation is possible. However, the thermodynamic consistency of the ANN-based model is not ensured in any case, since this would require the existence of a free energy density \(\Psi \) that is related to the stresses via Eq. (5). Thus, in order to satisfy thermodynamic consistency a priori, customized training loops which optimize the network not only with respect to the output which have to be chosen to \(\Psi \), but also with respect to gradients \(\partial _{\varvec{C}} \Psi \) of the output quantities have to be used, cf. [9, 41, 44, 58].

Within this work, an alternative approach which allows to easily use standard machine learning algorithms is presented. It obligatorily enforces the fulfillment of the 2nd law of thermodynamics a posteriori and is based on the calculation of a pseudo-potential from a network which has been previously trained according to (c.1).

In a first step, the mechanical work W along a piecewise smooth path \({\mathcal {P}}_{\varvec{C}}:[a,b] \rightarrow \mathscr {S}\mathscr {y}\mathscr {m}\) has to be calculated by the integration

$$\begin{aligned} W = \frac{1}{2} \int \limits _{{\mathcal {P}}_{\varvec{C}}} \varvec{T}(\tilde{\varvec{C}}) : \mathrm d \tilde{\varvec{C}} \end{aligned}$$
(20)

using the approximated relation (18). By exploiting the differential relationship \(\mathrm {d}I_\alpha = \partial _{\varvec{C}}I_\alpha : \mathrm {d}\varvec{C} = \varvec{G}^\alpha : \mathrm {d}\varvec{C}\), the introduced integral (20) can be transformed into the space of invariants. Consequently, the path \({\mathcal {P}}_{\varvec{C}}\) is now replaced by \({\mathcal {P}}_I:[a,b]\rightarrow \mathbb {R}^{3\times 1}\) and it follows

$$\begin{aligned} W&= \sum _{\alpha =1}^{3}\;\int \limits _{{\mathcal {P}}_I} f_\alpha ({{\tilde{I}}}_1,{{\tilde{I}}}_2,{{\tilde{I}}}_3) \, \mathrm d {{\tilde{I}}}_\alpha \nonumber \\&= \sum _{\alpha =1}^{3}\int \limits _a^b f_\alpha ({{\tilde{I}}}_1(p),\tilde{I}_2(p),{{\tilde{I}}}_3(p)) {{\tilde{I}}}_\alpha '(p) \, \mathrm d p . \end{aligned}$$
(21)

To facilitate the calculation of Eq. (21), a path given by the parametrization \({{\tilde{I}}}_\alpha = ({{\hat{I}}}_\alpha - I_\alpha ^0) p + I_\alpha ^0\) with \(p\in [0,1]\) is chosen in the following, where \(I_\alpha ^0\) denotes the values of the invariants according to Eq. (6) at \(\varvec{C}=\varvec{1}\). A graphical visualization of an arbitrary and the chosen path is given in Fig. 3

Fig. 3
figure 3

Possible integration paths from the undeformed state \(\varvec{C}^0\) to a deformed state \(\hat{\varvec{C}}\) within the invariant space. The chosen straight path is given by \(\tilde{I}_\alpha = ({{\hat{I}}}_\alpha - I_\alpha ^0) p + I_\alpha ^0\)

Considering a trained constitutive ANN with only one hidden layer, the integral could be solved analytically. By using Eqs. (15)–(17), the mechanical work W follows to

$$\begin{aligned} \begin{aligned} W&= \frac{1}{2}\sum _{\alpha =1}^3 \left[ \sum _{\beta =1}^N \frac{({{\hat{I}}}_\alpha - I_\alpha ^0) X_\alpha W_{\alpha \beta }}{\displaystyle \sum \limits _{\mu =1}^3 \frac{w_{\beta \mu }}{U_\mu } ({{\hat{I}}}_\mu - I_\mu ^0)} \Lambda _\beta + B_\alpha X_\alpha + Y_\alpha \right] , \\ \Lambda _\beta&:= \ln \left( \frac{\cosh \bigg (\displaystyle \sum \limits _{\gamma =1}^3 \frac{w_{\beta \gamma }}{U_\gamma } ({{\hat{I}}}_\gamma - V_\gamma ) + b_\beta \bigg )}{\cosh \bigg ( \displaystyle \sum \limits _{\kappa =1}^3 \frac{w_{\beta \kappa }}{U_\kappa } ( I_\kappa ^0 - V_\kappa ) + b_\beta \bigg )}\right) . \end{aligned} \end{aligned}$$
(22)

Now, the determined work W is assumed to define a pseudo-potential \(\Psi ^* := W\)Footnote 3, so that an alternative model for the stress response can be obtained by \(\varvec{T}^* = 2 \partial _{\varvec{C}} \Psi ^*\). Since the corrected ANN-based model fulfills Eq. (5), the thermodynamic consistency is guaranteed in any case. The updated stress coefficients are given by

$$\begin{aligned} \begin{aligned} f_\varepsilon ^\text {ANN*}&= Y_\varepsilon + X_{(\varepsilon )} B_\varepsilon + \sum \limits _{\beta =1}^N \frac{X_{(\varepsilon )} W_{\varepsilon \beta }}{\sum \limits _{\mu =1}^3 \frac{w_{\beta \mu }}{U_\mu } (I_\mu - I_\mu ^0)} \Lambda _\beta \\&\quad + \sum \limits _{\alpha =1}^3 \sum \limits _{\beta =1}^N \frac{(I_\alpha - I_\alpha ^0) \frac{w_{\beta \varepsilon }}{U_{(\varepsilon )}} X_\alpha W_{\alpha \beta }}{\sum \limits _{\kappa =1}^3 \frac{w_{\beta \kappa }}{U_\kappa } (I_\kappa - I_\kappa ^0)} \Phi _\beta \\&\quad - \sum \limits _{\alpha =1}^3 \sum \limits _{\beta =1}^N \frac{(I_\alpha - I_\alpha ^0) \frac{w_{\beta \varepsilon }}{U_{(\varepsilon )}} X_\alpha W_{\alpha \beta }}{\Bigl (\sum \limits _{\kappa =1}^3 \frac{w_{\beta \kappa }}{U_\kappa } (I_\kappa - I_\kappa ^0)\Bigr )^2} \Lambda _\beta , \\ \Phi _\beta&:= \tanh \bigg ( \sum \limits _{\gamma =1}^3 \frac{w_{\beta \gamma }}{U_\gamma } (I_\gamma - V_\gamma ) + b_\beta \bigg ) \end{aligned} \end{aligned}$$
(23)

including normalization weights \(X_{\alpha }\), \(Y_{\alpha }\), \(U_{\alpha }\) and \(V_{\alpha }\). This results in the corrected ANN-based model

$$\begin{aligned} \varvec{T}^\text {ANN*}(\varvec{C}) = \sum _{\alpha =1}^{3} f_\alpha ^\text {ANN*}(I_\beta (\varvec{C})) \, \varvec{G}^\alpha (I_\gamma (\varvec{C}),\varvec{C}) , \end{aligned}$$
(24)

where the weights \(b_{\alpha }\), \(B_{\alpha }\), \(w_{\alpha \beta }\) and \(W_{\alpha \beta }\) – determined in the training step (c.1) –  stay unchanged. Consequently, no customized training loops have to be designed for the application of the proposed approach. Similar to the original model (18) , the thermodynamically corrected equations are implemented into FEniCS.

4 Examples

4.1 Data generation

In a first step, the data set for the training of the ANN has to be achieved. As mentioned above, the data basis is generated numerically in want of real experiments for now. To this end, a virtual sample is tested in several FE simulations. Within these virtual experiments, the required data set \({\mathcal {D}}\) consisting of the tuples \({\mathcal {D}}_i\) is collected at the quadrature points of the finite elements.

Table 1 Constitutive parameters of the Ogden model (27) within the virtual experiments. Initial shear modulus \(G^\mathrm{init}\) and Poisson’s ratio \(\nu ^\mathrm{init}\) as well as parameter sets \(\mu _p\), \(\alpha _p\) and \(\kappa \)

4.1.1 Constitutive behavior

In order to demonstrate the ability of the proposed ANN-based method, a highly nonlinear stress-strain relation is chosen for the constitutive behavior of the sample’s material. To this end, an Ogden model [49] which is given by the free energy density function

$$\begin{aligned} \Psi := \sum _{p=1}^{N_\text {O}} \frac{\mu _p}{\alpha _p} \left( \sum _{\beta =1}^{N_\lambda } \nu _\beta {\bar{\lambda }}_\beta ^{\alpha _p} -3 \right) + \frac{\kappa }{4} \left( J^2 - 2 \ln J -1 \right) \end{aligned}$$
(25)

is used. In the equation above, \(\mu _p\), \(\alpha _p\) and \(\kappa \) denote parameters of the Ogden model and the compression modulus. The symbols \(\nu _\beta \in \{1,2,3\}\) and \(N_\lambda \in \{1,2,3\}\) are the algebraic multiplicity of \(\lambda _\beta \) and the number of independent eigenvalues. Furthermore, \({\bar{\lambda }}_\beta := J^{-1/3} \lambda _\beta \) are the isochoric principal stretches which follow from the Flory split [15]

$$\begin{aligned} \varvec{F} = J^{1/3} \varvec{1} \cdot \bar{\varvec{F}} \quad \text {with} \quad \bar{\varvec{F}}=J^{-1/3} \varvec{F} \; \text {and} \; \det \bar{\varvec{F}} \equiv 1 , \end{aligned}$$
(26)

i.e. the decomposition of the deformation gradient into volumetric \(J^{1/3} \varvec{1}\) and isochoric \(\bar{\varvec{F}}\) parts. The stress of the Ogden model is given by

$$\begin{aligned} \begin{aligned} \varvec{T} = \sum _{\beta =1}^{N_\lambda } \Biggl [&\frac{1}{\lambda _\beta ^2} \sum _{p=1}^{N_\text {O}} \mu _p \left( \bar{\lambda }_\beta ^{\alpha _p} -\frac{1}{3} \sum _{\gamma =1}^{N_\lambda } \nu _\gamma {\bar{\lambda }}_\gamma ^{\alpha _p}\right) \\&+ \frac{\kappa }{2} \lambda _\beta ^{-2} (J^2-1) \Biggr ] \varvec{M}^\beta , \end{aligned} \end{aligned}$$
(27)

where \(\varvec{M}^\beta \in {\mathcal {L}}_2\) denotes the projection tensors of the right Cauchy-Green deformation tensor:

$$\begin{aligned} \varvec{C} = \sum _{\beta =1}^{N_\lambda } \lambda _\beta ^2 \varvec{M}^\beta \; \text {with} \; \varvec{M}^\beta := \delta _{1 N_\lambda } \varvec{1} + \prod _{\beta \ne \alpha }^{N_\lambda } \frac{\varvec{C} - \lambda _\beta ^2 \varvec{1}}{\lambda _\alpha ^2 - \lambda _\beta ^2} .\nonumber \\ \end{aligned}$$
(28)

For further details on the calculation of isotropic tensor functions, the interested reader is referred to Miehe [45, 46].

Fig. 4
figure 4

Uniaxial tension test of the Ogden model (27) for the parameters given in Table 1: Stretch-stress curves for the Cauchy stress \(\sigma _{11}\) as well as the 1st and 2nd Piola-Kirchhoff stresses \(P_{11}\) and \(T_{11}\)

The material parameters chosen for the virtual experiments are given in Table 1, where they are related to initial shear modulus \(G^\text {init}\) and Poisson’s ratio \(\nu ^\text {init}\) via

$$\begin{aligned} G^\text {init}=\frac{1}{2} \sum _{p=1}^{N_\text {O}} \alpha _p \mu _p \quad \text {and} \quad \kappa = \frac{2}{3} G^\text {init} \frac{1+\nu ^\text {init}}{1-2\nu ^\text {init}} . \end{aligned}$$
(29)

Regarding the stretch-stress curves for uniaxial loading, a highly nonlinear response is achieved with this parameter set, cf. Fig. 4. Here, the quantity \(\varvec{P}:=J \varvec{F}^{-1} \cdot \varvec{\sigma }\in {\mathcal {L}}_2\) is the non-symmetric 1st Piola–Kirchhoff stress tensor.

4.1.2 Virtual experiments

In order to enable the training process of the constitutive ANN within a broad subset of the invariant space \({\mathcal {I}}\subset \mathbb {R}^{3\times 1}\), the deformation states of the virtual samples have to be as heterogeneous as possible. With regard to real experiments [10, 50], non-uniform thin discs, e. g., with holes, are chosen for this purpose. Thereby it has to be taken into account that only displacements on the surface of the sample are measurable in an experimental setting. A thin sample which could be loaded into a close to plane stress state is thus preferable.

Consequently, in order to obtain perfect plane stress data, the virtual experiments are realized within a two-dimensional setting.Footnote 4 A study analyzing two main influences – loading and hole geometry – with respect to the resulting sets of invariants is presented in the following. Thereby, the FE meshes were generated by using Gmsh [18].

Influence of the specimen loading on the data set In a first study, the two load cases (i) uniaxial tension and (ii) equi-biaxial tension are considered for a specimen with circular hole (sample I). As shown in the schematic representations given in Fig. 5, displacement boundary conditions with a maximum value of \({{\hat{u}}} \le {60}\,{\mathrm{mm}}\) for the uniaxial tension and a maximum value of \({{\hat{u}}} \le {40}\,{\mathrm{mm}}\) for the equi-biaxial tension are prescribed within 40 increments. Each data set is depicted in the three invariant-planes \(I_1\)\(I_2\), \(I_1\)\(I_3\) and \(I_2\)\(I_3\) within Fig. 6a, whereby the homogeneous uniaxial as well as equi-biaxial tension and compression load cases are included for reasons of orientation. For the case of incompressible solids, where \(I_3 \equiv 1\) holds, these curves are independent of the material’s behavior and delimit the area of admissible invariant sets within the \(I_1\)\(I_2\)-plane [8, 41]. However, these statements do not hold without restriction for the considered case of a compressible material.

Fig. 5
figure 5

Specimen loading with bearings and prescribed displacement \(\varvec{u}\) within the virtual experiments: a uniaxial tension and b equibiaxial tension

Fig. 6
figure 6

Representation of the extracted data sets by means of the strain invariants \(I_\alpha \) for the analyzed plane stress samples: a sample I with a circular hole, b sample II with an elliptical hole, and c sample III with arbitrarily arranged elliptical holes. The plotted gray lines represent the homogeneous uniaxial as well as equi-biaxial tension and compression states for the considered material

Fig. 7
figure 7

Visualization of the reduced Ogden-type elastic data set used for the training of the constitutive ANN. The data set is shown by means of the three stress coefficients \(f_\alpha \) in dependence of the three strain invariants \(I_\alpha \)

Regarding the collected data sets in the invariant space, it can be seen that \({\mathcal {I}}\) is widely sampled between the plotted curves for homogeneous uniaxial as well as equi-biaxial tension and compression load cases, see Fig. 6a. Furthermore, although only tensile tests were performed, data points in the compression range with \(I_3 < 1\) can also be found due to the inhomogeneous stress and strain states occurring in the specimen. Finally, it is observed that \({\mathcal {I}}\) is sampled over a much wider range by the equi-biaxial tensile test compared to the uniaxial one. As a resulting consequence for the investigated disc containing a circular hole geometry, the calibration of ANN-based models can be significantly improved if equi-biaxial tension tests are available.

Influence of the hole geometry on the data set Besides the loading type, the two-dimensional specimen geometry is varied in the following second study. For better comparability, the plots for sample II and III given in the Fig. 6b, c are truncated with respect to those of specimen I.

A comparison of the simulation results for sample I and II reveals that a replacement of the circular by a more complex elliptical hole geometry leads to an improved sampling of \({\mathcal {I}}\). The ratios of the collected uniaxial and biaxial data sets however stay approximately unchanged. Considering the data for sample III which includes several randomly distributed ellipses, so once again more complex than sample II, the information content of the simple uniaxial tension test is clearly increased compared to the samples I and II. Here, much more data points could be collected, especially in the compression range.

Accordingly, in order to test the proposed method schematized in Fig. 1 for the most easily realizable experimental setup, only the uniaxial tension data of sample III will be used for the ANN training process later on. The highly nonlinear character of the investigated Ogden-type material manifests in the corresponding stress coefficients \(f_\alpha \) shown in Fig. 7. To close the virtual experiments study, it is mentioned again that the collected data set consists of true plane stress tuples with \({}^{i}\underline{\mathbf {T}} = ({}^i T_{11}, {}^i T_{22}, 0, 0, 0, {}^i T_{12})^\text {T}\), since a two-dimensional simulation setting were used during the virtual experiments. In three-dimensional specimens, of course, gradients in the thickness direction will always appear under real loading conditions. However, the resulting deformation and stress components on the sample’s surface are in good agreement compared to a realistic three-dimensional setting. This is shown in the Appendix A for the considered example, Fig. 14.

4.2 Training of the constitutive ANN

After the selection of a suitable data set \({\mathcal {D}}\) and the transformation to \({\mathcal {D}}^\text {red}\), the ANN training process according to Sect. 3 (c.1) – i.e. the calibration of the normalized stress coefficients \(\mathfrak f_\alpha \) defined in Eq. (15) – is performed. In order to find a network which contains a minimum number of parameters while approximating \({\mathfrak {f}}_\alpha (\mathfrak i_\beta )\) with sufficient quality, the number of neurons \(N \in \mathbb {N}\) in the only hidden layer is systematically varied within the range \([1, 20] \cap \mathbb {N}\). For each particular neuron number, the respective ANN is trained 100 times, where the weights of the best achieved training state are stored at the end.Footnote 5 For this purpose, the reduced data set \({\mathcal {D}}^\text {red}\) is randomly divided into training (\(70\,\%\)), validation (\(15\,\%\)) and test (\(15\,\%\)) data. These subsets serve for the calibration in the training, generalization after each training progress and for the evaluation of the network’s quality afterwards, respectively. In order to evaluate the trained networks with respect to the prediction of the stress tensor \(\varvec{T}\), the following mean and maximum error measures are used:

$$\begin{aligned} {}^i\epsilon _\text {mean}&:= \frac{({}^i T_{11}^\text {ANN} - {}^i T_{11}) + \cdots + ({}^i T_{12}^\text {ANN} - {}^i T_{12})}{6 \Vert {}^i \underline{\mathbf {T}}\Vert _\infty } \; \text {and} \end{aligned}$$
(30)
$$\begin{aligned} {}^i \epsilon _\text {max}&:= \frac{\Vert {}^i \underline{\mathbf {T}}^\text {ANN} - {}^i \underline{\mathbf {T}}\Vert _\infty }{6 \Vert {}^i \underline{\mathbf {T}}\Vert _\infty } . \end{aligned}$$
(31)

In the equations above, \(\Vert \cdot \Vert _\infty \) denotes the maximum norm of a vector valued quantity. To avoid singularities, the introduced errors are set to zero if \(\Vert {}^i \underline{\mathbf {T}}\Vert _\infty \le 10^{-4}\) kPa holds which is a negligible stress value according to the material’s shear modulus \(G^\text {init}={100}~{\mathrm{kPa}}\).

Fig. 8
figure 8

Magnitude of the a maximum error measures \(\max _i \vert {}^i\!\epsilon _\text {mean}\vert \) and b \(\max _i {}^i\!\epsilon _\text {max}\) for training, validation data and test data in relation to the number of neurons in hidden layer

The performance of the trained ANNs measured by the maximum values \(\max |{}^i\epsilon _\text {mean}|\) and \(\max {}^i\epsilon _\text {max}\) is given in Fig. 8. According to that, a clear tendency to decrease the errors while increasing the number of neurons N in the hidden layer is visible. Defining the criteria \(\max _i \vert {}^i\epsilon _\text {mean}\vert \le 0.015\,\%\) and \(\max _i {}^i\epsilon _\text {max} \le 0.015\,\%\) lead to the selection of an ANN with \(N = 11\). Regarding other ANN-based approaches, this is a rather small network, see [6, 13]. Moreover, the transformation step \({\mathcal {D}} \rightarrow {\mathcal {D}}^\text {red}\) enables to use a set containing only \(n\approx 300\) tuples \({\mathcal {D}}^\text {red}_i\) for the training which is, again, comparatively small, cf. [37, 41].

Finally, the predictions of the chosen network are compared with the true plane stress values generated by using the Ogden model in Fig. 9. Analogously, this comparison is depicted for the thermodynamically corrected model, cf. the formalism given in Sect. 3 (c.2), within Fig. 10. Summarizing the approximation quality, both ANN-based models provide nearly perfect predictions for the in-plane stress components. However, for the out-of-plane components, the corrected model performs slightly worse compared to the simple original ANN model, although the predictions are still small in magnitude compared to the in-plane components.

4.3 Validation

In order to prove the suitability of both ANN-based models for the simulation of complex shaped samples and components, a comparison to reference results generated with the Ogden model (27) which has been used for the calibration is shown in the following. Thereby, two three-dimensional problems are considered: the uniaxial tension of a cuboid sample, and the torsion of a prismatic torsional sample, cf. the sketches given in Fig. 11a, b.

Fig. 9
figure 9

Verification of the ANN-based stress prediction (18) for a model with \(N=11\) neurons trained with pure plane stress data: a in-plane components and b out-of-plane components

Fig. 10
figure 10

Verification of the thermodynamically corrected stress prediction for an ANN-based model with \(N=11\) neurons trained with pure plane stress data: a in-plane components and b out-of-plane components

Fig. 11
figure 11

Reaction loads for the considered validation cases: a force-displacement curves of the cuboid sample with edge length \({100}\,{\mathrm{mm}} \times {100}\,{\mathrm{mm}} \times {25}\,{\mathrm{mm}}\) under uniaxial tension, and b torque-angle curves of the torsional sample with edge length \({200}\,{\mathrm{mm}} \times {100}\,{\mathrm{mm}} \times {100}\,{\mathrm{mm}}\)

Cuboid under uniaxial tension To start with, the three-dimensional cuboid with a thickness-length ratio of 1/4 is considered. It is loaded under uniaxial tension, where a displacement of \({{\hat{u}}} = {60}\,{\mathrm{mm}}\) – this is equal to \({60}{\%}\) effective strain of the sample – is prescribed on the top surface. Although the overall strain is identical to the utilized training load case, complex three-dimensional stress states are expected within the cuboid. Thus, regarding the highly nonlinear material response of the reference constitutive law, this example is a good possibility to validate and explore potential limits of the ANN-based approaches.

Fig. 12
figure 12

Simulated local distribution of secondary field quantities for the cuboid under uniaxial tension as well as relative errors of the ANN-based models (18) and (24) with respect to the reference Ogden model: 11-components of a right Cauchy-Green deformation tensor \(\varvec{C}\), and b 1st Piola-Kirchhoff stress tensor \(\varvec{P}\). The surface plots are given on the deformed configuration \({\mathcal {B}}\)

For all of the three models, the global reaction force depending on the prescribed displacement has been calculated within the simulations. Considering these curves which are given in Fig. 11a, a nearly perfect alignment of both ANN-solutions with the reference curve is observed.

Furthermore, in order to evaluate the ANN-predictions of local deformation \(\varvec{C}\) and stress fields \(\varvec{P}\), the simulated 11-components within the cuboid domain are compared to each other in Fig. 12a, b, respectively. According to that, relative errors below \({0.06}{\%}\) and \({0.16}{\%}\) are achievable with the original and the corrected ANN-based model, respectively. Slightly increased deviations below \({0.47}{\%}\) and \({0.42}{\%}\) are observed for the stress component \(P_{11}\), whereby – in order to avoid singularities – the absolute errors \(|P_{11}-P_{11}^\text {ANN}|\) and \(|P_{11}-P_{11}^\text {ANN*}|\) are related to the maximum value of \(|P_{11}|\). As expected, the predictive performance of the corrected model (24) is of similar quality compared to the initial model (18). However, a significantly aggravated convergence behavior within the Newton iteration is observed within the performed FE simulation for the corrected model. This behavior is supposably attributable to the singular points occurring in the corrected stress coefficients, cf. Eq. (23). In total, slightly twice as many iterations are required compared to the uncorrected ANN model.

Altogether, a rather high accuracy could be achieved for \(\varvec{C}\) and \(\varvec{P}\), although the occurring maximum stress of \(P_{11}={237}\,{\mathrm{kPa}}\) is increased compared to the training data set which is moreover purely two-dimensional with respect to the stresses, cf. Fig. 9a. In view of the inability of ANNs to extrapolate, this is a quite astonishing result. However, since the trained network lives in the reduced \(I_\alpha \)\(f_\beta \)-space and the final stress prediction is obtained in combination with the tensor-valued generators \(\varvec{G}^\alpha \), the network actually does not, or only slightly, have to extrapolate at all. More precisely, the deformation-type invariant set of the considered cuboid under uniaxial loading is almost completely included within the data set used for the training process. This is shown in Fig. 15a within the Appendix B. However, since the cuboid’s data set within the space \({\mathcal {I}}\) is not completely covered by the training data set, extrapolation is at least partially necessary.

Torsional sample In the second validation setup, a three-dimensional torsion sample with three circular holes is loaded by specifying a distortion of \({\hat{\phi }} = {45}^{\circ }\).

Similar to the previous validation case, the global reaction torque depending on the prescribed distortion has been calculated. Again, a nearly perfect alignment of both ANN-solutions with the reference curve is visible in Fig. 11b. Likewise, a comparison of the predictions for local strain \(C_{12}\) and stress \(P_{12}\) reveal relative errors in a similar range as for the first example, see Fig. 13. The stress prediction of the corrected model is now apparently worth compared to the original one but however still below \({0.46}\%\).

Fig. 13
figure 13

Simulated local distribution of secondary field quantities for the torsional sample as well as relative errors of the ANN-based models (18) and (24) with respect to the reference Ogden model: 11-components of a right Cauchy-Green deformation tensor \(\varvec{C}\), and b 1st Piola-Kirchhoff stress tensor \(\varvec{P}\). The surface plots are given on the deformed configuration \({\mathcal {B}}\)

Thus, also in this example, the ANN-based approach is very well able to describe the trained nonlinear constitutive behavior within the FE simulation of a comparatively complex load case. This is once again due to the fact that the deformation-type invariant space of the torsion sample is included into the used training data set within wide areas which is shown in the It has to be Appendix B, Fig. 15b. Thereby, the used ANN with not more than \(N=11\) neurons in only one hidden layer is very small compared to other ANN-based constitutive models [41].

5 Conclusions

In this work, a novel ANN-based approach for the automated constitutive modeling of isotropic hyperelastic solids is presented. It is based on a physically motivated reduction of the problem’s dimensionality for the deformation and stress type quantities. In contrast to the most ANN-based constitutive models, the proposed procedure enables a highly accurate stress prediction with only small network architectures. Furthermore, it is possible to resolve fully three-dimensional stress fields while only two-dimensional plane stress data are used for the training process of the ANN.

Starting from basic continuum mechanical equations, a short revision of hyperelastic constitutive models is given. Based on this, the developed ANN-based modeling approach is presented, where four steps – (a) data mining, (b) data processing, (c.1) training process, and (c.2) thermodynamically consistent correction – are discussed in detail. This approach is exemplarily applied to data collected from virtual experiments with samples comprising a highly nonlinear Ogden type material behavior. Thereby, different influences on the resulting data sets are investigated. In view of realistic experiments, where only information on the sample’s surface are available and thin specimens are thus preferable, the virtual samples are loaded in pure plane stress states. Finally, the trained ANN-based models are used for the simulation of two three-dimensional samples, a cuboid under uniaxial tension and a prismatic torsional sample. A comparison of these simulations with the results achieved with the Ogden type model shows a rather high accuracy for both ANN-based models with a small architecture consisting of only 11 neurons in one hidden layer.

Altogether, the presented ANN-based approach has shown to be an efficient tool for the description of isotropic hyperelastic solids which reveal a strongly nonlinear stress-strain-response. Due to the use of standard machine learning toolboxes, it is easy to use and could be integrated as a user subroutine into commercial or non-commercial FE codes.

In order to enable the robust application of the proposed ANN-based procedure for the analysis and design of complex engineering components, several extensions have to be made in the future. For instance, an extension for the processing of noisy data sets [3, 36] which enables to use data collected from real experiments is essential. Furthermore, to overcome the poor convergence behavior of the a posteriori corrected model, it is favorable to approximate the free energy function \(\Psi \) instead of variables on the stress level with the ANN. With that in combination with customized training processes which incorporate gradients \(\partial _{\varvec{C}}\Psi \) in the loss, an a priori thermodynamically consistent training process is possible [41, 44, 58]. Finally, an extension to more general anisotropy classes [13, 41] or dissipative constitutive behavior [44, 56, 59] is needed.