Viscoelasticty with physics-augmented neural networks: Model formulation and training methods without prescribed internal variables

,


Introduction
Despite a large number of existing classical constitutive models for nonlinear elastic and inelastic materials [1,2], the description of novel materials with complex constitu-tive behavior remains a challenging task.The choice of a suitable model and the identification of the corresponding material parameters is time-consuming and does not necessarily lead to results with the desired accuracy, so that the development of new specialized models may be necessary.For this reason, a new class of approaches has emerged in recent years, which are often referred to as data-driven or data-based methods [3,4].In particular, the application of artificial neural networks (NNs) [5] has become popular and has the potential to replace conventional modeling strategies under certain circumstances.As universal function approximators, they are capable of expressing complex relations that are difficult to grasp with manual modeling efforts, given that the training is successful.

Constitutive modeling with neural networks
Historically, the pioneering work of Ghaboussi et al. [6] from the beginning of the 1990s is particularly noteworthy.In this work, NNs, specifically feedforward neural networks (FNNs), are used for the first time to predict hysteresis in uniaxial and multiaxial stress states.Therein, the FNN is fed with information from several previous time steps to enable it to learn history-dependent behavior.In Furukawa and Yagawa [7], a constitutive model based on an FNN is presented that can be used to train uniaxial viscoplastic behavior.However, this requires internal variables in the training process, whose experimental identification is also described.Despite the rather simple nature of these models from today's point of view, approaches of this kind indicate the potential of data-driven constitutive modeling.Without having to decide on a specific model, it is possible to learn complex material behavior.With the recent rise in popularity of NNs and the associated rapid progress in efficiency and accessibility, many different methods have emerged in an extremely short time, extending and improving these approaches.For example, the works [8][9][10][11][12] propose advanced techniques using FNNs, while [11][12][13][14][15] make use of recurrent neural networks (RNNs), which are capable of making predictions based on past events due to their internal structure [16].
Thus, recurrent architectures are an appealing way to model inelastic behavior, since the provision of history variables or internal variables can be avoided [17].In particular, the development of advanced RNN cells such as long shortterm memory (LSTM) [18] or gated recurrent units (GRUs) [19], which provide increased memory capacity and enable an efficient training, lead to great popularity and rapid progress in this field.
What remains a challenge for NN-based approaches is the lack of a fundamental inclusion of physics, especially the second law of thermodynamics.The networks map input quantities directly to the variable of interest, for example the stress, and thus approximate only the phenomenological relationship between input and output [6,7].Compared to most classical constitutive models, the physical motivation is completely missing.This has some significant downsides.Such models, often denoted as black box models, cannot guarantee robustness of the predictions beyond the training range covered by the used training data.Exceeding this range may not only lead to wrong but also to physically implausible predictions [20,21].Furthermore, the training process is entirely driven by the training data and is not supported by existing knowledge from classical constitutive modeling.This may lead to more complex optimization problems that exhibit gradient conflicts and require more training data [21].For this reason, it seems natural to integrate the existing knowledge from continuum mechanics and constitutive modeling into the NN architecture to combine the advantages of both concepts.This kind of NN-based constitutive models or scientific machine learning approaches in general are labeled as physics-informed [22,23], mechanics-informed [24], physics-augmented [25], physics-constrained [26], thermodynamics-informed [27] or thermodynamics-based [21].These approaches have been intensively pursued for a few years with great success in constitutive modeling for elastic [20,23,25,26,[28][29][30] elastoplastic [21,27,[31][32][33][34][35][36] or viscoelastic behavior [36][37][38][39][40]. Thereby, a distinction must be made between methods with weak fulfillment of the principles by an additional term in the loss function and strong fulfillment with a priori compliance with the respective principle by constraining the architecture of the network [41].According to the comparative study presented in Rosenkranz et al. [17], the second approach is more promising since it is more efficient in terms of required data, robust and can extrapolate very well due to the high degree of incorporated physics, but involves some difficulties.The challenge here is to efficiently restrict the network without loosing too much flexibility.
For elastic and especially hyperelastic materials, many works have been published on that topic, e.g., [20,24,26,28,29,[42][43][44], among others, in which different requirements for a constitutive model are incorporated in a strong sense.Thereby, an elastic potential is approximated by using an FNN with the deformation or strain invariants as input.To allow the calibration of the network directly by tuples of stress and strain, the derivative of the potential with respect to the deformation, i.e., the network's input, is included into the loss, which is also called Sobolev training [20,45].With easily accessible implementations for automatic differentiation [46] in popular libraries like TensorFlow, PyTorch or JAX, this is no longer a major difficulty and is used in a wide variety of research areas [22,47].Furthermore, polyconvex NN models [20,29,48] have been formulated by using fully input convex neural networks (FICNNs) as introduced by Amos et al. [49].Also parametrized polyconvex models [50] have been formulated with partially input convex neural networks (PICNNs) [49].
Regarding NN-based constitutive modeling of inelastic behavior with a strong physical background, a variety of works have been published meanwhile.Thereby, approaches applying the concept of internal variables have shown to be particularily well suited.Remarkable works on the NN-based modeling of plasticity in recent years are, for example [21,27,[31][32][33][51][52][53], among others.In many of these works, however, the thermodynamic consistency is only fulfilled in a weak sense by adding a penalty term to the loss or internal variables must be known prior to training.When using the models in multiscale problems, internal variables can be determined, e.g., by autoencoders [54,55], but in real experimental setups the determination might not be practical.Thus, there are still some challenges to overcome, in particular with regard to the fulfillment of physical conditions by construction or the provision of internal variables during training.Only the elasto-plastic NN models [32,33] a priori fulfill the second law of thermodynamics and do not require internal variables in the training data set at the same time.A pioneering NN-based approach to model viscoelastic behavior is presented in [56].Thereby, thermodynamic consistency is ensured by using a convex dissipation potential but internal variables are required for training.Several approaches based on a similar modeling strategy can be found [17,39,44,57].In contrast to [56], however, no prior knowledge of the internal variable(s) is needed for training there.

Objectives and contributions of this work
As outlined above, NN-based constitutive models for viscoelastic problems using internal variables and accounting for fundamental physics as the second law of thermodynamics have so far received comparatively little attention.In addition, the techniques for providing internal variables during training are not satisfactory in every case, as either a high computational effort is required or flexibility and accuracy are not sufficient.Thus, within this contribution, we present a physics-augmented NN model for viscoelastic materials and an efficient training method which only requires stress and strain paths for training.The model is built on the concept of generalized standard materials and is therefore thermodynamically consistent by construction.It consists of a free energy and a dissipation potential, which can be either expressed by the coordinates of their tensor arguments or by a suitable set of invariants.The two potentials are expressed by FICNNs/PICNNs [49].For training of the NN model by paths of stress and strain, an efficient and flexible training method based on an LSTM cell is developed to automatically provide the internal variable(s) during the training process.The focus of this work is on a comprehensive benchmark test of the proposed method and a comparison with existing approaches.These include a method that obtains the internal variable by integrating the evolution equation over the entire sequence [37], while the other method uses an auxiliary FNN for the internal variable(s) [24].Databases for training are generated by using a conventional nonlinear viscoelastic reference model, where 3D and 2D plane strain data with either ideal or noisy stresses are generated, respectively.The coordinate-based and the invariant-based formulation are compared and the three training methods are applied.
The article is organized as follows: After a short summary of the concept of generalized standard materials in Sec. 2, the NN-based model is described in Sec. 3. In Sec. 4, three methods to calibrate the NN model are explained.Subsequently, the presented model and the training methods are tested and compared within numerical examples in Sec. 5 using different data sets with ideal or noisy stress data.After a discussion of the results, some closing remarks are given in Sec. 6.
Notation Throughout this work, several important quantities are tensors of different ranks.The space of tensors of rank  ≥ 1 is denoted as L  .Especially, the cases of  = 2 and  = 4, i.e., tensors of rank two and four, are of importance herein.A tensor of rank two is denoted with upright bold symbols as T =      ⊗   ∈ L 2 and a tensor of rank four with blackboard bold symbols as Therein, the Einstein summation convention is used, ⊗ is the dyadic product and   ∈ L 1 is the -th Cartesian basis vector.For tensors of rank two, only symmetric second order tensors S ∈  2 ⊂ L 2 are relevant in the scope of this work, with  2 = { S ∈ L 2 | S = S ⊤ } and S ⊤ =     ⊗   denoting the transpose of S. For tensors of rank four, the subset  4 ⊂ L 4 of tensors with major and minor symmetries is defined as . Some special fourth order tensors required here are the fully symmetric fourth order identity tensor and the deviatoric projector given by I D = I S − I K .In the above definitions, . .denotes double contraction of adjacent indices.Furthermore, tr(T) =   is the trace of T ∈ L 2 , the square of a tensor is T 2 =        ⊗   and the Frobenius norm is denoted by ∥T∥ in this work and is defined as ∥T∥ = √ T . .T ⊤ .

Generalized standard materials
The concept of generalized standard materials (GSMs) [58][59][60][61][62][63][64][65] allows the formulation of thermodynamically consistent constitutive models that are entirely described by two scalar valued functions.In the context of this work, thermodynamic consistency is equivalent to satisfying the Clausius-Duhem inequality where D is the dissipation rate, σ ∈  2 is the stress, ε ∈  2 is the strain rate, and  is the Helmholtz free energy density, or free energy for short.This free energy (ε, q  ) depends on the current strain ε ∈  2 and a set of internal variables q  ∈  2 .The stress σ and the internal forces τ  ∈  2 are obtained by differentiating the free energy with respect to the strain or the internal variables, respectively, i.e., In the context of GSMs, there exists another potential besides the free energy, the so-called dissipation potential ( q  , q  , ε), which depends on the rates of the internal variables q  and possibly on the strain and the internal variables themselves.Differentiating the dissipation potential with respect to the rate of the internal variables again yields the internal forces which are denoted with an additional ( •) to distinguish them from the internal forces τ  obtained with the free energy.To construct a constitutive model that complies with Ineq.(1), the dissipation potential must satisfy all of the following three requirements: (i) ( q  , q  , ε) must be convex in all q  , (ii) ( q  = 0, q  , ε) = 0 and (iii) ( q  , q  , ε) ≥ 0.
The material law is formulated with the Biot equation [66], that relates both potenials via Evaluating Eq. ( 4) gives rise to the evolution laws for the internal variables.The stated conditions on  automatically construct these evolution laws such that inequality Eq. ( 1) is satisfied.This framework of GSMs contains various classical constitutive models, including the viscoelastic reference model in Sec.5.1.1 that is used to generate the data for the numerical experiments.

Formulation of the physics-augmented neural network-based model
Based on the theoretical background from Sec. 2, an NN-based constitutive model for viscoelastic materials is presented.The model adapts the concept of generalized standard materials, where the two potentials are expressed as FNNs with incorporated convexity constraints [20,25,37,39,43,49,50], see App. A. First, the modeling approaches for free energy and dissipation potential are described, followed by the prediction process using the adapted free energy and dissipation potential.The training methods to find the weights and biases of the FNNs without requiring the internal variable in the training data set are explained in Sec. 4.

Modeling of the potentials
The potentials  and  can be either expressed in terms of tensor coordinates ((ε, q) and ( q, q, ε)) or with a set of invariants of those tensors ((I  ) and (I  )) [20,57,67].Since some details must be taken into account in the invariant formulation, this formulation is explained in more detail below.The coordinate formulation is obtained analogously with the corresponding simplifications.
Condition (v) is used for several reasons.On the one hand, the convexity of  results in the positive definiteness of the material tangent  εε , which offers numerical advantages in the application [68].On the other hand, the resulting restrictions simplify the training of the model.Furthermore, (v) is fulfilled for the used reference model from Sec. 5.1.1.Furterhmore, it should be noted that other works assume the convexity of  as well [39,69].
These constraints have to be incorporated into the FNN representation of , where  eq and  ov are each described by an FNN.These FNNs are denoted as ψeq and ψov , respectively.Requirements (i)-(v) can be satisfied as follows: (i) In order to ensure an isotropic free energy,  eq and  ov are expressed in terms of invariants of their tensor arguments [20], summarized in the invariant set =1 and I  ov = (  ov  ) 3 =1 , resp.That is,  eq (ε) becomes  eq (I  eq ) and  ov (p) becomes The specific choice of the invariant sets is explained in (v), where the convexity properties are discussed.(ii) In order to ensure (0, 0) = 0, each part  eq (I  eq (ε)) and  ov (I  ov (p)) is set to zero in the initial state, i.e.,  eq (I  eq (0)) = 0 and  ov (I  ov (0)) = 0.These requirements are not incorporated into the network functions themselves.Instead, a correction term is appended to the definitions of  eq (I  eq ) and  ov (I  ov ), that is,  eq (I  eq ) = ψeq (I  eq ) −  eq 0 and  ov (I  ov ) = ψov (I  ov ) −  ov 0 , where  eq 0 = ψeq (I  eq (ε = 0)) and  eq 0 = ψeq (I  ov (p = 0)).(iii) For the stress to vanish in the initial state, the equilibrium stress  ε  eq and the overstress  ε  ov must vanish.This is enforced with another set of correction terms  eq σ and  ov σ , such that  eq (I  eq ) = ψeq (I  eq ) −  eq 0 −  eq σ (I  eq ) and  ov (I  ov ) = Viscoelasticty with physics-augmented neural networks [20,24,56].These correction terms are defined such that, when differentiated with respect to ε, they compensate the error in the initial stress prediction by ψeq (I  eq ) or ψov (I  ov ), respectively.A possible, but inconvenient way to achieve this is to set

𝜕𝐼
eq  ε ε=0 . .ε (7) and analogous for  ov .However, this leads to a free energy that depends on not only the invariants, but also on the strain tensor itself, which may violate (i).Therefore,  eq σ (I  eq ) and  ov σ (I  ov ) may only depend on the invariants.This is achieved with

𝐼
eq 1 and ( 8) Since eq 1 = tr ε and = tr p are linear functions in ε and p, this does not effect convexity of .(iv) The internal force vanishes, if − q  ov p=0 = 0. Since − q  ov =  ε  ov holds and  ε  ov | p=0 = 0 is already assured with (iii), this requirement is already secured.(v) Convexity in ε and q can be achieved if two conditions are met: (a) The chosen invariants are convex with respect to ε and q and (b) ψeq and ψov are convex and non-decreasing in those invariants [29].Since both  eq (ε) and  ov (p) depend on a single symmetric tensor of rank two, a complete set of invariants composes, e.g., the invariants (tr S, tr S 2 , tr S 3 ), where S ∈  2 .The third invariant tr S 3 , however, is not convex in S, as shown in App.B. Therefore, the functional basis is adjusted suitably by replacing tr S 3 by the convex invariant tr S 4 .This is allowed, since tr S 3 can be expressed by tr S, tr S 2 and tr S 4 using the Cayley-Hamilton theorem and the original basis can be recovered.A comment on the convexity of the used invariants can be found in Appendix B. Consequently, the invariant bases in Eqs. ( 5) and ( 6) allow for the construction of convex functions using non-decreasing fully input convex neural networks(FICNNs)2 [20,29,49,70] as described in App.A.1.
Summarizing, the free energy is modeled with an additive decomposition, where each part is a non-decreasing FICNN with additional correction terms and is expressed in a set 2Since the invariants tr ε and tr p are linear functions of ε and q, the constraints on the corresponding weights in the passthrough layers of the non-decreasing FICNN as described in Sec.A are in general too restrictive.However, for the here shown NN-based model and the used reference material, this does not effect the prediction quality and is henceforth ignored. of three convex invariants, such that (ε, q) =  eq (ε) +  ov (p(ε, q)) , where eq (ε) =  eq (I  eq (ε)) = ψeq (I  eq ) −  eq 0 −  eq σ (I  eq ) and ( 11) With that, the free energy representation is physically meaningful, even in the untrained state of the NN model.Details on the hyperparameters of the FICNNs can be found in Sec.C To simplify notation, the weights and biases of the two FICNNs are summarized in a single parameter set   .The dissipation potential is constructed using similar techniques.

Dissipation potential
The NN-based dissipation potential depends on ( q, ε, q) [17].Similar to the non-equilibrium part of the free energy, the dependency of ε and q is expressed in terms of p = ε − q.
(b)  is convex with respect to its first argument, the rate of the internal variable q.(c) If the internal variable does not evolve, the dissipation potential vanishes, i.e., (0, p) = 0. (d) If the inelastic strain does not evolve, the dissipation potential is in a minimum, i.e.,  q (0, p) = 0.
The respective network to model this function is denoted as φ.The stated requirements are enforced with techniques similar to those used to construct the free energy.
(a) Formulating  in invariants enforces isotropy of the resulting function.The network φ(I  ( q, p)) depends on the six invariants I  ( q, p) = ( tr q, tr q 2 , tr q 4 , tr p, tr p 2 , tr p 3 ) .(13) Here, no mixed invariants between q and p are used.(b) To enforce convexity with respect to the rate of the internal variable, a non-decreasing partially input convex neural network (PICNN) [49,50] is used, which is convex with respect to the three invariants ( tr q, tr q 2 , tr q 4 ) and not necessarily convex with respect to ( tr p, tr p 2 , tr p 3 ).(c) Analogous to the free energy, a correction term  0 is appended to the definition of , such that  = φ −  0 .
0 compensates the offset in φ and is defined as  0 = φ(I  ( q = 0, p)).(d) By adapting the stress correction term for the free energy, a correction term  τ is appended, such that Summarizing, the dissipation potential is modeled with a non-decreasing PICNN and additional correction terms.
Viscoelasticty with physics-augmented neural networks Yes: No: eq ov q = q and exit : Structure of the invariant-based two-potential model and prediction process.The rate of the internal variables is determined iteratively so that the internal stress τ calculated from the free energy and the internal stress τ calculated from the dissipation potential are equal within a specified tolerance .
The dependency on q is expressed with convex invariants of q.That is, The NN model is thus thermodynamically consistent and isotropic by construction.All weights and biases of the PICNN with the hyperparamters from Sec. C are summarized in the parameter set   .

Coordinate formulation
The previous explanations refer to the invariant formulation.For the formulation with coordinates [39], requirements (i) and (a), i.e., the isotropy of the potentials, cannot be fulfilled.In principle, the same techniques are used to comply with the remaining requirements, but instead of convex and non-decreasing NNs, only convex NNs are used, as the tensors themselves are inputs of the networks.The coordinate formulations of the additional terms for zero energy and zero stress in the initial state are briefly explained using the example of  eq : The term  eq 0 for zero energy in the initial state is calculated as and the term  eq σ for the stress free initial state becomes The other additional terms for  ov and  are calculated analogously.

Incremental predictions with the trained models
Once the weights and biases of the networks for free energy and dissipation potential have been adapted, the constitutive model can be used to predict the constitutive response to a prescribed strain path consisting of multiple discrete time steps.In order to solve the evolution equation numerically, an implicit Euler scheme is applied for the temporal discretization, in which the rate of the internal variable is approximated as  q ≈ (  q − −1 q)/  Δ, where the superscript  corresponds to the -th time step of a sequence and  Δ =   − −1  is the time increment.In each time step, the rate of the internal variable is adapted iteratively, such that Eq. ( 4) is fulfilled up to a prescribed tolerance.That is, the resulting internal force, which is calculated from the free energy, i.e., τ = − q , and the internal force calculated from the dissipation potential, i.e., τ =  q , are supposed to be identical within a given tolerance.This iterative solution of Eq. ( 4) is done using a Newton-Raphson scheme, as depicted in Fig. 1.

Training methods
Since the internal variable and its rate are arguments of the free energy and the dissipation potential, they must be provided in the training process to render training possible.
In the following, however, it is assumed, that they are not present in the training data set.For data obtained through homogenization, a few approaches have been developed to enable the determination of internal variables using autoencoders [54,55].The training data set  = (ε(), σ(), q()) then comprises sequences of stresses, strains and internal variable(s).In real experiments, however, the determination of internal variables is in general not possible, shrinking the data set  = (ε(), σ()) to sequences of only stresses and strains.This section introduces three methods to address this issue.Two methods from the literature determine the internal variable by integrating the evolution equation over the entire sequence or by means of an additional auxiliary network in the form of an FNN.In the newly developed third method, the auxiliary network is replaced by an RNN.

Integration
The training approach denoted as integration in the following is probably the most intuitive among the three presented and has been applied in [37,57].In every training epoch, the stress response to individual sequences is determined following the scheme provided in Fig. 1.Starting from the initial state 0 ε = 0 σ = 0 q = 0, the stress 1 σ for the new strain 1 ε and the corresponding time step 1 Δ is calculated.The obtained internal variable is then passed on to the next time step and so forth until the last time step of the sequence is reached.The stresses σ determined this way are compared to the expected stresses σ from the training data set using the loss function where MAE(σ, σ) is the mean absolute error between predicted stress σ and expected stress σ.Compared to the mean squared error, the mean absolute error performed best for the problems in this article and is therefore used within the scope of this work.Furthermore,  σ is the normalization factor for the stress, that normalizes the loss to magnitude 1.The calculation of  σ is explained in A.3.
To adapt the weights and biases of the networks, the loss function Eq. ( 18) is minimized in the optimization problem where the parameters   and   correspond to the weights and biases of the two FICNNs of the free energy and the PICNN of the dissipation potential.They are constrained by the sets C  and C  containing restrictions for weights and biases of these ICNNs as described in App. A. Due to the iterative solution scheme, the evaluation of the loss function is very expensive [37].Hence, an optimizer with fast convergence is favorable.The optimizer Sequential Least Squares Programming (SLSQP) has shown to perform well for this application [17,20,26,71] and is therefore used in the numerical examples in Sec. 5.

Auxiliary feedforward network
To avoid the challenges of the integration method, another method has been proposed by As'ad and Farhat [39] and is adopted with slight modifications herein [17].This method introduces an additional FNN, that is supposed to learn the temporal course of the internal variable.Let q() denote this additional FNN.It depends on only a single input, the time , and has six outputs corresponding to the six independent entries of the symmetric tensor q.For the training of the networks for free energy and dissipation potential, values for the internal variable and its rate are taken from this network, where the rate is approximated as  q ≈ (  q − −1 q)/  Δ to be consistent with the prediction process.The weights and biases of q are adapted such that the predicted temporal course of q() allows the loss function to become as small as possible.The loss function consists of a term L σ for the accurate stress prediction and another L Biot term to comply with the Biot relation Eq. ( 4) [17].Thus, the reformulated optimization problem reads where the parameter set   contains weights and biases of the auxiliary FNN as specified in Sec. C.This leads to a reasonable representation of the internal variable without explicitly specifying its value in advance or having to integrate every sequence in every iteration of training.However, we have found that starting the optimization with randomly initialized weights for q makes the problem too complex for common optimizers and does not lead to the desired results.In order to facilitate the training, the auxiliary network is pre-trained using the strain ε() as an initial guess for q().Once the pre-training and subsequent actual training is finished, the auxiliary network is no longer necessary and the prediction process can be carried out using only free energy and dissipation potential.
Remark 1 With this architecture, the temporal course of the internal variable for a single sequence can be modeled via a single FNN.However, if data from several sequences are available in the data set, another FNN must be added for each additional sequence, making the optimization problem increasingly difficult.

Auxiliary recurrent network
RNNs have proven to be particularly suitable for describing the behavior of path-dependent systems [11,13,15,51].Therefore, the auxiliary FNN is now to be replaced by an RNN.Using an RNN cell to generate the internal variable allows to use multiple sequences for training without requiring a new auxiliary network for every training sequence.We use an LSTM cell [18] as RNN cell.This LSTM cell has two different state vectors, that allow the cell to store information from past time steps.These state vectors are the hidden state  and the cell state , where  is the output of the cell for every time step.The training with the auxiliary RNN works as follows: In each time step, the RNN cell receives the current strain  ε, the associated stress  σ from the data set as well as the time step  Δ.An FNN reduces the output of the RNN cell, the hidden state  , to 6 entries corresponding to the 6 independent entries of  q.The rate of the internal variable  q is calculated using  Δ and −1 q from the previous time step.Through differentiation of (  ε,  q) and (  q,  ε,  q), the stress  σ and the internal forces  τ and  τ are obtained.The status of the LSTM cell, i.e.,   and  , is passed on to the next time step, to obtain the next material state and so on.Using the calculated stresses and internal forces, the loss function L σ = MAE(σ, σ) and L Biot = MAE(τ, τ) (24) is evaluated.This loss function again contains a term for the error in the stress prediction and a term for compliance with the Biot equation.The optimization problem thus reads where the weights and biases of the auxiliary LSTM and the connected FNN are summarized in the parameter set Viscoelasticty with physics-augmented neural networks

A Preprint
No: : Schematic representation of the training process using the integration strategy.In each time step, the new material state is obtained iteratively with the procedure given in Fig. 3.The internal variable is passed on to every time step as starting point for the next integration step until the last time step of the sequence is reached.Consequently, the calculation of the stress for time step  requires the evaluation of all time steps 1, 2, . . .,  − 1 in advance. Iter. Iter. Iter.
Schematic representation of the training process using an FNN as auxiliary network for the internal variable.This FNN receives a single input, the time  , and outputs the six independent entries of  q.To calculate the rate  q ≈  q − −1 q)/  Δ, −1 q is obtained by evaluating this FNN with −1  as input.Note that each time step can be evaluated detached from the others, which allows fast training and the creation of minibatches within a sequence, if required., , , , , , : Schematic representation of the proposed training process using an RNN as auxiliary network for the internal variable.In each time step , this RNN receives three inputs: the strain  ε, the stress  σ and the time increment  Δ.Together with the hidden state −1  and cell state −1  from the previous time step, the RNN-cell processes these data into a new hidden state, that contains the information about the new internal variable.This new hidden state is forwarded to an FNN, that reduces the dimensionality to six for the six independent entries of  q.The new hidden state and cell state are passed on to the next time step, until the final time step of the sequence is reached.

Reference material
To generate the training data, a reference model consisting of a spring and a parallel Maxwell element as depicted in Fig. 5 is used.The free energy and the dissipation potential for such a model are defined as . ε el and ( 26) where ε in is defined to be the internal variable, ε el = ε − ε in , C eq = 3 eq I K + 2 eq I D and C ov = 3 ov I K + 2 ov I D (28) are the equilibrium and non-equilibrium stiffness tensors, respectively, and V(ε in , ε) is a positive semidefinite fourth order tensor describing the viscous properties of the material.This tensor is not constant, but depends on the overstress This ansatz is motivated from [72].The specific material parameters are given in Tab. 1.

Generation of the data base
A data set  = { ε(), σ() } contains information about the temporal course of strain and stress for a single or multiple sequences.To generate such a sequence, the reference material model is exposed to a randomized strain path ε() as shown in Fig. 6.This strain path is created with cubic splines that connect a set of randomly sampled knots [17,39].Starting from  knot   = 0 and  knot = 0, a time increment Δ knot is sampled from a uniform distribution between Δ knot min = 0.2 s and Δ knot max = 1 s.The increments of the six independent coordinates of the strain tensor are sampled from a normal distribution with standard deviation  knot Δ = 0.5 % around mean 0. If the resulting absolute value of the strain | knot   + Δ knot   | exceeds 2 %, the strain increment Δ knot is sampled again.Once this strain path ε() is generated, it is applied to the reference material with random time increments Δ from a uniform distribution between Δ min = 0.03 s and Δ max = 0.07 s to obtain the corresponding stresses σ().
We investigate the performance of the training methods for ideal stress data σ and for noisy stress data σ.To receive the noisy stress data, Gaussian noise is added to the ideal stress, such that σ  =    + Δ   , where Δ   is sampled from a normal distribution with standard deviation  Δ = 1.5 MPa and mean 0. The ideal and noisy data are shown in Fig. 6.
For the numerical examples, different data sets with different number of sequences and time steps per sequence are used with either ideal or noisy data.These data sets are indicated as  ideal 5x200 for a data set with 5 sequences à 200 time steps and ideal stress data, for example.To further indicate a data set that was generated using plane strain paths, i.e., with  13 =  23 =  33 = 0, the data set is marked with a bar as Dideal 5x200 .To investigate the performance of the trained models, they are tested on a test strain path that is generated in the same way as the sequences for the training data, that is, by sampling random knots and combining them with cubic splines.This path is scanned with constant time increment Δ = 0.05 s for in total 250 time increments.

Results
The presented NN-based constitutive model and training methods are now to be tested in different scenarios.In the first part of the subsection, it is exploited that the used data is generated synthetically and it is assumed that the internal variable is given in the data set to compare the invariant formulation and the coordinate formulation of the potentials in a simple test case, i.e.,  = (ε(), σ(), q()) additionally contains information about the internal variable.Afterwards, the internal variable is removed from the data set and the three training methods are compared for the invariant formulation, using both ideal and noisy stress data.using tensor coordinates as inputs of the potentials, a data set  ideal 5x200 with 5 sequences à 200 time steps and the corresponding plane strain data set Dideal 5x200 are used.These data sets are sufficiently large to show the full potential of both models and contain the internal variable.The training is carried out using the Adam optimizer with 20, 000 epochs, an initial learning rate of 0.01 and an exponential learning rate decay such that the learning rate is multiplied with 0.1 every 4, 000 epochs.To reduce the effect of the random weights initialization, both models are trained 25 times for each data set.The models with the lowest final value of the loss function are then tested on the unseen test strain path.The results are shown as correlation plots in Fig. 7a.

Comparison invariants vs. coordinates
For the coordinate formulation, it can be seen that the model is able to make fairly accurate predictions for the test path based on data set  ideal 5x200 .Using data set Dideal 5x200 , however, the model fails to extrapolate from the plane strain data to the full strain states.To illustrate this behavior, Fig. 7b shows the actual course of predicted stresses  11 ,  12 and  13 .It becomes evident that the large deviations from the expected values mainly concern the out-of-plane coordinates for which the training data set lacks sufficient data.
However, looking at the invariant formulation, it can be seen that it produces very accurate results regardless of the used data set, even exceeding the accuracy of the coordinate formulation with data set  ideal 5x200 .Thus, the invariant formulation enables extrapolation from plane strain data to full strain states without notable loss of accuracy.Such a well-developed extrapolation behavior also occurs with elastic NN models based on invariants [20,26].This benefit arises from the choice of a specific set of invariants that provides additional information about the anisotropy class of the underlying material law to the model.In fact, the change from plane strain to full strain states may not correspond to an extrapolation in the invariant space at all [28].For this reason, only the invariant formulation is used in the following studies and the data set is reduced to one sequence of length 200.

Training without given internal variable
The internal variable is now removed from the data set and has to be generated during the training process.To do so, the methods described in Sec. 4 are applied, where the data sets  ideal 1x200 and  noisy 1x200 are used to compare the methods.Again, all models are trained 25 times.The results in Fig. 8 are generated using the model with the lowest final value of the loss function.

Integration
The integration training method yields highly accurate results for the ideal data set and outperforms the other methods in terms of accuracy.However, it should be noted that this method uses the SLSQP optimizer for computational reasons, which has been shown to provide better results for smaller networks than Adam [71].Since no additional network is required for this method, SLSQP can be used in a computationally efficient man- Figure 7: Comparison of the formulation with invariants and coordinates as inputs to the networks: (a) shows correlation plots for the prediction results for both formulations using the data sets  ideal 5x200 (full strain states) and Dideal 5x200 (plane strain states) with given internal variable.(b) shows the stress response σ() for the coordinate formulation with the data set Dideal 5x200 to emphasize the lacking extrapolation capability for the out-of-plane coordinates on the example of  13 .
ner with fewer function evaluations compared to Adam.Nevertheless, the evaluation of the loss function consumes the majority of computational time for training, since an iterative solution is required at each time step, regardless of the optimizer, see Fig. 2. Adding another sequence to the data set or doubling its length roughly doubles the cost for the evaluation of the loss function, making it more and more inefficient.However, very good results can be achieved with both the ideal training data as well as the noisy training data set as Figs. 8 and 9 show.
Auxiliary feedforward network Although this training method shows the least accurate predictions for both ideal and noisy training data, the results are still sufficient in both cases.In addition to that, the computational time per iteration is very low.However, before the actual training starts, the auxiliary FNN must be pre-trained to a reasonable initial guess for q() to have good starting values for the weights and biases.Using randomly initialized parameters at the beginning of the actual training did not yield useful results in the numerical experiments shown here.We used q() = ε() as initial guess to pre-train the FNN.Another major drawback of this method is the increasing demand of trainable parameters for a longer sequence or additional sequences in the training data set.Since a single FNN only predicts the temporal course of one sequence, every new sequence requires a new FNN, making the optimization more and more complex.For a single sequence of moderate length, however, the method can yield useful results, see Figs. 8 and 9.
Auxiliary recurrent network The RNN as auxiliary network for the generation of the internal variable also provides precise results with both data sets.Compared to the integration method, the training is very fast and in contrast to the FNN it does not need a pre-training.
Moreover, using a training data set with several sequences instead of only one sequence does not increase the number of trainable parameters, as the RNN implicitly learns how the internal variable evolves instead of memorizing its temporal course.Thus, we consider a test case which is more complex compared to the works [37,39,57], where only one multiaxial or even uniaxial path has been used for training.To do so, several sequences with multiaxial states are added to the training data set, i.e.,  ideal 100x100 is used.The prediction results for the architecture trained with the set  ideal 100x100 are shown in Fig. 14.As can be seen, the stress response for the unseen strain path is even more precise, while the required time per training iteration remains short.Although the data set contains 50-times the number of time steps, the time per iteration increases from 0.059 s to only 0.112 s.This shows that the RNN method allows to use multiple sequences efficiently without increasing amount of trainable parameters and within reasonable training time.

Conclusions
This paper proposes a fast and widely applicable method to calibrate inelastic constitutive models without prior knowledge of the internal variables.This method is compared comprehensively with two existing methods.For this comparison, we propose and use a physics-augmented NN-based model for viscoelastic materials based on the concept of GSMs.
In the beginning of this work, the concept of GSMs is briefly described.Subsequently, the NN-based model is presented, which consists of two potentials: the free energy and the dissipation potential.These potentials are constrained in a physically meaningful way so that thermodynamic consistency is ensured in advance.The potentials can be expressed using either tensor coordinates or invariants of the tensor arguments.Following the presentation of the NN-based constitutive model, three training methods are introduced.These comprise a method that integrates entire sequences and two methods that use an additional NN, either an FNN or an RNN, to provide the internal variable.
Based on these techniques, numerical experiments are conducted using synthetically generated data from classical constitutive models.First, it is assumed that the internal variable is present in the training data set.Using this data set, the invariant formulation is compared to the coordinate formulation.It is shown, that the invariant formulation requires less data, yields more accurate results and exhibits far better extrapolation capabilities when predicting stresses for arbitrary strain paths with plane strain training data.Henceforth, only the invariant formulation is used in the subsequent studies.In order to compare the three training methods, the internal variable is erased from the data set and the performance of the methods is examined using only stress and strain data.It shows, that all three methods are able to provide models that yield accurate predictions for unseen data, even for a small data set with noisy stress data.However, only the proposed RNN model allows to achieve fast and accurate results with large data sets comprising many sequences and is therefore the method with the broadest applicability.Thus, this paper proposes a flexible constitutive model for viscoelastic materials and an efficient method for calibrating such models without prior knowledge of the internal variable.The presented training method exceeds the application possibilities of existing approaches and can form the basis for future extensions in the context of NN-based data driven constitutive modeling.
However, this study also has limitations.Specifically, the presented algorithms assume that constitutive behavior can be modeled with a single internal variable expressed by a symmetric second-order tensor.While this assumption is valid for the synthetically generated data used herein, it may not be sufficient to generalize for more complex behavior.In this case, the number of internal variables can be increased stepwise until the desired accuracy is achieved.Therefore, possible future studies include the implementation and testing of such algorithms, the application to elastoplastic materials [27,32], extension to viscoelasticity at finite strains [39] as well as the use of experimental [38] or homogenization data [26].
Viscoelasticty with physics-augmented neural networks A Preprint

A Neural Networks
This section describes the concept of input convex feedforward neural networks and the normalization of inputs and outputs.In order to provide a compact explanation, some symbols are introduced: The network function is denoted as () and depends on an input vector .The output   of the layer  is calculated using the activation function   , the biases   and the weights   connecting this layer to the previous layer  − 1.The following concepts are mainly based on Amos et al. [49], with extensions to functions that are not only convex with respect to the inputs , but also with respect to some  in a network (()) by making the network non-decreasing in  [20,50].

A.1 Fully input convex neural networks
A fully input convex neural network (FICNN) is an FNN, that is convex in all of its arguments.There are several ways to construct such an FNN, but not all variants are equivalent.Consider for example a normal FNN with non-negative weights across all layers as well as convex and non-decreasing activation functions.Such an FNN is convex, but is also extremely restrictive.Therefore, the more flexible approach of Amos et al. [49], shown in Fig. 15, is adopted.It is characterized by two different sets of weights: the weights in    connect the layer  − 1 with the layer  and the weights    connect the input  with the layer .The outputs of the layers  ∈ (1, . . ., ) is now calculated as =       −1 +     +   ∀ ∈ (2, . . ., ) .(32) The network () =   is convex if the following three conditions are fulfilled: (i) all weights in    are non-negative, (ii) all   are convex, and (iii) all   are non-decreasing.This approach offers greater flexibility since the weights    remain unconstrained as shown in Fig. 15(a).A common choice for a convex and non-decreasing activation function is the softplus activation function, which is defined by So far, the network has been constructed in such a way that it is convex with respect to its direct inputs.However, this must not always be sufficient.For example, in the context of this work, networks are to be implemented, that are convex with respect to a tensor, although the network inputs are invariants.The arbitrariness of the    follows from the fact that the second derivatives of     with respect to  vanish and consequently do not have any influence on the convexity in  of the network function ().This changes if  is not supposed to be convex in its inputs, but in another variable , which the inputs depend on.That is, (()) is supposed to be convex in .Then, the second  15(b): besides the conditions (i)-(iii), it must also hold that (iv) all entries in () are convex functions of , and (v) all weights in    are non-negative [29].

A.2 Partially input convex neural networks
A partially input convex neural network (PICNN) is, in contrast to an FICNN, only convex in some of its inputs.
Let ( ∪ ,  ∽ ) denote the network, that is convex in  ∪ , but not necessarily convex in  ∽ .The architecture of such a network is shown in Fig. 16.The figure shows the two different paths corresponding to the convex (bottom) and non-convex part (top).The non-convex part with the layer outputs   is a regular FNN, i.e., each layer is connected to only the previous layer of the non-convex path.A layer in the convex path with layer outputs   , however, depends on not only the previous layer of the convex path, but also on the output of previous layer in the non-convex path and the convex inputs.The outputs of the layers in the non-convex path are defined as and for the convex path as  Figure 16: The PICNN architecture is both convex and non-decreasing in  ∪ , but allows arbitrary functional relations in  ∽ .For this purpose, the PICNN is divided into two paths, one for the convex part (  ) and one for the non-convex part (  ).The non-convex path   is independent of the convex path   , while the layers in the convex path take into account both the output of the non-convex path as well as multiplications between   ,   and the convex input  ∪ .The final output is the last layer in the convex path.The illustration is based on Klein et al. [50].
The final output of the network is the last layer of the convex path, i.e., ( ∪ ,  ∽ ) =   .Instead of two sets of weights, a PICNN comprises six different sets of weights (   ,    ,   ∪  ,    , W  , W ∪   ) to take into account possible products between the two paths and the convex input.The network is convex in  ∪ if the following four conditions are fulfilled: (i) all weights in    are non-negative, (ii) all activations  ∪  are convex, (iii) all activations  ∪  are non-decreasing, and (iv) all activations    map to non-negative values.All other weights may as well take negative values and the activations  ∽  and   ∪   can be chosen arbitrarily.A valid choice for  ∪  is, like for the FICNN, the softplus function , which additionally is non-negative and thus can be used as    as well.The mentioned conditions yield a network ( ∪ ,  ∽ ), that is convex in all entries of  ∪ .As already discussed for the FICNN, this is not always sufficient.If a network ( ∪ (),  ∽ ) is supposed to be convex in , it is not the first function acting on , which leads to further restrictions on the weights and activations.( ∪ (),  ∽ ) is convex in , if in addition to (i)-(iv) it holds: (v) all entries of  ∪ are convex functions of , (vi) all weights in   ∪  are non-negative, and (vii) all   ∪   map to non-negative values.These additional restrictions are only necessary if the entries of  ∪ are non-linear function in , i.e., the second derivatives do not vanish.Otherwise, the standard architecture is also valid.

A.3 Normalization
The training process of neural networks is typically more stable and efficient if the inputs and outputs of the network are values of magnitude 1.In particular, since the free energy and the dissipation potential take on large numerical values in the range > 10 6 , normalization is essential for a successful training.Normalization of a known quantity (•) to the range (−1, 1) ∋ ( •) can be carried out with (•) with ( 37) ((•) max − (•) min ) and ( 38) where ( •) represents the normalized quantity and (•) max and (•) min are the maximum and minimum value of (•) across the whole training data set.For the problem described here, the inputs ε, q, q,  and Δ or the respective invariants I  eq , I  ov and I  have to be normalized, as well as the outputs  eq ,  ov and .The difficulty for the proposed model arises from the fact that neither q and q as inputs nor  eq ,  ov or  as outputs are known before the training, but only ε(), σ() and  are known in advance.For this reason, the order of magnitude of the expected values for the unknown variables is estimated based on the available variables.Therefore, the normalization parameters  and  are determined on the basis of the following assumptions: the internal variable and the variable p = ε − q are normalized with the values of ε, i.e.,  q =  p =  ε , while the rate q is normalized with ε.The normalization parameters for the potentials are defined to be   eq =   ov =   eq =   ov =  ε •  σ , where the choice of the s exploits the fact that  eq ,  ov ,  ≥ 0. Using these normalization parameters, the network itself maps only from values of magnitudes around 1 to values of magnitude 1, which enables an efficient training.Note, that due to the linear nature of the transformation Eq. (37)

Figure 5 :
Figure 5: Rheological model of the viscoelastic reference solid with a single internal variable q = ε in .
Figure A strain path ε() from the training data set with respective ideal stress response σ() for the data sets  ideal and noisy data for the data sets  noisy .The curve ε() is generated using cubic splines connecting a set of randomly sampled points, indicated as red crosses.

Figure 8 :Figure 9 :Figure 10 :Figure 11 :Figure 12 :Figure 13 :
Figure 8: Comparison of the training methods with correlation plots for the results of the stress prediction for the unseen test path.The models were traines with the data sets  ideal 1x200 and  noisy 1x200, respectively.For the noisy data, the reference stress refers to the underlying ground truth, i.e., the noiseless data.

Figure 14 :
Figure 14: Correlation plot for the prediction results for the unseen test path using the RNN training method with the data set  ideal 100x100 , i.e., with 100 sequences.

Table 1 :
Material parameters for the viscoelastic reference material that is used to generate the training data.
.The concrete architecture of the LSTM and FNN can be found in Sec. C. Finally, after training, the RNN and the attached FNN can be discarded, leaving  and  as constitutive model.

Table 2 :
Calculation time comparison of different training methods for the model for one epoch.

Table 3 :
Advantages and disadvantages of the different training methods.Summary Each of the analyzed techniques exhibits certain advantages compared to the others.In order to provide a comprehensive summary of this comparison, the advantages and disadvantages of each approach are listed in Tab. 3. The integration training is very accurate.However, the biggest disadvantage of this method is the training time, which also affects the applicability for large data sets with many sequences, such that this combination is not practically manageable due to cost reasons.Using an FNN as an auxiliary network is extremely fast, but requires a pre-training of the FNN.It is also limited to one or very few sequences.In contrast, the RNN is fast, does not require pre-training, and can be applied to large data sets with multiple sequences without further adjustments.This feature is particularly relevant for real-world applications with experimental or homogenization data.
[50]re15: Two types of FICNN architectures: the architecture in (a) is convex with respect to all entries in the input vector  with unrestricted weights in the passthrough layers.The architecture in (b) is constructed, such that it is convex and non-decreasing in (), making it convex in .This affects the weights in passthrough layers, which may no longer take negative values.This significantly limits the effectiveness of the passthrough layers, but enforces convexity in .The illustrations are based on Klein et al.[50].derivatives of    () with respect to  do not vanish if () is a nonlinear function of .Consequently, another restriction on the weights arises, see Fig.

Table 4 :
, this type of Viscoelasticty with physics-augmented neural networks Hyperparameters, that were used for the NNs of the potentials.