1 Introduction

Electro Active Polymers (EAPs) have emerged as a category of intelligent materials capable of undergoing substantial changes in shape in response to electrical stimuli [9, 32, 53,54,55]. Among these, dielectric elastomers are particularly noteworthy due to their remarkable actuation capabilities, encompassing attributes such as light weight, rapid response times, flexibility, and low stiffness properties. Notably, these materials can undergo electrically induced substantial strains (with reported area expansions of up to 1962% [37] as observed in research conducted at Harvard’s Suo Lab). Their potential is exceptionally promising, with applications spanning bio-inspired robotics [8, 11, 29, 36, 47], humanoid robotics, and advanced prosthetics [9, 40, 68], as well as implications for tissue regeneration [48].

The realm of nonlinear continuum mechanics has reached an advanced stage of development, encompassing the variational formulation, finite element implementation, and principles related to the constitutive modeling of EAPs [6, 7, 16, 17, 67]. In the context of the latter, the reversible constitutive model for dielectric elastomers is encapsulated within the free energy density, contingent upon the deformation gradient tensor \(\varvec{F}\) and the material electric field \(\varvec{E}_0\). Complementary to this potential, which exhibits a saddle point nature, is the internal energy density, contingent upon the deformation gradient tensor and the electric displacement field \(\varvec{D}_0\). Building upon this foundation, researchers in [22, 51, 52] introduced an extension of the concept of polyconvexity, originally from the field of hyperelasticity [2, 12,13,14, 64], into this coupled electromechanical scenario. This novel definition of polyconvexity played a pivotal role in establishing the existence of minimizers in this context for the first time [65], serving as a sufficient condition for the extension of the rank-one convexity criterion within electromechanics.

In spite of the considerable inherent potential exhibited by EAPs, a primary limitation arises from their demand for a substantial electric field magnitude to induce significant deformation, rendering them prone to electromechanical instabilities or electrical breakdown [4, 57, 70]. To mitigate the requirement for high voltage operation, some researchers propose the adoption of composite materials as the basis for EAPs. These composites often amalgamate a low-stiffness, low-permittivity elastomer matrix with stiffer, higher-permittivity inclusions distributed randomly as fibers or particles. Experimental investigations have evidenced a substantial enhancement in the coupled electromechanical performance of electroactive composites, thereby reducing the voltage prerequisites for actuation. A noteworthy development in recent years pertains to rank-one laminate composite dielectric elastomers [15, 43, 44].

The determination of the macroscopic constitutive response of the composite material hinges upon the specific type of microstructure under consideration. In the case of laminated composite materials, the homogenization challenge at the microstructure level is governed by a system of nonlinear equations that implicitly establish the microstructural parameters with respect to the macroscopic strain gradient tensor and the electric displacement field [15, 21, 58]. In the case of more intricate microstructures, such as randomly distributed inclusions embedded within an elastomeric matrix, the determination of the macroscopic constitutive response of the composite material necessitates the utilization of computationally intensive homogenization techniques. However, these methods come with a limitation. EAPs exhibit nonlinear behavior, leading to a non-linear dependency of their macroscopic response on macroscopic deformations and electric or magnetic fields. Essentially, this signifies that a boundary value problem must be solved at the micro level, considering suitable boundary conditions, for every stress and electric/magnetic field combination [63].

In the effort to mitigate the high computational demands associated with methods like computational homogenization, recent developments in the realm of nonlinear continuum mechanics have witnessed the emergence of Machine Learning algorithms. These methods enable the generation of diverse constitutive models through the utilization of data gathered from experimental tests or in-silico (computational)-based computations. This paper aims to explore a specific type of Machine Learning technique, namely Gaussian Process Regression (GPR), to showcase its viability in approximating the constitutive model of analytical constitutive models in electromechanics, particularly within a specific category of composites known as rank-one laminates. By applying rank-n homogenization principles, it becomes feasible to derive the homogenized or effective response of rank-one laminates in an almost analytical manner, without necessitating computational homogenization. Utilizing in-silico data generated based on this model, the objective is to create surrogate models capable of replicating the behavior of the aforementioned types of constitutive models. This initial focus pertains to simpler scenarios, with the intention of demonstrating the feasibility and accuracy of the employed GPR technique. This approach sets the stage for addressing more complex composite cases in near future works, eliminating the need for computational homogenization and facilitating a computationally efficient evaluation of their effective behaviour [63].

Artificial Neural Networks (ANNs) have been employed for learning or discovering constitutive models based on data generated either in silico or in physical laboratories, as demonstrated in studies several [10, 20, 31, 38, 39]. The work by Klein [31] represents a pioneering effort in the successful application of ANNs for uncovering constitutive laws in nonlinear electromechanics. Additionally, Gaussian Process Regression (GPR) has gained traction and found application in the development of data-based constitutive models for moderate strains in soft tissue applications, as shown by Aggarwal et al. [27]. A distinctive feature GPR compared to the ANN approach lies in its inherent probabilistic nature. GPR allows for the specification of prior knowledge, the generation of a distribution encompassing potential predictive functions, and the direct calculation of prediction uncertainties [5, 41, 56]. Moreover, GPR or Kriging offers control over the degree of interpolation between known points through the specification of noise in the correlation function [20].

Kriging [33, 46, 62] predicts Gaussian random field values using observed data from a finite set of points, finding applications in geostatistics, numerical code approximation, global optimization, and machine learning [28, 46, 56, 61]. It employs Gaussian distributions to define a joint distribution based on observations and predictions, utilizing spatially correlated covariance to weigh observation importance. The joint distribution conditions on observed data, yielding a prediction distribution characterized by mean and covariance, facilitating sampling for predictions [49]. This emulator type has grown popular due to its nonlinear function capture and statistical output [28, 45], yielding confidence intervals and adaptive metamodel refinement strategies.

This paper proposes a gradient-enhanced Gaussian process regression metamodelling technique for emulating internal energy densities characterizing soft/flexible EAP behavior. The method enforces physical constraints upfront by incorporating principal invariants as inputs. Gradient Kriging excels in precise interpolation of energy, first Piola-Kirchhoff stress tensor. Derivative incorporation reduces sampling points while maintaining accuracy. In contrast to neural networks, Kriging’s interpolatory nature precisely matches stress tensors at sample points, ensuring stress-free origin compliance.

The structure of this paper unfolds as follows: In Sect. 2, we establish the foundational concepts by introducing the essential elements of nonlinear continuum electromechanics, emphasizing constitutive modeling. Moving forward, Sect. 3 provides a comprehensive and self-contained overview of Gaussian Process Regression (GPR) or Kriging. Proceeding to Sect. 4, we undertake the calibration of Kriging-based surrogate models by employing synthetic data derived from well-established ground truth internal energy densities. Lastly, Sect. 5 exemplifies the practical application of these surrogate models within a 3D Finite Element computational framework. A thorough assessment is conducted to gauge the precision of these models across diverse and demanding scenarios, juxtaposing displacement and stress fields against their corresponding ground-truth analytical model counterparts.

Notation Throughout this paper, \(\varvec{A}:\varvec{B}=A_{IJ}B_{IJ}\), \(\forall \varvec{A},\varvec{B}\in {\mathbb {R}}^{3\times 3}\), and the use of repeated indices implies summation. The tensor product is denoted by \(\otimes \) and the second order identity tensor by \(\varvec{I}\). The tensor cross product operation between two artibrary second order tensor \(\varvec{A}\) and \(\varvec{B}\) entails . Furthermore, \(\varvec{{\mathcal {E}}}\) represents the third-order alternating tensor. The full and special orthogonal groups in \({\mathbb {R}}^3\) are represented as \(\text {O}(3)=\{\varvec{A}\in {\mathbb {R}}^{3\times 3},\vert \,\varvec{A}^T\varvec{A}=\varvec{I}\}\) and \(\text {SO}(3)=\{\varvec{A}\in {\mathbb {R}}^{3\times 3},\vert \,\varvec{A}^T\varvec{A}=\varvec{I},\,\text {det}\varvec{A}=1\}\), respectively and the set of invertible second order tensors with positive determinant is denoted by \(\text {GL}^+(3)=\{\varvec{A}\in {\mathbb {R}}^{3\times 3} \vert \,\text {det}{\varvec{A}}>0\}\).

2 Finite strain electromechanics

2.1 Differential governing equations in finite strain electromechanics

Let \({\mathcal {B}}_0\) denote a subset of three-dimensional Euclidean space \({\mathbb {R}}^3\), representing the initial, undeformed state of an Electro Active Polymer (EAP) material. We postulate the existence of an injective function \(\varvec{\phi }\), which uniquely maps each point \(\varvec{X}\) in the material configuration \({\mathcal {B}}_0\) to a corresponding point \(\varvec{x}\) in the deformed, spatial configuration \({\mathcal {B}}\in {\mathbb {R}}^3\). This mapping relationship is mathematically expressed as \(\varvec{x}=\varvec{\phi }(\varvec{X})\) (as illustrated in Fig. 1). Associated with the mapping \(\varvec{\phi }\), we define the deformation gradient tensor \(\varvec{F}\in \text {GL}^+(3)\) as \(\varvec{F} = \partial _{\varvec{X}}\varvec{\phi }\).

Fig. 1
figure 1

mapping of material quantities to the spatial quantities

The behavior of the EAP represented by \({\mathcal {B}}_0\) is governed by the ensuing coupled boundary value problem:

$$\begin{aligned} \left. \begin{aligned} \varvec{F}&= \partial _{\varvec{X}}\varvec{\phi },&\quad&\text {in }{\mathcal {B}}_0\\ \text {DIV}\varvec{P}&= -\varvec{f}_0,&\quad&\text {in }{\mathcal {B}}_0\\ \varvec{\phi }&=\varvec{\phi }^{\star },&\quad&\text {on }\partial _{\varvec{\phi }}{\mathcal {B}}_0\\ \varvec{P}\varvec{N}&=\varvec{t}_0,&\quad&\text {on }\partial _{\varvec{t}}{\mathcal {B}}_0 \end{aligned}\right\} \qquad \qquad \qquad \left. \begin{aligned} \varvec{E}_0&= -\partial _{\varvec{X}}{\varphi },&\quad&\text {in }{\mathcal {B}}_0\\ \text {DIV}\varvec{D}_0&= \rho _0,&\quad&\text {in }{\mathcal {B}}_0\\ {\varphi }&={\varphi }^{\star },&\quad&\text {on }\partial _{{\varphi }}{\mathcal {B}}_0\\ \varvec{D}_0\cdot \varvec{N}&=-{\omega }_0,&\quad&\text {on }\partial _{\omega }{\mathcal {B}}_0 \end{aligned}\right\} \nonumber \\ \end{aligned}$$
(1)

where the equations on the left correspond to the purely mechanical physics and those on the right hand side, with the electrostatics equations. In (1), \(\text {DIV}(\bullet )\) signifies the divergence with respect to the material coordinates \(\varvec{X}\in {\mathcal {B}}_0\), while \(\varvec{f}_0\) represents the force applied per unit volume \({\mathcal {B}}_0\). Dirichlet boundary conditions for the field \(\varvec{\phi }\) are imposed on \(\partial _{\varvec{\phi }}{\mathcal {B}}_0\), and \(\varvec{t}_0\) represents a force per unit undeformed area, being \(\varvec{N}\) the outward normal at \(\varvec{X}\in \partial _{\varvec{t}}{\mathcal {B}}_0\). Furthermore, on the right hand side of (1) \(\rho _0\) represents an electric charge per unit undeformed volume \({\mathcal {B}}_0\). Dirichlet boundary conditions are prescribed on \(\partial _{{\varphi }}{\mathcal {B}}_0\) for the field \(\varphi \), and \(\omega _0\) represents an electric charge per unit undeformed area \(\partial _{{\omega }}{\mathcal {B}}_0\), being \(\varvec{N}\) the outward normal at \(\varvec{X}\in \partial _{{\omega }}{\mathcal {B}}_0\). For both coupled physical problems, the boundaries where Dirichlet and Neumann boundary conditions are prescribed satisfy the following

$$\begin{aligned} \left. \begin{aligned} \partial {\mathcal {B}}_0&=\partial _{\varvec{\phi }}{\mathcal {B}}_0\cup \partial _{\varvec{t}}{\mathcal {B}}_0\\ \emptyset&=\partial _{\varvec{\phi }}{\mathcal {B}}_0\cap \partial _{\varvec{t}}{\mathcal {B}}_0 \end{aligned}\right\} \qquad \qquad \qquad \left. \begin{aligned} \partial {\mathcal {B}}_0&=\partial _{{\varphi }}{\mathcal {B}}_0\cup \partial _{\omega }{\mathcal {B}}_0\\ \emptyset&=\partial _{\varphi }{\mathcal {B}}_0\cap \partial _{\omega }{\mathcal {B}}_0 \end{aligned}\right\} \nonumber \\ \end{aligned}$$
(2)

Finally, \(\varvec{P}\) and \(\varvec{D}_0\) symbolize the first Piola-Kirchhoff stress tensor and the material electric displacement field, respectively. These tensors are interlinked with the deformation gradient tensor \(\varvec{F}\) and the material electric field \(\varvec{E}_0\) by means of an appropriate constitutive law, as described in Sect. 2.2.

2.2 The internal energy density in electromechanics

The constitutive model of the undeformed solid \({\mathcal {B}}_0\) is encapsulated in the internal energy density per unit underformed volume, denoted as

$$\begin{aligned} e:\text {GL}^+(3)\times {\mathbb {R}}^3\rightarrow {\mathbb {R}},\qquad (\varvec{F},\varvec{D}_0)\mapsto e(\varvec{F},\varvec{D}_0) \end{aligned}$$
(3)

Taking the derivative of the internal energy density with respect to both \(\varvec{F}\) and \(\varvec{D}_0\) gives rise to the first Piola-Kirchhoff stress tensor \(\varvec{P}\) and the material electric field \(\varvec{E}_0\) as defined in Eq. (1)

$$\begin{aligned} \varvec{P}=\partial _{\varvec{F}}e(\varvec{F},\varvec{D}_0)\qquad \varvec{E}_0=\partial _{\varvec{D}_0}e(\varvec{F},\varvec{D}_0) \end{aligned}$$
(4)

The internal energy density \(e(\varvec{F},\varvec{D}_0)\) is required to adhere to the principle of objectivity, also known as material frame indifference. This entails its invariance with respect to rotations \(\varvec{Q}\in \text {SO}(3)\) applied to the spatial configuration, as follows

$$\begin{aligned}{} & {} e(\varvec{Q}\varvec{F},\varvec{D}_0) = e(\varvec{F},\varvec{D}_0)\quad \forall \,\varvec{F}\in \text {GL}^+(3),\,\varvec{D}_0\in {\mathbb {R}}^3,\,\nonumber \\{} & {} \varvec{Q}\in \text {SO}(3). \end{aligned}$$
(5)

Moreover, the internal energy density must conform to the material symmetry group \({\mathcal {G}}\subseteq \text {O}(3)\), a defining factor in determining the isotropic or anisotropic attributes of the underlying material. This requirement can be succinctly expressed in mathematical terms as follows

$$\begin{aligned}{} & {} e(\varvec{F}\varvec{Q},\varvec{Q}\varvec{D}_0) = e(\varvec{F},\varvec{D}_0)\qquad \forall \,\varvec{F}\in \text {GL}^+(3),\,\varvec{D}_0\in {\mathbb {R}}^3,\,\nonumber \\{} & {} \varvec{Q}\in {\mathcal {G}}\subseteq \text {O}(3). \end{aligned}$$
(6)

Furthermore, the internal energy density \(e(\varvec{F},\varvec{D}_0)\), along with the first Piola-Kirchhoff stress tensor \(\varvec{P}\) and the material electric field \(\varvec{E}_0\), must all vanish when no deformations are present, i.e.

$$\begin{aligned}{} & {} \left. e(\varvec{F},\varvec{D}_0)\right| _{\varvec{F}=\varvec{I},\varvec{D}_0=\varvec{0}}=0,\qquad \left. \varvec{P}(\varvec{F},\varvec{D}_0)\right| _{\varvec{F}=\varvec{I},\varvec{D}_0=\varvec{0}}=\varvec{0};\nonumber \\{} & {} \quad \left. \varvec{E}_0(\varvec{F},\varvec{D}_0)\right| _{\varvec{F}=\varvec{I},\varvec{D}_0=\varvec{0}}=\varvec{0}. \end{aligned}$$
(7)

The conditions in Eqs. (5), (6), and (7) embody essential physical criteria. Alongside these, there is a requisite for the internal energy density function to satisfy pertinent mathematical criteria. Specifically, the internal energy density function conventionally adheres to mathematical constraints rooted in the concept of convexity. One of the simplest conditions is that of convexity of \(e(\varvec{F},\varvec{D}_0)\), that is

$$\begin{aligned} \begin{aligned}&D^2e(\varvec{F},\varvec{D}_0)[\delta \varvec{{\mathcal {U}}};\delta \varvec{{\mathcal {U}}}]=\delta \varvec{{\mathcal {U}}} \bullet [{\mathbb {H}}_e] \bullet \delta \varvec{{\mathcal {U}}} \ge 0;\\&\quad \forall \,\{\varvec{F},\varvec{D}_0\}\in \text {GL}^+(3)\times {\mathbb {R}}^3,\\&\quad \forall \delta \varvec{{\mathcal {U}}}=\{\delta \varvec{F},\delta \varvec{D}_0\}\in {\mathbb {R}}^{3\times 3}\times {\mathbb {R}}^3, \end{aligned} \end{aligned}$$
(8)

which requires positive semi-definiteness of the Hessian operator \([{\mathbb {H}}_e]\), defined as

$$\begin{aligned}{}[{\mathbb {H}}_e]=\begin{bmatrix} \partial ^2_{\varvec{F}\varvec{F}}e &{} \partial ^2_{\varvec{F}\varvec{D}_0}e\\ \partial ^2_{\varvec{D}_0\varvec{F}}e &{} \partial ^2_{\varvec{D}_0\varvec{D}_0}e \end{bmatrix}. \end{aligned}$$
(9)

However, convexity away from the origin (i.e., \(\varvec{F} \approx \varvec{I}\), \(\varvec{D}_0\approx \varvec{0}\)) is not practically suitable, as it doesn’t encompass realistic material behaviors like buckling [2]. An alternative mathematical constraint is the quasiconvexity of \(e(\varvec{F},\varvec{D}_0)\) [3]. Unfortunately, quasiconvexity is a nonlocal condition that is challenging, and even infeasible, to verify. An implied requirement of quasiconvexity is that of generalized rank-one convexity of \(e(\varvec{F},\varvec{D}_0)\). A generalized rank-one convex energy density satisfies

$$\begin{aligned} \begin{aligned}&D^2e(\varvec{F},\varvec{D}_0)[\delta \varvec{{\mathcal {U}}};\delta \varvec{{\mathcal {U}}}]=\delta \varvec{{\mathcal {U}}} \bullet [{\mathbb {H}}_e] \bullet \delta \varvec{{\mathcal {U}}} \ge 0;\\&\quad \forall \,\{\varvec{F},\varvec{D}_0\}\in \text {GL}^+(3)\times {\mathbb {R}}^3,\\&\quad \forall \delta \varvec{{\mathcal {U}}}=\{\varvec{u}\otimes \varvec{V},\varvec{V}_{\perp }\},\,\, \varvec{u},\varvec{V}\in {\mathbb {R}}^3,\varvec{V}_{\perp }\cdot \varvec{V}=0 \end{aligned} \end{aligned}$$
(10)

Remark 1

Notice that the vector \(\varvec{V}_{\perp }\) in (10) is orthogonal to \(\varvec{V}\). The reason for this choice has its roots in the analysis of the hyperbolicity of the system of PDEs in (10) in the dynamic context. In this case, it is customary to express the fields \(\varvec{\phi }\) and \(\varvec{D}_0\) as a perturbation with respect to equilibrium states \(\varvec{\phi }^{\text {eq}}\) and \(\varvec{D}_0^{\text {eq}}\), respectively, by means of the addition of travelling wave functions as

$$\begin{aligned} \varvec{\phi } = \varvec{\phi }^{\text {eq}} + \varvec{u}\hat{\phi }(\varvec{X}\cdot \varvec{V} - c t); \varvec{D}_0 = \varvec{D}_0^{\text {eq}} + \varvec{V}_{\perp }\hat{\phi }(\varvec{X}\cdot \varvec{V} - c t)\nonumber \\ \end{aligned}$$
(11)

where \(\varvec{V}\) represents the polarisation vector of the travelling wave and c the associated speed of propagation of the perturbation with amplitudes \(\varvec{u}\) and \(\varvec{V}_{\perp }\). Introduction of the ansatz for \(\varvec{D}_0\) into the Gauss’s law in Eq. (1) reveals that

$$\begin{aligned} \text {DIV}\varvec{D}_0 - \rho _0&= \text {DIV}\varvec{D}^{\text {eq}}_0 - \rho _0 + \left( \varvec{V}_{\perp }\cdot \varvec{V}\right) \hat{\phi }^{\prime }(\varvec{X}\cdot \varvec{V} - ct)\nonumber \\&=0 \end{aligned}$$
(12)

and therefore, \(\varvec{V}_{\perp }\) must be orthogonal to \(\varvec{V}\).

Condition (10) is known as the Legendre-Hadamard condition or ellipticity of \(e(\varvec{F},\varvec{D}_0)\). It is associated with the propagation of traveling plane waves in the material, defined by a vector \(\varvec{V}\) and speed c [50]. Importantly, the existence of real wave speeds ab initio for the specific governing Eq. in (1) is assured when the electromechanical acoustic tensor \(\varvec{Q}_{ac}\) is positive definite, with

$$\begin{aligned}{}[\varvec{Q}_{ac}]_{ij}=[\widetilde{\varvec{{\mathcal {C}}}}]_{iIjJ}V_{I}V_{J} \end{aligned}$$
(13)

with

$$\begin{aligned}{} & {} \widetilde{\varvec{{\mathcal {C}}}}=\partial ^2_{\varvec{FF}}e + \partial ^2_{\varvec{F}\varvec{D}_0}e\left( \partial ^2_{\varvec{D}_0\varvec{D}_0}e \right) ^{-1}\nonumber \\{} & {} \Bigg (\frac{\varvec{V}\otimes \left( \partial ^2_{\varvec{D}_0\varvec{D}_0}e \right) ^{-1}\varvec{V}}{\varvec{V}\cdot \left( \partial ^2_{\varvec{D}_0\varvec{D}_0}e \right) ^{-1}\varvec{V}} - \varvec{I}\Bigg )\partial ^2_{\varvec{D}_0\varvec{F}}e. \end{aligned}$$
(14)

A sufficient and localized condition aligned with the rank-one condition in (10) is the polyconvexity of e. The internal energy density is considered polyconvex [2, 22] if a convex and lower semicontinuous function \({\mathbb {W}}:\text {GL}^+(3)\times \text {GL}^+(3)\times {\mathbb {R}}^+\times {\mathbb {R}}^3\times {\mathbb {R}}^3 \rightarrow {\mathbb {R}}\cup {+\infty }\) (generally non-unique) is defined as

$$\begin{aligned} e(\varvec{F},\varvec{D}_0) = {\mathbb {W}}(\varvec{{\mathcal {U}}}), \qquad \varvec{{\mathcal {U}}}=(\varvec{F},\varvec{H},J,\varvec{D}_0,\varvec{d}), \end{aligned}$$
(15)

where \(\varvec{H}\) and J represent the co-factor and determinant of \(\varvec{F}\), and with the vector \(\varvec{d}\) a vector in the spatial configuration, being the three of them defined as

(16)

Polyconvexity of the internal energy density entails the satisfaction of the following inequality

$$\begin{aligned} \begin{aligned}&D^2{\mathbb {W}}(\varvec{{\mathcal {U}}})[\delta \varvec{{\mathcal {U}}};\delta \varvec{{\mathcal {U}}}]=\delta \varvec{{\mathcal {U}}} \bullet \partial ^2_{\varvec{{\mathcal {U}}}\varvec{{\mathcal {U}}}}{\mathbb {W}} \bullet \delta \varvec{{\mathcal {U}}} \ge 0;\\&\quad \forall \,\,\varvec{{\mathcal {U}}}\in \text {GL}^ +(3)\times \text {GL}^+(3)\times {\mathbb {R}}^+\times {\mathbb {R}}^3\times {\mathbb {R}}^3,\\&\quad \forall \delta \varvec{{\mathcal {U}}}\in {\mathbb {R}}^{3\times 3}\times {\mathbb {R}}^{3\times 3}\times {\mathbb {R}}\times {\mathbb {R}}^3\times {\mathbb {R}}^3 \end{aligned} \end{aligned}$$
(17)

2.3 Invariant-based electromechanics

A simple manner to accommodate the principle of objectivity or material frame indifference and the requirement of material symmetry is through the dependence of the internal energy density function \(e(\varvec{F},\varvec{D}_0)\) with respect to invariants of the right Cauchy-Green deformation gradient tensor \(\varvec{C}=\varvec{F}^T\varvec{F}\) and \(\varvec{D}_0\). Let \(\textbf{I}=\{I_1,I_2,\dots ,I_n\}\), represent the n objective invariants required to characterise a given material symmetry group \({\mathcal {G}}\). Then, it is possible to express the strain energy density \(e(\varvec{F},\varvec{D}_0)\) equivalently as

$$\begin{aligned} e(\varvec{F},\varvec{D}_0)=U(\textbf{I}) \end{aligned}$$
(18)

Application of the chain rule into Eq. (4) yields the first Piola-Kirchhoff stress tensor \(\varvec{P}\) and the material electric displacement field \(\varvec{D}_0\) in terms of the derivatives of \(U(\textbf{I})\) as

$$\begin{aligned} \begin{aligned} \varvec{P}=\sum _{i=1}^n\Big (\partial _{I_i}U\Big ) \partial _{\varvec{F}}I_i;\qquad \varvec{E}_0=\sum _{i=1}^n\Big (\partial _{I_i}U\Big ) \partial _{\varvec{D}_0}I_i \end{aligned} \end{aligned}$$
(19)

2.3.1 Isotropic electromechanics

For the case of isotropy, the invariants required to characterise this material symmetry group, and the first derivatives of the latter with respect to \(\varvec{F}\) and \(\varvec{D}_0\) (featuring in the definition of \(\varvec{P}\) and \(\varvec{E}_0\) in (19)) are

(20)

Inserting the expressions in (20) into (19) yields the following expression for the first Piola-Kirchhoff stress tensor \(\varvec{P}\) and for \(\varvec{E}_0\)

(21)

2.3.2 Transversely isotropic elastromechanics

In the context of transverse isotropy, a preferred direction \(\varvec{N}\) emerges, perpendicular to the material’s plane of isotropy, imparting anisotropic characteristics. Our focus centers on the material symmetry group \({\mathcal {D}}_{\infty h}\) [25], where the structural tensor takes the form \(\varvec{N}\otimes \varvec{N}\). This group is distinct from \({\mathcal {C}}_{\infty }\), also present in transversely isotropic materials, characterized by the structural vector \(\varvec{N}\) and encompassing the potential for piezoelectricity. The \({\mathcal {D}}_{\infty h}\) group, beyond the invariants \(\{I_1,I_2,I_3,I_4,I_5\}\) in (20), is distinguished by three additional invariants, detailed below

(22)

In this case, the first Piola Kirchhoff stress tensor \(\varvec{P}\) and the electric field \(\varvec{E}_0\) adopt the following expressions

(23)

2.4 Application to rank-one laminates

Section 2.3 presented the case of phenomenological internal energy densities using principal invariants. In composite materials, computing effective strain energy density requires homogenization. This section focuses on rank-one laminates, composed of two constituents perpendicular to \(\varvec{N}\). Rank-n homogenization theory [15] relates macroscopic \(\bar{\varvec{F}}\), \(\bar{\varvec{D}}_0\)Footnote 1 to microscopic \(\varvec{F}^a\), \(\varvec{F}^b\), \(\varvec{D}_0^a\), \(\varvec{D}_0^b\) as

(24)

where indices a and b differentiate the constituents and \(c^a\) and \(c^b\) denote their respective volume fractions, with \(c^b=1-c^a\). A possible definition for \(\varvec{F}^a\), \(\varvec{F}^b\), \(\varvec{D}_0^a\) and \(\varvec{D}_0^b\) compatible with (24) is

$$\begin{aligned} \begin{aligned} \varvec{F}^a \left( \bar{\varvec{F}},\varvec{\alpha }\right)&= \bar{\varvec{F}} + c^b \varvec{\alpha } \otimes \varvec{N};&\quad \varvec{F}^b \left( \bar{\varvec{F}},\varvec{\alpha }\right)&= \bar{\varvec{F}} - c^a \varvec{\alpha } \otimes \varvec{N};\\ \varvec{D}_0^a \left( \bar{\varvec{D}}_0,\varvec{\alpha }\right)&= \bar{\varvec{D}}_0 + c^b \varvec{{\mathcal {T}}_N}\varvec{\beta };&\quad \varvec{D}_0^b \left( \bar{\varvec{D}}_0,\varvec{\alpha }\right)&= \bar{\varvec{D}}_0 - c^a \varvec{{\mathcal {T}}_N}\varvec{\beta }. \end{aligned} \end{aligned}$$
(25)

where \(\varvec{\alpha }\in {\mathbb {R}}^3\) and \(\varvec{\beta }\in {\mathbb {R}}^2\) represent the mechanical and electric amplitude vectors, respectively, which need to be determined. Furthermore, \(\varvec{{\mathcal {T}}_N}=\varvec{T}_1\otimes \varvec{E}_1+\varvec{T}_2\otimes \varvec{E}_2\), being \(\varvec{T}_1\) and \(\varvec{T}_2\) any two perpendicular vectors to \(\varvec{N}\), and \(\varvec{E}_1=\begin{bmatrix} 1&0 \end{bmatrix}^T\) and \(\varvec{E}_2=\begin{bmatrix} 0&1 \end{bmatrix}^T\).

Remark 2

Notice that in Eq. in (25), although it might seem a priori not intuitive, the definition of \(\varvec{F}^a\) in terms of \(c^b\) and vice-versa (and also for \(\varvec{D}_0^a\) and \(\varvec{D}_0^b\)) is necessary in order to comply with Eq. (24). The first of these two equations entails

$$\begin{aligned}{} & {} c^a\varvec{F}^a + c^b\varvec{F}^b = c^a\Big (\bar{\varvec{F}} + c^b\varvec{\alpha }\otimes \varvec{N}\Big ) \nonumber \\{} & {} \quad + c^b\Big (\bar{\varvec{F}} - c^a\varvec{\alpha }\otimes \varvec{N}\Big )=\underbrace{\Big (c^a + c^b\Big )}_{=1}\bar{\varvec{F}} =\bar{\varvec{F}} \end{aligned}$$
(26)

which is clearly satisfied, and the second

(27)

which is also satisfied. The same derivations and conclusions are obtained when considering \(\varvec{D}_0^a\) and \(\varvec{D}_0^b\) as in (25).

The determination of \(\varvec{\alpha }\) and \(\varvec{\beta }\) can be done by postulating the effective energy \(e(\bar{\varvec{F}},\bar{\varvec{D}}_0)\) as

$$\begin{aligned} \begin{aligned} e\left( \bar{\varvec{F}},\bar{\varvec{D}}_0\right)&= \arg \underset{\varvec{\alpha },\varvec{\beta }}{\min } \{ \hat{e}\left( \bar{\varvec{F}},\bar{\varvec{D}}_0,\varvec{\alpha },\varvec{\beta }\right) \}, \\ \hat{e}\left( \bar{\varvec{F}},\bar{\varvec{D}}_0,\varvec{\alpha },\varvec{\beta }\right)&= c^a e^a\Big (\varvec{F}^a \left( \bar{\varvec{F}},\varvec{\alpha }\right) ,\varvec{D}_0^a \left( \bar{\varvec{D}}_0,\varvec{\beta }\right) \Big ) \\&\quad + c^b e^b\Big (\varvec{F}^b \left( \bar{\varvec{F}},\varvec{\alpha }\right) ,\varvec{D}_0^b \left( \bar{\varvec{D}}_0,\varvec{\beta }\right) \Big ). \end{aligned} \end{aligned}$$
(28)

The stationary conditions of \(\hat{e}\) with respect to \(\varvec{\alpha }\) and \(\varvec{\beta }\) yield

$$\begin{aligned} \begin{aligned} D\hat{e}\left[ \delta \varvec{\alpha }\right]&= c^a c^b \left( \varvec{P}^a - \varvec{P}^b\right) : \left( \delta \varvec{\alpha \otimes N}\right) = 0\forall \delta \varvec{\alpha }\\&\quad \implies \quad \quad \llbracket \varvec{P} \rrbracket \varvec{N} = \varvec{0};\\ D\hat{e}\left[ \delta \varvec{\beta }\right]&= c^a c^b \left( \varvec{E}_0^a - \varvec{E}_0^b\right) \cdot \left( \varvec{\mathcal {T}_N}\delta \varvec{\beta }\right) = 0 \forall \delta \varvec{\beta } \\&\quad \implies \quad \quad \varvec{\mathcal {T}_N}^T\llbracket \varvec{E}_0 \rrbracket = \varvec{0}. \end{aligned} \end{aligned}$$
(29)

which represent two nonlinear vector equations from which \(\{\varvec{\alpha },\varvec{\beta }\}\) can be obtained. Finally, computation of \(\{\varvec{\alpha },\varvec{\beta }\}\) permits to obtain the effective first Piola-Kirchhoff stress tensor and electric field as

$$\begin{aligned} \begin{aligned}&\varvec{P} = c^a \varvec{P}^a + c^b \varvec{P}^b; \varvec{P}^a = \partial _{\varvec{F}^a}e\Big (\varvec{F}^a\left( \bar{\varvec{F}},\varvec{\alpha }\right) , \varvec{D}_0^a\left( \bar{\varvec{D}}_0,\varvec{\beta }\right) \Big ),\\&\quad \varvec{P}^b=\partial _{\varvec{F}^b}e \Big (\varvec{F}^b\left( \bar{\varvec{F}},\varvec{\alpha }\right) ,\varvec{D}_0^b\left( \bar{\varvec{D}}_0,\varvec{\beta }\right) \Big )\\&\varvec{E}_0 = c^a \varvec{E}_0^a + c^b \varvec{E}_0^b;\\&\quad \varvec{E}_0^a=\partial _{\varvec{D}_0^a}e\Big (\varvec{F}^a\left( \bar{\varvec{F}},\varvec{\alpha }\right) , \varvec{D}_0^a\left( \bar{\varvec{D}}_0,\varvec{\beta }\right) \Big ),\\&\quad \varvec{E}_0^b=\partial _{\varvec{D}_0^b}e \Big (\varvec{F}^b\left( \bar{\varvec{F}},\varvec{\alpha }\right) ,\varvec{D}_0^b\left( \bar{\varvec{D}}_0,\varvec{\beta }\right) \Big ) \end{aligned} \end{aligned}$$
(30)

3 Gaussian process predictors

In the realm of computer experiments, metamodelling or surrogate modelling entails substituting a resource-intensive model or simulator \({U}=\mathscr {M}(\textbf{I})\) with a computationally efficient emulator \(\hat{\mathscr {M}}(\textbf{I})\). Both the simulator and emulator share the same input space \({\mathcal {D}}_I \subseteq {\mathbb {R}}^n\) and output space \({\mathcal {D}}_U \subseteq {\mathbb {R}}\). In our context, \(\mathscr {M}\) represents the response of an actual internal energy density U, dependent on principal invariants \(\textbf{I}\) (as discussed in Sect. 2.3). Thus, we replace the common input field \(\varvec{x}\) with \(\textbf{I}\) and the output y with U. As the internal energy U is scalar, the theoretical developments presented in this paper exclusively pertain to scalar outputs. Our approach employs Kriging models [30, 60], also known as Gaussian Process (GP) modelling. Succinctly, the key components of this method will be detailed in Sects. 3.13.3.

3.1 Gaussian process based prediction

GP modelling assumes that the output \({U}=\mathscr {M}(\textbf{I})\) is characterised by

$$\begin{aligned} {U}=\varvec{g}(\textbf{I})\cdot \varvec{\beta } + {Z}(\textbf{I}), \end{aligned}$$
(31)

where, \(\varvec{g}(\textbf{I})\cdot \varvec{\beta } \) signifies the prior mean of the Gaussian Process (GP), representing a linear regression model over a specific functional basis \({ g_i, i=1,\ldots ,p} \in {\mathcal {L}}_2({\mathcal {D}}I, {\mathbb {R}})\). The subsequent component, denoted as \({Z}(\textbf{I})\), characterizes a GP with a zero mean, a constant variance \(\sigma ^2_{U}\), and a stationary autocovariance function, defined as follows

$$\begin{aligned} C({\textbf{I}},{\textbf{I}^{'}})=\sigma ^2_{U} {\mathcal {R}}(\textbf{I},\textbf{I}^{'}, {\varvec{\theta }}), \end{aligned}$$
(32)

where \({\mathcal {R}}\) is a symmetric positive definite autocorrelation function, and \(\varvec{\theta }\), the vector of hyperparameters. In this work we employ the Gaussian kernel for the definition of \({\mathcal {R}}\), defined as

$$\begin{aligned} {\mathcal {R}}(\varvec{I},\varvec{I}^{'}, {\varvec{\theta }})= \exp \Bigg (\sum _{k=1}^n-\theta _k \left| I_k-I^{'}_k\right| ^2\Bigg ) \end{aligned}$$
(33)

The construction of a Kriging model consist of the two-stage framework described in the upcoming Sects. 3.2 and 3.3.

3.2 The conditional distribution of the prediction

The Bayesian prediction methodology assumes that observations gathered in the vector

$$\begin{aligned} \varvec{U} =\left[ U(\varvec{I}^{(1)}),U(\varvec{I}^{(2)}),\ldots ,U(\varvec{I}^{(m)})\right] ^T \end{aligned}$$

The observed values, including any unobserved \( U(\varvec{I})\), constitute a realization of a random vector adhering to a parametric joint distribution. This section seeks to derive a stochastic prediction for this unobserved quantity by harnessing this statistical interdependence. The Gaussian assumption for \(Z(\textbf{I})\) in Eq. (31) and the linear regression model’s nature enable the inference that the observation vector \(\varvec{U}\) is also Gaussian, characterized by

$$\begin{aligned} \varvec{U} \sim {\mathcal {N}}(\varvec{G}\varvec{\beta },\sigma _U^2 \varvec{R}), \end{aligned}$$
(34)

being \(\varvec{G}\) and \(\varvec{R}\) the regression and correlation matrices, defined as

$$\begin{aligned} G_{ij}=g_j(\textbf{I}^{(i)}),\quad i=1,\ldots ,m,\ j=1,\ldots ,p, \end{aligned}$$
(35)

and

$$\begin{aligned} R_{ij}={\mathcal {R}}(\textbf{I}^{(i)},\textbf{I}^{(j)}, {\varvec{\theta }}) \quad i=1,\ldots ,m,\ j=1,\ldots ,m. \end{aligned}$$
(36)

Likewise, a new random vector, encompassing the observed set \(\varvec{U}\) alongside any unobserved value \(U(\varvec{I})\), follows a joint Gaussian distribution, presented as

$$\begin{aligned} \begin{Bmatrix} \varvec{U}\\ U(\varvec{I}) \end{Bmatrix} \sim {\mathcal {N}}\Bigg ( \begin{Bmatrix} \varvec{G}\\ {\varvec{ g}}(\textbf{I})^T \end{Bmatrix} {\varvec{\beta }}, \sigma _U^2 \begin{bmatrix} {\varvec{R}}&{} {\varvec{ r}}({\textbf{I}})\\ {\varvec{ r}}({\textbf{I}})^T&{}1 \end{bmatrix} \Bigg ), \end{aligned}$$
(37)

where \( {\varvec{ g}}(\textbf{I})\) is the vector of regressors evaluated at \(\textbf{I}\) and \(\varvec{ r}(\textbf{I})\) is the vector of cross-correlations between the observations and prediction given by

$$\begin{aligned} r_{i}(\textbf{I})={\mathcal {R}}(\textbf{I}^{(i)},\textbf{I}, {\varvec{\theta }}) \quad i=1,\ldots ,m. \end{aligned}$$
(38)

Assuming that the autocovariance function given by Eq. (32) is known, the conditional distribution of the prediction \(\hat{U}(\varvec{I})=U(\varvec{I})|\varvec{U}\) is governed by the Best Linear Unbiased Predictor (BLUP) theorem [62]. As per BLUP, the unobserved quantity \(U(\varvec{I})=\mathscr {M}(\textbf{I})\) in the prior model of Eq. (31) follows a Gaussian distribution, represented by the Gaussian random variable \(\hat{U}\) with a specific mean:

$$\begin{aligned} \begin{aligned} \mu _{\hat{U}}(\textbf{I})&=\varvec{g}(\textbf{I})\cdot \hat{\varvec{\beta }} + \varvec{r}({\textbf{I}})\cdot \varvec{R}^{-1}\Big (\varvec{U} - \varvec{G}\hat{\varvec{\beta }}\Big ), \end{aligned} \end{aligned}$$
(39)

and variance

$$\begin{aligned} \begin{aligned}&\sigma ^2_{\hat{U}}(\textbf{I})=\sigma _U^2\Big (1-\varvec{r}({\textbf{I}})\cdot \varvec{R}^{-1}\varvec{r}({\textbf{I}}) + \varvec{u}({\textbf{I}})\cdot \Big (\varvec{G}^T\varvec{R}^{-1}\varvec{G}\Big )^{-1}\varvec{u}({\textbf{I}}) \Big ), \end{aligned} \end{aligned}$$
(40)

where

$$\begin{aligned} \begin{aligned} \hat{\varvec{\beta }}&=\Big (\varvec{G}^T\varvec{R}^{-1}\varvec{G}\Big )^{-1}\varvec{G}^T\varvec{R}^{-1}\varvec{U}, \end{aligned} \end{aligned}$$
(41)

is the generalised least-squares estimate of the underlying regression problem, and

$$\begin{aligned} \begin{aligned} \varvec{u}( {\textbf{I}})=\varvec{G}^T\varvec{R}^{-1}\varvec{r}({\textbf{I}}) - \varvec{g}({\textbf{I}}). \end{aligned} \end{aligned}$$
(42)

For those readers unfamiliar with the previous derivations, we suggest consulting the comprehensive treatment offered in Reference [18], which provides a thorough elucidation of the foundational mathematical principles and methodologies involved.

3.3 Joint maximum likelihood estimation of the GP parameters

In the previous sections, we operated under the assumption of a known autocovariance function. However, the specifics of the correlation functions \({\mathcal {R}}(\textbf{I},\textbf{I}^{'}, {\varvec{\theta }})\) and the variance value \(\sigma _U^2\) are typically not known in advance. In this study, we pre-define the correlation function type (specifically we make use of Gaussian kernels in (33) [18]), and the determination of hyperparameters \(\varvec{\theta }\) and variance \(\sigma _U^2\) is achieved using the observation dataset via the technique of maximum likelihood estimation (MLE). The outcome of this process yields the empirical best linear unbiased predictors (EBLUP) [62]. The estimation of GP parameters involves solving the following minimization problem

$$\begin{aligned} \left\{ \varvec{\beta }^*,{\sigma _U^2}^*,\varvec{\theta }^*\right\}= & {} \textrm{arg}\min _{\varvec{\beta },\sigma ^2_U,\varvec{\theta }}\ \ \mathscr {L}(\varvec{U}|\varvec{\beta },\sigma _U^2,\varvec{\theta }), \end{aligned}$$
(43)

where \(\mathscr {L}(\varvec{U}|\varvec{\beta },\sigma _U^2,\varvec{\theta })\) is the opposite log-likelihood of the observations \(\varvec{U}\) with respect to its multivariate normal distribution given by

$$\begin{aligned}&\mathscr {L}(\varvec{U}|\varvec{\beta },\sigma _U^2,\varvec{\theta })= \frac{1}{2\sigma _U^2}(\varvec{U}-\varvec{G}\varvec{\beta })^T\varvec{R}(\varvec{\theta })^{-1}(\varvec{U} -\varvec{G}\varvec{\beta })\nonumber \\&+\frac{m}{2}log(2\pi )+\frac{m}{2}log(\sigma _U^2)+\frac{1}{2}log(|\varvec{R}(\varvec{\theta })|). \end{aligned}$$
(44)

The MLE of \(\varvec{\beta }\) and \(\sigma _{U}^2\) are obtained from the first order optimality conditions of \(\mathscr {L}(\varvec{U}| \varvec{\beta },\sigma _U^2,\varvec{\theta })\), namely

$$\begin{aligned} \left\{ \begin{aligned} \partial _{\varvec{\beta }}\mathscr {L}&=\frac{1}{\sigma _U^2}\varvec{G}^T\varvec{R}^{-1}\left( \varvec{G}\varvec{\beta }-\varvec{U}\right) =0;\\ \partial _{\sigma _U^2}\mathscr {L}&=\frac{1}{2\sigma _U^2}\Bigg (m-\frac{\Big (\varvec{U} -\varvec{G}\varvec{\beta }\Big ) \cdot \varvec{R}^{-1} \Big (\varvec{U}-\varvec{G}\varvec{\beta }\Big )}{\sigma _U^2}\Bigg )=0; \end{aligned} \right. \end{aligned}$$
(45)

from which the following optimal values can be obtained

$$\begin{aligned} \begin{aligned} \varvec{\beta }^*(\varvec{\theta })&=\Big (\varvec{G}^T\varvec{R}(\varvec{\theta })^{-1}\varvec{G}\Big )^{-1}\varvec{G}^T \Big (\varvec{R}(\varvec{\theta })\Big )^{-1}\varvec{U};\\ {\sigma _U^2}^*(\varvec{\theta })&=\frac{1}{m}\Big (\varvec{U}-\varvec{G}\varvec{\beta }^*(\varvec{\theta })\Big )\cdot \Big (\varvec{R}(\varvec{\theta })\Big )^{-1}\cdot \Big (\varvec{U}-\varvec{G}\varvec{\beta }^*(\varvec{\theta })\Big ). \end{aligned} \end{aligned}$$
(46)

Substituting \(\varvec{\beta }^{*}(\varvec{\theta }) \) and \({\sigma _U^2}^*(\varvec{\theta }) \) into the log-likelihood function (44) enables it to be re-written as

$$\begin{aligned} \begin{aligned}&\mathscr {L}(\varvec{U}|\varvec{\beta }^*,{\sigma _U^2}^*,\varvec{\theta })= \frac{m}{2}+\frac{m}{2}log(2\pi )\\&\quad +\frac{m}{2}log\Big ({\sigma _U^2}^*(\varvec{\theta })\Big )+\frac{1}{2}log\Big (|\varvec{R}(\varvec{\theta })|\Big )\\&\quad =\frac{m}{2}log(\psi (\varvec{\theta }))+\frac{m}{2}(log(2\pi )+1), \end{aligned} \end{aligned}$$
(47)

where the reduced likelihood function has been introduced as

$$\begin{aligned} \psi (\varvec{\theta })={\sigma _U^2}^*(\varvec{\theta })|\varvec{R}(\varvec{\theta })|^{1/m}. \end{aligned}$$
(48)

This entails that the minimisation problem in Eq. (43) is equivalent to

$$\begin{aligned} \begin{aligned}&\varvec{\theta }^*=\textrm{arg}\min _{\varvec{\theta }}\ \ \psi (\varvec{\theta }),\qquad&s.t. \ \ [\varvec{\theta }]_i\ge 0\,\,\,\,i=\{1,2,\cdots ,n\} \end{aligned}\nonumber \\ \end{aligned}$$
(49)

Unfortunately, an analytical solution for the optimal hyperparameters \(\varvec{\theta }\in {\mathbb {R}}^n\) is unavailable. Instead, a numerical minimization approach is typically employed. In our research, we employ the box-min algorithm [69].

3.4 Gradient-enhanced Gaussian-process based prediction

In addition to function observations, leveraging output derivatives concerning input variables is possible, aiming to enhance predictor accuracy. This gives rise to what is termed Gradient Enhanced Kriging in the literature [23, 35], in contrast to the conventional Kriging detailed in Sects. 3.13.3. To establish a gradient-enhanced predictor, the observation vector is extended to encompass derivatives of the strain energy density U concerning its input variables \(\textbf{I}\), resulting in:

$$\begin{aligned} \varvec{U} =\left[ U^{(1)},\ldots ,U^{(m)}, \partial _{\textbf{I}}U^{(1)},\ldots ,\partial _{\textbf{I}}U^{(m)}\right] ^T, \end{aligned}$$
(50)

where

$$\begin{aligned} U^{(i)}=U(\varvec{I}^{(i)})\quad \quad \partial _{\textbf{I}}U^{(i)} = \left[ \partial _{I_1}U^{(i)}, \ldots , \partial _{I_n}U^{(i)} \right] ^T. \end{aligned}$$
(51)

To interpolate both the variable and its gradient at any unobserved location, the extension of the correlation matrix \(\varvec{R}\) is necessary to incorporate the correlation between the variable and its gradient, formulated as

$$\begin{aligned} \varvec{R}=\begin{bmatrix} \varvec{R}_{\varvec{UU}} &{} \varvec{R}_{\varvec{U}\varvec{U}^{\prime }}\\ \varvec{R}_{\varvec{U}\varvec{U}^{\prime }}^T &{} \varvec{R}_{\varvec{U}^{\prime }\varvec{U}^{\prime }} \end{bmatrix}, \end{aligned}$$
(52)

where \(\varvec{R}_{\varvec{UU}} \) is the correlation matrix presented in (36) for the non-gradient case. \(\varvec{R}_{\varvec{UU'}} \) includes the partial derivatives of \({\mathcal {R}}\) according to

$$\begin{aligned}{} & {} \varvec{R}_{\varvec{U}\varvec{U}^{\prime }} \nonumber \\{} & {} = \begin{bmatrix} \partial _{\textbf{I}^{(1)}}{\mathcal {R}}(\textbf{I}^{(1)},\textbf{I}^{(1)},\varvec{\theta }) &{} \dots &{} \partial _{\textbf{I}^{(m)}}{\mathcal {R}}(\textbf{I}^{(1)},\textbf{I}^{(m)},\varvec{\theta })\\ \vdots &{} \ddots &{} \vdots \\ \partial _{\textbf{I}^{(1)}}{\mathcal {R}}(\textbf{I}^{(m)},\textbf{I}^{(1)},\varvec{\theta }) &{} \dots &{} \partial _{\textbf{I}^{(m)}}{\mathcal {R}}(\textbf{I}^{(m)},\textbf{I}^{(m)},\varvec{\theta })\end{bmatrix}, \end{aligned}$$
(53)

given

$$\begin{aligned}{} & {} \partial _{\textbf{I}^{(j)}} {\mathcal {R}}(\textbf{I}^{(i)},\textbf{I}^{(j)},\varvec{\theta })\nonumber \\{} & {} \quad =\left[ \frac{\partial {\mathcal {R}}(\textbf{I}^{(i)},\textbf{I}^{(j)},\varvec{\theta })}{\partial {I}^{(j)}_1}, \frac{\partial {\mathcal {R}}(\textbf{I}^{(i)},\textbf{I}^{(j)},\varvec{\theta })}{\partial {I}^{(j)}_2},\ldots , \frac{\partial {\mathcal {R}}(\textbf{I}^{(i)},\textbf{I}^{(j)},\varvec{\theta })}{\partial {I}^{(j)}_n}\right] ^T. \qquad \quad \end{aligned}$$
(54)

The submatrix \(\varvec{R}_{\varvec{U^{\prime }U^{\prime }}} \) contains the second derivatives

$$\begin{aligned} \varvec{R}_{\varvec{U}^{\prime }\varvec{U}^{\prime }} = \begin{bmatrix} \partial ^2_{\textbf{I}^{(1)}{} \textbf{I}^{(1)}}{\mathcal {R}}(\textbf{I}^{(1)},\textbf{I}^{(1)},\varvec{\theta }) &{} \dots &{} \partial ^2_{\textbf{I}^{(1)}{} \textbf{I}^{(m)}}{\mathcal {R}}(\textbf{I}^{(1)},\textbf{I}^{(m)},\varvec{\theta })\\ \vdots &{} \ddots &{} \vdots \\ \partial ^2_{\textbf{I}^{(m)}{} \textbf{I}^{(1)}}{\mathcal {R}}(\textbf{I}^{(m)},\textbf{I}^{(1)},\varvec{\theta }) &{} \dots &{} \partial ^2_{\textbf{I}^{(m)}{} \textbf{I}^{(m)}}{\mathcal {R}}(\textbf{I}^{(m)},\textbf{I}^{(m)},\varvec{\theta }) \end{bmatrix},\nonumber \\ \end{aligned}$$
(55)

where

$$\begin{aligned}{} & {} \partial ^2_{\textbf{I}^{(i)}{} \textbf{I}^{(j)}} {\mathcal {R}}(\textbf{I}^{(i)},\textbf{I}^{(j)},\varvec{\theta })\nonumber \\{} & {} = \begin{bmatrix} \partial ^2_{\textbf{I}^{(1)}{} \textbf{I}^{(1)}}{\mathcal {R}}(\textbf{I}^{(i)},\textbf{I}^{(j)},\varvec{\theta })&{} \dots &{} \partial ^2_{\textbf{I}^{(1)}{} \textbf{I}^{(m)}}{\mathcal {R}}(\textbf{I}^{(i)},\textbf{I}^{(j)},\varvec{\theta })\\ \vdots &{} \ddots &{} \vdots \\ \partial ^2_{\textbf{I}^{(m)}{} \textbf{I}^{(1)}}{\mathcal {R}}(\textbf{I}^{(i)},\textbf{I}^{(j)},\varvec{\theta })&{} \dots &{} \partial ^2_{\textbf{I}^{(m)}{} \textbf{I}^{(m)}}{\mathcal {R}}(\textbf{I}^{(i)},\textbf{I}^{(j)},\varvec{\theta }) \end{bmatrix}.\nonumber \\ \end{aligned}$$
(56)

Similarly the vector of cross-correlations between the observations and the prediction is extended as follows

$$\begin{aligned}{} & {} \varvec{r} (\textbf{I})=\left[ {\mathcal {R}}(\textbf{I},\textbf{I}^{(1)},\varvec{\theta }),\right. \nonumber \\{} & {} \quad \left. \ldots , {\mathcal {R}}(\textbf{I},\textbf{I}^{(m)},\varvec{\theta }), \partial _{\textbf{I}^{(1)}}{\mathcal {R}}(\textbf{I},\textbf{I}^{(1)},\varvec{\theta }), \ldots , \partial _{\textbf{I}^{(m)}}{\mathcal {R}}(\textbf{I},\textbf{I}^{(m)},\varvec{\theta })\right] ^T.\nonumber \\ \end{aligned}$$
(57)

Once these adaptations have been made, the revised definitions for the respective quantities can be inserted into the descriptions provided in Sects. 3.2 and 3.3. To begin, let us revisit the mean prediction

$$\begin{aligned} \begin{aligned} \mu _{\hat{U}}(\textbf{I})&=\varvec{g}(\textbf{I})\cdot \hat{\varvec{\beta }} + \varvec{r}({\textbf{I}})\cdot \varvec{R}^{-1}\Big (\varvec{U} - \varvec{G}\hat{\varvec{\beta }}\Big )\\ \end{aligned}, \end{aligned}$$
(58)

and the variance

$$\begin{aligned} \begin{aligned} \sigma ^2_{\hat{U}}(\textbf{I})&=\sigma _U^2\Big (1-\varvec{r}({\textbf{I}})\cdot \varvec{R}^{-1}\varvec{r}({\textbf{I}}) + \varvec{u}({\textbf{I}})\cdot \Big (\varvec{G}^T\varvec{R}^{-1}\varvec{G}\Big )^{-1}\varvec{u}({\textbf{I}}) \Big ) \end{aligned},\nonumber \\ \end{aligned}$$
(59)

with

$$\begin{aligned} \begin{aligned} \varvec{\beta }^*(\varvec{\theta })&=\Big (\varvec{G}^T\varvec{R}(\varvec{\theta })^{-1}\varvec{G}\Big )^{-1} \varvec{G}^T\Big (\varvec{R}(\varvec{\theta })\Big )^{-1}\varvec{U};\\ {\sigma _U^2}^*(\varvec{\theta })&=\frac{1}{n(1+m)}\Big (\varvec{U}-\varvec{G}\varvec{\beta }^*(\varvec{\theta })\Big ) \cdot \Big (\varvec{R}(\varvec{\theta })\Big )^{-1}\cdot \Big (\varvec{U}-\varvec{G}\varvec{\beta }^*(\varvec{\theta })\Big ); \end{aligned}\nonumber \\ \end{aligned}$$
(60)

and

$$\begin{aligned}&\varvec{G}(\textbf{I})= \begin{bmatrix} \varvec{G}_{\varvec{U}}\\ \varvec{G}_{\varvec{U}^{\prime }} \end{bmatrix};\qquad \varvec{G}_{\varvec{U}}=\begin{bmatrix} \left( \varvec{g}(\textbf{I}^{(1)})\right) ^T\\ \vdots \\ \left( \varvec{g}(\textbf{I}^{(m)})\right) ^T \end{bmatrix};\nonumber \\&\quad \varvec{G}_{\varvec{U}^{\prime }}=\begin{bmatrix} \left( \partial _{\textbf{I}^{(1)}}\varvec{g}(\textbf{I}^{(1)})\right) ^T\\ \vdots \\ \left( \partial _{\textbf{I}^{(m)}}\varvec{g}(\textbf{I}^{(m)})\right) ^T \end{bmatrix}. \end{aligned}$$
(61)

Finally, the optimal hyperparameters are achieved by minimizing the log-likelihood function.

$$\begin{aligned} \begin{aligned}&\mathscr {L}(\varvec{U}|\varvec{\beta }^*,{\sigma _U^2}^*,\varvec{\theta })=\frac{m(1+n)}{2} log(\psi (\varvec{\theta }))\\&\quad +\frac{m(1+n)}{2}(log(2\pi )+1). \end{aligned} \end{aligned}$$
(62)

Remark 3

The conceptual framework presented in Sects. 2 and 3 has been constructed around the internal energy density \(e(\varvec{F},\varvec{D}_0)\). Yet, intertwined with this formulation, lies the ability to define its corresponding dual, referred to as the free energy density, symbolized as \(\varPsi (\varvec{F},\varvec{E})\). This duality is established through the subsequent Legendre transformation (Fig. 2)

$$\begin{aligned} \varPsi (\varvec{F},\varvec{E}) = -\sup _{\varvec{E}_0}\{\varvec{D}_0 - e(\varvec{F},\varvec{D}_0)\} \end{aligned}$$
(63)
Fig. 2
figure 2

a Convex nature of the strain energy density \(e(\varvec{F},\varvec{D}_0)\) and b saddle point nature of the free energy density \(\varPsi (\varvec{F},\varvec{E}_0)\) in the vicinity of \(\varvec{F}\approx \varvec{I}\) and \(\varvec{E}_0\approx \varvec{0}\)

The free energy density \(\varPsi (\varvec{F},\varvec{E}_0)\) imposes distinct convexity constraints compared to its dual counterpart \(e(\varvec{F},\varvec{D}_0)\). As a consequence, in the proximity of \(\varvec{F}\approx \varvec{I}\) and \(\varvec{E}_0\approx \varvec{0}\), it assumes the character of a saddle point function, exhibiting convexity with respect to \(\varvec{F}\) while displaying concavity concerning \(\varvec{E}_0\). This divergence in convexity/concavity attributes in the context of both mechanics and electro physics can introduce challenges in the application of Kriging or Gradient Enhanced Kriging interpolation models that rely on invariants of \(\varvec{F}\) and \(\varvec{E}_0\) (i.e., free energy-based Kriging), as opposed to those formulated in terms of \(\varvec{F}\) and \(\varvec{D}_0\) (i.e., internal energy-based Kriging). Notably, our observations underscore that an internal energy-focused approach yields markedly superior outcomes compared to the utilization of the free energy density methodology.

Remark 4

Applying Kriging and Gradient Enhanced Kriging techniques (discussed in Sects. 3.1 to 3.4) can be criticized due to the need to include strain energy values (U) at each observation or training point. This constraint limits their suitability for datasets from physical experiments, unlike those from in-silico or numerical sources, the paper’s focus. Quantifying energy measurements poses challenges in such cases. Yet, the Gradient Enhanced Kriging excels in adaptability, accommodating cases with a single observation (\(\varvec{F}=\varvec{I}\)) where U often equals zero. Here, derivative information at this point, combined with derivatives from other points, can be seamlessly integrated. This tailored approach is elaborated in Appendix C.

3.5 Derivatives of strain energy density for Gradient Enhanced Kriging

As detailed in Sect. 3.4, the gradient-enhanced Kriging method incorporates both the internal energy density U and its derivatives concerning the invariants \(\textbf{I}={I_1,I_2,\dots ,I_n}\). In cases involving isotropy or transverse isotropy within material symmetry groups, coupled with a principal invariant approach (see Sect. 2.3), addressing the derivatives of U with respect to \(\textbf{I}\) becomes imperative. While obtaining these derivatives for analytical energies like those derived from a Mooney–Rivlin model is straightforward, intricate internal energy densities arising from complex homogenization techniques in composites (e.g., rank-one laminates in Sect. 2.4) may lack readily available derivatives. In such scenarios, deriving these derivatives from the first Piola-Kirchhoff stress tensor \(\varvec{P}\) and the electric field \(\varvec{E}_0\) can be undertaken using conventional linear algebra principles. To facilitate this, let’s revisit the invariant-based expressions for \(\varvec{P}\) and \(\varvec{E}_0\) in (19), conveniently restated below

$$\begin{aligned} \begin{aligned}&\varvec{{P}} = \Big (\partial _{I_1} U\Big ) \varvec{V}_1 + \dots + \Big (\partial _{I_n}U\Big )\varvec{V}_n\\&\quad \varvec{E}_0 = \Big (\partial _{I_1} U\Big ) \varvec{W}_1 +\dots + \Big (\partial _{I_n}U\Big )\varvec{W}_n \end{aligned} \end{aligned}$$
(64)

where

$$\begin{aligned} \varvec{V}_i=\partial _{\varvec{F}}I_i;\qquad \qquad \varvec{W}_i=\partial _{\varvec{D}_0}I_i. \end{aligned}$$
(65)

Let us introduce now the following notation

$$\begin{aligned} \varvec{{\mathcal {A}}}=\begin{bmatrix} \hat{\varvec{P}}\\ \varvec{E}_0 \end{bmatrix};\qquad \qquad \varvec{{\mathcal {W}}}_i=\begin{bmatrix} \hat{\varvec{U}}_i\\ \varvec{W}_i \end{bmatrix},\,\,\,i=\{1,\dots ,n\} \end{aligned}$$
(66)

where \(\hat{\varvec{P}}\in {\mathbb {R}}^9\) and \(\hat{\varvec{U}}_i\in {\mathbb {R}}^9\) represent the vectorised expressions for both \(\varvec{P}\) and \(\varvec{U}_i\). This entails that \(\varvec{{\mathcal {A}}}\) can be conveniently written in terms of \(\varvec{{\mathcal {W}}}_i\) as

$$\begin{aligned} \begin{aligned} \varvec{{{\mathcal {A}}}} = \Big (\partial _{I_1} U\Big ) \varvec{{\mathcal {W}}}_1 + \dots + \Big (\partial _{I_n}U\Big )\varvec{{\mathcal {W}}}_n \end{aligned} \end{aligned}$$
(67)

In (67), \(\varvec{{\mathcal {W}}}_i\) can be understood as the linear independent vectors of a basis, whilst \(\Big (\partial _{I_i} U\Big )\) represent the coordinates of \(\varvec{{\mathcal {A}}}\) along the vectors \(\varvec{{\mathcal {W}}}_i\), (\(i=\{1,\dots ,n\}\)). As standard in basic courses of linear algebra, given \(\varvec{{\mathcal {A}}}\), the coordinates \(\Big (\partial _{I_i} U\Big )\) can be obtained through projection of the latter over the n vector of the basis, which yields the following linear system of equations

$$\begin{aligned}{} & {} \begin{bmatrix} \varvec{{\mathcal {A}}}\cdot \varvec{{\mathcal {W}}}_1\\ \varvec{{\mathcal {A}}}\cdot \varvec{{\mathcal {W}}}_2\\ \vdots \\ \varvec{{\mathcal {A}}}\cdot \varvec{{\mathcal {W}}}_n \end{bmatrix} ={\varvec{M}} \begin{bmatrix} \partial _{I_1} U\\ \partial _{I_2} U\\ \vdots \\ \partial _{I_n} U \end{bmatrix},\nonumber \\{} & {} \qquad \left[ {\varvec{M}}\right] _{ij}=\varvec{{\mathcal {W}}}_i\cdot \varvec{{\mathcal {W}}}_j,\,\,\,\, i,j=\{1,\dots ,n\} \end{aligned}$$
(68)

Remark 5

By examining the algebraic system of equations presented in (68), it becomes feasible to ascertain the conditions under which a solution for \(\{\partial _{I_1}U,\dots ,\partial _{I_n}U\}\), representing the components of \(\varvec{{\mathcal {A}}}\) with respect to the basis \(\{\varvec{{\mathcal {W}}}_1,\dots ,\varvec{{\mathcal {W}}}_n\}\), can be derived. Notably, the solvability of (68) (i.e., the linearity independence of \(\{\varvec{{\mathcal {W}}}_1,\dots ,\varvec{{\mathcal {W}}}_n\}\)) hinges on the determinant of the system, which must not equal zero. Ill-conditioning in the equation system (68) can stem from several factors: identical principal stretches of deformation (found in both isotropic and transverse isotropic models), alignment of principal deformation directions with the preferred direction of transverse isotropy, or alignment of \(\varvec{D}_0\) with one of the principal directions of \(\varvec{F}\). To rectify this numerical issue, we propose a perturbation approach, introducing slight variations to the identical principal stretches and a misalignment of the coinciding principal direction with the preferred direction. This ensures solvability of (68).

3.5.1 Noise regularisation

In cases of substantial training data and the incorporation of derivative information into the training strategy (e.g., in the context of gradient-enhanced Kriging), the correlation matrix \(\varvec{R}\) defined in Eq. (52) can become ill-conditioned. To mitigate this issue, a customary practice is to introduce regularization by augmenting the correlation matrix with a diagonal matrix as follows: [19, 49, 66]

$$\begin{aligned} \varvec{R}=\begin{bmatrix} \varvec{R}_{\varvec{UU}} + \varepsilon _1\varvec{I}_{m\times m} &{} \varvec{R}_{\varvec{U}\varvec{U}^{\prime }}\\ \varvec{R}_{\varvec{U}\varvec{U}^{\prime }}^T &{} \varvec{R}_{\varvec{U}^{\prime }\varvec{U}^{\prime }} + \varepsilon _2\varvec{I}_{m\cdot n\times m\cdot n} \end{bmatrix}, \varepsilon _1,\varepsilon _2\in {\mathbb {R}}^+ \end{aligned}$$
(69)

While our paper primarily highlights the interpolation properties of this technique, we consistently employ sufficiently small values of \(\varepsilon _1\) and \(\varepsilon _2\) to mitigate potential challenges. It is noteworthy, as elucidated in Remark 3, that Kriging and its gradient counterpart can achieve interpolation when \(\varvec{R}\) remains unregularized, specifically for \(\varepsilon _1=\varepsilon _2=0\). However, when \(\varepsilon _1\ne 0\) and \(\varepsilon _2\ne 0\), and in the extreme scenario of both parameters assuming larger values, Kriging transitions from an interpolation technique to a regression technique. Thus enabling to filter noisy data.

To illustrate this technique we explore the performance of Gradient-Enhanced Kriging in a regression context, particularly when confronted with severely ill-conditioned correlation matrices arising from noise-contaminated data. To elucidate this aspect, we employ two designated training samples, denoted as:

  • Unperturbed sample training sample devoid from noise in the output variables, including the values of the energy \(\varPsi (\varvec{F})=U(I_1,I_2,I_3,I_5)\) and its derivatives \(\{\partial _{I_1}U,\partial _{I_2}U,\partial _{I_3}U,\partial _{I_5}U\}\), where the ground-truth constitutive model from which these data have been generated in-silico corrresponds with the Mooney–Rivlin/ideal dielectric model described in Appendix A.

  • Noisy sample this training sample has been obtained by perturbing the deterministic sample according to:

    $$\begin{aligned}{} & {} \widetilde{U}=U + {\mathcal {N}}(0,\sigma _U);\qquad \widetilde{\partial _{I_i}U}=\partial _{I_i}U + {\mathcal {N}}(0,\sigma _{\partial _{I_i}U}),\, \nonumber \\{} & {} \quad i=\{1,2,3,5\} \end{aligned}$$
    (70)

    with

    $$\begin{aligned}{} & {} \sigma _U=\,0.2\cdot \text {mean}(U);\qquad \sigma _{\partial _{I_i}U}=0.2\nonumber \\{} & {} \quad \cdot \,\text {mean}({\partial _{I_i}U}),\,i=\{1,2,3,5\} \end{aligned}$$
    (71)

In both datasets, Fig. 3 illustrates the performance of interpolation and regression based approaches. In the case of the unperturbed sample (see Fig. 3a and b), Kriging prefectly reproduces the training data points (represented by circles). Conversely, in the noisy sample, Kriging strives to replicate the perturbed and irregular data to the greatest extent possible. Discrepancies observed at certain points, resulting in the ill-conditioning of the correlation matrix and subsequently the loss of interpolation properties. Notably, it is evident that the condition number of the matrix \(\varvec{R}\) experiences a substantial increase when dealing with the noisy sample, as illustrated in Fig. 3c. This observation aligns with expectations and raises concerns regarding the predictive accuracy of Kriging between training points, potentially leading to undesired oscillations.

Alternatively, we have explored a regression-based methodology, as detailed in [19, 49]. In this context, the regularization parameters \(\{\varepsilon _1,\varepsilon _2\}\) are treated as supplementary hyperparameters. Consequently, both sets of hyperparameters, namely \(\{\theta _1,\theta _2,\theta _3\}\) and \(\{\varepsilon _1,\varepsilon _2\}\), are optimized through the minimization of the reduced likelihood function \(\psi (\widetilde{\mathbf {\theta }})\)

$$\begin{aligned}&\widetilde{\varvec{\theta }}^*=\textrm{arg}\min _{\widetilde{\varvec{\theta }}}\ \ \psi (\widetilde{\varvec{\theta }}),\quad s.t. \ \ [\widetilde{\varvec{\theta }}]_i\ge 0\,\,\,\,i=\{1,2,\cdots ,5\}\nonumber \\ \end{aligned}$$
(72)

where the augmented set of hyperparameters is defined as \(\widetilde{\varvec{\theta }}=\{\theta _1,\theta _2,\theta _3,\varepsilon _1,\varepsilon _2\}\). Applying this approach to only the the noisy sample yields the outcomes depicted in Fig. 3. The values of \(\{\varepsilon _1,\varepsilon _2\}\) are determined to strike a balance between the interpolation and regression properties of the Kriging response. Naturally, the response does not precisely match the noisy data, thereby avoiding the introduction of undesirable oscillations caused by data perturbations (see Fig. 3f).

Fig. 3
figure 3

Performance of regression-based Gradient-Enhanced Kriging. a, b interpolation using an unperturbed training sample; c, d interpolation using a perturbed training sample; e, f Regression using a perturbed training sample

4 Calibration of Kriging and Gradient Enhanced Kriging predictors

4.1 Design of Experiments

In this section, we present the procedure used for generating synthetic data, utilizing a diverse set of ground truth constitutive models. The internal energy densities and material parameters for these models can be found in A. To acquire the dataset, we adhere to the procedure outlined in [34], extended to the coupled context of electromechanics. The deformation gradient tensor \(\varvec{F}\) is parameterized via a chosen set of deviatoric directions, amplitudes, and Jacobians (J, i.e., the determinant of \(\varvec{F}\)). The process of generating sample points for deviatoric directions, amplitudes, and Jacobians is elucidated in Algorithm 1. Similarly, the electric displacement \(\varvec{D}_0\) is also parametrised in terms of unitary directions and amplitudes. Concerning the deviatoric directions for \(\varvec{F}\), denoted as \(\varvec{V_F}\) we formulate them using a spherical parametrization in \({\mathbb {R}}^5\), precisely representing these directions using five pertinent angular measures (\(\phi _1, \phi _2, \phi _3, \phi _4, \phi _5\in [0,2\pi ]\times [0,\pi ]\times [0,\pi ]\times [0,\pi ]\times [0,\pi ]\)) within this 5-dimensional space. For the directions employed for the parametrisation of \(\varvec{D}_0\), denoted as \(\varvec{V}_{\varvec{D}_0}\), these are created using a spherical parametrization in \({\mathbb {R}}^3\), using as angular measures \((\theta ,\psi )\in [0,2\pi ]\times [0,\pi ]\), namely

Algorithm 1
figure a

Pseudo-code for sample generation

$$\begin{aligned}{} & {} \varvec{V}_{\varvec{F}}^i = \begin{bmatrix} \cos {\phi _1^i} \\ \sin {\phi _1^i}\cos {\phi _2^i} \\ \sin {\phi _1^i}\sin {\phi _2^i}\cos {\phi _3^i} \\ \sin {\phi _1^i}\sin {\phi _2^i}\sin {\phi _3^i}\cos {\phi _4^i} \\ \sin {\phi _1^i}\sin {\phi _2^i}\sin {\phi _3^i}\sin {\phi _4^i} \end{bmatrix};\,\, 1\le i\le n_{\varvec{V}};\nonumber \\{} & {} \quad \varvec{V}_{\varvec{D}_0}^i=\begin{bmatrix} \cos \theta ^i\sin \psi ^i\\ \sin \theta ^i\sin \psi ^i\\ \cos \psi ^i\end{bmatrix};\,\, 1\le i\le n_{\varvec{V}} \end{aligned}$$
(73)

Once the sample is generated following Algorithm 1, the reconstruction of the deformation gradient tensor \(\varvec{F}\) and of \(\varvec{D}_0\) becomes possible at each of the sampling points. This reconstruction process is demonstrated in Algorithm 2, where \({\varvec{\Psi }}\) represents the basis for symmetric and traceless tensors (refer to Appendix B for details on \(\varvec{\Psi }\)).

Algorithm 2
figure b

Pseudo-code for construction of the set of deformation gradient tensors and electric displacement fields

Fig. 4
figure 4

Evolution of metric \(\hat{E}_{\varvec{P}}\) for both ordinary and Gradient with the number of training of infill points for: a Mooney–Rivlin/ideal dielectric (MR/ID); b rank-one laminate composite model (ROL)

4.2 Calibration and Validation

The synthetic data, generated as per Sect. 4.1, calibrates Kriging and Gradient Enhanced Kriging surrogates, following principles in Sect. 3. To assess surrogate accuracy at non-observation points, generated evaluation points mirror the procedure in Sect. 4.1. These points test the model performance but are not part of calibration. For validation, a substantial validation set of 10, 000 data points is used. This density ensures assessment of the smaller calibration set’s accuracy. Validation comprehensively evaluates the surrogate model’s performance, verifying its reliability and generalizability.

The calibration and validation process has been carried across a diverse range of constitutive models. These include: (a) Mooney–Rivlin/ideal dielectric model (MR/ID); (b) Arruda-Boyce/ideal dielectric (AB/ID) (see Reference [1]); (c) Gent/ideal dielectric (Gent/ID); (d) Quadratic Mooney–Rivlin/ideal dielectric (QMR/ID); (e) Yeoh/ideal dielectric (Yeoh/ID); (f) Rank-one laminate composite (ROL). Specific expressions for strain energy densities and material parameters are available in A. For each model, 2 training datasets are generated, each containing \(N=\{45, 100\}\) training points. Kriging and Gradient Enhanced Kriging models are calibrated for all 6 ground truth models within each training set. Results include mean squared error (\(R^2(\varvec{P})\) and \(R^2(\varvec{E}_0)\)) for first Piola-Kirchhoff stress tensor \(\varvec{P}\) and \(\varvec{E}_0\), and values of \(\hat{E}_{\varvec{P}}\) and \(\hat{E}_{\varvec{E}_0}\), defined below

$$\begin{aligned}{} & {} \hat{E}_{\varvec{P}} = \max \left( \frac{ \Vert {\varvec{P}^{An}}^i - {\varvec{P}^{Kr}}^i \Vert }{ \Vert {\varvec{P}^{An}}^i \Vert },\right) ;\nonumber \\{} & {} \hat{E}_{\varvec{E}_0} = \max \left( \frac{ \Vert {\varvec{E}_0^{An}}^i - {\varvec{E}_0^{Kr}}^i \Vert }{ \Vert {\varvec{E}_0^{An}}^i \Vert },\right) \,\,i=\{1,\cdots ,n=10,000\},\nonumber \\ \end{aligned}$$
(74)

are presented in Table 1 (for \(N=45\) training points) and 2 (for \(N=100\) training points). In Eq. (74), \(||\varvec{A}||\) denotes the Frobenius norm of \(\varvec{A}\), n is the number of experiments, \({\varvec{P}^{An}}^i\) and \({\varvec{P}^{Kr}}^i\) represent the analytical and Kriging-predicted first Piola-Kirchhoff stress tensors, respectively. Similarly, \({\varvec{E}_0^{An}}^i\) and \({\varvec{E}_0^{Kr}}^i\) represent the analytical and Kriging-predicted material electric field, respectively.

Table 1 \(R^2(\varvec{P})\), \(R^2(\varvec{E}_0)\), \(\hat{E}_{\varvec{P}}\) and \(\hat{E}_{\varvec{E}_0}\) for all six models for number of training points \(N=45\), for both Kriging and Gradient Enhanced Kriging
Table 2 \(R^2(\varvec{P})\), \(R^2(\varvec{E}_0)\), \(\hat{E}_{\varvec{P}}\) and \(\hat{E}_{\varvec{E}_0}\) for all six models for number of training points \(N=100\), for both Kriging and Gradient enhanced Kriging

The findings of the analysis, as presented in Tables 1 and 2, offer insights into the performance of Kriging and Gradient Enhanced Kriging techniques. In both tables, the achieved \(R^2(\varvec{P})\) and \(R^2(\varvec{E}_0)\) values are notably high, approaching unity, signifying an impressive level of accuracy in predicting the first Piola–irchhoff stress tensor. This is true for all the models except for the rank-one laminate composite material, where the performance of the ordinary Kriging approach is extremely poor. Furthermore, under the consideration of the alternative metric, specifically \({\hat{E}_{\varvec{P}}, \hat{E}_{\varvec{E}_0}}\), Gradient Enhanced Kriging demonstrates a significantly superior accuracy, consistently yielding values approximately an order of magnitude smaller compared to the Kriging counterpart.

For comprehensive understanding, Fig. 4 depicts the evolution of the \(\hat{E}_{\varvec{P}}\) metric for both conventional and Gradient Enhanced Kriging methodologies. This illustration pertains to two specific constitutive models considered in Tables 1 and 2, namely the Mooney–Rivlin/ideal dielectric model and the rank-one laminate composite model. Notably, as the number of training points increases, the Gradient Enhanced technique adeptly diminishes this metric, substantiating its efficacy. On the contrary, the ordinary Kriging method is incapable of decreasing the metric \(\hat{E}_{\varvec{P}}\) as the number of infill points increases.

These observations emphasize the distinct advantage of adopting the Gradient Enhanced technique, as it facilitates precise predictions of the first Piola-Kirchhoff stress tensor \(\varvec{P}\) and of the material electric field \(\varvec{E}_0\) even when operating with an considerably small number of training points. This characteristic positions Gradient Enhanced Kriging as an exceedingly expedient and efficacious alternative in comparison to the conventional Kriging methodology.

Remark 6

Appendix A contains the material parameters used in each of the constitutive models (MR/ID, AB/ID, Gent/ID, QMR/ID, Yeoh/ID, ROL) considered for calibration of their respective Kriging and Gradient Krigind predictors. Notice that the values of these material parameters do not correspond with those of typical dielectric elastomers such as the VHB 4910 by 3 M. It is importante to emphasize that our Kriging and Gradient Kriging predictors are flexible to deal with any realistic value of material constants, and in particular, those typical of the popular VHB 4910. These materials exhibit a large disparity between the values of mechanical constants and electrical constants. For instance, the shear modulus \(\mu \) and electric permittivity \(\varepsilon \) of VHB 4910 material [26] take values of approximately \(\mu \approx 10^3-10^5\) (Pa) and \(\varepsilon \approx 10^{-11}-10^{-12}\) (F/m).

The enormous gap between the material constants of both physics (i.e. mechanics and electric physics) can ultimately pose challenges for the accurate calibration of the Kriging predictor (i.e. yielding ill-conditioning of the correlation matrix \(\varvec{R}\) (36)) or of any other type of machine learning technique. In order to remedy this, instead of considering the data generated by the model with such material constants, and in particular, the first Piola-Kirchhoff stress tensor \(\varvec{P}\), the material electric field \(\varvec{E}_0\) and the material electric displacement \(\varvec{D}_0\), we can alternatively perform the calibration with their dimensionless counterparts, \(\widetilde{\varvec{P}}\), \(\widetilde{\varvec{E}}_0\) and \(\widetilde{\varvec{D}}_0\) (notice that the deformation gradient tensor \(\varvec{F}\) is already dimensionless), respectively, defined as

$$\begin{aligned} \widetilde{\varvec{P}}=\frac{\varvec{P}}{\mu },\qquad \widetilde{\varvec{E}}_0=\sqrt{\frac{\varepsilon }{\mu }}\varvec{E}_0,\qquad \widetilde{\varvec{D}}_0=\frac{\varvec{D}_0}{\sqrt{\varepsilon \mu }}. \end{aligned}$$
(75)

Remark 7

With regards to the rank-one laminate material, in our previous publications (see Reference [42]), we demonstrated that whenever each of the phases ab of the rank-one laminate comply with the polyconvexity condition (15), and therefore with the ellipticity condition (10), the solvability of \(\varvec{\alpha }\) and \(\varvec{\beta }\) in (29) is always guaranteed. This entails that at microscopic level there is no apparent difficulty. However, the homogenised response of phases which are elliptic individually does not necessarily inherit this desirable property, and can indeed exhibit loss of ellipticity. We have not discarded this situation for the calibration of the Kriging and Gradient Kriging predictors, and in fact, some points within the data generated violate the ellipticity condition (at the macroscopic level). Despite this, the predictors can handle these situations.

5 Numerical three-dimensional examples

The analysis in Sect. 4 strongly supports the superiority of gradient enhanced Kriging over its energy-only counterpart, which lacks first derivatives. Inspired by these promising results, the primary objective of this section entails the seamless integration of gradient enhanced Kriging models into an in-house Finite Element computational framework. This assimilation endeavors to establish the accuracy and efficacy of these metamodels through meticulous juxtaposition with the Finite Element solutions provided by their respective ground truth counterparts. Specifically, this evaluation embraces intricate and exacting scenarios including complex bending and wrinkling, thus furnishing a robust appraisal of the metamodels’ performance within demanding contexts.

5.1 Electrically induced bending example: isotropic ground truth model

The inaugural exemplification within the Finite Element domain revolves around a cantilever beam configuration, as illustrated in Fig. 5. The geometric attributes and boundary conditions underpinning this scenario are succinctly elucidated in Fig. 5. Pertaining to the discretization framework, tri-quadratic Q2 Finite Elements have been judiciously employed to effectuate the interpolation of the displacement field.

Fig. 5
figure 5

Electrically induced actuation. Geometry and boundary conditions. Beam fixed at \(X_1=0\). \(\{a,b,c\}=\{120,10,1\}\) (mm)

In this illustrative instance, we examine a Mooney–Rivlin/ideal dielectric model (as expressed in Eq. (76)) as the ground truth internal energy density. We adopt the material parameters specified in Table 3. Upon subjecting the system to a voltage differential \(\Delta V\sqrt{\varepsilon /\mu _1}=0.5\) (see Table 3 for the value of \(\mu _1\)), the ensuing deformation is depicted in Fig. 6a and b for the ground truth Mooney–Rivlin model and its gradient enhanced Kriging counterpart, respectively. Evident congruity emerges between both figures. This congruence is also manifest in the contour plot of \(\sigma _{13}\), where \(\varvec{\sigma }\) denotes the Cauchy stress tensor, i.e., \(\varvec{\sigma }=J^{-1}\varvec{P}\varvec{F}^T\). These consistent parallels serve to affirm the precision and robustness of the gradient enhanced Kriging models, signifying their potential in effectively capturing the intricate behavior of the underlying physical systems.

Fig. 6
figure 6

Electrically induced actuation: a Mooney–Rivlin ground truth model; b isotropic Gradient Enhanced Kriging counterpart. Contour plot distribution of \(\sigma _{13}/\mu _1\) (see Table 3 for the value of \(\mu _1\))

5.2 Complex electrically induced bending example: rank one laminate ground truth model

The second example considers the same cantilever beam as in the preceding section, subjected to a more complex set of boundary conditions for the electric potential \(\varphi \). This can be seen in Fig. 7. Pertaining to the discretization framework, tri-quadratic Q2 Finite Elements have been judiciously employed to effectuate the interpolation of the displacement field.

Fig. 7
figure 7

Complex electrically induced actuation. Geometry and electrical boundary conditions. Displacements fixed at \(X_1=0\). \(\{a,b,c\}=\{120,10,1\}\) (mm). Electrodes highlighted with colour green (two regions in the lowest surface across the thickness and one region on the top surface) and red colour (one region in the mid surface across the thickness)

In this instance, we consider a more complex constitutive model in comparison to the precedent section, where the selection encompassed an isotropic ground truth model. Our present investigation is directed towards a rank-one laminate composite ground truth model. The homogenized internal energy governing this model is encapsulated within Eq. (81), while the associated material parameters are cataloged in Table 9. Upon the imposition of a voltage gradient denoted as \(\Delta V\sqrt{\varepsilon ^a/\mu _1^a}=2.5\) across the electrodes (see Table 9 for the value of \(\mu _1^a\) and \(\varepsilon ^a\)), the intricate phenomenon of electrically induced bending is explored for both the ground truth and Gradient Enhanced Kriging models. The outcomes of this analysis are presented in Fig. 8, effectively showcasing the marked concordance evident in the electrically induced deformations, as well as the alignment in stress distributions, between the two models.

Fig. 8
figure 8

Complex electrically induced actuation. Contour plot distribution of \(\sigma _{13}/\mu _1^a\) for various of \(\Delta V\sqrt{\mu _1^a/\varepsilon ^a}\) (see Table 9 for the value of \(\mu _1^a\) and \(\varepsilon ^a\)): a Rank-one laminate composite ground truth model; b Transversely isotropic Gradient Enhanced Kriging counterpart

For a more comprehensive evaluation of the electrically induced deformation in the context of both models (the isotropic ground truth model and its corresponding counterpart developed using the Gradient Enhanced Kriging approach), an enhanced comparative perspective is available in Fig. 9. This representation serves to underscore the notable concurrence observed between the deformation predictions of the two models.

5.3 Electrically induced wrinkles: rank one laminate ground truth model

The last example considers the geometry and boundary conditions shown in Fig. 10. This example has been previously analysed in other works by the authors [43, 44]. The squared plate is completely fixed in its borders. The voltage is grounded at the maximum value of coordinate \(X_3\) whilst a surface charge \(\omega _0=220/\sqrt{\mu ^a_1\varepsilon ^a}\) (Q\(\cdot \text {mm}^{-2} \)) (see Table 9 for the value of \(\mu _1^a\) and \(\varepsilon ^a\)) is applied at the minimum value of coordinate \(X_3\). The Finite Element discretisation considers Q2 (tri-quadratic) hexahedral Finite Elements with \(80\times 80\times 2\) elements in \(X_1\), \(X_2\) and \(X_3\) directions.

Fig. 9
figure 9

Complex electrically induced actuation for various values of \(\Delta V\). Isotropic ground truth model represented by meshed domain. Gradient Enhanced Kriging counterpart represented by magenta domain

Fig. 10
figure 10

Electrically induced wrinkles. Geometry and boundary conditions. Squared plate with side 0.06 (m) and thickness 1 (mm). Maximum applied surface charge \(\omega _0=20/\sqrt{\mu _1\varepsilon _1}\) (Q\(\cdot \text {mm}^{-2} \)) (see Table 9 for the value of \(\mu _1^a\) and \(\varepsilon ^a\)). Volumetric force of value \(9.8\times 10^{-2}\) (N/\(\hbox {m}^3\)) action in along \(X_3\) axis (in the positive direction)

The primary objective of this illustrative instance is to assess the precision of the Gradient Enhanced Kriging model within scenarios characterized by the emergence of wrinkles induced through electrical stimuli. In a specific context, our focus centers on employing the rank-one laminate model, as defined by Eq. (81), as the baseline model representing ground truth.

The pertinent material parameters integral to this model can be found in Table 9. Upon the application of an electric charge denoted as \(\omega _0\), the progression of electrically induced wrinkles is portrayed in Fig. 11, these predictions being furnished by the Gradient Enhanced Kriging model across a spectrum of escalating \(\omega _0\) values. In addition, this figure represent the evolution of the displacement along Z or \(X_3\) direction of points A and B (see Fig. 11) predicted by both ground truth and Gradient Enhanced Kriging (Emulator) models, showing a clear similarity between the paths of both models.

Fig. 11
figure 11

Electrically induced wrinkles. Wrinkling patterns for various values of surface charge \(\Lambda \omega _0\), being \(\Lambda \) the load factor. Top row: results predicted by the transversely isotropic Grandient Enhanced Kriging model, calibrated against rank-one laminate composite ground truth model. The graphics represent the evolution of the displacement along Z or \(X_3\) direction of points A and B predicted by both ground truth and Gradient Enhanced Kriging (Emulator) models

Crucially, Fig. 12 offers a side-by-side comparison of the wrinkles projected by the rank-one laminate composite model, serving as the veritable benchmark, and its concomitant representation through the Gradient Enhanced Kriging methodology. Evidently, the congruence between the patterns of electrically induced wrinkling is remarkable, further corroborated by the similarity observed in the distribution of stress patterns as depicted in the contour plots of both models.

Fig. 12
figure 12

Electrically induced wrinkles. Contour plot distribution of \(\sigma _{13}/\mu _1^a\) (see Table 9 for the value of \(\mu _1^a\)) for \(\Lambda =0.1\). a Rank-one laminate composite ground truth model; b transversely isotropic Gradient Enhanced Kriging model

6 Concluding Remarks

This manuscript introduced an innovative metamodelling technique that leverages gradient-enhanced Gaussian Process Regression or Kriging to emulate a diverse range of internal energy densities. The methodology seamlessly incorporates principal invariants as input variables for the surrogate internal energy density, thereby enforcing crucial physical constraints such as material frame indifference and symmetry. This advancement has facilitated precise interpolation not only of energy values, but also their derivatives, including the first Piola–Kirchhoff stress tensor and material electric field. Furthermore, it ensures stress and electric field-free conditions at the origin, a challenge typically encountered when employing regression-based methodologies like neural networks.

The research has indicated the inadequacy of utilizing invariants derived from the dual potential of the internal energy density, particularly the free energy density. The inherent saddle point nature of the latter diverges from the convex nature of the internal energy density, engendering complexities for models based on GPR or Gradient Enhanced GPR that rely on invariants of \(\varvec{F}\) and \(\varvec{E}_0\) (free energy-based GPR). This is contrasted with models formulated using \(\varvec{F}\) and \(\varvec{D}_0\) (internal energy-based GPR).

Table 3 Material parameters used with the Mooney–Rivlin/ideal dielectric model

Numerical examples within a 3D Finite Element framework have been thoughtfully incorporated, rigorously assessing the accuracy of surrogate models across intricate scenarios. The comprehensive analysis juxtaposing

displacement and stress fields with ground-truth analytical models encompasses situations involving extreme bending and electrically induced wrinkles, thus showcasing the utility and accuracy of the proposed approach.