1 Introduction

Since the discovery of their unique properties, shape memory alloys are used in many medical and engineering fields in various applications. Their frequent appearance stems from their unique features, like the shape memory effect and superelasticity. Both effects emerge from the characteristic first-order phase transition from the austenite to the martensite state and vice versa. While the occurring crystallographic effects involving the detwinning of the martensite phase, which enables the shape memory effect, are well understood, the thermomechanical modeling of the occurring effects is not straightforward. In recent times, there has been a high effort to improve shape memory alloy models and to obtain fitting models for specific use cases (see, for reviews, Section 4 of Lester et al. [27] or Cisse et al. [8]). They can be roughly categorized into three classes: models based on statistical thermodynamics, models founded in micromechanics and phenomenological models.

The models based on statistical thermodynamics rely on finding the phase equilibrium through a minimization of a three well potential energy (e.g., Seelecke and Müller [43] or Govindjee et al. [16]). Since these models yield results that also consider the microstructure of the materials, they come mostly with a computational cost which is too high for large, structural simulations. Additionally, gathering required micromechanical material parameters is sometimes an elaborate task.

On the other hand, models based on micromechanics usually consider the mechanics of shape memory alloy (SMA) single-crystals. Many models are then extended into the regime of polycrystal modeling by usage of homogenization techniques (e.g., see Patoor et al. [35] and Lagoudas et al. [25] or the more recent models by Mirzaeifar et al. [29] and Yu et al. [55]). While these models consider the deep, underlying phenomena of shape memory alloys, this advantage again comes with a high computational cost. This makes it really challenging to use these models in large structural simulations of shape memory alloy actuators.

The third group of models is the class of phenomenological models.

Usually, they come with the advantage of only having macroscopic material parameters, which mostly are obtained through tensile tests at different temperatures. In recent years, due to the plethora of shape memory alloys effects and applications, many new shape memory alloy models were published, which try to include more and more physical phenomenons. They can be divided into two subgroups: models which include a geometrically linear theory (see, e.g., Auricchio et al. [3], Auricchio et al. [4] and Sedlak et al. [42]), and models which include a geometrically nonlinear theory.

The two main ways to include a geometric nonlinearity in the shape memory alloy model is to either employ a multiplicative split of the deformation gradient going back to [10, 24, 26] or to make use of an additive split (see Nemat-Nasser [32]) of the rate of deformation, which are both well known from plasticity. While models based on the additive split are computationally enticing, they only allow for small strains, while still capturing large rotations well (see, e.g., Qidwai and Lagoudas [37], Müller and Bruhns [31] or Zhang and Baxevanis [57]). On the other hand, models employing a multiplicative split are computationally more elaborate, but can represent finite stretches well (see, e.g., Reese and Christ [38], Arghavani et al. [1] or Wang et al. [49]). Additionally, there exist many new models considering geometric nonlinearities, which are limited to superelasticity (see, e.g., Bellini et al. [5], Wang et al. [48] and Rezaee-Hajidehi et al. [39]). Furthermore, some models also consider transformation induced plasticity (see, e.g., Hartl et al. [19], Xu et al. [53] and it’s extension to partial phase transformations by Scalet et al. [41]).

Many of these models capture thermomechanics, martensite reorientation and detwinning as well as superelasticity at different temperatures or allow for different elastic properties of the materials while being numerically efficient and robust. However, when trying to model bistable shape memory actuators (see the actuator design in Arivanandhan [2]), one needs a numerically robust, fully thermomechanically coupled finite strain model that also models thermal expansion and volumetric effects during phase transition (see, e.g., Potapov et al. [36]). To our knowledge, there is no model in the literature fulfilling all of the aforementioned requirements, which led to the model described in this paper. For example, the model of Wang et al. [49] considers a thermomechanically coupled finite strain theory which is capable of modeling the shape memory effect as well as superelasticity, neglecting volumetric effects due to transformation as well as different expansion coefficients of the SMA phases. The model at hand falls into the aforementioned category of phenomenological models. It is embedded into the generalized standard materials framework developed by Halphen and Nguyen [17], which was extended to thermomechanics by Yang et al. [54] and which allows to ensure thermodynamic consistency. The energies as well as the dissipation potential can be seen as an extension of Sedlak et al. [42] to the finite strain case. Because the satisfaction of the incompressibility of inelastic strains for finite strains is not as straightforward as for the small strain case, a projection method developed for plasticity is incorporated into the model (see Hurtado et al. [21] and Sielenkämper et al. [44]). Further, due to the character of the energies used in this model, special numerical treatment is necessary to solve the model equations using a Newton scheme. Since our aim is to model microactuators in which the R-Phase is not present, it is not incorporated into the model. Additionally, tension-compression anisotropy, which is an important effect in many shape memory alloys (for experimental publications see, e.g., Gall et al. [15] or Wang and Zhu [47], and for modeling approaches see, e.g., Zaki et al. [56] or Sedlak et al. [42]), is not included in the model.

In the past, numerous advancements to current microelectromechanical systems (MEMS) based on electrostatics, magnetism and electrothermal principals have been made. For example, Hoffmann et al. [20] proposed a microactuator based on electrothermal activation making use of bimetal effects. This, however, comes with the downside of low actuation frequencies and a high power consumption. Han et al. [18] proposed an electrostatic actuator-based micro-switch for photonics. Devices based on electrostatics usually can be actuated with high frequencies and are adaptable to many applications. Devices using optomechanics were developed by, e.g., Eichenfield et al. [11], which come with a rather small tuning range, but allow for a very high operation speed [9]. Despite their unique advantages, current MEMS devices are challenging to use in downsized applications, where a high work output combined with a high power efficiency and bistability is crucial. The microactuator modeled in this paper is based on a concept published by Winzek et al. [50], which utilizes a high temperature shape memory alloy with a large thermal hysteresis. This concept may overcome the aforementioned weaknesses of other actuation principles, as shape memory alloys usually have a large work output density and favorable downscaling capabilities (see, e.g., Kohl [22]). Additionally, downscaling this design is expected to drastically increase the possible actuation frequency in comparison to other, electrothermally activated actuators due to the decreasing masses and increasing thermal gradients. One further key aspect is, that the proposed actuator design requires no power in the stable states.

The paper is structured as follows: First, the energies as well as the dissipation potential is derived. Then, in Sect. 3, numerical strategies necessary to solve the model equations are discussed. The numerical results are subsequently shown in Sect. 4 before a summary and outlook concludes the paper in Sect. 5.

Notation

Throughout this paper, a direct tensor notation is preferred. Scalars and scalar valued functions are typeset using light-face italic characters, e.g., a or A. First and second-order tensors and tensor-valued functions are represented by bold-face italic letters, e.g., \(\varvec{a}\) or \(\varvec{A}\). Further, blackboard bold-faced letters are used to denote fourth-order tensors, e.g., or .

Additionally, the transpose of a second-order tensor is designated by , while the major transpose of a fourth-order tensor is given by . The symmetric and deviatoric part of a second-order tensor \(\varvec{A}\) are denoted by \(\text {sym}( \varvec{A} ) = \frac{1}{2}( \varvec{A}+ {\varvec{A}}^\mathsf{T})\) and \(\varvec{A}^\prime = \varvec{A}- \frac{1}{3}\text {tr}( \varvec{A} )\varvec{I}\), respectively. Here, \(\varvec{I}\) is the second-order identity tensor and \(\text {tr}( \varvec{A} )\) denotes the trace of \(\varvec{A}\). A double contraction of two tensors \(\varvec{A}\) and \(\varvec{B}\) is denoted by \(\varvec{A}:\varvec{B}\), while the dyadic product is denoted by \(\varvec{a}\otimes \varvec{b}\). A determinant of a tensor \(\varvec{A}\) is either designated by \(\text {det}( \varvec{A} )\) or by \(\varvec{A}\)’s third invariant .

2 Modeling of shape memory alloys

2.1 Kinematics

The deformation gradient \(\varvec{F}\) maps a line element from the reference configuration of a body with Volume \(V_0\) into the current configuration of a body with Volume V and is defined as

$$\begin{aligned} \varvec{F}= \text {Grad\!}\left( \varvec{x}(\varvec{X},t) \right) \!, \end{aligned}$$
(1)

where \(\text {Grad\!}\left( \bullet \right) \) refers to the gradient with respect to the reference configuration while \(\varvec{X}\) and \(\varvec{x}\) are the position vectors of a material point in the reference and current configuration, respectively. We consider a multiplicative split of the deformation gradient in the form

$$\begin{aligned} \varvec{F}= \varvec{F}^{\text {e}}\varvec{F}^{\text {i}}\varvec{F}^{\theta }\end{aligned}$$
(2)

(see Wang et al. [49]), where \(\varvec{F}^{\text {e}}\) is the elastic, \(\varvec{F}^{\text {i}}\) the isochoric part of the deformation due to transformationFootnote 1 and \(\varvec{F}^{\theta }\) the part which describes the volume change due to thermal expansion and transformationFootnote 2 of the deformation gradient. This is fairly similar to the multiplicative split in plasticity going back to the works of Eckart [10], Kröner [24] and Lee [26]. Further, we define the determinant \(J^{\theta }= J^{\theta }(\theta ,\xi ) = \text {det}( \varvec{F}^{\theta } )\) to be a function of the absolute temperature \(\theta \) and the martensite volume fraction \(\xi \in [0,1]\). Since the thermal deformation is assumed to be volumetric, we can express \(\varvec{F}^{\theta }\) in terms of \(J^{\theta }\) as

$$\begin{aligned} \varvec{F}^{\theta }= (J^{\theta })^{\frac{1}{3}}\varvec{I}. \end{aligned}$$
(3)

As is commonly done and will be useful later, we define the elastic and inelastic left Cauchy-Green tensors as

$$\begin{aligned} \varvec{b}^{\text {e}}= \varvec{F}^{\text {e}}\varvec{F}^{\text {e} \mathsf T}, \ \ \varvec{b}^{\text {i}}= \varvec{F}^{\text {i}}\varvec{F}^{\text {i} \mathsf T}. \end{aligned}$$
(4)

Likewise, we define the inelastic right Cauchy-Green tensor and the inelastic Green-Lagrange strain

$$\begin{aligned} \varvec{C}^{\text {i}}= \varvec{F}^{\text {i} \mathsf T}\varvec{F}^{\text {i}}, \ \ \varvec{E}^{\text {i}}= \frac{1}{2}(\varvec{C}^{\text {i}}- \varvec{I}). \end{aligned}$$
(5)

Motivated by the observation that the shape-memory effect is caused by an almostFootnote 3 volume preserving transformation of the crystal lattice, we assume \(\varvec{C}^{\text {i}}\) to be volume preserving. Therefore, because \(\text {det}( \varvec{F}^{\text {i}} )= 1\) has to hold, the determinant of the deformation gradient is given by

(6)

with \(J^{\text {e}}=\text {det}( \varvec{F}^{\text {e}} )\). Additionally, we define the velocity gradient \(\varvec{l}\) and its symmetric part \(\varvec{d}\) as

$$\begin{aligned} \varvec{l}= \dot{\varvec{F}}\varvec{F}^{\text {-1}}, \ \ \varvec{d}= \text {sym}( \varvec{l} ). \end{aligned}$$
(7)

Finally, we define the inelastic ’velocity gradient’ \(\varvec{L}^{\text {i}}\) and its symmetric part \(\varvec{D}^{\text {i}}\) in analogy to \(\varvec{l}\) and \(\varvec{d}\) by

(8)

2.2 (Im-)Balance equations

2.2.1 Momentum balances

The quasistatic linear momentum balance for a body with volume V in the current configuration is given by

$$\begin{aligned} \text {div}\!\left( {\varvec{\sigma }} \right) + \rho \varvec{b}= \varvec{0}\ \ \text {in}\ \ V, \end{aligned}$$
(9)

where \(\varvec{b}\) is the body force and \(\rho \) is the mass density. Further, Cauchy’s lemma \(\varvec{t}= {\varvec{\sigma }}\varvec{n}\) as well as the angular momentum balance \({\varvec{\sigma }}= {{\varvec{\sigma }}}^\mathsf{T}\) are assumed to hold.

2.2.2 Energy balance

Next, the balance of the energies is given by

$$\begin{aligned} \dot{u} = {\varvec{\tau }}: \varvec{d}- J \text {div}\!\left( \varvec{q} \right) + w, \end{aligned}$$
(10)

where u is the internal energy density per unit reference volume, \({\varvec{\tau }}\) denotes the Kirchhoff stress tensor, \(\varvec{q}\) is the heat flux vector in the current configuration and w represents the energy source term. Further, the divergence of the heat flux vector \(\varvec{Q}= J \varvec{F}^{\text {-1}}\varvec{q}\) with respect to the reference configuration is then given by \(J \text {div}\!\left( \varvec{q} \right) = \text {Div\!}\left( \varvec{Q} \right) \).

2.2.3 Clausius–Planck inequality

To be thermodynamically consistent, the Clausius–Planck inequality

(11)

where is the mechanical dissipation density per unit reference volume and s is the entropy density per unit reference volume, has to be fulfilled at any time. Further, the Helmholtz free energy, which is defined by \(\psi = u - \theta s\), is introduced. We assume \(\psi \) to be a function of the deformation gradient \(\varvec{F}\), the absolute temperature \(\theta \) and the internal variables \(\xi \) and \(\varvec{C}^{\text {i}}\), i.e., \(\psi = \psi (\varvec{F},\varvec{C}^{\text {i}},\xi ,\theta )\). Furthermore, we make the common assumption that

(12)

i.e., there is no energy dissipation when the internal variables are virtually fixed. As Eq. (12) must hold for arbitrary processes, one can show the following standard results:

$$\begin{aligned} {\varvec{\tau }}= \frac{\partial \psi }{\partial \varvec{F}} \varvec{F}^\mathsf{T}, \ \ s = - \frac{\partial \psi }{\partial \theta } . \end{aligned}$$
(13)

Now, it is easy to show that

(14)

with shorthand notations for the effective Mandel stress with respect to the intermediate configuration \({\varvec{\Sigma }}^{\text {i}}= -2\varvec{F}^{\text {i}}\left( \partial \psi / \partial \varvec{C}^{\text {i}}\right) \varvec{F}^{\text {i} \mathsf T}\) and the thermodynamic force \(q = -\partial \psi /\partial \xi \) associated with the martensite volume fraction \(\xi \).

2.3 Helmholtz free energy

The Helmholtz free energy density \(\psi \) for this model is assumed to be a sum of elastic, chemical and hardening-like contributions in the form

$$\begin{aligned} \psi = \psi _\text {e}(\varvec{b}^{\text {e}},\xi ) + \psi _\text {c}(\xi ,\theta ) + \psi _\text {h}(\varvec{C}^{\text {i}},\xi ) . \end{aligned}$$
(15)

2.3.1 Elastic energy

The elastic energy \(\psi _\text {e}\) is assumed to be isotropic and to follow a modified Neo-Hookean formulation:

$$\begin{aligned} \psi _\text {e}(\varvec{b}^{\text {e}},\xi ) = \frac{\lambda (\xi )}{4} (J^\text {e 2}- 1 - 2 \ln {J^{\text {e}}}) + \frac{\mu (\xi )}{2}\left( \text {tr}( \varvec{b}^{\text {e}} ) -3 -2 \ln {J^{\text {e}}}\right) . \end{aligned}$$
(16)

Here, \(\lambda (\xi )\) and \(\mu (\xi )\) are the Lamé parameters in dependence of the martensite volume fraction \(\xi \). Further, we use a Reuss-like mixture rule to estimate the elastic constants, i.e.,

$$\begin{aligned} \mu (\xi ) = \left( \frac{\xi }{\mu _{\text {M}}} + \frac{1-\xi }{\mu _{\text {A}}} \right) ^{-1}, \ \ \lambda (\xi ) = \left( \frac{\xi }{\lambda _{\text {M}}} + \frac{1-\xi }{\lambda _{\text {A}}} \right) ^{-1}. \end{aligned}$$
(17)

Here, and subsequently, the indices \(\bullet _\text {A}\) and \(\bullet _\text {M}\) refer to the austenite and martensite phase, respectively. Manipulating Eq. (13), one can show that the ordinary form

$$\begin{aligned} {\varvec{\tau }}= 2\varvec{b}^{\text {e}}\frac{\partial \psi _\text {e}}{\partial \varvec{b}^{\text {e}}} \end{aligned}$$
(18)

holds for elastic isotropy. Using Eqs. (14) and (18), one can show that

$$\begin{aligned} {\varvec{\Sigma }}^{\text {i}}= - 2 \varvec{F}^{\text {i}}\frac{\partial \psi }{\partial \varvec{C}^{\text {i}}}\varvec{F}^{\text {i} \mathsf T}= {\varvec{\Sigma }}^{\text {e}}- \underbrace{2 \varvec{F}^{\text {i}}\frac{\partial \psi _\text {h}}{\partial \varvec{C}^{\text {i}}} \varvec{F}^{\text {i} \mathsf T}}_{:={\varvec{\Sigma }}^{\text {h}}}, \end{aligned}$$
(19)

where \({\varvec{\Sigma }}^{\text {e}}= \varvec{C}^{\text {e}}\varvec{F}^{\text {e} -1}{\varvec{\tau }}\varvec{F}^{\text {e} -\mathsf{T}}\) is the Mandel stress with respect to the intermediate, elastically unloaded configuration, which is symmetric due to the assumption of elastic isotropy.

2.3.2 Chemical energy

For the chemical energy, we assume a standard relationship (see, e.g., Lexcellent et al. [28] or Panico and Brinson [34]):

$$\begin{aligned} \begin{aligned} \psi _\text {c}&= \underbrace{\left( u_0^{\text {A}} - \theta s_0^\text {A}\right) }_{\psi _0^\text {A}} - \xi \underbrace{\left( \Delta u^{\text {A}\text {M}} - \theta \Delta s^{\text {A}\text {M}} \right) }_{\Delta \psi ^{\text {A}\text {M}}} + c\left( \theta - \theta _0 - \theta \ln {\frac{\theta }{\theta _0}} \right) \\&= u_0^\text {A}- \theta s_0^\text {A}+ \xi \left( \theta - \theta _0 \right) \Delta s^{\text {A}\text {M}} + c\left( \theta - \theta _0 - \theta \ln {\frac{\theta }{\theta _0}} \right) , \end{aligned} \end{aligned}$$
(20)

where c is the specific heat capacity and \(\Delta s^{\text {AM}}\) is the difference in specific entropy of the austenite and martensite phase: \(\Delta s^{\text {AM}}= s_0^\text { A} - s_0^\text { M}\). Here, c is assumed constant (compare [42]). In Eq. (20), we made use of the definition of the equilibrium temperature of austenite and martensite, which is \(\theta _0 = \Delta u^{\text {A}\text {M}}/\Delta s^{\text {A}\text {M}}\). Further, we assume for the equilibrium temperature \(A_\text {s}>\theta _0>M_\text {s}\), where \(A_\text {s}\) is the temperature where the reverse transformation starts and \(M_\text {s}\) is the starting temperature of the forward transformation.

2.3.3 Hardening energy

Since the inelastic strains vanish as \(\xi \rightarrow 0\), the inelastic strains \(\varvec{E}^{\text {i}}\) are assumed to satisfy the relation \(\varvec{E}^{\text {i}}= \xi \varvec{E}^{\text {t}}\) (see Otsuka and Ren [33]), where \(\varvec{E}^{\text {t}}\) is a measure for the effective transformation strain. Now, the hardening-like energy adapted from Sedlak et al. [42] for finite strains is assumed to be given by

$$\begin{aligned} \psi _\text {h}= k E^{\text {int}}\xi \frac{\big \langle \varvec{E}^{\text {t}}\big \rangle ^2}{1-\big \langle \varvec{E}^{\text {t}}\big \rangle ^4}, \ \ \big \langle \varvec{E}^{\text {t}}\big \rangle = \sqrt{\tfrac{2}{3}}\frac{\left\| \varvec{E}^{\text {t}} \right\| }{k} . \end{aligned}$$
(21)

Here, k is the maximum transformation strain, \(E^{\text {int}}\) is a hardening related parameter and \(\big \langle \varvec{E}^{\text {t}}\big \rangle \) is a modified von Mises equivalent strain. Obviously, \(\psi _\text {h}\rightarrow \infty \) for \(\big \langle \varvec{E}^{\text {t}}\big \rangle \rightarrow 1\). This is desired, since it captures the martensite becoming fully detwinned, which is assumed to cost additional energy.

2.3.4 Thermal strains

Modeling thermal strains of multi-phase materials requires attention and special treatment. In this work, the determinant of the thermal part of the deformation gradient is connected to the coefficients of thermal expansion (CTEs) by

(22)

Here, \(\varepsilon ^\theta (\xi ,\theta )\) is the thermal strain. Additionally, \(\alpha _\text {M}\) and \(\alpha _\text {A}\) are the CTEs of the martensite and austenite phase, respectively. Further, \(\theta _{\text { refM}}\) and \(\theta _{\text { refA}}\) are the reference temperatures for austenite and martensite. Here, we want to note that it is necessary to distinguish the two in order to properly represent the transformation-induced volume change which is present in some shape memory alloys (see below).

Example

We assume a one-dimensional SMA rod with only one reference temperature. Further, we simplify the SMA model by assuming that martensite and austenite transformation occur at the distinct temperatures \(\theta _\text {M}\) and \(\theta _\text {A}\), and not at temperature ranges (see Fig. 1). The thermal strains are then given by \(\varepsilon ^\theta = \alpha _{\text {A}\text {M}} (\theta - \theta _\text { ref})\), with \(\alpha _{\text {A}\text {M}}=\xi \alpha _\text {M}+ (1-\xi )\alpha _\text {A}\). Furthermore, we assume that we know the shape of our rod at the thermal annealing temperature \(\theta _\text { ref}\) (see Fig. 1).

Fig. 1
figure 1

Thermal strains when only considering one reference temperature \(\theta _\text { ref}\)

During forward transformation, the jump of \(\varepsilon ^\theta \) at \(\theta _\text {M}\) can be analytically calculated and is given by

$$\begin{aligned} \Delta \varepsilon ^\theta = (\theta _\text { ref}-\theta _\text {M})(\alpha _\text {A}- \alpha _\text {M}). \end{aligned}$$
(23)

Here, we can not only see that this jump is dependent on the reference temperature, but that it can be positive or negative, depending on the reference temperature \(\theta _\text { ref}\) being larger or smaller than the forward transformation temperature \(\theta _\text {M}\).

Using two reference temperatures \(\theta _{\text {ref M}}\) and \(\theta _{\text {ref A}}\), we can not only circumvent this problem, but also model the magnitude of this jump in compliance with, e.g., the experimental results of Potapov et al. [36], where they calculated the jump \(\Delta V\) in volume from the lattice parameters for \({\hbox {Ni}_{49.8}\hbox {Ti}_{35.2}\hbox {Hf}_{15}}\) to be \({0.47\%}\). However, we want to emphasize that this effect does not play a significant role in many other shape memory alloys.

2.4 Dissipation potential

The dissipative behavior of the shape memory alloy is assumed to be governed by the dissipation potential

$$\begin{aligned} \phi (\dot{\xi }, \varvec{D}^{\text {i}},\xi ) ={\left\{ \begin{array}{ll} \phi _\text {M}, \ \ &{}\text { if}\ \dot{\xi } \ge 0 \wedge \text {tr}( \varvec{D}^{\text {i}} ) = 0,\\ \phi _\text {A}, \ \ &{}\text { if}\ \dot{\xi } <0 \wedge \text {tr}( \varvec{D}^{\text {i}} ) = 0,\\ \infty , \ \ &{}\text { else} . \end{array}\right. } \end{aligned}$$
(24)

Here, depending on the direction of transformation, either \(\phi _\text {M}\) or \(\phi _\text {A}\) are used as the active dissipation potential. As is commonly done for plasticity, the dissipation is assumed to be infinite if \(\text {tr}( \varvec{D}^{\text {i}} ) \ne 0 \), which guarantees volume-preserving inelastic deformations (compare Rockafellar [40]). Additionally, we define the dissipation potential for an increasing martensite volume fraction as

$$\begin{aligned} \phi _\text {M}= \dot{\xi } \underbrace{\Delta s^{\text {AM}}\big ( (\theta _0 - M_{\text {s}}) + \xi (M_{\text {s}}- M_{\text {f}}) \big )}_{=:Q_\text {M}(\xi )} + \sqrt{\tfrac{2}{3}}\sigma ^{\text {reo}}\left\| \varvec{D}^{\text {i}} \right\| , \end{aligned}$$
(25)

where \(M_{\text {f}}\) is the finish temperature of forward transformation and \(\sigma ^{\text {reo}}\) is the reorientation stress. Likewise, the dissipation potential when going to a higher austenite volume fraction is defined as

$$\begin{aligned} \phi _\text {A}= \dot{\xi } \underbrace{\Delta s^{\text {AM}}\big ( (\theta _0 - A_{\text {f}}) + \xi (A_{\text {f}}- A_{\text {s}}) \big )}_{=:Q_\text {A}(\xi )} + \sqrt{\tfrac{2}{3}}\sigma ^{\text {reo}}\left( \left\| \frac{\dot{\xi }}{\xi }{\varvec{\varepsilon }}^{\text {i}} \right\| + \left\| \varvec{D}^{\text {i}}- \frac{\dot{\xi }}{\xi }{\varvec{\varepsilon }}^{\text {i}} \right\| \right) . \end{aligned}$$
(26)

Here, \(A_{\text {f}}\) is the finish temperature of reverse transformation and \({\varvec{\varepsilon }}^{\text {i}}= \frac{1}{2}\ln {\varvec{b}^{\text {i}}}\). The significance of using the logarithmic inelastic strain \({\varvec{\varepsilon }}^{\text {i}}\) as a strain measure for the dissipation is explained below. The dissipation potential represents a geometrically nonlinear generalization of the small strain potential proposed in Sedlak et al. [42], which is based on the works of Bernardini and Pence [6], Panico and Brinson [34] and Moumni et al. [30].

2.5 Transformation/yield criteria and inelastic evolution equations

It is assumed that the evolution of \(\varvec{C}^{\text {i}}\) and \(\xi \) follows from the minimization problem

$$\begin{aligned} \inf \limits _{{{\varvec{D}}^\text {i}},\dot{\xi }} \dot{\psi } + \phi , \end{aligned}$$
(27)

which is equivalent to the Legendre–Fenchel transformed problem (compare Eqs. (14), (19))

$$\begin{aligned} \sup \limits _{{{\varvec{D}}^\text {i}},\dot{\xi }} \left( {\varvec{\Sigma }}^{\text {e}}- {\varvec{\Sigma }}^{\text {h}}\right) : \varvec{D}^{\text {i}}+ q \dot{\xi } - \phi (\dot{\xi }, \varvec{D}^{\text {i}}, \xi ) =: \phi ^{*}({\varvec{\Sigma }}^{\text {i}}, q, \xi ), \end{aligned}$$
(28)

where \(\phi ^{*}\) is the dual dissipation potential of \(\phi \):

$$\begin{aligned} \phi ^{*}({\varvec{\Sigma }}^{\text {i}},q,\xi ) = \sup \limits _{\overset{\scriptstyle {{\varvec{D}}^\text {i}},\dot{\xi }}{\text {tr}( {{\varvec{D}}^\text {i}} )=0}} {\varvec{\Sigma }}^{\text {i}}: \varvec{D}^{\text {i}}+ q \dot{\xi } - {\left\{ \begin{array}{ll} \phi _\text {M}, \ &{}\text { if} \ \dot{\xi } \ge 0 \\ \phi _\text {A}, \ &{}\text { if} \ \dot{\xi } < 0 . \end{array}\right. } \end{aligned}$$
(29)

Due to the minimization in Eq. (27), there is a tendency that for \(\dot{\xi }<0\) we get \(\varvec{D}^{\text {i}}-(\dot{\xi }/\xi ){\varvec{\varepsilon }}^{\text {i}}= \varvec{0}\) as a result of the last term in Eq. (26). In that case \(\varvec{b}^{\text {i}}\rightarrow \varvec{I}\) as \(\xi \rightarrow 0\) in time (for a proof, see Appendix A), i.e., for vanishing martensite content, the inelastic strain vanishes, which is a physical necessity. With the reparametrization \(\varvec{D}^{\text {t}}= \varvec{D}^{\text {i}}- ({\dot{\xi }}/{\xi }) {\varvec{\varepsilon }}^{\text {i}}\) (compare Eq. (26)) we find

$$\begin{aligned} \phi ^{*}({\varvec{\Sigma }}^{\text {i}},q,\xi ) = \sup {\left\{ \begin{array}{ll} \sup \limits _{{\mathop {\text {tr}( {{\varvec{D}}^\text {i}} )=0}\limits ^{\scriptstyle {{\varvec{D}}^\text {i}},\dot{\xi }\ge 0}}} {\varvec{\Sigma }}^{\text {i}}: \varvec{D}^{\text {i}}- \sqrt{\tfrac{2}{3}}\sigma ^{\text {reo}}\left\| \varvec{D}^{\text {i}} \right\| + \big ( q - Q_\text {M}(\xi ) \big )\dot{\xi }, \\ \sup \limits _{{\mathop {\text {tr}( {\varvec{D}}^\text {t} )=0}\limits ^{\scriptstyle {\varvec{D}}^\text {t},\dot{\xi }<0}}} {\varvec{\Sigma }}^{\text {i}}: \varvec{D}^{\text {t}}- \sqrt{\tfrac{2}{3}}\sigma ^{\text {reo}}\left\| \varvec{D}^{\text {t}} \right\| + \left( \frac{1}{\xi } {\varvec{\Sigma }}^{\text {i}}: {\varvec{\varepsilon }}^{\text {i}}+ q - Q_\text {A}(\xi ) + \sqrt{\tfrac{2}{3}}\sigma ^{\text {reo}}\frac{\left\| {\varvec{\varepsilon }}^{\text {i}} \right\| }{\xi } \right) \dot{\xi }, \end{array}\right. } \end{aligned}$$
(30)

where \(Q_\text {M}\) and \(Q_\text {A}\) are defined in Eqs. (25) and (26) and capture the transformation hysteresis due to temperature change. Further reparametrizing \(\varvec{D}^\text { i / t} = \lambda \varvec{N}\) with \(\lambda \ge 0\) and the obvious solution

$$\begin{aligned} \varvec{N}= \frac{{\varvec{\Sigma }}^{\text {i} \prime }}{\left\| {\varvec{\Sigma }}^{\text {i} \prime } \right\| }, \end{aligned}$$
(31)

it follows that

$$\begin{aligned} \phi ^{*}({\varvec{\Sigma }}^{\text {i}},q,\xi ) = \sup {\left\{ \begin{array}{ll} \sup \limits _{\lambda \ge 0,\dot{\xi }\ge 0} \lambda f({\varvec{\Sigma }}^{\text {i}}) + \dot{\xi }g_\text {M}(q)\\ \sup \limits _{\lambda \ge 0,\dot{\xi }< 0} \lambda f({\varvec{\Sigma }}^{\text {i}}) + \dot{\xi }g_\text {A}(q,{\varvec{\Sigma }}^{\text {i}}) \end{array}\right. } = {\left\{ \begin{array}{ll} 0,\ &{}\text { if}\ f\le 0 \wedge g_\text {M}\le 0 \wedge g_\text {A}\ge 0,\\ \infty , \ &{}\text { else}. \end{array}\right. } \end{aligned}$$
(32)

Thus, we find the following transformation and yield criteria in the classical Karush-Kuhn-Tucker form as well as evolution equations:

$$\begin{aligned} \begin{aligned} f({\varvec{\Sigma }}^{\text {i}})&= \left\| {\varvec{\Sigma }}^{\text {i} \prime } \right\| - \sqrt{\tfrac{2}{3}}\sigma ^{\text {reo}}\le 0; \ \&\lambda f&= 0;\ \&\lambda&\ge 0; \ \&\varvec{D}^{\text {i}}&= \lambda \frac{\partial f}{\partial {\varvec{\Sigma }}^{\text {i}}};\\ g_\text {M}(q)&= q - Q_\text {M}(\xi ) \le 0; \ \&\langle \dot{\xi }\rangle g_\text {M}&= 0; \ \&\dot{\xi }&> 0 \&\text {possible if } g_\text {M}&= 0; \\ g_\text {A}(q,{\varvec{\Sigma }}^{\text {i}})&= \frac{1}{\xi } {\varvec{\Sigma }}^{\text {i}}:{\varvec{\varepsilon }}^{\text {i}}+ q - Q_\text {A}(\xi ) + \sqrt{\tfrac{2}{3}}\frac{\sigma ^{\text {reo}}}{\xi }\left\| {\varvec{\varepsilon }}^{\text {i}} \right\| \ge 0 ;\ \&\langle \dot{\xi }\rangle \_g_\text {A}&= 0; \ \&\dot{\xi }&< 0 \&\text {possible if } g_\text {A}&= 0; \end{aligned} \end{aligned}$$
(33)

with the Macaulay bracket \(\langle \bullet \rangle = ( \bullet + |\bullet |)/2\) and its modified form \(\langle \bullet \rangle \_ = ( \bullet - |\bullet |)/2\). It is then straightforward to prove thermodynamic consistency (see Eq. (14)). The ’0’-branch in Eq. (32) implies that (see Eq. (28))

(34)

Thus, we find with Eqs. (4) and (10) the following alternative form of the energy balance:

$$\begin{aligned} \theta \dot{s} = \phi - \text {Div\!}\left( \varvec{Q} \right) + w . \end{aligned}$$
(35)

2.6 Rate potential

To simplify the discussion, we start by the isothermal case, i.e., in a first step, the temperature \(\theta \) is considered to be a given parameter. The rate potential and its time discretized form read

$$\begin{aligned} \begin{aligned} \pi&= \dot{\psi }+ \phi \simeq \frac{1}{\Delta t}\left( \psi (\varvec{F},\varvec{C}^{\text {i}},\xi ,\theta ) - \psi _n +\phi _{\Delta }\right) . \end{aligned} \end{aligned}$$
(36)

Here, \(\psi _n\) refers to the Helmholtz free energy at the previous time, \(\Delta t\) is the time step from \(t_n\) to \(t_{n+1}\)Footnote 4 and \(\phi _{\Delta }\) is the time discretized version of the dissipation potential multiplied by \(\Delta t\), defined by

(37)

with \(\text {sg}\left( \bullet \right) \) referring to the sign function and \(\Delta \xi = \xi - \xi _n\). Additionally, is the third invariant of \(\varvec{C}^{\text {i}}\), i.e.,

(38)

The constraint in Eq. (37) is consistent with the requirement \(\text {tr}( \varvec{D}^{\text {i}} ) = 0\) (see Eq. (24)). Additionally, we compute the effective inelastic strain increment \(\Delta \alpha \) as

$$\begin{aligned} \Delta \alpha = \sqrt{\tfrac{2}{3}}\left\| \frac{1}{2}\varvec{U}^{\text {i}- 1}_n\Delta _\text {r}\varvec{C}^{\text {i}}\varvec{U}^{\text {i}- 1}_n \right\| =: \sqrt{\tfrac{2}{3}}\left\| \frac{1}{2}\Delta _\text {r}\varvec{C}^{\text {i}} \right\| _{{{\varvec{U}}^{\text {i}- 1}_n}}, \end{aligned}$$
(39)

where the inelastic right stretch is \(\varvec{U}^{\text {i}}= \sqrt{\varvec{C}^{\text {i}}}\), \(\Delta _\text {r}\varvec{C}^{\text {i}}= \varvec{C}^{\text {i}}- \varvec{C}^{\text {i}}_\text {r}\) and the shorthand notation \(\left\| \bullet \right\| _{{{\varvec{U}}^{\text {i}- 1}_n}}=\left\| \varvec{U}^{\text {i}- 1}_n\bullet \varvec{U}^{\text {i}- 1}_n \right\| \) is used. Further, we define \(\varvec{C}^{\text {i}}_\text {r}\) as

$$\begin{aligned} \varvec{C}^{\text {i}}_\text {r}= \left( \varvec{C}_n^{\text {i}}\right) ^{1+\big \langle \frac{\Delta \xi }{\xi _n}\big \rangle _-}. \end{aligned}$$
(40)

Now, for \(\xi \) going to zero in any given step, i.e., \(\xi = \xi _n+ \Delta \xi = 0 \Leftrightarrow \Delta \xi = -\xi _n\rightarrow \varvec{C}^{\text {i}}_\text {r}= \left( \varvec{C}_n^{\text {i}}\right) ^0=\varvec{I}\), \(\varvec{C}^{\text {i}}_\text {r}\) goes back to unity again. Thus, the time discrete potential (37) based on definition (40) of \(\varvec{C}^{\text {i}}_\text {r}\) is the key to ensure that the inelastic strain consistently disappears during the transformation from martensite to austenite. Further, the function \(Q(\varvec{C}_n^{\text {i}},\xi _{n+\frac{1}{2}},\text {sg}\left( \Delta \xi \right) )\) is given by

$$\begin{aligned} Q(\varvec{C}_n^{\text {i}},\xi _{n+\frac{1}{2}},\text {sg}\left( \Delta \xi \right) ) = {\left\{ \begin{array}{ll} \Delta s^{\text {AM}}\big ((\theta _0 - M_{\text {s}}) + \xi _{n+\frac{1}{2}}(M_{\text {s}}- M_{\text {f}})\big ), \ &{}\text { if} \ \Delta \xi \ge 0,\\ \Delta s^{\text {AM}}\big ((\theta _0 - A_{\text {f}}) + \xi _{n+\frac{1}{2}}(A_{\text {f}}- A_{\text {s}})\big ) - \sqrt{\tfrac{2}{3}}\frac{\sigma ^{\text {reo}}}{\xi _n} \left\| {\varvec{\varepsilon }}^{\text {i}}_n \right\| ,\ &{}\text { if}\ \Delta \xi < 0, \end{array}\right. } \end{aligned}$$
(41)

where \(\xi _{n+\frac{1}{2}} = (\xi +\xi _n)/2\) is the midpoint evaluation of \(\xi \), which is employed to obtain a reasonable transformation when the material is not stressed (see Frost et al. [14] for a similar concept). Moreover, we used \(\varvec{C}_n^{\text {i}}\) and \(\xi _n\) for the reverse transformation in Eq. (41) to circumvent the eigenvalue problem as well as its linearization in every local Newton iteration. In this way, it suffices to solve the eigenvalue problem once per time step.

Using some further shorthand notations for constant terms in Q, we obtain

$$\begin{aligned} \begin{aligned} \phi _{\Delta }&= \sigma ^{\text {reo}}\Delta \alpha \\&\quad + {\left\{ \begin{array}{ll} \overbrace{\Delta s^{\text {AM}}\big ((\theta _0-M_{\text {s}})+\xi _n(M_{\text {s}}-M_{\text {f}})\big )}^{Q^{\text {M}0}} \Delta \xi + \frac{1}{2}\overbrace{\Delta s^{\text {AM}}(M_{\text {s}}-M_{\text {f}})}^{H^\text {M}} \Delta \xi ^2, &{}\text {if}\ \Delta \xi \ge 0\\ \overbrace{\Big (\Delta s^{\text {AM}}\big ((\theta _0-A_{\text {f}})+\xi _n(A_{\text {f}}-A_{\text {s}})\big ) - \sqrt{\tfrac{2}{3}}\frac{\sigma ^{\text {reo}}}{\xi _n}\left\| {\varvec{\varepsilon }}^{\text {i}}_n \right\| \Big )}^{Q^{\text {A}0}} \Delta \xi + \frac{1}{2}\overbrace{\Delta s^{\text {AM}}(A_{\text {f}}-A_{\text {s}})}^{H^\text {A}} \Delta \xi ^2, \ &{}\text {if}\ \Delta \xi<0 \end{array}\right. }\\&=\sigma ^{\text {reo}}\Delta \alpha + {\left\{ \begin{array}{ll} Q^{\text {M}0}\Delta \xi + \frac{1}{2}H^\text {M}(\Delta \xi )^2, \ &{}\text {if}\ \Delta \xi \ge 0 \\ Q^{\text {A}0}\Delta \xi + \frac{1}{2}H^\text {A}(\Delta \xi )^2, \ &{}\text {if}\ \Delta \xi <0 \end{array}\right. }\\&= \sigma ^{\text {reo}}\Delta \alpha + Q (\text {sg}\left( \Delta \xi \right) )\Delta \xi + \frac{1}{2}H(\text {sg}\left( \Delta \xi \right) )(\Delta \xi )^2, \end{aligned} \end{aligned}$$
(42)

where H and Q summarize \(H^\text {M}\), \(H^\text {A}\), \(Q^{\text {M}0}\) and \(Q^{\text {A}0}\), respectively. Further, \(\text {det}( \varvec{C}^{\text {i}}_\text {r} ) = \text {det}( \varvec{C}_n^{\text {i}} ) = 1\) clearly holds when looking at the definition of \(\varvec{C}^{\text {i}}_\text {r}\) in Eq. (40). Additionally, the consistency of \(\phi _\Delta \) with the time-continuous theory is trivially proven except for \(\Delta \alpha \). Therefore, we have a look at the approximation of \(a^{1+x}\) at \(x\ll 1\):

$$\begin{aligned} a^{1+x} {\mathop {\approx }\limits ^{|x|\ll 1}} a + x a \ln {a} \end{aligned}$$
(43)

as \(\ln {a^x} \approx a^x-1\). Hence, we can approximate \(\varvec{C}^{\text {i}}_\text {r}\) as

$$\begin{aligned} \varvec{C}^{\text {i}}_\text {r}{\mathop {\approx }\limits ^{|\Delta \xi |\ll 1}}\varvec{C}_n^{\text {i}}+ \varvec{C}_n^{\text {i}}\ln ({\varvec{C}_n^{\text {i}}}) \left\langle \frac{\Delta \xi }{\xi _n}\right\rangle _{-}. \end{aligned}$$
(44)

We can use this result for computation of the time-discrete derivative of \(\alpha \):

$$\begin{aligned} \begin{aligned} \frac{\Delta \alpha }{\Delta t}&\approx \sqrt{\tfrac{2}{3}}\left\| \frac{1}{2}\varvec{U}^{\text {i}- 1}_n\frac{1}{\Delta t} \Delta \varvec{C}^{\text {i}}\varvec{U}^{\text {i}- 1}_n- \frac{1}{2}\ln (\varvec{C}_n^{\text {i}}) \left\langle \frac{\Delta \xi }{\Delta t \xi _n}\right\rangle _- \right\| \\&=\sqrt{\tfrac{2}{3}}\left\| \frac{1}{2}\varvec{F}^{\text {i} -\mathsf{T}}_{\text {n}}\frac{1}{\Delta t} \Delta \varvec{C}^{\text {i}}\varvec{F}^{\text {i} -1}_{\text {n}}- \frac{1}{2\xi _n} \ln {(\varvec{b}^\text {i}_n)} \left\langle \frac{\Delta \xi }{\Delta t}\right\rangle _- \right\| . \end{aligned} \end{aligned}$$
(45)

Therefore, with the time step width going to zero, we obtain

$$\begin{aligned} \Delta t \rightarrow 0 \Rightarrow \sqrt{\tfrac{2}{3}}\left\| \varvec{D}^{\text {i}}- \frac{\dot{\xi }}{2\xi } \ln {(\varvec{b}^{\text {i}})} \right\| = \dot{\alpha }, \end{aligned}$$
(46)

which is consistent with the time-continuous Eq. (26). Furthermore, since , we get for the time derivative of :

(47)

Hence, since \(\phi _{\Delta }\) is time-continuous, we can use the discretized potential to solve for \(\varvec{C}^{\text {i}}\) and \(\xi \):

$$\begin{aligned} (\varvec{C}^{\text {i}},\xi ) = \arg \inf \limits _{{\varvec{C}}^{\text {i}},\xi } ( \underbrace{\psi -\psi _n+\phi _{\Delta }}_{\pi _{\Delta }}) = \arg \inf \limits _{{\varvec{C}}^{\text {i}},\xi } \pi _{\Delta }(\varvec{F},\varvec{C}^{\text {i}},\xi ,\theta ). \end{aligned}$$
(48)

3 Numerical strategies

The potential \(\pi _{\Delta }\), as we formulated it in Eq. (48) carries two major numerical difficulties. First, \(\varvec{C}^{\text {i}}\) is constrained to be volume preserving, i.e., \(\text {det}( \varvec{C}^{\text {i}} ) = 1\), which is very important to be exactly satisfied to comply with physics and not accumulate errors. Second, \(\pi _{\Delta }\) is not differentiable at \(\Delta _\text {r}\varvec{C}^{\text {i}}= 0\). The strategies employed to overcome these and other numerical difficulties as well as general numerical approaches are presented in this section.

3.1 Inelastic volume preservation

To deal with the constraint in \(\varvec{C}^{\text {i}}\), we employ a strategy using a projection of \(\varvec{C}^{\text {i}}\) into the space of unimodular tensors, which was introduced by Hurtado et al. [21] for crystal plasticity. Our approach here closely follows the approach presented in Sielenkämper et al. [44].

First, we express \(\varvec{C}^{\text {i}}\) in terms of the unconstrained inelastic auxiliary right Cauchy-Green tensor :

(49)

Thus, \(\text {det}( \varvec{C}^{\text {i}} ) = 1\) is automatically satisfied. This idea is borrowed from various formulations in hyperelasticity, where similar approaches are used to decouple volumetric and deviatoric deformations (see, e.g., Flory and Volkenstein [13] or Simo et al. [45]). Now, we replace \(\varvec{C}^{\text {i}}\) by in the minimization problem in Eq. (48):

(50)

However, while this removes the constraint from the minimization problem, \(\pi _{\Delta }\) is now invariant with respect to changes of , and is therefore not uniquely solvable. For this reason, we add the regularization energy

(51)

to \(\pi _{\Delta }\). We want to emphasize that this has no effect on the solution. This is due to the fact that is exactly satisfied after converging to a solution, since \(\psi _\text {r}\) is the only term in \(\pi _{\Delta }\) that is dependent on . Therefore, \(\psi _\text {r}\) is zero at any solution of the minimization problem. Further, the constants A and B can be chosen arbitrarily. They do not have to be chosen particularly large. In this paper, we chose \(B = 1\), but any other positive value could be chosen and would lead to the exact same results. In this special case, is exactly satisfied once the solution algorithm is converged. However, we want to note that this equality does not hold in the not yet converged state.

The following tensor, which is used in the residuals and stiffness terms later, is also known from hyperelastic models:

(52)

where is the fourth-order identity on symmetric second-order tensors.

3.2 Differentiability at \(\Delta _\text {r}\varvec{C}^{\text {i}}= 0\)

The ansatz renders the solution \(\varvec{C}^{\text {i}}\) a priori volume preserving, but it does not solve the lack of differentiability of \(\phi _{\Delta }\) for \(\varvec{C}^{\text {i}}= \varvec{C}^{\text {i}}_\text {r}\). However, this is achieved by the following reparametrization:

(53)

where is an unconstrained, symmetric 2nd-order tensor. The final unconstrained minimization problem now reads

(54)

It is easy to show that \(\pi _{\Delta }\) is invariant with respect to . In order to render the solution unique, we further modify \(\psi _\text {r}\):

(55)

where the choice of A, with the same argument as in Sect. 3.1, still has no effect on the solution. Additionally, the last term with a very small and constant \(\epsilon \) is explained in Sect. 3.5. Now, the minimization problem from Eq. (54) is computed by solving the stationary conditions:

$$\begin{aligned} \frac{\partial \pi _{\Delta }}{\partial \Delta \gamma } = 0, \ \ \frac{\partial \pi _{\Delta }}{\partial \tilde{{\varvec{N}}}^\text {s}} = \varvec{0}, \ \ \frac{\partial \pi _{\Delta }}{\partial \xi } = 0, \end{aligned}$$
(56)

where in general only a subset of these three equations is involved, depending on the ’active’ variables. The active set of variables is determined by the activation or yield criteria presented in the sequel (see Sects. 3.3 and 3.10).

3.3 Algorithmic yield criterion

In a given time step, \(\varvec{C}^{\text {i}}\) will only change (i.e., \(\varvec{C}^{\text {i}}\ne \varvec{C}_n^{\text {i}}\Leftrightarrow \Delta \gamma > 0\)), if this decreases (minimizes) the potential. In order to decide whether the couple is activated for a given state \((\varvec{F},\varvec{C}^{\text {i}}_\text {r},\xi ,\theta )\), we evaluate the algorithmic yield condition

(57)

If \(f>0\), \(\Delta \gamma \) is activated, i.e., are put into the active set. Otherwise, \(\Delta \gamma = 0\) minimizes \(\pi _{\Delta }\), which corresponds to the case where are part of the active set, the related equations and concerning residuals are: \(\partial \pi _{\Delta }/ \partial \Delta \gamma {\mathop {=}\limits ^{!}}0\) and . For a given \(\varvec{F},\xi \) and \(\theta \), we obtain the explicit form of f by variation of \(\pi \): \(\Delta \gamma = 0\) if \(\pi _{\Delta }(\varvec{F},\varvec{C}^{\text {i}},\xi ,\theta )\) is minimized by \(\varvec{C}^{\text {i}}= \varvec{C}^{\text {i}}_\text {r}\). In other words, \(\Delta \gamma = 0\) is minimizer of \(\pi _{\Delta }\) if

(58)

with the shorthand notation

(59)

for an arbitrary and \(\delta \gamma \ge 0\). Further, taking a closer look at the variation of \(\pi _{\Delta }\), we obtain (compare Appendix B)

(60)

Using the definition of the unimodular projector in Eq. (52), we obtain

$$\begin{aligned} \delta \pi _{\Delta }{\mathop {=}\limits ^{(52)}} \Bigg (\underbrace{\left( 2\frac{\partial \psi }{\partial \varvec{C}^{\text {i}}}\Bigr |_{{\varvec{C}}^{\text {i}}= {\varvec{C}}^\text {i}_\text {r}} - \frac{1}{3}\left( \varvec{C}^{\text {i}}_\text {r}:2\frac{\partial \psi }{\partial \varvec{C}^{\text {i}}}\Bigr |_{{\varvec{C}}^{\text {i}}= {\varvec{C}}^\text {i}_\text {r}}\right) \varvec{C}^{\text {i}- 1}_\text {r}\right) }_{=:\,{\text {DEV}}_{{\varvec{C}}^\text {i}_\text {r}}\left( 2\frac{\partial \psi }{\partial {\varvec{C}}^{\text {i}}}\bigr |_{{\varvec{C}}^{\text {i}}= {\varvec{C}}^\text {i}_\text {r}}\right) \,=\, {\varvec{U}}^{\text {i}- 1}_\text {r}\left( 2 {\varvec{U}}^{\text {i}}_\text {r}\frac{\partial \psi }{\partial {\varvec{C}}^{\text {i}}}\Bigr |_{{\varvec{C}}^{\text {i}}= \varvec{C}^\text {i}_\text {r}}{\varvec{U}}^{\text {i}}_\text {r}\right) ^\prime {\varvec{U}}^{\text {i}- 1}_\text {r}} : {\varvec{N}^{\text {s}}}+ \sqrt{\tfrac{2}{3}}\sigma ^{\text {reo}}\Bigg ) \delta \gamma . \end{aligned}$$
(61)

With the definitions of \({\varvec{N}^{\text {s}}}\) in Eq. (59) and \({\text {DEV}}_{{\varvec{C}}^\text {i}_\text {r}}(\bullet )\) in Eq. (61) as well as making use of the fact that \(\varvec{U}^{\text {i}- 1}_n\) and \(\varvec{C}^{\text {i}- 1}_\text {r}\) are coaxial, we can simplify Eq. (60) to

(62)

Now, since inequality (58) must hold for arbitrary symmetric , we get

$$\begin{aligned} \delta \pi _{\Delta }\ge - \left( {\left\| {\text {DEV}}_{{\varvec{C}}^\text {i}_\text {r}}\left( 2\frac{\partial \psi }{\partial \varvec{C}^{\text {i}}}\Bigr |_{{\varvec{C}}^{\text {i}}= {\varvec{C}}^\text {i}_\text {r}}\right) \right\| }_{{\varvec{U}}^{\text {i}}_n} -\sqrt{\tfrac{2}{3}}\sigma ^{\text {reo}}\right) \delta \gamma \ge 0 \ \ \ \forall \delta \gamma \ge 0 . \end{aligned}$$
(63)

Thus, the algorithmic yield criterion reads

$$\begin{aligned} f = {\left\| {\text {DEV}}_{{\varvec{C}}^\text {i}_\text {r}}\left( 2\frac{\partial \psi }{\partial \varvec{C}^{\text {i}}}\Bigr |_{{\varvec{C}}^{\text {i}}= {\varvec{C}}^\text {i}_\text {r}}\right) \right\| }_{{\varvec{U}}^{\text {i}}_n} - \sqrt{\tfrac{2}{3}}\sigma ^{\text {reo}}\le 0. \end{aligned}$$
(64)

However, note that for \(\xi =0\), and thus \(\varvec{C}^{\text {i}}_\text {r}= \varvec{C}_n^{\text {i}}\), we have a simplified yield criterion

$$\begin{aligned} \begin{aligned} f&= \left\| \varvec{R}^{\text {i}}_n\left( 2 \varvec{U}^{\text {i}}_n\frac{\partial \psi }{\partial \varvec{C}^{\text {i}}}\Bigr |_{{\varvec{C}}^{\text {i}}= \varvec{C}_n^{\text {i}}} \varvec{U}^{\text {i}}_n\right) ^\prime \varvec{R}^{\text {i}\mathsf T}_n \right\| - \sqrt{\tfrac{2}{3}}\sigma ^{\text {reo}}\\&{\mathop {=}\limits ^{(19)}} \left\| \left( {\varvec{\Sigma }}^{\text {e,tr}}- {\varvec{\Sigma }}^{\text {h,tr}}\right) ^\prime \right\| - \sqrt{\tfrac{2}{3}}\sigma ^{\text {reo}}, \end{aligned} \end{aligned}$$
(65)

where the superscript ’\({\text {tr}}\)’ denotes the trial state and \(\varvec{R}^{\text {i}}= \varvec{F}^{\text {i}}\varvec{U}^{\text {i}- 1}\). This result again shows the consistency with the time-continuous theory (see Eq. (33)).

3.4 Algorithmic transformation criteria

In analogy to Sect. 3.3, the activation criteria for forward (’\(\text {M}\)’, \(\Delta \xi \ge 0\)) and reverse (’\(\text {A}\)’, \(\Delta \xi <0\)) transformations read

$$\begin{aligned} g_\text {M}= -\frac{\partial }{\partial \xi }\pi _{\Delta }\Bigr |_{\Delta \xi \rightarrow 0^+}, \ \ \ \ g_\text {A}= -\frac{\partial }{\partial \xi }\pi _{\Delta }\Bigr |_{\Delta \xi \rightarrow 0^-}, \end{aligned}$$
(66)

where \(0^+\) and \(0^-\) denote the right- and left-hand limit (recall that \(\text {sg}\left( \xi \right) \) is an argument of Q in Eq. (41)).

3.5 Initial guess for

In general, the direction of inelastic flow is determined by the minimization of \(\pi _{\Delta }\) through a Newton scheme, which requires a reasonable initial guess when \(\Delta \gamma \) is activated. Coincidentally, an analytical solution for \(\Delta \gamma \rightarrow 0\) exists for :

(67)

The proof is given in Appendix B.

It is noted that for \(\Delta \gamma = 0\), it follows that \(\varvec{C}^{\text {i}}= \varvec{C}^{\text {i}}_\text {r}\), which is independent of . That means that any choice of minimizes \(\pi _{\Delta }\), i.e., there is no unique solution. For very small \(\Delta \gamma \), the sensitivity of \(\pi _{\Delta }\) with respect to is also small, which can lead to a bad condition of the nonlinear equation system, which needs to be solved to minimize \(\pi _{\Delta }\). To stabilize the solution process, the last term in Eq. (55) is added to \(\pi \), since for \(\Delta \gamma \rightarrow 0\) it is known that minimizes \(\pi _{\Delta }\). In theory, \(\epsilon \) can be chosen arbitrarily small, in practice a finite value is necessary due to the limited numerical accuracy. Thus, the last term in Eq. (55) is used to ’guide’ the algorithm toward the right solution when it is no longer able to find it by itself.

3.6 Computing derivatives of \(\Delta \alpha \) for \(\Delta \gamma \rightarrow 0\)

The derivatives of \(\Delta \alpha \) are numerically tough to obtain for \(\Delta \gamma \rightarrow 0\), due to an almost zero denominator of \(\tilde{\varvec{M}}_{{\varvec{C}}^{\text {i}}}\) (see Eq. (112) and preceding equations). To overcome this issue, we derived the derivatives of \(\Delta \gamma \) with respect to \(\xi \), and \(\Delta \gamma \) separately for \(\Delta \gamma \rightarrow 0\), which are then numerically feasible to obtain. However, this seems to be a rather theoretical as we never observed the case that \(\Delta \gamma \) was too close to zero to obtain the derivatives using the usual way shown in Appendix C. Therefore, and for brevity, we do not discuss them here.

3.7 Regularization of \(\psi _\text {h}\)

Numerically, \(\psi _\text {h}\) is challenging, since with \(\big \langle \varvec{E}^{\text {t}}\big \rangle \rightarrow 1\), \(\psi _\text {h}\) and \(\partial \psi _\text {h}/ \partial \big \langle \varvec{E}^{\text {t}}\big \rangle \) go to infinity. This is very demanding, because values of \(\big \langle \varvec{E}^{\text {t}}\big \rangle \ge 1\) can occur during the iterative solution process of the Newton scheme, leading to a loss of convergence. The term (compare to Eq. (21))

$$\begin{aligned} \varphi (\big \langle \varvec{E}^{\text {t}}\big \rangle ) = \frac{\big \langle \varvec{E}^{\text {t}}\big \rangle ^2}{1-\big \langle \varvec{E}^{\text {t}}\big \rangle ^4} \end{aligned}$$
(68)

is illustrated in Fig. 2 in black.

Fig. 2
figure 2

Linear approximation of \(\psi _h\)

Fig. 3
figure 3

Penalty function penalizing nonphysical states

To stabilize the solution process, we partially replace the function \(\varphi \) by a linear approximation in the region where \(\big \langle \varvec{E}^{\text {t}}\big \rangle >c\), where c is close to 1 (see Fig. 2 in green). This regularization approach is adopted from crystal plasticity, where it successfully improved the numerical treatment of the power law in Wulfinghoff and Böhlke [52]. In order to prevent the solution \(\big \langle \varvec{E}^{\text {t}}\big \rangle \) from taking nonphysical solutions beyond 1, we add a penalty-type energy as illustrated in Fig. 3. We penalize all states \((\big \langle \varvec{E}^{\text {i}}\big \rangle ,\xi )\) (see Fig. 3) below the dashed half-line starting at \(\xi _0\) by the energy

$$\begin{aligned} \psi _\text {p}= {\left\{ \begin{array}{ll} \frac{1}{2}H_\text {p}l_\text {F}^2 \ \ &{} \xi < \frac{\langle {\varvec{E}}^i \rangle }{c} + \xi _0 \\ 0 &{} \text { else} \end{array}\right. } \end{aligned}$$
(69)

where \(l_\text {F}\) is the minimum distance of the point \((\big \langle \varvec{E}^{\text {i}}\big \rangle ,\xi )\) from the half-line and \(H_\text {p}\) is a large penalty parameter.

3.8 Viscosity of the martensite volume fraction \(\xi \)

To further improve the numerical robustness, we added an artificial viscosity-like term for the martensite volume fraction. Since it is a dissipative term, we add this term to the discretized dissipation potential:

$$\begin{aligned} \phi _{\Delta \eta } = \frac{1}{2 \Delta t} \eta _\xi \Delta \xi ^2, \end{aligned}$$
(70)

where \(\eta _\xi \) is a small, positive constant.

3.9 Thermomechanical coupling

Like the displacements, the temperature is an unknown for our model. Both fields are coupled through the thermomechanical energy and through the dissipative terms, which lead to a heating of the material. Additionally, the temperature changes the material behavior and leads to thermal strains. Both fields are coupled using a potential-based monolithic approach similar to the one presented in Yang et al. [54]. Further, we assume Fourier’s heat conduction law with heat conductivity \(\kappa \). The thermomechanic quasistatic time-discrete potential then reads

$$\begin{aligned} \pi _{\Delta }= \psi - (u_n - \theta s_n) + \frac{\theta }{\theta _n}\phi _{\Delta }- \frac{\Delta t}{2\theta _n}\kappa \left\| \text {Grad\!}\left( \theta \right) \right\| ^2 + \Delta t \frac{\theta }{\theta _n^2}\kappa \left\| \text {Grad\!}\left( \theta _n \right) \right\| ^2, \end{aligned}$$
(71)

where we neglect the influence of body forces. The integral form \(\Pi _{\Delta }\) of the local potential \(\pi _{\Delta }\) is then given by

$$\begin{aligned} \Pi _{\Delta }= \int \limits _{V_0} \pi _{\Delta }\text {d}V - \int \limits _{\partial V_{0\text {t}}} {\bar{\hat{\varvec{t}}}} \cdot \varvec{u}\text {d}S - \int \limits _{V_{0\text {Q}}} \frac{\Delta t}{\theta _n} \bar{Q} \theta \text {d}S + \int \limits _{V_0} \frac{\Delta t}{\theta _n}\theta w \text {d}V , \end{aligned}$$
(72)

where w denotes the heat source and \(\bar{\hat{\varvec{t}}}\) as well as \(\bar{Q}\) denote prescribed tractions and normal heat flux at the Neumann-type boundaries \(\partial V_{{0\text {t}}}\) and \(\partial V_{0\text {Q}}\), respectively. Now, the classical weak form of the linear momentum balance is obtained by variation of Eq. (71) with respect to \(\varvec{u}\):

$$\begin{aligned} \delta _{\varvec{u}} \Pi _{\Delta }= \int \limits _{V_0} {\varvec{\tau }}: \varvec{d}_\delta \text {d}V - \int \limits _{\partial V_{0t}} \bar{\hat{\varvec{t}}} \cdot \delta \varvec{u}\text {d}S {\mathop {=}\limits ^{!}} 0, \end{aligned}$$
(73)

with \(\varvec{d}_\delta = \text {sym}( \text {Grad\!}\left( \delta \varvec{u} \right) \varvec{F}^{\text {-1}} )\). Likewise, we obtain the weak form of the energy balance by variation of Eq. (71) with respect to \(\theta \):

(74)

Subsequently, we find \(\varvec{u}, \theta , \xi , \varvec{C}^{\text {i}}\) by finding the solution to the saddle point problem

$$\begin{aligned} (\varvec{u}, \theta , \xi , \varvec{C}^{\text {i}}) = \arg \inf \limits _{\varvec{u} \in \kappa _{\varvec{u}}} \sup \limits _{\theta \in \kappa _\theta } \inf \limits _{\varvec{z}} \Pi _{\Delta }, \end{aligned}$$
(75)

where \(\kappa _{{\varvec{u}}} = \{\varvec{u}: \varvec{u}= {\bar{\varvec{u}}}\ on\ \partial V_{0\text {u}}\}\) is the set of admissible displacements satisfying the Dirichlet boundary conditions imposed on the boundary \(\partial V_{0\text {u}}\) and \(\kappa _\theta = \{\theta : \theta = {\bar{\theta }}\ on\ \partial V_{0\theta }\}\) is the set of admissible temperatures satisfying the Dirichlet boundary conditions imposed on the boundary \(\partial V_{0\theta }\). Additionally, \(\varvec{z}\) is the vector of internal variables, i.e., \(\varvec{z}= (\xi ,\varvec{C}^{\text {i}})\).

Manipulating Eq. (74) using the definition \(s = -\partial \psi / \partial \theta \) as well as applying Gauss theorem, we obtain

$$\begin{aligned} \delta _\theta \Pi _{\Delta }=&\int \limits _{V_0}\left( -\Delta s+\frac{1}{\theta _n}\phi _{\Delta }+ \frac{\kappa \Delta t}{\theta _n} \text {Div\!}\left( \text {Grad\!}\left( \theta \right) \right) -\frac{\kappa \Delta t}{\theta _n^2}\text {Grad\!}\left( \theta \right) \cdot \text {Grad\!}\left( \theta _n \right) + \frac{\kappa \Delta t}{\theta _n^2}\left\| \text {Grad\!}\left( \theta _n \right) \right\| ^2\right. \nonumber \\&\left. +\frac{\Delta t}{\theta _n} w \right) \delta \theta \text {d}V + \int \limits _{\partial V_{0\text {Q}}} -\Delta t \kappa \frac{\text {Grad\!}\left( \theta \right) \cdot \varvec{N}}{\theta _n}\delta \theta \text {d}S - \int \limits _{\partial V_{0\text {Q}}} \frac{\Delta t}{\theta _n} \bar{Q} \delta \theta \text {d}S = 0, \end{aligned}$$
(76)

where \(\varvec{N}\) is the external normal on the boundary \(\partial V_0\) in the reference configuration. To show consistency with the time-continuous theory, we take a look at the integrand over the volume integral in Eq. (76). By multiplying with \(\theta _n/\Delta t\), we get

$$\begin{aligned} \theta _n \frac{\Delta s}{\Delta t} = \frac{1}{\Delta t} \phi _{\Delta }- \text {Div\!}\left( -\kappa \text {Grad\!}\left( \theta \right) \right) + w + \frac{\kappa }{\theta _n} \text {Grad\!}\left( \theta _n \right) \cdot (\text {Grad\!}\left( \theta _n \right) - \text {Grad\!}\left( \theta \right) ). \end{aligned}$$
(77)

Here, we want to note that this requires \(\kappa \) to be independent of the temperature \(\theta \). However, one possibility is to consider \(\kappa =\kappa (\theta _n)\), which could circumvent this limitation.

Now, for \(\Delta t\rightarrow 0\), we get (also see Eq. (35))

$$\begin{aligned} \theta \dot{s} = \phi - \text {Div\!}\left( -\kappa \text {Grad\!}\left( \theta \right) \right) + w. \end{aligned}$$
(78)

Hence, we arrived at the energy balance (Eq. (35)) with Fourier’s law \(\varvec{Q}= - \kappa \text {Grad\!}\left( \theta \right) \), which proves the consistency with the time-continuous theory. Further, the surface integrals in Eq. (76) imply the Neumann boundary condition \(\bar{Q} = -\kappa \text {Grad\!}\left( \theta \right) \cdot \varvec{N}\) on \(\partial V_{0\text {Q}}\). Alternatively, one can include Robin-type boundary conditions into the model by replacing the integrand \(\Delta t \bar{Q} \theta /\theta _n\) in Eq. (72) by \(1/2 \Delta t h(\theta -\theta _\text {s})^2/\theta _n\), where h is the heat convection coefficient and \(\theta _\text {s}\) is the temperature of the surrounding medium. In that case, the variation of \(\Pi _{\Delta }\) yields the boundary condition \(-\kappa \text {Grad\!}\left( \theta \right) \cdot \varvec{N}= h (\theta - \theta _\text {s})\) on \(\partial V_{0\text {Q}}\).

3.10 Active set search

When solving the set of Eq. (56), one has to decide which variables will evolve using the activation criteria given in Eqs. (64) and (66). They will then be put into the active array of variables . If then, at a later state, an activation criterion is inactive, the variable is taken from again. This is done using the active set search algorithm outlined in Algorithm 1.

figure a
figure b

The algorithm is structured as follows: First, if or no variable from was rescaled (see the end of this subsection for an explanation of rescaling) during the last iteration, we need to evaluate the activation criteria. With them at hand, we determine our new set . The details for determining which variable to activate are given in Algorithm 2. Subsequently, we check if the active set changed from the last iteration. If this is not the case, the solution from the last iteration is confirmed as solution of the equation system. In that case, we compute the algorithmic tangent (see Appendix E for details), save the history variables and exit the material routine. Otherwise, we solve the minimization problem for the current active set . This is done using a Newton scheme. First, we compute our residual and stiffness concerning active set , i.e., only derivatives with respect to the active set are computed. If the maximum iterations are exceeded or the norm of the residual is lower than the tolerance, we exit the loop. Otherwise, we check if updating \(\xi ^{j+1}\) would result in a negative \(\xi \) when applying the Newton step. If this is the case, we rescale, i.e., multiply the increments by a scalar such that \(\xi ^\star _{k+1}\) is set to \(\xi ^\text { hard}_\text { min}\), which is \({5\times 10^{-5}}\) in this work. Subsequently, we compute \(\Delta \xi \star _{k+1}\), \(\Delta \Delta \gamma ^\star _{j+1}\) and . In fact, we do not update \(\xi ^{j+1}\), \(\Delta \gamma ^{j+1}\) or yet. We update them only after exiting the Newton scheme and being sure that \(\xi \) incremented in a direction matching \(g_\text {A}\) and \(g_\text {M}\), which were calculated before the Newton loop. If, due to a bad starting solution, this is not the case, we rescale the Newton step such that \(\xi ^{j+1}=\xi _n\). If this happens, we make sure that in the next iteration of the active set algorithm, we won’t activate \(\xi \) again.

4 Numerical results

In this section, the previously presented model is tested for different examples. First, to test the model’s time convergence behavior and to show the superelastic as well as martensite reorientation behavior at high and low temperatures, respectively, thermomechanical Gauss point evaluations are conducted. Finally, a full actuator model is simulated. To cope with the thin structures occurring in the actuator, the SMA model at hand is embedded into a hexahedral element formulation with reduced integration and hourglass stabilization for the displacement, while the heat conduction terms are fully integrated.

4.1 Gauss point evaluations

For the Gauss point evaluations, thermal expansion as well as the transformation induced volume change, which is discussed in Sect. 2.3.4, are neglected for simplicity. The simulations are conducted using a reduced integration hexahedral element with hourglass stabilization, which is embedded into the finite element program FEAP [46]. The material constants used in the Gauss-point are given in Table 1.

Table 1 Material parameters

Here, it is noted that \(\theta _{\text { refA}}\) and \(\theta _{\text { refA} }\) are chosen such that \(\Delta V = 0.47\%\) (see Eq. (23)), matching the findings reported in Potapov et al. [36]. The numerical parameters are summarized in Table 2.

Table 2 Numerical parameter set 1 for simulation
Fig. 4
figure 4

Tensile test for 49 and 40,000 time steps. No error due to large load steps is visible

Table 3 Numerical parameter set 2, only for comparison in Fig. 5

Figure 4 shows a tensile test at a temperature of \({270}\,^{\circ }{\hbox {C}}\) for 49 and 40,000 time steps.

Fig. 5
figure 5

Investigation of the numerical parameters influence (set 1 is given in Table 2, set 2 in Table 3)

Clearly, convergence with regard to time step width is not an issue, as both time step widths yield accurate results.

To demonstrate that the model results indeed do not depend on the numerical parameters, we compare results for two different sets of numerical parameters in a tensile test. The first set of numerical parameters is the one given in Table 2, which is used in the remainder of this paper. The second set is defined in Table 3 just for this comparison. The results are shown in Fig. 5. Clearly, the numerical parameters do not have a noticeable effect on the results.

Next, we modeled tensile tests at \({50}\,^{\circ }{\hbox {C}}\), \({100}\,^{\circ }{\hbox {C}}\), \({160}\,^{\circ }{\hbox {C}}\) and \({200}\,^{\circ }{\hbox {C}}\). The resulting stress–strain curves are shown in Fig. 6.

Fig. 6
figure 6

Tensile test at various temperatures

For low temperatures, i.e. \({50}\,^{\circ }{\hbox {C}}\) and \({100}\,^{\circ }{\hbox {C}}\), the model predicts a remaining martensite reorientation. On the other hand, for rather high temperatures, the model captures superelastic material behavior when unloading. However, we want to emphasize that these results do not consider any damage or plastic deformations which might already occur.

Finally, the model captures the shape memory effect well. This is shown in Fig. 7, where we applied different stresses (160 MPa, 300 MPa, 400 MPa and 500 MPa) and then started a thermal cycle.

Fig. 7
figure 7

Thermal cycling tests with different stresses (160 MPa, 300 MPa, 400 MPa and 500 MPa)

For \({160}\,{\hbox {MPa}}\), one obtains only twinned martensite, i.e., \(\varvec{E}^{\text {t}}= \varvec{0}\) in the context of this work. Therefore, one only gets a small hysteresis due to the different elastic constants of austenite and martensite. For the larger prestresses, the austenite is transformed into detwinned martensite, i.e., \(\big \langle \varvec{E}^{\text {t}}\big \rangle \ne 0\). This leads to the typical shape memory effect.

4.2 Plate with a hole

In this example, we simulate a plate with a cylindrical hole (see Fig. 8), which is at first loaded by a traction \(\bar{\hat{\varvec{t}}}\), then unloaded and subsequently heated, which lets it recover the initial shape.

Fig. 8
figure 8

Sketch of the plate with a hole subjected to tension (dimensions in \({\hbox {mm}}\))

The material and numerical parameters are unchanged from Sect. 4.1, except the reference temperatures are now \(\theta _{\text {refA}}={80}\,^{\circ }{\hbox {C}}\) and \(\theta _{\text {refM}} = {87.66}\,^{\circ }{\hbox {C}}\). Due to the symmetry, using appropriate symmetry conditions, only one-eighth of the entire plate is simulated.

During the loading by the traction, the temperature at the upper and lower end is held constant at the initial temperature of \(\bar{\theta }= {80}\,^{\circ }\hbox {C}\) (\(M_{\text {s}}<\bar{\theta }<A_{\text {s}}\)). Additionally, in the initial state, the plate is fully austenite. First, the traction \(\bar{\hat{\varvec{t}}}\) is increased linearly to a maximum of \({330}\,\hbox {MPa}\) in longitudinal direction over the duration of \({50}{\text {s}}\). Subsequently, the traction \(\bar{\hat{\varvec{t}}}\) is decreased to zero over the duration of \({50}{\text {s}}\). Finally, the temperature at both ends of the plate is increased to \({200}\,^{\circ }\hbox {C}\) over the course of \({20}{\text {s}}\).

The mesh used in the eighth of the plate consists of 2907 uniformly distributed elements and is shown in Fig. 9.

Fig. 9
figure 9

Mesh of the plate with a hole with 2907 elements

Fig. 10
figure 10

Temperature and martensite volume fraction at different loading stages (see text)

For the entire simulation, 293 time steps were required, taking a total CPU-time of \({2524}{\text {s}}\).

At first, during the loading, almost the entire plate is transformed to martensite. The plate at maximum traction is shown in Fig. 10a. Here, the \(E_{xx}\) strain at the edge of the hole reaches roughly up to 18% (see Fig. 11).

Fig. 11
figure 11

Strain \(E_{xx}\) at max. traction (\(t={50}_{\text {s}}\))

Additionally, the plate heats up slightly due to the latent heat and mechanical dissipation (Fig. 10a). However, most of the heat is conducted out of the plate at the temperature Dirichlet boundaries due to the small size of the plate and the long simulated timeFootnote 5. Subsequently, after unloading, the plate does not transform back, which is shown in Fig 10b. Finally, when increasing the temperature at both ends, the martensite is transformed back to austenite, leading to a recovery of the initial shape (see Fig. 10c). Further, Fig. 12 shows the temperature and displacement of point A (see Fig. 8) over time. Here, the increase in slope of the displacement at roughly \({15}{\text {s}}\) is the point where we have a forward transformation in the vicinity of the hole. Then, the second change in slope of the displacement at roughly \({36}{\text {s}}\) is the rest of the plate starting to undergo forward transformation. At \({50}{\text {s}}\), when releasing the traction, the displacements decrease. Then again, due to thermal expansion, the displacement increases slightly after increasing the temperature at the ends at \({100}{\text {s}}\). However, when the temperature reaches the backward transformation temperatures, the material almost recovers the initial shape. The nonzero displacement at \(t={120}{\text {s}}\) stems from thermal expansion and the \(\Delta V\) effect.

Overall, the examples have been chosen such that the stresses in the vicinity of the hole reach unphysical values, at which a real material is expected to show irrecoverable strains or even fracture. Thus, this simulation rather serves as an example showing that the implementation is able to find a solution, even under severe loads.

4.3 Finite element actuator model

In this section, we model a bistable shape memory microactuator using in the finite element program FEAP [46]. The actuator concept was published by Winzek et al. [50] for large structures and is built with three layers: a bottom layer of molybdenum, a middle layer of NiTiHf and a top layer of polymethyl methacrylate (PMMA) (Winzek et al. [50] used PMMA layers on both sides). For simplicity, a hexahedral element formulation with reduced integration and hourglass stabilization for the displacements is used for all materials. The material and numerical parameters for the shape memory alloy remain unchanged (see Tables 1 and 2).

4.3.1 Actuating principle

The actuator works through interlacing of the large hysteresis of NiTiHf with the polymers hysteresis. This is shown in Fig. 13.

Fig. 12
figure 12

Temperature and displacement at point A (see Fig. 8) over time

Fig. 13
figure 13

Schematic of the interlaced hysteresis of the NiTiHf-molybdenum films and the polymer. Dashed line for the polymer stiffness, solid line for the SMA film stress

Fig. 14
figure 14

A bimorph with SMA and molybdenum (left) in comparison to the bistable actuator with an additional layer of PMMA (right). The temperatures are indicated by different thermometer colorings

It is actuated by joule heating to specific temperatures and cooling back to room temperature. Depending on the heat cycle, the martensite state or austenite state is held in place by the polymer. This is understood best when looking at the following example, which is depicted in Fig. 14, where we neglected the CTE of PMMA for simplicity. Here, we compare a bimorph of SMA and molybdenum on the left to the proposed actuator on the right. When heating from ambient temperature (blue, Fig. 14a) to a temperature above \(A_{\text {f}}\) (red), the glass transition temperature is reached, which drastically decreases the stiffness of the PMMA. Subsequently, the SMA in the bimorph as well as the bistable actuator reach reverse transformation temperatures, which makes them bend up (Fig. 14b). Now, upon cooling, the actuator reaches \(\theta _\text {g}\), which ’freezes’ it’s current shape. Thus, when subsequently reaching forward transformation temperatures, unlike the bimorph, it will not revert into the original shape at room temperature (Fig. 14c). Now, we can heat the actuator to a temperature above \(\theta _\text {g}\) but below \(A_{\text {s}}\) (orange in Fig. 14d) to soften the polymer, which makes the actuator adopt the deformation of the bimorph. When now cooling back to room temperature again, the actuator is in it’s initial stable state again and the actuation cycle could be started again. The two stable states at room temperature are given by Figs. 14c, e.

For the actuation behavior, it is important to have a homogeneous temperature profile in the actuator. Therefore, a wing-shape actuator developed in Arivanandhan et al. [2] is used to obtain rather homogeneous temperatures in the double beam cantilever.

Fig. 15
figure 15

Zoom in into the layers of the actuator (molybdenum in red, NiTiHf in blue and PMMA in orange)

To rapidly optimize the actuators properties, it turned out that building and modeling only the beam section of the actuator is advantageous. Therefore, we simulated just the beam part with suitable boundary conditions instead of simulating the entire wing structure including the attached wafer material. This thin film actuator with a zoom into the layers is shown in Fig. 15, where the Dirichlet boundary condition for the displacements and the Robin boundary condition for the temperature are indicated as well.

4.3.2 Polymer and molybdenum model

Since the interest in modeling the actuator lies rather in the states at \(\theta _1\) , \(\theta _2\) and \(\theta _3\), and less in the states between them, we chose a very simple polymer model. It is governed by a thermally coupled viscoelastic Maxwell model for finite strains (Young’s modulus and Poisson’s ratio are \({500} \hbox { MPa}\) and 0.4, respectively), where the viscosity is high (\(10^{7} \hbox { MPas}\)) at low temperatures and low (\({1} \hbox { MPas})\) at high temperatures. Therefore, it has almost no stiffness at high temperatures while it behaves almost elastically at low ones. Additionally, the glass transition temperature is given by \(\theta _\text {g}= {77}^{\circ } \hbox {C}\). The molybdenum is modeled with a thermally coupled Neo-Hookean elastic model. For the molybdenum, Young’s modulus, Poisson’s ratio, the thermal expansion coefficient as well as the reference temperature are \(E={65\times 10^{3}} {\hbox { MPa}}\), \(\nu = 0.31\), \(\alpha =5 \times 10^{-6}{\hbox {K}^{-1}}\) and \(\theta _{\text {ref}} = {500}\,^{\circ }{\hbox {C}}\), respectively.

4.3.3 Mesh convergence

The actuator considered is shown in Fig. 15. It is clamped on the left side. It has a length of 10 mm, a width of 5 mm. The layer thicknesses of the molybdenum and TiNiHf are for this work and , respectively. The polymer layer thickness is for now, before different layer thicknesses are compared in Sect. 4.3.4. For the initial conditions, we assume zero displacements as well as a temperature of \({500}\,^{\circ }{\hbox {C}}\), i.e., the temperature at which the actuator is thermally annealed in a flat state. Furthermore, we assume the material to be in its austenite state at \(t=0\), which directly implies that initially \(\varvec{C}^{\text {i}}=\varvec{I}\). Subsequently, it is cooled down to room temperature at \({20}\,^{\circ }{\hbox {C}}\), which bends the actuator due to the mismatch in the coefficients of thermal expansions and difference in cell volume between the austenite and martensite phase.

The heating cycle is realized through applying a heat source in the Mo and NiTiHf material. The heat source magnitude is modeled by a sine function, which is cut off when below 0. It is given by

$$\begin{aligned} w_h = \max (2250(\sin {(0.03t)}-0.6),0)\,\hbox {mW mm}^{-3},\ \ \ w_l = \max (0.65\cdot 2250(\sin {(0.03t)}-0.6),0)\,\hbox {mW mm}^{-3} \end{aligned}$$

for the higher and lower heat cycles, respectively.

At the top and at the bottom face of the actuator a Robin boundary condition is applied, which cools the thin film to room temperature over time. Due to their small areas, the heat transfer at the lateral faces is neglected. The surrounding air’s temperature is \({20}\,^{\circ }{\hbox {C}}\) while the convective heat transfer coefficient is assumed to be \({70}\hbox { W m}^{-2}\,\hbox { K}^{-1}\) in accordance with Kohl et al. [23].

For these thin structures, a sufficiently fine mesh is crucial to obtain converged results. Therefore, we conducted convergence studies with regard to the necessary elements in each layer and the amount of elements needed over the length and width of the actuator. First, it turned out that due to the bending deformation of the thin film, it is sufficient to only use one element over the width of the actuator. Then, the convergence with respect to the amount of elements over the length is tested using four elements over the thickness for each material layer. The results are shown in Fig. 16, where the stroke and temperature of the SMA at the tip is plotted over time. Here, only the temperature for 20 elements over the length is shown, since there was virtually no difference in temperature for the different discretizations. Further, we concluded that using 20 elements over the length leads to an acceptable error in stroke while the main features of the actuator are conveyed well. Additionally, roughly 160 time steps were used for one actuation cycle, e.g., from \({0}~\hbox {s}\) to \({180}~{\hbox {s}}\) in Fig. 16.

Fig. 16
figure 16

Stroke (left y-axis) and temperature (right y-axis, dashed in black) over time for different amounts of elements over the length of the actuator

Fig. 17
figure 17

Stroke (left y-axis) and temperature (right y-axis, dashed in black) over time for different amounts of elements over the thickness of each material layer

Fig. 18
figure 18

Stroke and temperature over time for \(\Delta V= 0.49\%\) and \(\Delta V = 0.0\%\)

Next, the convergence with regards to the elements used over the depth for each material was studied. The results are depicted in Fig. 17.

Here, for the number of elements over the SMA thickness, one element is already enough to obtain results with an acceptable error. For the molybdenum and polymer layer, one needs at least two elements to obtain converged results.

To illustrate the importance of modeling the \(\Delta V\) effect, Fig. 18 shows the stroke and temperature over time for \(\Delta V\) being 0 and \(0.49\%\). Here, one can see that the volume change contributes to large parts of the achieved stroke between the two stable states at room temperature.

4.3.4 Influence of the polymer thickness

The polymer layer thickness plays a key role for this actuator design—if it is too thin, the high temperature shape cannot be held at room temperature by the polymer layer. On the other hand, it should not be too thick, since body forces and manufacturing problems come up in that case. Further, since the polymer insulates the actuator thermally to one side, thinner polymer layers lead to the possibility of faster actuation cycles. Therefore, we tested several polymer layer thicknesses, the results are shown in Fig. 19.

Fig. 19
figure 19

Stroke (left y-axis) and temperature (right y-axis, dashed in black) over time for different PMMA layer thicknesses

First, at (see Fig. 19), the actuator is heated. At first, the larger CTE of the polymer makes the actuator bend down. Then, when reaching \(\theta _\text {g}\), the polymer softens up and the actuator relaxes. Subsequently, at , the actuator is cooled down to room temperature by the surrounding air. The polymer hardens again and the device reaches stable configuration at room temperature. Now, when heating again, the actuator bends down due to the large CTE of the PMMA at . As soon as the temperature reaches \(\theta _\text {g}\), the polymer softens up. Afterward, at the shape memory alloy reaches \(A_{\text {f}}\) and the martensite is transformed back to austenite, which makes the actuator deflect up. Finally, when we remove the source at , the polymer hardens before \(M_{\text {f}}\) is reached. Thus, depending on the polymer thickness, the polymer may hold the actuators shape or release some of it’s stroke. Finally, the actuator reaches it’s second stable state at room temperature, which is also it’s initial state.

For many actuator designs, one wants to maximize the achievable stroke while keeping the device as small and power efficient as possible. Therefore, we try to maximize the difference in stroke between states and . In turn, one must choose a sufficient polymer thickness, such that the actuator does not release too much stroke at . For example, 0.1mm of PMMA is not thick enough to hold the shape, while 0.3 mm holds the shape perfectly (see Fig. 19 at ). Depending on the actuation frequency in mind and necessary achievable stroke, either value in between might be a suitable choice.

5 Summary and outlook

In this paper, a new thermomechanical shape memory alloy model for finite strains is presented. It uses a projection method to fulfill the incompressibility constraint for the inelastic stretches. Further, the model is realized in the generalized standard material formulation being extended to thermomechanics. The optimization of the global, incremental mixed thermomechanical potential by variation yields the mechanical balance principles as well as the evolution equations of the internal variables and boundary condition integrals. The presented model employs a thermal strain formulation for the shape memory alloy which allows to describe the transformation induced volume changes found in some shape memory alloys. Using a logarithmic strain formulation, a finite strain dissipation potential incorporates vanishing inelastic strains upon austenite transformation in a manner consistent with the time-continuous case. Additionally, yield and transformation criteria as well as the algorithmically consistent tangent for the coupled problem are given and discussed. Due to numerical difficulties, a regularization of the hardening energy term is implemented.

The numerical results show, that the model is capable of producing reasonable results in tensile tests. The dependency on the temperature at which the tensile tests are carried out is captured accurately. Furthermore, it enables the solution of thin film problems.

Using the model, we are now able to estimate viable layer thicknesses and device sizes as well as fitting joule heating parameters. Additionally, it turned out that only applying polymer to one side increases the maximum actuation frequency while also introducing more deflection between the stable states and requiring more power. Moreover, we found that the \(\Delta V\) effect as well as the thermal expansion of the SMA, due to the design being very sensitive to volume changes, is fairly important for the actuators stroke and therefore needs to be modeled accurately as well. Furthermore, the model results show that the bistability at room temperature, which is enabled by the polymer locking in the deformation, is achievable for the shown actuator concept.

In the future, it remains interesting to find ways of solving the model problem without any penalty terms, which are not reasoned physically and are numerically challenging to deal with. A possible solution to this problem could be to make use of the Fischer-Burmeister [12] complementary function. In fact, this has been done in Auricchio et al. [4] for shape memory alloys and in Brepols et al. [7] for damage-plasticity with great success. Furthermore, there is still space for improvements to the algorithms convergence behavior which could increase the speed of the proposed material model, especially under severe loading conditions. Additionally, an inclusion of functional fatigue properties, e.g., a shifting of the TiNiHf transformation temperatures over several transformation cycles into the model would enable to predict the long-term behavior of the actuators.