1 Introduction

The Augmented Lagrangian Method (ALM) has a long history in optimisation. In its standard form it can be seen as augmenting standard Lagrange multiplier methods with a penalty term, penalising the constraint equations. It was introduced in order to combine the advantages of the penalty method and the multiplier method in the context on constrained optimisation independently by Hestenes and Powell in [1, 2]. It was then extended to the case of optimization with inequality constraints by Rockafellar in [3, 4]. Soon afterwards the potential of ALM for the numerical approximation of partial differential equations (pde) and computational mechanics was explored in Glowinski and Morocco [5] and by Fortin in [6]. For overviews of the early results on augmented Lagrangian methods for approximation of pde we refer to the monographs by Glowinski and coworkers [7, 8].

In computational mechanics, Lagrangian methods have the drawback of having to fulfil an inf-sup condition to ensure stability of the discrete scheme such that the balance between the discretisation of the primal variable and the multiplier variable must be chosen carefully. Adding a penalty term does not change this situation, and in computational mechanics ALM has therefore been used mostly in an iterative approach (improving the conditioning of the discrete system) [7,8,9,10,11,12,13], or as a way of strengthening control of the constraints in cases where the discretisation is under-constrained. It was also shown to improve convergence in some cases by making the penalty parameter mesh dependent in [14]. Recently similar ideas have been applied in the context of preconditioning solution methods for discretisations of incompressible flows [15, 16]. The ideas of extending the ALM to variational inequalities of [3, 4] were introduced in the context of contact mechanics by Alart and Curnier in [17].

An early approach to weak boundary conditions for finite element methods was introduced orignally by Nitsche in [18], using a method that is related to ALM, but without any multiplier. Indeed here the multiplier has been replaced by its physical representation, the normal boundary flux. Only recently this possibility of substituting the multiplier by its physical interpretation in the discrete augmented Lagrangian formulation has been explored in its generality. This approach gives rise to schemes that are formally equivalent to stabilised Lagrange multiplier methods, where the stabilisation is of Galerkin/Least Squares (GLS) type [19].

There is, however a crucial difference between the ALM and GLS stabilisation method, and that is the treatment of variational inequalities. The classical GLS formulation for variational inequalities of Barbosa and Hughes [20] is very close to standard multiplier schemes, whereas the ALM supplies an alternative way to define the stabilisation mechanism which transforms the variational inequalities to nonlinear equalities to which iterative schemes can be readily applied.

There is a very large literature on variational inequalities in pde and we can not survey the whole field herein. Below we will focus on works on finite element method formulation and error analysis. For theoretical background material relevant to the material herein we refer to [21,22,23] and for a review of computational aspects including design of special finite element spaces, adaptive method and solvers we refer to [24] and references therein.

The theoretical foundation for finite element approximation of variational inequalities was laid in the seminal works by Falk [25, 26], by Brezzi et al. [27, 28] and Haslinger [29]. For early overviews on computational aspects we refer to the monographs by Glowinski and co-workers [30, 31] and Kikuchi and Oden [32]. More recent studies of the numerical analysis of finite element methods for variational inequalities include [33,34,35,36,37,38,39]. For further work on mixed finite element methods we refer to [40,41,42,43,44,45,46]. For stabilised finite element methods in the context of variational inequalities see [20, 47,48,49,50,51,52]. More recently discontinuous Galerkin methods and other non-conforming methods allowing for polygonal elements have been developed for different types of contact problems [53,54,55,56,57,58,59,60]. Another recent development is the application of isogeometric analysis to contact problems [61,62,63,64]. Some results on fourth order problems have been reported in [41, 52, 65,66,67,68]. Some early error analyses for augmented Lagrangian finite element methods applied to variational inequalities have been proposed in [35, 69].

Optimal error estimates for the unilateral contact problem however remained elusive and typically required some additional assumptions on the interface between the zones of contact and no contact. The Nitsche ALM, where the multiplier is replaced by its physical interpretation, was first introduced and analysed for variational inequalities by Chouly and Hild [70] in the setting of friction free small deformation elastic contact (without explicit reference to augmented Lagrangians). In this context they also showed optimal error estimates without additional a priori assumptions on the contact set. A similar result for the Signorini problem using a Lagrange multiplier approach (without ALM) was derived in [71]. The idea of using ALM with eliminated multiplier for contact problems was then extended to various other models in [72,73,74,75,76,77]; for an overview, cf. [78]. Finite element methods using ALM in the form of a nonlinear equality without eliminating the multiplier was analysed in [79]. In the context of non-conforming approximation the approach has been applied in [80] and using IGA in [63, 81]. It has been explored for CutFEM applications in [82, 83], for obstacle problems in [84, 85], and for Signorini boundaries in the plate model in [86]. Typically the analysis of Nitsche’s method requires some additional regularity assumptions in order to make sense of the non-conforming terms and we will consider this case below. An analysis for low regularity solutions for Nitsche type methods applied to contact problems was proposed in [52, 87]. The reformulation of the variational inequality as a nonlinear equality with elimination of the multiplier is also advantageous in multi physics applications as illustrated in [88, 89] and to impose positivity in flow problems [90].

Our main objective in this paper is to introduce the ALM in a model context, starting with the original formulation for optimization under constraints and then presenting the extension to pde approximation in an abstract framework. Particular focus will be given to variational inequalities that are rewritten as nonlinear equalities in the ALM framework. Here we prove existence and best approximation estimates for the multiplier method under the assumption of sufficient smoothness of the multiplier. We discuss stabilised methods and sketch how these results generalize to the case where the multiplier is eliminated. The versatility of the approach is then shown by applying it in some different settings. Although the analysis is presented in a relatively simple framework we believe that the ideas have potential to extend to a larger class of optimization problem. A fruitful future approach could be to interpret the regularization of the inequality constraint in the framework of Moreau–Yosida regularization [91,92,93] allowing for more general inequality constraints [94], nonsmooth functionals [95] and further applications, for instance in control problems [96].

In Sect. 2 we start by recalling the augmented Lagrangian method in the finite dimensional setting for equality and inequality constraints and derive the augmented Lagrangian formulation for inequality constraints using the equality constraint formulation and slack variables. In Sect. 4 we then discuss the use of the augmented Lagrangian in the context of partial differential equations and present the properties of the formulation in an abstract framework. We show how the necessary a priori bounds for existence of discrete solutions are obtained and we derive best approximation estimates for the augmented Lagrangian finite element method. In Sect. 5 we proceed and give a number of different applications drawing from fluid and solid mechanics. The paper finishes with some numerical experiments in Sect. 6 showing the versatility of the proposed framework.

2 The Finite Dimensional Setting

We begin by recalling the ALM for finite dimensional optimisation problems and by giving an informal introduction to some key ideas to be used in the following. Below we will frequently use the notation \(a \lesssim b\) for \(a \le C b\), where C is constant that, in particular, is independent of the mesh size in the context of the finite element discretizations we consider below.

2.1 Optimisation with Equality Constraints

We consider the quadratic optimisation problem:

$$\begin{aligned} \min _{{{\varvec{x}}}\in \mathbb {R}^n} f({{\varvec{x}}}) \quad \text {subject to} \quad g_i({{\varvec{x}}}) = 0, \quad i=1,\dots , m \end{aligned}$$
(1)

This problem can be solved by the Lagrange multiplier method, seeking stationary points to the function

$$\begin{aligned} \mathcal {L}({{\varvec{x}}},\lambda _1,\ldots ,\lambda _m) = f({{\varvec{x}}}) +\sum _{i}\lambda _i g_i({{\varvec{x}}}) \end{aligned}$$
(2)

solving the system of equations

$$\begin{aligned} \nabla f -\sum _i\lambda _i \nabla g_i = {}&\textbf{0} \end{aligned}$$
(3)
$$\begin{aligned} g_i={}&0, \; i=1,\ldots , m \end{aligned}$$
(4)

It can also be solved approximately by the penalty method, seeking the minimum to the function

$$\begin{aligned} \mathcal {L}_\gamma ({{\varvec{x}}}) = f({{\varvec{x}}}) +\frac{\gamma }{2}\sum _{i}g_i({{\varvec{x}}})^2 \end{aligned}$$
(5)

where \(\gamma \in \mathbb {R}^+\) is a given (large) penalty parameter. We note that the penalty method has a strong regularising effect on the problem in the sense that if some of the side conditions are (close to being) linear combinations of each other, this does not matter; indeed even if \(g_j({{\varvec{x}}}) = g_1({{\varvec{x}}})\) for all j we simply solve

$$\begin{aligned} \mathcal {L}_\gamma ({{\varvec{x}}}) = f({{\varvec{x}}}) +m\frac{\gamma }{2} g_1({{\varvec{x}}})^2 \end{aligned}$$
(6)

which is a well posed problem. This is not the case in the multiplier method, where the system (3)–(4) would then be ill posed. The key point is that the side conditions do not come into play explicitly in the penalty method. On the other hand, in general the minimiser of (5) coincides with that of (1) only in the limit as \(\gamma \rightarrow \infty\). The ALM is a combination of the penalty method and the multiplier method: seek the stationary point to

$$\begin{aligned} \mathcal {L}_{\gamma }({{\varvec{x}}},{\varvec{\lambda }})&= f({{\varvec{x}}}) -\sum _{i}\lambda _i g_i({{\varvec{x}}}) +\frac{\gamma }{2}\sum _{i}g_i({{\varvec{x}}})^2 \end{aligned}$$
(7)

This problem has the same stationary point as (2) and the same stability problem in case of linearly independent side conditions. We note, however, that the multiplier can be eliminated by first solving (3), which we symbolically denote by

$$\begin{aligned} \lambda _i = \frac{df}{dg_i}({{\varvec{x}}}) \end{aligned}$$
(8)

(the multipliers can be interpreted as the change in objective with respect to change in the corresponding side condition), and seek the minimum to the reduced Lagrangian

$$\begin{aligned} \mathcal {L}_{A}({{\varvec{x}}}) = f({{\varvec{x}}}) +\sum _{i}\frac{\gamma }{2} g^2_i({{\varvec{x}}})-\frac{df}{dg_i}({{\varvec{x}}}) g_i({{\varvec{x}}}) \end{aligned}$$
(9)

Like in the penalty method, the side conditions are then no longer explicit; however, in case of linear dependence we still have an ill posed problem in solving (3) and we cannot obtain the representation (8). But say that we had an alternative way of computing the multiplier so that symbolically we had

$$\begin{aligned} \lambda _i^*({{\varvec{x}}}) \approx \frac{df}{dg_i}({{\varvec{x}}}), \quad \lambda _i^*({{\varvec{x}}})\; \text {computable} \end{aligned}$$
(10)

Then we could consider the problem of minimising

$$\begin{aligned} \mathcal {L}_{A}^{*}({{\varvec{x}}}) = f({{\varvec{x}}}) +\sum _{i}\left( \frac{\gamma }{2} g_i({{\varvec{x}}})^2-\lambda _i^*({{\varvec{x}}}) g_i({{\varvec{x}}})\right) \end{aligned}$$
(11)

The accuracy of this method would then depend on the accuracy of the approximation (10) and the stability of the formulation. A typical situation is that there is a constant such that

$$\begin{aligned} \sum _i \vert \lambda _i^*({{\varvec{x}}})\vert ^2 \le C f({{\varvec{x}}}) \end{aligned}$$
(12)

which gives

$$\begin{aligned} \mathcal {L}_{A}^{*}({{\varvec{x}}}) = {}&\sum _{i}\left( \frac{\gamma }{2} g_i({{\varvec{x}}})^2-\lambda _i^*({{\varvec{x}}}) g_i({{\varvec{x}}})\right) + f({{\varvec{x}}}) \nonumber \\ = {}&\sum _{i}\left( \frac{\gamma }{2} g_i({{\varvec{x}}})^2- \delta \vert \lambda _i^*({{\varvec{x}}})\vert ^2 - \frac{1}{4 \delta } g_i^2({{\varvec{x}}})\right) \nonumber \\&+f({{\varvec{x}}}) \nonumber \\ \ge {}&f({{\varvec{x}}}) - \delta \sum _{i} \vert \lambda _i^*({{\varvec{x}}})\vert ^2 +\sum _{i} \left( \frac{\gamma }{2} - \frac{1}{ 4 \delta } \right) g_i^2({{\varvec{x}}}) \nonumber \\ \ge {}&( 1- \delta C ) f({{\varvec{x}}}) +\sum _{i} \left( \frac{\gamma }{2} - \frac{1}{ 4 \delta } \right) g_i^2({{\varvec{x}}}) \nonumber \\ \gtrsim {}&f({{\varvec{x}}}) + \sum _{i} g_i^2({{\varvec{x}}}) \end{aligned}$$
(13)

where we obtained the last estimate by taking \(\delta\) sufficiently small and \(\gamma\) sufficiently large. We conclude that the minimization problem for \(\mathcal {L}_{A}^{*}({{\varvec{x}}})\) is well posed if \(\gamma > \gamma _C\). This is the basic idea that underlies the application of the ALM as a stabilisation method, in cases where the multiplier can be eliminated.

2.2 Optimisation with Inequality Constraints

We consider next a quadratic optimisation problems of the type:

$$\begin{aligned} \min _{x \in \mathbb {R}^n} f({{\varvec{x}}}) \;\; \text {subject to} \;\; g_i({{\varvec{x}}}) \le 0, \quad i=1,\dots , m \end{aligned}$$
(14)

The augmented Lagrangian for this problem proposed by Rockafellar [3, Eq. (7)], here with \(\gamma =2 r\), and with the multiplier chosen negative, takes the form for \(\gamma \in \mathbb {R}^+\),

$$\begin{aligned} \mathcal {L}_{A}({{\varvec{x}}},{\varvec{\lambda }}) =&f({{\varvec{x}}}) +\frac{1}{2\gamma } \sum _{i} \left( [ \gamma g_i({{\varvec{x}}}) -\lambda _i]_+^2 -\lambda _i^2\right) \end{aligned}$$
(15)

where \([x]_+ = \max (x,0)\).

Observe that another equivalent reformulation is given by

$$\begin{aligned} \mathcal {L}_{A}({{\varvec{x}}},{\varvec{\lambda }}) =&{} f({{\varvec{x}}}) -\sum _{i}\lambda _i g_i({{\varvec{x}}}) +\frac{\gamma }{2}\sum _{i}g_i({{\varvec{x}}})^2 \nonumber \\&- \frac{1}{2\gamma } \sum _{i} [ \gamma g_i({{\varvec{x}}}) -\lambda _i]_-^2 \end{aligned}$$
(16)

where \([x]_- = \min (x,0)\). This is easily seen by using that \(x = [x]_+ + [x]_-\) and hence

$$\begin{aligned}{}[ x]_+^2 = {}&([x]_+ +[x]_-)^2 - [x]_-^2 - \underbrace{2 [x]_+ [x]_-}_{=0} \\ = {}&x^2 - [x]_-^2 \end{aligned}$$

Applying this in (15) with \(x = \gamma g_i({{\varvec{x}}}) -\lambda _i\) leads to (16). In (16) we recognise the augmented Lagrangian for the equality constraint (7) in the first three terms and the last term is the non-linear switch that introduces the inequality constraint.

To see that (15) is indeed the natural formulation we introduce slack variables \(z_i \in \mathbb {R}_+\) and rewrite (14) in the form

$$\begin{aligned}&\min _{({{\varvec{x}}},{{\varvec{z}}}) \in \mathbb {R}^n \times \mathbb {R}^m_+} f({{\varvec{x}}}) \;\; \text {subject to} \;\; g_i({{\varvec{x}}})+z_i = 0, \nonumber \\&\qquad i=1,\dots ,m \end{aligned}$$
(17)

with corresponding augmented Lagrangian

$$\begin{aligned} \nonumber \mathcal {L}_{A}({{\varvec{x}}},{{\varvec{z}}},{\varvec{\lambda }}) = {}&f({{\varvec{x}}}) -\sum _i (g_i({{\varvec{x}}}) + z_i) \lambda _i \\ {}&+ \sum _i\frac{\gamma }{2} (g_i({{\varvec{x}}}) + z_i)^2 \end{aligned}$$
(18)

for which we seek stationary points, minimizing in \(({{\varvec{x}}}, {{\varvec{z}}})\). Here we may now perform the optimization over \({{\varvec{z}}}\in \mathbb {R}_+^m\) explicitly by noting that for each \({{\varvec{x}}}\) and \({\varvec{\lambda }}\) we obtain a sum of quadratic polynomials in \(z_i\) of the form

$$\begin{aligned} \nonumber&-(g_i({{\varvec{x}}}) + z_i) \lambda _i + \frac{\gamma }{2} (g_i({{\varvec{x}}}) + z_i)^2 \\&=\frac{1}{2\gamma } \Big ( (\gamma (g_i({{\varvec{x}}}) + z_i ) -\lambda _i )^2 - \lambda _i^2 \Big ) \end{aligned}$$
(19)

and therefore the minimum is attained at \(\gamma z_i = - (\gamma g_i -\lambda _i)\) and taking the constraint \(z_i \in \mathbb {R}_+\) into account we find that \(\gamma z_i = [-(\gamma g_i({{\varvec{x}}}) -\lambda _i)]_+\). Inserting this expression for \(\gamma z_i\) into (19) and using the identity \(a+[-a]_+ = [a]_+\) we arrive at

$$\begin{aligned} \nonumber \mathcal {L}_{A}({{\varvec{x}}},{{\varvec{z}}},{\varvec{\lambda }}) ={}&f({{\varvec{x}}}) + \frac{1}{2\gamma }\sum _i ([\gamma g_i({{\varvec{x}}}) -\lambda _i]_+^2 - \lambda _i^2) \end{aligned}$$
(20)

Alternatively we may seek stationary points to the standard Lagrangian

$$\begin{aligned} \mathcal {L}({{\varvec{x}}},\lambda ) = f({{\varvec{x}}}) -\sum _{i}\lambda _i g_i({{\varvec{x}}}) \end{aligned}$$
(21)

under the Karush–Kuhn–Tucker (KKT) conditions

$$\begin{aligned} g_i\le {}&0, \; i=1,\ldots , m \end{aligned}$$
(22)
$$\begin{aligned} \lambda _i\le {}&0, \; i=1,\ldots , m \end{aligned}$$
(23)
$$\begin{aligned} \lambda _ig_i= {}&0, \; i=1,\ldots , m \end{aligned}$$
(24)

Noting that the KKT conditions (22)–(24) are equivalent to the single statement

$$\begin{aligned} \lambda _i = -[\gamma g_i -\lambda _i]_+ \end{aligned}$$
(25)

where \(\gamma \in \mathbb {R}^+\) is an arbitrary positive number. We may then rewrite the Lagrangian in the form

$$\begin{aligned} \nonumber f({{\varvec{x}}}) -\sum _{i}\lambda _i g_i({{\varvec{x}}}) = {}&f({{\varvec{x}}}) -\sum _{i}\left(\lambda _i \Big ( g_i({{\varvec{x}}}) -\frac{1}{\gamma }\lambda _i \Big ) + \frac{1}{\gamma } \lambda _i^2\right)\\ \nonumber = {}&f({{\varvec{x}}}) \\ \nonumber&+ \frac{1}{\gamma } \sum _i \underbrace{[\gamma g_i({{\varvec{x}}}) -\lambda _i]_+ ( \gamma g_i({{\varvec{x}}}) + \lambda _i )}_{[\gamma g_i({{\varvec{x}}}) -\lambda _i]_+^2} \\&- \frac{1}{\gamma }\sum_i \lambda _i^2 \end{aligned}$$
(26)

where we used (25) and the fact that \([a]_+a = [a]_+^2\). The substitutions \(\lambda _i \mapsto \lambda _i/2\) and \(\gamma \mapsto 2 \gamma\) manufactures the Lagrangian (20).

Writing the optimality system of (20) results in the system of equations

$$\begin{aligned} \nabla f +\sum _i[\gamma g_i-\lambda _i]_+ \nabla g_i = {}&\textbf{0} \end{aligned}$$
(27)
$$\begin{aligned} \left[\gamma g_{i}-\lambda _i\right]_+ = {}&-\lambda _i, \; i=1,\ldots , m \end{aligned}$$
(28)

which is a nonlinear equality problem which explicitly includes the KKT conditions.

Again, if we can use (10) we may instead seek the minima to

$$\begin{aligned} \nonumber \mathcal {L}_{A}^{*}({{\varvec{x}}}) := {}&f({{\varvec{x}}}) +\frac{1}{2\gamma } \sum _{i} [\gamma g_i({{\varvec{x}}})-\lambda _i^*({{\varvec{x}}})]_+^2 \\&-\frac{1}{2\gamma } \sum _{i}(\lambda _i^*({{\varvec{x}}}))^2 \end{aligned}$$
(29)

3 Iterative Solution Using the Augmented Lagrangian

The augmented Lagrangian is possibly most well known as the basis for an iterative algorithm for constrained optimization problems. The stationary points of the functional (7) can be approximated using the following classical algorithm attributed to Usawa, with the application to augmented Lagrangian methods developed in the works of Glowinski and co-workers [7, 8, 30, 97]. Following [7] we consider the situation where the model problem is to minimize

$$\begin{aligned} J({{\varvec{x}}}) := \frac{1}{2} {{\varvec{x}}}^T {{\varvec{A}}}{{\varvec{x}}}- {{\varvec{b}}}^T {{\varvec{x}}}\end{aligned}$$

over \({{\varvec{x}}}\in \mathbb {R}^n\) under the constraint \({{\varvec{B}}}{{\varvec{x}}}= {{\varvec{c}}}\in \mathbb {R}^m\). Here \({{\varvec{A}}}\in \mathbb {R}^{n\times n}\) is symmetric positive definite, \({{\varvec{b}}}\in \mathbb {R}^n\) and \({{\varvec{B}}}\in \mathbb {R}^{m \times n}\). The augmented Lagrangian (7) then takes the form,

$$\begin{aligned} \mathcal {L}_{A}({{\varvec{x}}},{\varvec{\lambda }}) := {}&\frac{1}{2} {{\varvec{x}}}^T {{\varvec{A}}}{{\varvec{x}}}- {{\varvec{b}}}^T {{\varvec{x}}}+ {\varvec{\lambda }}^T({{\varvec{B}}}{{\varvec{x}}}-{{\varvec{c}}}) \\&+ \frac{\gamma }{2} \vert {{\varvec{B}}}\textbf{x}- {{\varvec{c}}}\vert ^2 \end{aligned}$$
(30)
figure a

We note that step 2 is equivalent to solving the linear system, find \({{\varvec{x}}}^k \in \mathbb {R}^n\) such that

$$\begin{aligned} ({{\varvec{A}}}+ \gamma {{\varvec{B}}}^T {{\varvec{B}}}) {{\varvec{x}}}^k = - {{\varvec{B}}}^T {\varvec{\lambda }}^k + {{\varvec{b}}}+ \gamma {{\varvec{B}}}^T {{\varvec{c}}}\end{aligned}$$

The iterates \({{\varvec{x}}}^k, {\varvec{\lambda }}^k\) of the iterative method converges to the saddle point of (30) provided the steplength \(\rho_k\) satisfies

$$\begin{aligned} 0< \alpha _0 \le \rho_k \le \alpha _1 < 2 \left( \gamma + \frac{1}{\beta ^2} \right) \end{aligned}$$

where \(\beta ^2\) is the largest eigenvalue of the matrix \({{\varvec{A}}}^{-1} {{\varvec{B}}}^T {{\varvec{B}}}\) defined by

$$\begin{aligned} \beta ^2 = \max _{{{\varvec{v}}}\ne 0} \frac{\vert {{\varvec{B}}}{{\varvec{v}}}\vert ^2}{{{\varvec{v}}}^T {{\varvec{A}}}{{\varvec{v}}}} \end{aligned}$$

For a proof of the convergence result we refer to [30, Chap. 2, Sect. 4] or [7, Chap. 1, Sect. 2].

4 Augmented Lagrangian Methods and Galerkin/Least Squares

We now turn to the case where the Lagrangian is a functional taking values in some Sobolev space and the numerical method is obtained by finding the stationary points in a finite dimensional approximation space. Typically we are interested in the discretisation of a problem where some energy is minimised under a constraint. To illustrate this we consider the case with equality constraints. Let V and H denote two Hilbert spaces, with dual spaces \(V'\) and \(H'\), respectively. Let \(F:V \rightarrow \mathbb {R}\) denote a strictly convex \(C^2\)-functional and \(B:V \rightarrow H\) a linear operator. We are interested in minimising F under a constraint defined by B. Given the data \(f\in V'\) and \(g\in H\) we consider the optimization problem

$$\begin{aligned} u = \text{ arginf}_{v \in V} F(v) - \left\langle f,v\right\rangle _{V',V} \text{ such } \text{ that } B u = g \end{aligned}$$
(31)

The Lagrangian takes the form

$$\begin{aligned} \mathcal {L}(v,\mu ) := F(v) - \left\langle f,v\right\rangle _{V',V}- \left\langle \mu ,Bv-g \right\rangle _{H',H} \end{aligned}$$
(32)

This problem can be shown to have unique solution under suitable hypothesis on the spaces V and H and the operators F, B, f and g (see for instance [98, Chap. 1, Sect. 2.1, Theorems 2.1 and 2.2]). Augmenting the Lagrangian has no effect on the continuous level, but formally an augmented version of (32), in the spirit of (7) can be written

$$\begin{aligned} \nonumber \mathcal {L}_\text {A}(v,\mu ) :={}&F(v) - \left\langle f,v\right\rangle _{V',V}- \left\langle \mu ,Bv-g \right\rangle _{H',H} \\&+ \frac{\gamma }{2} \Vert Bv-g\Vert _H^2 \end{aligned}$$
(33)

The discrete version of the ALM based on (33), would then be obtained by restricting \(\mathcal {L}_\text {A}\) to finite dimensional spaces. As we saw in the previous section the ALM on the discrete level combines the control of the constraint given by the Lagrange multiplier and of the penalty. It also gives us an iterative procedure to find the minimiser. When using the ALM in the context of pde problems the ALM also gives enhanced control of the side condition in the sense of a GaLS method, or a variational multiscale method. To see this we assume that \(H=H'=L^2\) and that \(H'_h \subset H'\), \(V_h \subset V\) are some finite dimensional approximation spaces. Here h denotes the characteristic lengthscale (or mesh parameter) of the discrete space. We let \(\pi _H:H \mapsto H'_h\) denote the \(L^2\)-orthogonal projection onto \(H'_h\). Since

$$\begin{aligned} \left\langle \mu _h,Bv-g \right\rangle _{L^2} = \left\langle \mu _h,\pi _H (Bv-g) \right\rangle _{L^2} \end{aligned}$$

we see that the Lagrange multiplier only gives control of the projection of \(Bv-g\) on the finite dimensional subspace \(H_h'\). This may be insufficient for the stability of the method, in particular since \(H_h'\) may need to be chosen small compared to \(V_h\) for stability reasons, i.e. to satisfy the inf-sup stability condition that we will discuss below. A classical example is the stability of the incompressibility constraint (in which case B is the divergence operator) of the Brinkman problem when the viscosity becomes negligible. Adding the term \(\Vert Bv-g\Vert _H^2\) enhances the stability, by adding control of \((I - \pi _H)(Bv-g)\) compared to the pure Lagrange multiplier method. This also shows that a sufficient stabilization can be achieved by augmenting with \(\Vert (I - \pi _H)(Bv-g)\Vert _H^2\). This we recognise as a stabilization of the orthogonal subscales, which is a member of the family of variational multiscale methods. Of course in the associated Euler-Lagrange equations these terms take the form of GLS stabilizations of some residual quantities. Indeed a number of ideas from the field of stabilized methods can be made to bear to the ALM, but we will not explore this further herein. Instead we will show in the examples below how the design of finite element methods using the ALM allows us to recover some well known GLS methods from computational mechanics.

We can discern two different situations for the continuous problem (33):

  1. A.

    The multiplier has enough regularity to define a scalar product with the side condition.

  2. B.

    The multiplier has only regularity enough to support a duality pairing with the side condition.

In the first case we can use an analogue to the reformulation (15) which is convenient for the treatment of inequality conditions, and formulate the problem on the continuous level; in the second case this is not formally correct. Indeed if the multiplier does not have sufficient regularity the augmented continuous formulation does not lead to a well-defined problem, unless the augmentation is taken in the continuous H-norm, which may be inconvenient from computational standpoint. In this case the reformulation (15) is not available. We emphasize that this is not a problem in the discrete setting since we can use norm equivalence of discrete spaces to obtain an ALM that has the right asymptotic scaling. However in order to carry out a rigorous numerical analysis of the resulting finite element method the assumption of additional regularity of the exact solution must be justifiable. This is often, but not always the case. In that sense ALM methods in the situation B can be seen as a non-conforming method.

For the discrete as well as the continuous problem we have two further cases:

  1. C.

    The multiplier has a physical interpretation in terms of the primal variable.

  2. D.

    The multiplier cannot be interpreted (or be easily interpreted) in terms of the primal variable.

For the discrete case, we also have the problem of finding suitable approximations to fulfil a discrete inf–sup condition. In case C we can use a trick analogous to that of (10), which gives a class of problems where the multiplier has been eliminated beforehand; alternatively, the multiplier can be retained and stabilised by the addition of a GLS term, in the spirit of [20, 99]. These approaches give stability without balancing the discretisation of the multiplier space and the space for the primal variable. In case D the multiplier has to be retained, but the inequality case can still be handled in the same way as above and stabilisation is still possible, for instance using interior penalty stabilization where the stabilization acts on the multiplier alone [79, 100].

4.1 Abstract Framework

Since the rationale of the method is from numerical approximation we will only consider formulations that work in the finite dimensional setting, then A and B above are treated similarly. However it is only in case A that the discussion holds also for the continuous case. The resulting numerical methods can be shown to be optimally converging for sufficiently smooth exact solutions, but the problem of convergence is not established for exact solutions that has no additional regularity. The question of how to design methods that are valid formulations also for the original pde problem is subtle and requires the design of sophisticated stabilization operators, for an interesting work in this direction we refer to [101].

We are interested in minimising F under a constraint defined by B, either as an equality or an inequality constraint. We will now introduce some sufficient conditions for the abstract analysis below to hold. We will then in the examples show that the assumptions are verified.

  1. 1.

    We assume that the operator B is bounded and surjective from V to H, so that for every \(\zeta \in H\) there exists \(\xi \in V\) such that \(B \xi = \zeta\) and \(\Vert \xi \Vert _V \le C \Vert \zeta \Vert _{H}\). It follows that there exists \(\alpha >0\) such that for every \(\mu \in H'\) there holds

    $$\begin{aligned} \alpha \Vert \mu \Vert _{H'} \le \sup _{v \in V} \frac{\left\langle B v, \mu \right\rangle _{H,H'}}{\Vert v\Vert _{V}} \end{aligned}$$
    (34)
  2. 2.

    We also assume that \(V_h\) and \(H_h'\) are chosen in such a way that this property carries over to the finite dimensional setting, in the sense that a so called Fortin interpolant exists, for all \(v \in V\) such that \(B v \in H\), there exists \(\exists i_F v \in V_h\) such that for all \(q_h \in H_h'\),

    $$\begin{aligned} \nonumber \left\langle B (v - i_F v), q_h\right\rangle = {}&0, \\ \Vert i_F v\Vert _V+ \Vert B i_F v\Vert _{H_h} \lesssim {}&\Vert v\Vert _V + \Vert B v\Vert _{H_h} \end{aligned}$$
    (35)

    and note that for \(v \in V_h\) there holds \(i_F v = v\). Here and below \(\left\langle \cdot ,\cdot \right\rangle\) denotes the \(L^2\) scalar product over the the domain of definition of functions in H and we denote the associated norm \(\Vert v\Vert := \left\langle v,v\right\rangle ^{\frac{1}{2}}\).

  3. 3.

    We assume that the surjectivity also holds for the discrete spaces on the following form: for all \(\mu _h \in H_h'\) there exists \(v_h \in V_h\) such that for all \(q_h \in H_h'\),

    $$\begin{aligned} \nonumber \left\langle \mu _h -B v_h, q_h\right\rangle = {}&0, \\ \Vert v_h\Vert _V + \Vert B v_h\Vert _{H_h} \lesssim {}&\Vert \mu _h\Vert _{H_h} \end{aligned}$$
    (36)

Discrete surjectivity is a consequence of the discrete inf-sup condition which typically is equivalent with the existence of the Fortin interpolant [102, Lemma 26.9]. We state both (35) and (36) separately here for future reference and to highlight the difference of the norms required in the right hand side. If we are in a non-conforming situation it is not immediately clear that equivalence holds. Note however that if the spaces are such that \(\Vert B v - \pi _{H_h} B v\Vert _{H_h} \le \Vert v\Vert _V\) then (36) implies (35).

The form of the stabilities in (35) and (36) appear a bit ad hoc here, but as we shall see below this is the natural stability to require for the analysis. Here the norm \(\Vert \cdot \Vert _{H_h}\) is an h-weighted \(L^2\)-norm and will be discussed below.

4.2 Equality Constraints

We wish to solve the optimization problem (31) and recall the formal augmented Lagrangian similar to (7) given by

$$\begin{aligned} \nonumber \mathcal {L}_\text {A}(v,\mu ) := {}&F(v) - \left\langle f,v\right\rangle _{V',V}- \left\langle \mu ,Bv-g \right\rangle _{H',H} \\&+ \frac{ \gamma }{2} \Vert Bv-g\Vert _H^2 \end{aligned}$$
(37)

For later use with inequality constraints, we would now like to use the analogy to (15). However, this is not possible unless \(H' = H := L^2\), where \(L^2\) denotes the space of square integrable functions over the pertinent domain, which is case A above. In this particular case, completing the square, \(-2 a b + b^2 = (a-b)^2 - a^2\), results in the following equivalent formulation

$$\begin{aligned} \nonumber \mathcal {L}_\text {A}(v,\mu ) := {}&F(v) + \left\langle f,v\right\rangle _{V',V} \\&+ \frac{\gamma }{2} \Vert Bv-g -\frac{1}{2 \gamma } \mu \Vert - \frac{1}{2 \gamma } \Vert \mu \Vert ^2 \end{aligned}$$
(38)

analogous to (15) and we recall that \(\Vert \cdot \Vert\) is the \(L^2\) norm, see Assumption 2 above. We let the semi-linear form \(a:V \times V \rightarrow \mathbb {R}\) be defined by the Gateaux derivative of F(v),

$$\begin{aligned} a(u;v) := \left\langle \frac{\partial F}{\partial u}(u), v \right\rangle _{V',V} \end{aligned}$$
(39)

and we assume that the form a satisfies the positivity, monotonicity and continuity conditions

$$\begin{aligned} a(v;v)\ge & {} \alpha \Vert v\Vert _V^2, \quad \alpha >0 \end{aligned}$$
(40)
$$\begin{aligned} a(w_1;w_1-w_2) - a(w_2;w_1 - w_2 )\ge & {} \alpha \Vert w_1 - w_2\Vert _V^2 \end{aligned}$$
(41)
$$\begin{aligned} \vert a(w_1;v) - a(w_2;v)\vert\le & {} C \Vert w_1-w_2\Vert _V \Vert v\Vert _V \end{aligned}$$
(42)

The optimality system obtained by differentiating (38) then reads: find \((u,\lambda ) \in V \times H'\)such that

$$\begin{aligned} \left\langle f,v_h\right\rangle _{V',V} + \left\langle g, \mu + {\gamma } Bv \right\rangle _H = {}&a(u;v) - \left\langle \lambda , Bv\right\rangle _{H',H} \nonumber \\&- \left\langle \mu , Bu\right\rangle _{H',H} \nonumber \\&+ {\gamma } \left\langle Bu, Bv\right\rangle _H \end{aligned}$$
(43)

for all \((v,\mu )\in V\times H'\). Here we simply replace V and \(H'\) by \(V_h\) and \(H_h'\) to obtain the discrete method.

We also want to handle case B. Then typically \(Bv\in H := H^r\) where \(H^r\) denotes a (potentially fractional) Hilbert space with \(r > 0\), and consequently \(\mu \in H' := H^{-r}\), the dual to \(H^r\). Since \(H^{-r} \not \subset H^r\) the formulation (38) no longer makes sense. Instead in the spirit of discretize first then optimize we move to the discrete counterpart of (31) and introduce discrete spaces \(V_h\subset V\) and \(H'_h \subset H'\). The finite element method then amounts to seek stationary points in \(V_h\) and \(H'_h\) to the augmented Lagrangian (37). On the finite dimensional finite element spaces we can approximate the continuous norms \(\Vert \cdot \Vert _H\) and \(\Vert \cdot \Vert _{H'}\) by discrete counterparts

$$\begin{aligned} \Vert Bv \Vert _H^2 \approx \Vert Bv \Vert _{H_h}^2 := \Vert h^{-r} Bv \Vert _{L^2}^2 \end{aligned}$$
(44)

and

$$\begin{aligned} \Vert \mu \Vert _{H'}^2 \approx \Vert \mu \Vert _{H'_h}^2 := \Vert h^{r} \mu \Vert _{L^2}^2 \end{aligned}$$
(45)

where h is the local meshsize (assumed constant in the following for simplicity) and \(r\ge 0\) depends on the space H; loosely speaking r corresponds to the number of derivatives present in the norm \(\Vert \cdot \Vert _H\). It is also immediate by the Cauchy-Schwarz inequality that the following discrete duality property holds

$$\begin{aligned} \left\langle v,\mu \right\rangle _{H_h,H_h'} := \left\langle v,\mu \right\rangle \le \Vert v \Vert _{H_h} \Vert \mu \Vert _{H'_h}. \end{aligned}$$

This is done for two reasons

  1. 1.

    To obtain a well conditioned method, we wish to have the same condition number emanating from the penalty term as from the form \(a(\cdot ,\cdot )\).

  2. 2.

    The analysis of the resulting methods requires that the discrete norms can be bounded in terms of the form \(a(\cdot ,\cdot )\) which is only possible if they scale the same way.

Now we can use the arbitrariness of \(\gamma\) to set

$$\begin{aligned} \gamma = \gamma _0/h^{2r} \end{aligned}$$
(46)

where \(\gamma _0\) is a problem– and discretization–dependent constant. Proceeding as above we find that on discrete spaces

$$\begin{aligned} \mathcal {L}^h_\text {A}(v,\mu ) := {}&F(v) + \left\langle f,v\right\rangle _{V',V} \\&+ \frac{\gamma _0}{2h^{2r}} \Vert Bv-g -\frac{h^{2r}}{2 \gamma _0} \mu \Vert ^2 - \frac{h^{2r}}{2 \gamma _0} \Vert \mu \Vert ^2 \end{aligned}$$

and the discrete optimality system reads: find \((u_h,\lambda _h) \in V_h \times H'_h\) such that

$$\begin{aligned} \nonumber&a(u_h;v) - \left\langle \lambda _h, Bv\right\rangle - \left\langle \mu , Bu_h\right\rangle + \frac{\gamma _0}{h^{2r}} \left\langle Bu_h, Bv\right\rangle = \\&= \left\langle f,v_h\right\rangle _{V',V} + \left\langle g, \mu + \frac{\gamma _0}{h^{2r}} Bv \right\rangle \end{aligned}$$
(47)

for all \((v,\mu )\in V_h\times H'_h\), where \(\langle \cdot ,\cdot \rangle\) denotes the standard \(L^2\) scalar product. Introducing the global form

$$\begin{aligned} A[(w,\eta );(v,\mu )] := {}&a(w; v) - \left\langle \eta , Bv\right\rangle - \left\langle \mu , Bw\right\rangle \\ {}&+ \frac{\gamma _0}{h^{2r}} \left\langle Bw, Bv\right\rangle \end{aligned}$$

we can cast the optimality system on the compact form: find \((u_h,\lambda _h) \in V_h \times H'_h\) such that

$$\begin{aligned} A[(u_h,\lambda _h);(v,\mu )] = \left\langle f,v\right\rangle _{V',V}+ \left\langle g, \mu + \frac{\gamma _0}{h^{2r}} Bv \right\rangle \end{aligned}$$
(48)

for all \((v,\mu ) \in V_h \times H'_h\).

It follows by inspection that any solution to (31) that is sufficiently smooth, i.e. \((u,\lambda ) \in V \times H' \cap L^2\) is a solution to (47) and hence the formulation is consistent. Indeed the stationary point of (32) is given by the solution to

$$\begin{aligned} a(u;v) - \left\langle \lambda , B v\right\rangle _{H',H} = \left\langle f, v\right\rangle _{V',V},\quad \forall v \in V \end{aligned}$$

and

$$\begin{aligned} \left\langle B u , \mu \right\rangle _{H,H'} = \left\langle g , \mu \right\rangle _{H,H'} \end{aligned}$$

If the solution is sufficiently regular these equalities hold with H and \(H'\) replaced by the \(L^2\) norm and we see that in that case the exact solution satisfies the finite element formulation,

$$\begin{aligned}&\underbrace{a(u;v) - \left\langle \lambda , Bv\right\rangle }_{=\left\langle f, v\right\rangle _{V',V}} - \underbrace{\left\langle \mu , Bu\right\rangle + \frac{\gamma _0}{h^{2r}} \left\langle Bu, Bv\right\rangle }_{=\left\langle g, \mu + \frac{\gamma _0}{h^{2r}} Bv \right\rangle } = \\&=\left\langle f,v_h\right\rangle _{V',V} + \left\langle g, \mu + \frac{\gamma _0}{h^{2r}} Bv \right\rangle \end{aligned}$$

We do not give a full analysis of the linear problem herein, but focus on the nonlinear case in the next section. The analysis immediately also applies to the linear case.

4.3 Inequality Constraints

For the subsequent analysis, we will consider the discrete case and hence we use the space \(V_h\) for the primal variable and \(H_h'\) for the dual variable. For simplicity we do not use the subscript h on all variables below. We wish to solve the continuous optimization problem

$$\begin{aligned} u = \text{ arginf}_{v \in V} F(v) - \left\langle f,v\right\rangle _{V',V} \text{ such } \text{ that } B u \le 0 \end{aligned}$$
(49)

where the inequality constraint is interpreted in the sense of distributions on H as follows,

$$\begin{aligned} -\left\langle B u, \phi \right\rangle _{H,H'} \le 0, \quad \forall \phi \in \{C^\infty : \phi \le 0\}. \end{aligned}$$
(50)

We will denote the continuous multiplier appearing in the constrained optimization by \(\lambda \in H'\). The weak formulation characterizing the solution to the continuous problem is as follows. Find \((u,\lambda ) \in V \times K\), where \(K= \{ \mu \in H' : \mu \le 0 \}\) such that

$$\begin{aligned} a(u;v) - \left\langle \lambda , B v\right\rangle _{H',H} = \left\langle f, v\right\rangle _{V',V},\quad \forall v \in V \end{aligned}$$
(51)
$$\begin{aligned} \left\langle B u, \lambda - \mu \right\rangle _{H,H'} \le 0,\quad \forall \mu \in K \end{aligned}$$
(52)

Here the inequality constraint with test functions in K is defined by (50) and the fact that \(\{C^\infty : \phi \le 0\}\) is dense in K. It follows by choosing \(\lambda - \mu >0\) in (52) that \(Bu \le 0\). By taking \(\mu =0\) it follows that \(\left\langle B u , \lambda \right\rangle _{H,H'} \le 0\) and since both Bu and \(\lambda\) are negative it follows that \(\lambda B u = 0\).

We have arrived at the following Kuhn–Tucker conditions on the multiplier and side condition:

$$\begin{aligned} B u \le 0,\quad \lambda \le 0, \quad \lambda B u=0. \end{aligned}$$
(53)

We now use the analogue to (25), to show that (53) formally is equivalent to

$$\begin{aligned} \lambda = -{\gamma }\,[Bu-\gamma ^{-1} \, \lambda ]_+,\, \text{ a.e. } \end{aligned}$$
(54)

To derive the finite element formulation we also proceed formally following the discussion of Sect. 2.1 applied to the problem (49) with the min taken over the finite dimensional space \(V_h\) and write the augmented Lagrangian, for \(\gamma \in \mathbb {R}^+\), \((v,\mu ) \in V_h \times H_h'\),

$$\begin{aligned} \mathcal {L}_\text {A}(v,\mu ) := {}&F(v) - \left\langle f,v\right\rangle _{V',V}\nonumber \\&+ \frac{\gamma }{2} \Vert [Bv -\mu /\gamma ]_+\Vert ^2 - \frac{1}{ 2\gamma }\Vert \mu \Vert ^2 \end{aligned}$$
(55)

we note that if \(\gamma\) is chosen as in (46) we may use (44) and (45) to write

$$\begin{aligned} \mathcal {L}_\text {A}(v,\mu ) := {}&F(v) - \left\langle f,v\right\rangle _{V',V} + \frac{\gamma _0}{2} \Vert [Bv -\mu /\gamma ]_+\Vert _{H_h}^2 \nonumber \\&- \frac{1}{ 2\gamma _0}\Vert \mu \Vert _{H_h'}^2 \end{aligned}$$
(56)

The finite element optimality system reads: find \((u_h,\lambda _h) \in V_h \times H_h'\) such that

$$\begin{aligned} A[(u_h,\lambda _h);(v,\mu )] = \left\langle f,v\right\rangle _{V',V} \end{aligned}$$
(57)

for all \((v,\mu ) \in V_h \times H_h'\), where

$$\begin{aligned} A[(w,\eta );(v,\mu )] := {}&a(w; v) - \left\langle \gamma ^{-1} \eta , \mu \right\rangle \\&+ \left\langle \gamma [Bw -\eta /\gamma ]_+ , Bv -\mu / \gamma \right\rangle \end{aligned}$$

Note that in general (\(H \ne H'\)) and it is not possible to prove well-posedness of (57) in continuous spaces. Nevertheless also in this case a sufficiently smooth solution of the original continuous problem will also be solution to the formulation (57), showing that the formulation remains consistent.

First we note that for smooth solutions \(\lambda \in K\) and (52) are equivalent to (54). Then evaluating (57) at a sufficiently smooth exact solution \((u,\lambda )\) we see that for all \((v,\mu ) \in V_h \times H_h'\)

$$\begin{aligned} A[(u,\lambda );(v,\mu )] := {}&a(u; v) - \left\langle \gamma ^{-1} \lambda , \mu \right\rangle \nonumber \\&+ \underbrace{\left\langle \gamma [Bu -\lambda /\gamma ]_+, Bv -\mu / \gamma \right\rangle }_{= -\left\langle \lambda , Bv -\mu / \gamma \right\rangle \,\hbox {by } (54)} \nonumber \\ = {}&a(u; v)- \left\langle \lambda , B v\right\rangle _{H',H} \end{aligned}$$
(58)

and hence by (51) the formulation (57) is consistent for exact solutions \((u,\lambda ) \in V \times H'\cap L^2\).

To see the effect of the nonlinear formulation for active and non-active constraints, first assume

$$\begin{aligned} {[}Bw-\eta /\gamma ]_+>0 \end{aligned}$$

in (57). The constraint is active and we see that the equation becomes

$$\begin{aligned} a(w; v) - \left\langle \mu , Bw \right\rangle - \left\langle \eta , Bv \right\rangle + \left\langle \gamma Bw, Bv \right\rangle = 0 \end{aligned}$$

which we recognise as the augmented Lagrangian form from (47) imposing the equality constraint \(Bw = 0\). If on the other hand \([Bw-\eta /\gamma ]_+=0\) then the constraint is not active and the equation (57) takes the form

$$\begin{aligned} a(w; v) - \left\langle \gamma ^{-1} \eta , \mu \right\rangle = 0 \end{aligned}$$

and we see that Bw is free and \(\eta = 0\) is imposed. As expected the formulation expresses the conditions of (53) and acts as a nonlinear switch between imposing either \(B u = 0\) and \(\lambda = 0\).

Using the parameter \(\gamma\) introduced in (46) and the h-weighted norms introduced in (44) and (45) together with the inequality \(\vert [a]_+ - [b]_+\vert \le \vert a - b\vert\) [70] we see that the following continuity holds

$$\begin{aligned}&\left\langle \gamma ([Bw_1-\eta _1/\gamma ]_+ - [Bw_2-\eta _2/\gamma ]_+ , Bv + \mu / \gamma \right\rangle \nonumber \\&\quad \lesssim (\Vert B (w_1 - w_2)\Vert _{H_h} + \Vert \eta _1 - \eta _2\Vert _{H_h'})(\Vert Bv \Vert _{H_h} + \Vert \mu \Vert _{H_h'}) \end{aligned}$$
(59)

Together with (42) this shows that the form A is continuous. If \(H\equiv L^2\), the formulation (57) and (59) makes sense on the continuous level. Observe that unless \(r=0\) the norms are h dependent and hence the bound degenerates for decreasing h.

4.3.1 Stability, Existence and Uniqueness of Solutions

We will now show that thanks to the properties (40) - (42) we can derive a priori bounds on \((w,\eta )\) that allows us to prove existence of a solution in the spaces \(V_h \times H_h'\), using fixed point arguments.

Proposition 1

Assume that (34)-(36) and (40)-(42) hold. Then for every fixed h the formulation (57) admits a unique solution \((u_h,\lambda _h) \in V_h\times H_h'\). The solution satisfies the a priori bound

$$\begin{aligned}&\Vert u_h\Vert _V + \gamma _0^{\frac{1}{2}} \Vert [Bu_h- \gamma ^{-1} \lambda _h ]_+ + \gamma ^{-1}\lambda _h \Vert _{H_h} \nonumber \\&\qquad + \gamma _0^{-\frac{1}{2}} \Vert \lambda _h\Vert _{H_h'} \lesssim \Vert f\Vert _{V'} \end{aligned}$$
(60)

Proof

If we can show that the operator A is continuous and satisfies a stability condition then existence follows using Brouwer’s fixed point theorem and the arguments of [98, Chapter 2, Theorem 4.3] (see also [79, Proposition 4.3] for a discussion of finite element methods and augmented Lagrangian methods). First note that continuity of A follows by (59) and (42). Since h is fixed there is no need for the constant of the continuity to be independent of h. Existence of discrete solutions follow from the stability estimate, for all \(w, \eta \in V_h \times H_h'\),

$$\begin{aligned} A[(w,\eta ),(w + \alpha _\xi \xi ,-\eta )] \ge {}&\frac{1}{2} \alpha \Vert w\Vert ^2_V + \frac{1}{2} \gamma _0 \Vert [Bw- \gamma ^{-1} \eta ]_+ \nonumber \\&+ \gamma ^{-1}\eta \Vert _{H_h}^2 + \frac{1}{2} \gamma _0^{-1} \alpha _\xi \Vert \eta \Vert _{H_h'}^2 \end{aligned}$$
(61)

where \(\xi \in V_h\) is a function such that

$$\begin{aligned} \left\langle B \xi (\eta ) , q_h\right\rangle = {}&-\left\langle \eta /\gamma , q_h\right\rangle , \forall q_h \in H_h' \nonumber \\&\text{ and } \Vert \xi \Vert _V+ \|B \xi\|_{H_h} \lesssim \gamma _0^{-1} \Vert \eta \Vert _{H_h'} \end{aligned}$$
(62)

(c.f (36)), \(\gamma _0\ge 1\) and \(\alpha _\xi = 1/2 \min (C_{(36)}^{-2}, C_{(42)}^{-2} \min (1,\gamma _0 \alpha )\) where \(C_{(36)}\) and \(C_{(42)}\) are the constants in the bounds (36) and (42) respectively. The bound (60) follows from (61) since for a solution \((u_h,\lambda _h)\) there holds

$$\begin{aligned} A[(u_h,\lambda _h),(u_h + \alpha _\xi \xi (\lambda _h),-\lambda _h)] = \left\langle f, u_h + \alpha _\xi \xi (\lambda _h)\right\rangle _{V,V'} \end{aligned}$$

Using the duality pairing we see that

$$\begin{aligned} \left\langle f, u_h + \alpha _\xi \xi (\lambda _h)\right\rangle _{V,V'} \le {}&\Vert f\Vert _{V'}(\Vert u_h\Vert _V + \alpha _\xi \Vert \xi (\lambda _h)\Vert _V) \\ \lesssim {}&\Vert f\Vert _{V'}(\Vert u_h\Vert _V + \alpha _\xi \gamma _0^{-1} \Vert \lambda _h\Vert _{H_h'}) \end{aligned}$$

and the claim follows.

To show (61) observe that by testing with \((v,\mu ) = (w,-\eta )\) we have

$$\begin{aligned} \nonumber A[(w,\eta ),(w,-\eta )] = {}&a(w; w) + \gamma _0^{-1}\Vert \eta \Vert _{H_h'}^2 \\&+ \left\langle \gamma [Bw-\eta /\gamma ]_+ , Bw + \eta / \gamma \right\rangle \end{aligned}$$
(63)

By completing the square we see that

$$\begin{aligned} \nonumber&\gamma ^{-1} \Vert \eta \Vert _{L^2}^2 + \left\langle \gamma [Bw -\eta /\gamma ]_+ , Bw + \eta / \gamma \right\rangle = \\&= \gamma _0 \Vert [Bw-\eta / \gamma ]_+ + \gamma ^{-1}\eta \Vert _{H_h}^2 \end{aligned}$$
(64)

We conclude that A satisfies the following positivity property, for all \((w,\eta ) \in V_h \times H_h'\),

$$\begin{aligned} \nonumber A[(w,\eta ),(w,-\eta )] = {}&a(w; w) +\\&\gamma _0 \Vert [Bw-\eta /(2 \gamma )]_+ + \gamma ^{-1}\eta \Vert _{H_h}^2 \end{aligned}$$
(65)

Then, since \(\eta \in H_h'\) we can use (36) to choose \(\xi (\eta ) \in V_h\) satisfying (62), and test with \(v =\xi\) and \(\mu =0\) to obtain

$$\begin{aligned} A[(w,\eta ),(\xi ,0)]&= a(w;\xi )+ \gamma \left\langle [Bw-\eta /\gamma ]_+ , B \xi (\eta )\right\rangle \end{aligned}$$
(66)

Now observe that

$$\begin{aligned} \gamma \left\langle [Bw-\eta /\gamma ]_+, B \xi (\eta )\right\rangle = {}&\gamma \left\langle [Bw-\eta /\gamma ]_+ + \gamma ^{-1} \eta , B \xi (\eta )\right\rangle \\ {}&- \underbrace{\left\langle \eta , B \xi (\eta )\right\rangle }_{ = -\gamma _0^{-1} \Vert \eta \Vert _{H_h'}^2} \\ \ge {}&\gamma _0^{-1} \Vert \eta \Vert _{H_h'}^2 - \frac{1}{2} \gamma _0 C_{(36)}^{-2} \Vert B \xi (\eta )\Vert _{H_h}^2 \\ {}&- \frac{1}{2} C_{(36)}^2 \gamma _0 \Vert [Bw+\eta /\gamma ]_+ + \gamma ^{-1} \eta \Vert _{H_h}^2 \end{aligned}$$

and since \(\gamma _0^{\frac{1}{2}} \Vert B \xi (\eta )\Vert _{H_h} \le C_{(36)} \gamma _0^{\frac{1}{2}} \Vert \eta /\gamma \Vert _{H_h} = C_{(36)} \gamma _0^{-\frac{1}{2}} \Vert \eta \Vert _{H_h'}\) we see that

$$\begin{aligned} \gamma \left\langle [Bw-\eta /\gamma ]_+, B \xi (\eta )\right\rangle \ge {}&\frac{1}{2} \gamma _0^{-1} \Vert \eta \Vert _{H_h'}^2 \\ {}&- \frac{1}{2} C_{(36)}^2 \gamma _0 \Vert [Bw+\eta /\gamma ]_+ - \gamma ^{-1} \eta \Vert _{H_h}^2 \end{aligned}$$

Combining (36) with (42) we see that, using the boundedness

$$\begin{aligned} a(w;\xi ) \le C_{(42)} \Vert w\Vert _V \Vert \eta /\gamma \Vert _{H_h} \le C_{(42)} \gamma _0^{-1} \Vert w\Vert _V \Vert \eta \Vert _{H_h'}, \end{aligned}$$
$$\begin{aligned} \nonumber&a(w;\alpha _\xi \xi )+ \gamma \left\langle [Bw-\eta /\gamma ]_+ + \gamma ^{-1} \eta , \alpha _\xi B \xi (\eta )\right\rangle \ge \\ \nonumber&\ge - \gamma _0^{-1} \alpha _\xi C_{(42)}^2 \Vert w\Vert _V^2 - \alpha _\xi C_{(36)}^2 \gamma _0 \Vert [Bw+\eta /\gamma ]_+ \\&\qquad + \gamma ^{-1} \eta \Vert _{H_h}^2 + \frac{1}{2} \gamma _0^{-1} \alpha _\xi \Vert \eta \Vert _{H_h'}^2 \end{aligned}$$
(67)

The desired inequality then follow by adding (65) and (67) for \(\gamma _0 \ge 1\) and

$$\begin{aligned} \alpha _\xi = 1/2 \min (C_{(36)}^{-2}, C_{(42)}^{-2}) \min (1,\gamma _0 \alpha ). \end{aligned}$$

If \(H\equiv L^2\) the analysis can be extended to the continuous case, for details see [98, Chapter 1, Lemma 4.3].

Uniqueness follows in principle from [98, Chapter 2, Theorem 2.2], but for completeness we give a simple proof below. Considering the nonlinearity expressing the constraint we have using the monotonicity \(([a]_+-[b]_+)(a - b) \ge ([a]_+-[b]_+)^2\), and setting, \(e=w_1-w_2\) and \(\zeta = \eta _1-\eta _2\),

$$\begin{aligned} \nonumber&\left\langle \gamma ([Bw_1-\eta _1/\gamma ]_+ - [Bw_2-\eta _2/\gamma ]_+, Be + \zeta \gamma \right\rangle + \\ \nonumber&\left\langle \gamma ^{-1} (\eta _1 - \eta _2), \zeta \right\rangle = \nonumber \\&\quad = \left\langle \gamma ([Bw_1-\eta _1/\gamma ]_+ - [Bw_2-\eta _2/\gamma ]_+, Be -\zeta / \gamma \right\rangle \nonumber \\&+2 \left\langle \gamma ([Bw_1-\eta _1/\gamma ]_+ - [Bw_2-\eta _2/\gamma ]_+, \zeta / \gamma \right\rangle + \gamma _0^{-1} \Vert \zeta \Vert ^2_{H_h'}\ge \nonumber \\&\quad \ge \gamma _0 \Vert [Bw_1-\eta _1/\gamma ]_+ - [Bw_2-\eta _2/\gamma ]_+ + \gamma ^{-1}\zeta \Vert _{H_h}^2 \end{aligned}$$
(68)

It follows from (41) and (68) that

$$\begin{aligned} \nonumber E_C[(w_1,\eta _1),(w_2,\eta _2)]^2 + \alpha \Vert e\Vert _V^2 \le {}&A[(w_1,\eta _1),(e,-\zeta )] \\ {}&- A[(w_2,\eta _2),(e,-\zeta )] \end{aligned}$$
(69)

where \(E_C\) is the error in the approximation of the contact zone defined by

$$\begin{aligned}&E_C[(w_1,\eta _1),(w_2,\eta _2)] := \\&:= \gamma _0^{\frac{1}{2}} \Vert [B w_1+\eta _1/\gamma ]_+ - [B w_2+\eta _2/\gamma ]_+ + \gamma ^{-1}\zeta \Vert _{H_h} \end{aligned}$$

If we assume that both \(\{w_1,\eta _1\}\) and \(\{w_2,\eta _2\}\) are solutions to (57) it follows that the right hand side of (69) is zero and

$$\begin{aligned} E_C[(w_1,\eta _1),(w_2,\eta _2)]^2 + \alpha \Vert e\Vert _V^2 = 0 \end{aligned}$$

It follows that \(e=0\) and the primal solution is unique. To see that also the multiplier is unique once again choose \(\xi (\eta )\) such that \(B \xi (\eta ) = -\eta /\gamma\), in the sense that \(\left\langle B \xi (\zeta ) , q_h\right\rangle = -\left\langle \zeta /\gamma , q_h\right\rangle\), for all \(q_h \in H_h'\), and test with \(v = \xi\) and \(\mu =0\), and use arguments similar as those leading to (61) to see that

$$\begin{aligned} 0 ={}&A[(w_1,\eta _1),(\xi (\zeta ) ,0)] - A[(w_2,\eta _1),(\xi (\zeta ) ,0)] \nonumber \\ \ge {}&-C^2\gamma _0^{-1} \underbrace{\Vert e\Vert _V ^2}_{I_1} + \frac{1}{2} \gamma _0^{-1} \Vert \zeta \Vert ^2_{H_h'} \nonumber \\&- \gamma _0 C^2 \underbrace{E_C[(w_1,\eta _1),(w_2,\eta _2)]^2}_{I_2} \end{aligned}$$
(70)

We have already shown in (69) that \(I_1 = I_2 = 0\) if both \(\{w_1,\eta _1\}\) and \(\{w_2, \eta _2\}\) are solutions, hence we conclude that \(\Vert \zeta \Vert _{H_h'} = 0\) which finishes the discussion of (discrete) well-posedness. \(\square\)

4.3.2 Best Approximation Results

In this section we will derive a best approximation result for the solution of (57). Due to the nonconforming character of the ALM we need to assume that the multiplier is in \(H' \cap L^2\). By specifying the approximation properties of our finite element spaces optimal a priori error estimates can be deduced.

Proposition 2

Assume that (34)-(36) and (40)-(42) hold. Let \((u,\lambda ) \in V \times (H' \cap L^2)\) be the solution to (49)-(53) and \((u_h,\lambda _h) \in V_h \times H_h'\) be the solution of (57). Then if \(\Phi [(u,\lambda ),(u_h,\lambda _h)] := E_C[(u,\lambda ),(u_h,\lambda _h)] + \Vert u - u_h\Vert _V + \gamma _0^{-\frac{1}{2}} \Vert \lambda - \lambda _h\Vert _{H_h'}\) there holds

$$\begin{aligned} \nonumber \Phi [(u,\lambda ),(u_h,\lambda _h)] \lesssim {}&\inf _{(v_h,\mu _h) \in V_h \times H_h'} \Bigl (\Vert u - v_h\Vert _V \\ \nonumber&+ \gamma _0^{\frac{1}{2}} \Vert B ( u - v_h)\Vert _{H_h} \\ {}&+ \gamma _0^{-\frac{1}{2}} \Vert \lambda - \mu _h\Vert _{H_h'}\Bigr ) \end{aligned}$$
(71)

Proof

Since (69) holds for all \(w_1,w_2 \in V\) and \(\zeta \in H'\cap L^2\), if the exact solution \(u,\lambda \in V \times (H'\cap L^2)\) we may apply it with \(w_1= u\), \(\eta _1 = \lambda\) and \(w_2=u_h\), \(\eta _2 = \lambda _h\) to obtain, with \(e = u - u_h\) and \(\zeta = \lambda - \lambda _h\),

$$\begin{aligned} \nonumber E_C[(u,\lambda ),(u_h,\lambda _h)]^2 + \alpha \Vert e\Vert _V^2 \le {}&A[(u,\lambda ),(e,-\zeta )] \\ {}&- A[(u_h,\lambda _h),(e,-\zeta )] \end{aligned}$$
(72)

Using the consistency of the method we have

$$\begin{aligned}&A[(u,\lambda ),(e,-\zeta )] - A[(u_h,\lambda _h),(e,-\zeta )] = \\&= A[(u,\lambda ),(u - i_F u,\pi _H \lambda - \lambda )] \\&\qquad - A[(u_h,\lambda _h),(u - i_F u,\pi _H \lambda - \lambda )] \end{aligned}$$

By the continuity of a we have

$$\begin{aligned} a(u;u - i_F u) - a(u_h,u - i_F u) \le C \Vert e\Vert _V \Vert u - i_F u\Vert _V \end{aligned}$$

For the nonlinearity imposing the constraint we notice that by the \(L^2\)-orthogonality of \(\pi _H\),

$$\begin{aligned} \gamma ^{-1} \left\langle \zeta , \lambda - \pi _H \lambda \right\rangle = \gamma _0^{-1} \Vert \lambda - \pi _H \lambda \Vert _{H_h'}^2 \end{aligned}$$

and using in addition the properties of \(i_F u\) we have \(\left\langle \pi _H \zeta , B ( u - i_F u) + (\lambda - \pi _H \lambda )/\gamma \right\rangle = 0\) and hence using that \(\pi _H \zeta = \zeta + \pi _H \zeta - \zeta = \zeta - (\lambda -\pi _H \lambda )\),

$$\begin{aligned} \left\langle \gamma ([Bu +\lambda /\gamma ]_+ - [B u_h +\lambda _h/\gamma ]_+), B ( u - i_F u)\right\rangle \nonumber \\ + \left\langle \gamma ([Bu +\lambda /\gamma ]_+ - [B u_h +\lambda _h/\gamma ]_+),(\lambda - \pi _H \lambda )/\gamma \right\rangle = \nonumber \\ = \left\langle \gamma ([B u+\lambda /\gamma ]_+ - [B u_h+\lambda _h/\gamma ]_+) + \zeta , B ( u - i_F u)\right\rangle \nonumber \\ + \left\langle \gamma ([B u+\lambda /\gamma ]_+ - [B u_h+\lambda _h/\gamma ]_+)+ \zeta,(\lambda - \pi _H \lambda )/\gamma \right\rangle \nonumber \\ - \left\langle \lambda - \pi _H \lambda , B ( u - i_F u) + (\lambda - \pi _H \lambda )/\gamma \right\rangle \end{aligned}$$
(73)

Collecting the above inequalities we obtain using the Cauchy-Schwarz inequality and the arithmetic-geometric inequality in each right hand side,

$$\begin{aligned}&A[(u,\lambda ),(e,-\zeta )] - A[(u_h,\lambda _h),(e,-\zeta )] \le \\&\le \frac{1}{2} (E_C[(u,\lambda ),(u_h,\lambda _h)]^2 + \alpha \Vert e\Vert _V^2) \\&\quad + \frac{C}{\alpha } (\Vert u - i_F u\Vert _V^2 + \gamma _0 \Vert B ( u - i_F u) \Vert _{H_h}^2+ \gamma _0^{-1} \Vert \lambda - \pi _H \lambda \Vert _{H_h'}^2) \end{aligned}$$

It follows that the following error bound holds,

$$\begin{aligned}&E_C[(u,\lambda ),(u_h,\lambda _h)]^2 + \alpha \Vert e\Vert _V^2\\&\lesssim \Vert u - i_F u\Vert _V^2 + \gamma _0 \Vert B ( u - i_F u)\Vert ^2_{H_h} + \gamma _0^{-1} \Vert \lambda - \pi _H \lambda \Vert _{H_h'}^2 \end{aligned}$$

By adding and subtracting \(v_h\), applying the triangle inequality followed by the stability of the Fortin operator (right inequality of (35)) there holds

$$\begin{aligned} \Vert u - i_F u\Vert _V + \gamma _0^{\frac{1}{2}} \Vert B ( u - i_F u)\Vert _{H_h} \lesssim \Vert u - v_h\Vert _V + \gamma _0^{\frac{1}{2}} \Vert B ( u - v_h)\Vert _{H_h} \end{aligned}$$

and we conclude using also the definition of the \(L^2\)-projection \(\pi _H\), that

$$\begin{aligned} \nonumber&E_C[(u,\lambda ),(u_h,\lambda _h)] + \Vert e\Vert _V \\ \nonumber&\lesssim \inf _{(v_h,\mu _h) \in V_h \times H_h'} \Bigl (\Vert u - v_h\Vert _V + \gamma _0^{\frac{1}{2}} \Vert B ( u - v_h)\Vert _{H_h}\\&\qquad \qquad \qquad \qquad \qquad + \gamma _0^{-1} \Vert \lambda - \mu _h\Vert _{H_h'}\Bigr ) \end{aligned}$$
(74)

Turning to the error in the multiplier we have using (62)

$$\begin{aligned} \gamma _0^{-1} \Vert \pi _H \zeta \Vert _{H_h'}^2 = \left\langle \pi _H \zeta , B \xi ( \pi _H \zeta ) \right\rangle \end{aligned}$$

where \(\xi (\gamma ^{-1} \pi _H \zeta )\) is defined by (36) with \(z_h = \gamma ^{-1} \pi _H \zeta\)

Using the equation we see that

$$\begin{aligned}&\gamma _0^{-1} \Vert \pi _H \zeta \Vert _{H_h'}^2 = \left\langle \lambda - \pi _H\lambda , B \xi (\pi _H \zeta ) \right\rangle \\&\quad + \left\langle \gamma ([Bu +\lambda /\gamma ]_+ - [B u_h +\lambda -h/\gamma ]_+) + \zeta , B \xi (\pi _H \zeta )\right\rangle \\&\quad + a(u; \xi (\pi _H \zeta )) - a(u_h; \xi (\pi _H \zeta )) \end{aligned}$$

Applying the bound (42) to the last two terms of the right hand side and the Cauchy-Schwarz inequality to the others and applying the stability of (36) we see that

$$\begin{aligned}&a(u; \xi (\pi _H \eta )) - a(u_h; \xi (\pi _H \eta )) \le \\&\le C_{4.12} \Vert e\Vert _V \Vert \xi (\pi _H \eta )\Vert _V \le C_{4.12} \Vert e\Vert _V \gamma _0^{-\frac{1}{2}} \Vert \pi _H \zeta \Vert _{H_h'}, \end{aligned}$$
$$\begin{aligned}&\left\langle \pi _H \lambda - \lambda , B \xi (\pi _H \zeta ) \right\rangle \le \\&\le \gamma _0^{-\frac{1}{2}} \Vert \pi _H \lambda - \lambda \Vert _{H_h'} \gamma _0^{\frac{1}{2}} \Vert B \xi (\pi _H \zeta )\Vert _{H_h} \le \\&\le \gamma _0^{-\frac{1}{2}} \Vert \pi _H \lambda - \lambda \Vert _{H_h'} \gamma _0^{-\frac{1}{2}} \Vert \pi _H \zeta \Vert _{H_h'} \end{aligned}$$

and

$$\begin{aligned}&\left\langle \gamma ([Bu +\lambda /\gamma ]_+ - [B u_h +\lambda _h/\gamma ]_+) + \zeta , B \xi (\pi _H \zeta )\right\rangle \le \\&\le E_C[(u,\lambda ),(u_h,\lambda _h)] \gamma _0^{-\frac{1}{2}} \Vert \pi _H \zeta \Vert _{H_h'}. \end{aligned}$$

Collecting terms and dividing through by \(\gamma _0^{-\frac{1}{2}} \Vert \pi _H \zeta \Vert _{H_h'}\) we have

$$\begin{aligned} \gamma _0^{-\frac{1}{2}} \Vert \pi _H \zeta \Vert _{H_h'} \lesssim E_C[(u,\lambda ),(u_h,\lambda _h)] + \Vert e\Vert _V + \gamma _0^{-\frac{1}{2}}\Vert \pi _H \lambda - \lambda \Vert _{H_h'} \end{aligned}$$

We conclude by applying (74) to the right hand side and the triangle inequality \(\Vert \zeta \Vert \le \Vert \lambda - \pi _H \lambda \Vert + \Vert \pi _H \zeta \Vert\) to obtain,

$$\begin{aligned} \nonumber \gamma _0^{-\frac{1}{2}} \Vert \zeta \Vert _{H_h'} \lesssim {}&\inf _{(v_h,\mu _h) \in V_h \times H_h'} \Bigl (\Vert u - v_h\Vert _V + \gamma _0^{\frac{1}{2}} \Vert B ( u - v_h)\Vert _{H_h} \\&+ \gamma _0^{-\frac{1}{2}} \Vert \lambda - \mu _h\Vert _{H_h'}\Bigr ) \end{aligned}$$
(75)

The claim now follows by combining (74) and (75). \(\square\)

We observe that the natural norm for \(\lambda\) here would be \(H'\), but that we here consider the corresponding weighted \(L^2\)-norm \(H_h'\) instead. Since this is an h-weighted norm, the resulting \(L^2\) error estimate is suboptimal compared to approximation. Recovering control of the error in the \(H'\) norm would require an additional duality argument that is beyond the scope of this work.

4.3.3 Remark on Stabilized Methods

If the discrete spaces \(V_h\), \(H_h'\) do not satisfy the infsup condition (35), one can introduce a stabilization operator \(s(\cdot ,\cdot )\) which is designed to control the unstable modes. If a stable pair \(V_h\), \({\tilde{H}}_h'\), where \({\tilde{H}}_h'\) has the same approximation properties as \(H_h'\) up to a constant factor, is known, i.e. (35) and (36) are satisfied for these spaces, then a convenient way of choosing s is by using the following design criteria

  1. 1.

    Control of unstable modes:

    $$\begin{aligned} \gamma _0^{-1/2} \Vert \mu - {\tilde{\pi }}_H \mu \Vert _{H_h'} \lesssim s(\mu ,\mu )^{\frac{1}{2}}, \; \forall \mu \in H_h' + L^2 \end{aligned}$$
    (76)

    where \({\tilde{\pi }}_H\) denotes the \(L^2\) projection on \({\tilde{H}}_h'\).

  2. 2.

    Weak consistency:

    $$\begin{aligned} s(\mu - {\tilde{\pi }}_H \mu ,\mu - {\tilde{\pi }}_H \mu ) \sim \gamma _0^{-1} \Vert \mu - {\tilde{\pi }}_H \mu \Vert _{H_h'}^2, \; \forall \mu \in L^2 \end{aligned}$$
    (77)

    Here the \(\sim\) notation means that the two quantities have the same asymptotics in h for smooth enough \(\mu\).

The simplest choice of s is

$$\begin{aligned} s(\eta ,\mu ) = \gamma ^{-1} \left\langle (\pi _H - {\tilde{\pi }}_H) \eta , \mu \right\rangle \end{aligned}$$

The optimality system of the finite element formulation then reads: find \((u_h,\lambda _h) \in V_h \times H_h'\) such that

$$\begin{aligned} A[(u_h,\lambda _h);(v,\mu )]- s(\lambda _h,\mu ) = \left\langle f,v\right\rangle _{V',V} \end{aligned}$$
(78)

for all \((v,\mu ) \in V_h \times H_h'\), with A defined in (58).

It is then possible to use the monotonicity, the inf-sup stability (35) together with (76) and (77) to obtain bounds similar to (71) for the error of the stabilized Galerkin approximation. We only sketch the arguments. The only modification of the stability is that the stabilization operator appears in the left hand side. If \(e = u- u_h\) and \(\zeta = \lambda -\lambda _h\) then

$$\begin{aligned} \nonumber&E_C[(u,\lambda ),(u_h,\lambda _h)]^2 + \alpha \Vert e\Vert _V^2+s(\zeta ,\zeta ) \le \\&\le A[(u,\lambda ),(e,-\zeta )] - A[(u_h,\lambda _h),(e,-\zeta )]- s(\zeta ,-\zeta ) \end{aligned}$$
(79)

The key observation to obtain optimal approximation is to use Galerkin orthogonality using \(u_h - i_F u\) and \(\lambda _h - \tilde{\pi }_H \lambda\) and then apply a modified continuity estimate. Indeed by the assumptions we have \(\left\langle {\tilde{\pi }}_H \zeta , B ( u - i_F u) \right\rangle = 0\) and hence we can modify the continuity (73) the following way,

$$\begin{aligned} \left\langle \gamma ([Bu +\lambda /\gamma ]_+ - [B u_h +\lambda _h/\gamma ]_+), B ( u - i_F u)\right\rangle \\ +\left\langle \gamma ([Bu +\lambda /\gamma ]_+ - [B u_h +\lambda _h/\gamma ]_+),(\lambda - {\tilde{\pi }}_H \lambda )/\gamma \right\rangle =\\ =\left\langle \gamma ([Bu +\lambda /\gamma ]_+ - [B u_h +\lambda _h/\gamma ]_+)+ {\tilde{\pi }}_H \zeta , B ( u - i_F u)\right\rangle \\+ \left\langle \gamma ([Bu +\lambda /\gamma ]_+ - [B u_h +\lambda _h/\gamma ]_+)+ {\tilde{\pi }}_H \zeta , (\lambda - {\tilde{\pi }}_H \lambda )/\gamma \right\rangle \\ = \left\langle \gamma ([B u+\lambda /\gamma ]_+ - [B u_h+\lambda _h/\gamma ]_+) + \zeta , B ( u - i_F u)\right\rangle \\ + \left\langle \gamma ([B u+\lambda /\gamma ]_+ - [B u_h+\lambda _h/\gamma ]_+) + \zeta ,(\lambda - {\tilde{\pi }}_H \lambda )/\gamma \right\rangle \\ - \left\langle \zeta - {\tilde{\pi }}_H \zeta , B( u - i_F u) +(\lambda - {\tilde{\pi }}_H \lambda )/\gamma \right\rangle \end{aligned}$$

where we used that

$$\begin{aligned} \left\langle {\tilde{\pi }}_H \zeta , B( u - i_F u) +(\lambda - {\tilde{\pi }}_H \lambda )/\gamma \right\rangle = 0. \end{aligned}$$

In this expression all but the last term can be bounded in the same fashion as before. For the last term we apply the Cauchy-Schwarz inequality and then (76) to see that

$$\begin{aligned}&\left\langle \zeta - {\tilde{\pi }}_H \zeta , B( u - i_F u) + (\lambda - {\tilde{\pi }}_H \lambda )/\gamma \right\rangle \le \\&\le s(\zeta ,\zeta )^{\frac{1}{2}} (\gamma _0^{\frac{1}{2}} \Vert B( u - i_F u)\Vert _{H_h}+ \gamma _0^{-\frac{1}{2}} \Vert \lambda - {\tilde{\pi }}_H \lambda \Vert _{H_h'}) \end{aligned}$$

where now the right hand side is controlled by stability and approximation respectively. This leads to an error estimate for \(u - u_h\). The error in the multiplier can also be estimated using that

$$\begin{aligned} \Vert \zeta \Vert _{H'_h} \le \Vert {\tilde{\pi }}_h(\zeta - \zeta _h)\Vert _{H'_h}+ s(\zeta ,\zeta )^{\frac{1}{2}} \end{aligned}$$

and noting that the first term of the right hand side can be controlled as in the infsup stable case and the second is bounded by (79).

4.4 Eliminating the Multiplier

Now we assume that the multiplier can be expressed in the primal variable through a linear operator T on the continuous level, i.e. \(\lambda = T u\), such that for \(v_h \in V_h\), the following inequality that typically is of inverse type, holds

$$\begin{aligned} \Vert T u_h\Vert _{H_h'}^2 \le C_\text {I} \Vert u_h\Vert ^2_V \end{aligned}$$
(80)

where \(C_\text {I}\) is a constant that may depend on the mesh geometry, but not on the mesh size. We may then write the Nitsche type form of the equation (57): find \(u_h\in V_h\) such that

$$\begin{aligned} A[(u_h,T u_h);(v,T v)] = \left\langle f,v\right\rangle _{V',V} \end{aligned}$$
(81)

for all \(v \in V_h\), where \(A_h\) was defined in (58). This formulation, where the multiplier is eliminated is identified as a nonlinear GLS method. For this GLS formulation existence and uniqueness is ensured without any inf-sup condition [70, Theorem 3.3]. Stability is obtained thanks to the continuity of the T operator, (80).

We now revisit the analysis of the previous section and show that the same results hold for the case when the multiplier has been eliminated.

4.4.1 Continuity and Stability

We only need to verify (59) for the method (81). We immediately have for \(w_1,w_2,v \in V_h\),

$$\begin{aligned} \left\langle \gamma ([Bw_1-T w_1/\gamma ]_+ - [Bw_2-T w_2/\gamma ]_+ , Bv + T v/\gamma \right\rangle \nonumber \\ \lesssim (\Vert B (w_1 - w_2)\Vert _{H_h} + \Vert T (w_1 - w_2)\Vert _{H_h'})(\Vert Bv \Vert _{H_h} + \Vert T v\Vert _{H_h'}) \nonumber \\ \le C (\Vert B (w_1 - w_2)\Vert _{H_h} + \Vert w_1 - w_2\Vert _{V})(\Vert Bv \Vert _{H_h} + \Vert v\Vert _{V}) \end{aligned}$$
(82)

where we used (80) for the second inequality. To prove the a priori estimate that together with the continuity allows for the fixed point analysis we test with \(v=u_h\) in (81) to obtain using (40)

$$\begin{aligned}&\alpha \Vert u_h\Vert _V^2 + \Vert [B u_h -T u_h/\gamma ]_+ \Vert ^2_{H_h} - \gamma _0^{-1} \Vert T u_h\Vert _{H_h'}^2 \le \\&\le A[(u_h,T u_h);(u_h,T u_h)] \end{aligned}$$

Applying (80) to the last term of the right hand side we see that

$$\begin{aligned}&(\alpha - C_I/\gamma _0) \Vert u_h\Vert _V^2 + \gamma _0 \Vert [B u_h -T u_h/\gamma ]_+\Vert ^2_{H_h} \le \\&\le A[(u_h,T u_h);(u_h,T u_h)] \end{aligned}$$

We conclude that the stability holds for \(\gamma _0> C_I/\alpha\). Hence under this condition there exists a discrete solution to (81).

4.4.2 Uniqueness and Best Approximation Estimates

Uniqueness and best approximation follows using similar arguments, we only detail the best approximation case. We assume that the exact solution u to (49) is sufficiently smooth that

$$\begin{aligned} A_h[(u,T u);(v,T v)] = \left\langle f,v\right\rangle _{V',V}, \forall v \in V_h \end{aligned}$$
(83)

Then we may write, \(e=u-u_h\) and using the monotonicity of a (41) and of \([\cdot ]_+\) we see that, using the notation \(E_C(u,u_h) := \gamma _0 \Vert [Bu -T u\gamma ]_+ - [B u_h-T u_h/\gamma ]_+\Vert _{H_h}^2\),

$$\begin{aligned}&A_h[(u,T u);(e,T e)] - A_h[(u_h,T u_h);(e,T e)] \ge \\&\ge \alpha \Vert e\Vert _V^2 + E_C(u,u_h) - \gamma _0^{-1} \Vert T e\Vert _{H_h'}^2 \end{aligned}$$

For the last term of the right hand side observe that

$$\begin{aligned} \Vert T e\Vert _{H_h'}&\le \Vert T (u - v_h)\Vert _{H_h'}+\Vert T (u_h - v_h)\Vert _{H_h'} \\&\le \Vert T (u - v_h)\Vert _{H_h'}+C_I^{1/2} \Vert u_h - v_h\Vert _{V}\\&\le \Vert T (u - v_h)\Vert _{H_h'}+C_I^{1/2} (\Vert e\Vert _{V}+\Vert u - v_h\Vert _{V}) \end{aligned}$$

Hence

$$\begin{aligned}&A_h[(u,T u);(e,T e)] - A_h[(u_h,T u_h);(e,T e)] \ge \nonumber \\&\ge (\alpha - 3 C_I/\gamma _0) \Vert e\Vert _V^2 + E_C(u,u_h)\nonumber \\&\qquad - 3 \Vert T (u - v_h)\Vert _{H_h'}^2 - 3 C_I \Vert u - v_h\Vert _{V}^2 \end{aligned}$$
(84)

Fix \(\gamma _0 = 6 C_I/\alpha\) so that \(\alpha - 3 C_I/\gamma _0 = \alpha /2\). Considering the left hand side we have using (83), for all \(v_h \in V_h\)

$$\begin{aligned}&A_h[(u,T u);(e,T e)] - A_h[(u_h,T u_h);(e,T e)] = \\&=A_h[(u,T u);(u-v_h,T (u-v_h))] \\&\qquad - A_h[(u_h,T u_h);(u-v_h,T (u-v_h))] \end{aligned}$$

To conclude we use the continuity (42) and the arithmetic-geometric inequality,

$$\begin{aligned} a(u;u-v_h) - a(u_h;u-v_h) \le {}&C \Vert e\Vert _V \Vert u-v_h\Vert _V \\ \le {}&\frac{\alpha }{4} \Vert e\Vert _V^2 + C^2 \Vert u-v_h\Vert _V^2 \end{aligned}$$

together with the Cauchy-Schwarz inequality and the arithmetic-geometric inequality,

$$\begin{aligned} \left\langle \gamma ([Bu -T u\gamma ]_+ - [B u_h-T u_h/\gamma ]_+ , B(u - v_h)\right\rangle \\ + \left\langle \gamma ([Bu -T u\gamma ]_+ - [B u_h-T u_h/\gamma ]_+ ,T (u - v_h)/\gamma \right\rangle \\ \le \frac{1}{2} E_C(u,u_h) + \gamma _0 \Vert B(u - v_h)\Vert _{H_h}^2 + \gamma _0^{-1} \Vert T (u - v_h)\Vert _{H_h'}^2. \end{aligned}$$

Applying these inequalities in (84) we see that for all \(v_h \in V_h\)

$$\begin{aligned} \alpha \Vert e\Vert _V^2 + E_C(u,u_h) \lesssim {}&\Vert u-v_h\Vert _V^2 + \Vert B(u - v_h)\Vert _{H_h}^2 \\&+\Vert T (u - v_h)\Vert _{H_h'}^2 \end{aligned}$$

Taking square roots of both sides and the infimum over \(v_h \in V_h\) in the right hand side we conclude

$$\begin{aligned} E_C(u,u_h) +\Vert e\Vert _V \lesssim {}&\inf _{v_h \in V_h} \Bigl (\Vert u-v_h\Vert _V + \Vert B(u - v_h)\Vert _{H_h} \nonumber \\&+\Vert T (u - v_h)\Vert _{H_h'}\Bigr ) \end{aligned}$$
(85)

We have sketched a best approximation result for the formulation (81). Observe that no condition needs to be imposed on the finite element space in this case. Instead stability is ensured by the inverse inequality (80) that bounds the \(H_h'\)-norm of the multiplier expressed in the primal variable by the V-norm of the primal variable. By equivalence of norms on finite dimensional spaces this bound is always true. The key to optimality of the estimate is the proper h-scaling of the discrete norms given in (44) and (45).

We now turn to specific examples.

5 Applications

5.1 The Stokes Problem with Cavitation

Consider a domain \(\Omega\) in \({\mathbb {R}}^n\), \(n=2\) or \(n=3\) with boundary \(\partial \Omega\) that is composed of the two subsets \(\Gamma _D\) and \(\Gamma _N\) such that \(\partial \Omega = {\bar{\Gamma }}_D \cup {\bar{\Gamma }}_N\). We consider a lubricant with viscosity \(\mu\). The Stokes equation can then be written

$$\begin{aligned} -\mu \Delta {\varvec{u}} +\nabla p = {{\varvec{f}}}\;\text {and}\; \nabla \cdot {\varvec{u}} = 0\quad \text {in}\;\Omega , \end{aligned}$$
(86)

with \({\varvec{u}}=0\) on \(\Gamma _D\) and \((-p{\varvec{I}} + \mu \nabla {\varvec{u}})\cdot {\varvec{n}} = {\varvec{0}}\) on \(\Gamma _N\). Here, \({\varvec{u}}\) is the velocity of the lubricant, p is the pressure, and \({ {\varvec{f}}}\) is a force term. The lubricant cannot support subatmospheric pressure, so an additional condition is \(p\ge 0\) in \(\Omega\). In order to incorporate this condition into the model, it can be written as a variational inequality as follows. Let

$$\begin{aligned} a(\varvec{u},\varvec{v}) := \int _{\Omega }\mu \nabla \varvec{u}:\nabla \varvec{v}\, \textrm{d}\Omega , \quad L(\varvec{v}) := \int _{\Omega } \varvec{f}\cdot \varvec{v}\, \textrm{d}\Omega \end{aligned}$$

and

$$\begin{aligned} K=\{p\in L^2(\Omega ):\quad p\ge 0\} \end{aligned}$$

Seek \({\varvec{u}}\in [H_0^1(\Omega )]^n\) and \(p\in K\) such that

$$\begin{aligned} a(\varvec{u},\varvec{v}) -\int _{\Omega } p\,\nabla \cdot {{\varvec{v}}} \, d\Omega = L(\varvec{v}) , \end{aligned}$$
(87)

for all \({{\varvec{v}}} \in [H^1(\Omega )]^n\), and

$$\begin{aligned} -\int _{\Omega }\nabla \cdot {\varvec{u}}\, (q-p) \, d\Omega \le 0,\quad \forall q\in K \end{aligned}$$
(88)

To rewrite this problem as a variational equality, we use the Kuhn-Tucker conditions

$$\begin{aligned} p\ge 0, \quad \nabla \cdot \varvec{u} \ge 0, \quad p\,\nabla \cdot \varvec{u} = 0 \end{aligned}$$
(89)

and again replace conditions (89) by the equivalent statement

$$\begin{aligned} p = \gamma _0[\gamma _0^{-1} p -\nabla \cdot \varvec{u}]_+ \end{aligned}$$
(90)

with \(\gamma _0\) a positive number. We note here that we can identify the abstract spaces H and \(H'\) with \(L^2(\Omega )\) and that here the pressure cannot easily be interpreted as coming from a linear operator on the velocity, so we are in cases A and D from Sect. 4; the pressure has to be retained but \(r=0\) in the discrete norms.

Defining function spaces

$$\begin{aligned} V =\{\varvec{v}\in [H^1(\Omega )]^n: \; \varvec{v} =\varvec{0}\;\text {on } \Gamma ^{\text {D}}\}, \; Q = L^2(\Omega ) \end{aligned}$$
(91)

and seeking \((\varvec{u},p)\in V\times Q\) we seek stationary points to the functional

$$\begin{aligned} \nonumber \mathcal {L}_\text {A}(\varvec{u},p) := {}&\frac{1}{2} a(\varvec{u},\varvec{u}) - L(\varvec{u}) \\ \nonumber&+ \int _{\Omega }\frac{\gamma _0}{2}\left[ \gamma _0^{-1} p-\nabla \cdot \varvec{u}\right] _+^2d\Omega \\ {}&-\int _{\Omega }\frac{1}{2\gamma _0} p^2d\Omega \end{aligned}$$
(92)

analogously to (55).

For the discrete problem, we will use the inf-sup stable Taylor–Hood approximation which utilises the finite element space

$$\begin{aligned} \textbf{V}_h = {}&\{\varvec{v}:\varvec{v} \in \left[ C^0(\Omega )\right] ^d,\; \varvec{v}\vert _K\in [ P^2(K)]^d, \ \forall K\in {{\mathcal {T}}}^h,\; \\&\varvec{v}=\varvec{0} \text { on } \Gamma ^{\text {D}}\} \end{aligned}$$

for the velocity, where \(P^2(K)\) denotes the space of piecewise quadratic polynomials on K, and the space \(Q_h\) of piecewise linears for the pressure:

$$\begin{aligned} Q_{h} = \{p\in C^0(\Omega ):\; p\vert _{K} \in P^{1}(K),\,\forall K\in {{\mathcal {T}}}^h\}. \end{aligned}$$
(93)

The finite element method based on (92) is to find \((\varvec{u}_h,p_h)\in \textbf{V}_h\times Q_{h}\) such that

$$\begin{aligned} a(\varvec{u}_h,\varvec{v}) -\int _{\Omega }{\gamma _0}[\gamma _0^{-1} p_h-\nabla \cdot \varvec{u}_h ]_+ \nabla \cdot \varvec{v}d\Omega = (\varvec{f},\varvec{v}) \end{aligned}$$
(94)

for all \(\varvec{v}\in \textbf{V}_h\), and

$$\begin{aligned} \int _{\Omega }\left( {\gamma _0}[\gamma _0^{-1} p_h-\nabla \cdot \varvec{u}_h ]_+- p_h\right) qd\Omega =0 \end{aligned}$$
(95)

for all \(q\in Q_{h}\).

5.1.1 Satisfaction of Assumptions for the Abstract Analysis

For the present problem we have \(V=[H^1(\Omega )]^n\), \(H = H_h=H_h' = L^2(\Omega )\). The constraint operator B is the divergence operator. It if well known that the Taylor-Hood element admits a Fortin interpolant satisfying

$$\begin{aligned} \Vert \nabla \pi _F \varvec{v}\Vert _\Omega \lesssim \Vert \nabla \varvec{v}\Vert _\Omega , \forall \varvec{v} \in [H^1(\Omega )]^d. \end{aligned}$$

Since \(\Vert \nabla \pi _F \varvec{v}\Vert _\Omega \lesssim \Vert \nabla \pi _F \varvec{v}\Vert _\Omega\) the relation (35) holds. This means that for all \(\mu _h \in Q_h\) there exists \(\varvec{v}_h \in \textbf{V}_h\) such that \((\nabla \cdot \varvec{v}_h, q_h)_\Omega = (\mu _h, q_h)_\Omega\) and \(\Vert \varvec{v}_h\Vert _V \lesssim \Vert \mu _h\Vert _\Omega\). Hence (36) is also satisfied.

Since \(a(\cdot ,\cdot )\) is a linear operator in this case we see that (40)–(42) are satisfied using standard arguments. Hence the assumptions of Sect. 4 are satisfied in this case and hence we conclude that the best approximation estimate (71) holds.

5.2 Weak Imposition of Dirichlet Boundary Conditions

5.2.1 Model Problem

Let us first consider the Poisson model problem: find \(u:\Omega \rightarrow \mathbb {R}\) such that

$$\begin{aligned} -\Delta u= f ~\text {in} ~\Omega ,\quad u=g ~ \text {on} ~\Gamma :=\partial \Omega \end{aligned}$$
(96)

where \(\Omega\) is a bounded domain in two or three space dimensions, with outward pointing normal \(\varvec{n}\), and f and g are given functions. For simplicity, we shall assume that \(\Omega\) is polyhedral (polygonal). A classical way of prescribing \(u=g\) on the boundary is to pose the problem (96) as a minimisation problem with side conditions and seek stationary points to the functional

$$\begin{aligned} \mathcal {L}(v,\mu ) := \frac{1}{2} a(v,v) - \left\langle \mu ,v-g\right\rangle _{H^{-1/2}(\Gamma ),H^{1/2}(\Gamma )} - (f,v)_\Omega \end{aligned}$$
(97)

where

$$\begin{aligned} (f,v)_\Omega := \int _{\Omega }f v \, d\Omega , \quad a(u,v) := \int _\Omega \nabla u\cdot \nabla v\, d\Omega \end{aligned}$$
(98)

and \(\left\langle \mu ,v-g\right\rangle _{H^{-1/2}(\Gamma ),H^{1/2}(\Gamma )}\) is interpreted as a duality pairing on \(H^{-1/2}(\Gamma ) \times H^{1/2}(\Gamma )\). We are thus in case B of Sec. 4, and the method proposed will only make sense on discrete spaces.

The stationary points to (97) are given by finding \((u,\lambda )\in H^1(\Omega )\times H^{-1/2}(\Gamma )\) such that

$$\begin{aligned} a(u,v) - \left\langle \lambda ,v\right\rangle _{H^{-1/2}(\Gamma ),H^{1/2}(\Gamma )}= & {} (f,v)\quad \forall v\in H^1(\Omega ) \end{aligned}$$
(99)
$$\begin{aligned} \left\langle \mu ,u\right\rangle _{H^{-1/2}(\Gamma ),H^{1/2}(\Gamma )}= & {} \left\langle \mu ,g\right\rangle _{H^{-1/2}(\Gamma ),H^{1/2}(\Gamma )}\; \forall \mu \in H^{-1/2}(\Gamma ) \end{aligned}$$
(100)

As mentioned above, the discretisation of this problem requires balancing of the discrete spaces for the multiplier \(\lambda\) and the primal solution u in order for the method to be stable.

5.2.2 The Augmented Lagrangian Method for Boundary Conditions

The Lagrangian in (97) is augmented by a penalty term scaled by a parameter \(\gamma \in \mathbb {R}^+\) so that we seek stationary points to

$$\begin{aligned} \nonumber \mathcal {L}_A(v,\mu ) := {}&\frac{1}{2} a(v,v) - \left\langle \mu ,v-g\right\rangle _{H^{-1/2}(\Gamma ),H^{1/2}(\Gamma )} \\ {}&+ \frac{\gamma }{2}\Vert (v-g)\Vert ^2_{H^{1/2}(\Gamma )} - (f,v)_\Omega \end{aligned}$$
(101)

We note that the continuous norms imply \(r=1/2\) in the discrete norms. To find the stationary points we seek \((u,\lambda )\) such that

$$\begin{aligned}&a(u,v ) - \left\langle \lambda ,v\right\rangle _{H^{-1/2}(\Gamma ),H^{1/2}(\Gamma )} + \gamma (u-g, v)_{H^{1/2}(\Gamma )} \\&\qquad + \left\langle \mu,u \right\rangle _{H^{-1/2}(\Gamma ),H^{1/2}(\Gamma )} = \\ {}&= (f,v)_\Omega + \gamma \langle g, v\rangle _{H^{1/2}(\Gamma )} + \left\langle \mu,g \right\rangle _{H^{-1/2}(\Gamma ),H^{1/2}(\Gamma )} \end{aligned}$$

To determine the Lagrange multiplier \(\lambda\) we set \(\mu = 0\), and integrate by parts which gives

$$\begin{aligned} (-\Delta v + f, v )_\Omega + \langle \nabla _n u - \lambda , v\rangle _{H^{-1/2}(\Gamma ),H^{1/2}(\Gamma )} = 0 \end{aligned}$$
(102)

For the exact solution the first term vanish and we conclude that \(\lambda = \nabla _n v\).

We now wish to find a stable discrete counterpart to this optimisation problem. To this end, let \(\mathcal {T}_h\) be a family of quasi–uniform partitions, with mesh parameter h, of \(\Omega\) into shape regular triangles or tetrahedra T and the discrete space

$$\begin{aligned} V_h := \{v_h \in H^1(\Omega ): v_h\vert _T \in \mathbb {P}_k(T), \, \forall T \in {\mathcal {T}_h} \}, \end{aligned}$$
(103)

for \(k\ge 1\), and some discrete space \(Q_h\) (not explicitly defined) for the approximation of the Lagrange multiplier.

We first follow the idea of (44) and replace the \(H^{1/2}\)–norm by the discrete counterpart \(h^{-1/2} \Vert \cdot \Vert _{L^2(\Gamma )}\), which by an inverse estimate dominates the \(H^{1/2}(\Gamma )\) norm,

$$\begin{aligned} \Vert v\Vert ^2_{H^{1/2}(\Gamma )}\lesssim h^{-1} \Vert v\Vert ^2_{L^2(\Gamma )} \quad {v\in V_h} \end{aligned}$$
(104)

and introduce the problem of finding the stationary point in \(V_h\times Q_h\) of the discrete Lagrangian

$$\begin{aligned} \nonumber \mathcal {L}_A^h(v,\mu ) :={}&\frac{1}{2} a(v,v) - ({\mu },{v-g})_\Gamma +\frac{\gamma _0}{2h}\Vert v-g\Vert ^2_{L^2(\Gamma )}\\ {}&- (f,v)_\Omega \end{aligned}$$
(105)

Recalling next that formally the Lagrange multiplier in (99) is given by \(\mu = \nabla _n v\), which provides a direct way of computing the Lagrange multiplier from the primal solution, we obtain

$$\begin{aligned} \nonumber \mathcal {L}_A^h(v) := {}&\frac{1}{2} a(v,v) - (\nabla _n{v},{v-g})_{\Gamma } + \frac{\gamma _0}{2h}\Vert v-g \Vert ^2_{L^2(\Gamma )} \\ {}&- (f,v)_\Omega \end{aligned}$$
(106)

This is our stabilised ALM, the minimiser to which solves the problem of finding \(u_h\in V_h\) such that

$$\begin{aligned} \nonumber&a(u_h,v)-({\nabla _n u_h},{v})_{\Gamma }-({\nabla _n v},{u_h})_{\Gamma } +\gamma _0 h^{-1} ({u_h},{v})_\Gamma =\\&= l(v)\quad \forall v\in V_h \end{aligned}$$
(107)

where

$$\begin{aligned} l(v) :=(f,v)+(\gamma _0 h^{-1}{v} - {\nabla _n v},{g})_\Gamma \end{aligned}$$
(108)

We identify the classical method of Nitsche [18], stable if \(\gamma _0\) is chosen so that \(\gamma _0> \gamma _C\), where \(\gamma _C\) is the constant in the inverse inequality

$$\begin{aligned} h \Vert \nabla _n v\Vert _{L^2(\Gamma )}^2 \le \gamma _C \Vert \nabla v\Vert _{L^2(\Omega )}^2 \end{aligned}$$
(109)

Remark 1

As shown by Stenberg [19] (and discussed in Sec. 4.4), Nitsche’s method can be viewed as a particular instance of the GLS stabilisation method of Barbosa–Hughes [99]; in this sense the ALM is a variant of GLS, with the multiplier eliminated.

Remark 2

We note that the ALM leads to the symmetric form of Nitsche’s method. The corresponding unsymmetric forms, as discussed, e.g., in [73], are derived using different arguments.

5.3 Inequality Boundary Conditions

An important feature of the augmented Lagrangian approach is that it can be extended to the case of inequality constraints. We consider the problem: find \(u:\Omega \rightarrow \mathbb {R}\) such that

$$\begin{aligned} -\Delta u= f ~\text {in} ~\Omega ,\quad u-g\le 0 ~ \text {on} ~\Gamma \end{aligned}$$
(110)

We have the following Kuhn–Tucker conditions on the multiplier and side condition:

$$\begin{aligned} u -g \le 0,\quad \lambda \le 0, \quad \lambda (u-g)=0. \end{aligned}$$
(111)

We now use the analogue to (25), that (111) is equivalent to

$$\begin{aligned} \lambda = -{\gamma }\,[u-g-\gamma ^{-1}\, \lambda ]_+ \end{aligned}$$
(112)

first used in this context by Alart and Curnier [17]. Now we can take another route to the augmented Lagrangian method. Taking the discrete counterpart to the standard multiplier equilibrium equation (99) we find

$$\begin{aligned} \nonumber (f,v) = {}&a(u_h,v)-({\lambda _h},{v})_\Gamma \\ \nonumber = {}&a(u_h,v)-({\lambda _h},{v-\gamma ^{-1}\mu })_\Gamma \\&-({\gamma ^{-1}\lambda _h},{\mu })_\Gamma \end{aligned}$$
(113)

for all \(v\in V_h\) and \(\mu \in Q_h\) arbitrary. Using now (112) we find

$$\begin{aligned} \nonumber (f,v) = {}&a(u_h,v)+({{\gamma }\,[u_h-g-\gamma ^{-1}\, \lambda _h]_+},{v-\gamma ^{-1}\mu })_\Gamma \\&-({\gamma ^{-1}\lambda _h},{\mu })_\Gamma \quad \forall (v,\mu )\in V_h\times Q_h. \end{aligned}$$
(114)

This is the optimality system for the Lagrangian

$$\begin{aligned} \nonumber \mathcal {L}_A^h(v,\mu ) := {}&\frac{1}{2} a(v,v) +\frac{1}{2} \Vert \gamma ^{1/2}[v-g-\gamma ^{-1} \mu ]_+\Vert _{L^2(\Gamma )}^2 \\ {}&-\Vert \gamma ^{-1/2}\mu \Vert ^2_{L^2(\Gamma )} - (f,v)_\Omega \end{aligned}$$
(115)

cf. [17]. Approximating \(\lambda _h\approx \partial _n u_h\), and setting \(\mu =\partial _n v\), leads to the Nitsche method first introduced by Chouly and Hild in the context of elastic contact [70]. We seek \(u_h\in V_h\) such that

$$\begin{aligned} \nonumber (f,v)_\Omega = {}&a(u_h,v) \\\nonumber&+({{\gamma }\,[u_h-g-\gamma ^{-1}\, \partial _nu_h]_+},{v-\gamma ^{-1}\partial _n v})_\Gamma \\ {}&-({\gamma ^{-1}\partial _n u_h},{\partial _n v})_\Gamma =(f,v)_\Omega \; \forall \in V_h \end{aligned}$$
(116)

The solution to this problem is the minimiser of the nonlinear augmented Lagrangian

$$\begin{aligned} \nonumber \mathcal {L}_A^h(v) :={}&\frac{1}{2} a(v,v) + \frac{1}{2} \Vert \gamma ^{1/2}[v_h-g-\gamma ^{-1} \partial _nv]_+\Vert _{L^2(\Gamma )}^2 \\&-\Vert \gamma ^{-1/2}\partial _n v\Vert ^2_{L^2(\Gamma )} - (f,v_h)_\Omega \end{aligned}$$
(117)

Again, we choose \(\gamma = \gamma _0/h\). Variants and several extensions of (116) can be found in [83]. We remark here that (116) coincides with (107) in case of contact and gives a penalty on \(\partial _n u = 0\) on \(\Gamma\) in case of no contact. This penalty does not destroy the coercivity of the problem if (104) is satisfied.

Remark 3

In the GLS stabilisation for variational inequalities proposed by Barbosa and Hughes [20], no penalty is added to the Lagrangian; the multiplier is not eliminated, and their approach is a stabilised Lagrange multiplier method which requires the solution of an inequality problem. It is also possible to retain the multiplier in the ALM and add GLS stabilisation to the augmented Lagrangian. This approach, which also leads to a nonlinear equality problem, was explored in [103].

5.3.1 Satisfaction of Assumptions for the Abstract Analysis

In this case \(V=H^1(\Omega )\) and \(H=H^{\frac{1}{2}}(\partial \Omega )\), \(H' = H^{-\frac{1}{2}}(\partial \Omega )\). However since the solution to (110) is known to have the additional regularity \(u \in H^{\frac{3}{2}+\epsilon }(\Omega )\), \(\epsilon >0\), see [114]. It follows that \(\partial _n u \in L^2(\Omega )\) and the discrete norms \(H_h\) and \(H_h'\) defined by (44) and (45) are well defined on the exact solution. While (109) then is enough to make the formulation (116) satisfy the assumptions necessary for the analysis of Sect. 4.4, the formulation (115) still requires the satisfaction of (35) and (36). For a charaterisation of spaces satisfying these conditions (in the h-weighted \(L^2\)-norm) we refer to [104]. An example of a construction is two space dimension is to take element wise constant approximation for \(Q_h\) and let \(V_h\) consist of piecewise quadratic continuous approximation, or piecewise affine approximation enriched with a quadratic bubble added to elements adjacent to the boundary on each boundary face. The Fortin interpolant can then be constructed by first defining the nodal degrees of freedom using any \(H^1\)-stable interpolant and then fixing the degree of freedom associated to the bubble on each boundary faces so that (35) and (36) are satisfied. Indeed here they are equivalent. The same construction may be used for the forthcoming sections.

5.4 A Model for Elastic Contact

5.4.1 Treatment of Robin Boundary Conditions

To show the versatility of the ALM we shall consider the equations of linear elasticity in contact with a springy substrate. We start with the linear case of a Robin boundary condition: Find the displacement \({{\varvec{u}}}= \left[ u_i\right] _{i=1}^n\) and the symmetric stress tensor \({\varvec{\sigma }}= \left[ \sigma _{ij}\right] _{i,j=1}^n\) such that

$$\begin{aligned} \nonumber {\varvec{\sigma }}= {}&\frac{\nu E}{(1+\nu )(1-2\nu )} ~\text {tr}\,{\varvec{\varepsilon }}({{\varvec{u}}})\,{{\varvec{I}}}+ \frac{E}{(1+\nu )}{\varvec{\varepsilon }}({{\varvec{u}}})\quad \text {in}\quad \Omega\end{aligned}$$
(118)
$$\begin{aligned} -\nabla \cdot {\varvec{\sigma }}= {}&{{\varvec{f}}}\quad \text {in}\quad \Omega \end{aligned}$$
(119)
$$\begin{aligned} {{\varvec{S}}}{{\varvec{u}}}={}&-{\varvec{\sigma }}\cdot {{\varvec{n}}}\quad \text {on}\quad \partial \Omega _{\text {S}} \end{aligned}$$
(120)
$$\begin{aligned} {\varvec{\sigma }}\cdot {{\varvec{n}}}= {}&\textbf{0} \quad \text {on}\quad \partial \Omega \setminus \partial \Omega _{\text {S}} \end{aligned}$$
(121)

Here \(\Omega\) is a closed subset of \(\mathbb {R}^n\), \(n=2\) or \(n=3\), E is Young’s modulus and \(\nu\) is Poisson’s ratio. \({\varvec{\varepsilon }}\left( {{\varvec{u}}}\right) = \left[ \varepsilon _{ij}({{\varvec{u}}})\right] _{i,j=1}^n\) is the strain tensor with components

$$\begin{aligned} \varepsilon _{ij}({{\varvec{u}}}) = \frac{1}{2}\left( \frac{\partial u_i}{\partial x_j}+\frac{\partial u_j}{\partial x_i}\right) \end{aligned}$$

and trace

$$\begin{aligned} \text {tr}\,{\varvec{\varepsilon }}({{\varvec{u}}}) = \sum _{i} \varepsilon _{ii}({{\varvec{u}}}) = \nabla \cdot {{\varvec{u}}} \end{aligned}$$

Furthermore, \(\nabla \cdot {\varvec{\sigma }}= \left[ \sum _{j=1}^n\partial \sigma _{ij}/\partial x_j\right] _{i=1}^n\), \({{\varvec{I}}}= \left[ \delta _{ij}\right] _{i,j=1}^n\) with \(\delta _{ij} =1\) if \(i=j\) and \(\delta _{ij}= 0\) if \(i\ne j\), and \({{\varvec{f}}}\) is a given load. Finally, we assume that the boundary stiffness \({{\varvec{S}}}\) is of the form

$$\begin{aligned} {{\varvec{S}}}= \alpha ^{-1} {{\varvec{n}}}\otimes {{\varvec{n}}}+ \beta ^{-1} {{\varvec{P}}}, \quad {{\varvec{P}}}:=({{\varvec{I}}}-{{\varvec{n}}}\otimes {{\varvec{n}}}) \end{aligned}$$

where \(\alpha\) and \(\beta\) are flexibility parameters in the normal and tangential direction, respectively. The solution to (118)–(120) minimises the functional

$$\begin{aligned} \mathcal {L}_S({{\varvec{u}}}) := \frac{1}{2}a({{\varvec{u}}},{{\varvec{u}}}) -({{\varvec{f}}},{{\varvec{u}}})_{\Omega } + \langle {{\varvec{S}}}{{\varvec{u}}},{{\varvec{u}}}\rangle _{\partial \Omega _\text {S}} \end{aligned}$$
(122)

where

$$\begin{aligned} a({{\varvec{u}}},{{\varvec{v}}}) := ( {\varvec{\sigma }}({{\varvec{u}}}),{\varvec{\varepsilon }}({{\varvec{v}}}))_{\Omega }=\int _{\Omega } {\varvec{\sigma }}({{\varvec{u}}}):{\varvec{\varepsilon }}({{\varvec{v}}})\, d\Omega \end{aligned}$$

which is the usual foundation for a discrete method. However, to obtain a robust method for the case of \(\alpha \rightarrow 0\) or \(\beta \rightarrow 0\), we can introduce a new variable \({\varvec{\lambda }}\in [ L^2(\partial \Omega _\text {S})]^n\) and seek stationary points to

$$\begin{aligned} \mathcal {L}({{\varvec{u}}},{\varvec{\lambda }}) := {}&\frac{1}{2}a({{\varvec{u}}},{{\varvec{u}}}) -({{\varvec{f}}},{{\varvec{u}}})_{\Omega } -\frac{1}{2} \langle {{\varvec{K}}}{\varvec{\lambda }},{\varvec{\lambda }}\rangle _{\partial \Omega _\text {S}}\nonumber \\&- \langle {\varvec{\lambda }},{{\varvec{u}}}\rangle _{\partial \Omega _\text {S}} \end{aligned}$$
(123)

where \({{\varvec{K}}}:= {{\varvec{S}}}^{-1}\) is a flexibility matrix which simply tends to the zero matrix if \(\alpha ,\beta \rightarrow 0\), and the Robin condition becomes a Dirichlet condition. The stationary point to (122) fulfils the variational equations of finding \(({{\varvec{u}}},{\varvec{\lambda }})\in [H^1(\Omega )]^n\times [L^2(\partial \Omega _\text {S})]^n\) such that

$$\begin{aligned} a({{\varvec{u}}},{{\varvec{v}}}) - \langle {\varvec{\lambda }},{{\varvec{v}}}\rangle _{\partial \Omega _\text {S}} = {}&({{\varvec{f}}},{{\varvec{v}}})_{\Omega }\quad \forall {{\varvec{v}}}\in [H^1(\Omega )]^n \end{aligned}$$
(124)
$$\begin{aligned} \langle {{\varvec{K}}}{\varvec{\lambda }}+{{\varvec{u}}},{\varvec{\mu }}\rangle _{\partial \Omega _\text {S}}= {}&{ 0}\quad \forall {\varvec{\mu }}\in [L^2(\partial \Omega _\text {S})]^n \end{aligned}$$
(125)

and we note that, formally,

$$\begin{aligned} {\varvec{\lambda }}= {\varvec{\sigma }}({{\varvec{u}}})\cdot {{\varvec{n}}}\end{aligned}$$
(126)

In the discrete case, we can now formulate an ALM by adding a penalty term and replacing \({\varvec{\lambda }}\) using (127), looking for the minimiser of

$$\begin{aligned} \mathcal {L}_A^h({{\varvec{u}}}) :={}&\frac{1}{2}a({{\varvec{u}}},{{\varvec{u}}}) -({{\varvec{f}}},{{\varvec{u}}})_{\Omega } - \langle {\varvec{\sigma }}({{\varvec{u}}})\cdot {{\varvec{n}}},{{\varvec{u}}}\rangle _{\partial \Omega _\text {S}}\nonumber \\&+\frac{1}{2}\langle {{\varvec{S}}}_h ({{\varvec{K}}}{\varvec{\sigma }}({{\varvec{u}}})\cdot {{\varvec{n}}}+{{\varvec{u}}}), {{\varvec{K}}}{\varvec{\sigma }}({{\varvec{u}}})\cdot {{\varvec{n}}}+{{\varvec{u}}}\rangle _{\partial \Omega _\text {S}} \nonumber \\&-\frac{1}{2} \langle {{\varvec{K}}}{\varvec{\sigma }}({{\varvec{u}}})\cdot {{\varvec{n}}},{\varvec{\sigma }}({{\varvec{u}}})\cdot {{\varvec{n}}}\rangle _{\partial \Omega _\text {S}} \end{aligned}$$
(127)

where \({{\varvec{S}}}_h\) is a discrete stiffness matrix, to be chosen. The minimiser to (126) satisfies the variational equation of finding \({{\varvec{u}}}_h\in V := [V_h]^n\) such that

$$\begin{aligned} a_{{{\varvec{S}}}_h}({{\varvec{u}}}_h,{{\varvec{v}}}) = ({{\varvec{f}}},{{\varvec{v}}})_{\Omega }\quad \forall {{\varvec{v}}}\in V \end{aligned}$$
(128)

where

$$\begin{aligned} a_{{{\varvec{S}}}_h}({{\varvec{u}}},{{\varvec{v}}}) := {}&a({{\varvec{u}}},{{\varvec{v}}}) - \langle {{\varvec{u}}}+{{\varvec{K}}}{\varvec{\sigma }}({{\varvec{u}}})\cdot {{\varvec{n}}},{\varvec{\sigma }}({{\varvec{v}}})\cdot {{\varvec{n}}}\rangle _{\partial \Omega _\text {S}}\nonumber \\&- \langle {\varvec{\sigma }}({{\varvec{u}}})\cdot {{\varvec{n}}},{{\varvec{v}}}+{{\varvec{K}}}{\varvec{\sigma }}({{\varvec{v}}})\cdot {{\varvec{n}}}\rangle _{\partial \Omega _\text {S}}\nonumber \\&+ \langle {{\varvec{S}}}_h({{\varvec{u}}}+{{\varvec{K}}}{\varvec{\sigma }}({{\varvec{u}}})\cdot {{\varvec{n}}}),{{\varvec{v}}}+{{\varvec{K}}}{\varvec{\sigma }}({{\varvec{v}}}) \cdot {{\varvec{n}}}\rangle _{\partial \Omega _\text {S}} \nonumber \\&+ \langle {{\varvec{K}}}{\varvec{\sigma }}({{\varvec{u}}})\cdot {{\varvec{n}}},{\varvec{\sigma }}({{\varvec{v}}}) \cdot {{\varvec{n}}}\rangle _{\partial \Omega _\text {S}} \end{aligned}$$
(129)

which is related to the Nitsche method for interfaces in [105, 106], and a variant of the method of Juntunen and Stenberg [107] for Poisson’s problem with Robin boundary conditions. With the particular choice

$$\begin{aligned} {{\varvec{S}}}_h = \left( (h/\gamma _0){{\varvec{I}}}+ {{\varvec{K}}}\right) ^{-1} \end{aligned}$$
(130)

we regain the standard Nitsche method for the Dirichet problem if \({{\varvec{K}}}\) is the zero matrix, and if \({{\varvec{K}}}\) is nonzero we approach the minimiser of (124) as \(h\rightarrow 0\). Thus the method is robust also in the limit of zero flexibility.

5.4.2 One-Sided Conditions in Contact

We now wish to activate the Robin boundary only if \({{\varvec{u}}}\cdot {{\varvec{n}}}-g > 0\), corresponding to contact with a springy foundation at a distance g from the elastic body. Since this condition is only on the normal part of the displacement, we consider the case of slip, i.e., we choose

$$\begin{aligned} {{\varvec{K}}}= \alpha {{\varvec{n}}}\otimes {{\varvec{n}}}\end{aligned}$$

Setting \(\sigma _n := {{\varvec{n}}}\cdot {\varvec{\sigma }}\cdot {{\varvec{n}}}\) and \(u_n ={{\varvec{u}}}\cdot {{\varvec{n}}}\), the linear case is then to find stationary points to (124) simplified as

$$\begin{aligned} \nonumber \mathcal {L}({{\varvec{u}}},\lambda _n) := {}&\frac{1}{2} a({{\varvec{u}}},{{\varvec{u}}}) -({{\varvec{f}}},{{\varvec{u}}})_{\Omega } \\&-\frac{1}{2} \langle \alpha \lambda _n ,\lambda _n \rangle _{\partial \Omega _\text {S}}- \langle \lambda _n ,u_n-g\rangle _{\partial \Omega _\text {S}} \end{aligned}$$
(131)

where formally \(\lambda _n = \sigma _n({{\varvec{u}}})\). In the case of contact we now have the KKT condition

$$\begin{aligned} u_n -g+\alpha \lambda _n \le {}&0 \end{aligned}$$
(132)
$$\begin{aligned} \lambda _n \le {}&0 \end{aligned}$$
(133)
$$\begin{aligned} \lambda _n \left( u_n -g+\alpha \lambda _n\right) = {}&0 \end{aligned}$$
(134)

which we can formally rewrite as

$$\begin{aligned} \lambda _n =-\gamma [ ( u_n -g+\alpha \lambda _n)-\gamma ^{-1}\lambda _n ]_+ \end{aligned}$$
(135)

Proceeding as in (113), the equilibrium equation resulting from (132) is

$$\begin{aligned} ({{\varvec{f}}},{{\varvec{v}}})_{\Omega } = a({{\varvec{u}}},{{\varvec{v}}}) -\langle \lambda _n ,v_n\rangle _{\partial \Omega _\text {S}} \end{aligned}$$
(136)

and seeing as

$$\begin{aligned} - \langle \lambda _n ,v_n\rangle _{\partial \Omega _\text {S}} = {}&- \langle \lambda _n ,v_n+\alpha \mu _n\rangle _{\partial \Omega _\text {S}} +\langle \alpha \lambda _n ,\mu _n\rangle _{\partial \Omega _\text {S}} \nonumber \\ = {}&- \langle \lambda _n ,v_n+(\alpha -\gamma ^{-1})\mu _n\rangle _{\partial \Omega _\text {S}} \nonumber \\&+\langle (\alpha -\gamma ^{-1}) \lambda _n, \mu _n\rangle _{\partial \Omega _\text {S}} \end{aligned}$$
(137)

with \(\mu _n\) arbitrary, we find that the discrete augmented Lagrangian can be written

$$\begin{aligned} \mathcal {L}_A^h({{\varvec{u}}},\lambda _n) :={}&\frac{1}{2}a({{\varvec{u}}},{{\varvec{u}}}) -({{\varvec{f}}},{{\varvec{u}}})_{\Omega } \\ \nonumber&+\frac{1}{2}\Vert \gamma ^{1/2}[(u_n-g + (\alpha -\gamma ^{-1}) \lambda _n )]_+\Vert ^2_{\partial \Omega _\text {S}}\nonumber \\&+\frac{1}{2} \langle (\alpha -\gamma ^{-1})\lambda _n,\lambda _n\rangle _{\partial \Omega _\text {S}} \end{aligned}$$
(138)

and with \(\lambda _n \approx \sigma _n({{\varvec{u}}})\),

$$\begin{aligned} \nonumber \mathcal {L}_A^h({{\varvec{u}}}) :={}&\frac{1}{2}a({{\varvec{u}}},{{\varvec{u}}}) -({{\varvec{f}}},{{\varvec{u}}})_{\Omega } \\ \nonumber&+ \frac{1}{2}\Vert \gamma ^{1/2}[u_n-g + (\alpha -\gamma ^{-1})\sigma _n({{\varvec{u}}})]_+ \Vert ^2_{\partial \Omega _\text {S}}\\&+\frac{1}{2} \langle (\alpha -\gamma ^{-1})\sigma _n({{\varvec{u}}}) ,\sigma _n({{\varvec{u}}})\rangle _{\partial \Omega _\text {S}} \end{aligned}$$
(139)

the minimiser of which is \({{\varvec{u}}}\in V\) satisfying

$$\begin{aligned} \nonumber&({{\varvec{f}}},{{\varvec{v}}})_{\Omega } = a({{\varvec{u}}},{{\varvec{v}}}) \\ \nonumber&+ \langle \gamma [u_n-g + (\alpha -\gamma ^{-1}) \sigma _n({{\varvec{u}}})]_+,v_n + (\alpha -\gamma ^{-1}) \sigma _n({{\varvec{v}}})\rangle \\&+\langle (\alpha -\gamma ^{-1})\sigma _n({{\varvec{u}}}) ,\sigma _n({{\varvec{v}}})\rangle _{\partial \Omega _\text {S}} \quad \forall {{\varvec{v}}}\in V \end{aligned}$$
(140)

which coincides with (127) in contact, and gives an additional penalty on the condition \(\sigma _n({{\varvec{u}}}) =0\) if there is no contact. Choosing now

$$\begin{aligned} \gamma = (h/\gamma _0+\alpha )^{-1} \; \Rightarrow \; \alpha -\gamma ^{-1} = -\frac{h}{\gamma _0} \end{aligned}$$
(141)

we obtain the same penalty on the normal stress as in [70], which does not destroy the positive definite nature of the problem if we take \(\gamma _0 > \gamma _C\) where \(\gamma _C\) is the (stiffness dependent) constant in the inverse inequality

$$\begin{aligned} \Vert h^{1/2}{\varvec{\sigma }}({{\varvec{v}}})\cdot {{\varvec{n}}}\Vert _{\partial \Omega _\text {S}}^2 \le \gamma _C a({{\varvec{v}}},{{\varvec{v}}})\, \quad \forall {{\varvec{v}}}\in V \end{aligned}$$
(142)

5.5 Stabilising the Kirchhoff Plate Model

5.5.1 Approximation with Independent Rotations and Displacement

In the Kirchhoff plate model, posed on a domain \(\Omega \subset \mathbb {R}^2\) with boundary \(\partial \Omega\), we seek an out–of–plane (scalar) displacement u to which we associate the strain (curvature) tensor

$$\begin{aligned} {\varvec{\varepsilon }}(\nabla u) := \frac{1}{2}\left( \nabla \otimes (\nabla u) + (\nabla u ) \otimes \nabla \right) = \nabla ^2 u \end{aligned}$$
(143)

and the plate stress (moment) tensor

$$\begin{aligned} {\varvec{\sigma }}_P (\nabla u)&:= \text {D}\left( {\varvec{\varepsilon }}(\nabla u) + \nu (1- {\nu })^{-1} \nabla \cdot \nabla u \, {{\varvec{I}}}\right) \end{aligned}$$
(144)
$$\begin{aligned}&= \text {D}\left( \nabla ^2 u + \nu (1-\nu )^{-1} \Delta u {{\varvec{I}}}\right) \end{aligned}$$
(145)

where

$$\begin{aligned} \text {D}= \frac{E t^3}{12(1+\nu )} \end{aligned}$$
(146)

where t denotes the plate thickness.

The Kirchhoff clamped problem then takes the form: given the out–of–plane (scaled) load \(t^3f\), find the displacement u such that

$$\begin{aligned} \nabla \cdot \left( \nabla \cdot {\varvec{\sigma }}_P ( \nabla u )\right) = t^3f&\qquad \text {in } \Omega \end{aligned}$$
(147)
$$\begin{aligned} u = 0&\qquad \text {on } \partial \Omega \end{aligned}$$
(148)
$$\begin{aligned} {{\varvec{n}}}\cdot \nabla u = 0&\qquad \text {on } \partial \Omega \end{aligned}$$
(149)

The corresponding variational problem takes the form: Find the displacement \(u \in H^2_0(\Omega )\) such that

$$\begin{aligned} a_P(\nabla u, \nabla v )= (f,v)_\Omega \qquad \forall v \in H^2_0(\Omega \end{aligned}$$
(150)

where

$$\begin{aligned} a_P(\nabla v, \nabla w)&:= (t^{-3}{\varvec{\sigma }}_P(\nabla v), {\varvec{\varepsilon }}(\nabla w))_\Omega \end{aligned}$$
(151)

From a computational point of view (150) is cumbersome since it requires \(C^1\)–conforming elements or carefully constructed nonconforming approximations. It is therefore common to use instead the Mindlin–Reissner model which is described by the following partial differential equations:

$$\begin{aligned} \begin{aligned} -t^{-3}\nabla \cdot {\varvec{\sigma }}_P ({\varvec{\theta }})-\kappa \,t^{-2}\left( \nabla u -{\varvec{\theta }}\right)&= 0, \quad \text {in } \Omega \subset \mathbb {R}^2, \\ -\kappa \,t^{-2}\,\nabla \cdot \left( \nabla u-{\varvec{\theta }}\right)&= f, \quad \text {in } \Omega , \end{aligned} \end{aligned}$$
(152)

where \({\varvec{\theta }}\) is the rotation of the median surface and \(\kappa\) is a shear correction factor. We note that this relaxes the continuity requirement on u and that, as \(t\rightarrow 0\), tends to the Kirchhoff model. However, the requirement on the approximation to allow \(\vert \nabla u -{\varvec{\theta }}\vert \rightarrow 0\) is difficult to realise in the discrete setting and if this condition cannot be met, shear locking occurs, destroying the approximation properties of the discrete model. The ALM can offer an alternative approach in which we enforce the requirement \(\nabla u = {\varvec{\theta }}\) by a Lagrange multiplier. To this end we consider the Lagrangian

$$\begin{aligned} \mathcal {L}(u,{\varvec{\theta }},{\varvec{\lambda }}) := \frac{1}{2} a_P({\varvec{\theta }},{\varvec{\theta }})+({\varvec{\lambda }},\nabla u-{\varvec{\theta }})_{\Omega }-(f,u)_{\Omega } \end{aligned}$$
(153)

The Euler stationary points of (153) satisfy the weak system

$$\begin{aligned} a_P({\varvec{\theta }},{\varvec{\vartheta }}) +({\varvec{\lambda }},\nabla v-{\varvec{\vartheta }})_{\Omega } = {}&(f, v)_{\Omega } \end{aligned}$$
(154)

for all \((v,{\varvec{\vartheta }})\in H_0^1(\Omega )\times [H_0^1(\Omega )]^2\), and

$$\begin{aligned} (\nabla u-{\varvec{\theta }},{\varvec{\mu }})_{\Omega } = {}&0 \end{aligned}$$
(155)

for all \({\varvec{\mu }}\in [L^2(\Omega )]^2,\) corresponding to the strong form

$$\begin{aligned} -\nabla \cdot {\varvec{\sigma }}_P ({\varvec{\theta }})&= t^3{\varvec{\lambda }}\quad \text {in } \Omega \end{aligned}$$
(156)
$$\begin{aligned} -\nabla \cdot {\varvec{\lambda }}&= f \quad \text {in } \Omega \end{aligned}$$
(157)
$$\begin{aligned} \nabla u-{\varvec{\theta }}&= 0 \quad \text {in } \Omega \end{aligned}$$
(158)

We now wish to stabilise (153) using the ALM. To this end, we use (156) to eliminate \({\varvec{\lambda }}\) and add a penalty term on the side condition to obtain the augmented discrete functional

$$\begin{aligned} \nonumber \mathcal {L}_A^h(u_h,{\varvec{\theta }}_h):= {}&\frac{1}{2} a_P({\varvec{\theta }}_h,{\varvec{\theta }}_h) -(t^{-3}\nabla \cdot {\varvec{\sigma }}_P ({\varvec{\theta }}_h),\nabla u_h-{\varvec{\theta }}_h)_{h} \\&+ \frac{\gamma }{2}\Vert \nabla u_h -{\varvec{\theta }}_h\Vert ^2_{\Omega }-(f,u_h)_{\Omega } \end{aligned}$$
(159)

where \(u_h\in V^h_1\) and \({\varvec{\theta }}_h\in [V^h_2]^2\) for some discrete spaces \(V^h_1\) and \(V^h_2\). Here we use the notation

$$\begin{aligned} ({{\varvec{u}}},{{\varvec{v}}})_h := \sum _{T\in \mathcal {T}_h} \int _{T} {{\varvec{u}}}\cdot {{\varvec{v}}}\, dxdy \end{aligned}$$
(160)

The Euler equations corresponding to the augmented system are

$$\begin{aligned} A_h(({\varvec{\theta }}_h,u_h),({\varvec{\vartheta }},v))= (f, v)_{\Omega}\ \end{aligned}$$
(161)

for all \((v,{\varvec{\vartheta }})\in V^h_1\times [V^h_2]^2\), where

$$\begin{aligned} \nonumber A_h(({\varvec{\theta }},u),({\varvec{\vartheta }},v)) := {}&a_P({\varvec{\theta }},{\varvec{\vartheta }}) -(t^{-3}\nabla \cdot {\varvec{\sigma }}_P ({\varvec{\theta }}),\nabla v-{\varvec{\vartheta }})_{h}\\ \nonumber&-(\nabla u-{\varvec{\theta }},t^{-3}\nabla \cdot {\varvec{\sigma }}_P ({\varvec{\vartheta }}))_{h} \\&+\gamma (\nabla u -{\varvec{\theta }},\nabla v-{\varvec{\vartheta }})_{\Omega} \end{aligned}$$
(162)

Now, if \(V_2^h\) is the space of piecewise linears, the terms \((\cdot ,\cdot )_h\) vanish and, seeing as \(\theta \in H^{1}(\Omega )\) and thus \(\lambda \in H^{-1}(\Omega )\), we choose \(r=1\) in (45) and \(\gamma = \gamma _0/h^2\) to obtain a scheme proposed by Pitkäranta [108]; for higher order polynomial approximations we recover a GLS stabilisation method due to Stenberg [109, 110].

5.5.2 The Plate Obstacle Problem

We next consider applying the model from the previous Section to a regularised plate obstacle problem. The continuous model is

$$\begin{aligned}&-\nabla \cdot {\varvec{\sigma }}_P ({\varvec{\theta }}) = t^3{\varvec{\lambda }} \end{aligned}$$
(163)
$$\begin{aligned}&-\nabla \cdot {\varvec{\lambda }}+ p = f \end{aligned}$$
(164)
$$\begin{aligned}&\nabla u-{\varvec{\theta }}= 0 \end{aligned}$$
(165)
$$\begin{aligned}&p\ge 0,\; u-g + \beta p\ge 0, \; p(u-g+\beta p) = 0 \end{aligned}$$
(166)

Here, \(\beta\) is a given compliance which regularises the problem, in the limit case of \(\beta =0\) (rigid obstacle) we instead have the KKT conditions \(p\ge 0\), \(u-g\ge 0\), and \(p(u-g)= 0\). Note that the regularity in the limit case is insufficient for the analysis above. Indeed it is well known that \(u \not \in H^4(\Omega )\), which is insufficient for the multiplier to be in \(L^2\). It is however known that for \(\beta >0\), \(u \in H^4(\Omega )\) if the interior angles of the domain are smaller than \(126^\circ\) (see [111]). Therefore the analysis is valid for all \(\beta >0\), since we have

$$\begin{aligned} p\in Q = \left\{ \begin{array}{l} L^2(\Omega )\quad \text {if } \beta > 0\\ H^{-2}(\Omega )\quad \text {if } \beta = 0\end{array} \right. \end{aligned}$$
(167)

We see that, again, formally \({\varvec{\lambda }}= -t^{-3}\nabla \cdot {\varvec{\sigma }}_P ({\varvec{\theta }})\) and that \(p = f -t^{-3}\nabla \cdot \left( \nabla \cdot {\varvec{\sigma }}_P ({\varvec{\theta }})\right)\). Following the strategy from Sec. 5.4.2 we write

$$\begin{aligned} p =\epsilon [ ( u_n -g+\beta p)-\epsilon ^{-1}p ]_+ \end{aligned}$$
(168)

We need to also stabilise the rotations, and to this end we consider the discrete Lagrangian

$$\begin{aligned} \nonumber&\mathcal {L}_A^h({\varvec{\vartheta }},v) := \frac{1}{2} A_h(({\varvec{\vartheta }},v),({\varvec{\vartheta }},v)) \\ \nonumber&+ \frac{1}{2} \Vert \epsilon ^{1/2}[u-g-(\epsilon ^{-1}-\beta ) (f -t^{-3}\nabla \cdot \left( \nabla \cdot {\varvec{\sigma }}_P ({\varvec{\vartheta }}) )\right) ]_+\Vert _h^2 \\ \nonumber&-\frac{1}{2}\Vert (\epsilon ^{-1}-\beta )^{1/2} (f -t^{-3}\nabla \cdot ( \nabla \cdot {\varvec{\sigma }}_P ({\varvec{\vartheta }}) ) )\Vert _h^2\\&- (f,v)_\Omega \end{aligned}$$
(169)

where, considering the limit case \(p\in H^{-2}(\Omega )\), we choose \(r=2\) and thus

$$\begin{aligned} \epsilon = (h^4/\gamma _1+\beta )^{-1} \end{aligned}$$
(170)

with \(\gamma _1\) a sufficiently large constant. A similar approach has been suggested by Gustafsson et al. [52, 112] in the context of \(C^1\) approximations of the clamped Kirchhoff plate with GLS stabilisation, without specific reference to augmented Lagrangian methods.

6 Numerical Examples

6.1 Cavitation

The problem formulation is that of (94)–(95). Our numerical experience is that for the chosen discretization \(\gamma _0\) should not be chosen too large; in our example we chose \(\gamma _0=1/100\).

We consider a domain with an elliptically shaped pocket, with mesh shown in Fig. 1. The boundary conditions are natural boundary conditions

$$\begin{aligned} (-p{\varvec{I}} + \mu \nabla \varvec{u})\cdot {\varvec{n}} = {\varvec{0}} \end{aligned}$$

at the left- and right-hand sides. The velocity is set to zero along the floor of the channel and pocket boundary, and the flow is driven by setting \(\varvec{u} = (1,0)\) at the ceiling. The viscosity is \(\mu =1\). We compare the pressure solution with and without cavitation in Figs. 2 and 3 and note that there is a pressure resultant in the cavitation case, creating a lifting resultant force, cf. [113].

6.2 Elastic Contact with Flexible Plane

In this example, we consider an elastic sphere of radius 1 under the load \({{\varvec{f}}}=(0,0,-50)\) in contact with a flexible plane. The contact is assumed friction-free, in accordance with the form (141). The moduli of elasticity were chosen as \(E=200\) and \(\nu =0.33\) and the stabilisation parameter was taken as \(\gamma =100 E\). In Figs. 4, 5, and 6 we show the deformation and contact pressure for increasing flexibilities of the contact plane.

Fig. 1
figure 1

Mesh for cavitation computations

Fig. 2
figure 2

Pressure isolines without (left) and with (right) cavitation

Fig. 3
figure 3

Pressure elevation without (left) and with (right) cavitation

Fig. 4
figure 4

Deformations for \(\alpha =0\) and associated contact pressure

Fig. 5
figure 5

Deformations for \(\alpha =10^{-3}\) and associated contact pressure

Fig. 6
figure 6

Deformations for \(\alpha =10^{-2}\) and associated contact pressure

Fig. 7
figure 7

Computational mesh and elevation of displacements with obstacle indicated.

Fig. 8
figure 8

Deformation isoplot

6.3 Plate Obstacle Problem

The considered example, from [52], concerns a clamped square plate \(\Omega = (0,1)\times (0,1)\) in contact with a rigid obstacle (\(\beta =0)\) in the center of the plate, \(g=100((x-1/2)^2+(y-1/2)^2)\). Here \(E=1\), \(\nu =0\), \(t=1\), and we chose \(\gamma _1=10 E\) and \(\gamma _2 =E/10\). We present a sample computation using continuous, piecewise \(P^2\) approximations for both displacement and rotations on triangular meshes, based on the variational equations resulting from minimization of the Lagrangian (169). The mesh is shown in Fig. 7 (left), and the corresponding soultion is given in Figs. 7 (right, with obstacle indicated) and 8. The computational solution agrees well with that of [52].