1 Introduction

Consider the generalized equation (GE)

$$\begin{aligned} 0\in H(x):=f(x)+Q(x), \end{aligned}$$
(1.1)

where \(f:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is continuously differentiable and \(Q:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) is a set-valued mapping with a closed graph. For the numerical solution of (1.1) various methods are available, including the semismooth\(^{*}\) Newton method developed in [13]. In this method, the approximation/linearization of the multi-valued term in (1.1) is performed on the basis of either the graph of the respective strict derivative or the limiting coderivative. In each Newton step, one has to solve a linear square system with a non-singular matrix. The method thus differs both from the approach of Josephy [20, 21], where the multi-valued part is not approximated at all, and from the Newton-type methods in [1] and [8], where the multi-valued term is approximated in a different way.

In [14] the authors have shown that for a class of the so-called SCD (subspace containing derivatives) mappings the semismooth\(^{*}\) Newton can be improved. In particular, at these mappings, we dispose at each point with linear subspaces belonging to the graphs of the above-mentioned generalized derivatives, which generate the linear systems in the Newton step in a straightforward way. Moreover, the "regularity" requirement, needed to ensure the (locally) superlinear convergence, could have been substantially relaxed. In [15] this so-called SCD semismooth\(^{*}\) Newton method has been implemented in a class of variational inequalities (VIs) of the second kind that includes, among various problems of practical importance, also a class of discretized contact problems with Tresca friction ([16]). The very good performance of the new method, when applied to those problems, has led us to consider more complicated problems, in which Q does not amount to the subdifferential of a convex function. This clearly requires a generalization of the theory of [15]. As a concrete representative problem of this type, we have chosen a discretized static contact problem, where the Tresca friction is replaced by the (physically more realistic) Coulomb friction.

Starting with the pioneering paper [27] there are many papers and a comprehensive monograph [11] devoted to static, quasi-static and dynamic contact problems with Coulomb friction for various types of material of the bodies in contact. Concerning the static contact of two elastic bodies or an elastic body with a rigid obstacle, it is known [27] that this problem has a (not necessarily unique) solution whenever the friction coefficient belongs to the interval (0, b], where \(b > 0\) is a bound depending on the Poisson constant. This is a great difference from the corresponding discretized problems where, for small (mesh-dependent) values of the friction coefficient, one has to do with a unique solution. However, when the friction coefficient increases, discretized models may allow multiple solutions, as shown, e.g., in [26, 30],

For simplicity, we consider only the contact of an elastic body with a rigid obstacle (Signorini problem with friction). From the algebraic point of view, each two-body problem can be rewritten formally as a one-body problem, see [33]. Such problems can be modeled as quasi-variational inequalities (QVIs) expressed in terms of the so-called dual variables (having the physical meaning of stresses). This enables us to employ a variety of methods developed for the numerical solution of QVIs, cf., e.g., [18, Chapter 5], [12]. As shown in [23, 28], to these methods we can also count the classical semismooth Newton method [25, 29]. Another approach has been used in [2, 3], where a solution is computed as a fixed point of a mapping generated by solving the corresponding contact problems with the Tresca friction. These problems can be solved, for example, by a specially tailored minimization routine; see [24]. In this paper, we use a new discrete model formulated in displacements, which is obtained by a modification of the GE used in [3]. We thus work with a purely "primal" model, well suited for a direct application of the SCD semismooth\(^{*}\) Newton method.

The plan of the paper is as follows: In Sect. 2 we recall some standard notions from variational analysis, which are extensively used throughout the paper. Section 3 is devoted to the SCD mappings; we list their basic properties and provide some calculus rules, indispensable in the construction of the SCD semismooth\(^{*}\) Newton method in Sect. 4 and its subsequent implementation. In Sect. 4 we present the main conceptual algorithm which is thereafter, in Sect. 5, implemented to the numerical solution of GE (1.1). As a result, we obtain an efficient tool, applicable to a wide range of equilibrium models including VIs of the first and second kind, hemivariational inequalities [19] and many others. The first part of Sect. 6 is devoted to the construction of the new model of the discrete contact problem with Coulomb friction mentioned above (Section 6.1). In Section 6.2, it is shown that the multifunction, which arises in the respective GE, is an SCD mapping and possesses the semismooth\(^{*}\) property. This finally enables us to specialize the formulas for the approximation and Newton step, developed in Sect. 5, for the GE considered. The results of the numerical experiments are presented in Sect. 7.

Our notation is basically standard. Given a linear subspace \(L \subseteq \mathbb {R}^n\), \(L^\perp\) denotes its orthogonal complement. For an element \(u \in \mathbb {R}^n\), \(\Vert u\Vert\) denotes its Euclidean norm, \({\mathscr{B}}_\epsilon (u)\) denotes the closed ball around u with radius \(\epsilon\) and \(\mathbb {S}_{\mathbb {R}^n}\) stands for the unit sphere in \(\mathbb {R}^n\). For a matrix A, \(\mathrm{rge\;}A\) signifies its range. To avoid possible confusion, in some situations the dimension of a unit matrix I will be indicated by a subscript (\(I_n\)). Given a set \(\Omega \subset \mathbb {R}^s\), we define the distance from a point x to \(\Omega\) by \(d_{\Omega }(x):= \mathrm{dist}(x, \Omega ):= \inf \{\Vert y-x\Vert \,\big |\,y\in \Omega \}\), the respective indicator function is denoted by \(\delta _{\Omega }\) and \({\mathop {x \rightarrow \bar{x}}\limits ^{\Omega }}\) means convergence within \(\Omega\). When a mapping \(F: \mathbb {R}^n \rightarrow \mathbb {R}^m\) is differentiable at x, we denote by \(\nabla F(x)\) its Jacobian.

2 Preliminaries

Throughout the paper, we will frequently use the following basic notions of modern variational analysis.

Definition 2.1

Let A be a set in \(\mathbb {R}^{s}\), \(\bar{x} \in A\) and A be locally closed around \(\bar{x}\). Then

  1. (i)

    The tangent (contingent, Bouligand) cone to A at \(\bar{x}\) is given by

    $$\begin{aligned} T_{A}(\bar{x}):=\mathop {\mathrm{Lim}\,\mathrm{sup}}\limits _{t\downarrow 0} \frac{A-\bar{x}}{t}. \end{aligned}$$

    A tangent \(u\in T_A(\bar{x})\) is called derivable if \(\lim _{t\downarrow 0}\mathrm{dist}(\bar{x}+tu,A)/t=0\). The set A is geometrically derivable at \(\bar{x}\) if every tangent vector u to A at \(\bar{x}\) is derivable.

  2. (ii)

    The set

    $$\begin{aligned} \widehat{N}_{A}(\bar{x}):=(T_{A}(\bar{x}))^{\circ } \end{aligned}$$

    is the regular (Fréchet) normal cone to A at \(\bar{x}\), and

    $$\begin{aligned}N_{A}(\bar{x}):=\mathop {\mathrm{Lim}\,\mathrm{sup}}\limits _{{\mathop {x \rightarrow \bar{x}}\limits ^{A}}} \widehat{N}_{A}(x)\end{aligned}$$

    is the limiting (Mordukhovich) normal cone to A at \(\bar{x}\).

In this definition ”\(\mathop {\mathrm{Lim}\,\mathrm{sup}}\)” stands for the Painlevé-Kuratowski outer (upper) set limit, see, e.g., [32]. The above listed cones enable us to describe the local behavior of set-valued maps via various generalized derivatives. Let \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^m\) be a (set-valued) mapping with the domain and the graph

$$\begin{aligned}\mathrm {dom}\,F:=\{x\in \mathbb {R}^n\,\big |\,F(x) \not = \emptyset \},\quad \mathrm {gph}\,F:=\{(x,y)\in \mathbb {R}^n\times \mathbb {R}^m \,\big |\,y\in F(x)\}.\end{aligned}$$

Definition 2.2

Consider a (set-valued) mapping \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^m\) and let \(\mathrm {gph}\,F\) be locally closed around some \((\bar{x},\bar{y})\in \mathrm {gph}\,F\).

  1. (i)

    The multifunction \(DF(\bar{x},\bar{y}):\mathbb {R}^n\rightrightarrows \mathbb {R}^m\), given by \(\mathrm {gph}\,DF(\bar{x},\bar{y})=T_{\mathrm {gph}\,F}(\bar{x},\bar{y})\), is called the graphical derivative of F at \((\bar{x},\bar{y})\).

  2. (ii)

    The multifunction \(D^*F(\bar{x},\bar{y}): \mathbb {R}^m \rightrightarrows \mathbb {R}^n\), defined by

    $$\begin{aligned} \mathrm {gph}\,D^*F(\bar{x},\bar{y})=\{(y^*,x^*)\,\big |\,(x^*,-y^*)\in N_{\mathrm {gph}\,F}(\bar{x},\bar{y})\}\end{aligned}$$

    is called the limiting coderivative of F at \((\bar{x},\bar{y})\).

Let us now recall the following regularity notions.

Definition 2.3

Let \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^m\) be a (set-valued) mapping and let \((\bar{x},\bar{y})\in \mathrm {gph}\,F\).

  1. 1.

    F is said to be metrically subregular at \((\bar{x},\bar{y})\) if there exists a real \(\kappa \ge 0\) along with some neighborhood X of \(\bar{x}\) such that

    $$\begin{aligned} \mathrm{dist}(x,F^{-1}(\bar{y}))\le \kappa \,\mathrm{dist}(\bar{y},F(x))\ \forall x\in X. \end{aligned}$$
  2. 2.

    F is said to be strongly metrically subregular at \((\bar{x},\bar{y})\) if it is metrically subregular at \((\bar{x},\bar{y})\) and there exists a neighborhood \(X'\) of \(\bar{x}\) such that \(F^{-1}(\bar{y})\cap X'=\{\bar{x}\}\).

  3. 3.

    F is said to be metrically regular around \((\bar{x},\bar{y})\) if there is some \(\kappa \ge 0\) together with neighborhoods X of \(\bar{x}\) and Y of \(\bar{y}\) such that

    $$\begin{aligned} \mathrm{dist}(x,F^{-1}(y))\le \kappa \,\mathrm{dist}(y,F(x))\ \forall (x,y)\in X\times Y. \end{aligned}$$
  4. 4.

    F is said to be strongly metrically regular around \((\bar{x},\bar{y})\) if it is metrically regular around \((\bar{x},\bar{y})\) and \(F^{-1}\) has a single-valued localization around \((\bar{y},\bar{y})\), i.e., there are open neighborhoods \(Y'\) of \(\bar{y}\), \(X'\) of \(\bar{x}\) and a mapping \(h:Y'\rightarrow \mathbb {R}^n\) with \(h(\bar{y})=\bar{x}\) such that \(\mathrm {gph}\,F\cap (X'\times Y')=\{(h(y),y)\,\big |\,y\in Y'\}\).

It is easy to see that the strong metric regularity around \((\bar{x},\bar{y})\) implies the strong metric subregularity at \((\bar{x},\bar{y})\) and the metric regularity around \((\bar{x},\bar{y})\) implies the metric subregularity at \((\bar{x},\bar{y})\). To check the metric regularity one often employs the so-called Mordukhovich criterion, see, e.g. [32, Theorem 9.43], according to which this property around \((\bar{x},\bar{y})\) is equivalent to the condition

$$\begin{aligned} 0\in D^*F(\bar{x},\bar{y})(y^*)\ \Rightarrow \ y^*=0. \end{aligned}$$
(2.1)

Another useful characterization of metric regularity in terms of the graphical derivative is provided by the so-called Aubin-criterion by Dontchev et al [9]. For pointwise characterizations of the other stability properties from Definition 2.3, the reader is referred to [14, Theorem 2.7].

In this preparatory section, we end with a definition of the semismooth\(^{*}\) property, which paved the way for both the semismooth\(^{*}\) Newton method in [13] and the SCD semismooth\(^{*}\) Newton method in [14].

Definition 2.4

We say that \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) is semismooth\(^{*}\) at \((\bar{x},\bar{y})\in \mathrm {gph}\,F\) if for every \(\epsilon >0\) there is some \(\delta >0\) such that the inequality

$$\begin{aligned} \vert \langle x^*,x-\bar{x}\rangle -\langle y^*,y-\bar{y}\rangle \vert&\le \epsilon \Vert (x,y)-(\bar{x},\bar{y})\Vert \Vert (x^*,y^*)\Vert \end{aligned}$$

holds for all \((x,y)\in \mathrm {gph}\,F\cap {\mathscr{B}}_\delta (\bar{x},\bar{y})\) and all \((y^*,x^*)\) belonging to \(\mathrm {gph}\,D^*F(x,y)\).

3 On SCD mappings

3.1 Basic properties

In this section, we wish to recall the basic definitions and characteristics of the SCD property introduced in the recent paper [14].

In what follows, we denote by \(\mathcal{Z}_n\) the metric space of all n-dimensional subspaces of \(\mathbb {R}^{2n}\) equipped with the metric

$$\begin{aligned}d_{\mathcal{Z}_n}(L_1,L_2):=\Vert P_{L_1}-P_{L_2}\Vert ,\end{aligned}$$

where \(P_{L_i}\) is the symmetric \(2n\times 2n\) matrix representing the orthogonal projection onto \(L_i\), \(i=1,2\).

Sometimes, we will also work with bases for the subspaces \(L\in \mathcal{Z}_n\). Let \({\mathscr{M}}_n\) denote the collection of all \(2n\times n\) matrices with full column rank n and define for \(L\in \mathcal{Z}_n\) the set

$$\begin{aligned}{\mathscr{M}}(L):=\{Z\in {\mathscr{M}}_n\,\big |\,\mathrm{rge\;}Z =L\},\end{aligned}$$

i.e., the columns of \(Z\in {\mathscr{M}}(L)\) create a basis for L.

We treat every element of \(\mathbb {R}^{2n}\) as a column vector. To keep the notation simple, we write (uv) instead of \(\begin{pmatrix}u\\ v\end{pmatrix}\in \mathbb {R}^{2n}\) when this does not cause confusion.

Let \(L\in \mathcal{Z}_n\) and consider \(Z\in {\mathscr{M}}(L)\). Then we can divide Z into two \(n\times n\) matrices A and B and write \(Z=(A,B)\) instead of \(Z=\begin{pmatrix}A\\ B\end{pmatrix}\). It follows that \(\mathrm{rge\;}(A,B):=\{(Au,Bu)\,\big |\,u\in \mathbb {R}^n\}\doteq \{\begin{pmatrix}Au\\ Bu\end{pmatrix}\,\big |\,u\in \mathbb {R}^n\}=L\).

Furthermore, for every \(L\in \mathcal{Z}_n\) we can define the adjoint space

$$\begin{aligned} L^*&:=\{(-v^*,u^*)\,\big |\,(u^*,v^*)\in L^\perp \}. \end{aligned}$$
(3.1)

It can be shown that \((L^*)^*=L\) and \(d_{\mathcal{Z}_n}(L_1,L_2)=d_{\mathcal{Z}_n}(L_1^*,L_2^*)\). Thus, the mapping \(L\rightarrow L^*\) defines an isometry on \(\mathcal{Z}_n\).

Definition 3.1

Consider a mapping \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\).

  1. 1.

    We call F graphically smooth of dimension n at \((x,y)\in \mathrm {gph}\,F\), if \(T_{\mathrm {gph}\,F}(x,y)=\mathrm {gph}\,DF(x,y)\in \mathcal{Z}_n\). In addition, we denote by \(\mathcal{O}_F\) the set of all points where F is graphically smooth of dimension n.

  2. 2.

    We associate with F the four mappings \(\widehat{\mathscr{S}}F:\mathrm {gph}\,F\rightrightarrows \mathcal{Z}_n\), \(\widehat{\mathscr{S}}^* F:\mathrm {gph}\,F\rightrightarrows \mathcal{Z}_n\), \({\mathscr{S}}F:\mathrm {gph}\,F\rightrightarrows \mathcal{Z}_n\), \({\mathscr{S}}^* F:\mathrm {gph}\,F\rightrightarrows \mathcal{Z}_n\), given by

    $$\begin{aligned}\widehat{\mathscr{S}}F(x,y)&:={\left\{ \begin{array}{ll}\{\mathrm {gph}\,DF(x,y)\}&{} \text{ if }\quad (x,y)\in \mathcal{O}_F,\\ \emptyset &{}\text{ else, }\end{array}\right. }\\ \widehat{\mathscr{S}}^* F(x,y)&:={\left\{ \begin{array}{ll}\{\mathrm {gph}\,DF(x,y)^*\}&{} \text{ if }\quad (x,y)\in \mathcal{O}_F,\\ \emptyset &{}\text{ else, }\end{array}\right. }\\ {\mathscr{S}}F(x,y)&:=\mathop {\mathrm{Lim}\,\mathrm{sup}}_{(u,v)\mathop {\longrightarrow }\limits ^{{\mathrm {gph}\,F}}(x,y)} \widehat{\mathscr{S}}F(u,v) \\&=\{L\in \mathcal{Z}_n\,\big |\,\exists (x_k,y_k)\mathop {\longrightarrow }\limits ^{{\mathcal{O}_F}}(x,y):\ \lim _{k\rightarrow \infty } d_{\mathcal{Z}_n}(L,\mathrm {gph}\,DF(x_k,y_k))=0\},\\ {\mathscr{S}}^* F(x,y)&:=\mathop {\mathrm{Lim}\,\mathrm{sup}}_{(u,v)\mathop {\longrightarrow }\limits ^{{\mathrm {gph}\,F}}(x,y)} \widehat{\mathscr{S}}^* F(u,v)\\&=\{L\in \mathcal{Z}_n\,\big |\,\exists (x_k,y_k)\mathop {\longrightarrow }\limits ^{{\mathcal{O}_F}}(x,y):\ \lim _{k\rightarrow \infty } d_{\mathcal{Z}_n}(L,\mathrm {gph}\,DF(x_k,y_k)^*)=0\}. \end{aligned}$$
  3. 3.

    We say that F has the SCD (subspace containing derivative) property at \((x,y)\in \mathrm {gph}\,F\), if \({\mathscr{S}}^*F(x,y)\not =\emptyset\). F is said to have the SCD property around \((x,y)\in \mathrm {gph}\,F\), if there is a neighborhood W of (xy) such that F has the SCD property at every \((x',y')\in \mathrm {gph}\,F\cap W\). Finally, we call F an SCD mapping if F has the SCD property at every point of its graph.

Since \(L\rightarrow L^*\) is an isometry on \(\mathcal{Z}_n\) and \((L^*)^*=L\), the mappings \({\mathscr{S}}^*F\) and \({\mathscr{S}}F\) are related through

$$\begin{aligned} {\mathscr{S}}^* F(x,y)=\{L^*\,\big |\,L\in {\mathscr{S}}F(x,y)\},\ {\mathscr{S}}F(x,y)=\{L^*\,\big |\,L\in {\mathscr{S}}^* F(x,y)\}.\end{aligned}$$
(3.2)

The name SCD property is motivated by the following statement.

Lemma 3.2

(cf.[14, Lemma 3.7]) Let \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) and let \((x,y)\in \mathrm {gph}\,F\). Then \(L\subseteq \mathrm {gph}\,D^*F(x,y)\) \(\forall L\in {\mathscr{S}}^*F(x,y)\).

In the recent paper [14] one can find several calculus rules to work with the SCD property including the next result.

Proposition 3.3

(cf.[14, Proposition 3.15]) Let \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) have the SCD property at \((x,y)\in \mathrm {gph}\,F\) and let \(h:U\rightarrow \mathbb {R}^n\) be continuously differentiable at \(x\in U\) where \(U\subseteq \mathbb {R}^n\) is open. Then \(F+h\) has the SCD property at \((x,y+h(x))\) and

$$\begin{aligned} {\mathscr{S}}(F+h)(x,y+h(x))=\Big \{\left( \begin{matrix}I&{}0\\ \nabla h(x)&{}I\end{matrix}\right) L\,\big |\,L\in {\mathscr{S}}F(x,y)\Big \}, \end{aligned}$$
(3.3)
$$\begin{aligned} {\mathscr{S}}^* (F+h)(x,y+h(x))=\Big \{\left( \begin{matrix}I&{}0\\ \nabla h(x)^T&{}I\end{matrix}\right) L\,\big |\,L\in {\mathscr{S}}^* F(x,y)\Big \}. \end{aligned}$$
(3.4)

Note that these sum rules are also valid at the points \((x,y)\in \mathrm {gph}\,F\) where F does not have the SCD property: In this case, we simply have \({\mathscr{S}}(F+h)(x,y+h(x))={\mathscr{S}}^*(F+h)(x,y+h(x))={\mathscr{S}}F(x,y)={\mathscr{S}}^* F(x,y)=\emptyset\). In addition, we will need some calculus rules for the Cartesian product of mappings. Consider the mapping \(F:\prod _{i=1}^p\mathbb {R}^{n_i}\rightrightarrows \prod _{i=1}^p\mathbb {R}^{n_i}\) defined by

$$\begin{aligned} F(x_1,\ldots ,x_p):=\prod _{i=1}^pF_i(x_i),\end{aligned}$$
(3.5)

where each multifunction \(F_i:\mathbb {R}^{n_i}\rightrightarrows \mathbb {R}^{n_i}\), \(i=1,\ldots ,p\), has a closed graph. Note that

$$\begin{aligned}&T_{\mathrm {gph}\,F}\big ((x_1,\ldots ,x_p),(y_1,\ldots ,y_p)\big )\nonumber \\&\quad =\left\{ \big ((u_1,\ldots ,u_p),(v_1,\ldots ,v_p)\big )\,\big |\,\big ((u_1,v_1),\ldots ,(u_p,v_p)\big )\in T_{\mathrm {gph}\,F_1\times \ldots \times \mathrm {gph}\,F_p}\big ((x_1,y_1),\ldots ,(x_p,y_p)\big )\right\} \end{aligned}$$
(3.6)
$$\begin{aligned}&\subset \left\{ \big ((u_1,\ldots ,u_p),(v_1,\ldots ,v_p)\big )\,\big |\,(u_i,v_i)\in T_{\mathrm {gph}\,F_i}(x_i,y_i),\ i=1,\ldots ,p\right\} ,\end{aligned}$$
(3.7)

where the first equation follows from the identity

$$\begin{aligned}\mathrm {gph}\,F=\left\{ \big ((x_1,\ldots ,x_p),(y_1,\ldots ,y_p)\big )\,\big |\,\big ((x_1,y_1),\ldots ,(x_p,y_p)\big )\in \mathrm {gph}\,F_1\times \ldots \times \mathrm {gph}\,F_p\right\} \end{aligned}$$

together with [32, Exercise 6.7] and the inclusion (3.7) is a consequence of [32, Proposition 6.41].

Lemma 3.4

Let F be given by (3.5) and let \((x,y):=\big ((x_1,\ldots ,x_p),(y_1,\ldots ,y_p)\big )\in \mathrm {gph}\,F\). Then we have the following:

  1. 1.

    If \((x,y)\in \mathcal{O}_F\) then each tangent cone \(T_{\mathrm {gph}\,F_i}(x_i,y_i)\), \(i=1,\ldots ,p\), is a subspace of \(\mathbb {R}^{2n_i}\).

  2. 2.

    On the contrary, if \((x_i,y_i)\in \mathcal{O}_{F_i}\), \(i=1,\ldots ,p\), and all but at most one of the sets \(\mathrm {gph}\,F_i\), \(i=1,\ldots ,p\), are geometrically derivable at \((x_i,y_i)\), then \((x,y)\in \mathcal{O}_F\).

Proof

In order to prove the first assertion, let \((x,y)\in \mathcal{O}_F\), let \(i\in \{1,\ldots ,p\}\) be arbitrarily fixed and consider the set

$$\begin{aligned} L_i:=\{0_{m_1}\}\times T_{\mathrm {gph}\,F_i}(x_i,y_i)\times \{0_{m_2}\}\quad \text{ with }\quad m_1:=2\sum _{k=1}^{i-1}n_k,\ m_2:=2\sum _{k=i+1}^{p}n_k. \end{aligned}$$

Then it readily follows from Definition 2.1(i) that \(L_i\) is a subset of \(T:=T_{\mathrm {gph}\,F_1\times \ldots \times \mathrm {gph}\,F_p}\big ((x_1,y_1),\ldots ,(x_p,y_p)\big )\), which is a subspace by (3.6). Consider two tangents \(t_1,t_2\in T_{\mathrm {gph}\,F_i}(x_i,y_i)\) together with two scalars \(\mu _1,\mu _2\). Since \((0_{m_1},t_j,0_{m_2})\in L_i\subset T\), \(j=1,2\), we conclude

$$\begin{aligned} \mu _1(0_{m_1},t_1,0_{m_2})+\mu _2(0_{m_1},t_2,0_{m_2})=(0_{m_1},\mu _1t_1+\mu _2t_2t_1,0_{m_2})\in T \end{aligned}$$

and from (3.7) we deduce \(\mu _1t_1+\mu _2t_2\in T_{\mathrm {gph}\,F_i}(x_i,y_i)\). This proves that \(T_{\mathrm {gph}\,F_i}(x_i,y_i)\) is a subspace.

The second statement follows from the fact that under the stated assumption, inclusion (3.7) holds with equality, cf. [17, Proposition 1].

Proposition 3.5

Let F be given by (3.5) and assume that all the mappings \(F_i\), \(i=1,\ldots ,p\), are SCD mappings. If all, but at most one, of the mappings \(F_i\), \(i=1,\ldots ,p\), have the property that \(\mathrm {gph}\,F_i\) is geometrically derivable at every point \((x_i,y_i)\in \mathcal{O}_{F_i}\), then F is an SCD mapping, and

$$\begin{aligned} \mathcal{O}_F\supset \left\{ \big ((x_1,\ldots ,x_p),(y_1,\ldots ,y_p)\big )\,\big |\,(x_i,y_i)\in \mathcal{O}_{F_i},\ i=1,\ldots ,p\right\} . \end{aligned}$$
(3.8)

Moreover, for every \((x,y)=\big ((x_1,\ldots ,x_p),(y_1,\ldots ,y_p)\big )\in \mathrm {gph}\,F\) there holds

$$\begin{aligned} {\mathscr{S}}F(x,y)\supset \Big \{\left\{ \big ((u_1,\ldots ,u_p),(v_1,\ldots ,v_p)\big )\,\big |\,(u_i,v_i)\in L_i\right\} \,\big |\,L_i\in {\mathscr{S}}F_i(x_i,y_i),\ i=1,\ldots ,p\Big \}, \end{aligned}$$
(3.9)
$$\begin{aligned} {\mathscr{S}}^* F(x,y)\supset \Big \{\left\{ \big ((v_1^*,\ldots ,v_p^*),(u_1^*,\ldots ,u_p^*)\big )\,\big |\,(v_i^*,u_i^*)\in L_i\right\} \,\big |\,L_i\in {\mathscr{S}}^* F_i(x_i,y_i),\ i=1,\ldots ,p\Big \}. \end{aligned}$$
(3.10)

The equality holds in the inclusions (3.8), (3.9) and (3.10) if, in addition, all but at most one of the mappings \(F_i\), \(i=1,\ldots ,p\), have the following property: For every \((x_i,y_i)\in \mathrm {gph}\,F_i\) such that \(T_{\mathrm {gph}\,F_i}(x_i,y_i)\) is a subspace, the dimension of this subspace is \(n_i\).

Proof

The inclusions (3.8) and (3.9) follow immediately from Lemma 3.4. Since \(F_i\), \(i=1,\ldots , p\), are SCD mappings, \(\mathcal{O}_{F_i}\) is dense in \(\mathrm {gph}\,F_i\) and from (3.8) we conclude that \(\mathcal{O}_F\) is dense in \(\mathrm {gph}\,F\). This proves that F is an SCD mapping. Now consider subspaces \(L_i\in {\mathscr{S}}F_i(x_i,y_i)\), \(i=1,\ldots ,p\). By taking into account the relation

$$\begin{aligned}\left\{ \big ((u_1,\ldots ,u_p),(v_1,\ldots ,v_p)\big )\,\big |\,(u_i,v_i)\in L_i\right\} ^*=\left\{ \big ((v_1^*,\ldots ,v_p^*),(u_1^*,\ldots ,u_p^*)\big )\,\big |\,(v_i^*,u_i^*)\in L_i^*\right\} ,\end{aligned}$$

the inclusion (3.10) follows from (3.9) and (3.2). The statement about equality in (3.8) is again a consequence of Lemma 3.4 implying equality in (3.9) and (3.10).

The following large class of graphically Lipschitzian mappings covers many mappings important in applications; cf. [31], and is also important in the context of SCD mappings.

Definition 3.6

(cf.[32, Definition 9.66]) A mapping \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^m\) is graphically Lipschitzian of dimension d at \((\bar{x},\bar{y})\in \mathrm {gph}\,F\) if there is an open neighborhood W of \((\bar{x},\bar{y})\) and a one-to-one mapping \(\Phi\) from W onto an open subset of \(\mathbb {R}^{n+m}\) with \(\Phi\) and \(\Phi ^{-1}\) continuously differentiable, such that \(\Phi (\mathrm {gph}\,F\cap W)\) is the graph of a Lipschitz continuous mapping \(f:U\rightarrow \mathbb {R}^{n+m-d}\), where U is an open set in \(\mathbb {R}^d\).

Every multifunction \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) that is graphically Lischitzian of dimension n at some point \((x,y)\in \mathrm {gph}\,F\), has the SCD property around (xy) by [14, Proposition 3.17]. We now state another property related to Proposition 3.5.

Lemma 3.7

Let \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) be graphically Lipschitzian of dimension n at \((\bar{x},\bar{y})\in \mathrm {gph}\,F\). Then there is an open neighborhood W of \((\bar{x},\bar{y})\) such that for all \((x,y)\in \mathrm {gph}\,F\cap W\) the following properties hold:

  1. (i)

    If \((x,y)\in \mathcal{O}_F\) then \(\mathrm {gph}\,F\) is geometrically derivable at (xy).

  2. (ii)

    If \(T_{\mathrm {gph}\,F}(x,y)\) is a subspace, then the dimension of this subspace is n.

Proof

Regarding property (i), we refer to [14, Remark 3.18] and [32, Proposition 8.41]. To show the second statement, let W, \(\Phi\), U and f be as in Definition 3.6 and consider \((x,y)\in \mathrm {gph}\,F\cap W\) such that \(T_{\mathrm {gph}\,F}(x,y)\) is a subspace. Denoting \((u,f(u))= \Phi (x,y)\), we obtain that

$$\begin{aligned}T_{\mathrm {gph}\,f}(u,f(u))=\nabla \Phi (x,y)T_{\mathrm {gph}\,F}(x,y)\end{aligned}$$

is a subspace with the same dimension as \(T_{\mathrm {gph}\,F}(x,y)\). Therefore, the graphical derivative Df(uf(u)) is a linear mapping and f is Fréchet differentiable at u by [32, Exercise 9.25]. Since \(T_{\mathrm {gph}\,f}(u,f(u))=\mathrm{rge\;}(I,\nabla f(u))\), the dimension of \(T_{\mathrm {gph}\,f}(u,f(u))\) is n, which is the same as the dimension of \(T_{\mathrm {gph}\,F}(x,y)\).

Next, we turn to the notion of SCD regularity.

Definition 3.8

  1. 1.

    We denote by \(\mathcal{Z}_n^\mathrm{reg}\) the collection of all subspaces \(L\in \mathcal{Z}_n\) such that

    $$\begin{aligned} (y^*,0)\in L\ \Rightarrow \ y^*=0. \end{aligned}$$
  2. 2.

    A mapping \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) is called SCD regular around \((x,y)\in \mathrm {gph}\,F\), if F has the SCD property around (xy) and

    $$\begin{aligned} (y^*,0)\in L \Rightarrow \ y^*=0\ \forall L\in {\mathscr{S}}^*F(x,y), \end{aligned}$$
    (3.11)

    i.e., \(L\in \mathcal{Z}_n^\mathrm{reg}\) for all \(L\in {\mathscr{S}}^*F(x,y)\). Further, we will denote by

    $$\begin{aligned}\mathrm{scd\,reg\;}F(x,y):=\sup \{\Vert y^*\Vert \,\big |\,(y^*,x^*)\in L, L\in {\mathscr{S}}^*F(x,y), \Vert x^*\Vert \le 1\}\end{aligned}$$

    the modulus of SCD regularity of F around (xy).

Since the elements of \({\mathscr{S}}^*F(x,y)\) are contained in \(\mathrm {gph}\,D^*F(x,y)\), it follows from the Mordukhovich criterion (2.1) that SCD regularity is weaker than metric regularity, and consequently SCD regularity is also weaker than strong metric regularity.

In the following proposition, we state some basic properties of subspaces \(L\in \mathcal{Z}_n^\mathrm{reg}\).

Proposition 3.9

(cf.[14, Proposition 4.2]) We have \(L\in \mathcal{Z}_n^\mathrm{reg}\) if and only if for every \((A,B)\in {\mathscr{M}}(L)\) the matrix B is not singular. Thus, for every \(L\in \mathcal{Z}_n^\mathrm{reg}\) there is a unique \(n\times n\) matrix \(C_L\) such that \(L=\mathrm{rge\;}(C_L,I)\). Further, \(L^*=\mathrm{rge\;}(C_L^T,I)\in \mathcal{Z}_n^\mathrm{reg}\),

$$\begin{aligned} \langle x^*,C_L^Tv\rangle =\langle y^*,v\rangle \ \forall (y^*,x^*)\in L\forall v\in \mathbb {R}^n. \end{aligned}$$

and

$$\begin{aligned} \Vert y^*\Vert \le \Vert C_L\Vert \Vert x^*\Vert \ \forall (y^*,x^*)\in L. \end{aligned}$$

Note that for every \(L\in \mathcal{Z}_n^{\mathrm{reg\,}}\) there is \(C_L=AB^{-1}\) for all \((A,B)\in {\mathscr{M}}(L)\). Combining [14, Equation (34), Lemma 4.7 and Proposition 4.8] we obtain the following lemma.

Lemma 3.10

Assume that \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) is SCD regular around \((\bar{x},\bar{y})\in \mathrm {gph}\,F\). Then

$$\begin{aligned}\mathrm{scd\,reg\;}F(\bar{x},\bar{y})=\sup \{\Vert C_L\Vert \,\big |\,L\in {\mathscr{S}}^*F(\bar{x},\bar{y})\}<\infty .\end{aligned}$$

Moreover, F is SCD regular around every \((x,y)\in \mathrm {gph}\,F\) sufficiently close to \((\bar{x},\bar{y})\) and

$$\begin{aligned}\limsup _{(x,y)\mathop {\longrightarrow }\limits ^{\mathrm {gph}\,F}(\bar{x},\bar{y})}\mathrm{scd\,reg\;}F(x,y)\le \mathrm{scd\,reg\;}F(\bar{x},\bar{y}).\end{aligned}$$

4 On semismooth* Newton methods for SCD mappings

In this section we recall the general framework for the semismooth\(^{*}\) Newton method introduced in [13] and adapted to SCD mappings in [14]. Consider the inclusion

$$\begin{aligned} 0\in F(x), \end{aligned}$$
(4.1)

where \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) is a mapping having the SCD property around some point \((\bar{x},0)\in \mathrm {gph}\,F\).

The following notion relaxes the semismooth\(^{*}\) property from Definition 2.4.

Definition 4.1

We say that \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) is SCD semismooth\(^{*}\) at \((\bar{x},\bar{y})\in \mathrm {gph}\,F\) if F has the SCD property around \((\bar{x},\bar{y})\) and for every \(\epsilon >0\) there is some \(\delta >0\) such that the inequality

$$\begin{aligned} \vert \langle x^*,x-\bar{x}\rangle -\langle y^*,y-\bar{y}\rangle \vert&\le \epsilon \Vert (x,y)-(\bar{x},\bar{y})\Vert \Vert (x^*,y^*)\Vert \end{aligned}$$

holds for all \((x,y)\in \mathrm {gph}\,F\cap {\mathscr{B}}_\delta (\bar{x},\bar{y})\) and all \((y^*,x^*)\) belonging to any \(L\in {\mathscr{S}}^*F(x,y)\).

Clearly, every mapping with the SCD property around \((\bar{x},\bar{y}) \in \mathrm {gph}\,F\) which is semismooth\(^{*}\) at \((\bar{x},\bar{y})\) is automatically SCD semismooth\(^{*}\) at \((\bar{x},\bar{y})\). Therefore, the class of SCD semismooth\(^{*}\) mappings is even richer than the class of semismooth\(^{*}\) maps. In particular, it follows from [22, Theorem 2] that every mapping whose graph is a closed subanalytic set is SCD semismooth\(^{*}\) , cf. [14].

The following proposition provides the key estimate for the semismooth\(^{*}\) Newton method for SCD mappings.

Proposition 4.2

([14, Proposition 5.3]) Assume that \(F:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) is SCD semismooth\(^{*}\) at \((\bar{x},\bar{y})\in \mathrm {gph}\,F\). Then for every \(\epsilon >0\) there is some \(\delta >0\) such that the estimate

$$\begin{aligned} \Vert x-C_L^T(y-\bar{y})-\bar{x}\Vert \le \epsilon \sqrt{n(1+\Vert C_L\Vert ^2)}\Vert (x,y)-(\bar{x},\bar{y})\Vert \end{aligned}$$

holds for every \((x,y)\in \mathrm {gph}\,F\cap {\mathscr{B}}_\delta (\bar{x},\bar{y})\) and every \(L\in {\mathscr{S}}^*F(x,y)\cap \mathcal{Z}_n^\mathrm{reg}\).

We now describe the SCD variant of the semismooth\(^{*}\) Newton method. Given a solution \(\bar{x}\in F^{-1}(0)\) of (4.1) and some positive scalar, we define the mappings \(\mathcal{A}_{\eta ,\bar{x}}:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\times \mathbb {R}^n\) and \(\mathcal{N}_{\eta ,\bar{x}}:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\) by

$$\begin{aligned}&\mathcal{A}_{\eta ,\bar{x}}(x):=\{(\hat{x},\hat{y})\in \mathrm {gph}\,F\,\big |\,\Vert (\hat{x},\hat{y})-(\bar{x},0)\Vert \le \eta \Vert x-\bar{x}\Vert \},\\&\mathcal{N}_{\eta ,\bar{x}}(x):=\{\hat{x}-C_L^T\hat{y}\,\big |\,(\hat{x},\hat{y})\in \mathcal{A}_{\eta ,\bar{x}}(x), L\in {\mathscr{S}}^*F(\hat{x},\hat{y})\cap \mathcal{Z}_n^\mathrm{reg}\}. \end{aligned}$$

Proposition 4.3

([15, Proposition 4.3]) Assume that F is SCD semismooth\(^{*}\) in \((\bar{x},0) \in \mathrm {gph}\,F\) and SCD regular around \((\bar{x},0)\) and let \(\eta >0\). Then there is some \(\bar{\delta }>0\) such that for every \(x\in {\mathscr{B}}_{\bar{\delta }}(\bar{x})\) the mapping F is SCD regular around every point \((\hat{x},\hat{y})\in \mathcal{A}_{\eta ,\bar{x}}(x)\). Furthermore, for every \(\epsilon >0\) there is some \(\delta \in (0,\bar{\delta }]\) such that

$$\begin{aligned}\Vert z-\bar{x}\Vert \le \epsilon \Vert x-\bar{x}\Vert \ \forall x\in {\mathscr{B}}_\delta (\bar{x}), \forall z\in \mathcal{N}_{\eta ,\bar{x}}(x).\end{aligned}$$

Assuming that we are given some iterate \(x^{(k)}\), the next iterate is given formally by \({x}^{(k+1)}\in \mathcal{N}_{\eta ,\bar{x}}({x}^{(k)})\). Let us take a closer look at this rule. Since we are dealing with set-valued mappings F, we cannot expect, in general, that \(F(x^{(k)})\not =\emptyset\) or that 0 is close to \(F(x^{(k)})\), even if \(x^{(k)}\) is close to a solution \(\bar{x}\). Therefore, we first perform some step that produces \((\hat{x}^{(k)},\hat{y}^{(k)})\in \mathrm {gph}\,F\) as an approximate projection of \((x^{(k)},0)\) onto \(\mathrm {gph}\,F\). We require that

$$\begin{aligned} \Vert (\hat{x}^{(k)},\hat{y}^{(k)})-(\bar{x},0)\Vert \le \eta \Vert x^{(k)}-\bar{x}\Vert \end{aligned}$$
(4.2)

for some constant \(\eta >0\), i.e., \(({\hat{x}}^{(k)},{\hat{y}}^{(k)})\in \mathcal{A}_{\eta ,\bar{x}}({x}^{(k)})\). For instance, if

$$\begin{aligned}\Vert (\hat{x}^{(k)},\hat{y}^{(k)})-(x^{(k)},0)\Vert \le \beta \mathrm{dist}((x^{(k)},0),\mathrm {gph}\,F) \end{aligned}$$

is true with some \(\beta \ge 1\), then

$$\begin{aligned}\Vert (\hat{x}^{(k)},\hat{y}^{(k)})-(\bar{x},0)\Vert&\le \Vert (\hat{x}^{(k)},\hat{y}^{(k)})-(x^{(k)},0)\Vert +\Vert (x^{(k)},0)-(\bar{x},0)\Vert \\&\le \beta \mathrm{dist}((x^{(k)},0),\mathrm {gph}\,F)+\Vert (x^{(k)},0)-(\bar{x},0)\Vert \le (\beta +1)\Vert (x^{(k)},0)-(\bar{x},0)\Vert \end{aligned}$$

and thus (4.2) holds with \(\eta =\beta +1\) and we can fulfill the inequality (4.2) without knowing the solution \(\bar{x}\). Furthermore, we require that \({\mathscr{S}}^*F(\hat{x}^{(k)},\hat{y}^{(k)})\cap \mathcal{Z}_n^\mathrm{reg}\not =\emptyset\) and compute the new iterate as \(x^{(k+1)}=\hat{x}^{(k)}-C_L^T\hat{y}^{(k)}\) for some \(L\in {\mathscr{S}}^*F(\hat{x}^{(k)},\hat{y}^{(k)})\cap \mathcal{Z}_n^\mathrm{reg}\). In fact, in our numerical implementation we will not compute the matrix \(C_L\), but two \(n\times n\) matrices AB such that \(L=\mathrm{rge\;}(B^T,A^T)\). The next iterate \(x^{(k+1)}\) is then obtained by \(x^{(k+1)}=\hat{x}^{(k)}+\Delta x^{(k)},\) where \(\Delta x^{(k)}\) is a solution of the system \(A\Delta x=-B\hat{y}^{(k)}\). Alternatively, in view of Proposition 3.9, we can also choose a subspace \(L\in {\mathscr{S}}F(\hat{x}^{(k)},\hat{y}^{(k)})\cap \mathcal{Z}_n^\mathrm{reg}\) and compute the Newton direction as \({\Delta x}^{(k)}=-C_L{\hat{y}}^{(k)}\), that is, given \((A,B)\in {\mathscr{M}}(L)\) we have \({\Delta x}^{(k)}=-Ap\) where p solves \(Bp={y}^{(k)}\).

This leads to the following conceptual algorithm.

figure a

For the choice between the two approaches to calculate the Newton direction, it is important to consider whether an element from \({\mathscr{S}}^*F(\hat{x}^{(k)},\hat{y}^{(k)})\) or from \({\mathscr{S}}F(\hat{x}^{(k)},\hat{y}^{(k)})\) is easier to compute.

For this algorithm, locally superlinear convergence follows from Proposition 4.3, see also [14, Corollary 5.6].

Theorem 4.4

Assume that F is SCD semismooth\(^{*}\) at \((\bar{x},0) \in \mathrm {gph}\,F\) and SCD regular around \((\bar{x},0)\). Then for every \(\eta >0\) there is a neighborhood U of \(\bar{x}\) such that for every starting point \(x^{(0)}\in U\) Algorithm 1 is well defined and stops after finitely many iterations at a solution of (4.1) or produces a sequence \(x^{(k)}\) that superlinearly converges to \(\bar{x}\) for any choice of \((\hat{x}^{(k)},\hat{y}^{(k)})\) satisfying (4.2) and any \(L^{(k)}\in {\mathscr{S}}^*F(\hat{x}^{(k)},\hat{y}^{(k)})\) in Step 4.a) and any \(L^{(k)}\in {\mathscr{S}}F(\hat{x}^{(k)},\hat{y}^{(k)})\) in Step 4.b).

In particular, if F is strongly metrically regular around \((\bar{x},0)\), then F has the SCD property around \((\bar{x},0)\) by [14, Corollary 3.19] and it is also SCD regular around \((\bar{x},0)\) as pointed out in the previous section. Therefore, if F also happens to be SCD semismooth\(^{*}\) around \((\bar{x},0)\), then the assumptions of the above statement are fulfilled.

Note that for an implementation of the Newton step, we need not know the whole derivative \({\mathscr{S}}^*F(\hat{x}^{(k)},\hat{y}^{(k)})\) (or \({\mathscr{S}}F(\hat{x}^{(k)},\hat{y}^{(k)})\)) but only one element \(L^{(k)}\).

5 On the implementation of the SCD semismooth* Newton method

When trying to implement the SCD semismooth\(^{*}\) Newton method directly for (1.1), it turns out that it can be rather difficult to perform the approximation step. Hence, we consider another equivalent approach which is more flexible. Consider an equivalent reformulation of (1.1) by the (decoupled) GE

$$\begin{aligned} 0\in G(x,d)=\begin{pmatrix}f(x)+Q(d)\\ x-d\end{pmatrix} \end{aligned}$$
(5.1)

in variables \((x,d)\in \mathbb {R}^n\times \mathbb {R}^n\). Obviously, \(0\in H(\bar{x})\) holds if and only if \((0,0)\in G(\bar{x},\bar{x})\).

The new variable d acts only as an auxiliary variable. The approximation step now reads as follows: Given \({x}^{(k)}\) close to a solution \(\bar{x}\) (and arbitrary \({d}^{(k)}\), for example, \({d}^{(k)}={x}^{(k)}\)), set \({\hat{x}}^{(k)}:={x}^{(k)}\) and find a point \({\hat{d}}^{(k)}\) close to \({x}^{(k)}\) such that \(\mathrm{dist}(0,f({x}^{(k)})+Q({\hat{d}}^{(k)}))\) is small. An approach to solving this problem could be to rewrite GE (1.1) in fixed point form \(x\in T(x)\) and select \({\hat{d}}^{(k)}\in T({x}^{(k)})\). For example, for any \(\lambda >0\) there is

$$\begin{aligned} \hbox {(1.1)}\quad \Leftrightarrow \ x-\lambda f(x)\in (I+\lambda Q)(x)\ \Leftrightarrow \ x\in (I+\lambda Q)^{-1}\big (x-\lambda f(x)\big ).\end{aligned}$$
(5.2)

If we choose \({\hat{d}}^{(k)}\in (I+\lambda Q)^{-1}\big ({x}^{(k)}-\lambda f({x}^{(k)})\big )\), we have \({x}^{(k)}-\lambda f({x}^{(k)})\in {\hat{d}}^{(k)}+\lambda Q({\hat{d}}^{(k)})\) and

$$\begin{aligned} {\hat{y}}^{(k)}:=\big (\frac{1}{\lambda }({x}^{(k)}-{\hat{d}}^{(k)}),{x}^{(k)}-{\hat{d}}^{(k)}\big )\in G({x}^{(k)},{\hat{d}}^{(k)})\end{aligned}$$
(5.3)

follows. In order to show that this approach is feasible as an approximation step, we have to verify that a bound of the form (4.2) holds, at least for \({x}^{(k)}\) close to \(\bar{x}\).

Proposition 5.1

Let \(\bar{x}\) be a solution of (1.1) and assume that there is some \(\lambda >0\) such that the resolvent \((I+\lambda Q)^{-1}\) has a single-valued Lipschitz continuous localization S around \(\bar{x}-\lambda f(\bar{x})\) for \(\bar{x}\), i.e., there are neighborhoods V of \(\bar{x}-\lambda f(\bar{x})\) and U of \(\bar{x}\) such that \(S:V\rightarrow U\) is Lipschitz continuous and \((I+\lambda Q)^{-1}(z)\cap U =\{S(z)\}\) for \(z\in V\). Then there is some \(\delta >0\) and some \(\eta >0\) such that for every \(x\in {\mathscr{B}}_\delta (\bar{x})\) there holds \(x-\lambda f(x)\in V\) and the vectors \(\hat{d}:=S(x-\lambda f(x))\), \(\hat{y}:=(\frac{1}{\lambda }(x-\hat{d}), x-\hat{d})\in \mathrm {gph}\,G(x,\hat{d})\) satisfy the estimate

$$\begin{aligned}\Vert \big ((x,\hat{d}), \hat{y}\big )-\big ((\bar{x},\bar{x}),0\big )\Vert \le \eta \Vert x-\bar{x}\Vert .\end{aligned}$$

Proof

Choose \(\delta >0\) such that \({\mathscr{B}}_\delta (\bar{x})\subseteq (I-\lambda f)^{-1}(V)\) and let \(L_f, L_S>0\) denote the moduli of Lipschitz continuity of f on \({\mathscr{B}}_\delta (\bar{x})\) and of S on V, respectively. Consider \(x\in {\mathscr{B}}_\delta (\bar{x})\). Then, by construction, \(x-\lambda f(x)\in V\) and hence \(\hat{d}:=S(x-\lambda f(x))\) is well defined. Further, by (5.2), we have \(\bar{x}=S(\bar{x}-\lambda f(\bar{x}))\) implying

$$\begin{aligned}\Vert \hat{d}-\bar{x}\Vert =\Vert S(x-\lambda f(x))-S(\bar{x}-\lambda f(\bar{x}))\Vert \le L_S\Vert x-\lambda f(x)-(\bar{x}-\lambda f(\bar{x}))\Vert \le L_S(1+\lambda L_f)\Vert x-\bar{x}\Vert .\end{aligned}$$

In addition we have \(x-\lambda f(x)\in \hat{d}+\lambda Q(\hat{d})\) and \(\hat{y}\in \mathrm {gph}\,G(x,\hat{d})\) follows. Since \(\Vert \hat{y}\Vert \le (1+1/\lambda )\Vert \hat{d}-x\Vert \le (1+1/\lambda )(\Vert \hat{d}-\bar{x}\Vert +\Vert x-\bar{x}\Vert )\), we obtain

$$\begin{aligned}\Vert \big ((x,\hat{d}), \hat{y}\big )-\big ((\bar{x},\bar{x}),0\big )\Vert&\le \Vert x-\bar{x}\Vert +\Vert \hat{d}-\bar{x}\Vert +\Vert \hat{y}\Vert \le (2+\frac{1}{\lambda })(\Vert x-\bar{x}\Vert +\Vert \hat{d}-\bar{x}\Vert )\\&\le (2+\frac{1}{\lambda })(1+L_S(1+\lambda L_f))\Vert x-\bar{x}\Vert \end{aligned}$$

and the assertion follows.

In particular, if Q is a maximal hypomonotone mapping, i.e., there exists some \(\gamma \ge 0\) such that \(\gamma I+Q\) is maximal monotone, then for every \(0<\lambda <1/\gamma\) the mapping \((I+\lambda Q)\) is strongly monotone and hence \((I+\lambda Q)^{-1}\) is a single-valued Lipschitz continuous function on \(\mathbb {R}^n\), cf. [32, Proposition 12.54]. However, hypomonotonicity is only a sufficient condition ensuring that \((I+\lambda Q)^{-1}\) has this property. In Sect. 6 we will encounter a non-hypomonotone mapping \(\tilde{Q}\), such that \((I+\lambda \tilde{Q})^{-1}\) is single-valued and Lipschitz continuous for every \(\lambda >0\).

Note that the choice \(\hat{d}\in (I+\lambda Q)^{-1}\big (x-\lambda f(x)\big )\) corresponds to one step of the so-called Forward-Backward method for solving (1.1).

In the next proposition, we summarize some properties of G.

Proposition 5.2

  1. (i)

    For every \(x\in \mathbb {R}^n\) and \((d,z)\in \mathrm {gph}\,Q\) we have

    $$\begin{aligned}{\mathscr{S}}G\big ((x,d),(f(x)+z,x-d)\big )&=\left\{ \mathrm{rge\;}\left[ \left( \begin{matrix}I&{}0\\ 0&{}X\end{matrix}\right) , \left( \begin{matrix}\nabla f(x)&{}Y\\ I&{}-X\end{matrix}\right) \right] \,\bigg |\,\mathrm{rge\;}(X,Y)\in {\mathscr{S}}Q(d,z)\right\} ,\\ {\mathscr{S}}^* G\big ((x,d),(f(x)+z,x-d)\big )&=\left\{ \mathrm{rge\;}\left[ \left( \begin{matrix}Y^*&{}0\\ 0&{}I\end{matrix}\right) , \left( \begin{matrix}\nabla f(x)^TY^*&{}I\\ X^*&{}-I\end{matrix}\right) \right] \,\bigg |\,\mathrm{rge\;}(Y^*,X^*)\in {\mathscr{S}}^* Q(d,z)\right\} . \end{aligned}$$
  2. (ii)

    Let \(x\in \mathbb {R}^n\) and assume that Q has the SCD property around \((d,z)\in \mathrm {gph}\,Q\). Then the following statements are equivalent:

    1. (a)

      G is SCD regular around \(\big ((x,d),( f(x)+z,x-d)\big )\).

    2. (b)

      For every \(L\in {\mathscr{S}}^* Q(d,z)\) and every \((X,Y)\in {\mathscr{M}}(L)\) the matrix \(\nabla f(x)X+Y\) is nonsingular.

    3. (c)

      For every \(L\in {\mathscr{S}}^* Q(d,z)\) and every \((Y^*,X^*)\in {\mathscr{M}}(L)\) the matrix \(\nabla f(x)^TY^*+X^*\) is nonsingular.

  3. (iii)

    The mapping H is SCD regular around \((\bar{x},0)\) if and only if G is SCD regular around \(\big ((\bar{x},\bar{x}),(0,0))\).

Proof

G has the representation \(G(x,d)=h(x,d)+\tilde{Q}(x,d)\) with

$$\begin{aligned}h(x,d):=\begin{pmatrix}f(x)\\ x-d\end{pmatrix} \text{ and } \tilde{Q}(x,d)=\begin{pmatrix}Q(d)\\ 0\end{pmatrix}.\end{aligned}$$

Since \(\mathrm {gph}\,D\tilde{Q}\big ((x,d),(z,0)\big )=\{\big ((u,e), (v,0)\big )\,\big |\,(e,v)\in \mathrm {gph}\,DQ(d,z)\}\), we obtain \(\mathcal{O}_{\tilde{Q}}=\mathbb {R}^n\times \mathcal{O}_Q\times \{0\}\). For every \((d,z)\in \mathcal{O}_Q\) and every \(x\in \mathbb {R}^n\) the orthogonal projection onto \(\tilde{L}:=\mathrm {gph}\,D\tilde{Q}\big ((x,d),(z,0)\big )=\{\big ((u,e), (v,0)\big )\,\big |\,(e,v)\in \mathrm {gph}\,DQ(d,z)\}\) is represented by the matrix

$$\begin{aligned}P_{\tilde{L}}=\left( \begin{matrix}I&{}0&{}0\\ 0&{}P_L&{}0\\ 0&{}0&{}0\end{matrix}\right) ,\end{aligned}$$

where \(P_L\) corresponds to the orthogonal projection onto the subspace \(L:=\mathrm {gph}\,DQ(d,z)\). Hence, for every \((d,z)\in \mathrm {gph}\,Q\) and every \(x\in \mathbb {R}^n\) we obtain

$$\begin{aligned}{\mathscr{S}}\tilde{Q}\big ((x,d), (z,0)\big )&=\{\mathbb {R}^n\times L\times \{0\}\,\big |\,L\in {\mathscr{S}}Q(d,z)\}\\&=\left\{ \mathrm{rge\;}\left[ \left( \begin{matrix}I&{}0\\ 0&{}X\end{matrix}\right) , \left( \begin{matrix}0&{}Y\\ 0&{}0\end{matrix}\right) \right] \,\bigg |\,\mathrm{rge\;}(X,Y)\in {\mathscr{S}}Q(d,z) \right\} \end{aligned}$$

and from Proposition 3.3 we conclude

$$\begin{aligned}{\mathscr{S}}G\big ((x,d),(f(x)+z,x-d)\big )&=\left\{ \mathrm{rge\;}\left[ \left( \begin{matrix}I&{}0&{}0&{}0\\ 0&{}I&{}0&{}0\\ \nabla f(x)&{}0&{}I&{}0\\ I&{}-I&{}0&{}I\end{matrix}\right) \left( \begin{matrix}I&{}0\\ 0&{}X\\ 0&{}Y\\ 0&{}0\end{matrix}\right) \right] \,\bigg |\,\mathrm{rge\;}(X,Y)\in {\mathscr{S}}Q(d,z)\right\} \\&=\left\{ \mathrm{rge\;}\left[ \left( \begin{matrix}I&{}0\\ 0&{}X\end{matrix}\right) , \left( \begin{matrix}\nabla f(x)&{}Y\\ I&{}-X\end{matrix}\right) \right] \,\bigg |\,\mathrm{rge\;}(X,Y)\in {\mathscr{S}}Q(d,z)\right\} .\end{aligned}$$

Similarly, we have

$$\begin{aligned}{\mathscr{S}}^*\tilde{Q}\big ((x,d),(z,0)\big )&=\left\{ \mathrm{rge\;}\left[ \left( \begin{matrix}Y^*&{}0\\ 0&{}I\end{matrix}\right) , \left( \begin{matrix}0&{}0\\ X^*&{}0\end{matrix}\right) \right] \,\bigg |\,\mathrm{rge\;}(Y^*,X^*)\in {\mathscr{S}}^* Q(d,z)\right\} , \end{aligned}$$

yielding, together with Proposition 3.3,

$$\begin{aligned}&{\mathscr{S}}^* G\big ((x,d),(f(x)+z,x-d)\big )\\&\quad =\left\{ \mathrm{rge\;}\left[ \left( \begin{matrix}I&{}0&{}0&{}0\\ 0&{}I&{}0&{}0\\ \nabla f(x)^T&{}I&{}I&{}0\\ 0&{}-I&{}0&{}I\end{matrix}\right) \left( \begin{matrix}Y^*&{}0\\ 0&{}I\\ 0&{}0\\ X^*&{}0\end{matrix}\right) \right] \,\bigg |\,\mathrm{rge\;}(Y^*,X^*)\in {\mathscr{S}}^* Q(d,z)\right\} \\&\quad =\left\{ \mathrm{rge\;}\left[ \left( \begin{matrix}Y^*&{}0\\ 0&{}I\end{matrix}\right) , \left( \begin{matrix}\nabla f(x)^TY^*&{}I\\ X^*&{}-I\end{matrix}\right) \right] \,\bigg |\,\mathrm{rge\;}(Y^*,X^*)\in {\mathscr{S}}^* Q(d,z)\right\} .\end{aligned}$$

By virtue of (i) and the definition of SCD regularity, G is SCD regular around \(\big ((x,d),(f(x)+z,x-d)\big )\) if and only if for every pair XY with \(\mathrm{rge\;}(X,Y)\in {\mathscr{S}}Q(d,z)\) the matrix

$$\begin{aligned}\left( \begin{matrix}\nabla f(x)&{}Y\\ I&{}-X\end{matrix}\right) =\left( \begin{matrix}\nabla f(x)&{}\nabla f(x)X+Y\\ I&{}0\end{matrix}\right) \left( \begin{matrix}I&{}-X\\ 0&{}I\end{matrix}\right) \end{aligned}$$

is nonsingular. By the representation above, this holds if and only if \(\nabla f(x)X+Y\) is nonsingular. Thus the equivalence between (a) and (b) is established. Similarly, G is SCD regular around \(\big ((x,d),(f(x)+z,x-d)\big )\) if and only if for every pair \(Y^*,X^*\) with \(\mathrm{rge\;}(Y^*,X^*)\in {\mathscr{S}}^* Q(d,z)\) the matrix

$$\begin{aligned}\left( \begin{matrix}\nabla f(x)^TY^*&{}I\\ X^*&{}-I\end{matrix}\right) = \left( \begin{matrix}I&{}-I\\ 0&{}I\end{matrix}\right) \left( \begin{matrix}\nabla f(x)^TY^*+X^*&{}0\\ X^*&{}-I\end{matrix}\right) \end{aligned}$$

is nonsingular and the equivalence between (a) and (c) follows.

To establish (iii), just note that by Proposition 3.3 we have

$$\begin{aligned}{\mathscr{S}}H(\bar{x},0)=\left\{ \mathrm{rge\;}\left[ \left( \begin{matrix}I&{}0\\ \nabla f(\bar{x})&{}I\end{matrix}\right) \begin{pmatrix}X\\ Y\end{pmatrix}\right] =\mathrm{rge\;}(X,\nabla f(x)X+Y)\,\Big |\,\mathrm{rge\;}(X,Y)\in {\mathscr{S}}Q(\bar{x},-f(\bar{x}))\right\} \end{aligned}$$

and the assertion follows from (ii) and the definition of SCD regularity.

Let us now consider the Newton step. Assume that, emanating from the iterate \({x}^{(k)}\), we have computed \(\big (({\hat{x}}^{(k)},{\hat{d}}^{(k)}),({\hat{y}_1}^{(k)}, {\hat{y}_2}^{(k)})\big )\in \mathrm {gph}\,G\) as the result of the approximation step.

Case (i): We compute the Newton direction according to step 4.a) of Algorithm 1.

By Proposition 5.2, we have to compute two \(n\times n\) matrices \({{Y^*}}^{(k)}, {X^*}^{(k)}\) with \(\mathrm{rge\;}({Y^*}^{(k)}, {X^*}^{(k)})\in {\mathscr{S}}^*Q({\hat{d}}^{(k)},{\hat{y}_1}^{(k)}-\nabla f({\hat{x}}^{(k)}))\) and to solve the system

$$\begin{aligned} \left( \begin{array}{cc} \nabla f({\hat{x}}^{(k)})^T{Y^*}^{(k)}&{}I\\ {X^*}^{(k)}&{}-I\end{array}\right) ^T\begin{pmatrix}{\Delta x}^{(k)}\\ {\Delta d}^{(k)}\end{pmatrix}=\left( \begin{array}{cc}{{Y^*}^{(k)}}^T\nabla f({\hat{x}}^{(k)})&{}{{X^*}^{(k)}}^T\\ I&{}-I\end{array}\right) \begin{pmatrix}{\Delta x}^{(k)}\\ {\Delta d}^{(k)}\end{pmatrix}=-\left( \begin{array}{cc}{Y^*}^{(k)}&{}0\\ 0&{}I\end{array}\right) ^T\begin{pmatrix}{\hat{y}_1}^{(k)}\\ {\hat{y}_2}^{(k)}\end{pmatrix}. \end{aligned}$$

Using the second equation we can eliminate \({\Delta d}^{(k)}={\Delta x}^{(k)}+{\hat{y}_2}^{(k)}\) and arrive at the linear system

$$\begin{aligned} \left( {{Y^*}^{(k)}}^T\nabla f({\hat{x}}^{(k)})+{{X^*}^{(k)}}^T\right) {\Delta x}^{(k)}=-\left( {{Y^*}^{(k)}}^T{\hat{y}_1}^{(k)}+{{X^*}^{(k)}}^T{\hat{y}_2}^{(k)}\right) .\end{aligned}$$
(5.4)

Case (ii): The Newton direction is computed by step 4.b) of Algorithm 1.

In this case we determine two \(n\times n\) matrices \({X}^{(k)},{Y}^{(k)}\) with \(\mathrm{rge\;}({X}^{(k)},{Y}^{(k)})\in {\mathscr{S}}Q({\hat{d}}^{(k)},{\hat{y}_1}^{(k)}-\nabla f({\hat{x}}^{(k)}))\), solve the linear system

$$\begin{aligned} \left( \begin{matrix}\nabla f({\hat{x}}^{(k)})&{}{Y}^{(k)}\\ I&{}-{X}^{(k)}\end{matrix}\right) \begin{pmatrix}p_1\\ p_2\end{pmatrix}=-\begin{pmatrix}{\hat{y}_1}^{(k)}\\ {\hat{y}_2}^{(k)}\end{pmatrix}\end{aligned}$$

and set

$$\begin{aligned}\begin{pmatrix}{\Delta x}^{(k)}\\ {\Delta d}^{(k)}\end{pmatrix}=\left( \begin{matrix}I&{}0\\ 0&{}{X}^{(k)}\end{matrix}\right) \begin{pmatrix}p_1\\ p_2\end{pmatrix}.\end{aligned}$$

By eliminating \(p_1={X}^{(k)} p_2-{\hat{y}_2}^{(k)}\) we obtain the linear system

$$\begin{aligned}\left( \nabla f({\hat{x}}^{(k)}){X}^{(k)}+ {Y}^{(k)}\right) p_2=\nabla f({\hat{x}}^{(k)}){\hat{y}_2}^{(k)}-{\hat{y}_1}^{(k)},\end{aligned}$$

whose solution yields

$$\begin{aligned}\begin{pmatrix}{\Delta x}^{(k)}\\ {\Delta d}^{(k)}\end{pmatrix}=\begin{pmatrix}{X}^{(k)} p_2-{\hat{y}_2}^{(k)}\\ {X}^{(k)} p_2\end{pmatrix}.\end{aligned}$$

In both cases, the new iterate is given by \({x}^{(k+1)}:={\hat{x}}^{(k)}+{\Delta x}^{(k)}\). Further, we have \({\Delta x}^{(k)}-{\Delta d}^{(k)}=-{\hat{y}_2}^{(k)}={\hat{d}}^{(k)}-{\hat{x}}^{(k)}\) resulting in \({x}^{(k+1)}={\hat{d}}^{(k)}+{\Delta d}^{(k)}\).

6 Algebraic form of the discrete contact problem with Coulomb friction

We consider an elastic body represented by a bounded domain \(\Omega \subset \mathbb {R}^3\) with a sufficiently smooth boundary \(\partial \Omega\). The body is made of elastic, homogeneous, and isotropic material. The boundary consists of three non-empty disjoint parts: \(\partial \Omega =\overline{\Gamma _u}\cup \overline{\Gamma _p}\cup \overline{\Gamma _c}\). Zero displacements are prescribed on \(\Gamma _u\), surface tractions act on \(\Gamma _p\), and the body is subject to volume forces. We seek a displacement field and a corresponding stress field satisfying the Lamé system of PDEs in \(\Omega\), the homogeneous Dirichlet boundary conditions on \(\Gamma _u\), and the Neumann boundary conditions on \(\Gamma _p\). The body is unilaterally supported along \(\Gamma _c\) by some flat rigid foundation given by the halfspace \(\mathbb {R}^2\times \mathbb {R}_-\) and the initial gap between the body and the rigid foundation is denoted by d(x), \(x\in \Gamma _c\). In the contact zone, we consider a static Coulomb Friction condition.

This problem can be described by partial differential equations and boundary conditions for the displacements, which we are looking for. We refer the reader to, e.g., [11], where also a weak formulation can be found. We consider here only the discrete algebraic problem, which arises after some suitable finite element approximation.

Let n denote the number of degrees of freedom of the nodal displacement vector and let p denote the number of contact nodes \(x_i\in \overline{\Gamma _c}\setminus \overline{\Gamma _u}\). After some suitable reordering of the variables, such that the first 3p positions are occupied by the displacements of the nodes lying in the contact part of the boundary, we arrive at the following nodal block structure for an arbitrary vector \(y\in \mathbb {R}^n\):

$$\begin{aligned}y=(y^1,\ldots ,y^p,y^R)\quad \text{ with }\quad y^i\in \mathbb {R}^3,\ i=1,\ldots ,p,\ y^R\in \mathbb {R}^{n-3p}.\end{aligned}$$

In what follows, \(A\in \mathbb {R}^{n\times n}\), \(\tilde{l}\in \mathbb {R}^n\) are the stiffness matrix and the load vector, respectively. Further we are given two matrices \(N\in \mathbb {R}^{p\times n}\), \(T\in \mathbb {R}^{2p\times n}\), where, for a given displacement vector v, Nv yields the normal components at the p contact points, and \(Tv=(T^1v,\ldots ,T^pv)\), where \(T^iv\in \mathbb {R}^2\) is the tangential nodal displacement vector at the i-th contact node. The symbol \(\vert Tv\vert \in \mathbb {R}^p\) denotes a vector defined by

$$\begin{aligned}\vert Tv\vert =(\Vert T^1v\Vert ,\ldots ,\Vert T^pv\Vert ).\end{aligned}$$

We denote with \(\alpha \in \mathbb {R}^p\) the vector of nodal distances with \(\alpha _i:=d(x_i)\) and the friction coefficient is denoted by \({\mathscr{F}}\).

Definition 6.1

([3, Definition 3.6]) As a solution of a discrete contact problem with Coulomb friction we declare any couple \((\tilde{u}, \lambda )\in \mathbb {R}^n\times \mathbb {R}^p_+\) satisfying

$$\begin{aligned}&\langle A\tilde{u},v-\tilde{u}\rangle +{\mathscr{F}}\langle \lambda ,\vert Tv\vert -\vert T\tilde{u}\vert \rangle \ge \langle \tilde{l},v-\tilde{u}\rangle +\langle \lambda , Nv-N\tilde{u}\rangle \ \forall v\in \mathbb {R}^n, \end{aligned}$$
(6.1)
$$\begin{aligned}&\langle \mu -\lambda , N\tilde{u}+\alpha \rangle \ge 0,\ \forall \mu \in \mathbb {R}^p_+. \end{aligned}$$
(6.2)

Since the stiffness matrix A is positive definite and \(\lambda \ge 0\), condition (6.1) is equivalent to the requirement that \(\tilde{u}\) is a minimizer of the convex minimization problem

$$\begin{aligned}\min _v\frac{1}{2}v^T Av-\langle \tilde{l},v\rangle -\langle \lambda , Nv\rangle +{\mathscr{F}}\langle \lambda ,\vert Tv\vert \rangle .\end{aligned}$$

Given a vector \(z=(z_1,z_2,z_3)^T\in \mathbb {R}^3\), we denote by \(z_{12}:=(z_1,z_2)^T\in \mathbb {R}^2\) the vector formed by the first two components. Using this notation, we have

$$\begin{aligned}Nv=(v_3^1,\ldots ,v_3^p)^T\quad \text{ and }\quad T^iv=v^i_{12},\ i=1,\ldots ,p\end{aligned}$$

due to the ordering of the nodal displacements.

Next consider the transformation of variables \(u=\tilde{u}+d\), where \(d=(d^1,\ldots ,d^p,d^R)^T\in \mathbb {R}^n\) is given by

$$\begin{aligned}d^i_{12}:=0,\ d^i_3:=\alpha _i,\ i=1,\ldots ,p,\quad d^R:=0.\end{aligned}$$

Then u is a solution of the problem

$$\begin{aligned}\min _v\frac{1}{2}v^T Av-\langle l,v\rangle -\sum _{i=1}^p\lambda _iv^i_3+\sum _{i=1}^p{\mathscr{F}}\lambda _i\Vert v^i_{12}\Vert ,\end{aligned}$$

where \(l:=\tilde{l}-Ad\). Since the objective in this minimization problem is convex, this is in turn equivalent to the first-order optimality condition

$$\begin{aligned}0&\in (Au-l)^i-\lambda _i(0,0,1)^T+{\mathscr{F}}\lambda _i\partial \Vert u^i_{12}\Vert ,\ i=1,\ldots ,p,\\ 0&=(Au-l)^R. \end{aligned}$$

Further, (6.2) is the same as \(-\lambda _i\in N_{\mathbb {R}_+}(\tilde{u}^i_3+\alpha _i)=N_{\mathbb {R}_+}(u^i_3)\), \(i=1,\ldots ,p\). After eliminating \(\lambda\) from explicit variables, we have thus arrived at the GE

$$\begin{aligned} 0 \in H(u):=Au-l+Q(u), \end{aligned}$$
(6.3)

where

$$\begin{aligned} Q(u)=\prod ^{p}_{i=1} \tilde{Q}(u^{i})\times Q^R(u^R) \end{aligned}$$
(6.4)

with \(\tilde{Q}:\mathbb {R}^3\rightrightarrows \mathbb {R}^3\) and \(Q^R:\mathbb {R}^{n-3p}\rightrightarrows \mathbb {R}^{n-3p}\) defined by

$$\begin{aligned} \tilde{Q}(v):= \left\{ \begin{pmatrix}-{\mathscr{F}}\vartheta \partial \Vert v_{12}\Vert \\ \vartheta \end{pmatrix}\,\bigg |\, \vartheta \in N_{\mathbb {R}_+}(v_3)\right\} \quad \text{ and }\quad Q^R(v):=\{0\}. \end{aligned}$$
(6.5)

GE (6.3) is dependent solely on transformed displacements u. Multipliers \(\lambda _i\) appear only implicitly as \(-\vartheta\) in the description of \(\tilde{Q}\). This is a big difference with respect to other approaches, where the semismooth Newton method for equations is applied to mixed primal-dual systems or purely dual systems using some NCP-functions, see, e.g., [33, 5, 28].

Remark 6.2

We have derived the GE (6.3) for the Signorini problem with static Coulomb friction. We claim also that, for other contact problems with friction involving two elastic bodies, one can derive a GE of the same type. The interested reader is referred to [33, Section 5.2] for an algebraic transformation of a two-body problem to a one-body problem.

Note that

$$\begin{aligned}\mathrm {gph}\,\tilde{Q}=\left\{ (v,g,\vartheta )\in \mathbb {R}^3\times \mathbb {R}^2\times \mathbb {R}\,\big |\,g\in -{\mathscr{F}}\vartheta \partial \Vert v_{12}\Vert ,\ \vartheta \in N_{\mathbb {R}_+}(v_3)\right\} ,\end{aligned}$$

which enables us to prove the following statement.

Proposition 6.3

H is semismooth\(^{*}\) at each point in its graph.

Proof

Consider a point \((\bar{u}, \bar{w}) \in \mathrm {gph}\,H\). By [13, Proposition 3.6] it suffices to show that Q is semismooth\(^{*}\) at \((\bar{u},\bar{w}-A\bar{u}-l)\), which definitely holds true provided \(\tilde{Q}\) is semismooth\(^{*}\) at all points of its graph. Thus, invoking [22, Theorem 3] and using the connection between the semismooth\(^{*}\) property of sets and the respective distance functions, it suffices to show that \(\mathrm {gph}\,\tilde{Q}\) is a subanalytic set. Let us pick a reference point \((\bar{v},\bar{g},\bar{\vartheta }) \in \mathrm {gph}\,\tilde{Q}\) and consider the set

$$\begin{aligned} P=\left\{ (v,g,\vartheta ,p)\in \mathbb {R}^3\times \mathbb {R}^2\times \mathbb {R}\times \mathbb {R}^2\,\left| \; \begin{aligned}&\Vert v-\bar{v}\Vert ^2+\Vert (g,\vartheta )-(\bar{g}, \bar{\vartheta })\Vert ^2\le 1,\ \Vert p\Vert ^2\le 1,\\&\Vert v_{12}\Vert ^2p_1^2=v_1^2, \Vert v_{12}\Vert ^2p_2^2=v_2^2,\\&v_1p_1\ge 0,\ v_2p_2\ge 0,\\&g_1=-{\mathscr{F}}\vartheta p_1,\ g_2=-{\mathscr{F}}\vartheta p_2,\\&v_3\ge 0,\ \vartheta \le 0,\ v_3\vartheta =0 \end{aligned}\right. \right\} .\end{aligned}$$

Clearly, P is semianalytic (as the intersection of finitely many polynomial equalities and inequalities, it is even semialgebraic) and compact. Moreover, by construction, \(\mathrm {gph}\,\widetilde{Q} \cap \mathcal {B}_1(\bar{v},\bar{g},\bar{\vartheta })\) is the canonical projection of P onto the space of variables \((v,g,\vartheta )\) and hence subanalytic, cp. [4]. The proof is complete.

Proposition 6.4

For every \(\gamma >0\), the mapping \((\gamma I_3+\tilde{Q})^{-1}:\mathbb {R}^3\rightrightarrows \mathbb {R}^3\) is single-valued and Lipschitz continuous on \(\mathbb {R}^3\). In particular, \(\tilde{Q}\) is graphically Lipschitzian of dimension 3 at every point of its graph.

Proof

We have \(v\in (\gamma I_3+\tilde{Q})^{-1}(w)\) if and only if

$$\begin{aligned} \gamma v_3+\vartheta =w_3\end{aligned}$$
(6.6)
$$\begin{aligned} \gamma v_{12}+{\mathscr{F}}(-\vartheta ) v_{12}^*=w_{12} \end{aligned}$$
(6.7)

for some \(\vartheta \in N_{\mathbb {R}_+}(v_3)\) and some \(v_{12}^*\in \partial \Vert v_{12}\Vert\). Since \(\gamma I_1+N_{\mathbb {R}_+}\) is both maximal monotone and strongly monotone, \(v_3\) and \(\vartheta\) are uniquely given by

$$\begin{aligned} v_3=(\gamma I_1+N_{\mathbb {R}_+})^{-1}(w_3)=\frac{\max \{w_3,0\}}{\gamma },\quad \vartheta =w_3-\gamma v_3=\min \{w_3,0\}.\end{aligned}$$
(6.8)

For given \(\vartheta \le 0\) the mapping \(\gamma I_2+{\mathscr{F}}(-\vartheta )\partial \Vert \cdot \Vert\) is again maximal monotone and strongly monotone and thus \(v_{12}\) is uniquely given by

$$\begin{aligned}v_{12}=(\gamma I_2+{\mathscr{F}}(-\vartheta )\partial \Vert \cdot \Vert )^{-1}(w_{12})={\left\{ \begin{array}{ll}0&{}\text{ if }\quad \Vert w_{12}\Vert \le {\mathscr{F}}(-\vartheta ),\\ \frac{1}{\gamma }\left( 1-\frac{{\mathscr{F}}(-\vartheta )}{\Vert w_{12}\Vert }\right) w_{12}&{}\text{ if }\quad \Vert w_{12}\Vert >{\mathscr{F}}(-\vartheta ).\end{array}\right. }\end{aligned}$$

These arguments prove that \((\gamma I_3+\tilde{Q})^{-1}\) is single-valued on \(\mathbb {R}^3\) and there remains to show the Lipschitz continuity. Consider two points \(w^j\), \(j=1,2\), together with \((\gamma I_3+\tilde{Q})^{-1}(w^j)=\{v^j\}\) and the corresponding \(\vartheta ^j\in N_{\mathbb {R}_+}(v_3^j)\), \(v_{12}^{*j}\in \partial \Vert v_{12}^j\Vert\), \(j=1,2\), according to (6.6), (6.7). Then we deduce from (6.7) that

$$\begin{aligned}\langle w^1_{12}-w^2_{12}, v^1_{12}-v^2_{12}\rangle&=\gamma \Vert v^1_{12}-v^2_{12}\Vert ^2+{\mathscr{F}}\langle -\vartheta ^1 v_{12}^{*1}+\vartheta ^2 v_{12}^{*2},v^1_{12}-v^2_{12}\rangle \\&= \gamma \Vert v^1_{12}-v^2_{12}\Vert ^2+{\mathscr{F}}(-\vartheta ^1)\langle v_{12}^{*1}-v_{12}^{*2},v^1_{12}-v^2_{12}\rangle +{\mathscr{F}}(\vartheta ^2-\vartheta ^1)\langle v_{12}^{*2},v^1_{12}-v^2_{12}\rangle \\&\ge \gamma \Vert v^1_{12}-v^2_{12}\Vert ^2-{\mathscr{F}}\vert \vartheta ^1-\vartheta ^2\vert \Vert v^1_{12}-v^2_{12}\Vert ,\end{aligned}$$

where we have used the facts that \(-\vartheta ^1\ge 0\), that the subdifferential mapping \(\partial \Vert \cdot \Vert\) is monotone and that \(\Vert v_{12}^{*2}\Vert \le 1\). Since the functions \(t\rightarrow \min \{t,0\}\) and \(t\rightarrow \max \{t,0\}\) are Lipschitz continuous on \(\mathbb {R}\) with constant 1, we obtain from (6.8) that \(\vert \vartheta ^1-\vartheta ^2\vert \le \vert w_3^1-w_3^2\vert\) yielding

$$\begin{aligned}\gamma \Vert v^1_{12}-v^2_{12}\Vert ^2\le \langle w^1_{12}-w^2_{12}, v^1_{12}-v^2_{12}\rangle +{\mathscr{F}}\vert w_3^1-w_3^2\vert \Vert v^1_{12}-v^2_{12}\Vert \le \big (\Vert w^1_{12}-w^2_{12}\Vert +{\mathscr{F}}\vert w_3^1-w_3^2\vert \big )\Vert v^1_{12}-v^2_{12}\Vert \end{aligned}$$

and consequently \(\gamma \Vert v^1_{12}-v^2_{12}\Vert \le \Vert w^1_{12}-w^2_{12}\Vert +{\mathscr{F}}\vert w_3^1-w_3^2\vert\). Since we also have \(\gamma \vert v_3^1-v_3^2\vert \le \vert w_3^1-w_3^2\vert\), we obtain

$$\begin{aligned}\gamma ^2\Vert v^1-v^2\Vert ^2&\le (\Vert w^1_{12}-w^2_{12}\Vert +{\mathscr{F}}\vert w_3^1-w_3^2\vert )^2\\&\quad +\vert w_3^1-w_3^2\vert ^2\le 2(1+{\mathscr{F}}^2)(\Vert w^1_{12}-w^2_{12}\Vert ^2+\vert w_3^1-w_3^2\vert ^2)\\&=2(1+{\mathscr{F}}^2)\Vert w^1-w^2\Vert ^2\end{aligned}$$

establishing Lipschitz continuity of \((\gamma I_3+\tilde{Q})^{-1}\). To see that \(\tilde{Q}\) is graphically Lipschitzian of dimension 3, just take \(\Phi (x,y)=(\gamma x+y,x)\) and \(f:=(\gamma I+\tilde{Q})^{-1}\) to obtain \(\mathrm {gph}\,f=\Phi (\mathrm {gph}\,\tilde{Q})\).

Remark 6.5

Note that the mapping \(\tilde{Q}\) is not hypomonotone, that is, for every \(\gamma >0\) the mapping \(\gamma I_3+\tilde{Q}\) is not monotone. Indeed, consider \(\gamma >0\) and let

$$\begin{aligned}(v^1,w^1):=\big ((1,0,0),(2\gamma ,0,-2\frac{\gamma }{{\mathscr{F}}})\big )\in \mathrm {gph}\,\tilde{Q},\quad (v^2,w^2):=\big ((2,0,0),(0,0,0)\big )\in \mathrm {gph}\,\tilde{Q}.\end{aligned}$$

Then

$$\begin{aligned}\langle (\gamma v^1+w^1)-(\gamma v^2+w^2),v^1 -v^2\rangle =\gamma \Vert v^1-v^2\Vert ^2+\langle (2\gamma ,0,-2\frac{\gamma }{{\mathscr{F}}}),(-1,0,0)\rangle =-\gamma <0\end{aligned}$$

and therefore \(\gamma I_3+\tilde{Q}\) is not monotone.

Further note that in the case when \(\vartheta =0\) and \(v_{12}=0\) the subgradient \(v_{12}^*\in \partial \Vert v_{12}\Vert\) fulfilling (6.7) is not uniquely given.

Throughout the sequel, it is convenient to refer to [3] and express the graph of \(\tilde{Q}\) in the form

$$\begin{aligned} \mathrm {gph}\,\tilde{Q} = L \cup M_{1} \cup M_{3}^{+} \cup M_{2} \cup M_{3}^{-} \cup M_{4}, \end{aligned}$$
(6.9)

where the single sets arising in (6.9) do have a clear mechanical interpretation. Their definitions are provided in the following table.

Table 1 Definitions and mechanic interpretations of the sets from (6.9)

Note that in Table 1 the impossible combinations of variables are crossed out.

Proposition 6.6

\(\tilde{Q}\) is an SCD mapping and \(\mathcal {O}_{\tilde{Q}}= L \cup M_{1} \cup M_{3}^{+}\). In particular, for \((\bar{v},\bar{g},\bar{\vartheta }) \in L\)

$$\begin{aligned} \widehat{\mathscr{S}}\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })= {\mathscr{S}}\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })= \widehat{\mathscr{S}}^*\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })={\mathscr{S}}^*\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })=\{\mathrm{rge\;}(I_3,0)\}, \end{aligned}$$
(6.10)

for \((\bar{v},\bar{g},\bar{\vartheta })\in M_{3}^{+}\)

$$\begin{aligned} \widehat{\mathscr{S}}\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })={\mathscr{S}}\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })= \widehat{\mathscr{S}}^*\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })={\mathscr{S}}^*\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })=\{\mathrm{rge\;}(0,I_3)\}, \end{aligned}$$
(6.11)

and for \((\bar{v},\bar{g},\bar{\vartheta })\in M_{1}\) one has

$$\begin{aligned}&\widehat{\mathscr{S}}\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })={\mathscr{S}}\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })=\left\{ \mathrm{rge\;}\left[ \left( \begin{array}{ll} I_{2} &{} 0\\ 0 &{} 0 \end{array} \right) , \left( \begin{matrix} -{\mathscr{F}}\bar{\vartheta }\frac{1}{\Vert \bar{v}_{12}\Vert } \left( I_2- \frac{\bar{v}_{12}\bar{v}_{12}^{T}}{\Vert \bar{v}_{12} \Vert ^{2}} \right) &{} -{\mathscr{F}}{\bar{v}_{12}}{\Vert \bar{v}_{12} \Vert }\\ 0 &{} 1 \end{matrix}\right) \right] \right\} , \end{aligned}$$
(6.12)
$$\begin{aligned}&\widehat{\mathscr{S}}^*\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })={\mathscr{S}}^*\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })=\left\{ \mathrm{rge\;}\left[ \left( \begin{matrix} I_{2} &{} 0\\ {\mathscr{F}} \frac{\bar{v}_{12}^T}{\Vert \bar{v}_{12} \Vert } &{} 0 \end{matrix} \right) , \left( \begin{matrix} -{\mathscr{F}}\bar{\vartheta }\frac{1}{\Vert \bar{v}_{12}\Vert } \left( I_2- \frac{\bar{v}_{12}\bar{v}_{12}^{T}}{\Vert \bar{v}_{12} \Vert ^{2}} \right) &{} 0\\ 0 &{} 1 \end{matrix}\right) \right] \right\} . \end{aligned}$$
(6.13)

Proof

Since \(\tilde{Q}\) is graphically Lipschitzian of dimension 3 at every point of its graph, it is an SCD mapping.

Note that the sets \(L, M_1\) and \(M_{3}^{+}\) exhibit a stable behavior in the sense that, for a sufficiently small \(\varrho > 0\),

$$\begin{aligned} \left. \begin{array}{l} (\bar{v},\bar{g},\bar{\vartheta })\in L( \mathrm{or}~ M_{1}, \mathrm{or}~ M_{3}^{+})\\ (v,g,\vartheta ) \in \mathrm {gph}\,\tilde{Q} \cap {\mathscr{B}}_\rho (\bar{v},\bar{g},\bar{\vartheta }) \end{array}\right\} \Rightarrow (v,g,\vartheta ) \in L ( \mathrm{or}~ M_{1}, \mathrm{or}~ M_{3}^{+}). \end{aligned}$$

In particular, we have

$$\begin{aligned} \mathrm {gph}\,\tilde{Q}\cap {\mathscr{B}}_\rho (\bar{v},\bar{g},\bar{\vartheta })={\left\{ \begin{array}{ll}(\mathbb {R}^3\times \{0\})\cap {\mathscr{B}}_\rho (\bar{v},\bar{g},\bar{\vartheta })&{}\text{ if }\quad (\bar{v},\bar{g},\bar{\vartheta })\in L,\\ (\{0\} \times \mathbb {R}^3)\cap {\mathscr{B}}_\rho (\bar{v},\bar{g},\bar{\vartheta })&{}\text{ if } \quad (\bar{v},\bar{g},\bar{\vartheta })\in M_3^+,\\ \{(v_{12},0,-{\mathscr{F}}\vartheta \frac{v_{12}}{\Vert v_{12}\Vert },\vartheta )\,\big |\,v_{12}\in \mathbb {R}^2,\vartheta \in \mathbb {R}\}\cap {\mathscr{B}}_\rho (\bar{v},\bar{g},\bar{\vartheta })&{}\text{ if }\quad (\bar{v},\bar{g},\bar{\vartheta })\in M_1.\end{array}\right. }\end{aligned}$$

It follows from Definition 2.1 that

$$\begin{aligned}&T_{\mathrm {gph}\,\tilde{Q}}(\bar{v},\bar{g},\bar{\vartheta })=\\&{\left\{ \begin{array}{ll}\mathbb {R}^3\times \{0\}&{}\text{ if }\quad (\bar{v},\bar{g},\bar{\vartheta })\in L,\\ \{0\} \times \mathbb {R}^3&{}\text{ if }\quad (\bar{v},\bar{g},\bar{\vartheta })\in M_3^+,\\ \left\{ \left( h_{12},0,-{\mathscr{F}}\left( \bar{\vartheta }\frac{1}{\Vert \bar{v}_{12}\Vert }\left( I-\frac{\bar{v}_{12}\bar{v}_{12}^T}{\Vert \bar{v}_{12}^2\Vert }\right) h_{12}+\omega \frac{\bar{v}_{12}}{\Vert \bar{v}_{12}\Vert }\right) ,\omega \right) \,\big |\,h_{12}\in \mathbb {R}^2,\omega \in \mathbb {R}\right\} &{}\text{ if }\quad (\bar{v},\bar{g},\bar{\vartheta })\in M_1.\end{array}\right. } \end{aligned}$$

In all three cases, we have therefore to do with linear subspaces of dimension three, which yield \(\mathcal {O}_{\tilde{Q}} \supset L \cup M_{1} \cup M_{3}^{+}\) and the expressions for \(\widehat{\mathscr{S}}\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })\) in (6.10), (6.11), and (6.12). Concerning the expressions for \(\widehat{\mathscr{S}}^* \tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })\), they can be derived by first computing the respective orthogonal complements and then using the relation (3.1). The equalities \(\widehat{\mathscr{S}}\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })= {\mathscr{S}}\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })\) and \(\widehat{\mathscr{S}}^* \tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })={\mathscr{S}}^* \tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })\) in (6.10)–(6.13) follow from the observation that the matrices that describe the subspaces continuously depend on the argument \((\bar{v},\bar{g},\bar{\vartheta })\).

It remains to show that actually \(\mathcal{O}_{\tilde{Q}} = L \cup M_{1} \cup M_{3}^{+}\), that is, \((M_2\cup M_4\cup M_3^-)\cap \mathcal{O}_{\tilde{Q}}=\emptyset\). Consider first \((\bar{v},\bar{g},\bar{\vartheta })\in M_2\cup M_4\). Then \(\bar{v}_3=\bar{\vartheta }=0\) and it follows that \(\big ((0,0,1),(0,0,0)\big )\in T_{\mathrm {gph}\,\tilde{Q}}(\bar{v},\bar{g},\bar{\vartheta })\), but the opposite direction \(\big ((0,0,-1),(0,0,0)\big )\) cannot belong to the tangent cone because \(-1\not \in T_{\mathbb {R}_+}(0)\). Hence, \(T_{\mathrm {gph}\,\tilde{Q}}(\bar{v},\bar{g},\bar{\vartheta })\) is not a subspace and \((\bar{v},\bar{g},\bar{\vartheta })\not \in \mathcal{O}_{\tilde{Q}}\) follows. Finally, let \((\bar{v},\bar{g},\bar{\vartheta })\in M_3^-\). Then for all \(t>0\) we have \(\big ((t\bar{g},0),(-{\mathscr{F}}\bar{\vartheta }\bar{g},\bar{\vartheta })\big )\in \mathrm {gph}\,\tilde{Q}\) implying \(\big ((\bar{g},0,0),(0,0,0)\big )\in T_{\mathrm {gph}\,\tilde{Q}}(\bar{v},\bar{g},\bar{\vartheta })\). Now assume that \(-\big ((\bar{g},0,0),(0,0,0)\big )\in T_{\mathrm {gph}\,\tilde{Q}}(\bar{v},\bar{g},\bar{\vartheta })\). By definition, there are sequences \(t_k\downarrow 0\) and \((v^k,g^k,\vartheta ^k)\mathop {\longrightarrow }\limits ^{\mathrm {gph}\,\tilde{Q}}(\bar{v},\bar{g},\bar{\vartheta })\) that satisfy \(\big ((v^k,g^k,\vartheta ^k)-(\bar{v},\bar{g},\bar{\vartheta })\big )/t_k\rightarrow \big ((-\bar{g},0),(0,0,0)\big )\). From \((v^k_{12}-\bar{v}_{12})/t_k=v^k_{12}/t_k\rightarrow -\bar{g}\), \(\vartheta ^k\rightarrow \bar{\vartheta }\) and \(\Vert \bar{g}\Vert =-{\mathscr{F}}\bar{\vartheta }\) we deduce

$$\begin{aligned}\lim _{k\rightarrow \infty }g^k=-\lim _{k\rightarrow \infty }{\mathscr{F}}\vartheta ^k\frac{ v_{12}^k}{\Vert v_{12}^k\Vert }=-\lim _{k\rightarrow \infty }{\mathscr{F}}\vartheta ^k\frac{ v_{12}^k/t_k}{\Vert v_{12}^k\Vert /t_k}=-{\mathscr{F}}\bar{\vartheta }\frac{-\bar{g}}{\Vert \bar{g}\Vert }= -\bar{g}\end{aligned}$$

contradicting \(g^k\rightarrow \bar{g}\) and we conclude that \((\bar{v},\bar{g},\bar{\vartheta })\not \in \mathcal{O}_{\tilde{Q}}\). This completes the proof.

Note that in the formulas (6.12) and (6.13) the matrices \(-{\mathscr{F}}\bar{\vartheta }/\Vert \bar{v}_{12}\Vert \left( I_2- \bar{v}_{12}\bar{v}_{12}^{T}/\Vert \bar{v}_{12} \Vert ^{2} \right)\) are unbounded for \(\bar{v}_{12}\rightarrow 0\). Theoretically, this does not cause problems, because convergence is related to SCD regularity, which is independent from the basis representation used of the underlying subspaces. However, the use of ill-conditioned bases might produce numerical instability when computing the Newton direction. For this reason, we present another representation of the collections \({\mathscr{S}}\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })\) and \({\mathscr{S}}^*\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })\) with a well-conditioned base when \((\bar{v},\bar{g},\bar{\vartheta })\in M_1\). Observe that for any two \(n\times n\) matrices AB and every nonsingular \(n\times n\) matrix C there are \(\mathrm{rge\;}(A,B)=\mathrm{rge\;}(AC,BC)\). Thus, the following corollary follows from (6.12), (6.13) by using the scaling matrix

$$\begin{aligned}C=\left( \begin{array}{cc} \left( I_2-{\mathscr{F}}\bar{\vartheta }\frac{1}{\Vert \bar{v}_{12}\Vert } \left( I_2- \frac{\bar{v}_{12}\bar{v}_{12}^{T}}{\Vert \bar{v}_{12} \Vert ^{2}} \right) \right) ^{-1}&{}0\\ 0&{}1 \end{array}\right) =\left( \begin{array}{cc}\frac{\Vert \bar{v}_{12}\Vert }{\Vert \bar{v}_{12}\Vert -{\mathscr{F}}\bar{\vartheta }} I_2- \frac{{\mathscr{F}}\bar{\vartheta }}{\Vert \bar{v}_{12}\Vert -{\mathscr{F}}\bar{\vartheta }}\frac{\bar{v}_{12}\bar{v}_{12}^T}{\Vert \bar{v}_{12}\Vert ^2}&{}0\\ 0&{}1 \end{array}\right) .\end{aligned}$$

Corollary 6.7

For \((\bar{v},\bar{g},\bar{\vartheta })\in M_1\) we have

$$\begin{aligned} {\mathscr{S}}\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })=\left\{ \mathrm{rge\;}\left[ \left( \begin{array}{cc} \frac{\Vert \bar{v}_{12}\Vert }{\Vert \bar{v}_{12}\Vert -{\mathscr{F}}\bar{\vartheta }} I_2- \frac{{\mathscr{F}}\bar{\vartheta }}{\Vert \bar{v}_{12}\Vert -{\mathscr{F}}\bar{\vartheta }}\frac{\bar{v}_{12}\bar{v}_{12}^T}{\Vert \bar{v}_{12}\Vert ^2}&{}0\\ 0&{}0 \end{array}\right) , \left( \begin{array}{cc}\frac{-{\mathscr{F}}\bar{\vartheta }}{\Vert \bar{v}_{12}\Vert -{\mathscr{F}}\bar{\vartheta }}\left( I_2-\frac{\bar{v}_{12}\bar{v}_{12}^T}{\Vert \bar{v}_{12}\Vert ^2}\right) &{} -{\mathscr{F}}\frac{\bar{v}_{12}}{\Vert \bar{v}_{12}\Vert }\\ 0&{}1 \end{array}\right) \right] \right\} \end{aligned}$$
(6.14)
$$\begin{aligned} {\mathscr{S}}^*\tilde{Q}(\bar{v},\bar{g},\bar{\vartheta })=\left\{ \mathrm{rge\;}\left[ \left( \begin{array}{cc}\frac{\Vert \bar{v}_{12}\Vert }{\Vert \bar{v}_{12}\Vert -{\mathscr{F}}\bar{\vartheta }} I_2- \frac{{\mathscr{F}}\bar{\vartheta }}{\Vert \bar{v}_{12}\Vert -{\mathscr{F}}\bar{\vartheta }}\frac{\bar{v}_{12}\bar{v}_{12}^T}{\Vert \bar{v}_{12}\Vert ^2}&{}0\\ {\mathscr{F}}\frac{\bar{v}_{12}^T}{\Vert \bar{v}_{12}\Vert }&{}0 \end{array}\right) , \left( \begin{array}{cc}\frac{-{\mathscr{F}}\bar{\vartheta }}{\Vert \bar{v}_{12}\Vert -{\mathscr{F}}\bar{\vartheta }}\left( I_2-\frac{\bar{v}_{12}\bar{v}_{12}^T}{\Vert \bar{v}_{12}\Vert ^2}\right) &{} 0\\ 0&{}1 \end{array}\right) \right] \right\} . \end{aligned}$$
(6.15)

From Table 1 one can further infer that

  1. (i)

    every point from \(M_{2}\) is accessible by sequences belonging solely to L or to \(M_{1}\);

  2. (ii)

    every point from \(M_{3}^{-}\) is accessible by sequences belonging to \(M_{1}\) or to \(M_{3}^{+}\), and

  3. (iii)

    the singleton \(M_{4}=\{(0,0,0)\}\) is accessible by sequences belonging to L or to \(M_{1}\) or to \(M_3^-\) or to \(M_{3}^{+}\).

This implies in particular that

$$\begin{aligned} \begin{array}{l} \mathrm{rge\;}(I_3,0)\in {\mathscr{S}}^* \tilde{Q}(\bar{v},\bar{g},\bar{\vartheta }) \text{ for } (\bar{v},\bar{g},\bar{\vartheta }) \in M_{2} \cup M_{4} \text{ and } \\ \mathrm{rge\;}(0,I_3)\in {\mathscr{S}}^* \tilde{Q}(\bar{v},\bar{g},\bar{\vartheta }) \text{ for } (\bar{v},\bar{g},\bar{\vartheta }) \in M_{3}^{-}. \end{array} \end{aligned}$$
(6.16)

Formulas (6.16) are used in the implementation of the Newton step of the SCD semismooth\(^{*}\) Newton method to the numerical solution of (6.3) in the next section.

Obviously, the mapping \(Q^R\) given by (6.5) is Lipschitzian and, therefore, graphically Lipschitzian of dimension \(n-3p\) as well. Further \(\mathcal{O}_{Q^R}=\mathrm {gph}\,Q^R=\mathbb {R}^{n-3p}\times \{0\}\) and

$$\begin{aligned}{\mathscr{S}}Q^R(v,0)={\mathscr{S}}^* Q^R(v, 0)=\{\mathrm{rge\;}(I_{n-3p},0)\},\ v\in \mathbb {R}^{n-3p}.\end{aligned}$$

Combining Lemma 3.4, Proposition 3.5 and Lemma 3.7 with Proposition 6.4 and Proposition 6.6 we arrive at the following corollary.

Corollary 6.8

The mapping Q given by (6.4) is a SCD mapping,

$$\begin{aligned} \mathcal{O}_Q=\left\{ \big ((u^1,\ldots ,u^p,u^R), (w^1,\ldots ,w^p,0)\big )\,\big |\,(u^i,w^i)\in \mathcal{O}_{\tilde{Q}}, u^R\in \mathbb {R}^{n-3p}\right\} \end{aligned}$$

and for every \((u,w)=\big ((u^1,\ldots ,u^p,u^R),(w^1,\ldots ,w^p,0)\big )\in \mathrm {gph}\,Q\) we have

$$\begin{aligned}&{\mathscr{S}}Q(u,w)=\\&\left\{ \left. \mathrm{rge\;}\left[ \left( \begin{array}{cccc} U^1&{}&{}&{}0\\ {} &{}\ddots \\ &{}&{}U^p\\ 0&{}&{}&{}I_{n-3p}\end{array}\right) , \left( \begin{array}{cccc} W^1&{}&{}&{}0\\ {} &{}\ddots \\ &{}&{}W^p\\ 0&{}&{}&{}0\end{array}\right) \right] \,\right| \,\begin{array}{l}\mathrm{rge\;}(U^i,W^i)\in {\mathscr{S}}\tilde{Q}(u^i,w^i),\\ i=1,\ldots ,p\end{array}\right\} \\&{\mathscr{S}}^* Q(u,w)=\\&\left\{ \left. \mathrm{rge\;}\left[ \left( \begin{array}{cccc} {W^*}^1&{}&{}&{}0\\ {} &{}\ddots \\ &{}&{}{W^*}^p\\ 0&{}&{}&{}I_{n-3p}\end{array}\right) , \left( \begin{array}{cccc} {U^*}^1&{}&{}&{}0\\ {} &{}\ddots \\ &{}&{}{U^*}^p\\ 0&{}&{}&{}0\end{array}\right) \right] \,\right| \,\begin{array}{l}\mathrm{rge\;}({W^*}^i,{U^*}^i)\in {\mathscr{S}}^*\tilde{Q}(u^i,w^i),\\ i=1,\ldots ,p\end{array}\right\} . \end{aligned}$$

To implement the semismooth\(^{*}\) Newton method, we must also specify the approximation step. For every \(\gamma >0\) the mapping \(\big (\gamma I_{n-3p}+Q^R\big )^{-1}=I_{n-3p}/\gamma\) is obviously single-valued and Lipschitzian on \(\mathbb {R}^{n-3p}\). Together with Proposition 6.4 we obtain that

$$\begin{aligned}(\gamma I_n+Q)^{-1}=\begin{pmatrix}(\gamma I_3+\tilde{Q})^{-1}\\ \vdots \\ (\gamma I_3+\tilde{Q})^{-1}\\ (\gamma I_{n-3p}+Q^R)^{-1}\end{pmatrix}\end{aligned}$$

and consequently also the mapping \((I_n+\frac{1}{\gamma }Q)^{-1}=(\gamma I_n+Q)^{-1}\circ \gamma I_n\) has these properties. Thus, for a given iterate \({u}^{(k)}\), the choice

$$\begin{aligned} {\hat{d}}^{(k)}:=(I_n+\frac{1}{\gamma }Q)^{-1}\big ( {u}^{(k)}-\frac{1}{\gamma }(A{u}^{(k)}-l)\big )=(\gamma I_n+ Q)^{-1}\big ( \gamma {u}^{(k)}-(A{u}^{(k)}-l)\big ) \end{aligned}$$
(6.17)

is feasible for the approximation step by Proposition 5.1.

7 Numerical experiments

We treat various geometries arising from the cuboid \((0,2)\times (0,1)\times (0,1) \subset \mathbb {R}^3\) by modifying its bottom surface. Given a function \(d:(0,2)\times (0,1)\mapsto \mathbb {R}\), we consider the elastic body represented by the domain

$$\begin{aligned}\Omega (d):=\{(x_1,x_2,x_3)\,\big |\,(x_1,x_2)\in (0,2)\times (0,1),\ d(x_1,x_2)<x_3<1\}.\end{aligned}$$

At the left surface \(x_1=0\) of the body, we impose Dirichlet boundary conditions and on the top surface \(x_3=1\) and the right surface \(x_1=2\) act surface tractions with densities \(P_T\) and \(P_R\). The rigid obstacle is given by the half-space \(\mathbb {R}^2\times \mathbb {R}_-\) so that the contact boundary is the bottom surface of the body.

Depending on the discretization parameter lev, the elastic body is uniformly cut into \(n_{x_1}\cdot n_{x_2}\cdot n_{x_3}\) hexahedrons, where

$$\begin{aligned}n_{x_1}=\lceil 4\cdot 2^{lev/2}\rceil ,\ n_{x_2}=n_{x_3}=\lceil 2\cdot 2^{lev/2}\rceil .\end{aligned}$$

This results in a hexahedral mesh with \((n_{x_1}+1)\cdot (n_{x_2}+1)\cdot (n_{x_3}+1)\) vertices, where \((n_{x_2}+1)\cdot (n_{x_3}+1)\) vertices are in the Dirichlet part of the boundary and \(p:=n_{x_1}\cdot (n_{x_2}+1)\) vertices belong to the contact part of the boundary. The total number of degrees of freedom of the nodal displacements is

$$n:=3n_{x_1}\cdot (n_{x_2}+1)\cdot (n_{x_3}+1).$$

The setting is shown in Fig. 1.

Fig. 1
figure 1

An undeformed and deformed cuboid domain are shown in the left and middle pictures. The zero Dirichlet boundary condition for displacements is assumed on the blue part of the boundary \(\Gamma _u\), surface tractions are applied to the green part of the boundary \(\Gamma _p\) and the contact boundary \(\Gamma _c\) is pressed against the (red) rigid plane foundation. Front faces are not visualized. The right picture shows the deformed cuboid domain decomposed in hexahedrons

The resulting GE (6.3) is solved with the SCD semismooth\(^{*}\) Newton method described in Sects. 5 and 6 and the implementation was carried out in MATLAB on a PC with an i7-7700 CPU. The part of the code that describes the model is built on the original code of [3] with the accelerated assembly of the elastic stiffness matrix as described in [6]. This part of the code was also applied to the Tresca friction solver of [16]. The main difference between the implementations is that in the new code, no reduction is done to the nodes on the contact boundary, and the complete problem is treated with all domain nodes.

For the scalar \(\gamma\) used in the approximation step (6.17) we used an approximation of the largest eigenvalue of A obtained by five iterations of the power method. The system (5.4) that defines the Newton direction was solved iteratively using the GMRES method with ILU factorization as a preconditioner. We stopped the GMRES method when the relative residual (non-preconditioned) is less than the prescribed tolerance tol. Clearly, in this case, we will lose the superlinear convergence, but we can expect linear convergence with the rate tol. Of course, we can use more advanced methods to solve the linear system that determines the Newton direction, but the main task of this paper is to demonstrate the efficiency of the semismooth\(^{*}\) Newton method and not the solution of linear systems.

To improve the global convergence properties, we use a non-monotone line search heuristic as introduced in [15]. Recall that the quantity \({\hat{y}}^{(k)}\) given by (5.3) acts as a residual for GE (5.1) at the point \(({x}^{(k)}, {\hat{d}}^{(k)})\). In the context of our contact problem with Coulomb friction, given an iterate \({u}^{(k)}\) of nodal displacements and \({\hat{d}}^{(k)}\) by (6.17), we consider

$$\begin{aligned}{\hat{y}}^{(k)}:=(\gamma ({u}^{(k)}-{\hat{d}}^{(k)}),{u}^{(k)}-{\hat{d}}^{(k)})\end{aligned}$$

as a residual. If the Newton direction used is denoted by \(\Delta {u}^{(k)}\), the next iterate is calculated as \({u}^{(k+1)}={u}^{(k)}+{\alpha }^{(k)}\Delta {u}^{(k)}\), where \({\alpha }^{(k)}\) is chosen as the first element of the sequence \(1,\frac{1}{2},\frac{1}{4},\frac{1}{8},\frac{1}{32},\frac{1}{128}, \frac{0.1}{128},\frac{0.01}{128},\ldots\) such that

$$\begin{aligned}\Vert {\hat{y}}^{(k+1)}\Vert \le \left( 1-0.1{\alpha }^{(k)}+\frac{0.1}{k+1}\right) \Vert {\hat{y}}^{(k)}\Vert .\end{aligned}$$

We considered three different elastic bodies \(\Omega (d_i)\), \(i=1,2,3\), given by

$$\begin{aligned}&d_1(x_1,x_2)=0.01,\\&d_2(x_1,x_2)=\max \{0.01-0.015\sqrt{0.5(x_1-1)^2+2(x_2-0.5)^2},0.0025\},\\&d_3(x_1,x_2)=0.01+0.005\big (\sin (2\pi x_1)+\cos (2\pi x_2)\big ) \end{aligned}$$

and two load cases \(L_1,L_2\) with surface tractions

$$\begin{aligned}&L_1:\ P_T=(0,0,-1\,\mathrm{GPa}),\ P_R=(-0.2\,\mathrm{GPa},0,0),\\&L_2:\ P_T=(0,0,-1\,\mathrm{GPa}),\ P_R=(-0.17\,\mathrm{GPa},-0.1\,\mathrm{GPa},0). \end{aligned}$$

As material parameters, we chose Young’s modulus \(E:=70\,\mathrm{GPa}\) and Poisson’s ratio \(\nu =0.334\) (aluminum). The friction coefficient was always chosen as \({\mathscr{F}}=0.23\). In Table 2 we report for different discretization levels lev the number p of nodes in the contact part of the boundary and the number n of degrees of freedom. Furthermore, using the relative accuracy \(tol=0.1\) to calculate Newton’s direction, for each of the six possible combinations of geometries \(d_1,d_2,d_3\) and load cases \(L_1,L_2\) we list the number of Newton iterations it and the total number gmres of GMRES iterations needed to reduce the initial residual \(\Vert {\hat{y}}^{(0)}\Vert\) by a factor of \(10^{-12}\). The starting point \({u}^{(0)}\) for the semismooth\(^{*}\) Newton method was always chosen as the origin. The value gmres characterizes the computational complexity.

Table 2 Iteration numbers for \(tol=0.1\) and a starting point \({u}^{(0)}=0\)

We can see that for every geometry and every load case the iteration numbers are nearly equal and increase only slightly with finer discretizations.

In Fig. 2 we depict for the different cases the undeformed bottom surface and the contact states for the deformed contact boundary.

Fig. 2
figure 2

Undeformed bottom surface and states in the deformed contact boundary: no contact (blue), sliding (yellow), sticking (red) (Color figure online)

The number of iterations will decrease when a better starting point is available. In Table 3 we display the iteration numbers when we use as a starting point the solution of the previous discretization level interpolated to the finer mesh.

Table 3 Iteration numbers for \(tol=0.1\) and a starting point \({u}^{(0)}\) set as the interpolated solution of the previous discretization level

We observe that the number of GMRES iterations is reduced by \(35-45\%\) at the highest discretization level. A closer analysis shows that, as expected, we essentially avoid iterations to localize the solution. In fact, after 1–3 iterations, we have identified the correct state of all nodes in the contact part of the boundary, and the remaining iterations are only needed to reach the desired accuracy. Note that we use \(tol=0.1\) and therefore expect linear convergence with the rate 0.1. Since we want to reduce the residual by a factor of \(10^{-12}\), we expect about 12 semismooth\(^{*}\) Newton steps to achieve this goal. In most cases, we need fewer iterations: The reason is that the relative residual of the calculated direction is sometimes significantly less than tol.

Finally, we investigate the impact of the parameter tol on the performance of the semismooth\(^{*}\) Newton method. Here, we consider only the load case \(L_2\) and that the bottom surface is given by \(d_3\) with the discretization level \(lev=10\). In Table 4 we report the iteration numbers it of the semismooth\(^{*}\) Newton method and the total number gmres of GMRES iterations for the starting point \({u}^{(0)}=0\). We can see that for \(tol=0.1\) we need the most semismooth\(^{*}\) Newton steps; however, the total number of GMRES iterations, which measures computational complexity, is the lowest.

Table 4 Iteration numbers for various values of tol (case \(d_3/L_2\), \(lev=10\), \({u}^{(0)}=0\))

We show the convergence of the semismooth\(^{*}\) Newton method for the four values of tol in Fig. 3. We see that during the first 5 or 6 iterations, when the semismooth\(^{*}\) Newton method tries to localize the solution, the accuracy tol does not play any role in decreasing the residual, and we only need a lot of GMRES iterations to compute the search directions with higher accuracy. As soon as we are sufficiently close to the solution, the increased accuracy for computing the search direction also yields better convergence rates and consequently fewer iterations for the semismooth\(^{*}\) Newton method. However, we also need more GMRES iterations to calculate the search direction, which defeats the advantage of a better convergence rate.

Fig. 3
figure 3

Convergence of the semismooth\(^{*}\) Newton method for different values of tol (Case \(d_3/L_2\), \(lev=10\), \({u}^{(0)}=0\))

8 Conclusion

The paper shows the abilities of the SCD semismooth\(^{*}\) Newton method to compute, apart from the variational inequalities of the first and second kind, also a more complicated class of equilibria which can be modeled as GEs with an SCD and semismooth\(^{*}\) multi-valued part. This is documented by a large-scale highly complicated contact problem, where the efficiency of the method enables us, in contrast to most existing approaches, to solve the respective GE on the whole domain without the time-consuming reduction to nodes lying on the contact boundary. We do hope that the SCD semismooth\(^{*}\) Newton method will exhibit a similar performance also in some other mechanical problems having a similar structure as the considered contact problem with Coulomb friction.

The paper is dedicated to our friend A.L. Dontchev, for whom nonsmooth Newton methods definitely belonged to favorite research topics and who contributed to the development of this area in a remarkable way, cf., e.g., [7, 10].