1 Introduction

Starting with the work of Guermond [9], recent papers like [11, 12, 15, 16, 18] have approximated linear PDE’s by minimal residual methods in Banach spaces. The reasons for using Banach spaces like \(L^p(\Omega )\) or \(W^{1,p}_0(\Omega )\) rather than Hilbert spaces like \(L^2(\Omega )\) or \(W^{1,2}_0(\Omega )\) are manyfold. For example, rough data might lead to solutions that are not in \(L^2(\Omega )\), see [11, 16]. Furthermore, minimizing in \(L^1(\Omega )\) seems to allow for computations of viscosity solutions, see [9, Sec. 4.6]. Moreover, classical finite element methods lead for problems like singular perturbed problems or convection-dominated diffusion to non-physical oscillations that can be overcome by the use of minimal residual methods in spaces like \(L^1(\Omega )\), see for example [12]. Unfortunately, the resulting numerical schemes are non-linear minimization problems which are difficult to solve. We overcome this downside for a minimal residual method in \(W^{-1,p'}(\Omega )\) with \(p>2\) by introducing a regularized Kačanov scheme that converges even for large exponents \(p\gg 2\) towards the exact discrete minimizer. More precisely, we do the following.

Suppose we have some linear operator \(B:W^{1,p'}_0(\Omega ) \rightarrow W^{-1,p'}(\Omega )\) that maps the Sobolev space \(W^{1,p}_0(\Omega )\) equipped with homogeneous Dirichlet boundary data onto the dual space \(W^{-1,p'}(\Omega ) :=(W^{1,p}_0(\Omega ))^*\) with \(1/p + 1/p' = 1\). Given a right-hand side \(F\in W^{-1,p'}(\Omega )\) and discrete subspaces \(U_h\subset W^{1,p'}_0(\Omega )\) and \(V_h\subset W^{1,p}_0(\Omega )\), we approximate the solution \(\mathfrak {u}\in W^{1,p'}_0(\Omega )\) to \(B\mathfrak {u}= F\) by a minimizer

$$\begin{aligned} \mathfrak {u}_h \in \mathop {\textrm{arg min}}\limits _{u_h \in U_h} \, \Vert Bu_h - F \Vert _{V^*_h}\text { with } \Vert Bu_h - F \Vert _{V^*_h} = \sup _{v_h \in V_h\setminus \lbrace 0 \rbrace } \frac{ Bu_h(v_h) - F(v_h) }{\Vert \nabla v_h \Vert _{L^p(\Omega )}}. \end{aligned}$$
(1)

For \(p=2\) the computation of the minimizer in (1) has been discussed in [17, Sec. 3.2]. For \(p>2\) we modify the saddle point problem therein by introducing a computable weight \(\sigma _n^\zeta \) with values within some relaxation interval \(\zeta = [\zeta _-,\zeta _+]\subset (0,\infty )\) in the sense that \(\sigma _n^\zeta (x) \in \zeta \) for almost all \(x\in \Omega \). The resulting scheme introduced in (19) seeks \(\psi _{h,{n+1}}\in V_h\) and \(\mathfrak {u}_{h,n+1} \in U_h\) with

$$\begin{aligned} \begin{aligned} \int _\Omega (\sigma _n^\zeta )^{2-p'} \nabla \psi _{h,n+1} \cdot \nabla v_h \, \textrm{d}x+ B\mathfrak {u}_{h,n+1}(v_h)&= F(v_h){} & {} \text {for all }v_h\in V_h,\\ Bu_h(\psi _{h,n+1})&= 0{} & {} \text {for all }u_h\in U_h. \end{aligned} \end{aligned}$$
(2)

Solving this problem allows us to update the weight and to proceed inductively.

To verify the convergence of the iterative scheme, we introduce in Sect. 2 equivalent formulations of the problem in (1) using duality. Since the resulting problems share similarities with the p-Laplace problem, we can exploit recent ideas for the p-Laplace operator from [1, 5]. In particular, we introduce a regularization of the dual problem via a relaxation interval \(\zeta = [\zeta _-,\zeta _+]\) and show convergence of the minimizers of the regularized problem towards the exact minimizer as \(\zeta _- \rightarrow 0\) and \(\zeta _+ \rightarrow \infty \) in Sect. 3. We verify the convergence of the Kačanov iterations towards the minimizers of the regularized dual problems in Sect. 4. Additionally, we use duality again to rewrite the Kačanov iterations as a primal problem, leading to the scheme in (2). We conclude our analysis with a study of a priori and a posteriori error estimates in Sect. 5 and suggest an adaptive scheme in Sect. 6. Finally, we study numerically the beneficial properties of the scheme and discuss strategies to solve challenging problems like convection-dominated diffusion with vanishing viscosity in Sect. 7.

2 Primal and Dual Formulation

Before we discuss the problem in (1), let us introduce some notation:

  • The operator \(B :W^{1,p'}_0(\Omega ) \rightarrow W^{-1,p'}(\Omega )\) with exponent \(p>2\) is a bounded linear mapping, defining a bilinear form \(b(u,v) :=Bu(v)\) for all \(u\in W^{1,p'}_0(\Omega )\) and \(v \in W^{1,p}_0(\Omega )\). Moreover, let \(F\in W^{-1,p'}(\Omega )\) be some given data.

  • Given a regular triangulation \(\mathcal {T}\) of the bounded Lipschitz domain \(\Omega \subset \mathbb {R}^d\), set for all \(\ell \in \mathbb {N}_0\) the space of piece-wise polynomials \(\mathcal {L}^0_\ell (\mathcal {T}) :=\lbrace w \in L^2(\Omega ) :w|_T\) is a polynomial of maximal degree \(\ell \) for all \(T\in \mathcal {T}\rbrace \) and set for some fixed degrees \(k,\delta \in \mathbb {N}\) the Lagrange finite element spaces

    $$\begin{aligned} U_h&:=\mathcal {L}^1_{k,0}(\mathcal {T}) :=\mathcal {L}^0_{k}(\mathcal {T}) \cap W^{1,p'}_0(\Omega ),\\ V_h&:=\mathcal {L}^1_{k+\delta ,0}(\mathcal {T}) :=\mathcal {L}^0_{k+\delta }(\mathcal {T}) \cap W^{1,p}_0(\Omega ). \end{aligned}$$
  • Set for all \(G\in W^{-1,p'}(\Omega )\) the discrete dual seminorm

    $$\begin{aligned} \Vert G \Vert _{V^*_h} :=\sup _{v_h\in V_h\setminus \lbrace 0 \rbrace } \frac{G(v_h)}{\Vert \nabla v_h \Vert _{L^p(\Omega )}}. \end{aligned}$$
  • Set the subspace \((BU_h)^\perp :=\lbrace v_h \in V_h:b(u_h,v_h) = 0\) for all \(u_h \in U_h\rbrace \subset V_h\).

We can characterize the solution \(\mathfrak {u}_h \in U_h\) to (1) via the saddle point problem: Seek \(\psi _h\in V_h\) and \(\mathfrak {u}_h \in U_h\) such that

$$\begin{aligned} \begin{aligned} \int _\Omega |\nabla \psi _h|^{p-2} \nabla \psi _h \cdot \nabla v_h \, \textrm{d}x+ b(\mathfrak {u}_h,v_h)&= F(v_h){} & {} \text {for all }v_h \in V_h,\\ b(u_h,\psi _h)&= 0{} & {} \text {for all }u_h \in U_h. \end{aligned} \end{aligned}$$
(3)

A further related problems seeks the minimizer

$$\begin{aligned} \psi _h = \mathop {\textrm{arg min}}\limits _{v_h \in (BU_h)^\perp } \frac{1}{p} \int _\Omega |\nabla v_h |^p \, \textrm{d}x- F(v_h). \end{aligned}$$
(4)

Lemma 1

(Existence and equivalent characterization)

  1. (a)

    There exists a unique solution \(\psi _h \in (BU_h)^\perp \) to the minimization problem in (4).

  2. (b)

    There exists a solution \(\mathfrak {u}_h \in U_h\) to the minimization problem in (1). The solution \(\mathfrak {u}_h\) is unique up to the kernel \(\ker B|_{U_h} :=\lbrace u_h \in U_h:Bu_h = 0\) in \(V_h^*\rbrace \).

  3. (c)

    The pair \((\psi _h,\mathfrak {u}_h) \in V_h \times U_h\) solves (3) if and only if \(\mathfrak {u}_h \in U_h\) solves (1) and \(\psi _h \in (BU_h)^\perp \) solves (4).

This lemma is shown in [18, Thm. 4.1] within an abstract framework. We give a direct proof utilizing the following statement.

Lemma 2

(Duality mapping) Let \(G\in V_h^*\).

  1. (a)

    There exists a unique solution \(R(G) \in V_h\) to the problem

    $$\begin{aligned} \int _\Omega |\nabla R(G)|^{p-2} \nabla R(G) \cdot \nabla v_h \, \textrm{d}x= G(v_h)\qquad \text {for all }v_h\in V_h. \end{aligned}$$
  2. (b)

    If \(G \ne 0\), the function \(\Vert \nabla R(G) \Vert _{L^p(\Omega )}^{-1} R(G)\) is the unique normed function that attains the supremum in the definition of the \(V_h^*\) norm of G in the sense that any \(\Theta _h \in V_h \text { with }\Vert \nabla \Theta _h \Vert _{L^p(\Omega )} = 1\) satisfies

    $$\begin{aligned} G(\Theta _h) = \Vert G \Vert _{V_h^*}\quad \text {if and only if}\quad \Theta _h = \Vert \nabla R(G) \Vert _{L^p(\Omega )}^{-1} R(G). \end{aligned}$$

Proof

Let \(G\in V_h^*\). The direct method in calculus of variations yields the existence of unique minimizers \(R(G) \in V_h\) with

$$\begin{aligned} R(G) = \mathop {\textrm{arg min}}\limits _{v_h\in V_h} \frac{1}{p} \int _\Omega |\nabla v_h|^p \, \textrm{d}x- G(v_h). \end{aligned}$$

Differentiation shows that this existence result is equivalent to the statement in (a).

Let \(G\ne 0\). Hölder’s inequality and testing with \(v_h = R(G)\) shows that

$$\begin{aligned} \Vert G \Vert _{V_h^*} = \sup _{v_h\in V_h\setminus \lbrace 0 \rbrace } \frac{\int _\Omega |\nabla R(G)|^{p-2} \nabla R(G) \cdot \nabla v_h\, \textrm{d}x}{\Vert \nabla v_h \Vert _{L^p(\Omega )}} = \Vert \nabla R(G)\Vert _{L^p(\Omega )}^{p-1}. \end{aligned}$$

This yields \(G(\Vert \nabla R(G)\Vert _{L^p(\Omega )}^{-1} R(G)) = \Vert G \Vert _{V_h^*}\). Let \(\Theta _h\in V_h\) with \(\Vert \nabla \Theta _h \Vert _{L^p(\Omega )} = 1\) be a further function that attains the supremum in the sense that \( G(\Theta _h) = \Vert G \Vert _{V_h^*}. \) The linearity of G implies

$$\begin{aligned} \sup _{v_h\in V_h\setminus \lbrace 0 \rbrace } \frac{G(v_h)}{\Vert \nabla v_h \Vert _{L^p(\Omega )}} = \Vert G \Vert _{V_h^*} = G\left( \tfrac{1}{2}\Vert \nabla R(G)\Vert _{L^p(\Omega )}^{-1} R(G) + \tfrac{1}{2} \Theta _h\right) , \end{aligned}$$

which yields in particular that \(1 \le \big \Vert \tfrac{1}{2}\Vert \nabla R(G)\Vert _{L^p(\Omega )}^{-1} \nabla R(G) + \tfrac{1}{2}\nabla \Theta _h\big \Vert _{L^p(\Omega )}\). This estimate and the triangle inequality shows that

$$\begin{aligned} 2&\le \left\Vert \Vert \nabla R(G)\Vert _{L^p(\Omega )}^{-1}\nabla R(G) + \nabla \Theta _h \right\Vert _{L^p(\Omega )}\\&\le \left\Vert \Vert \nabla R(G)\Vert _{L^p(\Omega )}^{-1} \nabla R(G) \right\Vert _{L^p(\Omega )} +\Vert \nabla \Theta _h\Vert _{L^p(\Omega )} = 2. \end{aligned}$$

Since \(W^{1,p}_0(\Omega )\) is a strictly convex space [10], this identity yields

$$\begin{aligned} \Theta _h&= \Vert \nabla R(G)\Vert _{L^p(\Omega )}^{-1} R(G). \end{aligned}$$

\(\square \)

Proof of Lemma 1

Step 1 (Proof of (a) and (b)). The direct method in calculus of variations yields the existence of unique minimizers \(\psi _h \in (BU_h)^\perp \) of the strictly convex energy in (4), that is, it verifies (a). Similarly, we conclude the existence of a unique minimizer

$$\begin{aligned} B\mathfrak {u}_h = \mathop {\textrm{arg min}}\limits _{\lbrace Bu_h :u_h \in U_h\rbrace } \Vert Bu_h - F\Vert _{V_h^*}. \end{aligned}$$

This yields the existence of a minimizer \(\mathfrak {u}_h \in U_h\) to the problem (1) and shows (b).

Step 2 (Proof of (c), trivial case). Let \(\mathfrak {u}_h\in U_h\) and \(\psi _h\in (BU_h)^\perp \) satisfy (1) and (4). If \(B\mathfrak {u}_h = F\) in \(V_h^*\), the problem in (3) is satisfied with \(\psi _h = 0\) and vice versa.

Step 3 (Proof of \(\Leftarrow \)in (c)). Let \(\mathfrak {u}_h\in U_h\) satisfy (1) and let \(\psi _h\in (BU_h)^\perp \) satisfy (4) with \(B\mathfrak {u}_h \ne F\) in \(V_h^*\). Since \(\lbrace Bu_h:u_h \in U_h\rbrace \) is a closed subspace of \(V_h^*\), a consequence of the Hahn-Banach theorem (see for example [21, Prop. 3]) yields the existence of a function \(\Theta _h \in V_h\) with \(\Vert \nabla \Theta _h \Vert _{L^p(\Omega )} = 1\),

$$\begin{aligned} (F - B \mathfrak {u}_h)(\Theta _h) = \Vert B \mathfrak {u}_h - F \Vert _{V_h^*},\quad \text {and}\quad B u_h (\Theta _h) = 0\quad \text {for all }u_h \in U_h. \end{aligned}$$
(5)

Lemma 2 characterizes the function \(\Theta _h \in V_h\) due to the first identity in (5) as \(\Theta _h = \Vert \phi _h \Vert _{L^p(\Omega )}^{-1} \phi _h\), where \(\phi _h \in V_h\) solves the problem

$$\begin{aligned} \int _\Omega |\nabla \phi _h|^{p-2} \nabla \phi _h \cdot \nabla v_h \, \textrm{d}x= (F -B\mathfrak {u}_h)(v_h) \qquad \text {for all }v_h \in V_h. \end{aligned}$$
(6)

In particular, the function \(\phi _h\) solves

$$\begin{aligned} \int _\Omega |\nabla \phi _h|^{p-2} \nabla \phi _h \cdot \nabla v_h \, \textrm{d}x= F( v_h)\qquad \text {for all }v_h \in (BU_h)^\perp . \end{aligned}$$

Since this characterizes the minimizer in (4) and \(\phi _h \in (BU_h)^\perp \) due to the second identity in (5), we have \(\phi _h = \psi _h\). Hence, the functions \((\mathfrak {u}_h,\psi _h) \in U_h\times V_h\) solve (3).

Step 4 (Proof of \(\Rightarrow \)in (c)). If there exists a solution \((\mathfrak {u}_h,\psi _h) \in U_h\times V_h\) to (3), the function \(\psi _h\in V_h\) is an element in \((BU_h)^\perp \) and satisfies in particular

$$\begin{aligned} \int _\Omega |\nabla \psi _h|^{p-2} \nabla \psi _h \cdot \nabla v_h \, \textrm{d}x= F (v_h)\qquad \text {for all }v_h \in (BU_h)^\perp . \end{aligned}$$
(7)

This identity characterizes the unique (Step 1) solution to (4), that is, \(\psi _h\) must be the minimizer in (4). The solution \(\mathfrak {u}_h\in U_h\) to (3) is characterized via the identity

$$\begin{aligned} b(\mathfrak {u}_h,v_h) = F(v_h) - \int _\Omega |\nabla \psi _h|^{p-2} \nabla \psi _h\cdot v_h\, \textrm{d}x\qquad \text {for all }v_h \in V_h. \end{aligned}$$
(8)

Since the right-hand side equals zero for all \(v_h \in (BU_h)^\perp \) due to (7), it is in the range of the operator \(B:U_h\rightarrow V_h^*\), that is, there exist a unique solution \(\mathfrak {u}_h \in U_h/\ker B|_{U_h}\) to (8). We know from Step 3 that the solution to (1) solves the problem in (8) as well. The uniqueness of these solutions up to the kernel \(\ker B|_{U_h}\) (Step 1) implies that they must coincide. \(\square \)

The minimization problem in (4) shares similarities with the p-Laplace problem, which can be solved by the regularized Kačanov scheme introduced in [5]. Unfortunately, this schemes converges only for \(p \le 2\). We remedy this downside as in [1] by the use of duality. The dual problem of (4) involves the affine space

$$\begin{aligned} \Sigma :=\left\{ \tau \in L^{p'}(\Omega ;\mathbb {R}^d) :\int _\Omega \nabla v_h \cdot \tau \, \textrm{d}x= F(v_h) \text { for all } v_h \in (BU_h)^\perp \right\} . \end{aligned}$$
(9)

It seeks the minimizer to the problem

$$\begin{aligned} \sigma = \mathop {\textrm{arg min}}\limits _{\tau \in \Sigma } \frac{1}{p'} \int _\Omega |\tau |^{p'}\, \textrm{d}x. \end{aligned}$$
(10)

Let us show the equivalence of the problems in (4) and (10). The solution to (4) is characterized via the Euler-Lagrange equation as unique solution \(\psi _h \in (BU_h)^\perp \) to

$$\begin{aligned} \int _\Omega |\nabla \psi _h|^{p-2} \nabla \psi _h \cdot \nabla v_h \, \textrm{d}x= F(v_h)\qquad \text {for all } v_h\in (BU_h)^\perp . \end{aligned}$$
(11)

The solution \(\sigma \in L^{p'}(\Omega ;\mathbb {R}^d)\) to (10) solves with unique function \(\phi _h \in (BU_h)^\perp \) the saddle point problem

$$\begin{aligned} \begin{aligned} \int _\Omega |\sigma |^{p'-2} \sigma \cdot \tau \, \textrm{d}x- \int _\Omega \nabla \phi _h \cdot \tau \, \textrm{d}x&= 0{} & {} \text {for all } \tau \in L^{p'}(\Omega ;\mathbb {R}^d), \\ - \int _\Omega \nabla v_h \cdot \sigma \, \textrm{d}x&= - F(v_h){} & {} \text {for all } v_h\in (BU_h)^\perp . \end{aligned} \end{aligned}$$
(12)

Lemma 3

(Duality) The solutions to (11) and (12) are related via the identities

$$\begin{aligned} \sigma = |\nabla \psi _h|^{p-2} \nabla \psi _h,\qquad \nabla \psi _h = |\sigma |^{p'-2} \sigma ,\qquad \text {and}\qquad \phi _h = \psi _h. \end{aligned}$$
(13)

Furthermore, the minimal energies satisfy

$$\begin{aligned} \frac{1}{p} \int _\Omega |\nabla \psi _h|^p \, \textrm{d}x- F(\psi _h) = - \frac{1}{p'} \int _\Omega |\sigma |^{p'}\, \textrm{d}x. \end{aligned}$$
(14)

Proof

Let \(\psi _h \in (BU_h)^\perp \) solve (4) and define the functions

$$\begin{aligned} \sigma :=|\nabla \psi _h|^{p-2} \nabla \psi _h \in L^{p'}(\Omega ;\mathbb {R}^d)\qquad \text {and}\qquad \phi _h :=\psi _h \in (BU_h)^\perp . \end{aligned}$$

Direct calculations show that these functions solve the saddle point problem in (12). Since the solution to (12) is unique (due to the uniqueness of the minimizer \(\sigma \) and the fact that the first line in (12) uniquely determines \(\phi _h\) via the identity \(|\sigma |^{p'-2} \sigma = \nabla \phi _h\)), we obtain the equivalence stated in (13). Since \(1/p+1/p'=1\) implies with (13) that \(|\nabla \psi _h|^p = |\sigma |^{p'}\), the identity in (11) yields

$$\begin{aligned} \frac{1}{p} \int _\Omega |\nabla \psi _h|^p \, \textrm{d}x- F(\psi _h) = \left( \frac{1}{p}-1\right) \int _\Omega |\nabla \psi _h|^p \, \textrm{d}x= -\frac{1}{p'} \int |\sigma |^{p'}\, \textrm{d}x. \end{aligned}$$

This shows (14) and concludes the proof. \(\square \)

We want to solve the non-linear problem in (12) via the iterative scheme

$$\begin{aligned} \begin{aligned} \int _\Omega |\sigma _n|^{p'-2} \sigma _{n+1} \cdot \tau \, \textrm{d}x- \int _\Omega \nabla \phi _{h,n+1} \cdot \tau \, \textrm{d}x&= 0{} & {} \text {for all } \tau \in L^{p'}(\Omega ;\mathbb {R}^d), \\ - \int _\Omega \nabla v_h \cdot \sigma _{n+1} \, \textrm{d}x&= - F(v_h){} & {} \text {for all } v_h\in (BU_h)^\perp . \end{aligned} \end{aligned}$$

However, the resulting problems are in general not well posed since \(\sigma _n\) might degenerate. We thus introduce the following regularization.

3 Regularization

Following [5] and [1], we define for any relaxation interval \(\zeta = [\zeta _-,\zeta _+] \subset (0,\infty )\) and all \(t \ge 0\) the integrant

$$\begin{aligned} \kappa ^*_\zeta (t) :={\left\{ \begin{array}{ll} \frac{1}{2}\zeta _-^{p'-2}t^2 + \left( \frac{1}{p'}-\frac{1}{2}\right) \zeta _-^{p'}&{}\text {for }t\le \zeta _-,\\ \frac{1}{p'}t^{p'}&{}\text {for } \zeta _-\le t\le \zeta _+,\\ \frac{1}{2}\zeta _+^{p'-2}t^2+\left( \frac{1}{p'}-\frac{1}{2}\right) \zeta _+^{p'}&{}\text {for }\zeta _+ \le t. \end{array}\right. } \end{aligned}$$

We furthermore define for all \(\tau \in L^{p'}(\Omega ;\mathbb {R}^d)\) the energies

$$\begin{aligned} \mathcal {J}^*_\zeta (\tau ) :=\int _\Omega \kappa ^*_\zeta (|\tau |)\, \textrm{d}x\qquad \text {and}\qquad \mathcal {J}^*(\tau ):=\frac{1}{p'} \int _\Omega |\tau |^{p'} \, \textrm{d}x. \end{aligned}$$

Notice that the regularized energy \(\mathcal {J}^*_\zeta (\tau _h)\) equals infinity if \(\tau \in L^{p'}(\Omega ;\mathbb {R}^d) {\setminus } L^2(\Omega ;\mathbb {R}^d)\). Furthermore, the relaxed energy is monotone with respect to the relaxation interval in the sense that all \(\tau \in L^{p'}(\Omega ;\mathbb {R}^d)\) and relaxation intervals \(\zeta ^2 = [\zeta ^2_-,\zeta ^2_+] \subset \zeta ^1 = [\zeta ^1_-,\zeta ^1_+] \subset (0,\infty )\) satisfy

$$\begin{aligned} \mathcal {J}^*(\tau ) \le \mathcal {J}^*_{\zeta ^1}(\tau )\le \mathcal {J}^*_{\zeta ^2}(\tau ). \end{aligned}$$

The direct method in calculus of variations verifies the existence of a unique minimizer \(\sigma _\zeta \) of \(\mathcal {J}_\zeta ^*\) in \(\Sigma \) in the sense that

$$\begin{aligned} \sigma _\zeta = \mathop {\textrm{arg min}}\limits _{\tau \in \Sigma } \mathcal {J}_\zeta ^*(\tau ). \end{aligned}$$
(15)

In the following we investigate the convergence of \(\sigma _\zeta \) towards the minimizer \(\sigma \in \Sigma \) in (10). Rather than investigating convergence in the \(L^p(\Omega )\) norm, we investigate the convergence of the energies. This energy difference leads to the following bound.

Lemma 4

(Notion of distance) Let \(\sigma \in \Sigma \) be the minimizer in (10) and let \(\tau \in \Sigma \). Then we have

$$\begin{aligned} \Vert |\sigma | + |\sigma - \tau | \Vert _{L^{p'}(\Omega )}^{p'-2} \Vert \sigma - \tau \Vert _{L^{p'}(\Omega )}^2&\lesssim \mathcal {J}^*(\tau ) - \mathcal {J}^*(\sigma ) \lesssim \Vert \sigma - \tau \Vert _{L^{p'}(\Omega )}^{p'}. \end{aligned}$$

Furthermore, we have the lower bound

$$\begin{aligned} \Vert |\tau | + |\sigma - \tau | \Vert _{L^{p'}(\Omega )}^{p'-2} \Vert \sigma - \tau \Vert _{L^{p'}(\Omega )}^2 \lesssim \mathcal {J}^*(\tau ) - \mathcal {J}^*(\sigma ). \end{aligned}$$

The hidden constants depend on p but are independent of the solution \(\sigma \).

Proof

Since this result is well-known in the context of the p-Laplacian, let us briefly summarize its derivation. Let \(\sigma \) and \(\tau \) be as in the lemma. Since \((\mathcal {J}^*)'(\sigma )(\tau - \sigma ) = 0\) due to the minimization property of \(\sigma \), the convexity of \(\mathcal {J}^*\) yields

$$\begin{aligned} \mathcal {J}^*(\tau ) - \mathcal {J}^*(\sigma )&\le (\mathcal {J}^*)'(\tau ) (\tau - \sigma ) = \big ((\mathcal {J}^*)'(\tau ) - (\mathcal {J}^*)'(\sigma )\big ) (\tau - \sigma )\\&= \int _\Omega (|\tau |^{p'-2} \tau - |\sigma |^{p'-2} \sigma ) \cdot (\tau - \sigma ) \, \textrm{d}x. \end{aligned}$$

Further arguments for the integrand as for example shown in [5, Lem. 42] lead to the lower bound

$$\begin{aligned} \int _\Omega (|\tau |^{p'-2} \tau - |\sigma |^{p'-2} \sigma ) \cdot (\tau - \sigma ) \, \textrm{d}x\lesssim \mathcal {J}^*(\tau ) - \mathcal {J}^*(\sigma ). \end{aligned}$$

Additionally, the equivalence \((|P|^{p'-2}P - |Q|^{p'-2}Q) \cdot (P-Q) \eqsim (|Q| + |P-Q|)^{p'-2} |P-Q|^2\) for all \(P,Q\in \mathbb {R}^d\) as shown in [5, Lem. 39] implies

$$\begin{aligned} \int _\Omega (|\tau |^{p'-2} \tau - |\sigma |^{p'-2} \sigma ) \cdot (\tau - \sigma ) \, \textrm{d}x\eqsim \int _\Omega (|\sigma | + |\sigma - \tau |)^{p'-2} |\sigma - \tau |^2\, \textrm{d}x. \end{aligned}$$
(16)

These observations lead to the upper bound in the lemma. The lower bound follows from Hölder’s reverse inequality

$$\begin{aligned}&\left( \int _\Omega \big ((|\sigma | + |\sigma - \tau |)^{p'-2}\big )^{\tfrac{1}{1-q}} \, \textrm{d}x\right) ^{1-q} \left( \int _\Omega \big (|\sigma - \tau |^2\big )^{\tfrac{1}{q}} \, \textrm{d}x\right) ^q\\&\qquad \le \int _\Omega (|\sigma | + |\sigma - \tau |)^{p'-2} |\sigma - \tau |^2\, \textrm{d}x\qquad \text {with }q :=\frac{2}{p'}. \end{aligned}$$

Exchanging the role of \(\sigma \) and \(\tau \) in (16) leads to the alternative lower bound. \(\square \)

We have the following convergence result for the energy differences.

Proposition 5

(Convergence in \(\zeta \)) Let \(\psi _h \in V_h\) denote the solution to (4) and let \(\sigma \) and \(\sigma _\zeta \) denote the minimizers in (10) and (15), respectively. Their energy difference is bounded for all relaxation intervals \(\zeta = [\zeta _-,\zeta _+]\subset (0,\infty )\) and \(r > 2\) by

$$\begin{aligned} \mathcal {J}^*(\sigma _\zeta ) - \mathcal {J}^*(\sigma ) \le \mathcal {J}_\zeta ^*(\sigma _\zeta ) - \mathcal {J}^*(\sigma )&\le \frac{|\Omega |}{p'}\zeta _-^{p'} + \frac{1}{p'}\zeta _+^{-(r-p')} \Vert \sigma \Vert ^{r}_{L^{r}(\Omega )} \\&= \frac{|\Omega |}{p'}\zeta _-^{p'} + \frac{1}{p'}\zeta _+^{-(r-p')} \Vert \nabla \psi _h \Vert ^{r(p-1)}_{L^{r(p-1)}(\Omega )}. \end{aligned}$$

Proof

This first two inequalities follow as in [1, Thm. 3.1]. Since \(|\nabla \psi _h|^p = |\sigma |^{p'}\) due to Lemma 3, the equality then follows from the identity

$$\begin{aligned}&\int _\Omega |\sigma |^r \, \textrm{d}x= \int _\Omega |\sigma |^{p' \tfrac{r}{p'}} = \int _\Omega |\nabla \psi |^{r \tfrac{p}{p'}} \, \textrm{d}x= \int _\Omega |\nabla \psi _h |^{r(p-1)} \, \textrm{d}x. \end{aligned}$$

\(\square \)

Remark 6

(Regularity) The convergence result in Proposition 5 assumes the regularity property \(\psi _h \in W_0^{1,r(p-1)}(\Omega )\). Such a result is indeed true for all \(r \le \infty \), since \(\psi _h \in V_h= \mathcal {L}^1_{k+\delta ,0}(\mathcal {T})\) is a function in a finite dimensional space. However, the norm might increase as the mesh is refined. In practical computations this issue does not seem to cause problems, since we can control the impact of the regularization by comparing the energies \(\mathcal {J}_\zeta ^*(\tau )\) and \(\mathcal {J}^*(\tau )\), cf. Sect. 6, and our numerical experiments in Sect. 7 do not indicate a significantly decreased rate of convergence.

4 Relaxed Kačanov Scheme

In this section we introduce our iterative scheme that converges towards the minimizer \(\sigma _\zeta \) in (15) with relaxation interval \(\zeta = [\zeta _-,\zeta _+]\subset (0,\infty )\). Set \(b \vee c :=\max \lbrace b,c\rbrace \) and \(b\wedge c:=\min \lbrace b,c\rbrace \) for all \(b,c\in \mathbb {R}\). Given some initial value \(\sigma _0\in L^{p'}(\Omega ;\mathbb {R}^d)\), we compute iteratively for any \(n\in \mathbb {N}_0\) the solution \(\sigma _{n+1} \in L^{p'}(\Omega ;\mathbb {R}^d)\) and \(\psi _{h,n+1}\in (BU_h)^\perp \) satisfying for all \(\tau \in L^{p'}(\Omega ;\mathbb {R}^d)\) and \(v_h\in (BU_h)^\perp \)

$$\begin{aligned} \begin{aligned} \int _\Omega (\zeta _-\vee |\sigma _n|\wedge \zeta _+) ^{p'-2} \sigma _{n+1} \cdot \tau \, \textrm{d}x- \int _\Omega \nabla \psi _{h,n+1} \cdot \tau \, \textrm{d}x&= 0,\\ - \int _\Omega \nabla v_h \cdot \sigma _{n+1} \, \textrm{d}x&= - F(v_h). \end{aligned} \end{aligned}$$
(17)

The following proposition shows convergence of the solutions \(\sigma _n\) towards the minimizer \(\sigma _\zeta \) in (15).

Proposition 7

(Convergence) We have with constant \(\rho \lesssim (\zeta _-/\zeta _+)^{2-p'}\) the linear convergence result

$$\begin{aligned} \mathcal {J}_\zeta ^*(\sigma _{n+1}) - \mathcal {J}_\zeta ^*(\sigma _\zeta ) \le (1-\rho )^n \big ( \mathcal {J}_\zeta ^*(\sigma _0) - \mathcal {J}_\zeta ^*(\sigma _\zeta ) \big )\qquad \text {for all }n\in \mathbb {N}. \end{aligned}$$

Proof

This result follows as in [1, Sec. 4]. \(\square \)

To solve the problem in (17), we utilize duality to obtain a primal problem which seeks \(\psi _{h,n+1} \in (BU_h)^\perp \) such that for all \(v_h\in (BU_h)^\perp \)

$$\begin{aligned} \int _\Omega (\zeta _-\vee |\sigma _n|\wedge \zeta _+)^{2-p'} \nabla \psi _{h,n+1} \cdot \nabla v_h \, \textrm{d}x= F(v_h). \end{aligned}$$
(18)

The corresponding saddle point problem seeks \(\psi _{h,n+1}\in V_h\) and \(\mathfrak {u}_{h,n+1}\in U_h\) with

$$\begin{aligned} \begin{aligned} \int _\Omega (\zeta _-\vee |\sigma _n|\wedge \zeta _+)^{2-p'} \nabla \psi _{h,n+1} \cdot \nabla v_h \, \textrm{d}x+ b(\mathfrak {u}_{h,n+1},v_h)&= F(v_h){} & {} \text {for all }v_h\in V_h,\\ b(u_h,\psi _{h,n+1})&= 0{} & {} \text {for all }u_h\in U_h. \end{aligned} \end{aligned}$$

Proposition 8

(Equivalence) The solution \(\psi _{h,n+1} \in (BU_h)^\perp \) to (18) and \(\sigma _{n+1}\in \Sigma \) to (17) are related via the identity

$$\begin{aligned} \sigma _{n+1} = (\zeta _-\vee |\sigma _n|\wedge \zeta _+)^{2-p'} \nabla \psi _{h,n+1}. \end{aligned}$$

Proof

The same arguments as in the proof of Lemma 3 yield the proposition. \(\square \)

The problem in (18) can be solved iteratively. More precisely, we compute with given initial value \(\sigma _0 \in L^{p'}(\Omega ;\mathbb {R}^d)\) for all \(n\in \mathbb {N}_0\) iteratively the solution \(\psi _{h,n+1} \in (BU_h)^\perp \) to the problem in (18) and define \(\sigma _{n+1} :=(\zeta _-\vee |\sigma _n|\wedge \zeta _+)^{2-p'} \nabla \psi _{h,n+1}\). Notice that the problem in (18) can be equivalently reformulated as the saddle point problem in (2), that is, we seek \(\psi _{h,n+1}\in V_h\) and \(\mathfrak {u}_{h,n+1} \in U_h\) such that

$$\begin{aligned} \begin{aligned} \int _\Omega (\zeta _-\vee |\sigma _n|\wedge \zeta _+)^{2-p'} \nabla \psi _{h,n+1} \cdot \nabla v_h \, \textrm{d}x+ b(\mathfrak {u}_{h,n+1},v_h)&= F(v_h){} & {} \text {for all }v_h \in V_h,\\ b(u_h,\psi _{h,n+1})&= 0{} & {} \text {for all }u_h \in U_h. \end{aligned} \end{aligned}$$
(19)

Due to Proposition 8 the resulting functions \(\sigma _{n+1}\) are the same as the functions obtained by (17) and thus the sequence converges according to Proposition 7 towards the exact discrete minimizer \(\sigma _\zeta \).

Remark 9

(Finer mesh) The contribution \(\sigma _{n+1} = (\zeta _-\vee |\sigma _n|\wedge \zeta _+)^{2-p'} \nabla \psi _{h,n+1}\) is in general not a polynomial. In the experiments shown below we thus simplified our computations by using the \(L^2(\Omega )\) orthogonal projection \(\Pi _{k+\delta -1} \sigma _{n+1}\) onto the space of piece-wise polynomials of maximal degree \(k+\delta -1\). Alternatively, it is possible to use instead of a higher polynomial degree \(k + \delta \) the space \(V_h = \mathcal {L}^1_{1,0}(\mathcal {T}^+)\) with a sufficiently fine refinement of \(\mathcal {T}^+\) of \(\mathcal {T}\). In this case the contribution \((\zeta _-\vee |\sigma _n|\wedge \zeta _+)^{2-p'} \nabla \psi _{h,n+1}\) is a piece-wise constant polynomial with respect to the underlying triangulation \(\mathcal {T}^+\) which allows for the exact evaluation of \(\sigma _{n+1}\).

5 Error Control

The a priori and a posteriori error control for minimal residual methods is well established, see for example [3, 4, 17,18,19]. Let us briefly adapt the proofs therein to our situation. We assume that

  1. (a)

    there exists a unique solution \(\mathfrak {u}\in W^{1,p'}_0(\Omega )\) with \(B\mathfrak {u}= F\) in \(W^{-1,p'}(\Omega )\) and

  2. (b)

    there exist a Fortin operator \(\Pi :W^{1,p}_0(\Omega ) \rightarrow V_h\) with continuity constant \(\Vert \Pi \Vert < \infty \) in the sense that for all \(u_h\in U_h\) and \(v\in W^{1,p}_0(\Omega )\)

    $$\begin{aligned} b(u_h,v-\Pi v) = 0 \quad \text {and}\quad \Vert \nabla \Pi v \Vert _{L^p(\Omega )} \le \Vert \Pi \Vert \, \Vert \nabla v \Vert _{L^p(\Omega )}. \end{aligned}$$
    (20)

Proposition 10

(Error control) Suppose that the assumptions in (a) and  (b) are true. Then the solution \(\mathfrak {u}\) to \(B\mathfrak {u}= F\) in \(W^{-1,p'}(\Omega )\) and \(\mathfrak {u}_h\in U_h\) to (1) satisfy

$$\begin{aligned} \Vert B \mathfrak {u}- B\mathfrak {u}_h \Vert _{W^{-1,p'}(\Omega )} \le (1+2\,\Vert \Pi \Vert ) \min _{u_h\in U_h} \Vert B \mathfrak {u}- B u_h \Vert _{W^{-1,p'}(\Omega )}. \end{aligned}$$
(21)

Moreover, with oscillation \(osc (F) :=\sup _{v\in W^{1,p}(\Omega ){\setminus } \lbrace 0 \rbrace } F(v-\Pi v)/\Vert \nabla v \Vert _{L^p(\Omega )}\) we have for any \(u_h \in U_h\) the a posteriori error estimate

$$\begin{aligned} \Vert B \mathfrak {u}- B u_h \Vert _{W^{-1,p'}(\Omega )} \le \Vert \Pi \Vert \, \Vert B \mathfrak {u}- B u_h \Vert _{V_h^*} + osc (F). \end{aligned}$$
(22)

Proof

Let \(u_h \in U_h\). Any \(v \in W^{1,p}_0(\Omega )\) with \(\Vert \nabla v \Vert _{L^p(\Omega )} = 1\) satisfies

$$\begin{aligned} b( \mathfrak {u}- u_h , v) = b( \mathfrak {u}- u_h , \Pi v) + b( \mathfrak {u}- u_h ,v - \Pi v) \le \Vert \Pi \Vert \, \Vert B\mathfrak {u}-B u_h \Vert _{V^*_h} + osc (F). \end{aligned}$$

This proves the a posteriori estimate in (22). To obtain the a priori estimate in (21), we use the minimization property in (1), that is,

$$\begin{aligned} \Vert B\mathfrak {u}-B \mathfrak {u}_h \Vert _{V^*_h} = \min _{u_h \in U_h} \Vert B\mathfrak {u}-B u_h \Vert _{V^*_h} \le \min _{u_h \in U_h} \Vert B\mathfrak {u}-B u_h \Vert _{W^{-1,p'}(\Omega )}. \end{aligned}$$

Moreover, the oscillation satisfies

$$\begin{aligned} osc (F)&= \sup _{v\in W^{1,p}(\Omega )\setminus \lbrace 0 \rbrace } \frac{F(v-\Pi v)}{\Vert \nabla v \Vert _{L^p(\Omega )}} = \min _{u_h\in U_h} \sup _{v\in W^{1,p}(\Omega )\setminus \lbrace 0 \rbrace } \frac{(B\mathfrak {u}- B \mathfrak {u}_h)(v-\Pi v)}{\Vert \nabla v \Vert _{L^p(\Omega )}}\\&\le (1 + \Vert \Pi \Vert )\, \Vert B\mathfrak {u}- B \mathfrak {u}_h \Vert _{W^{-1,p'}(\Omega )}. \end{aligned}$$

Combining these estimates with (22) concludes the proof of (21). \(\square \)

Corollary 11

(A posteriori for exact solution) Let \((\psi _h,\mathfrak {u}_h) \in V_h\times U_h\) solve (3) and assume that the assumptions in (a) and (b) are satisfied. Moreover, let \(\sigma \in \Sigma \) denote the solution to (10). Then we have the a posteriori error estimate

$$\begin{aligned} \Vert B\mathfrak {u}- B\mathfrak {u}_h \Vert _{W^{-1,p'}(\Omega )} \eqsim \Vert \nabla \psi _h \Vert ^{p-1}_{L^p(\Omega )} + osc (F) = \Vert \sigma \Vert _{L^{p'}(\Omega )} + osc (F). \end{aligned}$$
(23)

Proof

Hölder’s inequality, the first equation in (3), and the identity in (13) show

$$\begin{aligned} \Vert B\mathfrak {u}- B\mathfrak {u}_h \Vert _{V_h^*} = \sup _{v_h \in V_h\setminus \lbrace 0 \rbrace } \frac{\int _\Omega |\nabla \psi _h|^{p-2}\nabla \psi _h \cdot \nabla v_h \, \textrm{d}x}{\Vert \nabla v_h \Vert _{L^p(\Omega )}} = \Vert \nabla \psi _h \Vert ^{p-1}_{L^p(\Omega )} = \Vert \sigma \Vert _{L^{p'}(\Omega )}. \end{aligned}$$

Using this identity in the a posteriori estimate in (22) leads to the upper bound in (23). Equivalence follows from the upper bound for the oscillation

$$\begin{aligned} osc (F)&= \sup _{v \in W^{1,p}_0(\Omega )} \frac{F(v-\Pi v)}{\Vert \nabla v \Vert _{L^{p}(\Omega )}} = \sup _{v \in W^{1,p}_0(\Omega )} \frac{b(\mathfrak {u}- \mathfrak {u}_h,v-\Pi v)}{\Vert \nabla v \Vert _{L^{p}(\Omega )}}\\&\le (1 + \Vert \Pi \Vert )\, \Vert B\mathfrak {u}- B \mathfrak {u}_h \Vert _{W^{-1,p'}(\Omega )}. \end{aligned}$$

\(\square \)

We conclude this section with a discussion of the following additional assumption:

  1. (c)

    The operator \(B:W^{1,p'}_0(\Omega ) \rightarrow W^{-1,p'}(\Omega )\) is bounded from above and below in the sense that

    $$\begin{aligned} \Vert \nabla u \Vert _{L^{p'}(\Omega )} \eqsim \Vert Bu\Vert _{W^{-1,p'}(\Omega )}\qquad \text {for all }u \in W^{1,p'}_0(\Omega ). \end{aligned}$$

Under this additional assumption the error estimates in Proposition 10 and Corollary 11 allow for any estimate of the more natural error quantity \(\Vert \nabla \mathfrak {u}- \nabla \mathfrak {u}_h \Vert _{L^{p'}(\Omega )}\) due to the equivalence

$$\begin{aligned} \Vert \nabla \mathfrak {u}- \nabla \mathfrak {u}_h \Vert _{L^{p'}(\Omega )} \eqsim \Vert B \mathfrak {u}- B \mathfrak {u}_h \Vert _{W^{-1,p'}(\Omega )}. \end{aligned}$$

The assumption in (a) seems to be natural. The assumption in (b) can in many situations be achieved by choosing sufficiently large polynomial degrees \(k+\delta \) for the test space \(V_h = \mathcal {L}^1_{k+\delta }(\mathcal {T})\) as for example investigated in [17, Sec. 4]. The assumption in (c) has been investigated in [11] but seems to be rather restrictive. Indeed, there exist counterexamples for the Laplace problem \(Bu = \int _\Omega \nabla u \cdot \nabla {\varvec{\cdot }}\, \textrm{d}x\) for exponents \(p>4\) and non-smooth non-convex domains \(\Omega \) as shown in [13]. Notice that even in cases where (c) is satisfied, the Galerkin scheme investigated in [11] requires stability of the \(W^{1,2}_0(\Omega )\)-projection in \(W^{1,p}_0(\Omega )\). Such stability results are known for uniform and mildly graded meshes [6, 7], but are an open problem for adaptively refined meshes. Our minimal residual method circumvents this problem by suitable designs of Fortin operators in (b).

6 Adaptive Scheme

As pointed out in Corollary 11, the minimizer \(\sigma \) with (10) allows us to drive an adaptive mesh refinement scheme. However, our iterative scheme does not compute the exact solution \(\sigma \). We thus introduce an adaptive scheme that additionally takes the distance of the current iterate \(\sigma _{n}\) to \(\sigma \) into account. The error indicator that indicate errors caused by

  1. (a)

    the upper interval bound \(\zeta _+\) reads \(\eta _{\zeta _+}^2(\sigma _{n}) :=\mathcal {J}^*_{\zeta }(\sigma _{n}) - \mathcal {J}^*_{[\zeta _-,\infty )}(\sigma _{n})\),

  2. (b)

    the lower interval bound \(\zeta _-\) reads \( \eta _{\zeta _-}^2(\sigma _{n}) :=\mathcal {J}^*_{\zeta }(\sigma _{n}) - \mathcal {J}^*_{[0,\zeta _+]}(\sigma _{n})\),

  3. (c)

    the error due to the fixed-point iteration reads

  4. (d)

    the error due to the discretization reads

    $$\begin{aligned} \eta _h^{p'} :=\sum _{T\in \mathcal {T}} \eta ^{p'}_h(T)\quad \text {with}\quad \eta ^{p'}_h(T) :=\Vert \sigma _{n} \Vert _{L^{p'}(T)}^{p'}. \end{aligned}$$

The indicators in (a) and (b) provide some information on the impact of the relaxation interval \(\zeta \) on the current iterate. The indicator in (c) is motivated by the convergence result in Proposition 7. The error indicator in (d) is motivated by the a posteriori error estimate for \(\sigma \) in Corollary 11. Notice that \(\sigma _n\) is indeed a good approximation of \(\sigma \) if \(\Vert \sigma - \sigma _n \Vert _{L^{p'}(\Omega )} \ll \Vert \sigma \Vert _{L^{p'}(\Omega )}\), which can be seen by the triangle inequality

$$\begin{aligned} |\Vert \sigma _n \Vert _{L^{p'}(\Omega )} - \Vert \sigma - \sigma _n \Vert _{L^{p'}(\Omega )} | \le \Vert \sigma \Vert _{L^{p'}(\Omega )} \le \Vert \sigma _n \Vert _{L^{p'}(\Omega )} + \Vert \sigma - \sigma _n \Vert _{L^{p'}(\Omega )}. \end{aligned}$$

Lemma 4 states that

$$\begin{aligned} \Vert \sigma - \sigma _n \Vert ^2_{L^{p'}(\Omega )} \lesssim \Vert |\sigma _n| + |\sigma - \sigma _n|\Vert _{L^{p'}(\Omega )}^{2-p'} \big (\mathcal {J}(\sigma _n) - \mathcal {J}(\sigma ) \big ). \end{aligned}$$

Hence, the estimate \(\Vert \sigma - \sigma _n \Vert _{L^{p'}(\Omega )} \ll \Vert \sigma \Vert _{L^{p'}(\Omega )}\) follows from an estimate like

$$\begin{aligned} \mathcal {J}(\sigma _n) - \mathcal {J}(\sigma ) \ll \Vert |\sigma _n| + |\sigma - \sigma _n|\Vert _{L^{p'}(\Omega )}^{p'-2} \Vert \sigma _n\Vert _{L^{p'}(\Omega )}^2 \le \Vert \sigma _n\Vert _{L^{p'}(\Omega )}^{p'} = \eta _h^{p'}. \end{aligned}$$

This motivates the following refinement strategy with some small weight \(w>0\):

  1. (a)

    If , refine the mesh adaptively with the local error contributions \(\eta ^{p'}_h(T)\) as refinement indicator.

  2. (b)

    Otherwise, if , increase \(\zeta _+\).

  3. (c)

    Otherwise, if , decrease \(\zeta _-\).

Then we perform another Kačanov iteration and continue with the evaluation of the resulting error indicators. This leads to an adaptive loop which solves (19) and then might refine the mesh or adapt \(\zeta \) according to (a)–(c). Then the loop resets and proceeds with solving (19) again.

Remark 12

(Primal-dual error estimator) In [1, Sec. 6.2] we use the dual problem with energy \(\mathcal {J}_\zeta \) of the minimization problem in (10) to define the estimator

(24)

This error estimator is a guaranteed upper bound for the error

However, in [1] we focused on a lowest-order scheme in the sense that \(V_h = \mathcal {L}^1_{1,0}(\Omega )\), which allows for accurate evaluations of \(\sigma _{n+1} = (\zeta _-\vee |\sigma _n|\wedge \zeta _+)^{2-p'} \nabla \psi _{h,n+1}\) with piece-wise constant function \(\sigma _n\). Since in this paper’s minimal residual method the space \(V_h\) is of higher polynomial degree, the evaluation of \(\sigma _{n+1}\) becomes more intricate, cf. Remark 9. Our numerical experiments indicate that this challenge does not impact the convergence of the Kačanov scheme, but it causes difficulties when evaluating the duality gap in (24). We thus use the alternative indicator in (c). Replacing the test space \(V_h\) of higher polynomial degree by a test space \(V_h =\mathcal {L}^1_{1,0}(\mathcal {T}^+)\) with finer mesh \(\mathcal {T}^+ \ge \mathcal {T}\) as discussed in Remark 9 circumvents these issues and thus allows us to use of the primal-dual error estimator in (24).

Remark 13

(Cheaper approaches) The adaptive loop suggested in this section is much more costly than the adaptive scheme with linear minimal residual methods for \(p=2\). This might be a price we have to pay the solve challenging PDE’s. On the other hand, for less challenging problems, we might use cheaper versions of the suggested scheme. For example, the scheme performed well in our experiments with fixed relaxation interval and a fixed small number of Kačanov iterations after each mesh refinement as for example done in Sect. 7.3. Alternatively, one might use our scheme only in the last step of an adaptive finite element loop to smoothen oscillations.

7 Applications

We conclude this paper with an application of our algorithm to convection-diffusion problems. Given a bounded Lipschitz domain \(\Omega \subset \mathbb {R}^d\), a diffusion coefficient \(\varepsilon > 0\), an incompressible advection field \(\beta \in L^\infty (\Omega ;\mathbb {R}^d)\), a function \(c \in L^\infty (\Omega )\), and a right-hand side \(f \in L^2(\Omega )\), this problems seeks \(u\in W^{1,2}_0(\Omega )\) with

$$\begin{aligned} - div (\varepsilon \nabla u - \beta u ) + cu = f. \end{aligned}$$
(25)

Set for all \(u,v\in W^{1,2}_0(\Omega )\) the functional \(F(v) :=\int _\Omega fv\, \textrm{d}x\) and the bilinear form

$$\begin{aligned} b(u,v) :=\int _\Omega \varepsilon \nabla u \cdot \nabla v \, \textrm{d}x- \int _\Omega u\beta \cdot \nabla v\, \textrm{d}x+ \int _\Omega c uv\, \textrm{d}x. \end{aligned}$$

The variational formulation of (25) seeks the solution \(u\in W^{1,2}_0(\Omega )\) to

$$\begin{aligned} b(u,v) = F(v) \qquad \text {for all }v\in W^{1,2}_0(\Omega ). \end{aligned}$$
(26)

This formulation allows for the application of our minimal residual method. We therefore discretize the spaces \(W_0^{1,p'}(\Omega )\) and \(W_0^{1,p}(\Omega )\) with \(p :=100\) by

$$\begin{aligned} U_h :=\mathcal {L}^1_{1,0}(\mathcal {T})\qquad \text {and}\qquad V_h :=\mathcal {L}^1_{2,0}(\mathcal {T}). \end{aligned}$$

Suitable Fortin operators (20), which might in fact require higher polynomial degrees in \(V_h\), are discussed in [17, Sec. 4]. To compare the results, we apply the following alternative schemes:

  1. (a)

    Our first alternative numerical scheme is the classical Galerkin FEM. It seeks the solution \(\mathfrak {u}^G_h\in U_h\) to the problem

    $$\begin{aligned} b(\mathfrak {u}^G _h, w_h) = F(w_h) \qquad \text {for all }w_h\in U_h. \end{aligned}$$

    Adaptive mesh refinements are driven by the standard residual error estimator investigated for example in [20, Sec. 1.2].

  2. (b)

    The second alternative is the classical first-order system least squares method [2] with Raviart-Thomas space \(RT_0(\mathcal {T}) :=\lbrace q \in H(div ,\Omega ):\) for all \(T\in \mathcal {T}\) exist \(A\in \mathbb {R}^d\) and \(b\in \mathbb {R}\) with \(q(x)|_T = A + b x\) for all \(x\in T\rbrace \). It seeks the minimizer \((\mathfrak {u}_h^LS ,\sigma _h^LS ) \in U_h\times RT_0(\mathcal {T})\) that minimizes over all \((u_h,\tau _h) \in U_h\times RT_0(\mathcal {T})\) the functional

    $$\begin{aligned} \Vert \tau _h - \varepsilon \nabla u_h + \beta u_h\Vert _{L^2(\Omega )}^2 + \Vert div \, \tau _h - cu_h + f\Vert _{L^2(\Omega )}^2. \end{aligned}$$

    Adaptive mesh refinements are driven by the local contributions

    $$\begin{aligned} \Vert \sigma _h^LS - \varepsilon \nabla \mathfrak {u}_h^LS + \beta \mathfrak {u}_h^LS \Vert _{L^2(T)}^2 + \Vert div \, \sigma _h^LS - c\mathfrak {u}_h^LS + f\Vert _{L^2(T)}^2\quad \text {for all }T\in \mathcal {T}. \end{aligned}$$
  3. (c)

    The third alternative is the minimal residual method introduced in [17, Example 2.2 (i)], which seeks the solution to the minimization problem

    $$\begin{aligned} \mathfrak {u}^Min _h = \mathop {\textrm{arg min}}\limits _{u_h \in U_h} \sup _{v_h\in V_h\setminus \lbrace 0 \rbrace } \frac{b(u_h,v_h) - F(v_h) }{\Vert \nabla v_h \Vert _{L^2(\Omega )}}. \end{aligned}$$

    Adaptive mesh refinements are driven by the local contributions \(\Vert \eta _h \Vert _{L^2(T)}^2\) for all \(T\in \mathcal {T}\) of the Riesz representative \(\eta _h \in V_h\) with

    $$\begin{aligned} \int _\Omega \nabla \eta _h\cdot \nabla v_h \, \textrm{d}x= b(\mathfrak {u}_h,v_h) - F(v_h) \qquad \text {for all }v_h \in V_h. \end{aligned}$$

The adaptive schemes use the Dörfler marking strategy with bulk parameter 0.5.

7.1 Experiment 1 (Viscosity Solution 1D)

Fig. 1
figure 1

Approximations of the viscosity solution \(\mathfrak {u}\) to (27)

Our first experiment considers the one dimensional problem

$$\begin{aligned} \mathfrak {u}' + \mathfrak {u}= 1\text { in }\Omega :=(0,1)\qquad \text {with}\qquad \mathfrak {u}(0) = \mathfrak {u}(1) = 0. \end{aligned}$$
(27)

In other words, we solve the problem in (25) with \(\varepsilon = 0\) and \(\beta = c = 1\). This overdetermined ODE has no classical solution, but can be seen as the limiting case \(\varepsilon \rightarrow 0\) of the problem

$$\begin{aligned} -\varepsilon \mathfrak {u}_\varepsilon '' + \mathfrak {u}_\varepsilon ' + \mathfrak {u}_\varepsilon = 1\text { in }\Omega \qquad \text {with}\qquad \mathfrak {u}_\varepsilon (0) = \mathfrak {u}_\varepsilon (1) = 0. \end{aligned}$$

These functions \(\mathfrak {u}_\varepsilon \) converge towards the viscosity solution \(\mathfrak {u}(x) = 1 - \exp (-x)\), cf. [14, Chap. 1] and [9, Sec. 4.6]. Figure 1 displays the resulting approximations on a partition of the unit interval into \(2^5\) equidistant intervals. The solution \(\mathfrak {u}_h\) to (1) yields, apart from a tiny oscillation on the last intervals, a very accurate approximation of the viscosity solution. In contrast, the Galerkin FEM results in a highly oscillating function that does not resemble any of the solutions characteristics at all. The solution to the minimization methods in (b) and (c) experiences some fast decay on the first interval. Thereafter, the approximation increase and experience a similar (but stronger) oscillation at the last two intervals. The accuracy of the solutions to (a)–(c) does not improve under uniform mesh refinement. Adaptive mesh refinements, driven by the local residuals of these methods, overcome this problem partially for the methods in (b)–(c). Apart from an oscillation near \(x=1\), the adaptive LSFEM shows some small oscillation near the origin \(x=0\) and the adaptive minimal residual method (c) shows an oscillation near \(x=1/2\). This indicates severe difficulties of the methods in (a)–(c) for problems with small viscosity parameter \(\varepsilon \ll 1\), that can be overcome by the minimal residual method (1) in this paper.

7.2 Experiment 2 (Viscosity Solution 2D)

Fig. 2
figure 2

Approximations with about 1000 degrees of freedom of the viscosity solution \(\mathfrak {u}\) with (28) evaluated at (x, 1/2) (top left) and (3/4, y) (top right) with uniform mesh refinements and at (x, 1/2) (bottom) with adaptive mesh refinement

In our second experiment we extend the first experiment to two dimensions: We seek the viscosity solution to

$$\begin{aligned} \frac{d}{dx} \mathfrak {u}+ \mathfrak {u}= 1\text { in }\Omega :=(0,1)^2\qquad \text {with }\mathfrak {u}(0,{\varvec{\cdot }}) = \mathfrak {u}(1,{\varvec{\cdot }}) = 0. \end{aligned}$$
(28)

In other words, we solve the problem in (25) with \(\varepsilon = 0\), \(\beta = (1,0)^\top \), and \(c = 1\) with modified boundary conditions. The viscosity solution reads \(\mathfrak {u}(x,y) = 1- \exp (-x)\). The resulting approximations are displayed in Fig. 2. All methods fail on uniform meshes: While the Galerkin FEM leads to strong oscillations along the x-axis, the minimal residual method in (c) and our suggested methods in (1) lead to strong oscillations along the y-axis. The LSFEM seems to be more robust, but does not provide a good approximation as well. Uniform mesh refinements do not seem to overcome these difficulties. However, adaptive mesh refinement overcomes this problem for the minimal residual methods. The adaptive LSFEM solution seems to converge to the exact solution but is still much worse than the adaptively computed solutions to the method in (c) and our approach in (1). Indeed, the solution to (c) shows only some tiny oscillation near \(x=1\), the solution to our scheme in (1) does not show any oscillation at all and provides a very accurate approximation, cf. Fig. 2. This shows that adaptivity might be a key in the convergence of our approximation.

7.3 Experiment 3 (Eriksson and Johnson)

Our third experiment has been introduced by Eriksson and Johnson in [8]. We seek the solution to (26) with \(\beta = (1,0)^\top \), right-hand side \(f=0\), and initial data \(\mathfrak {u}(0,y) = \sin (\pi y)\) and \(\mathfrak {u}(x,y)=0\) for \(x=1\) or \(y\in \{0,1\}\) with unit square domain \(\Omega = (0,1)^2\). In other words, we seek the solution to

$$\begin{aligned} \begin{aligned} -\varepsilon \Delta \mathfrak {u}+ \frac{d}{d x}\mathfrak {u}&= 0{} & {} \text {in }\Omega ,\\ \mathfrak {u}(x,y)&= 0{} & {} \text {if } y\in {0,1}\text { or }x = 1,\\ \mathfrak {u}(x,y)&= \sin (\pi y){} & {} \text {if } x = 0. \end{aligned} \end{aligned}$$
(29)

Let \(s_1 :=(1 + \sqrt{1+4\pi ^2\varepsilon ^2})(2\varepsilon )^{-1}\) and \(s_2 :=(1 - \sqrt{1+4\pi ^2\varepsilon ^2})(2\varepsilon )^{-1}\). The exact solution reads

$$\begin{aligned} \mathfrak {u}(x,y) = \frac{\exp (s_1(x-1)) - \exp (s_2(x-1))}{\exp (-s_1)-\exp (-s_2)}\sin (\pi y). \end{aligned}$$

In our first computation we set \(\varepsilon :=10^{-3}\) and use uniformly refined meshes. In contrast to the previous calculations all methods converge as the mesh is uniformly refined. On coarse grids our minimization schemes leads to superior results as depicted in Fig. 3.

Fig. 3
figure 3

Approximations of the solution to (29) with \(\varepsilon = 10^{-3}\), uniform mesh, and \(\dim U_h = 4225\) evaluated at (x, 1/2)

The situation changes drastically when we solve the problem with very small diffusion coefficient \(\varepsilon :=10^{-6}\). For uniform mesh refinements the solutions to the minimal residual method in (1) and the minimal residual method in (c) show as in Experiment 2 strong oscillations along the y-axis. The direct solver in FEniCS (MUMPS) was not able to solve the resulting system for the Galerkin FEM solution with more than 80 degrees of freedom. The LSFEM solution seems to converge towards a function \(u \approx \gamma \, \sin (\pi y)\) with some constant \(\gamma \) slight larger than 1/2 as the mesh is uniformly refined. Unfortunately, adaptivity does not overcome this problem: The direct solver in FEniCS (MUMPS) was not able to compute a solution to the Galerkin FEM and the LSFEM with adaptive mesh refinements for meshes with more than about 300 and 1000 degrees of freedom, respectively. The adaptive scheme for our method in (1) refines strongly near \(x=0\) and the approximation seems to converge point-wise to zero, cf. Fig. 4. The adaptive minimal residual method in (c) refines strongly near \(x=0\) and \(x=1\) and the approximations look roughly like \(u \approx \gamma \, \sin (\pi y)\) with some constant \(\gamma \) slightly larger than 1/2. All in all, non of the schemes converges towards the exact solution. We overcome this challenge by slowly adapting the diffusion parameter \(\varepsilon \) in our computations in the sense that we set

$$\begin{aligned} \varepsilon :={\left\{ \begin{array}{ll} 10^{-2}&{}\text {it }\dim U_h \in [0,1000),\\ 10^{-3}&{}\text {if }\dim U_h \in [1000,5000),\\ 10^{-4}&{}\text {if } \dim U_h \in [5000,10 000),\\ 10^{-5} &{} \text {if }\dim U_h \in [10 000,50 000),\\ 10^{-6} &{} \text {else}. \end{array}\right. } \end{aligned}$$
(30)

Fig. 4 shows the resulting convergence history plot of the error measured in the \(L^2(\Omega )\) norm. Initially adapting \(\varepsilon \) helps all methods, but as \(\dim U_h\) exceeds \(10^3\) (and so \(\varepsilon \) is set to \(10^{-3}\)), the LSFEM starts to struggle. The same happens for the Galerkin and minimal residual method in (a) and (c) as \(\dim U_h\) exceed \(10^4\) (and so \(\varepsilon \) is set to \(10^{-5}\)). In contrast, our method in (1) still converges as the number of degrees of freedom is increased. In order to save computational power, we did not use the adaptive scheme suggested in Sect. 6. Instead, we fixed the relaxation interval \(\zeta :=[10^{-2},10^2]\) and computed only two Kačanov iterations on each mesh. An alternative calculation, using the adaptive strategy in Sect. 6 with the large weight \(w = 100\) causing about five Kačanov iterations on each mesh, led to similar convergence results.

Fig. 4
figure 4

The left-hand side shows approximations evaluated at (x, 1/2) with adaptive mesh refinements and fixed parameter \(\varepsilon = 10^{-6}\) with \(\dim U_h \approx 5000\) and the right-hand side shows the convergence history plot of the \(L^2(\Omega )\) error for \(\varepsilon = 10^{-6}\) with adapted diffusion parameter in (30)

8 Conclusion

We have introduced a novel numerical scheme that solves minimal residual methods in \(W^{-1,p'}(\Omega )\). Additionally, we suggested an iterative scheme that converges towards the discrete solution of the resulting non-linear minimization problem. The scheme converges even for large exponents like \(p=100\). The resulting approximations are beneficial for solving challenging PDE’s like convection-dominated diffusion problems compared to other schemes like the Galerkin FEM or minimal residual methods in Hilbert spaces. However, in these challenging situations the convergence of our scheme seems to require some suitable mesh design. This can be done adaptively with some suitable designed initial mesh. We thus suggest a scheme where we increase the diffusion parameter depending on the degrees of freedom. This allowed for the approximation of convection-dominated diffusion problems with tiny diffusion parameters like \(\varepsilon = 10^{-6}\).