1 Introduction

Let \(\Omega \subset \mathbb {R}^d\). For \(1 \le p < \infty \), the \(p-\)Laplace equation is

$$\begin{aligned} \nabla \cdot (\Vert \nabla v\Vert _2^{p-2}\nabla v) = f \text { in } \Omega \text { and } v = g \text { on } \partial \Omega , \end{aligned}$$
(1)

where \(\Vert w\Vert _2 = \left( \sum _{j=1}^d |w_j|^2 \right) ^{1/2}\) is the usual \(2-\)norm on \(\mathbb {R}^d\). Prolonging g from \(\partial \Omega \) to the interior \(\Omega \) and setting \(u = v-g\), the variational form is

$$\begin{aligned} \text {Find } u \in W^{1,p}_0(\Omega ) \text { such that } J(u) = {1 \over p}\int _{\Omega } \Vert \nabla (u+g)\Vert _2^p - \int _{\Omega } fu \text { is minimized}. \end{aligned}$$
(2)

A similar definition can be made in the case \(p=\infty \) and will be discussed in Sect. 3.1.

For \(p=1\), the p-Laplacian is also known as Mean Curvature, and a solution with \(f=0\) is known as a minimal surface [31]. The 1-Laplacian is related to a certain “pusher-chooser” game [19] and compressed sensing [7]. The general p-Laplacian is used for nonlinear Darcy flow [11], modelling sandpiles [2] and image processing [8]. We also mention the standard text of Heinonen et al. [16]; as well as the lecture notes of Lindqvist [21].

One may discretize the variational form (2) using finite elements; we briefly outline this procedure in Sect. 2.1 and refer to Barrett and Liu [3] for details. One chooses piecewise linear basis functions \(\{\phi _j(x)\}\) on \(\Omega \) and we let \(u_h(x) = \sum _{j} u_j \phi _j(x)\). The energy \(J(u_h)\) can be approximated by quadrature; the quadrature is exact if the elements are piecewise linear. This leads to a finite-dimensional energy functional

$$\begin{aligned}&\text {Find }u \in \mathbb {R}^n \text { such that } J(u) = c^Tu\nonumber \\&\quad + {1 \over p}\sum _{i=1}^m \omega _i \left( \sum _{j=1}^d (D^{(j)} u+b^{(j)})_i^2 \right) ^{p \over 2} \text { is minimized,} \end{aligned}$$
(3)

where \(D^{(j)}\) is a numerical partial derivative, \(b^{(j)} = D^{(j)}g\) comes from the boundary conditions g and c comes from the forcing term f.

Several algorithms have been proposed to minimize the convex functional J(u). Huang et al. [18] proposed a steepest descent algorithm on a regularized functional \(J_{h,\epsilon }(u)\) which works well when \(p > 2\). Tai and Xu [36] proposed a subspace correction algorithm which works best when p is close to 2 but whose convergence deteriorates when \(p \rightarrow 1\) or \(p \rightarrow \infty \). Algorithms based on a multigrid approach (e.g. Huang et al. [9]) suffer from the same problems when p approaches 1 or \(\infty \). The algorithm of Oberman [30] also works for \(p \ge 2\), although the convergence factor deteriorates after several iterations so it is difficult to reach high accuracy with this method.

The problem of minimizing J(u) has much in common with the problem of minimizing a p-norm, which is by now well-understood. The motivation for optimizing a p-norm is often given as a facility location problem [1, 6]. Efficient algorithms for solving such problems can be obtained within the framework of convex optimization and barrier methods; see Hertog et al. [17] and Xue and Ye [38] specifically for p-norm optimization; and for general convex optimization, see Nesterov and Nemirovskii [28], Boyd and Vandenberghe [5] and Nesterov [27] and references therein.

Given a \(\nu \)-self-concordant barrier for a convex problem, it is well-known that the solution can be found in \(O(\sqrt{\nu }\log \nu )\) Newton iterations. However, the “hidden constant” in the big-O notation depends on problem parameters, including the number n of grid points in a finite element discretization. Our main result is to estimate these hidden constants and show that the overall performance of our algorithm is indeed \(O(\sqrt{n} \log n)\).

figure a

The iteration count \(O(\sqrt{n}\log n)\) also holds if \(\epsilon \) is not frozen, provided that \(\epsilon ^{-1}\) grows at most polynomially in n.

We emphasize that the \(p=1,\infty \) cases have up to now been considered to be especially hard and there are no other algorithm that offers any performance guarantees in these situations. Estimate (6) is the first algorithm that is known to converge even when \(p=1\). We also have an algorithm for the \(p = \infty \) case and we provide a corresponding iteration count estimate in Theorem 3.

The algorithm mentioned in Theorem 1 is the barrier method of convex optimization. Consider the problem of minimizing \(c^Tx\) subject to \(x \in Q\) where Q is some convex set. Assume we have a \(\nu \)-self-concordant barrier F for Q. The barrier method works by minimizing \(tc^Tx + F(x)\) for larger and larger values of the barrier parameter t, which is increased iteratively according to some schedule. In the short step variant, t increases slowly and the method is very robust; the estimates of Theorem 1 are for the short step variant of the barrier method. It is well-known that the short step barrier method has the theoretically best convergence estimates, but that long step variants (where t increases more rapidly) work better in practice. However, long step algorithms have theoretically worse convergence estimates and, as we will see in the numerical experiments, can sometimes require a large number of Newton iterations to converge.

In order to get the best convergence, we have devised a new, very simple adaptive stepping algorithm for the barrier method. There are already many adaptive stepping algorithms (see e.g. Nocedal et al. [29] and references therein). It is often difficult to prove “global convergence” of these algorithms, and we are not aware of global estimates of Newton iteration counts. With our new, highly innovative algorithm, we are able to prove “quasi-optimal” convergence of our adaptive scheme. Here, quasi-optimal means that our adaptive algorithm requires \(\tilde{O}(\sqrt{n})\) Newton iterations, neglecting logarithmic terms, which is the same as the theoretically optimal short-step algorithm of Theorem 1.

The p-Laplacian is subject to roundoff problems when p is large but finite, as we now briefly describe. Consider the problem of minimizing \(\Vert v\Vert _p^p\) in the space \(\{v = (1,y)\}\). Assume that we are given a machine \(\epsilon \) (for example, \(\epsilon \approx 2.22\times 10^{-16}\) in double precision) and consider an arbitrary vector \(v = [1,\delta ]^T\). In this situation \(\Vert v\Vert _p^p = 1^p + \delta ^p = 1\) in machine precision, provided \(\delta < \epsilon ^{1/p}\). This means that a region of size \(\epsilon ^{1/p}\) near the minimum is numerically indistinguishable from the true minimum when computing the energy, causing a very large relative error in the solution. This phenomenon becomes worse in higher dimensions and when composing with matrices with large condition numbers as in (3). This means that all algorithms, including our own, will struggle to produce highly accurate solutions when \(p \gg 2\). In particular, for \(p=5\), we see that \(\delta < 7.4 \times 10^{-4}\) is best possible, and this is made worse by the condition number of the differential matrices. However, although the problem is numerically challenging for finite \(p \gg 2\), the problem becomes easy again when \(p=\infty \). Our second main result is an estimate for the \(p=\infty \) case in Sect. 3.1, and we confirm by numerical experiments that there are no numerical issues for \(p=\infty \).

Our algorithm is an iterative scheme for a high-dimensional problem arising from a partial differential equation. Each iteration involves the solution of a linear problem that can be interpreted as a numerical elliptic boundary value problem. One can estimate pessimistically that solving each linear problem requires \(O(n^3)\) FLOPS, for a total cost of \(O(n^{3.5}\log n)\) FLOPS for our entire algorithms. This estimate can be improved by using an \(O(n^{2.373})\) FLOPS fast matrix inverse [20], making our overall algorithms \(O(n^{2.873}\log n)\) FLOPS; we mention that this matrix inversion algorithm mostly of theoretical interest since it is not practical for any reasonable value of n. We have taken special care to preserve the sparsity of this problem so that, if one assumes a bandwidth of b (e.g. typically \(b = O(\sqrt{n})\) for \(d=2\) and \(b = O(n^{2/3})\) for \(d=3\)), one obtains an \(O(b^2n)\) sparse matrix solve algorithm, resulting in \(O(n^{2.5}\log n)\) (\(d=2\)) or \(O(n^{2.84}\log n)\) (\(d=3\)) FLOPS for our overall algorithms. In addition, we mention many preconditioning opportunities [4, 10, 12, 14, 15, 22,23,24,25,26, 35]. Although solution by preconditioning is possible, it is difficult to estimate the number of iterations a priori since the diffusion coefficient of the stiffness matrix is difficult to estimate a priori; in the best case (“optimal preconditioning”) where the elliptic solve at each Newton iteration can be done in O(n) FLOPS, our algorithms are then \(O(n^{1.5} \log n)\) FLOPS.

Our paper is organized as follows. In Sect. 2, we give some preparatory material on the p-Laplacian and the barrier method. In Sect. 3, we prove our main theorem for \(1 \le p < \infty \) and a separate theorem for the case \(p=\infty \). In Sect. 4, we validate our algorithms with numerical experiments. We end with some conclusions.

2 Preparatory material

We now discuss some preparatory material regarding the \(p-\)Laplacian, for \(1 \le p \le \infty \). The \(\infty \)-Laplacian can be interpreted as the problem of minimizing

$$\begin{aligned} J(u) = \Vert u+g \Vert _{X^{\infty }(\Omega )} - \int _{\Omega } fu \text { where } \Vert v\Vert _{X^{\infty }(\Omega )} = \sup _{x \in \Omega } \Vert \nabla v(x)\Vert _2. \end{aligned}$$
(7)

Note that (7) is not a limit as \(p \rightarrow \infty \) of (2), e.g. because (2) uses the pth power of \(\Vert \cdot \Vert _{X^p}\) in its definition.

Lemma 1

For \(1\le p \le \infty \), J(u) is convex on \(W^{1,p}(\Omega )\). For \(1<p<\infty \), J(u) is strictly convex on \(W^{1,p}_0(\Omega )\).

Proof

We consider the case \(1 \le p < \infty \) in detail, the case \(p=\infty \) is done in a similar fashion. Convexity (and strict convexity) is unaffected by linear shifts so without loss of generality we assume that \(f=0\). Let \(0 \le t \le 1\). We must show that \(J(tu+(1-t)v) \le tJ(u) + (1-t)J(v)\). To simplify the notation, let \(q = \nabla u\), \(r = \nabla v\) and \(s = \nabla g\).

$$\begin{aligned} J(tu+(1-t)v)&= \int _{\Omega } \Vert tq+(1-t)r+s\Vert _2^p {\mathop {\le }\limits ^{(*)}} \int _{\Omega } \left( t\Vert q+s\Vert _2+(1-t)\Vert r+s\Vert _2 \right) ^p \\&{\mathop {\le }\limits ^{(**)}} \left( \int _{\Omega } t\Vert q+s\Vert _2^p + (1-t)\Vert r+s\Vert _2^p \right) = tJ(u) + (1-t)J(v), \end{aligned}$$

where we have used the triangle inequality for \(\Vert \cdot \Vert _2\) at \((*)\) and the convexity of \(\phi (z) = z^p\) at \((**)\).

We now prove strict convexity for the \(1<p<\infty \) case. If we have equality at \((*)\) then \(q(x)+s(x)\) and \(r(x)+s(x)\) are non-negative multiples of one another, i.e. \(q+s = aw\) and \(r+s = bw\) where \(a(x),b(x) \ge 0\) and w(x) is vector-valued. Then \((**)\) becomes \(\int _{\Omega } ((ta+(1-t)b)\Vert w\Vert _2)^p \le \int _{\Omega } (ta^p+(1-t)b^p) \Vert w\Vert _2^p\). Note that \((ta+(1-t)b)^p < ta^p+(1-t)b^p\) unless \(a=b\) so the inequality (**) is strict unless \(\nabla u = \nabla v\) almost everywhere. Since \(u,v \in W^{1,p}_0\) can be identified by their gradients, we have proven strict convexity. \(\square \)

From the norm equivalence \(\Vert u\Vert _p \le d^{\max \left\{ 0,{1 \over p} - {1 \over q}\right\} } \Vert u\Vert _q\) for \(x \in \mathbb {R}^d\), one obtains

$$\begin{aligned} d^{-\max \left\{ 0,{1 \over p} - {1 \over 2}\right\} } |u|_{W^{1,p}} \le \Vert u\Vert _{X^p(\Omega )} \le d^{\max \left\{ 0,{1 \over 2} - {1 \over p}\right\} } |u|_{W^{1,p}}. \end{aligned}$$
(8)

We can give a modified Friedrichs inequality for \(\Vert \cdot \Vert _{X^p}\).

Lemma 2

(Friedrichs inequality for \(\Vert \cdot \Vert _{X^p}\)) Assume that \(\Omega \subset \mathbb {R}^d\) fits inside of a strip of width \(\ell \) and assume that \(\phi \in W_0^{1,p}(\Omega )\), where \(1 \le p \le \infty \). Then, \(\Vert \phi \Vert _{L^p} \le \ell p^{-{1 \over p}} \Vert \phi \Vert _{X^p}\), where we define \(\infty ^{-{1 \over \infty }} = 1\).

Proof

Without loss of generality, assume that \(\Omega \) is inside the strip \(0 \le x_1 \le \ell \). From the fundamental theorem of calculus, the following argument proves the \(p=\infty \) case: \(|\phi (x_1,\ldots ,x_d)| \le \int _0^{x_1} |\phi _{x_1}(t,x_2,\ldots ,x_d)| \, dx \le \int _0^{\ell } \sup _{x \in \Omega } \Vert \nabla \phi (x)\Vert _2 \, dx_1 \le \ell \Vert \phi \Vert _{X^{\infty }}\). Now assume \(1 \le p < \infty \).

$$\begin{aligned} \int _0^\ell |\phi |^p \, dx_1&= \int _0^\ell \left| \int _0^{x_1} \phi _{x_1}(t,x_2,\ldots ,x_d) \, dt \right| ^p \, dx_1 \end{aligned}$$
(9)
$$\begin{aligned}&\le \int _0^\ell \int _0^{x_1} |\phi _{x_1}(t,x_2,\ldots ,x_d)|^px_1^{p-1} \, dt \, dx_1 \text { (Jensen's ineq.)} \end{aligned}$$
(10)
$$\begin{aligned}&\le \int _0^{\ell } \Vert \nabla \phi (t,x_2,\ldots ,x_d)\Vert _2^p \int _t^\ell x_1^{p-1} \, dx_1 \, dt \end{aligned}$$
(11)
$$\begin{aligned}&= \int _0^{\ell } \Vert \nabla \phi (t,x_2,\ldots ,x_d)\Vert _2^p {1 \over p}(\ell ^p-t^p)\, dt \end{aligned}$$
(12)
$$\begin{aligned}&\le {\ell ^p \over p} \int _0^{\ell } \Vert \nabla \phi (t,x_2,\ldots ,x_d)\Vert _2^p \, dt. \end{aligned}$$
(13)

The result follows by integrating over \(x_2,\ldots ,x_d\). \(\square \)

We now give an a priori estimate on the magnitude of the minimizer of J(u). This estimate will be important in the design of our algorithm in order to limit the search volume to some ball of reasonable size.

Lemma 3

Let \(1<p<\infty \) and \({1 \over p}+{1 \over q}=1\) and assume that \(\Omega \subset \mathbb {R}^d\) is a domain of width L. Let \(\Vert v\Vert _{X^p}^p = \int _{\Omega } \Vert \nabla v\Vert _2^p\). Assume \(\{u_k\} \subset W_0^{1,p}(\Omega )\) is a minimizing sequence for J(u). Then, for large enough k,

$$\begin{aligned} \Vert u_k\Vert _{X^p}^p \le 4\Vert g\Vert _{X^p}^p + 2L^q\left( p\over 2\right) ^{1 \over 1-p}(p-1)\Vert f\Vert _{L^q}^q. \end{aligned}$$
(14)

If \(\{p,q\} = \{1,\infty \}\) and \(L\Vert f\Vert _{L^{q}}<1\) then a minimizing sequence must eventually lie in \(\Vert u_k\Vert _{X^p} \le \Vert g\Vert _{X^p}/(1-L\Vert f\Vert _{L^{q}})\).

Proof

Case \(1<p<\infty \): For convenience, we write \(J(u) = {1 \over p}\Vert u+g\Vert _{X^p}^p - \int _{\Omega } fu\). Assume \(\Vert u\Vert _{X^p} \ge \Vert g\Vert _{X^p}\); then:

$$\begin{aligned} J(u)&\ge {1 \over p}(\Vert u\Vert _{X^p}-\Vert g\Vert _{X^p})^p - \Vert f\Vert _{L^q} \Vert u\Vert _{L^p}\\&\ge {1 \over p}\Vert u\Vert _{X^p}^p - {1 \over p}\Vert g\Vert _{X^p}^p - \Vert f\Vert _{L^q} Lp^{-1/p}\Vert u\Vert _{X^p}. \end{aligned}$$

Next, we use Young’s inequality \(ab \le {1 \over q} a^q + {1 \over p} b^p\) with \(a=2^{1/p}p^{-1/p}L\Vert f\Vert _{L^q}\), \(b=2^{-1/p}\Vert u\Vert _{X^p}\) to obtain

$$\begin{aligned} J(u)-J(0)&\ge {1 \over 2p}\Vert u\Vert _{X^p}^p - {2 \over p}\Vert g\Vert _{X^p}^p - {L^q(p/2)^{1 \over 1-p} \over q }\Vert f\Vert _{L^q}^q. \end{aligned}$$
(15)

Hence, if \(\Vert u\Vert _{X^p}^p > 4\Vert g\Vert _{X^p}^p + 2L^q(p/2)^{1 \over 1-p}(p-1)\Vert f\Vert _{L^q}^q\), then \(J(u)-J(0) > 0\) and hence a minimizing sequence must satisfy (14).

The \(p=1\) case is as follows:

$$\begin{aligned} J(u) - J(0) \ge \Vert u\Vert _{X^1} - \Vert g\Vert _{X^1} - \Vert f\Vert _{L^{\infty }}L\Vert u\Vert _{X^1} > 0, \end{aligned}$$

if \(\Vert u\Vert _{X^1} > \Vert g\Vert _{X^1}/(1-\Vert f\Vert _{L^{\infty }}L)\). The \(p = \infty \) case is done in a similar fashion. \(\square \)

The a priori estimate above can also be used to show the existence of a minimizer of J(u).

Lemma 4

Let \(1<p<\infty \). There is a unique \(u \in V \subset W^{1,p}_0(\Omega )\) that minimizes J(u).

Proof

Let \(\alpha = \inf _v J(v)\). We now show how to produce a minimizing sequence \(\{u_k\} \subset W^{1,p}_0(\Omega )\). For \(k=1,2,\ldots \), let \(B_k = \{ u \in W^{1,p}_0(\Omega ) \; | \; J(u)< \alpha +1/k \text { and } \Vert u\Vert _{X^p} < 4\Vert g\Vert _{X^p}^p + 4L^2(p-1)\Vert f\Vert _{L^q}^q+1 \}\), see (14). Note that each \(B_k\) is open and nonempty so pick \(u_{k} \in B_k\). Furthermore, the \(B_k\) are nested: \(B_1 \supset B_2 \supset \ldots \); the convexity of J implies that the \(B_k\) are also convex.

According to (8), we see that each \(B_k\) is contained in a closed ball \(F = \{ u \in W^{1,p}_0(\Omega ) \; | \; |u|_{W^{1,p}(\Omega )} \le r \}\) where \(r = d^{\max \left\{ 0,{1 \over p} - {1 \over 2}\right\} } \left( 4\Vert g\Vert _{X^p}^p {+} 4L^2(p{-}1)\Vert f\Vert _{L^q}^q{+}1\right) \). Recall that F is weakly compact. Passing to a subsequence if necessary, we may assume that \(\{u_k\}\) converges weakly to some u. By Mazur’s lemma, we can now find some convex linear combinations \(v_k = \sum _{j=k}^{J(k)} \alpha _j u_j \in B_k\) such that \(\{v_k\}\) converges to u strongly. This shows that u belongs to every \(B_k\) and hence J(u) is minimal. Uniqueness follows by strict convexity. \(\square \)

2.1 Finite elements

Assume that \(\Omega \) is a polygon. We introduce a triangulation \(T_h\) of \(\Omega \), parametrized by \(0<h<1\), and piecewise linear finite element basis functions \(\{\phi _1(x),\ldots ,\phi _n(x)\} \subset W^{1,p}(\Omega )\). As usual, we define a “reference element” \(\hat{K} = \{ x \in \mathbb {R}^d \; : \; x_i \ge 0 \text { for } i=1,\ldots ,d \text { and } \sum _{i=1}^d x_i \le 1 \}\). Each simplex \(K_k\) in \(T_h\) can be written as \(K_k = \Phi _k(\hat{K}) = P^{(k)}\hat{K} + q^{(k)}\), where \(P^{(k)} \in \mathbb {R}^{d \times d}\) and \(q^{(k)} \in \mathbb {R}^d\). If \(T_h\) is a uniform lattice of squares or d-cubes, then each \(P^{(k)}\) is of the form \({{\,\mathrm{diag}\,}}(\pm h, \ldots , \pm h)\), and . In general, if \(T_h\) is not necessarily a uniform lattice, we say that the family of triangulations \(T_h\), parametrized by \(0<h<1\), is quasi-uniform with parameter \(\rho < \infty \) if . Note that on the reference simplex, the basis functions are \(\hat{\phi }_i(\hat{x}) = \hat{x}_i\) for \(i=1,\ldots ,d\) and \(\hat{\phi }_0(\hat{x}) = 1-\sum _i \hat{x}_i\). As a result, \(\Vert \nabla \hat{\phi }\Vert _2 \le \sqrt{d}\) and, from the chain rule, \(\Vert \nabla \phi _i(x)\Vert _2 \le h^{-1}\sqrt{d}\).

Let \({{\,\mathrm{span}\,}}\{\phi _k(x) \; | \; k=1,\ldots ,n\} \subset W^{1,p}(\Omega )\) be the finite element space of piecewise linear elements over \(T_h\) and let \(\int _{\Omega } w(x) \, dx \approx \sum _{i=1}^m \omega _i w(x^{(i)})\) be the midpoint quadrature rule, which is exact for piecewise linear or piecewise constant functions. We can construct a “discrete derivative” matrix \(D^{(j)}\) whose (ik) entry is \(D^{(j)}_{i,k} = {\partial \phi _k \over \partial x_j}(x^{(i)})\). Then,

$$\begin{aligned} {1 \over p}\int _{\Omega } \Vert \nabla (u+g)\Vert _2^p = \sum _{i=1}^m {\omega _i \over p}\left( \sum _{j=1}^d\left( (D^{(j)} (u+g))_i \right) ^2\right) ^{p \over 2}; \end{aligned}$$

note that the quadrature is exact provided that g is also piecewise linear. For the midpoint rule, \(\omega _i\) is the volume of the simplex \(K_i\); if the triangulation \(T_h\) is quasi-uniform then we find that

$$\begin{aligned} {h^d \over d!} \le \omega _i \le {\rho ^d h^d \over d!}; \end{aligned}$$
(16)

we write \(\omega _i = \Theta (h^d)\), which means both that \(\omega _i = O(h^d)\) and \(h^d = O(\omega _i)\). We abuse the notation and use the same symbol u to represent both the finite element coefficient vector \([u_1,\ldots ,u_n]^T\) and the finite element function \(u(x) = \sum _{k=1}^n u_k \phi _k(x)\).

We further denote by \(D^{(j)}_{\Gamma }\) the columns of \(D^{(j)}\) corresponding to the vertices of \(T_h\) in \(\partial \Omega \), and \(D^{(j)}_I\) corresponding to the interior vertices in \(\Omega \), such that \(D^{(j)} = \begin{bmatrix} D^{(j)}_I&D^{(j)}_\Gamma \end{bmatrix}\). Denoting \(u = \begin{bmatrix} u_I^T&u_\Gamma ^T \end{bmatrix}^T = \begin{bmatrix} u_I^T&0^T \end{bmatrix}^T\), note that \(D^{(j)}(u+g) = D_I^{(j)} u_I + D^{(j)} g\). Putting \(b^{(j)}=D^{(j)} g\) and dropping the subscript I leads to the discretized system (3). The matrix \(A = \sum _{k=1}^d [D_I^{(k)}]^T {{\,\mathrm{diag}\,}}(\omega _1,\ldots ,\omega _m) D_I^{(k)}\) is the usual discretization of the Laplacian or Poisson problem, and we have that \(u^TAu = |u|^2_{H^1} = \int _{\Omega } \Vert \nabla u\Vert _2^2 \, dx\). For a domain of width \(\ell \), the Friedrichs inequality \(\Vert u\Vert _{L^2} \le \ell |u|_{H^1}\) (see [33, (18.1) and (18.19)]) proves that the smallest eigenvalue of the Laplacian differential operator is at least \(\ell ^{-2}\); however, the smallest eigenvalue of the finite-dimensional matrix A is actually \(\Theta (h^2)\) because the relevant Rayleigh quotient in \(\mathbb {R}^n\) is \(u^*Au/u^*u \ne |u|_{H^1}^2/\Vert u\Vert _{L^2}^2\).

We now prove that the finite element method converges for the p-Laplacian.

Lemma 5

Assume that \(\Omega \) is a polytope and \(1< p < \infty \). Let \(u_h\) be the finite element minimizer of J(u) in a finite element space that contains the piecewise linear functions. Then, \(J(u_h)\) converges to \(\inf _v J(v)\) as \(h \rightarrow 0\).

Proof

Let u be a minimizer of J(u) and denote by \(V_h \subset W_0^{1,p}(\Omega )\) the finite element space with grid parameter \(0<h<1\). Recall that finite element functions are dense in \(W^{1,p}_0(\Omega )\) [13, Proposition 2.8, page 316]. Hence, we can find finite element functions \(\{v_{h} \in V_{h} \}\) that converge to u in the \(W^{1,p}_0(\Omega )\) norm as \(h \rightarrow 0\). Since J is continuous and since \(u_{h}\) minimizes J(u) in the finite element space \(V_{h}\), we find that \(J(u_{h}) \le J(v_{h}) \rightarrow J(u) = \inf _v J(v)\), as required. \(\square \)

Lemma 5 is very general (no regularity assumptions are made on the solution u) but also very weak since it does not give a rate of convergence. If one assumes regularity of the solution then one can use quasi-interpolation [34] to estimate the convergence more precisely. However, we will see in Sect. 2.2 (Example 1) that it is difficult to prove regularity. Since the present paper focuses on the numerical solver, and not in the discretization, we do not investigate this aspect any further. The theorem also does not specify whether \(u_h\) converges as h tends to 0. In the case \(1<p<\infty \), the strict convexity of J ensures that \(u_h\) will indeed converge to a u in \(W^{1,p}(\Omega )\) but for \(p=1\) there may be multiple minimizers and then \(u_h\) could oscillate between the many minimizers or converge to a “minimizer” in the double-dual of \(W^{1,1}(\Omega )\).

2.2 Pathological situations for extreme p values

The p-Laplace problem varies in character as p ranges from \(1 \le p \le +\infty \). When \(p=2\), minimizing J(u) is equivalent to solving a single linear problem, which is clearly faster than solving hundreds of linear problems as required by a barrier method. As p gets further away from \(p=2\), naive solvers work less well and proper convex optimization algorithms are required, such as our barrier methods. The extreme cases \(p=1\) and \(p=\infty \) have traditionally been considered hardest. For example, J(u) may not be differentiable at u when \(p \in \{1,\infty \}\), typically when \(\nabla (u+g)\) vanishes at some point \(x \in \Omega \). In Sect. 2, we have introduced several lemmas, some of which work for all cases \(1 \le p \le \infty \), others are restricted to \(1<p<\infty \). Briefly speaking, we have shown that for all \(1 \le p \le \infty \), J(u) is convex and possesses minimizing sequences (with some restrictions on the forcing f when \(p\in \{1,\infty \}\).) These facts are sufficient to deploy barrier methods, because barrier methods do not require the objective to be differentiable, be strictly convex, or have a unique minimizer. As a “bonus”, we have also shown that J(u) is strictly convex and has a unique minimum when \(1<p<\infty \), but this is not required for the successful application of our barrier methods.

We now illustrate the pathological behavior for extreme values of p with several simple examples. For \(1<p<\infty \), strict convexity ensures the uniqueness of the minimum of J(u). However, in the case \(p=1\), the minimizer may be “outside” of \(W^{1,1}(\Omega )\) or nonunique.

Example 1

Consider \(\Omega = (0,1)\) in dimension \(d=1\) and \(f=0\), with boundary conditions \(u(0) = 0\) and \(u(1) = 1\) and with \(p=1\). Then,

$$\begin{aligned} J(u) = \int _0^1 |u'(x)| \, dx = TV(u) \ge 1, \end{aligned}$$

where TV(u) denotes the usual total variation of u. Any monotonically nondecreasing function u(x) with \(u(0) = 0\) and \(u(1) = 1\) will minimize J(u) and satisfy the boundary conditions.

A minimizing sequence for J(u) is the piecewise linear functions \(u_n(x) = \min (1,\max (0,0.5+n(x-0.5)))\). This sequence converges to the indicating function of [0.5, 1), which is not in \(W^{1,1}(0,1)\). This is because \(W^{1,1}(0,1)\) is not reflexive and hence its unit ball is not weakly compact. Instead, the limit of \(u_n\) is in BV, the double-dual of \(W^{1,1}(0,1)\).

We now briefly show why the minimization of J(u) for \(u \in V_h\) is numerically challenging.

Example 2

Consider \(J(x) = |x|^p\) where \(x \in \mathbb {R}\) and \(1 \le p < \infty \); this corresponds to a 1-dimensional discrete p-Laplacian with a single grid point. The Newton iteration \(x_{k+1} = x_k - J'(x_k)/J''(x_k)\) is

$$\begin{aligned} x_{k+1} = x_k - {{{\,\mathrm{sgn}\,}}(x_k)p|x_k|^{p-1} \over p(p-1)|x_k|^{p-2}} = {p-2 \over p-1}x_k. \end{aligned}$$

Hence, the Newton iteration converges linearly for \(p \in (1.5,2) \cup (2,\infty )\) and diverges for \(1 < p \le 1.5\). The Newton iteration is undefined for \(p=1\) since \(J'' = 0\).

The p-Laplacian for \(p=1\) is particularly hard; we now show two types of difficulties. First, the Hessian may be singular, and regularizing the Hessian leads to gradient descent.

Example 3

Consider \(J(x) = \sqrt{x_1^2 + x_2^2} = \Vert x\Vert _2\); this correspond to a 2-dimensional 1-Laplacian discretized with a single grid point. The gradient is \(J'(x) = {x \over \Vert x\Vert _2}\) and the Hessian is

$$\begin{aligned} J''(x) = {1 \over \Vert x\Vert _2^3} \begin{bmatrix} x_2^2 &{}\quad -x_1x_2 \\ -x_1x_2 &{}\quad x_1^2 \end{bmatrix}. \end{aligned}$$

The Hessian matrix \(J''(x)\) is singular which makes the Newton iteration undefined. To make matters worse, the kernel of \(J''\) is spanned by \(J'\) and hence any “regularization” \(J''+\epsilon I\) leads to a simple gradient descent.

Yet another difficulty is that the 1-Laplacian may have nonunique solutions or no solutions when the forcing is nonzero.

Example 4

Let \(c \in \mathbb {R}\) and \(J(x) = |x|+cx\); this corresponds to a 1-dimensional 1-Laplacian with a nonzero forcing term, discretized with a single grid point. Then, J(x) is convex for all \(c \in \mathbb {R}\). However, J(x) has a unique minimum \(x=0\) if and only if \(|c|<1\). When \(|c| = 1\), J(x) has infinitely many minima. When \(|c|>1\), J(x) is unbounded below and there is no minimum.

As a result, the energy J(u) of the 1-Laplacian may not be bounded below when the forcing \(f \ne 0\); see also Lemma 3.

2.3 Convex optimization by the barrier method

In this section, we briefly review the theory and algorithms of convex optimization and refer to Nesterov [27, Section 4.2] for details, including the notion of self-concordant barriers.

Let \(Q \subset \mathbb {R}^n\) be a bounded closed convex set that is the closure of its interior, \(c \in \mathbb {R}^n\) be a vector and consider the convex optimization problem

$$\begin{aligned} c^* = \min \{ c^Tx \; : \; x \in Q \}. \end{aligned}$$
(17)

The barrier method (or interior point method) for solving (17) is to minimize \(tc^Tx + F(x)\) for increasing values of \(t \rightarrow \infty \), where the barrier function F(x) tends to \(\infty \) when \(x \rightarrow \partial Q\). The minimizer \(x^*(t)\), parametrized by \(t \ge 0\), of \(tc^Tx + F(x)\), is called the central path, and \(x^*(t)\) forms a minimizing sequenceFootnote 1 for (17) as \(t \rightarrow \infty \). Assume we have a \(\nu \)-self-concordant barrier F(x) for Q. Define the norm \(\Vert v\Vert _x^* = \sqrt{v^T[F''(x)]^{-1}v}.\) The main path-following scheme is

  1. 1.

    Set \(t_0 = 0\), \(\beta = 1/9\) and \(\gamma = 5/36\). Choose an accuracy \(\epsilon >0\) and \(x^{(0)} \in Q\) such that \(\Vert F'(x^{(0)})\Vert _{x^{(0)}}^* \le \beta .\)

  2. 2.

    The kth iteration (\(k \ge 0\)) is

    $$\begin{aligned} t_{k+1} = t_k + {\gamma \over \Vert c\Vert _{x^{(k)}}^*} \text { and } x^{(k+1)} = x^{(k)} - [F''(x^{(k)})]^{-1}(t_{k+1}c+F'(x^{(k)})). \end{aligned}$$
    (18)
  3. 3.

    Stop if \( t_k \ge \left( \nu + {(\beta + \sqrt{\nu }) \beta \over 1-\beta } \right) \epsilon ^{-1} =: {{\,\mathrm{tol}\,}}^{-1}. \)

The invariant of this algorithm is that, if \(\Vert t_k c + F'(x^{(k)})\Vert _{x^{(k)}}^* \le \beta \) then also \(\Vert t_{k+1} c + F'(x^{(k+1)})\Vert _{x^{(k+1)}}^* \le \beta \). The stopping criterion guarantees that, at convergence, \(c^Tx^{(k)} - c^* \le \epsilon \). Starting this iteration can be difficult, since it is not always obvious how to find an initial point \(x^{(0)} \in Q\) such that \(\Vert F'(x^{(0)})\Vert _{x^{(0)}}^* \le \beta .\) Define the analytic center \(x_F^*\) by \(F'(x_F^*) = 0\). We use an auxiliary path-following schemeFootnote 2 to approximate the analytic center \(x_F^*\) of Q:

  1. 1.

    Choose \(x^{(0)} \in Q\) and set \(t_0 = 1\) and \(G = -F'(x^{(0)})\).

  2. 2.

    For the kth iteration (\(k \ge 0\)):

    $$\begin{aligned} t_{k+1}&= t_k - {\gamma \over \Vert G\Vert _{x^{(k)}}^*} \text { and } \end{aligned}$$
    (19)
    $$\begin{aligned} x^{(k+1)}&= x^{(k)} - {[F''(x^{(k)})]^{-1}(t_{k+1}G+F'(x^{(k)})) }. \end{aligned}$$
    (20)
  3. 3.

    Stop if \(\Vert F'(x^{(k)})\Vert _{x^{(k)}}^* \le {\sqrt{\beta } \over 1+\sqrt{\beta }}\). Set \(\bar{x} = x^{(k)} - [F''(x^{(k)})]^{-1} F'(x^{(k)})\).

The invariant of the auxiliary scheme is that \(\Vert t_kG + F'(x^{(k)})\Vert _{x^{(k)}}^* \le \beta \) for every k. At convergence, one can show that \(\Vert F'(\bar{x})\Vert _{\bar{x}}^* \le \beta \). Let \(\hat{x} \in Q\) be some starting point for the auxiliary path-following scheme. Combining the auxiliary path-following scheme to find the approximate analytic center \(\bar{x}\) of Q, followed by the main path-following scheme to solve the optimization problem (17) starting from \(x^{(0)} = \bar{x}\), completes in at most N iterations, where

$$\begin{aligned} N = 7.2 \sqrt{\nu } \left[ 2\log \nu + \log \Vert F'(\hat{x})\Vert _{x_F^*}^* + \log \Vert \hat{x}\Vert _{x_F^*}^* + \log (1/\epsilon ) \right] . \end{aligned}$$
(21)

2.3.1 Long-step algorithms

The path-following schemes of Sect. 2.3 are so-called “short step”, meaning that the barrier parameter t increases fairly slowly when \(\nu \) is large. It is well-known that long-step algorithms, where t increases more rapidly, often converge faster overall than short-step algorithms, even though the worst case estimate \(O(\nu \log \nu )\) is worse than the short-step estimate \(O(\sqrt{\nu } \log \nu )\), see Nesterov and Nemirovskii [28] for details. The main path-following scheme can be made “long-step” as follows:

  1. 1.

    Assume \(x^{(0)} \in Q\) such that \(\Vert F'(x^{(0)})\Vert _{x^{(0)}}^* \le \beta \) and let \(t_0 = 0\).

  2. 2.

    Set

    $$\begin{aligned} t_{k+1}&= {\left\{ \begin{array}{ll} \max \left\{ \kappa t_k, t_k + {\gamma \over \Vert c\Vert _{x^{(k)}}^*} \right\} &{}\quad \text { if } \Vert t_k c + F'(x^{(k)})\Vert _{x^{(k)}}^* \le \beta , \\ t_k &{}\quad \text { otherwise;} \end{array}\right. } \end{aligned}$$
    (22)
    $$\begin{aligned} x^{(k+1)}&= x^{(k)} - r_k [F''(x^{(k)})]^{-1}(t_{k+1}c + F'(x^{(k)})), \end{aligned}$$
    (23)

    where \(0 < r_k \le 1\) is found by line search, see e.g. Boyd and Vandenberghe [5, Algorithm 9.2 with \(\alpha = 0.01\) and \(\beta = 0.25\)].

  3. 3.

    Stop if \( t_k \ge \left( \nu + {(\beta + \sqrt{\nu }) \beta \over 1-\beta }\right) \epsilon ^{-1} = {{\,\mathrm{tol}\,}}^{-1}. \)

The parameter \(\kappa \ge 1\) determines the step size of the scheme. In convex optimization, step sizes \(\kappa = 10\) or even \(\kappa = 100\) are often used, but we will see in Sect. 4 that shorter step sizes are better suited for the p-Laplacian.

The long-step variant of the auxiliary path-following scheme is implemented in a similar fashion; the criterion for decreasing \(t_{k+1}\) is then \(\Vert t_k G + F'(x^{(k)}\Vert _{x^{(k)}}^* \le \beta \).

2.3.2 Adaptive stepping

We finally introduce an algorithm whose step parameter \(\kappa _k\) is indexed by the iteration counter k. We first introduce some terminology. If \(\Vert t_k c + F'(x^{(k)})\Vert _{x^{(k)}}^* \le \beta \) (main phase) or \(\Vert t_k G + F'(x^{(k)})\Vert _{x^{(k)}}^* \le \beta \) (auxiliary phase), we say that \(x^{(k)}\) was accepted, else we say that \(x^{(k)}\) was a slow step. Let \(\kappa _0\) be an initial step size (we will take \(\kappa _0 = 10\).)

  1. 1.

    If \(x^{(k)}\) is accepted after 2 or fewer slow steps, put \(\kappa _{k+1} = \min \{\kappa _0,\kappa _k^2\}\).

  2. 2.

    If \(x^{(k)}\) is accepted after 8 or more slow steps, put \(\kappa _{k+1} = \sqrt{\kappa _k}\).

  3. 3.

    If \(x^{(k)}\) is still not accepted after 15 slow steps, replace \(x^{(k+1)}\) and \(t_{k+1}\) by the most recently accepted step and put \(\kappa _{k+1} = \kappa _k^{1/4}\). We call this procedure a rejection.

  4. 4.

    Otherwise, put \(\kappa _{k+1} = \kappa _k\).

The quantity \(t_{k+1}\) is computed as in the long step algorithm (22), with \(\kappa = \kappa _{k+1}\). Note that whenever \(t_{k+1}\) coincides with the short step (20), then the step is automatically accepted. The rejection is “wasteful” in that it discards possibly useful information, but we will see in the numerical experiments that this adaptive scheme is quite efficient in practice. Furthermore, the rejection step is the key that unlocks a very simple analysis of our algorithm.

Theorem 2

For given \(c,F,\epsilon \), let \(N_S\) and \(N_A\) be the number of Newton steps of the short step and adaptive step algorithms, respectively. Then,

$$\begin{aligned} N_A&\le 16\lceil 0.76 + 0.73 \log (1+9\sqrt{\nu })\rceil N_S. \end{aligned}$$
(24)

Proof

By construction, on each accepted step of the main path-following algorithm, we find that \(t_{k+1} \ge t_k + {\gamma \over \Vert c\Vert _{x^{(k)}}^*}\), the short step size, see (22). Thus, we only need to estimate the maximum number of slow steps before a step is accepted. According to [27, p.202], the short step size satisfies

$$\begin{aligned} t_{k+1} \ge \overbrace{\left( 1 + {5 \over 4+36\sqrt{\nu }}\right) }^{\kappa _{\min }} t_k. \end{aligned}$$

Starting from \(\kappa = 10\), after r rejections, the step size is \(\kappa = 10^{(1/4)^r}\). When \(\kappa \le \kappa _{\min }\), the short step is automatically accepted and hence the maximum number of rejections is \(r = \lceil r_- \rceil \), where

$$\begin{aligned} 10^{(1/4)^{r_-}} = \kappa _{\min } \implies r_- = -{\log (\log (\kappa _{\min })/\log (10)) \over \log 4}. \end{aligned}$$

Hence,

$$\begin{aligned} r \le \lceil 0.76 + 0.73 \log (1+9\sqrt{\nu })\rceil . \end{aligned}$$

Since all the adaptive steps are at least as large as the short steps and the stopping criterion is purely based on the barrier parameter \(t_k\), and noting that each rejection corresponds to 15 slow steps (plus the initial accepted step), we obtain the estimate for the main phase. The estimate for the auxiliary phase is obtained in a similar fashion. \(\square \)

Theorem 2 states that the adaptive algorithm cannot be much worse than the short step algorithm, which means that the adaptive algorithm scales at worse like \(\tilde{O}(\sqrt{\nu })\), where we have neglected some logarithms. The reader may be surprised that the estimate for the adaptive scheme is slightly worse than the estimate for the short step scheme, but this is a well-known phenomenon in convex optimization. The long step estimates are quite pessimistic and in practice, long step and adaptive schemes work much better than the theoretical estimates. Our result is especially interesting because it is well-known that estimates for long step algorithms scale like \(\tilde{O}(\nu )\), whereas our new algorithm scales like \(\tilde{O}(\sqrt{\nu })\).

3 Proof of Theorem 1

The proof of Theorem 1 is rather technical, so we begin by outlining the plan of our proof. The idea is to estimate all the quantities in the bound (21) for the number N of Newton iterations. The barrier parameter \(\nu \) is estimated in Lemma 6. Some “uniform” or “box” bounds are given for the central path in Lemma 7; these are an intermediate step in converting as many estimates as possible into functions of h. Because (21) depends on the Hessian \(F''\) of the barrier, the lowest eigenvalue \(\lambda _{\min }\) of \(F''\) is estimated in Lemma 8. This bound itself depends on extremal singular values of the discrete derivative matrices, which are estimated in Lemma 9, and these bounds are rephrased in terms of h in Lemma 10. In Lemma 11, we establish the connection between the number m of simplices and the grid parameter h, which is used in Lemma 12 to estimate the quantities \(\Vert \hat{x}\Vert _2\) and \(\Vert F'(\hat{x})\Vert _2\), which can be converted to estimates for \(\Vert \hat{x}\Vert _{x_F^*}^*\) and \(\Vert F'(\hat{x})\Vert _{x_F^*}^*\) in (21) by dividing by \(\lambda _{\min }\); here \(\hat{x}\) is a starting point for the barrier method. Finally, the quantities R appearing in Theorem 1 are obtained by starting from the estimates of Lemma 3, adding 1, and doubling them. This ensures that the central path will be well inside the ball of radius R.

In the present section, we treat in detail the case \(1 \le p < \infty \). The case \(p=1\), which is considered especially difficult, poses no special difficulty in the present section, provided that the hypotheses of Lemma 3 are satisfied. The case \(p=\infty \) is deferred to Sect. 3.1.

Let \(1 \le p < \infty \) and define the barrier

$$\begin{aligned} F(u,s)&= F_p(u,s) = - \sum _i \log z_i - \sigma \sum _i \log s_i - \sum _i \log \tau _i \text { where } \end{aligned}$$
(25)
$$\begin{aligned} z_i&= s_i^{2/p} - \sum _{j=1}^d [(\overbrace{D^{(j)}u+D^{(j)}g}^{y^{(j)}})_i]^2, \qquad \tau _i = R - \omega _i s_i \text { and } \end{aligned}$$
(26)
$$\begin{aligned} \sigma&= \sigma (p) = {\left\{ \begin{array}{ll} 2 &{}\quad \text { if } 1 \le p < 2 \text { and } \\ 1 &{}\quad \text { if } p \ge 2. \end{array}\right. } \end{aligned}$$
(27)

Lemma 6

The function F(ut) is an \(m(\sigma +2)\)-self-concordant barrier for the set

$$\begin{aligned} Q = \{ (u,s) \; : \; s_i \ge \Vert \nabla (u+g)|_{K_i}\Vert _2^p, \; s_i \ge 0 \text { and } \max _i \omega _i s_i \le R \}, \end{aligned}$$
(28)

The problem of minimizing J(u) over \(u \in V_h\) subject to the additional constraint that \(\max _i \omega _i \Vert \nabla (u+g)|_{K_i}\Vert _2 \le R\) is equivalent to:

$$\begin{aligned} \min c^Tx \text { subject to } x \in Q \text { where } c = \begin{bmatrix} -f \\ \omega \end{bmatrix}. \end{aligned}$$
(29)

Here, we have abused the notation and used the symbol f for the vector whose ith component is \(\int _{\Omega } f(x) \phi _i(x) \, dx\).

Proof

The functions \(B_p(x,s) = -\log (s^{2/p} - x^Tx)\) are \(\sigma +1\)-self-concordant so \(-\sum _i \log \tau _i +\sum _{i=1}^m B_p([\sum _k D_{i,k}^{(j)}(u+g)_k]_{j=1}^d,s_i)\) is \(m(\sigma +2)\) self-concordant, see Nesterov and Nemirovskii [28]. The rest is proved by inspection. \(\square \)

From Lemma 3, it is tempting to use a bound such as \(\Vert u\Vert _{X^p} < R\), i.e. \(\sum _i \omega _u s_i \le R\), but this leads to a dense Hessian \(F_{ss}\). Instead, we have used the “uniform” bound:

$$\begin{aligned} \omega _i s_i \le \sum _i \omega _i s_i = \int _{\Omega } s \le R. \end{aligned}$$

With this “looser” bound, the Hessian \(F_{ss}\) is sparse.Footnote 3 Furthermore, by using the R value from the a-priori estimate Lemma 3, one can ensure that Q is non-empty and contains minimizing sequences for J(u). Thus, put:

$$\begin{aligned} R \ge R^* = 2(1+\Vert g\Vert _{X^{p}}^p) = 2+2\left( \sum _{j=1}^d [(D^{(j)} g)_i]^2 \right) ^{p \over 2}. \end{aligned}$$
(30)

Set

$$\begin{aligned} \hat{s}_i&= 1+\left( \sum _{j=1}^d [(D^{(j)} g)_i]^2 \right) ^{p \over 2};\,\text { hence } \hat{s}_i \le {R \over 2}. \end{aligned}$$
(31)

In this way, \((0,\hat{s}) \in Q\).

Lemma 7

For all \((u,s) \in Q\),

$$\begin{aligned} \tau _i&\le R, \quad s_i \le {R\over \omega _i}, \quad z_i \le \left( R \over \omega _i \right) ^{2 \over p}. \end{aligned}$$
(32)

Proof

From \(w^Ts \ge 0\) and (26), we find \(\tau \le R\). From (26), we find \(z_i \le s_i^{2/p}\) and from \(0 \le \tau _i = R-\omega _i s\), we find \(\omega _i s_i \le R\).

We further find that:

$$\begin{aligned} \hat{\tau }_i&\ge R/2, \qquad \hat{s}_i \ge 1, \qquad \hat{z}_i \ge 1. \end{aligned}$$
(33)

The gradient of F is:

$$\begin{aligned} F' = \begin{bmatrix} F_u \\ F_s \end{bmatrix} = \begin{bmatrix} \sum _j 2 [D^{(j)}]^T{y^{(j)} \over z} \\ -{2 \over p}{1 \over z}s^{2/p-1} - {\sigma \over s} + {\omega \over \tau } \end{bmatrix}, \end{aligned}$$
(34)

where vector algebra is defined entrywise.

The Hessian of F is

$$\begin{aligned} F''&= \begin{bmatrix} F_{uu} &{}\quad F_{us} \\ F_{su} &{}\quad F_{ss} \end{bmatrix} = \begin{bmatrix} F_{uu} &{}\quad F_{us} \\ F_{us}^T &{}\quad F_{ss} \end{bmatrix} \text { where } \end{aligned}$$
(35)
$$\begin{aligned} F_{uu}&= 2\sum _{j=1}^d [D^{(j)}]^T Z^{-1} D^{(j)} + 4\sum _{j,r=1}^d (Y^{(j)}D^{(j)})^T Z^{-2}(Y^{(r)}D^{(r)}), \end{aligned}$$
(36)
$$\begin{aligned} F_{us}&= -{4 \over p}\sum _{j=1}^d (Y^{(j)}D^{(j)})^TZ^{-2}S^{2/p-1}, \end{aligned}$$
(37)
$$\begin{aligned} F_{ss}&= -{2 \over p}\left( {2 \over p}-1 \right) Z^{-1}S^{2/p-2} + {4 \over p^2} Z^{-2} S^{4/p-2} + \sigma S^{-2} + W^2Z^{-2}, \end{aligned}$$
(38)
$$\begin{aligned} S&= {{\,\mathrm{diag}\,}}(s), \; W = {{\,\mathrm{diag}\,}}(\omega ), \; Y = {{\,\mathrm{diag}\,}}(y), \; Z = {{\,\mathrm{diag}\,}}(z). \end{aligned}$$
(39)

Lemma 8

Let \(d_{\min }^2\) be the smallest eigenvalue of \(\sum _{k=1}^d [D^{(k)}]^TD^{(k)}\) and assume \(0<h<1\). Let \(\omega _{\min } = \min _i \omega _i\), and similarly for \((z_F^*)_{\max }\), etc... The smallest eigenvalue \(\lambda _{\min }\) of \(F''(u_F^*,s_F^*)\) is bounded below by

$$\begin{aligned} \lambda _{\min }&\ge \min \left\{ 2(z_F^*)_{\max }^{-1}d_{\min }^2, \omega _{\min }^2 (z_F^*)_{\max }^{-2}\right\} . \end{aligned}$$
(40)

Proof

We consider the “Rayleigh quotient” \(x^TF''x/x^Tx\), the extremal values of which are the extremal eigenvalues of \(F''\). We put \(x = \begin{bmatrix} v \\ w \end{bmatrix}\) so that

$$\begin{aligned} x^TF''x = v^TF_{uu}v + 2 v^TF_{us}w + w^TF_{ss}w \end{aligned}$$

We use the Cauchy-Schwarz inequality together with Young’s inequality to find that

$$\begin{aligned} 2|v^TF_{us}w|&= {4 \over p} \left| \left( \sum _{j=1}^d v^T(Y^{(j)}D^{(j)})^TZ^{-1}\right) \left( Z^{-1}S^{2/p-1}w \right) \right| \\&\le {8 \over p} \sqrt{\sum _{j,r=1}^d v^T(Y^{(j)}D^{(j)})^TZ^{-2}(Y^{(r)}D^{(r)})v}\, \sqrt{w^TS^{2/p-1}Z^{-2}S^{2/p-1}w} \\&\le 4\sum _{j=1,r}^d v^T(Y^{(j)}D^{(j)})^TZ^{-2}(Y^{(r)}D^{(r)})v + {4 \over p^2}w^TS^{2/p-1}Z^{-2}S^{2/p-1}w. \end{aligned}$$

Hence we find:

$$\begin{aligned} x^TF''x&\ge 2\sum _{j=1}^d v^T[D^{(j)}]^T Z^{-1} D^{(j)}v + \left( {2 \over p}-1 \right) w^T\left( -{2 \over p}Z^{-1}S^{2/p-2}\right) w \\&\quad +\,{w^T\sigma S^{-2}w} + { w^TW^2Z^{-2}w}. \end{aligned}$$

We use that \(F_s=0\), which implies that \(-{2 \over p}Z^{-1}S^{2/p-1}=T^{-1}W - \sigma S^{-1}\), where \(T = {{\,\mathrm{diag}\,}}(\tau )\) and hence

$$\begin{aligned} x^TF''x&\ge 2\sum _{j=1}^d v^T[D^{(j)}]^T Z^{-1} D^{(j)}v + {\left( {2 \over p}-1 \right) }{w^TT^{-1}WS^{-1}w} + { w^TW^2Z^{-2}w} \nonumber \\&\ge \Vert x\Vert _2^2\min \left\{ 2z_{\max }^{-1}d_{\min }^2, \omega _{\min }^2 z_{\max }^{-2}\right\} . \end{aligned}$$

\(\square \)

A domain \(\Omega \) is said to be of width L when \(\Omega \subset S\), where S is a strip of width L. The Friedrichs inequality states that, for domains of width \(L>0\) and for \(u \in W^{1,p}_0(\Omega )\), \(\Vert u\Vert _{L^2} \le L |u|_{H^1(\Omega )}\).

Lemma 9

Let \(\Omega \) be a polytope of width \(L<\infty \), and assume that the triangulation \(T_h\), which depends on the grid parameter \(0<h<1\), is quasi-uniform. Then, there is a constant \(c_{\Omega }>0\), which depends on \(\Omega \) and the quasi-uniformity parameter \(\rho \) of \(T_h\), such that the smallest eigenvalue \(d_{\min }^2\) of \(\sum _k [D^{(k)}]^T D^{(k)}\) satisfies

$$\begin{aligned} d_{\min }^2&\ge c_{\Omega }>0. \end{aligned}$$
(41)

Proof

Consider the matrix \(A = \sum _{k=1}^d [D^{(k)}]^TWD^{(k)}\) and note that

$$\begin{aligned} u^*Au \le \omega _{\max } u^* \left( \sum _k [D^{(k)}]^T D^{(k)} \right) u \le {(\rho h)^d \over d!} u^* \left( \sum _k [D^{(k)}]^T D^{(k)} \right) u. \end{aligned}$$

Furthermore, \(u^TAu = |u|_{H^1}^2,\) and the Friedrichs inequality states that \(\Vert u\Vert _{L^2} \le L|u|_{H^1}.\) Furthermore, according to [32, Proposition 6.3.1], there is a constant \(K_{\Omega }\) such that \(u^Tu \le K_{\Omega } h^{-d} \Vert u\Vert _{L^2}^2\). Finally, we use that the quadrature weights \(\{\omega _i\}\) are \(\Theta (h^{d})\) to find that \(u^Tu \le C_{\Omega } h^{-d} |u|_{H^1}^2 = C_{\Omega } h^{-d} u^TAu \le C_{\Omega } h^{-d}{(\rho h)^d \over d!} u^* \big (\sum _k [D^{(k)}]^T D^{(k)}\big ) u\), as required. \(\square \)

Lemma 10

Assume \(T_h\) is quasi-uniform and that \(R \ge 1\) and \(0 < h \le 1\). There is a constant \(c'_{\Omega }>0\), which depends on \(\Omega \), such that

$$\begin{aligned} \lambda _{\min }&\ge c'_{\Omega }R^{-4} h^{6d}. \end{aligned}$$
(42)

Proof

Using (32) and (40), we arrive at:

$$\begin{aligned} \lambda _{\min }&\ge \min \left\{ 2\left( R \over \omega _{\min }\right) ^{-{2 \over p}}d_{\min }^2, \omega _{\min }^2 \left( R \over \omega _{\min }\right) ^{-{4 \over p}}\right\} . \nonumber \end{aligned}$$
(43)

Note that \(R \ge R^* \ge 1 = |\Omega |^{-1}\,|\Omega | \ge |\Omega |^{-1} \omega _{\min }\) and hence \(R/\omega _{\min } \ge |\Omega |^{-1}\) and

$$\begin{aligned} \lambda _{\min } \ge \min \{2c_{\Omega },1\} \max \{|\Omega |,1\}^2 R^{-{4 \over p}} \omega _{\min }^{2 + {4 \over p}}. \end{aligned}$$

Since \(R \ge 1\) and \(\omega _{\min } \le 1\) (because \(h \le 1\)), we can find a lower bound by putting \(p=1\) in the exponents. Under the quasi-uniform hypothesis, all the quadrature weights are bounded below by \(\omega _i \ge h^d/(d!)\), which yields (42). \(\square \)

Lemma 11

For \(0<h<1\), assume \(T_h\) is a quasi-uniform triangulation of \(\Omega \). The number n of vertices of \(T_h\) inside \(\Omega \) and the number m of simplices in \(T_h\) satisfy

$$\begin{aligned} {n \over d + 1} \le m \le |\Omega | h^{-d} d!, \end{aligned}$$
(44)

where \(|\Omega |\) is the volume of \(\Omega \).

Proof

The inequality \(n \le (d+1)m\) follows from the fact that each of the m simplices has precisely \(d+1\) vertices; we may indeed have \(n < (d+1)m\) since some vertices may be shared between multiple simplices. The upper bound for m follows from \(|\Omega | = \sum _{i=1}^m \omega _i \ge m \omega _{\min } \ge mh^{d}/(d!)\). \(\square \)

Lemma 12

Consider the point \(\hat{x} = (0,\hat{s})\).

$$\begin{aligned} \Vert \hat{x}\Vert _2&\le C_{\Omega }^* h^{-1.5d} R, \text { and } \Vert F'(\hat{x})\Vert _2 \le C_{\Omega }^* h^{-1-1.5d}R (1+\Vert g\Vert _{X^p}), \end{aligned}$$
(45)

where \(C_{\Omega }^*<\infty \) is a constant that depends on \(\Omega \) and the quasi-unifomity parameter \(\rho \) of \(T_h\).

Proof

From (31) we have

$$\begin{aligned} \Vert \hat{x}\Vert _2^2 = \sum _{i=1}^m \hat{s}_i^2 \le m\left( R \over 2\omega _{\min } \right) ^2. \end{aligned}$$
(46)

Using (43), we obtain (44).

We now estimate \(F'(\hat{x})\). Using (34), we find

$$\begin{aligned} \Vert F'(\hat{x})\Vert _2&\le \Vert F_u\Vert _2 + \Vert F_s\Vert _2 \end{aligned}$$
(47)
$$\begin{aligned}&\le \sum _j \Vert [D^{(j)}]^T\hat{Z}^{-1}D^{(j)}g \Vert _2 + {2 \over p}\hat{z}_{\min }^{-1}\Vert \hat{s}^{2/p-1}\Vert _2 + \sigma \Vert \hat{s}^{-1}\Vert _2 + {1 \over \hat{\tau }_{\min }} \Vert \omega \Vert _2. \end{aligned}$$
(48)

We bound the first term as follows:

(49)
(50)

Here, , where \(\rho (\cdot )\) is the spectral radius. We estimate the spectral radius as follows:

$$\begin{aligned} w^T[D^{(j)}]^TWD^{(j)}w = \int _{\Omega } w_{x_j}^2 \, dx \le |w|_{H^1}^2 \le C_{IS}^2 h^{-2} \Vert w\Vert _{L^2}^2 \le C_{IS}^2 [K_{\Omega }']^2 h^{2d-2}\Vert w\Vert _2^2, \end{aligned}$$

where we have used the inverse Sobolev inequality \(|w|_{H^1} \le C_{IS} h^{-1} \Vert w\Vert _{L^2}\) for \(w \in V_h\) (see e.g. Toselli and Widlund [37, Lemma B.27]) and the norm equivalence \(\Vert w\Vert _{L^2} \le K'_{\Omega } h^{d} \Vert w\Vert _2^2\). Thus,

$$\begin{aligned} \Vert \delta \Vert _2 \le \omega _{\min }^{-1/2}\sqrt{d}C_{IS}[K_{\Omega }']h^{d-1}. \end{aligned}$$

Furthermore, using equivalence of \(p-\)norms in m dimensions,

$$\begin{aligned} \left( \sum _j \Vert D^{(j)}g\Vert _2^2\right) ^{1/2}&\le \omega _{\min }^{-1/2} \left( \sum _{k=1}^m \omega _k \left[ \left( \sum _{j=1}^d (D^{(j)}g)_k^2\right) ^{1/2}\right] ^2\right) ^{1/2} \end{aligned}$$
(51)
$$\begin{aligned}&\le \omega _{\min }^{-1/2}m^{1/2}\left( \sum _{k=1}^m \omega _k\left[ \left( \sum _{j=1}^d (D^{(j)}g)_k^2\right) ^{1/2}\right] ^p\right) ^{1/p} \end{aligned}$$
(52)
$$\begin{aligned}&= \omega _{\min }^{-1/2}m^{1/2}\Vert g\Vert _{X^p}. \end{aligned}$$
(53)

As a result,

$$\begin{aligned} \sum _j \Vert [D^{(j)}]^T\hat{Z}^{-1}D^{(j)}g\Vert _2&\le \left( \omega _{\min }^{-1/2}\sqrt{d}C_{IS}[K_{\Omega }']h^{d-1}\right) \left( \omega _{\min }^{-1/2}m^{1/2}\Vert g\Vert _{X^{p}} \right) \end{aligned}$$
(54)
$$\begin{aligned}&\le C_{IS} K''_{\Omega } h^{-1-0.5d} \Vert g\Vert _{X^p}. \end{aligned}$$
(55)

From (31) and (33), we further estimate

$$\begin{aligned} \Vert \hat{s}^{2/p-1}\Vert _2&\le {\left\{ \begin{array}{ll} \sqrt{m} \hat{s}_{\max }^{2/p-1} \le \sqrt{m} \left( R \over 2\omega _{\min } \right) ^{2/p-1} &{} \text { if } 1 \le p < 2 \\ \sqrt{m} &{} \text { if } p \ge 2 \end{array}\right. } \end{aligned}$$
(56)
$$\begin{aligned}&\le \sqrt{m}\left( R \over 2 \omega _{\min } \right) \text { and } \end{aligned}$$
(57)
$$\begin{aligned} \Vert \hat{s}^{-1}\Vert _2&\le \sqrt{m} \text { and } \Vert \omega \Vert _2 \le \sqrt{m}(\rho h)^d/(d!). \end{aligned}$$
(58)

Hence,

$$\begin{aligned} \Vert F'(\hat{x})\Vert _2&\le C_{IS} K''_{\Omega } h^{-1-0.5d} \Vert g\Vert _{X^p} \\&\quad +\, \sqrt{m}\left( R \over 2\omega _{\min } \right) + 2 \sqrt{m} + {2 \over R} \sqrt{m}(\rho h)^d/(d!). \end{aligned}$$

\(\square \)

Proof of Theorem 1

Using (42), we find

$$\begin{aligned} \Vert \hat{x}\Vert _{x_F^*}^*&\le \lambda _{\min }^{-1} \Vert \hat{x}\Vert _2 \le \left( c'_{\Omega }R^{-4} h^{6d} \right) ^{-1} \left( C_{\Omega }^* h^{-1.5d} R\right) = [c'_{\Omega }]^{-1}C_{\Omega }^* R^{5}h^{-7.5d}. \end{aligned}$$
(59)

Also

$$\begin{aligned} \Vert F'(\hat{x})\Vert _{x_F^*}^*&\le \lambda _{\min }^{-1} \Vert F'(\hat{x})\Vert _2 \le \left( c'_{\Omega }R^{-4} h^{6d} \right) ^{-1} \left( C_{\Omega }^* h^{-1-1.5d}R (1+\Vert g\Vert _{X^p}) \right) \end{aligned}$$
(60)
$$\begin{aligned}&= [c'_{\Omega }]^{-1} C_{\Omega }^* h^{-1-7.5d}R^5 (1+\Vert g\Vert _{X^p}). \end{aligned}$$
(61)

We substitute these estimates into (21) to get

$$\begin{aligned} N^*&\le 7.2 \sqrt{4m} \left[ 2\log (4m) \right. \end{aligned}$$
(62)
$$\begin{aligned}&\quad + \log \left( [c'_{\Omega }]^{-1} C_{\Omega }^* h^{-1-7.5d}R^5 (1+\Vert g\Vert _{X^p}) \right) \end{aligned}$$
(63)
$$\begin{aligned}&\quad \left. + \log \left( [c'_{\Omega }]^{-1}C_{\Omega }^* R^{5}h^{-7.5d} \right) + \log (1/\epsilon ) \right] \end{aligned}$$
(64)

Using \(m \le |\Omega | d! h^{-d}\), we get

$$\begin{aligned}&\le 14.4 \sqrt{|\Omega |d!h^{-d}} \left[ \log \left( h^{-1-17d} R^5 (1+\Vert g\Vert _{X^p}) \epsilon ^{-1} \right) +K^* \right] . \end{aligned}$$
(65)

The estimates \(R_{p=1}\), \(R_{1<p<\infty }\) were obtained by starting from the estimates of Lemma 3, adding 1, and doubling them. Substituting these into \(N^*\) produces \(N_{p=1}\) and \(N_{1<p<\infty }\). \(\square \)

3.1 The case \(p=\infty \)

Recall the \(\infty \)-Laplacian of (7). As in the \(p=1\) case, J(u) is non-differentiable and may be unbounded below when f is large. As per Lemma 3, assume that \(L\Vert f\Vert _{L^1} < 1\) and set \(R_{p=\infty } = \max _i \omega _i \left( 2+{2\Vert g\Vert _{X^{\infty }(\Omega )} \over 1-L\Vert f\Vert _{L^1}}\right) \) and impose \(\omega _i s \le R_{p=\infty }\). The problem of minimizing J(u) over \(u \in V_h\) is equivalent to

$$\begin{aligned} \min s \text { over } Q := \left\{ (u,s) \; : \; s \ge \Vert \nabla (u+g)|_{K_i}\Vert _{2} \; \forall i, \text { and } R \ge \omega _i s \right\} \end{aligned}$$
(66)

We notice that this definition of Q coincides with the definition (28) with \(p=1\) subject to the additional restriction that \(s_1=\ldots =s_m\) and subsequently dropping the index i from \(s_i\). As a result, one can obtain a barrier for Q by taking the barrier (25) with \(p=1\) on the subspace of constant valued s vectors, hence the barrier \(F_{\infty }\) and its derivatives are

$$\begin{aligned} F_{\infty }(u,s)&= F_1(u,se), \; F_{\infty }'(u,s) = \mathcal {E}F_1'(u,se), \; F_{\infty }''(u,s) = \mathcal {E}F_1''(u,se)\mathcal {E}^T, \end{aligned}$$
(67)
$$\begin{aligned}&\text {where } e = \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix}, \; \mathcal {E} = \begin{bmatrix} I &{}\quad O\\ O &{}\quad e^T \end{bmatrix}. \end{aligned}$$
(68)

The starting point for the optimization is \((\hat{u},\hat{s})\) with \(\hat{u}=0\) and

$$\begin{aligned} \hat{s} = 1+\max _i \left( \sum _{j=1}^d [(D^{(j)} g)_i]^2 \right) ^{1 \over 2}. \end{aligned}$$
(69)

Theorem 3

With the notation as in Theorem 1, let

$$\begin{aligned} R = R_{p=\infty } = \max _i \omega _i \left( 2+{2\Vert g\Vert _{X^{\infty }(\Omega )} \over 1-L\Vert f\Vert _{L^1}}\right) \end{aligned}$$
(70)

and assume \(p=\infty ,\)\(L\Vert f\Vert _{L^1}<1\). The barrier method to solve (65) requires at most \(N_{p=\infty }\) Newton iterations, where

$$\begin{aligned} N_{p=\infty }&\le 14.4 \sqrt{|\Omega |h^{-d}d!} \left[ \log \left( h^{-1-6.5d}\left( 2+{2\Vert g\Vert _{X^{\infty }(\Omega )} \over 1-L\Vert f\Vert _{L^1}}\right) ^{5}\epsilon ^{-1}\right) + K^* \right] . \end{aligned}$$
(71)

The computational complexity as a function of the number n of grid points (and freezing all other parameters) is \(O(\sqrt{n} \log n)\).

Proof

The proof of Theorem 3 follows the same logic as that of Theorem 1, so we merely sketch it here. First, (34) and (35) are replaced by:

$$\begin{aligned} F'&= \begin{bmatrix} F_u \\ F_s \end{bmatrix} = \begin{bmatrix} \sum _j 2 [D^{(j)}]^T{y^{(j)} \over z} \\ -{2}s\sum _j {1 \over z_j} - {m\sigma \over s} + \sum _j {\omega _j \over \tau _j} \end{bmatrix}, \end{aligned}$$
(72)
$$\begin{aligned} F''&= \begin{bmatrix} F_{uu} &{}\quad F_{us} \\ F_{su} &{}\quad F_{ss} \end{bmatrix} = \begin{bmatrix} F_{uu} &{}\quad F_{us} \\ F_{us}^T &{}\quad F_{ss} \end{bmatrix} \text { where } \end{aligned}$$
(73)
$$\begin{aligned} F_{uu}&= 2\sum _{j=1}^d [D^{(j)}]^T Z^{-1} D^{(j)} + 4\sum _{j,r=1}^d (Y^{(j)}D^{(j)})^T Z^{-2}(Y^{(r)}D^{(r)}), \end{aligned}$$
(74)
$$\begin{aligned} F_{us}&= -{4}\sum _{j=1}^d (Y^{(j)}D^{(j)})^Tz^{-2}s, \end{aligned}$$
(75)
$$\begin{aligned} F_{ss}&= -{2} \sum _j z_j^{-1} + {4 } \sum _jz_j^{-2} s^{2} + \sigma m s^{-2} + \sum _j \omega ^2_jz_j^{-2}. \end{aligned}$$
(76)

The proof of (40) holds (changing what must be changed), ending with

$$\begin{aligned} x^TF''x&\ge \Vert x\Vert _2^2 \min \{2z_{\max }^{-1}d_{\min }^2,\sum _k \omega _k^{2}z_k^{-2}\}, \end{aligned}$$
(77)

which is slightly stronger than (40).

The estimate (44) also holds verbatim. The estimate for \(\Vert \hat{x}\Vert _2\) is by inspection of (68) and (69). The estimate for \(\Vert F_u(\hat{x})\Vert _2\) is identical to the proof of (44), and \(|F_s(\hat{x})|\) is estimated as follows:

$$\begin{aligned} |F_s|&\le {2}\left( R \over 2 \omega _{\min }\right) z_{\min }^{-1}m +\sigma m +|\Omega |\tau _{\min }^{-1} \le C_{\Omega } Rh^{-2d}, \end{aligned}$$
(78)

where we have used (16), \(1 \le \hat{s} \le R/(2\min \omega _i)\), \(\hat{\tau }_i \ge R/(2\min \omega _i)\), and \(\hat{z}_j \ge 1\), and \(C_{\Omega }\) is some constant that depends only on \(\Omega \). Thus,

$$\begin{aligned} \Vert \hat{x}\Vert _{x_F^*}^* \le \lambda _{\min }^{-1} \Vert \hat{x}\Vert _2 \le [c'_{\Omega }]^{-1}C_{\Omega }^* R^{5}h^{-7.5d}, \end{aligned}$$

see (58). Futhermore,

$$\begin{aligned} \Vert F'(\hat{x})\Vert _{x_F^*}^*&\le \lambda _{\min }^{-1} \Vert F'(\hat{x})\Vert _2 \end{aligned}$$
(79)
$$\begin{aligned}&\le \left( c'_{\Omega }R^{-4} h^{6d} \right) ^{-1} \left( C_{IS} K''_{\Omega } h^{-1-0.5d} \Vert g\Vert _{X^\infty } + C_{\Omega } Rh^{-2d} \right) \end{aligned}$$
(80)
$$\begin{aligned}&\le K'''_{\Omega } h^{-1-3d} \left( \Vert g\Vert _{X^\infty } \over 1+\Vert f\Vert _{X^1}\right) , \end{aligned}$$
(81)

yielding the final estimate

$$\begin{aligned} N^*&\le 7.2 \sqrt{4m} \left[ 2\log (4m) \right. \end{aligned}$$
(82)
$$\begin{aligned}&\quad + \log \left( K'''_{\Omega } h^{-1-3d} \left( \Vert g\Vert _{X^\infty } \over 1+\Vert f\Vert _{X^1}\right) \right) \end{aligned}$$
(83)
$$\begin{aligned}&\quad + \log \left( [c'_{\Omega }]^{-1}C_{\Omega }^* R^{5}h^{-7.5d} \right) \\&\quad \left. + \log (\epsilon ^{-1})\right] , \end{aligned}$$
(84)

as required. \(\square \)

3.2 Implementation notes

In principle, the vector \((f_i)\) is defined by \(f_i = \int _{\Omega } f\phi _i\); we have not analyzed an inexact scheme for computing these integrals. If f is assumed to be a suitable finite element space (e.g. piecewise constant), then these integrals can be computed exactly from the same quadrature we use on the diffusion term. In addition, we can then compute \(\Vert f\Vert _{L^q}\) exactly by quadrature since \(|f|^q\) is also piecewise constant. Assuming g is piecewise linear, the quantities \(|\Omega |\) and \(\Vert g\Vert _{X^p}^p\) can be computed exactly, see (30). Thus, one can compute \(R_{1< p < \infty }\), \(R_{p = 1}\), etc... exactly.

In the strong form (1), the function g is given on \(\partial \Omega \) (i.e. it is a trace), but in the variational form (2), the function g has domain \(\Omega \). Regarding \(v = u+g\) as the solution, the choice of \(g|_{\Omega }\) doe not affect the value of v, provided that \(g|_{\partial \Omega }\) is fixed. The simplest way to choose \(g|_{\Omega }\) as a piecewise linear function on \(T_h\) is to set all nodal values to 0 inside of \(\Omega \), but this is a somewhat “rough” prolongation that is furthermore dependent on h. Using such a prolongation of g causes the estimates (5) and (6) to become dependent on h where g appears. In order to avoid this dependence on h, one can proceed in one of two ways. First, if the meshes \(T_h\) are all included in one coarse mesh \(T_{h_0}\), then one can do the prolongation on \(T_{h_0}\) and use the same prolongation on all \(T_h\).

Another method is to solve the linear Laplacian with boundary conditions \(g|_{\partial \Omega }\) on the mesh \(T_h\). This choice of \(g|_{\Omega }\) does vary slightly with h but it converges to the continuous solution as \(h \rightarrow 0\). Furthermore, this choice of g minimizes \(\Vert g\Vert _{X^2} = |g|_{H^1}\) so it may result in a smaller value of R than prolongation by truncation. We call this choice of g the discrete harmonic prolongation of \(g|_{\partial \Omega }\) to \(\Omega \). We use the discrete harmonic prolongation in all our numerical experiments.

4 Numerical experiments

We consider the p-Laplacian for \(p=1, \; 1.1, \; 1.2, \; 1.5, \; 2, \; 3, \; 4, \; 5, \; \infty \) for a square domain subject to Dirichlet boundary conditions and where the forcing \(f = 0\), see Fig. 1. For the boundary conditions g, we have taken the piecewise linear interpolant of the trace \((1_X(x,y))|_{\partial \Omega }\) of the characteristic function \(1_X(x,y)\), where X is the set \(X = (\{0\} \times [0.25,0.75]) \cup ([0.6,1] \times [0.25,1])\), which we approximate on the discrete grid by piecewise linear elements. Note that this creates very challenging numerical and functional analytical problems, e.g. the trace of \(W^{1,\infty }\) functions are also \(W^{1,\infty }\) so the \(\infty -\)Laplacian here is solving a problem approximating one outside the usual trace space. The forcing \(f=0\) means that solutions must satisfy minimum and maximum principles, and so the solution is always between the extremal 0 and 1 boundary values for all values of p and all \(x \in \Omega \). The zero forcing provides some “protection” against the “bad” boundary data.

Fig. 1
figure 1

Solving the p-Laplacian for \(p=1,2,\infty \) with the same boundary conditions g and zero forcing \(f=0\) on a \(200 \times 200\) grid. Because of the zero forcing, the minimum and maximum principles hold, which provides some protection against the near-discontinuities in the boundary data, e.g. when \(p=\infty \)

We have varied the number n of grid points from \(n=16\) (a \(4 \times 4\) grid) up to \(n=40,000\) (\(200 \times 200\)) and in all cases, solved to a tolerance of \(\epsilon \approx 10^{-6}\). We have reported the number of Newton iterations required for convergence in Table 1. This detailed table reveals those values of \(\kappa ,n,p\) that failed to converge within five minutes. Most of these convergence failures are due to purely numerical problems. Indeed, we have noted in the introduction that when p is large, minimizing J(u) is intrinsically challenging because it exhausts the accuracy of double precision floating point. Thus, the difficulty in solving p-Laplacians accurately for large p is not particular to our algorithm but indeeds affects all algorithms for solving p-Laplacians. MATLAB has also issued warnings that the Hessian was singular to machine precision, for large values of p and n.

Table 1 Newton iteration counts for various problem sizes n, various step strategies \(\kappa \) and various values of p

The scaling properties of our algorithms are not immediately obvious from Table 1. In order to visualize the scaling properties of our algorithms, we have sketched the iterations counts of Table 1 in Fig. 2. Note that both axes are in logarithmic scale, so straight lines of slope \(\alpha \) correspond to \(O(n^{\alpha })\) scaling. We see that the short step algorithm of Sect. 2.3 requires the largest number of Newton iterations to converge (blue lines). This is consistent with experience in convex optimization. For this reason, we were not able to solve larger problems with the short-step algorithm. The scaling of the short-step algorithm is consistent with the theoretical prediction \(O(\sqrt{n}\log n)\) of Theorems 1 and 3.

Fig. 2
figure 2

The number of Newton iterations for various grid sizes n and parameters p and step sizes \(\kappa \)

The long step algorithms (black lines) all require fewer Newton steps than the short step algorithm, even though the theoretical estimate \(O(n\log n)\) for long step algorithms is worse than for short step algorithms. This is a well-known phenomenon, and in practice, long step algorithms perform better, as is the case here.

In Fig. 2, most of the black curves are approximately straight lines, indicating \(O(n^\alpha )\) scaling, but there are notable exceptions when \(p=1\) or \(p = \infty \), especially when \(\kappa \) is also large. By contrast, the adaptive step size algorithm (red lines), with \(\kappa _0 = 10\), is seen to be the best algorithm in most cases, and these red lines are much straighter than the black lines. We denote by \(N_p(n)\) the number of iterations required for a certain value of p and problem size n for the adaptive step size algorithm. We have fitted straight lines to the red curves of Fig. 2 in the least-squares sense and obtained the following approximations:

$$\begin{aligned} \begin{array}{rcccccccccccccc} p= &{} 1.0 &{} 1.1 &{} 1.2 &{} 1.5 &{} 2.0 &{} 3.0 &{} 4.0 &{} 5.0 &{} \infty \\ N_p(n)\approx &{} 62 n^{0.18} &{} 33 n^{0.21} &{} 31 n^{0.17} &{} 43 n^{0.11} &{} 47 n^{0.10} &{} 60 n^{0.09} &{} 30 n^{0.22} &{} 17 n^{0.36} &{} 18 n^{0.28} \end{array} \end{aligned}$$

Thus, it seems like the adaptive scheme requires about \(O(n^{1 \over 4})\) Newton iterations, regardless of the value of p.

Note that the case \(p=2\) is a linear Laplacian that can be computed by solving a single linear problem. When we embed this linear problem into the machinery of convex optimization, the overall algorithm is very inefficient since it may require hundreds of linear solves. We are including this test case for completeness, not as a recommendation.

4.1 3d experiments

Consider the following function:

$$\begin{aligned} \phi&= {\frac{9}{20}}-\sqrt{ \left( {x}^{2}+{y}^{2} \right) \left( 1/10+ \left( \left| x-\cos \left( y \right) \right| \right) ^{2} \right) + \left( z+{\frac{3\,{\mathrm{e}^{-x}}}{25}} \right) ^{2}}. \end{aligned}$$
(85)

We define the “spaceship domain” \(\tilde{\Omega } = \{ (x,y,z) \in \mathbb {R}^3 \; : \; \phi > 0 \};\) this domain is slightly rescaled so that it is aesthetically pleasing. In practice, the domain \(\tilde{\Omega }\) is approximated on a discrete grid with a tetrahedral mesh \(T_h\) and the corresponding polyhedral approximation \(\Omega \) of \(\tilde{\Omega }\). On this tetrahedral mesh, we solve the \(p-\)Laplacian with forcing \(f=1\) with \(p \in \{1,\infty \}\). The boundary values g are the indicating function of the set \(\{y>0.45\}\), as approximated by a piecewise linear function on the finite element grid. This problem features \(n=11,224\) unknowns and \(m=47,956\) elements. The solutions are displayed in Fig. 3.

Fig. 3
figure 3

Solving the 1-Laplacian (top row) and \(\infty \)-Laplacian (bottom row) in 3d. The left column shows the solutions on the whole volumetric domain \(\Omega \) with transparency, while the right column shows a slice through \(\Omega \) of the same solutions with opaque colors

For these problems, the solution of the 1-Laplacian seems to approximate the indicating function of \(\{y>0.45\}\), as expected. However, the solution of the \(\infty \)-Laplacian is very large (exceeding 2, 000 somewhere in the middle of the spaceship). This is because the traces of \(W^{1,\infty }(\Omega )\) functions are in \(W^{1,\infty }(\partial \Omega )\) but our boundary data g is a piecewise linear approximation of a discontinuous trace with jumps (an indicating function), an hence \(\Vert g\Vert _{X^{\infty }}\) is very large and so is the solution \(u+g\). The 1-Laplacian is better able to tolerate the boundary data g with (near)-jumps because the trace of a \(W^{1,1}(\Omega )\) function is merely \(L^1(\partial \Omega )\), thus allowing jumps.

The solution for the \(p=1\)-Laplacian seems very close to what one would obtain if one were to put \(f=0\) instead of \(f=1\). This is not surprising, because the 1-Laplacian is a linear program and the solutions of linear programs change in discrete steps when the forcing changes continuously. For example, the unique minimizer of \(\tilde{J}(x) = |x|+fx\) (\(x \in \mathbb {R}\)) is \(x=0\) whenever \(|f|<1\), and switches to “undefined” (or \(\pm \infty \)) when \(|f|>1\) because then \(\tilde{J}\) is unbounded below.

For the \(p=\infty \)-Laplacian, the solution \(u+g\) is a large positive bump because \(f>0\) and there is a minimum principle stating that the minimum must of \(u+g\) be on the boundary \(\partial \Omega \). When one takes \(f<0\) instead, the solution \(u+g\) is a large negative bump because in that scenario, \(u+g\) satisfies a maximum principle. In the 2d experiments, the \(\infty \)-Laplacian did not develop large bumps because the boundary data was between 0 and 1 and the forcing was 0. This meant that \(u+g\) had to satisfy both minimum and maximum principles, and u was constrained by \(0 \le u+g \le 1\), preventing the formation of large bumps in the solution.

5 Conclusions and outlook

We have presented new algorithms for solving the p-Laplacian efficiently for any given tolerance and for all \(1 \le p \le \infty \). We have proven that our algorithms compute a solution to any given tolerance in polynomial time, using \(O(\sqrt{n} \log n)\) Newton iterations, and an adaptive stepping variant converges in \(O(\sqrt{n} \log ^2 n)\) Newton iterations. We have confirmed these scalings with numerical experiments. We have further shown by numerical experiments that the adaptive step variant of the barrier method converges much faster than the short-step variant for the p-Laplacian and also usually faster than long-step barrier methods, thus achieving the practical speedup of long-step algorithms while avoiding the \(O(n\log n)\) worst-case behavior of long-step algorithms. We have numerically estimated that the adaptive step algorithm requires \(O(n^{1 \over 4})\) Newton iterations across all values of \(1 \le p \le \infty \). We have observed numerical difficulties for \(p \ge 5\), which are expected since large powers exhaust the accuracy of double precision floating point arithmetic; this difficulty is not specific to our algorithm but is inherent to the p-Laplacian for large values of p. Our algorithms are particularly attractive when \(p \approx 1\) and \(p = \infty \), where there are no other algorithms that are efficient at all tolerances.