Efficient algorithms for solving the $p$-Laplacian in polynomial time

The $p$-Laplacian is a nonlinear partial differential equation, parametrized by $p \in [1,\infty]$. We provide new numerical algorithms, based on the barrier method, for solving the $p$-Laplacian numerically in $O(\sqrt{n}\log n)$ Newton iterations for all $p \in [1,\infty]$, where $n$ is the number of grid points. We confirm our estimates with numerical experiments.


Introduction
Let ⊂ R d . For 1 ≤ p < ∞, the p−Laplace equation is where w 2 = d j=1 |w j | 2 1/2 is the usual 2−norm on R d . Prolonging g from ∂ to the interior and setting u = v − g, the variational form is A similar definition can be made in the case p = ∞ and will be discussed in Sect. 3.1. For p = 1, the p-Laplacian is also known as Mean Curvature, and a solution with f = 0 is known as a minimal surface [31]. The 1-Laplacian is related to a certain "pusher-chooser" game [19] and compressed sensing [7]. The general p-Laplacian is used for nonlinear Darcy flow [11], modelling sandpiles [2] and image processing [8]. We also mention the standard text of Heinonen et al. [16]; as well as the lecture notes of Lindqvist [21].
One may discretize the variational form (2) using finite elements; we briefly outline this procedure in Sect. 2.1 and refer to Barrett and Liu [3] for details. One chooses piecewise linear basis functions {φ j (x)} on and we let u h (x) = j u j φ j (x). The energy J (u h ) can be approximated by quadrature; the quadrature is exact if the elements are piecewise linear. This leads to a finite-dimensional energy functional Find u ∈ R n such that J (u) = c T u where D ( j) is a numerical partial derivative, b ( j) = D ( j) g comes from the boundary conditions g and c comes from the forcing term f . Several algorithms have been proposed to minimize the convex functional J (u). Huang et al. [18] proposed a steepest descent algorithm on a regularized functional J h, (u) which works well when p > 2. Tai and Xu [36] proposed a subspace correction algorithm which works best when p is close to 2 but whose convergence deteriorates when p → 1 or p → ∞. Algorithms based on a multigrid approach (e.g. Huang et al. [9]) suffer from the same problems when p approaches 1 or ∞. The algorithm of Oberman [30] also works for p ≥ 2, although the convergence factor deteriorates after several iterations so it is difficult to reach high accuracy with this method.
The problem of minimizing J (u) has much in common with the problem of minimizing a p-norm, which is by now well-understood. The motivation for optimizing a p-norm is often given as a facility location problem [1,6]. Efficient algorithms for solving such problems can be obtained within the framework of convex optimization and barrier methods; see Hertog et al. [17] and Xue and Ye [38] specifically for p-norm optimization; and for general convex optimization, see Nesterov and Nemirovskii [28], Boyd and Vandenberghe [5] and Nesterov [27] and references therein.
Given a ν-self-concordant barrier for a convex problem, it is well-known that the solution can be found in O( √ ν log ν) Newton iterations. However, the "hidden constant" in the big-O notation depends on problem parameters, including the number n of grid points in a finite element discretization. Our main result is to estimate these hidden constants and show that the overall performance of our algorithm is indeed O( √ n log n).
Theorem 1 Assume that ⊂ R d is a polytope and that T h is a quasi-uniform triangulation of , parametrized by 0 < h < 1 and with quasi-uniformity parameter 1 ≤ ρ < ∞. Let 1 ≤ p < ∞. Assume g ∈ W 1, p ( ) is piecewise linear on T h and let V h ⊂ W 1, p 0 ( ) be the piecewise linear finite element space on T h whose trace vanishes. Let R ≥ R * := 2(1 + g p X p ), where | | is the volume of and g p X p = ∇g p 2 dx.
Let > 0 be a tolerance. In exact arithmetic, the barrier method of Sect. 2.3, with barrier (25), to minimize J (u) over u ∈ V h , starting from (u, s) = (0,ŝ) given by (31), converges in at most N * iterations, where Here, the constant K * = K * ( , ρ) depends on the domain and on the quasi-uniformity parameter ρ of T h . At convergence, u satisfies The global minimizer in V h can be found by choosing a sufficiently large value of R. We give two cases where R is sufficiently large (1 < p < ∞ and p = 1.) Case 1 < p < ∞: For any 1 < p < ∞, assume that f ∈ L q ( ) where 1 p + 1 q = 1. Assume that fits inside a strip of width L. The value of R = R 1< p<∞ = 2 + 8 g p X p + 4L q p Case p = 1: Assume that L f L ∞ < 1. The choice R = R p=1 = 2 + 2 g X 1 /(1 − L f L ∞ ) always produces a global minimizer u of J (u) and the number of iterations is bounded by We emphasize that the p = 1, ∞ cases have up to now been considered to be especially hard and there are no other algorithm that offers any performance guarantees in these situations. Estimate (6) is the first algorithm that is known to converge even when p = 1. We also have an algorithm for the p = ∞ case and we provide a corresponding iteration count estimate in Theorem 3.
The algorithm mentioned in Theorem 1 is the barrier method of convex optimization. Consider the problem of minimizing c T x subject to x ∈ Q where Q is some convex set. Assume we have a ν-self-concordant barrier F for Q. The barrier method works by minimizing tc T x + F(x) for larger and larger values of the barrier parameter t, which is increased iteratively according to some schedule. In the short step variant, t increases slowly and the method is very robust; the estimates of Theorem 1 are for the short step variant of the barrier method. It is well-known that the short step barrier method has the theoretically best convergence estimates, but that long step variants (where t increases more rapidly) work better in practice. However, long step algorithms have theoretically worse convergence estimates and, as we will see in the numerical experiments, can sometimes require a large number of Newton iterations to converge.
In order to get the best convergence, we have devised a new, very simple adaptive stepping algorithm for the barrier method. There are already many adaptive stepping algorithms (see e.g. Nocedal et al. [29] and references therein). It is often difficult to prove "global convergence" of these algorithms, and we are not aware of global estimates of Newton iteration counts. With our new, highly innovative algorithm, we are able to prove "quasi-optimal" convergence of our adaptive scheme. Here, quasi-optimal means that our adaptive algorithm requiresÕ( √ n) Newton iterations, neglecting logarithmic terms, which is the same as the theoretically optimal short-step algorithm of Theorem 1.
The p-Laplacian is subject to roundoff problems when p is large but finite, as we now briefly describe. Consider the problem of minimizing v p p in the space {v = (1, y)}. Assume that we are given a machine (for example, ≈ 2.22 × 10 −16 in double precision) and consider an arbitrary vector v = [1, δ] T . In this situation v p p = 1 p + δ p = 1 in machine precision, provided δ < 1/ p . This means that a region of size 1/ p near the minimum is numerically indistinguishable from the true minimum when computing the energy, causing a very large relative error in the solution. This phenomenon becomes worse in higher dimensions and when composing with matrices with large condition numbers as in (3). This means that all algorithms, including our own, will struggle to produce highly accurate solutions when p 2. In particular, for p = 5, we see that δ < 7.4 × 10 −4 is best possible, and this is made worse by the condition number of the differential matrices. However, although the problem is numerically challenging for finite p 2, the problem becomes easy again when p = ∞. Our second main result is an estimate for the p = ∞ case in Sect. 3.1, and we confirm by numerical experiments that there are no numerical issues for p = ∞.
Our algorithm is an iterative scheme for a high-dimensional problem arising from a partial differential equation. Each iteration involves the solution of a linear problem that can be interpreted as a numerical elliptic boundary value problem. One can estimate pessimistically that solving each linear problem requires O(n 3 ) FLOPS, for a total cost of O(n 3.5 log n) FLOPS for our entire algorithms. This estimate can be improved by using an O(n 2.373 ) FLOPS fast matrix inverse [20], making our overall algorithms O(n 2.873 log n) FLOPS; we mention that this matrix inversion algorithm mostly of theoretical interest since it is not practical for any reasonable value of n. We have taken special care to preserve the sparsity of this problem so that, if one assumes a bandwidth of b (e.g. or O(n 2.84 log n) (d = 3) FLOPS for our overall algorithms. In addition, we mention many preconditioning opportunities [4,10,12,14,15,[22][23][24][25][26]35]. Although solution by preconditioning is possible, it is difficult to estimate the number of iterations a priori since the diffusion coefficient of the stiffness matrix is difficult to estimate a priori; in the best case ("optimal preconditioning") where the elliptic solve at each Newton iteration can be done in O(n) FLOPS, our algorithms are then O(n 1.5 log n) FLOPS.
Our paper is organized as follows. In Sect. 2, we give some preparatory material on the p-Laplacian and the barrier method. In Sect. 3, we prove our main theorem for 1 ≤ p < ∞ and a separate theorem for the case p = ∞. In Sect. 4, we validate our algorithms with numerical experiments. We end with some conclusions.

Preparatory material
We now discuss some preparatory material regarding the p−Laplacian, for 1 ≤ p ≤ ∞. The ∞-Laplacian can be interpreted as the problem of minimizing Note that (7) is not a limit as p → ∞ of (2), e.g. because (2) uses the pth power of · X p in its definition.
We now prove strict convexity for the 1 < p < ∞ case. If we have equality at ( * ) then q(x) + s(x) and r (x) + s(x) are non-negative multiples of one another, i.e. q + s = aw and r + s = bw where a(x), b(x) ≥ 0 and w(x) is vector-valued. Then ( * * ) becomes ((ta can be identified by their gradients, we have proven strict convexity.
From the norm equivalence u p ≤ d We can give a modified Friedrichs inequality for · X p . Lemma 2 (Friedrichs inequality for · X p ) Assume that ⊂ R d fits inside of a strip of width and assume that φ ∈ W Proof Without loss of generality, assume that is inside the strip 0 ≤ x 1 ≤ . From the fundamental theorem of calculus, the following argument proves the p = ∞ case: The result follows by integrating over x 2 , . . . , x d .
We now give an a priori estimate on the magnitude of the minimizer of J (u). This estimate will be important in the design of our algorithm in order to limit the search volume to some ball of reasonable size.
If { p, q} = {1, ∞} and L f L q < 1 then a minimizing sequence must eventually lie in u k X p ≤ g X p /(1 − L f L q ).
Proof Case 1 < p < ∞: For convenience, we write J (u) = 1 p u + g p X p − f u. Assume u X p ≥ g X p ; then: Next, we use Young's inequality ab ≤ 1 q a q + 1 > 0 and hence a minimizing sequence must satisfy (14).
The p = 1 case is as follows: The p = ∞ case is done in a similar fashion.
The a priori estimate above can also be used to show the existence of a minimizer of J (u). (14). Note that each B k is open and nonempty so pick u k ∈ B k . Furthermore, the B k are nested: B 1 ⊃ B 2 ⊃ . . .; the convexity of J implies that the B k are also convex.
According to (8), we see that each B k is contained in a closed ball Recall that F is weakly compact. Passing to a subsequence if necessary, we may assume that {u k } converges weakly to some u. By Mazur's lemma, we can now find some convex linear combinations v k = J (k) j=k α j u j ∈ B k such that {v k } converges to u strongly. This shows that u belongs to every B k and hence J (u) is minimal. Uniqueness follows by strict convexity.

Finite elements
Assume that is a polygon. We introduce a triangulation T h of , parametrized by 0 < h < 1, and piecewise linear finite element basis functions {φ 1 (x), . . . , φ n (x)} ⊂ W 1, p ( ). As usual, we define a "reference element"K = {x ∈ R d : In general, if T h is not necessarily a uniform lattice, we say that the family of triangulations T h , parametrized by 0 be the midpoint quadrature rule, which is exact for piecewise linear or piecewise constant functions. We can construct a "discrete derivative" matrix note that the quadrature is exact provided that g is also piecewise linear. For the midpoint rule, ω i is the volume of the simplex K i ; if the triangulation T h is quasiuniform then we find that We abuse the notation and use the same symbol u to represent both the finite element coefficient vector [u 1 , . . . , u n ] T and the finite element function u(x) = n k=1 u k φ k (x). We further denote by D ( j) the columns of D ( j) corresponding to the vertices of is the usual discretization of the Laplacian or Poisson problem, and we have that We now prove that the finite element method converges for the p-Laplacian.

Lemma 5
Assume that is a polytope and 1 < p < ∞. Let u h be the finite element minimizer of J (u) in a finite element space that contains the piecewise linear functions.
Lemma 5 is very general (no regularity assumptions are made on the solution u) but also very weak since it does not give a rate of convergence. If one assumes regularity of the solution then one can use quasi-interpolation [34] to estimate the convergence more precisely. However, we will see in Sect. 2.2 (Example 1) that it is difficult to prove regularity. Since the present paper focuses on the numerical solver, and not in the discretization, we do not investigate this aspect any further. The theorem also does not specify whether u h converges as h tends to 0. In the case 1 < p < ∞, the strict convexity of J ensures that u h will indeed converge to a u in W 1, p ( ) but for p = 1 there may be multiple minimizers and then u h could oscillate between the many minimizers or converge to a "minimizer" in the double-dual of W 1,1 ( ).

Pathological situations for extreme p values
The p-Laplace problem varies in character as p ranges from 1 ≤ p ≤ +∞. When p = 2, minimizing J (u) is equivalent to solving a single linear problem, which is clearly faster than solving hundreds of linear problems as required by a barrier method. As p gets further away from p = 2, naive solvers work less well and proper convex optimization algorithms are required, such as our barrier methods. The extreme cases p = 1 and p = ∞ have traditionally been considered hardest. For example, J (u) may not be differentiable at u when p ∈ {1, ∞}, typically when ∇(u + g) vanishes at some point x ∈ . In Sect. 2, we have introduced several lemmas, some of which work for all cases 1 ≤ p ≤ ∞, others are restricted to 1 < p < ∞. Briefly speaking, we have shown that for all 1 ≤ p ≤ ∞, J (u) is convex and possesses minimizing sequences (with some restrictions on the forcing f when p ∈ {1, ∞}.) These facts are sufficient to deploy barrier methods, because barrier methods do not require the objective to be differentiable, be strictly convex, or have a unique minimizer. As a "bonus", we have also shown that J (u) is strictly convex and has a unique minimum when 1 < p < ∞, but this is not required for the successful application of our barrier methods.
We now illustrate the pathological behavior for extreme values of p with several simple examples. For 1 < p < ∞, strict convexity ensures the uniqueness of the minimum of J (u). However, in the case p = 1, the minimizer may be "outside" of W 1,1 ( ) or nonunique.
A minimizing sequence for J (u) is the piecewise linear functions u n (x) = min(1, max(0, 0.5 + n(x − 0.5))). This sequence converges to the indicating function of [0.5, 1), which is not in W 1,1 (0, 1). This is because W 1,1 (0, 1) is not reflexive and hence its unit ball is not weakly compact. Instead, the limit of u n is in BV , the double-dual of W 1,1 (0, 1).
We now briefly show why the minimization of J (u) for u ∈ V h is numerically challenging.
The p-Laplacian for p = 1 is particularly hard; we now show two types of difficulties. First, the Hessian may be singular, and regularizing the Hessian leads to gradient descent.
this correspond to a 2-dimensional 1-Laplacian discretized with a single grid point. The gradient is J (x) = x x 2 and the Hessian is The Hessian matrix J (x) is singular which makes the Newton iteration undefined. To make matters worse, the kernel of J is spanned by J and hence any "regularization" J + I leads to a simple gradient descent.
Yet another difficulty is that the 1-Laplacian may have nonunique solutions or no solutions when the forcing is nonzero.

Example 4
Let c ∈ R and J (x) = |x| + cx; this corresponds to a 1-dimensional 1-Laplacian with a nonzero forcing term, discretized with a single grid point. Then, is convex for all c ∈ R. However, J (x) has a unique minimum x = 0 if and only if |c| < 1. When |c| = 1, J (x) has infinitely many minima. When |c| > 1, J (x) is unbounded below and there is no minimum. As a result, the energy J (u) of the 1-Laplacian may not be bounded below when the forcing f = 0; see also Lemma 3.

Convex optimization by the barrier method
In this section, we briefly review the theory and algorithms of convex optimization and refer to Nesterov [27,Section 4.2] for details, including the notion of self-concordant barriers.
Let Q ⊂ R n be a bounded closed convex set that is the closure of its interior, c ∈ R n be a vector and consider the convex optimization problem The barrier method (or interior point method) for solving (17) is to minimize The main path-following scheme is 1. Set t 0 = 0, β = 1/9 and γ = 5/36. Choose an accuracy > 0 and The stopping criterion guarantees that, at convergence, c T x (k) − c * ≤ . Starting this iteration can be difficult, since it is not always obvious how to find an initial point We use an auxiliary path-following scheme 2 to approximate the analytic center x * F of Q: 1. Choose x (0) ∈ Q and set t 0 = 1 and G = −F (x (0) ). 2. For the kth iteration (k ≥ 0): The invariant of the auxiliary scheme is that t k G + F (x (k) ) * x (k) ≤ β for every k. At convergence, one can show that F (x) * x ≤ β. Letx ∈ Q be some starting point for the auxiliary path-following scheme. Combining the auxiliary path-following scheme to find the approximate analytic centerx of Q, followed by the main path-following scheme to solve the optimization problem (17) starting from x (0) =x, completes in at most N iterations, where

Long-step algorithms
The path-following schemes of Sect. 2.3 are so-called "short step", meaning that the barrier parameter t increases fairly slowly when ν is large. It is well-known that longstep algorithms, where t increases more rapidly, often converge faster overall than short-step algorithms, even though the worst case estimate O(ν log ν) is worse than the short-step estimate O( √ ν log ν), see Nesterov and Nemirovskii [28] for details. The main path-following scheme can be made "long-step" as follows: where 0 < r k ≤ 1 is found by line search, see e.g. Boyd and Vandenberghe [5, Algorithm 9.2 with α = 0.01 and β = 0 The parameter κ ≥ 1 determines the step size of the scheme. In convex optimization, step sizes κ = 10 or even κ = 100 are often used, but we will see in Sect. 4 that shorter step sizes are better suited for the p-Laplacian.
The long-step variant of the auxiliary path-following scheme is implemented in a similar fashion; the criterion for decreasing t k+1 is then t k G + F (x (k) * x (k) ≤ β.

Adaptive stepping
We finally introduce an algorithm whose step parameter κ k is indexed by the iteration counter k. We first introduce some terminology. If t k c + F (x (k) ) * x (k) ≤ β (main phase) or t k G + F (x (k) ) * x (k) ≤ β (auxiliary phase), we say that x (k) was accepted, else we say that x (k) was a slow step. Let κ 0 be an initial step size (we will take κ 0 = 10.) 1. If x (k) is accepted after 2 or fewer slow steps, put κ k+1 = min{κ 0 , κ 2 k }. 2. If x (k) is accepted after 8 or more slow steps, put κ k+1 = √ κ k .
The quantity t k+1 is computed as in the long step algorithm (22), with κ = κ k+1 . Note that whenever t k+1 coincides with the short step (20), then the step is automatically accepted. The rejection is "wasteful" in that it discards possibly useful information, but we will see in the numerical experiments that this adaptive scheme is quite efficient in practice. Furthermore, the rejection step is the key that unlocks a very simple analysis of our algorithm.
Theorem 2 For given c, F, , let N S and N A be the number of Newton steps of the short step and adaptive step algorithms, respectively. Then, Proof By construction, on each accepted step of the main path-following algorithm, we find that , the short step size, see (22). Thus, we only need to estimate the maximum number of slow steps before a step is accepted. According to [27, p.202], the short step size satisfies Starting from κ = 10, after r rejections, the step size is κ = 10 (1/4) r . When κ ≤ κ min , the short step is automatically accepted and hence the maximum number of rejections is r = r − , where (10)) log 4 . Hence, Since all the adaptive steps are at least as large as the short steps and the stopping criterion is purely based on the barrier parameter t k , and noting that each rejection corresponds to 15 slow steps (plus the initial accepted step), we obtain the estimate for the main phase. The estimate for the auxiliary phase is obtained in a similar fashion.
Theorem 2 states that the adaptive algorithm cannot be much worse than the short step algorithm, which means that the adaptive algorithm scales at worse likeÕ( √ ν), where we have neglected some logarithms. The reader may be surprised that the estimate for the adaptive scheme is slightly worse than the estimate for the short step scheme, but this is a well-known phenomenon in convex optimization. The long step estimates are quite pessimistic and in practice, long step and adaptive schemes work much better than the theoretical estimates. Our result is especially interesting because it is well-known that estimates for long step algorithms scale likeÕ(ν), whereas our new algorithm scales likeÕ( √ ν).

Proof of Theorem 1
The proof of Theorem 1 is rather technical, so we begin by outlining the plan of our proof. The idea is to estimate all the quantities in the bound (21) for the number N of Newton iterations. The barrier parameter ν is estimated in Lemma 6. Some "uniform" or "box" bounds are given for the central path in Lemma 7; these are an intermediate step in converting as many estimates as possible into functions of h. Because (21) depends on the Hessian F of the barrier, the lowest eigenvalue λ min of F is estimated in Lemma 8. This bound itself depends on extremal singular values of the discrete derivative matrices, which are estimated in Lemma 9, and these bounds are rephrased in terms of h in Lemma 10. In Lemma 11, we establish the connection between the number m of simplices and the grid parameter h, which is used in Lemma 12 to estimate the quantities x 2 and F (x) 2 , which can be converted to estimates for x * (21) by dividing by λ min ; herex is a starting point for the barrier method. Finally, the quantities R appearing in Theorem 1 are obtained by starting from the estimates of Lemma 3, adding 1, and doubling them. This ensures that the central path will be well inside the ball of radius R.
In the present section, we treat in detail the case 1 ≤ p < ∞. The case p = 1, which is considered especially difficult, poses no special difficulty in the present section, provided that the hypotheses of Lemma 3 are satisfied. The case p = ∞ is deferred to Sect. 3.1.
Let 1 ≤ p < ∞ and define the barrier

Lemma 6 The function F(u, t) is an m(σ + 2)-self-concordant barrier for the set
The problem of minimizing J (u) over u ∈ V h subject to the additional constraint that max i ω i ∇(u + g)| K i 2 ≤ R is equivalent to: Here, we have abused the notation and used the symbol f for the vector whose ith component is self-concordant, see Nesterov and Nemirovskii [28]. The rest is proved by inspection.
From Lemma 3, it is tempting to use a bound such as u X p < R, i.e. i ω u s i ≤ R, but this leads to a dense Hessian F ss . Instead, we have used the "uniform" bound: With this "looser" bound, the Hessian F ss is sparse. 3 Furthermore, by using the R value from the a-priori estimate Lemma 3, one can ensure that Q is non-empty and contains minimizing sequences for J (u). Thus, put: In this way, (0,ŝ) ∈ Q.

Lemma 7 For all (u, s) ∈ Q,
Proof From w T s ≥ 0 and (26), we find τ ≤ R. From (26), we find z i ≤ s 2/ p i and from 0 ≤ τ i = R − ω i s, we find ω i s i ≤ R.
We further find that:τ The gradient of F is: where vector algebra is defined entrywise. The Hessian of F is Proof We consider the "Rayleigh quotient" x T F x/x T x, the extremal values of which are the extremal eigenvalues of F . We put x = v w so that We use the Cauchy-Schwarz inequality together with Young's inequality to find that Hence we find: We use that F s = 0, which implies that − 2 A domain is said to be of width L when ⊂ S, where S is a strip of width L. The Friedrichs inequality states that, for domains of width L > 0 and for u ∈ W 1, p 0 ( ), u L 2 ≤ L|u| H 1 ( ) .

Lemma 9
Let be a polytope of width L < ∞, and assume that the triangulation T h , which depends on the grid parameter 0 < h < 1, is quasi-uniform. Then, there is a constant c > 0, which depends on and the quasi-uniformity parameter ρ of T h , such that the smallest eigenvalue d 2 Furthermore, u T Au = |u| 2 H 1 , and the Friedrichs inequality states that u L 2 ≤ L|u| H 1 . Furthermore, according to [32,Proposition 6.3.1], there is a constant K such that u T u ≤ K h −d u 2 L 2 . Finally, we use that the quadrature weights

Lemma 11 For 0 < h < 1, assume T h is a quasi-uniform triangulation of . The number n of vertices of T h inside and the number m of simplices in T h satisfy
where | | is the volume of .
Proof The inequality n ≤ (d + 1)m follows from the fact that each of the m simplices has precisely d + 1 vertices; we may indeed have n < (d + 1)m since some vertices may be shared between multiple simplices. The upper bound for m follows from . Lemma 12 Consider the pointx = (0,ŝ).
where C * < ∞ is a constant that depends on and the quasi-unifomity parameter ρ of T h .
Proof From (31) we have (45) Using (43), we obtain (44). We now estimate F (x). Using (34), we find We bound the first term as follows: is the spectral radius. We estimate the spectral radius as follows: where we have used the inverse Sobolev inequality |w| Furthermore, using equivalence of p−norms in m dimensions, As a result, From (31) and (33), we further estimate Hence, Proof of Theorem 1 Using (42), we find We substitute these estimates into (21) to get Using m ≤ | |d!h −d , we get The estimates R p=1 , R 1< p<∞ were obtained by starting from the estimates of Lemma 3, adding 1, and doubling them. Substituting these into N * produces N p=1 and N 1< p<∞ .

The case p = ∞
Recall the ∞-Laplacian of (7). As in the p = 1 case, J (u) is non-differentiable and may be unbounded below when f is large. As per Lemma 3, assume that L f L 1 < 1 and set R p=∞ = max i ω i 2 + 2 g X ∞ ( ) 1−L f L 1 and impose ω i s ≤ R p=∞ . The problem of minimizing J (u) over u ∈ V h is equivalent to min s over Q := (u, s) : s ≥ ∇(u + g)| K i 2 ∀i, and R ≥ ω i s We notice that this definition of Q coincides with the definition (28) with p = 1 subject to the additional restriction that s 1 = . . . = s m and subsequently dropping the index i from s i . As a result, one can obtain a barrier for Q by taking the barrier (25) with p = 1 on the subspace of constant valued s vectors, hence the barrier F ∞ and its derivatives are The starting point for the optimization is (û,ŝ) withû = 0 and (68) Theorem 3 With the notation as in Theorem 1, let and assume p = ∞, L f L 1 < 1. The barrier method to solve (65) requires at most N p=∞ Newton iterations, where The computational complexity as a function of the number n of grid points (and freezing all other parameters) is O( √ n log n).
Proof The proof of Theorem 3 follows the same logic as that of Theorem 1, so we merely sketch it here. First, (34) and (35) are replaced by: The proof of (40) holds (changing what must be changed), ending with which is slightly stronger than (40). The estimate (44) also holds verbatim. The estimate for x 2 is by inspection of (68) and (69). The estimate for F u (x) 2 is identical to the proof of (44), and |F s (x)| is estimated as follows: where we have used (16), 1 ≤ŝ ≤ R/(2 min ω i ),τ i ≥ R/(2 min ω i ), andẑ j ≥ 1, and C is some constant that depends only on . Thus, as required.

Implementation notes
In principle, the vector ( f i ) is defined by f i = f φ i ; we have not analyzed an inexact scheme for computing these integrals. If f is assumed to be a suitable finite element space (e.g. piecewise constant), then these integrals can be computed exactly from the same quadrature we use on the diffusion term. In addition, we can then compute f L q exactly by quadrature since | f | q is also piecewise constant. Assuming g is piecewise linear, the quantities | | and g p X p can be computed exactly, see (30). Thus, one can compute R 1< p<∞ , R p=1 , etc... exactly.
In the strong form (1), the function g is given on ∂ (i.e. it is a trace), but in the variational form (2), the function g has domain . Regarding v = u + g as the solution, the choice of g| doe not affect the value of v, provided that g| ∂ is fixed. The simplest way to choose g| as a piecewise linear function on T h is to set all nodal values to 0 inside of , but this is a somewhat "rough" prolongation that is furthermore dependent on h. Using such a prolongation of g causes the estimates (5) and (6) to become dependent on h where g appears. In order to avoid this dependence on h, one can proceed in one of two ways. First, if the meshes T h are all included in one coarse mesh T h 0 , then one can do the prolongation on T h 0 and use the same prolongation on all T h .
Another method is to solve the linear Laplacian with boundary conditions g| ∂ on the mesh T h . This choice of g| does vary slightly with h but it converges to the continuous solution as h → 0. Furthermore, this choice of g minimizes g X 2 = |g| H 1 so it may result in a smaller value of R than prolongation by truncation. We call this choice of g the discrete harmonic prolongation of g| ∂ to . We use the discrete harmonic prolongation in all our numerical experiments. , which we approximate on the discrete grid by piecewise linear elements. Note that this creates very challenging numerical and functional analytical problems, e.g. the trace of W 1,∞ functions are also W 1,∞ so the ∞−Laplacian here is solving a problem approximating one outside the usual trace space. The forcing f = 0 means that solutions must satisfy minimum and maximum principles, and so the solution is always between the extremal 0 and 1 boundary values for all values of p and all x ∈ . The zero forcing provides some "protection" against the "bad" boundary data.
We have varied the number n of grid points from n = 16 (a 4 × 4 grid) up to n = 40, 000 (200 × 200) and in all cases, solved to a tolerance of ≈ 10 −6 . We have reported the number of Newton iterations required for convergence in Table 1. This detailed table reveals those values of κ, n, p that failed to converge within five minutes. Most of these convergence failures are due to purely numerical problems. Indeed, we have noted in the introduction that when p is large, minimizing J (u) is intrinsically challenging because it exhausts the accuracy of double precision floating point. Thus, the difficulty in solving p-Laplacians accurately for large p is not particular to our algorithm but indeeds affects all algorithms for solving p-Laplacians. MATLAB has also issued warnings that the Hessian was singular to machine precision, for large values of p and n.
The scaling properties of our algorithms are not immediately obvious from Table 1. In order to visualize the scaling properties of our algorithms, we have sketched the iterations counts of Table 1 in Fig. 2. Note that both axes are in logarithmic scale, so straight lines of slope α correspond to O(n α ) scaling. We see that the short step algorithm of Sect. 2.3 requires the largest number of Newton iterations to converge (blue lines). This is consistent with experience in convex optimization. For this reason, we were not able to solve larger problems with the short-step algorithm. The scaling of the short-step algorithm is consistent with the theoretical prediction O( √ n log n) of Theorems 1 and 3.
The long step algorithms (black lines) all require fewer Newton steps than the short step algorithm, even though the theoretical estimate O(n log n) for long step algorithms is worse than for short step algorithms. This is a well-known phenomenon, and in practice, long step algorithms perform better, as is the case here. The symbol -indicates that the algorithm failed to converge by exceeding the 5 minutes time limit In Fig. 2, most of the black curves are approximately straight lines, indicating O(n α ) scaling, but there are notable exceptions when p = 1 or p = ∞, especially when κ is also large. By contrast, the adaptive step size algorithm (red lines), with κ 0 = 10, is seen to be the best algorithm in most cases, and these red lines are much straighter than the black lines. We denote by N p (n) the number of iterations required for a certain value of p and problem size n for the adaptive step size algorithm. We have fitted straight lines to the red curves of Fig. 2 in the least-squares sense and obtained the following approximations: Note that the case p = 2 is a linear Laplacian that can be computed by solving a single linear problem. When we embed this linear problem into the machinery of convex optimization, the overall algorithm is very inefficient since it may require hundreds of linear solves. We are including this test case for completeness, not as a recommendation.
We define the "spaceship domain"˜ = {(x, y, z) ∈ R 3 : φ > 0}; this domain is slightly rescaled so that it is aesthetically pleasing. In practice, the domain˜ is approximated on a discrete grid with a tetrahedral mesh T h and the corresponding polyhedral approximation of˜ . On this tetrahedral mesh, we solve the p−Laplacian with forcing f = 1 with p ∈ {1, ∞}. The boundary values g are the indicating function of the set {y > 0.45}, as approximated by a piecewise linear function on the finite element grid. This problem features n = 11, 224 unknowns and m = 47, 956 elements. The solutions are displayed in Fig. 3. For these problems, the solution of the 1-Laplacian seems to approximate the indicating function of {y > 0.45}, as expected. However, the solution of the ∞-Laplacian is very large (exceeding 2, 000 somewhere in the middle of the spaceship). This is because the traces of W 1,∞ ( ) functions are in W 1,∞ (∂ ) but our boundary data g is a piecewise linear approximation of a discontinuous trace with jumps (an indicating function), an hence g X ∞ is very large and so is the solution u + g. The 1-Laplacian is better able to tolerate the boundary data g with (near)-jumps because the trace of a W 1,1 ( ) function is merely L 1 (∂ ), thus allowing jumps.
The solution for the p = 1-Laplacian seems very close to what one would obtain if one were to put f = 0 instead of f = 1. This is not surprising, because the 1-Laplacian is a linear program and the solutions of linear programs change in discrete steps when the forcing changes continuously. For example, the unique minimizer of J (x) = |x| + f x (x ∈ R) is x = 0 whenever | f | < 1, and switches to "undefined" (or ±∞) when | f | > 1 because thenJ is unbounded below.
For the p = ∞-Laplacian, the solution u + g is a large positive bump because f > 0 and there is a minimum principle stating that the minimum must of u + g be on the boundary ∂ . When one takes f < 0 instead, the solution u + g is a large negative bump because in that scenario, u + g satisfies a maximum principle. In the 2d experiments, the ∞-Laplacian did not develop large bumps because the boundary data was between 0 and 1 and the forcing was 0. This meant that u + g had to satisfy both minimum and maximum principles, and u was constrained by 0 ≤ u + g ≤ 1, preventing the formation of large bumps in the solution.

Conclusions and outlook
We have presented new algorithms for solving the p-Laplacian efficiently for any given tolerance and for all 1 ≤ p ≤ ∞. We have proven that our algorithms compute a solution to any given tolerance in polynomial time, using O( √ n log n) Newton iterations, and an adaptive stepping variant converges in O( √ n log 2 n) Newton iterations. We have confirmed these scalings with numerical experiments. We have further shown by numerical experiments that the adaptive step variant of the barrier method converges much faster than the short-step variant for the p-Laplacian and also usually faster than long-step barrier methods, thus achieving the practical speedup of long-step algorithms while avoiding the O(n log n) worst-case behavior of long-step algorithms. We have numerically estimated that the adaptive step algorithm requires O(n 1 4 ) Newton iterations across all values of 1 ≤ p ≤ ∞. We have observed numerical difficulties for p ≥ 5, which are expected since large powers exhaust the accuracy of double precision floating point arithmetic; this difficulty is not specific to our algorithm but is inherent to the p-Laplacian for large values of p. Our algorithms are particularly attractive when p ≈ 1 and p = ∞, where there are no other algorithms that are efficient at all tolerances.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.