1 Introduction

Mean Field Games (MFG in short) theory, introduced in [25, 27], arises in the study of differential games with an infinite number of rational agents. The corresponding literature is now vast and concerns both theoretical and applicative aspects, see [4, 17, 30] and references therein. In this regard, a significant part of it is dedicated to the study of numerical methods and algorithms for the computation of the solution to the MFG model, both in the formulation as a PDEs system and as an optimal control problem of a PDE. Such approaches, just to mention a few, include finite differences, semi-Lagrangian methods and Fourier expansions with regard to the approximation methods (see [1, 2, 6, 7, 11, 13, 14, 28, 29, 31, 33, 34]). Many of these methods exploit the variational structure of the problem, concerns the case in which the coupling term involving the distribution of the population is separated from the Hamiltonian, while relatively few works have been dedicated to the so-called non-separable case

$$\begin{aligned} \left\{ \begin{aligned} (i) \qquad&-\partial _t u -\Delta u+H(x,m,Du)=f(m) \qquad{} & {} \textrm{in}\,\,Q,\\ (ii) \qquad&\partial _t m- \Delta m - \text {div} \big ( mH_p(x,m,Du)\big ) =0{} & {} \textrm{in}\,\,Q, \\&m(x,0)=m_0(x), \; u(x,T)=u_T(x){} & {} \textrm{in}\,\,{\mathbb {T}}^d. \end{aligned}\right. \end{aligned}$$
(1.1)

(\({\mathbb {T}}^d\) is the unit torus and \(Q={\mathbb {T}}^d\times (0,T)\)). Moreover, the non-separable case is very important in applications to model congestion effects, i.e., situations in which the cost of displacement of the agents increases in those regions where the density is large. MFGs models including congestion were introduced in [30] and a typical Hamiltonian in such cases is

$$\begin{aligned} H(x,m,p)=\frac{h(x)|p|^2}{(1+m)^\alpha }, \quad \alpha >0. \end{aligned}$$

Global in time weak solution to Eq. (1.1) has been considered in [5, 23], short time existence and uniqueness of regular solution in [8, 19] and the stationary case in [22]. In general, MFGs with non-separable Hamiltonian do not have a variational structure and this causes a restriction on the choice of numerical methods. Moreover and in general, implicit schemes are preferred as they enhance the stability and efficiency compared to explicit schemes. To design implicit finite difference schemes, iterative methods are needed to reduce the problem to a sequence of linear systems. Iterative methods employed in solving MFGs include Newton’s method [6, 7, 9, 28], fixed point iteration, fictitious play, policy iteration [15, 29], smoothed policy iteration [34], etc. In particular, numerical solution of MFGs with non-separable Hamiltonian have been discussed in, e.g., [6, 7, 23, 28, 29].

In this paper, we consider Newton’s method from a continuous standpoint, viewing it as a linear system of partial differential equations (PDEs) which approximate the nonlinear problem (1.1). Newton’s method (also known as the Newton Kantorovich method) is effective for convex optimization problems ( [10]) or for solving nonlinear functional equations in Banach spaces, cf. [18]. The novelty and main contributions of this work are theoretical. We rigorously establish a quadratic rate of convergence of the method in a neighborhood of the solution of Eq. (1.1). In the study of Newton’s method to Eq. (1.1), a critical point is in establishing the well posedness of the linearized MFG system. To address this, we broaden the theoretical framework developed in [12, 16] for analyzing master equations. This extension is applicable to MFGs with non-separable Hamiltonians, subject to certain Hessian-type monotonicity conditions. Recently, the convergence analysis of Newton’s method has been considered in [9] for stationary MFGs with separable Hamiltonian.

The Newton method for system Eq. (1.1) reads as follows. Writing the MFG system as an operator equation \({\mathcal {F}}(u,m)=0\) and denoting by \(\mathcal{L}\mathcal{F}_{({\check{u}},{\check{m}})}\) the linearized \({\mathcal {F}}\) at \(({\check{u}},{\check{m}})\) and by \(\mathcal{L}\mathcal{F}_{({\check{u}},{\check{m}})}^{-1}\) the inverse of \(\mathcal{L}\mathcal{F}_{({\check{u}},{\check{m}})}\), then we get

$$\begin{aligned} (u^{n},m^n)=(u^{n-1},m^{n-1})-\mathcal{L}\mathcal{F}^{-1}_{(u^{n-1},m^{n-1})}\cdot {\mathcal {F}}(u^{n-1},m^{n-1}), \end{aligned}$$

or equivalently

$$\begin{aligned} \mathcal{L}\mathcal{F}_{(u^{n-1},m^{n-1})}\cdot \big (u^{n}- u^{n-1},m^n-m^{n-1}\big )=-{\mathcal {F}}(u^{n-1},m^{n-1}). \end{aligned}$$
(1.2)

The previous identity in PDE form reads as

$$\begin{aligned} \left\{ \begin{aligned} (i)\qquad&-\partial _t (u^{n}- u^{n-1}) -\Delta (u^{n}- u^{n-1})+H_p(x,m^{n-1},Du^{n-1})D(u^{n}- u^{n-1})\\&+H_m(x,m^{n-1},Du^{n-1}) (m^{n}- m^{n-1})-f'(m^{n-1})(m^n-m^{n-1})\\ ={}&\partial _t u^{n-1} +\Delta u^{n-1}-H(x,m^{n-1},Du^{n-1})+f(m^{n-1})\qquad{} & {} \textrm{in}\,\,Q, \\ (ii)\qquad&\partial _t (m^{n}- m^{n-1})- \Delta (m^{n}- m^{n-1}) - \text {div} \big ( (m^{n}- m^{n-1}) H_p(x,m^{n-1},Du^{n-1})\big )\\&-\text {div} \Big (m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})(m^{n}- m^{n-1})\Big ) \\&-\mathrm{{div}}\Big (m^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})(Du^n-Du^{n-1})\Big )\\ ={}&-\partial _t m^{n-1}+ \Delta m^{n-1} +\text {div} \big ( m^{n-1}H_p(x,m^{n-1},Du^{n-1})\big ){} & {} \textrm{in}\,\,Q, \\&m^n(x,0)=m_0(x), \; u^n(x,T)=u_T(x){} & {} \textrm{in}\,\,{\mathbb {T}}^d, \end{aligned}\right. \end{aligned}$$
(1.3)

and, after simplification, we get the coupled linear system in the unknown \((u^{n},m^{n})\)

$$\begin{aligned} \left\{ \begin{aligned} (i)\qquad&-\partial _t u^n -\Delta u^n+H_p(x,m^{n-1},Du^{n-1})D(u^{n}- u^{n-1})\\&+H_m(x,m^{n-1},Du^{n-1}) (m^{n}- m^{n-1})\\ ={}&-H(x,m^{n-1},Du^{n-1})+f(m^{n-1})+f'(m^{n-1})(m^n-m^{n-1})\qquad{} & {} \textrm{in}\,\,Q, \\ (ii)\qquad&\partial _t m^n- \Delta m^n - \text {div} \big ( m^nH_p(x,m^{n-1},Du^{n-1})\big ) \\ ={}&\mathrm{{div}}\Big (m^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})(Du^n-Du^{n-1})\Big )\\&+\text {div} \Big (m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})(m^{n}- m^{n-1})\Big ){} & {} \textrm{in}\,\,Q, \\&m^n(x,0)=m_0(x), \; u^n(x,T)=u_T(x){} & {} \textrm{in}\,\,{\mathbb {T}}^d. \end{aligned}\right. \end{aligned}$$
(1.4)

Assuming that the Hamiltonian is regular and satisfies a classical monotonicity condition (see [5, 30]), we obtain existence and uniqueness of a classical solution (um) to Eq. (1.1), see Proposition 2.4. Then, we prove the well posedness of Eq. (1.4) at each iteration and the quadratic rate of convergence of the Newton iteration to the solution of the MFG system when the initial guess \((u^0,m^0)\) is sufficiently close to (um). We remark that, even though the algorithm is presented for evolutive MFGs, the ideas extend naturally to stationary MFGs as well.

This paper primarily focuses on analyzing the convergence of the Newton method for the MFG system at the level of PDEs. In the MFG literature, this iterative method has been applied to solve the nonlinear finite dimensional system which results via a finite differences approximation of Eq. (1.1), see [4, 7]. The algorithm is usually coupled with a continuation method (typically with respect to the viscosity parameter). Indeed, it is important to have a good initial guess of the solution and, for that, it is possible to take advantage of the continuation method by choosing the initial guess as the solution obtained with the previous value of the parameter (see [28]). Alternatively, the Newton method may be selectively employed to tackle the Hamilton-Jacobi equation at each iteration while using a fixed point iterations for the MFG system (as in, e.g., [23, 31]). Another approach involves employing a nonlinear discretized system, as presented in [1, 2], followed by the application of automatic differentiation for Newton’s iteration. Within this context, a significant challenge involves establishing a priori estimates for the finite difference scheme, ensuring the stability of the region of attraction of the method with respect to the discretization parameters. Our convergence result can be interpreted as an intermediate step in the proof of the convergence of the Newton method for finite dimensional approximation to the MFG system. However, it is important to note that the convergence analysis presented in this work does not readily extend to the discretized system, and addressing this is left for our future works.

The paper is organized as follows. In Sect. 2, we introduce some notations and preliminary results. In Sect. 3, we discuss the convergence of the Newton method for a non-separable Hamiltonian and local coupling, while in Sect. 4 the case of a separable Hamiltonian and nonlocal coupling. Finally, the Appendix A is devoted to the proof of some basic results necessary for the rest of the paper.

2 Preliminaries

In this section, we introduce the assumptions on the Hamiltonian, prove the well posedness of (1.1) and some preliminary results necessary for the estimate of the rate in the next section. Throughout the paper, we work in the \(d-\)dimensional torus \({\mathbb {T}}^d\) (i.e., periodic boundary conditions). We consider the set \({\mathcal {P}}({\mathbb {T}}^d)\) of Borel probability measures on \({\mathbb {T}}^d\) is endowed with the Monge Kantorovich (Wasserstein) distance: for \(m,m' \in {\mathcal {P}}({\mathbb {T}}^d)\), \(\mathbf{{d}}_1(m,m')=\sup _\phi \int _{{\mathbb {T}}^d}\phi (x)\textrm{d}(m-m')(x)\) where the supremum is taken over all Lipschitz maps \(\phi : {\mathbb {T}}^d\rightarrow {\mathbb {R}}\) with a Lipschitz constant bounded by 1. In particular, we have that \(\mathbf{{d}}_1(m,m')\le \Vert m-m'\Vert _{{\mathcal {C}}^0({\mathbb {T}}^d)}\) if \(m,m'\in {\mathcal {P}}({\mathbb {T}}^d)\cap {\mathcal {C}}^0({\mathbb {T}}^d)\). Given a map \(f:{\mathbb {T}}^d\times {\mathcal {P}}({\mathbb {T}}^d)\rightarrow {\mathbb {R}}^d\), we will use the notation \(\frac{\delta f}{\delta m}\) for the derivative of f w.r.t m, as introduced in [16, Section 2.2]. \(\frac{\delta f}{\delta m}:{\mathbb {T}}^d\times {\mathcal {P}}({\mathbb {T}}^d)\times {\mathbb {T}}^d\rightarrow {\mathbb {R}}\) is a continuous map such that

$$\begin{aligned} f[m'](x)-f[m](x)= \int _0^1\int _{{\mathbb {T}}^d}\frac{\delta f}{\delta m}[(1-s)m+sm'](x)(y)d (m'-m)(y) ds. \end{aligned}$$

The above relation defined the map \(\frac{\delta f}{\delta m}\) only up to a constant. We always use the normalization

$$\begin{aligned} \int _{{\mathbb {T}}^d} \frac{\delta f}{\delta m}[m](x)(y)\textrm{d}m(y)=0. \end{aligned}$$

Higher-order derivatives are defined similarly.

The set \({\mathcal {C}}^{1,0}(Q)\) with the norm

$$\begin{aligned} \Vert u\Vert _{{\mathcal {C}}^{1,0}(Q)}= \Vert u\Vert _{{\mathcal {C}}^{0}(Q)}+ \Vert D u\Vert _{{\mathcal {C}}^{0}(Q;{\mathbb {R}}^d)} \end{aligned}$$

is the space of continuous functions on Q with continuous derivatives in the \(x-\)variable, up to the parabolic boundary. We also recall the definition of parabolic Hölder spaces on the torus (we refer to [26] for a more comprehensive discussion). For \(0<\alpha <1\), we denote

$$\begin{aligned}{}[u]_{C^{\alpha ,\frac{\alpha }{2}}(Q)}:=\sup _{(x_1,t_1),(x_2,t_2)\in Q}\frac{\vert u(x_1,t_1)-u(x_2,t_2)\vert }{(\textrm{d}(x_1,x_2)^2+\vert t_1-t_2\vert )^{\frac{\alpha }{2}}}, \end{aligned}$$
(2.1)

where \(\textrm{d}(x,y)\) stands for the geodesic distance from x to y in \({{\mathbb {T}}^d}\). The parabolic Hölder space \(C^{\alpha ,\frac{\alpha }{2}}(Q)\) is the space of functions \(u\in L^\infty (Q)\) for which \([u]_{C^{\alpha ,\frac{\alpha }{2}}(Q)}<\infty \). It is endowed with the norm:

$$\begin{aligned} \Vert u\Vert _{{\mathcal {C}}^{\alpha ,\frac{\alpha }{2}}(Q)}:=\Vert u\Vert _{{\mathcal {C}}^{0}(Q)}+[u]_{C^{\alpha ,\frac{\alpha }{2}}(Q)}. \end{aligned}$$

The space \({\mathcal {C}}^{1+\alpha ,\frac{1+\alpha }{2}}(Q)\) and \({\mathcal {C}}^{2+\alpha ,1+\alpha /2}(Q)\) are endowed with the norms

$$\begin{aligned} \Vert u\Vert _{{\mathcal {C}}^{1+\alpha ,\frac{1+\alpha }{2}}(Q)}:= & {} \Vert u\Vert _{{\mathcal {C}}^{0}(Q)}+\sum _{i=1}^d\Vert \partial _{x_i}u\Vert _{C^{\alpha ,\frac{\alpha }{2}}(Q)}\nonumber \\{} & {} \quad +\sup _{(x_1,t_1),(x_2,t_2)\in Q}\frac{\vert u(x_1,t_1)-u(x_2,t_2)\vert }{\vert t_1-t_2\vert ^{\frac{1+\alpha }{2}}},\end{aligned}$$
(2.2)
$$\begin{aligned}{} & {} \Vert u\Vert _{{\mathcal {C}}^{2+\alpha ,1+\alpha /2}(Q)}:=\Vert u\Vert _{{\mathcal {C}}^0(Q)}+\sum _{i=1}^d\Vert \frac{\partial u}{\partial x_i}\Vert _{{\mathcal {C}}^{1+\alpha ,\frac{1+\alpha }{2}}(Q)}+\Vert \frac{\partial u}{\partial t}\Vert _{{\mathcal {C}}^{\alpha ,\alpha /2}(Q)}.\nonumber \\ \end{aligned}$$
(2.3)

We now introduce some useful anisotropic Sobolev spaces to handle time-dependent problems. First, given a Banach space X, \(L^p(0,T;X)\) denotes the usual vector-valued Lebesgue space for \(p\in [1,\infty ]\). For any \(r\ge 1\), we denote by \(W^{2,1}_r(Q)\) the space of functions u such that \(\partial _t^{{\delta }}D^{\sigma }_x u\in L^r(Q)\) for all multi-indices \(\sigma \) and \({\delta }\) such that \(\vert \sigma \vert +2{\delta }\le 2\), endowed with the norm

$$\begin{aligned} \Vert u\Vert _{W^{2,1}_r(Q)}=\Big (\int _{Q}\sum _{\vert \sigma \vert +2{\delta }\le 2}\vert \partial _t^{{\delta }}D^{\sigma }_x u\vert ^rdxdt\Big )^{\frac{1}{r}}. \end{aligned}$$

We recall that, by classical results in interpolation theory, the sharp space of initial (or terminal) trace of \(W^{2,1}_r(Q)\) is given by the fractional Sobolev class \(W^{2-\frac{2}{r}}_r({{\mathbb {T}}^d})\). We define \(W^{1,0}_r(Q)\) as the space of functions on Q such that the norm

$$\begin{aligned} \Vert {u} \Vert _{W^{1,0}_r(Q)}:=\Vert u\Vert _{L^r(Q)}+ \sum _{i=1}^d\Vert \frac{\partial u}{\partial x_i}\Vert _{L^r(Q)} \end{aligned}$$

is finite and we denote with \({\mathcal {H}}_r^{1}(Q)\) the space of functions \(u\in W^{1,0}_r(Q)\) with \(\partial _t u\in (W^{1,0}_{r'}(Q))'\), equipped with the natural norm

$$\begin{aligned} \Vert u\Vert _{{\mathcal {H}}_r^{1}(Q)}:=\Vert u\Vert _{W^{1,0}_r(Q)} +\Vert {\partial _tu} \Vert _{(W^{1,0}_{s'}(Q))'}. \end{aligned}$$

From [32, Theorem A.3 (iii)] and [21, Proposition 2.1 (iii)], for \(r>d+2\) the space \({\mathcal {H}}^1_r(Q)\) is continuously embedded in \({\mathcal {C}}^{\alpha /2,\alpha }(Q)\), for some \(\alpha \in (0,1)\).

We consider the following set of assumptions for the non-separable case with local coupling, while specific assumptions in the case of a nonlocal coupling will be discussed in Sect. 4. The notation \(|\cdot |\) both refers to the modulus of a vector and to the norm of a matrix in the appropriate space.

  1. (A1)

    \(m_0\in {\mathcal {P}}({\mathbb {T}}^d) \cap {\mathcal {C}}^{2+\alpha }({\mathbb {T}}^d)\) and \(m_0(x)\ge \vartheta >0\), \(u_T\in {\mathcal {C}}^{2+\alpha }({\mathbb {T}}^d)\).

  2. (A2)

    \(H\in {\mathcal {C}}^4({\mathbb {T}}^d\times {\mathbb {R}}^+ \times {\mathbb {R}}^d)\) and for all \(x\in {\mathbb {T}}^d\), \(m\in {\mathbb {R}}^+\), \(p\in {\mathbb {R}}^d\) and some \({\bar{C}}>0\):

    $$\begin{aligned} \begin{aligned}&|H_{px}(x,m,p)|\le {\bar{C}}(|p| + 1), \,\,\,|H_{xx}(x,m,p)|+|H_{xxm}(x,m,p)| \le {\bar{C}} (|p|^2 + 1),\\&\quad |H_{mp}(x,m,p)|\le {\bar{C}}\vert p\vert ,\,\,|H_{mm}(x,m,p)|\le {\bar{C}}\vert p\vert ^2,\\&\quad |H_{pp}(x,m,p) | + |H_{ppm}(x,m,p) |\le {\bar{C}},\\&\quad |H_{ppp}(x,m,p)|+|H_{pppm}(x,m,p)| \le {\bar{C}}. \end{aligned} \end{aligned}$$
    (2.4)
    $$\begin{aligned} \begin{pmatrix} -H_m(x,m,p) &{} \dfrac{m}{2}H_{pm}(x,m,p)^T\\ \dfrac{m}{2}H_{pm}(x,m,p) &{} mH_{pp}(x,m,p) \end{pmatrix}>0,\quad \forall m>0.\nonumber \\ \end{aligned}$$
    (2.5)
  3. (A3)

    \(f(\cdot )\), \(f'(\cdot )\) and \(f''(\cdot )\) are uniformly bounded mappings from \({\mathbb {R}}^+\) to \({\mathbb {R}}\). Moreover, \(f'(\cdot )\ge 0\).

Some remarks about these assumptions are in order.

Remark 2.1

Equation (2.5), first proposed by P. L. Lions in [30] and then exploited in [5, 6], is a uniqueness condition for the MFG systems with non-separable Hamiltonian. In particular, it implies that H is convex with respect to p and nonincreasing with respect to m and, when H has a separate form \(H = ~H( x, p)-{\overline{f}}(m)\), it reduces to \(H_{pp}> 0\) and \({\overline{f}} '>0\). Besides for uniqueness, we use this condition to prove the estimate in Lemma 2.6, which is crucial for the rate of convergence.

Remark 2.2

A typical example of Hamiltonian which satisfies (A2) is

$$\begin{aligned} H(x,m,p)=\frac{h(x)|p|^2}{(1+m)^\alpha }, \end{aligned}$$
(2.6)

where \(0<\alpha \le 2\), \(h(x)\in {\mathcal {C}}^2({\mathbb {T}}^d)\) and \(h(x)>0\) for all \(x\in {\mathbb {T}}^d\). Existence and uniqueness of a weak solutions to MFGs with such Hamiltonians, under some additional assumptions, can be obtained from results in [5, 24]. In Proposition 2.4, under the stronger assumptions (A1)–(A3), we prove existence and uniqueness of a classical solution to Eq. (1.1).

Remark 2.3

An example of f which satisfies (A3) is the sigmoid function

$$\begin{aligned} f(m)=\frac{1}{1+e^{-m}}. \end{aligned}$$

In fact, the uniformly boundedness of f is included in (A3) only to obtain a relatively simple proof of existence of a solution in the non-separable case, see Proposition 2.4, but one can obtain small time existence and uniqueness results with less restrictions on H and f, see [19]. If we assume a priori that Eq. (1.1) has a classical solution, independently of assuming (A1)–(A3), the key assumption for proving the convergence of Newton method is the uniform boundedness of \(f''(\cdot )\), see also Remark 3.3. In this case, we can also include examples such as

  • \(f(m)=m\).

  • \(f(m)=(1+m)^\alpha \), \(0<\alpha <2\).

Therefore, in some practical applications, we can apply the Newton method without requiring all the assumptions in this paper to be satisfied. In any case, restrictions on the uniform boundedness of \(H_{ppp}(x,m,p)\) and \(f''(m)\) are very typical for Newton iterations, c.f. S. Boyd [10, Section 9.5.3]. Some possible generalizations to the coupling \(f(m)=m^\alpha \), \(\alpha \ge 2\), will be discussed later in the paper, see Remark 3.4. It is also possible to include, under appropriate assumptions, a dependence of f on t, but for simplicity we omit it.

We will consider classical solutions to the MFG system. Recall that a classical solution of (1.1) is a couple (um) such that u and m belong to \({\mathcal {C}}^{2, \alpha }(Q)\) for some \(\alpha \in (0,1)\) and satisfies the problem in pointwise sense. For the proof of the next result, see the Appendix.

Proposition 2.4

Under assumptions (A1), (A2) and (A3), the system (1.1) has a unique classical solution.

The next two lemmas are devoted to prove an estimate for a perturbation of the linearized MFG system. This result is the main ingredient in our analysis of the convergence of the Newton algorithm.

Lemma 2.5

Assume (A1), (A2) and (A3) and let (um) be the solution of Eq. (1.1). Then, the unique weak solution \((v,\rho )\) of the system

$$\begin{aligned} \left\{ \begin{aligned} (i) \qquad&-\partial _t v -\Delta v+H_p(x,m,Du)Dv+H_m(x,m,Du)\rho =f'(m)\rho \qquad{} & {} \textrm{in}\,\,Q,\\ (ii) \qquad&\partial _t \rho - \Delta \rho - \mathrm{{div}} \big ( \rho H_p(x,m,Du)\big )-\mathrm{{div}} \big ( m\rho H_{pm}(x,m,Du)\big ) \\&=\mathrm{{div}}\big (mH_{pp}(x,m,Du)Dv\big ){} & {} \textrm{in}\,\,Q, \\&\rho (x,0)=0, \; v(x,T)=0{} & {} \textrm{in}\,\,{\mathbb {T}}^d \end{aligned}\right. \nonumber \\ \end{aligned}$$
(2.7)

is the trivial solution \((v,\rho )=(0,0)\).

Proof

Multiply by \(\rho \) on both sides of (i), integrate on Q and exploit (ii) to get

$$\begin{aligned} \begin{aligned}&\int _Qf'(m)|\rho |^2dxdt\\&\quad ={}\int _QH_m(x,m,Du)|\rho |^2dxdt+\int _Qv\mathrm{{div}}\big (mH_{pp}(x,m,Du)Dv\big )dxdt\\&\qquad +\int _Qv\mathrm{{div}}\big (m\rho H_{pm}(x,m,Du)\rho \big )dxdt\\&\quad ={}\int _QH_m(x,m,Du)|\rho |^2dxdt-\int _QmH_{pp}(x,m,Du)Dv\cdot Dvdxdt\\&\qquad -\int _QmH_{pm}(x,m,Du)\rho Dv. \end{aligned} \end{aligned}$$

It follows from (A1) and parabolic maximum principle (c.f. [34]) that \(m>0\). Hence with (A3), we obtain \(f'(m)\ge 0\) and

$$\begin{aligned} \int _Qf'(m)|\rho |^2dxdt\ge 0. \end{aligned}$$

Hence, from Eq. (2.5), we get that

$$\begin{aligned} \begin{pmatrix} \rho&Dv \end{pmatrix} \begin{pmatrix} -H_m(x,m,Du) &{} mH_{pm}(x,m,Du)/2\\ mH_{pm}(x,m,Du)/2 &{} mH_{pp}(x,m,Du) \end{pmatrix} \begin{pmatrix} \rho \\ Dv \end{pmatrix} = 0, \end{aligned}$$

otherwise we obtain a contradiction. Therefore \((\rho ,Dv)\equiv (0,0)\). From \(v(x,T)=0\), it also follows that \(v=0\). \(\square \)

The estimate in the next lemma is similar to [12, Lemma 5.2], with the key differences that we consider non-separable Hamiltonian and local couplings.

Lemma 2.6

Assume (A1), (A2) and (A3) and let (um) be the classical solution of Eq. (1.1). Given \(a\in {\mathcal {C}}^0(Q)\) and a vector field \(b\in {\mathcal {C}}^0(Q;{\mathbb {R}}^d)\), let \((v,\rho )\) be a classical solution of the perturbed linear system

$$\begin{aligned} \left\{ \begin{aligned} (i)\qquad&-\partial _t v -\Delta v+H_p(x,m,Du)Dv+H_m(x,m,Du)\rho =f'(m)\rho +a(x,t) \qquad{} & {} \textrm{in}\,\,Q,\\ (ii)\qquad&\partial _t \rho - \Delta \rho - \mathrm{{div}} \big ( \rho H_p(x,m,Du)\big ) -\mathrm{{div}} \big ( m\rho H_{pm}(x,m,Du)\big ) \\ ={}&\mathrm{{div}}\big (mH_{pp}(x,m,Du)Dv\big )+\mathrm{{div}}(b(x,t)){} & {} \textrm{in}\,\,Q, \\&\rho (x,0)=0, \; v(x,T)=0{} & {} \textrm{in}\,\,{\mathbb {T}}^d. \end{aligned}\right. \end{aligned}$$
(2.8)

Then, there exists a constant \(C>0\), depending only on the coefficients of the problem, such that

$$\begin{aligned} \Vert v\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho \Vert _{{\mathcal {C}}^{0}}\le C\Big (\Vert a\Vert _{{\mathcal {C}}^{0}}+\Vert b\Vert _{{\mathcal {C}}^{0}}\Big ). \end{aligned}$$
(2.9)

Proof

First observe that since the system (2.8) is linear, then \((v,\rho )/(\Vert a\Vert _{{\mathcal {C}}^{0}}+\Vert b\Vert _{{\mathcal {C}}^{0}})\) is the solution of the problem corresponding to the perturbation \((a,b)/(\Vert a\Vert _{{\mathcal {C}}^{0}}+\Vert b\Vert _{{\mathcal {C}}^{0}})\). Hence, (2.9) is equivalent to show that, for \(\Vert a\Vert _{{\mathcal {C}}^{0}}+\Vert b\Vert _{{\mathcal {C}}^{0}}\le 1\), then \(\Vert v\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho \Vert _{{\mathcal {C}}^{0}}\le C\) for some \(C>0\). We argue by contradiction and suppose that the estimate is not true. Hence, we assume that there exists \(a^k\), \(b^k\) and \((v^k,\rho ^k)\) with

$$\begin{aligned} \Vert a^k\Vert _{{\mathcal {C}}^{0}}+\Vert b^k\Vert _{{\mathcal {C}}^{0}}\le 1,\,\,\theta ^k:=\Vert v^k\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^k\Vert _{{\mathcal {C}}^{0}}\ge k. \end{aligned}$$
(2.10)

Set

$$\begin{aligned} {\tilde{v}}^k:=\frac{v^k}{\theta ^k},\,\,\,{\tilde{\rho }}^k:=\frac{\rho ^k}{\theta ^k}. \end{aligned}$$

By definition, \(\Vert {\tilde{v}}^k\Vert _{{\mathcal {C}}^{1,0}}+\Vert {\tilde{\rho }}^k\Vert _{{\mathcal {C}}^{0}}=1\) for all k and the pair \(({\tilde{v}}^k,{\tilde{\rho }}^k)\) solves:

$$\begin{aligned} \left\{ \begin{aligned} (i) \qquad&-\partial _t {\tilde{v}}^k -\Delta {\tilde{v}}^k+H_p(x,m,Du)D{\tilde{v}}^k+H_m(x,m,Du){\tilde{\rho }}^k=f'(m){\tilde{\rho }}^k+\frac{a^k(x,t)}{\theta ^k} \qquad{} & {} \textrm{in}\,\,Q,\\ (ii) \qquad&\partial _t {\tilde{\rho }}^k- \Delta {\tilde{\rho }}^k - \mathrm{{div}} \big ( {\tilde{\rho }}^k H_p(x,m,Du)\big ) -\mathrm{{div}} \big ( m{\tilde{\rho }}^k H_{pm}(x,m,Du)\big ) \\ ={}&\mathrm{{div}}\big (mH_{pp}(x,m,Du)D{\tilde{v}}^k\big )+\mathrm{{div}}\big (\frac{b^k(x,t)}{\theta ^k}\big ){} & {} \textrm{in}\,\,Q, \\&\rho (x,0)=0, \; v(x,T)=0{} & {} \textrm{in}\,\,{\mathbb {T}}^d. \end{aligned}\right. \nonumber \\ \end{aligned}$$
(2.11)

Observe that \({\tilde{v}}^k\) is a solution of a linear parabolic equation with bounded coefficients. Hence, \({\tilde{v}}^k\) and \(D{\tilde{v}}^k\) are bounded in \({\mathcal {C}}^{\alpha ,\alpha /2}\) for some \(\alpha \in (0,1)\). Similarly, \({\tilde{\rho }}^k\), solution of a linear equation in divergence form, is bounded in \({\mathcal {C}}^{\beta ,\beta /2}\) for some \(\beta \in (0,1)\). By taking subsequences we obtain a cluster point \((v,\rho )\) of \(({\tilde{v}}^k,{\tilde{\rho }}^k)\) such that

$$\begin{aligned} \Vert v\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho \Vert _{{\mathcal {C}}^{0}}=\lim _{k\rightarrow +\infty }(\Vert {\tilde{v}}^k\Vert _{{\mathcal {C}}^{1,0}}+\Vert {\tilde{\rho }}^k\Vert _{{\mathcal {C}}^{0}})=1. \end{aligned}$$
(2.12)

By Eq. (2.10), we know \( a^k(x,t)/\theta ^k\) and \({ \textrm{div}} ( b^k(x,t)/\theta ^k )\) actually vanish for \(k\rightarrow \infty \) and therefore \((v,\rho )\) is a solution of (2.7). Hence by Lemma 2.5, we have \((v,\rho )=(0,0)\), a contradiction to Eq. (2.12). \(\square \)

3 The Newton Method for the Mean Field Games System with Non-separable Hamiltonian and Local Coupling

In this section, we give an estimate for the rate of convergence of the Newton method to the MFG system in the case of a non-separable Hamiltonian and local coupling. We first prove the well posedness of the system (1.4) for each n.

Proposition 3.1

For any \(n\in {\mathbb {N}}\), there exists a unique solution \((u^{n},m^{n})\in {\mathcal {C}}^{2+\alpha ,1+\alpha /2}(Q)\times {\mathcal {C}}^{2+\alpha ,1+\alpha /2}(Q)\) to the system (1.4).

Proof

Assume to have proved the statement at step \(n-1\). Hence, given \((u^{n-1},m^{n-1})\in {\mathcal {C}}^{2+\alpha ,1+\alpha /2}(Q)\times {\mathcal {C}}^{2+\alpha ,1+\alpha /2}(Q)\), Eq. (1.4) is a strongly coupled linear system for \((u^{n},m^{n})\).

We first prove existence of a weak solution \((u^n,m^n)\in W^{2,1}_r(Q)\times {\mathcal {H}}^1_r(Q)\), \(r>d+2\) by means of a fixed point argument. Define \({\textbf{X}}:=\{\varrho \in {\mathcal {C}}^0(Q): \varrho \ge 0, \varrho (x,0)=m_0(x), \int _{{\mathbb {T}}^d}\varrho (x,t)\textrm{d}x=1\}\) and consider the compact mapping \({\hat{\varrho }}={\textbf{T}}(\varrho ): {\mathcal {C}}^0(Q)\rightarrow {\mathcal {C}}^{\alpha ,\alpha /2}(Q)\) defined by solving in sequence

$$\begin{aligned} \left\{ \begin{aligned} (i) \qquad&-\partial _t {\hat{u}} -\Delta {\hat{u}}+H_p(x,m^{n-1},Du^{n-1})D{\hat{u}}+H_m(x,m^{n-1},Du^{n-1}) (\varrho - m^{n-1})\\ ={}&H_p(x,m^{n-1},Du^{n-1})Du^{n-1}-H(x,m^{n-1},Du^{n-1})+f(m^{n-1})+f'(m^{n-1})(\varrho -m^{n-1}) \qquad{} & {} \textrm{in}\,\,Q,\\&{\hat{u}}(x,T)=u_T(x){} & {} \textrm{in}\,\,{\mathbb {T}}^d,\\ (ii) \qquad&\partial _t {\hat{\varrho }}- \Delta {\hat{\varrho }} - \text {div} \big ( \varrho H_p(x,Du^{n-1})\big )-\text {div} \big (m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})\varrho \big )\\ ={}&\mathrm{{div}}\big (m^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})(D{\hat{u}}-Du^{n-1})\big )-\text {div} \big ((m^{n-1})^2H_{pm}(x,m^{n-1},Du^{n-1})\big ){} & {} \textrm{in}\,\,Q, \\&{\hat{\varrho }}(x,0)=m_0(x){} & {} \textrm{in}\,\,{\mathbb {T}}^d. \end{aligned}\right. \nonumber \\ \end{aligned}$$
(3.1)

We rewrite equation (i) in Eq. (3.1) as

$$\begin{aligned} -\partial _t {\hat{u}} -\Delta {\hat{u}}+H_p(x,m^{n-1},Du^{n-1})D{\hat{u}}={\textsf{f}} \end{aligned}$$

with

$$\begin{aligned} \begin{aligned} {\textsf{f}}:={}&H_p(x,m^{n-1},Du^{n-1})Du^{n-1}-H(x,m^{n-1},Du^{n-1})+f(m^{n-1})\\&+f'(m^{n-1})(\varrho -m^{n-1})-H_m(x,m^{n-1},Du^{n-1}) (\varrho - m^{n-1}) \in L^\infty (Q). \end{aligned} \end{aligned}$$
(3.2)

By Proposition A.2, we obtain the existence of \({\hat{u}}\in W^{2,1}_r(Q)\) solving (i) in (3.1). By Sobolev embedding \(D{\hat{u}} \in {\mathcal {C}}^{\alpha ,\alpha /2}(Q;{\mathbb {R}}^d)\). Next we rewrite equation (ii) in Eq. (3.1) as

$$\begin{aligned} \partial _t {\hat{\varrho }}- \Delta {\hat{\varrho }} - \textrm{div}({\textsf{F}})=0 \end{aligned}$$

with

$$\begin{aligned} {\textsf{F}}:={}&\varrho H_p(x,Du^{n-1})+m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})\\&+m^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})(D{\hat{u}}-Du^{n-1}) -(m^{n-1})^2H_{pm}(x,m^{n-1},Du^{n-1}). \end{aligned}$$

From \(D{\hat{u}} \in {\mathcal {C}}^{\alpha ,\alpha /2}(Q;{\mathbb {R}}^d)\) and the assumptions on \((u^{n-1},m^{n-1})\), \({\textsf{F}}\in L^\infty (Q;{\mathbb {R}}^d)\) and, by Proposition A.3, we obtain there exists a solution \({\hat{\varrho }}\in {\mathcal {H}}^1_r(Q)\) to (i) in (3.1), thus \({\hat{\varrho }}\in {\mathcal {C}}^{\alpha ,\alpha /2}(Q)\).

Set \(\delta {\hat{\varrho }}=\hat{\varrho _1}-\hat{\varrho _2}\), \(\delta {\hat{u}}={\hat{u}}_1-{\hat{u}}_2\). Then

$$\begin{aligned} \left\{ \begin{aligned} (i) \qquad&-\partial _t \delta {\hat{u}} -\Delta \delta {\hat{u}}+H_p(x,m^{n-1},Du^{n-1})D\delta {\hat{u}}+m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})\delta \varrho \\ ={}&f'(m^{n-1})\delta \varrho \qquad{} & {} \textrm{in}\,\,Q,\\ (ii) \qquad&\partial _t \delta {\hat{\varrho }}- \Delta \delta {\hat{\varrho }} - \text {div} \big ( \delta \varrho H_p(x,m^{n-1},Du^{n-1})\big )-\text {div} \big (m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})\delta \varrho \big )\\ ={}&\mathrm{{div}}\big (m^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})D\delta {\hat{u}}\big ){} & {} \textrm{in}\,\,Q, \\&\delta {\hat{\varrho }}(x,0)=0, \; \delta {\hat{u}}(x,T)=0{} & {} \textrm{in}\,\,{\mathbb {T}}^d. \end{aligned}\right. \end{aligned}$$
(3.3)

We obtain by Proposition A.2 that

$$\begin{aligned} \Vert \delta {\hat{u}}\Vert _{W^{2,1}_r(Q)}\le C\Vert \delta \varrho \Vert _{L^\infty (Q)},\,\,\,\Vert D\delta {\hat{u}}\Vert _{L^\infty (Q;{\mathbb {R}}^d)}\le C\Vert \delta \varrho \Vert _{L^\infty (Q)}, \end{aligned}$$

then by Proposition A.3

$$\begin{aligned} \Vert \delta {\hat{\varrho }}\Vert _{{\mathcal {C}}^{\alpha ,\alpha /2}}\le C(\Vert \delta \varrho \Vert _{L^\infty (Q)}+\Vert D\delta {\hat{u}}\Vert _{L^\infty (Q;{\mathbb {R}}^d)})\le C\Vert \delta \varrho \Vert _{L^\infty (Q)}. \end{aligned}$$

It then follows that \({\textbf{T}}\) is a continuous map. We conclude, by Schauder fixed point theorem, the existence of a solution to (1.4). It follows that \(({\hat{u}},{\hat{\varrho }})\in W^{2,1}_r(Q)\times {\mathcal {C}}^{\alpha ,\alpha /2}(Q)\) is a fixed point defined by Eq. (3.1), with \({\textsf{f}}\) replaced by

$$\begin{aligned} \begin{aligned} \hat{{\textsf{f}}}:={}&H_p(x,m^{n-1},Du^{n-1})Du^{n-1}-H(x,m^{n-1},Du^{n-1})+f(m^{n-1})\\&+f'(m^{n-1})({\hat{\varrho }}-m^{n-1})-H_m(x,m^{n-1},Du^{n-1}) ({\hat{\varrho }}- m^{n-1}). \end{aligned} \end{aligned}$$

Since \(\hat{{\textsf{f}}}\in {\mathcal {C}}^{\alpha ,\alpha /2}(Q)\), from Proposition A.1 it follows that \({\hat{u}}\in {\mathcal {C}}^{2+\alpha ,1+\alpha /2}(Q)\). By assumption (A2), \({\hat{u}}\in {\mathcal {C}}^{2+\alpha ,1+\alpha /2}(Q)\) and \((u^{n-1},m^{n-1})\in {\mathcal {C}}^{2+\alpha ,1+\alpha /2}(Q)\times {\mathcal {C}}^{2+\alpha ,1+\alpha /2}(Q)\), we obtain \(\textrm{div}({\textsf{F}})\in {\mathcal {C}}^{\alpha ,\alpha /2}(Q)\). Using Proposition A.1 again, we obtain \({\hat{\rho }}\in {\mathcal {C}}^{2+\alpha ,1+\alpha /2}(Q)\). We now prove uniqueness. Assume that there are two solutions \(({\hat{u}}_1,{\hat{\rho }}_1)\) and \(({\hat{u}}_2,{\hat{\rho }}_2)\) to Eq. (3.1) and set \(\delta {\hat{u}}={\hat{u}}_1-{\hat{u}}_2\), \(\delta {\hat{\varrho }}=\hat{\varrho _1}-\hat{\varrho _2}\). Clearly \((\delta {\hat{u}},\delta {\hat{\rho }})\) solves

$$\begin{aligned} \left\{ \begin{aligned} (i) \qquad&-\partial _t \delta {\hat{u}} -\Delta \delta {\hat{u}}+H_p(x,m^{n-1},Du^{n-1})D\delta {\hat{u}}+m^{n-1} H_{m}(x,m^{n-1},Du^{n-1})\delta {\hat{\rho }} \\ ={}&f'(m^{n-1})\delta {\hat{\rho }} \qquad{} & {} \textrm{in}\,\,Q,\\ (ii) \qquad&\partial _t \delta {\hat{\varrho }}- \Delta \delta {\hat{\varrho }} - \text {div} \big ( \delta {\hat{\rho }} H_p(x,m^{n-1},Du^{n-1})\big )-\text {div} \big (m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})\delta {\hat{\rho }} \big )\\ ={}&\mathrm{{div}}\big (m^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})D\delta {\hat{u}}\big ){} & {} \textrm{in}\,\,Q, \\&\delta {\hat{\varrho }}(x,0)=0, \; \delta {\hat{u}}(x,T)=0{} & {} \textrm{in}\,\,{\mathbb {T}}^d. \end{aligned}\right. \end{aligned}$$
(3.4)

Testing the equation (i) with \({\hat{\rho }}\), equation (ii) with \({\hat{u}}\) and subtracting the resulting identities, an easy computation gives

$$\begin{aligned} \begin{aligned}&\int _Qf'(m^{n-1})|\delta {\hat{\varrho }}|^2dxdt\\&\quad ={}\int _QH_m(x,m^{n-1},Du^{n-1})|\delta {\hat{\varrho }}|^2dxdt-\int _Qm^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})D\delta {\hat{u}}\cdot D\delta {\hat{u}}dxdt\\&\qquad -\int _Qm^{n-1}H_{pm}(x,m^{n-1},Du^{n-1})\delta {\hat{\varrho }} D\delta {\hat{u}}\le 0. \end{aligned} \end{aligned}$$

By (A2), Eq. (2.5), and (A3) we get \((\delta {\hat{\varrho }},D\delta {\hat{u}})=(0,0)\). From \(\delta {\hat{u}}(x,T)=0\), it also follows that \(\delta {\hat{u}}=0\). \(\square \)

Proposition 3.1 is concerned with the invertibility of the linear operator \(\mathcal{L}\mathcal{F}\) at each step n, as defined in Eq. (1.2). Hence, it is not surprising that one needs some Hessian-type condition. The constant C in Proposition 3.1 may depend on the previous step \((u^{n-1},m^{n-1})\). In the discretized setting, invertibility of a linearized system similar to Eq. (3.4) has been discussed in [3, Section 4.1] for solving a mean field planning problem with separable Hamiltonian. We believe the ideas in [3, Section 4.1] can be extended also to MFGs with non-separable Hamiltonian. However, solvability at each iteration is not enough for the convergence of the Newton method since it is necessary to prove some a priori estimates independent of iteration index n. In the next result, we obtain the local quadratic rate of convergence result.

Theorem 3.2

Let (um) be the solution of system (1.1) and \((u^n,m^n)\) is the sequence generated by Newton’s algorithm (1.4). Set \(v^n=u^n-u\), \(\rho ^n=m^n-m\). There exists a constant \(\eta >0\) such that if \(\Vert v^0\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^0\Vert _{{\mathcal {C}}^{0}}\le \eta \) then \(\Vert v^n\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^n\Vert _{{\mathcal {C}}^{0}}\rightarrow 0\) with a quadratic rate of convergence.

Proof

We emphasize that from here and for the rest of the proof, C denotes some generic constant which may increase from line to line. This constant may depend on data of the problem and the solution (um), but it is always independent of n. We observe that \(v^n\) solves the equation

$$\begin{aligned} \begin{aligned}&-\partial _t v^n -\Delta v^n+H_p(x,m,Du)\cdot Dv^n\\&={}f(m^{n-1})-f(m)+f'(m^{n-1})(m^n-m^{n-1})+H(x,m,Du)-H(x,m^{n-1},Du^{n-1})\\&\quad -H_p(x,m^{n-1},Du^{n-1})D(u^{n}- u^{n-1})-H_m(x,m^{n-1},Du^{n-1}) (m^{n}- m^{n-1}), \end{aligned} \end{aligned}$$

which can be rewritten as

$$\begin{aligned} -\partial _t v^n -\Delta v^n+H_p(x,m,Du)\cdot Dv^n+H_m(x,m,Du)\rho ^n=f'(m)\rho ^n+a, \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} a:={}&H_p(x,m,Du)(Du^n-Du)+H_m(x,m,Du)(m^n-m)+H(x,m,Du)\\&-H(x,m^{n-1},Du^{n-1})\\&-H_p(x,m^{n-1},Du^{n-1})D(u^{n}- u^{n-1})-H_m(x,m^{n-1},Du^{n-1}) (m^{n}- m^{n-1})\\&-f'(m)(m^n-m)+f(m^{n-1})-f(m)+f'(m^{n-1})(m^n-m^{n-1}). \end{aligned} \end{aligned}$$
(3.5)

In order to apply Lemma 2.6, we need to estimate \(\Vert a\Vert _{{\mathcal {C}}^0}\). We rewrite the terms involving H in Eq. (3.5) as

$$\begin{aligned} \begin{aligned}&H_p(x,m,Du)(Du^n-Du)+H_m(x,m,Du)(m^n-m)+H(x,m,Du)\\&-H(x,m^{n-1},Du^{n-1})\\&-H_p(x,m^{n-1},Du^{n-1})D(u^{n}- u^{n-1})-H_m(x,m^{n-1},Du^{n-1}) (m^{n}- m^{n-1})\\ ={}&H(x,m,Du)-H(x,m^{n-1},Du^{n-1})-H_p(x,m^{n-1},Du^{n-1})D(u- u^{n-1})\\&-H_m(x,m^{n-1},Du^{n-1}) (m- m^{n-1})+\Big (H_p(x,m,Du)\\&-H_p(x,m^{n-1},Du^{n-1})\Big )(Du^n-Du)\\&+\Big (H_m(x,m,Du)-H_m(x,m^{n-1},Du^{n-1})\Big ) (m^n-m). \end{aligned} \end{aligned}$$
(3.6)

We now estimate the terms on the right hand side of the previous identity. It is clear from (A2) that for any \(\tau \in (0,1)\),

$$\begin{aligned} \begin{aligned} \sup _\tau&\big \vert H_{mm}(x,m+\tau (m^{n-1}-m),Du+\tau (Du^{n-1}-Du))\big \vert \\&\le C|Du+\tau (Du^{n-1}-Du)|^2 \le C\big (\vert Du\vert +\vert Du^{n-1}-Du\vert \big )^2\\&\le C\big (2\vert Du\vert ^2+2\vert Du^{n-1}-Du\vert ^2 \big ). \end{aligned} \end{aligned}$$
(3.7)

Likewise,

$$\begin{aligned} \begin{aligned} \sup _\tau \big \vert H_{mp}(x,m+\tau (m^{n-1}-m),Du+\tau (Du^{n-1}-Du))\big \vert&\le C|Du+\tau (Du^{n-1}-Du)|\\&\le C\big (\vert Du\vert +\vert Du^{n-1}-Du\vert \big ), \end{aligned} \end{aligned}$$
(3.8)

and

$$\begin{aligned} \sup _\tau \vert H_{pp}(x,m+\tau (m^{n-1}-m),Du+\tau (Du^{n-1}-Du))\vert \le {\bar{C}}. \end{aligned}$$

Moreover, by the mean value theorem, we have

$$\begin{aligned} \begin{aligned}&H(x,m,Du)-H(x,m^{n-1},Du^{n-1})-H_p(x,m^{n-1},Du^{n-1})D(u- u^{n-1})\\&\quad -H_m(x,m^{n-1},Du^{n-1}) (m- m^{n-1})\\&={}\int _0^1\Big (H_p\big (x,m^{n-1}+\tau (m-m^{n-1}),Du^{n-1}+\tau (Du-Du^{n-1})\big )\\&\quad -H_p(x,m^{n-1},Du^{n-1})\Big )(Du-Du^{n-1})\textrm{d}\tau \\&\quad +\int _0^1\Big (H_m\big (x,m^{n-1}+\tau (m-m^{n-1}),Du^{n-1}+\tau (Du-Du^{n-1})\big )\\&\quad -H_p(x,m^{n-1},Du^{n-1})\Big )(m-m^{n-1})\textrm{d}\tau . \end{aligned} \end{aligned}$$
(3.9)

By using mean value theorem again, we estimate the integrand in Eq. (3.9) by

$$\begin{aligned} \begin{aligned}&H_p\big (x,m^{n-1}+\tau (m-m^{n-1}),Du^{n-1}+\tau (Du-Du^{n-1})\big )\\&\quad -H_p(x,m^{n-1},Du^{n-1})\\&= {} \int _0^1 \Big (H_{pp}\big (x,m^{n-1}+(1-\tau ')\tau (m-m^{n-1}),Du^{n-1}\\&\quad +(1-\tau ')\tau (Du-Du^{n-1})\big )\tau (Du^{n-1}-Du)\textrm{d}\tau ' \\&\quad +\int _0^1 \Big (H_{pm}\big (x,m^{n-1}+(1-\tau ')\tau (m-m^{n-1}),Du^{n-1}+(1-\tau ')\tau (Du-Du^{n-1})\big )\\&\quad -H_p(x,m^{n-1},Du^{n-1})\Big )\tau (m^{n-1}-m)\textrm{d}\tau \end{aligned} \end{aligned}$$

and therefore, as \(0<(1-\tau ')\tau <1\), \(0<\tau <1\), we get

$$\begin{aligned} \begin{aligned}&\sup _\tau \vert H_p\big (x,m^{n-1}+\tau (m-m^{n-1}),Du^{n-1}+\tau (Du-Du^{n-1})\big )-H_p(x,m^{n-1},Du^{n-1})\vert \\&\quad \le \sup _\tau \vert H_{pp}\big (x,m^{n-1}+\tau (m-m^{n-1}),Du^{n-1}+\tau (Du-Du^{n-1})\vert \vert Du-Du^{n-1}\vert \\&\qquad +\sup _\tau \vert H_{pm}\big (x,m^{n-1}+\tau (m-m^{n-1}),Du^{n-1}+\tau (Du-Du^{n-1})\vert \vert m-m^{n-1}\vert \\&\quad \le {\bar{C}}\vert Du-Du^{n-1}\vert +C(\vert Du\vert +\vert Du-Du^{n-1}\vert ) \vert m-m^{n-1}\vert , \end{aligned} \end{aligned}$$
$$\begin{aligned} \begin{aligned}&\left| \int _0^1\Big (H_p\big (x,m^{n-1}+\tau (m-m^{n-1}),Du^{n-1}+\tau (Du-Du^{n-1})\big )\right. \\&\qquad \left. -H_p(x,m^{n-1},Du^{n-1})\Big )(Du-Du^{n-1})\textrm{d}\tau \right| \\&\quad \le \left( \int _0^1\tau \textrm{d}\tau \right) \Big ({\bar{C}}\vert Du-Du^{n-1}\vert +C(\vert Du\vert +\vert Du-Du^{n-1}\vert ) \vert m-m^{n-1}\vert \Big )\vert Du-Du^{n-1}\vert . \end{aligned} \end{aligned}$$

In a similar way, we obtain

$$\begin{aligned}{} & {} \left| \int _0^1\Big (H_m\big (x,m^{n-1}+\tau (m-m^{n-1}),Du^{n-1}+\tau (Du-Du^{n-1})\big )\right. \\{} & {} \qquad \left. -H_m(x,m^{n-1},Du^{n-1})\Big )(m-m^{n-1})\textrm{d}\tau \right| \\{} & {} \quad \le {} (\int _0^1\tau \textrm{d}\tau )\Big (C(1+\vert Du-Du^{n-1}\vert ^2 )\vert m-m^{n-1}\vert +C(1+\vert Du-Du^{n-1}\vert )\vert Du\\{} & {} \qquad -Du^{n-1}\vert \vert \Big )\vert m-m^{n-1}\vert . \end{aligned}$$

Therefore, replacing in Eq. (3.9), we have

$$\begin{aligned} \begin{aligned}&\left| H(x,m,Du)-H(x,m^{n-1},Du^{n-1})-H_p(x,m^{n-1},Du^{n-1})D(u- u^{n-1})\right. \\&\qquad \left. -H_m(x,m^{n-1},Du^{n-1}) (m- m^{n-1})\right| \\&\quad \le C(\vert Dv^{n-1}\vert ^2)+C(1+\vert Dv^{n-1}\vert )\vert Dv^{n-1}\vert \vert \rho ^{n-1}\vert +C(1+\vert Dv^{n-1}\vert ^2)\vert \rho ^{n-1}\vert ^2\\&\quad \le C\Big (\vert Dv^{n-1}\vert ^2+\vert \rho ^{n-1}\vert ^2+\vert Dv^{n-1}\vert ^4+\vert \rho ^{n-1}\vert ^4\Big ). \end{aligned} \end{aligned}$$
(3.10)

For the other terms in Eq. (3.6), we first observe that by (A2) and Eq. (3.8) we get

$$\begin{aligned} \begin{aligned}&\big \vert H_p(x,m,Du)-H_p(x,m^{n-1},Du^{n-1})\big \vert \le {} \sup _\tau \vert H_{pm}(x,m+\tau (m^{n-1}-m),Du\\&\qquad +\tau (Du^{n-1}-Du)) \vert \vert \rho ^{n-1}\vert \\&\qquad +\sup _\tau \vert H_{pp}(x,m+\tau (m^{n-1}-m),Du+\tau (Du^{n-1}-Du)) \vert \vert v^{n-1}\vert \\&\quad \le C(\vert Du\vert +\vert Dv^{n-1}\vert )\vert \rho ^{n-1}\vert +C\vert Dv^{n-1}\vert , \end{aligned} \end{aligned}$$

and, by Eqs. (3.7) and (3.8),

$$\begin{aligned} \begin{aligned}&\big \vert H_m(x,m,Du)-H_m(x,m^{n-1},Du^{n-1})\big \vert \le \sup _\tau \vert H_{pm}\big (x,m+\tau (m^{n-1}-m),Du\\&\qquad +\tau (Du^{n-1}-Du)\big ) \vert \vert v^{n-1}\vert \\&\qquad +\sup _\tau \vert H_{mm}\big (x,m+\tau (m^{n-1}-m),Du+\tau (Du^{n-1}-Du)\big ) \vert \vert \rho ^{n-1}\vert \\&\quad \le C(\vert Du\vert +\vert Dv^{n-1}\vert )\vert Dv^{n-1}\vert +C(\vert Du\vert ^2+\vert Dv^{n-1}\vert ^2)\vert \rho ^{n-1}\vert . \end{aligned} \end{aligned}$$

We then obtain

$$\begin{aligned} \begin{aligned}&\left| \Big (H_p(x,m,Du)-H_p(x,m^{n-1},Du^{n-1})\Big )(Du^n-Du)+\Big (H_m(x,m,Du)\right. \\&\qquad \left. -H_m(x,m^{n-1},Du^{n-1})\Big ) (m^n-m)\right| \\&\quad \le \vert H_p(x,m,Du)-H_p(x,m^{n-1},Du^{n-1})\vert \vert Du^n-Du\vert \\&\qquad +\vert H_m(x,m,Du)-H_m(x,m^{n-1},Du^{n-1})\vert \vert m^n-m\vert \\&\quad \le \Big (C(1+\vert Dv^{n-1}\vert )\vert \rho ^{n-1}\vert \\&\qquad +C\vert Dv^{n-1}\vert \Big )\vert Dv^n\vert +\Big (C(1+\vert Dv^{n-1}\vert ^2)\vert \rho ^{n-1}\vert \vert +C(1+\vert Dv^{n-1}\vert )\vert Dv^{n-1}\vert \Big )\vert \rho ^n\vert \\&\quad \le C\Big (\vert Dv^{n-1}\vert +\vert Dv^{n-1}\vert ^2+\vert Dv^{n-1}\vert \vert \rho ^{n-1}\vert +\vert Dv^{n-1}\vert ^2 \vert \rho ^{n-1}\vert +\vert \rho ^{n-1}\vert \Big )(\vert Dv^n\vert +\vert \rho ^n\vert )\\&\quad \le C\Big (\vert Dv^{n-1}\vert +\vert Dv^{n-1}\vert ^2+\vert Dv^{n-1}\vert ^4+\vert \rho ^{n-1}\vert +\vert \rho ^{n-1}\vert ^2\Big )\big (\vert Dv^n\vert +\vert \rho ^n\vert \big ). \end{aligned} \end{aligned}$$
(3.11)

To estimate the terms containing f in Eq. (3.5), we observe that

$$\begin{aligned} \begin{aligned}&-f'(m)(m^n-m)+f(m^{n-1})-f(m)+f'(m^{n-1})(m^n-m^{n-1})\\ ={}&f(m^{n-1})-f(m)-f'(m)(m^{n-1}-m)+ (f'(m^{n-1})-f'(m))(m^n-m^{n-1}). \end{aligned} \end{aligned}$$

Exploiting that \(f'(\cdot )\) is globally Lipschitz, see (A3), we have

$$\begin{aligned} \vert f(m^{n-1})-f(m)-f'(m)(m^{n-1}-m)\vert \le C|m^{n-1}-m|^2, \end{aligned}$$
(3.12)

and

$$\begin{aligned} \begin{aligned}&\left| \Big (f'(m^{n-1})-f'(m)\Big )(m^n-m^{n-1})\right| \\&\quad \le \vert f'(m^{n-1})-f'(m)\vert \vert m^n-m\vert +\vert f'(m^{n-1})-f'(m)\vert \vert m-m^{n-1}\vert \\&\quad \le C\Big (|\rho ^n||\rho ^{n-1}|+|\rho ^{n-1}|^2\Big ). \end{aligned} \end{aligned}$$
(3.13)

Finally, by Eqs. (3.10), (3.11), (3.12) and (3.13), we get

$$\begin{aligned} \begin{aligned} \Vert a\Vert _{{\mathcal {C}}^{0}} \le {}&C\Big (\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}^2+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}^2+\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}^4+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}^4\\&+\big (\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}} +\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}^2+\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}^4+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}\\&+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}^2\big )\big (\Vert v^{n}\Vert _{{\mathcal {C}}^{1,0}}+ \Vert \rho ^{n}\Vert _{{\mathcal {C}}^{0}}\big )\Big ). \end{aligned} \end{aligned}$$
(3.14)

Now we consider the equation satisfied by \(\rho ^n\). We have

$$\begin{aligned}{} & {} \partial _t \rho ^n -\Delta \rho ^n-\mathrm{{div}}(\rho ^nH_p(x,m,Du))-\mathrm{{div}}(m\rho ^nH_{pm}(x,m,Du))\\{} & {} \quad = \mathrm{{div}}\big (mH_{pp}(x,m,Du)Dv^n\big )\mathrm{{div}}(b), \end{aligned}$$

with

$$\begin{aligned} \begin{aligned} b:={}&-\rho ^nH_p(x,m,Du)-m\rho ^nH_{pm}(x,m,Du)\\&-mH_{pp}(x,m,Du)Dv^n\\&+ m^{n-1}H_p(x,m^{n-1},Du^{n-1}) -mH_p(x,m,Du) \\&+ (m^{n}- m^{n-1}) H_p(x,m^{n-1},Du^{n-1})\\&+m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})(m^{n}- m^{n-1}) \\&+m^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})(Du^n-Du^{n-1}). \end{aligned} \end{aligned}$$

To estimate \(\Vert b\Vert _{{\mathcal {C}}^0}\), we start observing that

$$\begin{aligned} \begin{aligned} b={}&m^{n-1}H_p(x,m^{n-1},Du^{n-1})\\&-mH_p(x,m,Du)-m^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})(Du^{n-1}-Du)\\&-(m^{n-1}-m) H_p(x,m^{n-1},Du^{n-1})-m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})(m^{n-1}-m) \\&+\rho ^n\Big (H_p(x,m^{n-1},Du^{n-1})-H_p(x,m,Du)\Big )\\&+\rho ^n\Big (m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})-mH_{pm}(x,m,Du)\Big )\\&+Dv^n\Big (m^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})-mH_{pp}(x,m,Du)\Big ). \end{aligned} \end{aligned}$$
(3.15)

Denoting \(\Phi (x,m,p)=mH_p(x,m,p)\), then by elementary calculation

$$\begin{aligned}&\partial _{p}\Phi (x,m,p)=mH_{pp}(x,m,p),\,\partial _{pp}\Phi (x,m,p)=mH_{ppp}(x,m,p),\\&\partial _{m}\Phi (x,m,p)=H_p(x,m,p)+mH_{pm}(x,m,p),\,\partial _{mm}\Phi (x,m,p)\\&\quad =2H_{pm}(x,m,p)+mH_{mmp}(x,m,p). \end{aligned}$$

It is clear from (A2) that

$$\begin{aligned} \begin{aligned}&\sup _\tau \left| \big (m+\tau (m^{n-1}-m)\big )H_{ppp}(x,m+\tau (m^{n-1}-m),Du\right. \\&\quad \left. +\tau (Du^{n-1}-Du))\right| \le C\big (\vert m\vert +|\rho ^{n-1}|\big ). \end{aligned} \end{aligned}$$

From Eq. (3.8) and

$$\begin{aligned}{} & {} \sup _\tau \big \vert H_{mmp}(x,m+\tau (m^{n-1}-m),Du\nonumber \\{} & {} \quad +\tau (Du^{n-1}-Du))\big \vert \le C\big (\vert Du\vert +\vert Du^{n-1}-Du\vert \big ), \end{aligned}$$
(3.16)

we obtain

$$\begin{aligned} \begin{aligned}&\sup _\tau \left| 2H_{pm}(x,m+\tau (m^{n-1}-m),Du+\tau (Du^{n-1}-Du))\right| \\&\qquad +\sup _\tau \left| \big (m+\tau (m^{n-1}-m)\big )H_{pmm}(x,m+\tau (m^{n-1}-m),Du+\tau (Du^{n-1}-Du))\right| \\&\quad \le {} C\big (1+\vert Dv^{n-1}\vert +\vert \rho ^{n-1}\vert +\vert Dv^{n-1}\vert \vert \rho ^{n-1}\vert \big ). \end{aligned} \end{aligned}$$

In addition, we have

$$\begin{aligned} \begin{aligned}&\sup _\tau \Big \vert H_{pp}\big (x,m+\tau (m^{n-1}-m),Du+\tau (Du^{n-1}-Du)\big )\\&\qquad +\big (m+\tau (m^{n-1}-m)\big )H_{ppm}\big (x,m+\tau (m^{n-1}-m),Du+\tau (Du^{n-1}-Du)\big )\Big \vert \\&\quad \le {} C(1+\vert \rho ^{n-1}\vert ). \end{aligned} \end{aligned}$$

Collecting these estimates, we obtain

$$\begin{aligned} \begin{aligned}&\Big \vert m^{n-1}H_p(x,m^{n-1},Du^{n-1}) -mH_p(x,m,Du)-m^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})(Du^{n-1}-Du)\\&\qquad -(m^{n-1}-m) H_p(x,m^{n-1},Du^{n-1})-m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})(m^{n-1}-m) \Big \vert \\&\quad \le C\big (1+|\rho ^{n-1}|\big )\vert Dv^{n-1}\vert ^2+C\big (1+\vert v^{n-1}\vert +\vert \rho ^{n-1}\vert \\&\qquad +\vert Dv^{n-1}\vert \vert \rho ^{n-1}\vert \big )\vert \rho ^{n-1}\vert ^2+C(1+\vert \rho ^{n-1}\vert )\vert Dv^{n-1}\vert \vert \rho ^{n-1}\vert \\&\quad \le C\big (\vert Dv^{n-1}\vert ^2+\vert \rho ^{n-1}\vert ^2+\vert \rho ^{n-1}\vert ^3\\&\qquad +\vert Dv^{n-1}\vert \vert \rho ^{n-1}\vert +\vert Dv^{n-1}\vert ^2 \vert \rho ^{n-1}\vert +\vert Dv^{n-1}\vert \vert \rho ^{n-1}\vert ^2+\vert Dv^{n-1}\vert \vert \rho ^{n-1}\vert ^3\big )\\&\quad \le C\big (\vert Dv^{n-1}\vert ^2+\vert \rho ^{n-1}\vert ^2+\vert Dv^{n-1}\vert ^4+\vert \rho ^{n-1}\vert ^4+\vert \rho ^{n-1}\vert ^3+\vert \rho ^{n-1}\vert ^6\big ). \end{aligned} \end{aligned}$$
(3.17)

Moreover, from Eq. (2.4), it follows that

$$\begin{aligned}{} & {} \left| H_p(x,m^{n-1},Du^{n-1})-H_p(x,m,Du)\right| \le C\left( \vert Dv^{n-1}\vert \right. \nonumber \\{} & {} \quad \left. +(1+\vert Dv^{n-1}\vert )\vert \rho ^{n-1}\vert \right) , \end{aligned}$$
(3.18)

and, from Eqs. (2.4) and (3.16),

$$\begin{aligned} \begin{aligned}&\left| m^{n-1} H_{pm}(x,m^{n-1},Du^{n-1})-mH_{pm}(x,m,Du)\right| \\&\quad \le {}C(1+\vert \rho ^{n-1}\vert )\vert Dv^{n-1}\vert +C\Big (1+\vert v^{n-1}\vert +(1+\vert v^{n-1}\vert )(1+\vert \rho ^{n-1}\vert )\Big )\vert \rho ^{n-1}\vert \\&\quad \le {} C\big (\vert Dv^{n-1}\vert +\vert \rho ^{n-1}\vert +\vert Dv^{n-1}\vert ^2+\vert \rho ^{n-1}\vert ^2+\vert \rho ^{n-1}\vert ^4\big ), \end{aligned} \end{aligned}$$
(3.19)
$$\begin{aligned} \begin{aligned}&\left| m^{n-1}H_{pp}(x,m^{n-1},Du^{n-1})-mH_{pp}(x,m,Du)\right| \le C(1+\vert \rho ^{n-1}\vert )\vert Dv^{n-1}\vert \\&\qquad +C\vert \rho ^{n-1}\vert +C(1+\vert \rho ^{n-1}\vert )\vert \rho ^{n-1}\vert . \end{aligned} \end{aligned}$$
(3.20)

Replacing estimates Eqs. (3.17)–(3.20) in Eq. (3.15), we get

$$\begin{aligned} \begin{aligned}&\Vert b\Vert _{{\mathcal {C}}^{0}} \le C\Big (\Vert v^{n-1}\Vert ^2_{{\mathcal {C}}^{1,0}}+\Vert v^{n-1}\Vert ^4_{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n-1}\Vert ^2_{{\mathcal {C}}^{0}}+\Vert \rho ^{n-1}\Vert ^3_{{\mathcal {C}}^{0}}+\Vert \rho ^{n-1}\Vert ^4_{{\mathcal {C}}^{0}}+\Vert \rho ^{n-1}\Vert ^6_{{\mathcal {C}}^{0}}\\&\quad +\big (\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}+ \Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}+\Vert v^{n-1}\Vert ^2_{{\mathcal {C}}^{1,0}}\\&\quad + \Vert \rho ^{n-1}\Vert ^2_{{\mathcal {C}}^{0}}+ \Vert \rho ^{n-1}\Vert ^4_{{\mathcal {C}}^{0}}\big )\big (\Vert v^{n}\Vert _{{\mathcal {C}}^{1,0}}+ \Vert \rho ^{n}\Vert _{{\mathcal {C}}^{0}}\big )\Big ). \end{aligned} \end{aligned}$$
(3.21)

Finally, from Lemma 2.6 and estimates Eqs. (3.14) and (3.21), we have

$$\begin{aligned} \begin{aligned}&\Vert v^n\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^n\Vert _{{\mathcal {C}}^{0}} \le C(\Vert a\Vert _{{\mathcal {C}}^{0}}+\Vert b\Vert _{{\mathcal {C}}^{0}})\\&\quad \le K\Big [\Vert v^{n-1}\Vert ^2_{{\mathcal {C}}^{1,0}}+\Vert v^{n-1}\Vert ^4_{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n-1}\Vert ^2_{{\mathcal {C}}^{0}}+\Vert \rho ^{n-1}\Vert ^3_{{\mathcal {C}}^{0}}+\Vert \rho ^{n-1}\Vert ^4_{{\mathcal {C}}^{0}}+\Vert \rho ^{n-1}\Vert ^6_{{\mathcal {C}}^{0}}\\&\qquad +\big (\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}+ \Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}+\Vert v^{n-1}\Vert ^2_{{\mathcal {C}}^{1,0}}+\Vert v^{n-1}\Vert ^4_{{\mathcal {C}}^{1,0}}+ \Vert \rho ^{n-1}\Vert ^2_{{\mathcal {C}}^{0}}\\&\qquad + \Vert \rho ^{n-1}\Vert ^4_{{\mathcal {C}}^{0}}\big )\big (\Vert v^{n}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n}\Vert _{{\mathcal {C}}^{0}}\big )\Big ], \end{aligned} \end{aligned}$$
(3.22)

where K is a constant which depends only on the data of the problem. Without loss of generality, we assume that \(K>1\). Assume that initial guess \((u^0,m^0)\) of the Newton method satisfies

$$\begin{aligned} \Vert v^{0}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{0}\Vert _{{\mathcal {C}}^{0}}\le \frac{1}{12K}, \end{aligned}$$

where K as in Eq. (3.22). Since \(K>1\), we have

$$\begin{aligned} \Vert v^0\Vert ^4_{{\mathcal {C}}^{1,0}}\le \Vert v^{0}\Vert _{{\mathcal {C}}^{1,0}},\,\Vert \rho ^{0}\Vert ^6_{{\mathcal {C}}^{0}}\le \Vert \rho ^{0}\Vert _{{\mathcal {C}}^{0}}, \end{aligned}$$
$$\begin{aligned}{} & {} \Vert v^{0}\Vert _{{\mathcal {C}}^{1,0}}+ \Vert \rho ^{0}\Vert _{{\mathcal {C}}^{0}}+\Vert v^{0}\Vert ^2_{{\mathcal {C}}^{1,0}}+\Vert v^{0}\Vert ^4_{{\mathcal {C}}^{1,0}}+ \Vert \rho ^{0}\Vert ^2_{{\mathcal {C}}^{0}}+ \Vert \rho ^{0}\Vert ^4_{{\mathcal {C}}^{0}}\le 3(\Vert v^{0}\Vert _{{\mathcal {C}}^{1,0}}\nonumber \\{} & {} \quad +\Vert \rho ^{0}\Vert _{{\mathcal {C}}^{0}})\le \frac{1}{4K},\nonumber \\{} & {} \Vert v^{0}\Vert ^2_{{\mathcal {C}}^{1,0}}+\Vert \rho ^{0}\Vert ^2_{{\mathcal {C}}^{0}}\le 2(\Vert v^{0}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{0}\Vert _{{\mathcal {C}}^{0}})^2\le \frac{1}{72K},\nonumber \\{} & {} \Vert v^{0}\Vert ^2_{{\mathcal {C}}^{1,0}}+\Vert v^{0}\Vert ^4_{{\mathcal {C}}^{1,0}}+\Vert \rho ^{0}\Vert ^2_{{\mathcal {C}}^{0}}+\Vert \rho ^{0}\Vert ^3_{{\mathcal {C}}^{0}}+\Vert \rho ^{0}\Vert ^4_{{\mathcal {C}}^{0}}+\Vert \rho ^{0}\Vert ^6_{{\mathcal {C}}^{0}}\le 4(\Vert v^{0}\Vert ^2_{{\mathcal {C}}^{1,0}}\nonumber \\{} & {} \quad +\Vert \rho ^{0}\Vert ^2_{{\mathcal {C}}^{0}})\le \frac{1}{18K}. \end{aligned}$$
(3.23)

Replacing the previous estimates in Eq. (3.22) for \(n=1\), we get

$$\begin{aligned} \Vert v^1\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^1\Vert _{{\mathcal {C}}^{0}}\le \frac{1}{18K}+\frac{1}{4}(\Vert v^1\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^1\Vert _{{\mathcal {C}}^{0}}). \end{aligned}$$

Hence, by absorbing the term \(\frac{1}{4}(\Vert v^1\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^1\Vert _{{\mathcal {C}}^{0}})\) on the right hand side, we get

$$\begin{aligned} \Vert v^1\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^1\Vert _{{\mathcal {C}}^{0}}\le \frac{2}{27K}<\frac{1}{12K}. \end{aligned}$$
(3.24)

Arguing iteratively, we have that, if \(\Vert v^{0}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{0}\Vert _{{\mathcal {C}}^{0}}\le \frac{1}{12K}\), then

$$\begin{aligned} \Vert v^{n}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n}\Vert _{{\mathcal {C}}^{0}}\le \frac{1}{12K}, \qquad \text {for any} n\in {\mathbb {N}}. \end{aligned}$$

Repeating an estimate similar to Eq. (3.23) for \(n-1\), we have

$$\begin{aligned} \Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}+ \Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}+\Vert v^{n-1}\Vert ^2_{{\mathcal {C}}^{1,0}}+\Vert v^{n-1}\Vert ^4_{{\mathcal {C}}^{1,0}}+ \Vert \rho ^{n-1}\Vert ^2_{{\mathcal {C}}^{0}}+ \Vert \rho ^{n-1}\Vert ^4_{{\mathcal {C}}^{0}}\le \frac{1}{4K}. \end{aligned}$$

From Eq. (3.22), we obtain

$$\begin{aligned} \Vert v^n\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^n\Vert _{{\mathcal {C}}^{0}}\le K\Big [8\big (\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}\big )^2+\frac{1}{4K}\big (\Vert v^{n}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n}\Vert _{{\mathcal {C}}^{0}}\big )\Big ], \end{aligned}$$

hence we get the estimate

$$\begin{aligned} \Vert v^{n}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n}\Vert _{{\mathcal {C}}^{0}}\le \frac{32K}{3}\big (\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}\big )^2. \end{aligned}$$

Multiplying both the side of the previous estimate for \(\frac{32K}{3}\), we have

$$\begin{aligned} \frac{32K}{3}\big (\Vert v^{n}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n}\Vert _{{\mathcal {C}}^{0}}\big )\le \Big (\frac{32K}{3}\big (\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}\big )\Big )^2. \end{aligned}$$

Hence, as we have assumed \(\big (\Vert v^{0}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{0}\Vert _{{\mathcal {C}}^{0}}\big )\le \frac{1}{12K}\), we obtain by induction that

$$\begin{aligned} \Vert v^{n}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n}\Vert _{{\mathcal {C}}^{0}}\le \frac{3}{32K}\Big (\frac{8}{9}\Big )^{2^n}. \end{aligned}$$

\(\square \)

Remark 3.3

We can extend our main convergence rate result to MFG system with superquadratic Hamiltonians, if we assume the system admits a classical solution. In this case, for the rate of convergence in Theorem 3.2, it is sufficient to assume that H is smooth, without the uniform bounds on the derivatives in Eq. (2.4). Indeed, a careful inspection of the previous proof shows that the constant K in Eq. (3.22) depends on (um), the data of the problem and, in particular, on the derivative of the Hamiltonian computed in \(Dv^{n-1}\), \(\rho ^{n-1}\). If we assume that \(\Vert v^0\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^0\Vert _{{\mathcal {C}}^{0}}<1/12K\), then, arguing as in the proof, we have that \(\Vert v^n\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^n\Vert _{{\mathcal {C}}^{0}}<1/12K\) for any \(n\in {\mathbb {N}}\). Hence, the constant K in Eq. (3.22), which depends on \(\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}\), does not change with the iterations. The restriction of quadratic H in Theorem 3.2 is made to prove existence and uniqueness of a classical solution to the MFG system.

Remark 3.4

Assumption (A3) requires the uniform Lipschitz continuity of \(f'\) and therefore it excludes some interesting cases such as \(f(m)=m^\alpha \), \(\alpha \ne 1\). We show that we can at least consider the case \(\alpha \ge 2\). The main difference from using (A3) is in the estimate Eq. (3.13). We replace the argument in the proof with

$$\begin{aligned} \left| f'(m^{n-1})-f'(m)\right|&\le (\alpha -1)(|m^{\alpha -2}|+|(m^{n-1})^{\alpha -2}|)|\rho ^{n-1}|\\&\le (\alpha -1)(2|m^{\alpha -2}|+|(\rho ^{n-1})^{\alpha -2}|)|\rho ^{n-1}| \end{aligned}$$

and therefore

$$\begin{aligned}&\left| \Big (f'(m^{n-1})-f'(m)\Big )(m^n-m^{n-1})\right| \\&\quad \le {}(\alpha -1)(2|m^{\alpha -2}|+|(\rho ^{n-1})^{\alpha -2}|)|\rho ^{n-1}|^2+ (\alpha -1)(2|m^{\alpha -2}|+|(\rho ^{n-1})^{\alpha -1}|)|\rho ^{n-1}||\rho ^n|. \end{aligned}$$

Then one can proceed similarly as in Theorem 3.2. So far we do not have a corresponding result for \(0<\alpha <2\), \(\alpha \ne 1\). For global (in time) solutions to MFGs with separable Hamiltonians and \(f(m)=m^\alpha \), \(\alpha >0\) we refer to the paper of Cirant and Goffi [20, Theorem 1.4].

4 The Newton Method for the Mean Field Games System with Saparable Hamiltonian and Nonlocal Coupling

In this section, we consider the MFG system with Hamiltonian independent of m and nonlocal coupling

$$\begin{aligned} \left\{ \begin{aligned} (i)\qquad&-\partial _t u -\Delta u+H(x,Du)=f[m](x) \qquad{} & {} \textrm{in}\,\,Q,\\ (ii) \qquad&\partial _t m- \Delta m - \text {div} \big ( mH_p(x,Du)\big ) =0{} & {} \textrm{in}\,\,Q, \\&m(x,0)=m_0(x), \; {u(x,T)=g[m(T)](x)}{} & {} \textrm{in}\,\,{\mathbb {T}}^d. \end{aligned}\right. \end{aligned}$$
(4.1)

We assume that \(m_0\) is as in (A1), the Hamiltonian H satisfies (A2), while the assumption on \(u_T\) in (A1) and (A3) are replaced by

  1. (A3’)

    \(f,g: {\mathbb {T}}^d \times {\mathcal {P}}({\mathbb {T}}^d) \rightarrow {\mathbb {R}}\). f, g and their space derivatives \(\partial _{x_i}f\), \(\partial _{x_i}g\), \(\partial _{x_ix_j}g\) are all globally Lipschitz continuous. The measure derivatives \(\frac{\delta f}{\delta m}\) and \(\frac{\delta g}{\delta m}\) \(:{\mathbb {T}}^d\times {\mathcal {P}}({\mathbb {T}}^d)\times {\mathbb {T}}^d\rightarrow {\mathbb {R}}\) are also Lipschitz continuous. For any \(m,m'\in {\mathcal {P}}({\mathbb {T}}^d)\),

    $$\begin{aligned} \begin{aligned} \int _{{\mathbb {T}}^d} \left( f[m](x)-f[m'](x)\right) \textrm{d}(m-m')(x)\ge 0, \,\,\\ \int _{{\mathbb {T}}^d} \left( g[m](x)-g[m'](x)\right) \textrm{d}(m-m')(x)\ge 0. \end{aligned} \end{aligned}$$
    (4.2)

Remark 4.1

Equation (4.2) implies that \(\frac{\delta f}{\delta m}\) and \(\frac{\delta g}{\delta m}\) satisfy the following monotonicity property (explained for f):

$$\begin{aligned} \int _{{\mathbb {T}}^d}\int _{{\mathbb {T}}^d}\frac{\delta f}{\delta m}[m](x)(y)\rho (x)\rho (y)\textrm{d}x\textrm{d}y\ge 0 \end{aligned}$$
(4.3)

for any centered measure \(\rho \), c.f. [16, p. 36].

Remark 4.2

From assumption (A3’), there exists a constant \(C>0\),

$$\begin{aligned} \sup _{x\in {\mathbb {T}}^d}\big |f[m'](x)-f[m](x)\big |+\sup _{x,y \in {\mathbb {T}}^d}\big |\frac{\delta f}{\delta m}[m'](x)(y)-\frac{\delta f}{\delta m}[m](x)(y)\big |&\le C \mathbf{{d}}_1(m,m'),\end{aligned}$$
(4.4)
$$\begin{aligned} \sup _{x\in {\mathbb {T}}^d}\big |f[m' ](x) - f[m](x) - \int _{{\mathbb {T}}^d} \frac{\delta f}{\delta m}[m](x)(y)\textrm{d}(m'-m)(y)\big |&\le C\mathbf{{d}}^2_1(m,m') ,\end{aligned}$$
(4.5)
$$\begin{aligned} \sup _{x\in {\mathbb {T}}^d}\big |g[m' ](x) - g[m](x) - \int _{{\mathbb {T}}^d} \frac{\delta g}{\delta m}[m](x)(y)\textrm{d}(m'-m)(y)\big |&\le C\mathbf{{d}}^2_1(m,m') . \end{aligned}$$
(4.6)

Remark 4.3

For the simplicity, we will be using the shortened notation, c.f. [16, p. 60]

$$\begin{aligned} \int _{{\mathbb {T}}^d} \frac{\delta f}{\delta m}[m](x)(y)\textrm{d}m'(y)=\frac{\delta f}{\delta m}[m](x)m' \end{aligned}$$

for the duality bracket between \(\frac{\delta f}{\delta m}[m]\) and \(m'\) at x.

The next two lemmas are proved in Cardaliaguet Briani [12, Lemma 5.2]. A similar result with different functional spaces is discussed in Cardaliaguet, Delarue, Lasry and Lions [16, Lemma 3.3.1].

Lemma 4.4

Under assumptions (A1), (A2) and (A3’), let (um) be a classical solution to the system (4.1). Then, the unique weak solution of the system

$$\begin{aligned} \left\{ \begin{aligned} (i) \qquad&-\partial _t v -\Delta v+H_p(x,Du)Dv=\frac{\delta f}{\delta m}[m(t)]\rho{} & {} \textrm{in}\,\,Q,\\ (ii) \qquad&\partial _t \rho - \Delta \rho - \mathrm{{div}} \big ( \rho H_p(x,Du)\big )=\mathrm{{div}}\big (mH_{pp}(x,Du)Dv\big ){} & {} \textrm{in}\,\,Q, \\&\rho (x,0)=0, \; v(x,T)=\frac{\delta g}{\delta m}[m(T)]\rho (T){} & {} \textrm{in}\,\,{\mathbb {T}}^d \end{aligned}\right. \end{aligned}$$
(4.7)

is the trivial solution \((v,\rho )=(0,0)\).

Lemma 4.5

Given \(a\in {\mathcal {C}}^0(Q)\), \(b\in {\mathcal {C}}^0(Q;{\mathbb {R}}^d)\). Let (um) be a classical solution to the system (4.1) and \((v,\rho )\) be a classical solution of the perturbed linear system

$$\begin{aligned} \left\{ \begin{aligned} (i)\qquad&-\partial _t v -\Delta v+H_p(x,Du)Dv=\frac{\delta f}{\delta m}[m(t)](x)\rho +a(x,t)\qquad{} & {} \textrm{in}\,\,Q,\\ (ii)\qquad&\partial _t \rho - \Delta \rho - \mathrm{{div}} \big ( \rho H_p(x,Du)\big ) =\mathrm{{div}}\big (mH_{pp}(x,Du)Dv\big )+\mathrm{{div}}(b(x,t)){} & {} \textrm{in}\,\,Q, \\&\rho (x,0)=0, \; v(x,T)=\frac{\delta g}{\delta m}[m(T)](x)\rho (T)+c(x){} & {} \textrm{in}\,\,{\mathbb {T}}^d. \end{aligned}\right. \nonumber \\ \end{aligned}$$
(4.8)

Then, there exists a constant \(C>0\) depending on the coefficients of the problem, such that

$$\begin{aligned} \Vert v\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho \Vert _{{\mathcal {C}}^{0}}\le C\left( \Vert a\Vert _{{\mathcal {C}}^{0}}+\Vert b\Vert _{{\mathcal {C}}^{0}}+\Vert c\Vert _{{\mathcal {C}}^{0}}\right) . \end{aligned}$$

Existence and uniqueness result for a classical solution to Eq. (4.1) under rather general assumptions which, in particular, include (A1), (A2) and (A3’), can be found in [27]. The Newton system for solving Eq. (4.1), analogous to Eq. (1.4), can be written as

$$\begin{aligned} \left\{ \begin{aligned} (i)\qquad&-\partial _t u^n -\Delta u^n+H_p(x,Du^{n-1})D(u^n-u^{n-1})\\ ={}&-H(x,Du^{n-1})+f[m^{n-1}(t)](x)+\frac{\delta f}{\delta m}[m^{n-1}(t)](x)(m^n-m^{n-1})\qquad{} & {} \textrm{in}\,\,Q,\\ (ii)\qquad&\partial _t m^n- \Delta m^n - \text {div} \big ( m^nH_p(x,Du^{n-1})\big ) =\mathrm{{div}}\big (m^{n-1}H_{pp}(x,Du^{n-1})(Du^n-Du^{n-1})\big ){} & {} \textrm{in}\,\,Q, \\&m^n(x,0)=m_0(x), \; u^n(x,T)=g[m^{n-1}(T)]+\frac{\delta g}{\delta m}[m^{n-1}(T)](x)\big (m^n(T)-m^{n-1}(T)\big ){} & {} \textrm{in}\,\,{\mathbb {T}}^d. \end{aligned}\right. \end{aligned}$$
(4.9)

Existence and uniqueness of a classical solution to Eq. (4.9) can be proved as in Proposition 3.1.

Theorem 4.6

Let (um) be the solution of system (4.1) and \((u^n,m^n)\) is the sequence generated by Newton’s algorithm (4.9). Set \(v^n=u^n-u\), \(\rho ^n=m^n-m\). There exists a constant \(\eta >0\) such that if \(\Vert v^0\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^0\Vert _{{\mathcal {C}}^{0}}\le \eta \) then \(\Vert v^n\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^n\Vert _{{\mathcal {C}}^{0}}\rightarrow 0\) with a quadratic rate of convergence.

Proof

We first observe that \(v^n=u^n-u\), \(\rho ^n=m^n-m\) satisfy

$$\begin{aligned}&-\partial _t v^n -\Delta v^n+H_p(x,Du)\cdot Dv^n=\frac{\delta f}{\delta m}[m](x)(\rho ^n)+a,\\&\partial _t \rho ^n -\Delta \rho ^n-\mathrm{{div}}(\rho ^nH_p(x,Du))= \mathrm{{div}}\big (mH_{pp}(x,Du)Dv^n\big )+\mathrm{{div}}(b),\\&\rho ^n(x,0)=0,\,v^n(x,T)= \frac{\delta g}{\delta m}[m(T)](x)(\rho ^n) +c(x), \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} a&:={} H_p(x,Du)(Du^n-Du)+H(x,Du)-H(x,Du^{n-1})-H_p(x,Du^{n-1})D(u^{n}- u^{n-1})\\&\quad -\frac{\delta f}{\delta m}[m](x)(m^n-m)+f(m^{n-1})-f(m)+\frac{\delta f}{\delta m}[m^{n-1}](x)(m^n-m^{n-1}), \end{aligned} \end{aligned}$$
(4.10)
$$\begin{aligned} \begin{aligned} b&:={}-\rho ^nH_p(x,Du)-mH_{pp}(x,Du)Dv^n+ m^{n-1}H_p(x,Du^{n-1}) -mH_p(x,Du) \\&\quad + (m^{n}- m^{n-1}) H_p(x,Du^{n-1})+m^{n-1}H_{pp}(x,Du^{n-1})(Du^n-Du^{n-1}), \end{aligned} \end{aligned}$$
$$\begin{aligned} c:= & {} g[m^{n-1}(T)](x)-g[m(T)](x)-\frac{\delta g}{\delta m}[m(T)](x)(\rho ^n)\\{} & {} +\frac{\delta g}{\delta m}[m^{n-1}(T)](x)\big (m^n(T)-m^{n-1}(T)\big ). \end{aligned}$$

We first consider the nonlocal coupling terms as they constitute the main differences with respect to the Proof of Theorem 3.2. Rewrite the terms containing f in Eq. (4.10) as

$$\begin{aligned} \begin{aligned}&-\frac{\delta f}{\delta m}[m](x)(m^n-m)+f(m^{n-1})-f(m)+\frac{\delta f}{\delta m}[m^{n-1}](x)(m^n-m^{n-1})\\ ={}&f(m^{n-1})-f(m)-\frac{\delta f}{\delta m}[m^{n-1}](x)(\rho ^{n-1})+\Big (\frac{\delta f}{\delta m}[m^{n-1}](x)-\frac{\delta f}{\delta m}[m](x)\Big )(\rho ^n). \end{aligned} \end{aligned}$$

By Lipschitz continuity of \(\frac{\delta f}{\delta m}\), i.e., Eqs. (4.4) and (4.5), we get

$$\begin{aligned}{} & {} \Vert -\frac{\delta f}{\delta m}[m](\cdot )(m^n-m)+f(m^{n-1})-f(m)+\frac{\delta f}{\delta m}[m^{n-1}](\cdot )(m^n-m^{n-1})\Vert _{{\mathcal {C}}^{0}}\\{} & {} \quad \le C(\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}^2+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}\Vert \rho ^n\Vert _{{\mathcal {C}}^{0}}). \end{aligned}$$

Similarly, rewriting

$$\begin{aligned} c= & {} g[m^{n-1}(T)](x)-g[m(T)](x)-\frac{\delta g}{\delta m}[m^{n-1}(T)](x)(\rho ^{n-1}(T))\\{} & {} \quad +\Big (\frac{\delta g}{\delta m}[m^{n-1}(T)](x)-\frac{\delta g}{\delta m}[m(T)](x)\Big )(\rho ^n(T)), \end{aligned}$$

we obtain

$$\begin{aligned} \Vert c\Vert _{{\mathcal {C}}^{0}}\le C\left( \Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}^2+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}\Vert \rho ^n\Vert _{{\mathcal {C}}^{0}}\right) . \end{aligned}$$

By a straightforward adaptation of the Proof of Theorem 3.2, we estimate the other terms in a and b. Indeed, we have

$$\begin{aligned} \begin{aligned}&\left| H_p(x,Du)(Du^n-Du)+H(x,Du)-H(x,Du^{n-1})-H_p(x,Du^{n-1})D(u^{n}- u^{n-1})\right| \\ \le {}&C(\vert Dv^{n-1}\vert ^2+\vert Dv^{n-1}\vert \vert Dv^{n}\vert ), \end{aligned} \end{aligned}$$

hence

$$\begin{aligned} \Vert a\Vert _{{\mathcal {C}}^{0}}\le C\left( \Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}^2+\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}\Vert v^n\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}^2+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}\Vert \rho ^n\Vert _{{\mathcal {C}}^{0}}\right) . \end{aligned}$$

Moreover, by

$$\begin{aligned} \begin{aligned} b:={}&\rho ^n(H_p(x,Du)-H_p(x,Du^{n-1}))+\rho ^{n-1}H_{pp}(x,Du)Dv^n\\&\quad + m^{n-1}H_p(x,Du^{n-1}) -mH_p(x,Du) \\&\quad -(m^{n-1}-m) H_p(x,Du^{n-1})-m^{n-1}H_{pp}(x,Du^{n-1})(Du^{n-1}-Du), \end{aligned} \end{aligned}$$

we obtain

$$\begin{aligned} \begin{aligned}&\Vert b\Vert _{{\mathcal {C}}^{0}}\\&\quad \le {}C\left( \Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}\Vert \rho ^{n}\Vert _{{\mathcal {C}}^{0}}\right. \\&\quad \left. + \Vert v^{n}\Vert _{{\mathcal {C}}^{1,0}}\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}+\Vert \rho ^{n-1}\Vert ^2_{{\mathcal {C}}^{0}}+\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}}+(1+\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}})\Vert v^{n-1}\Vert ^2_{{\mathcal {C}}^{1,0}}\right) . \end{aligned} \end{aligned}$$

Collecting the estimate of a, b and c, by Lemma 4.5 we obtain

$$\begin{aligned}{} & {} \Vert v^n\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^n\Vert _{{\mathcal {C}}^{0}} \le C\left( \Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}^2+\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}^4+\Vert \rho ^{n-1}\Vert ^2_{{\mathcal {C}}^{0}}+(\Vert v^{n-1}\Vert _{{\mathcal {C}}^{1,0}}\right. \\{} & {} \quad \left. +\Vert \rho ^{n-1}\Vert _{{\mathcal {C}}^{0}})(\Vert v^{n}\Vert _{{\mathcal {C}}^{1,0}}+\Vert \rho ^{n}\Vert _{{\mathcal {C}}^{0}})\right) . \end{aligned}$$

We omit the rest of the proof as it is very similar to Theorem 3.2. \(\square \)

Remark 4.7

The monotonicity conditions Eq. (4.2) guarantee uniqueness of the solution to the MFG system with nonlocal coupling. If we do not assume Eq. (4.2), the result and proof methodology in this section can be adapted to prove local convergence to a stable solution of a potential MFG. Recall that, for a potential MFG, a solution (um) is said to be stable if the only solution to the linearized MFG system at (um) is the trivial one (see [12]). In other words, instead of proving that \((v,\rho )=0\) as in Lemma 4.4, we use it as part of the definition of the stable solution. We plan to study this problem in the future.

Remark 4.8

In this paper, we separated the discussions on MFGs with local or nonlocal couplings for simplicity. One can easily replace in Eq. (1.1) one or both of the terms f and g by nonlocal couplings and obtain similar results as Proposition 2.4 and Theorem 3.2. One can also consider the non-separable Hamiltonians with nonlocal congestions, e.g., replacing m in H(xmp) by a convolution with some kernels, as in [6]. However, even though the existence of solution to such type of MFGs has been demonstrated in [6], it is not clear how one can apply the Hessian condition Eq. (2.5) to show the uniqueness of a global (in time) classical solution and the stability property (Lemma 2.6) in the nonlocal congestion case. We leave further developments in this direction for the future.