1 Introduction

Systems of partial differential equations with cross-diffusion have developed into a large field of research in the last decades. Cross-diffusion, the phenomenon in which the gradient in the concentration of a species causes a flux of another species, appears in various applications as the modelling of population dynamics, e.g. [7,8,9, 21, 33] or electrochemistry, e.g. [5]. Another important biological field that is mathematically described by systems with cross-diffusion is cell-sorting or chemotaxis-like problems, e.g. [29, 30]. Chemotaxis denotes the process of cell movement provoked by chemical signals. Classical examples involve pattern formation of bacteria, e.g. [22, 36], or biomedical processes as tumour invasion, e.g. [14, 16]. For more detailed background information regarding the biological and modelling processes, we refer the reader to [28].

In the present work, we study cross-diffusion systems that are dominated by linear diffusion. More precisely, we study a system of diffusion equations that are coupled through nonlinear reaction terms

$$\begin{aligned} \partial _t w_i - \Delta w_i = \nabla \cdot F_i(w,\nabla w) \ \ \ \ \&\text { in } (0,\infty )\times {\mathbb {T}}^n,&i=1,\dots ,d. \end{aligned}$$
(1)

Here, \(w_i\) is the mass density, concentration or volumic fraction of the ith species—depending on the particular model under consideration. We choose the reaction term in divergence form for mathematical convenience. This way, the evolution is conservative, i.e. the \(w_i\)’s are preserved over time. If the reaction originates from (nonlinear) drift or diffusion processes in the absence of external forces, it can be modelled by \(F_i(w, \nabla w) = \sum _j A_{ij}(w)\nabla w_j\) for some matrices \(A(w) = \{A_{ij}(w)\}\). We suppose that the matrix is nonlinear and Lipschitz, in the sense that

$$\begin{aligned} |A_{ij}(w)| \lesssim |w|^{\mu } \ \ \ \text { and } \ \ \ |A_{ij}(w)-A_{ij}(v)|\lesssim \max \{ |w|^{\nu },|v|^{\nu }\} |w-v|, \ \ \ \forall 1\le i,j \le d, \end{aligned}$$
(2)

for some positive real numbers \(\mu \) and \(\nu \). For mathematical convenience, we choose to work on the n-dimensional torus and neglect thus any boundary effects. Furthermore, we equip system (1) with initial data \(h_1,\dots , h_d\).

In its full generality, it is very challenging to study the well-posedness for (1) without further assumptions. In the present work, our goal is to exploit the particular structure of the nonlinearity, in order to derive a well-posedness result for weak solutions with small initial data.

Theorem 1.1

For every sufficiently small set of initial data \(h=(h_1,\dots ,h_d)\), there exists a solution \(w=(w_1,\dots ,w_d)\) to system (1). The solution is unique in the class of functions satisfying

$$\begin{aligned} \left\Vert w\right\Vert _{L^\infty }+\sup _t \sqrt{t} \left\Vert \nabla w(t)\right\Vert _{L^{\infty }} \lesssim \left\Vert h\right\Vert _{L^{\infty }}. \end{aligned}$$
(3)

Moreover, if \({\tilde{w}}=(\tilde{w}_1,\dots ,\tilde{w}_d)\) is another set of solutions with initial data \(\tilde{h}=(\tilde{h}_1,\dots ,\tilde{h}_d)\), it holds that

$$\begin{aligned} \left\Vert w-\tilde{w}\right\Vert _{L^{\infty }} \lesssim \left\Vert h-\tilde{h}\right\Vert _{L^{\infty }}. \end{aligned}$$
(4)

In fact, our result is a bit stronger, in the sense that we consider a class of functions that is slightly larger than (3). The corresponding function space is defined via suitable Carleson measures or, a little more accurately, via \(L^{\infty }\) norms of certain Hardy–Littlewood maximal functions. We will discuss these spaces and their origin later in Sect. 2. A more detailed version of Theorem 1.1 will be given in Theorem 2.1. Moreover, we will see that our solutions w are of class \(C^m\), class \(C^{\infty }\) or analytic if \(F(w,\nabla w)\) is of the according class \(C^m\), \(C^{\infty }\) or \(C^{\omega }\) as well. Estimates analogous to the gradient estimate (3) hold true also for any derivatives in time and space,

$$\begin{aligned} \sup _{t,x}\, {t }^{k + \frac{|\beta |}{2}} |\partial _t^k\partial _x^{\beta } w(t,x)| \lesssim \left\Vert h\right\Vert _{L^{\infty }}, \end{aligned}$$

for any \(k\in {\mathbb {N}}_0\) and any \(\beta \in {\mathbb {N}}_0^n\) such that the derivatives exist, see Theorem 2.2.

Remark

In this work, we write \(x \lesssim y\), if the inequality only holds true up to a positive constant \(C< \infty \). For the arguments used, the precise values of these constants are irrelevant.

The reason why we choose to work in the setting of bounded functions is particularly motivated by the following specific example, which apparently belongs to the class of cross-diffusion systems modelled in (1), (2). We study a cross-diffusion system that can be modelled by a multi-dimensional advection–diffusion equation with linear drift and diffusion matrices,

$$\begin{aligned} \partial _t u_i = \nabla \cdot \left[ \sum \limits _{j=1, j \ne i}^d K_{ij}(u_j\nabla u_i - u_i \nabla u_j)\right] \ \ \ \ \&\text { in } (0,\infty )\times {\mathbb {T}}^n,&i=1,\dots ,d. \end{aligned}$$
(5)

This system describes the evolution of d different species, and \(u_i(t,x)\) plays the role of the density or volumic fraction of the ith species at time t and point x. The \(K_{ij}\)’s are the cross-diffusion coefficients, which relate the gradient of the jth species’ concentration with the flux of the ith species’ concentration. To illustrate the structure of (5), we note that the evolution of the ith species can be rewritten as the linear conservative advection–diffusion equation

$$\begin{aligned} \partial _t u_i + \nabla \cdot (b u_i) = \nabla \cdot (a\nabla u_i), \end{aligned}$$

in which the diffusion coefficient a is proportional to the concentration of the concurrent species, while the advecting velocity field b is linearly dependent on their concentration gradients. The system can be derived as a formal limit from a hopping model with size exclusion, see [6]. It was recently studied mathematically in [4].

Since the solution \(u_i\) for \(i=1,\dots ,d\) represents the volumic fraction of the ith species, it is reasonable to demand the solutions to partition unity,

$$\begin{aligned} \forall 1\le i \le d,\ \ \ u_i(t,x) \ge 0 \ \ \ \text { and } \ \ \ \sum \limits _{i=1}^d u_i(t,x)=1 \ \ \ \text { in } (0,\infty ) \times {\mathbb {T}}^n. \end{aligned}$$
(6)

The same condition has thus to be satisfied by the initial data \(g= (g_1,\dots ,g_d)\), that is,

$$\begin{aligned} \forall 1\le i \le d,\ \ \ g_i(x) \ge 0 \ \ \ \text { and } \ \ \ \sum \limits _{i=1}^d g_i(x)=1 \ \ \ \text { in }{\mathbb {T}}^n. \end{aligned}$$
(7)

In order to ensure that the partition condition in (6) is satisfied, even on a formal level, it is necessary to impose that the diffusion coefficients are symmetric in the sense that

$$\begin{aligned} K_{ij} = K_{ji}\quad \text{ for } \text{ all } 1\le i \ne j\le d. \end{aligned}$$
(8)

Even for this specific model, proving uniqueness and pointwise bounds as in (6) is rather challenging. In the following, inspired by [4], we will restrict our attention to the case, in which the cross-diffusion coefficients \(K_{ij}\) satisfy certain closeness assumptions. This way, despite the constraint in (6), we are in a situation in which our system under consideration is equivalent to that in (1), (2), and thus, Theorem 1.1 applies.

To be more specific, thanks to the partition condition in (6), we can elegantly generate a linear diffusion term in (5),

$$\begin{aligned} \partial _t u_i - K \Delta u_i = \nabla \cdot \left[ \sum \limits _{j=1, j \ne i}^d( K_{ij}-K)(u_j \nabla u_i - u_i \nabla u_j) \right]&\text { in } (0,\infty )\times {\mathbb {T}}^n,&i=1,\dots ,d, \end{aligned}$$
(9)

for any positive constant K. In order to treat the right-hand side as a perturbation, we have to assume that the coefficients are sufficiently close to each other. This is achieved, for instance, by choosing

$$\begin{aligned} K :=\frac{1}{2} \left( \max \limits _{1\le i \ne j \le d} K_{ij} + \min \limits _{1\le i \ne j \le d} K_{ij}\right) , \end{aligned}$$

and demanding that

$$\begin{aligned} \max \limits _{1\le i \ne j \le d} |K_{ij}-K|\ll K/d. \end{aligned}$$
(10)

This assumption enables us to translate (5) or (9) into a diffusion-dominant system, see Sect. 2.

Theorem 1.1 provides us, due to scaling argument, with a unique solution to (9) in the class of functions satisfying

$$\begin{aligned} \Vert u\Vert _{L^{\infty }}+ \sup _t \sqrt{Kt} \left\Vert \nabla u(t)\right\Vert _{L^\infty } \lesssim 1. \end{aligned}$$
(11)

In fact, we will see that this system can be transferred back into the original cross-diffusion system (5), (6). We thus have the following well-posedness result.

Theorem 1.2

Suppose that the coefficients \(K_{ij}\) are symmetric and sufficiently close to each other in the sense of (8) and (10). Then, for every set of initial data \(g_1,\dots ,g_d \) satisfying (7), there exists a smooth solution \(u_1,\dots ,u_d\) to the cross-diffusion system (5), (6). This solution is unique in the class of functions satisfying (11). Moreover, solutions are stable in the sense of (4).

Remark 1

We remark that Theorem 1.2 (as Theorems 2.1 and 2.3) is valid also for more general classes of cross-diffusion coefficients that vary in space and time, \(K_{ij}=K_{ij}(t,x)\), as long as (10) and (8) remain true.

We note that solutions are automatically bounded thanks to the modelling assumption (6), which makes \(L^{\infty }\) a natural space for the study of well-posedness. Moreover, the gradient estimate (3) or (11) is natural in this perturbative setting (10), as it is the standard gradient estimate for the homogeneous heat equation with \(L^{\infty }\) data—observe that the control over the gradient deteriorates as \(t\rightarrow 0\) with a rate proportional to the diffusion length. In this sense, we consider the conditions for well-posedness imposed in the present paper as optimal. Our well-posedness result for the system under consideration improves upon earlier results which require the solutions and data to be of higher regularity [4].

We finally remark that, in general, the analytic treatment of many cross-diffusion problems in the form of

$$\begin{aligned} \partial _t u - \nabla \cdot \big ( a(u)\nabla u\big ) = f(u) \end{aligned}$$

can be very challenging since the diffusion matrix a(u) neither has to be symmetric nor positive definite, which makes it hard to ensure such modelling assumptions as in (6). Another difficulty lies in the absence of a maximum principle or general parabolic regularity theory, if the diffusion matrix is not diagonal. Sufficient conditions for the global existence of weak or strong solutions of nonlinear parabolic equations are obtained, for instance, in [1, 11, 20, 26, 31]. The problem of uniqueness is in general much harder. For mildly coupled cross-diffusion equations, uniqueness has been proved by duality methods [13, 19, 27]. In some situations, the structure of the equations also allows for the application of entropy methods [10, 21, 37]. We finally mention results on weak–strong uniqueness in [4, 11, 15].

The paper is organized as follows: In Sect. 2, we introduce and discuss the precise function spaces in which we establish well-posedness. Section 3 is devoted to the study of the linear problem in these spaces. In Sect. 4, we come back to the nonlinear problem and provide the proofs of the main theorems.

2 Reformulation and results

The systems that we investigate in this work can be considered as nonlinear perturbations of multi-dimensional heat equations. Moreover, the particular (semilinear) structure of the nonlinearity considered in (1), more precisely, the properties formulated in (2), which are in turn motivated by the particular example mentioned in (5) or (9), leads to the study of bounded solutions to the respective equations in a natural way. Indeed, for any “well-behaved” norm \(\Vert \cdot \Vert \) for which we have maximal regularity estimates for the heat equation, we expect that

$$\begin{aligned} \Vert \nabla w\Vert \lesssim \Vert F(w,\nabla w)\Vert + \Vert h\Vert \lesssim \Vert |w|^{\mu }\nabla w\Vert + \Vert h\Vert \lesssim \Vert w\Vert _{L^{\infty }}^{\mu } \Vert \nabla w\Vert + \Vert h\Vert \end{aligned}$$

by the virtue of (2), and the nonlinear term on the right-hand side can be absorbed into the left-hand side provided that \(\Vert w\Vert _{L^{\infty }}\) is sufficiently small. We are thus led to considering \(\Vert h\Vert = \Vert h\Vert _{L^{\infty }}\) in the case of the initial datum—a choice that is consistent with the partition of unity condition imposed in (6), (7). The space–time maximal regularity norm has to be accordingly scale-invariant. Motivated by [25], we use the following (semi-)norms that are motivated by Carleson measure characterizations of the BMO space, see Theorem 3 of Chapter 4.4 in [34].

For notational convenience, we denote hereinafter the average integral , where \(|\Omega |\) is the volume of \(\Omega \), by . Given functions \(w:(0,\infty ) \times {\mathbb {T}}^n \rightarrow {\mathbb {R}}\) and \(F: (0,\infty ) \times {\mathbb {T}}^n \rightarrow {\mathbb {R}}^n\) and \(p\in (1,\infty )\), we define

where \(Q_R(z) :=[\frac{R^2}{2},R^2] \times B_R(z) \subseteq {\mathbb {R}}\times {\mathbb {R}}^n\). If necessary, we identify w or F with its spatial periodic extension. Based on these norms, we define two Banach spaces \(X^p\) and \(Y^p\) by

$$\begin{aligned}&X^p:=\bigl \{ w:(0,\infty ) \times {\mathbb {T}}^n \rightarrow {\mathbb {R}}\mid \left\Vert w\right\Vert _{X^p}< \infty \bigr \} \text { and }\\&Y^p :=\bigl \{ F : (0,\infty ) \times {\mathbb {T}}^n \rightarrow {\mathbb {R}}^n \mid \left\Vert F\right\Vert _{Y^p} < \infty \bigr \}. \end{aligned}$$

The underlying concept of using such norms was introduced in [25], in order to prove well-posedness for the Navier–Stokes equations with small initial data in BMO\(^{-1}\). This concept was further developed in order to establish existence and uniqueness results for various (degenerate) parabolic equations, including geometric flows with rough data [24, 35], the porous medium equation [23], the thin film equation [18, 32] and the Landau–Lifshitz–Gilbert equation [17].

By a slight abuse of notation, we generalize these norms and spaces to vector or matrix valued functions by setting

$$\begin{aligned} \left\Vert w\right\Vert _{X^p} :=\max \limits _{i=1,\dots ,d} \left\Vert w_i\right\Vert _{X^p}, \\ \left\Vert F\right\Vert _{Y^p} :=\max \limits _{i=1,\dots ,d} \left\Vert F_i\right\Vert _{Y^p}, \\ \left\Vert h\right\Vert _{L^{\infty }} :=\max \limits _{i=1,\dots ,d} \left\Vert h_i\right\Vert _{L^{\infty }}, \end{aligned}$$

for tuples \(w=(w_1,\dots ,w_d)\), \(F=(F_1,\dots ,F_d)\), and \(h=(h_1,\dots ,h_d)\).

We are now in the position to present our first result (Theorem 1.1) in a more precise manner.

Theorem 2.1

Suppose that (2) holds and let \(p > n+2\). There exist \(\delta _0 > 0\) and \(C >0\) such that for every \(\delta \le \delta _0\) and every initial data h with \(\left\Vert h\right\Vert _{L^{\infty } }\le \delta \), there exists a unique solution w to the system (1) in the class \(\left\Vert w\right\Vert _{X^p} \le C\delta \). Moreover, if \({\tilde{w}}\) is another solution with initial datum \({\tilde{h}}\), it holds that

$$\begin{aligned} \Vert w-{\tilde{w}}\Vert _{X^p} \lesssim \Vert h-{\tilde{h}}\Vert _{L^{\infty }}. \end{aligned}$$
(12)

Under additional assumptions concerning the nonlinearity F, we are able to show higher regularity of the solutions.

Theorem 2.2

Let A be of class \(C^m\), of class \(C^\infty \) or analytic. Then there exists \(\delta _0>0\), maybe even smaller than needed in Theorem 2.1, such that the dependence of the solution w from Theorem 2.1 on the initial data h is of class \(C^m\), class \(C^\infty \) or analytic. Moreover, the solution is of class \(C^m\), class \(C^\infty \) or analytic in time and space. For every \(k \in {\mathbb {N}}_0\) and every multi-index \(\beta \in {\mathbb {N}}_0^n\) such that the derivative exists, it holds that

$$\begin{aligned} \sup _{t,x} t^{k+\frac{|\beta |}{2}}\big |\partial _t^k\partial _x^{\beta }w(t,x)\big | \lesssim \left\Vert h\right\Vert _{L^{\infty } }. \end{aligned}$$
(13)

In the analytic case, there exist constants \(\Gamma >0\) and \(C>0\) independent of k and \(\beta \) such that

$$\begin{aligned} \sup _{t,x}t^{k+\frac{|\beta |}{2}}\big |\partial _t^k\partial _x^{\beta }w(t,x)\big | \le C \Gamma ^{l+|\beta |}k!\beta !\left\Vert h\right\Vert _{L^{\infty } }. \end{aligned}$$
(14)

for every \(k \in {\mathbb {N}}_0\) and every multi-index \(\beta \in {\mathbb {N}}_0^n\).

We finally turn to the explicit system given in (5), (6) and show how it fits into the general framework considered in Theorems 2.1 and 2.2. We have already seen that under the partition condition in (6), (5) is equivalent to (9). Our goal is to transfer the latter into a diffusion-dominated system with small initial data. By rescaling time, the diffusivity constant on the left-hand side can be absorbed into the cross-diffusion coefficients; that is, we consider

$$\begin{aligned} \partial _t u_i - \Delta u_i = \nabla \cdot \left[ \sum \limits _{j=1, j \ne i}^d \delta _{ij}(u_j \nabla u_i - u_i \nabla u_j) \right]&\text { in } (0,\infty )\times {\mathbb {T}}^n,&i=1,\dots ,d, \end{aligned}$$

with coefficients \(\delta _{ij} :=\frac{K_{ij}}{K}-1\). At this point, we note that the scaling factor K has to be positive. The closeness condition (10) now translates into the smallness condition \(\delta :=\max \limits _{1\le i \ne j \le d} |\delta _{ij}| \ll \frac{1}{d}\) on the new coefficients. We now use the nonlinearity of the equation to shift the smallness condition further to the initial datum. This is achieved by setting \(w_i:=\delta u_i\) and \(h_i:=\delta g_i\). The new partition conditions are thus

$$\begin{aligned} \forall 1\le i \le d,\ \ \ w_i(t,x) \ge 0 \ \ \ \text { and } \ \ \ \sum \limits _{i=1}^d w_i(t,x)=\delta \ \ \ \text { in } (0,\infty ) \times {\mathbb {T}}^n, \end{aligned}$$
(15)

and

$$\begin{aligned} \forall 1\le i \le d,\ \ \ h_i(x) \ge 0 \ \ \ \text { and } \ \ \ \sum \limits _{i=1}^d h_i(x)=\delta \ \ \ \text { in }{\mathbb {T}}^n, \end{aligned}$$
(16)

and the cross-diffusion equations become

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t w_i - \Delta w_i = \nabla \cdot \left[ \sum \limits _{j=1, j \ne i}^d \alpha _{ij} (w_j\nabla w_i - w_i \nabla w_j)\right] &{} \text { in } \left( 0,\infty \right) \times {\mathbb {T}}^n,\\ w_i(0,\cdot )=h_i &{} \text { in } {\mathbb {T}}^n, \end{array}\right. } \qquad&i = 1,\dots d, \end{aligned}$$
(17)

where the \(\alpha _{ij}\)’s are given by \(\frac{\delta _{ij}}{\delta }\) and are thus bounded, \(|\alpha _{ij}|\le 1\).

Remark

To be accurate, we have to exclude the case \(K_{ij} \equiv K\). Since we would obtain \(\delta =0\), the change of variables would not be permitted. However, this is not a significant restriction, because the cross-diffusion system would untangle into a system of d independent heat equations, which is much easier to solve.

We see that (17) has the same structure as our general model (1), where A(w) is given by

$$\begin{aligned} A_{ij}(w)={\left\{ \begin{array}{ll} \sum \limits _{k=1, j \ne i}^d \alpha _{ik}w_k &{}\text{ if } i=j,\\ -\alpha _{ij}w_i &{}\text{ if } i\not =j. \end{array}\right. } \end{aligned}$$

Apparently, (16) provides an upper bound for the initial data and the nonlinearity A(w) depends analytically on w. We are thus allowed to apply Theorems 2.1 and 2.2 and obtain well-posedness for (17) in the class \(X^p\) together with analyticity in time and space and analytic dependence on the initial data. It only remains to verify that solutions obey the partition condition (15), the argument of which we provide in Sect. 4, following [4]. Our result for the cross-diffusion system (5), (6), or equivalently, (17), (15), is thus the following.

Theorem 2.3

Suppose that the coefficients \(K_{ij}\) are symmetric in the sense of (10). Let \(p > n+2\) be given. There exist \(\delta _0 > 0\) and \(C >0\) such that for every \(\delta \le \delta _0\) and every \(h \in L^{\infty }\) with (16), there exists a unique solution w to the system (17), (15) in the class \(\left\Vert w\right\Vert _{X^p} \le C\delta \). Moreover, the solution depends analytically on time, space and the initial data, and estimates (14) and (12) hold.

3 Linear theory

Our proof of Theorem 2.1 is based on a fixed point argument. We will thus start with the study of the linear problem. Our goal in this section is to prove the following maximal regularity estimate.

Proposition 3.1

Let w be a solution of the inhomogeneous heat equation

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t w - \Delta w = \nabla \cdot F &{} \text { in } \left( 0, \infty \right) \times {\mathbb {T}}^n\\ w(0,\cdot )=h &{} \text { in } {\mathbb {T}}^n. \end{array}\right. } \end{aligned}$$
(18)

Then, it holds that

$$\begin{aligned} \left\Vert w\right\Vert _{X^p} \lesssim \left\Vert F\right\Vert _{Y^p} + \left\Vert h\right\Vert _{L^{\infty }}. \end{aligned}$$

The argumentation for establishing this proposition is similar to those in [18, 23, 24, 32].

It will be convenient to translate the problem onto the full space by extending all involved functions periodically from \({\mathbb {T}}^n\) to \({\mathbb {R}}^n\). It is clear that the corresponding norms remain unchanged under periodic extension.

We denote the heat kernel in \({\mathbb {R}}^n\) by \(\Phi \), i.e. \(\Phi (t,x)=(4\pi t)^{-\frac{n}{2}} \mathrm{e}^{-\frac{|x|^2}{4t}}\), so that solutions to (18) have the representation

$$\begin{aligned} w(t,x)&= \int \nolimits _{{\mathbb {R}}^n} \Phi (t,x-y)h(y)\mathrm{d}y + \int \nolimits _0^t \int \nolimits _{{\mathbb {R}}^n} \nabla \Phi (t-s,x-y) \cdot F(s,y)\mathrm{d}y\mathrm{d}s\\&=: {\tilde{w}}(t,x) + {\hat{w}}(t,x). \end{aligned}$$

We will estimate the homogeneous part \({\tilde{w}}\) and the inhomogeneous part \({\hat{w}}\) separately. Before doing so, we recall a standard estimate on the gradient of the heat kernel.

Lemma 3.2

For every \(p\in [1,\infty ]\), it holds that

$$\begin{aligned} \left\Vert \nabla \Phi (t, \cdot )\right\Vert _{L^p({\mathbb {R}}^n)} ~ \lesssim ~ t^{-\frac{n}{2}-\frac{1}{2}+\frac{n}{2p}}. \end{aligned}$$

We provide the simple proof for the convenience of the reader.

Proof

For any \(\alpha >0\), the function \(y^{\alpha }\mathrm{e}^{-\frac{y}{2}}\) is bounded on \([0,\infty )\) and thus

$$\begin{aligned} y^{\alpha }\mathrm{e}^{-y} = y^{\alpha }\mathrm{e}^{-\frac{y}{2}}\mathrm{e}^{-\frac{y}{2}} \lesssim \mathrm{e}^{-\frac{y}{2}}. \end{aligned}$$

Using \(y=\frac{|x|^2}{4t}\) and \(\alpha = \frac{1}{2}\), we thus obtain the pointwise estimate

$$\begin{aligned} |\nabla \Phi (t,x)| = \frac{1}{(4 \pi t)^{\frac{n}{2}}} \frac{|x|}{2t} \mathrm{e}^{-\frac{|x|^2}{4t}} \lesssim \frac{1}{t^{\frac{n+1}{2}}}\mathrm{e}^{-c\frac{|x|^2}{t}}. \end{aligned}$$

This proves the case \(p=\infty \). For smaller values of p, using a chance of variables, we compute

$$\begin{aligned} \left\Vert \nabla \Phi (t, \cdot )\right\Vert _{L^p({\mathbb {R}}^n)}^{p}&\lesssim t^{-\frac{np}{2}-p} \int \nolimits _{{\mathbb {R}}^n}|x|^p \mathrm{e}^{-\frac{p|x|^2}{4t}}\mathrm{d}x \\&\lesssim t^{-\frac{np}{2}-p+\frac{p}{2}+\frac{n}{2}}\int \nolimits _{{\mathbb {R}}^n}|y|^p \mathrm{e}^{-\frac{|y|^2}{4}}\mathrm{d}y \\&\lesssim t^{-\frac{np}{2}-\frac{p}{2}+\frac{n}{2}}. \end{aligned}$$

This proves the lemma. \(\square \)

We first turn to the estimate of the solution to the homogeneous problem \({\tilde{w}}\).

Lemma 3.3

It holds that \( \left\Vert \tilde{w}\right\Vert _{X^p} \lesssim \left\Vert {h}\right\Vert _{L^{\infty }}\).

Proof

The maximum principle for the heat equation immediately implies the bound on the \(L^{\infty }\) norm of \({\tilde{w}}\). In order to estimate the Carleson measure part of the \(X^p\) norm, we observe that

$$\begin{aligned} \sup \limits _{x \in {\mathbb {R}}^n, t\in [\frac{R^2}{2},R^2]} | \nabla \tilde{w}(t,x)|&\le \sup \limits _{x \in {\mathbb {R}}^n, t\in [\frac{R^2}{2},R^2]} \int \nolimits _{{\mathbb {R}}^n} | \nabla \Phi (t,y) {h}(x-y)|\mathrm{d}y \\&\le \sup \limits _{ t\in [\frac{R^2}{2},R^2]} \left\Vert \nabla \Phi (t, \cdot )\right\Vert _{L^1({\mathbb {R}}^n)} \left\Vert {h}\right\Vert _{L^{\infty }({\mathbb {R}}^n)}\\&\lesssim \frac{1}{\sqrt{R^2}} \left\Vert {h}\right\Vert _{L^{\infty }({\mathbb {R}}^n)}, \end{aligned}$$

due to Lemma 3.2. Using this estimate, we get

which proves the lemma. \(\square \)

Lemma 3.4

It holds that \(\left\Vert \hat{w}\right\Vert _{X^p} \lesssim \left\Vert {F}\right\Vert _{Y^p}\).

Proof

We start with the bound on the \(L^{\infty }\)-norm of \(\hat{w}\). We set \(R=\sqrt{t}\) and split the space–time integral into a diagonal and an off-diagonal part,

$$\begin{aligned} |\hat{w}(t,x)|&\le \big | \int \nolimits _{Q_R(x)} \nabla \Phi (t-s,x-y)\cdot {F}(s,y)\mathrm{d}y\mathrm{d}s \big |\\&\qquad + \big | \int \limits _{[0,R^2]\times {\mathbb {R}}^n \setminus Q_R(x)} \nabla \Phi (t-s,x-y) \cdot {F}(s,y)\mathrm{d}y\mathrm{d}s\big |\\&=:A+B. \end{aligned}$$

To bound the diagonal part, we use Hölder’s inequality and get

$$\begin{aligned} A \le \left\Vert \nabla \Phi \right\Vert _{L^q([0, \frac{R^2}{2}]\times B_R(0))} \left\Vert {F}\right\Vert _{L^p(Q_R(x))}, \end{aligned}$$

for any Hölder conjugates p and q. We have to choose q small enough such that the \(L^q\)-norm of \(\nabla \Phi \) is finite. From Lemma 3.2, we get

$$\begin{aligned} \left( \int \nolimits _0^{\frac{R^2}{2}} \int \nolimits _{B_R(0)} | \nabla \Phi |^q\mathrm{d}x\mathrm{d}s\right) ^{\frac{1}{q}} \lesssim \left( \int \nolimits _0^{\frac{R^2}{2}} t^{-\frac{nq}{2}- \frac{q}{2}+ \frac{n}{2}}\mathrm{d}t\right) ^{\frac{1}{q}}. \end{aligned}$$

The right-hand side is finite if and only if \(-\frac{nq}{2}- \frac{q}{2}+ \frac{n}{2} >-1\), which is equivalent to requiring that \(p >n+2\), as in the assumption of Theorem 2.1. We evaluate the integral on the right-hand side and obtain for the diagonal part of \({\hat{w}}\) that

Let us now consider the off-diagonal term B. Applying elementary arguments, we observe

$$\begin{aligned} B \lesssim \int \nolimits _{[0,R^2]\times {\mathbb {R}}^n} \frac{1}{R^{n+1}}\mathrm{e}^{-c\frac{|x-y|}{R}}|{F}(s,y)|\mathrm{d}y\mathrm{d}s. \end{aligned}$$
(19)

In order to control the term on the right by the Carleson measure expression which defines the \(Y^p\) norm, we have to invoke a covering argument. Using the triangle inequality and the fact that \(\sum \limits _{\tilde{x}\in R \cdot {\mathbb {Z}}^n} \mathrm{e}^{-c\frac{|x-\tilde{x}|}{R}}\) is controlled by a constant only depending on the dimension n, we notice that

$$\begin{aligned} B&\le \sum \limits _{m=0}^{\infty } \sum \limits _{\tilde{x} \in R\cdot {\mathbb {Z}}^n} \int \nolimits _{R^2\cdot 2^{-(m+1)}}^{R^2 \cdot 2^{-m}} \int \nolimits _{B_R(\tilde{x})} \mathrm{e}^{-c\frac{|x-y|}{R}} \frac{1}{R^{n+1}}| {F}(s,y)| \mathrm{d}y\mathrm{d}s \\&\lesssim \sum \limits _{m=0}^{\infty } \sum \limits _{\tilde{x} \in R \cdot {\mathbb {Z}}^n} \mathrm{e}^{-\frac{|x-\tilde{x}|}{R}}\int \nolimits _{R^2\cdot 2^{-(m+1)}}^{R^2 \cdot 2^{-m}} \int \nolimits _{B_R(x-\tilde{x})} \frac{1}{R^{n+1}}| {F}(s,x-z)|\mathrm{d}z\mathrm{d}s\\ {}&\lesssim \sum \limits _{m=0}^{\infty } \sup \limits _{\tilde{x} \in R \cdot {\mathbb {Z}}^n} \int \nolimits _{R^2\cdot 2^{-(m+1)}}^{R^2 \cdot 2^{-m}} \int \nolimits _{B_R(\tilde{x})} \frac{1}{R^{n+1}}| {F}(s,y)|\mathrm{d}y\mathrm{d}s. \end{aligned}$$

Now we claim that there exists a constant \( 0< \gamma <1\) independent of m, such that

$$\begin{aligned} \int \nolimits _{R^2\cdot 2^{-(m+1)}}^{R^2 \cdot 2^{-m}} \int \nolimits _{B_R(\tilde{x})} \frac{1}{R^{n+1}}| {F}(s,y)|\mathrm{d}y\mathrm{d}s \lesssim \gamma ^m \left\Vert {F}\right\Vert _{Y^p}. \end{aligned}$$
(20)

This estimate directly implies that \(B \lesssim \left\Vert {F}\right\Vert _{Y^p}\) as a conclusion from the geometric series’ convergence, which in turn establishes the control of the \(L^{\infty }\) norm as desired.

To prove the claim in (20), we have to refine the spatial covering. Indeed, we cover the set \((R^2 \cdot 2^{-(m+1)}, R^2 \cdot 2^{-m})\times B_R(\tilde{x} )\) by about \(2^{\frac{mn}{2}}\) many cylinders \(Q_m(z)\) of the form \(Q_m(z) :=(R^2 \cdot 2^{-(m+1)}, R^2 \cdot 2^{-m})\times B_{R\cdot 2^{-\frac{m}{2}}}(z)\). We now obtain

$$\begin{aligned} \int \nolimits _{R^2\cdot 2^{-(m+1)}}^{R^2\cdot 2^{-m}} \int \nolimits _{B_R(\tilde{x})}\frac{1}{R^{n+1}}| {F}|\mathrm{d}y\mathrm{d}s&\lesssim 2^{\frac{nm}{2}} \frac{1}{R^{n+1}}\sup \limits _z \left\Vert {F}\right\Vert _{L^1(Q_m(z))}\\&\lesssim 2^{\frac{nm}{2}} \frac{1}{R^{n+1}} \big ( R \cdot 2^{-\frac{m}{2}} \big )^{n+2}\frac{1}{R2^{-\frac{m}{2}}} \left\Vert {F}\right\Vert _{Y^1} \\&=2^{-\frac{m}{2}} \left\Vert {F}\right\Vert _{Y^1}. \end{aligned}$$

Since \(\Vert F\Vert _{Y^1}\le \Vert F\Vert _{Y^p}\) by Jensen’s inequality, we see that \(\gamma = 2^{-\frac{1}{2}}\) is a valid constant.

It remains to estimate the Carleson measure part of the \(X^p\) norm. Again, we consider separately the diagonal and the off-diagonal contribution, this time, however, by distinguishing the two cases \({{\,\mathrm{supp}\,}}( {F})\subset [0,\infty )\times {\mathbb {R}}^n \setminus Q_R(z)\) and \({{\,\mathrm{supp}\,}}( {F})\subseteq \check{Q}_R(z) :=(\frac{R^2}{4},R^2)\times B_{2R}(z)\). The general case is obtained by a standard cut-off procedure via the triangle inequality.

Case 1: We assume \({{\,\mathrm{supp}\,}}( {F})\subset [0,\infty )\times {\mathbb {R}}^n \setminus Q_R(z)\).

Then, we get for the absolute value of \(\nabla \hat{w}(x,t)\) with \(R^2=t\):

$$\begin{aligned} |\nabla \hat{w}(t,x)|&\le \int \limits _{{\mathbb {R}}^n}\int \nolimits _0^{R^2}|\nabla ^2\Phi (t-s,x-y)| | {F}(s,y)|\mathrm{d}y\mathrm{d}s \\&= \int \nolimits _{ [0,R^2]\times {\mathbb {R}}^n \setminus Q_R(x)} |\nabla ^2\Phi (t-s,x-y)| | {F}(s,y)|\mathrm{d}y\mathrm{d}s\\&\lesssim \int \nolimits _{ [0,R^2]\times {\mathbb {R}}^n } \frac{1}{R^{n+2}}\mathrm{e}^{-c\frac{|x-y|}{R}}| {F}(s,y)|\mathrm{d}y\mathrm{d}s. \end{aligned}$$

Up to a factor 1/R, the term on the right-hand side is precisely the term that we have to bound in our previous argument for B, see (19). We thus find

$$\begin{aligned} |\nabla \hat{w}(t,x)| \lesssim \frac{1}{R}\left\Vert {F}\right\Vert _{Y^p}, \end{aligned}$$

and averaging over the cylinder \(Q_R(z)\) gives

as desired.

Case 2: We assume \({{\,\mathrm{supp}\,}}({F})\subseteq \check{Q}_R(z) \).

Our argumentation for this case is based on the maximal regularity estimate for the heat equation with forcing in divergence form,

$$\begin{aligned} \left\Vert \nabla \hat{w}\right\Vert _{L^p((0,\infty )\times {\mathbb {R}}^n)} \lesssim \left\Vert {F}\right\Vert _{L^p((0,\infty ) \times {\mathbb {R}}^n)}. \end{aligned}$$

Given the restriction on the support of the forcing, we get \(\left\Vert \nabla \hat{w}\right\Vert _{L^p(Q_R(z))} \lesssim \left\Vert {F}\right\Vert _{ L^p(\check{Q}_R(z))}\). We can cover \(\check{Q}_R(z)\) by \(Q_R(z)\cup Q_{2R}(z) \cup Q_{\frac{R}{\sqrt{2}}}(z)\), and thus, we obtain

$$\begin{aligned}&R^{1-\frac{n+2}{p}}\left\Vert \nabla \hat{w}\right\Vert _{L^p(Q_R(z))} \\&\quad \le R^{1-\frac{n+2}{p}}\big ( \left\Vert {F}\right\Vert _{L^p(Q_R(z))} + \left\Vert {F}\right\Vert _{L^p(Q_{2R}(z))}+ \left\Vert {F}\right\Vert _{L^p(Q_{R/\sqrt{2}}(z))}\big )\\&\quad \lesssim R^{1-\frac{n+2}{p}} \left\Vert {F}\right\Vert _{L^p(Q_R(z))} + (2R)^{1-\frac{n+2}{p}} \left\Vert {F}\right\Vert _{L^p(Q_{2R}(z))} + \left( \frac{R}{\sqrt{2}}\right) ^{1-\frac{n+2}{p}} \left\Vert {F}\right\Vert _{L^p(Q_{R/\sqrt{2}}(z))} \\&\quad \lesssim \left\Vert {F}\right\Vert _{Y^p}. \end{aligned}$$

Maximizing R and z on the left-hand side yields the missing estimate. \(\square \)

4 The nonlinear problem

In this section, we want to prove Theorems 2.12.2 and 2.3 . Our first concern is the well-posedness of the system (1) under the assumption (2) on the nonlinearity, which we derive by a fixed point argument. To apply this argument, we need the following lemma.

Lemma 4.1

It holds that

$$\begin{aligned} \left\Vert F(v,\nabla v)-F(w,\nabla w)\right\Vert _{Y^p} \lesssim d \max \big \{\left\Vert v\right\Vert _{X^p}^{\mu },\left\Vert w\right\Vert _{X^p}^{\mu }, \left\Vert v\right\Vert _{X^p}^{\nu +1}, \left\Vert w\right\Vert _{X^p}^{\nu +1} \big \} \left\Vert v-w\right\Vert _{X^p} \end{aligned}$$
(21)

and

$$\begin{aligned} \left\Vert F(v,\nabla v)\right\Vert _{Y^p} \lesssim d \left\Vert v\right\Vert _{X^p}^{\mu +1}. \end{aligned}$$
(22)

Proof

Since \(\left\Vert F(v,\nabla v)\right\Vert _{Y^p}\) is defined as the maximum of \(\left\Vert F_i(v,\nabla v)\right\Vert _{Y^p}\), it suffices to show the statements of the lemma for some component \(F_i\) of F. We restrict our attention to the proof of the Lipschitz estimate (21). The argument for (22) is similar and even shorter. By the definition of the nonlinearity \(F_i\) and an application of the triangle inequality, it holds that

$$\begin{aligned}&\left\Vert F_i(v,\nabla v)-F_i(w,\nabla w)\right\Vert _{Y^p} = \left\Vert \sum \limits _{j=1}^d A_{ij}(v)\nabla v_j - \sum \limits _{j=1}^d A_{ij}(w)\nabla w_j\right\Vert _{Y^p}\\&\quad \le \sum \limits _{i=1}^d \left\Vert A_{ij}(v)\left( \nabla v_j -\nabla w_j\right) \right\Vert _{Y^p} + \sum \limits _{i=1}^d \left\Vert \left( A_{ij}(v)-A_{ij}(w)\right) \nabla w_j\right\Vert _{Y^p}. \end{aligned}$$

We make now use of the assumptions on the reaction matrix A in (2) and the fact that \(\Vert \nabla w\Vert _{Y^p} = \Vert w\Vert _{\dot{X}^p}\) to estimate

$$\begin{aligned}&\left\Vert F_i(v,\nabla v)-F_i(w,\nabla w)\right\Vert _{Y^p} \\&\quad \le \sum \limits _{i=1}^d \left\Vert A_{ij}(v)\right\Vert _{L^{\infty }} \left\Vert v_j-w_j\right\Vert _{X^p} + \sum \limits _{i=1}^d \left\Vert A_{ij}(v)-A_{ij}(w)\right\Vert _{L^{\infty }} \left\Vert v_j\right\Vert _{X^p}\\&\quad \lesssim d \left\Vert v\right\Vert _{L^{\infty }}^{\mu } \left\Vert v-w\right\Vert _{X^p} + d \max \left\{ \left\Vert v\right\Vert _{L^{\infty } }^{\nu }, \left\Vert w\right\Vert _{L^{\infty } }^{\nu } \right\} \left\Vert v-w\right\Vert _{L^{\infty } } \left\Vert v\right\Vert _{X^p} \\&\quad \lesssim d \max \Big \{\left\Vert v\right\Vert _{X^p}^{\mu }, \left\Vert v\right\Vert _{X^p}^{\nu +1}, \left\Vert w\right\Vert _{X^p}^{\nu +1} \Big \} \left\Vert v-w\right\Vert _{X^p}. \end{aligned}$$

This proves (21). \(\square \)

We now have all prerequisites to prove Theorem 2.1.

Proof of Theorem 2.1

Let \(w\in X^p\) be given, and let T[hw] be the solution to the linear problem (18) with inhomogeneity \(\nabla \cdot F(w,\nabla w)\) and initial data h. By Proposition 3.1, we obtain the estimate \(\left\Vert T[w,h]\right\Vert _{X^p} \lesssim \left\Vert h\right\Vert _{L^{\infty }} + \left\Vert F(w,\nabla w)\right\Vert _{Y^p}\). Applying Lemma 4.1 and using the assumptions on h, we furthermore have \({ \left\Vert h\right\Vert _{L^{\infty }}+ \left\Vert F(w, \nabla w)\right\Vert _{Y^p} \lesssim \delta + d \left\Vert w\right\Vert ^{\mu +1}_{X^p}}\), and thus, combining both estimates, we get the following bound on the solution of the linear problem

$$\begin{aligned} \left\Vert T[h,w]\right\Vert _{X^p} \le C \big ( \delta + d \left\Vert w\right\Vert ^{\mu +1}_{X^p}\big ) \end{aligned}$$

for some constant C that we keep fixed for a moment. In order to define a contraction map, we define \(\delta _0 = \left( \frac{1}{d (2C)^{\mu +1}}\right) ^{\frac{1}{\mu }}\), to the effect that

$$\begin{aligned} \left\Vert T[h,w]\right\Vert _{X^p} \le C \big ( \delta + d (2C\delta )^{\mu +1} \big )\le 2C\delta \end{aligned}$$

for any \(w \in B_{2C\delta }(0)\subseteq X^p\), provided that \(\delta \le \delta _0\). Hence, for every such \(\delta \) and every h fixed, the function \(T(h, \cdot )\) maps the set \(B_{2C\delta }(0)\subseteq X^p\) into itself.

Furthermore, by a similar argument, given \(w_1\) and \(w_2\), the linearity and Lemma 4.1 yield

$$\begin{aligned}&\left\Vert T[w_1,h]-T[w_2,h]\right\Vert _{X^p} \\&\le {\tilde{C}} d \max \big \{\left\Vert w_1\right\Vert _{X^p}^{\mu },\left\Vert w_2\right\Vert _{X^p}^{\mu },\left\Vert w_1\right\Vert _{X^p}^{\nu +1}, \left\Vert w_2\right\Vert _{X^p}^{\nu +1} \big \} \left\Vert w_1-w_2\right\Vert _{X^p}, \end{aligned}$$

for some constant \({\tilde{C}}\). Choosing \(\delta _0\) even smaller—if necessary—we thus find the contraction estimate

$$\begin{aligned} \left\Vert T[w_1,h]-T[w_2,h]\right\Vert _{X^p} \le \theta \left\Vert w_1-w_2\right\Vert _{X^p}, \end{aligned}$$

for any \({{w_1, w_2\in B_{2C\delta }(0)\subseteq X^p}}\) and some \(\theta <1\) fixed.

An application of Banach’s fixed point theorem thus provides a unique solution \(w^* \) in \( B_{2C\delta }(0)\subseteq X^p\) to the equation \(T[h,w^*]=w^*\), which is nothing but (1). As a by-product, we also have the stability estimate (12). \(\square \)

The idea how to prove the regularity of the solution was introduced in [2, 3] and is commonly referred to as “Angenent’s trick.”

Proof of Theorem 2.2

To show that the dependence of the solution on the initial datum is of class \(C^m\), \(C^{\infty }\) or \(C^{\omega }\), we consider the operator \(L: L^{\infty }({\mathbb {T}}^n, {\mathbb {R}}^d)\times X^p \rightarrow X^p\) defined by \({L[h,w] = w - T[h,w]}\), where T is the fixed point map introduced in the proof of Theorem 2.1. Defined on \(B_{\delta }(0)\times B_{2C\delta }(0) \subseteq L^{\infty }({\mathbb {T}}^n,{\mathbb {R}}^d)\times X^p\), this map T is of the same differentiability class as the nonlinearity \(F(w,\nabla w)\) through A(w), and so is the operator L by definition. Indeed, if, for instance, A is \(C^1\), we notice that

$$\begin{aligned}&\left| F_i(w_1,\nabla w_1) - F_i( w_2,\nabla w_2) - D_wF_i(w_2,\nabla w_2) (w_1-w_2)\right. \\&\qquad \left. - D_{\nabla w}F_i(w_2,\nabla w_2)(\nabla w_1-\nabla w_2)\right| \\&\quad \le \sum _{j} |A_{ij}(w_1) - A_{ij}(w_2) - A'_{ij}(w_2)(w_1-w_2) ||\nabla w_j| \\&\qquad + \sum _j |A_{ij}'(w_2)||w_1-w_2||\nabla w_1-\nabla w_2|, \end{aligned}$$

and the right-hand side is a \(o(\Vert w_1-w_2\Vert _{X^p})\) term, and the derivative of the fixed point map T[hw] with respect to w is given by the solution of the heat equation with inhomogeneities \(\nabla \cdot (D_wF_i(w,\nabla w) v+ D_{\nabla w}F_i(w,\nabla w)\nabla v)\). Next, we observe that \(L[0,0]=0\) holds and \(D_wL[0,0]= id\) is invertible. We are thus in the position to apply the (analytic) implicit function theorem (see, for example, [12]) to deduce the existence of balls \({B_{\hat{\delta }}(0) \subseteq L^{\infty }({\mathbb {T}}^n,{\mathbb {R}}^d)}\) and \(B_{\varepsilon }(0)\subseteq X^p\) and of a function \({S:L^{\infty }({\mathbb {T}}^n, {\mathbb {R}}^d) \supseteq B_{\hat{\delta }}(0) \rightarrow B_{\varepsilon }(0)\subseteq X^p}\) of class \(C^m\), \(C^\infty \) or \(C^\omega \) with \(S[0]=0\) and \(L[h,S[h]]=0\). For \(\tilde{\delta } = \min (\delta , \hat{\delta })\) and \(\tilde{\varepsilon }=\min (\varepsilon , \varepsilon _0)\), we obtain, due to the definition of L, a unique solution \(w^* \in B_{\tilde{\varepsilon }}(0)\subseteq X^p\) that depends on class \(C^m\), class \(C^\infty \) or analytically on the initial data \(h \in B_{\tilde{\delta }}(0) \subseteq L^{\infty }({\mathbb {T}}^n,{\mathbb {R}}^d)\).

Finally, we show the regularity of the solution \(w^*\). For this purpose, we define a translation operator \(\Psi _{s,a}:{\mathbb {R}}\times {\mathbb {R}}^n\rightarrow {\mathbb {R}}\times {\mathbb {R}}^n\) by

$$\begin{aligned} \Psi _{s,a}(t,x) :=(st,x+t^{\frac{1}{2}}a)\ \ \ \text { and set } \ \ \ w^*_{s,a} :=w^* \circ \Psi _{s,a}. \end{aligned}$$

We notice that \(w^*_{s,a}\) solves the equation

$$\begin{aligned} \partial _t w^*_{s,a} -\Delta w^*_{s,a} =\nabla \cdot F_{s,a}(w^*_{s,a}, \nabla w^*_{s,a}), \end{aligned}$$

where

$$\begin{aligned} F_{s,a}(w,\nabla w) :=s F(w,\nabla w)+(s-1)\nabla w + \frac{1}{2} a{ t^{-\frac{1}{2}}} w. \end{aligned}$$

By definition, it holds that \(F_{1,0}(w, \nabla w)= F(w, \nabla w)\). Let \(T_{s,a}[h,w]\) denote the solution to the linear problem with inhomogeneity \(\nabla \cdot F_{s,a}(w,\nabla w)\) and initial data h. Since \(\Vert a{ t^{-\frac{1}{2}}}w\Vert _{Y^p} \lesssim \left\Vert w\right\Vert _{L^{\infty }}\), Lemma 4.1 holds true for \(F_{s,a}\) as well. We set, similarly as above, \(L_{s,a}[h,w] = w - T_{s,a}[h,w]\). Again it holds that \(L_{1,0}[0,0]=0\) and \(D_wL_{1,0}[0,0]=id\). Another application of the implicit function theorem thus yields the existence of two numbers \(\lambda >0\) and \(\delta _0>0\) as well as a function \(S_{s,a}[h]=S[s,a,h]\) from \(B_{\lambda }(1) \times B_{\lambda }(0)\times B_{\delta }(0) \subseteq {\mathbb {R}}\times {\mathbb {R}}^n \times L^{\infty }({\mathbb {T}}^n,{\mathbb {R}}^d)\) to \(B_{2C\delta }(0)\subseteq X^p\) of class \(C^m\), \(C^\infty \) or \(C^\omega \) for every \(\delta \le \delta _0\). The function \(S_{s,a}\) satisfies \(L_{s,a}[h,S_{s,a}[h]]=0\) and thus \(S_{s,a}[h]=T_{s,a}[h,S_{s,a}[h]]\). From the above uniqueness results, we deduce that \(S_{s,a}[h]= S[h] \circ \Psi _{s,a}\). Moreover, since \({S[h](0,\cdot ) = h = S_{s,a}[h](0,\cdot )}\) and \(L_{s,a}[h,S[h]\circ \Psi _{s,a}]=0\), it holds that the dependence of \(S[h]\circ \Psi _{s,a}(t,x)\) on the parameters a and s is of class \(C^m\), \(C^\infty \) or \(C^\omega \) in a small neighbourhood of \((1,0) \in {\mathbb {R}}\times {\mathbb {R}}^n\). For finite t, we can calculate the derivatives,

$$\begin{aligned} \partial _s^k \partial _a^{\beta }\big |_{(s,a)=(1,0)}S[h]\circ \Psi _{s,a}(t,x) = t^{k+\frac{|\beta |}{2}}\partial _t^k\partial _x^{\beta }w(t,x). \end{aligned}$$

This shows that S[h] and thereby \(w^*\) as well are of class \(C^m\), class \(C^\infty \) or analytic in space and time for every \(x \in {\mathbb {T}}^n\) and every \(0<t<\infty \). Since \(\left\Vert S[h]\circ \Psi _{s,a}\right\Vert _{L^{\infty } } \le \left\Vert S[h]\right\Vert _{X^p} \lesssim \left\Vert h\right\Vert _{L^{\infty } }\), we deduce (13).

To cover the analytic case, it only remains to recall the elementary fact that we can estimate arbitrary derivatives of an analytic function f locally by \( |f^{(j)}(y)| \le C \frac{j!}{\Gamma ^j}\left\Vert f\right\Vert _{L^{\infty }} \) for some positive reals C and \( \Gamma \). This concludes the proof of Theorem 2.2. \(\square \)

We finally turn to the proof of Theorem 2.3. Thanks to the results obtained so far for the general systems, it is enough to show that solutions to (17) satisfy the partition of unity condition (15). For this purpose, it is convenient to truncate the nonlinearities. Inspired by [4], we consider

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t w_i - \Delta w_i = \nabla \cdot \hat{F}_i(w,\nabla w) &{} \text { in } \left( 0,\infty \right) \times {\mathbb {R}}^n,\\ w_i(0,\cdot )=h_i &{} \text { in } {\mathbb {R}}^n, \end{array}\right. } \qquad&i = 1,\dots d, \end{aligned}$$
(23)

with nonlinearities

$$\begin{aligned} \hat{F}_i (w,\nabla w) :=\sum \limits _{i=1, j \ne i}^d \alpha _{ij} (\hat{w}_j\nabla w_i - \hat{w}_i \nabla w_j) , \end{aligned}$$

where \(\hat{w}_i\) is obtained from \(w_i\) by restriction to the range \([0,\delta ]\), i.e. \(\hat{w}_i :=\max \left( 0, \min (\delta ,w_i)\right) \). We have to show that solutions to the truncated problem satisfy (15) and that \({\hat{w}}_i = w_i\) to deduce statement of Theorem 2.3.

Proof of Theorem 2.3

The general well-posedness result of Theorem 2.1 applies to the modified problem (23), and we see that \(\delta _0 \) has to be chosen much smaller than 1/d by a closer inspection of the proof. We denote the unique solution to (23) by \(w^*\).

Our goal is to show that \(w^*\) fulfils the partition of unity condition (15). Therefore, we start by adding up all d equations of (23). Due to the symmetry condition \(K_{ij}=K_{ji}\) imposed in (8), which is inherited by the \(\alpha _{ij}\)’s, this leads to considering the homogeneous heat equation

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t W - \Delta W = 0 &{}\text { in } (0,\infty ) \times {\mathbb {T}}^n,\\ W(0, \cdot )= \delta &{} \text { in } {\mathbb {T}}^n, \end{array}\right. } \end{aligned}$$

for \(W:=\sum \limits _{i=1,\dots ,d} w_i^*\), which is solved by \(W=\delta \).

To show that the \(w_i^*\)’s stay nonnegative, we consider the negative parts of \(w_i^*\), namely \(w_i^{*-} :=\min (0,w_i^*)\). Multiplying the ith equation of (23) by \(w_i^{*-}\) and integrating over \({\mathbb {T}}^n\) lead to

$$\begin{aligned}&\int \limits _{{\mathbb {T}}^n}w_i^{*-} \partial _t w_i^* \mathrm{d}x - \int \limits _{{\mathbb {T}}^n}w_i^{*-} \Delta w_i^*\mathrm{d}x \\&\quad = \int \limits _{{\mathbb {T}}^n}w_i^{*-} \nabla \cdot \left( \sum \limits _{j=1, j \ne i}^d \alpha _{ij} \hat{w}_j^*\nabla w_i^* \right) \mathrm{d}x - \int \limits _{{\mathbb {T}}^n}w_i^{*-} \nabla \cdot \left( \sum \limits _{j=1, j \ne i}^d \alpha _{ij} \hat{w}^*_i\nabla w_j^* \right) \mathrm{d}x. \end{aligned}$$

By a multiple integration by parts, taking into account that \(\hat{w}^*_i \nabla w_i^{*-} = 0\) and \(w_i^{*-} \hat{w}^*_i =0\), we derive the energy identity

$$\begin{aligned} 0 = \frac{1}{2} \frac{\mathrm{d}}{\mathrm{d}t} \left\Vert w_i^{*-}\right\Vert _{L^2({\mathbb {T}}^n)}^2+ \int \limits _{{\mathbb {T}}^n}|\nabla w_i^{*-}|^2 \left( 1 + \sum \limits _{j=1, j \ne i}^d \alpha _{ij}\hat{w}^*_j \right) \mathrm{d}x. \end{aligned}$$

Since \(w^* \in B_{2C\delta }(0) \subseteq X^p\), we know \(|w_j^*| \le 2C\delta \) for every j and therefore \(\hat{w}^*_j \in [0, 2C\delta ]\), where \(2C\delta < \frac{1}{2(d-1)C}\). We can assume that the positive constant C is greater than one and thus, using \(\alpha _{ij} \in [-1,1]\), we obtain

$$\begin{aligned} 1 + \sum \limits _{j=1, j \ne i}^d \alpha _{ij}\hat{w}^*_j \ge 1 -\sum \limits _{j=1, j \ne i}^d \hat{w}^*_j > 1 - (d-1)\frac{1}{2(d-1)}=\frac{1}{2}. \end{aligned}$$

Hence, the second term in (4) is nonnegative. This provides that the \(L^2\)-norm of \(w_i^{*-}\) decreases in time. Together with the fact \(h_i = w_i^*(0,\cdot )\) is nonnegative for every \(i=1,\dots ,d\), we obtain \(w_i^{*-}=0\) and thus \(w_i^*\ge 0\) almost everywhere in \((0,\infty ) \times {\mathbb {T}}^n\).

We have thus seen that \(w^*\) solves partition of unity condition (15) almost everywhere in \((0,\infty ) \times {\mathbb {T}}^n\), and thus, \(w^* = {\hat{w}}^*\) almost everywhere. It remains to note that thanks to the regularity established in Theorem 2.2 and the continuity of the nonlinearity \(\hat{F}(w,\nabla w)\), the solution \(w^*\) is continuous as well and this property expands to the whole domain \((0,\infty )\times {\mathbb {T}}^n\). \(\square \)