1 Introduction

The aim of this survey is to serve as a general introduction to the many models and techniques used to study the aggregation-diffusion family of equations

$$\begin{aligned}\frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}\Big ( \rho \nabla ( U'(\rho ) + V + W * \rho ) \Big ). \end{aligned}$$
(ADE)

This survey does not intend to provide an encyclopedic presentation including all possible references and results in maximal generality (this would be impossible), but to give newcomers a general overview of the subject.

2 Modelling

Let us discuss first the modelling, and then we devote Sect. 3 to several modelling problems which belong to this family. On the one hand, diffusion is usually modelled as a conservation law.

2.1 Continuity equations

Let \(\rho \) be a density and \(\omega \subset {{\mathbb {R}}^d}\) any control volume, if \({\textbf{F}}\) is the out-going flux

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} \int _\omega \rho \mathop {}\!\textrm{d}x =- \int _{\partial \omega } {\textbf{F}} \cdot {\textbf{n}} \mathop {}\!\textrm{d}S= -\int _{ \omega } {{\,\textrm{div}\,}}\, {\textbf{F}} \mathop {}\!\textrm{d}x \end{aligned}$$
(2.1)

Hence we arrive at the continuity equation

$$\begin{aligned} \frac{\partial \rho }{\partial t} = -{{\,\textrm{div}\,}}\, {\textbf{F}}. \end{aligned}$$
(2.2)

For the transport of heat, we use Fourier’s law to model the flux \({\textbf{F}} = -D \nabla \rho \) and yields the heat equation

$$\begin{aligned} \frac{\partial \rho }{\partial t} = D \Delta \rho . \end{aligned}$$
(HE)

A generalisation of this problem which is used to the flow a gas through porous media. We derive the model as mass balance by writing the flux in terms of a velocity \({\textbf{v}}\), i.e., \({\textbf{F}} = -\rho {\textbf{v}}\); we use Darcy’s law to relate the velocity with the pressure p as \({\textbf{v}} = - \frac{k}{\mu }\nabla p\); and lastly we use a general state equation \(p = \phi (\rho )\), where for perfect gases \(\phi (\rho ) = p_0 \rho ^\gamma \). Eventually, we recover that \({\textbf{F}} = - \nabla \Phi (\rho )\) for some non-decreasing \(\Phi : {\mathbb {R}}\rightarrow {\mathbb {R}}\), and this yields the so-called porous medium equation

figure a

It is common to select \( \Phi (\rho ) = \rho ^m\) for \(m > 0\). The most complete reference for this equation is [171]. Notice \(\Delta \Phi (\rho ) = {{\,\textrm{div}\,}}( \Phi '(\rho ) \nabla \rho )\) so we take

$$\begin{aligned} U'' (\rho ) = \frac{\Phi ' (\rho )}{\rho }. \end{aligned}$$
(2.3)

The classical modelling also allows for a transport velocity field \({\textbf{v}} (t,x)\) to be included. For example, if a pollutant is being transported by water, then \({\textbf{v}}(t,x)\) is the velocity of the water. The flux of mass being moved is recovered from the density and the velocity as \(\rho {\textbf{v}}\). A general flux then looks like

$$\begin{aligned} {\textbf{F}} = - \nabla \Phi (\rho ) + \rho {\textbf{v}}. \end{aligned}$$

Remark 2.1

We do not discuss the case of sign-changing solutions. In that setting \(u^m\) must be replaced by \(|u|^{m-1} u\).

2.2 Particle systems

We can also understand transport from the particle perspective. Consider a known velocity field \({\textbf{v}}(t,x)\). Consider N particles with positions \(X_i (t)\) of equal masses 1/N moving in the velocity field

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}X_i}{\mathop {}\!\textrm{d}t} = {\textbf{v}}(t, X_i(t)), \quad i = 1, \ldots , N. \end{aligned}$$

Notice that the evolution of the particles is decoupled. To recover a PDE we consider the so-called empirical distribution defined as

$$\begin{aligned} \mu _t^N = \sum _{j=1}^N \frac{1}{N} \delta _{X_j(t)}, \end{aligned}$$
(2.4)

where \(\delta _x\) is the Dirac delta at a point x. First, notice that

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} \int _{{\mathbb {R}}^d}\varphi (t,x) \mathop {}\!\textrm{d}\mu ^N_t (x)&= \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} \frac{1}{N} \sum _j \varphi (t, X_j(t)) \\&= \frac{1}{N} \sum _j \left( \frac{\partial \varphi }{\partial t}(t, X_j(t)) + \nabla \varphi (t,X_j(t)) \cdot \frac{\mathop {}\!\textrm{d}X_j}{\mathop {}\!\textrm{d}t} \right) \\&= \frac{1}{N} \sum _j \left( \frac{\partial \varphi }{\partial t}(t, X_j(t)) + \nabla \varphi (t,X_j(t)) \cdot {\textbf{v}}(t, X_j(t)) \right) \\&= \int _{{\mathbb {R}}^d}\left( \frac{\partial \varphi }{\partial t}(t, x) + \nabla \varphi (t,x) \cdot {\textbf{v}}(t, x) \right) \mathop {}\!\textrm{d}\mu ^N_t (x) \end{aligned}$$

If \(\varphi \in C^1_c ((0,\infty ) \times {{\mathbb {R}}^d})\) (i.e., \(\varphi (0,x) = 0\) and for T large enough \(\varphi (t,x) = 0\) for \(t \ge T\)) we integrate in time to recover

$$\begin{aligned} \int _0^\infty \int _{{\mathbb {R}}^d}\Big ( \frac{\partial \varphi }{\partial t} + {\textbf{v}} \cdot \nabla \varphi \Big ) \mathop {}\!\textrm{d}\mu ^N_t \mathop {}\!\textrm{d}t= 0. \end{aligned}$$

Integrating by parts, we observe that this is the distributional formulation of the continuity equation

$$\begin{aligned} \partial _t \mu ^N + {{\,\textrm{div}\,}}(\mu ^N {\textbf{v}} ) = 0. \end{aligned}$$

Aggregation and confinement can be modelled by considering interacting particles where the velocity field is aware of other particles

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}X_i}{\mathop {}\!\textrm{d}t} = - \sum _{\begin{array}{c} j=1 \\ j \ne i \end{array}}^N \frac{1}{N} \nabla W (X_i - X_j) - \nabla V (X_i), \quad i = 1, \ldots , N. \end{aligned}$$

The first term models the interactions between particles, with attract/repel each other as a function of their relative position, and the second term models a field attracting/repelling every particle. Noticing that

$$\begin{aligned} \sum _{\begin{array}{c} j=1 \\ j \ne i \end{array}}^N \frac{1}{N} \nabla W (x - X_j(t)) = \int _{{\mathbb {R}}^d}\nabla W (x-y) \mathop {}\!\textrm{d}\mu ^N_t (y) \end{aligned}$$

we deduce from the previous reasoning that the empirical measure \(\mu ^N\) is a distributional solution of the aggregation-confinement equation

$$\begin{aligned} \partial _t \mu = {{\,\textrm{div}\,}}(\mu \nabla ( W * \mu + V ) ). \end{aligned}$$
(2.5)

Linear diffusion can be added to the particle system by introducing noise and working with stochastic ODEs and a mean-field approximation (see Sect. 3.8).

As we will see below, special attention has been paid to the case \(V = 0\) which is simply called aggregation equation

$$\begin{aligned} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}(\rho \nabla W * \rho ). \end{aligned}$$
(AE)

As a toy model we will also discuss the confinement problem

$$\begin{aligned} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}(\rho \nabla V ). \end{aligned}$$
(CE)

Joining the many-particle approximation for aggregation with the porous medium diffusion we recover (ADE). If we look for solutions defined in the whole space, then it is natural to consider only the case of finite mass. Hence, we assume \(\rho \in L^1 ({{\mathbb {R}}^d})\) and, in some sense,

$$\begin{aligned} \rho _t(x) \rightarrow 0 \text { as } |x| \rightarrow \infty . \end{aligned}$$
(2.6)

2.3 Boundary conditions in bounded domains of \({{\mathbb {R}}^d}\)

The problems above are often posed in the whole space \({{\mathbb {R}}^d}\). However, sometimes it is convenient to restrict them to bound domains, specially for the numerical analysis. Here and below we will denote by \(\Omega \) an open and bounded set of \({{\mathbb {R}}^d}\) with a smooth boundary. Although it is possible to study these kinds of problems in less smooth domain (Lipschitz boundary, interior point conditions, etc...) this introduces additional difficulties we will not discuss. Since with this kind of equation we intend to model systems that conserve mass, there are two natural approaches.

2.3.1 No-flux conditions

We can establish that no mass will cross the boundary of \(\Omega \), \(\partial \Omega \). Going back to (2.1), a natural way is to establish no-flux conditions \({\textbf{F}} \cdot {\textbf{n}} = 0\) on \(\partial \Omega \). With our choice of flux this means

$$\begin{aligned} \rho \nabla ( U'(\rho ) + V + W*\rho ) \cdot {\textbf{n}} = 0 \quad \text { for } t > 0 \text { and } x \in \partial \Omega \end{aligned}$$
(BC)

where again \(\rho _0\) is known, and we have to be a little careful with the notion of convolution when \(\rho \) is only defined in \(\Omega \). For this, we consider the extended notion

$$\begin{aligned} W*\rho \equiv W* {\widetilde{\rho }}, \quad \text {where } {\widetilde{\rho }} = \left\{ \begin{array}{ll} \rho &{} \text {in } \Omega , \\ 0 &{} \text {elsewhere}. \end{array} \right. \end{aligned}$$

In fact, the problem (ADE) with general V and W in a bounded domain is rather delicate in terms of a priori estimates (see Sect. 4.2.3). It is sometimes preferable to discuss the generalisation

figure b

and with the corresponding boundary conditions

figure c

If \(\Omega \) is bounded we will choose

$$\begin{aligned} \nabla V \cdot {\textbf{n}} = 0 \text { in } \partial \Omega , \quad \text { and } \quad K \in C_c (\Omega \times \Omega ). \end{aligned}$$
(2.7)

A very typical example is

$$\begin{aligned} K(x,y) = \eta (x) W(x-y) \eta (y) \end{aligned}$$
(2.8)

with \(\eta \in C^\infty _c (\Omega )\).

2.3.2 Periodic boundary conditions

Another approach is to stablish that any mass that crosses the boundary will “come back” on the opposite side. This happens, for example, if all data are periodic. Some papers have studied periodic solutions defined over the unit cube \([0,1]^d\) (or, equivalently, the torus \({\mathbb {T}}^d\)). In \(d = 1\) this is equivalent to studying solutions defined over the unit circle \({\mathbb {S}}^1\). This is the case of the Kuramoto model described below. An example of the case \(d = 2\) can be found in [64].

2.4 Free energy

We conclude this section by introducing a functional which will be very useful below. We will show how (ADE) is related to the free energy

$$\begin{aligned} {\mathcal {F}}(\rho ) =\int _\Omega U(\rho ) + \int _\Omega V \rho + \frac{1}{2} \iint _{\Omega \times \Omega } W(x-y) \rho (x) \rho (y) \mathop {}\!\textrm{d}x \mathop {}\!\textrm{d}y , \end{aligned}$$
(FE)

whereas (\(\hbox {ADE}^*\)) is related to the more general free energy

figure d

For convenience, let us define the function

$$\begin{aligned} \frac{\delta {\mathcal {F}}}{\delta \rho } [\rho ] {:}{=}U'(\rho ) + V + \int _\Omega K(\cdot ,y)\rho (y) \mathop {}\!\textrm{d}y. \end{aligned}$$
(2.9)

This is called first variation of the free energy, and we will explain its importance below in Sect. 4.2.4. Hence, (\(\hbox {ADE}^*\)) can be re-written as

$$\begin{aligned} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}\left( \rho \nabla \frac{\delta {\mathcal {F}}}{\delta \rho }\right) \end{aligned}$$
(2.10)

for \({{\mathcal {F}}}\) given by (FE).

Occasionally, it will be convenient for us to use the free energies of each of the terms. Hence, we define

$$\begin{aligned} {\mathcal {U}}[\rho ]= & {} \int _{{\mathbb {R}}^d}U(\rho ), \qquad {\mathcal {V}}[\rho ] = \int _{{\mathbb {R}}^d}V \rho , \qquad \nonumber \\ {\mathcal {W}}[\rho ]= & {} \frac{1}{2} \iint _{{\mathbb {R}}^d}W(x-y) \rho (x) \rho (y) \mathop {}\!\textrm{d}x \mathop {}\!\textrm{d}y. \end{aligned}$$
(2.11)

2.5 A comment on mass conservation

Our evolution problem need, of course, an initial state \(\rho _0\). Throughout this survey we will usually restrict ourselves non-negative unit mass initial data, i.e.,

figure e

All the equations presented above are of “conservation type”. This means we expect preservation of mass, i.e., that provided unit initial mass (\(\hbox {U}_0\)) then there will be non-negative unit initial mass for all times, i.e.,

$$\begin{aligned} \int _{{\mathbb {R}}^d}\rho _t = 1 , \quad \forall t > 0. \end{aligned}$$
(U)

3 Famous particular cases

3.1 Heat equation and Fokker–Plank

As explained above, the heat Eq. (HE) is the most classical example in our family of equations. It corresponds to (ADE) with the choices \(U = \rho \log \rho \) and \(V = W = 0\). In \({{\mathbb {R}}^d}\) It is well known that (HE) admits the Gaussian solution

$$\begin{aligned} K_t (x) = (4 \pi t)^{-\frac{d}{2}} \exp \left( - \frac{|x|^2}{4t} \right) . \end{aligned}$$

All solutions (with suitable initial data) are precisely of the form

$$\begin{aligned} \rho _t = K_t * \rho _0, \end{aligned}$$

and it is not difficult to check that

$$\begin{aligned} \Vert \rho _t\Vert _{L^1} = M, \quad \Vert \rho _t - M K_t \Vert _{L^1} \rightarrow 0 \quad { as } t \rightarrow \infty . \end{aligned}$$

A good survey on the matter can be found in [172].

The fundamental solution \(K_t\) is of the so-called self-similar type

$$\begin{aligned} K_t (x) = A(t) F\left( \frac{x}{\sigma (t)} \right) . \end{aligned}$$
(SS)

In this expression, F is usually called self-similar profile and \(\sigma (t), A(t)\) the scaling parameters. Due to mass conservation the natural choice is \(A(t) = \sigma (t)^{-d}\). Plugging (SS) into (HE) and setting the total mass \(\int K_t = 1\) yields the Gaussian. Since (HE) is linear, \(MK_t\) is a solution of mass M. Usually, the self-similar solution has \(\sigma (t)\) increasing, and it only makes sense to consider \(\sigma (t) \ge 0\). Hence, we make the choice \(\sigma (0) = 0\) that corresponds to \(K_t = \delta _0\), a Dirac delta at 0. This type of solutions with \(\rho _0 = \delta _0\) are called source-type solutions.

This nice convolutional representation is not available in more general settings. There are several other ways to study the asymptotic behaviour. One of the key tricks is the study of the self-similar change of variable

$$\begin{aligned} \rho (x,t) = u(y,\tau ) (1 + t)^{-\frac{d}{2}}, \quad 2\tau = \log (1 + t), { \quad y = (1+t)^{-\frac{1}{2}} x, } \end{aligned}$$

which leads to the so-called linear Fokker–Planck equation

$$\begin{aligned} \frac{\partial u}{\partial \tau } = \Delta _y u + {{\,\textrm{div}\,}}(u y ). \end{aligned}$$
(HE-FP)

This equation corresponds to

$$\begin{aligned} U_1 (\rho ) = \rho \log \rho , \end{aligned}$$

\(V = \frac{|x|^2}{2}\) and \(W = 0\). Notice that stationary states can be obtained as solutions of the equation for the flux \(\log u + \frac{|x|^2}{2} = -h\). This yields the Gaussian profile

$$\begin{aligned} {{\widehat{u}}}(y) = AG(y), \quad G(y) = \frac{1}{\sqrt{2\pi }} e^{-\frac{|y|^2}{2}}. \end{aligned}$$
(3.1)

The constant A is the unique value such that \(\int \rho _t = M\). In original variable, we recover \(M K_t\).

3.2 Porous medium equation

As presented in the introduction, the most common example of (\(\hbox {PME}_\Phi \)) is the porous medium equation

$$\begin{aligned} \frac{\partial \rho }{\partial t} = \Delta \rho ^m . \end{aligned}$$
(PME)

The range \(m \in (0,1)\) is sometimes called fast diffusion equation. The (PME) corresponds, for \(m \ne 1\), to

$$\begin{aligned} U_m (\rho ) = \frac{\rho ^m}{m-1} \end{aligned}$$

and \(V = W = 0\). Notice that as \(m \rightarrow 1\) we have \(\rho ^{m-1}/(m-1)\) tends to \(\log \rho \) and hence \(U_m\) tends to \(U_1\). Notice that, due to the homogeneity of the equation, after rescaling space and time, we can preserve the equation (without introducing constants) and can work with (\(\hbox {U}_0\)).

Similarly to the heat equation, a suitable change of variable (detailed below in Sect. 5.1.1) converts (PME) into

$$\begin{aligned} \frac{\partial u}{\partial t} = \Delta u^m + {{\,\textrm{div}\,}}(u x) \end{aligned}$$
(PME-FP)

This corresponds to \(U = U_m\) and \(V = \frac{|x|^2}{2}\) and \(W = 0\). (PME-FP) admits a stationary solution given by the so-called Barenblatt profiles

$$\begin{aligned} {{\hat{u}}}(x) = F_B(x), \quad F_B(x) {:}{=}\left( C - \tfrac{m-1}{2m}{|x|^2}\right) ^{\frac{1}{m-1}}_+, \end{aligned}$$
(3.2)

where \(C \ge 0\). For \(m > 1\) they have compact support, and for \(m \in (0,1)\) they have power-like tails. To check whether finite-mass states are available we compute

$$\begin{aligned} \int _{{\mathbb {R}}^d}F_B \approx c + \int _1^\infty r^{\frac{2}{m-1}} r^{d-1} \mathop {}\!\textrm{d}r. \end{aligned}$$

Hence, finite mass steady states are only when \(m > \frac{d-2}{d}\). This number is critical, as well will see below. Undoing the change of variables we arrive at the self-similar solution

$$\begin{aligned} B_t = t^{-\alpha } (C - k|x|^2t^{-2\beta })^{\frac{1}{m-1}}_+, \quad \text {where } \alpha = \tfrac{d}{d(m-1)+2},\, \beta = \tfrac{\alpha }{d},\, k = \tfrac{\alpha (m-1)}{d}. \end{aligned}$$
(ZKB)

This solution is usually denoted ZKB after Zel’dovich and Kompaneets, and Barenblatt (see [171] and the references therein). Usually this is shortened to Barenblatt. When \(m > \frac{d-2}{d}\) there is a unique constant C so that (U). It can be explicitly computed (see [171, Section 17.5]).

As we let \(m \searrow \frac{d-2}{d}\) we get \(C \nearrow \infty \), so for \(m = \frac{d-2}{d}\) we expect \(B_t = \delta _0\). In fact, Brezis and Friedman [32] showed that if \(m \le \frac{d-2}{d}\) and \(\rho _0^{(j)} \in L^1 ({{\mathbb {R}}^d})\) are a sequence of initial data converging to \(\delta _0\), then the corresponding solutions converge to \(\rho ^{(j)} (t,x) \rightarrow \delta _0 (x) \otimes 1(t)\). Therefore, very fast diffusion cannot diffuse a \(\delta _0\). This is because the diffusion coefficient is \(m\rho ^{m-1}\). For \(m \ll 1\) this means that small values of \(\rho \) are indeed diffused fast (hence the name of “fast diffusion”). However, in regions where \(\rho \) is large the diffusion is slow.

It is also worth pointing to the equivalent formulation in the so-called pressure variable \(u = U'(\rho )\). We recover the Hamilton-Jacobi type equation

$$\begin{aligned} \frac{\partial u}{\partial t} = \Phi '(\rho ) \Delta u + |\nabla u|^2, \end{aligned}$$

where we recall the relation (2.3). Notice that \(\Phi '(\rho )\) can be written as a non-linear function of u. This formulation is convenient when \(m > 1\) since \(u = \frac{m}{m-1} \rho ^{m-1}\). This presentation is much better for the study of viscosity solutions (see [31, 40]).

First results of asymptotic behaviour for \(m > 1\) were obtained by regularity, self-similarity, and comparison arguments in the late 70 s and early 80 s (see [121] and the references therein). Self-similar analysis and relative entropy arguments were later developed to improve the theoretical understanding (see Sects. 5.1.1 and 5.2.3). For a detailed explanation of the theory see [170, 171].

Remark 3.1

(HE) and (PME) are “diffusive”. This can be seen in different directions. For example, at a point \((t_0,x_0)\) of local maximum of \(\rho \) we get \(\Delta \rho ^m \le 0\) and hence \(\frac{\partial \rho }{\partial t} \le 0\), i.e., the maximum values decays. This is strongly related to the maximum principle. The converse happens to the minimum if it exists (notice that non-negative solutions in \({{\mathbb {R}}^d}\) will tend to 0 as \(|x| \rightarrow \infty \)).

Remark 3.2

Let us recall some “numerological values”. The value \(m = 1-\frac{2}{d}\) is the critical value for finite-mass Barenblatt. It is also be the critical value below which there is finite-time extinction (see Sect. 4.2.1). The (3.2) has finite \(\lambda \)-moment, i.e.,

$$\begin{aligned} \int _{{\mathbb {R}}^d}|x|^\lambda B_t(x) \mathop {}\!\textrm{d}x < \infty \end{aligned}$$

if and only if \(m > 1 - \frac{2}{d+\lambda } \). In Sect. 4.3.4 we will present a third relevant value \(m = 1 - \frac{1}{d}\) coming from the Wasserstein gradient-flow analysis.

Remark 3.3

We will not discuss here the limit \(m \rightarrow 0\). In this direction there are two possibilities. On the one hand, some authors have discussed the rescaled limit \(\frac{u^m}{m} \rightarrow \log u\), and we arrive at the logarithmic diffusion equation \(\partial _t u = \Delta \log (u)\). We point to [170, Section 8.2] for a discussing on this topic and [170, Section 8.4] for historical notes. On the other hand, some authors choose to look at the limit \(|u|^{m-1} u \rightarrow \text {sign}(u)\) leads to the so-called Sign Fast Diffusion Equation \(\partial _t u = \Delta \text {sign}(u)\). Some results in this direction are available [30].

3.3 Transport problem with known potential

To understand the transport terms in the (ADE) family, we will discuss the toy problem

$$\begin{aligned} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}(\rho \nabla V) \end{aligned}$$

where V is known and does not depend on t. Imagine that \(\rho _0 \ge 0\) is radially symmetric function. Computing the balance of mass of a ball \(B_r\) we get

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t}\int _{B_r} \rho _t = \int _{B_r} {{\,\textrm{div}\,}}(\rho \nabla V) = \int _{\partial B_r} \rho \nabla V \cdot x. \end{aligned}$$
(3.3)

If \(\nabla V \cdot x \le 0\) all balls \(B_r\) are “loosing mass”, i.e., then mass if always flowing “away” from 0 and towards infinity. Hence, we can think of these kinds of potentials as “diffusive” (but not as strongly as in the sense of Remark 3.1). We will show below by explicitly solving these problems that potentials such that \(\nabla V \cdot x \ge 0\) will attract mass towards 0, asymptotically (or even instantaneously) creating a singularity.

3.4 Keller–Segel model

The Keller–Segel [133] (or Patlak–Keller–Segel model [149]) model for describes the motion of cells by chemotactical attraction by means of the coupled system

$$\begin{aligned} \left\{ \begin{array}{l} \frac{\partial \rho }{\partial t} = \Delta \rho - {{\,\textrm{div}\,}}(\rho \nabla V_t ) \\ -\Delta V_t= \rho _t . \end{array} \right. \end{aligned}$$
(KSE)

This model was first studied for \(d = 2\). Some authors replace the second equation by a more general evolutionary problem \(\varepsilon \frac{\partial V_t}{\partial t} - \Delta V_t + \alpha V_t = \rho _t\). We will discuss (KSE), which can be thought as the limit \(\varepsilon , \alpha \rightarrow 0\). This simplification was first introduced by [146] to model the case where the diffusion of the chemical is much faster than its production.

So far, this model is not in the (ADE) family. However, the can analysis the equation for \(V_t\) to recover some information. In \({{\mathbb {R}}^d}\) we can use the Fourier transform \(\mathrm F\) for the elliptic problem to recover

$$\begin{aligned} |\xi |^2 \textrm{F} [V_t] (\xi ) = \textrm{F}[\rho _t] \end{aligned}$$

and hence we can solve and recover \( V_t = W_{\textrm{N}} * \rho \), where \(W_{\mathrm N} = \textrm{F}^{-1}[|\xi |^{-2}]\). In fact, we have the closed expression

$$\begin{aligned} W_{\mathrm N} (x) = \left\{ \begin{array}{ll} -\frac{1}{2\pi } \log |x| &{} \text {if } d = 2, \\ \frac{1}{d(d-2)\omega _d} |x|^{2-d} &{} \text {if } d > 2. \end{array} \right. \end{aligned}$$
(3.4)

Notice that \(-\Delta W_{\textrm{N}} = \delta _0\). The so-called Newtonian potential corresponds to \(\mathrm N = -W_{\textrm{N}}\). Eventually, we can write this problem as (ADE) where \(U = \rho \log \rho \), \(V = 0\) and \(W = -W_{\textrm{N}}\).

We can think about the convolution term \(W*\rho _t\) as an aggregation potential. The justification is the following. If \(\rho _t\) is non-increasing, then \(V_t = W*\rho _t\) is also non-increasing, and hence \(\nabla V_t \cdot x \le 0\). Going back to Sect. 3.3, this is what we called aggregation potentials. Unfortunately, we do not have in general the sign \(\nabla V_t \cdot x \le 0\) for all \(t,x > 0\). In fact, (KSE) does not “prefer” \(x = 0\) from other points.

Changing the sign of the aggregation to \(\frac{\partial \rho }{\partial t} = \Delta \rho + {{\,\textrm{div}\,}}(\rho \nabla V_t )\) makes the problem more regularising. In [24] the author proves global well-posedness of that problem for \(d = 3\) in a bounded domain with no-flux condition was shown to have.

Sometimes it is convenient to express the problem in a single equation using operator notation:

$$\begin{aligned} \frac{\partial \rho }{\partial t} = \Delta \rho - {{\,\textrm{div}\,}}(\rho \nabla (-\Delta )^{-1} \rho ). \end{aligned}$$

When written in this form, the authors usually allow for any initial mass \(\Vert \rho _0\Vert _{L^1}\). We can rescale to impose (\(\hbox {U}_0\)). If we let \(\chi = \Vert \rho _0\Vert _{L^1}\) and \(u(t,x) = \rho (t,x) / \chi \) we arrive at the equation

figure f

In bounded domains some authors introduce Neumann boundary conditions \(\frac{\partial u}{\partial n} = \frac{\partial V}{\partial n} = 0\). The authors of [129] extend the problem to \(-\Delta V = \beta u - \mu \) which a few authors later called Jäger–Luckhaus problem (e.g., [156]). This is equivalent to replacing \((-\Delta )^{-1}\) by \((-\Delta + \textrm{id})^{-1}\).

Some authors later introduced the Keller–Segel model with porous medium diffusion. One of the model arguments is that \(m > 1\) would deal with “over-crowding” (i.e., the formation of Dirac deltas). We arrive at the problem

figure g

This model has the advantage of not having finite time blow-up (see [44]). Existence and uniqueness of solutions for \(m > 1\) and a family of W covering \(W_{\textrm{N}}\) can be found in [15]. Some authors replace \((-\Delta )^{-1}\) by \((-\Delta + \gamma \textrm{id})^{-1}\), see, e.g., [142, 163,164,165].

As we will discuss below, (\(\hbox {KSE}_{m,\chi }\)) admits global solutions in the sub-critical range \(m > 2-\frac{2}{d}\). However, for \(m < 2-\frac{2}{d}\) there is finite-time blow-up. In the critical case blow-up depends on \(\chi \). Curiously enough, the first case studied was \(d = 2\) and \(m = 1\), which is critical. A good explanation of this dichotomy is the free-energy minimisation argument which we will provide in Sect. 6.3.

3.5 Chapman–Rubinstein–Schatzman

In the context of superconductivity [90] introduced a mean-field model for “the motion of rectilinear vortices in the mixed state of a type-II semiconductor”. They argue this is the limit where the Ginzburg–Landau parameter \(\kappa \) tends to \(+\infty \) and the number of vortices becomes large one recovers the problem

$$\begin{aligned} \left\{ \begin{array}{l} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}( |\rho | \nabla V_t ) \\ -\Delta V_t + V_t = \rho _t . \end{array} \right. \end{aligned}$$
(CRSE)

The model was first introduced in \({\mathbb {R}}^2\). Then, the authors also make a particularisation to a bounded domain \(\Omega \) where they introduce Dirichlet conditions for \(V_t(x) = h_0\) on \(\partial \Omega \), and flux conditions setting the number of vortices crossing the boundary \(|\rho |\nabla v \cdot n\) known.

In fact, when \(\rho \ge 0\) it becomes of our class (ADE) (or (\(\hbox {ADE}^*\)) in bounded domains). On \({{\mathbb {R}}^d}\) we can solve as before to recover \( V_t = W * \rho _t \) where \(W = \mathrm {F^{-1}}[ (1 + |\xi |^2)^{-1} ]\). As long as \(\rho \ge 0\), this model corresponds to (ADE) where \(U, V = 0\). If we work in a bounded domain \(\Omega \) the homogeneous Dirichlet condition we recover (\(\hbox {ADE}^*\)) where \(U, V = 0\) and K is the Green kernel of the equation. Notice that K is not a compactly supported in \(\Omega \times \Omega \).

There is extensive literature on this problem. First, there were different works applying PDE theory. First, [90] proved existence of the problem posed in bounded domains. Then in [127] the authors work with non-negative \(\rho _0\) and compactly supported, using generalised characteristics. We also point to [154] where the authors perform a vanishing viscosity analysis in bounded domains.

Later, some authors applied optimal transport methods, in particular the 2-Wasserstein gradient-flow approach. In [5] the authors worked in with non-negative probability measures in a bounded domain, i.e., \({\mathcal {P}}_2 ({\overline{\Omega }})\). Later, the work was extended in [4] to a bounded domain and signed measures. They extend the Wasserstein distance to signed measures and in fact they minimise a localised energy in the space of signed unit measures in \({{\mathbb {R}}^d}\) to recover a problem of the form

$$\begin{aligned} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}( {\textbf{1}}_\Omega |\rho | \nabla V_t ), \quad \text {for } t> 0, x \in {{\mathbb {R}}^d}. \end{aligned}$$

This sparked broader interest in the family of aggregation equations (AE).

3.6 Newtonian vortex problem

In [139] the authors justify that a different limit from Gizburg–Landau equations leads to the problem without zero-order term

$$\begin{aligned} \left\{ \begin{array}{l} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}(\rho \nabla V_t ) \\ -\Delta V_t = \rho . \end{array} \right. \end{aligned}$$
(NVE)

Again, this can be set in \({{\mathbb {R}}^d}\) or in a bounded domain (with suitable boundary conditions).

This problem also attracted several authors, who brought different approaches. [139] the authors construct solutions by characteristics with the so-called “vortex-blob” method. In [143] the authors prove existence of renormalised solutions. The authors of [21] study positive and negative solutions. This is equivalent to studying non-negative solutions of the problems

$$\begin{aligned} \frac{\partial \rho }{\partial t} = \pm {{\,\textrm{div}\,}}(\rho \nabla W_{\textrm{N}} * \rho ). \end{aligned}$$

The \(W = -W_{\textrm{N}}\) can also be seen of the problem \(W = W_{\textrm{N}}\) but taken backwards in time. In fact, the authors prove that the \(+\) sign leads to asymptotic dispersion whereas − leads to blow-up.

3.7 The Caffarelli–Vázquez problem

In [38], Caffarelli and Vázquez introduced a non-local porous medium-type equation given by

$$\begin{aligned} \left\{ \begin{array}{l} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}(\rho \nabla V_t ) \\ (-\Delta )^s V_t = \rho , \end{array} \right. \end{aligned}$$

where \((-\Delta )^s\) denotes the fractional Laplacian. In \({{\mathbb {R}}^d}\) this operator is to the operational fractional power of the Laplacian, which can be rigorously defined as the operator with Fourier symbol \(|\xi |^{2s}\). This problem is often called Caffarelli–Vázquez problem and is often written in the compact form

$$\begin{aligned} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}(\rho \nabla (-\Delta )^{-s} \rho ). \end{aligned}$$
(CVE)

This is a “diffusive” operator. To solve the fractional Laplace equation we take \(W = \mathrm F^{-1}[|\xi |^{-2s}]\), and we recover the so-called Riesz potential

$$\begin{aligned} W_{\textrm{R}} (x) = c_{d,s} \left\{ \begin{array}{ll} |x|^{2s-d} &{} \text {if } 2s < d, \\ \log |x| &{} \text {if } 2s = d \end{array} \right. \end{aligned}$$

The properties convolution properties of the Riesz potential where already well known in harmonic analysis. There has been a lot of analysis for this equation: asymptotic analysis [41], sign-changing case and self-similar solutions [25], regularity theory [37, 39], and in [157] that the limit as \(s \rightarrow 1\) is (NVE). In [140] the authors construct a solution through the Wasserstein gradient-flow theory described in Sect. 4.3.

3.8 McKean–Vlasov and Kuramoto problems

The McKean–Vlasov process is given by the stochastic differential equation

$$\begin{aligned} \mathop {}\!\textrm{d}X_t = a_t \mathop {}\!\textrm{d}{\mathcal {B}}_t + b_t \mathop {}\!\textrm{d}t \end{aligned}$$

where \(a_t, b_t\) may depend on \(X_t\) and its law, and \({\mathcal {B}}_t\) is a Wiener process, i.e., we introduce Brownian motion. We can extend this idea to a system of N particles driven by coupled McKean–Vlasov equations. If we consider that \(a_t\) is constant, say \(a_t = \sqrt{2 \beta ^{-1}}\), and \(b_t\) is given by the interaction we can arrive at

$$\begin{aligned} \mathop {}\!\textrm{d}X_t^{(i)} = -V(X_t^{(i)}) -\frac{1}{N}\sum _{\begin{array}{c} j=1 \\ j \ne i \end{array}}^N \nabla W(X_t^{(i)}-X_t^{(j)}) \mathop {}\!\textrm{d}t + \sqrt{2 \beta ^{-1}} \mathop {}\!\textrm{d}{\mathcal {B}}_t. \end{aligned}$$

Through arguments of mean-field theory (or propagation of chaos see, e.g., [128]), one arrives asymptotically as \(N \rightarrow \infty \) to (ADE) with linear diffusion \(U = \beta \rho \log \rho \) (see, e.g., [64]). This is the reason the problem

$$\begin{aligned} \frac{\partial \rho }{\partial t} = \beta ^{-1} \Delta \rho + {{\,\textrm{div}\,}}( \rho \nabla (V + W* \rho ) ) \end{aligned}$$
(McKVE)

is sometimes called McKean–Vlasov problem (see, e.g., [64]). However, this denotation is not universal (see, e.g., [61]).

Kuramoto [135] introduced a model for synchronisation of chemical and biological oscillators, and it has been well-received by the community in neuroscience. It is a particular case of McKean–Vlasov problem in \(d = 1\) where \(X_t^{(i)}\) models the phase of a coupled oscillator. Therefore, we take \(X_t^{(i)} \in [0,2\pi )\) or, equivalently, periodic boundary condition. The particular choices are \(V(x) = -\omega _i\) the natural frequency of the system, and \(W(x) = \sin (x)\). The resulting (ADE) is known as the Kuramoto model

$$\begin{aligned} \left\{ \begin{array}{ll} \frac{\partial \rho }{\partial t} = \beta ^{-1} \Delta \rho + {{\,\textrm{div}\,}}( \rho (-\omega + \sin * \rho ) ) &{} t > 0, x \in (0,2\pi ), \\ \rho _t \text { is } 2\pi - \text {periodic}. \end{array} \right. \end{aligned}$$
(KE)

The case \(\beta ^{-1} = 0\) is also interesting. An interesting analysis of this problem without diffusion (i.e., \(\beta ^{-1} = 0\)) can be found in [48].

3.9 The neural networks of machine learning

A neural network is no more than a specific type of parametric function accepting an input vector \(x \in {{\mathbb {R}}^d}\) and returning \(y \in {\mathbb {R}}\). If the output is higher dimensional, then we can create a neural network for each output. In this context “training” corresponds to optimising the parameters to fit a set of prescribed data (supervised learning) or to achieve some objective (unsupervised learning). Neural networks are form by connecting so-called perceptron. A perceptron is a parametric function of the form

$$\begin{aligned} x \longmapsto \sigma ( \theta _{1} \cdot x + \theta _{2} ) \end{aligned}$$

where \(\sigma \) is called the activation function and \(\theta _{1} \in {{\mathbb {R}}^d}, \theta _{2} \in {\mathbb {R}}\). A one-layer neural network is the linear combination of the output of N perceptrons, and can be expressed

$$\begin{aligned} f_N( x | w, \theta ) {:}{=}\sum _{i=1}^N \frac{w_i}{N} \sigma ( \theta _{1,i} \cdot x + \theta _{2,i} ) \end{aligned}$$

where \(w_i \in {\mathbb {R}}\), \(\theta _{1,i} \in {{\mathbb {R}}^d}\), and \(\theta _{2,i} \in {\mathbb {R}}\). The aim is, for N fixed, to find the best parameter to approximate a target function f, in the sense that

$$\begin{aligned} \min _{(w, \theta )} \int _{{\mathbb {R}}^d}| f(x) - f_N( x | w, \theta ) |^2\mathop {}\!\textrm{d}x. \end{aligned}$$
(3.5)

The quadratic error can be replaced by more general functions.

In [118] the authors give a nice presentation of the following continuous formulation, i.e., limit as \(N \rightarrow \infty \). When can let \(\xi _i {:}{=}(w_i,\theta _{1,i},\theta _{2,i})\), \(\Xi (\xi ,x) {:}{=}w \sigma (\theta _{1,i} x + \theta _{2,i})\), and define the empirical distribution

$$\begin{aligned} \mu _N {:}{=}\frac{1}{N} \sum _{i=1} \delta _{\xi _i}. \end{aligned}$$
(3.6)

For convenience, define \(\Omega \) as the set of admissible \(\xi \). Then, the value of the neural network can be computed as

$$\begin{aligned} f_N( x | w, \theta ) = \int _\Omega \Xi (\xi ,x) \mathop {}\!\textrm{d}\mu _N(\xi ). \end{aligned}$$

We can rewrite the (3.5) to a minimisation problem over the set of empirical distributions given by (3.6). The limit as \(N \rightarrow \infty \) is now clear, we pass to the minimisation problem over \({\mathcal {P}} (\Omega )\). To smoothen the problem the authors allow for a penalisation \({{\widehat{V}}}\) (of type \(|\xi |^2/2\)), which can be understood as trying to prevent overfitting. We land on the problem

$$\begin{aligned} \min _{\mu \in {\mathcal {P}} (\Omega )} \frac{1}{2} \int _{{\mathbb {R}}^d}\left| f(x) - \int _\Omega \Xi (x, \xi ) \mathop {}\!\textrm{d}\mu (\xi ) \right| ^2\mathop {}\!\textrm{d}x + \int _\Omega {{\widehat{V}}}(\xi ) \mathop {}\!\textrm{d}\mu (\xi ). \end{aligned}$$

Expanding the square, this is precisely (\(\hbox {FE}^*\)) for \(U = 0\) and

$$\begin{aligned} K(x,y)&= \frac{1}{2} \int _\Omega \Xi (x,z) \Xi (y,z) \mathop {}\!\textrm{d}z, \\ V(x)&= -\int _\Omega \Xi (x,y) f(y) \mathop {}\!\textrm{d}y + {{\widehat{V}}}(x). \end{aligned}$$

To find local minimums of this energy, it is natural over \({\mathcal {P}}(\Omega )\), it is natural to consider the associated 2-Wasserstein gradient flow. As will show below, this leads to (\(\hbox {ADE}^*\)).

In [94] the authors generalise this result to problems with more than one layer.

3.10 Granular flow equation

Another popular example of the (ADE) family is the so-called granular flow equation. This phenomenom is usually modelled in phase space (txv) where v is the velocity. In that model \(\rho (t,x,v)\) represents the phase-space distribution. We arrive at (ADE) if the \(\rho (0,x,v)\) does not depend on x. In this setting, U models the random interactions of the granules with their environment (a fluid or heat bath), V models friction, and W models inelastic collisions between granules with different velocities. For more details see [16, 77, 176].

3.11 The power-type family of aggregation-diffusion equations

Many authors have devoted their attention to the power-type family of non-linearities given by \(U = U_m\),

$$\begin{aligned} V_\lambda (x) = \left\{ \begin{array}{ll} \frac{|x|^\lambda }{\lambda } &{} \text {if } \lambda \ne 0 \\ \log |x| &{} \text {if } \lambda = 0, \end{array} \right. , \qquad W_k (x) = \left\{ \begin{array}{ll} \frac{|x|^k}{k} &{} \text {if } k \in (-d,d) \setminus \{0\} \\ \log |x| &{} \text {if } k = 0, \end{array} \right. \end{aligned}$$

Notice that

$$\begin{aligned} W_{\textrm{N}} = - C_{d} W_{2-d} \quad \text { and } \quad W_{\textrm{R}} = - C_{d,s} W_{2s-d} \quad \text { with } C_{d}> 0, C_{d,s} > 0 \end{aligned}$$

This family covers most of the relevant examples above. We introduce the associated free energies

$$\begin{aligned} {\mathcal {U}}_m[\rho ]= & {} \int _{{\mathbb {R}}^d}U(\rho ), \qquad {\mathcal {V}}_\lambda [\rho ] = \int _{{\mathbb {R}}^d}V_\lambda \rho , \qquad \nonumber \\ {\mathcal {W}}_k[\rho ]= & {} \frac{1}{2} \iint _{{\mathbb {R}}^d}W_k(x-y) \rho (x) \rho (y) \mathop {}\!\textrm{d}x \mathop {}\!\textrm{d}y. \end{aligned}$$
(3.7)

4 Well-posedness frameworks

After setting a model like the ones described in Sect. 3, and before any other analysis can take place, the mathematician immediately aims to solve the question of existence and uniqueness. For this, one needs a suitable framework in which to define what is a “solution”.

The very first approach to a PDE is to consider the existence of so-called classical solutions. These are solutions that are smooth enough so that all the derivatives in the equation can be taken, and the equation is satisfied is all points in the domain.

The hope for classical solutions quickly vanishes for many singular problems (e.g., (PME) when \(m > 1\)). To deal with this difficulty, many authors in 20th century dedicated their efforts to the notion of weak solutions, with their derivatives taken in distributional sense. So the equation could simply be satisfied in almost all points, and additional reasonable conditions. We will make some comments on this in Sect. 4.1.

We devote Sect. 4.2 to present some good and bad expected properties in different setting. In particular, we will also make a brief stop in Sect. 4.1.6 to discuss stationary solutions (i.e., solutions no evolving in time). When not even weak solutions may exist (like in the transport equation), the community turned towards Optimal Transport techniques, which we will discuss in Sect. 4.3.

4.1 The PDE approach

First, we discuss the more classical approach to understand “solutions” of (ADE).

4.1.1 Local PDEs

Non-linear diffusion problems of the general type

$$\begin{aligned} \frac{\partial \rho }{\partial t} = \Delta \Phi (\rho ) + {{\,\textrm{div}\,}}( {\textbf{E}}(t,x, \rho (t,x))) \end{aligned}$$
(4.1)

were widely studied in the 20-th century. Through the work of many authors, different types of solutions have been studied: classical, weak, entropy, viscosity,...

Let us start by discussing the nicer problem given by non-degenerate parabolic problems. When \(\Phi , E\) are smooth and we assume \(\Phi \) is uniformly elliptic, in the sense that there exist constants such that

$$\begin{aligned} 0< c_1 \le \Phi '(u) \le c_2 < \infty , \end{aligned}$$
(4.2)

existence, uniqueness, and maximum principle hold from the classical theory. The literature is extensive: in \({{\mathbb {R}}^d}\) this issue was solved at the beginning of the twentieth century (see [136]). This book already contains several results for bounded domains (see [136, Chapter V.6 and V.7]). However, the development of regular solutions would leave room for later improvement. A good reference for Dirichlet boundary conditions is [108]. There is a series of works by [1] for Neumann/Robin boundary conditions, although they provide very general ellipticity conditions which written for systems and are not easy to apply to specific case. There are later references which are more concise, e.g., [180].

The theory of weak solution is based on the notion of weak (or distributional) derivative, and hence solutions are found in Sobolev spaces. This is typical of parabolic problems where \(\Phi \ne 0\). There are more general theory that allow for well-posedness even when \(\Phi = 0\): entropy solutions (see [47, 132] and the references therein) and viscosity solutions (e.g., [97] and the references therein)

The theory of weak and energy solutions for (PME) is well presented in the book [171]. Notice that for \(m > 1\) we have \(\Phi '(0) = 0\), and we say that the diffusion is degenerate, and for \(m < 1\) we have \(\Phi '(0) = \infty \), and we say the diffusion is singular. The first order problems \(U = 0\) (or, equivalently, \(\Phi = 0\)) are purely of the first order and hence may yield discontinuous solutions.

Remark 4.1

Occasionally, Hölder spaces are used in addition to Sobolev. For this, one can use the method of intrinsic scaling (see [108, 169] and the references therein)

Remark 4.2

Purely-diffusive problems like (PME) admit semigroup-type solutions. In the linear theory this is called Hille–Yosida theorem (see, e.g., [33, Chapter 7]) and in the non-linear setting Crandall–Liggett theorem (see [96]). The semigroup is constructed as the limit \(\tau \rightarrow 0\) of solutions of the implicit Euler scheme \(u_0 = \rho \) and

$$\begin{aligned} u_{k+1} + \tau (-\Delta u_{k+1}^m) = u_k. \end{aligned}$$

To be precise, we take \(t_k = k \tau \) and for \(t \in [t_k, t_{k+1})\) then we define

$$\begin{aligned} \rho ^{(\tau )}_t = (1-\tfrac{t-t_k}{\tau }) u_k + \tfrac{t-t_k}{\tau } u_{k+1}. \end{aligned}$$

Remark 4.3

Boundary conditions of homogeneous Dirichlet (i.e., \(u = 0\)) and Neumann (i.e., \(\nabla u \cdot n = 0\)) are fairly well understood in terms of existence and uniqueness (see, e.g., [136]). In linear problems, even so-called Robin type boundary conditions \(\nabla u \cdot n + \alpha u= 0\). The operator \(Au = -\Delta u - {{\,\textrm{div}\,}}(u \nabla V)\) with no-flux condition is monotone (i.e., \(\langle Au, u \rangle \ge 0\)) if \(\Vert \nabla V \Vert _{L^\infty }\) is small enough, so well-posedness follows from the Hille–Yosida theory. Non-linear diffusion problems are more difficult. Works for (KSE) and related problem usually take advantage of an auxiliary boundary condition for \(V_t\).

When there is no diffusion (i.e., \(U, W = 0\)) and \(V = -|x|^2\) leads to mass trying to exit any ball \(B_r\). If work on a ball \(B_R\) and we set no-flux condition \(\rho \nabla V \cdot n = 0\) on \(\partial B_R\), the result will be the formation of Dirac deltas on \(\partial B_R\). This can be properly understood by the optimal transport theory below. It points to a subtle trade-of between diffusion and aggregation.

Existence and uniqueness for (\(\hbox {ADE}^*\)) is only understood with (\(\hbox {BC}^*\)) if we assume (2.7) (see [57] for \(U = U_m\) with \(m \in (0,1)\)). The general case when (2.7) does not hold seems to be open. We provide a tricky example with no-diffusion in Remark 4.14.

4.1.2 Weak vs very weak solution

If we work in \({{\mathbb {R}}^d}\) we can take \(\varphi \in C_c^\infty ((0,T) \times {{\mathbb {R}}^d})\), multiply the equation and integrate one to recover

$$\begin{aligned} -\int _0^\infty \int _{{\mathbb {R}}^d}\rho _t \frac{\partial \varphi }{\partial t} = - \int _0^\infty \int _{{\mathbb {R}}^d}\rho \nabla ( U'(\rho ) + V + W*\rho ) \cdot \nabla \varphi \end{aligned}$$

Functions \(\rho \) that satisfy this equation are so-called weak solutions of the problem. Here, \(U(\rho )\) must have a gradient in some sense. In \({{\mathbb {R}}^d}\) we can integrate by parts again in the diffusion term to recover

$$\begin{aligned} -\int _0^\infty \int _{{\mathbb {R}}^d}\rho _t \frac{\partial \varphi }{\partial t} = \int _0^\infty \int _{{\mathbb {R}}^d}\Phi (\rho ) \Delta \varphi - \int _0^\infty \int _{{\mathbb {R}}^d}\rho \nabla ( V + W*\rho ) \cdot \nabla \varphi . \end{aligned}$$

These are the so-called very weak solutions to the problem.

If we work in a bounded domain \(\Omega \) and consider the no-flux condition we can recover weak solutions. If we want to recover the no-flux condition from this weak formulation, we need to take \(\varphi \) not vanishing in the boundary, for example \(C_c^\infty ((0,T); C^\infty ({\overline{\Omega }}))\). To integrate by parts again we would need that \(\partial _n \Phi (\rho ) = 0\) on the boundary. This is only possible if we assume that V and K do not interfere with the boundary (2.7).

4.1.3 Solutions of (\(\hbox {ADE}^*\)) by approximation

We will not make any detailed analysis of Sobolev spaces let us make some comments that the equation would have if the solutions where smooth. We define \(\Phi '(s) = s U''(s)\) and \(\Phi (0) = 0\). To give a hint on the procedure:

  1. 1.

    For uniformly elliptic \(\Phi _\varepsilon \), e.g.,

    $$\begin{aligned} \Phi '_\varepsilon \sim \varepsilon + (\Phi ' \wedge \varepsilon ^{-1}) \end{aligned}$$

    and \(E_t(x)\) known, existence is done through the classical theory above.

  2. 2.

    Letting \(\varepsilon \rightarrow 0\) yields solutions of the degenerate/singular diffusion term. This is the approach of [171].

  3. 3.

    Take \(V_\kappa , K_\kappa \) smooth approximations of V and K. Then replace \(E_t(x)\) by \(\rho \nabla (V + {{\mathcal {K}}}\rho )\) by a fixed-point argument.

  4. 4.

    Pass to the limit in \(\kappa \).

The order of steps 2 and 3 can be suitable exchanged depending on convenience. They depend on delicate a priori estimates. For two particular examples for global-in-time existence and uniqueness theory using this approach we point to [66] for \(m > 1\), [61] for \(m = 1\), and [57] for \(m \in (0,1)\) (the arguments in [171] make \(m > 1\) simpler). For (AE) we point the reader to [137] for a vanishing viscosity argument.

4.1.4 Push-forward solutions

When the velocity field v is known, the transport can be written as

$$\begin{aligned} \frac{\partial \rho }{\partial t} + {{\,\textrm{div}\,}}(\rho v_t ) = 0. \end{aligned}$$

If v is smooth problems, then can be solved by characteristics. This problem can also be recovered trying to find solutions as generalised characteristics \(\rho _t(X_t(y)) = A_t \rho _0(y)\). This leads the set of ODEs

$$\begin{aligned} {\left\{ \begin{array}{ll} \dfrac{\partial X_t}{\partial t} = v_t(X_t) &{} t > 0, \\ X_0(y) = y. \end{array}\right. } \end{aligned}$$
(4.3)

If \(v_t\) is Lipschitz in x, the field of characteristics is the unique solution of this problem. We do not concern ourselves with \(A_t\) for now.

A more elegant approach is to look for solutions of the Lagrangian formulation of the problem.

$$\begin{aligned} \int _{X_t(A)} \rho _t(x) \mathop {}\!\textrm{d}x = \int _{A} \rho _0(x) \mathop {}\!\textrm{d}x. \end{aligned}$$
(4.4)

This is equivalent to

$$\begin{aligned} \int _{{{\mathbb {R}}^d}} \rho _t(x) \varphi (x) \mathop {}\!\textrm{d}x = \int _{{\mathbb {R}}^d}\rho _0(x) \varphi (X_t(x)) \mathop {}\!\textrm{d}x. \end{aligned}$$

For the details we recommend [2, Lecture 16]. This taps into a well-understood theory coming from the optimal transport world (we will come back to this in Sect. 4.3). Given a measure \(\mu \in {\mathcal {P}}(\Omega )\) and a measurable map \(T: \Omega \rightarrow \Omega \) we can define the push-forward of \(\mu \) by T, which we denote \(T_\sharp \mu \in {\mathcal {P}}(\Omega )\), as

$$\begin{aligned} T_\sharp \mu (B) {:}{=}\mu (T^{-1}(B)) \quad \text { for all } B \text { measurable.} \end{aligned}$$

In particular, we can rewrite as (4.4)

$$\begin{aligned} \rho _t = (X_t)_\sharp \rho _0. \end{aligned}$$

There are many properties of \(\rho _t\) that be derived directly from this structure and the analysis of the ODEs (4.3).

4.1.5 Solutions by characteristics when \(U = 0\)

If \(U, W = 0\), then we can simply solve the de-coupled PDEs

$$\begin{aligned} {\left\{ \begin{array}{ll} \dfrac{\partial X_t}{\partial t} = -\nabla V(X_t) &{} t > 0, \\ X_0(y) = y, \end{array}\right. } \end{aligned}$$

and set \(\rho _t = (X_t)_\# \rho _0\). When V is of class \(C^2\), then a unique solution holds by the Picard–Lindelöf theorem. Furthermore, coming back to Sect. 3.3 if we have \(\nabla V(x) \cdot x \ge 0\) then

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} | X_t | ^2 = - \nabla V(X_t) \cdot X_t \le 0 \end{aligned}$$

so all the mass is moving towards 0. Since \(X_t = 0\) is a solution, when \(V \in C^2\) mass will concentrate in infinite time. A posteriori, for (ADE) when \(U = 0\) one can always define

$$\begin{aligned} v_t = \nabla (V + W*\rho ) \end{aligned}$$

and arrive at a transport problem.

The problem (AE) is more involved. The specific case (CRSE) was done [127], where the author arrive at closed-form formulas for the characteristics. The author of [138] proves global existence of solutions when W is good enough by a fixed-point argument with suitable a priori estimates in Sobolev spaces. This result later applied to construct local-in-time solutions by approximation [18, 19, 22], where a key difficulty is to then decide if there is global existence or finite-time blow-up. Later works use characteristics to study blow-up behaviours (see, e.g., [20]).

4.1.6 Stationary solutions

There is a particular type of solutions which will be of interest for the asymptotics. A stationary solution is a solution that does not evolve in time. We recall the equivalent writing (2.10). If \({\widehat{\rho }}\) is a stationary solution, using \(\varphi = \frac{\delta {{\mathcal {F}}}}{\delta \rho } [{\widehat{\rho }}]\) as a test function yields

$$\begin{aligned} \int _\Omega {\widehat{\rho }} \left| \nabla \frac{\delta {{\mathcal {F}}}}{\delta \rho } [{\widehat{\rho }}] \right| ^2 = 0. \end{aligned}$$

Hence, we arrive at the equation for stationary states

$$\begin{aligned} {\widehat{\rho }} \, \nabla \frac{\delta {{\mathcal {F}}}}{\delta \rho } [{\widehat{\rho }}] = 0, \qquad \text {a.e. in } \Omega . \end{aligned}$$
(ADE-S)

This means that in open connected sets where \({\hat{\rho }} > 0\) then \(\frac{\delta {{\mathcal {F}}}}{\delta \rho } [{\widehat{\rho }}]\) is constant.

Usually it is asked also that \(\rho \in L^1 \cap L^\infty \) that \(\Phi (\rho ) \in W^{1,1}_{loc}\) (where \(\Phi '(s) = s U'(s)\) and \(\Phi (0) = 0\)) and that \(\nabla V, \nabla W * \rho \in L^1_{loc}\). This regularity is not always achievable.

Remark 4.4

When they are available, these kinds of solutions will typically be recovered from a minimisation problem. We will discuss particular examples in Sect. 6.3.

Remark 4.5

Notice that solving this problem when \(U = U_m\) and \(V = V_2\) the Barenblatt profile (3.2) (i.e., (ZKB) in re-scaled variables) is a solution of (ADE-S).

4.2 A priori estimates for smooth solutions

In this section we make some formal computations to understand properties of the solution of (\(\hbox {ADE}^*\)). We will mostly assume that \(\rho \) is smooth. However, due to weak lower semi-continuity of norms, the a priori estimates still usually hold after approximation.

4.2.1 Conservation of mass

Let us consider (\(\hbox {ADE}^*\)), and take a smooth set \(\omega \subset {{\mathbb {R}}^d}\). We can formally compute the balance of mass

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} \int _\omega \rho \mathop {}\!\textrm{d}x= & {} \int _\omega \frac{\partial \rho }{\partial t} = \int _\omega {{\,\textrm{div}\,}}\left( \rho \nabla (U'(\rho ) + V + {\mathcal {K}} \rho ) \right) \mathop {}\!\textrm{d}x \\ {}= & {} -\int _{\partial \omega } \rho \nabla (U'(\rho ) + V + {\mathcal {K}} \rho ) \cdot n \mathop {}\!\textrm{d}S. \end{aligned}$$

Hence, if we have the boundary conditions (BC) we have conservation of mass. In \({{\mathbb {R}}^d}\) we will need that the decay as \(|x| \rightarrow \infty \) is fast enough. This is not always the case, as shown by the next result (see [17]).

Proposition 4.6

Let \(\rho \) be a solution of (PME) in \({{\mathbb {R}}^d}\) with \(m \in (0, \frac{d-2}{d})\) and \(d \ge 3\). Then

  1. 1.

    If \(\rho _0 \in L^{q} ({{\mathbb {R}}^d})\) with \(q = \frac{(1-m)d}{2}\) then there is finite-time extinction.

  2. 2.

    If \(\rho _0 \in L^1 ({{\mathbb {R}}^d})\) then there is (at least), infinite-time extinction.

Proof

First let \(u_0 \in L^q\). Then, we compute for sufficiently good solutions that

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} \frac{1}{q} \int _{{\mathbb {R}}^d}\rho ^q&= - C \int _{{\mathbb {R}}^d}| \nabla \rho ^{\frac{m+q}{2} }|^2 \le -C { \left( \int _{{\mathbb {R}}^d}\rho ^{\frac{m+q}{2}2^*} \right) ^{\frac{1}{2^*}} } \end{aligned}$$
(4.5)

using Sobolev’s inequality where \(2^* = \frac{1}{2} - \frac{1}{d}\). The equation \(\frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} X = - C X^{\alpha }\) where \(\alpha < 1\) has finite-time extinction in the sense \(X_T = 0\). Thus, we have the finite-time extinction

$$\begin{aligned} \int _{{\mathbb {R}}^d}\rho ^q \rightarrow 0, \qquad t \nearrow T. \end{aligned}$$

Let \(\rho _0 \in L^1({{\mathbb {R}}^d})\). Take \(\varepsilon > 0\) and an approximating sequence \(\rho ^{(\varepsilon )}_0 \in L^q\) with \(\Vert \rho _0 - \rho _0^{(\varepsilon )} \Vert _{L^1} \le \varepsilon \). For \(t \ge T_\varepsilon ^*\)

$$\begin{aligned} \Vert \rho (t) \Vert _{L^1}&\le \Vert \rho (t) - \rho ^{(\varepsilon )}(t) \Vert _{L^1} + \Vert \rho ^{(\varepsilon )} (t) \Vert _{L^1} \\&\le \Vert \rho _0 - \rho ^{(\varepsilon )}_0 \Vert _{L^1} + 0 \\&\le \varepsilon . \end{aligned}$$

\(\square \)

This effect is known as “loss of mass through infinity”, and is typical to the very fast diffusion. In some settings, this can be offset by the aggregation term. To prevent this loss of mass it suffices that \(\rho _t\) is a tight family (a notion we discuss in the next section).

4.2.2 Mass escaping through infinity: tightness

In the previous section we have discussed the problem of mass escaping through infinity “as time passes”. This phenomenon is also related to a difficult technical problem when working on \({{\mathcal {P}}}({{\mathbb {R}}^d})\). We recall some classical results for working with probability measures.

First, let \(\rho _n\) be a sequence of functions in \({{\mathcal {P}}}({{\mathbb {R}}^d}) \cap L^1 ({{\mathbb {R}}^d})\). If they have an a.e. limit we know, at most, that

$$\begin{aligned} \int _{{\mathbb {R}}^d}{\hat{\rho }} \le \liminf _n \int _{{\mathbb {R}}^d}\rho _n = 1. \end{aligned}$$

But the equality needs to hold. Furthermore, let us take a sequence in \(\mu _n \in {{\mathcal {P}}}({{\mathbb {R}}^d})\). Since they are uniformly bounded measures, and \({\mathcal {M}} ({{\mathbb {R}}^d})\) is the dual of \(C_b({{\mathbb {R}}^d})\) by the Banach-Alaoglu theorem we have that the sequence \(\mu _n\) is weak-\(\star \) pre-compact, and take \({\widehat{\mu }}\) an accumulation point. It needs not lie in \({{\mathcal {P}}}({{\mathbb {R}}^d})\).

To ensure that the limit is a measure we need to add some information. Prokhorov’s theorem (see, e.g., [3, Theorem 5.1.3]) states that a family of measures is pre-compact in \({{\mathcal {P}}}({{\mathbb {R}}^d})\) if and only it is tight.

Definition 4.7

A family of probability measures \((\mu _a)_{a \in A}\) is tight if, for every \(\varepsilon > 0\) there exists \(K_\varepsilon \subset {{\mathbb {R}}^d}\) compact such that

$$\begin{aligned} \mu _a ({{\mathbb {R}}^d}{\setminus } K_\varepsilon ) < \varepsilon , \quad \forall a \in A. \end{aligned}$$

This means, informally speaking, that the tails of \(\mu _n\) hold “uniformly small” mass.

Remark 4.8

To show that a family of probability measures \((\mu _a)_{a\in A}\) is tight it suffices to show that it is uniformly integrable against a suitable function. Following [3, Remark 5.1.5], it suffices to find \(\varphi :{{\mathbb {R}}^d}\rightarrow [0,\infty ]\) such that its sublevel sets (i.e., \(\{ x \in {{\mathbb {R}}^d}: \varphi (x) \le c \}\) for \(c \in {\mathbb {R}}\)) are compact, and we have

$$\begin{aligned} \sup _{a \in A} \int _{{\mathbb {R}}^d}\varphi (x) \mathop {}\!\textrm{d}\mu _a(x) < \infty . \end{aligned}$$
(4.6)

Very typical examples is \(\varphi (x) = |x|^p\) with \(p \ge 1\). The quantity \(\int _{{\mathbb {R}}^d}|x|^p \mathop {}\!\textrm{d}\mu \) is called the pth order moment of \(\mu \).

It is clear that (4.6) can follow for minimisers (FE) if V is good enough. We point out the following estimate that allows to use \({\mathcal {W}}\) for the same purpose.

Lemma 4.9

[84, 145] Let \(\rho \) be radially symmetric, \(\Omega = {{\mathbb {R}}^d}\), w non-decreasing such that \(w(r_0) \ge 0\) for some \(r_0\), and let \(r_1 > r_0\). Then

$$\begin{aligned} \int _{|x| \ge r_1} w(|x|) \rho (x) dx \le \frac{ \iint _{{{\mathbb {R}}^d}\times {{\mathbb {R}}^d}} | w(|x-y|) | \rho (x) \rho (y) \mathop {}\!\textrm{d}x \mathop {}\!\textrm{d}y }{ \frac{1}{2} \int _{{{\mathbb {R}}^d}} \rho (x) \mathop {}\!\textrm{d}x } \end{aligned}$$
Fig. 1
figure 1

Sets used of the proof of Lemma 4.9

Proof

Taking into account the geometry shown in Fig. 1 we can estimate

$$\begin{aligned}&\int _{|x| \ge r_1} \int _{|x-y| \ge r_0} w(|x-y|) \rho (x) \rho (y) \mathop {}\!\textrm{d}y \mathop {}\!\textrm{d}x \\&\quad \ge \int _{|x| \ge r_1} \int _{y \cdot x \le 0} w(|x-y|) \rho (x) \rho (y) \mathop {}\!\textrm{d}y \mathop {}\!\textrm{d}x \end{aligned}$$

Since \(x \cdot y \le 0\) then \(|x-y|^2 = |x|^2 + |y|^2 - 2 x \cdot y \ge |x|^2\) and since w is non-decreasing

$$\begin{aligned}&\ge \int _{|x| \ge r_1} \int _{y \cdot x \le 0} w(|x|) \rho (x) \rho (y) \mathop {}\!\textrm{d}y \mathop {}\!\textrm{d}x \\&\ge \int _{|x| \ge r_1} w(|x|) \rho (x) \int _{y \cdot x \le 0} \rho (y) \mathop {}\!\textrm{d}y \mathop {}\!\textrm{d}x. \end{aligned}$$

Since \(\rho \) is radially symmetric, for any \(x \in {{\mathbb {R}}^d}\) we have

$$\begin{aligned} \int _{y \cdot x \le 0} \rho (y) \mathop {}\!\textrm{d}y = \frac{1}{2} \int _{{{\mathbb {R}}^d}} \rho (y) \mathop {}\!\textrm{d}y. \end{aligned}$$

4.2.3 An \(L^p\) estimate

We can proceed similarly to the approach in (4.5) in more general settings. If we drop the term coming from the diffusion, then we get

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} \frac{1}{p} \int _\omega \rho ^p&= \int _{\omega } \rho ^{p-1} \frac{\partial \rho }{\partial t} \\&= \int _{\omega } \rho ^{p-1} {{\,\textrm{div}\,}}\, \rho \nabla \frac{\delta {{\mathcal {F}}}}{\delta \rho }\\&= -(p-1)\int _{\omega } \rho ^{p-1} \nabla \rho \nabla (U'(\rho ) + V + {{\mathcal {K}}}\rho ) + \int _{\partial \omega } \rho ^p \nabla \frac{\delta {{\mathcal {F}}}}{\delta \rho } \cdot n \\&\le - \frac{p-1}{p} \int _\omega \nabla {\rho _t^{p}} \nabla (V+ {{\mathcal {K}}}\rho ) + \int _{\partial \omega } \rho ^p \nabla \frac{\delta {{\mathcal {F}}}}{\delta \rho } \cdot n \end{aligned}$$

Integrating by parts one final time leads to

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} \frac{1}{p} \int _\omega \rho ^p\le & {} \frac{p-1}{p} \int _\omega \rho ^p \Delta (V+{{\mathcal {K}}}\rho ) \nonumber \\{} & {} - \frac{p-1}{p} \int _{\partial \omega } \rho ^p \nabla (V+{{\mathcal {K}}}\rho ) \cdot n(x) + \int _{\partial \omega } \rho ^p \nabla \frac{\delta {{\mathcal {F}}}}{\delta \rho } \cdot n. \end{aligned}$$
(4.7)

If we work in \({{\mathbb {R}}^d}\) with \(V, W \in W^{2,\infty }\), we assume good decay of \(\rho ^p \nabla (V+{{\mathcal {K}}}\rho )\) and \(\rho ^p \nabla \frac{\delta {{\mathcal {F}}}}{\delta \rho }\) then the boundary term vanish and we can write an inequality for equation of the form

$$\begin{aligned} \Vert \rho _t \Vert _{L^p({{\mathbb {R}}^d})} \le \Vert \rho _0 \Vert _{L^p({{\mathbb {R}}^d})} e^{Ct}. \end{aligned}$$

However, when we work in \(\Omega \) with (\(\hbox {BC}^*\)) and we take \(\omega = \Omega \), then the last term vanishes. However, we cannot relate the boundary term \(\rho ^p \nabla (V+{{\mathcal {K}}}\rho ) \cdot n(x) \) to the \(L^p\) norm of \(\rho \). In some cases it can be compensated by the negative diffusion term, but not in general. This is one of the reasons we assume (2.7) on the boundary.

4.2.4 Free-energy dissipation

Definition 4.10

(First variation) Let X be a normed space, \({{\mathcal {F}}}: X \rightarrow {\mathbb {R}}\) and fix \(\rho _0 \in X\). We say \({{\mathcal {F}}}\) it is Gateaux differentiable at X if for all \(\varphi \in X\) there exists

$$\begin{aligned} \mathop {}\!\textrm{d}{{\mathcal {F}}}[\rho _0, \varphi ] {:}{=}\lim _{\varepsilon \rightarrow 0} \frac{{{\mathcal {F}}}[\rho _0 + \varepsilon \varphi ] - {{\mathcal {F}}}[\rho _0]}{\varepsilon }. \end{aligned}$$

The function \(\mathop {}\!\textrm{d}{{\mathcal {F}}}\) is usually called first variation of \({{\mathcal {F}}}\). Some authors extend this definition and require just that the limit exists for \(\varphi \) in a dense subset of X. Functionals that admit first variation are called Gateaux differentiable.

When \(X \subset L^1 (\Omega )\), the first variation can sometimes be represented by a function \(f_0 \in L^1(\Omega )\) the sense that

$$\begin{aligned} \int _\Omega f_0(x) \varphi (x) \mathop {}\!\textrm{d}x = \mathop {}\!\textrm{d}{{\mathcal {F}}}[\rho _0, \varphi ], \qquad \forall \varphi \in L^\infty (\Omega ). \end{aligned}$$

If this is the case, we denote \(\frac{\delta F}{\delta \rho }[\rho _0] {:}{=}f_0\). It is easy to check formally that, when \({\mathcal {F}}\) is given by (\(\hbox {FE}^*\)) and \(K(x,y) = K(y,x)\), then (2.9). In many of the examples the first variation can be rigorously computed.

If \(K(x,y) = K(y,x)\) then the solutions of (\(\hbox {ADE}^*\)) admits (\(\hbox {FE}^*\)) as Lyapunov functional. This functional is usually called free energy. For smooth solutions of (\(\hbox {ADE}^*\)) (with sufficiently good decay as \(|x| \rightarrow \infty \) in unbounded domains) we can compute

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} {\mathcal {F}}[\rho ] = - \int _\Omega \rho \left| \nabla \frac{\delta {\mathcal {F}}}{\delta \rho } \right| ^2 + \int _{\partial \Omega } \rho \nabla \frac{\delta {\mathcal {F}}}{\delta \rho } \cdot n. \end{aligned}$$
(4.8)

When \(\Omega = {{\mathbb {R}}^d}\) (with sufficiently good decay of \(\rho \nabla \frac{\delta {\mathcal {F}}}{\delta \rho }[\rho _t]\)) or we have the no-flux condition (\(\hbox {BC}^*\)), we recover the decay of the free energy, i.e., along solutions of (\(\hbox {ADE}^*\)) we have decay of free-energy dissipation

$$\begin{aligned} {\mathcal {F}}[\rho _t] - {\mathcal {F}}[\rho _s] = -\int _s^t \int _\Omega \rho \left| \nabla \frac{\delta {\mathcal {F}}}{\delta \rho } [\rho ] \right| ^2. \end{aligned}$$
(4.9)

In particular if \(s < t\) then \({{\mathcal {F}}}[\rho _s] > {{\mathcal {F}}}[\rho _t]\).

Remark 4.11

Often the free energy dissipation allows us to prove tightness. One approach is to recall Remark 4.8 and notice that free energy dissipation includes a term \(\int _\Omega V \rho \). The term \(\iint W(x-y) \rho (x) \rho (y)\) can also be useful due to Lemma 4.9.

4.2.5 Negative Sobolev spaces

Given a set \(\Omega \), the negative Sobolev spaces can be defined by distributions with finite norm

$$\begin{aligned} \Vert f \Vert _{W^{-1,1} (\Omega )} = \inf _{ f = {{\,\textrm{div}\,}}\, F } \Vert F \Vert _{L^1 (\Omega )} \end{aligned}$$

Using the free-energy dissipation

$$\begin{aligned} \Big \Vert \frac{\partial \rho _t}{\partial t} \Big \Vert _{W^{-1,1} (\Omega )}&= \inf _{ \frac{\partial \rho _t}{\partial t} = {{\,\textrm{div}\,}}\, F } \Vert F \Vert _{L^1 (\Omega )} \le \Vert \rho \nabla \tfrac{\delta {\mathcal {F}}}{\delta \rho } \Vert _{L^1} \\&\le \Vert \rho _t\Vert _{L^1}^{\frac{1}{2}} \left( \int _\Omega \rho _t \left| \nabla \tfrac{\delta {\mathcal {F}}}{\delta \rho } \right| ^2 \right) ^{\frac{1}{2}} = \Vert \rho _t\Vert _{L^1}^{\frac{1}{2}} \left( - \frac{d}{dt} {\mathcal {F}}[\rho _t] \right) ^{\frac{1}{2}} \end{aligned}$$

Also \(\Vert \rho _t \Vert _{L^1} \le \Vert \rho _0 \Vert _{L^1}\). In the smooth setting, we can integrate in time

$$\begin{aligned} \Vert \rho _t - \rho _s \Vert _{W^{-1,1}} \le C |t-s|^{\frac{1}{2}} ({\mathcal {F}}[\rho _s] - {\mathcal {F}}[\rho _t])^{\frac{1}{2}}. \end{aligned}$$

Since \(L^1\) is compactly embedded in \(W^{-1,1}\), when \({{\mathcal {F}}}\) is bounded below, the sequence

$$\begin{aligned} \rho ^{[n]}_t = \rho _{n+t} \end{aligned}$$

is relatively compact in \(C([0,1]; W^{-1,1}(B_R))\) due to Ascoli–Arzelá theorem. Let \({\widehat{\rho }} \in W^{-1,1}\) be the limit of a sub-sequence. Since \({\mathcal {F}}[\rho _t]\) is bounded below and non-increasing, there is a limit \({\widehat{{{\mathcal {F}}}}}\). Then

$$\begin{aligned} \Vert {\widehat{\rho }}_t - {\widehat{\rho }}_s \Vert _{W^{-1,1}} \le C|t-s|^{\frac{1}{2}} ({\widehat{{{\mathcal {F}}}}} - {\widehat{{{\mathcal {F}}}}}) = 0. \end{aligned}$$

So \({\widehat{\mu }}\) does not depend on t. In the cases where one can prove stronger convergences, it can be shown that it is a stationary solution, i.e.,

$$\begin{aligned} {{\,\textrm{div}\,}}\left( {\widehat{\rho }} \nabla \frac{\delta {\mathcal {F}}}{\delta \rho }[{\widehat{\rho }}]\right) = 0. \end{aligned}$$
(4.10)

4.3 Optimal-transport approach: Wasserstein spaces and the JKO scheme

As we already mentioned in Sect. 4.1.4, there is a clear connection between some of our equations and Optimal Transport problems. Otto realised in [147] (see also [148]) that the (PME) in \({{\mathbb {R}}^d}\) corresponds, in fact, to the gradient flow of a \({\mathcal {F}}\) with respect to the 2-Wasserstein distance. This idea led to the so-called “Otto calculus” that was later perfected.

The following pages are meant as an extremely brief introduction. For the reader interested in deepening their understanding of these spaces and their connection to PDEs we suggest reading the lecture notes [2] first, and then the very detailed books [3, 177]. A very nice presentation with an emphasis on the examples can be found in [153].

4.3.1 The Wasserstein distances

The Wasserstein metrics were a tool already used by people studying optimal transport between probability measures. We give a brief definition. Let X be a metric space with a Borel algebra, and recall the definition of push-forward given in Sect. 4.1.4. If we are given \(\mu , \nu \in {\mathcal {P}}(X)\), a transport map is a measurable function T such that \(\nu = T_\sharp \mu \). We can formally think about the p-Wasserstein distance between probability distributions are defined for \(p \ge 1\) as

$$\begin{aligned} \left( \inf _{T:\nu = T_\sharp \mu } \int _{X} |x - T(x)|^p \mathop {}\!\textrm{d}\mu (x) \right) ^{\frac{1}{p}} \end{aligned}$$
(4.11)

This is the so-called Monge optimisation problem. The infimum is not always achieved, and sometimes no valid T exists. Kantorovich improved this idea by introducing transport plans. A transport plan is a probability distribution in \({\mathcal {P}}(X \times X)\) that have \(\mu , \nu \), i.e.,

$$\begin{aligned} \mu (A) = \pi (A \times X), \quad \nu (A)= \pi (X \times A), \quad \text { for all } A \subset X \text { measurable}. \end{aligned}$$

This set is denoted by \(\Pi (\mu ,\nu )\), and is never empty because of the measure \(\mu \otimes \nu \in \Pi (\mu ,\nu )\), the unique measure such that

$$\begin{aligned} (\mu \otimes \nu ) (A \times B) = \mu (A) \nu (B). \end{aligned}$$

The Wasserstein distances are defined as

$$\begin{aligned} {{\mathfrak {W}}}_p(\mu ,\nu ) {:}{=}\left( \min _{\pi \in \Pi (\mu , \nu )} \iint _{X \times X} |x-y|^p \mathop {}\!\textrm{d}\pi (x,y) \right) ^{\frac{1}{p}}. \end{aligned}$$
(4.12)

These distances are sometimes called also Kantorovich–Rubinstein distance. The distance between two measures in \({\mathcal {P}} (X)\) can be infinite, unless the finite pth order moment is finite. Hence, we define the p-Wasserstein space

$$\begin{aligned} {\mathcal {P}}_p (X) {:}{=}\left\{ \mu \in {\mathcal {P}}(X): \int _{X} |x|^p \mathop {}\!\textrm{d}\mu (x) < \infty \right\} . \end{aligned}$$

The pair \(({\mathcal {P}}_p (X), {{\mathfrak {W}}}_p)\) is a metric space. If X is complete, then so this is space is also complete and it is equivalent to the narrow convergence (see [3, Proposition 7.1.5]). This is why we pick \(X = {\overline{\Omega }}\).

As it usually happens in these families of spaces, there are three highlighted cases: \(p = 1,2, \infty \). The case \(p = 2\) will appear in the next section, and it is the one directly related to (ADE). The advantage of \(p = 1\) is the so-called Kantorovich–Rubinstein duality

$$\begin{aligned} {{\mathfrak {W}}}_1 (\mu , \nu ) = \sup \left\{ \int \psi \mathop {}\!\textrm{d}\mu - \int \psi \mathop {}\!\textrm{d}\nu : \text {for } \psi \text { such that } \sup _{x \ne y}\frac{|\psi (x) - \psi (y)|}{|x-y|} \le 1 \right\} .\nonumber \\ \end{aligned}$$
(4.13)

A very interesting property of the Wasserstein distance is that it admits a dynamic characterisation through the so-called Benamou–Brenier formula which is stated easiest in \({{\mathbb {R}}^d}\)

$$\begin{aligned} {{\mathfrak {W}}}_2 (\mu _0,\mu _1) = \inf _{ (\mu ,v) \in {{\mathcal {A}}(\mu _0,\mu _1)} } \int _0^1 \int _{{\mathbb {R}}^d}|v_t|^2 \mathop {}\!\textrm{d}\mu _t \mathop {}\!\textrm{d}t, \end{aligned}$$
(4.14)

where

$$\begin{aligned} {\mathcal {A}} (\mu _0, \mu _1)&= \Big \{ (\mu , \nu ) : \mu \in C( [0,1], {\mathcal {P}}_2 ({{\mathbb {R}}^d}) ), v \in L^2([0,1] \times {\mathbb {R}}^d,\mu )^d, \\ \mu (0)&= \mu _0, \mu (1) = \mu _1 \text {, and } \frac{\partial \mu _t}{\partial t} + {{\,\textrm{div}\,}}(\mu _t v_t) = 0 \Big \} \end{aligned}$$

is set space of admissible paths between \(\mu _0\) and \(\mu _1\).

4.3.2 Otto’s calculus

The notion of calculus in these spaces is a little tricky, since \({\mathcal {P}}_p({{\mathbb {R}}^d})\) are not vector spaces. However, they are subsets of the linear space of measures. In fact, it can be though of as a manifold: we can define tangent through curves. In particular, \(\rho _0 \in {\mathcal {P}}_2({{\mathbb {R}}^d})\) and take \(\varphi \in C_c^\infty ({{\mathbb {R}}^d})\) a test function. Consider the curve of probabilities

$$\begin{aligned} \rho _t = ( \textrm{id} + t \nabla \varphi )_\sharp \rho _0. \end{aligned}$$

It is not too difficult to see [2, Lecture 16] that \(\rho _t\) is a distributional solution of the continuity equation

$$\begin{aligned} \frac{\partial \rho _t}{\partial t} + {{\,\textrm{div}\,}}( \rho _t v_t ) = 0 \end{aligned}$$

where \(v_t = \nabla \varphi \circ ( \textrm{id} + t \nabla \varphi )^{-1}\). Then formally the tangent space \(T_{\rho _0} {\mathcal {P}}_2({{\mathbb {R}}^d})\) is made of

$$\begin{aligned} s {:}{=}\frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t}\Big |_{t=0} \rho _t = - {{\,\textrm{div}\,}}( \rho _t \nabla \varphi ). \end{aligned}$$

This allows to formally construct the gradient of functions. Take the free energy of the diffusion term \({\mathcal {U}}\). In distributional sense the Wasserstein gradient is

$$\begin{aligned} \langle \nabla _{{{\mathfrak {W}}}_2} U [\rho _0] , s \rangle&{:}{=}\frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t}\Big |_{t=0} {\mathcal {U}}[\rho _t] = \int _{{\mathbb {R}}^d}U'(\rho _0 ) \frac{\partial }{\partial t}\Big |_{t=0} \rho _t \\&= - \int _{{\mathbb {R}}^d}U'(\rho _0 ) {{\,\textrm{div}\,}}(\rho _0 \nabla \varphi ) \\&=-\int _{{\mathbb {R}}^d}\varphi {{\,\textrm{div}\,}}( \rho _0 \nabla U'(\rho _0) ). \end{aligned}$$

In general

$$\begin{aligned} \langle \nabla _{{{\mathfrak {W}}}_2} {\mathcal {F}}[\rho _0], s \rangle = -\int _\Omega \varphi {{\,\textrm{div}\,}}\left( \rho \nabla \frac{\delta {\mathcal {F}}}{\delta \rho }[ \rho _0 ]\right) \end{aligned}$$
(4.15)

Hence, (ADE) in \({{\mathbb {R}}^d}\) is formally written can be formally written as the 2-Wasserstein gradient flow in the sense that

$$\begin{aligned} \frac{\partial \rho }{\partial t} = - \nabla _{{{\mathfrak {W}}}_2} {\mathcal {F}}[\rho _t] \end{aligned}$$

of the free energy given by (FE).

Remark 4.12

If we work in a bounded domain, then we must study the space \({\mathcal {P}}_2 ({\overline{\Omega }})\) and understand how the no-flux condition (BC) appears naturally in this context. One approach is to consider \(\varphi \in C_c^\infty ({\overline{\Omega }})\) (meaning their support is bounded and does not reach the boundary). In order for \(\textrm{id} + t \nabla \varphi : {\overline{\Omega }} \rightarrow {\overline{\Omega }}\) when \(t \in (-\varepsilon ,\varepsilon )\) we must specify \(\rho \nabla \varphi \cdot n = 0\). This is related to how the no-flux appears. We will continue this discussion below in Remark 4.13.

4.3.3 Rigorous gradient flow structure. The JKO scheme

The idea of writing gradient flows in terms of minimising movements is due to De Giorgi (see [101]). To draw a quick analogy think that we can to minimise a function \(F:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\). A continuous ODE going to local minimima is the \({{\mathbb {R}}^d}\) gradient flow

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}x}{\mathop {}\!\textrm{d}t} = - \nabla F(x) \end{aligned}$$

The implicit Euler scheme leads to the implicit gradient-descent method

$$\begin{aligned} \frac{x_{n+1} - x_n}{\tau } = - \nabla F(x_{n+1}). \end{aligned}$$

We have avoided making explicit the dependence of \(x_n\) with respect to \(\tau \). We can rewrite this the minimisation of the so-called proximal function

$$\begin{aligned} x_{n+1} = \mathop {{{\,\textrm{argmin}\,}}}\limits _{x \in {{\mathbb {R}}^d}} \left( \frac{|x-x_n|^2}{2} + \tau F(x) \right) \end{aligned}$$
(4.16)

If F is smooth this minimisation problem has a unique solution for \(\tau \) small. These are the so-called minimising movements. Then one can do a piece-wise constant interpolation \(x^{(\tau )} (t) {:}{=}x_n\) when \(t \in [\tau n, \tau (n+1))\), or even linear interpolation.

The notion of minimising movement can be generalised to metric spaces and, in particular, we can construct in 2-Wasserstein space

$$\begin{aligned} \mu _{n+1} = \mathop {{{\,\textrm{argmin}\,}}}\limits _{\mu \in {\mathcal {P}}_2({{\mathbb {R}}^d})} \left( \frac{{{\mathfrak {W}}}_2(\mu ,\mu _{n})^2}{2} + \tau {\mathcal {F}}(\mu ) \right) . \end{aligned}$$
(4.17)

Jordan, Kinderlehrer, and Otto proved in the seminal paper [130] that this procedure works for (HE-FP). The book [3] is devoted to proving how these minimising movements lead to (ADE) in \({{\mathbb {R}}^d}\) for fairly general \({\mathcal {F}}\). A good notion of solution for the limit of minimising movements is that of curves of maximal slope. Often, it is even possible to get back a suitable solution of a PDE.

Remark 4.13

In bounded domains, we continue the comment in Remark 4.12. In general, if one minimises over \({\mathcal {P}}_2({\overline{\Omega }})\), then we expect to arrive (ADE) with the no-flux condition (\(\hbox {BC}^*\)). We recommend [153, Chapter 8].

However, the general setting is tricky as presented by the following example.

Remark 4.14

Take \(\Omega = (-1,1)\) (i.e., \(d=1\)), \(U = W = 0\) and \(V(x) = a x\). Then we recover the transport equation \(\frac{\partial \rho }{\partial t} = a \frac{\partial \rho }{\partial x}\). The solutions of this equation are always of the form \(\rho (t,x) = \rho _0(x+at)\). The boundary condition \(a \rho = 0\) can be forced on one side, but never the other. From the perspective below one can always consider a Dirac delta on the right-hand side. This negates somehow \(a\rho = 0\) on that side, but it is only possible solution. In fact, it is recovered from the vanishing viscosity formulation.

Notice that this examples does not satisfy (2.7). If we replaced V by a different smooth potential such that (2.7), then the characteristics will not reach the boundary in finite time. Hence, we will expect a smooth solution.

Remark 4.15

Otto’s original paper [147] already covered p-Wassersten distances with \(p \ne 2\). In particular, it showed that the doubly nonlinear equation

$$\begin{aligned} \frac{\partial u}{\partial t} = \Delta _p u^m \end{aligned}$$

has a p-Wasserstein gradient-flow structure. Recently, a new article has appeared justifying the JKO scheme for these problems, see [42].

4.3.4 Convexity and McCann’s condition

In the same way that (4.16) works better if \(F:{{\mathbb {R}}^d}\rightarrow {\mathbb {R}}\) is convex, there is a suitable notion of convexity that works for (4.17). The correct extension is convexity along geodesics also called displacement convexity, i.e., if \(\rho _t\) is a geodesic from \(\rho _0\) to \(\rho _1\) then \( t \mapsto {\mathcal {F}}[\rho _t] \) is a convex \([0,1] \rightarrow {\mathbb {R}}\) function. It is usually called displacement convexity and was introduced by McCann in [145].Footnote 1

Before we present this result, it is interesting to discuss the structure of geodesics in Wasserstein space (see, e.g., [2]). As the simplest case, assume that \(\mu , \nu \) are absolutely continuous probability distributions and compactly supported. Then, Monge’s problem (4.11) for \(p = 2\) admits an optimal transport map \(T = \nabla \varphi \) such that \(\nu = T_\sharp \mu \) (see, e.g., [2, Lecture 5]). Furthermore, the unique geodesic between \(\mu \) and \(\nu \) is given by

$$\begin{aligned} \rho _t = ((1-t) \textrm{id} + t T)_\# \mu . \end{aligned}$$

In [145] the author proves that if VW are strictly convex, \(W(x) = W(-x)\) then

$$\begin{aligned} \int _{{\mathbb {R}}^d}V \rho , \quad \iint _{{{\mathbb {R}}^d}\times {{\mathbb {R}}^d}} W(x-y) \rho (x) \rho (y) \mathop {}\!\textrm{d}x \mathop {}\!\textrm{d}y \end{aligned}$$

are displacement convex. For the diffusion term there is a more interesting condition, known as McCann’s condition. It states that if we define

$$\begin{aligned} P'(s) {:}{=}s U'(s), \quad P(0) {:}{=}0 \end{aligned}$$

then (2.11) is displacement convex if and only if

$$\begin{aligned} P'(s) s \ge (1-1/d)P(s) \ge 0, \quad \forall s \in (0,\infty ). \end{aligned}$$
(4.18)

This can also be written as \(\lambda \mapsto \lambda ^d U(\lambda ^{-d})\) is convex. When \(U = \frac{s^m}{m-1}\) this means \(m \ge \frac{d-1}{d}\).

Remark 4.16

The case \(m > \frac{d-1}{d}\) and \(V(x) = \frac{|x|^2}{2}\) the functional is 0-convex and bounded below. But recalling Remark 3.2 for \(m < \frac{d}{2+d}\) the (3.2) is not in \({\mathcal {P}}_2\), we cannot hope for asymptotic convergence in 2-Wasserstein sense.

Besides “regular” convexity, a powerful tool in the arsenal is the notion of \(\lambda \)-convexity. A function \(f: (0,1)\rightarrow {\mathbb {R}}\) is \(\lambda \)-convex if \(f(t) - \lambda \frac{|t|^2}{2}\) is convex. Analogously as above we introduce the notion of displacement (or geodesically) \(\lambda \)-convex. In this setting for the gradient flow starting from two points we have

$$\begin{aligned} {{\mathfrak {W}}}_2(\mu _t, {\widehat{\mu }}_t) \le e^{-\lambda t} {{\mathfrak {W}}}_2(\mu _0, {\widehat{\mu }}_0). \end{aligned}$$

This implies uniqueness of the gradient flow and, if \({\mathcal {F}}\) is bounded below, of its minimisers.

A similar analysis of a free energy in \({\mathbb {S}}^1\) is done in [82]. The sense in which the equation is satisfied is delicate. For \(U = V = 0\) and W “pointy”, we send the reader to [56].

4.4 Global existence vs finite-time blow-up

As explained above, local-in-time existence is usually achieved through “standard” PDE methods or the use of gradient flow arguments, specially if starting from a nice initial datum \(\rho _0\). Whether these solutions behave “nicely” for large t is a more difficult problem.

Transport for given potential Going back to the simplest example (CE) we can look at the examples \(V = \frac{|x|^\alpha }{\alpha }\) with \(\alpha > 0\). Then \(v = -\nabla V = - |x|^{\alpha -2} x\) and \(\rho _t = (X_t)_\sharp \rho _0\) with corresponding characteristic field is obtained by solving

$$\begin{aligned} \frac{\partial X_t}{\partial t} = -|X_t|^{\alpha -2} X_t, \end{aligned}$$

with \(X_t(0) = y\). We conclude that

$$\begin{aligned} X_t (y) = \left\{ \begin{array}{ll} y e^{-t} &{}\quad \text {if }\alpha =2, \\ ( |y|^{2-\alpha } - (2-\alpha ) t )^{\frac{1}{2-\alpha }} \frac{y}{|y|} &{}\quad \text {if } \alpha \ne 2. \end{array} \right. \end{aligned}$$

For \(\alpha < 2\) the characteristics arrive at 0 at finite time and hence we get a Dirac delta at 0 at a certain finite time for any \(\rho _0 \not \equiv 0\). This is not problematic in the distributional sense in this case. However, this behaviour happens also for other problems.

Keller–Segel problem. Some authors noticed by direct techniques that the Keller–Segel model has finite-time blow-up for initial data with large enough mass. [129] showed for (KSE) in a bounded domain with no-flux the existence of a critical mass \(M^*\) such that, if \(\Vert \rho _0 \Vert _{L^1} > M^* \), then \(\rho (T^*)\) contains a Dirac delta. Later this result was obtained also in \({{\mathbb {R}}^d}\) in [126] using radial arguments, where the authors characterise the critical mass is \(M^* = 8 \pi \). This was later proved in more generality in [112]. Conversely, for \(\Vert \rho _0 \Vert _{L^1} < M^*\) global existence for (KSE) is known. In \(d = 2\) it was shown in [28].

A nice proof of the formation of a Dirac comes through the analysis of the second-order moment. For (KSE) in \(d = 2\) it was first noticed by [112] that, working with solutions of initial mass \(\int _{{\mathbb {R}}^d}\rho _0 = M\),

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} \int _{{\mathbb {R}}^d}|x|^2 \rho (t,x) \mathop {}\!\textrm{d}x = 4 M \Big ( 1- \frac{M}{8\pi }\Big ). \end{aligned}$$

Hence, if \(M > {8 \pi }\) necessarily the second-order moment becomes negative in finite time. This is incompatible with our non-negative solutions. There is complete concentration to a Dirac delta. This was later extended by [27] for (KSE) when \(d > 2\) where the authors prove that

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} \int _{{\mathbb {R}}^d}|x|^2 \rho (t,x) \mathop {}\!\textrm{d}x = 2 (d-2) {\mathcal {F}}[\rho (t,\cdot )] \le 2 (d-2) {\mathcal {F}}[\rho _0]. \end{aligned}$$

When \(M > M^*\), there exist \(\Vert \rho _0 \Vert _{L^1} = M\) such that \({\mathcal {F}}[\rho _0] < 0\). We will perform the analysis of the free energy for this problem in Remark 6.15. We will discuss below in Sect. 5.1.3 the critical case \(M = 8\pi \) and some interesting ad-hoc constructions which are available in the literature.

5 Asymptotics

The aim of this section is to highlight some techniques coming from the PDE community that allow us to understand the behaviour as \(t \rightarrow \infty \) (and sometimes \(|x| \rightarrow \infty \)) of “typical” solutions of the problem (namely those with “good” initial values).

5.1 Self-similarity

Some of the equations we are analysing have linear operators homogeneous non-linearities, and it is therefore interesting to exploit this structure. One of the main approaches comes from considering the mass-preserving change of variable

$$\begin{aligned} u(\tau ,y) = \sigma (\tau )^{d} \rho \Big ({{\tilde{t}}}(\tau ), \sigma (\tau )y\Big ). \end{aligned}$$
(5.1)

5.1.1 Self-similar scaling for (PME)

Let \(\rho \) be the solution of (PME). Applying the chain rule to (5.1)

$$\begin{aligned} \frac{\partial u}{\partial \tau }&= \frac{\frac{\mathop {}\!\textrm{d}\sigma }{\mathop {}\!\textrm{d}\tau }}{\sigma } u{{\,\textrm{div}\,}}_y y + \sigma ^{d}\frac{\mathop {}\!\textrm{d}\sigma }{\mathop {}\!\textrm{d}\tau } \nabla _x \rho \cdot \nabla _y y + \sigma ^{d}\frac{\mathop {}\!\textrm{d}{{\tilde{t}}}}{\mathop {}\!\textrm{d}\tau } \Delta _x \rho ^m \end{aligned}$$

since \(\nabla _x x = (1, \ldots , 1) = \nabla _y y\) y \({{\,\textrm{div}\,}}_x x = d = {{\,\textrm{div}\,}}_y y\). Due to the chain rule

$$\begin{aligned} \frac{\partial u}{\partial y_i}&= \sigma ^{d+1} \frac{\partial \rho }{\partial x_i} , \quad \frac{\partial ^2 u^m}{\partial y_i^2} = \sigma ^{2+md} \frac{\partial ^2 \rho ^m}{\partial x_i^2} \end{aligned}$$

Eventually, we simplify the equation for u to

$$\begin{aligned} \frac{\partial u}{\partial \tau }&= \frac{\frac{\mathop {}\!\textrm{d}\sigma }{\mathop {}\!\textrm{d}\tau }}{\sigma } {{\,\textrm{div}\,}}_y(uy) + \sigma ^{-d(m-1) + 2} \frac{\mathop {}\!\textrm{d}{{\tilde{t}}}}{\mathop {}\!\textrm{d}\tau } \Delta _x \rho ^m \end{aligned}$$

The only sensible choice is \(\frac{\mathop {}\!\textrm{d}\sigma }{\mathop {}\!\textrm{d}\tau } = \sigma \) and \(\frac{\mathop {}\!\textrm{d}{{\tilde{t}}}}{\mathop {}\!\textrm{d}\tau } = \sigma ^{\frac{1}{\kappa }}\) where \(\kappa = (d(m-1)+2)^{-1},\) so we arrive to the simplified equation

$$\begin{aligned} \frac{\partial u}{\partial \tau }&= {{\,\textrm{div}\,}}_y(uy) + \Delta _y u^m. \end{aligned}$$

Forward self-similar solutions Notice that adding the condition \(\sigma (0) = 1\) we get \(\sigma (\tau ) = e^\tau \). We also have that \(\frac{\mathop {}\!\textrm{d}{{\tilde{t}}}}{\mathop {}\!\textrm{d}\tau } = e^{\tau / \kappa }\). We have \(t = {{\tilde{t}}}(\tau )\) so we set \({{\tilde{t}}}(0) = 0\). Eventually

$$\begin{aligned} t&= \frac{ e^{\frac{\tau }{\kappa }} - 1 }{\kappa } \\ \kappa&= (d(m-1)+2)^{-1} \\ x&= e^\tau y. \end{aligned}$$

Notice that \(\kappa > 0\) if and only if \(m > m_c = \frac{d-2}{d}\). The Barenblatt solution (ZKB) corresponds to setting u as stationary. In fact, for \(m > m_c\) we can write the family of self-similar solutions

$$\begin{aligned} \rho _t(x)&:= \frac{1}{\sigma _B(t+T)^{d}} F\left( \frac{x}{\sigma _B(t+T)} \right) , \\ F_B(y)&:= \left( D - \frac{m-1}{2m}|y|^2\right) ^{\frac{1}{m-1}}_+\\ \sigma _B(t)&:= ( d|m-m_c|t )^{\frac{1}{d(m-m_c)}}. \end{aligned}$$

Here we have chosen the presentation in [26, 83].

Backwards self-similar solutions When \(\kappa < 0\) (i.e., \(m > m_c = \frac{d-2}{d}\)), we can build another type of solution. In [170, Section 5.2], Vázquez introduces the pseudo-Barenblatt solutions for the sub-critical regime \(0< m < m_c\) which can be written

$$\begin{aligned} \rho _t(x)&{:}{=}\frac{1}{\sigma _B(T-t)^{d}} F_B\left( \frac{x}{\sigma _B(T-t)} \right) . \\ \end{aligned}$$

Here, T denotes the extinction time. They are suitable to deal with finite-time extinction [26].

5.1.2 Self-similar analysis in more general contexts

In the previous computation we can trade \(\Delta _x \rho ^m\) for any non-linear operator \({\mathcal {L}}\) with the scaling property

$$\begin{aligned} {\mathcal {L}} [u] = \sigma ^q {\mathcal {L}}[\rho ]. \end{aligned}$$

Self-similarity analysis for (CVE) was performed in [41, Theorem 3.1]. The result passes by an obstacle problem. For (NVE) the self-similar analysis is done in [157, Section 6.2]. For the fractional heat equation see [122].

Self-similar solutions are very particular of problem where all terms have the same scaling, and it is not reasonable to expect solutions of this nature to exist in general. Consider the problem

$$\begin{aligned} \frac{\partial \rho }{\partial t} = {\mathcal {L}} \rho . \end{aligned}$$

When \({\mathcal {L}}\) is the sum of two terms with different scaling, it is not reasonable to expect self-similar structure. For (ADE) with \(U = U_m, V = 0, W=W_k\) this is only possible in the so-called fair-competition regime

$$\begin{aligned} d(m-1) + k = 0. \end{aligned}$$

The self-similar analysis in this setting was performed in [43].

5.1.3 The Keller–Segel problem: bubbles

For \(d = 2\) a self-similar solution is not available, although [126] give some hints on the shape by matching asymptotics. Later, there were more advanced studies on stability [173,174,175]. In this direction, see also [95, 151].

In [125] the authors prove there are no self-similar solution of (\(\hbox {KSE}_\chi \)) when \(d = 2\). However, for \(d = 3\) they show that there exists a sequence of radial self-similar solutions \(\rho ^{(n)}\)

$$\begin{aligned} \rho ^{(n)}_t (x) = \frac{1}{T-t} F_n \left( \frac{x}{\sqrt{T-t}}\right) ,\qquad \text {where } F_n(y) \sim \left( \frac{8\pi }{\chi } + a_n \right) \frac{1}{4 \pi |y|^2} \text { for } |y| \sim 0, \end{aligned}$$

and \(a_n \rightarrow 0\). This construction was later generalised for \(d \ge 3\) by several authors [123, 156, 160]. Blow-up on the range \(U = U_m\) for \(m \ge 1\), \(V = 0\) and \(W = W_k\) with \(k > 2-d\) see [179].

It was later conjectured in [102] and proved in [100] that for \(d = 2\) one can construct “approximate bubbles”. In particular, an initial datum \(\rho _0^\star \) with initial mass \(8 \pi \) so that the solutions of Keller-Segel problem (KSE) such that, for \(\rho _0\) close to \(\rho _0^\star \)

$$\begin{aligned} \rho (t,x) \approx \frac{1}{\sigma (t)^2} F\left( \frac{x-\chi (t)}{\sigma (t)} \right) , \qquad F(y) = \frac{8}{(1+|y|^2)^2} \end{aligned}$$

as \(t \rightarrow \infty \) where \(\sigma (t) \approx c/ \sqrt{\log t}\), \(\chi (t) \rightarrow q\).

5.2 Relative entropy

If we are able to construct a global solution \(B_t\) of which we know some properties, we would like to see whether this is the “generic behaviour”. Assume we are in a case where \(\Vert \rho _t \Vert _{L^1}\) is preserved. We would say that \(B_t\) is attractive if

$$\begin{aligned} \Vert \rho _t - B_t \Vert _{L^1} \rightarrow 0, \qquad \text {as } t \rightarrow \infty . \end{aligned}$$

For \(L^p\) norms in general (which typically change over time), we would to see if the relative error tends to zero

$$\begin{aligned} \frac{\Vert \rho _t - B_t \Vert _{L^p}}{\Vert \rho _t \Vert _{L^p}} \rightarrow 0, \qquad \text {as } t \rightarrow \infty . \end{aligned}$$
(5.2)

These are usually called intermediate asymptotics, as opposed to the long-time asymptotic limit.

Naïve computations do not yield good results. Sometimes it is possible to do this computation directly via comparison arguments. But, in general, this is not possible. A useful tool is are the so-called relative entropies. We start by a simple example, (HE).

5.2.1 \(L^2\) relative entropy for (HE)

An interesting alternative way of writing (HE-FP) as

$$\begin{aligned} \frac{\partial u}{\partial t} = {{\,\textrm{div}\,}}\left( G \nabla \frac{u}{G}\right) , \end{aligned}$$

where G is the Gaussian profile (3.1). Let us take \(w = \tfrac{u}{G}\) we rewrite this problem as

$$\begin{aligned} \frac{\partial w}{\partial t} = G^{-1} {{\,\textrm{div}\,}}\left( G \nabla w \right) = \Delta w - x \cdot \nabla w. \end{aligned}$$

This is the famous Ornstein–Uhlenbeck problem, which is also the dual (HE-FP). We can write the free-energy dissipation formula

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} \int _{{\mathbb {R}}^d}|w-1|^2 {G} \mathop {}\!\textrm{d}x = - 2 \int _{{\mathbb {R}}^d}|\nabla w|^2 G\mathop {}\!\textrm{d}x. \end{aligned}$$

We can now take advantage of the Gaussian Poincaré inequality , see [93],

$$\begin{aligned} \int _{{\mathbb {R}}^d}w^2 G\mathop {}\!\textrm{d}x - \left( \int _{{\mathbb {R}}^d}w G \mathop {}\!\textrm{d}x \right) ^2 \le \int _{{\mathbb {R}}^d}|\nabla w|^2 {G}. \end{aligned}$$

Hence, assuming that \( \int _{{\mathbb {R}}^d}w {G = \int _{{\mathbb {R}}^d}u} = 1, \) we recover that

$$\begin{aligned} \int _{{\mathbb {R}}^d}|w_t - 1|^2 G\mathop {}\!\textrm{d}x \le e^{-2t} \int _{{\mathbb {R}}^d}|w_0 - 1|^2 G \mathop {}\!\textrm{d}x. \end{aligned}$$

This approach can be generalised in many directions. We point the reader to [6] and the references therein.

5.2.2 \(L^1\) Relative entropy argument for (HE)

For convenience, we will denote the solution of (HE-FP) by \(v\). Notice that, with our choice of re-scaling \(v_0 = \rho _0\). The relative entropy is defined

$$\begin{aligned} {\mathcal {E}} (v) = \int _{{\mathbb {R}}^d}v\log \frac{v}{G} = \int _{{\mathbb {R}}^d}v\left( \log v+ \frac{x^2}{2} v\right) + C. \end{aligned}$$

It is easy to check that

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} {\mathcal {E}}(v) = - {\mathcal {I}} (v), \end{aligned}$$

where \({\mathcal {I}}\) is the so-called Fisher information

$$\begin{aligned} {\mathcal {I}} (v) = \int _{{\mathbb {R}}^d}v\left| \frac{\nabla v}{v} + x \right| ^2 = \int _{{\mathbb {R}}^d}v\left| \nabla \log \frac{v}{G} \right| ^2. \end{aligned}$$

The Gaussian logarithmic Sobolev inequality (see [113, 124, 168])

$$\begin{aligned} \frac{1}{2} \int _{{\mathbb {R}}^d}(|f|^2 \log |f|^2) G \mathop {}\!\textrm{d}x \le \int _{{\mathbb {R}}^d}|\nabla f|^2 G, \quad \text {if } \int _{{\mathbb {R}}^d}|f|^2 G = 1. \end{aligned}$$
(5.3)

This inequality ensures the relationship \({\mathcal {E}} \le \tfrac{1}{2} {\mathcal {I}}\), and hence we recover an ordinary differential inequation that yields \({\mathcal {E}}(v_t) \le {\mathcal {E}}(\rho _0) e^{-2t}\). Lastly, we will take advantage of the Czsizar–Kullback inequality taking \(G = K_1\)

$$\begin{aligned} \Vert f - G \Vert _{L^1}^2 \le 2 {\mathcal {E}}(f). \end{aligned}$$

Eventually, we deduce that if \(\rho \) is a solution (HE)

$$\begin{aligned} \Vert \rho _t - K_t \Vert _{L^1} \le \sqrt{2 {\mathcal {E}}(\rho _0)} t^{-\frac{1}{2}}. \end{aligned}$$

Remark 5.1

It is also worth point out that (5.3) is equivalent to the Euclidean log-Sobolev inequality, which in scale invariant form (see [113, 178]) given by

$$\begin{aligned} \frac{d}{2} \log \left( \frac{2}{\pi d e} \int _{{\mathbb {R}}^d}|\nabla f|^2 \right) \ge \int _{{\mathbb {R}}^d}|f|^2 \log |f|^2, \qquad \text {if } \int _{{\mathbb {R}}^d}|f|^2 = 1. \end{aligned}$$
(5.4)

Notice that is usually applied to \(f = \sqrt{\rho }\), and hence the condition is simply that \(\rho \in L^1 ({{\mathbb {R}}^d}) \cap {{\mathcal {P}}}({{\mathbb {R}}^d})\).

5.2.3 \(L^1\) relative entropy for (PME)

In [83, 103] the authors extend the relative-entropy study to \(m > 1\). Now we denote by \(v\) the solution of (PME-FP). Suppose there exists \({\hat{v}}\) of suitable mass and define the relative entropy and the Fisher information

$$\begin{aligned} H(v) = \int _{{\mathbb {R}}^d}\left( \frac{2}{m-1} v^m + |x|^2 v\right) \qquad I(v) = \int _{{\mathbb {R}}^d}v\left| x + \frac{m}{m-1} \nabla v^{m-1} \right| ^2. \end{aligned}$$

The suitable relative entropy for \(m > 1\) is \({\mathcal {H}}(u||{{\hat{u}}}) = {\mathcal {H}} (u) - {\mathcal {H}}({{\hat{u}}})\). This was later extended to more general families than (PME-FP) in [72]. This relative-entropy arguments rely heavy on Gagliardo–Nirenberg–Sobolev inequalities. In fact, there is a deep connection between the smoothing properties of (PME) and these types of inequalities (see, e.g., [103]). In [26, 29] the authors develop the correct relative entropies and Fisher information functions to cover the cases \(m < 1\) with the correct re-scaling of the Barenblatt/pseudo-Barenblatt explained in Sect. 5.1.1.

Remark 5.2

This kind of relative entropy arguments for diffusive problems (which do not have a natural stationary state) rely on the self-similar solution. It is possible to prove intermediate asymptotics without homogeneity or self-similarity techniques. For diffusive problems of the form \(\frac{\partial \rho }{\partial t} = \Delta \Phi (\rho )\), a very nice analysis using Wasserstein spaces can be found in [59].

5.2.4 Relative entropy argument for (ADE)

The entropy arguments for (PME) are stable enough that allow some families of perturbations into the range (ADE). For \(W = 0\) and confining potentials see [72]. The reader may find information on the strictly displacement-convex free-energy functionals [76, 77]. For (\(\hbox {KSE}_{m,\chi }\)) with \(m \in [1,2-\frac{2}{d}]\) (and, in fact, a larger family of W radially decreasing), intermediate asymptotics in the sense (5.2) are obtained in [14]. In [61] the authors take advantage of relative entropy arguments to prove that in the case \(U(\rho ) = \rho \log \rho , V = 0\) and W smooth and bounded (5.2) with \(p = 1\) and \(B_t = K_t\) (i.e., the heat kernel). See [46] for previous results in this direction.

5.3 Study of the mass variable of radial solutions

One of the significant limitations to prove the above results rigorously is the lack of regularity of \(\rho \). Some regularity can be regained by passing to the so-called mass variable.

Let \(d = 1\). Assume \(x \in {\mathbb {R}}\) and

$$\begin{aligned} M(t,x) = \int _{-\infty }^x \rho _t(y) \mathop {}\!\textrm{d}y. \end{aligned}$$
(5.5)

Integrating (ADE) we show that M satisfies a PDE

$$\begin{aligned} \begin{aligned} \frac{\partial M}{\partial t}&= \Phi ' \left( \frac{\partial M}{\partial x} \right) \frac{\partial ^2 M}{\partial x^2 } + \frac{\partial M}{\partial x} \frac{\partial V}{\partial x} + \frac{\partial M}{\partial x} \frac{\partial }{\partial x} \left( W*\frac{\partial M}{\partial x}\right) \end{aligned} \end{aligned}$$

If \(\Phi (s) = |s|^{m-1}s\) (PME for \(\rho \)), this is the p-Laplacian for M with \(p = m+1\). If \(x \in {\mathbb {R}}^d\) with \(d > 1\) it is better to define the mass in volumetric coordinates

$$\begin{aligned} M(t,v) = \int _{A_v} \rho (t, x) \mathop {}\!\textrm{d}x, \quad \text {where } A_v = B(0,r) \text { such that } |A_v| = v. \end{aligned}$$

Then, if \(\rho \) is radially, we recover for a weight \(\kappa \)

$$\begin{aligned} \frac{\partial M}{\partial t}&= \kappa (v)^2 \Phi ' \left( \frac{\partial M}{\partial v} \right) \frac{\partial ^2 M}{\partial v^2 } + \kappa (v)^2\frac{\partial M}{\partial v} \frac{\partial V}{\partial v} + \kappa (v)^2 \frac{\partial M}{\partial v} \frac{\partial }{\partial v} \left( W*_{{{\mathbb {R}}^d}}\frac{\partial M}{\partial v}\right) \end{aligned}$$

This is a Hamilton–Jacobi type equation, and is well suited for the theory of viscosity solutions. In [60] the authors discuss the problem when \(W = 0\). They show that one can take advantage of stability properties of viscosity solutions to characterise the steady state. Since \(\kappa (0) = 0\) and \(\kappa (v) > 0\) for \(v>0\), this equation can be used to show that for smooth V and W and radially symmetric \(\rho \), singularities can only form at \(x = 0\). They also show that a Dirac may form, but only in the limit \(t \rightarrow \infty \).

Many authors have taken advantage of this kind of idea in their arguments. For example, we point to [123, 125, 134, 156].

There is also an interesting connection between the mass variable and the Wasserstein distance for \(\rho \). The simplest case states that if \(\rho _1, \rho _2 \in {\mathcal {P}}({\mathbb {R}})\), and \(M_i\) are their primitives, then

$$\begin{aligned} {{\mathfrak {W}}}_1(\rho _1, \rho _2) = \Vert M_1 - M_2 \Vert _{L^1 ({\mathbb {R}})}. \end{aligned}$$

A similar formula holds if \(\rho _i \in {\mathcal {P}}({{\mathbb {R}}^d})\), and are radially symmetric.

The inverse of the mass function Furthermore, if we consider the generalised inverse

$$\begin{aligned} M^{-1}(s) = \inf \{ v \in [-\infty ,\infty ): M(v) \ge s\} \end{aligned}$$

and \(\rho _1, \rho _2 \in {\mathcal {P}}({\mathbb {R}})\), and \(M_i\) are their primitives, then

$$\begin{aligned} {{\mathfrak {W}}}_p(\rho _1, \rho _2) = \Vert M_1^{-1} - M_2^{-1} \Vert _{L^p (0,1)}. \end{aligned}$$

Interestingly, if \(\rho (t,x)\) solves a conservation problem, we can also deduce and equation for \(u(t,s) = M^{-1} (t,s)\). This equation is usually degenerate, but the coefficients do not depend on s (see, e.g., [68]).

5.4 On the attractiveness of attractors

Relative entropy arguments allow us to show that the steady state (or energy minimiser) in \(L^1\) is an attractor for a large range of equations: from (PME-FP) to all the examples discussed in Sect. 5.2.4.

A more difficult question is to understand the case where the energy minimiser may contain a singular part, e.g.,

$$\begin{aligned} {\widehat{\mu }} = {\widehat{\rho }} + (1-\Vert {\widehat{\rho }}\Vert _{L^1}) \delta _0, \end{aligned}$$
(5.6)

and \(\Vert {\widehat{\rho }}\Vert _{L^1} < 1\). We will see an example in the aggregation dominated range (see Corollary 6.13 below). It is difficult to say whether a measure of this kind can be called a steady state, since it cannot be plugged into even the distributional formulation. If the energy is \(\lambda \)-displacemente convex, there is no doubt that the minimiser is a global attractor.

In [60] the authors prove that (\(\hbox {ADE}^*\)) \(m \in (0,1)\), V smooth, and \(K = 0\), the solution converges to the free energy minimiser, even when this contains a singular part. In [57] the authors use the mass to prove that for (\(\hbox {ADE}^*\)) \(m \in (0,1)\), in large class of K.

In the recent preprint [23], covering (ADE) when \(k = 2 - 2s \in (2-d,0)\) and \(m = \frac{2d}{d+2s}\) (part of the fair-competition regime described in Sect. 6.3.3) shows that there is an explicit steady state (given in (6.3) below), and solutions with \(\Vert \rho _0\Vert _{L^m} > \Vert {\widehat{\rho }}\Vert _{L^m}\) have \(\Vert \rho _t\Vert _{L^m}\) blow-up in finite time (and hence even we cannot study them in the distributional sense).

6 Minimisation of the free energy

As mentioned in Sect. 4.3.4, displacement \(\lambda \)-convexity with \(\lambda > 0\) guarantees exponential contraction with rate

$$\begin{aligned} {{\mathfrak {W}}}_2(\mu _t, {\widehat{\mu }}_t) \le e^{-\lambda t} {{\mathfrak {W}}}_2(\mu _0, {\widehat{\mu }}_0) \end{aligned}$$

In particular, if \({\widehat{\mu }}\) is a global minimiser of \({\mathcal {F}}\) (which is unique by convexity), then the JKO schemes is stationary and so is the gradient flow. This means that

$$\begin{aligned} {{\mathfrak {W}}}_2(\mu _t, {\widehat{\mu }}) \le e^{-\lambda t} {{\mathfrak {W}}}_2(\mu _0, {\widehat{\mu }}) \end{aligned}$$

Therefore, it is a global attractor with exponential rate. First, we will try to characterise the local minimisers using variational arguments. This will lead us to Euler–Lagrange conditions, and generally corresponds to 2-Wasserstein local minimisers. On each connect component of the support of a steady state, the Euler–Lagrange condition comes also from the free-energy dissipation (4.9), and are simply \(\frac{\delta F}{\delta \rho } = C\).

However, (ADE) can have steady states that do not satisfy the Euler–Lagrange conditions, specially when the support has several components. A famous example that is known to the community is the case \(U = U_m\) with \(m > 1\), \( W = 0\), and a potential V with two wells, e.g., \(V(x) = |x-x_0|^2 + |x-x_1|^2\). Then we can build formal steady states

$$\begin{aligned} {\hat{\rho }} (x) = \Big ((U_m')^{-1}(h_1 - V(x))\Big )_+ + \Big ((U_m')^{-1}(h_2 - V(x))\Big )_+ \text { even with } h_1 \ne h_2,\nonumber \\ \end{aligned}$$
(6.1)

provided the support of these two terms are disjoint. These are not 2-Wasserstein local minimisers, but they are, actually, \(\infty \)-Wasserstein local minimisers. Furthermore, these steady states attract some initial data. The study of \(\infty \)-Wasserstein local minimisation became quite popular, and we cite several references below.

6.1 Local minimisers: Euler–Lagrange conditions

The aim of this section is show that if \({\hat{\rho }} \in L^1 ({{\mathbb {R}}^d}) \cap {\mathcal {P}} ({{\mathbb {R}}^d})\) is a local minimiser of \({{\mathcal {F}}}\) over \(L^1 ({{\mathbb {R}}^d}) \cap {\mathcal {P}} ({{\mathbb {R}}^d})\) then we expect it to satisfy the following: there exists \(h \in {\mathbb {R}}\) such that

figure h
figure i

We recall that our main interest is the free energy \({\mathcal {F}}\) given by (\(\hbox {FE}^*\)) for which the first variation \(\frac{\delta {{\mathcal {F}}}}{\delta \rho }\) is given by (2.9).

6.1.1 First variation over the space of absolutely continuous probability measures

We follow the argument of [10, 27, 43, 45, 60, 66, 67, 84] where the reader may find rigorous proofs in different ranges. Our aim is to take variations of the forms

$$\begin{aligned} {\widehat{\rho }}_\varepsilon = {\widehat{\rho }} + \varepsilon \varphi \end{aligned}$$

where, if we pick test functions \(\varphi \) suitably, we still have \({\widehat{\rho }}_\varepsilon \in L^1 \cap {{\mathcal {P}}}\) so

$$\begin{aligned} {{\mathcal {F}}}({\widehat{\rho }}) \le {{\mathcal {F}}}({\widehat{\rho }}_\varepsilon ). \end{aligned}$$

Variations on the support of \({\hat{\rho }}\) Consider a test function \(\psi \in C_c^\infty ({{\mathbb {R}}^d})\) and introduce

$$\begin{aligned} \varphi (x) = \left( \psi (x) - \int _{{\mathbb {R}}^d}\psi (y) {\hat{\rho }}(y) \mathop {}\!\textrm{d}y \right) {\hat{\rho }}(x) \end{aligned}$$

Then for \(\varepsilon < \frac{1}{2} \Vert \psi \Vert _{L^\infty }^{-1}\) we have \({\hat{\rho }}_\varepsilon \in L^1 \cap {\mathcal {P}}_2\) and, using that \({\widehat{\rho }}\) is a minimiser and the definition of first variation, we get

$$\begin{aligned} \int _{{\mathbb {R}}^d}\frac{\delta {{\mathcal {F}}}}{\delta \rho }[{\hat{\rho }}] \varphi = \lim _{\varepsilon \rightarrow 0} \frac{1}{\varepsilon } \left( {{\mathcal {F}}}(\rho _\varepsilon ) - {{\mathcal {F}}}(\rho ) \right) \ge 0. \end{aligned}$$

Expanding this integral and using \(\Vert {\hat{\rho }} \Vert _{L^1} = 1\) we get

$$\begin{aligned} 0&\le \int _{{\mathbb {R}}^d}\frac{\delta {{\mathcal {F}}}}{\delta \rho }[{\hat{\rho }}] (x) \left( \psi (x) - \int _{{\mathbb {R}}^d}\psi (y) {\hat{\rho }}(y) \mathop {}\!\textrm{d}y \right) {\hat{\rho }}(x) \mathop {}\!\textrm{d}x\\&= \int _{{\mathbb {R}}^d}\frac{\delta {{\mathcal {F}}}}{\delta \rho }[{\hat{\rho }}] (x) {\hat{\rho }}(x) \psi (x) \mathop {}\!\textrm{d}x - \left( \int _{{{\mathbb {R}}^d}} \frac{\delta {{\mathcal {F}}}}{\delta \rho }[{\hat{\rho }}] (x) {\hat{\rho }}(x) \mathop {}\!\textrm{d}x \right) \left( \int _{{{\mathbb {R}}^d}} \psi (y) {\hat{\rho }}(y) \mathop {}\!\textrm{d}y \right) \end{aligned}$$

Lastly, trading x and y on the last two integrals we recover that for all \(\psi \in C_c^\infty ({{\mathbb {R}}^d})\) we have

$$\begin{aligned} \int _{{\mathbb {R}}^d}\left( \frac{\delta {{\mathcal {F}}}}{\delta \rho }[{\hat{\rho }}] (x) - h({\hat{\rho }}) \right) \psi (x) {\hat{\rho }}(x) \mathop {}\!\textrm{d}x \ge 0, \qquad \text {where } h({\hat{\rho }}) = \int _{{{\mathbb {R}}^d}} \frac{\delta {{\mathcal {F}}}}{\delta \rho }[{\hat{\rho }}] (y) {\hat{\rho }}(y) \mathop {}\!\textrm{d}y. \end{aligned}$$

Since this also holds for \(-\psi \) the equality holds above, and hence

$$\begin{aligned} \left( \frac{\delta {{\mathcal {F}}}}{\delta \rho }[{\hat{\rho }}] (x) - h({\hat{\rho }}) \right) {\widehat{\rho }}(x) = 0, \qquad \text {for a.e. } x \in {{\mathbb {R}}^d}\end{aligned}$$

Hence we recover (\(\hbox {EL}_1\)). This gives us complete information on the boundary of the support. But it could jump uncontrollably from 0 to the other profile.

Variations outside the support of \({\hat{\rho }}\) Now we take the variations

$$\begin{aligned} \varphi (x) = \psi (x) - {\hat{\rho }} (x) \int _{{{\mathbb {R}}^d}} \psi (y) \mathop {}\!\textrm{d}y. \end{aligned}$$

If \(\psi \ge 0\) and \(\varepsilon < \Vert \psi \Vert _{L^1}^{-1}\) then \({\hat{\rho }}_\varepsilon \ge 0\). Notice that we cannot hope for \(\psi < 0\) outside the support of \({\hat{\rho }}\). With a similar argument as before we get

$$\begin{aligned} \int _{{\mathbb {R}}^d}\psi \left( \frac{\delta {{\mathcal {F}}}}{\delta \rho }[{\hat{\rho }}] (x) - h({\hat{\rho }}) \right) \ge 0. \end{aligned}$$

Since this holds true for all \(\psi \) is the conditions above, we have proven (\(\hbox {EL}_2\)).

Remark 6.1

For (ADE) we have that

$$\begin{aligned} h({\hat{\rho }})&= \int _{{\mathbb {R}}^d}(U'({\hat{\rho }}){\hat{\rho }} + V {\hat{\rho }} + (W*{\hat{\rho }}) {\hat{\rho }} ) \end{aligned}$$

With PME diffusion

$$\begin{aligned} h({\hat{\rho }}) = {\mathcal {F}}({\hat{\rho }}) + (m-1) \int \rho ^m + \frac{1}{2} \int _{{\mathbb {R}}^d}(W*{\hat{\rho }}) {\hat{\rho }} \end{aligned}$$

6.1.2 Solving the Euler–Lagrange equation when \(U \ne 0\)

When \(U \not \equiv 0\) then \(U'\) is non-decreasing, and we get from (\(\hbox {EL}_2\))

$$\begin{aligned} {\hat{\rho }} \ge (U')^{-1} ( h - V - W* {\hat{\rho }} ) \end{aligned}$$
(6.2)

we also know that \({\hat{\rho }} \ge 0\), and (6.2) equation holds with equality when \({\hat{\rho }} > 0\). If the right-hand of (6.2) is positive, so is \({\widehat{\rho }}\) and equality holds. When the right-hand side of (6.2) is non-positive, then \({\widehat{\rho }} = 0\). Therefore, we have that

figure j

In particular, observe that when \(U = U_m\), \(V = \frac{|x|^2}{2}\) and \(W = 0\) (i.e., (PME-FP)) this is the famous Barenblatt solution (ZKB). More generally, the same approach holds true for the case \(W = 0\). In this case, (\(\hbox {EL}_3\)) is a one-parameter family \({\hat{\rho }}_h\). Furthermore, since \(U'\) is non-decreasing we have that \({\hat{\rho }}_h\) is point-wise non-decreasing with h. Hence, in most cases h can be recovered from the fixed value \(\int _{{\mathbb {R}}^d}{\widehat{\rho }}_h\).

Remark 6.2

For the fast diffusion case \(U'(0) = \infty \). Hence, if \(|h|<\infty \) and \(W*\rho \in L^\infty _{loc}\) then \({\widehat{\rho }} > 0\).

Notice that when \({\hat{\rho }} > 0\) then the solutions of (\(\hbox {EL}_3\)) are precisely stationary solutions in the sense of (ADE-S). When the support of \({\hat{\rho }}\) minimisers are still formally stationary, however the regularity on the free boundary (the boundary of the support) is more challenging.

The cases \(U = 0\) are also interesting. Take, for example (CVE) and re-scale to a Fokker–Planck problem. Then \( \frac{|x|^2}{2} + (-\Delta )^{-s} {\widehat{\rho }} \ge h. \) Let \(v {:}{=}(-\Delta )^{-s} \rho \). Hence, \((-\Delta )^s v = {\hat{\rho }} \ge 0\). We also get, due to (\(\hbox {EL}_1\)), that when \({\hat{\rho }} > 0\) then equality holds in the last equation. Hence, we can write

$$\begin{aligned} \min \left\{ (-\Delta )^s v(x), v(x) + \frac{|x|^2}{2} - h \right\} = 0, \qquad \text {for all } x \in {{\mathbb {R}}^d}. \end{aligned}$$

This is the well-known fractional obstacle problem. In this particular application it is discussed in [41]. Regularity of solutions was proved in [158].

Lastly, there is the case of the Newtonian/Riesz potential \(W = - cW_k\) for \(k = 2-2\,s \in [2-d,0)\). Then \(u {:}{=}W * \rho = (-\Delta )^{-s} \rho \). Hence, we can re-write (\(\hbox {EL}_3\)) as

$$\begin{aligned} (-\Delta )^s u = ((U')^{-1} (u + h - V))_+ \end{aligned}$$

The case of \(U_m\) with \(m > 1\), \(V = 0\), and \(s \in (0,1)\) was studied in [89].

The particular case \(k \in (2-d,0)\) and \(m = \frac{2d}{2d+k}\) has the correct scaling (sometimes called conformal). The explicit self-similar solution can be found using the result in [92] and the references therein (see also [23]) to be

$$\begin{aligned} {\hat{\rho }} (x) = B \left( \frac{\lambda }{\lambda ^2 + |x-x_0|^2} \right) ^{\frac{d+2s}{2}}. \end{aligned}$$
(6.3)

Remark 6.3

This also shows that in \({{\mathbb {R}}^d}\), when \(U'\) is strictly increasing, \(U'(0) = -\infty \), and VW is bounded, there can exist no stationary states. We compute the lower bound

$$\begin{aligned} {\hat{\rho }}_h (x) \ge (U')^{-1} ( h - \Vert V \Vert _\infty - \Vert W \Vert _\infty \Vert \rho \Vert _\infty ) > 0. \end{aligned}$$

This means that, \({\hat{\rho }}_t \notin L^1({{\mathbb {R}}^d})\) is not for any \(h \in {\mathbb {R}}\).

Remark 6.4

(Bifurcation analysis) Several authors have paid attention to the structure of local minimisers in terms of bifurcations. These analyses can be done for (KE) in terms of Fourier coefficients. In [64] the authors study bifurcation of (McKVE) (with periodic boundary conditions, or equivalently in \({\mathbb {T}}^d\), with \(W = \kappa {{\bar{W}}}\), and \(V = 0\)) by applying Crandall-Rabinowitz theory. They see local minimisers as solutions of the equation

$$\begin{aligned} (\rho , \kappa ) \mapsto \rho - \frac{e^{-\beta \kappa {{\bar{W}}}*\rho } }{\int e^{-\beta \kappa ({{\bar{W}}}*\rho )(y)} \mathop {}\!\textrm{d}y}, \end{aligned}$$

over the space of even functions. This was later extended in [65] to (ADE) (again in the torus but with \(U = U_m\) and \(V = 0\)) by taking the suitable extension of the energy.

Remark 6.5

When \(U = 0\) the equation \(V + W * \rho = h\) has been studied using different approaches, but we include these references in Sect. 6.3.2 below.

6.2 Extension of the free energy to measure space

Our free energy \({{\mathcal {F}}}\) is defined for \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^d}) \cap L^1 ({{\mathbb {R}}^d})\). This set is not dense in \({\mathcal {P}}_2({{\mathbb {R}}^d})\) with the total variation metric since for \(\Vert \rho - \delta _0\Vert _{TV} = 2\). However, it is dense in the narrow topology. We can therefore try to extend \({\mathcal {F}}\). The natural weak lower-semicontinuous extension is given by

$$\begin{aligned} \widetilde{{{\mathcal {F}}}} (\mu ) = \inf _{\rho _n \rightharpoonup \mu } \liminf _n {{\mathcal {F}}}(\rho _n). \end{aligned}$$

Clearly, this is not so easy to compute directly.

Let us look first at the diffusion term in the homogeneous range \(U_m\). As an extreme example, we have to make sense of \(\delta _0^m\). An easy approach is to take \(\rho _1 \in {\mathcal {P}}_2({{\mathbb {R}}^d}) \cap C_c^\infty ({{\mathbb {R}}^d})\), define \(\rho _\varepsilon = \varepsilon ^{-d} \rho _1(\varepsilon ^{-1} x)\). Then \(\rho _\varepsilon \rightharpoonup \delta _0\). Integrating we recover

$$\begin{aligned} {\mathcal {U}}_m (\rho _\varepsilon ) = \varepsilon ^{d(1-m)} {\mathcal {U}}_m(\rho _1). \end{aligned}$$

Therefore, the extension has to be

$$\begin{aligned} \widetilde{{\mathcal {U}}_m} (\delta _0) = \left\{ \begin{array}{ll} +\infty &{} \text {if } m > 1, \\ 0 &{} \text {if } m \in (0,1). \end{array} \right. \end{aligned}$$

The rigorous construction of \(\widetilde{\mathcal{U}}\) in general setting can be found in [107]. In fact

$$\begin{aligned} {\widetilde{{\mathcal {U}}}}(\mu ) = \left\{ \begin{array}{ll} +\infty &{} \text {if } { \lim _{s \rightarrow \infty }\tfrac{U(s)}{s} = \infty } \text { and } { \mu _{\textrm{sing}} } \not \equiv 0, \\ {\mathcal {U}}(\mu _{ac}) &{} \text {if } { \lim _{s \rightarrow \infty }\tfrac{U(s)}{s} =0, } \end{array} \right. \end{aligned}$$

where \(\mu = \mu _{\textrm{ac}} + \mu _{\textrm{sing}}\) the absolutely continuous and singular parts of the measure. If VW are well-behaved (e.g., bounded by \(C(1+|x|^2)\)) then

$$\begin{aligned} \widetilde{{{\mathcal {F}}}} (\alpha \delta _0 + \rho ) = \widetilde{{\mathcal {U}}} (\rho ) + \int V \mathop {}\!\textrm{d}\mu + \frac{1}{2} \iint (W*\mu ) \mathop {}\!\textrm{d}\mu . \end{aligned}$$

For example, if U is sublinear and VW are well-behaved

$$\begin{aligned} \widetilde{{{\mathcal {F}}}} (\alpha \delta _0 + \rho ) = {{\mathcal {F}}}(\rho ) + \alpha V(0) + 2 \alpha \int _{{\mathbb {R}}^d}W(x) \rho (x) \mathop {}\!\textrm{d}x + \alpha ^2 W(0). \end{aligned}$$

The first variation in this setting is a little more involved, but it follows the same general scheme.

However, if W is singular at 0 the landscape can be richer, for example as mentioned in Remark 6.15 for (\(\hbox {KSE}_\chi \)), the correct behaviour extension is more difficult to construct. In fact, we point the reader to [106] where the authors construct example of W with infinitely many radially decreasing steady states.

6.3 Existence of minimisers

6.3.1 General comments

For example, it is easy to show that when \(m > 1\), and VW are \(\lambda \)-convex for \(\lambda > 0\) and bounded below, then the free-energy functional admits a unique minimiser with the properties above using the direct method of calculus of variations. First, the functional is bounded below since \(U_m \ge 0\) for \(m > 1\). We take a minimising sequence \(\rho _n\), and we use Prokhorov’s theorem to prove it has a narrow limit \({\hat{\mu }}\). Using weak lower semi-continuity we show that the limit is indeed a minimiser.

The situation in bounded domain \(\Omega \) is easier, since all sequences of measures are tight (i.e., mass cannot escape to infinity) and the Lebesgue spaces are embedded in each other. We focus therefore on the more complicated case \({{\mathbb {R}}^d}\).

To find energy minimisers in \({{\mathbb {R}}^d}\), one can try to construct minimising profiles by scaling. Given a profile \(\rho _1\), we can rescale it like

$$\begin{aligned} \rho _\lambda (x) = \lambda ^{d} \rho _1( \lambda x ). \end{aligned}$$

It might happen that full diffusion (i.e., \(\lambda \rightarrow 0\)) is energy beneficial. Indeed, we have the scaling

$$\begin{aligned} {\mathcal {U}}_m (\rho _\lambda ) = \left\{ \begin{array}{ll} \lambda ^{d(m-1)} {\mathcal {U}}_m (\rho _1) &{}\text {if } m \ne 1 \\ {\mathcal {U}}_m(\rho _1) + \log \lambda &{} \text {if } m = 1, \end{array} \right. \end{aligned}$$

Notice that, when \(m > 1\) then \(d(m-1)\) and \({\mathcal {U}}_m(\rho _1) > 0\), whereas if \(m < 1\) they are both negative. Furthermore, we observe that, when \(m > 1\), then the energy is bounded below; and for \(m \in (0,1]\) it is not.

It might also happen that full concentration (i.e., \(\lambda \rightarrow \infty \)) is best for the energy, for example for \(U, W = 0\) and \(V = |x|^2\). Then we expect the minimiser to be a \(\delta _0\).

6.3.2 The case \(U = 0\)

There is a long literature looking for minimisers when \(U = 0\). In this setting, some authors have studied the \({{\mathfrak {W}}}_p\) local minimisers with \(p \in (1,\infty ]\). In [10] the authors realised that, when \(V = 0\), the minimisers can be supported over sets of different dimensions, depending on the attractive-repulsive nature of W. The stability of these minimisers was later studied in [11]. In [12] the authors discuss the compact support of minimisers under general settings. For general conditions on existence of global minimisers see [159]. The regularity of compactly supported \(\infty \)-Wasserstein when \(V = 0\) was studied in [53]. Recently, analysis of the Fourier transform has been applied to determine further structure of minimisers [79, 81].

There is the particularly interesting case of \(V = 0\) and the power-type attractive-repulsive potential

$$\begin{aligned} W(x) = \frac{|x|^\gamma }{\gamma } - \frac{|x|^\alpha }{\alpha }. \end{aligned}$$
(6.4)

In this direction, there have a number of works. The authors of [85] show existence of minimisers coming as a limit of empirical measures (recall (2.4)) when \(\gamma > \alpha \). In dimension 1 one can construct solutions of \(W*\rho = E\) by inverting Fredholm operators (see [70]). Nevertheless, the measures

$$\begin{aligned} \mu _{{\mathfrak {m}}} = {\mathfrak {m}} \delta _0 + (1-{\mathfrak {m}}) \delta _1 \end{aligned}$$

are of special relevance. In [131] the authors show in \(d = 1\) if \(\alpha \ge 2\) and \(\gamma \) is large enough, then \(\mu _{\frac{1}{2}}\) is unique \({\mathfrak {W}}_p\) minimiser for \(p \in [1,\infty )\). The \({\mathfrak {W}}_\infty \) minimisation when \(d = 1\) is richer:

  • If \(\gamma> \alpha > 2\), for every \({\mathfrak {m}} \in (0, 1)\), \(\mu _{{\mathfrak {m}}}\) is a \({{\mathfrak {W}}}_\infty \)-strict local minimiser.

  • If \(\gamma > 3, \alpha = 2\), for every \({\mathfrak {m}}\in (\frac{1}{\gamma -1}, \frac{\gamma -2}{\gamma -1})\), \(\mu _{{\mathfrak {m}}}\) is a \({{\mathfrak {W}}}_\infty \)-strict local minimiser.

  • If \(\gamma > 3, \alpha = 2\),for every \({\mathfrak {m}} \in (0, \frac{1}{\gamma -1} ] \cup [ \frac{\gamma -2}{\gamma -1}, 1)\), \(\mu _{{\mathfrak {m}}}\) is a \({{\mathfrak {W}}}_\infty \)-saddle point.

  • If \(3> \gamma > \alpha = 2\), for every \({\mathfrak {m}}\in (0, 1)\), \(\mu _{{\mathfrak {m}}}\) is a \({{\mathfrak {W}}}_\infty \)-saddle point.

  • If \(\gamma = 3, \alpha = 2\), for every \(m\in (0, \frac{1}{2} ) \cup (\frac{1}{2}, 1)\), \(\mu _{{\mathfrak {m}}}\) is a \({{\mathfrak {W}}}_\infty \)-saddle point.

  • If \(\gamma = 3, \alpha = 2\), \(\mu _{\frac{1}{2}}\) is a \({{\mathfrak {W}}}_\infty \)-strict local minimiser.

When \(d > 1\) the suitable extension of \(\rho _{{\mathfrak {m}}}\) is a “Dirac delta” supported on the surface of a ball, see [98, 99]. The explicit shape of the \({\mathfrak {W}}_\infty \)-minimiser when \(\gamma \in (2,3)\) and \(\alpha = 2\) is given in [119, 120]. There is also significant interest in the case \(\gamma = 2, \alpha \in (-2,0)\), see [80] and the references therein.

There are several related problems with different choices of VW which have been studied over the last few years due to the modelling, we point the reader to [74, 75, 144], and the references therein.

6.3.3 The power cases \(U = U_m\), \(V=0\), and \(W = \chi W_k\)

The family of cases \(U_m (\rho ) = \frac{\rho ^m}{m-1}\), \(V = 0\), and \(W = \chi W_k\) has been studied in extensive detail. We analyse the scaling of the second term of the energy to recover

$$\begin{aligned} {\mathcal {W}}_k(\rho _\lambda ) = \left\{ \begin{array}{ll} \lambda ^{-k} {\mathcal {W}}(\rho _1)&{}\text {if } k \ne 0 \\ {\mathcal {W}}_k(\rho _1) - \log \lambda &{} \text {if } k = 0, \end{array} \right. \end{aligned}$$

The scaling of the energy is given by

$$\begin{aligned} {{\mathcal {F}}}_{m,\chi ,k}(\rho _\lambda ) = \lambda ^{-k} \left( \lambda ^{d(m-1) + k} {\mathcal {U}}_m (\rho _1) + \chi {\mathcal {W}}(\rho _1) \right) \end{aligned}$$

The sign inside the parenthesis is crucial. As pointed out in [43, 67] the two terms are in balance when \(d(m-1) = k\), and this leads to the critical value

$$\begin{aligned} m_c = \frac{d-k}{d} \end{aligned}$$
(6.5)

called the fair-competition regime. For \(m > m_c\) we have the so-called diffusion-dominated regime and for \(m \in (0,m_c)\) the aggregation-dominated regime. Notice that \(k > 0\) implies \(m_c<1\) (fast-diffusion) whereas \(k < 0\) implies \(m_c>1\) (slow-diffusion).

Remark 6.6

The case scaling argument can be repeated for \({{\mathcal {F}}}= {\mathcal {U}}_m + \chi {\mathcal {V}}_\lambda \), and we recover also the critical value \(m_c = \frac{d-\lambda }{d}\). The critical value of (PME-FP) corresponds precisely to \(\lambda = 2\).

Remark 6.7

Notice that (\(\hbox {KSE}_{m,\chi }\)) corresponds precisely to \(\lambda = 2 - d\), so we recover \(m_c = \frac{2d - 2}{d}\). The case \(U = U_m\) with \(m \ge 2\), \(V = 0\) and W radially decreasing was studied in [13], where the main tool is radially decreasing rearrangement of \(\rho \) (which will be presented in Sect. 6.4).

Remark 6.8

(Hardy–Littlewood–Sobolev) To show that the free energy functionals are bounded below, we recall the Hardy–Littlewood–Sobolev inequality \(k \in (-d,0) \) and \(\rho \in {\mathcal {P}}({{\mathbb {R}}^d})\)

$$\begin{aligned} \iint _{{{\mathbb {R}}^d}\times {{\mathbb {R}}^d}} |x-y|^\lambda \rho (x) \rho (y) \mathop {}\!\textrm{d}x \mathop {}\!\textrm{d}y \le C_{{\textrm{HLS}}}(k,d) \int _{{\mathbb {R}}^d}\rho (x) ^{m_c} \mathop {}\!\textrm{d}x, \end{aligned}$$
(6.6)

where we recall the definition of \(m_c\) given in (6.5). A proof can be found in [43, Theorem 3.1] The logarithmic version is that if \(\log (1+|\cdot |^2) \rho \in L^1\) and \(\rho \in {\mathcal {P}}({{\mathbb {R}}^d})\) then

$$\begin{aligned} - \iint _{{{\mathbb {R}}^d}\times {{\mathbb {R}}^d}} \log |x-y| \rho (x) \rho (y) \mathop {}\!\textrm{d}x \mathop {}\!\textrm{d}y \le \tfrac{1}{d} \,{\mathcal {U}}_1(\rho ) + C_0. \end{aligned}$$
(6.7)

The fair-competition regime was discussed in [43], the diffusion-dominated regime in [67], and the aggregation-dominated regime in [54].

Since there is fair competition, the rescaling done for (PME) suitably rescales the equation. There is, as usual, a new term \({{\,\textrm{div}\,}}(x \rho )\) coming from the time derivative. This new equation is itself the 2-Wasserstein gradient flow of the rescaled free energy

$$\begin{aligned} {{\mathcal {F}}}_{\textrm{resc}} {:}{=}{{\mathcal {F}}}+ {\mathcal {V}}_2, \end{aligned}$$

where we recall the definition (3.7). The main results are as follows

Theorem 6.9

(Fair-competition case [43]) Let \(m = m_c\) and \(k \in (-d,0)\). Then

  1. 1.

    Stationary states satisfy \({{\mathcal {F}}}({\hat{\rho }}) = 0\).

  2. 2.

    We have that

    $$\begin{aligned} {{\mathcal {F}}}(\rho ) \ge \frac{1 - \chi C_{{\textrm{HLS}}}(k,d)}{d(m-1)} \Vert \rho \Vert _{L^m}^m. \end{aligned}$$

    Due to the previous item, we define \(\chi _c = 1/C_{{\textrm{HLS}}}(k,d)\).

  3. 3.

    If \(\chi = \chi _c\), then there exists a global minimiser.

  4. 4.

    If \(\chi < \chi _c\), then there exists a global minimiser for \({{\mathcal {F}}}_{\textrm{resc}}\), but not for \({{\mathcal {F}}}\).

  5. 5.

    If \(\chi > \chi _c\), then \({\mathcal {F}}\) and \({{\mathcal {F}}}_{\textrm{resc}}\) are not bounded below.

Theorem 6.10

(Diffusion-dominated regime [67]) Let \(m> m_c, d \ge 1, \chi > 0,\) and \(k \in (-d,0)\). Then there exists a global minimiser \({\hat{\rho }} \in L^1_+({{\mathbb {R}}^d}) \cap L^m({{\mathbb {R}}^d})\), and it is radially symmetric and compactly supported.

The work [66] also contains contributions towards existence of minimisers in the diffusion-dominated range. However, the key contribution is the radial symmetry as described in Sect. 6.4. More or less in parallel, work was done on the aggregation-dominated regime.

Theorem 6.11

(Aggregation-dominated regime [54]) Assume that \(W \in L^1_{loc} ({{\mathbb {R}}^d})\) and \(W(x) = W(-x)\).

  1. 1.

    (Linear diffusion) Assume, furthermore, that \(W \in L^\infty ({{\mathbb {R}}^d}{\setminus } B_\delta )\). Then, for any \(\varepsilon > 0\) the energy \(\varepsilon {\mathcal {U}}_1 + {\mathcal {W}}\) does not admit any \({{\mathfrak {W}}}_p\)-local minimisers. Furthermore, if W is Lipschitz continuous, then it does not admit any critical points.

  2. 2.

    (Fast diffusion) If \(m < \frac{d}{d+k}\) then the free energy \(\varepsilon U_m + {\mathcal {W}}_k\) is not bounded below in \({\mathcal {P}}({{\mathbb {R}}^d}) \cap L^\infty ({{\mathbb {R}}^d})\).

  3. 3.

    (Slow diffusion) If \(m > 1\) and \(k \in ((1-m)d, 0)\) then the energy

    $$\begin{aligned} \mathcal F(\rho ) = \left\{ \begin{array}{ll} {\mathcal {U}}_m(\rho ) + {\mathcal {W}}_k (\rho ) &{} \quad \text {if } \rho \in L^m ({{\mathbb {R}}^d}) \\ +\infty &{}\quad \text {if } \rho \in {\mathcal {P}}({{\mathbb {R}}^d}) {\setminus } L^m ({{\mathbb {R}}^d}) \end{array} \right. \end{aligned}$$

    has a global minimiser.

The authors also provide a condition for sharp existence of global minimisers. In [55] the authors introduce a reversed HLS inequality to deal with the aggregation-dominated regime.

Theorem 6.12

(Reversed HLS [55])

  1. 1.

    If \(k > 0\) and \(m \in ( \frac{d}{d+k},1)\) then there exist a constant \(C \in {\mathbb {R}}\) such that

    $$\begin{aligned} \dfrac{ \iint _{{{\mathbb {R}}^d}\times {{\mathbb {R}}^d}} f(x) |x-y|^k f(y) \mathop {}\!\textrm{d}x \mathop {}\!\textrm{d}y}{\left( \int _{{\mathbb {R}}^d}f(x) \mathop {}\!\textrm{d}x \right) ^{\alpha } \left( \int _{{\mathbb {R}}^d}f(x)^m \mathop {}\!\textrm{d}x \right) ^{\frac{2-\alpha }{m}}} \ge C, \end{aligned}$$
    (6.8)

    where

    $$\begin{aligned} \alpha = \frac{2d-m(2d+k)}{d(1-m)}. \end{aligned}$$
  2. 2.

    The equality is in (6.8) achieved if \(d=1,2\) or \(d = 3\) and \(m \ge \min \{ \frac{d-2}{d}, \frac{2d}{2d+k} \} \).

  3. 3.

    If \(m \in ( \frac{d}{d+k},1)\) and either \(\lambda \in [2,4]\) or, \(\lambda \ge 1\) and \(m \ge \frac{d-1}{d}\) then the minimiser of (6.8) is unique (up to translation, dilation, or multiplication).

Part of the results in [55] appeared first as a preprint [51] written in terms of (ADE) instead of functional inequalities. We present the results as stated there as a corollary of the published paper.

Corollary 6.13

Let \(m \in ( \frac{d}{d+k}, 1)\) and \(k > 0\). Then

  1. 1.

    There exists a unique \({\widehat{\mu }}\) that minimises the free energy \({\widetilde{{\mathcal {U}}}}_m ({\widehat{\mu }}) + \chi {\mathcal {W}}({\widehat{\mu }})\), and they coincide with minimisers of (6.8).

  2. 2.

    It is of the form (5.6) for some \({\widehat{\rho }} \in L^1 \).

  3. 3.

    For \(m > \frac{d-1}{d}\) and \(k \ge 1\) the functional is geodesically convex in \({{\mathfrak {W}}}_2\), and hence, the minimiser is unique.

The authors also present the suitable Euler–Lagrange equation for \({\widehat{\mu }}\). The computation is more involved that (6.1.1) that follows the same philosophy.

The case \(\lambda = 4\) was studied in full detail in [52]. Here the authors characterise precisely the dimensions d and index m for has a Dirac delta, i.e., it is (5.6) with \(\Vert {\widehat{\rho }}\Vert _{L^1} < 1\).

Remark 6.14

A particular amount of work has been devoted to the case \(V = 0\) whereas \(V \ne 0\) raised much lower interest. The extension of the results is more or less direct, see [57, 60].

Remark 6.15

Keller–Segel corresponds precisely to the logarithmic cases \(d = 2\), \(m = 1\), and \(W = \frac{-1}{2 \pi } W_0 \) which yield (assuming \(\Vert \rho \Vert _{L^1} = 1\))

$$\begin{aligned} {{\mathcal {F}}}[ \rho _\lambda ] = {{\mathcal {F}}}(\rho _1) + \left( 2 - \frac{ \chi }{4 \pi }\right) \log \lambda . \end{aligned}$$

This leads to the critical value of \(\chi ^* = 8 \pi \) or, equivalently recalling the change of variable in Sect. 3.4, critical mass \(M^* = 8 \pi \).

6.4 Radial symmetry

There are several approaches to prove that, when VW are radially symmetric then the minimisers must be radially symmetric. One approach is to use the associated stationary problem, and Alexandrov reflexions.

A different approach is to prove that we can find radially symmetric minimising sequences. For this, we can take advantage of the rearrangement theory (see [166] and the references therein). First, let us “re-arrange” a set. If \(A \subset {{\mathbb {R}}^d}\) we define the rearrangement of A as the ball of radius 0

$$\begin{aligned} A^\star {:}{=}B(0,R) \qquad \text { such that } |A^\star | = |A|. \end{aligned}$$

We define the radially decreasing re-arrangement of a non-negative function f as the function \(f^\star \) such that

$$\begin{aligned} \{ x: f^\star (x) \ge t \} = \{ x: f(x) \ge t \}^\star . \end{aligned}$$

We can write f in terms of its level sets via the layer-cake representation

$$\begin{aligned} f(x) = \int _0^{f(x)} \mathop {}\!\textrm{d}t = \int _0^\infty 1_{[0,f(x)]} (t) \mathop {}\!\textrm{d}t = \int _0^\infty 1_{L_t^+(f)} (x) \mathop {}\!\textrm{d}t, \end{aligned}$$

where \(1_A\) is the indicator function of A, and we have introduced the super-level set notation

$$\begin{aligned} L_t^+ (f) {:}{=}\{ x: f(x) \ge t \}. \end{aligned}$$

Hence, we can simply define

$$\begin{aligned} f^\star (x) {:}{=}\int _0^\infty 1_{L_t^+(f)^\star } (x) \mathop {}\!\textrm{d}t. \end{aligned}$$

It is not difficult to check that if fgh are non-negative functions vanishing at infinity we have the conservation of all \(L^p\) norms, which can be more generally written as

$$\begin{aligned} \int _{{\mathbb {R}}^d}U(f^\star (x)) \mathop {}\!\textrm{d}x = \int _{{\mathbb {R}}^d}U(f(x)) \mathop {}\!\textrm{d}x, \qquad \text {for all }U\text { convex}. \end{aligned}$$

Furthermore, we have Hardy–Littlewood’s inequality

$$\begin{aligned} \int _{{\mathbb {R}}^d}f^\star (x) g^\star (x) \mathop {}\!\textrm{d}x \ge \int _{{\mathbb {R}}^d}f(x) g(x) \mathop {}\!\textrm{d}x, \end{aligned}$$

and Riesz’s inequality

$$\begin{aligned} \iint _{{{\mathbb {R}}^d}\times {{\mathbb {R}}^d}} f^\star (x) g^\star (y) h^\star (x-y) \mathop {}\!\textrm{d}x \mathop {}\!\textrm{d}y \ge \iint _{{{\mathbb {R}}^d}\times {{\mathbb {R}}^d}} f(x) g(y) h(x-y) \mathop {}\!\textrm{d}x \mathop {}\!\textrm{d}y. \end{aligned}$$

By reverting these inequalities, if VW are non-negative and radially increasing we have that

$$\begin{aligned} {{\mathcal {F}}}[\rho ^\star ] \le {{\mathcal {F}}}[\rho ]. \end{aligned}$$

Hence, the expected behaviour is that if \({\hat{\rho }}\) is a minimiser, then so is its radially decreasing rearrangement. Furthermore, any minimising sequence \(\rho _k\) can be replaced by a radially-decreasing minimising sequence. However, rigorously proving this intuition is rather technical. A fairly general theorem can be seen in [66], where the authors take advantage of the continuous Steiner rearrangement.

7 Numerical methods

7.1 Finite volumes

Finite volume schemes are a family of schemes for conservation equations which incorporate the idea of transport of mass. They are easiest to introduce \(d = 1\). Take a spatial mesh \(x_i = i \Delta x\) we aim to approximate numerically the averages

$$\begin{aligned} {\rho _i} \approx \frac{1}{x_{i+\frac{1}{2}} - x_{i-\frac{1}{2}}} \int _{x_{i-\frac{1}{2}}}^{x_{i+\frac{1}{2}}} \rho (x) \mathop {}\!\textrm{d}x. \end{aligned}$$

The conservation equation (2.2) yields

$$\begin{aligned} \frac{\mathop {}\!\textrm{d}}{\mathop {}\!\textrm{d}t} \int _{x_{i-\frac{1}{2}}}^{x_{i+\frac{1}{2}}} \rho (t,x) \mathop {}\!\textrm{d}x= -\Big ( F(x_{i+\frac{1}{2}}) - F(x_{i-\frac{1}{2}})\Big ). \end{aligned}$$

Discretising in time we arrive at the general form

$$\begin{aligned} \frac{\rho _i^{n+1} - \rho _i^n }{\Delta t} + \frac{F_{i+\frac{1}{2}}^{n+1} - F^{n+1}_{i-\frac{1}{2}}}{\Delta x} = 0. \end{aligned}$$

The no-flux condition is approximated by \(F_{-N-\frac{1}{2}} = F_{N + \frac{1}{2}} = 0\), so \(i = -K, \ldots , K\). For transport problems in which \({\textbf{F}} = \rho {\textbf{v}}\), the flux is taken up-wind to preserve positivity of \(\rho _i^n\), namely

$$\begin{aligned} F_{i+\frac{1}{2} } = \rho _i (v_{i+\frac{1}{2}})_+ + \rho _{i+1} (v_{i+\frac{1}{2}})_-, \end{aligned}$$
(7.1)

where \(u_+ = \max \{u,0\}, u_- = \min \{u,0\}\) so that \(v = v_+ + v_-\). For (ADE) the velocity field \(v_{i+\frac{1}{2}}\) is an approximation of the velocity field

$$\begin{aligned} v_{i+\frac{1}{2}} \approx -\nabla (U'(\rho ) + V + W*\rho ) (x_{i+\frac{1}{2}}). \end{aligned}$$

In [7] the authors introduce a scheme with energy decay for (ADE). When \(W = 0\), if U is concave then so is \(E(x,\rho ) = U(\rho ) + V(x)\rho \) and hence we can proof a version of (4.8) with an inequality

$$\begin{aligned} \frac{1}{\Delta t} \Bigg ( \sum _i E(x_i, \rho _i^{n+1}) - \sum _i E(x_i, \rho _i^{n}) \Bigg ) \\&\le \sum _i \frac{\partial E}{\partial \rho }(x_i, \rho _i^{n+1}) \frac{\rho _i^{n+1} - \rho _i^n }{\Delta t} \\&\le - \sum _i \frac{\partial E}{\partial \rho }(x_i, \rho _i^{n+1}) \frac{F_{i+\frac{1}{2}}^{n+1} - F^{n+1}_{i-\frac{1}{2}}}{\Delta x} \\&= \sum _i \frac{ \frac{\partial E}{\partial \rho }(x_{i+1}, \rho _{i+1}^{n+1}) - \frac{\partial E}{\partial \rho }(x_i, \rho _{i}^{n+1}) }{\Delta x} F_{i+\frac{1}{2}}^{n+1} . \end{aligned}$$

Due to the convexity tricked used above, it is natural to take the implicit method

$$\begin{aligned} v_{i+\frac{1}{2}}^{n+1}&= -\frac{ \frac{\partial E}{\partial \rho }(x_{i+1}, \rho _{i+1}^{n+1}) - \frac{\partial E}{\partial \rho }(x_i, \rho _{i}^{n+1}) }{\Delta x}\\ F_{i+\frac{1}{2} }^{n+1}&= \rho _i^{n+1} (v_{i+\frac{1}{2}}^{n+1})_+ + \rho _{i+1}^{n+1} (v_{i+\frac{1}{2}}^{n+1})_-. \end{aligned}$$

Then we have the energy decay

$$\begin{aligned} \frac{1}{\Delta t} \Bigg ( \sum _i E(x_i, \rho _i^{n+1})&- \sum _i E(x_i, \rho _i^{n}) \Bigg ) \le - \sum _i \left( \rho _i^{n+1} ( v_{i+\frac{1}{2}}^{n+1})_+^2 + \rho _{i+1}^{n+1} (v_{i-\frac{1}{2}}^{n+1})_-^2 \right) \le 0 \end{aligned}$$

When W is introduced similar estimates are possible, although the bilinear terms need delicate handling. In [7] the authors prove that a successful scheme is

$$\begin{aligned} \xi _{i}^{n+1}&= U'(\rho _i^{n+1}) + V(x_i) + \Delta x\sum _{j=1}^n W(x_i - x_j) \frac{\rho _j^{n+1} + \rho _j^n}{2} \end{aligned}$$
(7.2a)
$$\begin{aligned} v_{i+\frac{1}{2}}^{n+1}&= - \frac{\xi _{i+1}^{n+1} - \xi _i^{n+1}}{\Delta x}, \end{aligned}$$
(7.2b)
$$\begin{aligned} F_{i+\frac{1}{2} }^{n+1}&= \rho _i^{n+1} (v_{i+\frac{1}{2}}^{n+1})_+ - \rho _{i+1}^{n+1} (v_{i+\frac{1}{2}}^{n+1})_-, \end{aligned}$$
(7.2c)

with the corresponding boundary no-flux conditions. This scheme has energy decay for any W even. The authors of [7] also prove that under certain assumptions on W the decay of the energy holds for other discretisations without the midpoint \(({\rho _j^{n+1} + \rho _j^n})/{2}\). And a proof of convergence through compactness for the cases \(m > 2\) was later introduced in [9] convergence. This scheme can be adapted to (\(\hbox {ADE}^*\)) by

$$\begin{aligned} \begin{aligned} \xi _{i}^{n+1}&= U'(\rho _i^{n+1}) + V(x_i) + \Delta x\sum _{j=1}^n K(x_i, x_j) \frac{\rho _j^{n+1} + \rho _j^n}{2}. \end{aligned} \end{aligned}$$
(7.2a')

A very similar scheme for (AE) is studied in [105], where the authors prove convergence in Wasserstein sense even for pointy potentials. This scheme is later extended for linear diffusion in [137].

A higher order method for the case \(U, V = 0\) and W Lipschitz was constructed in [58]. The authors prove convergence in 1-Wasserstein norm.

Remark 7.1

Some finite-volume methods have been studied actually correspond to the gradient of a discrete energy in a discrete version of the Wasserstein metric, see [152].

Remark 7.2

This kind of method have been generalised to \(\partial _t \rho = {{\,\textrm{div}\,}}( m(\rho ) \nabla \frac{\delta F}{\delta \rho }) )\) when m is concave (see [8]).

7.2 Methods based on discretising the Wasserstein distance

There are several possibilities of numerical computing the Wasserstein gradient flow. One could combine a method numerically computing the Wasserstein distance (see, e.g., [150]) with a numerical optimisation in the JKO scheme. A very interesting paper in a similar direction is [50], where the authors combine of the JKO scheme with the Benamou–Brenier formula (recall (4.14) above) to construct a discrete gradient-flow method, which they call primal-dual method.

A different approach is to use a modified energy for which the gradient flow is easier to compute. This leads to the particle/blob methods we will discuss in the next section.

7.3 Particle/Blob methods

When \(U = 0\) and VW are Lipschitz, (ADE) admits exact solutions given by empirical distributions

$$\begin{aligned} \mu _t^{(K)} = \sum _{i=1}^K w_i \delta _{X_t^{(i)}}. \end{aligned}$$

Hence, the PDE can be reduced to a finite set of ODEs, that can be solved by any method for ODEs. If we are given \(\rho _0 \in L^1\), then for any K we approximate it by an empirical measure \(\mu _0^{(K)}\).

When \(U \ne 0\) this is no longer the case, because the PDE is “regularising”. Furthermore, the free of empirical measures is not defined for \(U = U_m, m \ge 1\). However, there are two approaches to approximate the free energy so that the \({{\mathfrak {W}}}_2\)-gradient flow admits again these kinds of solutions.

7.3.1 The particle method

A first approach was introduced in [78] and perfected in [69]. Fix an initial mass

$$\begin{aligned} \mu _0^{(K)} = \sum _{i=1}^K w_i \delta _{X_0^{(i)}}. \end{aligned}$$

Fix these weights of the particles and K the number of particles. Take the set of empirical distributions of K elements and fixed weights

$$\begin{aligned} {\mathcal {A}}_{K,w} ({{\mathbb {R}}^d}) = \left\{ \mu = \sum _{i=1}^K w_i \delta _{x_i}: \text { for some } x_i \in {{\mathbb {R}}^d}\right\} . \end{aligned}$$

The aim is to approximate the free energy \(\mathcal F\) so that

$$\begin{aligned} \begin{aligned} {{\mathcal {F}}}_K \left[ \sum _{i=1}^K w_i \delta _{x_i}\right]&{:}{=}{{\mathcal {F}}}\left[ \sum _{i=1}^K w_i \frac{1_{B(x_i,r_i)}}{|B(x_i,r_i)|} \right] \\&= \sum _i |B(x_i,r_i)|^d U\left( \frac{w_i}{|B(x_i,r_i)|} \right) + \sum _{i=1}^K w_i V(x_i) \\ {}&\quad + \frac{1}{2} \sum _{i,j=1}^K w_i w_j W(x_i - x_j), \end{aligned} \end{aligned}$$
(7.3)

for suitably selected radii. Then the scheme is simply to apply a suitable adaptation of the JKO scheme (4.17) with time-step \(\tau \), that is

$$\begin{aligned} \mu _{n+1} {:}{=}\mathop {{{\,\textrm{argmin}\,}}}\limits _{\mu \in {\mathcal {A}}_{K,w}} \left( \frac{{{\mathfrak {W}}}_2(\mu , \mu _n)^2}{2} + \tau {{\mathcal {F}}}_K [\mu ] \right) . \end{aligned}$$

7.3.2 The blob method

The blob approach aims to modify the energy so that solutions by particles are admissible. To preserve the gradient-flow structure we modify the free energy by changing the term \(\int U( \rho )\). A popular idea introduced by [86] is

$$\begin{aligned} {\mathcal {U}}_\eta [\rho ] = \int _{{\mathbb {R}}^d}F \circ (\eta * \rho ) \mathop {}\!\textrm{d}\rho , \quad \text {where } F(s) = \frac{U( s )}{s} \end{aligned}$$

where \(\eta \) is a smooth probably distribution typically close to a Dirac delta. Then, the equation becomes

$$\begin{aligned} \partial _t \rho = {{\,\textrm{div}\,}}\left( \rho \nabla \frac{\delta U_\eta }{\delta \rho } \right) \qquad \text {where } \frac{\delta U_\eta }{\delta \rho } = \eta * \Big ( \big ( F' \circ (\eta * \rho ) \big ) \rho \Big ) + F \circ (\eta * \rho ). \end{aligned}$$

This problem admits particle-type solutions.

Alternative choices of approximation of the free energy are available. See, e.g., [36, 88].

7.4 Finite elements

Finite element schemes rely on the weak formulation and look for solution in Sobolev spaces. They are extremely useful and accurate for problems which have an \(L^2\)-gradient flow structure (e.g., the p-Laplacian problem instead of (PME)). Nevertheless, authors have found success in this approach, e.g., for (KSE)-type problems [114,115,116] and later for more general (ADE) in [35]. For more details on finite elements for parabolic problems see [167].

8 Other related problems

8.1 Second-order problems

Our family of problems is of first order, since in the particle description we are specifying the velocities of the particles. There is a family of models which introduce the type of fields but work in acceleration formulation

$$\begin{aligned} \left\{ \begin{array}{l} \frac{\mathop {}\!\textrm{d}x_t^{(i)}}{\mathop {}\!\textrm{d}t} = v_t^{(i)} \\ \frac{\mathop {}\!\textrm{d}v_t^{(i)}}{\mathop {}\!\textrm{d}t} = -\nabla V(x_t^{(i)}) - \frac{1}{N}\sum _j \nabla W(x_t^{(i)}-x_t^{(j)}) + \sqrt{2\sigma } \mathop {}\!\textrm{d}{\mathcal {B}}_t ^{ { (i) } } \end{array} \right. \end{aligned}$$

with independent Brownian motions \({\mathcal {B}}_t^{(i)}\). When \(\sigma \ne 0\) we recover a diffusion term. This leads to the Vlasov–Poisson–Fokker–Planck equations. We point the reader to [49] and the references therein.

8.2 Cahn–Hilliard problem

Instead of a known potential, some authors have considered the Cahn–Hilliard type problem

$$\begin{aligned} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}(\rho \nabla (-\Delta \rho ) ) - \chi \Delta \rho ^m, \end{aligned}$$

where the diffusion is of fourth order, and the aggregation is modelled by a second order operator. This problem also has a Wasserstein gradient-flow structure. We point the reader, e.g., to [87] and the references therein.

8.3 The (fractional) thin-film problem

A family of related problem is the so-called thin-film problem and its fractional variant

$$\begin{aligned} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}\left( u \nabla ( (-\Delta )^s u + V(x) ) \right) . \end{aligned}$$

These are 2-Wasserstein gradient flows, where we replace \(U(\rho )\) by \([u]_{H^s}^2\) and V is usually 0 or \(|x|^2/2\). Usually \(s \in [0,1]\) where \(s = 0\) corresponds in fact to \(U_2\). We point the reader to [155] and the references therein.

8.4 Cross-diffusion systems

In our model, all particles are of the same “type”. Hence, there is only need for one density. If one considers models several “types” of particles, we will arrive at a system of PDEs. A similar approach to ours can be done in the so-called cross-diffusion systems. They correspond to the suitable Wasserstein flow of several-variable energies like

$$\begin{aligned} {{\mathcal {F}}}[\rho _1, \rho _2] = \int _{{\mathbb {R}}^d}U(\rho _1, \rho _2) + \frac{1}{2} \int _{{\mathbb {R}}^d}\rho _1 W_1 * \rho _1 + \frac{1}{2} \int _{{\mathbb {R}}^d}\rho _2 W_2 \rho _2 + \int _{{\mathbb {R}}^d}\rho _1 K*\rho _2. \end{aligned}$$

See, e.g., [34, 71, 91, 109] and the references therein.

8.5 Non-linear mobility

Several authors have studied the family of problems with so-called non-linear mobility \(\textrm{m}\) given by

$$\begin{aligned} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}\left( \textrm{m} (\rho ) \nabla \frac{\delta {{\mathcal {F}}}}{\delta \rho } \right) . \end{aligned}$$
(8.1)

This family of problems correspond to a Wasserstein-type gradient flow, with a modified distance. The correct way to modify the Wasserstein distance is through an adapted Benamou–Brenier formula (recall (4.14)). The details can be found in [73, 111, 141]. It turns out this is only a valid distance if \(\textrm{m}\) is concave.

8.5.1 The Caffarelli–Vázquez problem with non-linear mobility

Some authors developed a theory for the (CVE) with non-linear mobility of the form \(\textrm{m} (\rho ) = \rho ^{m-1}\) with \(m > 0\). The motivation for came as a fractional generalisation of (PME) with non-local pressure. A theory of existence and properties of weak solutions was developed in [161, 162]. Recently, it was shown that in \(d=1\) one can analyse a suitable equation for the mass, for which there is a monotone and convergent scheme [104]. In fact, this numerical method for the mass uses an up-winding given for \(\rho \) as (7.1). This group in Trondheim also prove convergence in \(\rho \) variable in the so-called bounded Lipschitz distance (also called Fortet–Mourier distance), where we point the reader to [177, page 97]. This distance is defined as an extension of the duality formula (4.13) as

$$\begin{aligned} d_{bL} (\mu , \nu ) {:}{=}\sup \left\{ \int \psi \mathop {}\!\textrm{d}\mu - \int \psi \mathop {}\!\textrm{d}\nu : \text {for } \psi \text { such that } \Vert \psi \Vert _{L^\infty } + \Vert \nabla \psi \Vert _{L^\infty } \le 1 \right\} , \end{aligned}$$

where we point out the addition of the bound of \(\psi \) in \(L^\infty \).

8.5.2 Newtonian vortex with non-linear mobility

Some authors became interested in the case of non-linear mobility version of (NVE) given by

$$\begin{aligned} \left\{ \begin{array}{l} \frac{\partial \rho }{\partial t} = {{\,\textrm{div}\,}}(\rho ^\alpha \nabla v ) \\ -\Delta v = \rho , \end{array} \right. \end{aligned}$$
(8.2)

The cases \(\alpha \in (0,1)\) (where \(\mathrm m\) is concave) are studied in [62], where the authors show that for \(\alpha < 1\) there exist nice self-similar solutions. The cases \(\alpha > 1\) [63], where the authors find self-similar solutions do not have finite mass, and in fact the attracting solution is a family of characteristic functions. The technique in [62, 63] is to find solutions by generalised characteristics of the mass equation.

8.5.3 Saturation

The problem (ADE) with \(\textrm{m}(\rho ) = \rho \psi (\rho )\) with \(\psi (1) = 0\) has recently gained the attention of some experts. This addition ensures the that we can work with solutions such that \(0 \le \rho \le 1\) (in particular no formation of \(\delta \)), and hence some authors use the term saturation to describe this choice.

The papers [110, 117] explore the case \(U = 0\). The case \(U \ne 0\) is, so far, only understood numerically. A finite-volume numerical scheme was constructed for this case [8]. Their numerical results suggest that steady-states will be of the form

$$\begin{aligned} {\hat{\rho }} = \min \{ 1, (U')^{-1}( h-V-W*{\hat{\rho }}) \}_+ \end{aligned}$$

for suitable h such that the total mass is preserved. However, there is currently no rigorous proof that this is the correct choice.