1 Introduction

The evolution of interacting particles and their equilibrium configurations has attracted the attention of many applied mathematicians and mathematical analysts for years. Continuum description of interacting particle systems usually leads to analyze the behavior of a mass density \(\rho (t,x)\) of individuals at certain location \(x\in {\mathbb {R}}^d\) and time \(t\ge 0\). Most of the derived models result in aggregation-diffusion nonlinear partial differential equations through different asymptotic or mean-field limits [14, 29, 75]. The different effects reflect that equilibria are obtained by competing behaviors: the repulsion between individuals/particles is modeled through nonlinear diffusion terms while their attraction is integrated via nonlocal forces. This attractive nonlocal interaction takes into account that the presence of particles/individuals at a certain location \(y\in {\mathbb {R}}^d\) produces a force at particles/individuals located at \(x\in {\mathbb {R}}^d\) proportional to \(-\nabla W(x-y)\) where the given interaction potential \(W:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) is assumed to be radially symmetric and increasing consistent with attractive forces. The evolution of the mass density of particles/individuals is given by the nonlinear aggregation-diffusion equation of the form:

$$\begin{aligned} \partial _t \rho = \Delta \rho ^m + \nabla \cdot (\rho \nabla (W*\rho )) \quad x\in {\mathbb {R}}^d \,, t\ge 0\, \end{aligned}$$
(1.1)

with initial data \(\rho _0 \in L^1_+({\mathbb {R}}^d)\cap L^m({\mathbb {R}}^d)\). We will work with degenerate diffusions, \(m>1\), that appear naturally in modelling repulsion with very concentrated repelling nonlocal forces [14, 75], but also with linear and fast diffusion ranges \(0<m\le 1\), which are also classical in applications [59, 77]. These models are ubiquitous in mathematical biology where they have been used as macroscopic descriptions for collective behavior or swarming of animal species, see [15, 20, 69,70,71, 84] for instance, or more classically in chemotaxis-type models, see [11, 13, 26, 53, 54, 59, 77] and the references therein.

On the other hand, this family of PDEs is a particular example of nonlinear gradient flows in the sense of optimal transport between mass densities, see [2, 33, 34]. The main implication for us is that there is a natural Lyapunov functional for the evolution of (1.1) defined on the set of centered mass densities \(\rho \in L^1_+({\mathbb {R}}^d)\cap L^m({\mathbb {R}}^d)\) given by

$$\begin{aligned}&\mathcal {E} [\rho ] = \frac{1}{m-1} \int _{{\mathbb {R}}^d} \rho ^m(x) \, dx + \frac{1}{2} \int _{{\mathbb {R}}^d\times {\mathbb {R}}^d} \rho (x) W(x-y) \rho (y)\, dx\,dy \nonumber \\&\rho (x)\ge 0\, , \quad \int _{{\mathbb {R}}^d}\rho (x)\, dx = M\ge 0\, , \quad \int _{{\mathbb {R}}^d} x\rho (x)\, dx = 0\, , \end{aligned}$$
(1.2)

being the last integral defined in the improper sense, and if \(m=1\) we replace the first integral of \(\mathcal {E} [\rho ]\) by \( \int _{{\mathbb {R}}^d} \rho \log \rho dx\). Therefore, if the balance between repulsion and attraction occurs, these two effects should determine stationary states for (1.1) including the stable solutions possibly given by local (global) minimizers of the free energy functional (1.2).

Many properties and results have been obtained in the particular case of Newtonian attractive potential due to its applications in mathematical modeling of chemotaxis [59, 77] and gravitational collapse models [78]. In the classical 2D Keller–Segel model with linear diffusion, it is known that equilibria can only happen in the critical mass case [10] while self-similar solutions are the long time asymptotics for subcritical mass cases [13, 22]. For supercritical masses, all solutions blow up in finite time [54]. It was shown in [23, 63] that degenerate diffusion with \(m>1\) is able to regularize the 2D classical Keller–Segel problem, where solutions exist globally in time regardless of its mass, and each solution remain uniformly bounded in time. For the Newtonian attraction interaction in dimension \(d\ge 3\), the authors in [9] show that the value of the degeneracy of the diffusion that allows the mass to be the critical quantity for dichotomy between global existence and finite time blow-up is given by \(m=2-2/d\). In fact, based on scaling arguments it is easy to argue that for \(m>2-2/d\), the diffusion term dominates when density becomes large, leading to global existence of solutions for all masses. This result was shown in [80] together with the global uniform bound of solutions for all times.

However, in all cases where the diffusion dominates over the aggregation, the long time asymptotics of solutions to (1.1) have not been clarified, as pointed out in [8]. Are there stationary solutions for all masses when the diffusion term dominates? And if so, are they unique up to translations? Do they determine the long time asymptotics for (1.1)? Only partial answers to these questions are present in the literature, which we summarize below.

To show the existence of stationary solutions to (1.1), a natural idea is to look for the global minimizer of its associated free energy functional (1.2). For the 3D case with Newtonian interaction potential and \(m>4/3\), Lions’ concentration-compactness principle [67] gives the existence of a global minimizer of (1.2) for any given mass. The argument can be extended to kernels that are no more singular than Newtonian potential in \({\mathbb {R}}^d\) at the origin, and have slow decay at infinity. The existence result is further generalized by [5] to a broader class of kernels, which can have faster decay at infinity. In all the above cases, the global minimizer of (1.2) corresponds to a stationary solution to (1.1) in the sense of distributions. In addition, the global minimizer must be radially decreasing due to Riesz’s rearrangement theorem.

Regarding the uniqueness of stationary solutions to (1.1), most of the available results are for Newtonian interaction. For the 3D Newtonian potential with \(m>4/3\), for any given mass, the authors in [65] prove uniqueness of stationary solutions to (1.1) among radial functions, and their method can be generalized to the Newtonian potential in \({\mathbb {R}}^d\) with \(m>2-2/d\). For the 3D case with \(m>4/3\), [79] show that all compactly supported stationary solutions must be radial up to a translation, hence obtaining uniqueness of stationary solutions among compactly supported functions. The proof is based on moving plane techniques, where the compact support of the stationary solution seems crucial, and it also relies on the fact that the Newtonian potential in 3D converges to zero at infinity. Similar results are obtained in [28] for 2D Newtonian potential with \(m>1\) using an adapted moving plane technique. Again, the uniqueness result is based on showing radial symmetry of compactly supported stationary solutions. Finally, we mention that uniqueness of stationary states has been proved for general attracting kernels in one dimension in the case \(m=2\), see [21]. To the best of our knowledge, even for Newtonian potential, we are not aware of any results showing that all stationary solutions are radial (up to a translation).

Previous results show the limitations of the present theory: although the existence of stationary states for all masses is obtained for quite general potentials, their uniqueness, crucial for identifying the long time asymptotics, is only known in very particular cases of diffusive dominated problems. The available uniqueness results are not very satisfactory due to the compactly supported restriction on the uniqueness class imposed by the moving plane techniques. And thus, large time asymptotics results are not at all available due to the lack of mass confinement results of any kind uniformly in time together with the difficulty of identifying the long time limits of sequences of solutions due to the restriction on the uniqueness class for stationary solutions.

If one wants to show that the long time asymptotics are uniquely determined by the initial mass and center of mass, a clear strategy used in many other nonlinear diffusion problems, see [87] and the references therein, is the following: one first needs to prove that all stationary solutions are radial up to a translation in a non restrictive class of stationary solutions, then one has to show uniqueness of stationary solutions among radial solutions, and finally this uniqueness will allow to identify the limits of time diverging sequences of solutions, if compactness of these sequences is shown in a suitable functional framework. Let us point out that comparison arguments used in standard porous medium equations are out of the question here due to the lack of maximum principle by the presence of the nonlocal term.

In this work, we will give the first full result of long time asymptotics for a diffusion dominated problem using the previous strategy without smallness assumptions of any kind. More precisely, we will prove that all solutions to the 2D Keller–Segel equation with \(m>1\) converge to the global minimizer of its free energy using the previous strategy. The first step will be to show radial symmetry of stationary solutions to (1.1) under quite general assumptions on W and the class of stationary solutions. Let us point out that standard rearrangement techniques fail in trying to show radial symmetry of general stationary states to (1.1) and they are only useful for showing radial symmetry of global minimizers, see [28]. Comparison arguments for radial solutions allow to prove uniqueness of radial stationary solutions in particular cases [61, 65]. However, up to our knowledge, there is no general result in the literature about radial symmetry of stationary solutions to nonlocal aggregation-diffusion equations.

Our first main result is that all stationary solutions of (1.1), with no restriction on \(m>0\), are radially decreasing up to translation by a fully novel application of continuous Steiner symmetrization techniques for the problem (1.1). Continuous Steiner symmetrization has been used in calculus of variations [18] for replacing rearrangement inequalities [16, 64, 72], but its application to nonlinear nonlocal aggregation-diffusion PDEs is completely new. Most of the results present in the literature using continuous Steiner symmetrization deal with functionals of first order, i.e. functionals involving a power of the modulus of the gradient of the unknown, see [19, Corollary 7.3] for an application to p-Laplacian stationary equations, and in [58, Section II] and [18, 57], while in our case the functional (1.2) is purely of zeroth order. The decay of the attractive Newtonian potential interaction term in \(d\ge 3\) follows from [18, Corollary 2] and [72], which is the only result related to our strategy.

We will construct a curve of measures starting from a stationary state \(\rho \) using continuous Steiner symmetrization such that the functional (1.2) decays strictly at first order along that curve unless the base point \(\rho \) is radially symmetric, see Proposition 2.15. However, the functional (1.2) has at most a quadratic variation when \(\rho \) is a stationary state as the first term in the Taylor expansion cancels. This leads to a contradiction unless the stationary state is radially symmetric. The construction of this curve needs a non-classical technique of slowing-down the velocities of the level sets for the continuous Steiner symmetrization in order to cope with the possible compact support of stationary states in the degenerate case \(m>1\), see Proposition 2.8. This first main result is the content of Sect. 2 in which we specify the assumptions on the interaction potential and the notion of stationary solutions in details. We point out that the variational structure of (1.1) is crucial to show the radially decreasing property of stationary solutions.

The result of radial symmetry for general stationary solutions to (1.1) is quite striking in comparison to other gradient flow models in collective behavior based on the competition of attractive and repulsive effects via nonlocal interaction potentials. Actually, there exist numerical and analytical evidence in [4, 7, 62] that there should be stationary solutions of these fully nonlocal interaction models which are not radially symmetric despite the radial symmetry of the interaction potential. Our first main result shows that this break of symmetry does not happen whenever nonlinear diffusion is chosen to model very strong localized repulsion forces, see [84]. Symmetry breaking in nonlinear diffusion equations without interactions has also received a lot of attention lately related to the Caffarelli–Kohn–Nirenberg inequalities, see [45, 46]. Another consequence of our radial symmetry results is the lack of non-radial local minimizers, and even non-radial critical points, of the free energy functional (1.2), which is not at all obvious.

We also generalize our radial symmetry result when (1.1) has an additional term \(\nabla \cdot (\rho \nabla V)\) on the right-hand side, where V is a confining potential (see Sect. 2.5 for precise conditions on V), in the sense that it plays the role of preventing particles to drift away in the presence of the diffusion. It is known that with the extra term, the corresponding energy functional has an additional term \(\int V(x) \rho (x)\, dx\). The particular case of quadratic confinement \(V(x)=\tfrac{|x|^2}{2}\) is important since it leads to the free energy functional associated to (1.1) with homogeneous kernels in self-similar variables [24, 25, 36] and thus, characterizing the self-similar profiles for those problems.

Finally, let us remark that our radial symmetry result applies to stationary states of (1.1) for any \(m>0\) regardless of being in the diffusion dominated case or not. As soon as stationary states of (1.1) exist under suitable assumptions on the interaction potential W, and the confining potential V if present, they must be radially symmetric up to a translation. This fact makes our result applicable to the fair-competition cases [10,11,12] and the aggregation-dominated cases, see [39, 40, 68] with degenerate, linear or fast diffusion. Section 2.4 is finally devoted to deal with the most restrictive case of \(\lambda \)-convex potentials and the Newtonian potential with \(m\ge 1-\tfrac{1}{d}\). In these cases, we can directly make use of the key first-order decay result of the interaction energy along Continuous Steiner symmetrization curves in Proposition 2.15, bypassing the technical result in Proposition 2.8, in order to give a nice shortcut of the proof of our main Theorem 2.2 based on gradient flow techniques.

We next study more properties of particular radially decreasing stationary solutions. We make use of the variational structure to show the existence of global minimizers to (1.2) under very general hypotheses on the interaction potential W and \(m>1\). In Sect. 3, we show that these global minimizers are in fact radially decreasing continuous functions, compactly supported if \(m>1\). These results fully generalize the results in [28, 79]. Putting together Sects. 2 and 3, the uniqueness and full characterization of the stationary states is reduced to uniqueness among the class of radial solutions. This result is known in the case of Newtonian attraction kernels [65].

Finally, we make use of the uniqueness among translations for any given mass of stationary solutions to (1.1) to obtain the second main result of this work, namely to answer the open problem of the long time asymptotics to (1.1) with Newtonian interaction in 2D and \(m>1\). This is accomplished in Sect. 4 by a compactness argument for which one has to extract the corresponding uniform in time bounds and a careful treatment of the nonlinear terms and dissipation while taking the limit \(t\rightarrow \infty \). We do not know how to obtain a similar result for Newtonian interaction in \(d\ge 3\) due to the lack of uniform in time mass confinement bounds in this case. We essentially cannot show that mass does not escape to infinity while taking the limit \(t\rightarrow \infty \). However, the compactness and characterization of stationary solutions is still valid in that case.

The present work opens new perspectives to show radial symmetry for stationary solutions to nonlocal aggregation-diffusion problems. While the hypotheses of our result to ensure existence of global radially symmetric minimizers of (1.2), and in turn of stationary solutions to (1.1), are quite general, we do not know yet whether there is uniqueness among radially symmetric stationary solutions (with a fixed mass) for general non-Newtonian kernels. We even do not have available uniqueness results of radial minimizers beyond Newtonian kernels. Understanding if the existence of radially symmetric local minimizers, that are not global, is possible for functionals of the form (1.2) with radial interaction potential is thus a challenging question. Concerning the long-time asymptotics of (1.1), the lack of a novel approach to find confinement of mass beyond the usual virial techniques and comparison arguments in radial coordinates hinders the advance in their understanding even for Newtonian kernels with \(d\ge 3\). Last but not least, our results open a window to obtain rates of convergence towards the unique equilibrium up to translation for the Newtonian kernel in 2D. The lack of general convexity of this variational problem could be compensated by recent results in a restricted class of functions, see [32]. However, the problem is quite challenging due to the presence of free boundaries in the evolution of compactly supported solutions to (1.1) that rules out direct linearization techniques as in the linear diffusion case [22].

2 Radial symmetry of stationary states with degenerate diffusion

Throughout this section, we assume that \(m>0\), and \(W\) satisfies the following four assumptions:

  1. (K1)

    \(W\) is attracting, i.e., \(W(x) \in C^1({\mathbb {R}}^d \setminus \{0\})\) is radially symmetric

    $$\begin{aligned} W(x)=\omega (|x|)=\omega (r) \end{aligned}$$

    and \(\omega '(r)>0\) for all \(r>0\) with \(\omega (1)=0\).

  2. (K2)

    \(W\) is no more singular than the Newtonian kernel in \({\mathbb {R}}^d\) at the origin, i.e., there exists some \(C_w>0\) such that \(\omega '(r) \le C_w r^{1-d}\) for \(r\le 1\).

  3. (K3)

    There exists some \(C_w>0\) such that \(\omega '(r) \le C_w\) for all \(r>1\).

  4. (K4)

    Either \(\omega (r)\) is bounded for \(r\ge 1\) or there exists \(C_w>0\) such that for all \(a,b\ge 0\):

    $$\begin{aligned} \omega _+(a+b)\le C_w (1+\omega (1+a)+\omega (1+b))\,. \end{aligned}$$

As usual, \(\omega _\pm \) denotes the positive and negative part of \(\omega \) such that \(\omega =\omega _+-\omega _-\). In particular, if \(W=-{\mathcal N}\), modulo the addition of a constant factor, is the attractive Newtonian potential, where \({\mathcal N}\) is the fundamental solution of \(-\Delta \) operator in \({\mathbb {R}}^d\), then \(W\) satisfies all the assumptions. Since the Eq. (1.1) does not change by adding a constant to the potential W, we will consider that the potential W is defined modulo additive constants from now on.

We denote by \(L^{1}_{+}({\mathbb {R}}^{d})\) the set of all non-negative functions in \(L^{1}({\mathbb {R}}^{d})\). Let us start by defining precisely stationary states to the aggregation Eq. (1.1) with a potential satisfying (K1)–(K4).

Definition 2.1

Given \(\rho _s \in L^1_+({\mathbb {R}}^d)\cap L^\infty ({\mathbb {R}}^d)\) we call it a stationary state for the evolution problem (1.1) if \(\rho _s^{m}\in H^1_{loc} ({\mathbb {R}}^d)\), \(\nabla \psi _s:=\nabla W *\rho _s\in L^1_{loc} ({\mathbb {R}}^d)\), and it satisfies

$$\begin{aligned} \nabla \rho _s^{m} = - \rho _s\nabla \psi _s \text { in } {{\mathbb {R}}}^d \end{aligned}$$
(2.1)

in the sense of distributions in \({\mathbb {R}}^d\).

Let us first note that \(\nabla \psi _s\) is globally bounded under the assumptions (K1)–(K3). To see this, a direct decomposition in near- and far-field sets yields

$$\begin{aligned} |\nabla \psi _s(x)|&\le \int _{{\mathbb {R}}^d} | \nabla W(x-y) | \rho _s(y) \,dy \le C_w \int _{\mathcal {A}}\frac{1}{|x-y|^{d-1}} \rho _s(y) \,dy\nonumber \\&\quad + C_w \int _{\mathcal {B}} \rho _s(y) \,dy \nonumber \\&\le C_w\int _{\mathcal {A}} \frac{1}{|x-y|^{d-1}} dy \, \Vert \rho _s\Vert _{L^\infty ({\mathbb {R}}^d)}+C_w \Vert \rho _s\Vert _{L^1({\mathbb {R}}^d)}\nonumber \\&\le C (\Vert \rho _s\Vert _{L^1({\mathbb {R}}^d)}+\Vert \rho _s\Vert _{L^\infty ({\mathbb {R}}^d)})\,. \end{aligned}$$
(2.2)

where we split the integrand into the sets \(\mathcal {A} := \{ y : |x - y| \le 1 \}\) and \(\mathcal {B} := {\mathbb {R}}^d \setminus \mathcal {A}\), and apply the assumptions (K1)–(K3).

Under the additional assumptions (K4) and \(\omega (1+|x|)\rho _s \in L^1({\mathbb {R}}^d)\), we will show that the potential function \(\psi _s(x) = W*\rho _s(x)\) is also locally bounded. First, note that (K1)–(K3) ensures that \(|\omega (r)| \le {\tilde{C}}_w \phi (r)\) for all \(r\le 1\) with some \({\tilde{C}}_w>0\), where

$$\begin{aligned} \phi (r):=\left\{ \begin{array}{lr}r^{2-d}-1 &{} \quad \text{ if } d\ge 3\\ -\log (r) &{} \quad \text{ if } d=2 \\ 1-r &{} \quad \text{ if } d=1 \end{array}\right. . \end{aligned}$$
(2.3)

Hence we can again perform a decomposition in near- and far-field sets and obtain

$$\begin{aligned} |\psi _s(x)|&\le \int _{{\mathbb {R}}^d} |W(x-y) | \rho _s(y) \,dy \le C_w \int _{\mathcal {A}}\phi (|x-y|) \rho _s(y) \,dy\nonumber \\&\quad + \int _{\mathcal {B}} \omega _+(|x|+|y|) \rho _s(y) \,dy \nonumber \\&\le C_w \int _{\mathcal {A}} \phi (|x-y|) dy \, \Vert \rho _s\Vert _{L^\infty ({\mathbb {R}}^d)}+C_w(1+\omega (1+|x|))\Vert \rho _s\Vert _{L^1({\mathbb {R}}^d)}\nonumber \\&\quad + C_w \Vert \omega (1+|x|)\rho _s\Vert _{L^1({\mathbb {R}}^d)} \nonumber \\&\le C (\Vert \rho _s\Vert _{L^1({\mathbb {R}}^d)}+\Vert \rho _s\Vert _{L^\infty ({\mathbb {R}}^d)})+\omega (1+|x|)\Vert \rho _s\Vert _{L^1({\mathbb {R}}^d)}\nonumber \\&\quad + C_w \Vert \omega (1+|x|)\rho _s\Vert _{L^1({\mathbb {R}}^d)}\,. \end{aligned}$$
(2.4)

Our main goal in this section is the following theorem.

Theorem 2.2

Assume that W satisfies (K1)–(K4) and \(m>0\). Let \(\rho _s \in L^1_+({\mathbb {R}}^d)\cap L^\infty ({\mathbb {R}}^d)\) with \(\omega (1+|x|)\rho _s \in L^1({\mathbb {R}}^d)\) be a non-negative stationary state of (1.1) in the sense of Definition 2.1. Then \(\rho _s\) must be radially decreasing up to a translation, i.e. there exists some \(x_0\in {\mathbb {R}}^d\), such that \(\rho _s(\cdot - x_0)\) is radially symmetric, and \(\rho _s(|x-x_0|)\) is non-increasing in \(|x-x_0|\).

Before going into the details of the proof, we briefly outline the strategy here. Assume there is a stationary state \(\rho _s\) which is not radially decreasing under any translation. To obtain a contradiction, we consider the free energy functional \(\mathcal {E}[\rho ]\) associated with (1.1),

$$\begin{aligned} \mathcal {E}[\rho ] = \frac{1}{m-1} \int _{{\mathbb {R}}^d} \rho ^m dx + \frac{1}{2} \int _{{\mathbb {R}}^d} \rho (W*\rho ) dx =: \mathcal {S}[\rho ] + \mathcal {I}[\rho ], \end{aligned}$$
(2.5)

where \(\mathcal {S}[\rho ]\) is replaced by \(\int \rho \log \rho \,dx\) if \(m=1\). We first observe that \(\mathcal {I}[\rho _s]\) is finite since the potential function \(\psi _s=W*\rho _s \in {\mathcal W}^{1,\infty }_{loc}({\mathbb {R}}^d)\) satisfies (2.4) with \(\omega (1+|x|)\rho _s \in L^1({\mathbb {R}}^d)\). Since \(\rho _s \in L^1_+({\mathbb {R}}^d)\cap L^\infty ({\mathbb {R}}^d)\), \(\mathcal {S}[\rho _s]\) is finite for all \(m>1\), but may be \(-\infty \) if \(m\in (0,1]\).

Below we discuss the strategy for \(m>1\) first, and point out the modification for \(m\in (0,1]\) in the next paragraph. Using the assumption that \(\rho _s\) is not radially decreasing under any translation, we will apply the continuous Steiner symmetrization to perturb around \(\rho _s\) and construct a continuous family of densities \(\mu (\tau , \cdot )\) with \(\mu (0,\cdot )=\rho _s\), such that \(\mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s] < -c\tau \) for some \(c>0\) and any small \(\tau >0\). On the other hand, using that \(\rho _s\) is a stationary state, we will show that \(|\mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s]| \le C\tau ^2\) for some \(C>0\) and any small \(\tau >0\). Combining these two inequalities together gives us a contradiction for sufficiently small \(\tau >0\).

For \(m\in (0,1)\), even if \(\mathcal {S}[\rho _s]\) might be \(-\infty \) by itself, the difference \(\mathcal {S}[\mu (\tau )] - \mathcal {S}[\rho _s] \) can be still well-defined in the following sense, if we regularize the function \(\frac{1}{m-1}\rho ^{m}\) by \(\frac{1}{m-1}\rho (\rho +\epsilon )^{m-1}\) and take the limit \(\epsilon \rightarrow 0\):

$$\begin{aligned}&\mathcal {S}[\mu (\tau )] - \mathcal {S}[\rho _s]\nonumber \\&\quad := \lim _{\epsilon \rightarrow 0} \int \frac{1}{m-1}\Big (\mu (\tau ,x) (\mu (\tau ,x)+ \epsilon )^{m-1} - \rho _s(x) (\rho _s(x)+ \epsilon )^{m-1} \Big ) dx,\nonumber \\ \end{aligned}$$
(2.6)

and if \(m=1\) the integrand is replaced by \(\mu (\tau ,\cdot ) \log (\mu (\tau ,\cdot )+\epsilon ) - \rho _s \log (\rho _s +\epsilon )\). Note that as long as \(\mu (\tau )\) has the same distribution as \(\rho _s\), the above definition gives \(\mathcal {S}[\mu (\tau )] - \mathcal {S}[\rho _s] =0\). With such modification, we will show that the difference \(\mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s]\) is well-defined and satisfies the same two inequalities as the \(m>1\) case, so we again have a contradiction for small \(\tau >0\).

If the kernel W has certain convexity properties and \(m\ge 1-\tfrac{1}{d}\), then it is known that (1.1) has a rigorous Wasserstein gradient flow structure. In this case, once we obtain the crucial estimate: \(\mathcal {E}[\mu (\tau )] -\mathcal {E}[\rho _s] <-c\tau \), there is a shortcut that directly leads to the radial symmetry result, which we will discuss in Sect. 2.4.

Let us characterize first the set of possible stationary states of (1.1) in the sense of Definition 2.1 and their regularity. Parts of these arguments are reminiscent from those done in [28, 79] in the case of attractive Newtonian potentials.

Lemma 2.3

Let \(\rho _s \in L^1_+({\mathbb {R}}^d)\cap L^\infty ({\mathbb {R}}^d)\) with \(\omega (1+|x|)\rho _s \in L^1({\mathbb {R}}^d)\) be a non-negative stationary state of (1.1) for some \(m>0\) in the sense of Definition 2.1. Then \(\rho _s \in \mathcal {C}({\mathbb {R}}^d)\), and there exists some \(C = C(\Vert \rho _s\Vert _{L^1}, \Vert \rho _s\Vert _{L^\infty }, C_w, d)>0\), such that

$$\begin{aligned} \frac{m}{|m-1|}|\nabla (\rho _s^{m-1})| \le C \quad \text { in }\mathrm {supp}\,\rho _s \quad \text { if } m\ne 1, \end{aligned}$$
(2.7)

and

$$\begin{aligned} |\nabla \log \rho | \le C \quad \text { in }\mathrm {supp}\,\rho _s \quad \text { if } m= 1. \end{aligned}$$
(2.8)

In addition, if \(m \in (0,1]\), then \(\mathrm {supp}\,\rho _s = {\mathbb {R}}^d\).

Proof

We have already checked that under these assumptions on W and \(\rho _s\), the potential function \(\psi _s\in {\mathcal W}^{1,\infty }_{loc}({\mathbb {R}}^d)\) due to (2.2)–(2.4). Since \(\rho _s^{m}\in H^1_{loc} ({\mathbb {R}}^d)\), then \(\rho _s^m\) is a weak \(H^1_{loc}({\mathbb {R}}^{d})\) solution of

$$\begin{aligned} \Delta \rho _s^{m} = -\nabla \cdot \left( \rho _s\nabla \psi _s\right) \text { in } \mathbb {{\mathbb {R}}}^d \end{aligned}$$
(2.9)

with right hand side belonging to \({\mathcal W}^{-1,p}_{loc} ({\mathbb {R}}^d)\) for all \(1\le p \le \infty \). As a consequence, \(\rho _s^m\) is in fact a weak solution in \({\mathcal W}_{loc}^{1,p}({\mathbb {R}}^d)\) for all \(1<p<\infty \) of (2.9) by classical elliptic regularity results. Sobolev embedding shows that \(\rho _s^m\) belongs to some Hölder space \(\mathcal {C}_{loc}^{0,\alpha }({\mathbb {R}}^d)\), and thus \(\rho _s\in \mathcal {C}_{loc}^{0,\beta }({\mathbb {R}}^d)\) with \(\beta := \min \{\alpha /m, 1\}\). Let us define the set \(\Omega =\{x\in {\mathbb {R}}^d : \rho _s(x)>0\}\). Since \(\rho _s\in \mathcal {C}({\mathbb {R}}^d)\), then \(\Omega \) is an open set and it consists of a countable number of open possibly unbounded connected components. Let us take any bounded smooth connected open subset \(\Theta \) such that \(\overline{ \Theta } \subset \Omega \), and start with the case \(m\ne 1\). Since \(\rho _s\in \mathcal {C}({\mathbb {R}}^d)\), then \(\rho _s\) is bounded away from zero in \(\Theta \) and thus due to the assumptions on \(\rho _s\), we have that \(\frac{m}{m-1}\nabla \rho _s^{m-1} = \frac{1}{\rho _s} \nabla \rho _{s}^{m}\) holds in the distributional sense in \(\Theta \). We conclude that wherever \(\rho _s\) is positive, (2.1) can be interpreted as

$$\begin{aligned} \nabla \left( \frac{m}{m-1} \rho _s^{m-1} +\psi _s \right) = 0\,, \end{aligned}$$
(2.10)

in the sense of distributions in \(\Omega \). Hence, the function \(G(x)=\frac{m}{m-1}\rho _s^{m-1}(x) +\psi _s (x)\) is constant in each connected component of \(\Omega \). From here, we deduce that any stationary state of (1.1) in the sense of Definition 2.1 is given by

$$\begin{aligned} \rho _s(x)= \left( \frac{m-1}{m}(G-\psi _s)(x)\right) _+^{\tfrac{1}{m-1}}\,, \end{aligned}$$
(2.11)

where G(x) is a constant in each connected component of the support of \(\rho _s\), and its value may differ in different connected components. Due to \(\psi _s\in {\mathcal W}^{1,\infty }_{loc}({\mathbb {R}}^d)\), we deduce that \(\rho _s \in \mathcal {C}_{loc}^{0,1/(m-1)}({\mathbb {R}}^d)\) if \(m\ge 2\) and \(\rho _s \in \mathcal {C}_{loc}^{0,1}({\mathbb {R}}^d)\) for \(m \in (0,1)\cup (1,2)\). Putting together (2.11) and (2.2), we conclude the desired estimate.

In addition, from (2.11) we have that \(\Omega = {\mathbb {R}}^d\) if \(m \in (0,1)\): if not, let \(\Omega _0\) be any connected component of \(\Omega \), and take \(x_0 \in \partial \Omega _0\). As we take a sequence of points \(x_n \rightarrow x_0\) with \(x_n \in \Omega _0\), we have that \(\rho _s(x_n)^{m-1}\rightarrow \infty \), whereas the sequence \(G(x_n) - \psi _s(x_n)\) is bounded [since \(\psi _s\) is locally bounded due to (2.4)], a contradiction.

If \(m=1\), the above argument still goes through except that we replace (2.10) by

$$\begin{aligned} \nabla \left( \log \rho _s +\psi _s \right) = 0 \end{aligned}$$

in the sense of distributions in \(\Omega \). As a result, the function \(G(x)=\log \rho _s +\psi _s (x)\) is constant in each connected component of \(\Omega \). The same argument as the \(m\in (0,1)\) case then yields that \(\rho _s \in \mathcal {C}_{loc}^{0,1}({\mathbb {R}}^d)\) and \(\Omega = {\mathbb {R}}^d\), leading to the estimate \(|\nabla \log \rho | \le C\) in \({\mathbb {R}}^d\). \(\square \)

2.1 Some preliminaries about rearrangements

Now we briefly recall some standard notions and basic properties of decreasing rearrangements for non-negative functions that will be used later. For a deeper treatment of these topics, we address the reader to the books [6, 51, 56, 60, 64] or the papers [73, 81,82,83]. We denote by \(|E|_{d}\) the Lebesgue measure of a measurable set E in \({\mathbb {R}}^{d}\). Moreover, the set \(E^{\#}\) is defined as the ball centered at the origin such that \(|E^{\#}|_{d}=|E|_{d}\).

A non-negative measurable function f defined on \({\mathbb {R}}^{d}\) is called radially symmetric if there is a non-negative function \(\widetilde{f}\) on \([0,\infty )\) such that \(f(x)=\widetilde{f}(|x|)\) for all \(x\in {\mathbb {R}}^{d}\). If f is radially symmetric, we will often write \(f(x)=f(r)\) for \(r=|x|\ge 0\) by a slight abuse of notation. We say that f is rearranged if it is radial and \(\widetilde{f}\) is a non-negative right-continuous, non-increasing function of \(r>0\). A similar definition can be applied for real functions defined on a ball \(B_{R}(0)=\left\{ x\in {\mathbb {R}}^d:|x|<R\right\} \).

We define the distribution function of\(f\in L^{1}_{+}({\mathbb {R}}^{d})\) by

$$\begin{aligned} \zeta _{f}(\tau )=|\left\{ x\in {\mathbb {R}}^{d}:f(x)>\tau \right\} |_{d},\quad \text { for all }\tau >0. \end{aligned}$$

Then the function \(f^{*}:[0,+\infty )\rightarrow [0,+\infty ]\) defined by

$$\begin{aligned} f^{*}(s)=\sup \left\{ \tau>0:\zeta _{f}(\tau )>s \right\} ,\quad s\in [0,+\infty ), \end{aligned}$$

will be called the Hardy–Littlewood one-dimensional decreasing rearrangement of f. By this definition, one could interpret \(f^{*}\) as the generalized right-inverse function of \(\zeta _{f}(\tau )\).

Making use of the definition of \(f^{*}\), we can define a special radially symmetric decreasing function \(f^{\#}\), which we will call the Schwarz spherical decreasing rearrangement of f by means of the formula

$$\begin{aligned} f^{\#}(x)=f^{*}(\omega _{d}|x|^{d}) \quad x\in {\mathbb {R}}^{d}, \end{aligned}$$
(2.12)

where \(\omega _d\) is the volume of the unit ball in \({\mathbb {R}}^d\). It is clear that if the set \(\Omega _{f}=\left\{ x\in {\mathbb {R}}^{d}:f(x)>0\right\} \) of f has finite measure, then \(f^{\#}\) is supported in the ball \(\Omega _{f}^{\#}\).

One can show that \(f^{*}\) (and so \(f^{\#}\)) is equidistributed with f (i.e. they have the same distribution function). Thus if \(f\in L^{p}({\mathbb {R}}^d)\), a simple use of Cavalieri’s principle (see e.g. [60, 82]) leads to the invariance property of the \(L^{p}\) norms:

$$\begin{aligned} \Vert f\Vert _{L^{p}({\mathbb {R}}^{d})} = \Vert f^*\Vert _{L^{p}(0,\infty )}=\Vert f^\#\Vert _{L^{p}({\mathbb {R}}^{d})} \qquad \text{ for } \text{ all } 1\le p \le \infty \,.\qquad \end{aligned}$$
(2.13)

In particular, using the layer-cake representation formula (see e.g. [64]) one could easily infer that

$$\begin{aligned} f^\#(x) = \int _0^\infty \chi _{\{f>\tau \}^\#} d\tau . \end{aligned}$$

Among the many interesting properties of rearrangements, it is worth mentioning the Hardy–Littlewood inequality (see [6, 51, 60] for the proof): for any couple of non-negative measurable functions \(f,\,g\) on \({\mathbb {R}}^{d}\), we have

$$\begin{aligned} \int _{{\mathbb {R}}^d} f(x)g(x)dx\le \int _{{\mathbb {R}}^{d}} f^{\#}(x)g^{\#}(x)dx. \end{aligned}$$
(2.14)

Since in Sect. 4 we will use estimates of the solutions of Keller–Segel problems in terms of their integrals, let us now recall the concept of comparison of mass concentration, taken from [85], that is remarkably useful.

Definition 2.4

Let \(f,g\in L^{1}_{loc}({\mathbb {R}}^{d})\) be two non-negative, radially symmetric functions on \({\mathbb {R}}^{d}\). We say that f is less concentrated than g, and we write \(f\prec g\) if for all \(R>0\) we get

$$\begin{aligned} \int _{B_{R}(0)}f(x)dx\le \int _{B_{R}(0)}g(x)dx. \end{aligned}$$

The partial order relationship \(\prec \) is called comparison of mass concentrations. Of course, this definition can be suitably adapted if fg are radially symmetric and locally integrable functions on a ball \(B_{R}\). The comparison of mass concentrations enjoys a nice equivalent formulation if f and g are rearranged, whose proof we refer to [1, 41, 86]:

Lemma 2.5

Let \(f,g\in L^{1}_{+}({\mathbb {R}}^{d})\) be two non-negative rearranged functions. Then \(f\prec g\) if and only if for every convex nondecreasing function \(\Phi :[0,\infty )\rightarrow [0,\infty )\) with \(\Phi (0)=0\) we have

$$\begin{aligned} \int _{\Omega }\Phi (f(x))\,dx\le \int _{\Omega }\Phi (g(x))\,dx. \end{aligned}$$

From this Lemma, it easily follows that if \(f\prec g\) and \(f,g\in L^{p}({\mathbb {R}}^{d})\) are rearranged and non-negative, then

$$\begin{aligned} \Vert f\Vert _{L^{p}({\mathbb {R}}^{d})}\le \Vert g\Vert _{L^{p}({\mathbb {R}}^{d})}\quad \forall p\in [1,\infty ]. \end{aligned}$$

Let us also observe that if \(f,g\in L^{1}_{+}({\mathbb {R}}^{d})\) are non-negative and rearranged, then \(f\prec g\) if and only if for all \(s\ge 0\) we have

$$\begin{aligned} \int _{0}^{s}f^{*}(\sigma )d\sigma \le \int _{0}^{s}g^{*}(\sigma )d\sigma . \end{aligned}$$

If \(f\in L^1_+({\mathbb {R}}^d)\), we denote by \(M_2[f]\) the second moment of f, i.e.

$$\begin{aligned} M_2[f] := \int _{{\mathbb {R}}^d} f(x)|x|^2dx. \end{aligned}$$
(2.15)

In this regard, another interesting property which will turn out useful is the following

Lemma 2.6

Let \(f, g \in L^1_+({\mathbb {R}}^d)\) with \(\Vert f\Vert _{L^1({\mathbb {R}}^d)} = \Vert g\Vert _{L^1({\mathbb {R}}^d)}\). If additionally g is rearranged and \(f^\# \prec g\), then \(M_2[f] \ge M_2[g]\).

Proof

Let us consider the sequence of bounded radially increasing functions \(\left\{ \varphi _{n}\right\} \), where \(\varphi _{n}(x)=\min \left\{ |x|^{2},n\right\} \) is the truncation of the function \(|x|^{2}\) at the level n and define the function

$$\begin{aligned} h_{n}=n-\varphi _{n}. \end{aligned}$$

Then \(h_{n}\) is non-negative, bounded and rearranged. Thus using the Hardy–Littlewood inequality (2.14) and [1, Corollary 2.1] we find

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}f(x)\,\varphi _{n}(x)dx&=n\Vert f\Vert _{L^1({\mathbb {R}}^d)}-\int _{{\mathbb {R}}^{d}}f(x)\,h_{n}(x)dx\\&\ge n\Vert f\Vert _{L^1({\mathbb {R}}^d)}-\int _{{\mathbb {R}}^{d}}f^{\#}(x)\,h_{n}(x)dx\\&\ge n\Vert g\Vert _{L^1({\mathbb {R}}^d)}-\int _{{\mathbb {R}}^{d}}g(x)\,h_{n}(x)dx =\int _{{\mathbb {R}}^{d}}g(x)\,\varphi _{n}(x)dx \end{aligned}$$

Then passing to the limit as \(n\rightarrow \infty \) we find the desired result. \(\square \)

Remark 2.7

Lemma 2.6 can be easily generalized when \(|x|^{2}\) is replaced by any non-negative radially increasing potential \(V=V(r)\), \(r=|x|\), such that

$$\begin{aligned} \lim _{r\rightarrow +\infty }V(r)=+\infty . \end{aligned}$$

2.2 Continuous Steiner symmetrization

Although classical decreasing rearragement techniques are very useful to study properties of the minimizers and for solutions of the evolution problem (1.1) in next sections, we do not know how to use them in connection with showing that stationary states are radially symmetric. For an introduction of continuous Steiner symmetrization and its properties, see [16, 18, 64]. In this subsection, we will use continuous Steiner symmetrization to prove the following proposition.

Proposition 2.8

Let \(\mu _0 \in \mathcal {C}({\mathbb {R}}^d) \cap L^1_+({\mathbb {R}}^d)\), and assume it is not radially decreasing after any translation.

Moreover, if \(m\in (0,1)\cup (1,\infty )\), assume that \(|\frac{m}{m-1}\nabla \mu _0^{m-1}| \le C_0\) in \(\mathrm {supp}\,\mu _0\) for some \(C_0\); and if \(m=1\) assume that \(|\nabla \log \mu _0| \le C_0\) in \(\mathrm {supp}\,\mu _0\) for some \(C_0\). In addition, if \(m\in (0,1]\), assume that \(\mathrm {supp}\,\mu _0 = {\mathbb {R}}^d\).

Then there exist some \(\delta _0>0, c_0>0, C_1>0\) (depending on m, \(\mu _0\) and \(W\)) and a function \(\mu \in C([0,\delta _0]\times {\mathbb {R}}^d)\) with \(\mu (0,\cdot ) = \mu _0\), such that \(\mu \) satisfies the following for a short time \(\tau \in [0,\delta _0]\), where \(\mathcal {E}\) is as given in (2.5):

$$\begin{aligned}&\mathcal {E}[\mu (\tau )] - \mathcal {E}[\mu _0] \le - c_0 \tau , \end{aligned}$$
(2.16)
$$\begin{aligned}&|\mu (\tau ,x) - \mu _0(x)| \le C_1 \mu _0(x)^{\max \{1,2-m\}} \tau \quad \text { for all } x\in {\mathbb {R}}^d, \end{aligned}$$
(2.17)
$$\begin{aligned}&\int _{D_i} (\mu (\tau ,x)-\mu _0(x))dx =0 \,\mathrm{for\, any\, connected\, component\, } D_i\,\mathrm{of\, } \mathrm {supp}\,\mu _0.\nonumber \\ \end{aligned}$$
(2.18)

2.2.1 Definitions and basic properties of Steiner symmetrization

Let us first introduce the concept of Steiner symmetrization for a measurable set \(E\subset {\mathbb {R}}^{d}\) . If \(d=1\), the Steiner symmetrization of E is the symmetric interval \(S(E)=\left\{ x\in {\mathbb {R}}:|x|<|E|_{1}/2\right\} \). Now we want to define the Steiner symmetrization of E with respect to a direction in \({\mathbb {R}}^{d}\) for \(d\ge 2\). The direction we symmetrize corresponds to the unit vector \(e_{1}=(1,0,\ldots ,0)\), although the definition can be modified accordingly when considering any other direction in \({\mathbb {R}}^{d}\).

Let us label a point \(x\in {\mathbb {R}}^{d}\) by \((x_{1},x^{\prime })\), where \(x^{\prime }=(x_{2},\ldots ,x_{d})\in {\mathbb {R}}^{d-1}\) and \(x_{1}\in {\mathbb {R}}\). Given any measurable subset E of \({\mathbb {R}}^{d}\) we define, for all \(x^{\prime }\in {\mathbb {R}}^{d-1}\), the section of E with respect to the direction \(x_{1}\) as the set

$$\begin{aligned} E_{x^{\prime }}=\left\{ x_{1}\in {\mathbb {R}}:(x_{1},x^{\prime })\in E \right\} . \end{aligned}$$

Then we define the Steiner symmetrization of E with respect to the direction \(x_{1}\) as the set S(E) which is symmetric about the hyperplane \(\left\{ x_{1}=0\right\} \) and is defined by

$$\begin{aligned} S(E)=\left\{ (x_{1},x^{\prime })\in {\mathbb {R}}^{d}:x_{1}\in S(E_{x^{\prime }}) \right\} . \end{aligned}$$

In particular we have that \(|E|_{d}=|S(E)|_{d}\).

Now, consider a non-negative function \(\mu _{0}\in L^{1}({\mathbb {R}}^{d})\), for \(d\ge 2\). For all \(x^{\prime }\in {\mathbb {R}}^{d-1}\), let us consider the distribution function of \(\mu _{0}(\cdot ,x^{\prime })\), i.e. the function

$$\begin{aligned} \zeta _{\mu _{0}}(h,x^{\prime })=|U_{x'}^h|_{1}\quad \text { for all }h>0,\,x^{\prime }\in {\mathbb {R}}^{d-1}, \end{aligned}$$

where

$$\begin{aligned} U_{x'}^h = \{x_1 \in {\mathbb {R}}: \mu _0(x_1, x')>h\}. \end{aligned}$$
(2.19)

Then we can give the following definition:

Definition 2.9

We define the Steiner symmetrization (or Steiner rearrangement) of \(\mu _{0}\) in the direction \(x_{1}\) as the function \(S \mu _{0}=S \mu _{0}(x_{1},x^{\prime })\) such that \(S \mu _{0}(\cdot ,x^{\prime })\) is exactly the Schwarz rearrangement of \(\mu _{0}(\cdot ,x^{\prime })\)i.e. (see (2.12))

$$\begin{aligned} S\mu _{0}(x_{1},x^{\prime })=\sup \left\{ h>0:\zeta _{\mu _{0}}(h,x^{\prime })>2|x_{1}|\right\} . \end{aligned}$$

As a consequence, the Steiner symmetrization \(S\mu _{0}(x_{1},x^{\prime })\) is a function being symmetric about the hyperplane \(\left\{ x_{1}=0\right\} \) and for each \(h>0\) the level set

$$\begin{aligned} \left\{ (x_{1},x^{\prime }):S\mu _{0}(x_{1},x^{\prime })>h\right\} \end{aligned}$$

is equivalent to the Steiner symmetrization

$$\begin{aligned} S(\left\{ (x_{1},x^{\prime }):\mu _{0}(x_{1},x^{\prime })>h\right\} ) \end{aligned}$$

which implies that \(S\mu _{0}\) and \(\mu _{0}\) are equidistributed, yielding the invariance of the \(L^{p}\) norms when passing from \(\mu _{0}\) to \(S\mu _{0}\), that is for all \(p\in [1,\infty ]\) we have

$$\begin{aligned} \Vert S\mu _{0}\Vert _{L^{p}({\mathbb {R}}^{d})}=\Vert \mu _{0}\Vert _{L^{p}({\mathbb {R}}^{d})}. \end{aligned}$$

Moreover, by the layer-cake representation formula, we have

$$\begin{aligned} S\mu _{0}(x_{1},x^{\prime })=\int _0^\infty \chi _{S(U_{x'}^h)}(x_1) \,dh\,. \end{aligned}$$
(2.20)

Now, we introduce a continuous version of this Steiner procedure via an interpolation between a set or a function and their Steiner symmetrizations that we will use in our symmetry arguments for steady states.

Definition 2.10

For an open set \(U\subset {\mathbb {R}}\), we define its continuous Steiner symmetrization\(M^\tau (U)\) for any \(\tau \ge 0\) as below. In the following we abbreviate an open interval \((c-r, c+r)\) by I(cr), and we denote by \(\mathrm {sgn}\,c\) the sign of c (which is 1 for positive c, \(-1\) for negative c, and 0 if \(c=0\)).

  1. (1)

    If \(U = I(c,r)\), then

    $$\begin{aligned} M^\tau (U):= {\left\{ \begin{array}{ll}I(c-\tau \,\mathrm {sgn}\,c, r) &{} \text { for }0\le \tau < |c|,\\ I(0,r) &{}\text { for }\tau \ge |c|. \end{array}\right. } \end{aligned}$$
  2. (2)

    If \(U = \cup _{i=1}^N I(c_i,r_i)\) (where all \(I(c_i, r_i)\) are disjoint), then \(M^\tau (U) := \cup _{i=1}^N M^\tau (I(c_i, r_i))\) for \(0\le \tau <\tau _1\), where \(\tau _1\) is the first time two intervals \(M^\tau (I(c_i, r_i))\) share a common endpoint. Once this happens, we merge them into one open interval, and repeat this process starting from \(\tau =\tau _1\).

  3. (3)

    If \(U = \cup _{i=1}^\infty I(c_i, r_i)\) (where all \(I(c_i, r_i)\) are disjoint), let \(U_N = \cup _{i=1}^N I(c_i, r_i)\) for each \(N>0\), and define \(M^\tau (U) := \cup _{N=1}^\infty M^\tau (U_N)\).

See Fig. 1 for illustrations of \(M^\tau (U)\) in the cases (1) and (2). Also, we point out that case (3) can be seen as a limit of case (2), since for each \(N_1<N_2\) one can easily check that \(M^\tau (U_{N_1}) \subset M^\tau (U_{N_2})\) for all \(\tau \ge 0\). Moreover, according to [18], the definition of \(M^{\tau }(U)\) can be extended to any measurable set U of \({\mathbb {R}}\), since

$$\begin{aligned} U=\bigcap _{n=1}^{\infty } O_{n}\setminus N, \end{aligned}$$

being \(O_{n}\supset O_{n+1}\)\(n=1,2,\ldots ,\) open sets and N a nullset.

Fig. 1
figure 1

Illustrations of \(M^\tau (U)\) when U is a single open interval (left), and when U is the union of two open intervals (right)

In the next lemma we state four simple facts about \(M^\tau \). They can be easily checked for case (1) and (2) (hence true for (3) as well by taking the limit), and we omit the proof.

Lemma 2.11

Given any open set \(U\subset {\mathbb {R}}\), let \(M^\tau (U)\) be defined in Definition 2.10. Then

  1. (a)

    \(M^{0}(U)=U\), \(M^{\infty }(U)=S(E)\).

  2. (b)

    \(|M^\tau (U)| = |U|\) for all \(\tau \ge 0\).

  3. (c)

    If \(U_1 \subset U_2\), we have \(M^\tau (U_1) \subset M^\tau (U_2)\) for all \(\tau \ge 0\).

  4. (d)

    \(M^\tau \) has the semigroup property: \(M^{\tau +s}U = M^\tau (M^s(U))\) for any \(\tau ,s\ge 0\) and open set U.

Once we have the continuous Steiner symmetrization for a one-dimensional set, we can define the continuous Steiner symmetrization (in a certain direction) for a non-negative function in \({\mathbb {R}}^d\).

Definition 2.12

Given \(\mu _0 \in L^1_+({\mathbb {R}}^d)\), we define its continuous Steiner symmetrization \(S^\tau \mu _0\) (in direction \(e_1 = (1,0,\cdots ,0)\)) as follows. For any \(x_1 \in {\mathbb {R}}, x'\in {\mathbb {R}}^{d-1}, h>0\), let

$$\begin{aligned} S^\tau \mu _0(x_1, x') := \int _0^\infty \chi _{M^\tau (U_{x'}^h)}(x_1) dh, \end{aligned}$$

where \(U_{x'}^h\) is defined in (2.19).

Fig. 2
figure 2

Illustrations of \(\mu _0\) and \(S^\tau \mu _0\) (for a small \(\tau >0\))

For an illustration of \(S^\tau \mu _0\) for \(\mu _0\in L^1({\mathbb {R}})\), see Fig. 2.

Using the above definition, Lemma 2.11 and the representation (2.20) one immediately has

$$\begin{aligned} S^{0}\mu _{0}=\mu _{0}, \quad S^{\infty }\mu _{0}=S^{}\mu _{0}. \end{aligned}$$

Furthermore, it is easy to check that \(S^\tau \mu _0 = \mu _0\) for all \(\tau \) if and only if \(\mu _0\) is symmetric decreasing about the hyperplane \(H=\{x_1=0\}\). Below is the definition for a function being symmetric decreasing about a hyperplane:

Definition 2.13

Let \(\mu _0 \in L^1_+({\mathbb {R}}^d)\). For a hyperplane \(H \subset {\mathbb {R}}^d\) (with normal vector e), we say \(\mu _0\) is symmetric decreasing about H if for any \(x\in H\), the function \(f(\tau ):=\mu _0(x+\tau e)\) is rearranged, i.e. if \(f=f^{\#}\).

Next we state some basic properties of \(S^\tau \) without proof, see [18, 56, 58] for instance.

Lemma 2.14

The continuous Steiner symmetrization \(S^\tau \mu _0\) in Definition 2.12 has the following properties:

  1. (a)

    For any \(h>0\), \(|\{S^\tau \mu _0> h\}| = |\{\mu _0>h\}|\). As a result, \(\Vert S^\tau \mu _0\Vert _{L^p({\mathbb {R}}^d)} = \Vert \mu _0\Vert _{L^p({\mathbb {R}}^d)}\) for all \(1\le p\le +\infty \).

  2. (b)

    \(S^\tau \) has the semigroup property, that is, \(S^{\tau +s}\mu _0 = S^\tau (S^s\mu _0)\) for any \(\tau ,s\ge 0\) and non-negative \(\mu _0 \in L^1({\mathbb {R}}^d)\).

Lemma 2.14 immediately implies that \(\mathcal {S}[S^\tau \mu _0]\) is constant in \(\tau \), where \(\mathcal {S}[\cdot ]\) is as given in (2.5).

2.2.2 Interaction energy under Steiner symmetrization

In this subsection, we will investigate \(\mathcal {I}[S^\tau \mu _0]\). It has been shown in [18, Corollary 2] and [64, Theorem 3.7] that \(\mathcal {I}[S^\tau \mu _0]\) is non-increasing in \(\tau \). Indeed, in the case that \(\mu _0\) is a characteristic function \(\chi _{\Omega _0}\), it is shown in [72] that \(\mathcal {I}[S^\tau \mu _0]\) is strictly decreasing for \(\tau \) small enough if \(\Omega _0\) is not a ball. However, in order to obtain (2.16) for a strictly positive \(c_0\), some refined estimates are needed, and we will prove the following:

Proposition 2.15

Let \(\mu _0 \in \mathcal {C}({\mathbb {R}}^d) \cap L^1_+({\mathbb {R}}^d)\). Assume the hyperplane \(H=\{x_1=0\}\) splits the mass of \(\mu _0\) into half and half, and \(\mu _0\) is not symmetric decreasing about H. Let \(\mathcal {I}[\cdot ]\) be given in (2.5), where \(W\) satisfies the assumptions (K1)–(K3). Then \(\mathcal {I}[S^\tau \mu _0]\) is non-increasing in \(\tau \), and there exists some \(\delta _0>0\) (depending on \(\mu _0\)) and \(c_0>0\) (depending on \(\mu _0\) and \(W\)), such that

$$\begin{aligned} \mathcal {I}[S^\tau \mu _0] \le \mathcal {I}[\mu _0] - c_0 \tau \quad \text { for all } \tau \in [0,\delta _0]. \end{aligned}$$

The building blocks to prove Proposition 2.15 are a couple of lemmas estimating how the interaction energy between two one-dimensional densities \(\mu _1, \mu _2\) changes under continuous Steiner symmetrization for each of them. That is, we will investigate how

$$\begin{aligned} I_{\mathcal K}[\mu _1,\mu _2](\tau ) := \int _{{\mathbb {R}}\times {\mathbb {R}}} (S^\tau \mu _1)(x) (S^\tau \mu _2)(y) {\mathcal K}(x-y) dxdy \end{aligned}$$
(2.21)

changes in \(\tau \) for a given one-dimensional kernel \({\mathcal K}\) to be determined. We start with the basic case where \(\mu _1, \mu _2\) are both characteristic functions of some open interval.

Lemma 2.16

Assume \({\mathcal K}(x) \in \mathcal {C}^1({\mathbb {R}})\) is an even function with \({\mathcal K}'(x)<0\) for all \(x>0\). For \(i=1,2\), let \(\mu _i := \chi _{I(c_i,r_i)}\) respectively, where I(cr) is as given in Definition 2.10. Then the following holds for the function \(I(\tau ) := I_{\mathcal K}[\mu _1, \mu _2](\tau )\) introduced in (2.21):

  1. (a)

    \(\frac{d^+}{d \tau } I(0) \ge 0\). (Here \(\frac{d^+}{d\tau }\) stands for the right derivative.)

  2. (b)

    If in addition \(\mathrm {sgn}\,c_1 \ne \mathrm {sgn}\,c_2\), then

    $$\begin{aligned} \frac{d^+}{d \tau } I(0) \ge c_w \min \{r_1, r_2\} |c_2-c_1| > 0, \end{aligned}$$
    (2.22)

    where \(c_w\) is the minimum of \(|{\mathcal K}'(r)|\) for \(r\in [\frac{|c_2-c_1|}{2}, r_1+r_2 + |c_2-c_1|]\).

Proof

By definition of \(S^\tau \), we have \(S^\tau \mu _i = \chi _{M^\tau (I(c_i, r_i))}\) for \(i=1,2\) and all \(\tau \ge 0\). If \(\mathrm {sgn}\,c_1 = \mathrm {sgn}\,c_2\), the two intervals \(M^\tau (I(c_i, r_i))\) are moving towards the same direction for small enough \(\tau \), during which their interaction energy \(I(\tau )\) remains constant, implying \(\frac{d}{d \tau } I(0)=0\). Hence it suffices to focus on \(\mathrm {sgn}\,c_1 \ne \mathrm {sgn}\,c_2\) and prove (2.22).

Without loss of generality, we assume that \(c_2>c_1\), so that \(\mathrm {sgn}\,c_2- \mathrm {sgn}\,c_1\) is either 2 or 1. The definition of \(M^\tau \) gives

$$\begin{aligned} \begin{aligned} I(\tau )&= \int _{-r_1+c_1- \tau \mathrm {sgn}\,c_1}^{r_1 + c_1 - \tau \mathrm {sgn}\,c_1} \int _{-r_2+c_2- \tau \mathrm {sgn}\,c_2}^{r_2 + c_2 - \tau \mathrm {sgn}\,c_2}{\mathcal K}(x-y)dydx\\&= \int _{-r_1}^{r_1} \int _{-r_2}^{r_2}{\mathcal K}(x-y + (c_1 - c_2) + \tau (\mathrm {sgn}\,c_2 - \mathrm {sgn}\,c_1))dydx. \end{aligned} \end{aligned}$$

Taking its right derivative in \(\tau \) yields

$$\begin{aligned} \begin{aligned} \frac{d^+}{d\tau } I(0)&= (\mathrm {sgn}\,c_2 - \mathrm {sgn}\,c_1) \int _{-r_1}^{r_1} \int _{-r_2}^{r_2}{\mathcal K}'(x-y + (c_1 - c_2) )dydx. \end{aligned} \end{aligned}$$

Let us deal with the case \(r_1\le r_2\) first. In this case we rewrite \(\frac{d^+}{d\tau } I(0)\) as

$$\begin{aligned} \frac{d^+}{d\tau } I(0) = (\mathrm {sgn}\,c_2 - \mathrm {sgn}\,c_1) \int _{ Q} {\mathcal K}'(x-y)dxdy, \end{aligned}$$
(2.23)

where Q is the rectangle \([-r_1,r_1]\times [-r_2+(c_2-c_1), r_2+(c_2-c_1)]\), as illustrated in Fig. 3. Let \( Q^- = Q \cap \{x-y>0\}\), and \(Q^+ = Q \cap \{x-y<0\}\). The assumptions on \({\mathcal K}\) imply \({\mathcal K}'(x-y)<0\) in \(Q^-\), and \({\mathcal K}'(x-y)>0\) in \(Q^+\).

Fig. 3
figure 3

Illustration of the sets \(Q, Q^-, {\tilde{Q}}^+\) and D in the proof of Lemma 2.16

Let \({\tilde{Q}}^+ := Q^+ \cap \{y\le r_2\}\), and \(D:= [-r_1, r_1]\times [r_2 + \frac{c_2-c_1}{2}, r_2+(c_2-c_1)]\). (\({\tilde{Q}}^+\) and D are the yellow set and green set in Fig. 3 respectively). By definition, \({\tilde{Q}}^+\) and D are disjoint subsets of \(Q^+\), so

$$\begin{aligned} \frac{d^+}{d\tau } I(0)\ge & {} (\mathrm {sgn}\,c_2 - \mathrm {sgn}\,c_1)\nonumber \\&{\times }\,\Bigg (\underbrace{\int _{Q^-} {\mathcal K}'(x-y)dxdy}_{\le 0} + \underbrace{\int _{{\tilde{Q}}^+} {\mathcal K}'(x-y)dxdy}_{\ge 0} +\underbrace{\int _{D} {\mathcal K}'(x-y)dxdy}_{> 0}\Bigg ).\nonumber \\ \end{aligned}$$
(2.24)

We claim that \(\int _{Q^-} {\mathcal K}'(x-y)dxdy + \int _{{\tilde{Q}}^+} {\mathcal K}'(x-y)dxdy \ge 0\). To see this, note that \( Q^- \cup {\tilde{Q}}^+ \) forms a rectangle, whose center has a zero x-coordinate and a positive y-coordinate. Hence for any \(h>0\), the line segment \({\tilde{Q}}^+ \cup \{x-y = -h\}\) is longer than \(Q^-\cup \{x-y = h\}\), which gives the claim.

Therefore, (2.24) becomes

$$\begin{aligned} \frac{d^+}{d\tau } I(0)\ge & {} (\mathrm {sgn}\,c_2 - \mathrm {sgn}\,c_1) \int _{D}{\mathcal K}'(x-y)dxdy \ge \int _{D}{\mathcal K}'(x-y)dxdy \\\ge & {} |D| \min _{(x,y)\in D}{\mathcal K}'(x-y) \end{aligned}$$

Note that D is a rectangle with area \(r_1(c_2-c_1)\), and for any \((x,y)\in D\), we have (recall that \(r_{2}>r_{1}\))

$$\begin{aligned} \frac{|c_2-c_1|}{2}+r_{2}-r_{1}\le y-x \le r_1+r_2 + |c_2-c_1|. \end{aligned}$$

This finally gives

$$\begin{aligned} \frac{d^+}{d\tau } I(0)\ge r_1 (c_2-c_1) \min _{r\in [\frac{|c_2-c_1|}{2}, r_1+r_2 + |c_2-c_1|]} |{\mathcal K}'(r)|. \end{aligned}$$

Similarly, if \(r_1>r_2\), then \(I'(0)\) can be written as (2.23) with \({\tilde{Q}}\) defined as \([-r_1+(c_2-c_1),r_1+(c_2-c_1)]\times [-r_2, r_2]\) instead, and the above inequality would hold with the roles of \(r_1\) and \(r_2\) interchanged. Combining these two cases, we have

$$\begin{aligned} \frac{d^+}{d\tau } I(0) \ge c_w \min \{r_1, r_2\} |c_2-c_1| \quad \text { for }\mathrm {sgn}\,c_1 \ne \mathrm {sgn}\,c_2, \end{aligned}$$

where \(c_w\) is the minimum of \(|{\mathcal K}'(r)|\) for \(r\in [\frac{|c_2-c_1|}{2}, r_1+r_2 + |c_2-c_1|]\). \(\square \)

The next lemma generalizes the above result to open sets with finite measures.

Lemma 2.17

Assume \({\mathcal K}(x) \in \mathcal {C}^1({\mathbb {R}})\) is an even function with \({\mathcal K}'(r)<0\) for all \(r>0\). For open sets \(U_1, U_2 \subset {\mathbb {R}}\) with finite measure, let \(\mu _i := \chi _{U_i}\) for \(i=1,2\), and \(I(\tau ) := I_{\mathcal K}[\mu _1,\mu _2](\tau )\) is as defined in (2.21). Then

  1. (a)

    \(\frac{d}{d \tau }I(\tau )\ge 0\) for all \(\tau \ge 0\);

  2. (b)

    In addition, assume that there exists some \(a\in (0,1)\) and \(R>\max \{|U_1|, |U_2|\}\) such that \(|U_1 \cap (\frac{|U_1|}{2}, R)|>a\), and \(|U_2 \cap (-R, -\frac{|U_2|}{2})|>a\). Then for all \(\tau \in [0,a/4]\), we have

    $$\begin{aligned} \frac{d^+}{d \tau } I(\tau ) \ge \frac{1}{128} c_w a^3 > 0, \end{aligned}$$
    (2.25)

    where \(c_w\) is the minimum of \(|{\mathcal K}'(r)|\) for \(r\in [\frac{a}{4}, 4R]\).

Proof

It suffices to focus on the case when \(U_1, U_2\) both consist of a finite disjoint union of open intervals, and for the general case we can take the limit. Recall that \(S^\tau \mu _i = \chi _{M^\tau (U_i)}\) for \(i=1,2\) and all \(\tau \ge 0\).

To show (a), due to the semigroup property of \(S^\tau \) in Lemma 2.14, all we need to show is \(\frac{d^+}{d \tau }I(0)\ge 0\). By writing \(U_1, U_2\) each as a union of disjoint open intervals and expressing \(I(\tau )\) a sum of the pairwise interaction energy, (a) immediately follows from Lemma 2.16(a).

We will prove (b) next. First, we claim that

$$\begin{aligned} A_1(\tau ) := \left| M^\tau (U_1) \cap \left( \frac{|U_1|}{2}+\frac{a}{4}, \,R\right) \right| >\frac{a}{4} \quad \text { for all } \tau \in [0,\frac{a}{4}].\nonumber \\ \end{aligned}$$
(2.26)

To see this, note that \(A_1(0)>\frac{3a}{4}\) due to the assumption \(|U_1 \cap (\frac{|U_1|}{2}, R)|>a\). Since each interval in \(M^\tau (U_1)\) moves with speed either 0 or \(\pm 1\) at each \(\tau \), we know \(A_1'(\tau )\ge -2\) for all \(\tau \), yielding the claim. (Similarly, \(A_2(\tau ) := |M^\tau (U_2) \cap (-R, -\frac{|U_2|}{2}-\frac{a}{4})|>\frac{a}{4}\) for all \(\tau \in [0,\frac{a}{4}]\).)

Now we pick any \(\tau _0 \in [0,\frac{a}{4}]\), and we aim to prove (2.25) at this particular time \(\tau _0\). At \(\tau =\tau _0\), write \(M^{\tau _0}(U_1):= \cup _{k=1}^{N_1} I(c_k^1, r_k^1)\), where all intervals \(I(c_k^1, r_k^1)\) are disjoint, and none of them share common endpoints – if they do, we merge them into one interval.

Note that for every \(x\in M^{\tau _0}(U_1) \cap (\frac{|U_1|}{2}+\frac{a}{4}, \,R)\), x must belong to some \(I(c_k^1, r_k^1)\) with \(a/4\le c_k^1 \le R+|U_1|/2\). Otherwise, the length of \(I(c_k^1, r_k^1)\) would exceed \(|U_1|\), contradicting Lemma 2.11(a). We then define

$$\begin{aligned} \mathscr {I}_1:= \left\{ 1\le k\le N_1: \frac{a}{4}\le c_k^1 \le R+|U_1|/2\right\} . \end{aligned}$$

Combining the above discussion with (2.26), we have \(\sum _{k\in \mathscr {I}_1}|I(c_k^1, r_k^1)| \ge a/4\), i.e.

$$\begin{aligned} \sum _{k\in \mathscr {I}_1} r_k^1 \ge \frac{a}{8}. \end{aligned}$$
(2.27)

Likewise, let \(M^{\tau _0}(U_2) := \cup _{k=1}^{N_2} I(c_k^2, r_k^2)\), and denote by \(\mathscr {I}_2\) the set of indices k such that \(-R-|U_2|/2\le c_k^2 \le -\frac{a}{4}\), and similarly we have \(\sum _{k\in \mathscr {I}_2} r_k^2 \ge a/8\).

The semigroup property of \(M^\tau \) in Lemma 2.11 gives that for all \(s>0\),

$$\begin{aligned} M^{\tau _0+s}(U_1) = M^s(M^{\tau _0} (U_1)) = M^s( \cup _{k=1}^{N_1} I(c_k^1, r_k^1)). \end{aligned}$$

Since none of the intervals \(I(c_k^1, r_k^1)\) share common endpoints, we have

$$\begin{aligned} M^s( \cup _{k=1}^{N_1} I(c_k^1, r_k^1))= \cup _{k=1}^{N_1} M^s(I(c_k^1, r_k^1)) \quad \text { for sufficiently small }s> 0. \end{aligned}$$

A similar result holds for \(M^{\tau _0+s}(U_2)\), hence we obtain for sufficiently small \(s> 0\):

$$\begin{aligned} I(\tau _0+s) = I_{\mathcal K}[\chi _{M^{\tau _0}(U_1)}, \chi _{M^{\tau _0}(U_2)}](s)= \sum _{k=1}^{N_1} \sum _{l=1}^{N_2} I_{\mathcal K}[\chi _{I(c_k^1, r_k^1)}, \chi _{I(c_l^2, r_l^2)}](s). \end{aligned}$$

Applying Lemma 2.16(a) to the above identity yields

$$\begin{aligned} \frac{d^+}{d\tau } I(\tau _0)&= \sum _{k=1}^{N_1} \sum _{l=1}^{N_2} \frac{d}{ds}I_{\mathcal K}[\chi _{I(c_k^1, r_k^1)}, \chi _{I(c_l^2, r_l^2)}]\Big |_{s=0}\nonumber \\&\ge \sum _{k\in \mathscr {I}_1} \sum _{l\in \mathscr {I}_2} \underbrace{\frac{d}{ds}I_{\mathcal K}[\chi _{I(c_k^1, r_k^1)}, \chi _{I(c_l^2, r_l^2)}]\Big |_{s=0}}_{=: T_{kl}}. \end{aligned}$$
(2.28)

Next we will obtain a lower bound for \(T_{kl}\). By definition of \(\mathscr {I}_1\) and \( \mathscr {I}_2\), for each \(k\in \mathscr {I}_1\) and \(l\in \mathscr {I}_2\) we have that \(c_k^1 \ge \frac{a}{4}\) and \(c_l^2 \le -\frac{a}{4}\), hence \(|c_l^2 - c_k^1| \ge \frac{a}{2}\). Thus Lemma 2.16(b) yields

$$\begin{aligned} T_{kl} \ge c_w \min \{r_k^1, r_l^2\} |c_l^2 - c_k^1| \ge c_w \frac{a}{2} \min \{r_k^1, r_l^2\} \quad \text { for }k\in \mathscr {I}_1, l\in \mathscr {I}_2, \end{aligned}$$

where \(c_w = \min _{r\in [\frac{a}{4}, 4R]} |{\mathcal K}'(r)|\) (here we used that for \(k\in \mathscr {I}_1, l\in \mathscr {I}_2\), we have \(r_k^1+r_l^2 + |c_l^2-c_k^1| \le |U_1|/2+|U_2|/2 + (R+|U_1|/2) + (R+|U_2|/2) \le 4R\), due to the assumption \(R>\max \{|U_1|, |U_2|\}\).)

Plugging the above inequality into (2.28) and using \(\min \{u,v\} \ge \min \{u,1\} \min \{v,1\}\) for \(u,v>0\), we have

$$\begin{aligned} \begin{aligned} \frac{d^+}{d\tau } I(\tau _0)&\ge \frac{ac_w}{2} \sum _{k\in \mathscr {I}_1} \sum _{l\in \mathscr {I}_2} \min \{r_k^1, 1\} \min \{r_l^2, 1\}\\&=\frac{ac_w}{2} \left( \sum _{k\in \mathscr {I}_1} \min \{r_k^1, 1\} \right) \left( \sum _{l\in \mathscr {I}_2} \min \{r_l^2, 1\} \right) \\&\ge \frac{ac_w}{2} \min \left\{ 1, \sum _{k\in \mathscr {I}_1} r_k^1\right\} \min \left\{ 1, \sum _{l\in \mathscr {I}_2} r_l^2\right\} \\&\ge \frac{ac_w}{2} \min \left\{ 1, \frac{a}{8} \right\} ^2 \ge \frac{1}{128} c_w a^3 , \end{aligned} \end{aligned}$$

here we applied (2.27) in the second-to-last inequality, and used the assumption \(a\in (0, 1)\) for the last inequality. Since \(\tau _0 \in [0,a/4]\) is arbitrary, we can conclude. \(\square \)

Now we are ready to prove Proposition 2.15.

Proof of Proposition 2.15

Since \(\mu _0 \in \mathcal {C}({\mathbb {R}}^d) \cap L^1_+({\mathbb {R}}^d)\) is not symmetric decreasing about \(H = \{x_1 = 0\}\), we know that there exists some \(x' \in {\mathbb {R}}^{d-1}\) and \(h>0\), such that \(U_{x'}^h := \{x_1\in {\mathbb {R}}: \mu _0(x_1, x')>h\}\) has finite measure, and its difference from \((-|U_{x'}^h|/2, |U_{x'}^h|/2)\) has nonzero measure.

For \(R>0, a>0\), define

$$\begin{aligned} B_1^{R,a} = \left\{ (x',h)\in {\mathbb {R}}^{d-1}\times (0,+\infty ): \left| U_{x'}^h \cap (|U_{x'}^h|/2, R)\right|>a, |x'|\le R\right\} ,\\ B_2^{R,a} = \left\{ (x',h)\in {\mathbb {R}}^{d-1}\times (0,+\infty ): \left| U_{x'}^h \cap (-R, -|U_{x'}^h|/2)\right| >a, |x'|\le R\right\} . \end{aligned}$$

Our discussion above yields that at least one of \(B_1^{R,a}\) and \(B_2^{R,a}\) is nonempty when R is sufficiently large and \(a>0\) sufficiently small (hence at least one of them must have nonzero measure by continuity of \(\mu _0\)). Indeed, using the fact that H splits the mass of \(\mu _0\) into half and half, we can choose R sufficiently large and \(a>0\) sufficiently small (both of them depend on \(\mu _0\) only), such that both \(B_1^{R,a}\) and \(B_2^{R,a}\) have nonzero measure in \({\mathbb {R}}^{d-1}\times (0,+\infty )\).

Now, let us define a one-dimensional kernel \(K_l(r) := -\tfrac{1}{2}W(\sqrt{r^2+l^2})\). Note that for any \(l>0\), the kernel \(K_l \in \mathcal {C}^1({\mathbb {R}})\) is even in r, and \(K_l'(r)<0\) for all \(r>0\). By definition of \(S^\tau \), we can rewrite \(\mathcal {I}[S^\tau \mu _0]\) as

$$\begin{aligned} \begin{aligned} \mathcal {I}[S^\tau \mu _0]= {} -\int _{({\mathbb {R}}^+)^2}\int _{{\mathbb {R}}^{2(d-1)}} \int _{{\mathbb {R}}^2}&\chi _{ M^\tau (U_{x'}^{h_1}) }(x_1) \chi _{ M^\tau (U_{y'}^{h_2}) }(y_1) \\&K_{|x'-y'|}(|x_1-y_1|) dx_1 dy_1 dx' dy' dh_1dh_2. \end{aligned}\end{aligned}$$

Thus using the notation in (2.21), \(\mathcal {I}[S^\tau \mu _0] \) can be rewritten as

$$\begin{aligned} \mathcal {I}[S^\tau \mu _0] =-\int _{({\mathbb {R}}^+)^2}\int _{{\mathbb {R}}^{2(d-1)}} I_{K_{|x'-y'|}}[\chi _{U_{x'}^{h_1}},\chi _{U_{y'}^{h_2}}](\tau ) ~dx' dy' dh_1dh_2, \end{aligned}$$

and taking its right derivative [and applying Lemma 2.17(a)] yields

$$\begin{aligned} \begin{aligned} -\frac{d^+}{d \tau }\mathcal {I}[S^\tau \mu _0]\ge \int _{(x',h_1)\in B_1^{R,a}}\int _{(y',h_2)\in B_2^{R,a}} \frac{d}{d \tau } I_{K_{|x'-y'|}} [\chi _{U_{x'}^{h_1}},\chi _{U_{y'}^{h_2}}](\tau )~ dy'dh_2 dx'dh_1. \end{aligned}\qquad \qquad \end{aligned}$$
(2.29)

By definition of \(B_1^{R,a}\) and \(B_2^{R,a}\), for any \((x',h_1)\in B_1^{R,a}\) and \((y',h_2)\in B_2^{R,a}\), we can apply Lemma 2.17(b) to obtain

$$\begin{aligned} \frac{d^+}{d \tau } I_{K_{|x'-y'|}}[\chi _{U_{x'}^{h_1}},\chi _{U_{y'}^{h_2}}](\tau ) \ge \frac{1}{128}c_w a^3 \quad \text { for any }\tau \in [0,a/4],\qquad \end{aligned}$$
(2.30)

where \(c_w\) is the minimum of \(|K_{|x'-y'|}'(r)|\) in [a / 4, 4R]. By definition of \(K_l(r)\), we have

$$\begin{aligned} K_{|x'-y'|}'(r) = - \tfrac{1}{2}W'(\sqrt{r^2+|x'-y'|^2}) \frac{r}{\sqrt{r^2+|x'-y'|^2}}. \end{aligned}$$

Using \(|x'|\le R\) and \(|y'|\le R\) (due to definition of \(B_1, B_2\)), we have \( \frac{r}{\sqrt{r^2+|x'-y'|^2}} \ge \frac{a}{20R}\) for all \(r\in [a/4,4R]\), hence \(c_w \ge \frac{a}{40R} \min _{r\in [\frac{a}{4}, 4R]}W'(r)\).

Plugging (2.30) (with the above \(c_w\)) into (2.29) finally yields

$$\begin{aligned} -\frac{d^+}{d \tau }\mathcal {I}[S^\tau \mu _0] \ge \frac{1}{6000} |B_1^{R,a}| |B_2^{R,a}| \min _{r\in [\frac{a}{4}, 4R]}W'(r) a^4>0 \quad \text {for all } \tau \in [0,a/4], \end{aligned}$$

hence we can conclude the desired estimate. \(\square \)

2.2.3 Proof of Proposition 2.8

In the statement of Proposition 2.8, we assume that \(\mu _0\) is not radially decreasing up to any translation. Since Steiner symmetrization only deals with symmetrizing in one direction, we will use the following simple lemma linking radial symmetry with being symmetric decreasing about hyperplanes. Although the result is standard (see [48, Lemma 1.8]), for the sake of completeness we include here the details of the proof.

Lemma 2.18

Let \(\mu _0 \in \mathcal {C}({\mathbb {R}}^d)\). Suppose for every unit vector e, there exists a hyperplane \(H \subset {\mathbb {R}}^d\) with normal vector e, such that \(\mu _0\) is symmetric decreasing about H. Then \(\mu _0\) must be radially decreasing up to a translation.

Proof

For \(i=1,\dots ,d\), let \(e_i\) be the unit vector with i-th coordinate 1 and all the other coordinates 0. By assumption, for each i, there exists some hyperplane \(H_i\) with normal vector \(e_i\), such that \(\mu _0\) is symmetric decreasing about \(H_i\). We then represent each \(H_i\) as \(\{(x_1,\dots ,x_d): x_i = a_i\)} for some \(a_i\in {\mathbb {R}}\), and then define \(a\in {\mathbb {R}}^d\) as \(a:=(a_1,\dots , a_d)\). Our goal is to prove that \(\mu _0(\cdot -a)\) is radially decreasing.

We first claim that \(\mu _0(x) = \mu _0(2a-x)\) for all \(x\in {\mathbb {R}}^d\). For any hyperplane \(H\subset {\mathbb {R}}^d\), let \(T_H: {\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) be the reflection about the hyperplane H. Since \(\mu _0\) is symmetric with respect to \(H_1, \dots , H_d\), we have \(\mu _0(x) = \mu _0(T_{H_i}x)\) for \(x\in {\mathbb {R}}^d\) and all \(i=1,\dots ,d\), thus \(\mu _0(x) = \mu _0(T_{H_1}\dots T_{H_d} x) = \mu _0(2a-x)\).

The claim implies that every hyperplane H passing through a must split the mass of \(\mu _0\) into half and half. Denote the normal vector of H by e. By assumption, \(\mu _0\) is symmetric decreasing about some hyperplane \(H'\) with normal vector e. The definition of symmetric decreasing implies that \(H'\) is the only hyperplane with normal vector e that splits the mass into half and half, hence \(H'\) must coincide with H. Thus \(\mu _0\) is symmetric decreasing about every hyperplane passing through a, hence we can conclude. \(\square \)

Proof of Proposition 2.8

Since \(\mu _0\) is not radially decreasing up to any translation, by Lemma 2.18, there exists some unit vector e, such that \(\mu _0\) is not symmetric decreasing about any hyperplane with normal vector e. In particular, there is a hyperplane H with normal vector e that splits the mass of \(\mu _0\) into half and half, and \(\mu _0\) is not symmetric decreasing about H. We set \(e=(1,0,\dots ,0)\) and \(H = \{x_1=0\}\) throughout the proof without loss of generality. For the rest of the proof, we will discuss two different cases \(m\in (0,1]\) and \(m>1\), and construct \(\mu (\tau , \cdot )\) in different ways.

Case 1:\(m \in (0,1]\). In this case, we simply set \(\mu (\tau ,\cdot ) = S^\tau \mu _0\). By Proposition 2.15, \(\mathcal {I}[S^\tau \mu _0]\) is decreasing at least linearly for a short time. Since continuous Steiner symmetrization preserves the distribution function, even if \(\mathcal {S}[\mu _0] = -\infty \) by itself, we still have the difference \(\mathcal {S}[\mu (\tau )] - \mathcal {S}[\mu _0] \equiv 0\) in the sense of (2.6). Thus (2.16) holds for all sufficiently small \(\tau >0\). In addition, (2.18) is automatically satisfied since we assumed that \(\mathrm {supp}\,\mu _0 = {\mathbb {R}}^d\) for \(m\in (0,1]\), and recall that \(S^\tau \) is mass-preserving by definition.

It then suffices to prove (2.17) for all sufficiently small \(\tau >0\). Let us discuss the case \(m=1\) first. By assumption, \(|\nabla \log \mu _0| \le C_0\). For any \(y\in {\mathbb {R}}^d\) and \(\tau >0\) we claim that

$$\begin{aligned} \log \mu _0(y) - C_0\tau \le \log \mu (\tau ,y) \le \log \mu _0(y) + C_0\tau . \end{aligned}$$
(2.31)

To see this, let us fix any \(y = (y_1, y')\in {\mathbb {R}}^d\). Since \(\log \mu _0(\cdot , y')\) is Lipschitz with constant \(C_0\), for any \(\tau >0\), the following two inequalities hold:

$$\begin{aligned} \mathrm {dist}(y_1, \{x_1\in {\mathbb {R}}: \log \mu _0(x_1, y') > \log \mu _0(y_{1},y^{\prime }) + C_0 \tau \}) \ge \tau \end{aligned}$$

and

$$\begin{aligned} \mathrm {dist}(y_1, \{x_1\in {\mathbb {R}}: \log \mu _0(x_1, y') < \log \mu _0(y_{1},y^{\prime }) - C_0 \tau \}) \ge \tau . \end{aligned}$$

Since the level sets of \(\mu _0\) are moving with velocity at most 1 (and note that any level set of \(\mu _0\) is also a level set of \(\log \mu _0\)), we obtain (2.31). It implies

$$\begin{aligned} \mu _0(y) (e^{-C_0\tau }-1) \le \mu (\tau ,y) - \mu _0(y) \le \mu _0(y) (e^{C_0\tau }-1). \end{aligned}$$

We then have \(|\mu (\tau ,y) - \mu _0(y)| \le 2C_0\mu _0(y) \tau \) for all \(\tau \in (0, \frac{\log 2}{C_0})\) and all \(y\in {\mathbb {R}}^d\).

Now we move on to \(m\in (0,1)\), where we aim to show that \(|\mu (\tau ,y) - \mu _0(y)| \le C_1 \mu _0^{2-m}(y)\tau \) for some \(C_1\) for all sufficiently small \(\tau >0\). Using the assumption \(|\nabla \frac{m}{1-m} \mu _0^{m-1}| \le C_0\), the same argument to obtain (2.31) then gives the following for all \(y\in {\mathbb {R}}^d, \tau >0\):

$$\begin{aligned} \frac{m}{1-m} \mu _0^{m-1}(y) - C_0\tau \le \frac{m}{1-m} \mu ^{m-1}(\tau ,y) \le \frac{m}{1-m} \mu _0^{m-1}(y) + C_0\tau . \end{aligned}$$

Note that \(\mu _0^{m-1}(y)\ge \Vert \mu _0\Vert _\infty ^{m-1}\), since \(\mu _0 \in L^\infty \) and \(m\in (0,1)\). Let us set \(\delta _0 = \frac{m}{2(1-m)C_0}\Vert \mu _0\Vert _\infty ^{m-1}\). For any \(\tau \in (0,\delta _0)\), the left hand side of the above inequality is strictly positive, thus we have

$$\begin{aligned} \left( \mu _0^{m-1}(y) + \frac{C_0(1-m)}{m}\tau \right) ^{\tfrac{1}{m-1}}\le \mu (\tau ,y) \le \left( \mu _0^{m-1}(y) - \frac{C_0(1-m)}{m}\tau \right) ^{\tfrac{1}{m-1}},\nonumber \\ \end{aligned}$$
(2.32)

and note that our choice of \(\delta _0\) ensures that

$$\begin{aligned} \mu _0^{m-1}(y) - \frac{C_0(1-m)}{m}\tau \ge \mu _0^{m-1}(y)- \frac{1}{2}\Vert \mu _0\Vert _\infty ^{m-1} \ge \frac{1}{2}\mu _0^{m-1}(y) \end{aligned}$$

for all \(\tau \in (0,\delta _0)\). Let \(f(a) := \left( \mu _0^{m-1}(y) +a\right) ^{\frac{1}{m-1}} - \mu _0(y)\), which is a convex and decreasing function in a with \(f(0)=0\). Using this function f, the above inequality (2.32) can be rewritten as

$$\begin{aligned} f\left( \frac{C_0(1-m)}{m}\tau \right) \le \mu (\tau ,y) - \mu _0(y) \le f\left( -\frac{C_0(1-m)}{m}\tau \right) . \end{aligned}$$

Since f is convex and decreasing, for all \(|a| \le \frac{C_0(1-m)}{m}\delta _0=\frac{1}{2}\Vert \mu _{0}\Vert _{\infty }^{m-1}\) we have

$$\begin{aligned} |f'(a)| \le \frac{1}{|m-1|} \left( \frac{1}{2}\mu _0^{m-1}(y)\right) ^{\frac{2-m}{m-1}} = \frac{2^{\frac{m-2}{m-1}}}{|m-1|} \mu _0(y)^{2-m}, \end{aligned}$$

and this leads to

$$\begin{aligned} |\mu (\tau ,y) - \mu _0(y)| \le C_1 \mu _0(y)^{2-m} \tau \quad \text { for all }\tau \in (0,\delta _0) \end{aligned}$$

with \(C_1 := \frac{2^{\frac{m-2}{m-1}}}{m}C_{0}\), which gives (2.17).

Case 2:\(m>1\). Note that if we set \(\mu (\tau ,\cdot ) = S^\tau \mu _0\), then it directly satisfies (2.16) for a short time, since \(\mathcal {I}[S^\tau \mu _0]\) is decreasing at least linearly for a short time by Proposition 2.15, and we also have \(\mathcal {S}[S^\tau \mu _0]\) is constant in \(\tau \). However, \(S^\tau \mu _0\) does not satisfy (2.17) and (2.18). To solve this problem, we will modify \(S^\tau \mu _0\) into \(\tilde{S}^\tau \mu _0\), where we make the set \(U_{x'}^h := \{x_1\in {\mathbb {R}}: \mu _0(x_1, x')>h\}\) travels at speed v(h) rather than at constant speed 1, with v(h) given by

$$\begin{aligned} v(h) := {\left\{ \begin{array}{ll} 1 &{} h\ge h_0,\\ \left( \dfrac{h}{h_0}\right) ^{m-1} &{} 0<h<h_0, \end{array}\right. } \end{aligned}$$
(2.33)

for some sufficiently small constant \(h_0>0\) to be determined later. More precisely, we define \(\mu (\tau ,\cdot ) = {\tilde{S}}^\tau \mu _0\) as

$$\begin{aligned} {\tilde{S}}^\tau \mu _0(x_1, x') := \int _0^\infty \chi _{M^{v(h)\tau }(U_{x'}^h)}(x_1) dh \end{aligned}$$
(2.34)

with v(h) as in (2.33). For an illustration on the difference between \(S^\tau \mu _0\) and \({\tilde{S}}^\tau \mu _0\), see the left figure of Fig. 4.

Fig. 4
figure 4

Left: A sketch on \(\mu _0\) (grey), \(S^\tau \mu _0\) (blue) and \({\tilde{S}}^\tau \mu _0\) (red dashed) for a small \(\tau >0\). Right: In the construction of \({\tilde{S}}^\tau \), due to a reduced speed at lower values, a higher value level set may travel over a lower value level set. The figure illustrates this phenomenon for a large \(\tau >0\)

Note that \({\tilde{S}}^\tau \mu _0\) and \(S^\tau \mu _0\) do not necessarily have the same distribution function. Due to a reduced speed v(h) for \(h\in (0,h_0)\) in the construction of \(\tilde{S}^\tau \), a higher block may travel over a lower block, as illustrated in the right figure of Fig. 4. When this happens, the part that is hanging outside would “drop down” as we integrate in h in (2.34), thus changing the distribution function of \({\tilde{S}}^\tau \mu _0\). But, this is not likely (and even impossible) to happen when \(\tau \ll 1\): indeed, using the regularity assumption \(|\nabla \mu _0^{m-1}|\le C_0\) and the particular v(h) in (2.33), one can show that the level sets remain ordered for small enough \(\tau \). But we will not pursue in this direction, since later we will show in (2.38) that \(\mathcal {S}[{\tilde{S}}^\tau \mu _0] \le \mathcal {S}[\mu _0]\) for all \(\tau >0\), which is sufficient for us.

Our goal is to show that such \(\mu (\tau ,\cdot )\) satisfies (2.16), (2.17) and (2.18) for small enough \(\tau \). Let us first prove that for any \(h_0>0\), \(\mu (\tau ,\cdot )\) satisfies (2.17) and (2.18) for \(\tau \in [0,\delta _1]\), where \(\delta _1 = \delta _1(m,h_0,C_0)>0\). To show (2.18), note that the assumption \(|\nabla (\mu _0^{m-1})| \le C_0\) directly leads to the following: for any \(x,y\in {\mathbb {R}}^d\) with \(\mu _0(x)\ge h>0\) and \(\mu _0(y)=0\), we have that \(|x-y| \ge h^{m-1}/C_0\). This implies that for any connected component \(D_i \subset \mathrm {supp}\,\mu _0\),

$$\begin{aligned} \mathrm {dist}\left( \{\mu _0>h\} \cap D_i\,, \partial D_i\right) \ge \frac{h^{m-1}}{C_0} \quad \text { for all } h>0. \end{aligned}$$
(2.35)

Now define \(D_{i,x'}\) as the one-dimensional set \(\{x_1\in {\mathbb {R}}: (x_1, x') \in D_i\}\). The inequality (2.35) yields

$$\begin{aligned} M^{v(h)\tau } (U_{x'}^h \cap D_{i,x'}) \subset D_{i,x'} \quad \text { for all } x'\in {\mathbb {R}}^{d-1}, \,h>0,\, \tau \le \dfrac{h^{m-1}}{C_0 v(h)}, \end{aligned}$$

and note that for any \(h>0\), we have \(h^{m-1}/(C_0 v(h))\ge h_0^{m-1}/C_0\) by definition of v(h). Using the above equation, the definition of \({\tilde{S}}^\tau \) and the fact that \(M^{v(h)\tau }\) is measure-preserving, we have that (2.18) holds for all \(\tau \le h_0^{m-1}/C_0\).

Next we prove (2.17). Let us fix any \(y = (y_1, y')\in {\mathbb {R}}^d\), and denote \(h=\mu _0(y)\). Using \(|\nabla \mu _0^{m-1}| \le C_0\), we have that for any \(\lambda >1\),

$$\begin{aligned} \mathrm {dist}(y_1, U_{y'}^{\lambda h})\ge \frac{(\lambda ^{m-1}-1) h^{m-1}}{C_0}. \end{aligned}$$

So we have \(y_1 \not \in M^{v(\lambda h)\tau } \left( U_{y'}^{\lambda h}\right) \) for all \(\tau \le \frac{(\lambda ^{m-1}-1)h^{m-1}}{C_0 v(\lambda h)}\), which is uniformly bounded below by \( \frac{(\lambda ^{m-1}-1)h_0^{m-1}}{C_0 \lambda ^{m-1}}\) due to the fact that \(v(\lambda h) \le (\lambda h/h_0)^{m-1}\) for all h. By definition of \({\tilde{S}}^\tau \) and the fact that \(\mu _0(y)=h\), the following holds for all \(\lambda > 1\):

$$\begin{aligned} {\tilde{S}}^\tau [\mu _0](y) \le \lambda \mu _0(y) \quad \text { for all } \tau \le \frac{(\lambda ^{m-1}-1)h_0^{m-1}}{C_0 \lambda ^{m-1}}. \end{aligned}$$

Note that there exists \(c_m^1>0\) only depending on m, such that \(\lambda ^{m-1}-1 \ge c_m^1(\lambda - 1)\) for all \(1<\lambda <2\). Hence for all \(1<\lambda <2\) we have

$$\begin{aligned} {\tilde{S}}^\tau [\mu _0](y) - \mu _0(y) \le (\lambda -1 ) \mu _0(y) \quad \text { for } \tau = \frac{c_m^1 h_0^{m-1}}{C_0 2^{m-1}}(\lambda -1), \end{aligned}$$

and this directly implies

$$\begin{aligned} {\tilde{S}}^\tau [\mu _0](y) - \mu _0(y) \le \frac{C_0 2^{m-1}}{c_m^1 h_0^{m-1} }\mu _0(y)\tau \quad \text { for all } \tau \le \frac{c_m^1 h_0^{m-1}}{C_0 2^{m-1}}. \end{aligned}$$
(2.36)

Similarly, for any \(0<\eta < 1\) we have \( \mathrm {dist}(y_1, (U_{y'}^{\eta h})^c)\ge \frac{(1-\eta ^{m-1}) h^{m-1}}{C_0}, \) and an identical argument as above gives us

$$\begin{aligned} {\tilde{S}}^\tau [\mu _0](y) \ge \eta \mu _0(y) \quad \text { for all } \tau \le \frac{(1-\eta ^{m-1})h_0^{m-1}}{C_0 \eta ^{m-1}}. \end{aligned}$$

Now we let \(c_m^2>0\) be such that \(1-\eta ^{m-1} \ge c_m^2(1-\eta )\) for all \(\frac{1}{2}<\eta <1\). Hence we have \({\tilde{S}}^\tau [\mu _0](y) - \mu _0(y) \ge -(1-\eta )\mu _0(y)\) for \(\tau = \dfrac{c_m^2 h_0^{m-1}}{C_0}(1-\eta )\), which implies

$$\begin{aligned} {\tilde{S}}^\tau [\mu _0](y) - \mu _0(y) \ge -\frac{C_0}{c_m^2 h_0^{m-1} }\mu _0(y)\tau \quad \text { for all } \tau \le \frac{c_m^2 h_0^{m-1}}{2C_0}.\qquad \end{aligned}$$
(2.37)

Combining (2.36) and (2.37), we have that for any \(h_0>0\), (2.17) holds for some \(C_1\) for all \(\tau \in [0,\delta _1]\), where both \(C_1>0\) and \(\delta _1>0\) depend on \(C_0, h_0\) and m.

Finally, we will show that (2.16) holds for \(\mu (\tau ) = {\tilde{S}}^\tau [\mu _0]\) if we choose \(h_0>0\) to be sufficiently small. First, we point out that \(\mathcal {S}[{\tilde{S}}^\tau \mu _0]\) is not preserved for all \(\tau \). This is because when different level sets are moving at different speed v(h), we no longer have that \(M^{v(h_1)\tau }(U_{x'}^{h_1}) \subset M^{v(h_2)\tau }(U_{x'}^{h_2}) \) for all \(h_1>h_2\). Nevertheless, we claim it is still true that

$$\begin{aligned} \mathcal {S}[{\tilde{S}}^\tau \mu _0] \le \mathcal {S}[\mu _0] \text { for all }\tau \ge 0. \end{aligned}$$
(2.38)

To see this, note that the definition of \({\tilde{S}}^\tau \) and the fact that \(M^{v(h)\tau }\) is measure preserving give us

$$\begin{aligned} \left| \{{\tilde{S}}^\tau \mu _0> h\}\right| \le \left| \{\mu _0>h\}\right| \quad \text { for all }h>0, \tau \ge 0, \end{aligned}$$

regardless of the definition of v(h). This implies that \(\int f({\tilde{S}}^\tau \mu _0(x))dx \le \int f(\mu _0(x))dx\) for any convex increasing function f, yielding (2.38).

Due to (2.38) and the fact that \(\mathcal {E}[\cdot ] = \mathcal {S}[\cdot ]+\mathcal {I}[\cdot ]\), in order to prove (2.16), it suffices to show

$$\begin{aligned} \mathcal {I}[{\tilde{S}}^\tau \mu _0] \le \mathcal {I}[\mu _0] - c_0 \tau \quad \text { for } \tau \in [0,\delta _0], \text { for some } c_0>0\text { and } \delta _0>0.\nonumber \\ \end{aligned}$$
(2.39)

Recall that Proposition 2.15 gives that \(\mathcal {I}[S^\tau \mu _0] \le \mathcal {I}[\mu _0] - c\tau \) for \(\tau \in [0,\delta ]\) with some \(c>0\) and \(\delta >0\). As a result, to show (2.39), all we need is to prove that if \(h_0>0\) is sufficiently small, then

$$\begin{aligned} \left| \mathcal {I}[{\tilde{S}}^\tau \mu _0] - \mathcal {I}[ S^\tau \mu _0]\right| \le \frac{c\tau }{2}\quad \text { for all } \tau . \end{aligned}$$
(2.40)

To show (2.40), we first split \(S^\tau \mu _0\) as the sum of two integrals in \(h\in [h_0,\infty )\) and \(h\in [0,h_0)\):

$$\begin{aligned} S^\tau \mu _0(x_1, x')= & {} \int _{h_0}^\infty \chi _{M^\tau (U_{x'}^h)}(x_1)dh + \int _{0}^{h_0} \chi _{M^\tau (U_{x'}^h)}(x_1)dh \nonumber \\=: & {} f_1(\tau ,x)+f_2(\tau ,x). \end{aligned}$$
(2.41)

We then split \({\tilde{S}}^\tau \mu _0\) similarly, and since \(v(h)=1\) for all \(h>h_0\) we obtain

$$\begin{aligned} \begin{aligned} {\tilde{S}}^\tau \mu _0(x_1, x')&= f_1(\tau ,x) + \int _{0}^{h_0} \chi _{M^{v(h)\tau }(U_{x'}^h)}(x_1)dh \\&=: f_1(\tau ,x) + {\tilde{f}}_2(\tau ,x).\end{aligned}\end{aligned}$$
(2.42)

For any \(\tau \ge 0\), we have \(\Vert f_1(\tau ,\cdot )\Vert _{L^\infty ({\mathbb {R}}^d)} \le \Vert \mu _0\Vert _{L^\infty ({\mathbb {R}}^d)}\), while \(\Vert f_2(\tau ,\cdot )\Vert _{L^\infty ({\mathbb {R}}^d)}\) and \(\Vert \tilde{f}_2(\tau ,\cdot )\Vert _{L^\infty ({\mathbb {R}}^d)}\) are both bounded by \(h_0\). As for the \(L^1\) norm, we have that \(\Vert f_1(\tau ,\cdot )\Vert _{L^1({\mathbb {R}}^d)} \le \Vert \mu _0\Vert _{L^1({\mathbb {R}}^d)}\), and

$$\begin{aligned} \Vert f_2(\tau ,\cdot )\Vert _{L^1({\mathbb {R}}^d)} = \Vert {\tilde{f}}_2(\tau ,\cdot )\Vert _{L^1({\mathbb {R}}^d)} = \int _{{\mathbb {R}}^d} \min \{\mu _0(x), h_0\}dx =: m_{\mu _0}(h_0), \end{aligned}$$

where \(m_{\mu _0}(h_0)\) approaches 0 as \(h_0\searrow 0\).

Also, since \(v(h)\le 1\), we know that for each \(\tau \ge 0\), there is a transport map \(\mathcal {T}(\tau ,\cdot ):[0,\infty )\times {\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) with \(\sup _{x\in {\mathbb {R}}^d} |\mathcal {T}(\tau ,x)-x|\le 2 \tau \), such that \(\mathcal {T}(\tau ,\cdot )\# f_2(\tau ,\cdot )=\tilde{f}_2(\tau ,\cdot )\) (that is, \(\int {\tilde{f}}_2(\tau ,x) \varphi (x)dx = \int f_2(\tau ,x)\varphi (\mathcal {T}(\tau ,x))dx\) for any measurable function \(\varphi \)). Indeed, since the level sets of \(f_2\) are traveling at speed 1 and the level sets of \({\tilde{f}}_2\) are traveling with speed v(h), for each \(\tau \) we can find a transport plan between them with maximal displacement \(L^\infty \) distance at most \(2\tau \) in its support. Let us remark that since these densities are both in \(L^\infty \), there is some optimal transport map \(\tilde{\mathcal {T}}\) for the \(\infty \)-Wasserstein such that \(|\tilde{\mathcal {T}}(\tau ,x)-x|\le 2\tau \). Although existence of an optimal map is known [38], we just need a transport map with this property below.

Using the decompositions (2.41), (2.42) and the definition of \(\mathcal {I}[\cdot ]\), we obtain, omitting the \(\tau \) dependence on the right hand side,

$$\begin{aligned} \begin{aligned} \left| \mathcal {I}[{\tilde{S}}^\tau \mu _0] - \mathcal {I}[ S^\tau \mu _0]\right| \le&\underbrace{\left| \int f_2(W*f_1) dx - \int {\tilde{f}}_2(W*f_1)dx\right| }_{=:A_1(\tau )}\\&+ \frac{1}{2}\underbrace{\left| \int f_2(W*f_2)dx - \int {\tilde{f}}_2(W*{\tilde{f}}_2)dx\right| }_{=:A_2(\tau )}, \end{aligned} \end{aligned}$$

and we will bound \(A_1(\tau )\) and \(A_2(\tau )\) in the following. For \(A_1(\tau )\), denote \(\Phi (\tau ,\cdot ) =: W*f_1(\tau ,\cdot )\), and using the \(L^\infty \), \(L^1\) bounds on \(f_1\) and the assumptions (K2),(K3), we proceed in the same way as in (2.4) to obtain that \(\Vert \nabla \Phi \Vert _{L^\infty ({\mathbb {R}}^d)} \le C = C(\Vert \mu _0\Vert _{L^\infty ({\mathbb {R}}^d)}, \Vert \mu _0\Vert _{L^1({\mathbb {R}}^d)}, C_w, d)\).

Using that \(\mathcal {T}(\tau ,\cdot )\# f_2(\tau ,\cdot ) = \tilde{f}_2(\tau ,\cdot )\), we can rewrite \(A_1(\tau )\) as

$$\begin{aligned} \begin{aligned} A_1(\tau )&= \left| \int f_2(x) \Big (\Phi (x) - \Phi (\mathcal {T}(\tau ,x))\Big )dx\right| \\&\le \Vert f_2(\tau )\Vert _{L^1({\mathbb {R}}^d)} \sup _{x\in {\mathbb {R}}^d}|\Phi (x){-}\Phi (\mathcal {T}(\tau ,x))| {\le } m_{\mu _0}(h_0) \Vert \nabla \Phi \Vert _{L^\infty ({\mathbb {R}}^d)} 2\tau \\&\le m_{\mu _0}(h_0) C(\Vert \mu _0\Vert _{L^\infty ({\mathbb {R}}^d)}, \Vert \mu _0\Vert _{L^1({\mathbb {R}}^d)}, C_w, d) \tau , \end{aligned} \end{aligned}$$

where the coefficient of \(\tau \) can be made arbitrarily small by choosing \(h_0\) sufficiently small. To control \(A_2(\tau )\), we first use the identity \(\int f(W*g)dx = \int g(W*f)dx\) to bound it by

$$\begin{aligned} A_2(\tau )\le & {} \left| \int f_2(W*f_2)dx - \int {\tilde{f}}_2(W* f_2)dx\right| \\&+ \left| \int f_2(W*{\tilde{f}}_2)dx - \int \tilde{f}_2(W*{\tilde{f}}_2)dx\right| , \end{aligned}$$

and both terms can be controlled in the same way as \(A_1(\tau )\), since both \(\Phi _2 := W*f_2\) and \({{\tilde{\Phi }}}_2 := W*{\tilde{f}}_2\) satisfy the same estimate as \(\Phi \). Combining the estimates for \(A_1(\tau )\) and \(A_2(\tau )\), we can choose \(h_0>0\) sufficiently small, depending on \(\mu _0\) and \(W\), such that Eq. (2.40) would hold for all \(\tau \), which finishes the proof. \(\square \)

2.3 Proof of Theorem 2.2

Proof

Towards a contradiction, assume there is a stationary state \(\rho _s\) that is not radially decreasing. Due to Lemma 2.3, we have that \(\rho _s \in \mathcal {C}({\mathbb {R}}^d) \cap L^1_+({\mathbb {R}}^d)\), and \(|\frac{m}{m-1}\nabla \rho _s^{m-1}| \le C_0\) in \(\mathrm {supp}\,\rho _s\) for some \(C_0>0\) (and if \(m=1\), it becomes \(|\nabla \log \rho _s|\le C_0\)). In addition, if \(m\in (0,1]\), the same lemma also gives \(\mathrm {supp}\,\rho _s = {\mathbb {R}}^d\). This enables us to apply Proposition 2.8 to \(\rho _s\), hence there exists a continuous family of \(\mu (\tau ,\cdot )\) with \(\mu (0,\cdot ) = \rho _s\) and constants \(C_1>0, c_0>0,\delta _0>0\), such that the following holds for all \(\tau \in [0,\delta _0]\):

$$\begin{aligned}&\mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s] \le - c_0 \tau , \end{aligned}$$
(2.43)
$$\begin{aligned}&|\mu (\tau ,x) - \rho _s(x)| \le C_1 \rho _s(x)^{\max \{1,2-m\}} \tau \quad \text { for all }x\in {\mathbb {R}}^d, \end{aligned}$$
(2.44)
$$\begin{aligned}&\int _{D_i} (\mu (\tau ,x)-\rho _s(x))dx =0 \text { for any connected component } D_i\text { of } \mathrm {supp}\,\rho _s.\qquad \qquad \end{aligned}$$
(2.45)

Next we will use (2.44) and (2.45) to directly estimate \(\mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s]\), and our goal is to show that there exists some \(C_2>0\), such that

$$\begin{aligned} \big | \mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s] \big | \le C_2 \tau ^2 \quad \text { for } \tau \text { sufficiently small.} \end{aligned}$$
(2.46)

We then directly obtain a contradiction between (2.43) and (2.46) for sufficiently small \(\tau >0\).

Let \(g(\tau ,x) := \mu (\tau ,x)-\rho _s(x)\). Due to (2.44), we have \(|g(\tau ,x)| \le C_1 \rho _s(x)^{\max \{1,2-m\}} \tau \) for all \(x\in {\mathbb {R}}^d\) and \(\tau \in [0,\delta _0]\). From now on, we set \(\delta _0\) to be the minimum of its previous value and \((2C_1(1+\Vert \rho _s\Vert _\infty ))^{-1}\). Such \(\delta _0\) ensures that \(\mathrm {supp}\,g(\tau ,\cdot ) \subset \mathrm {supp}\,\rho _s\) and \(|g(\tau ,x)/\rho _s(x)|\le \frac{1}{2}\) for all \(\tau \in [0,\delta _0]\).

Since the energy \(\mathcal {E}\) takes different formulas for \(m\ne 1\) and \(m=1\), we will treat these two cases differently. Let us start with the case \(m\in (0,1)\cup (1,+\infty )\). Using the notation \(g(\tau ,x)\), we have the following: (where in the integrand we omit the x dependence, due to space limitations)

$$\begin{aligned} \mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s] =&\int \frac{\left( (\rho _s+g(\tau ))^m - \rho _s^m\right) }{m-1} dx\nonumber \\&+\frac{1}{2} \int (\rho _s+g(\tau )) \big (W*(\rho _s+g(\tau ))\big ) - \rho _s (W*\rho _s) dx \nonumber \\ =&\int _{\mathrm {supp}\,\rho _s} \underbrace{\frac{\rho _s^m}{m-1}\left( \left( 1+\frac{g(\tau )}{\rho _s}\right) ^m - 1\right) }_{:= T(\tau ,x)} dx \nonumber \\&+ \int \left[ g(\tau )(W*\rho _s)+\frac{1}{2}g(\tau )(W*g(\tau ))\right] dx. \end{aligned}$$
(2.47)

Recall that for all \(|a|<1/2\), we have the elementary inequality

$$\begin{aligned} \big | (1+a)^m -1 -ma\big | \le C(m)a^2 \text { for some }C(m)>0. \end{aligned}$$

Since for all \(x\in \mathrm {supp}\,\rho _s\) and \(\tau \in [0,\delta _0]\) we have \(|g(\tau ,x)/\rho _s(x)|\le \frac{1}{2}\), we can replace a by \(g(x)/\rho _s(x)\) in the above inequality, then multiply \(\frac{1}{|m-1|}\rho _s^m\) to both sides to obtain the following (with \(C_2(m)=C(m)/|m-1|\)):

$$\begin{aligned} \left| T(\tau ,x) - \frac{m}{m-1} g(\tau ,x) \rho _s(x)^{m-1}\right| \le C_2(m) \rho _s^{m-2} g(\tau )^2. \end{aligned}$$

Applying this to (2.47), we have the following for all \(\tau \le \min \{\delta _0, C_1/2\}\):

$$\begin{aligned} \begin{aligned} \big |\mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s]\big | \le&\left| \int _{\mathrm {supp}\,\rho _s}g(\tau ) \left( \frac{m}{m-1} \rho _s^{m-1} + W*\rho _s\right) dx\right| \\&\quad + \left| \frac{1}{2}\int g(\tau )(W*g(\tau )) dx\right| \\&\qquad + C_2(m)\left| \int \rho _s^{m-2} g(\tau )^2 dx\right| \\&=: I_1+I_2+I_3. \end{aligned} \end{aligned}$$

Since \(\rho _s\) is a steady state solution, from (2.11) we have \(\frac{m}{m-1}\rho _s^{m-1} + W*\rho _s = C_i\) in each connected component \(D_i\subset \mathrm {supp}\,\rho _s\), hence \(I_1 \equiv 0\) for all \(\tau \in [0,\delta _0]\) due to (2.45) and the definition of \(g(\tau ,\cdot )\).

For \(I_2\) and \(I_3\), since \(|g(\tau ,x)| \le C_1 \rho _s(x)^{\max \{1,2-m\}} \tau \) for \(\tau \in [0,\delta _0]\), for \(m>1\) it becomes \(|g(\tau ,x)| \le C_1 \rho _s(x) \tau \), thus we directly have

$$\begin{aligned} I_2 \le \frac{1}{2}C_1^2 \tau ^2 \int |\rho _s(W*\rho _s)|dx \le A \tau ^2,\\ I_3 \le C_2(m) C_1^2 \tau ^2 \int \rho _s^m dx \le A \tau ^2, \end{aligned}$$

for some \(A>0\) depending on \(\Vert \rho _s\Vert _{1}, \Vert \rho _s\Vert _{\infty }, m\) and d (where we use (2.4) and \(\rho _s \omega (1+|x|)\in L^1\) to control \(I_2\)). For \(m\in (0,1)\), the bound of g implies \(|g(\tau ,x)| \le C_1 \Vert \rho _s\Vert _{\infty }^{1-m} \rho _s(x) \tau \). Plugging this into \(I_2\) gives the same bound as above (with a different A). And for \(I_3\), plugging in \(|g(\tau ,x)| \le C_1 \rho _s(x)^{2-m} \tau \) gives

$$\begin{aligned} I_3 \le C_2(m) C_1^2 \tau ^2 \int \rho _s^{2-m} \le A\tau ^2, \end{aligned}$$

where in the last inequality we used that \(2-m>1\) and \(\rho _s \in L^1 \cap L^\infty \). Putting them together finally gives \(\big |\mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s]\big | \le 2A\tau ^2\) for all \(\tau \le \delta _0\), finishing the proof for \(m\in (0,1)\cup (1,+\infty )\).

Next we move on to the case \(m=1\). Using the notation \(g(\tau ,x)\), the difference \(\mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s]\) can be rewritten as follows: (where we again omit the x dependence in the integrand)

$$\begin{aligned} \mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s]&= \int \left[ (\rho _s+g(\tau )) \log (\rho _s+g(\tau )) - \rho _s \log \rho _s\right. \\&\quad \left. + g(\tau )(W*\rho _s)+\frac{1}{2}g(\tau )(W*g(\tau ))\right] dx \\&= \int g(\tau ) \left( \log \rho _s + W*\rho _s\right) dx+ \int (\rho _s + g(\tau )) \log \left( 1+\frac{g(\tau )}{\rho _s}\right) dx \\&\quad +\frac{1}{2} \int g(\tau )(W*g(\tau )) dx \\&=: J_1 + J_2 + J_3. \end{aligned}$$

Again, we have \(J_1 = 0\) since \(\int g(\tau )dx = 0\), and \(\log \rho _s + W*\rho _s = C\) in \({\mathbb {R}}^d\). \(J_3\) is the same term as \(I_2\), thus again can be controlled by \(A\tau ^2\). Finally it remains to control \(J_2\). Let us break \(J_2\) into

$$\begin{aligned} J_2 = \int \rho _s \log \left( 1+\frac{g(\tau )}{\rho _s}\right) dx + \int g(\tau ) \log \left( 1+\frac{g(\tau )}{\rho _s}\right) dx =: J_{21} + J_{22}. \end{aligned}$$

For \(J_{22}\), using the inequality \(\log (1+a)<a\) for all \(a>0\), we have

$$\begin{aligned} J_{22}\le \int \frac{g(\tau )^2}{\rho _s} dx \le \int C_1^2 \tau ^2 \rho _s dx \le C_1^2 \Vert \rho _s\Vert _{1} \tau ^2 , \end{aligned}$$
(2.48)

where we use (2.44) in the second inequality. To control \(J_{21}\), due to the elementary inequality

$$\begin{aligned} \left| \log \left( 1+a\right) -a\right| \le C a^2 \quad \text { for all }a>0 \end{aligned}$$

for some universal constant C, letting \(a = \frac{g(\tau )}{\rho _s}\) and apply it to \(J_{21}\) gives

$$\begin{aligned} \Big |J_{21} - \underbrace{\int g(\tau )dx}_{=0 \text { by } (2.45)}\Big | \le C \int \frac{g(\tau )^2}{\rho _s} dx \le C C_1^2 \Vert \rho _s\Vert _{1} \tau ^2 , \end{aligned}$$

where the last inequality is obtained in the same way as (2.48). Combining these estimates above gives \(|\mathcal {E}[\mu (\tau )] - \mathcal {E}[\rho _s]| \le A\tau ^2\) for some \(A>0\) depending on \(\Vert \rho _s\Vert _{1}, \Vert \rho _s\Vert _{\infty }\) and d, which completes the proof. \(\square \)

2.4 A shortcut for equations with a gradient flow structure

In this subsection, we would like to discuss a shortcut for proving Theorem 2.2, once the first order decay under continuous Steiner symmetrization in Proposition 2.15 has been established, if the Eq. (1.1) has a rigorous gradient flow structure. Over the past two decades, it was discovered that many evolution PDEs have a Wasserstein gradient flow structure including the heat equation, porous medium equation, and the aggregation-diffusion Eq. (1.1) if the kernel W has certain convexity properties, see [2, 34, 42, 55, 76]. More precisely, for (1.1), if W is known to be \(\lambda \)-convex, then given any \(\rho _0 \in \mathcal {P}_2({\mathbb {R}}^d)\) (space of non-negative probability measures with finite second-moment) with \(\mathcal {E}[\rho _0]<\infty \), there exists a unique gradient flow \(\rho (t)\) of the free energy functional \(\mathcal {E}[\rho _0]\) in the space \(\mathcal {P}_2({\mathbb {R}}^d)\) endowed by the 2-Wasserstein distance. In addition, the gradient flow coincides with the unique weak solution if the velocity field has the necessary integrability conditions.

The \(\lambda \)-convexity of the potential W does not hold in the generality of our assumptions (K1)–(K4). However, the \(\lambda \)-convexity assumption on W has been recently relaxed in the following works for the particular, but important, case of the attractive Newtonian kernel. Craig [42] has shown that the gradient flow is well-posed if the energy \(\mathcal {E}\) is \(\xi \)-convex, where \(\xi \) is a modulus of convexity. Carrillo and Santambrogio [35] have recently shown that for (1.1) with attractive Newtonian potential, for any \(\rho _0\) in \(L^\infty ({\mathbb {R}}^d) \cap \mathcal {P}_2({\mathbb {R}}^d)\), there is a local-in-time gradient flow solution. The authors show that there are local in time \(L^\infty \) bounds at the discrete variational level allowing for local in time well defined gradient flow solutions. Furthermore, this gradient flow solution is unique among a large class of weak solutions due to the earlier results [32]. There, it was also shown that the free energy functional \(\mathcal {E}\) is \(\xi \)-convex for \(m\ge 1-\tfrac{1}{d}\) in the set of bounded densities \(L^\infty ({\mathbb {R}}^d) \cap \mathcal {P}_2({\mathbb {R}}^d)\) with a given fixed bound allowing the use of the recent theory of \(\xi \)-convex gradient flows in [42]. Summarizing, the recent results for the Newtonian attractive kernel [32, 35, 42] allow for a rigorous gradient flow structure of the Newtonian attractive kernel case for \(m\ge 1-\tfrac{1}{d}\) with initial data in \(L^\infty ({\mathbb {R}}^d) \cap \mathcal {P}_2({\mathbb {R}}^d)\).

In short we now know two particular more restrictive classes of potentials than the assumptions (K1)–(K4), including the Newtonian kernel case, for which a rigorous gradient flow theory has been developed for (1.1). Next we will show that under a rigorous gradient flow structure, once we use continuous Steiner symmetrization to obtain Proposition 2.15, it almost directly leads to radial symmetry via the following shortcut. In particular, Proposition 2.8 is not needed. Below is the statement and proof of the new proposition that we include for the sake of completeness. Note that it is weaker than Theorem 2.2, since Wasserstein gradient flow requires solutions to have a finite second moment, and furthermore for the existence of the gradient flow solutions we need to assume \(m\ge 1-\tfrac{1}{d}\). We will discuss this difference in Remark 2.20.

Proposition 2.19

Assume that W is such that (1.1) has a local-in-time unique gradient flow solution. Let \(\rho _s \in L^\infty ({\mathbb {R}}^d) \cap \mathcal {P}_2({\mathbb {R}}^d) \) be a stationary solution of (1.1) with \( \mathcal {E}[\rho _s]\) being finite. Then \(\rho _s\) must be radially decreasing after a translation.

Proof

Towards a contradiction, assume there is a stationary state \(\rho _s\) that is not radially decreasing after any translation. As before, Lemma 2.3 yields that \(\rho _s \in \mathcal {C}({\mathbb {R}}^d) \cap L^1_+({\mathbb {R}}^d)\). Applying Lemma 2.18 to \(\rho _s\) allows us to find a hyperplane H that splits the mass of \(\rho _s\) into half and half, but \(\rho _s\) is not symmetric decreasing about H. Without loss of generality assume \(H = \{x_1=0\}\). Applying Proposition 2.15 to \(\rho _s\) and using the fact that the \(L^m\) norm is conserved under the continuous Steiner symmetrization \(S^\tau \), we directly have that

$$\begin{aligned} \mathcal {E}[S^\tau \rho _s] \le \mathcal {E}[ \rho _s] - c_0 \tau \quad \text { for all }\tau \in [0,\delta _0], \end{aligned}$$
(2.49)

where \(c_0, \delta _0\) are strictly positive constants that depend on \(\rho _s\). In addition, since the continuous Steiner symmetrization \(S^\tau \) gives an explicit transport plan from \(\rho _s\) to \(S^\tau \rho _s\), where each layer is shifted by no more than distance \(\tau \), we have \(W_\infty (\rho _s, S^\tau \rho _s) \le \tau \), thus

$$\begin{aligned} W_2(\rho _s, S^\tau \rho _s) \le W_\infty (\rho _s, S^\tau \rho _s) \le \tau \quad \text { for all }\tau >0. \end{aligned}$$
(2.50)

Using (2.49) and (2.50), the metric slope \(|\partial \mathcal {E}|(\rho _s)\) as defined in [2, Definition 1.2.4] satisfies

$$\begin{aligned} |\partial \mathcal {E}|(\rho _s) = \limsup _{\rho \rightarrow \rho _s} \frac{(\mathcal {E}[\rho _s] - \mathcal {E}[\rho ])^+}{W_2(\rho _s, \rho )} \ge \limsup _{\tau \rightarrow 0} \frac{(\mathcal {E}[\rho _s] - \mathcal {E}[S^\tau \rho _s])^+}{W_2(\rho _s, S^\tau \rho _s)} \ge c_0. \end{aligned}$$

On the other hand, the local in time gradient flow solution \(\rho (t)\) with initial solution \(\rho _s\) satisfies an Evolution Differential Inequality (EVI) (see [42, Definition 2.10] when W is the Newtonian kernel), then arguing as in [3, Proposition 3.6], see also [32], we have that the following energy dissipation inequality is satisfied, for all \(t\ge 0\)

$$\begin{aligned} \mathcal {E}(\rho (t))-\mathcal {E}(\rho _{s})\le -\frac{1}{2}\int _{0}^{t}|\partial \mathcal {E}|^{2}(\rho (\tau ))d\tau -\frac{1}{2}\int _{0}^{t}|\rho ^{\prime }(\tau )|^{2}d\tau \end{aligned}$$
(2.51)

both for \(\lambda \)-convex potentials, actually (2.51) holds with equality, and for the Newtonian attractive potential. This is a consequence of the map \(t\rightarrow |\partial \mathcal {E}|(\rho (t))\) being decreasing and lower semicontinuous, see for instance [2, Theorem 2.4.15] in the \(\lambda \)-convex case and [42, Theorem 3.12] in the Newtonian kernel case. Since \(\rho (t)\equiv \rho _s\) is a gradient flow solution, plugging it into (2.51) yields that the left hand side is 0, whereas the right hand side is less than \(-\frac{1}{2}c_0^2 t\) which is negative for all \(t>0\), a contradiction. \(\square \)

Remark 2.20

The assumption that \(\rho _s\) is a probability measure does not create any actual restriction. If \(\rho _s\) is a stationary solution of (1.1) with mass \(M_0 \ne 1\), we can simply apply Theorem 2.19 to \({{\tilde{\rho }}}_s := \frac{\rho _s}{M_0}\), which has mass 1, and it is a stationary solution of (1.1) with some positive coefficients multiplied to the two terms on the right hand side. However, the assumption that \(\rho _s\) has finite second moment (which comes in the definition of \(\mathcal {P}_2({\mathbb {R}}^d)\)) makes it more restrictive than Theorem 2.2, which only requires \(\omega (1+|x|)\rho _s \in L^1({\mathbb {R}}^d)\). Moreover, the assumption of the existence of a local-in-time unique gradient flow solution implies the more restrictive condition on the nonlinear diffusion \(m\ge 1-\tfrac{1}{d}\) in order to be proved with the available literature [3, 42].

At the end of this subsection, let us point out that for our main application in this work, where \(W = -\mathcal {N}\) is the attractive Newtonian kernel modulo translation and \(m>1\), we could have used this shortcut to show that all stationary solution \(\rho _s \in L^1_+({\mathbb {R}}^d)\cap L^\infty ({\mathbb {R}}^d)\) with finite second moment must be radially decreasing. However the longer approach (via Proposition 2.8 and Theorem 2.2) has a larger interest for two reasons. One is that as discussed in Remark 2.20, Theorem 2.2 proves radial symmetry in a more general class of stationary solutions and more general nonlinear diffusions. Another reason is that the longer approach does not rely on any convexity assumption on W, thus it works even if the equation does not have a rigorous gradient flow structure. Even more, part of the authors have also recently shown that this longer proof can be generalized to kernels that are more singular than Newtonian [31] for which a rigorous gradient flow theory is missing.

2.5 Including a potential term

In this subsection, we consider the aggregation-diffusion equation with an extra drift term given by a potential V(x):

$$\begin{aligned} \partial _t \rho = \Delta \rho ^m + \nabla \cdot (\rho \nabla (W*\rho + V)) \quad x\in {\mathbb {R}}^d \,, t\ge 0\, \end{aligned}$$
(2.52)

where we assume that \(m>0\), \(V(x)\in \mathcal {C}^1({\mathbb {R}}^d)\) is radially symmetric, and \(V'(r)>0\) for all \(r>0\).

For this equation, its stationary solution is defined in the same way as Definition 2.1, with (2.1) replaced by \(\nabla \rho _s^{m} = -\rho _s\nabla (\psi _s + V)\). We point out that Lemma 2.3 still holds, except that the right hand sides of (2.7) and (2.8) are now replaced by an x-dependent bound \(C + |\nabla V(x)|\). From its proof, we know that if \(\rho _s\) is a stationary solution, then

$$\begin{aligned} \frac{m}{m-1}\rho _s^{m-1} + \rho _s * W+ V = C_i \quad \text { in } \mathrm {supp}\,\rho _s, \end{aligned}$$

where \(C_i\) may take different values in different components. As before, if \(m=1\) then \(\frac{m}{m-1}\rho _s^{m-1}\) is replaced by \(\log \rho _s\); and if \(0<m\le 1\) we again have that \(\mathrm {supp}\,\rho _s = {\mathbb {R}}^d\).

Due to the extra potential term, the energy functional \(\mathcal {E}[\rho ]\) is now given by \(\mathcal {S}[\rho ] + \mathcal {I}[\rho ] + \mathcal {V}[\rho ]\), with the extra potential energy \(\mathcal {V}[\rho ] := \int \rho V dx\). We start with a simple observation that the potential energy is non-increasing under continuous Steiner symmetrization, a consequence of properties of continuous Steiner symmetrization in [18].

Lemma 2.21

Let \(V \in \mathcal {C}({\mathbb {R}}^d)\) be radially symmetric and non-decreasing in |x|. Let \(\mu \in L^1_+({\mathbb {R}}^d)\cap L^\infty ({\mathbb {R}}^d)\) be such that \(\int \mu V dx < \infty \). Then \(\int S^\tau [\mu ] V dx\) is non-increasing for all \(\tau >0\).

Proof

For any \(n\in {\mathbb {N}}_+\), let \(\varphi _n(x) := \max \{0, V(n)-V(x)\}\). (Here we define \(V(n):=V(x)|_{|x|=n}\) by a slight abuse of notation.) Note that \(\mathrm {supp}\,\varphi _n \subset B(0,n)\), and is non-increasing in |x|. By the Hardy–Littlewood inequality for continuous Steiner symmetrization [18, Lemma 4], we have

$$\begin{aligned} \int _{{\mathbb {R}}^d} S^\tau [\mu ] \varphi _n dx = \int _{{\mathbb {R}}^d} S^\tau [\mu ] S^\tau [\varphi _n] dx \ge \int _{{\mathbb {R}}^d} \mu \varphi _n dx \quad \text { for all }\tau \ge 0, n\in {\mathbb {N}}^+\nonumber \\ \end{aligned}$$
(2.53)

Note that \(-\varphi _n = \min \{V(n), V(x)\} - V(n).\) Since \(\int S^\tau [\mu ] dx = \int \mu dx\), (2.53) is equivalent with

$$\begin{aligned}&\int _{{\mathbb {R}}^d} S^\tau [\mu ] \min \{V(x), V(n)\} dx\\&\quad \le \int _{{\mathbb {R}}^d} \mu \min \{V(x), V(n)\} dx \quad \text { for all }\tau \ge 0, n\in {\mathbb {N}}^+. \end{aligned}$$

Sending \(n\rightarrow \infty \), the above inequality becomes \(\int S^\tau [\mu ] V dx\le \int \mu V dx\) for all \(\tau \ge 0\). The semigroup property of \(S^\tau \) then gives us the desired result. \(\square \)

The above lemma gives that \(\frac{d^+}{d\tau } \int S^\tau [\mu ] V dx \le 0\), but it turns out that we have to improve it into a strict inequality if \(\mu \) is not symmetric decreasing about \(H = \{x_1=0\}\), which we prove below.

Lemma 2.22

Let \(V \in \mathcal {C}({\mathbb {R}}^d)\) be radially symmetric and strictly increasing in |x|. Assume \(\mu \in L^1_+({\mathbb {R}}^d)\cap L^\infty ({\mathbb {R}}^d)\) is such that \(\int \mu V dx < \infty \), and \(\mu \) is not symmetric decreasing about \(H = \{x_1=0\}\). Then \(\frac{d^+}{d\tau } \int S^\tau [\mu ] V dx \big |_{\tau =0}< 0.\) As a consequence, for such \(\mu \), there is a constant \(c_{0}>0\) (depending on \(\mu \) and V) such that for small \(\tau >0,\)

$$\begin{aligned} \int S^\tau [\mu ] V dx \le \int \mu \, V dx-c_{0}\tau . \end{aligned}$$

Proof

Recall that for each \(x'\in {\mathbb {R}}^{d-1}\), \(h\in {\mathbb {R}}^+\), the set \(U_{x'}^h\) is an at most countable union of subintervals. Without loss of generality we assume the subintervals do not share a common endpoint; if so, we add a point to merge them into one interval. Each subinterval can be written in the form \(I(c,r)=(c-r,c+r)\). Since \(\mu \) is not symmetric decreasing about H, some of these subintervals must have their center not at 0 for some \(x', h\). This motivates us to define the set \(B_\delta \subset {\mathbb {R}}^{d-1} \times {\mathbb {R}}^+\) for \(0< \delta \ll 1\):

$$\begin{aligned} B_\delta:= & {} \{(x', h) \in {\mathbb {R}}^{d-1} \times {\mathbb {R}}^+ : |x'| \le \delta ^{-1}, \text { and }U_{x'}^h \\&\text { has a subinterval }I(c,r) \text { with }|c|, r \in [\delta , \delta ^{-1}]\}. \end{aligned}$$

The assumption of \(\mu \) implies that \(|B_\delta | > 0\) for sufficiently small \(\delta >0\).

By Definition 2.12, \(\int S^\tau [\mu ] V dx\) can be written as

$$\begin{aligned} \int S^\tau [\mu ] V dx = \int _{{\mathbb {R}}^+} \int _{{\mathbb {R}}^{d-1}} \int _{{\mathbb {R}}} \chi _{M^\tau (U_{x'}^h)}(x_1) V(x_1, x') dx_1 dx' dh.\nonumber \\ \end{aligned}$$
(2.54)

Now let us investigate the innermost integral. For any open set \(U \subset {\mathbb {R}}\), let us define

$$\begin{aligned} \Phi (\tau ; U,x'):= \int _{{\mathbb {R}}} \chi _{M^\tau (U)}(x_1) V(x_1, x') dx_1. \end{aligned}$$

With this notation, the innermost integral in (2.54) becomes \(\Phi (\tau ; U_{x'}^h,x')\).

To estimate \(\frac{d^+}{d\tau } \Phi (\tau ; U_{x'}^h,x')|_{\tau =0}\), let us start with an easier estimate \(\frac{d^+}{d\tau } \Phi (\tau ; U,x')|_{\tau =0}\) when U is a single interval I(cr). If \(c=0\), clearly \(\frac{d^+}{d\tau } \Phi (\tau ; U,x')\big |_{\tau =0} = 0\). If \(c\ne 0\) (WLOG assume \(c<0\)), then \(M^\tau (U) = I(c+\tau , r)\) for sufficiently small \(\tau >0\), thus

$$\begin{aligned} \frac{d^+}{d\tau } \Phi (\tau ; U,x')\Big |_{\tau =0} = V(c+r, x') - V(c-r, x') < 0, \end{aligned}$$

where we use \(|c+r|<|c-r|\) in the last inequality, which follows from \(c<0\), and actually we have \(|c-r|-|c+r|\ge \min \{2|c|, 2r\}\). And if \(c,r, x'\) satisfy \(|c|, r\in [\delta , \delta ^{-1}]\) and \(|x'| \le \delta ^{-1}\), we have the quantitative estimate

$$\begin{aligned} \frac{d^+}{d\tau } \Phi (\tau ; U,x')\Big |_{\tau =0} \le -C_\delta < 0, \end{aligned}$$

where \(C_\delta \) is given by

$$\begin{aligned} \begin{aligned} C_\delta :=&\inf _{a_1,a_2,b\in {\mathbb {R}}}\left\{ V\left( \sqrt{a_1^2 + b^2}\right) - V\left( \sqrt{a_2^2 + b^2}\right) : \right. \\&\left. \qquad \qquad \quad |a_1|-|a_2| \ge 2\delta , |a_1|,|a_2|\le 2\delta ^{-1}, |b|\le \delta ^{-1}\right\} ,\end{aligned} \end{aligned}$$

where we denote \(V(x)=V(|x|)\) by a slight abuse of notation. The strict positivity of \(C_\delta \) follows from the fact that V(r) is strictly increasing in r for \(r\ge 0\), as well as the compactness of the set \(\{ |a_1|-|a_2|\ge 2\delta , |a_1|,|a_2|\le 2\delta ^{-1}, |b|\le \delta ^{-1}\}\).

The above argument immediately leads to the crude estimate

$$\begin{aligned} \frac{d^+}{d\tau } \Phi (\tau ; U_{x'}^h,x')|_{\tau =0}\le 0 \quad \text { for all } (x', h)\in {\mathbb {R}}^{d-1}\times R^+ \end{aligned}$$

as we take the sum of the estimate \(\frac{d^+}{d\tau } \Phi (\tau ; U,x')|_{\tau =0}\le 0\) over all the subintervals \(U \subset U_{x'}^h\). In addition, if \(|x'|\le \delta ^{-1}\) and \(U_{x'}^h\) has a subinterval I(cr) with \(|c|, r\in [\delta , \delta ^{-1}]\), we have the quantitative estimate \(\frac{d^+}{d\tau } \Phi (\tau ; U_{x'}^h,x')|_{\tau =0} \le -C_\delta <0\). By definition of \(B_\delta \) at the beginning of this proof, we have

$$\begin{aligned} \frac{d^+}{d\tau } \Phi (\tau ; U_{x'}^h,x')\Big |_{\tau =0} \le -C_\delta <0 \quad \text { for all } (x', h)\in B_\delta , \end{aligned}$$

thus

$$\begin{aligned} \frac{d^+}{d\tau } \int S^\tau [\mu ] V dx \Big |_{\tau =0} = \int _{{\mathbb {R}}^+} \int _{{\mathbb {R}}^{d-1}} \frac{d^+}{d\tau }\Phi (\tau ; U_{x'}^h,x')\Big |_{\tau =0} dx' dh \le -C_\delta | B_\delta | < 0, \end{aligned}$$

finishing the proof.\(\square \)

Our goal of this subsection is to show that the radial symmetry result in Theorem 2.2 can be generalized to (2.52) for certain classes of potential V. We will work with one of the following two classes of V:

(V1) \(0<V'(r)\le C\) for some C for all \(r>0\).

(V2) \(V'(r)>0\) for all \(r>0\), and \(V'(r)\rightarrow +\infty \) as \(r\rightarrow +\infty \).

In the following theorem we prove radial symmetry of stationary solutions under assumption (V1) for all \(m>0\), and under assumption (V2) for \(m>1\). We expect that when \(m\in (0,1]\), it should be possible to refine some estimates in the proof and obtain symmetry for a wider class than (V1). We will not pursue this direction for presentation simplicity, and we leave further generalizations to interested readers.

Theorem 2.23

Assume that W satisfies (K1)–(K4) and \(m>0\). Let \(\rho _s \in L^1_+({\mathbb {R}}^d)\cap L^\infty ({\mathbb {R}}^d)\) satisfy \(\omega (1+|x|)\rho _s \in L^1({\mathbb {R}}^d)\) and \( \rho _sV \in L^1({\mathbb {R}}^d)\). Assume that \(\rho _s\) is a non-negative stationary state of (2.52) in the sense of Definition  2.1, with (2.1) replaced by \(\nabla \rho _s^{m} = -\rho _s\nabla (\psi _s + V)\). Then if V satisfies (V1), or if V satisfies (V2) in addition to \(m>1\), then \(\rho _s\) is radially decreasing about the origin.

Proof

Note that Lemma 2.3 still holds with a potential V, except that right hand sides of (2.7) and (2.8) are now replaced by an x-dependent bound \(C + |\nabla V(x)|\), which is uniformly bounded in x under (V1). And under the assumptions (V2) and \(m>1\), we will prove in Lemma 2.24 that \(\rho _s\) must be compactly supported. Thus in both cases, the right hand sides of (2.7) and (2.8) are still uniformly bounded in x in \(\mathrm {supp}\,\rho _s\).

The rest of the proof follows a similar approach as Theorem 2.2 and Proposition 2.8, with \(\mathcal {E}\) including an extra potential energy \(\mathcal {V}[\rho ] := \int \rho V dx\). However, some crucial modifications in the proof of Proposition 2.8 are needed, which we highlight below.

First, note that with a potential V, we will prove radial symmetry about the origin, rather than up to a translation. For this reason, we take an arbitrary hyperplane H passing through the origin, and aim to prove that \(\rho _s\) is symmetric decreasing about H. (WLOG we let \(H = \{x_1=0\}\).) Since H does not split the mass of \(\rho _s\) into half-and-half, it is possible that for all \(x' \in {\mathbb {R}}^{d-1}\) and \(h>0\), every line segment in \(U_{x'}^h\) has its center lying on one side of H. Therefore, the estimate in Proposition 2.15 might fail for \(\rho _s\), and all we have is the crude estimate

$$\begin{aligned} \mathcal {I}[S^\tau \rho _s] - \mathcal {I}[\rho _s] \le 0. \end{aligned}$$
(2.55)

Despite this weaker estimate in the interaction energy, we will show that all 3 estimates of Proposition 2.8 still hold, if we define \(\mu (\cdot ,\tau )\) in the same way as in its proof. Clearly, (2.17) and (2.18) remain true since \(\mu (\cdot ,\tau )\) is defined the same as before. We claim that (2.16) still holds, but with a different reason as before: the coefficient \(c_0>0\) used to come from contribution from the interaction energy via Proposition 2.15, but now it comes from the potential energy. To see this, consider the following two cases.

Case 1: \(m\in (0,1]\). Combining (2.55), Lemma 2.22 with \(\mathcal {S}[\mu (\tau )] - \mathcal {S}[\rho _s] \equiv 0\) (where the difference is defined in the sense of (2.6)), we again have (2.16) for some \(c_0>0\) for all sufficiently small \(\tau >0\).

Case 2: \(m>1\). In this case, recall that \(\mu (\tau ,\cdot ) = \tilde{S}^\tau [\mu (0,\cdot )]\), where \(\mu (0,\cdot )=\rho _s\) and \(\tilde{S}^\tau \) is the continuous Steiner symmetrization which “slows-down” at height \(h \in (0,h_0)\). From the proof of Lemma 2.22, we know that if \(B_\delta \) has a positive measure, then \(B_\delta \cap \{(x',h): h>h_0\}\) also has a positive measure for all sufficiently small \(h_0>0\), thus Lemma 2.22 still holds for \(\mu (\tau ) = {\tilde{S}}^\tau [\mu _0]\) if \(h_0\) is sufficiently small, leading to

$$\begin{aligned} \mathcal {V}[\mu (\tau )] - \mathcal {V}[\rho _s] \le -c\tau \quad \text { for some } c>0\text { for all sufficiently small }\tau >0. \end{aligned}$$

In addition, for sufficiently small \(h_0\) we still have (2.40) (where we fix c to be the constant from the above equation), and combining it with (2.55) gives

$$\begin{aligned} \mathcal {I}[\mu (\tau )] - \mathcal {I}[\rho _s] \le \frac{c\tau }{2}, \end{aligned}$$

and adding them together with (2.38) gives (2.16).

Once we obtain Proposition 2.8, the rest of the proof follows closely the proof of Theorem 2.2, except the following minor changes. With an extra potential energy in \(\mathcal {E}\), the right hand side of (2.47) has an additional term \(\int g(\tau ) V dx\). As a result, \(I_1\) has a different definition

$$\begin{aligned} I_1 =\left| \int _{\mathrm {supp}\,\rho _s}g(\tau ) \left( \frac{m}{m-1} \rho _s^{m-1} + W*\rho _s + V\right) dx\right| , \end{aligned}$$

which is still 0, since the equation for stationary solution now becomes

$$\begin{aligned} \frac{m}{m-1}\rho _s^{m-1} + \rho _s * W+ V = C_i \quad \text { in } \mathrm {supp}\,\rho _s. \end{aligned}$$

The \(m=1\) case is done with a similar modification, where \(J_1\) is now \(\int g(\tau ) \left( \log \rho _s + W*\rho _s + V\right) dx\), and again we have \(J_1=0\) since \(\rho _s\) is stationary. Finally, we obtain the same contradiction as in the proof of Theorem 2.2 if \(\rho _s\) is not symmetric decreasing about H. And since H is an arbitrary hyperplane through the origin, we have that \(\rho _s\) is radially decreasing about the origin. \(\square \)

Finally we state and prove the lemma used in the proof of Theorem 2.23, which shows all stationary solutions must be compactly supported if \(m>1\) and V satisfies (V2).

Lemma 2.24

Assume that \(m>1\), W satisfies (K1)–(K4), and V satisfies (V2). Let \(\rho _s \in L^1_+({\mathbb {R}}^d)\cap L^\infty ({\mathbb {R}}^d)\) satisfy \(\omega (1+|x|)\rho _s \in L^1({\mathbb {R}}^d)\). Assume that \(\rho _s\) is a non-negative stationary state of (2.52) in the sense of Definition  2.1, with (2.1) replaced by \(\nabla \rho _s^{m} = -\rho _s\nabla (\psi _s + V)\). Then \(\rho _s\) is compactly supported.

Proof

With a potential term, we have that

$$\begin{aligned} \frac{m}{m-1}\rho _s^{m-1} + \rho _s * W+ V = C_i \quad \text { in } \mathrm {supp}\,\rho _s, \end{aligned}$$
(2.56)

where \(C_i\) takes different values in different connected components of \(\mathrm {supp}\,\rho _s\). By a similar computation as (2.4) (with \(W\) replaced by \(\min \{W,0\}\)), we have \(\rho _s*W\ge -C(\Vert \rho _s\Vert _1, \Vert \rho _s\Vert _\infty , W)\). Thus the first two terms of (2.56) are uniformly bounded below. As a result, every connected component D of \(\mathrm {supp}\,\rho _s\) must be bounded: if not, the left hand side would be unbounded in D due to \(\lim _{|x|\rightarrow \infty } V(|x|)=\infty \), contradicting with (2.56).

Note that every connected component being bounded does not imply that \(\mathrm {supp}\,\rho _s\) is bounded: there may be a countable number of connected components going to infinity. We claim that there is some \(R(\Vert \rho _s\Vert _1, \Vert \rho _s\Vert _\infty , W, V)>0\), such that every connected component D must satisfy that \(D \cap B(0,R) \not = \emptyset \). As we will see later, this will help us control the outmost point of D.

If \(0\in D\), then clearly \(D \cap B(0,R) \not = \emptyset \). If \(0\not \in D\), we find some unit vector \(\nu \in {\mathbb {R}}^d\), such that the ray starting at origin with direction \(\nu \) has a non-empty intersection with D. Let \( t_0 = \inf \{t>0: t\nu \in D\}, \) and let \(x_0 = t_0 \nu \). We take a sequence of points \((t_n)_{n=1}^\infty \) such that \(t_n\searrow t_0\) and \(t_n \nu \in D\), and denote \(x_n = t_n \nu \). Since \(x_n \in D\) and \(x_0 \in \partial D\), the left hand side of (2.56) takes the same constant value \(C_i\) at \(x_0\) and all \(x_n\). As a result, for all \(n\ge 1\) we have

$$\begin{aligned}&\frac{\frac{m}{m-1}\left( \rho _s^{m-1}(x_n) - \rho _s^{m-1}(x_0)\right) }{t_n-t_0} + \frac{(\rho _s * W)(x_n) - (\rho _s * W)(x_0)}{t_n-t_0} \\&\quad + \frac{V(x_n)-V(x_0)}{t_n-t_0} = 0. \end{aligned}$$

Note that the first term is non-negative since \(\rho _s(x_0)=0\) (which follows from \(x_0\in \partial D\) and \(\rho _s\in \mathcal {C}({\mathbb {R}}^d)\)). The second term converges to \(\nabla (\rho _s*W)\cdot \nu \), whose absolute value is bounded by \(C(\Vert \rho _s\Vert _1, \Vert \rho _s\Vert _\infty , W)\) by (2.2). The third term converges to \(\nabla V(x_0) \cdot \nu = V'(t_0)\). Putting the three estimates together gives that

$$\begin{aligned} V'(t_0) \le C(\Vert \rho _s\Vert _1, \Vert \rho _s\Vert _\infty , W), \end{aligned}$$

thus assumption (V2) gives that \(t_0 \le R(\Vert \rho _s\Vert _1, \Vert \rho _s\Vert _\infty , W, V)\), finishing the proof of the claim.

Finally, we will show that \(D \cap B(0,R) \not = \emptyset \) implies the outmost point of D cannot get too far. Take any \(x_1 \in D\cap B(0,R)\), and let \(x_2\) be the outmost point of D. Taking the difference of (2.56) at \(x_2\) and \(x_1\) gives

$$\begin{aligned} V(x_2)-V(R) \le V(x_2)-V(x_1) = \frac{m}{m-1}\rho _s^{m-1}\Big |_{x_2}^{x_1} + (\rho _s * W)\Big |_{x_2}^{x_1}. \end{aligned}$$

Due to (2.4), we bound the right hand side by \(C(\Vert \rho _s\Vert _1,\Vert \rho _s\Vert _\infty , \Vert \omega (1+|x|)\rho _s\Vert _1, W) + \omega (1+|x_2|) \Vert \rho _s\Vert _1\). Note that the left hand grows superlinearly in \(|x_2|\) due to (V2), whereas \(\omega (1+|x_2|)\) at most grows linearly in \(|x_2|\) by assumption (K3) on \(W\). This leads to

$$\begin{aligned} |x_2| \le C(\Vert \rho _s\Vert _1,\Vert \rho _s\Vert _\infty , \Vert \omega (1+|x|)\rho _s\Vert _1, W, V), \end{aligned}$$

which completes the proof. \(\square \)

3 Existence of global minimizers

In Sect. 2, we showed that if \(\rho _s \in L^1_+({\mathbb {R}}^d)\cap L^\infty ({\mathbb {R}}^d)\) is a stationary state of (1.1) in the sense of Definition  2.1 and it satisfies \(\omega (1+|x|)\rho _s \in L^1({\mathbb {R}}^d)\), then it must be radially decreasing up to a translation. This section is concerned with the existence of such stationary solutions. Namely, under (K1)–(K4) and one of the extra assumptions (K5) or (K6) below, we will show that for any given mass, there indeed exists a stationary solution satisfying the above conditions. We will generalize the arguments of [28] to show that there exists a radially decreasing global minimizer \(\rho \) of the functional (2.5) given by

$$\begin{aligned} {\mathcal E}[\rho ]=\frac{1}{m-1}\int _{{\mathbb {R}}^{d}}\rho ^{m}\,dx+\frac{1}{2}\int _{{\mathbb {R}}^{d}}\int _{{\mathbb {R}}^{d}}W(x-y)\,\rho (x)\rho (y)\,dx\,dy\, \end{aligned}$$

over the class of admissible densities

$$\begin{aligned} \begin{aligned} \mathcal {Y}_{M}&:= \left\{ \rho \in L_{+}^{1}({\mathbb {R}}^{d})\cap L^{m}({\mathbb {R}}^{d}):\,\Vert \rho \Vert _{L^1({\mathbb {R}}^{d})}=M,\right. \\&\qquad \qquad \qquad \quad \left. \int _{{\mathbb {R}}^{2}}x\rho (x)\,dx=0,\,\omega (1+|x|)\,\rho (x)\in L^{1}({\mathbb {R}}^{d})\right\} , \end{aligned} \end{aligned}$$

and with the potential satisfying at least (K1)–(K4). Note that the condition on the zero center of mass has to be understood in the improper integral sense, i.e.

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}x\rho (x)\,dx=\lim _{R\rightarrow \infty } \int _{|x|<R}x\rho (x)\,dx = 0\, \end{aligned}$$

since we do not assume that the first moment is bounded in the class \(\mathcal {Y}_{M}\). We emphasize that from now on we will work in the dominated regime with degenerate diffusion, namely when

$$\begin{aligned} m>\max \left\{ 2-\frac{2}{d},1\right\} . \end{aligned}$$
(3.1)

In order to avoid loss of mass at infinity, we need to assume some growth condition at infinity. In this section, we will obtain the existence of global minimizers under two different conditions related to the works [5, 28, 67], and show that such global minimizers are indeed \(L^1\) and \(L^\infty \) stationary solutions. Namely, we assume further that the potential W satisfies at infinity either the property

  1. (K5)

    \(\displaystyle \lim _{r\rightarrow +\infty }\omega _{+}(r)=+\infty \),

or

  1. (K6)

    \(\displaystyle \lim _{r\rightarrow +\infty }\omega _{+}(r)=\ell \in (0,+\infty )\) where the non-negative potential \(\mathcal {K}:=\ell -W\) is such that, in the case \(m>2\), \(\mathcal {K}\in L^{\hat{p}}({\mathbb {R}}^{d}\setminus B_{1}(0))\), for some \(1\le \hat{p}<\infty \), while for the case \(2-(2/d)<m\le 2\) we will require that \(\mathcal {K}\in L^{p,\infty }({\mathbb {R}}^{d}\setminus B_{1}(0))\), for some \(1\le p<\infty \). Moreover, there exists an \(\alpha \in (0,d)\) for which \(m>1+\alpha /d\) and

    $$\begin{aligned} \mathcal {K} (\tau x)\ge \tau ^{-\alpha } \mathcal {K}(x),\quad \forall \tau \ge 1,\, \text{ for } \text{ a.e. } x\in {\mathbb {R}}^{d}. \end{aligned}$$
    (3.2)

Here, we denote by \(L^{p,\infty }({\mathbb {R}}^{d})\) the weak-\(L^p\) or Marcinkiewicz space of index \(1\le p<\infty \). In particular, the attractive Newtonian potential (which is the fundamental solution of \(-\Delta \) operator in \({\mathbb {R}}^d\)) is covered by these assumptions: for \(d=1,2\) it satisfies (K5), whereas for \(d\ge 3\) it satisfies (K6) with \(\alpha = d-2\).

Notice that the subadditivity-type condition (K4) allows to claim that \({\mathcal E}[\rho ]\) is finite over the class \(\mathcal {Y}_{M}\): indeed if we split the W into its positive part \(W_{+}\) and negative part \(W_{-}\) as done in the bound of \(\psi _s\) in Sect. 2, the integral with kernel \(W_{-}\) is finite by the HLS inequality, see (3.3) below, while by (K4) we infer

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}\int _{{\mathbb {R}}^{d}}W_{+}(x-y)\,\rho (x)\rho (y)\,dx\,dy&=\int _{{\mathbb {R}}^{d}}\int _{|x-y|\ge 1}\omega _{+}(|x-y|)\,\rho (x)\rho (y)\,dx\,dy\\&\le C M^{2}+2M\int _{{\mathbb {R}}^{d}}\omega (1+|x|)\rho (x)dx. \end{aligned}$$

3.1 Minimization of the Free Energy functional

The existence of minimizers of the functional \({\mathcal E}\) can be proven with different arguments according to the choice between condition (K5) or (K6): indeed, (K5) produces a quantitative version of the mass confinement effect while (K6) does it in a nonconstructive way. For such a difference, we first briefly discuss the case when condition (K6) is employed, as it can be proven by a simple application of Lion’s concentration-compactness principle [67] and its variant in [5].

Theorem 3.1

Assume that conditions (3.1), (K1)–(K4) and (K6) hold. Then for any positive mass M, there exists a global minimizer \(\rho _{0}\), which is radially symmetric and decreasing, of the free energy functional \({\mathcal E}\) in \(\mathcal {Y}_{M}\). Moreover, all global minimizers are radially symmetric and decreasing.

Proof

We write \({\mathcal E}[\rho ]={\widetilde{{\mathcal E}}}[\rho ]+\frac{\ell }{2}M^{2}\), where

$$\begin{aligned} \widetilde{{\mathcal E}}[\rho ]=\frac{1}{m-1}\int _{{\mathbb {R}}^{d}}\rho ^{m}\,dx-\frac{1}{2}\int _{{\mathbb {R}}^{d}}\int _{{\mathbb {R}}^{d}}\mathcal {K}(x-y)\,\rho (x)\rho (y)\,dx\,dy, \end{aligned}$$

being the kernel \(\mathcal {K}\) non-negative and radially decreasing; furthermore condition (K3) implies \(\mathcal {K}\in L^{p,\infty }(B_{1}(0))\), where \(p=d/(d-2)\). Then we are in position to apply [5, Theorem 1] for \(m>2\) and [67, Corollary II.1] for \(2-(2/d)<m\le 2\) to get the existence of a radially decreasing minimizer \(\rho _{0}\in \mathcal {Y}_{M}\) of \(\widetilde{{\mathcal E}}\) (and then of \({{\mathcal E}}\)). Moreover, since \(\mathcal {K}\) is strictly radially decreasing, all global minimizers are radially decreasing. \(\square \)

When considering the presence of condition (K5) the concentration-compactness principle is not applicable but a direct control of the mass confinement phenomenon is possible. Then we first prove the following Lemma, which provides a reversed Riesz inequality, allowing to reduce the study of the minimization of \({\mathcal E}\) to the set of all the radially decreasing density in \(\mathcal {Y}_{M}\).

Lemma 3.2

Assume that conditions (K1)–(K5) hold and take a density \(\rho \) such that

$$\begin{aligned} \rho \in L_{+}^{1}({\mathbb {R}}^{d}),\,\omega (1+|x|)\,\rho (x)\in L^{1}({\mathbb {R}}^{d}). \end{aligned}$$

Then the following inequality holds:

$$\begin{aligned} \mathcal {I}[\rho ]= & {} \int _{{\mathbb {R}}^{2}}\int _{{\mathbb {R}}^{2}}W(x-y)\rho (x)\rho (y)dx\,dy\\\ge & {} \int _{{\mathbb {R}}^{2}}\int _{{\mathbb {R}}^{2}}W(x-y)\rho ^{\#}(x)\rho ^{\#}(y)dx\,dy=\mathcal {I}[\rho ^\#] \end{aligned}$$

and the equality occurs if and only if \(\rho \) is a translate of \(\rho ^{\#}\).

Proof

The proof proceeds exactly as in [27, Lemma 2], up to replacing the function k(r) defined there by the function

$$\begin{aligned} \kappa (r)= \left\{ \begin{array} [c]{lll} -\omega (r) &{} &{} \text {if }r\le r_{0}\\ -\omega (r_{0})-\int _{r_{0}}^{r}\omega ^{\prime }(s)\frac{1+r_{0}^2}{1+s^{2}}ds &{} &{} \text {if }r>r_{0}, \end{array} \right. \end{aligned}$$

being \(r_{0}>0\) fixed. \(\square \)

Theorem 3.3

Assume that (3.1) and (K1)–(K5) hold, then the conclusions of Theorem 3.1 remain true.

Proof

We follow the main lines of [28, Theorem 2.1]. By Lemma 3.2 we can restrict ourselves to consider only radially decreasing densities \(\rho \). In order to show that \(\mathcal {I}[\rho ]\) is bounded from below, we first argue in the case \(d\ge 3\). Thanks to conditions (K1)–(K2) we have

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}\int _{{\mathbb {R}}^{d}}W(x-y)\rho (x)\rho (y)dx\,dy&\ge -C \int _{{\mathbb {R}}^{d}}\int _{|x-y|\le 1}\frac{\rho (x)\rho (y)}{|x-y|^{d-2}}dx\,dy\\&\ge -C \int _{{\mathbb {R}}^{d}}\int _{{\mathbb {R}}^{d}}\frac{\rho (x)\rho (y)}{|x-y|^{d-2}}dx\,dy. \end{aligned}$$

Now we observe that by (3.1) we have

$$\begin{aligned} 1<\frac{2d}{d+2}<m \end{aligned}$$

and \(\frac{d-2}{d}+\frac{d+2}{2d}+\frac{d+2}{2d}=2\), then by the classical HLS and \(L^{p}\) interpolation inequalities, we find

$$\begin{aligned} \mathcal {I}[\rho ]= & {} \int _{{\mathbb {R}}^{d}}\int _{{\mathbb {R}}^{d}}W(x-y)\rho (x)\rho (y)dx\,dy\ge -C\Vert \rho \Vert ^{2}_{L^{2d/(d+2)}({\mathbb {R}}^{d})}\nonumber \\\ge & {} -C\Vert \rho \Vert ^{2\alpha }_{L^{1}({\mathbb {R}}^{d})} \Vert \rho \Vert ^{2(1-\alpha )}_{L^{m}({\mathbb {R}}^{d})}, \end{aligned}$$
(3.3)

where \(\alpha =\frac{1}{m-1}\left( m\frac{d+2}{2d}-1\right) \). Then by (3.3) we find that

$$\begin{aligned} {\mathcal E}[\rho ]\ge \frac{1}{m-1}\Vert \rho \Vert ^{m}_{L^{m}({\mathbb {R}}^{d})}-CM^{2\alpha } \Vert \rho \Vert ^{2(1-\alpha )}_{L^{m}({\mathbb {R}}^{d})} \end{aligned}$$
(3.4)

where we notice that \(m>2(1-\alpha )\) if and only if \(m>2-\frac{2}{d}\), that is (3.1). Then by (3.4) we can find a constant \(C_{1}>0\) and a sufficiently large constant \(C_{2}\) such that

$$\begin{aligned} {\mathcal E}[\rho ]\ge -C_{1}+C_{2}\Vert \rho \Vert ^{m}_{L^{m}({\mathbb {R}}^{d})}. \end{aligned}$$

Concerning the case \(d=2\), we observe that conditions (K1)–(K2) yield

$$\begin{aligned} \int _{{\mathbb {R}}^{2}}\int _{{\mathbb {R}}^{2}}W(x-y)\rho (x)\rho (y)dx\,dy&\ge -C\int _{{\mathbb {R}}^2}dx\int _{|x-y|\le 1}\log (|x-y|)\rho (x)\rho (y)dxdy\\&\ge - C\int _{{\mathbb {R}}^2}dx\int _{{\mathbb {R}}^2}\log (|x-y|)\rho (x)\rho (y)dxdy \end{aligned}$$

and we can use the classical log-HLS inequality and the arguments of [28] to conclude.

Concerning the mass confinement, due to (K5) and the same arguments in [28], see also Lemma 4.17, allow us to show

$$\begin{aligned} \int _{|x|>R}\rho (x)\,dx\le \frac{C}{\omega (R)}\underset{R\rightarrow \infty }{\longrightarrow 0}. \end{aligned}$$

Finally, we should check that the interaction potential W is lower semicontinuous as shown in [28, page8]. Indeed, the only technical point to verify in this more general setting relates to the control of the truncated interaction potential \(\mathsf {A}^{\varepsilon }\) for \(d\ge 3\). Notice that we can estimate due to (2.3)

$$\begin{aligned} |\mathsf {A}^{\varepsilon }[\rho ]|&:=\left| \int _{{\mathbb {R}}^{d}}\int _{|x-y|\le \varepsilon }W(x-y)\rho (x)\rho (y)dxdy\right| \\&\le C \int _{{\mathbb {R}}^{d}}\int _{|x-y|\le \varepsilon }\frac{\rho (x)\rho (y)}{|x-y|^{d-2}}dxdy\\&=C \int _{{\mathbb {R}}^{d}}\int _{{\mathbb {R}}^{d}}\rho (x)\,(\chi _{B_{\varepsilon }(0)}\,\mathcal {N})(x-y)\,\rho (y)\,dxdy. \end{aligned}$$

Now recall that the Newtonian potential

$$\begin{aligned} \Psi _{\rho }(x)=\int _{{\mathbb {R}}^{d}}\frac{\rho (y)}{|x-y|^{d-2}}dy \end{aligned}$$

is well defined for a.e. \(x\in {\mathbb {R}}^{d}\) and is in \(L^{1}_{loc}({\mathbb {R}}^{d})\), see [47, Theorem 2.21], then for a.e. \(x\in {\mathbb {R}}^{d}\) we have \(\chi _{B_{\varepsilon }(0)}\,\mathcal {N}*\rho \rightarrow 0\) as \(\varepsilon \rightarrow 0\). Moreover, by the HLS inequality we have

$$\begin{aligned} \rho (x)(\chi _{B_{\varepsilon }(0)}\,\mathcal {N}*\rho )(x)\le \rho (x)\Psi _{\rho }(x)\in L^{1}_{loc}({\mathbb {R}}^{d}) \end{aligned}$$

with

$$\begin{aligned} \Vert \rho (x)\Psi _{\rho }\Vert _{L^{1}({\mathbb {R}}^{d})}\le C\Vert \rho \Vert ^{2\alpha }_{L^{1}({\mathbb {R}}^{d})} \Vert \rho \Vert ^{2(1-\alpha )}_{L^{m}({\mathbb {R}}^{d})}. \end{aligned}$$

Then Lebesgue’s dominated convergence theorem allows to conclude that \(\mathsf {A}^{\varepsilon }[\rho ]\rightarrow 0\) as \(\varepsilon \rightarrow 0\). This convergence is uniform taken on a minimizing sequence \(\rho _{n}\).

Now, all ingredients are there to argue as in [28] showing that \({\mathcal E}\) achieves its infimum in the class of all radially decreasing densities in \(\mathcal {Y}_{M}\). \(\square \)

Remark 3.4

According to Theorem 2.2, the radial symmetry of the global minimizers of \({\mathcal E}\), which are particular critical points of \({\mathcal E}\), is not a surprise. Nevertheless, as pointed out in the proofs of Theorems 3.13.3, this property can be much more easily achieved by rearrangement inequalities.

A useful result, which will be used in the next arguments, regards the behavior at infinity of the so called W-potential, namely the function

$$\begin{aligned} \psi _{f}(x)=\int _{{\mathbb {R}}^{d}}W(x-y)f(y)dy. \end{aligned}$$

Following the blueprint of [37, Lemma 1.1], we have the following result.

Lemma 3.5

Assume that (K1)–(K5) hold, and let

$$\begin{aligned} f\in L^{1}({\mathbb {R}}^{d})\cap L^{\infty }({\mathbb {R}}^{d}\setminus B_1(0)) ,\,\omega (1+|x|)\, f(x)\in L^{1}({\mathbb {R}}^{d}). \end{aligned}$$

Then

$$\begin{aligned} \frac{\psi _{f}(x)}{W(x)}\rightarrow \int _{{\mathbb {R}}^{d}}f(y)dy\quad \text {as }|x|\rightarrow +\infty . \end{aligned}$$

Proof

As in Chae-Tarantello [37], we first set

$$\begin{aligned} \sigma (x):= & {} \frac{\psi _{f}(x)}{\omega (|x|)}-\int _{{\mathbb {R}}^{d}}f(y)dy \nonumber \\= & {} \frac{1}{\omega (|x|)}\int _{{\mathbb {R}}^{d}}\left[ \omega (|x-y|)-\omega (|x|)\right] f(y)dy \end{aligned}$$
(3.5)

so that our aim will be to show that \(\sigma (x)\rightarrow 0\) as \(|x|\rightarrow \infty \). Assume that \(|x|>2\). We then write

$$\begin{aligned} \sigma (x)=\sigma _{1}(x)+\sigma _{2}(x)+\sigma _{3}(x), \end{aligned}$$

where \(\sigma _{i}\), \(i=1,2,3\), are defined by breaking the integral on the right hand side of (3.5) into:

$$\begin{aligned} D_{1}= & {} \left\{ y:|x-y|<1\right\} \, , \, D_{2}=\left\{ y:|x-y|>1,\,|y|\le R\right\} \text{ and } \\ D_{3}= & {} \left\{ y:|x-y|>1,\,|y|>R\right\} \end{aligned}$$

respectively, where \(R>2\) is a fixed constant. Recall that (K2) implies \(|\omega (r)|\le C \phi (r)\) for \(r\le 1\), with \(\phi \) given in (2.3). Thus, we have

$$\begin{aligned} |\sigma _{1}(x)|&\le \frac{1}{\omega (|x|)}\int _{|x-y|<1}\left| \omega (|x-y|)-\omega (|x|)\right| |f(y)|dy\\&\le \frac{C}{\omega (|x|)}\int _{|x-y|<1}\phi (|x-y|)\,|f(y)|dy+\int _{|y|>|x|-1}|f|dy\\&\le \frac{C\Vert f\Vert _{L^{\infty }({\mathbb {R}}^{d}\setminus B_1(0))} \Vert \phi \Vert _{L^1(B_1(0))}}{\omega (|x|)}+\int _{|y|>|x|-1}|f|dy\,, \end{aligned}$$

where we used \(f\in L^{\infty }({\mathbb {R}}^{d}\setminus B_1(0))\) and \(|x|>2\) in the last inequality. This means that \(\sigma _{1}(x)\rightarrow 0\) as \(|x|\rightarrow \infty \). Moreover, we notice that

$$\begin{aligned} |\sigma _{2}(x)|\le \frac{1}{\omega (|x|)}\int _{\left\{ y\in {\mathbb {R}}^{d}:|x-y|>1,\,|y|<R\right\} }\left| \omega (|x-y|)-\omega (|x|)\right| |f(y)|dy\,. \end{aligned}$$

Since by property (K3) we can estimate in the region \(D_{2}\)

$$\begin{aligned} \left| \omega (|x-y|)-\omega (|x|)\right| \le C \big ||x-y|-|x|\big |\le C|y|\le CR\,, \end{aligned}$$

such that

$$\begin{aligned} |\sigma _{2}(x)|\le C\frac{R}{\omega (|x|)}\Vert f\Vert _{L^{1}({\mathbb {R}}^{d})}, \end{aligned}$$

which implies that also \(\sigma _{2}(x)\rightarrow 0\) as \(|x|\rightarrow +\infty \). As for \(\sigma _3\), for x such that \(|x|>R\), using (K4)–(K5) we write

$$\begin{aligned} |\sigma _{3}(x)|\le&\, \frac{1}{\omega (|x|)}\int _{\left\{ y\in {\mathbb {R}}^{d}:|x-y|>1,\,R<|y|<|x|\right\} }\left| \omega (|x-y|)-\omega (|x|)\right| |f(y)|dy\\&+\frac{1}{\omega (|x|)}\int _{\left\{ y\in {\mathbb {R}}^{d}:|x-y|>1,\,|y|>|x|\right\} }\omega (|x-y|)|f(y)|dy+\int _{|y|>R}|f(y)|dy\\ \le&\, \frac{\omega (2|x|)}{\omega (|x|)}\int _{|y|>R}|f(y)|dy+2\int _{|y|>R}|f(y)|dy\\&+\frac{C_w}{\omega (|x|)}\int _{\left\{ y\in {\mathbb {R}}^{d}:|x-y|>1,\,|y|>|x|\right\} } \left[ 1+\omega (1+|x|)+\omega (1+|y|)\right] |f(y)|dy\\ \le&\, C\left( 1+\frac{1}{\omega (|x|)}\right) \int _{|y|>R}|f(y)|dy+\frac{1}{\omega (|x|)}\int _{{\mathbb {R}}^d}\omega (1+|y|)|f(y)|dy\\&\rightarrow C \int _{|y|>R}|f(y)|dy \end{aligned}$$

as \(|x|\rightarrow +\infty \), for any fixed \(R>1\). Hence letting \(R\rightarrow +\infty \) we get \(\sigma _{3}(x)\rightarrow 0\). \(\square \)

In case of assumption (K6), we prove the following Lemma.

Lemma 3.6

Assume (3.1), (K1)–(K4) and (K6) hold, and let \(\mathcal {K} := \ell -W\) be as defined in (K6). Then the following holds for any radially decreasing \( f\in L^1_+({\mathbb {R}}^{d})\):

$$\begin{aligned} \lim _{|x|\rightarrow \infty } \int _{{\mathbb {R}}^{d}}\mathcal {K}(x-y)f(y)dy =0, \end{aligned}$$
(3.6)

and

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}\mathcal {K}(x-y)f(y)dy \ge c \mathcal {K}(x) \text { for all }|x|>1, \end{aligned}$$
(3.7)

where \(c:= 2^{-\alpha } \int _{B_1(0)} f(y)dy>0\), with \(\alpha >0\) as given in (K6).

Proof

Since both f and \(\mathcal {K}\) are radially symmetric, we define \({\bar{f}}\), \(\bar{\mathcal {K}}: [0,+\infty )\rightarrow {\mathbb {R}}\) such that \({\bar{f}}(|x|) = f(x)\), \(\bar{\mathcal {K}}(|x|)=\mathcal {K}(x)\). Note that \(\lim _{r\rightarrow \infty } {\bar{f}}(r) = \lim _{r\rightarrow \infty } \bar{\mathcal {K}}(r) = 0\) due to (K1), (K6) and the assumption on f. To prove (3.6), we break \(\int _{{\mathbb {R}}^{d}}\mathcal {K}(x-y)f(y)dy\) into the following three parts with \(|x|>1\) and control them respectively by:

$$\begin{aligned} \int _{|y|>\frac{|x|}{2}, |x-y|\le 1}\mathcal {K}(x-y)f(y)dy \le \Vert \mathcal {K}\Vert _{L^1(B_1(0))} \bar{f}(|x|-1),\\ \int _{|y|>\frac{|x|}{2}, |x-y|> 1}\mathcal {K}(x-y)f(y)dy \le \bar{\mathcal {K}}(1) \int _{|y|>\frac{|x|}{2}} f(y)dy, \end{aligned}$$

and

$$\begin{aligned} \int _{|y| \le \frac{|x|}{2}}\mathcal {K}(x-y)f(y)dy \le \bar{\mathcal {K}}\left( \frac{|x|}{2}\right) \Vert f\Vert _{L^1}. \end{aligned}$$

Since all the three parts tend to 0 as \(|x|\rightarrow \infty \), we obtain (3.6). To show (3.7), we use \(\mathcal {K}, f \ge 0\) to estimate

$$\begin{aligned} \begin{aligned} \int _{{\mathbb {R}}^{d}}\mathcal {K}(x-y)f(y)dy&\ge \int _{|y|\le 1}\mathcal {K}(x-y)f(y)dy \ge \bar{\mathcal {K}}(|x|+1) \int _{B_1(0)} f(y)dy \\&\ge \left( \frac{|x|+1}{|x|}\right) ^{-\alpha } \bar{\mathcal {K}}(|x|) \int _{B_1(0)} f(y)dy \ge c \mathcal {K}(x)\\&\quad \text { for any }|x|>1, \end{aligned} \end{aligned}$$

where we apply (K6) to obtain the third inequality, and in the last inequality we define \(c:= 2^{-\alpha } \int _{B_1(0)} f(y)dy>0\). \(\square \)

Using similar arguments as in [28], we are able to derive the following result, which indeed gives a natural form of the Euler–Lagrange equation associated to the functional \({\mathcal E}\):

Theorem 3.7

Assume that (3.1), (K1)–(K4) and either (K5) or (K6) hold. Let \(\rho _0\in \mathcal {Y_{M}}\) be a global minimizer of the free energy functional \({\mathcal E}\). Then for some positive constant \(\mathsf {D}[\rho _{0}]\), we have that \(\rho _0\) satisfies

$$\begin{aligned} \frac{m}{m-1}\rho _{0}^{m-1}+W*\rho _{0}= \mathsf {D}[\rho _{0}]\quad a.e. \text { in } \text{ supp }(\rho _{0}) \end{aligned}$$
(3.8)

and

$$\begin{aligned} \frac{m}{m-1}\rho _{0}^{m-1}+W*\rho _{0}\ge \mathsf {D}[\rho _{0}]\quad a.e. \text { outside } \text{ supp }(\rho _{0}) \end{aligned}$$

where

$$\begin{aligned} \mathsf {D}[\rho _{0}]=\frac{2}{M}\mathsf {G}[\rho _0]+\frac{m-2}{M(m-1)}\Vert \rho _{0}\Vert _{L^m({\mathbb {R}}^d)}^{m}. \end{aligned}$$

As a consequence, any global minimizer of \({\mathcal E}\) verifies

$$\begin{aligned} \frac{m}{m-1}\rho _{0}^{m-1}=\left( \mathsf {D}[\rho _{0}]-W*\rho _{0}\right) _{+}. \end{aligned}$$
(3.9)

We now turn to show compactness of support and boundedness of the minimizers.

Lemma 3.8

Assume that (3.1), (K1)–(K4) and either (K5) or (K6) hold and let \(\rho _0\in \mathcal {Y}_{M}\) be a global minimizer of the free energy functional \({\mathcal E}\). Then \(\rho _0\) is compactly supported.

Proof

By Theorems 3.1 and 3.3, \(\rho _0\) is radially decreasing under either set of assumptions. In addition, under the assumption (K5), Lemma 3.5 gives that

$$\begin{aligned} \frac{(W*\rho _0)(x)}{W(x)} \rightarrow \Vert \rho \Vert _{L^1({\mathbb {R}}^d)} \text { as }|x|\rightarrow \infty , \end{aligned}$$

hence combining this with (K5) gives us \((W*\rho _0)(x) \rightarrow +\infty \) as \(|x|\rightarrow \infty \). It implies that the right hand side of (3.9) must have compact support, hence \(\rho _0\) must have compact support too.

Under the assumption (K6), towards a contradiction, suppose \(\rho _0\) does not have compact support. Then \(\rho _0\) must be strictly positive in \({\mathbb {R}}^d\) since it is radially decreasing. We can then write (3.8) as

$$\begin{aligned} \frac{m}{m-1} \rho _0^{m-1} - \mathcal {K}*\rho _0 = C \quad \text {a.e. in }{\mathbb {R}}^d \end{aligned}$$

for some \(C\in {\mathbb {R}}\), where \(\mathcal {K} := \ell -W\) is as given in (K6). Indeed, C must be equal to 0, since both \(\rho _0(x)\) and \((\mathcal {K}*\rho _0)(x)\) tend to 0 as \(|x|\rightarrow \infty \), where we used (3.6) on the latter convergence. Thus

$$\begin{aligned} \rho _0(x) = \left( \frac{m-1}{m}(\mathcal {K}*\rho _0)(x)\right) ^{\tfrac{1}{m-1}} \ge \Big (\frac{m-1}{m} c \mathcal {K}(x)\Big )^{1/(m-1)} \text { for a.e. }|x|>1,\nonumber \\ \end{aligned}$$
(3.10)

where we applied (3.7) to obtain the last inequality, with \(c:= 2^{-\alpha } \int _{B_1(0)} \rho _0(y)dy>0\). Due to the assumptions (3.2) and \(\alpha < d(m-1)\) in (K6), we have \(\int _{|x|>1} \mathcal {K}(x)^{1/(m-1)}dx = +\infty \). Combining this with (3.10) leads to \(\rho _0 \not \in L^1({\mathbb {R}}^d)\), a contradiction. \(\square \)

Lemma 3.9

Assume that (3.1), (K1)–(K4) and either (K5) or (K6) hold and let \(\rho _0\in \mathcal {Y}_{M}\) be a global minimizer of the free energy functional \({\mathcal E}\). Then \(\rho _0 \in L^\infty ({\mathbb {R}}^d)\).

Proof

By Theorems 3.13.3 and Lemma 3.8, \(\rho _0\) is radially decreasing and has compact support say inside the ball \(B_R(0)\). Let us first concentrate on the proof under assumption (K5). For notational simplicity in this proof, we will denote by \(\Vert \rho _0\Vert _m\) the \(L^m({\mathbb {R}}^d)\)-norm of \(\rho _0\).

We will show that \(\rho _0 \in L^\infty ({\mathbb {R}}^d)\) by different arguments in several cases:

Case A:\(d\le 2\). Since \(\rho _0\) is supported in \(B_R(0)\), we can then find some \(C_w^1\) and \(C_w^2\), such that \(W \ge -C_w^1 \mathcal {N} - C_w^2\) in \(B_{2R}(0)\). Hence for any \(r<R\), we have

$$\begin{aligned} -(\rho _0*W)(r) \le -(\rho _0*(-C_w^1 \mathcal {N}- C_w^2))(r) \le C_w^1 (\rho _0*\mathcal {N})(r) + C_w^2 \Vert \rho _0\Vert _1, \end{aligned}$$

thus recalling (2.4)

$$\begin{aligned} (\rho _0*W^{-})(r)\le & {} C_w^1 (\rho _0*\mathcal {N})(r) + C_w^2 \Vert \rho _0\Vert _1+ (\rho _0*W^{+})(r)\\\le & {} C_w^1 (\rho _0*\mathcal {N})(r) + C_w^2 \Vert \rho _0\Vert _1+\widetilde{C}. \end{aligned}$$

Then by Eq. (3.9) it will be enough to show that the Newtonian potential\(\rho _0*\mathcal {N}\) is bounded in \(B_R(0)\) for \(d=1,2\). In \(d=1\), this is trivial. In \(d=2\) it follows from [50, Lemma 9.9] since we have that \(\rho _0*\mathcal {N}\in \mathcal {W}^{2,m}(B_R(0))\), then Morrey’s Theorem (see for instance [17, Corollary 9.15]) yields \(\rho _0*\mathcal {N}\in L^{\infty }(B_R(0))\).

Case B:\(d\ge 3\)and\(m> d/2\). In this case we get \(W^{-}\le C_{w}\,\mathcal {N}\) in the whole \({\mathbb {R}}^{d}\) for some constant \(C_w\), so we have for \(r>0\)

$$\begin{aligned} (\rho _0*W^{-})(r) \le C_w (\rho _0*\mathcal {N})(r). \end{aligned}$$

Then using Sobolev’s embedding theorem again (see again [17, Corollary 9.15]), we easily argue that for \(m>d/2\) we find \((\rho _0*W^{-})(r)\in L^{\infty }_{loc}({\mathbb {R}}^{d})\), hence \(\rho _0 \in L^\infty ({\mathbb {R}}^d)\) by (3.9) again.

Case C:\(d\ge 3\)and\(2-\tfrac{2}{d}<m\le d/2\). We aim to prove that \(\rho _{0}(0)\) is finite which is sufficient for the boundedness of \(\rho _0\) since \(\rho _0\) is radially decreasing. This is done by an inductive argument. To begin with, observe that since \(\rho _0\) is radially decreasing we have that \(\rho _0(r)^m |B(0,r)| \le \Vert \rho _0\Vert _m^m < \infty ,\) which leads to the basis step of our induction

$$\begin{aligned} \rho _0(r) \le C(d, m, \Vert \rho _0\Vert _m) r^{-d/m} \text { for all }r>0. \end{aligned}$$

We set our first exponent \({\tilde{p}} =-d/m\). For the induction step, we claim that if \(\rho _0(r) \le C_1( 1+r^{p})\) with \(-d<p<0\), then it leads to the refined estimate

$$\begin{aligned} \rho _0(r) \le {\left\{ \begin{array}{ll} C_2 (1+r^{\frac{p+2}{m-1}}) &{}\text { if }p\ne -2\\ C_2 (1+|\log r|^{\frac{1}{m-1}}) &{} \text { if }p=-2, \end{array}\right. } \end{aligned}$$
(3.11)

where \(C_2\) depends on \(d,m, \rho _0, W\) and \(C_1\).

Indeed, taking into account (K2) and (K5), the compact support of \(\rho _0\) together with the fact that \(\mathcal {N}> 0\) for \(d\ge 3\), we deduce that \(W \ge - C_{w,d} \,\mathcal {N}\) for some constant depending on W and d. As a result, we have, for \(r\in (0,1)\),

$$\begin{aligned} -(\rho _0*W)(r)\le & {} C_{w,d} (\rho _0*\mathcal {N})(r) \nonumber \\= & {} C_{w,d}\left( (\rho _0 * \mathcal {N})(1)- \int _r^1 \partial _r (\rho _0*\mathcal {N})(s) ds \right) .\qquad \qquad \end{aligned}$$
(3.12)

We can easily bound \((\rho _0 * \mathcal {N})(1)\) by some \(C(d,\Vert \rho _0\Vert _m)\). To control \( \int _r^1 \partial _r (\rho _0*\mathcal {N})(s) ds\), recall that

$$\begin{aligned} -\partial _r (\rho _0*\mathcal {N})(s) = \frac{M(s)}{|\partial B(0,s)|} = \frac{M(s)}{\sigma _d s^{d-1}}, \end{aligned}$$
(3.13)

where M(s) is the mass of \(\rho _0\) in B(0, s). By our induction assumption, we have

$$\begin{aligned} M(s) \le \int _0^s C_1( 1+t^{p}) \sigma _d t^{d-1} dt = C_1 \sigma _d \left( \frac{s^d}{d} + \frac{s^{d+p}}{d+p}\right) . \end{aligned}$$

Combining this with (3.13), we have

$$\begin{aligned} -\partial _r (\rho _0*\mathcal {N})(s) \le C_1 \left( \frac{s}{d} + \frac{s^{1+p}}{d+p}\right) , \end{aligned}$$

so we get, for \(p\ne -2\),

$$\begin{aligned} - \int _r^1 \partial _r (\rho _0*\mathcal {N})(s) ds\le C_{1}\left[ \frac{1}{2d}(1-r^{2})+\frac{1}{(d+p)(2+p)}(1-r^{2+p})\right] \,. \end{aligned}$$

Plugging it into the right hand side of (3.12) yields

$$\begin{aligned} -(\rho _0*W)(r) \le C(d,m,\Vert \rho _0\Vert _m, C_{w,d}, C_1)(1+r^{2+p}), \end{aligned}$$

and using this inequality in the Euler–Lagrange Eq. (3.9) leads to (3.11). Moreover, in the case \(p=-2\), we have instead the inequality

$$\begin{aligned} - \int _r^1 \partial _r (\rho _0*\mathcal {N})(s) ds\le C_{1}\left[ \frac{1}{2d}(1-r^{2})-\frac{1}{d-2}\log r\right] . \end{aligned}$$

Now we are ready to apply the induction starting at \({\tilde{p}}=-d/m\) to show \(\rho _0(0)<\infty \). We will show that after a finite number of iterations our induction arrives to

$$\begin{aligned} \rho _0(r) \le C (1+r^a) \end{aligned}$$
(3.14)

for some \(a>0\), which then implies that \(\rho _0(0) < \infty \). Let \(g(p) := \frac{p+2}{m-1}\), which is a linear function of p with positive slope, and let us denote \(g^{(n)}(p)=: \underbrace{(g\circ g\dots \circ g)}_{n \text { iterations}}(p)\).

Subcase C.1:\(m=d/2\).- In this case, we have \(\widetilde{p}=-2\) and by (3.11) we obtain

$$\begin{aligned} \rho _0(r) \le C_2 (1+|\log r|^{\frac{1}{m-1}})\le C_2 (1+r^{-1}) \end{aligned}$$

hence applying the first inequality in (3.11) for \(p=-1\) gives us (3.14) with \(a=1/(m-1)\).

Then it remains to consider the case \(m<d/2\). Notice that \(-d<\widetilde{p}<-2\). By (3.11) we get, for all \(r\in (0,1)\),

$$\begin{aligned} \rho _{0}(r)\le C_{2}(1+r^{g(\widetilde{p})}). \end{aligned}$$
(3.15)

Then we must consider three cases. We point out that in all the cases we need to discuss the possibility of \(g^{(n)}(p) = -2\) for some n: if this happens, the logarithmic case occurs again and the result follows in a final iteration step as in Subcase C.1.

Subcase C.2:\(m=2\)and\(m<d/2\).- In this case, we have \(g(p) = p+2\), hence \( g^{(n)}(p)=p+2n, \) then

$$\begin{aligned} \lim _{n\rightarrow \infty } g^{(n)}(p) =+\infty . \end{aligned}$$

Therefore we have \(g^{(n)}(\widetilde{p}) > 0\) for some finite n, whence iterating (3.15) n times we find \(\rho _0(0)<\infty \).

Subcase C.3:\(m>2\)and\(m<d/2\).- In this case, \(p=2/(m-2)\) is the only fixed point for the linear function g(p). For all \(p<\tfrac{2}{m-2}\) we have \(g(p)>p\) which implies \(g^{(n)}(p)>p\) for all \(n\in {\mathbb {N}}\). Notice that

$$\begin{aligned} g^{(n)}(p)=\frac{2}{m-2}+\frac{p(m-2)-2}{(m-2)(m-1)^{n}}, \end{aligned}$$
(3.16)

so the point \(p=2/(m-2)\) is attracting in the sense that

$$\begin{aligned} \lim _{n\rightarrow \infty } g^{(n)}(p) = \frac{2}{m-2}. \end{aligned}$$

Since \(\frac{2}{m-2}>0\), it again implies that \(g^{(n)}(p) > 0\) for some finite n. Then choosing \(p=\widetilde{p}\), we have \(g^{(n)}(\widetilde{p}) > 0\) for some n, then (3.15) implies \(\rho _0(0)<\infty \) again.

Subcase C.4:\(m<\min (2,d/2)\).- In this case, the only fixed point \(\frac{2}{m-2}\) is unstable, and we have \(g(p)>p\) for any \(p>\frac{2}{m-2}\), then by (3.16)

$$\begin{aligned} \lim _{n\rightarrow \infty } g^{(n)}(p) = +\infty \text { for any }p>\frac{2}{m-2}. \end{aligned}$$

Notice that \(\widetilde{p} >\frac{2}{m-2}\), since this condition reads \(m>2d/(d+2)\), a direct consequence of (3.1). Hence we again obtain \(g^{(n)}(\widetilde{p}) > 0\) for some finite n, which finishes the last case.

Let us finally turn back to the proof if we assume (K6) instead of (K5). Notice first that the proof of the Case C can also be done as soon as the potential W satisfies the bound \(W \ge - C_{w,d} (1+ \,\mathcal {N})\) for some \(C_{w,d}>0\). This is trivially true regardless of the dimension if the potential satisfies (K6) instead of (K5). \(\square \)

Finally, it is interesting to derive some regularity properties of a minimizer \(\rho _{0}\), as in [28]. Since W may not be the classical Newtonian kernel, we are led to prove a nice regularity for the W-potential \(\psi _{\rho _{0}}(x)\) which can be transferred to \(\rho _{0}\) via equation (3.8) in the support of \(\rho _{0}\). Note that (3.9) ensures that \(\rho _{0}\) satisfies equation (2.1) in the sense of distributions: indeed, as shown in (2.2)–(2.4), we find that \(\psi _{\rho _{0}}\in {\mathcal W}^{1,\infty }_{loc}({\mathbb {R}}^{d})\) thus we can take gradients on both sides of the Euler–Lagrange condition (3.9) and multiplying by \(\rho \) and writing \(\rho \nabla \rho ^{m-1}=\tfrac{m-1}{m}\nabla \rho ^{m}\) we reach (2.1). Now, using the regularity arguments of the proof of Lemma 2.3 again, together with the compact support property, we finally have \(\rho _{0}\in \mathcal {C}^{0,\alpha }({\mathbb {R}}^{d})\) with \(\alpha =1/(m-1)\).

We can summarize all the results in this section in the following theorem.

Theorem 3.10

In the diffusion dominated regime (3.1), assume that conditions (K1)–(K4) and either (K5) or (K6) hold. Then for any positive mass M, there exists a global minimizer \(\rho _{0}\) of the free energy functional \({\mathcal E}\) (2.5) defined in \(\mathcal {Y}_{M}\), which is radially symmetric, decreasing, compactly supported, Hölder continuous, and a stationary solution of (1.1) in the sense of Definition 2.1.

Putting together the previous theorem with the uniqueness of radial stationary solutions for the attractive Newtonian potential proved in [28, 61], we obtain the following result.

Corollary 3.11

In the particular case of the attractive Newtonian potential \(W(x)=-\mathcal {N}(x)\) modulo the addition of a constant factor, the global minimizer obtained in Theorem 3.10 is unique among all stationary solutions in the sense of Definition 2.1.

3.2 Some remarks about the minimization of energies with a potential term

The aim of this subsection is to generalize the previous result of Sect. 3.1 when dealing with free functionals involving a potential energy, namely

$$\begin{aligned} {\mathcal E}[\rho ]&=\frac{1}{m-1}\int _{{\mathbb {R}}^{d}}\rho ^{m}\,dx+\frac{1}{2}\int _{{\mathbb {R}}^{d}}\int _{{\mathbb {R}}^{d}}W(x-y)\,\rho (x)\rho (y)\,dx\,dy\\&\quad +\int _{{\mathbb {R}}^{d}}V(x)\rho (x)\,dx, \end{aligned}$$

defined over the same admissible set \(\mathcal {Y}_{M}\), for some \(\mathcal {C}^{1}\) non-negative radially increasing potential \(V=V(r)\), where \(r=|x|\), such that

$$\begin{aligned} \lim _{r\rightarrow +\infty }V(r)=+\infty . \end{aligned}$$

In this framework, the functional \({\mathcal E}\) might be infinite on some densities \(\rho \). The presence of the confinement potential V allows then to prove the following generalization of theorems 3.13.3, where no asymptotic behavior at infinity is needed for the radial profile \(\omega (r)\) of the kernel W:

Theorem 3.12

Assume that (3.1) and (K1)–(K4) hold, then the conclusions of Theorem 3.13.3 remain true.

Proof

We first observe that by Remark 2.7 and Lemma 3.2 we can restrict to radially decreasing densities. Moreover, following the lines of the proof of Theorem 3.3 we find that \({\mathcal E}\) is bounded from below and

$$\begin{aligned} {\mathcal E}[\rho ]\ge -C_{1}+C_{2}\Vert \rho \Vert ^{m}_{L^{m}({\mathbb {R}}^{d})}+\int _{{\mathbb {R}}^{d}}V(x)\rho (x)dx. \end{aligned}$$

This inequality easily implies the mass confinement of any minimizing sequence \(\left\{ \rho _{n}\right\} \), that is for some constant \(C>0\)

$$\begin{aligned} \sup _{n\in {\mathbb {N}}}\int _{|x|>R}\rho _{n}(x)dx\le \frac{C}{V(R)} \end{aligned}$$

for some large \(R>0\). In particular, we have that the sequence \(\left\{ \rho _{n}\right\} \) is tight, and by Prokhorov’s Theorem (see [3, Theorem 5.1.3]) we obtain that (up to subsequence) \(\left\{ \rho _{n}\right\} \) converges to a certain density \(\rho \in L^{1}_{+}({\mathbb {R}}^{d})\cap L^{m}({\mathbb {R}}^{d})\), \(\Vert \rho \Vert _{L^{1}({\mathbb {R}}^{d})}=M\), with respect to the narrow topology. Then [3, Lemma 5.1.7] ensures the lower semicontinuity of the potential energies of \(\left\{ \rho _{n}\right\} \), that is

$$\begin{aligned} \liminf _{n\rightarrow \infty }\int _{{\mathbb {R}}^{d}}V(x)\rho _{n}(x)dx\ge \int _{{\mathbb {R}}^{d}}V(x)\rho (x)dx. \end{aligned}$$

This implies that the infimum of \({\mathcal E}\) is achieved over a radially decreasing density \(\rho \in \mathcal {Y}_{M}\). In order to check that all the global minimizers are radially decreasing, we pick any minimizer \(\rho \in \mathcal {Y_{M}}\) and use Remark 2.7 and Lemma 3.2 in order to see that

$$\begin{aligned} {\mathcal E}[\rho ]={\mathcal E}[\rho ^{\#}], \end{aligned}$$

thus

$$\begin{aligned} \mathcal {I}[\rho ]-\mathcal {I}[\rho ^{\#}]=\int _{{\mathbb {R}}^{d}}V(x)(\rho ^{\#}-\rho )dx\le 0 \end{aligned}$$

then the equality case in Lemma 3.2 yields the conclusion. \(\square \)

We have the following generalization of Theorem 3.7:

Theorem 3.13

Assume that (3.1), (K1)–(K4) hold. Let \(\rho _0\in \mathcal {Y_{M}}\) be a global minimizer of the free energy functional \({\mathcal E}\). Then for some positive constant \(\mathsf {D}[\rho _{0}]\), we have that \(\rho _0\) satisfies

$$\begin{aligned} \frac{m}{m-1}\rho _{0}^{m-1}+W*\rho _{0}+V(x)= \mathsf {D}[\rho _{0}]\quad a.e. \text { in } \text{ supp }(\rho _{0}) \end{aligned}$$
(3.17)

and

$$\begin{aligned} \frac{m}{m-1}\rho _{0}^{m-1}+W*\rho _{0}+V(x)\ge \mathsf {D}[\rho _{0}]\quad a.e. \text { outside } \text{ supp }(\rho _{0}). \end{aligned}$$

As a consequence, any global minimizer of \({\mathcal E}\) verifies

$$\begin{aligned} \frac{m}{m-1}\rho _{0}^{m-1}=\left( \mathsf {D}[\rho _{0}]-W*\rho _{0}-V(x)\right) _{+}. \end{aligned}$$

The compactly supported property of the minimizers then follows from (3.17) and Lemmas 3.53.6. Moreover, it is straightforward to check that Lemma 3.9 continues to hold, as well as Theorem 3.10.

4 Long-time asymptotics

We now consider the particular case of (1.1) given by the Keller Segel model in two dimensions with nonlinear diffusion as

$$\begin{aligned} \partial _t \rho= & {} \Delta \rho ^m-\nabla \cdot (\rho \nabla {\mathcal N}*\rho )\,, \end{aligned}$$
(4.1)

where \(m>1\) and the logarithmic interaction kernel is defined as

$$\begin{aligned} {\mathcal N}(x)=-\frac{1}{2\pi }\log |x|\,. \end{aligned}$$

This system is also referred to as the parabolic-elliptic Keller–Segel system with nonlinear diffusion, since the attracting potential \(c={\mathcal N}*\rho \) solves the Poisson equation \(-\Delta c=\rho \). It corresponds exactly to the range of diffusion dominated cases as discussed in [23] since solutions do not show blow-up and are globally bounded. We will show based on the uniqueness part in Sect. 2 that not only the solutions to (4.1) exist globally and are uniformly bounded in time in \(L^\infty \), but also the solutions achieve stabilization in time towards the unique stationary state for any given initial mass.

The main tool for analyzing stationary states and the existence of solutions to the evolutionary problem is again the following free energy functional

$$\begin{aligned} {\mathcal E}[\rho ](t)=\int _{{\mathbb {R}}^2} \frac{\rho ^m}{m-1}dx+\frac{1}{4\pi }\int _{{\mathbb {R}}^2} \int _{{\mathbb {R}}^2}{\log } |x-y| \rho (x)\rho (y) dx\,dy\,.\nonumber \\ \end{aligned}$$
(4.2)

A simple differentiation formally shows that \({\mathcal E}\) is decaying in time along the evolution corresponding to (4.1), namely

$$\begin{aligned} \frac{d}{dt}{\mathcal E}[\rho ](t)=-{\mathcal D}[\rho ](t) \end{aligned}$$

which gives rise to the following (free) energy–energy dissipation inequality for weak solutions

$$\begin{aligned} {\mathcal E}[\rho ](t) +\int _0^t {\mathcal D}[\rho ] d\tau \le {\mathcal E}[\rho _0] \end{aligned}$$
(4.3)

for non-negative initial data \(\rho _0(x) \in L^1((1+{\log }(1+|x|^2))dx)\cap L^m({\mathbb {R}}^2)\). The entropy dissipation is given by

$$\begin{aligned} {\mathcal D}[\rho ]=\int _{{\mathbb {R}}^2}\rho |\nabla h[\rho ]|^2dx \,, \end{aligned}$$

where here and in the following we use the notation

$$\begin{aligned} h[\rho ]=\frac{m}{m-1}\rho ^{m-1}-{\mathcal N}*\rho \,. \end{aligned}$$

We shall note that h corresponds to \(\frac{\delta {\mathcal E}}{\delta \rho }\) and that in particular the evolutionary Eq. (4.1) can be stated as \(\partial _t \rho = \nabla \cdot (\rho \nabla h[\rho ])\). Thus, this equation bears the structure of being a gradient flow of the free energy functional in the sense of probability measures, see [2, 9, 11, 33] and the references therein.

We first prove the global well-posedness of weak solutions satisfying the energy inequality (4.3) in the next subsection as well as global uniform in time estimates for the solutions. In the second subsection, we used the uniform in time estimates together with the uniqueness of the stationary states proved in Sect. 2 to derive the main result of this section regarding long time asymptotics for (4.1).

4.1 Global well-posedness of the Cauchy problem

In this section we analyze the existence and uniqueness of a bounded global weak solution for initial data in \(L^1_{log}({\mathbb {R}}^2)\cap L^\infty ({\mathbb {R}}^2)\), where here and in the following we denote

$$\begin{aligned} L^1_{log}({\mathbb {R}}^2)= L^1((1+{\log }(1+|x|^2))dx)\,. \end{aligned}$$

Assuming to have a sufficiently regular solution with the gradient of the chemotactic potential being uniformly bounded, Kowalczyk [63] derived a priori bounds in \(L^\infty \) with respect to space and time for the Keller–Segel model with nonlinear diffusion on bounded domains. These a priori estimates have been improved and extended to the whole space by Calvez and Carrillo in [23]. We shall demonstrate here how these a priori estimates of [23] can be made rigorous when starting from an appropriately regularized equation leading to the following theorem.

Theorem 4.1

(Properties of weak solutions) For any non-negative initial data \(\rho _0\in L^1_{log}({\mathbb {R}}^2)\cap L^\infty ({\mathbb {R}}^2)\), there exists a unique global weak solution \(\rho \) to (4.1), which satisfies the energy inequality (4.3) with the energy being bounded from above and below in the sense that

$$\begin{aligned} {\mathcal E}_*\le {\mathcal E}[\rho ](t)\le {\mathcal E}[\rho _0] \end{aligned}$$

for some (negative) constant \({\mathcal E}_*\). In particular \(\rho \) is uniformly bounded in space and time

$$\begin{aligned} \sup _{t\ge 0}\Vert \rho (t,\cdot )\Vert _{L^\infty ({\mathbb {R}}^2)}\le C\, , \end{aligned}$$

where C depends only on the initial data. Moreover the log-moment grows at most linearly in time

$$\begin{aligned} N(t)=\int _{{\mathbb {R}}^2}{\log }(1+|x|^2)\rho (t,x)dx\le N(0)+C t\,, \end{aligned}$$

where again C depends only on the initial data.

We shall also state the existence result for radial initial data that was obtained in [65] and [61] for higher dimensions and the Newtonian potential. Similar methods can be applied in the case \(d=2\) considered here:

Theorem 4.2

(Properties of radial solutions) Let \(\rho _0 \in L^1_{log}({\mathbb {R}}^2)\cap L^\infty ({\mathbb {R}}^2)\) be non-negative and radially symmetric.

  1. (a)

    Then the corresponding unique weak solution of (4.1) remains radially symmetric for all \(t>0\).

  2. (b)

    If \(\rho _0\) is compactly supported, then the solution remains compactly supported for all \(t>0\).

  3. (c)

    If \(\rho _0\) is moreover monotonically decreasing, then the solution remains radially decreasing for all \(t>0\).

In the remainder of this section we carry out the proof of the existence of a bounded global weak solution to (4.1) as stated in Theorem 4.1. We therefore introduce the following regularization of (4.1)

$$\begin{aligned} \partial _t \rho _\varepsilon =\Delta (\rho _\varepsilon ^m + \varepsilon \rho _\varepsilon ) -\nabla \cdot (\rho _\varepsilon \nabla {\mathcal N}_{\varepsilon } *\rho _\varepsilon )\,, \end{aligned}$$
(4.4)

where \(m>1\) and the regularized logarithmic interaction potential is defined as

$$\begin{aligned} {\mathcal N}_\varepsilon (x)=-\frac{1}{4\pi }\log (|x|^2+\varepsilon ^2)\,. \end{aligned}$$

Moreover we have for the derivatives

$$\begin{aligned} \nabla {\mathcal N}_\varepsilon =-\frac{1}{2\pi }\frac{x}{|x|^2+\varepsilon ^2}\,,\qquad \Delta {\mathcal N}_\varepsilon =-\frac{1}{\pi }\frac{\varepsilon ^2}{\left( |x|^2+\varepsilon ^2\right) ^2}=-J_\varepsilon \end{aligned}$$

satisfying

$$\begin{aligned} \Vert J_\varepsilon \Vert _{L^1({\mathbb {R}}^2)}=1\,. \end{aligned}$$

The regularization in (4.4) was used by Bian and Liu [8], who studied the Keller–Segel equation with nonlinear diffusion and the Newtonian potential for \(d\ge 3\), which has been modified accordingly for the logarithmic interaction kernel in \(d=2\). The additional linear diffusion term in (4.4) removes the degeneracy and the regularized logarithmic potential \({\mathcal N}_\varepsilon \) possesses a uniformly bounded gradient, such that the local well posedness of (4.4) is a standard result for any \(\varepsilon >0\). We shall note that a slightly different regularization for such nonlinear diffusion Keller–Segel type of equations has been introduced by Sugiyama in [80], which also yields the existence and uniqueness of a global weak solution. The advantage of the regularization in (4.4) resembling the one in [8] is the fact that the regularized problem satisfies a free energy inequality, that in the limit gives exactly (4.3), whereas in [80] the dissipation term could only be retained with a factor of 3 / 4.

We point out that in the case \(d=2\) other a priori estimates are available than in higher space dimensions leading to a different proof for global well posedness of the Cauchy problem for (4.4) and the limit \(\varepsilon \rightarrow 0\) compared to [8].

4.1.1 Global well posedness of the regularized Cauchy problem

To derive a priori estimates for the regularized problem (4.4) we use the iterative method used by Kowalczyk [63] based on employing test functions that are powers of \(\rho _{\varepsilon ,k}=(\rho _{\varepsilon }-k)_+\) for some \(k>0\). When testing (4.4) against \(p\rho _{\varepsilon ,k}^{p-1}\) for any \(p\ge 2\), we obtain:

$$\begin{aligned}&\frac{d}{dt}\int _{{\mathbb {R}}^2}\rho ^{p}_{\varepsilon ,k}dx\nonumber \\&= -\frac{4(p-1)}{p}\int _{{\mathbb {R}}^2}(m \rho _\varepsilon ^{m-1}+\varepsilon ) |\nabla \rho ^{\frac{p}{2}}_{\varepsilon ,k}|^2dx\nonumber \\&\quad + p\int _{{\mathbb {R}}^2}(\rho _{\varepsilon ,k}+k)(\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon )\cdot \nabla \rho _{\varepsilon ,k}^{p-1}dx \end{aligned}$$
(4.5)
$$\begin{aligned}&\le -\frac{4(p-1)}{p}m\int _{{\mathbb {R}}^2}\rho _\varepsilon ^{m-1} |\nabla \rho ^{\frac{p}{2}}_{\varepsilon ,k}|^2dx\nonumber \\ {}&\quad + \int _{{\mathbb {R}}^2}(\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon )\cdot ((p-1)\nabla \rho _{\varepsilon ,k}^{p} + k p \nabla \rho _{\varepsilon ,k}^{p-1})dx\nonumber \\ {}&\le -\frac{4(p-1)}{p}mk^{m-1}\Vert \nabla \rho _{\varepsilon ,k}^{\frac{p}{2}}\Vert ^2_{L^2} + \int _{{\mathbb {R}}^2} J_\varepsilon *\rho _{\varepsilon }((p-1)\rho _{\varepsilon ,k}^{p}+k p \rho _{\varepsilon ,k}^{p-1} )dx\nonumber \\ {}&\le -\frac{4(p-1)}{p}mk^{m-1}\Vert \nabla \rho _{\varepsilon ,k}^{\frac{p}{2}}\Vert ^2_{L^2} + \int _{{\mathbb {R}}^2} (J_\varepsilon *\rho _{\varepsilon ,k} + k)((p-1)\rho _{\varepsilon ,k}^{p}+k p \rho _{\varepsilon ,k}^{p-1} ) dx \nonumber \\ {}&\le -\frac{4(p-1)}{p}mk^{m-1}\Vert \nabla \rho _{\varepsilon ,k}^{\frac{p}{2}}\Vert ^2_{L^2} + C(p-1) \int _{{\mathbb {R}}^2} \rho ^{p+1}_{\varepsilon ,k} dx \nonumber \\ {}&\quad + C k p \int _{{\mathbb {R}}^2} \rho _{\varepsilon ,k}^p dx + k^2 p \int _{{\mathbb {R}}^2} \rho _{\varepsilon ,k}^{p-1} dx\,, \end{aligned}$$
(4.6)

where for estimating the integrals involving convolution terms we used the inequality

$$\begin{aligned} \int f(x) (g*h) (x) dx\le & {} C \Vert f\Vert _{L^p}\Vert g\Vert _{L^q}\Vert h\Vert _{L^r},\nonumber \\&\quad \frac{1}{p}+\frac{1}{q}+\frac{1}{r}=2, \quad \text {where} \ p,q,r\ge 1\,, \end{aligned}$$
(4.7)

see e.g. Lieb and Loss [64]. Closing the estimate (4.6) would yield an estimate for \(\rho _{\varepsilon ,k}\) in \(L^\infty (0,T;L^p({\mathbb {R}}^2))\) and thus also for \(\rho _\varepsilon \in L^\infty (0,T;L^p({\mathbb {R}}^2))\), since

$$\begin{aligned} \int _{{\mathbb {R}}^2} \rho ^p_\varepsilon dx \le&\, k^{p-1}\int _{\{\rho _\varepsilon <k\}}\rho _\varepsilon dx+\int _{\{\rho _\varepsilon \ge k\}}(\rho _\varepsilon -k)^p dx +C(p,k)\int _{\{\rho _\varepsilon \ge k\}}k dx \nonumber \\ \le&\int _{{\mathbb {R}}^2}\rho _{\varepsilon ,k}^p dx + (k^{p-1}+C(p,k))M\,. \end{aligned}$$
(4.8)

Kowalczyk proceeded from (4.5) with the assumption corresponding to \(\Vert \nabla {\mathcal N}_\varepsilon *\rho _{\varepsilon }\Vert _{L^\infty }\le C\). Observe that it would be sufficient to prove \(\rho _\varepsilon \in L^\infty (0,T;L^p({\mathbb {R}}^2))\) for some \(p>2\) implying \(\Delta {\mathcal N}_\varepsilon *\rho _\varepsilon \in L^\infty (0,T;L^p({\mathbb {R}}^2))\) and hence the uniform boundedness of the gradient term by Sobolev imbedding. Calvez and Carrillo [23] circumvent this assumption and derive the bound by using an equi-integrability property in the inequality (4.6). Hence, in order to being able to follow the ideas of [23] for the regularized problem, we need to derive the corresponding energy inequality for the latter.

Proposition 4.3

For any finite time \(T>0\) the solution \(\rho _\varepsilon \) to the Cauchy problem (4.4) supplemented with initial data \(\rho _{0} \in L^1_{log}({\mathbb {R}}^2)\cap L^\infty ({\mathbb {R}}^2)\) satisfies the energy inequality

$$\begin{aligned} {{\mathcal {E}}}_\varepsilon [\rho _\varepsilon ](t) + \int _0^t{\mathcal D}_\varepsilon [\rho _\varepsilon ](t)dt \le {{\mathcal {E}}}_\varepsilon [\rho _{0}]+\varepsilon C (1+t)t\,, \end{aligned}$$
(4.9)

for a positive constant \(C=C(M,\Vert \rho _{0}\Vert _{\infty })\) and \(0\le t\le T\), where \({{\mathcal {E}}}_\varepsilon \) is an approximation of the free energy functional in (4.2):

$$\begin{aligned} {{\mathcal {E}}}_\varepsilon [\rho _\varepsilon ] =\int _{{\mathbb {R}}^2}\left( \frac{\rho _\varepsilon ^m}{m-1}-\frac{\rho _\varepsilon }{2} {\mathcal N}_\varepsilon *\rho _\varepsilon \right) dx \end{aligned}$$

and \({\mathcal D}_\varepsilon \) the corresponding dissipation

$$\begin{aligned} {\mathcal D}_\varepsilon [\rho _\varepsilon ](t){=}\int _{{\mathbb {R}}^2}\rho _\varepsilon |\nabla h_\varepsilon [\rho _\varepsilon ]|^2 dx\,\quad \text {with} \quad h_\varepsilon [\rho _\varepsilon ]=\frac{m}{m-1}\nabla \rho _\varepsilon ^{m-1}{-}\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon \,. \end{aligned}$$

In particular, we obtain equi-integrability

$$\begin{aligned} \lim _{k\rightarrow \infty }\sup _{t\in [0,T]}\int _{{\mathbb {R}}^2}(\rho _\varepsilon -k)_+ dx =0 \,. \end{aligned}$$

Remark 4.4

Note that due to the \(\varepsilon \Delta \rho _\epsilon \) regularization term in (4.4), its associated energy functional actually includes an extra term \(\varepsilon \int \rho _\varepsilon \log \rho _\varepsilon \) compared to \(\mathcal {E}_{\epsilon }\). But in this lemma we choose to obtain an energy inequality for \(\mathcal {E}_\varepsilon \) (rather than the actual associated energy functional), since the absence of the extra term \(\varepsilon \int \rho _{\varepsilon } \log \rho _{\varepsilon }\) will make it easier for us to obtain a priori estimates independent of \(\varepsilon \) later.

Proof

Testing (4.4) with \(\frac{m}{m-1}\rho _\varepsilon ^{m-1}-{\mathcal N}_\varepsilon *\rho _\varepsilon \) we obtain

$$\begin{aligned} \frac{d}{dt}{{\mathcal {E}}}_\varepsilon (t) + \int _{{\mathbb {R}}^{2}}\rho _{\varepsilon }&\left| \frac{m}{m-1}\nabla \rho _\varepsilon ^{m-1}-\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon \right| ^2dx +\varepsilon \frac{4}{m}\int _{{\mathbb {R}}^2}\left| \nabla \rho _\varepsilon ^{\frac{m}{2}}\right| ^2dx \\&=\varepsilon \int _{{\mathbb {R}}^2} \nabla {\mathcal N}_\varepsilon *\rho _\varepsilon \cdot \nabla \rho _\varepsilon dx =\varepsilon \int _{{\mathbb {R}}^2} \rho _\varepsilon (J_\varepsilon *\rho _\varepsilon ) dx \le \varepsilon \Vert \rho _\varepsilon \Vert ^2_{L^2({\mathbb {R}}^2)}\,, \end{aligned}$$

where we have used (4.7) and the fact that \(\Vert J_\varepsilon \Vert _{L^1({\mathbb {R}}^2)}=1\). Hence we need to derive an a priori bound for \(\rho _\varepsilon \) in \(L^2({\mathbb {R}}^2)\). We use the estimate (4.6) for \(p=2\) and bound \(\int _{{\mathbb {R}}^2}\rho _{\varepsilon ,k}^3 dx\) using the Gagliardo–Nirenberg inequality (see for instance [49, 74]) as follows:

$$\begin{aligned} \Vert \rho _{\varepsilon ,k}\Vert ^3_{L^3({\mathbb {R}}^2)}&\le C\Vert \nabla \rho _{\varepsilon ,k}\Vert _{L^2({\mathbb {R}}^2)}^2\Vert \rho _{\varepsilon ,k}\Vert _{L^1({\mathbb {R}}^2)}\\&\le C M\Vert \nabla \rho _{\varepsilon ,k}\Vert _{L^2({\mathbb {R}}^2)}^2. \end{aligned}$$

Then by (4.6) and interpolation of the \(L^2\)-integral, we have

$$\begin{aligned} \frac{d}{dt}\int _{{\mathbb {R}}^2}\rho ^{2}_{\varepsilon ,k}dx&\le -2mk^{m-1}\Vert \nabla \rho _{\varepsilon ,k}\Vert ^{2}_{L^{2}} + C \int _{{\mathbb {R}}^2} \rho _{\varepsilon ,k}^3\, dx + 3k^2 \int _{{\mathbb {R}}^2} \rho _{\varepsilon ,k}dx\nonumber \\&\le -(2mk^{m-1}-CM)\Vert \nabla \rho _{\varepsilon ,k}\Vert ^{2}_{L^{2}} +Ck^2M. \end{aligned}$$

Hence, choosing k large enough, recalling \(m>1\) and estimate (4.8), we can conclude by integrating in time that

$$\begin{aligned} \Vert \rho _\varepsilon (t,\cdot )\Vert _{L^2({\mathbb {R}}^2)}^2\le C (1+t)\, \end{aligned}$$

for some constant \(C=C(M,\Vert \rho _{0}\Vert _{L^\infty ({\mathbb {R}}^2)})\), which implies the stated energy inequality.

In order to obtain a priori bounds and in particular the equi-integrability property, we need to bound the energy functional also from below. The difference to the corresponding energy functional for the original model (4.1) lies only in the regularized interaction kernel. Since clearly for all \(x\in {\mathbb {R}}^2\) we have \({\log }(|x|^2+\varepsilon ^2)\ge 2{\log }|x| \), we obtain

$$\begin{aligned} {{\mathcal {E}}}_\varepsilon [\rho _\varepsilon ]&= \int _{{\mathbb {R}}^2} \frac{\rho _\varepsilon ^{m}}{m-1}dx + \frac{1}{8\pi }\int _{{\mathbb {R}}^2} \int _{{\mathbb {R}}^2} \rho _\varepsilon (x) {\log }(|x-y|^2+\varepsilon ^2) \rho _\varepsilon (y) dx dy\\&\ge \int _{{\mathbb {R}}^2} \frac{\rho _\varepsilon ^{m}}{m-1}dx + \frac{1}{4\pi }\int _{{\mathbb {R}}^2} \int _{{\mathbb {R}}^2} \rho _\varepsilon (x) {\log }|x-y| \rho _\varepsilon (y) dx dy \ = \ {{\mathcal {E}}}[\rho _\varepsilon ] \end{aligned}$$

Following [23] we can estimate further using the logarithmic Hardy–Littlewood–Sobolev inequality

$$\begin{aligned} {{\mathcal {E}}}_\varepsilon [\rho _{\varepsilon }]\ge {{\mathcal {E}}}[\rho _{\varepsilon }]\ge -\frac{M}{8\pi }C(M)+\int _{{\mathbb {R}}^{2}}\Theta (\rho _{\varepsilon })\,dx\,, \end{aligned}$$
(4.10)

where C(M) is a constant depending on the mass M and

$$\begin{aligned} \Theta (\rho ):=\frac{\rho ^{m}}{m-1}-\frac{M}{8\pi }\rho \log \rho . \end{aligned}$$

Now it is easy to verify there is a constant \(\kappa =\kappa (m,M)>1\) for which

$$\begin{aligned} \Theta (\rho )\ge 0\quad \text { for }\rho \ge \kappa , \end{aligned}$$

such that

$$\begin{aligned} \int _{{\mathbb {R}}^{2}}\Theta ^{-}(\rho _{\varepsilon })dx=\int _{1\le \rho _{\varepsilon }\le \kappa } \Theta ^{-}(\rho _{\varepsilon })dx\le \frac{M^{2}}{8\pi }\log \kappa \,, \end{aligned}$$

implying in particular

$$\begin{aligned} {{\mathcal {E}}}_\varepsilon [\rho _\varepsilon ]\ge {{\mathcal {E}}}[\rho _\varepsilon ]\ge -\frac{M}{8\pi }C(M)-\frac{M^2}{8\pi }{\log } \kappa =:{\mathcal E}_*\,. \end{aligned}$$
(4.11)

We therefore find from (4.9), (4.10) and (4.11) that

$$\begin{aligned} \int _{{\mathbb {R}}^{2}}\Theta ^{+}(\rho _{\varepsilon }(t))dx\le C+\varepsilon CT^2\,, \end{aligned}$$

with \(C =C(m,\Vert \rho _0\Vert _{L^1({\mathbb {R}}^2)}, \Vert \rho _0\Vert _{L^\infty ({\mathbb {R}}^2)})\) being a constant independent of t. Since \(\Theta ^{+}\) is superlinear at infinity, we obtain the equi-integrability as in Theorem 5.3 in [23]. \(\square \)

The equi-integrability from Proposition 4.3 allows to close the estimate (4.6) analogously to Lemma 3.1 of [23] leading to a bound for \(\rho _{\varepsilon }\) in \(L^\infty (0,T;L^p({\mathbb {R}}^2))\). Moreover, using Moser’s iterative methods of Lemma 3.2 in [23] we finally get a bound for \(\rho _{\varepsilon }\) in \(L^\infty (0,T;L^\infty ({\mathbb {R}}^2))\). In order to avoid mass loss at infinity typically the boundedness of the second moment of the solution is employed. We here however demonstrate that the bound of the log-moment provides sufficient compactness, having the advantage of less restrictions on the initial data. We therefore denote for the regularization

$$\begin{aligned} N_\varepsilon (t)=\int _{{\mathbb {R}}^2}{\log }(1+|x|^2)\rho _\varepsilon (t,x) dx\,. \end{aligned}$$

The following lemma is now obtained following the ideas of [23]:

Lemma 4.5

The solution \(\rho _\varepsilon \) to (4.4) for a non-negative initial data \(\rho _{0}\in L^1_{log}({\mathbb {R}}^2)\cap L^\infty ({\mathbb {R}}^2)\) satisfies for any \(T>0\):

$$\begin{aligned} \sup _{t\in [0,T]}\Vert \rho _\varepsilon (t,\cdot )\Vert _{L^\infty ({\mathbb {R}}^2)} + \Vert N_\varepsilon (t)\Vert _{L^\infty (0,T)} \le C(1+T+\epsilon T^2)\,, \end{aligned}$$

where the constant C depends on the initial data.

Proof

Computing formally the evolution of the log-moment in (4.4) in a similar fashion to [26], we find for the test function \(\phi (x)={\log } (1+|x|^2)\) after integrating by parts

$$\begin{aligned} \frac{d}{dt}N_{\varepsilon }= & {} \int _{{\mathbb {R}}^2} \partial _t\rho _\varepsilon \, \phi dx =-\int _{{\mathbb {R}}^{2}}\rho _\varepsilon \nabla h_\varepsilon [\rho _\varepsilon ]\cdot \nabla \phi dx + \varepsilon \int _{{\mathbb {R}}^2}\rho _\varepsilon \Delta \phi dx \nonumber \\\le & {} \frac{1}{2}\int _{{\mathbb {R}}^{2}}\rho _\varepsilon | \nabla \phi |^2 dx + \frac{1}{2}\int _{{\mathbb {R}}^{2}}\rho _\varepsilon |\nabla h_\varepsilon [\rho _\varepsilon ]|^2dx + \varepsilon \int _{{\mathbb {R}}^2}\rho _\varepsilon \Delta \phi dx\,. \end{aligned}$$

Computing the derivatives of \(\phi \) we see

$$\begin{aligned} |\nabla \phi |=\left| \frac{2 x}{1+|x|^2}\right| \le 1\,,\qquad |\Delta \phi | = \frac{4}{(1+|x|^2)^2}\le 4\,. \end{aligned}$$

We thus obtain

$$\begin{aligned} \frac{d}{dt}N_{\varepsilon }\le \frac{1}{2}((1+8\varepsilon )M+ {{\mathcal {D}}}_\varepsilon [\rho _\varepsilon ])\,. \end{aligned}$$

Integration in time and making use of the energy - energy dissipation inequality (4.9) and the uniform bound on \({\mathcal E}_\varepsilon \) from below in (4.11) gives

$$\begin{aligned} N_\varepsilon (t)\le N_\varepsilon (0)+\frac{1}{2}(1+8\varepsilon )M t + {\mathcal E}_\varepsilon (\rho _0) - {\mathcal {E}}_*+\varepsilon C(1+t)t \le C(1+ t+\varepsilon t^2) \end{aligned}$$

The argument can easily be made rigorous by using compactly supported approximations of \(\phi \) on \({\mathbb {R}}^2\) as test functions, see e.g. also [13]. The proof is concluded by referring to Lemma 3.2 in [23] for the proof of uniform boundedness of \(\rho _\varepsilon \). \(\square \)

Remark 4.6

  1. (i)

    The fact that the uniform bound of \(\rho _\varepsilon \) grows linearly with time originates from the term of order \(\varepsilon \) in the energy inequality for the regularized equation. Hence the bound on the energy and therefore the modulus of equi-continuity for the regularized problem are depending on time. However, for the limiting equation (4.1) this term vanishes and the energy is decaying for all times, which allows to deduce uniform boundedness of the solution to (4.1) globally in time and space, see also [23, Lemma 5.7].

  2. (ii)

    The log-moment of \(\rho _\varepsilon \) grows at most linearly in time. The same statement is true for the limiting function. Hence it is only possible to guarantee confinement of mass for finite times. This property allowing for compactness results will in the following be used to pass to the limit in the regularized problem. Due to the growth of the bound with time it cannot be employed for the long-time behavior. Hence different methods will be required.

4.1.2 The limit \(\varepsilon \rightarrow 0\)

In order to deduce the global well-posedness of the Cauchy problem for (4.1) it remains to carry out the limit \(\varepsilon \rightarrow 0\). Knowing that the solution remains uniformly bounded and having the bounds from the energy inequality, we obtain weak convergence properties of the solution. In order to pass to the limit with the nonlinearities and in the entropy inequality, strong convergence results will be required. The following lemma summarizes the uniform bounds we obtain from Proposition 4.3 and Lemma 4.5:

Lemma 4.7

Let \(\rho _\varepsilon \) be the solution as in Proposition 4.3, then we obtain the following uniform in \(\varepsilon \) bounds

$$\begin{aligned}&\Vert \rho _\varepsilon \Vert _{L^\infty (0,T;L_{log}^1({\mathbb {R}}^2))}+\Vert \rho _\varepsilon \Vert _{L^\infty ((0,T)\times {\mathbb {R}}^2)}\le C\,,\\&\Vert \sqrt{\rho _\varepsilon }\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon \Vert _{L^2((0,T)\times {\mathbb {R}}^2)} + \Vert \nabla {\mathcal N}_\varepsilon *\rho _\varepsilon \Vert _{L^\infty ((0,T)\times {\mathbb {R}}^2)}\\&\quad + \sqrt{\varepsilon }\Vert \nabla \rho _\varepsilon \Vert _{L^2((0,T)\times {\mathbb {R}}^2)} \le C\,,\\&\Vert \partial _t \rho _\varepsilon \Vert _{L^2(0,T;H^{-1}({\mathbb {R}}^2))}+\Vert \rho ^{q}_\varepsilon \Vert _{L^2(0,T;H^1({\mathbb {R}}^2))}\le C \qquad \text {for any} \ \ q\ge m-\frac{1}{2}\,, \end{aligned}$$

where C depends on \(m, q, \rho _0\) and T.

Proof

The uniform bounds of the \(L_{log}^1({\mathbb {R}}^2)\)- and \(L^\infty ({\mathbb {R}}^2)\)-norms follow from the conservation of mass and Lemma 4.5. The convolution term

$$\begin{aligned} \nabla {\mathcal N}_\varepsilon *\rho _\varepsilon = -\frac{1}{2\pi }\int _{{\mathbb {R}}^2}\frac{y-x}{|y-x|^2+\varepsilon ^2}\rho _\varepsilon (t,y) dy \end{aligned}$$

can be estimated as follows:

$$\begin{aligned} 2\pi |\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon |\le \Vert \rho _\varepsilon \Vert _{L^\infty ({\mathbb {R}}^2)}\int _{|x-y|\le 1}\frac{1}{|x-y|}dy + M \le C\,. \end{aligned}$$
(4.12)

The bound of \(\sqrt{\rho _\varepsilon } \nabla {\mathcal N}_\varepsilon *\rho _\varepsilon \) in \(L^2((0,T)\times {\mathbb {R}}^2)\) follows now easily by using the conservation of mass.

The basic \(L^2\)-estimate corresponding to (4.5) for \(p=2\) and \(k=0\) implies after integration in time

$$\begin{aligned} \frac{1}{2}\int _{{\mathbb {R}}^2}\rho _\varepsilon ^2 dx\le & {} \frac{1}{2}\int _{{\mathbb {R}}^2}\rho _0^2 dx - \varepsilon \int _0^T\int _{{\mathbb {R}}^2} |\nabla \rho _\varepsilon |^2 dxdt\\&- m\int _0^T\int _{{\mathbb {R}}^2}\rho _\varepsilon ^{m-1} |\nabla \rho _\varepsilon |^2 dxdt \\&\qquad + \int _0^T\int _{{\mathbb {R}}^2}({J_\varepsilon }*\rho _\varepsilon ) \rho _\varepsilon ^2 \, dxdt \,. \end{aligned}$$

Using the above a priori estimates we can further bound employing the inequality in (4.7)

$$\begin{aligned} \varepsilon \Vert \nabla \rho _\varepsilon \Vert ^2_{L^2((0,T)\times {\mathbb {R}}^2)}\le & {} \frac{1}{2}\int _{{\mathbb {R}}^2}\rho _0^2 dx+ C\int _0^T\int _{{\mathbb {R}}^2}\rho _\varepsilon ^3 dxdt \le C\,. \end{aligned}$$

Since \(m>1\), the conservation of mass and the uniform boundedness of \(\rho _\varepsilon \) give \(\rho _\varepsilon ^{m-1/2}\) in \(L^2((0,T)\times {\mathbb {R}}^2)\). For the gradient we now use the bound on the entropy dissipation (4.9)

$$\begin{aligned} \Vert \nabla \rho ^{m-1/2}_\varepsilon \Vert ^2_{L^2((0,T)\times {\mathbb {R}}^2)}\le & {} 2\frac{(m-1/2)^{2}}{m^{2}}\left( \int _0^T{\mathcal D}_\varepsilon [\rho _\varepsilon ]dt + \Vert \sqrt{\rho _\varepsilon }\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon \Vert ^2_{L^2((0,T)\times {\mathbb {R}}^2)}\right) \nonumber \\&\le C + C\Vert \nabla {\mathcal N}_\varepsilon *\rho _\varepsilon \Vert ^{2}_{L^\infty ((0,T)\times {\mathbb {R}}^2)}MT\,. \end{aligned}$$
(4.13)

The bound for \(\nabla \rho ^q\) follows easily by rewriting

$$\begin{aligned} \frac{m-\frac{1}{2}}{q}\,\nabla \rho _\varepsilon ^q=\rho _\varepsilon ^{q-m+\frac{1}{2}}\nabla \rho _\varepsilon ^{m-\frac{1}{2}} \end{aligned}$$

and using the uniform boundedness of \(\rho _\varepsilon \).

It thus now remains to derive the estimate for the time derivative. Using the previous estimates we have for any test function \(\phi \in L^2(0,T;H^1({\mathbb {R}}^2))\),

$$\begin{aligned}&\left| \int _0^T\int _{{\mathbb {R}}^2}\partial _t \rho _\varepsilon \phi \, dx\,dt\right| \le \int _0^T\int _{{\mathbb {R}}^2}\left| \nabla (\rho _\varepsilon ^m +\varepsilon \rho _\varepsilon )\cdot \nabla \phi \right| \, dx\,dt \\&\qquad \qquad \qquad \qquad \qquad \qquad \quad + \int _0^T\int _{{\mathbb {R}}^2}\left| \rho _\varepsilon (\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon )\cdot \nabla \phi \right| \, dx\,dt\\&\quad \le \left( \Vert \nabla \rho _\varepsilon ^m\Vert _{L^2((0,T)\times {\mathbb {R}}^2)}+ \varepsilon \Vert \nabla \rho _\varepsilon \Vert _{L^2((0,T)\times {\mathbb {R}}^2)}\right. \\&\qquad \left. +\Vert \sqrt{\rho _\varepsilon }|\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon |\Vert _{L^\infty ((0,T)\times {\mathbb {R}}^2)}\sqrt{TM}\right) \Vert \nabla \phi \Vert _{L^2((0,T)\times {\mathbb {R}}^2)} \\&\quad \le C\Vert \nabla \phi \Vert _{L^2((0,T)\times {\mathbb {R}}^2)}\,\,. \end{aligned}$$

\(\square \)

We now use these bounds to derive weak convergence properties. The Dubinskii Lemma (see Lemma 4.23 in the “Appendix”) can be applied to obtain the strong convergence locally in space, which can be extended to global strong convergence using the boundedness of the log-moment.

Lemma 4.8

Let \(\rho _\varepsilon \) be the solution as in Proposition 4.3. Then, up to a subsequence,

(4.14)
(4.15)
(4.16)
(4.17)

Proof

Since \(\{\rho _\varepsilon \}_\varepsilon \) are uniformly bounded in \(L^q((0,T)\times {\mathbb {R}}^2)\) for any \(1\le q\le \infty \), we obtain from the reflexivity of the Lebesgue spaces for \(1<q<\infty \), up to a subsequence, the weak convergence

$$\begin{aligned} \rho _\varepsilon \rightharpoonup \rho \qquad \text {in} \ \ L^q((0,T)\times {\mathbb {R}}^2) \quad \text {for any } \ 1<q<\infty \,. \end{aligned}$$
(4.18)

Moreover due to the uniform bounds from Lemma 4.7

$$\begin{aligned} \Vert \partial _t \rho _\varepsilon \Vert _{L^2(0,T;H^{-1}({\mathbb {R}}^2))}+\Vert \rho ^{r}_\varepsilon \Vert _{L^2(0,T;H^1({\mathbb {R}}^2))}\le C \end{aligned}$$

for any \(r\ge m-\frac{1}{2}\), we can apply the Dubinskii Lemma stated in the “Appendix” to derive

$$\begin{aligned} \rho _\varepsilon \rightarrow \rho \qquad \text {in} \quad L^{r}((0,T)\times B_{R}(0))\qquad \text {for any} \ \ 2m\le r<\infty \ \text { and any } R>0 \,. \end{aligned}$$

The boundedness of the log-moment N(t) allows to extend the strong convergence to the whole space, since for any \(1\le q<\infty \) we have

$$\begin{aligned} \int _0^T\int _{|x|>R} \rho _\varepsilon ^q dx dt&\le \Vert \rho _\varepsilon \Vert ^{q-1}_{L^\infty ((0,T)\times {\mathbb {R}}^2)}\int _0^T\int _{|x|>R}\frac{{\log }(1+|x|^2)}{{\log } (1+R^2)}\rho _\varepsilon dx dt\\&\le \frac{C(1+T)}{{\log } (1+R^2)} \rightarrow 0 \,, \end{aligned}$$

as \(R\rightarrow \infty \). Due to the weak lower semi-continuity of the \(L^q\)-norm we can now conclude with (4.18) that also

$$\begin{aligned} \int _{|x|>R}\rho ^q(t,x) dx\le \liminf _{\varepsilon>0}\int _{|x|>R}\rho _\varepsilon ^q(t,x) dx \rightarrow 0 \qquad \text {as} \ R\rightarrow \infty \quad \text {for all} \ q\ge 1\,. \end{aligned}$$

Hence we can extend the strong convergence locally in space to strong convergence in \({\mathbb {R}}^2\):

$$\begin{aligned} \rho _\varepsilon \rightarrow \rho \qquad \text {in} \quad L^{r}((0,T)\times {\mathbb {R}}^2)\qquad \text {for any} \ \ 2m\le r<\infty \,. \end{aligned}$$

Additionally the strong convergence in \(L^1((0,T)\times {\mathbb {R}}^2)\) can be deduced using the bound from the energy as stated in Lemma 4.22 in the “Appendix”. Interpolation now yields (4.14).

The weak convergence of \( \rho _\varepsilon ^{m-1/2}\) in \(L^2(0,T;H^1({\mathbb {R}}^2))\) holds due to its uniform boundedness given by inequality (4.13) and the reflexivity of the latter space, where the limit is identified arguing by the density of spaces. Due to the uniform boundedness of \(\rho _\varepsilon \) this assertion can be extended to any finite power bigger than \(m-1/2\).

Since moreover \(\sqrt{\rho _\varepsilon }\) is uniformly bounded in \(L^2((0,T)\times {\mathbb {R}}^2)\) we have the weak convergence towards \(\sqrt{\rho }\) in \(L^2((0,T)\times {\mathbb {R}}^2)\), where again the limit is identified by using the a.e. convergence of \(\rho _\varepsilon \) from the strong convergence above. To see (4.16) we rewrite

$$\begin{aligned} \Vert \sqrt{\rho _\varepsilon }-\sqrt{\rho }\Vert ^2_{L^2((0,T)\times {\mathbb {R}}^2)}= & {} \int _0^T\int _{{\mathbb {R}}^2}(\rho _\varepsilon -2\sqrt{\rho _\varepsilon }\,\sqrt{\rho } +\rho )dx\,dt\\= & {} \int _0^T\int _{{\mathbb {R}}^2}(\rho _\varepsilon -\rho )dx\,dt\\&- 2\int _0^T\int _{{\mathbb {R}}^2}\sqrt{\rho }\,(\sqrt{\rho _\varepsilon } -\sqrt{\rho } )dx\,dt\,. \end{aligned}$$

The first integral vanishes and the second one converges to 0 due to the weak convergence of \(\sqrt{\rho _\varepsilon }\rightharpoonup \sqrt{\rho }\) in \(L^2((0,T)\times {\mathbb {R}}^2)\).

Finally the convergence in (4.17) is a direct consequence of the bound \(\sqrt{\varepsilon }\Vert \nabla \rho _\varepsilon \Vert _{L^2((0,T)\times {\mathbb {R}}^2)}\le C\) in Lemma 4.7. \(\square \)

These convergence results from Lemma 4.8 are sufficient to obtain the weak convergence of the nonlinearities \(\sqrt{\rho _\varepsilon }\nabla h_\varepsilon [\rho _\varepsilon ]\) and \(\rho _\varepsilon \nabla h_\varepsilon [\rho _\varepsilon ]\) in \(L^2((0,T)\times {\mathbb {R}}^2)\), which allow to pass to the limit in the weak formulation and to deduce the weak lower semicontinuity of the entropy dissipation term:

Lemma 4.9

Let \(\rho _\varepsilon \) and \(\rho \) be as in Lemma 4.8. Then

$$\begin{aligned} \sqrt{\rho _\varepsilon }\,\nabla h_\varepsilon [\rho _\varepsilon ]\rightharpoonup \sqrt{\rho }\,\nabla h[\rho ] \qquad \text {in} \ L^2((0,T)\times {\mathbb {R}}^2;{\mathbb {R}}^{2})\, \end{aligned}$$
(4.19)
$$\begin{aligned} \rho _\varepsilon \,\nabla h_\varepsilon [\rho _\varepsilon ]\rightharpoonup \rho \,\nabla h[\rho ] \qquad \text {in} \ L^2((0,T)\times {\mathbb {R}}^2;{\mathbb {R}}^{2})\,. \end{aligned}$$
(4.20)

Proof

Due to (4.15) and (4.16) it remains to verify

$$\begin{aligned} \sqrt{\rho _\varepsilon }\,\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon \rightharpoonup \sqrt{\rho }\,\nabla {\mathcal N}*\rho \quad \ \text {in} \quad L^2((0,T)\times {\mathbb {R}}^2;{\mathbb {R}}^{2})\,. \end{aligned}$$

Due to Lemma 4.7, we have the weak convergence of \(\sqrt{\rho _\varepsilon }\,\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon \) in \(L^2((0,T)\times {\mathbb {R}}^2;{\mathbb {R}}^{2})\). In order to identify the limit we consider for a \(\phi \in L^2((0,T)\times {\mathbb {R}}^2;{\mathbb {R}}^{2})\):

$$\begin{aligned}&\int _0^T\int _{{\mathbb {R}}^2}(\sqrt{\rho _\varepsilon }\,\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon -\sqrt{\rho }\,\nabla {\mathcal N}*\rho )\cdot \phi \, dx dt \nonumber \\&\qquad =\int _0^T\int _{{\mathbb {R}}^2}(\sqrt{\rho _\varepsilon }-\sqrt{\rho })(\nabla {\mathcal N}_\varepsilon *\rho _\varepsilon )\cdot \phi \, dx dt \nonumber \\&\qquad \quad + \int _0^T\int _{{\mathbb {R}}^2}\sqrt{\rho }(\nabla ( {\mathcal N}_\varepsilon - {\mathcal N})*\rho _\varepsilon )\cdot \phi \, dx dt\nonumber \\&\qquad \quad +\int _0^T\int _{{\mathbb {R}}^2}\sqrt{\rho }\,\nabla {\mathcal N}*(\rho _\varepsilon -\rho )\cdot \phi \, dx dt \end{aligned}$$
(4.21)

The first term converges to zero using (4.16), since by (4.12) it is bounded by

$$\begin{aligned}&\Vert \sqrt{\rho _\varepsilon }-\sqrt{\rho }\Vert _{L^2((0,T)\times {\mathbb {R}}^2)}\Vert \nabla {\mathcal N}_\varepsilon *\rho _\varepsilon \Vert _{L^\infty ((0,T)\times {\mathbb {R}}^2)}\Vert \phi \Vert _{L^2((0,T)\times {\mathbb {R}}^2)}\\&\quad \le C\Vert \sqrt{\rho _\varepsilon }-\sqrt{\rho }\Vert _{L^2((0,T)\times {\mathbb {R}}^2)}\rightarrow 0\,. \end{aligned}$$

For the second term we first use the Cauchy–Schwarz inequality

$$\begin{aligned}&\int _0^T\int _{{\mathbb {R}}^2}\sqrt{\rho }(\nabla ({\mathcal N}_\varepsilon - {\mathcal N})*\rho _\varepsilon )\cdot \phi \, dx dt \\&\quad \le \sqrt{MT}\Vert \phi \Vert _{L^2((0,T)\times {\mathbb {R}}^2)}\Vert \nabla ({\mathcal N}_\varepsilon -{\mathcal N})*\rho _\varepsilon \Vert _{L^\infty ((0,T)\times {\mathbb {R}}^2)}\,. \end{aligned}$$

To see that this convolution term vanishes we bound further

$$\begin{aligned}&|\nabla ({\mathcal N}_\varepsilon -{\mathcal N})*\rho _\varepsilon |=\left| \int _{{\mathbb {R}}^2}\left( \frac{x-y}{|x-y|^2+\varepsilon ^2}-\frac{x-y}{|x-y|^2}\right) \rho _\varepsilon (y)dy\right| \\&\quad \le \varepsilon ^2\Vert \rho _\varepsilon \Vert _{L^\infty ({\mathbb {R}}^2)}\int _{{\mathbb {R}}^2}\frac{|x-y|}{(|x-y|^2+\varepsilon ^2)|x-y|^2}dy\\&\quad =\varepsilon C\int _0^\infty \frac{1}{s^2+1}ds \le \varepsilon C\rightarrow 0 \end{aligned}$$

uniformly in xt, where we substituted \(s = |x-y|/\varepsilon \). For the remaining term in (4.21) we proceed changing the order of integration, where we again skip the dependence of \(\rho _\varepsilon \) and \(\phi \) on t in the following:

$$\begin{aligned}&\int _0^T\int _{{\mathbb {R}}^2}\sqrt{\rho }\,(\nabla {\mathcal N}*(\rho _\varepsilon -\rho ))\cdot \phi \, dx dt \\&\quad =\frac{1}{2\pi }\int _0^T\int _{{\mathbb {R}}^2}\left( \sqrt{\rho _\varepsilon (y)}-\sqrt{\rho (y)}\right) \left( \sqrt{\rho _\varepsilon (y)}+\sqrt{\rho (y)}\right) \\&\qquad \left( \int _{{\mathbb {R}}^2} \sqrt{\rho (x)}\frac{x-y}{|x-y|^2}\cdot \phi (x) dx\right) dydt\\&\quad \le \frac{1}{2\pi }\left\| \sqrt{\rho _\varepsilon }-\sqrt{\rho }\right\| _{L^2((0,T)\times {\mathbb {R}}^2)}\left\| \left( \sqrt{\rho _\varepsilon (\cdot )}+\sqrt{\rho (\cdot )}\right) \right. \\&\qquad \left. \left( \int _{{\mathbb {R}}^2} \sqrt{\rho (x)}\frac{1}{|x-\cdot |}|\phi (x)| dx\right) \right\| _{L^2((0,T)\times {\mathbb {R}}^2)} \end{aligned}$$

To prove that this integral vanishes in the limit, due to (4.16) it suffices to show that

$$\begin{aligned} \int _0^T \int _{{\mathbb {R}}^2} \left( \left( \sqrt{\rho _\varepsilon (y)}+\sqrt{\rho (y)}\right) \int _{{\mathbb {R}}^2} \sqrt{\rho (x)}\frac{1}{|x-y|}|\phi (x)| dx \right) ^2 dydt \le C \,. \end{aligned}$$

We shall therefore split the integral into two parts and consider first

$$\begin{aligned}&\int _0^T \int _{{\mathbb {R}}^2} \left( \left( \sqrt{\rho _\varepsilon (y)}+\sqrt{\rho (y)}\right) \int _{|x-y|\le 1} \sqrt{\rho (x)}\frac{1}{|x-y|}|\phi (x)| dx \right) ^2 dydt\\&\le 2 \int _0^T \int _{{\mathbb {R}}^2} (\rho _\varepsilon (y)+\rho (y))\left( \int _{|x-y|\le 1}|\phi |^2(x) \frac{1}{|x-y|}dx\right) \\&\quad \left( \int _{|x-y|\le 1}\rho (x) \frac{1}{|x-y|}dx\right) \, dy dt\\&\le C\Vert \rho \Vert _{L^\infty ((0,T)\times {\mathbb {R}}^2)}\left( \Vert \rho \Vert _{L^\infty ((0,T)\times {\mathbb {R}}^2)}+\Vert \rho _\varepsilon \Vert _{L^\infty ((0,T)\times {\mathbb {R}}^2)}\right) \\&\quad \int _0^T \int _{{\mathbb {R}}^2}\int _{|x-y|\le 1} \frac{|\phi |^2(x)}{|x-y|}dy dx dt \\&\le C \int _0^T \int _{{\mathbb {R}}^2}|\phi |^{2}(x)\left( \int _{|x-y|\le 1} \frac{1}{|x-y|}dy\right) dx dt \ \le \ C\Vert \phi \Vert _{L^2((0,T)\times {\mathbb {R}}^2)}^2 \end{aligned}$$

It remains to bound the integral for \(|x-y|>1\):

$$\begin{aligned}&\int _0^T \int _{{\mathbb {R}}^2} \left( \left( \sqrt{\rho _\varepsilon (y)}+\sqrt{\rho (y)}\right) \int _{|x-y|> 1} \sqrt{\rho (x)}\frac{1}{|x-y|}|\phi (x)| dx \right) ^2 dydt\\&\le 2 \int _0^T\Vert \phi \Vert _{L^2({\mathbb {R}}^2)}^2 \int _{{\mathbb {R}}^{2}} \left( \rho _\varepsilon (y)+\rho (y)\right) \int _{|x-y|>1}\rho (x) \frac{1}{|x-y|^2}dxdy dt\\&\le 2M \int _0^T\Vert \phi \Vert _{L^2({\mathbb {R}}^2)}^2\int _{{\mathbb {R}}^{2}} \left( \rho _\varepsilon (y)+\rho (y)\right) dydt\ = \ 4M^{2}\Vert \phi \Vert ^2_{L^{2}((0,T)\times {\mathbb {R}}^{2})}. \end{aligned}$$

\(\square \)

Proof of Theorem 4.1

The convergence property of the nonlinearity in (4.20) and the weak convergence of the time derivative due to Lemma 4.7 allow to pass to the limit in the weak formulation of the Cauchy problem for (4.1), where the linear diffusion term vanishes due to (4.17). The uniqueness of the solution is implied from Theorem 1.3 and Corollary 6.1 of [32], where we shall not go further into detail here.

It thus remains to pass to the limit in the energy inequality. Since the energy dissipation is weakly lower semicontinuous due to (4.19), we get

$$\begin{aligned} \int _0^T{\mathcal D}[\rho ](t)dt\le \liminf _{\varepsilon >0}\int _0^T{\mathcal D}_\varepsilon [\rho _\varepsilon ](t)dt\,. \end{aligned}$$

In order to obtain the energy inequality (4.3) in the limit \(\varepsilon \rightarrow 0\) it thus remains to show \({{\mathcal {E}}}_\varepsilon [\rho _\varepsilon ](t) \rightarrow {\mathcal E}[\rho ](t)\,\) for \(t\in [0,T]\). Lemma 4.22 and the uniform bounds on \(\rho _\varepsilon \) in Lemma 4.7 directly imply the strong convergence of \(\rho _\varepsilon \) in \(L^\infty (0,T;L^m({\mathbb {R}}^2))\). It is therefore left to prove the convergence for the convolution term and we rewrite

$$\begin{aligned}&-4\pi \int _{{\mathbb {R}}^2} (\rho _\varepsilon {\mathcal N}_\varepsilon *\rho _\varepsilon -\rho {\mathcal N}*\rho ) dx =\int _{{\mathbb {R}}^2}\int _{{\mathbb {R}}^2}\rho _\varepsilon (x)\rho _\varepsilon (y){\log }\frac{|x-y|^2+\varepsilon ^2}{|x-y|^2} dxdy\\&\qquad \qquad + 2\int _{{\mathbb {R}}^2}\int _{{\mathbb {R}}^2}\left( \rho _\varepsilon (x)(\rho _\varepsilon (y)-\rho (y))+\rho (y)(\rho _\varepsilon (x)-\rho (x))\right) {\log } |x-y| dxdy\,. \end{aligned}$$

We split the domain of integration and first analyze the case \(|x-y|\ge 1\). In this domain, we get

$$\begin{aligned}&\int _{{\mathbb {R}}^2}\int _{|x-y|\ge 1}\rho _\varepsilon (x)\rho _\varepsilon (y){\log } \frac{|x-y|^2+\varepsilon ^2}{|x-y|^2} dxdy\\&\quad \le \int _{{\mathbb {R}}^2}\int _{|x-y|\ge 1}\rho _\varepsilon (x)\rho _\varepsilon (y){\log } (1+\varepsilon ^2) dxdy\le \varepsilon ^2M^2\,, \end{aligned}$$

and thus it converges to zero as \(\varepsilon \rightarrow 0\). Using the Cauchy–Schwarz inequality, we obtain moreover

$$\begin{aligned}&\left( \int _{{\mathbb {R}}^2}\int _{|x-y|\ge 1}\rho _\varepsilon (x)(\rho _\varepsilon (y)-\rho (y)){\log } |x-y| dxdy\right) ^2\\&\quad \le \Vert \rho _\varepsilon -\rho \Vert _{L^1({\mathbb {R}}^2)}\int _{{\mathbb {R}}^2} |\rho _\varepsilon (y)-\rho (y)|\left| \int _{|x-y|\ge 1}\rho _\varepsilon (x){\log }|x-y| dx\right| ^2dy\\&\quad \le 2M\Vert \rho _\varepsilon -\rho \Vert _{L^1({\mathbb {R}}^2)}\int _{{\mathbb {R}}^2}\int _{|x-y|\ge 1}({\log }(1+|x|)\\&\qquad +{\log }(1+|y|))\rho _\varepsilon (x)(\rho _\varepsilon (y)+\rho (y))dxdy\\&\quad \le 4M^2(N(t)+N_\varepsilon (t))\Vert \rho _\varepsilon -\rho \Vert _{L^1({\mathbb {R}}^2)}\\&\quad \le C(1+T)\Vert \rho _\varepsilon -\rho \Vert _{L^{\infty }(0,T;L^1({\mathbb {R}}^2))}\rightarrow 0 \end{aligned}$$

We now turn to the integration domain \(|x-y|<1\), where by dominated convergence

$$\begin{aligned}&\int _{{\mathbb {R}}^2}\int _{|x-y|< 1}\rho _\varepsilon (x)\rho _\varepsilon (y){\log }\frac{|x-y|^2+\varepsilon ^2}{|x-y|^2} dx dy\\&\quad \le \Vert \rho _\varepsilon \Vert _{L^\infty ({\mathbb {R}}^2)} \int _{{\mathbb {R}}^2}\rho _\varepsilon (y)\int _0^1 r{\log }\frac{r^2+\varepsilon ^2}{r^2} drdy\\&\quad \le CM\int _0^1 r{\log }\frac{r^2+\varepsilon ^2}{r^2} dr { \ \rightarrow \ 0. } \end{aligned}$$

This proves the convergence of the entropy, which together with the weak lower semicontinuity of the entropy-dissipation leads to the desired energy-energy dissipation inequality (4.3) for the limiting solution \(\rho \). \(\square \)

4.2 Long-time behavior of solutions

Our main result of Sect. 2 together with the uniqueness argument for radial stationary solutions to (4.1) of [61] and the characterization of global minimizers in [28] and Corollary 3.11 leads to the following result:

Theorem 4.10

There exists a unique stationary state \(\rho _M\) of (4.1) with mass M and zero center of mass in the sense of Definition 2.1 with the property \(\rho _M\in L^1_{log}({\mathbb {R}}^2)\). Moreover, \(\rho _M\) is compactly supported, bounded, radially symmetric and non-increasing. Moreover, the unique stationary state is characterized as the unique global minimizer of the free energy functional (4.2) with mass M.

As a consequence, all stationary states of (4.1) in the sense of Definition 2.1 with mass M are given by translations of the given profile \(\rho _M\):

$$\begin{aligned} \mathcal {S}=\left\{ \rho _M(x-x_0) \text{ such } \text{ that } x_0\in {\mathbb {R}}^2 \right\} \,. \end{aligned}$$

Remark 4.11

As in [61, Corollary 2.3] we have the following result comparing the support and height for stationary states with different masses based on a scaling argument: Let \(\rho _1\) be the radial solution with unit mass. Then the radial solution with mass M is of the form

$$\begin{aligned} \rho _M(x) = M^{\frac{1}{m-1}}\rho _1(M^{-\frac{m-2}{2(m-1)}}x)\,. \end{aligned}$$

For two stationary states \(\rho _{M_1}\) and \(\rho _{M_2}\) with masses \(M_1>M_2\) the following relations hold:

  1. (a)

    If \(m>2\), then \(\rho _{M_1}\) has a bigger support and a bigger height than \(\rho _{M_2}\).

  2. (b)

    If \(m=2\), then all stationary states have the same support.

  3. (c)

    If \(1<m<2\), then \(\rho _{M_1}\) has smaller support and bigger height than \(\rho _{M_2}\).

We will study now the long time asymptotics for the global weak solutions \(\rho \) of (4.1) that according to the entropy inequality in Theorem 4.1 satisfy

$$\begin{aligned} \lim _{t\rightarrow \infty }{\mathcal E}[\rho ](t)+\int _0^\infty {\mathcal D}[\rho ](t) dt \le {\mathcal E}[\rho _0]\,. \end{aligned}$$

Since the entropy is bounded from below, this implies for the entropy dissipation

$$\begin{aligned} \lim _{t\rightarrow \infty }\int _t^\infty {\mathcal D}[\rho ](s)ds =0 \,. \end{aligned}$$

Let us therefore now consider the sequence

$$\begin{aligned} \rho _k(t,x)=\rho (t+t_k,x)\qquad \text {on} \ (0,T)\times {\mathbb {R}}^2 \quad \text {for some } \ t_k\rightarrow \infty \,, \end{aligned}$$

for which we obtain

$$\begin{aligned} 0= & {} \lim _{k\rightarrow \infty }\int _{t_k}^\infty {\mathcal D}[\rho ](t) dt \ge \lim _{k\rightarrow \infty }\int _0^T{\mathcal D}[\rho ](t+t_k) dt\ge 0\,. \end{aligned}$$

Thus \( {\mathcal D}[\rho _k]\rightarrow 0\) in \(L^1(0,T)\), or equivalently

$$\begin{aligned} \Vert \sqrt{\rho _k}\left| \nabla h[\rho _k]\right| \Vert ^2_{L^2((0,T)\times {\mathbb {R}}^2)}\rightarrow 0 \qquad \text {as} \ k\rightarrow \infty \,. \end{aligned}$$

The proof of convergence towards the steady state will be based on weak lower semicontinuity of the entropy dissipation. Assume that \(\rho _k\rightharpoonup \overline{\rho } \) in \(L^\infty (0,T;L^1({\mathbb {R}}^2)\cap L^m({\mathbb {R}}^2))\), then we have to derive

$$\begin{aligned} \Vert \sqrt{\overline{\rho }}|\nabla h[\overline{\rho }]|\Vert _{L^2((0,T)\times {\mathbb {R}}^2)}\le \liminf _{k\rightarrow \infty }\Vert \sqrt{\rho _k}|\nabla h[\rho _k]|\Vert _{L^2((0,T)\times {\mathbb {R}}^2)}=0\,. \end{aligned}$$

Since the \(L^2\)-norm is weakly lower semicontinuous, it therefore remains to show similarly as in Lemma 4.9

$$\begin{aligned} \sqrt{\rho _k}\nabla h[\rho _k]\rightharpoonup \sqrt{\overline{\rho }}\nabla h[\overline{\rho }]\,\qquad \text {in} \ L^2((0,T)\times {\mathbb {R}}^2)\,. \end{aligned}$$

From there it can be deduced that \(\overline{\rho }\) is the stationary state \(\rho _M\) with \(M=\Vert \rho _0\Vert _{L^1({\mathbb {R}}^2)}\) by the uniqueness theorem 4.10, if we can guarantee that no mass gets lost in the limit.

The main difficulty for passing to the limit in the long-time behavior lies in obtaining sufficient compactness avoiding the loss of mass at infinity. Even though the mass of \(\rho (t,\cdot )\) is conserved for all time, if a positive amount of mass escapes to infinity, then a subsequence of \(\rho (t,\cdot )\) may weakly converge to a stationary solution with mass strictly less than M. To rule out this scenario, we need to show that the sequence \(\{\rho (t,\cdot )\}_{t>0}\) is tight, which can be done by obtaining uniform-in-time bounds for certain moments for \(\rho (t,\cdot )\). So far we only have a time-dependent bound on the logarithmic moment in Theorem 4.1, which is not enough. Moreover, even if we know that \(\{\rho (t,\cdot )\}_{t>0}\) is tight, if we want to choose the right limiting profile among all stationary states in \(\mathcal {S}\), we need to show the conservation of some symmetry. In fact, it is easy to check that the center of mass should formally be preserved by the evolution due to the antisymmetry of the gradient of the Newtonian potential. But to rigorously justify this, we need to work with moments that are larger than first moment, so the center of mass is well defined.

Below we state the main theorem in this section, where a key argument is to establish a uniform-in-time bound on the second moment of \(\rho (t,\cdot )\), if \(\rho _0\) has a finite second moment.

Theorem 4.12

Let \(\rho \) be the weak solution to (4.1) given in Theorem  4.1 with non-negative initial data \(\rho _0\in L^1((1+|x|^2)dx)\cap L^\infty ({\mathbb {R}}^2)\). Then, as \(t\rightarrow \infty \), \(\rho (\cdot ,t)\) converges to the unique stationary state with the same mass and center of mass as the initial data, i.e., to

$$\begin{aligned} \rho _M^c:=\rho _M(x-x_c) \qquad \text{ where } x_c=\frac{1}{M}\int _{{\mathbb {R}}^2} x \rho _0(x) \,dx\,, \end{aligned}$$

with \(M=\Vert \rho _{0}\Vert _{L^{1}({\mathbb {R}}^{2})}\), ensured by Theorem 4.10. More precisely, we have

$$\begin{aligned} \lim _{t\rightarrow \infty }\Vert \rho (t,\cdot )-\rho _M^c(\cdot )\Vert _{L^q({\mathbb {R}}^2)}\rightarrow 0 \qquad \text {for all} \ 1\le q<\infty \,. \end{aligned}$$

Our aim is to show that the second moment of solutions to (4.1) is uniformly bounded in time for all \(t\ge 0\). This in turn shows easily that the first moment is preserved in time for all \(t\ge 0\), as we will prove below. Recall that by (2.15) we denote by \(M_2[f]\) the second moment of \(f\in L^1_+({\mathbb {R}}^d)\). We first derive rigorously the evolution of the second moment in time:

$$\begin{aligned} M_2[\rho (t,\cdot )]-M_2[\rho (0,\cdot )]=4\int _0^t\int _{{\mathbb {R}}^2} \rho ^m dx \,dt-\frac{t M^2}{2\pi }\, \end{aligned}$$
(4.22)

starting from the regularized system (4.4). Computing the second moment of the regularized problem, we obtain

$$\begin{aligned} \frac{d}{dt}M_{2}[\rho _\varepsilon ]= & {} 4\int _{{\mathbb {R}}^{2}}(\rho _\varepsilon ^m+\varepsilon \rho _\varepsilon )dx-\frac{1}{\pi }\int _{{\mathbb {R}}^{2}}\int _{{\mathbb {R}}^{2}}\rho _{\varepsilon }(x,t)\rho _{\varepsilon }(y,t)\frac{x\cdot (x-y)}{|x-y|^{2}+\varepsilon ^{2}}dx\,dy\nonumber \\= & {} 4\int _{{\mathbb {R}}^{2}}(\rho _\varepsilon ^m+\varepsilon \rho _\varepsilon )dx-\frac{1}{2\pi }\int _{{\mathbb {R}}^{2}}\int _{{\mathbb {R}}^{2}}\rho _{\varepsilon }(x,t)\rho _{\varepsilon }(y,t)\frac{|x-y|^2}{|x-y|^{2}+\varepsilon ^{2}}dx\,dy\, \nonumber \\= & {} 4\int _{{\mathbb {R}}^{2}}(\rho _\varepsilon ^m+\varepsilon \rho _\varepsilon )dx-\frac{M^2}{2\pi }+R_\varepsilon (t)\,. \end{aligned}$$
(4.23)

The strong convergence in (4.14) allows to pass to the limit \(\varepsilon \rightarrow 0\) in the first integral of (4.23) and for the remainder term we moreover have due to the conservation of mass and the uniform boundedness of \(\rho _\varepsilon \)

$$\begin{aligned} R_\varepsilon (t)= & {} \frac{1}{2\pi }\int _{{\mathbb {R}}^{2}}\int _{{\mathbb {R}}^{2}}\rho _{\varepsilon }(x,t)\rho _{\varepsilon }(y,t)\frac{\varepsilon ^2}{|x-y|^{2}+\varepsilon ^2}dx\,dy\\\le & {} \frac{\varepsilon }{4\pi }\int _{{\mathbb {R}}^{2}}\int _{{\mathbb {R}}^{2}}\rho _{\varepsilon }(x,t)\rho _{\varepsilon }(y,t)\frac{1}{|x-y|}dx\,dy \\\le & {} \varepsilon C \quad \rightarrow \quad 0. \end{aligned}$$

The argument can easily be made rigorous by using compactly supported approximations of \(|x|^2\) on \({\mathbb {R}}^2\) as test functions, see e.g. also [13]. We finally obtain (4.22) by integrating in time.

Now, we want to compare general solutions to (4.1) with its radial solutions. In order to do this we will make use of the concept of mass concentration, which has been recalled in 2.4, and used for instance in [44, 61] for classical applications to Keller–Segel type models.

Following exactly the same proof as in [61], the following two results hold for the solutions of (4.1). The first result says that for two radial solutions, if one is initially “more concentrated” than the other one, then this property is preserved for all time. The second result compares a general (possibly non-radial) solution \(\rho (t,\cdot )\) with another solution \(\mu (t,\cdot )\) with initial data \(\rho ^\#(0,\cdot )\), i.e., the decreasing rearrangement of the initial data for \(\rho (t,\cdot )\), and it says that the symmetric rearrangement of \(\rho (t,\cdot )\) is always “less concentrated” than the radial solution \(\mu (t,\cdot )\). This result generalizes the results from [44] to nonlinear diffusion with totally different proofs. We also refer the interested reader to the survey [86] for a general exposition of the mass concentration comparison results for local nonlinear parabolic equations and to the recent developments obtained in [88, 89] in the context of nonlinear parabolic equations with fractional diffusion.

Proposition 4.13

Let \(m>1\) and fg be two radially symmetric solutions to (4.1) with \(f(0,\cdot ) \prec g(0,\cdot )\). Then we have \(f(t,\cdot ) \prec g(t,\cdot )\) for all \(t>0\).

Proposition 4.14

Let \(m>1\) and \(\rho \) be a solution to (4.1), and let \(\mu \) be a solution to (4.1) with initial condition \(\mu (0,\cdot ) = \rho ^\#(0,\cdot ).\) Then we have that \(\mu (t,\cdot )\) remains radially symmetric for all \(t\ge 0\), and in addition we have

$$\begin{aligned} \rho ^\#(t,\cdot ) \prec \mu (t,\cdot ) \quad \text { for all } t\ge 0. \end{aligned}$$

Now we are ready to bound the second moment of solutions in the two-dimensional case: we will show that if \(\rho (t,\cdot )\) is a solution to (4.1) with \(M_2[\rho _0]\) finite, then \(M_2[\rho (t)]\) must be uniformly bounded for all time.

Theorem 4.15

Let \(\rho _0 \in L^1((1+|x|^2)dx) \cap L^\infty ({\mathbb {R}}^2)\). Let \(\rho (t,\cdot )\) be the solution to (4.1) with initial data \(\rho _0\). Then we have that

$$\begin{aligned} M_2[\rho (t)] \le M_2[\rho _0] + C(\Vert \rho _0\Vert _{L^1}) \quad \text {for all }t\ge 0. \end{aligned}$$

Proof

Recalling that \(\rho _M\) is the unique radially symmetric stationary solution with the same mass as \(\rho _0\) and zero center of mass, we let \(\rho _{M,\lambda } := \lambda ^2 \rho _M(\lambda x)\) with some parameter \(\lambda >1\). Since \(\rho _0\in L^1({\mathbb {R}}^2) \cap L^\infty ({\mathbb {R}}^2)\), we can choose a sufficiently large \(\lambda \) such that \(\rho _0^\# \prec \rho _{M,\lambda }\). Note that \(\lambda >1\) also directly yields that \(\rho _M \prec \rho _{M,\lambda }\).

Let \(\mu (t,\cdot )\) be the solution to (4.1) with initial data \(\rho _{M,\lambda }\). Combining Proposition 4.13 and Proposition 4.14, we have that

$$\begin{aligned} \rho ^\#(t,\cdot ) \prec \mu (t,\cdot ) \quad \text { for all }t\ge 0. \end{aligned}$$

It then follows from (2.13) and Lemma 2.5 that

$$\begin{aligned} \int _{{\mathbb {R}}^2}\rho ^m(t,x) dx = \int _{{\mathbb {R}}^2} [\rho ^\#]^m(t,x) dx \le \int _{{\mathbb {R}}^2} \mu ^m(t,x) dx \quad \text { for all }t\ge 0.\nonumber \\ \end{aligned}$$
(4.24)

Now using the computation of the time derivative of \(M_2[\rho (t)]\) in (4.22), where \(\rho (\cdot ,t)\) is a solution to (4.1), we get

$$\begin{aligned} M_2[\rho (t)] - M_2[\rho _0] = 4\int ^t_0\int _{{\mathbb {R}}^2} \rho ^m(t,x) dx\,dt - \frac{t M^2}{2\pi }\,. \end{aligned}$$
(4.25)

Since \(\mu (t,\cdot )\) is also a solution to (4.1), (4.25) also holds when \(\rho \) is replaced by \(\mu \). Combining this fact with (4.24), we thus have

$$\begin{aligned} M_2[\rho (t)] - M_2[\rho _0] \le M_2[\mu (t)] - M_2[\mu (0)] \le M_2[\mu (t)]. \end{aligned}$$
(4.26)

Finally, it suffices to show \(M_2[\mu (t)]\) is uniformly bounded for all time. Since \(\rho _M\) is a stationary solution and we have \(\rho _M \prec \rho _{M,\lambda }\), it follows from Proposition 4.13 that \(\rho _M \prec \mu (t,\cdot )\) for all \(t\ge 0\), hence we have \(M_2[\rho _M] \ge M_2[\mu (t)]\) due to Lemma 2.6. Plugging this into (4.26) yields

$$\begin{aligned} M_2[\rho (t)] \le M_2[\rho _0] + M_2[\rho _M] \quad \text { for all }t\ge 0, \end{aligned}$$

where \(M_2[\rho _M]\) is a constant only depending on the mass \(M:=\Vert \rho _0\Vert _{L^1({\mathbb {R}}^2)}\), which can be computed as follows: using Remark 4.11, we know the support of \(\rho _M\) is given by the ball centered at 0 of radius \(R(M) = C_0 M^{\frac{m-2}{2(m-1)}}\) (where \(C_0\) is the radius of the support for the stationary solution with unit mass), hence \(M_2[\rho _M] \le M R(M)^2 \le C_0^2 M^{\frac{2m-3}{m-1}}\). \(\square \)

Remark 4.16

The last result showing uniform-in-time bounds for the second moment for \(m>1\) finite is also interesting in comparison to the results in [42, 43] where the case \(m\rightarrow \infty \) limit of the gradient flow is analysed. In the “\(m=\infty \)” case, the second moment of any solution is actually decreasing in time, leading to the result that all solutions converge towards the global minimizer with some explicit rate. As mentioned in the introduction, a result of this sort for any other potential rather than the attractive logarithmic potential is lacking.

As already mentioned above, a key ingredient in the proof of Theorem 4.12 is the confinement of mass, which is first now obtained as follows:

Lemma 4.17

Let \(\rho \) be a global weak solution as in Theorem 4.1 with mass M with initial data \(\rho _0 \in L^1((1+|x|^2)dx)\cap L^\infty ({\mathbb {R}}^2)\) and consider as above the sequence \(\{\rho _k\}_{k\in {\mathbb {N}}}=\{\rho (\cdot +t_k,\cdot )\}_{k\in {\mathbb {N}}}\) in \((0,T)\times {\mathbb {R}}^2\). Then there exists a \(\overline{\rho } \in L^1((0,T)\times {\mathbb {R}}^2)\cap L^m((0,T)\times {\mathbb {R}}^2)\) and a subsequence, that we denote with the same index without loss of generality, such that:

$$\begin{aligned} \rho _k(t,x)\rightharpoonup \overline{\rho }(t,x) \qquad \text {in} \ L^1((0,T)\times {\mathbb {R}}^2)\cap L^m((0,T)\times {\mathbb {R}}^2) \end{aligned}$$

as \(k\rightarrow \infty \).

Proof

Due to the entropy being uniformly bounded from below and by the entropy inequality (4.2), we have \(\rho _k\in L^\infty ((0,T);L^m({\mathbb {R}}^2))\). Using Theorem 4.15, we deduce that

$$\begin{aligned} M_2[\rho _k(t)] \le M_2[\rho _0] + C(\Vert \rho _0\Vert _{L^1({\mathbb {R}}^2)}) \quad \text {for all }k\in {\mathbb {N}} \text{ and } 0\le t\le T.\nonumber \\ \end{aligned}$$
(4.27)

Since \(\{\rho _k\}_{k\in {\mathbb {N}}}\) are also uniformly bounded in \(L^\infty (0,T;L^m({\mathbb {R}}^2))\) we obtain equi-integrability and can therefore apply the Dunford–Pettis theorem (see Theorem 4.21 in “Appendix”) to obtain the weak convergence in \(L^1((0,T)\times {\mathbb {R}}^2)\cap L^m((0,T)\times {\mathbb {R}}^2)\). \(\square \)

In order to obtain weak lower semicontinuity of the entropy dissipation term, we need additional convergence results. These are derived from the following uniform bounds:

Lemma 4.18

Let \(\rho \) be a global weak solution as in Theorem 4.1 with mass M and consider as above the sequence \(\{\rho _k\}_{k\in {\mathbb {N}}}=\{\rho (\cdot +t_k,\cdot )\}_{k\in {\mathbb {N}}}\) in \((0,T)\times {\mathbb {R}}^2\). Then

$$\begin{aligned}&\Vert \rho _k\Vert _{L^\infty (0,T;L^1({\mathbb {R}}^2))}+\Vert \rho _k\Vert _{L^\infty ((0,T)\times {\mathbb {R}}^2)}\le C\\&\Vert \sqrt{\rho _k}\nabla {\mathcal N}*\rho _k\Vert _{L^2((0,T)\times {\mathbb {R}}^2)} + \Vert \nabla {\mathcal N}*\rho _k\Vert _{L^\infty ((0,T)\times {\mathbb {R}}^2)} \le C\\&\Vert \partial _t \rho _k\Vert _{L^2(0,T;H^{-1}({\mathbb {R}}^2))}+\Vert \rho ^{q}_k\Vert _{L^2(0,T;H^1({\mathbb {R}}^2))} \le C \qquad \text {for any} \ \ q\ge m-\frac{1}{2}\,. \end{aligned}$$

Proof

The bounds are obtained from the energy-energy dissipation inequality (4.3) in an analogous way to the ones given in Lemma 4.7 with the only difference concerning the replacement of \({\mathcal N}_\varepsilon \) by \({\mathcal N}\), which however makes no difference in the estimate (4.12). \(\square \)

Using these estimates the following convergence properties can be derived in an analogous way to the proof of Lemma 4.8.

Lemma 4.19

Let the assumptions of Lemma 4.17 hold. Then, up to subsequences that we denote with the same index,

$$\begin{aligned}&\rho _k\ \rightarrow \ \overline{\rho } \qquad \text {in} \quad L^{q}((0,T)\times {\mathbb {R}}^2))\qquad \text {for any} \ \ 1\le q<\infty \,,\\&\rho ^{p}_{k}\ \rightharpoonup \ \overline{\rho }^{p} \qquad \text {in} \quad L^2(0,T;H^1({\mathbb {R}}^2))\qquad \text{ for } \text{ any } m-\tfrac{1}{2} \le p<\infty ,\\&\sqrt{\rho _k}\ \rightarrow \ \sqrt{\overline{\rho }} \qquad \text {in} \quad \ L^2((0,T)\times {\mathbb {R}}^2)\,. \end{aligned}$$

These convergence results from Lemma 4.19 and Lemma 4.17 are sufficient to obtain the weak convergence of the nonlinearities \(\sqrt{\rho _k}\nabla h[\rho _k]\) and \(\rho _k\nabla h[\rho _k]\) in \(L^2((0,T)\times {\mathbb {R}}^2)\), which allows to deduce the weak lower semicontinuity of the entropy dissipation term and to pass to the limit in the weak formulation of (4.1) in the same way as in the proof of Lemma 4.9.

Lemma 4.20

Let \(\rho _k\) and \(\overline{\rho }\) be as in Lemma 4.19. Then

$$\begin{aligned}&\sqrt{\rho _k}\,\nabla h[\rho _k]\rightharpoonup \sqrt{\overline{\rho }}\,\nabla h[\overline{\rho }] \qquad \text {in} \ L^2((0,T)\times {\mathbb {R}}^2;{\mathbb {R}}^{2})\\&\rho _k\,\nabla h[\rho _k]\rightharpoonup \overline{\rho }\,\nabla h[\overline{\rho }] \qquad \text {in} \ L^2((0,T)\times {\mathbb {R}}^2;{\mathbb {R}}^{2})\,. \end{aligned}$$

This enables us to close the proof of convergence towards the set of stationary states.

Proof of Theorem 4.12

Let us first notice that \(\overline{\rho }\in L^\infty ((0,T)\times {\mathbb {R}}^2)\) due to the first convergence in Lemma 4.19 and the uniform in time bound on the weak solutions in Theorem 4.1. Due to the weak lower semicontinuity of the \(L^2((0,T)\times {\mathbb {R}}^2)\)-norm and the bound from below of the entropy as done in Proposition 4.3 implies that \( {\mathcal D}[\rho _k]\rightarrow 0\) in \(L^1(0,T)\), and as consequence

$$\begin{aligned} \Vert \sqrt{\overline{\rho }}|\nabla h[\overline{\rho }]|\Vert _{L^2((0,T)\times {\mathbb {R}}^2)}^2\le \liminf _{k\rightarrow \infty }\Vert \sqrt{\rho _k}|\nabla h[\rho _k]|\Vert _{L^2((0,T)\times {\mathbb {R}}^2)}^2=0\,. \end{aligned}$$

Thus \(\overline{\rho }\) solves

$$\begin{aligned} \overline{\rho } |\nabla h[\overline{\rho }]|^2 =0 \qquad \text {a.e. in } (0,T)\times {\mathbb {R}}^2\,. \end{aligned}$$
(4.28)

Moreover, due to the convergence properties in Lemmas 4.19 and 4.20 the limiting density \(\overline{\rho }\) is a weak distributional solution to (4.1) with test functions is \(L^2(0,T;H^1({\mathbb {R}}^2))\). Due to (4.28), we get that \(\overline{\rho } \nabla h[\overline{\rho }]=0\) a.e. in \((0,T)\times {\mathbb {R}}^2\) and thus \(\partial _t \overline{\rho }=0\) in \(L^2(0,T;H^{-1}({\mathbb {R}}^2))\). This yields that \(\overline{\rho }(t,x) \equiv \overline{\rho }(x)\) does not depend on time.

Due to the convergence properties in Lemma 4.19, the uniform bound on the second moment (4.27) together with Lemma 4.22 in the “Appendix”, we can deduce that \(\overline{\rho }\in L^1((1+|x|^2)dx)\) and that \(\rho _k \rightarrow \overline{\rho }\) in \(L^\infty (0,T;L^1({\mathbb {R}}^2))\). In particular, \(\overline{\rho }\) has mass M.

Putting together all the properties of \(\overline{\rho }\) just proved together with the fact that \(\nabla \overline{\rho }^m \in L^2({\mathbb {R}}^2)\) due to Lemma 4.19, we infer that \(\overline{\rho }\) corresponds to a steady state of Eq. (4.1) in the sense of Definition 2.1. The uniqueness up to translation of stationary states in Theorem 4.10 shows that \(\overline{\rho }\) is a translation of \(\rho _M\), and thus \(\overline{\rho }\in \mathcal {S}\). In fact, we have shown that the limit of all convergent sequences \(\{\rho _k\}_{k\in {\mathbb {N}}}\) must be a translation of \(\rho _M\). This in turn shows that the set of accumulation points of any time diverging sequence belongs to \(\mathcal {S}\).

Finally, in order to identify uniquely the limit, we take advantage of the translational invariance. We first remark that the center of mass of the initial data is preserved for all time due to the antisymmetry of \(\nabla \mathcal {N}\). Due to Proposition 4.15, all time diverging sequences have uniformly bounded second moments, thus since \(\overline{\rho }\) is an accumulation point of a sequence \(\rho _{k}\), by Lemma 4.22 we have

$$\begin{aligned}&\left| \int _{{\mathbb {R}}^2}x\overline{\rho }(x)dx- x_{c} M\right| \\&\quad =\left| \int _{{\mathbb {R}}^2}x(\overline{\rho }(x)-\rho _{k}(t,x))dx- \int _{{\mathbb {R}}^2}x(\rho _{k}(t,x)-\rho _{0}(x))dx\right| \\&\quad \le \int _{{\mathbb {R}}^2}|x||\overline{\rho }(x)-\rho _{k}(t,x)|dx\le M_{2}[|\rho _{k}(t)-\overline{\rho }|]^{1/2}\Vert \rho _{k}(t)-\overline{\rho }\Vert ^{1/2}_{L^{1}({\mathbb {R}}^{2})}\\&\quad \le C \Vert \rho _{k}(t)-\overline{\rho }\Vert ^{1/2}_{L^{\infty }(0,T;L^{1}({\mathbb {R}}^{2}))}\rightarrow 0. \end{aligned}$$

Hence all accumulation points of the sequences have the same center of mass as the initial data. Then, all possible limits reduce to the translation of \(\rho _M\) to the initial center of mass as desired. \(\square \)