Nonlinear aggregation-diffusion equations: radial symmetry and long time asymptotics

We analyze under which conditions equilibration between two competing effects, repulsion modeled by nonlinear diffusion and attraction modeled by nonlocal interaction, occurs. This balance leads to continuous compactly supported radially decreasing equilibrium configurations for all masses. All stationary states with suitable regularity are shown to be radially symmetric by means of continuous Steiner symmetrization techniques. Calculus of variations tools allow us to show the existence of global minimizers among these equilibria. Finally, in the particular case of Newtonian interaction in two dimensions they lead to uniqueness of equilibria for any given mass up to translation and to the convergence of solutions of the associated nonlinear aggregation-diffusion equations towards this unique equilibrium profile up to translations as t→∞\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\rightarrow \infty $$\end{document}.


Introduction
The evolution of interacting particles and their equilibrium configurations has attracted the attention of many applied mathematicians and mathematical analysts for years. Continuum description of interacting particle systems usually leads to analyze the behavior of a mass density ρ(t, x) of individuals at certain location x ∈ R d and time t ≥ 0. Most of the derived models result in aggregation-diffusion nonlinear partial differential equations through different asymptotic or mean-field limits [75,14,29]. The different effects reflect that equilibria are obtained by competing behaviors: the repulsion between individuals/particles is modeled through nonlinear diffusion terms while their attraction is integrated via nonlocal forces. This attractive nonlocal interaction takes into account that the presence of particles/individuals at a certain location y ∈ R d produces a force at particles/individuals located at x ∈ R d proportional to −∇W (x − y) where the given interaction potential W : R d → R is assumed to be radially symmetric and increasing consistent with attractive forces. The evolution of the mass density of particles/individuals is given by the nonlinear aggregation-diffusion equation of the form: with initial data ρ 0 ∈ L 1 + (R d ) ∩ L m (R d ). We will work with degenerate diffusions, m > 1, that appear naturally in modelling repulsion with very concentrated repelling nonlocal forces [75,14], but also with linear and fast diffusion ranges 0 < m ≤ 1, which are also classical in applications [77,59]. These models are ubiquitous in mathematical biology where they have been used as macroscopic descriptions for collective behavior or swarming of animal species, see [69,15,70,71,84,20] for instance, or more classically in chemotaxis-type models, see [77,59,54,53,13,11,26] and the references therein.
On the other hand, this family of PDEs is a particular example of nonlinear gradient flows in the sense of optimal transport between mass densities, see [2,33,34]. The main implication for us is that there is a natural Lyapunov functional for the evolution of (1.1) defined on the set of centered mass densities ρ ∈ L 1 + (R d ) ∩ L m (R d ) given by being the last integral defined in the improper sense, and if m = 1 we replace the first integral of E[ρ] by R d ρ log ρdx. Therefore, if the balance between repulsion and attraction occurs, these two effects should determine stationary states for (1.1) including the stable solutions possibly given by local (global) minimizers of the free energy functional (1.2).
Many properties and results have been obtained in the particular case of Newtonian attractive potential due to its applications in mathematical modeling of chemotaxis [77,59] and gravitational collapse models [78]. In the classical 2D Keller-Segel model with linear diffusion, it is known that equilibria can only happen in the critical mass case [10] while self-similar solutions are the long time asymptotics for subcritical mass cases [13,22]. For supercritical masses, all solutions blow up in finite time [54]. It was shown in [63,23] that degenerate diffusion with m > 1 is able to regularize the 2D classical Keller-Segel problem, where solutions exist globally in time regardless of its mass, and each solution remain uniformly bounded in time. For the Newtonian attraction interaction in dimension d ≥ 3, the authors in [9] show that the value of the degeneracy of the diffusion that allows the mass to be the critical quantity for dichotomy between global existence and finite time blow-up is given by m = 2 − 2/d. In fact, based on scaling arguments it is easy to argue that for m > 2 − 2/d, the diffusion term dominates when density becomes large, leading to global existence of solutions for all masses. This result was shown in [80] together with the global uniform bound of solutions for all times.
However, in all cases where the diffusion dominates over the aggregation, the long time asymptotics of solutions to (1.1) have not been clarified, as pointed out in [8]. Are there stationary solutions for all masses when the diffusion term dominates? And if so, are they unique up to translations? Do they determine the long time asymptotics for (1.1)? Only partial answers to these questions are present in the literature, which we summarize below.
To show the existence of stationary solutions to (1.1), a natural idea is to look for the global minimizer of its associated free energy functional (1.2). For the 3D case with Newtonian interaction potential and m > 4/3, Lions' concentration-compactness principle [67] gives the existence of a global minimizer of (1.2) for any given mass. The argument can be extended to kernels that are no more singular than Newtonian potential in R d at the origin, and have slow decay at infinity. The existence result is further generalized by [5] to a broader classes of kernels, which can have faster decay at infinity. In all the above cases, the global minimizer of (1.2) corresponds to a stationary solution to (1.1) in the sense of distributions. In addition, the global minimizer must be radially decreasing due to Riesz's rearrangement theorem.
Regarding the uniqueness of stationary solutions to (1.1), most of the available results are for Newtonian interaction. For the 3D Newtonian potential with m > 4/3, for any given mass, the authors in [65] prove uniqueness of stationary solutions to (1.1) among radial functions, and their method can be generalized to the Newtonian potential in R d with m > 2 − 2/d. For the 3D case with m > 4/3, [79] show that all compactly supported stationary solutions must be radial up to a translation, hence obtaining uniqueness of stationary solutions among compactly supported functions. The proof is based on moving plane techniques, where the compact support of the stationary solution seems crucial, and it also relies on the fact that the Newtonian potential in 3D converges to zero at infinity. Similar results are obtained in [28] for 2D Newtonian potential with m > 1 using an adapted moving plane technique. Again, the uniqueness result is based on showing radial symmetry of compactly supported stationary solutions. Finally, we mention that uniqueness of stationary states has been proved for general attracting kernels in one dimension in the case m = 2, see [21]. To the best of our knowledge, even for Newtonian potential, we are not aware of any results showing that all stationary solutions are radial (up to a translation).
Previous results show the limitations of the present theory: although the existence of stationary states for all masses is obtained for quite general potentials, their uniqueness, crucial for identifying the long time asymptotics, is only known in very particular cases of diffusive dominated problems. The available uniqueness results are not very satisfactory due to the compactly supported restriction on the uniqueness class imposed by the moving plane techniques. And thus, large time asymptotics results are not at all available due to the lack of mass confinement results of any kind uniformly in time together with the difficulty of identifying the long time limits of sequences of solutions due to the restriction on the uniqueness class for stationary solutions.
If one wants to show that the long time asymptotics are uniquely determined by the initial mass and center of mass, a clear strategy used in many other nonlinear diffusion problems, see [87] and the references therein, is the following: one first needs to prove that all stationary solutions are radial up to a translation in a non restrictive class of stationary solutions, then one has to show uniqueness of stationary solutions among radial solutions, and finally this uniqueness will allow to identify the limits of time diverging sequences of solutions, if compactness of these sequences is shown in a suitable functional framework. Let us point out that comparison arguments used in standard porous medium equations are out of the question here due to the lack of maximum principle by the presence of the nonlocal term.
In this work, we will give the first full result of long time asymptotics for a diffusion dominated problem using the previous strategy without smallness assumptions of any kind. More precisely, we will prove that all solutions to the 2D Keller-Segel equation with m > 1 converge to the global minimizer of its free energy using the previous strategy. The first step will be to show radial symmetry of stationary solutions to (1.1) under quite general assumptions on W and the class of stationary solutions. Let us point out that standard rearrangement techniques fail in trying to show radial symmetry of general stationary states to (1.1) and they are only useful for showing radial symmetry of global minimizers, see [28]. Comparison arguments for radial solutions allow to prove uniqueness of radial stationary solutions in particular cases [65,61]. However, up to our knowledge, there is no general result in the literature about radial symmetry of stationary solutions to nonlocal aggregation-diffusion equations.
Our first main result is that all stationary solutions of (1.1), with no restriction on m > 0, are radially decreasing up to translation by a fully novel application of continuous Steiner symmetrization techniques for the problem (1.1). Continuous Steiner symmetrization has been used in calculus of variations [18] for replacing rearrangement inequalities [16,64,72], but its application to nonlinear nonlocal aggregation-diffusion PDEs is completely new. Most of the results present in the literature using continuous Steiner symmetrization deal with functionals of first order, i.e. functionals involving a power of the modulus of the gradient of the unknown, see [19,Corollary 7.3] for an application to p-Laplacian stationary equations, and in [58, Section II] and [57,18], while in our case the functional (1.2) is purely of zeroth order. The decay of the attractive Newtonian potential interaction term in d ≥ 3 follows from [18,Corollary 2] and [72], which is the only result related to our strategy.
We will construct a curve of measures starting from a stationary state ρ using continuous Steiner symmetrization such that the functional (1.2) decays strictly at first order along that curve unless the base point ρ is radially symmetric, see Proposition 2.15. However, the functional (1.2) has at most a quadratic variation when ρ is a stationary state as the first term in the Taylor expansion cancels. This leads to a contradiction unless the stationary state is radially symmetric. The construction of this curve needs a non-classical technique of slowing-down the velocities of the level sets for the continuous Steiner symmetrization in order to cope with the possible compact support of stationary states in the degenerate case m > 1, see Proposition 2.8. This first main result is the content of Section 2 in which we specify the assumptions on the interaction potential and the notion of stationary solutions in details. We point out that the variational structure of (1.1) is crucial to show the radially decreasing property of stationary solutions.
The result of radial symmetry for general stationary solutions to (1.1) is quite striking in comparison to other gradient flow models in collective behavior based on the competition of attractive and repulsive effects via nonlocal interaction potentials. Actually, there exist numerical and analytical evidence in [62,7,4] that there should be stationary solutions of these fully nonlocal interaction models which are not radially symmetric despite the radial symmetry of the interaction potential. Our first main result shows that this break of symmetry does not happen whenever nonlinear diffusion is chosen to model very strong localized repulsion forces, see [84]. Symmetry breaking in nonlinear diffusion equations without interactions has also received a lot of attention lately related to the Caffarelli-Kohn-Nirenberg inequalities, see [45,46]. Another consequence of our radial symmetry results is the lack of non-radial local minimizers, and even non-radial critical points, of the free energy functional (1.2), which is not at all obvious.
We also generalize our radial symmetry result when (1.1) has an additional term ∇ · (ρ∇V ) on the right-hand side, where V is a confining potential (see Section 2.5 for precise conditions on V ), in the sense that it plays the role of preventing particles to drift away in the presence of the diffusion. It is known that with the extra term, the corresponding energy functional has an additional term V (x)ρ(x) dx. The particular case of quadratic confinement V (x) = |x| 2 2 is important since it leads to the free energy functional associated to (1.1) with homogeneous kernels in self-similar variables [36,24,25] and thus, characterizing the self-similar profiles for those problems.
Finally, let us remark that our radial symmetry result applies to stationary states of (1.1) for any m > 0 regardless of being in the diffusion dominated case or not. As soon as stationary states of (1.1) exist under suitable assumptions on the interaction potential W , and the confining potential V if present, they must be radially symmetric up to a translation. This fact makes our result applicable to the fair-competition cases [11,10,12] and the aggregation-dominated cases, see [68,39,40] with degenerate, linear or fast diffusion. Section 2.4 is finally devoted to deal with the most restrictive case of λ-convex potentials and the Newtonian potential with m ≥ 1 − 1 d . In these cases, we can directly make use of the key first-order decay result of the interaction energy along Continuous Steiner symmetrization curves in Proposition 2.15, bypassing the technical result in Proposition 2.8, in order to give a nice shortcut of the proof of our main Theorem 2.2 based on gradient flow techniques.
We next study more properties of particular radially decreasing stationary solutions. We make use of the variational structure to show the existence of global minimizers to (1.2) under very general hypotheses on the interaction potential W and m > 1. In Section 3, we show that these global minimizers are in fact radially decreasing continuous functions, compactly supported if m > 1. These results fully generalize the results in [79,28]. Putting together Sections 2 and 3, the uniqueness and full characterization of the stationary states is reduced to uniqueness among the class of radial solutions. This result is known in the case of Newtonian attraction kernels [65].
Finally, we make use of the uniqueness among translations for any given mass of stationary solutions to (1.1) to obtain the second main result of this work, namely to answer the open problem of the long time asymptotics to (1.1) with Newtonian interaction in 2D and m > 1. This is accomplished in Section 4 by a compactness argument for which one has to extract the corresponding uniform in time bounds and a careful treatment of the nonlinear terms and dissipation while taking the limit t → ∞. We do not know how to obtain a similar result for Newtonian interaction in d ≥ 3 due to the lack of uniform in time mass confinement bounds in this case. We essentially cannot show that mass does not escape to infinity while taking the limit t → ∞. However, the compactness and characterization of stationary solutions is still valid in that case.
The present work opens new perspectives to show radial symmetry for stationary solutions to nonlocal aggregation-diffusion problems. While the hypotheses of our result to ensure existence of global radially symmetric minimizers of (1.2), and in turn of stationary solutions to (1.1), are quite general, we do not know yet whether there is uniqueness among radially symmetric stationary solutions (with a fixed mass) for general non-Newtonian kernels. We even do not have available uniqueness results of radial minimizers beyond Newtonian kernels. Understanding if the existence of radially symmetric local minimizers, that are not global, is possible for functionals of the form (1.2) with radial interaction potential is thus a challenging question. Concerning the long-time asymptotics of (1.1), the lack of a novel approach to find confinement of mass beyond the usual virial techniques and comparison arguments in radial coordinates hinders the advance in their understanding even for Newtonian kernels with d ≥ 3. Last but not least, our results open a window to obtain rates of convergence towards the unique equilibrium up to translation for the Newtonian kernel in 2D. The lack of general convexity of this variational problem could be compensated by recent results in a restricted class of functions, see [32]. However, the problem is quite challenging due to the presence of free boundaries in the evolution of compactly supported solutions to (1.1) that rules out direct linearization techniques as in the linear diffusion case [22].

Radial Symmetry of stationary states with degenerate diffusion
Throughout this section, we assume that m > 0, and W satisfies the following four assumptions: and ω (r) > 0 for all r > 0 with ω(1) = 0. (K2) W is no more singular than the Newtonian kernel in R d at the origin, i.e., there exists some C w > 0 such that ω (r) ≤ C w r 1−d for r ≤ 1. (K3) There exists some C w > 0 such that ω (r) ≤ C w for all r > 1. (K4) Either ω(r) is bounded for r ≥ 1 or there exists C w > 0 such that for all a, b ≥ 0: As usual, ω ± denotes the positive and negative part of ω such that ω = ω + − ω − . In particular, if W = −N , modulo the addition of a constant factor, is the attractive Newtonian potential, where N is the fundamental solution of −∆ operator in R d , then W satisfies all the assumptions. Since the equation (1.1) does not change by adding a constant to the potential W , we will consider that the potential W is defined modulo additive constants from now on.
We denote by L 1 + (R d ) the set of all nonnegative functions in L 1 (R d ). Let us start by defining precisely stationary states to the aggregation equation (1.1) with a potential satisfying (K1)-(K4).
in the sense of distributions in R d .
Let us first note that ∇ψ s is globally bounded under the assumptions (K1)-(K3). To see this, a direct decomposition in near-and far-field sets yields where we split the integrand into the sets A := {y : |x − y| ≤ 1} and B := R d \ A, and apply the assumptions (K1)-(K3).
Under the additional assumptions (K4) and ω(1 + |x|)ρ s ∈ L 1 (R d ), we will show that the potential function ψ s (x) = W * ρ s (x) is also locally bounded. First, note that (K1)-(K3) ensures that |ω(r)| ≤C w φ(r) for all r ≤ 1 with someC w > 0, where Hence we can again perform a decomposition in near-and far-field sets and obtain Our main goal in this section is the following theorem.
be a non-negative stationary state of (1.1) in the sense of Definition 2.1. Then ρ s must be radially decreasing up to a translation, i.e. there exists some Before going into the details of the proof, we briefly outline the strategy here. Assume there is a stationary state ρ s which is not radially decreasing under any translation. To obtain a contradiction, we consider the free energy functional E[ρ] associated with (1.1), Below we discuss the strategy for m > 1 first, and point out the modification for m ∈ (0, 1] in the next paragraph. Using the assumption that ρ s is not radially decreasing under any translation, we will apply the continuous Steiner symmetrization to perturb around ρ s and construct a continuous family of densities µ(τ, ·) with µ(0, ·) = ρ s , such that E[µ(τ )] − E[ρ s ] < −cτ for some c > 0 and any small τ > 0. On the other hand, using that ρ s is a stationary state, we will show that |E[µ(τ )] − E[ρ s ]| ≤ Cτ 2 for some C > 0 and any small τ > 0. Combining these two inequalities together gives us a contradiction for sufficiently small τ > 0.
If the kernel W has certain convexity properties and m ≥ 1 − 1 d , then it is known that (1.1) has a rigorous Wasserstein gradient flow structure. In this case, once we obtain the crucial estimate: there is a shortcut that directly lead to the radial symmetry result, which we will discuss in Section 2.4.
Let us characterize first the set of possible stationary states of (1.1) in the sense of Definition 2.1 and their regularity. Parts of these arguments are reminiscent from those done in [79,28] in the case of attractive Newtonian potentials.
be a non-negative stationary state of (1.1) for some m > 0 in the sense of Definition 2.1. Then ρ s ∈ C(R d ), and there exists some and Proof. We have already checked that under these assumptions on W and ρ s , the potential function with right hand side belonging to W −1,p loc (R d ) for all 1 ≤ p ≤ ∞. As a consequence, ρ m s is in fact a weak solution in W 1,p loc (R d ) for all 1 < p < ∞ of (2.9) by classical elliptic regularity results. Sobolev embedding shows that ρ m s belongs to some Hölder space C 0,α loc (R d ), and thus ρ s ∈ C 0,β loc (R d ) with β := min{α/m, 1}. Let us define the set Ω = {x ∈ R d : ρ s (x) > 0}. Since ρ s ∈ C(R d ), then Ω is an open set and it consists of a countable number of open possibly unbounded connected components. Let us take any bounded smooth connected open subset Θ such that Θ ⊂ Ω, and start with the case m = 1. Since ρ s ∈ C(R d ), then ρ s is bounded away from zero in Θ and thus due to the assumptions on ρ s , we have that m m−1 ∇ρ m−1 s = 1 ρs ∇ρ m s holds in the distributional sense in Θ. We conclude that wherever ρ s is positive, (2.1) can be interpreted as in the sense of distributions in Ω. Hence, the function G(x) = m m−1 ρ m−1 s (x) + ψ s (x) is constant in each connected component of Ω. From here, we deduce that any stationary state of (1.1) in the sense of Definition 2.1 is given by where G(x) is a constant in each connected component of the support of ρ s , and its value may differ in different connected components. Due to ψ s ∈ W 1,∞ loc (R d ), we deduce that ρ s ∈ C . Putting together (2.11) and (2.2), we conclude the desired estimate.
In addition, from (2.11) we have that Ω = R d if m ∈ (0, 1): if not, let Ω 0 be any connected component of Ω, and take x 0 ∈ ∂Ω 0 . As we take a sequence of points x n → x 0 with x n ∈ Ω 0 , we have that ρ s (x n ) m−1 → ∞, whereas the sequence G(x n ) − ψ s (x n ) is bounded (since ψ s is locally bounded due to (2.4)), a contradiction.
If m = 1, the above argument still goes through except that we replace (2.10) by ∇ (log ρ s + ψ s ) = 0 in the sense of distributions in Ω. As a result, the function G(x) = log ρ s + ψ s (x) is constant in each connected component of Ω. The same argument as the m ∈ (0, 1) case then yields that ρ s ∈ C 0,1 loc (R d ) and Ω = R d , leading to the estimate |∇ log ρ| ≤ C in R d .

2.1.
Some preliminaries about rearrangements. Now we briefly recall some standard notions and basic properties of decreasing rearrangements for nonnegative functions that will be used later. For a deeper treatment of these topics, we address the reader to the books [51,6,56,60,64] or the papers [81,82,83,73]. We denote by |E| d the Lebesgue measure of a measurable set E in R d . Moreover, the set E # is defined as the ball centered at the origin such that If f is radially symmetric, we will often write f (x) = f (r) for r = |x| ≥ 0 by a slight abuse of notation. We say that f is rearranged if it is radial and f is a nonnegative right-continuous, non-increasing function of r > 0. A similar definition can be applied for real functions defined on a ball B R (0) = x ∈ R d : |x| < R . We define the distribution function of f ∈ L 1 Then the function f * : [0, +∞) → [0, +∞] defined by will be called the Hardy-Littlewood one-dimensional decreasing rearrangement of f . By this definition, one could interpret f * as the generalized right-inverse function of ζ f (τ ).
Making use of the definition of f * , we can define a special radially symmetric decreasing function f # , which we will call the Schwarz spherical decreasing rearrangement of f by means of the formula where ω d is the volume of the unit ball in R d . It is clear that if the set Ω f = x ∈ R d : f (x) > 0 of f has finite measure, then f # is supported in the ball Ω # f . One can show that f * (and so f # ) is equidistributed with f (i.e. they have the same distribution function). Thus if f ∈ L p (R d ), a simple use of Cavalieri's principle (see e.g. [82,60]) leads to the invariance property of the L p norms: for all 1 ≤ p ≤ ∞ . (2.13) In particular,using the layer-cake representation formula (see e.g. [64]) one could easily infer that Among the many interesting properties of rearrangements, it is worth mentioning the Hardy-Littlewood inequality (see [51,6,60] for the proof): for any couple of nonnegative measurable functions f, g on R d , we have (2.14) Since in Section 4 we will use estimates of the solutions Keller-Segel problems in terms of their integrals, let us now recall the concept of comparison of mass concentration, taken from [85], that is remarkably useful.
loc (R d ) be two nonnegative, radially symmetric functions on R d . We say that f is less concentrated than g, and we write f ≺ g if for all R > 0 we get The partial order relationship ≺ is called comparison of mass concentrations. Of course, this definition can be suitably adapted if f, g are radially symmetric and locally integrable functions on a ball B R . The comparison of mass concentrations enjoys a nice equivalent formulation if f and g are rearranged, whose proof we refer to [1,41,86]: From this Lemma, it easily follows that if f ≺ g and f, g ∈ L p (R d ) are rearranged and nonnegative, then Let us also observe that if f, g ∈ L 1 + (R d ) are nonnegative and rearranged, then f ≺ g if and only if for all s ≥ 0 we have In this regard, another interesting property which will turn out useful is the following If additionally g is rearranged and Proof. Let us consider the sequence of bounded radially increasing functions {ϕ n }, where ϕ n (x) = min |x| 2 , n is the truncation of the function |x| 2 at the level n and define the function Then h n is nonnegative, bounded and rearranged. Thus using the Hardy-Littlewood inequality (2.14) and [1, Corollary 2.1] we find Then passing to the limit as n → ∞ we find the desired result.

Continuous Steiner symmetrization.
Although classical decreasing rearragement techniques are very useful to study properties of the minimizers and for solutions of the evolution problem (1.1) in next sections, we do not know how to use them in connection with showing that stationary states are radially symmetric. For an introduction of continuous Steiner symmetrization and its properties, see [16,18,64]. In this subsection, we will use continuous Steiner symmetrization to prove the following proposition.
, and assume it is not radially decreasing after any translation.
Moreover, if m ∈ (0, 1) ∪ (1, ∞), assume that | m m−1 ∇µ m−1 0 | ≤ C 0 in supp µ 0 for some C 0 ; and if m = 1 assume that |∇ log µ 0 | ≤ C 0 in supp µ 0 for some C 0 . In addition, if m ∈ (0, 1], assume that supp µ 0 = R d . Then there exist some δ 0 > 0, c 0 > 0, C 1 > 0 (depending on m, µ 0 and W ) and a function µ ∈ C([0, δ 0 ] × R d ) with µ(0, ·) = µ 0 , such that µ satisfies the following for a short time τ ∈ [0, δ 0 ], where E is as given in (2.5): Then we define the Steiner symmetrization of E with respect to the direction x 1 as the set S(E) which is symmetric about the hyperplane {x 1 = 0} and is defined by In particular we have that For all x ∈ R d−1 , let us consider the distribution function of µ 0 (·, x ), i.e. the function (2.19) Then we can give the following definition: Definition 2.9. We define the Steiner symmetrization (or Steiner rearrangement) of µ 0 in the direction x 1 as the function Sµ 0 = Sµ 0 (x 1 , x ) such that Sµ 0 (·, x ) is exactly the Schwarz rearrangement of µ 0 (·, x ) i.e. (see (2.12)) As a consequence, the Steiner symmetrization Sµ 0 (x 1 , x ) is a function being symmetric about the hyperplane {x 1 = 0} and for each h > 0 the level set which implies that Sµ 0 and µ 0 are equidistributed, yielding the invariance of the L p norms when passing from µ 0 to Sµ 0 , that is for all p ∈ [1, ∞] we have Moreover, by the layer-cake representation formula, we have Now, we introduce a continuous version of this Steiner procedure via an interpolation between a set or a function and their Steiner symmetrizations that we will use in our symmetry arguments for steady states. (1) If U = I(c, r), then M τ (U ) := I(c − τ sgn c, r) for 0 ≤ τ < |c|, I(0, r) for τ ≥ |c|.
where τ 1 is the first time two intervals M τ (I(c i , r i )) share a common endpoint. Once this happens, we merge them into one open interval, and repeat this process starting from τ = τ 1 .
See Figure 1 for illustrations of M τ (U ) in the cases (1) and (2). Also, we point out that case (3) can be seen as a limit of case (2), since for each N 1 < N 2 one can easily check that M τ (U N1 ) ⊂ M τ (U N2 ) for all τ ≥ 0. Moreover, according to [18], the definition of M τ (U ) can be extended to any measurable set U of R, since  In the next lemma we state four simple facts about M τ . They can be easily checked for case (1) and (2) (hence true for (3) as well by taking the limit), and we omit the proof.  Once we have the continuous Steiner symmetrization for a one-dimensional set, we can define the continuous Steiner symmetrization (in a certain direction) for a non-negative function in R d .
Using the above definition, Lemma (2.11) and the representation (2.20) one immediately has Furthermore, it is easy to check that S τ µ 0 = µ 0 for all τ if and only if µ 0 is symmetric decreasing about the hyperplane H = {x 1 = 0}. Below is the definition for a function being symmetric decreasing about a hyperplane: For a hyperplane H ⊂ R d (with normal vector e), we say µ 0 is symmetric decreasing about H if for any x ∈ H, the function f (τ ) := µ 0 (x + τ e) is rearranged, i.e. if f = f # .
Next we state some basic properties of S τ without proof, see [18,56,58] for instance.

2.2.2.
Interaction energy under Steiner symmetrization. In this subsection, we will investigate I[S τ µ 0 ]. It has been shown in [18,Corollary 2] and [64,Theorem 3.7] that I[S τ µ 0 ] is nonincreasing in τ . Indeed, in the case that µ 0 is a characteristic function χ Ω0 , it is shown in [72] that I[S τ µ 0 ] is strictly decreasing for τ small enough if Ω 0 is not a ball. However, in order to obtain (2.16) for a strictly positive c 0 , some refined estimates are needed, and we will prove the following: Assume the hyperplane H = {x 1 = 0} splits the mass of µ 0 into half and half, and µ 0 is not symmetric decreasing about H. Let I[·] be given in (2.5), where W satisfies the assumptions (K1)-(K3). Then I[S τ µ 0 ] is non-increasing in τ , and there exists some δ 0 > 0 (depending on µ 0 ) and c 0 > 0 (depending on µ 0 and W ), such that The building blocks to prove Proposition 2.15 are a couple of lemmas estimating how the interaction energy between two one-dimensional densities µ 1 , µ 2 changes under continuous Steiner symmetrization for each of them. That is, we will investigate how changes in τ for a given one dimensional kernel K to be determined. We start with the basic case where µ 1 , µ 2 are both characteristic functions of some open interval.
is as given in Definition 2.10. Then the following holds for the function I(τ ) : Proof. By definition of S τ , we have S τ µ i = χ M τ (I(ci,ri)) for i = 1, 2 and all τ ≥ 0. If sgn c 1 = sgn c 2 , the two intervals M τ (I(c i , r i )) are moving towards the same direction for small enough τ , during which their interaction energy I(τ ) remains constant, implying d dτ I(0) = 0. Hence it suffices to focus on sgn c 1 = sgn c 2 and prove (2.22).
Without loss of generality, we assume that c 2 > c 1 , so that sgn c 2 − sgn c 1 is either 2 or 1. The definition of M τ gives Taking its right derivative in τ yields Let us deal with the case r 1 ≤ r 2 first. In this case we rewrite d + dτ I(0) as d + dτ , as illustrated in Figure 3. Let (Q + and D are the yellow set and green set in Figure 3 respectively). By definition,Q + and D are disjoint subsets of Q + , so We claim that Q − K (x − y)dxdy + Q+ K (x − y)dxdy ≥ 0. To see this, note that Q − ∪Q + forms a rectangle, whose center has a zero x-coordinate and a positive y-coordinate. Hence for any h > 0, the line segmentQ + ∪ {x − y = −h} is longer than Q − ∪ {x − y = h}, which gives the claim. Therefore, (2.24) becomes Note that D is a rectangle with area r 1 (c 2 − c 1 ), and for any (x, y) ∈ D, we have (recall that This finally gives Similarly, if r 1 > r 2 , then I (0) can be written as (2.23) withQ defined as [−r 1 + (c 2 − c 1 ), r 1 + (c 2 − c 1 )] × [−r 2 , r 2 ] instead, and the above inequality would hold with the roles of r 1 and r 2 interchanged. Combining these two cases, we have The next lemma generalizes the above result to open sets with finite measures.
In addition, assume that there exists some a ∈ (0, 1) and R > max{|U 1 |, |U 2 |} such that Proof. It suffices to focus on the case when U 1 , U 2 both consist of a finite disjoint union of open intervals, and for the general case we can take the limit. Recall that S τ µ i = χ M τ (Ui) for i = 1, 2 and all τ ≥ 0.
To show (a), due to the semigroup property of S τ in Lemma 2.14, all we need to show is d + dτ I(0) ≥ 0. By writing U 1 , U 2 each as a union of disjoint open intervals and expressing I(τ ) a sum of the pairwise interaction energy, (a) immediately follows from Lemma 2.16(a).
We will prove (b) next. First, we claim that To see this, note that , and we aim to prove (2.25) at this particular time where all intervals I(c 1 k , r 1 k ) are disjoint, and none of them share common endpoints -if they do, we merge them into one interval.
. We then define , and denote by I 2 the set of indices k such that −R − |U 2 |/2 ≤ c 2 k ≤ − a 4 , and similarly we have k∈I2 r 2 k ≥ a/8.
The semigroup property of M τ in Lemma 2.11 gives that for all s > 0, Since none of the intervals I(c 1 k , r 1 k ) share common endpoints, we have A similar result holds for M τ0+s (U 2 ), hence we obtain for sufficiently small s > 0: Applying Lemma 2.16(a) to the above identity yields Next we will obtain a lower bound for T kl . By definition of I 1 and I 2 , for each k ∈ I 1 and l ∈ I 2 we have that where c w = min r∈[ a 4 ,4R] |K (r)| (here we used that for k ∈ I 1 , l ∈ I 2 , we have r 1 k + r 2 l + |c 2 l − c 1 k | ≤ |U 1 |/2 + |U 2 |/2 + (R + |U 1 |/2) + (R + |U 2 |/2) ≤ 4R, due to the assumption R > max{|U 1 |, |U 2 |}.) Plugging the above inequality into (2.28) and using min{u, v} ≥ min{u, 1} min{v, 1} for u, v > 0, we have here we applied (2.27) in the second-to-last inequality, and used the assumption a ∈ (0, 1) for the last inequality. Since τ 0 ∈ [0, a/4] is arbitrary, we can conclude.
Now we are ready to prove Proposition 2.15.

Proof of Proposition
Our discussion above yields that at least one of B R, a   1 and B R,a 2 is nonempty when R is sufficiently large and a > 0 sufficiently small (hence at least one of them must have nonzero measure by continuity of µ 0 ). Next let us discuss two cases. a   1 and B R,a 2 have nonzero measure when R is sufficiently large and a > 0 sufficiently small.
Let us define a one-dimensional kernel K l (r) := − 1 2 W ( √ r 2 + l 2 ). Note that for any l > 0, the kernel K l ∈ C 1 (R) is even in r, and K l (r) < 0 for all r > 0. By definition of S τ , we can rewrite Thus using the notation in (2.21), I[S τ µ 0 ] can be rewritten as and taking its right derivative (and applying Lemma 2.17(a)) yields By definition of B R, a   1 and B R,a 2 , for any (x , h 1 ) ∈ B R,a 1 and (y , h 2 ) ∈ B R,a 2 , we can apply Lemma 2.17(b) to obtain where c w is the minimum of |K |x −y | (r)| in [a/4, 4R]. By definition of K l (r), we have Using |x | ≤ R and |y | ≤ R (due to definition of B 1 , B 2 ), we have hence we can conclude the desired estimate.
Case 2: Only one of B R, a   1 and B R,a 2 has nonzero measure for R 1 and 0 < a x |/2, +∞) for almost every x ∈ R d−1 and h > 0. Thus using the layer cake representation formula (2.20), we have µ 0 ≤ Sµ 0 in {x 1 < 0}, where Sµ 0 is the Steiner symmetrization of µ 0 . On the other hand, using the assumption that H splits the mass of µ 0 into half and half, µ 0 and Sµ 0 must have the same mass in Combining this with |B R,a 1 | > 0, some U h0 x 0 must contain disjoint intervals with a positive gap. By the continuity of µ 0 , there exists some 0 ≤ x l < x r and some sufficiently small a > 0, such that has a nonzero measure for the above a > 0, and for R > 0 sufficiently large.
Let us denote τ 0 : where∪ represents the disjoint union. Now for all 0 < τ < τ 0 , we are ready to take the right derivative of (2.29) (and applying Lemma 2.17(a)) to obtain (2.33) Since |B R,a 1 | > 0 and |B R,a 2 | > 0, the rest of the argument is identical to the last part of Case 1, and at the end we obtain finishing the proof for Case 2.

2.2.3.
Proof of Proposition 2.8. In the statement of Proposition 2.8, we assume that µ 0 is not radially decreasing up to any translation. Since Steiner symmetrization only deals with symmetrizing in one direction, we will use the following simple lemma linking radial symmetry with being symmetric decreasing about hyperplanes. Although the result is standard (see [48,Lemma 1.8]), for the sake of completeness we include here the details of the proof.
Suppose for every unit vector e, there exists a hyperplane H ⊂ R d with normal vector e, such that µ 0 is symmetric decreasing about H. Then µ 0 must be radially decreasing up to a translation.
Proof. For i = 1, . . . , d, let e i be the unit vector with i-th coordinate 1 and all the other coordinates 0. By assumption, for each i, there exists some hyperplane H i with normal vector e i , such that µ 0 is symmetric decreasing about H i . We then represent each H i as {(x 1 , . . . , x d ) : x i = a i } for some a i ∈ R, and then define a ∈ R d as a := (a 1 , . . . , a d ). Our goal is to prove that µ 0 (· − a) is radially decreasing.
We first claim that µ 0 ( The claim implies that every hyperplane H passing through a must split the mass of µ 0 into half and half. Denote the normal vector of H by e. By assumption, µ 0 is symmetric decreasing about some hyperplane H with normal vector e. The definition of symmetric decreasing implies that H is the only hyperplane with normal vector e that splits the mass into half and half, hence H must coincide with H. Thus µ 0 is symmetric decreasing about every hyperplane passing through a, hence we can conclude. Proof of Proposition 2.8. Since µ 0 is not radially decreasing up to any translation, by Lemma 2.18, there exists some unit vector e, such that µ 0 is not symmetric decreasing about any hyperplane with normal vector e. In particular, there is a hyperplane H with normal vector e that splits the mass of µ 0 into half and half, and µ 0 is not symmetric decreasing about H. We set e = (1, 0, . . . , 0) and H = {x 1 = 0} throughout the proof without loss of generality. For the rest of the proof, we will discuss two different cases m ∈ (0, 1] and m > 1, and construct µ(τ, ·) in different ways.
Case 1: m ∈ (0, 1]. In this case, we simply set µ(τ, ·) = S τ µ 0 . By Proposition 2.15, I[S τ µ 0 ] is decreasing at least linearly for a short time. Since continuous Steiner symmetrization preserves the distribution function, even if S[µ 0 ] = −∞ by itself, we still have the difference S[µ(τ )] − S[µ 0 ] ≡ 0 in the sense of (2.6). Thus (2.16) holds for all sufficiently small τ > 0. In addition, (2.18) is automatically satisfied since we assumed that supp µ 0 = R d for m ∈ (0, 1], and recall that S τ is mass-preserving by definition. It then suffices to prove (2.17) for all sufficiently small τ > 0. Let us discuss the case m = 1 first. By assumption, |∇ log µ 0 | ≤ C 0 . For any y ∈ R d and τ > 0 we claim that To see this, let us fix any y = (y 1 , y ) ∈ R d . Since log µ 0 (·, y ) is Lipschitz with constant C 0 , for any τ > 0, the following two inequalities hold: Since the level sets of µ 0 are moving with velocity at most 1 (and note that any level set of µ 0 is also a level set of log µ 0 ), we obtain (2.34). It implies We then have |µ(τ, y) − µ 0 (y)| ≤ 2C 0 µ 0 (y)τ for all τ ∈ (0, log 2 C0 ) and all y ∈ R d . Now we move on to m ∈ (0, 1), where we aim to show that |µ(τ, y) − µ 0 (y)| ≤ C 1 µ 2−m 0 (y)τ for some C 1 for all sufficiently small τ > 0. Using the assumption |∇ m 1−m µ m−1 0 | ≤ C 0 , the same argument to obtain (2.34) then gives the following for all y ∈ R d , τ > 0: ∞ . For any τ ∈ (0, δ 0 ), the left hand side of the above inequality is strictly positive, thus we have (2. 35) and note that our choice of δ 0 ensures that , which is a convex and decreasing function in a with f (0) = 0. Using this function f , the above inequality (2.35) can be rewritten as Since f is convex and decreasing, for all |a| ≤ C0(1−m) and this leads to Case 2: m > 1. Note that if we set µ(τ, ·) = S τ µ 0 , then it directly satisfies (2.16) for a short time, since I[S τ µ 0 ] is decreasing at least linearly for a short time by Proposition 2.15, and we also have S[S τ µ 0 ] is constant in τ . However, S τ µ 0 does not satisfy (2.17) and (2.18). To solve this problem, we will modify S τ µ 0 intoS τ µ 0 , where we make the set U h for some sufficiently small constant h 0 > 0 to be determined later. More precisely, we define µ(τ, ·) =S τ µ 0 asS with v(h) as in (2.36) For an illustration on the difference between S τ µ 0 andS τ µ 0 , see the left figure of Figure 4.
Note thatS τ µ 0 and S τ µ 0 do not necessarily have the same distribution function. Due to a reduced speed v(h) for h ∈ (0, h 0 ) in the construction ofS τ , a higher block may travel over a lower block, as illustrated in the right figure of Figure 4. When this happens, the part that is hanging outside would "drop down" as we integrate in h in (2.37), thus changing the distribution function ofS τ µ 0 . But, this is not likely (and even impossible) to happen when τ 1: indeed, using the regularity assumption |∇µ m−1 0 | ≤ C 0 and the particular v(h) in (2.36), one can show that the level Figure 4. Left: A sketch on µ 0 (grey), S τ µ 0 (blue) andS τ µ 0 (red dashed) for a small τ > 0. Right: In the construction ofS τ , due to a reduced speed at lower values, a higher value level set may travel over a lower value level set. The figure illustrates this phenomenon for a large τ > 0.
sets remain ordered for small enough τ . But we will not pursue in this direction, since later we will show in (2.41) that S[S τ µ 0 ] ≤ S[µ 0 ] for all τ > 0, which is sufficient for us.
Finally, we will show that (2.16) holds for To see this, note that the definition ofS τ and the fact that M v(h)τ is measure preserving give us To show (2.43), we first split S τ µ 0 as the sum of two integrals in h ∈ [h 0 , ∞) and h ∈ [0, h 0 ): We then splitS τ µ 0 similarly, and since v(h) = 1 for all h > h 0 we obtaiñ )dx for any measurable function ϕ). Indeed, since the level sets of f 2 are traveling at speed 1 and the level sets off 2 are traveling with speed v(h), for each τ we can find a transport plan between them with maximal displacement L ∞ distance at most 2τ in its support. Let us remark that since these densities are both in L ∞ , there is some optimal transport mapT for the ∞-Wasserstein such that |T (τ, x) − x| ≤ 2τ . Although existence of an optimal map is known [38], we just need a transport map with this property below.
Using the decompositions (2.44), (2.45) and the definition of I[·], we obtain, omitting the τ dependence on the right hand side, , and we will bound A 1 (τ ) and A 2 (τ ) in the following. For A 1 (τ ), denote Φ(τ, ·) =: W * f 1 (τ, ·), and using the L ∞ , L 1 bounds on f 1 and the assumptions (K2),(K3), we proceed in the same way as in Using that T (τ, ·)#f 2 (τ, ·) =f 2 (τ, ·), we can rewrite A 1 (τ ) as where the coefficient of τ can be made arbitrarily small by choosing h 0 sufficiently small. To control and both terms can be controlled in the same way as A 1 (τ ), since both Φ 2 := W * f 2 andΦ 2 := W * f 2 satisfy the same estimate as Φ. Combining the estimates for A 1 (τ ) and A 2 (τ ), we can choose h 0 > 0 sufficiently small, depending on µ 0 and W , such that equation (2.43) would hold for all τ , which finishes the proof.
Next we move on to the case m = 1. Using the notation g(τ, x), the difference E[µ(τ )] − E[ρ s ] can be rewritten as follows: (where we again omit the x dependence in the integrand) Again, we have J 1 = 0 since g(τ )dx = 0, and log ρ s + W * ρ s = C in R d . J 3 is the same term as I 2 , thus again can be controlled by Aτ 2 . Finally it remains to control J 2 . Let us break J 2 into For J 22 , using the inequality log(1 + a) < a for all a > 0, we have where we use (2.47) in the second inequality. To control J 21 , due to the elementary inequality |log (1 + a) − a| ≤ Ca 2 for all a > 0 for some universal constant C, letting a = g(τ ) ρs and apply it to J 21 gives where the last inequality is obtained in the same way as (2.51 [55,76,2,34,42]. More precisely, for (1.1), if W is known to be λ-convex, then given any ρ 0 ∈ P 2 (R d ) (space of non-negative probability measures with finite second-moment) with E[ρ 0 ] < ∞, there exists a unique gradient flow ρ(t) of the free energy functional E[ρ 0 ] in the space P 2 (R d ) endowed by the 2-Wasserstein distance. In addition, the gradient flow coincides with the unique weak solution if the velocity field has the necessary integrability conditions.
The λ-convexity of the potential W does not hold in the generality of our assumptions (K1)-(K4). However, the λ-convexity assumption on W has been recently relaxed in the following works for the particular, but important, case of the attractive Newtonian kernel. [42] has shown that the gradient flow is well-posed if the energy E is ξ-convex, where ξ is a modulus of convexity. [35] has recently shown that for (1.1) with attractive Newtonian potential, for any ρ 0 in L ∞ (R d ) ∩ P 2 (R d ), there is a local-in-time gradient flow solution. The authors show that there are local in time L ∞ bounds at the discrete variational level allowing for local in time well defined gradient flow solutions. Furthermore, this gradient flow solution is unique among a large class of weak solutions due to the earlier results [32]. There, it was also shown that the free energy functional E is ξ-convex for m ≥ 1 − 1 d in the set of bounded densities L ∞ (R d ) ∩ P 2 (R d ) with a given fixed bound allowing the use of the recent theory of ξ-convex gradient flows in [42]. Summarizing, the recent results for the Newtonian attractive kernel [42,32,35] allow for a rigorous gradient flow structure of the Newtonian attractive kernel case for m ≥ 1 − 1 d with initial data in L ∞ (R d ) ∩ P 2 (R d ).
In short we now know two particular more restrictive classes of potentials than the assumptions (K1)-(K4), including the Newtonian kernel case, for which a rigorous gradient flow theory has been developed for (1.1). Next we will show that under a rigorous gradient flow structure, once we use continuous Steiner symmetrization to obtain Proposition 2.15, it almost directly leads to radial symmetry via the following shortcut. In particular, Proposition 2.8 is not needed. Below is the statement and proof of the new proposition that we include for the sake of completeness. Note that it is weaker than Theorem 2.2, since Wasserstein gradient flow requires solutions to have a finite second moment, and furthermore for the existence of the gradient flow solutions we need to assume m ≥ 1 − 1 d . We will discuss this difference in Remark 2.20. Proposition 2.19. Assume that W is such that (1.1) has a local-in-time unique gradient flow solution. Let ρ s ∈ L ∞ (R d ) ∩ P 2 (R d ) be a stationary solution of (1.1) with E[ρ s ] being finite. Then ρ s must be radially decreasing after a translation.
Proof. Towards a contradiction, assume there is a stationary state ρ s that is not radially decreasing after any translation. As before, Lemma 2.3 yields that ρ s ∈ C(R d )∩L 1 + (R d ). Applying Lemma 2.18 to ρ s allows us to find a hyperplane H that splits the mass of ρ s into half and half, but ρ s is not symmetric decreasing about H. Without loss of generality assume H = {x 1 = 0}. Applying Proposition 2.15 to ρ s and using the fact that the L m norm is conserved under the continuous Steiner symmetrization S τ , we directly have that where c 0 , δ 0 are strictly positive constants that depend on ρ s . In addition, since the continuous Steiner symmetrization S τ gives an explicit transport plan from ρ s to S τ ρ s , where each layer is shifted by no more than distance τ , we have W ∞ (ρ s , S τ ρ s ) ≤ τ , thus On the other hand, the local in time gradient flow solution ρ(t) with initial solution ρ s satisfies an Evolution Differential Inequality (EVI) (see [42,Definition 2.10] when W is the Newtonian kernel), then arguing as in [3,Proposition 3.6], see also [32], we have that the following energy dissipation inequality is satisfied, for all t ≥ 0 both for λ-convex potentials, actually (2.54) holds with equality, and for the Newtonian attractive potential. This is a consequence of the map t → |∂E|(ρ(t)) being decreasing and lower semicontinuous, see for instance [2,Theorem 2.4.15] in the λ-convex case and [42,Theorem 3.12] in the Newtonian kernel case. Since ρ(t) ≡ ρ s is a gradient flow solution, plugging it into (2.54) yields that the left hand side is 0, whereas the right hand side is less than − 1 2 c 2 0 t which is negative for all t > 0, a contradiction.
Remark 2.20. The assumption that ρ s is a probability measure does not create any actual restriction. If ρ s is a stationary solution of (1.1) with mass M 0 = 1, we can simply apply Theorem 2.19 tõ ρ s := ρs M0 , which has mass 1, and it is a stationary solution of (1.1) with some positive coefficients multiplied to the two terms on the right hand side. However, the assumption that ρ s has finite second moment (which comes in the definition of P 2 (R d )) makes it more restrictive than Theorem 2.2, which only requires ω(1 + |x|)ρ s ∈ L 1 (R d ). Moreover, the assumption of the existence of a local-in-time unique gradient flow solution implies the more restrictive condition on the nonlinear diffusion m ≥ 1 − 1 d in order to be proved with the available literature [3,42].
At the end of this subsection, let us point out that for our main application in this work, where W = −N is the attractive Newtonian kernel modulo translation and m > 1, we could have used this shortcut to show that all stationary solution ρ s ∈ L 1 + (R d )∩L ∞ (R d ) with finite second moment must be radially decreasing. However the longer approach (via Proposition 2.8 and Theorem 2.2) has a larger interest for two reasons. One is that as discussed in Remark 2.20, Theorem 2.2 proves radial symmetry in a more general class of stationary solutions and more general nonlinear diffusions. Another reason is that the longer approach does not rely on any convexity assumption on W , thus it works even if the equation does not have a rigorous gradient flow structure. Even more, part of the authors have also recently shown that this longer proof can be generalized to kernels that are more singular than Newtonian [31] for which a rigorous gradient flow theory is missing.
2.5. Including a potential term. In this subsection, we consider the aggregation-diffusion equation with an extra drift term given by a potential V (x): where we assume that m > 0, V (x) ∈ C 1 (R d ) is radially symmetric, and V (r) > 0 for all r > 0. Lemma 2.21. Let V ∈ C(R d ) be radially symmetric and non-decreasing in |x|. Let µ ∈ L 1 + (R d ) ∩ L ∞ (R d ) be such that µV dx < ∞. Then S τ [µ]V dx is non-increasing for all τ > 0.
Proof. For any n ∈ N + , let ϕ n (x) := max{0, V (n) − V (x)}. (Here we define V (n) := V (x)| |x|=n by a slight abuse of notation.) Note that supp ϕ n ⊂ B(0, n), and is non-increasing in |x|. By the Hardy-Littlewood inequality for continuous Steiner symmetrization [18, Lemma 4], we have Sending n → ∞, the above inequality becomes S τ [µ]V dx ≤ µV dx for all τ ≥ 0. The semigroup property of S τ then gives us the desired result.
The above lemma gives that d + dτ S τ [µ]V dx ≤ 0, but it turns out that we have to improve it into a strict inequality if µ is not symmetric decreasing about H = {x 1 = 0}, which we prove below. S τ [µ]V dx τ =0 < 0. As a consequence, for such µ, there is a constant c 0 > 0 (depending on µ and V ) such that for small τ > 0, x is an at most countable union of subintervals. Without loss of generality we assume the subintervals do not share a common endpoint; if so, we add a point to merge them into one interval. Each subinterval can be written in the form I(c, r) = (c − r, c + r). Since µ is not symmetric decreasing about H, some of these subintervals must have their center not at 0 for some x , h. This motivates us to define the set B δ ⊂ R d−1 × R + for 0 < δ 1: The assumption of µ implies that |B δ | > 0 for sufficiently small δ > 0.
By Definition 2.12, S τ [µ]V dx can be written as

Now let us investigate the innermost integral. For any open set
With this notation, the innermost integral in (2.57) becomes Φ(τ ; U h x , x ).
To estimate d + dτ Φ(τ ; U h x , x )| τ =0 , let us start with an easier estimate d + dτ Φ(τ ; U, x )| τ =0 when U is a single interval I(c, r). If c = 0, clearly d + dτ Φ(τ ; U, x ) τ =0 = 0. If c = 0 (WLOG assume c < 0), then M τ (U ) = I(c + τ, r) for sufficiently small τ > 0, thus where we use |c + r| < |c − r| in the last inequality, which follows from c < 0, and actually we have |c − r| − |c + r| ≥ min{2|c|, 2r}. And if c, r, x satisfy |c|, r ∈ [δ, δ −1 ] and |x | ≤ δ −1 , we have the quantitative estimate where C δ is given by where we denote V (x) = V (|x|) by a slight abuse of notation. The strict positivity of C δ follows from the fact that V (r) is strictly increasing in r for r ≥ 0, as well as the compactness of the set The above argument immediately leads to the crude estimate as we take the sum of the estimate d + dτ Φ(τ ; U, x )| τ =0 ≤ 0 over all the subintervals U ⊂ U h x . In addition, if |x | ≤ δ −1 and U h x has a subinterval I(c, r) with |c|, r ∈ [δ, δ −1 ], we have the quantitative estimate d + dτ Φ(τ ; U h x , x )| τ =0 ≤ −C δ < 0. By definition of B δ at the beginning of this proof, we have finishing the proof.
Our goal of this subsection is to show that the radial symmetry result in Theorem 2.2 can be generalized to (2.55) for certain classes of potential V . We will work with one of the following two classes of V : (V1) 0 < V (r) ≤ C for some C for all r > 0.
In the following theorem we prove radial symmetry of stationary solutions under assumption (V1) for all m > 0, and under assumption (V2) for m > 1. We expect that when m ∈ (0, 1], it should be possible to refine some estimates in the proof and obtain symmetry for a wider class than (V1). We will not pursue this direction for presentation simplicity, and we leave further generalizations to interested readers.
Assume that ρ s is a non-negative stationary state of (2.55) in the sense of Definition 2.1, with (2.1) replaced by ∇ρ m s = −ρ s ∇(ψ s + V ). Then if V satisfies (V1), or if V satisfies (V2) in addition to m > 1, then ρ s is radially decreasing about the origin.
Proof. Note that Lemma 2.3 still holds with a potential V , except that right hand sides of (2.7) and (2.8) are now replaced by an x-dependent bound C + |∇V (x)|, which is uniformly bounded in x under (V1). And under the assumptions (V2) and m > 1, we will prove in Lemma 2.24 that ρ s must be compactly supported. Thus in both cases, the right hand sides of (2.7) and (2.8) are still uniformly bounded in x in supp ρ s .
The rest of the proof follows a similar approach as Theorem 2.2 and Proposition 2.8, with E including an extra potential energy V[ρ] := ρV dx. However, some crucial modifications in the proof of Proposition 2.8 are needed, which we highlight below.
First, note that with a potential V , we will prove radial symmetry about the origin, rather than up to a translation. For this reason, we take an arbitrary hyperplane H passing through the origin, and aim to prove that ρ s is symmetric decreasing about H. (WLOG we let H = {x 1 = 0}.) Since H does not split the mass of ρ s into half-and-half, it is possible that for all x ∈ R d−1 and h > 0, every line segment in U h x has its center lying on one side of H. Therefore, the estimate in Proposition 2.15 might fail for ρ s , and all we have is the crude estimate (2.58) Despite this weaker estimate in the interaction energy, we will show that all 3 estimates of Proposition 2.8 still hold, if we define µ(·, τ ) in the same way as in its proof. Clearly, (2.17) and (2.18) remain true since µ(·, τ ) is defined the same as before. We claim that (2.16) still holds, but with a different reason as before: the coefficient c 0 > 0 used to come from contribution from the interaction energy via Proposition 2.15, but now it comes from the potential energy. To see this, consider the following two cases. Once we obtain Proposition 2.8, the rest of the proof follows closely the proof of Theorem 2.2, except the following minor changes. With an extra potential energy in E, the right hand side of (2.50) has an addition term g(τ )V dx. As a result, I 1 has a different definition which is still 0, since the equation for stationary solution now becomes The m = 1 case is done with a similar modification, where J 1 is now g(τ ) (log ρ s + W * ρ s + V ) dx, and again we have J 1 = 0 since ρ s is stationary. Finally, we obtain the same contradiction as the proof of Theorem 2.2 if ρ s is not symmetric decreasing about H. And since H is an arbitrary hyperplane through the origin, we have that ρ s is radially decreasing about the origin.
Finally we state and prove the lemma used in the proof of Theorem 2.23, which shows all stationary solutions must be compactly supported if m > 1 and V satisfies (V2). Note that every connected component being bounded does not imply that supp ρ s is bounded: there may be a countable number of connected components going to infinity. We claim that there is some R( ρ s 1 , ρ s ∞ , W, V ) > 0, such that every connected component D must satisfy that D ∩ B(0, R) = ∅. As we will see later, this will help us control the outmost point of D.
If 0 ∈ D, then clearly D ∩ B(0, R) = ∅. If 0 ∈ D, we find some unit vector ν ∈ R d , such that the ray starting at origin with direction ν has a non-empty intersection with D. Let t 0 = inf{t > 0 : tν ∈ D}, and let x 0 = t 0 ν. We take a sequence of points (t n ) ∞ n=1 such that t n t 0 and t n ν ∈ D, and denote x n = t n ν. Since x n ∈ D and x 0 ∈ ∂D, the left hand side of (2.59) takes the same constant value C i at x 0 and all x n . As a result, for all n ≥ 1 we have Note that the first term is non-negative since ρ s (x 0 ) = 0 (which follows from x 0 ∈ ∂D and ρ s ∈ C(R d )). The second term converges to ∇(ρ s * W ) · ν, whose absolute value is bounded by C( ρ s 1 , ρ s ∞ , W ) by (2.2). The third term converges to ∇V (x 0 ) · ν = V (t 0 ). Putting the three estimates together gives that thus assumption (V2) gives that t 0 ≤ R( ρ s 1 , ρ s ∞ , W, V ), finishing the proof of the claim.
Finally, we will show that D ∩ B(0, R) = ∅ implies the outmost point of D cannot get too far. Take any x 1 ∈ D ∩ B(0, R), and let x 2 be the outmost point of D. Taking the difference of (2.59) at x 2 and x 1 gives Due to (2.4), we bound the right hand side by C( ρ s 1 , ρ s ∞ , ω(1+|x|)ρ s 1 , W )+ω(1+|x 2 |) ρ s 1 . Note that the left hand grows superlinearly in |x 2 | due to (V2), whereas ω(1 + |x 2 |) at most grows linearly in |x 2 | by assumption (K3) on W . This leads to which completes the proof.

Existence of global minimizers
In Section 2, we showed that if ρ s ∈ L 1 is a stationary state of (1.1) in the sense of Definition 2.1 and it satisfies ω(1 + |x|)ρ s ∈ L 1 (R d ), then it must be radially decreasing up to a translation. This section is concerned with the existence of such stationary solutions. Namely, under (K1)-(K4) and one of the extra assumptions (K5) or (K6) below, we will show that for any given mass, there indeed exists a stationary solution satisfying the above conditions. We will generalize the arguments of [28] to show that there exists a radially decreasing global minimizer ρ of the functional (2.5) given by over the class of admissible densities and with the potential satisfying at least (K1)-(K4). Note that the condition on the zero center of mass has to be understood in the improper integral sense, i.e.
since we do not assume that the first moment is bounded in the class Y M . We emphasize that from now on we will work in the dominated regime with degenerate diffusion, namely when In order to avoid loss of mass at infinity, we need to assume some growth condition at infinity. In this section, we will obtain the existence of global minimizers under two different conditions related to the works [67,5,28], and show that such global minimizers are indeed L 1 and L ∞ stationary solutions. Namely, we assume further that the potential W satisfies at infinity either the property (K5) lim r→+∞ ω + (r) = +∞, or (K6) lim r→+∞ ω + (r) = ∈ (0, +∞) where the non-negative potential K := − W is such that, in the case m > 2, K ∈ Lp(R d \ B 1 (0)), for some 1 ≤p < ∞, while for the case 2 − (2/d) < m ≤ 2 we will require that K ∈ L p,∞ (R d \ B 1 (0)), for some 1 ≤ p < ∞. Moreover, there exists an α ∈ (0, d) for which m > 1 + α/d and Here, we denote by L p,∞ (R d ) the weak-L p or Marcinkiewicz space of index 1 ≤ p < ∞. In particular, the attractive Newtonian potential (which is the fundamental solution of −∆ operator in R d ) is covered by these assumptions: for d = 1, 2 it satisfies (K5), whereas for d ≥ 3 it satisfies (K6) with α = d − 2.
Notice that the subadditivity-type condition (K4) allows to claim that E[ρ] is finite over the class Y M : indeed if we split the W into its positive part W + and negative part W − as done in the bound of ψ s in Section 2, the integral with kernel W − is finite by the HLS inequality, see (3.3) below, while by (K4) we infer

3.1.
Minimization of the Free Energy functional. The existence of minimizers of the functional E can be proven with different arguments according to the choice between condition (K5) or (K6): indeed, (K5) produces a quantitative version of the mass confinement effect while (K6) does it in a nonconstructive way. For such a difference, we first briefly discuss the case when condition (K6) is employed, as it can be proven by a simple application of Lion's concentration-compactness principle [67] and its variant in [5].
being the kernel K nonnegative and radially decreasing; furthermore condition (K3) implies K ∈ L p,∞ (B 1 (0)), where p = d/(d − 2). Then we are in position to apply [5, Theorem 1] for m > 2 and [67, Corollary II.1] for 2 − (2/d) < m ≤ 2 to get the existence of a radially decreasing minimizer ρ 0 ∈ Y M of E (and then of E). Moreover, since K is strictly radially decreasing, all global minimizers are radially decreasing.
When considering the presence of condition (K5) the concentration-compactness principle is not applicable but a direct control of the mass confinement phenomenon is possible. Then we first prove the following Lemma, which provides a reversed Riesz inequality, allowing to reduce the study the minimization of E to the set of all the radially decreasing density in Y M . Lemma 3.2. Assume that conditions (K1)-(K5) hold and take a density ρ such that . Then the following inequality holds: and the equality occurs if and only if ρ is a translate of ρ # .
Proof. The proof proceeds exactly as in [27,Lemma 2], up to replacing the function k(r) defined there by the function being r 0 > 0 fixed.
Now we observe that by (3.1) we have 2d + d+2 2d = 2, then by the classical HLS and L p interpolation inequalities, we find where we notice that m > 2(1 − α) if and only if m > 2 − 2 d , that is (3.1). Then by (3.4) we can find a constant C 1 > 0 and a sufficiently large constant C 2 such that . Concerning the case d = 2, we observe that conditions (K1)-(K2) yields and we can use the classical log-HLS inequality and the arguments of [28] to conclude. Concerning the mass confinement, due to (K5) and the same arguments in [28], see also Lemma 4.17, allow us to show Finally, we should check that the interaction potential W is lower semicontinuous as shown in [28, page8]. Indeed, the only technical point to verify in this more general setting relates to the control of the truncated interaction potential A ε for d ≥ 3. Notice that we can estimate due to (2.3)

Now recall that the Newtonian potential
is well defined for a.e. x ∈ R d and is in L 1 loc (R d ), see [47,Theorem 2.21], then for a.e. x ∈ R d we have χ Bε(0) N * ρ → 0 as ε → 0. Moreover, by the HLS inequality we have . Then Lebesgue's dominated convergence theorem allows to conclude that A ε [ρ] → 0 as ε → 0. This convergence is uniform taken on a minimizing sequence ρ n . Now, all ingredients are there to argue as in [28] showing that E achieves its infimum in the class of all radially decreasing densities in Y M .
Remark 3.4. According to Theorem 2.2, the radial symmetry of the global minimizers of E, which are particular critical points of E, is not a surprise. Nevertheless, as pointed out in the proofs of Theorems 3.1-3.3, this property can be much more easily achieved by rearrangement inequalities.
A useful result, which will be used in the next arguments, regards the behavior at infinity of the so called W -potential, namely the function Following the blueprint of [37, Lemma 1.1], we have the following result. .
Proof. Since both f and K are radially symmetric, we definef ,K : [0, +∞) → R such that f (|x|) = f (x),K(|x|) = K(x). Note that lim r→∞f (r) = lim r→∞K (r) = 0 due to (K1), (K6) and the assumption on f . To prove (3.6), we break R d K(x − y)f (y)dy into the following three parts with |x| > 1 and control them respectively by: Since all the three parts tend to 0 as |x| → ∞, we obtain (3.6). To show (3.7), we use K, f ≥ 0 to estimate where we apply (K6) to obtain the third inequality, and in the last inequality we define c : Using similar arguments as in [28], we are able to derive the following result, which indeed gives a natural form of the Euler-Lagrange equation associated to the functional E: and

As a consequence, any global minimizer of E verifies
We now turn to show compactness of support and boundedness of the minimizers.
hence combining this with (K5) gives us (W * ρ 0 )(x) → +∞ as |x| → ∞. It implies that the right hand side of (3.9) must have compact support, hence ρ 0 must have compact support too.
Proof. By Theorem 3.1, Theorem 3.3 and Lemma 3.8, ρ 0 is radially decreasing and has compact support say inside the ball B R (0). Let us first concentrate on the proof under assumption (K5). For notational simplicity in this proof, we will denote by ρ 0 m the L m (R d )-norm of ρ 0 .
We will show that ρ 0 ∈ L ∞ (R d ) by different arguments in several cases: Since ρ 0 is supported in B R (0), we can then find some C 1 w and C 2 w , such that Then by equation (3.9) it will be enough to show that the Newtonian potential ρ 0 * N is bounded in B R (0) for d = 1, 2. In d = 1, this is trivial. In d = 2 it follows from [50, Lemma 9.9] since we have that ρ 0 * N ∈ W 2,m (B R (0)), then Morrey's Theorem (see for instance [17,Corollary 9.15]) yields ρ 0 * N ∈ L ∞ (B R (0)).
Case B: d ≥ 3 and m > d/2. In this case we get W − ≤ C w N in the whole R d for some constant C w , so we have for r > 0 Then using Sobolev's embedding theorem again (see again [17,Corollary 9.15]), we easily argue that for m > d/2 we find (ρ 0 * W − )(r) ∈ L ∞ loc (R d ), hence ρ 0 ∈ L ∞ (R d ) by (3.9) again.
Case C: d ≥ 3 and 2 − 2 d < m ≤ d/2. We aim to prove that ρ 0 (0) is finite which is sufficient for the boundedness of ρ 0 since ρ 0 is radially decreasing. This is done by an inductive argument. To begin with, observe that since ρ 0 is radially decreasing we have that ρ 0 (r) m |B(0, r)| ≤ ρ 0 m m < ∞, which leads to the basis step of our induction ρ 0 (r) ≤ C(d, m, ρ 0 m )r −d/m for all r > 0.
We set our first exponentp = −d/m. For the induction step, we claim that if ρ 0 (r) ≤ C 1 (1 + r p ) with −d < p < 0, then it leads to the refined estimate where C 2 depends on d, m, ρ 0 , W and C 1 .
Indeed, taking into account (K2) and (K5), the compact support of ρ 0 together with the fact that N > 0 for d ≥ 3, we deduce that W ≥ −C w,d N for some constant depending on W and d. As a result, we have, for r ∈ (0, 1), (3.12) We can easily bound (ρ 0 * N )(1) by some C(d, ρ 0 m ). To control where M (s) is the mass of ρ 0 in B(0, s). By our induction assumption, we have Combining this with (3.13), we have so we get, for p = −2, Plugging it into the right hand side of (3.12) yields and using this inequality in the Euler-Lagrange Equation (3.9) leads to (3.11). Moreover, in the case p = −2, we have instead the inequality Now we are ready to apply the induction starting atp = −d/m to show ρ 0 (0) < ∞. We will show that after a finite number of iterations our induction arrives to ρ 0 (r) ≤ C(1 + r a ) (3.14) for some a > 0, which then implies that ρ 0 (0) < ∞. Let g(p) := p+2 m−1 , which is a linear function of p with positive slope, and let us denote g (n) (p) =: (g • g · · · • g) n iterations (p).
Then it remains to consider the case m < d/2. Notice that −d < p < −2. By (3.11) we get, for all r ∈ (0, 1), ρ 0 (r) ≤ C 2 (1 + r g( p) ). (3.15) Then we must consider three cases. We point out that in all the cases we need to discuss the possibility of g (n) (p) = −2 for some n: if this happens, the logarithmic case occurs again and the result follows in a final iteration step as in Subcase C.1.
Notice that p > 2 m−2 , since this condition reads m > 2d/(d + 2), a direct consequence of (3.1). Hence we again obtain g (n) ( p) > 0 for some finite n, which finishes the last case.
Let us finally turn back to the proof if we assume (K6) instead of (K5). Notice first that the proof of the Case C can also be done as soon as the potential W satisfies the bound W ≥ −C w,d (1 + N ) for some C w,d > 0. This is trivially true regardless of the dimension if the potential satisfies (K6) instead of (K5).
Finally, it is interesting to derive some regularity properties of a minimizer ρ 0 , as in [28]. Since W may not be the classical Newtonian kernel, we are led to prove a nice regularity for the Wpotential ψ ρ0 (x) which can be transferred to ρ 0 via equation (3.8) in the support of ρ 0 . Note that (3.9) ensures that ρ 0 satisfies equation (2.1) in the sense of distributions: indeed, as shown in (2.2)-(2.4), we find that ψ ρ0 ∈ W 1,∞ loc (R d ) thus we can take gradients on both sides of the Euler-Lagrange condition (3.9) and multiplying by ρ and writing ρ∇ρ m−1 = m−1 m ∇ρ m we reach (2.1). Now, using the regularity arguments of the proof of Lemma 2.3 again, together with the compact support property, we finally have ρ 0 ∈ C 0,α (R d ) with α = 1/(m − 1).
We can summarize all the results in this section in the following theorem. Putting together the previous theorem with the uniqueness of radial stationary solutions for the attractive Newtonian potential proved in [61,28], we obtain the following result.
Corollary 3.11. In the particular case of the attractive Newtonian potential W (x) = −N (x) modulo the addition of a constant factor, the global minimizer obtained in Theorem 3.10 is unique among all stationary solutions in the sense of Definition 2.1.

3.2.
Some remarks about the minimization of energies with a potential term. The aim of this subsection is to generalize the previous result of subsection 3.1 when dealing with free functionals involving a potential energy, namely defined over the same admissible set Y M , for some C 1 nonnegative radially increasing potential V = V (r), where r = |x|, such that lim r→+∞ V (r) = +∞.
In this framework, the functional E might be infinite on some densities ρ. The presence of the confinement potential V allows then to prove the following generalization of theorems 3.1-3.3, where no asymptotic behavior at infinity is needed for the radial profile ω(r) of the kernel W : Proof. We first observe that by Remark 2.7 and Lemma 3.2 we can restrict to radially decreasing densities. Moreover, following the lines of the proof of Theorem 3.3 we find that E is bounded from below and This inequality easily implies the mass confinement of any minimizing sequence {ρ n }, that is for for some large R > 0. In particular, we have that the sequence {ρ n } is tight, and by Prokhorov's Theorem (see [3,Theorem 5.1.3]) we obtain that (up to subsequence) {ρ n } converges to a certain density ρ ∈ L 1 This implies that the infimum of E is achieved over a radially decreasing density ρ ∈ Y M . In order to check that all the global minimizers are radially decreasing, we pick any minimizer ρ ∈ Y M and use Remark 2.7 and Lemma 3.2 in order to see that then the equality case in Lemma 3.2 yields the conclusion.
We have the following generalization of Theorem 3.7:

As a consequence, any global minimizer of E verifies
The compactly supported property of the minimizers then follows from (3.17) and Lemmas 3.5-3.6. Moreover, it is straightforward to check that Lemma 3.9 continues to hold, as well as Theorem 3.10.

Long-time Asymptotics
We now consider the particular case of (1.1) given by the Keller Segel model in two dimensions with nonlinear diffusion as where m > 1 and the logarithmic interaction kernel is defined as This system is also referred to as the parabolic-elliptic Keller-Segel system with nonlinear diffusion, since the attracting potential c = N * ρ solves the Poisson equation −∆c = ρ. It corresponds exactly to the range of diffusion dominated cases as discussed in [23] since solutions do not show blow-up and are globally bounded. We will show based on the uniqueness part in Section 2 that not only the solutions to (4.1) exist globally and are uniformly bounded in time in L ∞ , but also the solutions achieve stabilization in time towards the unique stationary state for any given initial mass.
The main tool for analyzing stationary states and the existence of solutions to the evolutionary problem is again the following free energy functional A simple differentiation formally shows that E is decaying in time along the evolution corresponding to (4.1), namely d dt which gives rise to the following (free) energy -energy dissipation inequality for weak solutions for nonnegative initial data ρ 0 (x) ∈ L 1 ((1 + log(1 + |x| 2 ))dx) ∩ L m (R 2 ). The entropy dissipation is given by where here and in the following we use the notation We shall note that h corresponds to δE δρ and that in particular the evolutionary equation (4.1) can be stated as ∂ t ρ = ∇ · (ρ∇h[ρ]). Thus, this equation bears the structure of being a gradient flow of the free energy functional in the sense of probability measures, see [2,9,11,33] and the references therein.
We first prove the global well-posedness of weak solutions satisfying the energy inequality (4.3) in the next subsection as well as global uniform in time estimates for the solutions. In the second subsection, we used the uniform in time estimates together with the uniqueness of the stationary states proved in Section 2 to derive the main result of this section regarding long time asymptotics for (4.1).

4.1.
Global well-posedness of the Cauchy problem. In this section we analyze the existence and uniqueness of a bounded global weak solution for initial data in L 1 log (R 2 ) ∩ L ∞ (R 2 ), where here and in the following we denote L 1 log (R 2 ) = L 1 ((1 + log(1 + |x| 2 ))dx) . Assuming to have a sufficiently regular solution with the gradient of the chemotactic potential being uniformly bounded, Kowalczyk [63] derived a priori bounds in L ∞ with respect to space and time for the Keller-Segel model with nonlinear diffusion on bounded domains. These a priori estimates have been improved and extended to the whole space by Calvez and Carrillo in [23]. We shall demonstrate here how these a priori estimates of [23] can be made rigorous when starting from an appropriately regularized equation leading to the following theorem. For any nonnegative initial data ρ 0 ∈ L 1 log (R 2 ) ∩ L ∞ (R 2 ), there exists a unique global weak solution ρ to (4.1), which satisfies the energy inequality (4.3) with the energy being bounded from above and below in the sense that for some (negative) constant E * . In particular ρ is uniformly bounded in space and time where C depends only on the initial data. Moreover the log-moment grows at most linearly in time where again C depends only on the initial data.
We shall also state the existence result for radial initial data that was obtained in [65] and [61] for higher dimensions and the Newtonian potential. Similar methods can be applied in the case d = 2 considered here:  In the remainder of this section we carry out the proof of the existence of a bounded global weak solution to (4.1) as stated in Theorem 4.1. We therefore introduce the following regularization of (4.1) where m > 1 and the regularized logarithmic interaction potential is defined as Moreover we have for the derivatives The regularization in (4.4) was used by Bian and Liu [8], who studied the Keller-Segel equation with nonlinear diffusion and the Newtonian potential for d ≥ 3, which has been modified accordingly for the logarithmic interaction kernel in d = 2. The additional linear diffusion term in (4.4) removes the degeneracy and the regularized logarithmic potential N ε possesses a uniformly bounded gradient, such that the local well posedness of (4.4) is a standard result for any ε > 0. We shall note that a slightly different regularization for such nonlinear diffusion Keller-Segel type of equations has been introduced by Sugiyama in [80], which also yields the existence and uniqueness of a global weak solution. The advantage of the regularization in (4.4) resembling the one in [8] is the fact that the regularized problem satisfies a free energy inequality, that in the limit gives exactly (4.3), whereas in [80] the dissipation term could only be retained with a factor of 3/4.
We point out that in the case d = 2 other a priori estimates are available than in higher space dimensions leading to a different proof for global well posedness of the Cauchy problem for (4.4) and the limit ε → 0 compared to [8].
4.1.1. Global well posedness of the regularized Cauchy problem. To derive a priori estimates for the regularized problem (4.4) we use the iterative method used by Kowalczyk [63] based on employing test functions that are powers of ρ ε,k = (ρ ε − k) + for some k > 0. When testing (4.4) against pρ p−1 ε,k for any p ≥ 2, we obtain: where for estimating the integrals involving convolution terms we used the inequality see e.g. Lieb and Loss [64]. Closing the estimate (4.6) would yield an estimate for ρ ε,k in L ∞ (0, T ; L p (R 2 )) and thus also for ρ ε ∈ L ∞ (0, T ; L p (R 2 )), since Kowalczyk proceeded from (4.5) with the assumption corresponding to ∇N ε * ρ ε L ∞ ≤ C.
Observe that it would be sufficient to prove ρ ε ∈ L ∞ (0, T ; L p (R 2 )) for some p > 2 implying ∆N ε * ρ ε ∈ L ∞ (0, T ; L p (R 2 )) and hence the uniform boundedness of the gradient term by Sobolev imbedding. Calvez and Carrillo [23] circumvent this assumption and derive the bound by using an equi-integrability property in the inequality (4.6). Hence, in order to being able to follow the ideas of [23] for the regularized problem, we need to derive the corresponding energy inequality for the latter.
Proposition 4.3. For any finite time T > 0 the solution ρ ε to the Cauchy problem (4.4) supplemented with initial data ρ 0 ∈ L 1 log (R 2 ) ∩ L ∞ (R 2 ) satisfies the energy inequality for a positive constant C = C(M, ρ 0 ∞ ) and 0 ≤ t ≤ T , where E ε is an approximation of the free energy functional in (4.2): and D ε the corresponding dissipation In particular, we obtain equi-integrability Remark 4.4. Note that due to the ∆ρ regularization term in (4.4), its associated energy functional actually includes an extra term ρ log ρ compared to E . But in this lemma we choose to obtain an energy inequality for E (rather than the actual associated energy functional), since the absence of the extra term ρ log ρ will make it easier for us to obtain a priori estimates independent of ε later.
where we have used (4.7) and the fact that J ε L 1 (R 2 ) = 1. Hence we need to derive an a priori bound for ρ ε in L 2 (R 2 ). We use the estimate (4.6) for p = 2 and bound R 2 ρ 3 ε,k dx using the Gagliardo-Nirenberg inequality (see for instance [49], [74]) as follows: Then by (4.6) and interpolation of the Hence, choosing k large enough, recalling m > 1 and estimate (4.8), we can conclude by integrating in time that ρ ε (t, ·) 2 L 2 (R 2 ) ≤ C(1 + t) for some constant C = C(M, ρ 0 L ∞ (R 2 ) ), which implies the stated energy inequality.
In order to obtain a priori bounds and in particular the equi-integrability property, we need to bound the energy functional also from below. The difference to the corresponding energy functional for the original model (4.1) lies only in the regularized interaction kernel. Since clearly for all x ∈ R 2 we have log(|x| 2 + ε 2 ) ≥ 2log|x|, we obtain Following [23] we can estimate further using the logarithmic Hardy-Littlewood-Sobolev inequality where C(M ) is a constant depending on the mass M and Now it is easy to verify there is a constant κ = κ(m, M ) > 1 for which We therefore find from (4.9), (4.10) and (4.11) that ) being a constant independent of t. Since Θ + is superlinear at infinity, we obtain the equi-integrability as in Theorem 5.3 in [23].
The equi-integrability from Proposition 4.3 allows to close the estimate (4.6) analogously to Lemma 3.1 of [23] leading to a bound for ρ ε in L ∞ (0, T ; L p (R 2 )). Moreover, using Moser's iterative methods of Lemma 3.2 in [23] we finally get a bound for ρ ε in L ∞ (0, T ; L ∞ (R 2 )). In order to avoid mass loss at infinity typically the boundedness of the second moment of the solution is employed. We here however demonstrate that the bound of the log-moment provides sufficient compactness, having the advantage of less restrictions on the initial data. We therefore denote for the regularization The following lemma is now obtained following the ideas of [23]: where the constant C depends on the initial data.
Proof. Computing formally the evolution of the log-moment in (4.4) in a similar fashion to [26], we find for the test function φ(x) = log(1 + |x| 2 ) after integrating by parts Computing the derivatives of φ we see We thus obtain d dt Integration in time and making use of the energy -energy dissipation inequality (4.9) and the uniform bound on E ε from below in (4.11) gives The argument can easily be made rigorous by using compactly supported approximations of φ on R 2 as test functions, see e.g. also [13]. The proof is concluded by referring to Lemma 3.2 in [23] for the proof of uniform boundedness of ρ ε .
(i) The fact that the uniform bound of ρ ε grows linearly with time originates from the term of order ε in the energy inequality for the regularized equation. Hence the bound on the energy and therefore the modulus of equi-continuity for the regularized problem are depending on time. However, for the limiting equation (4.1) this term vanishes and the energy is decaying for all times, which allows to deduce uniform boundedness of the solution to (4.1) globally in time and space, see also [23,Lemma 5.7].
(ii) The log-moment of ρ ε grows at most linearly in time. The same statement is true for the limiting function. Hence it is only possible to guarantee confinement of mass for finite times. This property allowing for compactness results will in the following be used to pass to the limit in the regularized problem. Due to the growth of the bound with time it cannot be employed for the long-time behavior. Hence different methods will be required.
4.1.2. The limit ε → 0. In order to deduce the global well-posedness of the Cauchy problem for (4.1) it remains to carry out the limit ε → 0. Knowing that the solution remains uniformly bounded and having the bounds from the energy inequality, we obtain weak convergence properties of the solution. In order to pass to the limit with the nonlinearities and in the entropy inequality, strong convergence results will be required. The following lemma summarizes the uniform bounds we obtain from Proposition 4.3 and Lemma 4.5: Lemma 4.7. Let ρ ε be the solution as in Proposition 4.3, then we obtain the following uniform in ε bounds where C depends on m, q, ρ 0 and T .
Proof. The uniform bounds of the L 1 log (R 2 )-and L ∞ (R 2 )-norms follow from the conservation of mass and Lemma 4.5. The convolution term can be estimated as follows: The bound of √ ρ ε ∇N ε * ρ ε in L 2 ((0, T ) × R 2 ) follows now easily by using the conservation of mass.
The basic L 2 -estimate corresponding to (4.5) for p = 2 and k = 0 implies after integration in time Using the above a priori estimates we can further bound employing the inequality in (4.7) Since m > 1, the conservation of mass and the uniform boundedness of ρ ε give ρ m−1/2 ε in L 2 ((0, T )× R 2 ). For the gradient we now use the bound on the entropy dissipation (4.9) It thus now remains to derive the estimate for the time derivative. Using the previous estimates we have for any test function φ ∈ L 2 (0, T ; H 1 (R 2 )), We now use these bounds to derive weak convergence properties. The Dubinskii Lemma (see Lemma 4.23 in the Appendix) can be applied to obtain the strong convergence locally in space, which can be extended to global strong convergence using the boundedness of the log-moment.
The boundedness of the log-moment N (t) allows to extend the strong convergence to the whole space, since for any 1 ≤ q < ∞ we have as R → ∞. Due to the weak lower semi-continuity of the L q -norm we can now conclude with (4.18) that also Hence we can extend the strong convergence locally in space to strong convergence in R 2 : for any 2m ≤ r < ∞ .
Additionally the strong convergence in L 1 ((0, T ) × R 2 ) can be deduced using the bound from the energy as stated in Lemma 4.22 in the Appendix. Interpolation now yields (4.14).
The weak convergence of ρ m−1/2 ε in L 2 (0, T ; H 1 (R 2 )) holds due to its uniform boundedness given by inequality (4.13) and the reflexivity of the latter space, where the limit is identified arguing by the density of spaces. Due to the uniform boundedness of ρ ε this assertion can be extended to any finite power bigger than m − 1/2.

Since moreover
√ ρ ε is uniformly bounded in L 2 ((0, T ) × R 2 ) we have the weak convergence towards √ ρ in L 2 ((0, T ) × R 2 ), where again the limit is identified by using the a.e. convergence of ρ ε from the strong convergence above. To see (4.16) we rewrite The first integral vanishes and the second one converges to 0 due to the weak convergence of Finally the convergence in (4.17) is a direct consequence of the bound √ ε ∇ρ ε L 2 ((0,T )×R 2 ) ≤ C in Lemma 4.7.
These convergence results from Lemma 4.8 are sufficient to obtain the weak convergence of the nonlinearities √ ρ ε ∇h ε [ρ ε ] and ρ ε ∇h ε [ρ ε ] in L 2 ((0, T ) × R 2 ), which allow to pass to the limit in the weak formulation and to deduce the weak lower semicontinuity of the entropy dissipation term: Lemma 4.9. Let ρ ε and ρ be as in Lemma 4.8. Then Proof. Due to (4.15) and (4.16) it remains to verify Due to Lemma 4.7, we have the weak convergence of √ ρ ε ∇N ε * ρ ε in L 2 ((0, T ) × R 2 ; R 2 ). In order to identify the limit we consider for a φ ∈ L 2 ((0, T ) × R 2 ; R 2 ): The first term converges to zero using (4.16), since by (4.12) it is bounded by For the second term we first use the Cauchy-Schwarz inequality To see that this convolution term vanishes we bound further uniformly in x, t, where we substituted s = |x − y|/ε. For the remaining term in (4.21) we proceed changing the order of integration, where we again skip the dependence of ρ ε and φ on t in the following: To prove that this integral vanishes in the limit, due to (4.16) it suffices to show that We shall therefore split the integral into two parts and consider first It remains to bound the integral for |x − y| > 1: Proof of Theorem 4.1. The convergence property of the nonlinearity in (4.20) and the weak convergence of the time derivative due to Lemma 4.7 allow to pass to the limit in the weak formulation of the Cauchy problem for (4.1), where the linear diffusion term vanishes due to (4.17). The uniqueness of the solution is implied from Theorem 1.3 and Corollary 6.1 of [32], where we shall not go further into detail here.
It thus remains to pass to the limit in the energy inequality. Since the energy dissipation is weakly lower semicontinuous due to (4.19), we get In order to obtain the energy inequality (4.3) in the limit ε → 0 it thus remains to show E ε [ρ ε ](t) → E[ρ](t) for t ∈ [0, T ]. Lemma 4.22 and the uniform bounds on ρ ε in Lemma 4.7 directly imply the strong convergence of ρ ε in L ∞ (0, T ; L m (R 2 )). It is therefore left to prove the convergence for the convolution term and we rewrite −4π We split the domain of integration and first analyze the case |x − y| ≥ 1. In this domain, we get and thus it converges to zero as ε → 0. Using the Cauchy-Schwarz inequality, we obtain moreover We now turn to the integration domain |x − y| < 1, where by dominated convergence This proves the convergence of the entropy, which together with the weak lower semicontinuity of the entropy-dissipation leads to the desired energy-energy dissipation inequality (4.3) for the limiting solution ρ.

4.2.
Long-Time Behavior of Solutions. Our main result of Section 2 together with the uniqueness argument for radial stationary solutions to (4.1) of [61] and the characterization of global minimizers in [28] and Corollary 3.11 leads to the following result: Theorem 4.10. There exists a unique stationary state ρ M of (4.1) with mass M and zero center of mass in the sense of Definition 2.1 with the property ρ M ∈ L 1 log (R 2 ). Moreover, ρ M is compactly supported, bounded, radially symmetric and non-increasing. Moreover, the unique stationary state is characterized as the unique global minimizer of the free energy functional (4.2) with mass M .
As a consequence, all stationary states of (4.1) in the sense of Definition 2.1 with mass M are given by translations of the given profile ρ M : For two stationary states ρ M1 and ρ M2 with masses M 1 > M 2 the following relations hold: (a) If m > 2, then ρ M1 has a bigger support and a bigger height than ρ M2 .
(b) If m = 2, then all stationary states have the same support.
We will study now the long time asymptotics for the global weak solutions ρ of (4.1) that according to the entropy inequality in Theorem 4.1 satisfy Since the entropy is bounded from below, this implies for the entropy dissipation Let us therefore now consider the sequence for which we obtain The proof of convergence towards the steady state will be based on weak lower semicontinuity of the entropy dissipation. Assume that ρ k ρ in L ∞ (0, T ; L 1 (R 2 )∩L m (R 2 )), then we have to derive Since the L 2 -norm is weakly lower semicontinuous, it therefore remains to show similarly as in From there it can be deduced that ρ is the stationary state ρ M with M = ρ 0 L 1 (R 2 ) by the uniqueness theorem 4.10, if we can guarantee that no mass gets lost in the limit.
The main difficulty for passing to the limit in the long-time behavior lies in obtaining sufficient compactness avoiding the loss of mass at infinity. Even though the mass of ρ(t, ·) is conserved for all time, if a positive amount of mass escapes to infinity, then a subsequence of ρ(t, ·) may weakly converge to a stationary solution with mass strictly less than M . To rule out this scenario, we need to show that the sequence {ρ(t, ·)} t>0 is tight, which can be done by obtaining uniformin-time bounds for certain moments for ρ(t, ·). So far we only have a time-dependent bound on the logarithmic moment in Theorem 4.1, which is not enough. Moreover, even if we know that {ρ(t, ·)} t>0 is tight, if we want to choose the right limiting profile among all stationary states in S, we need to show the conservation of some symmetry. In fact, it is easy to check that the center of mass should formally be preserved by the evolution due to the antisymmetry of the gradient of the Newtonian potential. But to rigorously justify this, we need to work with moments that are larger than first moment, so the center of mass is well defined.
Below we state the main theorem in this section, where a key argument is to establish a uniformin-time bound on the second moment of ρ(t, ·), if ρ 0 has a finite second moment.
Our aim is to show that the second moment of solutions to (4.1) is uniformly bounded in time for all t ≥ 0. This in turn shows easily that the first moment is preserved in time for all t ≥ 0, as we will prove below. Recall that by (2.15) we denote by M 2 [f ] the second moment of f ∈ L 1 + (R d ). We first derive rigorously the evolution of the second moment in time: starting from the regularized system (4.4). Computing the second moment of the regularized problem, we obtain d dt (4.23) The strong convergence in (4.14) allows to pass to the limit ε → 0 in the first integral of (4.23) and for the remainder term we moreover have due to the conservation of mass and the uniform boundedness of ρ ε The argument can easily be made rigorous by using compactly supported approximations of |x| 2 on R 2 as test functions, see e.g. also [13]. We finally obtain (4.22) by integrating in time.
Now, we want to compare general solutions to (4.1) with its radial solutions. In order to do this we will make use of the concept of mass concentration, which has been recalled in (2.4), and used for instance in [61,44] for classical applications to Keller-Segel type models.
Following exactly the same proof as in [61], the following two results hold for the solutions of (4.1). The first result says that for two radial solutions, if one is initially "more concentrated" than the other one, then this property is preserved for all time. The second result compares a general (possibly non-radial) solution ρ(t, ·) with another solution µ(t, ·) with initial data ρ # (0, ·), i.e., the decreasing rearrangement of the initial data for ρ(t, ·), and it says that the symmetric rearrangement of ρ(t, ·) is always "less concentrated" than the radial solution µ(t, ·). This result generalizes the results from [44] to nonlinear diffusion with totally different proofs. We also refer the interested reader to the survey [86] for a general exposition of the mass concentration comparison results for local nonlinear parabolic equations and to the recent developments obtained in [88], [89] in the context of nonlinear parabolic equations with fractional diffusion. Proposition 4.13. Let m > 1 and f, g be two radially symmetric solutions to (4.1) with f (0, ·) ≺ g(0, ·). Then we have f (t, ·) ≺ g(t, ·) for all t > 0. Proposition 4.14. Let m > 1 and ρ be a solution to (4.1), and let µ be a solution to (4.1) with initial condition µ(0, ·) = ρ # (0, ·). Then we have that µ(t, ·) remains radially symmetric for all t ≥ 0, and in addition we have ρ # (t, ·) ≺ µ(t, ·) for all t ≥ 0. Now we are ready to bound the second moment of solutions in the two-dimensional case: we will show that if ρ(t, ·) is a solution to (4.1) with M 2 [ρ 0 ] finite, then M 2 [ρ(t)] must be uniformly bounded for all time.
It then follows from (2.13) and Lemma 2.5 that  Since µ(t, ·) is also a solution to (4.1), (4.25) also holds when ρ is replaced by µ. Combining this fact with (4.24), we thus have  Remark 4.16. The last result showing uniform-in-time bounds for the second moment for m > 1 finite is also interesting in comparison to the results in [42,43] where the case m → ∞ limit of the gradient flow is analysed. In the "m = ∞" case, the second moment of any solution is actually decreasing in time, leading to the result that all solutions converge towards the global minimizer with some explicit rate. As mentioned in the introduction, a result of this sort for any other potential rather than the attractive logarithmic potential is lacking.
As already mentioned above, a key ingredient in the proof of Theorem 4.12 is the confinement of mass, which is first now obtained as follows: Lemma 4.17. Let ρ be a global weak solution as in Theorem 4.1 with mass M with initial data ρ 0 ∈ L 1 ((1 + |x| 2 )dx) ∩ L ∞ (R 2 ) and consider as above the sequence {ρ k } k∈N = {ρ(· + t k , ·)} k∈N in (0, T ) × R 2 . Then there exists a ρ ∈ L 1 ((0, T ) × R 2 ) ∩ L m ((0, T ) × R 2 ) and a subsequence, that we denote with the same index without loss of generality, such that: as k → ∞.
Proof. The bounds are obtained from the energy-energy dissipation inequality (4.3) in an analogous way to the ones given in Lemma 4.7 with the only difference concerning the replacement of N ε by N , which however makes no difference in the estimate (4.12).
Using these estimates the following convergence properties can be derived in an analogous way to the proof of Lemma 4.8.
These convergence results from Lemma 4.19 and Lemma 4.17 are sufficient to obtain the weak convergence of the nonlinearities √ ρ k ∇h[ρ k ] and ρ k ∇h[ρ k ] in L 2 ((0, T ) × R 2 ), which allows to deduce the weak lower semicontinuity of the entropy dissipation term and to pass to the limit in the weak formulation of (4.1) in the same way as in the proof of Lemma 4.9.
Lemma 4.20. Let ρ k and ρ be as in Lemma 4.19.
This enables us to close the proof of convergence towards the set of stationary states.
Due to the convergence properties in Lemma 4.19, the uniform bound on the second moment (4.27) together with Lemma 4.22 in the Appendix, we can deduce that ρ ∈ L 1 ((1 + |x| 2 )dx) and that ρ k → ρ in L ∞ (0, T ; L 1 (R 2 )). In particular, ρ has mass M .
Putting together all the properties of ρ just proved together with the fact that ∇ρ m ∈ L 2 (R 2 ) due to Lemma (4.19), we infer that ρ corresponds to a steady state of equation (4.1) in the sense of Definition 2.1. The uniqueness up to translation of stationary states in Theorem 4.10 shows that ρ is a translation of ρ M , and thus ρ ∈ S. In fact, we have shown that the limit of all convergent sequences {ρ k } k∈N must be a translation of ρ M . This in turn shows that the set of accumulation points of any time diverging sequence belongs to S.
Finally, in order to identify uniquely the limit, we take advantage of the translational invariance. We first remark that the center of mass of the initial data is preserved for all time due to the antisymmetry of ∇N . Due to Proposition 4.15, all time diverging sequences have uniformly bounded second moments, thus since ρ is an accumulation point of a sequence ρ k , by Lemma 4.22 we have Hence all accumulation points of the sequences have the same center of mass as the initial data. Then, all possible limits reduce to the translation of ρ M to the initial center of mass as desired. Using additionally the confinement of mass from the bound on the log-moment, we obtain → 0 as R → ∞.
For the proof of the following Dubinskii Lemma we refer to [30] or Theorem 12.1 in [66]: Lemma 4.23.