1 Introduction

The theory of fluid mixing has become an active area of research in the applied mathematics community in recent years. Mixing refers to the homogenization process of a heterogeneously distributed physical quantity and can be driven by diffusion or the result of advection by a straining fluid flow. Each of these transport mechanisms has a different action on the mixture. While diffusion balances local differences in concentration, which results thus in a decay in the concentration intensity, advection creates finer and finer filaments and acts thus on the scale of fluctuations. The creation of small scales in turn amplifies the effect of diffusion in the fluid, a phenomenon usually referred to as enhanced dissipation.

In order to describe these phenomena more accurately, we introduce the underlying model equation. Flow enhanced mixing processes of a diffusive medium in an incompressible fluid can be described by the advection-diffusion equation

$$\begin{aligned} \partial _t \theta ^{\kappa }+ u\cdot \nabla \theta ^{\kappa }= \kappa \Delta \theta ^{\kappa }, \end{aligned}$$
(1)

that we consider, for simplicity, in the periodic box \(\mathbbm {T}^d = [0,1]^d\). Here \(\theta ^{\kappa }\) is the physical quantity of interest and u is the divergence-free velocity of the fluid. Since (1) is conservative, it is enough to consider the case in which \(\theta ^{\kappa }\) has zero mean. The constant \(\kappa \) is the diffusivity and can be interpreted as the inverse of the Péclet number in the non-dimensionalized setting. We shall always suppose that the diffusivity is small but finite, \(\kappa \ll 1\).

The above equation is linear as it is assumed that the observed quantity has no feedback on the fluid flow itself—think, for instance, of dye in water. Such quantities are in the literature often referred to as passive scalars.

It is a well-known fact that the advection field has no impact on the \(L^2\) energy balance law,

$$\begin{aligned} \frac{1}{2} \frac{\textrm{d}}{\textrm{d}t}\Vert \theta ^{\kappa }\Vert ^2_{L^2} + \kappa \Vert \nabla \theta ^{\kappa }\Vert _{L^2}^2 = 0. \end{aligned}$$
(2)

By applying the standard Poincaré inequality \(\Vert \theta \Vert _{L^2}^2 \le C \Vert \nabla \theta \Vert _{L^2}^2\) for mean-free functions, we deduce a first estimate on the dissipation rate,

$$\begin{aligned} \Vert \theta ^{\kappa }(t)\Vert _{L^2} \le e^{-C\kappa t} \Vert \theta _0\Vert _{L^2}, \end{aligned}$$
(3)

where \(\theta _0\) is the initial configuration. This estimate is apparently independent of the particular choice of the divergence-free velocity field u and is optimal for the purely diffusive heat equation. However, in situations in which the fluid motion creates fine filaments, the concentration gradients have to increase and in view of the balance law (2), we expect that the energy dissipates at a much higher rate. To be more specific and to fix terminology, we are concerned with the phenomenon of dissipation enhancement if there exists a constant \(D(\kappa )\gg \kappa \) such that

$$\begin{aligned} \Vert \theta ^{\kappa }(t)\Vert _{L^2} \le e^{-D(\kappa ) t} \Vert \theta _0\Vert _{L^2} \end{aligned}$$
(4)

for every choice of the initial datum.

A first qualitative evidence of such enhanced dissipation effects was provided in [10]. In this paper, a sharp characterization of (steady) incompressible flows that are dissipation enhancing is established. Quantitative results with precise exponential decay rates have been obtained only very recently. Particularly well understood is the effect of enhanced dissipation in the class of shear flows. Here, the dissipation rate increases to \(D(\kappa ) \sim \kappa ^{\gamma }\) for some \(\gamma \in (0,1)\) as observed, for instance, in [6, 9, 11, 13, 14, 34].Footnote 1 The results in these works rely on the particularly simple structure of shear velocity fields and there is no (obvious) way of translating the techniques developed in there to chaotic or turbulent fluid motions. The only rigorous result known to the author in which enhanced dissipation could be established in a more complex setting is for randomly forced (and, in fact, chaotic [3]) fluids systems, which include stochastic Stokes and 2D Navier–Stokes equations [4]: It features a dissipation rate that depends logarithmically on the diffusivity constant, \(D(\kappa ) \sim \log ^{-1}\frac{1}{\kappa }\). The velocity fields constructed in this work are quite regular as they satisfy stochastic analogues of the bound

$$\begin{aligned} \int _0^t \Vert \nabla u\Vert _{L^2}\, \textrm{d}t\lesssim 1+t. \end{aligned}$$
(5)

In the present work, we make an attempt to derive bounds on the maximal rates of enhanced dissipation that apply to a large class of velocity fields and require only assumptions on their regularity but not on their structure. For instance, we will show that for any flow in the regularity class (5), the rate of enhanced dissipation is bounded as \(D(\kappa ) \lesssim \log ^{-1}\frac{1}{\kappa }\), and thus, if we neglect the stochastic origin for a moment, our findings prove that the estimates in [4] are optimal. Vice versa, as the construction in the latter work saturates our bound, the estimate \(D(\kappa ) \lesssim \log ^{-1}\frac{1}{\kappa }\) derived here has to be sharp.

The rates of enhanced dissipation are intimately related to the rates of mixing (i.e., the decay of scales) in the non-diffusive setting, usually measured in terms of negative Sobolev norms [22, 24, 32]. A first rigorous connection between both phenomena was established in [12, 18]. In [12], the authors obtain a rate \(D(\kappa ) \sim \log ^{-2}\frac{1}{\kappa }\), provided that there exists a Lipschitz regular velocity field that mixes (for \(\kappa =0\)) any initial datum at an exponential rate. Our result thus shows the optimality possibly modulo a power on the logarithm. Exponential rates are optimal in the non-diffusive setting for velocity fields in the class (5), see [1, 5, 15,16,17, 19, 21, 23, 29, 35].

We remark that proving upper bounds on the rates of enhanced dissipation is quite different from proving lower bounds on the \(L^2\) decay. Indeed, the true counterpart of (4) would be a lower bound of the form

$$\begin{aligned} \Vert \theta ^{\kappa }(t)\Vert _{L^2} \ge e^{-D(\kappa )t} \Vert \theta _0\Vert _{L^2}, \end{aligned}$$

which is a long-standing open problem (and is, in fact, controversial). Moreover, it is clear that bounds of this type cannot be true without additional assumptions on the regularity of the initial datum since they are false for the heat equation. Of course, the same problem applies to the question that we address here.

In this regard, the progress made in the present paper is certainly modest. Yet, our contribution is substantial as it establishes limitations on the effect of enhanced dissipation for the first time. Moreover, upper bounds on the dissipation rate \(D(\kappa )\) single out a time scale that is characteristic for the diffusion in the presence of an irregular flow field.

We shall now state and discuss our precise result.

Theorem 1

Suppose that u is a divergence-free velocity field satisfying

$$\begin{aligned} \int _0^t \Vert \nabla u\Vert _{L^p}\, \textrm{d}t\lesssim 1 + t^{\alpha }, \end{aligned}$$
(6)

for some \(p\in (1,\infty ]\) and \(\alpha \in [0,1]\). Let \(q\in [1,\infty ]\) be such that \(\frac{1}{p}+\frac{1}{q}\le 1\), and suppose \(\theta _0\) is a mean-zero initial configuration satisfying \(\Vert \theta _0\Vert _{L^1}\sim \Vert \theta _0\Vert _{L^q}\sim 1\) and \(\Vert \nabla \theta _0\Vert _{L^1}\lesssim 1\). If there are positive constants D and \(\Lambda \) such that

$$\begin{aligned} \Vert \theta ^{\kappa }(t)\Vert _{L^q} \le \Lambda e^{-Dt}\Vert \theta _0\Vert _{L^q} , \end{aligned}$$
(7)

for any \(t>0\), then there exists a constant \(C>1\) independent of \(D,\Lambda \) and \(\kappa \) such that

$$\begin{aligned} D\lesssim {\left\{ \begin{array}{ll}\Lambda ^{\frac{1}{\alpha }}\log ^{-\frac{1}{\alpha }}\frac{1}{\kappa }&{} \text{ if } \alpha >0,\\ e^{C\Lambda }\kappa &{}\text{ if } \alpha =0, \end{array}\right. } \end{aligned}$$
(8)

provided that \(\kappa \ll 1\).

For notational simplicity, we will focus on the case \(p=q=2\) in the following discussion of Theorem 1. We first note that the velocity field in (5) corresponds to the case \(\alpha =1\) in (6) and generalizes the important class of velocity fields with a fixed power budget \(\Vert \nabla u\Vert _{L^2} \le 1\). In industrial stirring processes, this quantity describes the amount of work an agent spends per time unit to maintain stirring. We have included the general factor \(\Lambda \) in (7) to gain some degree of freedom. For instance, if we consider enhanced dissipation estimates in the form (4), this \(\Lambda \) is just 1. This way, we deduce \(D\lesssim \log ^{-1}\frac{1}{\kappa }\) from (8) as announced earlier, and this estimate is optimal in light of the results from [4]. On the other hand, when studying enhanced dissipation with diffusivity independent rates \(D\sim 1\), our findings indicate that the prefactor \(\Lambda \) has to diverge at least logarithmically, \(\log \frac{1}{\kappa }\lesssim \Lambda \). Whether this estimate is optimal or not is not known to the author. We remark that the work [4] provides an estimate with a diffusivity-independent rate, \(D\sim 1\), and a diverging prefactor of size \(\Lambda \sim \frac{1}{\kappa }\). To the best of our knowledge, diffusivity-independent rates have not (yet) been observed in the deterministic setting. Apparently, they do not result from a spectral gap estimate and it is not clear to the author whether these are generic.

In a certain sense, the two different considerations \(\Lambda =1\) and \(D\sim 1\) correspond to global-in-time and large-time estimates, respectively. Indeed, setting \(t=0\) in (7) and considering (3), it is obvious that \(\Lambda =1\) applies at small times and thus this choice in (7) gives a globally-in-time applicable dissipation rate. To understand why \(D\sim 1\) is expected for the large time dynamics, it is worth to have a short but oversimplified heuristic discussion on flow-induced mixing in diffusive media. In case that the initial configuration is sufficiently smooth, early stage mixing is essentially due to advection, that is, the reduction of the average scale. In this stage, the decay of the \(L^2\) norm is rather slow. Diffusion becomes relevant only at later times, when the typical scale is reduced to the so-called Batchelor scale [2], at which diffusion and advection balance. Dimensional arguments suggest that this scale is of the order \(\sqrt{\kappa }\), independently of the precise value of \(\alpha >0\) in our assumption (6). The later time dynamics are then governed by the diffusion process, with an exponential rate D proportional to \(\kappa |k|^2\), where k is the dominated wavenumber, which should be inverse to the Batchelor scale. Hence, \(D\sim \kappa |k|^2 \sim 1\) is the expected dissipation rate for large times. It was observed numerically in [25]. The prefactor \(\Lambda \) in this case is difficult to predict because it depends sensitively on the precise rate of early stage mixing and on the crossover time (which is decreasing with \(\alpha \)) from advection-dominated to diffusion-dominated mixing. Our estimates in (8) provide first rigorous bounds on that number.

The cases \(\alpha \in (0,1)\) correspond to situations in which the mixing efficiency of the flow is decaying in time, which is relevant, for instance, in case of the Navier–Stokes equations without forcing. Here, the effect of viscous dissipation slows down the velocity field and the bound (6) holds true with \(\alpha =1/2\). A more academic application is a mixing process with given, but decaying power budget. Our estimates suggest that the dissipation rate D in (4) is getting smaller when the advection slows down. This is certainly expected, and our bound is again consistent with the findings in [12] modulo a factor of 2 in the exponent. In case the mixing flow becomes negligible in finite time, \(\alpha =0\), we cannot expect more than a change in the prefactor compared to the purely diffusive bound (3), and thus, in this case (8) is sharp. If one aims for dissipation rates that are uniform in D, our estimates imply that \(\Lambda \gtrsim \log \frac{1}{\kappa }\) for any value of \(\alpha \). This lower bound is very likely not optimal because \(\Lambda \) is supposed to be growing as a function of \(\alpha \). The author is not aware of any rigorous bounds on \(\Lambda \) apart from those presented here.

We finally remark that the bound in (8) can be chosen independently of the precise gradient bound on the initial datum as long as \(1/\kappa \) is large compared to \(\Vert \nabla \theta _0\Vert _{L^1}\), which is guaranteed in the hypothesis of the theorem.

Around the same time a first version of the present paper was distributed by the author, Bruè and Nguyen uploaded a paper on the arXiv, that contains (among other results on diffusive mixing) estimates that are similar to (but slightly weaker than) those in Theorem 1, cf. [8]. Indeed, in this work, the case \(\alpha =1\) , \(p>2 \) and \(\Lambda \sim 1\) is treated and D is bounded by \(\log ^{-\frac{p-1}{p}}\frac{1}{\kappa }\).

Considering the fact that estimates on enhanced dissipation have been completely open for many years, the proof of Theorem 1 is surprisingly short. It relies, however, heavily on a stability theory for continuity equations developed by the author in [30], which in turn grew out of the Crippa–De Lellis theory for Lagrangian flows [15] and its Eulerian adaption in studies of coarsening problems [7, 26] and mixing phenomena [29]. An understanding of the role of diffusive perturbations, which is, in a certain sense, the view point taken here, has been developed in the study of numerical schemes featuring numerical diffusion [27, 28]. The theory is based on suitable Kantorovich–Rubinstein distances, which have their origin in the theory of optimal mass transportation. For a review of our method, we refer to [31].

The remainder of this article is devoted to the proofs.

2 Proofs

We will make use of the following Kantorovich–Rubinstein distance with logarithmic cost function, which was introduced in [30]. For \(\delta >0\) and any mean zero function \(\theta \) on \(\mathbbm {T}^d\), we define

$$\begin{aligned} D_{\delta }(\theta ) = \inf _{\pi \in \Pi (\theta ^+,\theta ^-)} \iint \log \left( \frac{|x-y|}{\delta } +1\right) \textrm{d}\pi (x,y), \end{aligned}$$

where \(\theta ^+\) and \(\theta ^-\) denote the positive and negative parts of \(\theta \), respectively, and \(\Pi (\theta ^+,\theta ^-)\) is the set of transport plans \(\pi :\mathbbm {T}^d\times \mathbbm {T}^d\rightarrow \mathbbm {R}_+\) with marginals \(\theta ^+\) and \(\theta ^-\), i.e.,

$$\begin{aligned} \iint \varphi (x)+\psi (y)\, \textrm{d}\pi (x,y) = \int \varphi \theta ^+\,\textrm{d}x+ \int \psi \theta ^- \,\textrm{d}x, \end{aligned}$$

for all continuous functions \(\varphi \) and \(\psi \) on the torus. We remark that the Kantorovich–Rubinstein distance is finite only if \(\theta \) has zero mean, because then, both \(\theta ^+\) and \(\theta ^-\) have the same total mass,

$$\begin{aligned} \int \theta ^+\,\textrm{d}x= \int \theta ^-\,\textrm{d}x. \end{aligned}$$

Our subsequent proofs will not use many properties of Kantorovich–Rubinstein distances, as some of the key estimates, above all the following lemma, can be taken from the existing literature. Yet, we refer the interested reader to [33] for a comprehensive introduction into the theory of optimal transportation.

The rate of change of \(D_{\delta }\) under solutions to advection-diffusion equations has been investigated in [26, 31], see also [7, 29, 30] for related estimates in the purely advective case.

Lemma 1

[26, 31]. Let \(\theta ^{\kappa }\) be a mean-zero solution to the advection-diffusion equation (1). Then \(D_{\delta }(\theta ^{\kappa })\) is absolutely continuous and it holds

$$\begin{aligned} \left| \frac{\textrm{d}}{\textrm{d}t} D_{\delta }(\theta ^{\kappa })\right| \lesssim \Vert \nabla u\Vert _{L^p} \Vert \theta ^{\kappa }\Vert _{L^{p'}} +\frac{\kappa }{\delta } \Vert \nabla \theta ^{\kappa }\Vert _{L^1}, \end{aligned}$$
(9)

where \(\frac{1}{p}+\frac{1}{p'}=1\).

Apart from its applications for mixing that we elaborate in the following, this estimate can be used to quantify the (weak) convergence in the vanishing diffusivity limit \(\kappa \rightarrow 0\). In this context, \(\delta \) can be interpreted as the order of convergence. This observation has been exploited in order to bound the approximation error due to numerical diffusion generated by the upwind finite volume scheme for continuity equations in [27, 28].

Our next result is a lower bound on the Kantorovich–Rubinstein distance in terms of the \(L^1\) norms of \(\theta \) and its gradient.

Lemma 2

Let \(\theta \) be a mean zero function in \(W^{1,1}(\mathbbm {T}^d)\). Then there exists a constant \(C>0\) such that

$$\begin{aligned} D_{\delta }(\theta ) \gtrsim \log \left( \frac{\Vert \theta \Vert _{L^1}}{\delta C\Vert \nabla \theta \Vert _{L^1}}+1\right) \Vert \theta \Vert _{L^1}. \end{aligned}$$
(10)

The statement of the lemma is a consequence of an interpolation inequality between Kantorovich–Rubinstein distances with logarithmic cost function and the Sobolev norm, a variation of which was proved previously in [7, 26, 29]. It is a generalization of the endpoint Kantorovich–Sobolev inequality

$$\begin{aligned} 1\lesssim \log ^{-1}\left( \Vert \nabla \theta \Vert _{L^1}^{-1}+1\right) \inf _{\pi \in \Pi (\theta ^+,\theta ^-)} \iint \log \left( |x-y|+1\right) \, \textrm{d}\pi (x,y) \end{aligned}$$

for probability distributions, see [20] for standard Wasserstein versions.

Proof

We recall that by duality, it holds that

$$\begin{aligned} \Vert \theta \Vert _{L^1} = \sup _{\Vert \psi \Vert _{L^{\infty }}\le 1}\int \theta \psi \,\textrm{d}x. \end{aligned}$$
(11)

We now pick \(\psi \) arbitrary with \(\Vert \psi \Vert _{L^{\infty }}\le 1\) and denote by subscript R the convolution with a standard mollifier of scale R. We then split

$$\begin{aligned} \int \theta \psi \,\textrm{d}x= \int (\theta -\theta _R)\psi \,\textrm{d}x+ \int \theta \psi _R\,\textrm{d}x, \end{aligned}$$
(12)

where we have used symmetry properties of the mollifier to shift the subscript from \(\theta \) to \(\psi \).

For the first term, we use the fact that \(\Vert \theta -\theta _R\Vert _{L^1} \lesssim R\Vert \nabla \theta \Vert _{L^1}\), so that

$$\begin{aligned} \int (\theta -\theta _R)\psi \,\textrm{d}x\lesssim R\Vert \nabla \theta \Vert _{L^1}. \end{aligned}$$
(13)

For the second one, we introduce a second auxiliary length scale r and write

$$\begin{aligned}&\int \theta \psi _R\,\textrm{d}x\\&\quad = \int (\theta ^+-\theta ^-)\psi _R\,\textrm{d}x\\&\quad = \iint (\psi _R(x)-\psi _R(y))\,\textrm{d}\pi (x,y)\\&\quad = \iint _{|x-y|\le r} (\psi _R(x)-\psi _R(y))\,\textrm{d}\pi (x,y)+ \iint _{|x-y|> r} (\psi _R(x)-\psi _R(y))\,\textrm{d}\pi (x,y), \end{aligned}$$

where \(\pi \in \Pi (\theta ^+,\theta ^-)\) is an arbitrary transport plan and we have used its marginal conditions in the second equality. On the one hand, because \(\psi _R\) is Lipschitz and \(\Vert \nabla \psi _R\Vert _{L^{\infty }}\lesssim \frac{1}{R}\Vert \psi \Vert _{L^{\infty }}\le \frac{1}{R}\), we have that

$$\begin{aligned} \iint _{|x-y|\le r} (\psi _R(x)-\psi _R(y))\,\textrm{d}\pi (x,y) \lesssim r \Vert \nabla \psi _R\Vert _{L^{\infty }}\iint \textrm{d}\pi (x,y) \lesssim \frac{r}{R} \Vert \theta \Vert _{L^1}. \end{aligned}$$

On the other hand, using \(\Vert \psi _R\Vert _{L^{\infty }}\le \Vert \psi \Vert _{L^{\infty }}\le 1\), the monotonicity of the logarithm and setting \(c(z) = \log \left( \frac{z}{\delta }+1\right) \), we estimate

$$\begin{aligned} \iint _{|x-y|> r} (\psi _R(x)-\psi _R(y))\,\textrm{d}\pi (x,y)&\le \frac{2\Vert \psi _R\Vert _{L^{\infty }}}{c(r)} \iint c(|x-y|)\, \textrm{d}\pi (x,y)\\&\lesssim \frac{1}{c(r)} \iint c(|x-y|)\, \textrm{d}\pi (x,y). \end{aligned}$$

Combining the previous estimates and optimizing in \(\pi \) on the right-hand side, we conclude that

$$\begin{aligned} \int \theta \psi _R\,\textrm{d}x\lesssim \frac{r}{R}\Vert \theta \Vert _{L^1} + \frac{1}{c(r)}D_{\delta }(\theta ). \end{aligned}$$

Plugging this estimate and (13) into the decomposition (12), we arrive at

$$\begin{aligned} \int \theta \psi \,\textrm{d}x\lesssim R\Vert \nabla \theta \Vert _{L^1} + \frac{r}{R}\Vert \theta \Vert _{L^1} + \frac{1}{c(r)}D_{\delta }(\theta ), \end{aligned}$$

for any \(\psi \) such that \(\Vert \psi \Vert _{L^{\infty }}\le 1\). Maximizing in \(\psi \) on the left-hand side and choosing \(R\gg r\), we deduce that

$$\begin{aligned} \Vert \theta \Vert _{L^1} \lesssim r \Vert \nabla \theta \Vert _{L^1} + \frac{1}{c(r)}D_{\delta }(\theta ), \end{aligned}$$

and thus, the result follows upon choosing \(r\ll \frac{\Vert \theta \Vert _{L^1}}{\Vert \nabla \theta \Vert _{L^1}}\). \(\square \)

We are now in the position to prove our bound on the dissipation rate.

Proof of Theorem 1

We may without loss of generality assume that \(D\le 1\) and \(\Lambda \ge 1\).

We notice that 1/D is the dissipation time scale, and we set \(t_n = n/D\) for \(n\in \mathbbm {N}\). Integrating (9) over \([t_n,t_{n+1}]\) yields

$$\begin{aligned} \left| D_{\delta }(\theta ^{\kappa }(t_{n+1})) - D_{\delta }(\theta ^{\kappa }(t_n))\right| \lesssim \int _{t_n}^{t_{n+1}}\Vert \nabla u\Vert _{L^p} \Vert \theta ^{\kappa }\Vert _{L^q}\, \textrm{d}t+\frac{\kappa }{\delta } \int _{t_n}^{t_{n+1}} \Vert \nabla \theta ^{\kappa }\Vert _{L^1}\, dt.\nonumber \\ \end{aligned}$$
(14)

If \(q\ge 2\), we use Jensen’s inequality and the energy balance (2) to bound the gradient term on the right-hand side,

$$\begin{aligned} \frac{\kappa }{\delta } \int _{t_n}^{t_{n+1}} \Vert \nabla \theta ^{\kappa }\Vert _{L^1}\, dt&\le \frac{1}{\delta } \sqrt{\frac{\kappa }{D}} \left( \kappa \int _{t_n}^{t_{n+1}} \Vert \nabla \theta ^{\kappa }\Vert _{L^2}^2\, \textrm{d}t\right) ^{\frac{1}{2}}\\&\le \frac{1}{\delta } \sqrt{\frac{\kappa }{D}} \Vert \theta ^{\kappa }(t_n)\Vert _{L^2}\\&\le \frac{1}{\delta } \sqrt{\frac{\kappa }{D}} \Vert \theta ^{\kappa }(t_n)\Vert _{L^q}. \end{aligned}$$

Otherwise, if \(q\le 2\), we use the generalized energy equality

$$\begin{aligned} \Vert \theta ^{\kappa }(t_{n+1})\Vert _{L^q}^q + \kappa q(q-1)\int _{t_n}^{t_{n+1}} \int _{\mathbbm {T}^d} |\theta ^{\kappa }|^{q-2}|\nabla \theta ^{\kappa }|^2\,\textrm{d}x\, \textrm{d}t= \Vert \theta ^{\kappa }(t_n)\Vert ^q_{L^q}, \end{aligned}$$

which is derived via a standard computation, and we estimate via interpolation and Jensen’s inequality

$$\begin{aligned} \frac{\kappa }{\delta } \int _{t_n}^{t_{n+1}} \Vert \nabla \theta ^{\kappa }\Vert _{L^1}\, \textrm{d}t&\le \frac{\kappa }{\delta } \int _{t_n}^{t_{n+1}} \left( \int _{\mathbbm {T}^d} |\theta ^{\kappa }|^{q-2}|\nabla \theta ^{\kappa }|^2\,\textrm{d}x\right) ^{\frac{1}{2}}\Vert \theta ^{\kappa }\Vert _{L^{2-q}}^{\frac{2-q}{2}}\, \textrm{d}t\\&\le \frac{1}{\delta } \sqrt{\frac{\kappa }{D}} \left( \kappa \int _{t_n}^{t_{n+1}} \int _{\mathbbm {T}^d} |\theta ^{\kappa }|^{q-2}|\nabla \theta ^{\kappa }|^2\,\textrm{d}x\, \textrm{d}t\right) ^{\frac{1}{2}} \Vert \theta ^{\kappa }(t_n)\Vert _{L^q}^{\frac{2-q}{q}}\\&\le \frac{1}{\delta } \sqrt{\frac{\kappa }{D}} \Vert \theta ^{\kappa }(t_n)\Vert _{L^q}. \end{aligned}$$

In either case, we have that

$$\begin{aligned} \frac{\kappa }{\delta } \int _{t_n}^{t_{n+1}} \Vert \nabla \theta ^{\kappa }\Vert _{L^1}\, \textrm{d}t\lesssim \frac{1}{\delta } \sqrt{\frac{\kappa }{D}} \Vert \theta ^{\kappa }(t_n)\Vert _{L^q} \le \frac{\Lambda }{\delta } \sqrt{\frac{\kappa }{D}} e^{-Dt_n}, \end{aligned}$$
(15)

where we have used (7) in the last inequality.

Regarding the velocity term in (14), we now observe that the budget constraint (6) and the enhanced dissipation assumption (7) imply that

$$\begin{aligned} \begin{aligned} \int _{t_n}^{t_{n+1}} \Vert \nabla u\Vert _{L^p}\Vert \theta ^{\kappa }\Vert _{L^q}\, \textrm{d}t&\lesssim \Lambda e^{-Dt_n} \left( 1+t_{n+1}^{\alpha }\right) \\&\lesssim \Lambda e^{-Dt_{n+2}} t_{n+2}^{\alpha }\\&\lesssim \Lambda D^{-\alpha } e^{-Dt_{n+2}/2} , \end{aligned} \end{aligned}$$
(16)

where in the last inequality we use that the mapping \(t\mapsto t^{\alpha } e^{-Dt/2}\) is decreasing for \(t\ge t_2 \).

Inserting the two estimates (15) and (16) into (14) then gives the bound

$$\begin{aligned} \left| D_{\delta }(\theta ^{\kappa }(t_{n+1})) - D_{\delta }(\theta ^{\kappa }(t_n))\right| \lesssim \Lambda D^{-\alpha } e^{-Dt_{n+2}/2} +\frac{\Lambda }{\delta } \sqrt{\frac{\kappa }{D}} e^{-Dt_n}. \end{aligned}$$

Hence, summing over n and recalling that \(t_n = n/D\), we find that

$$\begin{aligned} D_{\delta }(\theta _0)\lesssim & {} D_{\delta }(\theta ^{\kappa }(t_N)) + \frac{\Lambda }{D^{\alpha }} \sum _{n=0}^{N-1} e^{-\frac{n}{2}} +\frac{\Lambda }{\delta }\sqrt{\frac{\kappa }{D}} \sum _{n=0}^{N-1}e^{-n} \nonumber \\\lesssim & {} D_{\delta }(\theta ^{\kappa }(t_N)) + \frac{\Lambda }{D^{\alpha }} +\frac{\Lambda }{\delta }\sqrt{\frac{\kappa }{D}} . \end{aligned}$$
(17)

We now use the lower bound on our Kantorovich–Rubinstein distance (10) and the assumption on the initial datum to estimate the left-hand side from below. It holds that

$$\begin{aligned} \log \left( \frac{1}{C\delta }+1\right) \lesssim D_{\delta }(\theta _0), \end{aligned}$$

for some \(C>0\). This constant can be chosen larger than 1 without restrictions, and thus,

$$\begin{aligned} \log \left( \frac{1}{C\delta }+1\right) \ge \log \left( \frac{1}{\delta }+1\right) -\log C\gtrsim \log \left( \frac{1}{\delta }+1\right) \ge \log \frac{1}{\delta }, \end{aligned}$$

if \(\delta \) is sufficiently small, which we will ensure later. On the other hand, because \(|x-y|\le 1\) on the torus, we have the following brutal estimate on the Kantorovich–Rubinstein distance

$$\begin{aligned} D_{\delta }(\theta ^{\kappa }) \le \log \left( \frac{1}{\delta }+1\right) \iint \textrm{d}\pi (x,y) \lesssim \log \left( \frac{1}{\delta }+1\right) \Vert \theta ^{\kappa }\Vert _{L^1}, \end{aligned}$$

and thus by (7), and the fact that \(Dt_N=N\), (17) becomes

$$\begin{aligned} \log \frac{1}{\delta } \lesssim \Lambda \log \left( \frac{1}{\delta }+1\right) e^{-N} + \frac{\Lambda }{D^{\alpha }} +\frac{\Lambda }{\delta }\sqrt{\frac{\kappa }{D}}. \end{aligned}$$

Because N was arbitrary, the first term on the right-hand side can be dropped. We thus arrive at

$$\begin{aligned} \frac{1}{C}\log \frac{1}{\delta } \le \frac{\Lambda }{D^{\alpha }} +\frac{\Lambda }{\delta }\sqrt{\frac{\kappa }{D}}, \end{aligned}$$
(18)

for some constant C independent of \(\Lambda \) and D.

To conclude, we consider separately the cases \(\alpha \le 1/2\) and \(\alpha >1/2\).

Case \(\alpha \le 1/2\). We may without loss of generality assume that \(\Lambda ^2 \kappa \ll D\), because otherwise, if \(D\lesssim \Lambda ^2 \kappa \), the statement can be deduced from the estimate

$$\begin{aligned} \Lambda ^2 \kappa \lesssim \Lambda ^{\frac{1}{\alpha }}\log ^{-\frac{1}{\alpha }} \frac{1}{\kappa }, \end{aligned}$$

which is trivially satisfied since \(\Lambda \ge 1\) and \(\kappa \ll 1\). We optimize (18) with respect to \(\delta \) by choosing \(\delta = C\Lambda \sqrt{\frac{\kappa }{D}}\). Note that this \(\delta \) can assumed to be small by the previous argument. With this, estimate (18) becomes

$$\begin{aligned} 0 \le \frac{C \Lambda }{D^{\alpha }} + 1 + \log \left( C\Lambda \sqrt{\frac{\kappa }{D}}\right) . \end{aligned}$$

On the one hand, if \(\alpha =0\), the latter implies that

$$\begin{aligned} \log \frac{D}{\kappa } \lesssim 1+ \log \Lambda + \Lambda \lesssim \Lambda , \end{aligned}$$

since \(\Lambda \ge 1\). On the other hand, if \(\alpha >0\), we deduce for small \(\kappa \) that

$$\begin{aligned} \log \frac{1}{\kappa } \lesssim \frac{\Lambda }{D^{\alpha }}+ \log \Lambda +\log \frac{1}{D}\lesssim \frac{\Lambda }{D^{\alpha }}. \end{aligned}$$

In either case, we infer the desired estimate.

Case \(\alpha >1/2\). In this case, we choose \(\delta = \kappa ^{\frac{1}{2}} D^{\alpha - \frac{1}{2}}\) in (18), which is small because \(D\le 1 \), to the effect that

$$\begin{aligned} \log \frac{1}{\kappa } \lesssim \log \frac{1}{\kappa } + \log \frac{1}{D} \lesssim \frac{\Lambda }{D^{\alpha }}. \end{aligned}$$

Again, this is the stated estimate. \(\square \)