1 Introduction

Given \(N\ge 2\), we are interested in the following system of generalized Langevin equations in \(\mathbb {R}^d\), \(d\ge 1\):

$$\begin{aligned} \text {d}\, x_i(t)&= v_i(t)\text {d}t,\qquad i=1,\dots ,N, \nonumber \\ m \,\text {d}\, v_i(t)&= -\gamma v_i(t) \text {d}t -\nabla U(x_i(t))\text {d}t- \sum _{j\ne i}\nabla G\big (x_i(t)-x_j(t)\big ) \text {d}t +\sqrt{2\gamma } \,\text {d}W_{i,0}(t)\nonumber \\&\qquad +\int _0^t K_i(t-s)v_i(s)\text {d}s\text {d}t+F_i(t)\text {d}t. \end{aligned}$$
(1.1)

System (1.1) is introduced to describe the evolution of N interacting micro-particles in a thermally fluctuating viscoelastic medium, see Baczewski and Bond (2013), Bockius et al. (2021), Duong (2015), Duong and Pavliotis (2019), Gomes et al. (2020), Gottwald et al. (2015), Jung et al. (2018, Ness et al. (2015), Wei et al. (2016) and references therein. The bivariate process \((x_i(t),v_i(t))\), \(i=1,\dots ,N\) represents the position and velocity of the \(i{\text {th}}\) particle. On the \(v_i\)-equation in (1.1), \(m>0\) is the particle’s mass, \(\gamma >0\) governs the viscous friction, \(\{W_{i,0}\}_{i=1,\ldots , N}\) are independent standard d-dimensional Wiener processes, \(U:\mathbb {R}^d \rightarrow [0,\infty )\) represents a confining potential satisfying polynomial growth and certain dissipative conditions and \(G:\mathbb {R}^d\setminus \{0\}\rightarrow \mathbb {R}\) is a singular repulsive potential. Furthermore, the \(i{\text {th}}\) particle is subjected to a convolution term involving the convolution kernel \(K_i:[0,\infty )\rightarrow [0,\infty )\) that characterizes the delayed response of the fluid to the particle’s past movement (Kneller 2011; Mason and Weitz 1995; Mori 1965). In accordance with the fluctuation-dissipation relationship (Kubo 1966; Zwanzig 2001), the random force \(F_i(t)\) is a mean-zero, stationary Gaussian process linked to \(K_i\) via the relation

$$\begin{aligned} \mathbb {E}[F_i(t)F_i(s)]=K_i(|t-s|). \end{aligned}$$
(1.2)

In the absence of memory effects, that is setting \(K_i\equiv 0\) and \(F_i\equiv 0\), (1.1) is reduced to the classical underdamped Langevin system modeling Brownian particles driven by repulsive external forces

$$\begin{aligned} \text {d}\, x_i(t)&= v_i(t)\text {d}t,\quad i=1,\dots ,N, \nonumber \\ m\,\text {d}\, v_i(t)&= -\gamma v_i(t) \text {d}t -\nabla U(x_i(t))\text {d}t -\sum _{j\ne i} \nabla G(x_i(t)-x_j(t)) \text {d}t+\sqrt{2\gamma } \,\text {d}W_{i,0}(t). \end{aligned}$$
(1.3)

In particular, the large-time asymptotic of (1.3) is well-understood. That is under a wide class of polynomial potential U and singular potential G, including the instances of Lennard–Jones and Coulomb functions, system (1.3) admits a unique invariant probability measure which is exponentially attractive and whose formula is given by Conrad and Grothaus (2010), Cooke et al. (2017), Grothaus and Stilgenbauer (2015), Herzog and Mattingly (2019), Lu and Mattingly (2019)

$$\begin{aligned} \pi (\textrm{dx}, \textrm{dv})=\frac{1}{Z}\exp \Big \{-\Big (\frac{1}{2}\sum _{i=1}^N m|v_i|^2+\sum _{i=1}^N U(x_i)+\!\!\!\sum _{1\le i<j\le N}\!\!\!G(x_i-x_j)\Big ) \Big \}\textrm{dxdv}. \end{aligned}$$

In the above, Z is the normalization constant, \(\textrm{x}=(x_1,\dots ,x_N)\) and \(\textrm{v}=(v_1,\dots ,v_N)\). However, as pointed out elsewhere in Kubo (1966), Zwanzig (2001), the presence of elasticity in a viscoelastic medium induces a memory effect between the motion of the particles and the surrounding molecular bombardment. It is thus more physically relevant to consider (1.1). On the other hand, in the absence of singularities in (1.1) (\(G\equiv 0\)), there is a vast literature in the context of, e.g., large-time behaviors (Glatt-Holtz et al. 2020; Herzog et al. 2023; Ottobre and Pavliotis 2011; Pavliotis 2014; Pavliotis et al. 2021) as well as small-mass limits (Herzog et al. 2016; Hottovy et al. 2015; Lim and Wehr 2019; Lim et al. 2020; Nguyen 2018; Shi and Wang 2021). In contrast, much less is known about the system (1.1) in the presence of both memory kernels and singular potentials for any of those limiting regimes.

The main goal of the present article is thus twofold. Firstly, under a general set of conditions on the nonlinearities and memory kernels, we asymptotically characterize the equilibrium of (1.1) when \(t\rightarrow \infty \). More specifically, we aim to prove that under these practical assumptions, (1.1) is exponentially attractive toward a unique ergodic probability measure. Secondly, we explore the behaviors of (1.1) in the small-mass regime, i.e., by taking m to zero on the right-hand side of the \(v_i\)-equation in (1.1). Due to the singular limit when m is small, the velocity \(\textrm{v}(t)\) is oscillating fast, whereas the position \(\textrm{x}(t)\) is still moving slow. Hence, we seek to identify a limiting process \(\textrm{q}(t)\) such that \(\textrm{x}(t)\) can be related to \(\textrm{q}(t)\) on any finite-time window. We now provide a more detailed description of the main results.

1.1 Geometric Ergodicity

In general, there is no Markovian dynamics associated with (1.1), owing to the presence of the memory kernels. Nevertheless, it is well known that for kernels which are a sum of exponential functions, we may adopt the Mori–Zwanzig approach to produce a Markovian approximation to (1.1) (Glatt-Holtz et al. 2020; Kubo 1966; Mori 1965; Ottobre and Pavliotis 2011; Pavliotis 2014; Zwanzig 2001). More specifically, when \(K_i\) is given by

$$\begin{aligned} K_i(t)=\sum _{\ell =1}^{k_i} \lambda _{i,\ell }^2 e^{-\alpha _{i,\ell } t},\quad t\ge 0, \end{aligned}$$
(1.4)

for some positive constants \(\lambda _{i,\ell }\), \(\alpha _{i,\ell }\), \(\ell =1,\dots ,k_i\), following the framework of Baczewski and Bond (2013), Doob (1942), Duong and Shang (2022), Goychuk (2012), Ottobre and Pavliotis (2011), Pavlioti (2014), we can rewrite (1.1) as the following system

$$\begin{aligned} \text {d}\, x_i(t)&= v_i(t)\text {d}t,\qquad i=1,\dots ,N, \nonumber \\ m\,\text {d}\, v_i(t)&= -\gamma v_i(t) \text {d}t -\nabla U(x_i(t))\text {d}t +\sqrt{2\gamma } \,\text {d}W_{i,0}(t)\nonumber \\&\qquad - \sum _{j\ne i}\nabla G\big (x_i(t)-x_j(t)\big ) \text {d}t +\sum _{\ell =1}^{k_i} \lambda _{i,\ell } z_{i,\ell }(t)\text {d}t, \nonumber \\ \text {d}\, z_{i,\ell }(t)&= -\alpha _{i,\ell } z_{i,\ell } (t)\text {d}t-\lambda _{i,\ell } v_i(t)\text {d}t+\sqrt{2\alpha _{i,\ell } }\,\text {d}W_{i,\ell } (t),\quad \ell =1,\dots ,k_i. \end{aligned}$$
(1.5)

See Pavliotis (2014, Proposition 8.1) for a detailed discussion of this formulation.

Denoting \(\textrm{z}_i=(z_{i,1},\dots ,z_{i,k_i})\in (\mathbb {R}^d)^{k_i}, i=1,\ldots , N \), we introduce the Hamiltonian function \(H_N\) defined as

$$\begin{aligned} H_N(\textrm{x},\textrm{v},\textrm{z}_1,\dots ,\textrm{z}_N) = \frac{1}{2}m|\textrm{v}|^2+ \sum _{i=1}^N U(x_i)+\!\!\!\sum _{1\le i< j\le N}\!\!\!G(x_i-x_j)+\frac{1}{2}\sum _{i=1}^N |\textrm{z}_i|^2. \end{aligned}$$
(1.6)

The corresponding Gibbs measure is given by:

$$\begin{aligned} \pi _N(\textrm{x},\textrm{v},\textrm{z}_1,\dots ,\textrm{z}_N)=\frac{1}{Z_N}\exp \big \{-H_N(\textrm{x},\textrm{v},\textrm{z}_1,\dots ,\textrm{z}_N)\big \}\textrm{dxdvdz}_1\dots \textrm{dz}_N, \end{aligned}$$
(1.7)

where \(Z_N\) is the normalization constant. Under suitable assumptions on the potentials, cf. Assumptions 2.1 and 2.3, one can rely on the Hamiltonian structure to show that (1.5) is always well-posed. That is a strong solution of (1.5) exists and is unique for all finite time. Furthermore, by a routine computation (Pavliotis 2014, Proposition 8.2), it is not difficult to see that the Gibbs measure \(\pi _N\) as in (1.7) is an invariant probability measure of (1.5) and that there is dissipation toward such a measure. On the other hand, as observed elsewhere in Conrad and Grothaus (2010), Cooke et al. (2017), Grothaus and Stilgenbauer (2015), Herzog and Mattingly (2019), Lu and Mattingly (2019), in the presence of the nonlinearities, the Hamiltonian (1.6) does not produce an energy estimate of the form

$$\begin{aligned} \frac{\text {d}}{\text {d}t}\mathbb {E}[V(t)]\le -c\, \mathbb {E}[V(t)]+C,\quad t\ge 0, \end{aligned}$$

which is needed to obtain geometric ergodicity. In this paper, we tackle the problem by exploiting the technique of Lyapunov function, cf. Definition 3.1, and successfully establish the uniqueness of \(\pi _N\) as well as an exponential convergent rate toward \(\pi _N\). We note that our result covers important examples of singular potentials such as Lennard–Jones functions and Coulomb functions. We refer the reader to Theorem 2.8 for a precise statement of this result and to Sect. 3 for its proof.

Historically, in the absence of repulsive forces, (1.5) is reduced to the following single-particle GLE

$$\begin{aligned} \text {d}\, x(t)&= v(t)\text {d}t, \nonumber \\ m\,\text {d}\, v(t)&= -\gamma v(t) \text {d}t -\nabla U(x(t))\text {d}t +\sum _{i=1}^k \lambda _i z_i(t)\text {d}t +\sqrt{2\gamma } \,\text {d}W_0(t),\nonumber \\ \text {d}\, z_i(t)&= -\alpha _i z_i(t)\text {d}t-\lambda _i v(t)\text {d}t+\sqrt{2\alpha _i}\,\text {d}W_i(t),\quad i=1,\dots ,k, \end{aligned}$$
(1.8)

whose large-time asymptotic has been studied extensively (Glatt-Holtz et al. 2020; Herzog et al. 2023; Ottobre and Pavliotis 2011; Pavliotis 2014). Particularly, mixing rates for the kernel instances of finite sum of exponentials were established in Ottobre and Pavliotis (2011, Pavliotis (2014) via the weak Harris Theorem (Hairer and Mattingly 2011; Meyn and Tweedie 2012). Analogously, kernel instances as an infinite sum of exponentials were explored in Glatt-Holtz et al. (2020). In this work, the uniqueness of invariant probability measures was obtained by employing the so-called asymptotic coupling argument. Recently in Herzog et al. (2023), a Gibbsian approach was adopted from Liu (2002), Mattingly and Sinai (2001), Mattingly (2002) to study more general kernels that need not be a sum of exponentials.

Turning back to our ergodicity results for singular potentials, the proof of Theorem 2.8 relies on two ingredients: an irreducibility condition and a suitable Lyapunov function (Hairer and Mattingly 2011; Mattingly et al. 2002; Meyn and Tweedie 2012). Whereas the irreducibility is relatively standard and can be addressed by adapting to the argument in, e.g., Herzog and Mattingly (2019, Corollary 5.12) and Mattingly et al. (2002, Proposition 2.5), the construction of Lyapunov functions is highly nontrivial requiring a deeper understanding of the dynamics. Notably, in case the viscous drag is positive (\(\gamma >0\)), we draw upon the ideas developed in Herzog and Mattingly (2019), Lu and Mattingly (2019) tailored to our settings, cf. Lemma 3.5. On the other hand, when \(\gamma =0\), it is worth pointing out that the method therein is unfortunately not applicable owing to a lack of dissipation in \(\textrm{v}\). To circumvent the issue, we effectively build up a novel Lyapunov function specifically designed for (1.5), cf. Lemma 3.6. The main idea of the construction is to realize the dominating effects at high energy (\(H_N\gg 1\)). We do so by employing a heuristic asymptotic scaling allowing for determining the leading order terms in “bad" regions, i.e., when \(|\textrm{x}|\rightarrow 0\) and \(|\textrm{v}|,|\textrm{z}|\rightarrow \infty \). In turn, this will be crucial in the derivation of our Lyapunov functions that ultimately will be invoked to conclude ergodicity when \(\gamma =0\). The heuristic argument will be presented out in Sect. 3.1.1, whereas the proofs of Lemmas 3.5 and 3.6 are supplied in Sect. 3.2.

1.2 Small-Mass Limit

In the second main topic of the paper, we investigate the small-mass limit for the process \(\textrm{x}(t)=\textrm{x}_m(t)\) in (1.5). Namely, by taking m to zero on the right-hand side of the \(v_i\)-equation in (1.5), we aim to derive a process \(\textrm{q}(t)\) taking values in \((\mathbb {R}^d)^N\) such that \(\textrm{x}_m(t)\) can be well-approximated by \(\textrm{q}(t)\) on any finite-time window, i.e.,

$$\begin{aligned} \sup _{t\in [0,T]}|\textrm{x}_m(t)-\textrm{q}(t)|\rightarrow 0,\quad \text {as }m\rightarrow 0, \end{aligned}$$
(1.9)

where the limit holds in an appropriate sense. In the literature, such a statement is also known as the Smoluchowski–Kramer approximation (Freidlin 2004; Kramers 1940; Smoluchowski 1916).

We note that in the absence of the singularities, there is a vast literature on limits of the form (1.9) for various settings of the single-particle GLE as well as other second-order systems. For examples, numerical simulations were performed in Hottovy et al. (2012a, 2012b). Rigorous results in this direction for state-dependent drift terms appear in the work of Cerrai et al. (2020), Hottovy et al. (2015, 2012a), Pardoux and Veretennikov (2003). Similar results are also established for Langevin dynamics (Herzog et al. 2016; Duong et al. 2017), finite-dimensional single-particle GLE (Lim and Wehr 2019; Lim et al. 2020), as well as infinite-dimensional single-particle GLE (Nguyen 2018; Shi and Wang 2021). Analogous study for the stochastic wave equation was central in the work of Cerrai and Freidlin (2006a, b), Cerrai et al. (2017, Cerrai and Glatt-Holtz (2014, 2020), Cerrai and Salins (2016), Nguyen (2022). On the other hand, small-mass limits in the context of repulsive forces are much less studied, but see the recent paper (Choi and Tse 2022) for a quantification of the small-mass limit of the kinetic Vlasov–Fokker–Planck equations with singularities. In our work, we investigate this problem under a general set of conditions on the nonlinear potentials. More specifically, let us introduce the following system

$$\begin{aligned} \gamma \text {d}q_i(t)&= -\nabla U(q_i(t))\text {d}t- \sum _{j\ne i}\nabla G\big (q_i(t)-q_j(t)\big ) \text {d}t+\sqrt{2\gamma }\text {d}W_{i,0}(t) \nonumber \\&\quad \;-\sum _{\ell =1}^{k_i} \lambda _{i,\ell }^2 q_i(t)\text {d}t+\sum _{\ell =1}^{k_i}\lambda _{i,\ell } f_{i,\ell }(t)\text {d}t, \qquad i=1,\dots ,N, \nonumber \\ \text {d}f_{i,\ell }(t)&= -\alpha _{i,\ell } f_{i,\ell }(t)+\lambda _{i,\ell }\, \alpha _{i,\ell }\, q_i(t)\text {d}t+\sqrt{2\alpha _{i,\ell }}\text {d}W_{i,\ell }(t),\quad \ell =1,\dots , k_i. \end{aligned}$$
(1.10)

Our second main result states that the process \(\textrm{q}(t)\) satisfies the following limit in probability for all \(\xi ,\,T>0\)

$$\begin{aligned} \mathbb {P}\Big (\sup _{t\in [0,T]}|\textrm{x}_m(t)-\textrm{q}(t)|>\xi \Big )\rightarrow 0,\quad \text {as }m\rightarrow 0. \end{aligned}$$
(1.11)

The precise statement of (1.11) is provided in Theorem 2.10, while its detailed proof is supplied in Sect. 4.

In order to derive the limiting system (1.10), we will adopt the framework developed in Herzog et al. (2016), Nguyen (2018) dealing with the same issue in the absence of singular potentials. This involves exploiting the structure of the \(z_i\)-equation in (1.5) while making use of Duhamel’s formula and an integration by parts. In turn, this allows for completely decoupling \(z_i\) from \(v_i\), ultimately arriving at (1.10). See Sect. 4.1 for a further discussion of this point. The proof of (1.11) draws upon the argument in Herzog et al. (2016), Nguyen (2018), Ottobre and Pavliotis (2011) tailored to our settings. Namely, we first reduce the general problem to the special case when the nonlinearities are assumed to be Lipschitz. This requires a careful analysis on the auxiliary memory variables \(\textrm{z}_i(t)\), \(i=1,\dots ,N\), as well as the velocity process \(\textrm{v}_m(t)\). We then proceed to remove the Lipschitz constraint by making use of crucial moment bounds on the limiting process \(\textrm{q}(t)\). In turn, this relies on a delicate estimate on (1.10) via suitable Lyapunov functions, which are also of independent interest. Similarly to the ergodicity results, we note that the limit (1.11) is applicable to a wide range of repulsive forces, e.g., the Lennard–Jones and Coulomb functions. To the best of the authors’ knowledge, the type of limit (1.11) that we establish in this work seems to be the first in this direction for stochastic systems with memory and singularities. The explicit argument for (1.11) will be carried out while making use of a series of auxiliary results in Sect. 4. The proof of Theorem 2.10 will be presented in Sect. 4.4.

Finally, we remark that a crucial property that is leveraged in this work is the choice of memory kernels as a finite sum of exponentials, cf. (1.4). This allows for the convenience of employing Markovian framework to study asymptotic analysis for (1.1) (Mori 1965; Ottobre and Pavliotis 2011; Pavliotis 2014; Zwanzig 2001) under the impact of interacting repulsive forces. For SDEs with non-exponentially decaying kernels, e.g., sub-exponential or power-law, with smooth nonlinearities though, we refer the reader to Baeumer et al. (2015, Desch and Londen (2011, Glatt-Holtz et al. (2020), Herzog et al. (2023, Nguyen (2018).

1.3 Organization of the Paper

The rest of the paper is organized as follows: in Sect. 2, we introduce the notations as well as the assumptions that we make on the nonlinearities. We also state the main results of the paper, including Theorem 2.8 on geometric ergodicity of (1.5) and Theorem 2.10 on the validity of the approximation of (1.5) by (1.10) in the small-mass regime. In Sect. 3, we address the construction of Lyapunov functions for (1.5) and prove Theorem 2.8. In Sect. 4, we detail a series of auxiliary results that we employ to prove the convergence of (1.5) toward (1.10). We also conclude Theorem 2.10 in this section. In Appendix A, we provide useful estimates on singular potentials that are exploited to establish the main results.

2 Assumptions and Main Results

Throughout, we let \((\Omega , \mathcal {F}, (\mathcal {F}_t)_{t\ge 0}, \mathbb {P})\) be a filtered probability space satisfying the usual conditions (Karatzas and Shreve 2012) and \((W_{i,j}(t))\), \(i=1,\dots , N\), \(j=0,\dots ,k_i\), be i.i.d standard d-dimensional Brownian Motions on \((\Omega , \mathcal {F},\mathbb {P})\) adapted to the filtration \((\mathcal {F}_t)_{t\ge 0}\).

In Sect. 2.1, we detail sufficient conditions on the nonlinearities U and G that we will employ throughout the analysis. We also formulate the well-posedness through Proposition 2.7. In Sect. 2.2, we state the first main result through Theorem 2.8 giving the uniqueness of the invariant Gibbs measure \(\pi _N\) defined in (1.7), as well as the exponential convergent rate toward \(\pi _N\) in suitable Wasserstein distances. In Sect. 2.3, we provide our second main result through Theorem 2.10 concerning the validity of the small mass limit of (1.5) on any finite-time window.

2.1 Main Assumptions

For notational convenience, we denote the inner product and the norm in \(\mathbb {R}^d\) by \(\langle \cdot ,\cdot \rangle \) and \(|\cdot |\), respectively. Concerning the potential U, we will impose the following condition (Ottobre and Pavliotis 2011; Pavliotis 2014).

Assumption 2.1

(i) \(U\in C^\infty (\mathbb {R}^d;[1,\infty ))\) satisfies

$$\begin{aligned} |U(x)|\le a_1(1+|x|^{\lambda +1}),\quad |\nabla U(x)|\le a_1(1+|x|^{\lambda }),\quad x\in \mathbb {R}^d, \end{aligned}$$
(2.1)

and

$$\begin{aligned} |\nabla ^2 U(x)|\le a_1(1+|x|^{\lambda -1}), \end{aligned}$$
(2.2)

for some constants \(a_1>0\) and \(\lambda \ge 1\).

(ii) Furthermore, there exist positive constants \(a_2,a_3\) such that

$$\begin{aligned} \langle \nabla U(x), x\rangle \ge a_2|x|^{\lambda +1}-a_3,\quad x\in \mathbb {R}^d. \end{aligned}$$
(2.3)

(iii) If \(\gamma =0\), then \(\lambda =1 \).

Remark 2.2

The first two conditions (i) and (ii) in Assumption 2.1 are quite popular and can be found in many previous literature for the Langevin dynamics (Glatt-Holtz et al. 2020; Mattingly et al. 2002; Nguyen 2018). In particular, U(x) essentially behaves like \(|x|^{\lambda +1}\) at infinity. On the other hand, for the generalized Langevin counterpart, in the absence of the viscous drag, i.e., \(\gamma =0\) in \(v-\)equation in (1.5), we have to impose condition (iii) Ottobre and Pavliotis (2011) requiring U be essentially a quadratic potential. We note, however, that this condition is unnecessary for the well-posedness. Rather, it is to guarantee the existence of suitable Lyapunov functions, so as to ensure geometric ergodicity of (1.5). See the proofs of Lemmas 3.4 and 3.6 for a further discussion of this point.

Concerning the singular potential G, we will make the following condition (Bolley et al. 2018; Herzog and Mattingly 2019; Lu and Mattingly 2019).

Assumption 2.3

(i) \(G\in C^\infty (\mathbb {R}^d{\setminus }\{0\};\mathbb {R})\) satisfies \(G(x)\rightarrow \infty \) as \(|x|\rightarrow 0\). Furthermore, there exists a positive constant \(\beta _1\ge 1\) such that for all \(x\in \mathbb {R}^d\setminus \{0\}\)

$$\begin{aligned} |G(x)|&\le a_1\Big (1+|x|+\frac{1}{|x|^{\beta _1}}\Big ) , \end{aligned}$$
(2.4)
$$\begin{aligned} |\nabla G(x)|&\le a_1\Big (1+\frac{1}{|x|^{\beta _1}}\Big ),\end{aligned}$$
(2.5)
$$\begin{aligned} \text {and} \quad |\nabla ^2 G(x)|&\le a_1\Big (1+\frac{1}{|x|^{\beta _1+1}}\Big ), \end{aligned}$$
(2.6)

where \(a_1\) is the constant as in Assumption 2.1.

(ii) There exist positive constants \(\beta _2\in [0,\beta _1)\), \(a_4,\,a_5\) and \(a_6\) such that

$$\begin{aligned} \Big |\nabla G(x) +a_4\frac{x}{|x|^{\beta _1+1}}\Big | \le \frac{a_5}{|x|^{\beta _2}}+a_6, \quad x\in \mathbb {R}^d\setminus \{0\}. \end{aligned}$$
(2.7)

Remark 2.4

(i) Although (2.7) is slightly odd looking, this condition essentially states that \(\nabla G\) can be expressed as:

$$\begin{aligned} \nabla G(x)= -c\frac{x}{|x|^{\beta _1+1}}+\text {lower-order terms}, \end{aligned}$$

hence the requirement \(\beta _2<\beta _1\). Also, \(\nabla G\) has the order of \(|x|^{-\beta _1}\) as \(|x|\rightarrow 0\), i.e.,

$$\begin{aligned} \frac{c}{|x|^{\beta _1}}-C\le |\nabla G(x)|\le \frac{C}{|x|^{\beta _1}}+C, \end{aligned}$$

which need not imply (2.7). Condition (2.7) will be employed in the Lyapunov proofs in Sect. 3.

(ii) A routine calculation shows that both the Coulomb functions

$$\begin{aligned} G(x)={\left\{ \begin{array}{ll} -\log |x|,&{} d=2,\\ \frac{1}{|x|^{d-2}},&{} d\ge 3, \end{array}\right. } \end{aligned}$$
(2.8)

and the Lennard–Jones functions

$$\begin{aligned} G(x) = \frac{c_0}{|x|^{12}}-\frac{c_1}{|x|^6}, \end{aligned}$$
(2.9)

satisfy (2.7). Particularly, in case of (2.9), we have

$$\begin{aligned} \nabla G(x) = -12c_0\frac{x}{|x|^{14}}+6c_1\frac{x}{|x|^{8}}, \end{aligned}$$

which verifies (2.7) with \(\beta _1=13\) and \(\beta _2=7\). Another well-known example for the case \(\beta _1=1\) is the log function \(G(x)=-\log |x|\), whereas the case \(\beta _1>1\) includes the instance \(G(x)=|x|^{-\beta _1+1}\).

(iiI) Without loss of generality, we may assume that

$$\begin{aligned} \sum _{i=1}^N U(x_i)+\!\!\!\sum _{1\le i<j\le N}\!\!\!G(x_i-x_j)\ge 0. \end{aligned}$$
(2.10)

Otherwise, we may replace U(x) by \(U(x)+c\) for some sufficiently large constant c, which does not affect (1.5).

Under the above two assumptions, we are able to establish the geometric ergodicity of (1.5), cf. Theorem 2.8, in any dimension \(d\ge 1\). They are also sufficient for the purpose of investigating the small mass limits when either \(d\ge 2\) or \(\beta _1>1\). On the other hand, when \(d=1\) and \(\beta _1=1\) (e.g., log potentials), we will impose the following extra condition on G.

Assumption 2.5

Let G and \(\beta _1\) be as in Assumption 2.3. In dimension \(d=1\), if \(\beta _1=1\), then

$$\begin{aligned} a_4> \frac{1}{2}, \end{aligned}$$
(2.11)

where \(a_4\) is the positive constant from condition (2.7).

Remark 2.6

As it turns out, in dimension \(d=1\), log potentials (\(\beta _1=1\)) induce further difficulty for the small-mass limit. To circumvent the issue, we have to impose Assumption 2.5, which requires that the repulsive force be strong enough so as to establish suitable energy estimates. See the proof of Lemma 4.3 for a detailed explanation of this point. It is also worth mentioning that the threshold \(a_4>1/2\) is a manifestation of Lemma A.2 and is perhaps far from optimality. Thus, the small-mass limit in the case \(d=1\) and \(\beta _1=1\), cf. Theorem 2.10, remains an open problem for all \(a_4\in (0,1/2]\).

Having introduced sufficient conditions on the potentials, we turn to the issue of well-posedness for (1.5). The domain where the displacement process \(\textrm{x}_m(t)\) evolves on is denoted by \(\mathcal {D}\) and is defined as Bolley et al. (2018, Herzog and Mattingly (2019), Lu and Mattingly (2019)

$$\begin{aligned} \mathcal {D}=\{\textrm{x}=(x_1,\dots ,x_N) \in (\mathbb {R}^d)^N: x_i\ne x_j \text { if } i\ne j\}. \end{aligned}$$
(2.12)

Then, we define the phase space for the solution of (1.5) as follows:

$$\begin{aligned} \textbf{X}=\mathcal {D}\times (\mathbb {R}^d)^N \times \prod _{i=1}^N (\mathbb {R}^d)^{k_i}. \end{aligned}$$
(2.13)

The first result of this paper is the following well-posedness result ensuring the existence and uniqueness of strong solutions to system (1.5).

Proposition 2.7

Under Assumptions 2.1 and 2.3, for every initial condition \(X_0=(\textrm{x}(0),\textrm{v}(0)\), \(\textrm{z}_{1}(0)\),\(\dots \),\(\textrm{z}_N(0))\in \textbf{X}\), system (1.5) admits a unique strong solution \(X_m(t;X_0)=\big (\textrm{x}_m(t)\), \(\textrm{v}_m(t)\), \(\textrm{z}_{1,m}(t)\),..., \(\textrm{z}_{N,m}(t)\big )\in \textbf{X}\).

The proof of Proposition 2.7 is a consequence of the existence of Lyapunov functions below in Lemmas 3.5 and 3.6. The argument can be adapted from the well-posedness proof of Glatt-Holtz et al. (2020, Section 3) tailored to our settings. Alternatively, the result can also be proved by using the Hamiltonian as in (1.6) to establish suitable moment bounds. It, however, should be noted that Proposition 2.7 itself is highly nontrivial as well-posedness does not automatically follow a standard argument for SDEs (Khasminskii 2011) based on locally Lipschitz continuity, owing to the presence of singular potentials. Nevertheless, the issue can be tackled by constructing appropriate Lyapunov functions.

As a consequence of the well-posedness, we can thus introduce the Markov transition probabilities of the solution \(X_m(t)\) by

$$\begin{aligned} P_t^m(X_0,A){:}{=}\mathbb {P}(X_m(t;X_0)\in A), \end{aligned}$$

which are well-defined for \(t\ge 0\), initial condition \(X_0\in \textbf{X}\) and Borel sets \(A\subset \textbf{X}\). Letting \(\mathcal {B}_b(\textbf{X})\) denote the set of bounded Borel measurable functions \(f:\textbf{X}\rightarrow \mathbb {R}\), the associated Markov semigroup \(P_t^m:\mathcal {B}_b(\textbf{X})\rightarrow \mathcal {B}_b(\textbf{X})\) is defined and denoted by

$$\begin{aligned} P_t^m f(X_0)=\mathbb {E}[f(X_m(t;X_0))], \,\, f\in \mathcal {B}_b(\textbf{X}). \end{aligned}$$

2.2 Geometric Ergodicity

We now turn to the topic of large-time properties of equation (1.5). Recall that a probability measure \(\mu \) on Borel subsets of \(\textbf{X}\) is called invariant for the semigroup \(P_t^m\) if for every \(f\in \mathcal {B}_b(\textbf{X})\)

$$\begin{aligned} \int _{\textbf{X}} f(X) (P_t^m)^*\mu (\text {d}X)=\int _{\textbf{X}} f(X)\mu (\text {d}X), \end{aligned}$$

where \((P_t^m)^*\mu \) is defined as in Hairer and Mattingly (2011)

$$\begin{aligned} (P_t^m)^*\mu (A) = \int _{\textbf{X}} P_t^m(X,A)\mu (\text {d}X), \end{aligned}$$

for all Borel sets \(A\subset \textbf{X}\). Next, we denote by \(\mathcal {L}_{m,\gamma }^N\) the generator associated with (1.5). One defines \(\mathcal {L}_{m,\gamma }^N\) for any \(\varphi \in C^2(\textbf{X};\mathbb {R})\) by

$$\begin{aligned} \mathcal {L}_{m,\gamma }^N\varphi&=\sum _{i=1}^N\Big [ \langle \partial _{x_i} \varphi , v_i\rangle +\frac{1}{m} \Big \langle \partial _{v_i} \varphi , -\gamma v_i-\nabla U(x_i) -\sum _{j\ne i} \nabla G(x_i-x_j) +\sum _{\ell =1}^{k_i}\lambda _{i,\ell } z_{i,\ell }\Big \rangle \nonumber \\&\quad +\frac{\gamma }{m^2} \triangle _{v_i} \varphi + \sum _{\ell =1}^{k_i}\langle \partial _{z_{i,\ell }} \varphi , -\alpha _{i,\ell } z_{i,\ell }-\lambda _{i,\ell } v_i\rangle +\sum _{\ell =1}^{k_i} \alpha _{i,\ell }\triangle _{z_{i,\ell }}\varphi \Big ]. \end{aligned}$$
(2.14)

In order to show that \(\pi _N(X)\text {d}X\) defined in (1.7) is invariant for (1.5), it suffices to show that \(\pi _N(X)\) is a solution of the stationary Fokker–Planck equation

$$\begin{aligned} (\mathcal {L}_{m,\gamma }^N)^*\pi _N(X)=0, \end{aligned}$$
(2.15)

where \((\mathcal {L}_{m,\gamma }^N)^*\) is the dual of \(\mathcal {L}_{m,\gamma }^N\), i.e.,

$$\begin{aligned} \int _{\textbf{X}} \mathcal {L}_{m,\gamma }^Nf_1(X)\cdot f_2(X) \text {d}X = \int _{\textbf{X}} f_1(X)\cdot (\mathcal {L}_{m,\gamma }^N)^*f_2(X)\text {d}X, \end{aligned}$$

for any \(f_1,f_2\in C^2_c(\textbf{X};\mathbb {R})\). In the absence of the singular potential G, this approach was previously employed in Pavliotis (2014) for a finite-dimensional GLE and in Glatt-Holtz et al. (2020) for an infinite-dimensional GLE. With regard to (2.15), we may simply adapt to the proof of Pavliotis (2014, Proposition 8.2) tailored to our setting with the appearance of G.

Concerning the unique ergodicity of \(\pi _N\), we will work with suitable Wasserstein distances allowing for the convenience of measuring the convergent rate toward equilibrium. For a measurable function \(V:\textbf{X}\rightarrow (0,\infty )\), we introduce the following weighted supremum norm

$$\begin{aligned} \Vert \varphi \Vert _V{:}{=}\sup _{X\in \textbf{X}}\frac{|\varphi (X)|}{1+V(X)}. \end{aligned}$$

We denote by \(\mathcal {M}_V\) the collection of probability measures \(\mu \) on Borel subsets of \(\textbf{X}\) such that

$$\begin{aligned} \int _{\textbf{X}}V(X)\mu (\text {d}X)<\infty . \end{aligned}$$

Let \(\mathcal {W}_V\) be the corresponding weighted total variation distance in \(\mathcal {M}_V\) associated with \(\Vert \cdot \Vert _V\), given by

$$\begin{aligned} \mathcal {W}_V(\mu _1,\mu _2)=\sup _{\Vert \varphi \Vert _V\le 1}\Big |\int _{\textbf{X}} \varphi (X)\mu _1(\text {d}X)-\int _{\textbf{X}} \varphi (X)\mu _2(\text {d}X)\Big |. \end{aligned}$$

We remark that \(\mathcal {W}_V\) is a Wasserstein distance. Indeed, by the dual Kantorovich theorem

$$\begin{aligned} \mathcal {W}_V(\mu _1,\mu _2)= \inf \mathbb {E}\Big [\big ( (1+V(X_1))+(1+V(X_2))\big )\varvec{1}\{X_1\ne X_2\}\Big ], \end{aligned}$$

where the infimum runs over all pairs of random variables \((X_1,X_2)\) such that \(X_1\sim \mu _1\) and \(X_2\sim \mu _2\). We refer the reader to the monograph (Villani 2021) for a detailed account of Wasserstein distances and optimal transport problems. With this setup, we now state the first main result of the paper, establishing the unique ergodicity of \(\pi _N\) defined in (1.7) as well as the exponential convergent rate toward \(\pi _N\).

Theorem 2.8

Under Assumptions 2.1 and 2.3, for every \(m>0\) and \(\gamma \ge 0\), the probability measure \(\pi _N\) defined in (1.7) is the unique invariant probability measure for (1.5). Furthermore, there exists a function \(V\in C^2(\textbf{X};[1,\infty ))\) such that for all \(\mu \in \mathcal {M}_V\),

$$\begin{aligned} \mathcal {W}_V\big ((P_t^m)^*\mu , \pi _N\big )\le Ce^{-ct}\mathcal {W}_V(\mu ,\pi _N),\quad t\ge 0, \end{aligned}$$
(2.16)

for some positive constants c and C independent of \(\mu \) and t.

In order to establish Theorem 2.8, we will draw upon the framework of Bolley et al. (2018), Hairer and Mattingly (2011), Herzog and Mattingly (2019), Lu and Mattingly (2019), Mattingly et al. (2002), Meyn and Tweedie (2012) tailored to our settings. The argument relies on two crucial ingredients: a suitable Lyapunov function, cf. Definition 3.1, and a minorization condition, cf. Definition 3.2. Since it is not difficult to see that the system (1.5) satisfies hypoellipticity (Ottobre and Pavliotis 2011; Pavliotis 2014), we may employ a relatively standard argument (Mattingly et al. 2002; Pavliotis 2014) to establish the minorization. On the other hand, constructing Lyapunov functions is quite involved requiring a deeper understanding of the dynamics in the presence of the singular potentials. Particularly, while the construction in the case \(\gamma >0\) can be adapted to those in the previous work of Herzog and Mattingly (2019), Lu and Mattingly (2019), the absence of the viscous drag \(\gamma =0\) induces further difficulty owing to the interaction between the singular potentials, the velocity \(\textrm{v}\) and the auxiliary variables \(\{\textrm{z}_i\}_{i=1,\dots ,N}\). To overcome this issue, we will follow (Athreya et al. 2012; Cooke et al. 2017; Herzog and Mattingly 2015a, b, 2019; Lu and Mattingly 2019) and perform the technique of asymptotic scaling to determine the leading-order terms in the dynamics at large energy states. All of this will be carried out in Sect. 3. The proof of Theorem 2.8 will be given in Sect. 3.3.

2.3 Small-Mass Limit

We now consider the topic of small-mass limit and rigorously compare the solution of (1.5) with that of (1.10) as \(m\rightarrow 0\).

As mentioned in Sect. 1.2, the derivation of (1.10) follows the framework of Cerrai et al. (2020), Herzog et al. (2016), Lim and Wehr (2019, Lim et al. (2020) for finite-dimensional second-order systems as well as Nguyen (2018, Shi and Wang (2021) for infinite-dimensional systems. The trick employed involves an integration by parts on the z-equation of (1.5), so as to decouple the velocity from the other processes. As a result, it induces extra drift terms appearing on the right-hand side of the q-equation in (1.10). Also, the initial conditions of (1.10) are closely related to those of (1.5). For the sake of clarity, we defer the explanation in detail to Sect. 4.1.

Due to the presence of singular potentials, the first issue arising from (1.10) is the well-posedness. To this end, we introduce the new phase space for the solutions of (1.10) defined as

$$\begin{aligned} \textbf{Q}=\mathcal {D}\times \prod _{i=1}^{N} (\mathbb {R}^d)^{k_i}, \end{aligned}$$

where we recall \(\mathcal {D}\) being the state space for the displacement as in (2.12). The existence and uniqueness of a strong solution in \(\textbf{Q}\) for (1.10) are guaranteed in the following auxiliary result.

Proposition 2.9

Under Assumptions 2.1 and 2.3, for every initial condition \(Q_0=(\textrm{q}(0)\), \(\textrm{f}_{1}(0)\),\(\dots \),\(\textrm{f}_N(0))\in \textbf{Q}\), system (1.10) admits a unique strong solution \(Q(t;Q_0)=\big (\textrm{q}(t)\), \(\textrm{f}_{1}(t)\),..., \(\textrm{f}_{N}(t)\big )\in \textbf{Q}\).

Similar to the proof of Proposition 2.7, the argument of Proposition 2.9 follows from the energy estimate established in Lemma 4.3 in Sect. 4.3. In turn, the result in Lemma 4.3 relies on suitable Lyapunov function specifically designed for (1.10). It will also be employed to study the small-mass limit.

We now state the second main result of the paper giving the validity of the approximation of (1.5) by (1.10) in the small-mass regime.

Theorem 2.10

Suppose that Assumptions 2.1, 2.3 and 2.5 hold. For every \((\textrm{x}(0)\), \(\textrm{v}(0)\), \(\textrm{z}_{1}(0)\),..., \(\textrm{z}_N(0))\in \textbf{X}\), let \(X_m(t)=\big (\textrm{x}_m(t)\), \(\textrm{v}_m(t)\), \(\textrm{z}_{1,m}(t)\),..., \(\textrm{z}_{N,m}(t)\big )\) be the solution of (1.5) and let \(Q(t)=\big (\textrm{q}(t)\), \(\textrm{f}_{1}(t)\),\(\dots \), \(\textrm{f}_{N}(t)\big )\) be the solution of (1.10) with the following initial condition

$$\begin{aligned} q_i(0) = x_i(0),\quad f_{i,\ell }(0)=z_{i,\ell }(0)+\lambda _{i,\ell } x_i(0),\quad i=1,\dots ,N, \quad \ell =1,\dots ,k_i. \end{aligned}$$
(2.17)

Then, for every \(T,\,\xi >0\), it holds that

$$\begin{aligned} \mathbb {P}\Big \{\sup _{t\in [0,T]}|\textrm{x}_m(t)-\textrm{q}(t)|>\xi \Big \}\rightarrow 0,\quad m\rightarrow 0. \end{aligned}$$
(2.18)

In order to establish Theorem 2.10, we will adapt to the approach in Herzog et al. (2016), Nguyen (2018) tailored to our settings. The argument can be summarized as follows: we first truncate the nonlinearities in (1.5) and (1.10) while making use of conditions (2.2) and (2.6). This results in Lipschitz systems, thus allowing for proving the small-mass limit of the truncated systems. We then exploit the moment estimates on (1.10), cf. Lemma 4.3, to remove the Lipschitz assumption. The explicit argument will be carried out in a series of results in Sect. 4.

Finally, we remark that the convergence in (2.18) only holds in probability, but not in \(L^p\). Following Higham et al. (2002, Theorem 2.2), the latter is a consequence of an \(L^p\) estimate on \(\textrm{x}_m(t)\) that is uniform with respect to the mass, i.e.,

$$\begin{aligned} \limsup _{m\rightarrow 0}\sup _{t\in [0,T]}|\textrm{x}_m(t)|^p<\infty . \end{aligned}$$
(2.19)

Due to the singularities, (2.19) is not available in this work and would require further insight. Therefore, the small-mass convergence in \(L^p\) remains an open problem.

3 Geometric Ergodicity

Throughout the rest of the paper, c and C denote generic positive constants that may change from line to line. The main parameters that they depend on will appear between parenthesis, e.g., c(Tq) is a function of T and q. In this section, since we do not take \(m\rightarrow 0\), we will drop the subscript m in \(\textrm{x}_m,\textrm{v}_m\) and elsewhere for notational convenience.

In this section, we establish the unique ergodicity and the exponential convergent rate toward the Gibbs measure \(\pi _N\) defined in (1.7) for (1.5). The argument will rely on the construction of suitable Lyapunov functions while making use of a standard irreducibility condition. For the reader’s convenience, we first recall the definitions of these notions below.

Definition 3.1

A function \(V\in C^2(\textbf{X};[1,\infty ))\) is called a Lyapunov function for (1.5) if the followings hold:

(i) \(V(X)\rightarrow \infty \) whenever \(|X|+\sum _{1\le i<j\le N}|x_i-x_j|^{-1}\rightarrow \infty \) in \(\textbf{X}\); and

(ii) for all \(X\in \textbf{X}\),

$$\begin{aligned} \mathcal {L}_{m,\gamma }^N V(X)\le -c \mathcal {L}_{m,\gamma }^N V(X)+D, \end{aligned}$$
(3.1)

for some constants \(c>0\) and \(D\ge 0\) independent of X.

Definition 3.2

Let V be a Lyapunov function as in Definition 3.1. Denote

$$\begin{aligned} \textbf{X}_R =\big \{X\in \textbf{X}:V(X)\le R\big \}. \end{aligned}$$

The system (1.5) is said to satisfy a minorization condition if for all R sufficiently large, there exist positive constants \(t_R,\,c_R\), a probability measure \(\nu _R\) such that \(\nu _R(\textbf{X}_R)=1\) and for every \(X\in \textbf{X}_R\) and any Borel set \(A\subset \textbf{X}\),

$$\begin{aligned} P_{t_R}^m(X,A)\ge c_R\nu _R(A). \end{aligned}$$
(3.2)

In Sect. 3.1, we provide a heuristic argument on the construction of a Lyapunov function for the instance of single-particle GLE. In particular, we show that our construction works well for this simpler setting, both in the case \(\gamma >0\) and \(\gamma =0\) through Lemmas 3.3 and 3.4, respectively. In Sect. 3.2, we adapt the construction in the single-particle to the general N-particle system (1.5), and establish suitable Lyapunov function so as to achieve the desired dissipative estimates. Together with an auxiliary result on minorization, cf. Lemma 3.7, we will conclude the proof of Theorem 2.8 in Sect. 3.3.

3.1 Single-Particle GLE

Following the framework developed in Herzog and Mattingly (2019), Lu and Mattingly (2019), we will build up intuition for the construction of Lyapunov functions for the full system (1.5) by investigating a simpler equation in the absence of interacting forces. More specifically, we introduce the following single-particle GLE (\(N=1\))

$$\begin{aligned} \text {d}\, x(t)&= v(t)\text {d}t, \nonumber \\ m\,\text {d}\, v(t)&= -\gamma v(t) \text {d}t -\nabla U(x(t))\text {d}t - \nabla G(x(t)) \text {d}t +\sum _{i=1}^k \lambda _i z_i(t)\text {d}t +\sqrt{2\gamma } \,\text {d}W_0(t),\nonumber \\ \text {d}\, z_i(t)&= -\alpha _i z_i(t)\text {d}t-\lambda _i v(t)\text {d}t+\sqrt{2\alpha _i}\,\text {d}W_i(t),\quad i=1,\dots ,k. \end{aligned}$$
(3.3)

Let \(\mathcal {L}_{m,\gamma }\) be the generator associated with (3.3). That is for \(\gamma \ge 0\),

$$\begin{aligned} \mathcal {L}_{m,\gamma }\varphi&= \langle \partial _x \varphi , v\rangle + \frac{1}{m}\bigg \langle \partial _v \varphi , -\gamma v-\nabla U(x) -\nabla G(x)+\sum _{i=1}^k\lambda _i z_i\bigg \rangle \nonumber \\&\qquad +\frac{\gamma }{m^2} \triangle _v \varphi + \sum _{i=1}^k\langle \partial _{z_i} \varphi , -\alpha _i z_i-\lambda _i v\rangle +\sum _{i=1}^k \alpha _i\triangle _{z_i}\varphi , \end{aligned}$$
(3.4)

where \(\varphi =\varphi (x,v,z_1,\ldots , z_k)\in C^2(\mathbb {R}^{(k+2)d})\). Denote H

$$\begin{aligned} H\big (x,v,z_1,\dots ,z_k\big ) = U(x)+G(x)+ \frac{1}{2}m|v|^2+\frac{1}{2}\sum _{i=1}^k |z_i|^2. \end{aligned}$$
(3.5)

Note that \(\mathcal {L}_{m,\gamma }\varphi \) can be written as

$$\begin{aligned} \mathcal {L}_{m,\gamma } \varphi =J\nabla H\cdot \nabla \varphi -\sigma \nabla H\cdot \nabla \varphi +\textrm{div}(\sigma \nabla \varphi ), \end{aligned}$$
(3.6)

where

$$\begin{aligned} \nabla H&=\begin{pmatrix} \partial _x H\\ \partial _v H\\ \partial _{z_1}H\\ \vdots \\ \partial _{z_k}H \end{pmatrix}=\begin{pmatrix} \nabla U(x)+\nabla G(x)\\ mv\\ z_1\\ \vdots \\ z_k \end{pmatrix},\quad \nabla \varphi =\begin{pmatrix} \partial _x \varphi \\ \partial _v \varphi \\ \partial _{z_1}\varphi \\ \vdots \\ \partial _{z_k}\varphi \end{pmatrix}, \\ J&{:}{=} \frac{1}{m}\begin{pmatrix} 0&{}I &{}0&{}\cdots &{}0\\ -I&{}0&{}\lambda _1&{}\cdots &{}\lambda _k\\ 0&{}-\lambda _1&{}0&{}\cdots &{}0\\ \vdots &{}\vdots &{}\vdots &{}\ddots &{}\vdots \\ 0&{}-\lambda _k&{}0&{}\cdots &{}0 \end{pmatrix},\quad \text {and}\quad \sigma =\sigma _\gamma {:}{=}\begin{pmatrix} 0&{}0&{}0&{}\cdots &{}0\\ 0&{}\frac{\gamma }{m^2}&{}0&{}\cdots &{}0\\ 0&{}0&{}\alpha _1&{}\cdots &{}0\\ \vdots &{}\vdots &{}\vdots &{}\ddots &{}\vdots \\ 0&{}0&{}0&{}\cdots &{}\alpha _k \end{pmatrix}. \end{aligned}$$

We notice that J is anti-symmetric while \(\sigma \) is positive semi-definite. This formulation will be more convenient in the subsequent computations. In particular, if \(\varphi \) is linear in v and z then the last term in (3.6) vanishes.

Now, there are two cases to be considered depending on the value of the viscous constant \(\gamma \), of which, the first case is when \(\gamma >0\).

3.1.1 Positive Viscous Constant \(\gamma >0\)

In this case, we observe that system (3.3) is almost the same as the following single-particle Langevin equation without the auxiliary memory variables

$$\begin{aligned} \text {d}\, x(t)&= v(t)\text {d}t, \nonumber \\ m\,\text {d}\, v(t)&= -\gamma v(t) \text {d}t -\nabla U(x(t))\text {d}t - \nabla G(x(t)) \text {d}t+\sqrt{2\gamma } \,\text {d}W_0(t). \end{aligned}$$
(3.7)

Before discussing the Lyapunov construction for (3.3), it is illuminating to recapitulate the heuristic argument for (3.7) from Herzog and Mattingly (2019), Lu and Mattingly (2019, Mattingly et al. (2002). As a first ansatz, one can facilitate the Hamiltonian of (3.7) given by

$$\begin{aligned} \tilde{H}(x,v)=\frac{1}{2}m|v|^2+U(x)+G(x). \end{aligned}$$

Denoting by \(\tilde{\mathcal {L}}\) the generator associated with (3.7), i.e.,

$$\begin{aligned} \tilde{\mathcal {L}}\varphi = \langle \partial _x \varphi , v\rangle + \frac{1}{m}\big \langle \partial _v \varphi , -\gamma v-\nabla U(x) -\nabla G(x)\big \rangle +\frac{\gamma }{m^2} \triangle _v \varphi , \end{aligned}$$
(3.8)

a routine computation shows that

$$\begin{aligned} \tilde{\mathcal {L}}\tilde{H}\le -c|v|^2+D. \end{aligned}$$

While this is sufficient for the well-posedness of (3.7), it does not produce the dissipative effect on x when \(|x|\rightarrow \infty \) and \(|x|\rightarrow 0\), so as to establish geometric ergodicity. In the absence of the singular potential G, one can exploit the trick in Mattingly et al. (2002) by considering a perturbation of the form

$$\begin{aligned} \tilde{H}(x,v)+\varepsilon \langle x,v\rangle , \end{aligned}$$
(3.9)

where \(\varepsilon \) is sufficiently small. Applying \(\tilde{\mathcal {L}}\) to the above function while making use of condition (2.3), it is not difficult to see that

$$\begin{aligned} \tilde{\mathcal {L}}(\tilde{H}(x,v)+\varepsilon \langle x,v\rangle ) \le -c (|v|^2+U(x))+C. \end{aligned}$$

On the other hand, in the work of Herzog and Mattingly (2019), the following function was introduced

$$\begin{aligned} \exp \big \{b(\tilde{H}(x,v)+\psi (x,v))\big \}, \end{aligned}$$

where the perturbation \(\psi (x,v)\) satisfies

$$\begin{aligned} \tilde{\mathcal {L}}\psi (x,v)\le -C, \end{aligned}$$

for a sufficiently large constant C. In order to construct \(\psi \), it is crucial to determine the leading order terms in \(\tilde{\mathcal {L}}\) when \(U(x)+G(x)\) is large while v is being fixed. Following the idea previously presented in Herzog and Mattingly (2019, Section 3.2), in this situation, \(\tilde{\mathcal {L}}\) is approximated by

$$\begin{aligned} \tilde{\mathcal {L}}\approx \mathcal {A}_1=-\nabla (U(x)+ G(x))\cdot \nabla _v. \end{aligned}$$

In turn, this suggests \(\psi \) satisfies

$$\begin{aligned} \mathcal {A}_1\psi \le -C. \end{aligned}$$

A candidate for the above inequality is given by Herzog and Mattingly (2019)

$$\begin{aligned} \psi (x,v)\propto \frac{\langle v,\nabla (U(x)+G(x))\rangle }{|\nabla (U(x)+G(x))|^2}. \end{aligned}$$

We refer the reader to Herzog and Mattingly (2019) for the derivation of \(\psi \) in more detail. While this choice of \(\psi \) works well for Lennard–Jones and Riesz potentials, it is not applicable to the class of log functions, of which, the Coulomb potential in dimension \(d=2\) is a well-known example.

With regard to the specific instance of Coulomb potentials (2.8), in the work of Lu and Mattingly (2019), the authors tackle the Lyapunov issue by employing the scaling transformation

$$\begin{aligned} x = \kappa ^{-1}\hat{x},\quad v=\hat{v}, \end{aligned}$$

for \(\kappa >0\). In dimension \(d=2\) (with \(G(x)=-\log |x|\)), \(\tilde{\mathcal {L}}\) as in (3.8) can be recast as

$$\begin{aligned} \tilde{\mathcal {L}}&= v\cdot \nabla _x +\frac{1}{m}\Big (-\gamma v-\nabla U(x)+\frac{x}{|x|^2} \Big )\cdot \nabla _v+\frac{\gamma }{m^2}\triangle _{v} \\&= \kappa \,\hat{v}\cdot \nabla _{\hat{x}} +\frac{1}{m}\Big (-\gamma \hat{v}-\nabla U(\kappa ^{-1}\hat{x})+\kappa \frac{\hat{x}}{|\hat{x}|^2} \Big )\cdot \nabla _{\hat{v}}+\frac{\gamma }{m^2}\triangle _{\hat{v}}. \end{aligned}$$

Recalling condition (2.1), since \(|\nabla U(\kappa ^{-1} \hat{x})| \approx \kappa ^{-\lambda }|\hat{x}|^{\lambda }\), which is negligible as \(\kappa \) is large, we observe that

$$\begin{aligned} \tilde{\mathcal {L}}&\approx \kappa \,\hat{v}\cdot \nabla _{\hat{x}}+\frac{\kappa }{m}\frac{\hat{x}}{|\hat{x}|^2}\cdot \nabla _{\hat{v}},\quad \kappa \rightarrow \infty . \end{aligned}$$

Likewise, in dimension \(d\ge 3\) (with \(G(x)=|x|^{2-d}\)),

$$\begin{aligned} \tilde{\mathcal {L}}&= \kappa \,\hat{v}\cdot \nabla _{\hat{x}} +\frac{1}{m}\Big (-\gamma \hat{v}-\nabla U(\kappa ^{-1}\hat{x})+2\kappa ^{d-1} \frac{\hat{x}}{|\hat{x}|^d} \Big )\cdot \nabla _{\hat{v}}+\frac{\gamma }{m^2}\triangle _{\hat{v}} \\&\approx \frac{2}{m}\kappa ^{d-1}\frac{\hat{x}}{|\hat{x}|^2}\cdot \nabla _{\hat{v}}, \quad k\rightarrow \infty . \end{aligned}$$

So, taking \(\kappa \) to infinity (i.e., taking \(|x|\rightarrow 0\)) indicates

$$\begin{aligned} \tilde{\mathcal {L}}\approx \mathcal {A}_2= {\left\{ \begin{array}{ll} v\cdot \nabla _x+\frac{x}{|x|^2}\cdot \nabla _v,&{}d=2,\\ \frac{x}{|x|^d}\cdot \nabla _v,&{} d\ge 3. \end{array}\right. } \end{aligned}$$

A typical choice of \(\psi \) satisfying \(\mathcal {A}_2\psi \le -C\) is given by

$$\begin{aligned} \psi (x,v)\propto -\frac{\langle x,v\rangle }{|x|}. \end{aligned}$$

Together with (3.9), we deduce that a Lyapunov function V for (3.7) with Coulomb potentials has the following form Lu and Mattingly (2019)

$$\begin{aligned} V\propto \tilde{H}(x,v)+\varepsilon _1\langle x,v\rangle -\varepsilon _2\frac{\langle x,v\rangle }{|x|}, \end{aligned}$$

for some positive constants \(\varepsilon _1\) and \(\varepsilon _2\) sufficiently small.

Turning back to (3.3) in the case \(\gamma >0\), motivated by the above discussion, for \(\varepsilon >0\), we introduce the function \(V_1\) given by

$$\begin{aligned} V_1\big (x,v,z_1,\dots ,z_k\big ){:}{=} H\big (x,v,z_1,\dots ,z_k\big )+m\varepsilon \langle x,v\rangle -m\varepsilon \frac{\langle x,v\rangle }{|x|}, \end{aligned}$$
(3.10)

where H is defined in (3.5). Since (3.3) only differs from (3.7) by the appearance of the linear memory variables \(z_i\)’s, it turns out that \(V_1\) is indeed a Lyapunov function for (3.3). This is summarized in the following lemma.

Lemma 3.3

Under Assumptions 2.1 and 2.3, let \(V_1\) be defined as in (3.10). For each \(\gamma >0\) and \(m>0\), there exists a positive constant \(\varepsilon \) sufficiently small such that \(V_1\) is a Lyapunov function for (3.3).

Proof

Letting \(\mathcal {L}_{m,\gamma }\) and H, respectively, be defined in (3.4) and (3.5), Itô’s formula yields

$$\begin{aligned} \mathcal {L}_{m,\gamma } H&=-\sigma \nabla H\cdot \nabla H+\textrm{div}(\sigma \nabla H)\nonumber \\&=-\gamma |v|^2- \sum _{i=1}^k \alpha _i|z_i|^2+\frac{1}{m}\cdot \frac{d}{2}\gamma +\frac{d}{2}\sum _{i=1}^k \alpha _i. \end{aligned}$$
(3.11)

Similarly, we compute

$$\begin{aligned} \mathcal {L}_{m,\gamma } \big (m\langle x,v\rangle \big )= m|v|^2-\gamma \langle x,v\rangle -\langle \nabla U(x)+\nabla G(x),x\rangle +\sum _{i=1}^k\lambda _i\langle z_i,x\rangle . \end{aligned}$$
(3.12)

To estimate the right-hand side above, we recall from condition (2.3) that

$$\begin{aligned} -\langle \nabla U(x),x\rangle \le -a_2|x|^{\lambda +1}+a_3. \end{aligned}$$

With regard to \(\nabla G\), condition (2.5) implies that

$$\begin{aligned} |\langle \nabla G(x),x\rangle |\le \frac{a_1}{|x|^{\beta _1-1}}+a_1|x|. \end{aligned}$$

Concerning the cross terms \(\langle x,v\rangle \) and \(\langle z_i,x\rangle \), for \(\varepsilon \in (0,1)\), we employ Cauchy–Schwarz inequality to deduce

$$\begin{aligned} -\gamma \varepsilon \langle x,v\rangle +\varepsilon \sum _{i=1}^k\lambda _i\langle z_i,x\rangle \le \gamma \varepsilon ^{1/2}|v|^2+\varepsilon ^{1/2} \sum _{i=1}^k\lambda _i^2|z_i|^2 + (k+\gamma )\varepsilon ^{3/2}|x|^2. \end{aligned}$$

Together with (3.12), we find

$$\begin{aligned} \mathcal {L}_{m,\gamma } \big (\varepsilon m\langle x,v\rangle \big )&\le (\varepsilon m+\gamma \varepsilon ^{1/2}) |v|^2+\varepsilon ^{1/2}\sum _{i=1}^k\lambda _i^2 |z_i|^2 + (k+\gamma )\varepsilon ^{3/2}|x|^2\nonumber \\&\quad -a_2\varepsilon |x|^{\lambda +1}+a_3\varepsilon +\varepsilon \frac{a_1}{|x|^{\beta _1-1}}+\varepsilon a_1|x|, \end{aligned}$$
(3.13)

whence,

$$\begin{aligned} \mathcal {L}_{m,\gamma } \big (\varepsilon m\langle x,v\rangle \big )&\le C\Bigg (\varepsilon ^{1/2} |v|^2+\varepsilon ^{1/2}\sum _{i=1}^k |z_i|^2 + \varepsilon ^{3/2}|x|^2+\varepsilon \frac{1}{|x|^{\beta _1-1}}+1\Bigg )\nonumber \\ {}&\quad -a_2\varepsilon |x|^{\lambda +1}, \end{aligned}$$
(3.14)

for some positive constant C independent of \(\varepsilon \).

Turning to \(-\langle x,v\rangle /|x| \), it holds that

$$\begin{aligned}&\mathcal {L}_{m,\gamma }\bigg (\!-m\frac{\langle x,v\rangle }{|x|}\bigg ) \nonumber \\&\quad = -m\frac{|v|^2}{|x|}+m\frac{|\langle x,v\rangle |^2}{|x|^3}+\gamma \frac{\langle x,v\rangle }{|x|}+ \frac{\langle \nabla U(x), x\rangle }{|x|}+\frac{\langle \nabla G(x),x\rangle }{|x|}\nonumber \\ {}&\quad -\sum _{i=1}^k\lambda _i\frac{\langle z_i,x\rangle }{|x|} . \end{aligned}$$
(3.15)

It is clear that

$$\begin{aligned} -m\frac{|v|^2}{|x|}+m\frac{|\langle x,v\rangle |^2}{|x|^3}\le 0. \end{aligned}$$

From condition (2.1), we readily have

$$\begin{aligned} \frac{\langle \nabla U(x), x\rangle }{|x|} \le |\nabla U(x)|\le a_1(1+|x|^\lambda ). \end{aligned}$$

Also, recalling (2.7),

$$\begin{aligned} \frac{\langle \nabla G(x),x\rangle }{|x|} = -\frac{a_4}{|x|^{\beta _1}}+ \frac{\langle \nabla G(x)+a_4\frac{x}{|x|^{\beta _1+1}},x\rangle }{|x|}&\le -\frac{a_4}{|x|^{\beta _1}}+\frac{a_5}{|x|^{\beta _2}}+a_6\nonumber \\&\le -\frac{a_4}{2|x|^{\beta _1}}+C. \end{aligned}$$
(3.16)

In the last estimate above, we subsumed \(|x|^{-\beta _2}\) into \(-|x|^{-\beta _1}\) thanks to the fact that \(\beta _2\in [0,\beta _1)\) by virtue of the condition (2.7). Altogether, we deduce that

$$\begin{aligned} \mathcal {L}_{m,\gamma }\bigg (\!-m\varepsilon \frac{\langle x,v\rangle }{|x|}\bigg )&\le C\varepsilon \Bigg (|v|+\sum _{i=1}^k |z_i|+|x|^\lambda + 1 \Bigg )-\varepsilon \frac{a_4}{2|x|^{\beta _1}}\nonumber \\&\le C\Bigg (\varepsilon ^2|v|^2+\varepsilon ^2\sum _{i=1}^k |z_i|^2+\varepsilon |x|^\lambda + 1 \Bigg )-\varepsilon \frac{a_4}{2|x|^{\beta _1}} . \end{aligned}$$
(3.17)

Now, we combine estimates (3.14) and (3.17) together with identities (3.10) and (3.11) to infer

$$\begin{aligned} \mathcal {L}_{m,\gamma } V_1&= \mathcal {L}_{m,\gamma }\bigg (H+m\varepsilon \langle x,v\rangle -m\varepsilon \frac{\langle x,v\rangle }{|x|}\bigg ) \\&\le -c |v|^2- c\sum _{i=1}^k |z_i|^2-c\varepsilon |x|^{\lambda +1}-\varepsilon \frac{c}{|x|^{\beta _1}}+C \\&\quad + C\Big (\varepsilon ^{1/2} |v|^2+\varepsilon ^{1/2}\sum _{i=1}^k |z_i|^2 + \varepsilon ^{3/2}|x|^2+\varepsilon |x|^\lambda +\varepsilon \frac{1}{|x|^{\beta _1-1}}\Big ), \end{aligned}$$

for some positive constants cC independent of \(\varepsilon \). By taking \(\varepsilon \) sufficiently small, we observe that the positive non-constant terms on the above right-hand side are dominated by the negative terms. In particular, since \(\lambda \ge 1\), \(\varepsilon ^{3/2}|x|^2\) can be subsumed into \(-\varepsilon |x|^{\lambda +1}\). As a consequence, we arrive at

$$\begin{aligned} \mathcal {L}_{m,\gamma } V_1&\le -c |v|^2- c\sum _{i=1}^k |z_i|^2-c\varepsilon |x|^{\lambda +1}-\varepsilon \frac{c}{|x|^{\beta _1}}+C. \end{aligned}$$
(3.18)

Since U and G are bounded by \(|x|^{\lambda +1}+|x|^{-\beta _1}\), (3.18) produces the desired Lyapunov property of \(V_1\) for (3.3). The proof is thus finished. \(\square \)

3.1.2 Zero Viscous Constant \(\gamma =0\)

We now turn to the instance \(\gamma =0\). In this case, since there is no viscous drag on the right-hand side of the v-equation in (3.3), the function \(V_1\) defined in (3.10) does not produce the dissipation in v for large v. To circumnavigate this issue, we note that the \(z_i\)-equation in (3.3) still depends on v. So, we may exploit this fact to transfer the dissipation from, say \(z_1\) to v. More specifically, let us consider adding a small perturbation to \(V_1\) as follows:

$$\begin{aligned} V_1+m\varepsilon \langle v,z_1\rangle . \end{aligned}$$

Denote by \(\mathcal {L}_{m,0}\) the generator associated with (3.3) when \(\gamma =0\). That is, from (3.4), we have

$$\begin{aligned} \mathcal {L}_{m,0}\varphi&= \langle \partial _x \varphi , v\rangle + \frac{1}{m}\bigg \langle \partial _v \varphi ,-\nabla U(x) -\nabla G(x)+\sum _{i=1}^k\lambda _i z_i\bigg \rangle \nonumber \\&\qquad + \sum _{i=1}^k\langle \partial _{z_i} \varphi , -\alpha _i z_i-\lambda _i v\rangle +\sum _{i=1}^k \alpha _i\triangle _{z_i}\varphi . \end{aligned}$$
(3.19)

Applying \(\mathcal {L}_{m,0}\) to the cross term \(\langle v,z_1\rangle \), although \(\mathcal {L}_{m,0}\langle v,z_1\rangle \) provides the required dissipative effect in v, it also induces a cross product \(\langle \nabla G(x),z_1\rangle \), which has the order of \(|z_1|/|x|^{-\beta _1}\) by virtue of condition (2.5). That is

$$\begin{aligned} \mathcal {L}_{m,0}\langle v ,z_1\rangle \le -c|v|^2+ \frac{|z_1|}{|x|^{\beta _1}}. \end{aligned}$$
(3.20)

How to annihilate the effect caused by this extra term is the main difficulty that we face in the case \(\gamma =0\).

To circumvent the issue, from the proof of Lemma 3.3, particularly the estimate (3.17), we see that

$$\begin{aligned} \mathcal {L}_{m,0}\Big (-\frac{\langle x,v\rangle }{|x|}\Big ) \propto -\frac{1}{|x|^{\beta _1}}+ \varphi (x,v,z_1,\dots ,z_k), \end{aligned}$$

where \(\varphi (x,v,z_1,\dots ,z_k)\) consists of lower-order terms. This suggests that we look for a perturbation of the form

$$\begin{aligned} -\frac{\langle x,v\rangle }{|x|}\psi (x,v,z_1), \end{aligned}$$

where \(\psi \) satisfies

$$\begin{aligned} \psi \ge c|z_1|,\quad \text {and}\quad |\mathcal {L}_{m,0}\psi |=O(|v|+ |\textrm{z}|). \end{aligned}$$
(3.21)

From (3.20) and (3.21), the terms \(|z_1|/|x|^{\beta _1}\) and \(O(|v|+|\textrm{z}|)\) suggest that the issue is where |v|, \(|\textrm{z}|\) are large and |x| is small. To derive \(\psi \), it is important to understand the dynamics in \(\mathcal {L}_{m,0}\) in this “bad region". So, we introduce the following scaling transformation

$$\begin{aligned} (x,v,z_1,\dots ,z_k)=(\kappa ^{-a}\hat{x},\kappa \hat{v},\kappa \hat{z}_1,\kappa \hat{z}_2,\dots ,\kappa \hat{z}_k), \end{aligned}$$

for some positive constant \(a>1\). Recalling \(\mathcal {L}_{m,0}\) as in (3.19), under this scaling, we find that

$$\begin{aligned} \mathcal {L}_{m,0}&= v\cdot \nabla _{x}+\frac{1}{m}\Bigg (-\nabla U(x)-\nabla G(x)+\sum _{i=1}^k \lambda _iz_i\Bigg )\cdot \nabla _v\\&\quad + \sum _{i=1}^k \big (-\alpha _iz_i-\lambda _i v\big )\cdot \nabla _{z_i}+\sum _{i=1}^k \alpha _i \triangle _{z_i} \\&= \kappa ^{a+1}\hat{v}\cdot \nabla _{\hat{x}} +\frac{1}{m\kappa }\Bigg (-\nabla U(\kappa ^{-a} \hat{x})-\nabla G(\kappa ^{-a} \hat{x})+\kappa \sum _{i=1}^k \lambda _i\hat{z}_i\Bigg )\cdot \nabla _{\hat{v}}\\&\quad +\sum _{i=1}^k \big (-\alpha _i \hat{z}_i-\lambda _i \hat{v}\big )\cdot \nabla _{\hat{z}_i}+\kappa ^{-2}\sum _{i=1}^k \alpha _i \triangle _{\hat{z}_i}. \end{aligned}$$

Recalling condition (2.1) and condition (2.5), suppose heuristically that

$$\begin{aligned} -\nabla U(\kappa ^{-a} \hat{x})-\nabla G(\kappa ^{-a} \hat{x}) \approx -\kappa ^{-a\lambda }\nabla U(\hat{x})-\kappa ^{a\beta _1}\nabla G(\hat{x}), \end{aligned}$$

implying,

$$\begin{aligned} \mathcal {L}_{m,0}&\approx \kappa ^{a+1} \hat{v}\cdot \nabla _{\hat{x}} -\kappa ^{-a\lambda -1}\nabla U(\hat{x})\cdot \nabla _{\hat{v}}-\kappa ^{a\beta _1-1 } \nabla G(\hat{x})\cdot \nabla _{\hat{v}}+ \sum _{i=1}^k \hat{z}_i\cdot \nabla _{\hat{v}}\\&\qquad -\sum _{i=1}^k (\hat{z}_i+\hat{v}\big )\cdot \nabla _{\hat{z}_i}+\kappa ^{-2}\sum _{i=1}^k \triangle _{\hat{z}_i}. \end{aligned}$$

Observe that

$$\begin{aligned} \mathcal {L}_{m,0}&\approx \kappa ^{a+1} \hat{v}\cdot \nabla _{\hat{x}}-\kappa ^{a\beta _1-1 } \nabla G(\hat{x})\cdot \nabla _{\hat{v}},\quad \kappa \rightarrow \infty . \end{aligned}$$

In other words, when \(\kappa \) is large, the dominant balance of terms in the above transformation is contained in

$$\begin{aligned} \mathcal {A}_{m,0} = v\cdot \nabla _x-\nabla G(x)\cdot \nabla _v. \end{aligned}$$

Together with the requirement (3.21), a typical choice for \(\psi \) is given by

$$\begin{aligned} \psi =\sqrt{a_1|z_1|^2+\tfrac{1}{2}m|v|^2+G(x)+U(x)+a_2}, \end{aligned}$$

for some positive constants \(a_1,a_2\) to be chosen later. In the above, we note that the appearance of U(x) is to ensure the expression under the square root is positive. In summary, the candidate Lyapunov function for (3.3) looks like

$$\begin{aligned} V_1+m\varepsilon \langle v,z_1\rangle -m\varepsilon \frac{\langle x,v\rangle }{|x|}\psi . \end{aligned}$$

In Lemma 3.4, we will see that by picking \(a_1,a_2\) carefully, we will achieve the Lyapunov effect for (3.3). We finish this discussion by introducing the following function \(V_2\) defined for \(\varepsilon \in (0,1),\, R>1\),

$$\begin{aligned}&V_2\big (x,v,z_1,\dots ,z_k\big ) \nonumber \\&\quad {:}{=}H\big (x,v,z_1,\dots ,z_k\big )+ \varepsilon R m\langle x,v\rangle +\varepsilon R^2 m\langle v,z_1\rangle -\varepsilon m\frac{\langle x,v\rangle }{|x|}\sqrt{Q_R} , \end{aligned}$$
(3.22)

where

$$\begin{aligned} Q_R = R^4|z_1|^2+m|v|^2+2U(x)+2G(x)+R. \end{aligned}$$
(3.23)

Lemma 3.4

Under Assumptions 2.1 and 2.3, let \(V_2\) be defined as in (3.22). For \(\gamma =0\) and every \(m>0\), there exist positive constants \(\varepsilon \) small and R large enough such that \(V_2\) is a Lyapunov function for (3.3).

Proof

Firstly, we note that when \(\gamma =0\), by virtue of Assumption 2.1, \(\lambda =1\). From the estimate (3.13), we immediately obtain

$$\begin{aligned} \mathcal {L}_{m,0} \Big (\varepsilon R m\langle x,v\rangle \Big )&\le CR\Bigg (\varepsilon |v|^2+\varepsilon ^{1/2}\sum _{i=1}^k |z_i|^2 + \varepsilon ^{3/2}|x|^2+\varepsilon \frac{1}{|x|^{\beta _1-1}}+1\Bigg )\\ {}&\quad -a_2\varepsilon R|x|^{2}. \end{aligned}$$

Also, (3.11) is reduced to

$$\begin{aligned} \mathcal {L}_{m,0} H = - \sum _{i=1}^k \alpha _i|z_i|^2+\frac{d}{2}\sum _{i=1}^k \alpha _i. \end{aligned}$$

As a consequence, we obtain

$$\begin{aligned} \mathcal {L}_{m,0}\big (H+ \varepsilon Rm\langle x,v\rangle \big )&\le CR\Bigg (\varepsilon |v|^2+\varepsilon ^{1/2}\sum _{i=1}^k |z_i|^2 + \varepsilon ^{3/2}|x|^2+\varepsilon \frac{1}{|x|^{\beta _1-1}}+1\Bigg )\\ {}&\quad -a_2\varepsilon R|x|^{2}- \sum _{i=1}^k \alpha _i|z_i|^2+\frac{d}{2}\sum _{i=1}^k \alpha _i. \end{aligned}$$

By taking \(\varepsilon \) sufficiently small, it follows that

$$\begin{aligned}&\mathcal {L}_{m,0}\big (H+ m \varepsilon \langle x,v\rangle \big ) \nonumber \\&\le -c\varepsilon R|x|^2 - c\sum _{i=1}^k |z_i|^2 +C\varepsilon R m |v|^2+ C\varepsilon R\frac{1}{|x|^{\beta _1-1}}+CR, \end{aligned}$$
(3.24)

for some positive constants \(c,\,C\) independent of \(\varepsilon \) and R.

Next, we consider the cross term \(\langle v,z_1\rangle \) on the right-hand side of (3.22). Itô’s formula yields (recalling \(\gamma =0\))

$$\begin{aligned}&\mathcal {L}_{m,0} \big (\varepsilon m\langle v,z_1\rangle \big ) \nonumber \\&= -\varepsilon \langle \nabla U(x)+\nabla G(x),z_1\rangle +\varepsilon \sum _{i=1}^k \lambda _i\langle z_i,z_1\rangle -\varepsilon \alpha _1m\langle v,z_1\rangle -\lambda _1\varepsilon m|v|^2. \end{aligned}$$
(3.25)

We invoke condition (2.1) with Cauchy–Schwarz inequality to infer

$$\begin{aligned} -\varepsilon \langle \nabla U(x),z_1\rangle \le \varepsilon a_1(1+|x|)|z_1| \le a_1\varepsilon ^{3/2}|x|^2+a_1\varepsilon ^{1/2}|z_1|^2 +a_1\varepsilon |z_1| . \end{aligned}$$

Similarly,

$$\begin{aligned} \varepsilon \sum _{i\!=\!1}^k \lambda _i\langle z_i,z_1\rangle -\varepsilon \alpha _1m\langle v,z_1\rangle \le \varepsilon \sum _{i=1}^k \lambda _i^2|z_i|^2\!+\!\varepsilon k|z_1|^2+\varepsilon ^{1/2}\alpha _1^2m|z_1|^2\!+\!\varepsilon ^{3/2}m|v|^2. \end{aligned}$$

Concerning the cross term \(\langle \nabla G(x),z_1\rangle \), recall from (2.5) that

$$\begin{aligned} |\nabla G(x)|\le \frac{a_1}{|x|^{\beta _1}}+a_1. \end{aligned}$$

As a consequence,

$$\begin{aligned} -\varepsilon \langle \nabla G(x),z_1\rangle&\le \varepsilon a_1\frac{|z_1|}{|x|^{\beta _1}} +\varepsilon a_1|z_1|\le \varepsilon a_1\frac{|z_1|}{|x|^{\beta _1}} +\frac{1}{2} \varepsilon (|z_1|^2+a_1^2). \end{aligned}$$

Altogether, we deduce that the bound

$$\begin{aligned}&\mathcal {L}_{m,0} \big (R^2\varepsilon m\langle v,z_1\rangle \big ) \nonumber \\&\le \! -\! \lambda _1\varepsilon R^2 m|v|^2\!+\!C\varepsilon R^2 \frac{|z_1|}{|x|^{\beta _1}} \! +C\varepsilon ^{3/2} R^2\big (|v|^2 +|x|^{2}\big )\!+\!C\varepsilon ^{1/2} R^2\sum _{i=1}^k |z_i|^2\!+\!C\varepsilon R^2, \end{aligned}$$
(3.26)

holds for some positive constant C independent of R and \(\varepsilon \).

Turning to the last term on the right-hand side of (3.22), a routine calculation together with (3.15) gives

$$\begin{aligned}&\mathcal {L}_{m,0}\bigg (\!-\varepsilon m\frac{\langle x,v\rangle }{|x|}\sqrt{Q_R} \bigg ) \nonumber \\&\quad =\varepsilon \bigg (-m\frac{|v|^2|x|^2-|\langle x,v\rangle |^2}{|x|^3}+ \frac{\langle \nabla U(x)-\sum _{i=1}^k\lambda _i z_i, x\rangle }{|x|}+\frac{\langle \nabla G(x),x\rangle }{|x|}\bigg )\sqrt{Q_R} \nonumber \\&\qquad - \varepsilon m\frac{\langle x,v\rangle }{|x|}\cdot \frac{\langle v,\sum _{i=1}^k\lambda _i z_i\rangle }{\sqrt{Q_R} }\nonumber +\varepsilon R^6 m\frac{\langle x,v\rangle }{|x|}\cdot \frac{ \alpha _1|z_1|^2+\lambda _1\langle v,z_1\rangle }{\sqrt{Q_R} }\nonumber \\&\qquad +\frac{1}{2}\alpha _1 \varepsilon R^6 m\frac{\langle x,v\rangle }{|x|}\cdot \frac{1}{\sqrt{Q_R} } \Big ( d- \frac{R^6|z_1|^2}{Q_R}\Big )\nonumber \\&=I_1-I_2+I_3+I_4. \end{aligned}$$
(3.27)

Concerning \(I_4\), we recall from (3.23) that

$$\begin{aligned} Q_R\ge R^6|z_1|^2+m|v|^2, \end{aligned}$$
(3.28)

whence

$$\begin{aligned} I_4\le \frac{1}{2}\alpha _1 \varepsilon R^6 d \sqrt{m} . \end{aligned}$$

With regard to \(I_1\), we invoke conditions (2.1) and (2.4) to see that

$$\begin{aligned} Q_R\le R^6|z_1|^2+m|v|^2+a_1\Big (1+|x|^2+\frac{1}{|x|^{\beta _1}}+|x|\Big )+R^2. \end{aligned}$$

As a consequence,

$$\begin{aligned} \varepsilon \frac{\langle \nabla U(x)-\sum _{i=1}^k\lambda _i z_i, x\rangle }{|x|}\sqrt{Q_R}&\le \varepsilon \Big [a_1(1+|x|)+\sum _{i=1}^k \lambda _i |z_i|\Big ]\sqrt{Q_R}\\&\le \frac{1}{2}\varepsilon \Bigg [a_1(1+|x|)+\sum _{i=1}^k \lambda _i |z_i|\Bigg ]^2+\frac{1}{2}\varepsilon Q_R\\&\le C\varepsilon \Bigg (R^6\sum _{i=1}^k |z_i|^2+|v|^2+|x|^2+\frac{1}{|x|^{\beta _1}}+R^2\Bigg ). \end{aligned}$$

On the other hand, estimate (3.16) implies the bound

$$\begin{aligned} \varepsilon \frac{\langle \nabla G(x),x\rangle }{|x|}\sqrt{Q_R}&\le \varepsilon \Big (-\frac{a_4}{2|x|^{\beta _1}}+C\Big )\sqrt{Q_R}\\&= -\varepsilon \frac{a_4}{2}\cdot \frac{1}{|x|^{\beta _1}}\sqrt{Q_R}+C\varepsilon \sqrt{Q_R}\\&\le -c \varepsilon \frac{R^3|z_1|+R}{|x|^{\beta _1}}+C\varepsilon \Big (R^6 |z_1|^2+|v|^2+|x|^2+\frac{1}{|x|^{\beta _1}}+R^2\Big ). \end{aligned}$$

It follows that

$$\begin{aligned} I_1&\le -c \varepsilon \frac{R^3|z_1|+R}{|x|^{\beta _1}}+C\varepsilon \Bigg (R^6\sum _{i=1}^k |z_i|^2+|v|^2+|x|^2+\frac{1}{|x|^{\beta _1}}+R^2\Bigg ). \end{aligned}$$

With regard to \(I_2\) on the right-hand side of (3.27), since \(Q_R\ge m|v|^2\), we find

$$\begin{aligned} -I_2=-\varepsilon m\frac{\langle x,v\rangle }{|x|}\cdot \frac{\langle v,\sum _{i=1}^k\lambda _i z_i\rangle }{\sqrt{Q_R} }\le \varepsilon \sqrt{m}|v|\sum _{i=1}^k \lambda _i|z_i|\le C\varepsilon \Bigg (|v|^2+\sum _{i=1}^k |z_i|^2\Bigg ). \end{aligned}$$

Turning to \(I_3\), we estimate as follows:

$$\begin{aligned} I_3&=\varepsilon R^6 m\frac{\langle x,v\rangle }{|x|}\cdot \bigg (\frac{ \alpha _1|z_1|^2}{\sqrt{Q_R} }+\frac{ \lambda _1\langle v,z_1\rangle }{\sqrt{Q_R} }\bigg )\\&\le \varepsilon R^3 m |v|\cdot \alpha _1|z_1|+\varepsilon R^6\sqrt{m}|v|\cdot \lambda _1|z_1|\\&\le C\varepsilon (|v|^2+R^{12}|z_1|^2), \end{aligned}$$

where in the first inequality we have used \(\sqrt{Q_R}\ge R^3 |z_1|\) (which follows from (3.28)) and \(\sqrt{Q_R}\ge \sqrt{m}|v|\). Now, we collect the estimates on \(I_j\), \(j=1,\dots ,4\), together with expression (3.27) to infer (recalling \(\varepsilon<1<R\))

$$\begin{aligned}&\mathcal {L}_{m,0}\bigg (\!-\varepsilon m\frac{\langle x,v\rangle }{|x|}\sqrt{Q_R} \bigg ) \nonumber \\&\le -c \varepsilon \frac{R^3|z_1|+R}{|x|^{\beta _1}}+C\varepsilon \Bigg (R^{12}\sum _{i=1}^k |z_i|^2+|v|^2+|x|^2+\frac{1}{|x|^{\beta _1}}+R^6\Bigg ) \nonumber \\&\le -c \varepsilon \frac{R^3|z_1|+R}{|x|^{\beta _1}}+C\varepsilon \Bigg (R^{12}\sum _{i=1}^k |z_i|^2+|v|^2+|x|^2+R^6\Bigg ). \end{aligned}$$
(3.29)

In the last implication above, we subsumed \(C\varepsilon |x|^{-\beta _1}\) into \(-c\varepsilon R|x|^{-\beta _1}\), by taking R large enough.

Turning back to \(V_2\) given by (3.22), we combine (3.24), (3.26) and (3.29) to arrive at the estimate

$$\begin{aligned} \mathcal {L}_{m,0} V_2&\le -c\varepsilon R|x|^2 - c\sum _{i=1}^k |z_i|^2 +C\varepsilon R |v|^2+ C\varepsilon R\frac{1}{|x|^{\beta _1-1}}+CR\\&\qquad - c\varepsilon R^2 |v|^2+C\varepsilon R^2 \frac{|z_1|}{|x|^{\beta _1}} +C\varepsilon ^{3/2} R^2\big (|v|^2 +|x|^{2}\big )+C\varepsilon ^{1/2} R^2\sum _{i=1}^k |z_i|^2\\ {}&\qquad +C\varepsilon R^2\\&\qquad -c \varepsilon \frac{R^3|z_1|+R}{|x|^{\beta _1}}+C\varepsilon \Bigg (R^{12}\sum _{i=1}^k |z_i|^2+|v|^2+|x|^2+R^6\Bigg ), \end{aligned}$$

i.e.,

$$\begin{aligned} \mathcal {L}_{m,0} V_2&\le -(c\varepsilon R^2\! -\!C\varepsilon R-C\varepsilon ^{3/2}R^2-C\varepsilon ) |v|^2\! -\! (c\!-\!C\varepsilon ^{1/2}R^2-C\varepsilon R^{12})\sum _{i=1}^k |z_i|^2\\&\quad - ( c\varepsilon R-C\varepsilon ^{3/2}R^2-C\varepsilon )|x|^2- c\varepsilon R \frac{1}{|x|^{\beta _1}}-(c\varepsilon R^3-c\varepsilon R^2)\frac{|z_1|}{|x|^{\beta _1}} \\&\quad +C\varepsilon R\frac{1}{|x|^{\beta _1-1}}+CR+C\varepsilon R^{6}. \end{aligned}$$

Since cC are independent of \(\varepsilon \) and R, we may take R sufficiently large and then shrink \(\varepsilon \) to zero while making use of the fact that \(\frac{1}{|x|^{\beta _1-1}}\) can be subsumed into \(|x|^2+|x|^{\beta _1}\). It follows that

$$\begin{aligned} \mathcal {L}_{m,0} V_2&\le -c\varepsilon R^2|v|^2 - c\sum _{i=1}^k |z_i|^2- c\varepsilon R|x|^2- c\varepsilon R \frac{1}{|x|^{\beta _1}}+C\varepsilon R^{6}. \end{aligned}$$
(3.30)

This produces the Lyapunov property of \(V_2\) for (3.3) in the case \(\gamma =0\), thereby finishing the proof. \(\square \)

3.2 N-particle GLEs

We now turn our attention to the full system (1.5) and construct Lyapunov functions for (1.5) based on the discussion of the single-particle GLE in Sect. 3.1.

We start with the case \(\gamma >0\) and observe that there is a natural generalization of the function \(V_1\) defined in (3.10) to an arbitrary number of particles \(N\ge 2\) in an arbitrary number of dimensions \(d\ge 1\). More specifically, for \(\varepsilon >0\), we introduce the following function \(V^1_N\) given by

$$\begin{aligned}&V_N^1( \textrm{x},\textrm{v},\textrm{z}_1,\dots ,\textrm{z}_N) \nonumber \\&=H_N(\textrm{x},\textrm{v},\textrm{z}_1,\dots ,\textrm{z}_N)+ \varepsilon m\langle \textrm{x},\textrm{v}\rangle -\varepsilon m \sum _{i=1}^N \Big \langle v_i,\sum _{j\ne i}\frac{ x_i-x_j}{|x_i-x_j|}\Big \rangle , \end{aligned}$$
(3.31)

where \(H_N\) is the Hamiltonian as in (1.6). In Lemma 3.5, stated and proven next, we assert that one may pick \(\varepsilon \) sufficiently small to ensure the Lyapunov property of \(V^1_N\).

Lemma 3.5

Under Assumptions 2.1 and 2.3, let \(V_N^1\) be the function defined as (3.31). For all \(\gamma > 0\) and \(m>0\), there exists a positive constant \(\varepsilon \) sufficiently small such that \(V_N^1\) is a Lyapunov function for (1.5).

Proof

We first consider the Hamiltonian \(H_N\) given by (1.6). Applying \(\mathcal {L}_{m,\gamma }^N\) as in (2.14) to \(H_N\) gives

$$\begin{aligned} \mathcal {L}_{m,\gamma }^N H_N= -\gamma |\textrm{v}|^2-\sum _{i=1}^N \sum _{\ell =1}^{k_i}\alpha _{i,\ell }|z_{i,\ell }|^2+\frac{1}{2m}\gamma N d+\frac{1}{2}\sum _{i=1}^N\sum _{\ell =1}^{k_i}\alpha _{i,\ell }. \end{aligned}$$
(3.32)

Next, a routine calculation on the cross term \(\langle \textrm{x},\textrm{v}\rangle \) gives

$$\begin{aligned} \mathcal {L}_{m,\gamma }^N \big (\varepsilon m\langle \textrm{x},\textrm{v}\rangle \big )&=\varepsilon m|\textrm{v}|^2-\varepsilon \gamma \langle \textrm{x},\textrm{v}\rangle +\varepsilon \sum _{i=1}^N \Big \langle x_i, \sum _{\ell =1}^{k_i}z_{i,\ell }\Big \rangle \nonumber \\&\qquad -\varepsilon \sum _{i=1}^N \langle x_i,\nabla U(x_i)\rangle -\varepsilon \!\!\!\sum _{1\le i<j\le N}\!\!\!\langle x_i-x_j,\nabla G(x_i-x_j)\rangle . \end{aligned}$$
(3.33)

Recalling (2.3), we readily have

$$\begin{aligned} -\sum _{i=1}^N \langle x_i,\nabla U(x_i)\rangle \le -a_2\sum _{i=1}^N |x_i|^{\lambda +1}+Na_3. \end{aligned}$$

Also, from (2.5), it holds that

$$\begin{aligned} -\sum _{1\le i<j\le N}\langle x_i-x_j,\nabla G(x_i-x_j)\rangle \le a_1\Big (N^2+\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1-1}}\Big ). \end{aligned}$$

Concerning the other cross terms on the right-hand side of (3.33) we invoke Cauchy–Schwarz inequality to infer

$$\begin{aligned} -\varepsilon \gamma \langle \textrm{x},\textrm{v}\rangle +\varepsilon \sum _{i=1}^N \Big \langle x_i, \sum _{\ell =1}^{k_i}z_{i,\ell }\Big \rangle&\le C\Big ( \varepsilon ^{1/2}|\textrm{v}|^2+\varepsilon ^{1/2}\sum _{i=1}^N|\textrm{z}_i|^2+\varepsilon ^{3/2}|\textrm{x}|^2\Big ) \end{aligned}$$

for some positive constant C independent of \(\varepsilon \). We now collect the above estimates together with expression (3.33) to obtain

$$\begin{aligned} \mathcal {L}_{m,\gamma }^N\big ( \varepsilon m\langle \textrm{x},\textrm{v}\rangle \big )&\le C\Big ( \varepsilon ^{1/2}|\textrm{v}|^2\!+\!\varepsilon ^{1/2}\sum _{i=1}^N|\textrm{z}_i|^2+\varepsilon ^{3/2}|\textrm{x}|^2\!+\!\varepsilon \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1-1}}\Big ) \nonumber \\&\quad -a_2\varepsilon \sum _{i=1}^N |x_i|^{\lambda +1} +C. \end{aligned}$$
(3.34)

Next, we turn to the last term on the right-hand side of (3.31). A routine calculation produces

$$\begin{aligned}&\mathcal {L}_{m,\gamma }^N\Bigg (-m \sum _{i=1}^N \Bigg \langle v_i,\sum _{j\ne i}\frac{ x_i-x_j}{|x_i-x_j|}\Bigg \rangle \Bigg ) \nonumber \\&\quad = -m\!\!\!\sum _{1\le i<j\le N} \frac{|v_i-v_j|^2}{|x_i-x_j|} +m\!\!\!\sum _{1\le i<j\le N} \frac{|\langle v_i-v_j,x_i-x_j\rangle |^2}{|x_i-x_j|^3} \nonumber \\&\qquad +\gamma \!\!\!\sum _{1\le i<j\le N} \frac{\langle v_i-v_j,x_i-x_j\rangle }{|x_i-x_j|} +\sum _{1\le i<j\le N} \frac{\langle \nabla U(x_i) -\nabla U(x_j),x_i-x_j\rangle }{|x_i-x_j|} \nonumber \\&\qquad +\sum _{i=1}^N\Bigg \langle \sum _{j\ne i}\frac{x_i-x_j}{|x_i-x_j|} ,\sum _{\ell \ne i}\nabla G(x_i-x_\ell ) \Bigg \rangle -\sum _{i=1}^N\Bigg \langle \sum _{j\ne i}\frac{x_i-x_j}{|x_i-x_j|} ,\sum _{\ell =1}^{k_i} \lambda _{i,\ell }z_{i,\ell } \Bigg \rangle . \end{aligned}$$
(3.35)

We proceed to estimate the above right-hand side while making use of Cauchy–Schwarz inequality. It is clear that

$$\begin{aligned} -m\!\!\!\sum _{1\le i<j\le N} \frac{|v_i-v_j|^2}{|x_i-x_j|} +m\!\!\!\sum _{1\le i<j\le N} \frac{|\langle v_i-v_j,x_i-x_j\rangle |^2}{|x_i-x_j|^3}\le 0, \end{aligned}$$

which is negligible. Also,

$$\begin{aligned}&\gamma \!\!\!\sum _{1\le i<j\le N} \frac{\langle v_i-v_j,x_i-x_j\rangle }{|x_i-x_j|} -\sum _{i=1}^N\Bigg \langle \sum _{j\ne i}\frac{x_i-x_j}{|x_i-x_j|} ,\sum _{\ell =1}^{k_i} \lambda _{i,\ell }z_{i,\ell } \Bigg \rangle \\&\quad \le \gamma (N-1)\sum _{i=1}^N |v_i|+(N-1)\sum _{i=1}^N\sum _{\ell =1}^{k_i} \lambda _{i,\ell }|z_{i,\ell }| . \end{aligned}$$

Concerning the cross terms involving \(\nabla U\), we invoke condition (2.1) and obtain

$$\begin{aligned} \sum _{1\le i<j\le N} \frac{\langle \nabla U(x_i) -\nabla U(x_j),x_i-x_j\rangle }{|x_i-x_j|}&\le (N-1)a_1\Big (N+\sum _{i=1}^N|x_i|^\lambda \Big ). \end{aligned}$$

With regard to the cross terms involving \(\nabla G\), we recast them as follows:

$$\begin{aligned} \sum _{i\!=\!1}^N\Big \langle \sum _{j\ne i}\frac{x_i\!-\!x_j}{|x_i-x_j|} ,\sum _{\ell \ne i}\nabla G(x_i-x_\ell ) \! \Bigg \rangle&=-a_4\sum _{i=1}^N\Bigg \langle \sum _{j\ne i}\frac{x_i-x_j}{|x_i-x_j|} ,\sum _{\ell \ne i}\frac{x_i-x_\ell }{|x_i-x_\ell |^{\beta _1+1}} \Big \rangle \\&\quad + \sum _{i=1}^N\Bigg \langle \sum _{j\ne i}\frac{x_i-x_j}{|x_i-x_j|} ,\sum _{\ell \ne i}\nabla G(x_i-x_\ell )\\&\quad + a_4\frac{x_i-x_\ell }{|x_i-x_\ell |^{\beta _1+1}} \Bigg \rangle . \end{aligned}$$

In view of Lemma A.1, we readily have

$$\begin{aligned} -a_4\sum _{i=1}^N\Bigg \langle \sum _{j\ne i}\frac{x_i-x_j}{|x_i-x_j|} ,\sum _{\ell \ne i}\frac{x_i-x_\ell }{|x_i-x_\ell |^{\beta _1+1}} \Bigg \rangle \le -2a_4 \sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}. \end{aligned}$$

On the other hand, condition (2.7) implies the bound

$$\begin{aligned}&\sum _{i=1}^N\Bigg \langle \sum _{j\ne i}\frac{x_i-x_j}{|x_i-x_j|} ,\sum _{\ell \ne i}\nabla G(x_i-x_\ell )+ a_4\frac{x_i-x_\ell }{|x_i-x_\ell |^{\beta _1+1}} \Bigg \rangle \\&\quad \le (N-1)\sum _{i=1}^N\sum _{\ell \ne i}\Big |\nabla G(x_i-x_\ell )+ a_4\frac{x_i-x_\ell }{|x_i-x_\ell |^{\beta _1+1}} \Big |\\&\quad \le (N-1)\Bigg [ 2\sum _{1\le i<\ell \le N}\frac{a_5}{|x_i-x_\ell |^{\beta _2}}+N(N-1)a_6\Bigg ]. \end{aligned}$$

In the above, \(a_4,a_5\) and \(a_6\) are the constants as in condition (2.7). Since \(\beta _2\in [0,\beta _1)\), we observe that \(|x_i-x_\ell |^{-\beta _2}\) can be subsumed into \(-|x_i-x_\ell |^{-\beta _1}\). It follows that

$$\begin{aligned} \sum _{i=1}^N\Bigg \langle \sum _{j\ne i}\frac{x_i-x_j}{|x_i-x_j|} ,\sum _{\ell \ne i}\nabla G(x_i-x_\ell ) \Bigg \rangle&\le -a_4 \sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+C. \end{aligned}$$
(3.36)

From the identity (3.35), we infer the estimate

$$\begin{aligned}&\mathcal {L}_{m,\gamma }^N\Bigg (- \varepsilon m\sum _{i=1}^N \Bigg \langle v_i,\sum _{j\ne i}\frac{ x_i-x_j}{|x_i-x_j|}\Bigg \rangle \Bigg ) \nonumber \\&\quad \le -a_4\varepsilon \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+C\varepsilon \Bigg (1+\sum _{i=1}^N |v_i|+ \sum _{i=1}^N |x_i|^\lambda +\sum _{i=1}^N\sum _{\ell =1}^{k_i} |z_{i,\ell }| \Bigg ) . \end{aligned}$$
(3.37)

Now, we collect (3.32), (3.34), (3.35) together with the expression (3.31) of \(V_N^1\) and deduce

$$\begin{aligned} \mathcal {L}_{m,\gamma }^N V_N^1&\le -\gamma |\textrm{v}|^2-\sum _{i=1}^N \sum _{\ell =1}^{k_i}\alpha _{i,\ell }|z_{i,\ell }|^2 -a_2\varepsilon \sum _{i=1}^N |x_i|^{\lambda +1} \nonumber \\ {}&\quad -a_4\varepsilon \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+C\nonumber \\&\quad + C\Bigg ( \varepsilon ^{1/2}|\textrm{v}|^2+\varepsilon ^{1/2}\sum _{i=1}^N|\textrm{z}_i|^2+\varepsilon ^{3/2}|\textrm{x}|^2+\varepsilon \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1-1}}\Bigg ) \nonumber \\&\quad +C\varepsilon \Bigg (\sum _{i=1}^N |v_i|+ \sum _{i=1}^N |x_i|^\lambda +\sum _{i=1}^N\sum _{\ell =1}^{k_i} |z_{i,\ell }| \Bigg ) . \end{aligned}$$
(3.38)

In the above, we emphasize that C is a positive constant independent of \(\varepsilon \). Finally, by taking \(\varepsilon \) sufficiently small, we may infer

$$\begin{aligned} \mathcal {L}_{m,\gamma }^N V_N^1&\le -\frac{1}{2}\Bigg (\gamma |\textrm{v}|^2+\sum _{i=1}^N \sum _{\ell =1}^{k_i}\alpha _{i,\ell }|z_{i,\ell }|^2 +a_2\varepsilon \sum _{i=1}^N |x_i|^{\lambda +1}\\ {}&\qquad +a_4\varepsilon \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}\Bigg )+C. \end{aligned}$$

This produces the desired Lyapunov property of \(V_N^1\) for system (1.5) in the case \(\gamma >0\), as claimed. \(\square \)

Turning to the case \(\gamma =0\), analogous to the function \(V_2\) defined in (3.22) for the single-particle system (3.3), for \(\varepsilon \in (0,1)\) and \(R>1\), we introduce the function \(V_N^2\) given by

$$\begin{aligned} V_N^2( \textrm{x},\textrm{v},\textrm{z}_1,\dots ,\textrm{z}_N)&=H_N(\textrm{x},\textrm{v},\textrm{z}_1,\dots ,\textrm{z}_N)+ \varepsilon Rm\langle \textrm{x},\textrm{v}\rangle +\varepsilon R^2m \sum _{i=1}^N \langle v_i,z_{i,1}\rangle \nonumber \\&\quad -\varepsilon \Bigg (\sum _{i=1}^N m\Bigg \langle v_i,\sum _{j\ne i}\frac{ x_i-x_j}{|x_i-x_j|}\Bigg \rangle \Bigg )\sqrt{Q_R^N}, \end{aligned}$$
(3.39)

where

$$\begin{aligned} Q_R^N(\textrm{x},\textrm{v},\textrm{z}_1,\dots ,\textrm{z}_N)&=R^6\sum _{i=1}^N|z_{i,1}|^2+m|\textrm{v}|^2+2\sum _{i=1}^N U(x_i)\nonumber \\ {}&\quad +2\!\!\!\sum _{1\le i<j\le N}\!\!\!G(x_i-x_j)+R^2. \end{aligned}$$
(3.40)

In Lemma 3.6, we prove that \(V_N^2\) is indeed a Lyapunov function for (1.5) in the case \(\gamma =0\).

Lemma 3.6

Under Assumptions 2.1 and 2.3, let \(V_N^2\) be the function defined in (3.39). In the case \(\gamma = 0\), for every \(m>0\), there exist positive constants \(\varepsilon \) small and R large enough such that \(V_N^2\) is a Lyapunov function for (1.5).

Proof

We first consider the Hamiltonian \(H_N\) as in (1.6). Since \(\gamma =0\), the identity (3.32) is reduced to

$$\begin{aligned} \mathcal {L}_{m,0}^N H_N=-\sum _{i=1}^N \sum _{\ell =1}^{k_i}\alpha _{i,\ell }|z_{i,\ell }|^2+\frac{1}{2}\sum _{i=1}^N\sum _{\ell =1}^{k_i}\alpha _{i,\ell }. \end{aligned}$$
(3.41)

With regard to the cross term \(\langle \textrm{x},\textrm{v}\rangle \), we note that since \(\gamma =0\), by virtue of Assumption 2.1, \(\lambda =1\). We employ an argument similarly to that of the estimate (3.34) and obtain the bound

$$\begin{aligned} \mathcal {L}_{m,0}^N\big ( \varepsilon Rm\langle \textrm{x},\textrm{v}\rangle \big )&\le -a_2\varepsilon R|\textrm{x}|^{2}+ C R\Bigg ( \varepsilon |\textrm{v}|^2+\varepsilon ^{1/2}\sum _{i=1}^N|\textrm{z}_i|^2+\varepsilon ^{3/2}|\textrm{x}|^2+1\Bigg ) \nonumber \\&\quad +C\varepsilon R\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1-1}}. \end{aligned}$$
(3.42)

In the above, C is a positive constant independent of \(\varepsilon \) and R.

Concerning the cross terms \(\langle v_i,z_{i,1}\rangle \), \(i=1,\dots ,N\), on the right-hand side of (3.39), applying Itô’s formula gives

$$\begin{aligned} \mathcal {L}_{m,0}^N \big (m\langle v_i,z_{i,1}\big )\rangle&=\Bigg \langle -\nabla U(x_i)-\sum _{j\ne i}\nabla G(x_i-x_j),z_{i,1}\Bigg \rangle +\sum _{\ell =1}^{k_i}\lambda _{i,\ell }\langle z_{i,\ell }, z_{i,1}\rangle \\&\qquad -\alpha _{i,1}m\langle v_i,z_{i,1}\rangle - \lambda _{i,1}m|v_i|^2. \end{aligned}$$

From the condition (2.1) (\(\lambda =1\)), we have

$$\begin{aligned} -\varepsilon \langle \nabla U(x_i),z_{i,1}\rangle \le a_1 \varepsilon (|x|+1)|z_{i,1}|\le C(\varepsilon ^{3/2}|x_i|^2+\varepsilon ^{1/2}|z_{i,1}|^2+\varepsilon ). \end{aligned}$$

In the last estimate above, we employed Cauchy–Schwarz inequality. Likewise,

$$\begin{aligned} \varepsilon \sum _{\ell =1}^{k_i}\lambda _{i,\ell }\langle z_{i,\ell }, z_{i,1}\rangle -\varepsilon \alpha _{i,1}m\langle v_i,z_{i,1}\rangle \le C\varepsilon ^{1/2}|\textrm{z}_i|^2+C\varepsilon ^{3/2}|v_i|^2. \end{aligned}$$

Also, condition (2.5) implies

$$\begin{aligned} -\sum _{i=1}^N\Bigg \langle \sum _{j\ne i}\nabla G(x_i-x_j),z_{i,1}\Bigg \rangle&\le a_1\sum _{i=1}^N|z_{i,1}|\Bigg (2\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+N^2\Bigg )\\&\le C\sum _{i=1}^N|z_{i,1}|\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}\\ {}&\quad +C\sum _{i=1}^N|\textrm{z}_{i}|^2+C. \end{aligned}$$

It follows that for \(\varepsilon \in (0,1)\) and \(R>1\),

$$\begin{aligned}&\mathcal {L}_{m,0}^N \Bigg (\varepsilon R^2m\sum _{i=1}^N\langle v_i,z_{i,1}\rangle \Bigg ) \le -\varepsilon R^2m \sum _{i=1}^N \lambda _{i,1}|v_i|^2\nonumber \\ {}&\quad +C R^2 \Bigg ( \varepsilon ^{3/2}( |\textrm{x}|^2+|\textrm{v}|^2)+\varepsilon ^{1/2}\sum _{i=1}^N |\textrm{z}_i|^2+\varepsilon \Bigg ) \nonumber \\&\qquad +C\varepsilon R^2\Bigg (\sum _{i=1}^N|\textrm{z}_{i}|\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+\sum _{i=1}^N|\textrm{z}_{i}|^2+1\Bigg ). \end{aligned}$$
(3.43)

In the above, we emphasize again that the positive constant C does not depend on \(\varepsilon \) and R.

Next, we consider the last term on the right-hand side of (3.39). Observe that

$$\begin{aligned} \sum _{i=1}^N \Bigg \langle v_i,\sum _{j\ne i}\frac{ x_i-x_j}{|x_i-x_j|}\Bigg \rangle = \sum _{1\le i<j\le N}\!\!\!\frac{\langle v_i-v_j,x_i-x_j\rangle }{|x_i-x_j|}. \end{aligned}$$

So,

$$\begin{aligned}&\mathcal {L}_{m,0}^N \Bigg (-\varepsilon \sum _{i=1}^N m\Bigg \langle v_i,\sum _{j\ne i}\frac{ x_i-x_j}{|x_i-x_j|}\Bigg \rangle \Bigg ) \sqrt{Q_R^N}\Bigg )\nonumber \\&\quad =-\varepsilon \sqrt{Q_R^N}\,\mathcal {L}_{m,0}^N\Bigg ( \sum _{i=1}^N m \Bigg \langle v_i,\sum _{j\ne i}\frac{ x_i-x_j}{|x_i-x_j|}\Bigg \rangle \Bigg )\nonumber \\ {}&\quad -\varepsilon m\!\!\!\sum _{1\le i<j\le N}\!\!\!\frac{\langle v_i-v_j,x_i-x_j\rangle }{|x_i-x_j|} \mathcal {L}_{m,0}^N\sqrt{Q_R^N} \nonumber \\&=I_1+I_2. \end{aligned}$$
(3.44)

Concerning \(I_1\), from the estimate (3.37) (with \(\lambda =1\)), we have

$$\begin{aligned} I_1&= \sqrt{Q_R^N}\,\mathcal {L}_{m,0}^N \Bigg (-\varepsilon \sum _{i=1}^N m \Bigg \langle v_i,\sum _{j\ne i}\frac{ x_i-x_j}{|x_i-x_j|}\Bigg \rangle \Bigg ) \\&\le -a_4\varepsilon \sqrt{Q_R^N}\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+C\varepsilon \Bigg (1+ |\textrm{v}|+ |\textrm{x}|+\sum _{i=1}^N\sum _{\ell =1}^{k_i} |z_{i,\ell }| \Bigg )\sqrt{Q_R^N} . \end{aligned}$$

Recalling \(Q_R^N\) given by (3.40)

$$\begin{aligned} Q_R^N=R^6\sum _{i=1}^N|z_{i,1}|^2+m|\textrm{v}|^2+2\sum _{i=1}^N U(x_i)+2\!\!\!\sum _{1\le i<j\le N}\!\!\!G(x_i-x_j)+R^2, \end{aligned}$$

it is clear that

$$\begin{aligned} \sqrt{Q_R^N}\ge \Bigg (R^6\sum _{i=1}^N |z_{i,1}|^2+R^2\Bigg )^{1/2}\ge c R^3\sum _{i=1}^N |z_{i,1}|+cR, \end{aligned}$$

whence

$$\begin{aligned}&-a_4\varepsilon \sqrt{Q_R^N}\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} \le \\&\quad -c\varepsilon R^3\sum _{i=1}^N |z_{i,1}|\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} \\&\quad - c\varepsilon R \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} . \end{aligned}$$

On the other hand, in view of conditions (2.1) and (2.4), it holds that

$$\begin{aligned} \sqrt{Q_R^N} \le C\Bigg (R^3\sum _{i=1}^N |z_{i,1}|+\sqrt{m}|\textrm{v}|+|\textrm{x}|+\Bigg (\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}\Bigg )^{1/2}+R\Bigg ). \end{aligned}$$

It follows that

$$\begin{aligned}&\varepsilon \Bigg (1+|\textrm{v}|+ |\textrm{x}|+\sum _{i=1}^N\sum _{\ell =1}^{k_i} |z_{i,\ell }| \Bigg )\sqrt{Q_R^N} \\&\le C\,\varepsilon \Bigg ( R^6 \sum _{i=1}^N |\textrm{z}_i|^2+|\textrm{v}|^2+|\textrm{x}|^2+\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+R^2\Bigg ). \end{aligned}$$

As a consequence,

$$\begin{aligned} I_1&\le -a_4\varepsilon \sqrt{Q_R^N}\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+C\varepsilon \Bigg (1+ |\textrm{v}|+ |\textrm{x}|+\sum _{i=1}^N\sum _{\ell =1}^{k_i} |z_{i,\ell }| \Bigg )\sqrt{Q_R^N} \nonumber \\&\le -c\varepsilon R^3\sum _{i=1}^N |z_{i,1}|\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} - c\varepsilon R \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} \nonumber \\&\qquad +C\,\varepsilon \Bigg ( R^6 \sum _{i=1}^N |\textrm{z}_i|^2+|\textrm{v}|^2+|\textrm{x}|^2+\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+R^2\Bigg ), \end{aligned}$$
(3.45)

holds for some positive constants \(c,\, C\) independent of \(\varepsilon \) and R.

With regard to \(I_2\) on the right-hand side of (3.44), Itô’s formula yields the identity

$$\begin{aligned} \mathcal {L}_{m,0}^N\sqrt{Q_R^N}&= \frac{1}{\sqrt{Q_R^N}}\sum _{i=1}^N\Bigg [ \Bigg \langle v_i,\sum _{\ell =1}^{k_i}\lambda _{i,\ell } z_{i,\ell }\Bigg \rangle -R^6\alpha _{i,1} |z_{i,1}|^2-R^6\langle z_{i,\ell },v_i\rangle \\&\quad +\tfrac{1}{2}\alpha _{i,1}d-\frac{1}{2Q_R^N} \alpha _{i,1}|z_{i,1}|^2\Bigg ]. \end{aligned}$$

Since

$$\begin{aligned} \sum _{i=1}^N U(x_i)+\sum _{1\le i<j\le N}G(x_i-x_j)\ge 0, \end{aligned}$$

we deduce the bound

$$\begin{aligned} |\mathcal {L}_{m,0}^N\sqrt{Q_R^N}| \le C R^6 \frac{|\textrm{v}|\sum _{i=1}^N |\textrm{z}_i|+\sum _{i=1}^N |\textrm{z}_i|^2+1}{(R^6\sum _{i=1}^N |\textrm{z}_i|^2+|\textrm{v}|^2+1)^{1/2}} \le C R^6 \Bigg (\sum _{i=1}^N |\textrm{z}_i|+1\Bigg ). \end{aligned}$$

It follows that \(I_2\) satisfies

$$\begin{aligned}&I_2 =-\varepsilon m\!\!\!\sum _{1\le i<j\le N}\!\!\!\frac{\langle v_i-v_j,x_i-x_j\rangle }{|x_i-x_j|} \mathcal {L}_{m,0}^N\sqrt{Q_R^N}\\&\quad \le C\varepsilon \!\!\!\sum _{1\le i<j\le N}|v_i-v_j| R^6 \Bigg (\sum _{i=1}^N |\textrm{z}_i|+1\Bigg ) \\&\quad \le C\varepsilon R^6\sum _{i=1}^N|v_i| \Bigg (\sum _{i=1}^N |\textrm{z}_i|+1\Bigg ), \end{aligned}$$

whence

$$\begin{aligned} I_2&\le C\varepsilon \Bigg (|\textrm{v}|^2+ R^{12} \sum _{i=1}^N |\textrm{z}_i|^2+ R^{12}\bigg ). \end{aligned}$$
(3.46)

Now, we collect (3.45), (3.46) together with (3.44) to arrive at the bound

$$\begin{aligned}&\mathcal {L}_{m,0}^N \Bigg (-\varepsilon \sum _{i=1}^N m\Bigg \langle v_i,\sum _{j\ne i}\frac{ x_i-x_j}{|x_i-x_j|}\Bigg \rangle \Bigg ) \sqrt{Q_R^N}\Bigg )\\&\quad \le -c\varepsilon R^3\sum _{i=1}^N |z_{i,1}|\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} - c\varepsilon R \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} \\&\qquad +C\varepsilon \Bigg ( R^6 \sum _{i=1}^N |\textrm{z}_i|^2+|\textrm{v}|^2+|\textrm{x}|^2+\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+R^2\Bigg )\\&\qquad + C\varepsilon \Bigg (|\textrm{v}|^2+ R^{12} \sum _{i=1}^N |\textrm{z}_i|^2+ R^{12}\Bigg ), \end{aligned}$$

whence

$$\begin{aligned}&\mathcal {L}_{m,0}^N \Bigg (-\varepsilon \sum _{i=1}^N m\Bigg \langle v_i,\sum _{j\ne i}\frac{ x_i-x_j}{|x_i-x_j|}\Bigg \rangle \Bigg ) \sqrt{Q_R^N}\Bigg ) \nonumber \\&\quad \le -c\varepsilon R^3\sum _{i=1}^N |z_{i,1}|\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} - c\varepsilon R \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} \nonumber \\&\qquad +C\varepsilon \Bigg ( R^{12} \sum _{i=1}^N |\textrm{z}_i|^2+|\textrm{v}|^2+|\textrm{x}|^2+\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+R^{12}\Bigg ). \end{aligned}$$
(3.47)

In the above, we emphasize that cC are independent of \(\varepsilon \) and R.

Turning back to \(V_N^2\) given by (3.39), from the estimates (3.41), (3.42), (3.43) and (3.47), we obtain

$$\begin{aligned} \mathcal {L}_{m,0}^N V_N^2&\le -I_3+I_4, \end{aligned}$$

where

$$\begin{aligned} I_3&= -\sum _{i=1}^N \sum _{\ell =1}^{k_i}\alpha _{i,\ell }|z_{i,\ell }|^2 -a_2\varepsilon R|\textrm{x}|^{2} -\varepsilon R^2 m\sum _{i=1}^N \lambda _{i,1}|v_i|^2\\&\quad -c\varepsilon R^3\sum _{i=1}^N |z_{i,1}|\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} - c\varepsilon R \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} , \end{aligned}$$

and

$$\begin{aligned} I_4&=\frac{1}{2}\sum _{i=1}^N\sum _{\ell =1}^{k_i}\alpha _{i,\ell } + C R\Bigg ( \varepsilon |\textrm{v}|^2+\varepsilon ^{1/2}\sum _{i=1}^N|\textrm{z}_i|^2+\varepsilon ^{3/2}|\textrm{x}|^2+1\Bigg )\\&\quad +C\varepsilon R^2\Bigg (\sum _{i=1}^N|z_{i,1}|\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+\sum _{i=1}^N|\textrm{z}_{i}|^2+1\Bigg )\\&\qquad +C\varepsilon R\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1-1}}+C R^2 \Bigg ( \varepsilon ^{3/2}( |\textrm{x}|^2+|\textrm{v}|^2)+\varepsilon ^{1/2}\sum _{i=1}^N |\textrm{z}_i|^2+\varepsilon \Bigg ) \\&\qquad +C\varepsilon \Bigg ( R^{12} \sum _{i=1}^N |\textrm{z}_i|^2+|\textrm{v}|^2+|\textrm{x}|^2+\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}+R^{12}\Bigg ). \end{aligned}$$

Since the constants cC are independent of \(\varepsilon , R\), we may infer

$$\begin{aligned} \mathcal {L}_{m,0}^N V_N^2&\le -c\sum _{i=1}^N |\textrm{z}_{i}|^2 -c\varepsilon R|\textrm{x}|^{2} -c\varepsilon R^2 |\textrm{v}|^2\\&\quad -c\varepsilon R^3\sum _{i=1}^N |z_{i,1}|\!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} - c\varepsilon R \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} \\&\quad +C\varepsilon R(1+R\varepsilon ^{1/2})|\textrm{v}|^2+ C\varepsilon (1+R^2\varepsilon ^{1/2})|\textrm{x}|^2 +C\varepsilon ^{1/2}R^{12}\sum _{i=1}^N |\textrm{z}_i|^2\\&\quad +C\varepsilon R^2\sum _{i=1}^N|z_{i,1}|\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}\!+\!C\varepsilon \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}}\!+\!CR^{12}. \end{aligned}$$

Now, by first taking R sufficiently large and then shrinking \(\varepsilon \) small enough, we observe that all the positive (non-constant) terms on the above right-hand side are dominated by the negative terms. That is, the following holds

$$\begin{aligned} \mathcal {L}_{m,0}^N V_N^2&\le -c\sum _{i=1}^N |\textrm{z}_{i}|^2 -c\varepsilon R|\textrm{x}|^{2} -c\varepsilon R^2 |\textrm{v}|^2 - c\varepsilon R \!\!\!\sum _{1\le i<j\le N}\frac{1}{|x_i-x_j|^{\beta _1}} +CR^{12}. \end{aligned}$$

This produces the desired Lyapunov property of \(V_N^2\) for (1.5) in the case \(\gamma =0\). The proof is thus finished. \(\square \)

3.3 Proof of Theorem 2.8

The proof of Theorem 2.8 is based on the Lyapunov functions constructed in Sect. 3.2 and a local minorization, cf. Definition 3.2, on the transition probabilities \(P^m_t(X,\cdot )\). The latter property is summarized in the following auxiliary result.

Lemma 3.7

Under Assumption 2.1 and Assumption 2.3, system (1.5) satisfies the minorization condition as in Definition 3.2.

The proof of Lemma 3.7 is relatively standard and can be found in literature for the Langevin dynamics (Herzog and Mattingly 2019; Ottobre and Pavliotis 2011). For the sake of completeness, we briefly sketch the argument without going into details.

Sketch of the proof of Lemma 3.7

First of all, by verifying a Hormander’s condition, see Ottobre and Pavliotis (2011, page 1639), we note that the operator \(\partial _t+\mathcal {L}_{m,\gamma }^N\) (\(\gamma \ge 0\)) is hypoelliptic (Hörmander 1967). This implies that the transition probabilities \(P_t^m\) has a smooth density in \(\textbf{X}\). Furthermore, we may adapt the proof of Herzog and Mattingly (2019, Proposition 2.5) to study the control problem for (1.5). In particular, by the Stroock-Varadhan support Theorem, it can be shown that \(\mathbb {P}_t^m(X,A)>0\) for \(t>0,X\in \textbf{X}\) and every open set \(A\subset \textbf{X}\). We then employ the same argument as those in Herzog and Mattingly (2019, Corollary 5.12) and Mattingly et al. (2002, Lemma 2.3) to conclude the inequality (3.2), thereby concluding the minorization. \(\square \)

We are now in a position to conclude Theorem 2.8. The proof employs the weak-Harris theorem proved in (Hairer and Mattingly 2011, Theorem 1.2).

Proof of Theorem 2.8

First of all, it is clear that \(\pi _N\) defined in (1.7) is an invariant measure for (1.5). Next, we observe that from Lemma 3.5 and Lemma 3.6, the functions \(V^1_N\) and \(V^2_N\), respectively, defined in (3.31) and (3.39) are Lyapunov functions for (1.5) in the case \(\gamma >0\) and \(\gamma =0\). It follows that Hairer and Mattingly (2011, Assumption 1) holds. On the other hand, Lemma 3.7 verifies Hairer and Mattingly (2011, Assumption 2). In view of Hairer and Mattingly (2011, Theorem 1.2), we conclude the uniqueness of \(\pi _N\) as well as the exponential convergent rate (2.16). \(\square \)

4 Small-Mass Limit

We turn to the topic of the small-mass limit for (1.5). In Sect. 4.1, we provide a heuristic argument on how we derive the limiting system (1.10) as \(m\rightarrow 0\). We also outline the main steps of the proof of Theorem 2.10 in this section. In Sect. 4.2, we prove a partial result on the small-mass limit assuming the nonlinearities are globally Lipschitz. In Sect. 4.3, we establish useful moment estimates on the limiting system (1.10). Lastly, in Sect. 4.4, we establish Theorem 2.10 while making use of the auxiliary results from Sect. 4.2 and Sect. 4.3.

4.1 Heuristic Argument for the Limiting System (1.10)

In this subsection, we provide a heuristic argument detailing how we derive the limiting system (1.10) as well as its initial conditions from the original system (1.5). The argument draws upon recent works in Herzog et al. (2016), Nguyen (2018) where similar issues were dealt with in the absence of singular potentials. We formally set \(m=0\) on the left-hand side of the \(v_i-\)equation in (1.5) while substituting the \(v_i(t)\) term on the right-hand side by \(\text {d}x_i(t)\) to obtain

$$\begin{aligned} \gamma \text {d}x_i(t)&=\Bigg [-\nabla U(x_i(t))-\sum _{j\ne i}\nabla G(x_i(t)-x_j(t))+\sum _{\ell =1}^{k_i}\lambda _{i,\ell } z_{i,\ell }(t)\Bigg ]\text {d}t\\ {}&\quad +\sqrt{2\gamma }dW_0(t). \end{aligned}$$

Next, considering the \(z_{i,\ell }-\)equation in (1.5), by Duhamel’s formula, \(z_{i,\ell }(t)\) may be written as:

$$\begin{aligned} z_{i,\ell }(t)=e^{-\alpha _{i,\ell } t}z_{i,\ell }(0)-\lambda _{i,\ell }\int _0^t e^{-\alpha _{i,\ell }(t-r)}v_i(r)\text {d}r\!+\!\sqrt{2\alpha _{i,\ell }}\int _0^t e^{-\alpha _{i,\ell }(t\!-\!r)}\text {d}W_{i,\ell }(r). \end{aligned}$$
(4.1)

We note that the above expression still depends on \(v_i(t)\). Nevertheless, this can be circumvented by employing an integration by parts as follows:

$$\begin{aligned} \int _0^t e^{-\alpha _{i,\ell }(t-r)}v_i(r)\text {d}r = x_i(t)-e^{-\alpha _{i,\ell } t}x_i(0)+\alpha _{i,\ell }\int _0^t e^{-\alpha _{i,\ell }(t-r)}x_i(r)\text {d}r. \end{aligned}$$

Alternatively, we note that the above identity can be derived by applying Itô’s formula to \(e^{\alpha _{i,\ell } t}x_i(t)\). Plugging back into (4.1), we find

$$\begin{aligned} z_{i,\ell }(t)&=e^{-\alpha _{i,\ell } t}(z_{i,\ell }(0)+\lambda _{i,\ell } x_i(0))-\lambda _{i,\ell } x_i(t)-\lambda _{i,\ell }\alpha _{i,\ell }\int _0^t e^{-\alpha _{i,\ell }(t-r)}x_i(r)\text {d}r \nonumber \\&\quad +\sqrt{2\alpha _{i,\ell }}\int _0^t e^{-\alpha _{i,\ell }(t-r)}\text {d}W_{i,\ell }(r), \end{aligned}$$
(4.2)

whence

$$\begin{aligned} z_{i,\ell }(t)+\lambda _{i,\ell } x_i(t)&=e^{-\alpha _{i,\ell } t}(z_{i,\ell }(0)+\lambda _{i,\ell } x_i(0))-\lambda _{i,\ell }\alpha _{i,\ell }\int _0^t e^{-\alpha _{i,\ell }(t-r)}x_i(r)\text {d}r\\&\quad +\sqrt{2\alpha _{i,\ell }}\int _0^t e^{-\alpha _{i,\ell }(t-r)}\text {d}W_{i,\ell }(r). \end{aligned}$$

Setting \(f_{i,\ell }(t){:}{=}z_{i,\ell }(t)+\lambda _{i,\ell } x_i(t)\), we observe that (using Duhamel’s formula again)

$$\begin{aligned} \text {d}f_{i,\ell }(t)&= -\alpha _{i,\ell } f_{i,\ell }(t)+\lambda _{i,\ell }\, \alpha _{i,\ell }\, x_i(t)\text {d}t+\sqrt{2\alpha _{i,\ell }}\text {d}W_{i,\ell }(t),\quad \ell =1,\dots , k_i,\\ f_{i,\ell }(0)&=z_{i,\ell }(0)+\lambda _{i,\ell } x_i(0). \end{aligned}$$

This together with setting \(q_i(t){:}{=}x_i(t)\) deduces the limiting system (1.10) as well as the corresponding shifted initial conditions as in Theorem 2.10.

Next, for the reader’s convenience, we summarize the idea of the proof of Theorem 2.10. The argument essentially consists of three steps as follows (Herzog et al. 2016; Lim and Wehr 2019; Lim et al. 2020).

Step 1: We first truncate the nonlinear potentials in (1.5) and (1.10) obtaining a truncated system whose coefficients are globally Lipschitz, and establish a convergence in probability in the small mass limit. This result appears in Proposition 4.1 found in Sect. 4.2.

Step 2: Next, we establish an exponential moment bound on any finite-time window for the limiting system (1.10). This is discussed in details in Sect. 4.3, cf. Lemma 4.3.

Step 3: We prove Theorem 2.10 by removing the Lipschitz constraint from Proposition 4.1 while making use of the energy estimates in Lemma 4.3.

4.2 Truncating (1.5) and (1.10)

For \(R>2\), let \(\theta _R:[0,\infty )\rightarrow \mathbb {R}\) be a smooth function satisfying

$$\begin{aligned} \theta _R(t) = {\left\{ \begin{array}{ll} 1,&{} 0\le t\le R,\\ \text {decreasing},&{} R\le t\le R+1,\\ 0,&{} t\ge R+1. \end{array}\right. } \end{aligned}$$
(4.3)

With the above cut-off \(\theta _R\), we consider a truncating version of (1.5) given by

$$\begin{aligned} \text {d}\, x_i(t)&= v_i(t)\text {d}t,\qquad i=1,\dots ,N, \nonumber \\ m\text {d}\, v_i(t)&= -\gamma v_i(t) \text {d}t -\theta _R(|x_i(t)|)\nabla U(x_i(t))\text {d}t +\sqrt{2\gamma } \,\text {d}W_{i,0}(t)\nonumber \\&\qquad - \sum _{j\ne i}\theta _R\big (|x_i(t)-x_j(t)|^{-1}\big )\nabla G\big (x_i(t)-x_j(t)\big ) \text {d}t +\sum _{\ell =1}^{k_i} \lambda _{i,\ell } z_{i,\ell }(t)\text {d}t, \nonumber \\ \text {d}\, z_{i,\ell }(t)&= -\alpha _{i,\ell } z_{i,\ell } (t)\text {d}t-\lambda _{i,\ell } v_i(t)\text {d}t+\sqrt{2\alpha _{i,\ell } }\,\text {d}W_{i,\ell } (t),\quad \ell =1,\dots ,k_i, \end{aligned}$$
(4.4)

as well as the following truncated version of (1.10)

$$\begin{aligned} \gamma \text {d}q_i(t)&= -\theta _R(|q_i(t)|)\nabla U(q_i(t))\text {d}t- \sum _{j\!\ne \! i}\theta _R\!\big (|q_i(t)-q_j(t)|^{-1}\big )\nabla G\big (q_i(t)\!-\!q_j(t)\!\big ) \text {d}t \nonumber \\&\quad -\sum _{i=1}^{k_i} \lambda _{i,\ell }^2 q_i(t)\text {d}t+\sum _{\ell =1}^{k_i}\lambda _{i,\ell } f_{i,\ell }(t)\text {d}t+\sqrt{2\gamma }\text {d}W_{i,0}(t), \nonumber \\ \text {d}f_{i,\ell }(t)&= -\alpha _{i,\ell } f_{i,\ell }(t)+\lambda _{i,\ell }\, \alpha _{i,\ell }\, q_i(t)\text {d}t+\sqrt{2\alpha _{i,\ell }}\text {d}W_{i,\ell }(t),\quad \ell =1,\dots , k_i,\nonumber \\ q_i(0)&= x_i(0),\quad f_{i,\ell }(0)=z_{i,\ell }(0)+\lambda _{i,\ell } x_i(0). \end{aligned}$$
(4.5)

We now show that system (4.4) can be approximated by (4.5) on any finite-time window in the small-mass regime.

Proposition 4.1

Under Assumptions 2.1 and 2.3, given any initial condition \((\textrm{x}(0)\), \(\textrm{v}(0)\), \(\textrm{z}_{1}(0)\),\(\dots \), \(\textrm{z}_N(0))\in \textbf{X}\) and \(R>2\), let \(\big (\textrm{x}_m^R(t)\), \(\textrm{v}_m^R(t)\), \(\textrm{z}_{1,m}^R(t)\),\(\dots \), \(\textrm{z}_{N,m}^R(t)\big )\) and \(\big (\textrm{q}^R(t)\), \(\textrm{f}_{1}^R(t)\),\(\dots \), \(\textrm{f}_{N}^R(t)\big )\) respectively solve (4.4) and (4.5). Then, for every \(T>0\), the following holds

$$\begin{aligned} \mathbb {E}\Bigg [\sup _{0\le t\le T}\big |\textrm{x}_m^R(t)-\textrm{q}^R(t)\big |^4\Bigg ]\le m\cdot C,\quad \text {as}\quad m\rightarrow 0, \end{aligned}$$
(4.6)

for some positive constant \(C=C(T,R)\) independent of m.

In order to establish Proposition 4.1, it is crucial to derive useful moment bounds on the velocity process \(\textrm{v}_m^R(t)\). More precisely, we have the following result.

Lemma 4.2

Under Assumptions 2.1 and 2.3, given any initial condition \((\textrm{x}(0)\), \(\textrm{v}(0)\), \(\textrm{z}_{1}(0)\),\(\dots \), \(\textrm{z}_N(0))\in \textbf{X}\) and \(R>2\), let \(\big (\textrm{x}_m^R(t)\), \(\textrm{v}_m^R(t)\), \(\textrm{z}_{1,m}^R(t)\),\(\dots \), \(\textrm{z}_{N,m}^R(t)\big )\) be the solution of (4.4). Then, for every \(T>0\), \(n>1\) and \(\varepsilon >0\), it holds that

$$\begin{aligned} m^{n}\mathbb {E}\Big [\sup _{0\le t\le T}\big |\textrm{v}_m^R(t)\big |^n\Big ]\le m^{\frac{n}{2}-\varepsilon } C,\quad \text {as}\quad m\rightarrow 0, \end{aligned}$$
(4.7)

for some positive constant \(C=C(T,n,R,\varepsilon )\) independent of m.

For the sake of clarity, the proof of Lemma 4.2 will be deferred to the end of this subsection. In what follows, we will assume Lemma 4.2 holds and prove Proposition 4.1. The argument is adapted from the proof of (Nguyen 2018, Proposition 9) tailored to our settings.

Proof of Proposition 4.1

From the \((x_i,v_i)-\)equations in (4.4), we find

$$\begin{aligned} m\text {d}v_i^R(t)+\gamma \text {d}x_i^R(t)&= -\theta _R(x_i^R(t))\nabla U(x_i^R(t))\text {d}t +\sqrt{2\gamma } \,\text {d}W_{i,0}(t)\\&\quad - \sum _{j\ne i}\theta _R\big (|x_i^R(t)-x_j^R(t)|^{-1}\big )\nabla G\big (x_i^R(t)-x_j^R(t)\big ) \text {d}t \\&\quad +\sum _{\ell =1}^{k_i} \lambda _{i,\ell } z_{i,\ell }^R(t)\text {d}t. \end{aligned}$$

Substituting \(z_{i,\ell }^R(t)\) by the expression (4.2) into the above equation produces

$$\begin{aligned}&m\text {d}v_i^R(t)+\gamma \text {d}x_i^R(t) \nonumber \\&\quad = -\theta _R(|x_i^R(t)|)\nabla U(x_i^R(t))\text {d}t \nonumber \\&\qquad +\sqrt{2\gamma } \,\text {d}W_{i,0}(t) - \sum _{j\ne i}\theta _R\big (|x_i^R(t)-x_j^R(t)|^{-1}\big )\nabla G\big (x_i^R(t)-x_j^R(t)\big ) \text {d}t \nonumber \\&\qquad +\sum _{\ell =1}^{k_i} \lambda _{i,\ell }e^{-\alpha _{i,\ell } t}\big [z_{i,\ell }(0)+ \lambda _{i,\ell } x_i(0)\big ]\nonumber \\&\qquad -\Bigg (\sum _{\ell =1}^{k_i}\lambda _{i,\ell }^2\Bigg ) x_i^R(t) \nonumber \\&\qquad -\sum _{\ell =1}^{k_i}\lambda _{i,\ell }^2\alpha _{i,\ell }\int _0^t e^{-\alpha _{i,\ell }(t-r)}x_i^R(r)\text {d}r \nonumber \\&\qquad +\sum _{\ell =1}^{k_i}\lambda _{i,\ell }\sqrt{2\alpha _{i,\ell }}\int _0^t e^{-\alpha _{i,\ell }(t-r)}\text {d}W_{i,\ell }(r)\text {d}t. \end{aligned}$$
(4.8)

Similarly, from the \(f_{i,\ell }-\)equation of system (4.5), we have

$$\begin{aligned} f_{i,\ell }^R(t)&=e^{-\alpha _{i,\ell } t}(z_{i,\ell }(0)\!+\!\lambda _{i,\ell } x_i(0))\!-\!\lambda _{i,\ell }\alpha _{i,\ell }\!\int _0^t e^{-\alpha _{i,\ell }(t\!-\!r)}q_i^R(r)\text {d}r\\&\qquad \qquad \qquad +\sqrt{2\alpha _{i,\ell }}\int _0^t e^{-\alpha _{i,\ell }(t-r)}\text {d}W_{i,\ell }(r). \end{aligned}$$

Plugging into \(q_i-\)equation of (4.5) yields

$$\begin{aligned}&\gamma \text {d}q_i^R(t) \nonumber \\&\quad = -\theta _R(|q_i^R(t)|)\nabla U(q_i^R(t))\text {d}t \nonumber \\&\qquad +\sqrt{2\gamma } \,\text {d}W_{i,0}(t - \sum _{j\ne i}\theta _R\big (|q_i^R(t)-q_j^R(t)|^{-1}\big )\nabla G\big (q_i^R(t)-q_j^R(t)\big ) \text {d}t \nonumber \\&\qquad +\sum _{\ell =1}^{k_i} \lambda _{i,\ell }e^{-\alpha _{i,\ell } t}\big [z_{i,\ell }(0)+ \lambda _{i,\ell } x_i(0)\big ]-\Bigg (\sum _{\ell =1}^{k_i}\lambda _{i,\ell }^2\Bigg ) q_i^R(t) \nonumber \\&\qquad -\sum _{\ell =1}^{k_i}\lambda _{i,\ell }^2\alpha _{i,\ell }\int _0^t e^{-\alpha _{i,\ell }(t-r)}x_i^R(r)\text {d}r \nonumber \\&\qquad +\sum _{\ell =1}^{k_i}\lambda _{i,\ell }\sqrt{2\alpha _{i,\ell }}\int _0^t e^{-\alpha _{i,\ell }(t-r)}\text {d}W_{i,\ell }(r)\text {d}t. \end{aligned}$$
(4.9)

Setting \(\bar{x}_i^R=x_i^R-q_i^R\), we subtract (4.9) from (4.8) to obtain the identity

$$\begin{aligned} m\text {d}v_i^R(t)+\gamma \text {d}\bar{x}_i^R(t)&= -\Big [\theta _R(|x_i^R(t)|)\nabla U(x_i^R(t))-\theta _R(|q_i^R(t)|)\nabla U(q_i^R(t))\Big ]\text {d}t \nonumber \\&\qquad - \sum _{j\ne i}\Big [\theta _R\big (|x_i^R(t)-x_j^R(t)|^{-1}\big )\nabla G\big (x_i^R(t)-x_j^R(t)\big )\nonumber \\&\qquad \qquad -\theta _R\big (|q_i^R(t)-q_j^R(t)|^{-1}\big )\nabla G\big (q_i^R(t)-q_j^R(t)\big )\Big ] \text {d}t \nonumber \\&\qquad -\Big (\sum _{\ell =1}^{k_i}\lambda _{i,\ell }^2\Big ) \bar{x}_i^R(t)-\sum _{\ell =1}^{k_i}\lambda _{i,\ell }^2\alpha _{i,\ell }\int _0^t e^{-\alpha _{i,\ell }(t-r)}\bar{x}_i^R(r)\text {d}r . \end{aligned}$$
(4.10)

By the choice of \(\theta _R\) as in (4.3) and conditions (2.2) and (2.6), we invoke the mean value theorem to infer

$$\begin{aligned} |\theta _R(|x|)\nabla U(x)-\theta _R(|y|)\nabla U(y)|\le C|x-y|,\quad x,y\in \mathbb {R}^d, \end{aligned}$$

and

$$\begin{aligned} |\theta _R\big (|x|^{-1}\big )\nabla G(x)-\theta _R\big (|y|^{-1}\big )\nabla G(y)|\le C|x-y|,\quad x,y\in \mathbb {R}^d\setminus \{0\}, \end{aligned}$$

for some positive constant \(C=C(R)\). From (4.10), we arrive at the a.s. bound

$$\begin{aligned} |\bar{x}_i^R(t)|^n\le C m^n|v_i^R(t)-v(0)|^n+C\int _0^t \sum _{j=1}^N|\bar{x}_j^R(r)|^n\text {d}r, \end{aligned}$$

whence

$$\begin{aligned} |\bar{\textrm{x}}^R(t)|^n\le C m^n|\textrm{v}^R(t) -\textrm{v}(0)|^n+C\int _0^t \sum _{j=1}^N|\bar{\textrm{x}}^R(r)|^n\text {d}r. \end{aligned}$$

Gronwall’s inequality implies that

$$\begin{aligned} \mathbb {E}\sup _{t\in [0,T]}|\bar{\textrm{x}}^R(t)|^n \le C m^n\mathbb {E}\sup _{t\in [0,T]}|\textrm{v}^R(t)-\textrm{v}(0)|^n. \end{aligned}$$

Setting \(n=4\), by virtue of Lemma 4.2, this produces the small-mass limit (4.6), as claimed. \(\square \)

We now turn to the proof of Lemma 4.2.

Proof of Lemma 4.2

From the \(v_i-\)equation in (4.4), variation constant formula yields

$$\begin{aligned} mv_i^R(t)&=me^{-\frac{\gamma }{m}t}v_i(0)-\int _0^t e^{-\frac{\gamma }{m}(t-r)}\theta _R(|x_i^R(r)|)\nabla U(x_i^R(r))\text {d}r \nonumber \\&\qquad -\int _0^t e^{-\frac{\gamma }{m}(t-r)}\sum _{j\ne i}\Big [\theta _R\big (|x_i^R(r)-x_j^R(r)|^{-1}\big ) \nabla G(x_i^R(r)-x_j^R(r))\Big ]\text {d}r \nonumber \\&\qquad +\sum _{\ell =1}^{k_i}\lambda _{i,\ell }\int _0^t e^{-\frac{\gamma }{m}(t-r)}z_{i,\ell }^R(r)\text {d}r+\sqrt{2\gamma }\int _0^t e^{-\frac{\gamma }{m}(t-r)}\text {d}W_{i,0}(r). \end{aligned}$$
(4.11)

With regard to the integral involving \(z_{i,\ell }\), since \(z_{i,\ell }^R\) satisfies the third equation in (4.4), we have

$$\begin{aligned} z_{i,\ell }^R(t)=e^{-\alpha _{i,\!\ell } t}z_{i,\ell }(0)\!-\!\lambda _{i,\ell }\!\int _0^t e^{-\alpha _{i,\!\ell }(t-r)}v_i^R(r)\text {d}r+\sqrt{2\alpha _{i,\ell }}\!\int _0^t e^{-\alpha _{i,\ell }(t-r)}\text {d}W_{i,\ell }(r). \end{aligned}$$

Plugging back into (4.11), we obtain the identity

$$\begin{aligned} mv_i^R(t)&=me^{-\frac{\gamma }{m}t}v_i(0)-\int _0^t e^{-\frac{\gamma }{m}(t-r)}\theta _R(|x_i^R(r)|)\nabla U(x_i^R(r))\text {d}r \nonumber \\&\qquad -\int _0^t e^{-\frac{\gamma }{m}(t-r)}\sum _{j\ne i}\Big [\theta _R\big (|x_i^R(r)-x_j^R(r)|^{-1}\big ) \nabla G(x_i^R(r)-x_j^R(r))\Big ]\text {d}r \nonumber \\&\qquad +\sum _{\ell =1}^{k_i}\lambda _{i,\ell }\int _0^t e^{-\frac{\gamma }{m}(t-r)}e^{-\alpha _{i,\ell } r}z_{i,\ell }(0)\text {d}r\nonumber \\&\qquad -\sum _{\ell =1}^{k_i}\lambda _{i,\ell }^2\int _0^t e^{-\frac{\gamma }{m}(t-r)}\int _0^r e^{-\alpha _{i,\ell }(r-s)}v_i^R(s)\text {d}s \text {d}r \nonumber \\&\qquad +\sum _{\ell =1}^{k_i}\lambda _{i,\ell }\sqrt{2\alpha _{i,\ell }}\int _0^t e^{-\frac{\gamma }{m}(t-r)}\int _0^r e^{-\alpha _{i,\ell }(r-s)}\text {d}W_{i,\ell }(s)\text {d}r\nonumber \\&\qquad + \sqrt{2\gamma }\int _0^t e^{-\frac{\gamma }{m}(t-r)}\text {d}W_{i,0}(r)\nonumber \\&= me^{-\frac{\gamma }{m}t}v_i(0)-I_1-I_2+I_3-I_4+I_5+I_6. \end{aligned}$$
(4.12)

Concerning \(I_1\), we invoke condition (2.1) while making use of the choice of \(\theta _R\) as in (4.3) to obtain

$$\begin{aligned} |I_1|\le C\int _0^t e^{-\frac{\gamma }{m}(t-r)}\text {d}r= m C, \end{aligned}$$

for some positive constant \(C=C(R)\) independent of m. Likewise, we employ condition (2.5) to infer

$$\begin{aligned} |I_2|\le C\int _0^t e^{-\frac{\gamma }{m}(t-r)}\text {d}r= m C. \end{aligned}$$

Similarly, it is clear that

$$\begin{aligned} |I_3| \le m\sum _{\ell =1}^{k_i}\lambda _{i,\ell }|z_{i,\ell }(0)|. \end{aligned}$$

Concerning \(I_4\), observe that

$$\begin{aligned} |I_4|\le \sum _{\ell =1}^{k_i}\lambda _{i,\ell }^2\int _0^t e^{-\frac{\gamma }{m}(t-r)}\text {d}r\int _0^t \sup _{s\in [0,r]}|v_i^R(s)| \text {d}r\le m\sum _{\ell =1}^{k_i}\lambda _{i,\ell }^2\int _0^t \sup _{s\in [0,r]}|v_i^R(s)| \text {d}r. \end{aligned}$$

With regard to the noise term \(I_5\), note that

$$\begin{aligned}&\Big |\int _0^t e^{-\frac{\gamma }{m}(t-r)}\int _0^r e^{-\alpha _{i,\ell }(r-s)}\text {d}W_{i,\ell }(s)\text {d}r\Big |^n \\&\quad \le \Big |\int _0^t e^{-\frac{\gamma }{m}(t-r)}\text {d}r\Big |^n \sup _{r\in [0,T]}\Big |\int _0^r e^{-\alpha _{i,\ell }(r-s)}\text {d}W_{i,\ell }(s)\Big |^n\\&\quad \le \frac{m^n}{\gamma ^n} \sup _{r\in [0,T]}\Big |\int _0^r e^{-\alpha _{i,\ell }(r-s)}\text {d}W_{i,\ell }(s)\Big |^n. \end{aligned}$$

We employ Burkholder’s inequality to infer

$$\begin{aligned}&\mathbb {E}\sup _{t\in [0,T]}\Big |\int _0^t e^{-\frac{\gamma }{m}(t-r)}\int _0^r e^{-\alpha _{i,\ell }(r-s)}\text {d}W_{i,\ell }(s)\text {d}r\Big |^n\\&\quad \le \frac{m^n}{\gamma ^n}\mathbb {E}\sup _{t\in [0,T]}\Big |\int _0^t e^{\alpha _{i,\ell }s}\text {d}W_{i,\ell }(s)\Big |^n \le m^n C(T,n). \end{aligned}$$

Turning to \(I_6\), we set

$$\begin{aligned} Y_{i,0}(t){:}{=}\int _0^t e^{-\beta (t-r)}\text {d}W_{i,0}(r),\quad \beta {:}{=}\frac{\gamma }{m}, \end{aligned}$$

and let \(Z_{i,0}\sim N(0,1)\) be a random variable independent of \(W_{i,0}(t)\). Then, the process

$$\begin{aligned} X_{i,0}(t){:}{=}Z_{i,0}e^{-\beta t}+\sqrt{2\beta }Y_{i,0}(t) \end{aligned}$$

is a stationary solution to

$$\begin{aligned} \text {d}X_{i,0}(t)=-\beta X_{i,0}(t)\text {d}t+\sqrt{2\beta }\text {d}W_{i,0}(t),\quad X_{i,0}(0)=Z. \end{aligned}$$

For \(n\ge 1\), it holds by the definition of \(X_{i,0}(t)\) that

$$\begin{aligned} \mathbb {E}\sup _{t\in [0,T]}|Y_{i,0}(t)|^n \le C\beta ^{-\frac{n}{2}}\bigg ( 1+\mathbb {E}\sup _{t\in [0,T]}|X_{i,0}(t)|^n \bigg ). \end{aligned}$$

By Pavliotis et al. (2022, Lemma B.1), it holds for all \(n>1\) that

$$\begin{aligned} \mathbb {E}\sup _{t\in [0,T]}|X_{i,0}(t)|^n\le C\big ( 1+\log (1+\beta T) \big )^{n/2}. \end{aligned}$$

As a consequence, for all \(\varepsilon >0\)

$$\begin{aligned} \beta ^{\frac{n}{2}-\varepsilon }\mathbb {E}\sup _{t\in [0,T]}|Y_{i,0}(t)|^n <C(T,n). \end{aligned}$$

In other words,

$$\begin{aligned} \mathbb {E}\sup _{t\in [0,T]}|Y_{i,0}(t)|^n < m^{\frac{n}{2}-\varepsilon } C(T,n). \end{aligned}$$

Now we collect the above estimates together with expression (4.12) to deduce that for all \(n>1\) and \(\varepsilon >0\)

$$\begin{aligned}&m^n\mathbb {E}\sup _{t\in [0,T]}|\textrm{v}^R(t)|^n\\&\quad \le m^n C\Big ( |\textrm{v}(0)|^n+\sum _{i=1}^N|\textrm{z}_i^R(0)|^n+ 1 \Big )+m^{\frac{n}{2}-\varepsilon }C+C m^n \int _0^T \mathbb {E}\sup _{r\in [0,t]}|\textrm{v}^R(r)|^n\text {d}t, \end{aligned}$$

holds for some positive constant \(C=C(T,n,R,\varepsilon )\) independent of m. In view of Gronwall’s inequality, for all \(n>1\), we arrive at (4.7), as claimed. \(\square \)

4.3 Estimates on (1.10)

In this subsection, we provide several energy estimates on the limiting system (1.10) on any finite-time window. More precisely, we have the following result.

Lemma 4.3

Under Assumptions 2.1, 2.3 and 2.5, for all \((\textrm{x}(0),\textrm{z}_{1}(0)\),..., \(\textrm{z}_N(0))\in \mathcal {D}\times (\mathbb {R}^d)^{N}\), let \(Q(t)=\big (\textrm{q}(t)\), \(\textrm{f}_{1}(t)\),\(\dots \), \(\textrm{f}_{N}(t)\big )\) be the solution of (1.10) and \(\beta _1\) be the constant as in Assumptions 2.3 and 2.5. For all \(\varepsilon ,\kappa >0\) sufficiently small and \(T>0\), the followings hold:

(a) If \(\beta _1>1\),

$$\begin{aligned} \mathbb {E}\Bigg [\exp \Bigg \{\sup _{t\in [0,T]}\Bigg (\kappa |\textrm{q}(t)|^2+\kappa \varepsilon \!\!\!\sum _{1\le i<j\le N} \frac{1}{|q_i(t)-q_j(t)|^{\beta _1-1}}\Bigg )\Bigg \}\Bigg ]\le C, \end{aligned}$$
(4.13)

for some positive constant \(C=C(\kappa ,\varepsilon , T,\textrm{x}(0),\textrm{z}_{1}(0),\dots ,\textrm{z}_N(0))\).

(b) Otherwise, if \(\beta _1=1\),

$$\begin{aligned} \mathbb {E}\Bigg [\exp \Bigg \{\sup _{t\in [0,T]}\Bigg (\kappa |\textrm{q}(t)|^2-\kappa \varepsilon \!\!\!\sum _{1\le i<j\le N}\!\!\!\log |q_i(t)-q_j(t)|\Bigg )\Bigg \}\Bigg ]\le C. \end{aligned}$$
(4.14)

The proof of Lemma 4.3 relies on two ingredients: the choice of Lyapunov functions specifically designed for (1.10) and the exponential Martingale inequality. Later in Sect. 4.4, we will particularly exploit Lemma 4.3 to remove the Lipschitz constraint in Proposition 4.1 so as to conclude the main Theorem 2.10.

Proof of Lemma 4.3

(a) Suppose that \(\beta _1>1\). For \(\varepsilon >0\), we consider the functions \(\Gamma _1\) given by

$$\begin{aligned} \Gamma _1(\textrm{q},\textrm{f}_1,\dots ,\textrm{f}_N)&= \frac{1}{2}\gamma |\textrm{q}|^2 +\frac{1}{2}\sum _{i=1}^N \sum _{\ell =1}^{k_i}\frac{1}{\alpha _{i,\ell }}|f_{i,\ell }|^2 +\varepsilon \,\gamma \!\!\!\sum _{1\le i<j\le N}\frac{1}{ |q_i-q_j|^{\beta _1-1}}. \end{aligned}$$
(4.15)

We aim to show that \(\Gamma _1(t)\) satisfies suitable energy estimate, allowing for establishing the moment bound in sup norm (4.13).

From (1.10), we employ Itô’s formula to obtain

$$\begin{aligned}&\text {d}\Bigg (\frac{1}{2}\gamma |\textrm{q}(t)|^2 +\frac{1}{2}\sum _{i=1}^N \sum _{\ell =1}^{k_i}\frac{1}{\alpha _{i,\ell }}|f_{i,\ell }(t)|^2\Bigg ) \\&\quad = - \sum _{i=1}^N \Bigg (\sum _{\ell =1}^{k_i}\lambda _{i,\ell }^2\Bigg )|q_i(t)|^2\text {d}t-\sum _{i=1}^N \langle \nabla U(q_i(t)),q_i(t)\rangle \text {d}t +N\gamma ^2d\, \text {d}t\\&\qquad -\!\!\!\sum _{1\le i<j\le N}\!\!\!\langle \nabla G(q_i(t)-q_j(t)),q_i(t)-q_j(t)\rangle \text {d}t +\sum _{i=1}^N\sqrt{2\gamma }\langle q_i(t),\text {d}W_{i,0}(t)\rangle \\&\qquad -\sum _{i=1}^N \sum _{\ell =1}^{k_i}|f_{i,\ell }|^2\text {d}t+ \alpha _{i,\ell }^2d\,\text {d}t+ \sqrt{2\alpha _{i,\ell }}\langle f_{i,\ell }(t),\text {d}W_{i,\ell }(t)\rangle . \end{aligned}$$

Recalling condition (2.3), we readily have

$$\begin{aligned} -\sum _{i=1}^N \langle \nabla U(q_i(t)),q_i(t)\rangle \le -Na_2\sum _{i=1}^N|q_i(t)|^{\lambda +1}+Na_3 . \end{aligned}$$

To bound the cross terms involving G, we invoke (2.5) and obtain

$$\begin{aligned}&-\!\!\!\sum _{1\le i<j\le N}\!\!\!\langle \nabla G(q_i(t)-q_j(t)),q_i(t)-q_j(t)\rangle \\ {}&\quad \le a_1\!\!\!\sum _{1\le i<j\le N}\!\!\!\Bigg (|q_i(t)-q_j(t)|+\frac{1}{|q_i(t)-q_j(t)|^{\beta _1-1}}\Bigg ). \end{aligned}$$

Since \(\sum _{1\le i<j\le N}|q_i-q_j|\) can be subsumed into \(-|\textrm{q}|^2\), we infer from the above estimates that

$$\begin{aligned}&\text {d}\Bigg (\frac{1}{2}\gamma |\textrm{q}(t)|^2 +\frac{1}{2}\sum _{i=1}^N \sum _{\ell =1}^{k_i}\frac{1}{\alpha _{i,\ell }}|f_{i,\ell }(t)|^2\Bigg ) \nonumber \\&\quad \le - c|\textrm{q}(t)|^2\text {d}t-c|\textrm{q}(t)|^{\lambda +1}\text {d}t-\sum _{i=1}^N |\textrm{f}_i|^2 +C \text {d}t+ a_1\!\!\!\sum _{1\le i<j\le N}\frac{1}{|q_i-q_j|^{\beta _1-1}}\text {d}t \nonumber \\&\qquad +\sum _{i=1}^N\sqrt{2\gamma }\langle q_i(t),\text {d}W_{i,0}(t)\rangle +\sum _{i=1}^N \sum _{\ell =1}^{k_i} \sqrt{2\alpha _{i,\ell }}\langle f_{i,\ell }(t),\text {d}W_{i,\ell }(t)\rangle , \end{aligned}$$
(4.16)

for some positive constant \(c,\,C\) independent of t.

Turning to the last term on the right-hand side of (4.15), a routine computation gives

$$\begin{aligned}&\text {d}\Bigg ( \varepsilon \,\gamma \!\!\!\sum _{1\le i<j\le N}\frac{1}{|q_i(t)-q_j(t)|^{\beta _1-1}}\Bigg ) \nonumber \\&\quad = -\varepsilon (\beta _1-1)\sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , -\nabla U(q_i(t))\text {d}t\nonumber \\ {}&\quad - \sum _{\ell \ne i}\nabla G\big (q_i(t)-q_\ell (t)\big ) \text {d}t \nonumber \\&\qquad -\sum _{i=1}^{k_i} \lambda _{i,\ell }^2 q_i(t)\text {d}t+\sum _{\ell =1}^{k_i}\lambda _{i,\ell } f_{i,\ell }(t)\text {d}t+\sqrt{2\gamma }\text {d}W_{i,0}(t) \Bigg \rangle \nonumber \\&\qquad -\varepsilon (\beta _1-1)(\beta _1+1-d)\!\!\!\sum _{1\le i<j\le N} \frac{1}{|q_i(t)-q_j(t)|^{\beta _1+1}}\text {d}t. \end{aligned}$$
(4.17)

Using Cauchy–Schwarz inequality, it is clear that

$$\begin{aligned}&-\varepsilon (\beta _1-1)\sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , -\sum _{i=1}^{k_i} \lambda _{i,\ell }^2 q_i(t)+\sum _{\ell =1}^{k_i}\lambda _{i,\ell } f_{i,\ell }(t)\Bigg \rangle \\&\quad \le C\varepsilon ^{1/2}|\textrm{q}(t)|^2+C\varepsilon ^{1/2}\sum _{i=1}^N |\textrm{f}_i(t)|^2+C\varepsilon ^{3/2}\!\!\!\sum _{1\le i<j\le N}\frac{1}{|q_i(t)-q_j(t)|^{2\beta _1}}. \end{aligned}$$

Concerning the cross terms involving U on the right-hand side of (4.17), we recall that \(\nabla U\) satisfies (2.2). In light of the mean value theorem, we infer

$$\begin{aligned}&-\varepsilon (\beta _1-1)\sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , -\nabla U(q_i(t))\Bigg \rangle \\&\quad = \varepsilon (\beta _1-1)\!\!\!\sum _{1\le i<j\le N}\frac{\langle q_i(t)-q_j(t),\nabla U(q_i(t))-\nabla U(q_j(t))\rangle }{|q_i(t)-q_j(t)|^2}\\&\quad \le C\varepsilon \!\!\!\sum _{1\le i<j\le N}\big (|q_i(t)|^{\lambda -1}+|q_j(t)|^{\lambda -1}\big )\le C\varepsilon |\textrm{q}(t)|^{\lambda -1}. \end{aligned}$$

Turning to the cross terms involving G on the right-hand side of (4.25), we recast them as follows:

$$\begin{aligned}&-\varepsilon (\beta _1-1)\sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , - \sum _{\ell \ne i}\nabla G\big (q_i(t)-q_\ell (t)\big )\Bigg \rangle \\&\quad = -\varepsilon (\beta _1-1)\sum _{i=1}^N \Big \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , a_4 \sum _{\ell \ne i}\frac{q_i(t)-q_\ell (t)}{|q_i(t)-q_\ell (t)|^{\beta _1+1}}\Bigg \rangle \\&\qquad +\varepsilon (\beta _1-1)\sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , \sum _{\ell \ne i}\nabla G\big (q_i(t)-q_\ell (t)\big )\\ {}&\quad +a_4\frac{q_i(t)-q_\ell (t)}{|q_i(t)-q_\ell (t)|^{\beta _1+1}}\Bigg \rangle . \end{aligned}$$

In view of Lemma A.2, cf. (A.2), we find

$$\begin{aligned}&-\varepsilon (\beta _1-1)\sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , a_4 \sum _{\ell \ne i}\frac{q_i(t)-q_\ell (t)}{|q_i(t)-q_\ell (t)|^{\beta _1+1}}\Bigg \rangle \\&\quad \le -\varepsilon (\beta _1-1)a_4 \cdot \frac{4}{N(N-1)^2}\sum _{1\le i <j\le N}\frac{1}{|q_i(t)-q_j(t)|^{2\beta _1}}. \end{aligned}$$

On the other hand, the condition (2.7) implies

$$\begin{aligned}&\varepsilon (\beta _1-1)\sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , \sum _{\ell \ne i}\nabla G\big (q_i(t)-q_\ell (t)\big )\\ {}&\quad +a_4\frac{q_i(t)-q_\ell (t)}{|q_i(t)-q_\ell (t)|^{\beta _1+1}}\Bigg \rangle \\&\quad \le \varepsilon (\beta _1-1)\sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{1}{|q_i(t)-q_j(t)|^{\beta _1}} , \sum _{\ell \ne i}\Bigg (a_5\frac{1}{|q_i(t)-q_\ell (t)|^{\beta _2}}+a_6\Bigg )\Bigg \rangle \\&\quad \le \varepsilon ^{3/2} C \sum _{1\le i<j\le N}\frac{1}{|q_i(t)-q_j(t)|^{2\beta _1}}+ \varepsilon ^{1/2} C \sum _{1\le i <j\le N}\frac{1}{|q_i(t)-q_j(t)|^{2\beta _2}}+C, \end{aligned}$$

for some positive constant C independent of t and \(\varepsilon \). Since \(\beta _2<\beta _1\), cf. (2.7), taking \(\varepsilon \) small enough produces the bound

$$\begin{aligned}&-\varepsilon (\beta _1-1)\sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , - \sum _{\ell \ne i}\nabla G\big (q_i(t)-q_\ell (t)\big )\Bigg \rangle \nonumber \\&\quad \le -\varepsilon \, c\!\!\!\sum _{1\le i <j\le N}\frac{1}{|q_i(t)-q_j(t)|^{2\beta _1}}. \end{aligned}$$
(4.18)

Similarly, regarding the last term on the right-hand side of (4.17), since in Case 1, \(\beta _1>1\), it is clear that \(\varepsilon |q_i-q_j|^{-\beta _1-1} \) can also be subsumed into \(-\varepsilon |q_i-q_j|^{-2\beta _1}\) as in (4.18). Altogether with the expression (4.17), we arrive at the estimate

$$\begin{aligned}&\text {d}\Bigg ( \varepsilon \,\gamma \!\!\!\sum _{1\le i<j\le N}\frac{1}{|q_i(t)-q_j(t)|^{\beta _1-1}}\Bigg ) \nonumber \\&\quad \le \varepsilon ^{1/2} C |\textrm{q}(t)|^2\text {d}t+ \varepsilon ^{1/2} C|\textrm{q}(t)|^{\lambda -1}\text {d}t+\varepsilon ^{1/2} C\sum _{i=1}^N |\textrm{f}_i(t)|^2\text {d}t+C\text {d}t \nonumber \\&\qquad -\varepsilon \,c \!\!\!\sum _{1\le i<j \le N}\frac{1}{|q_i(t)-q_j(t)|^{2\beta _1}} \text {d}t \nonumber \\ {}&\quad - \varepsilon \sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , \sqrt{2\gamma }\text {d}W_{i,0}(t) \Bigg \rangle . \end{aligned}$$
(4.19)

Next, from the expression (4.15) of \(\Gamma _1\) and the estimates (4.16) and (4.19), we obtain (by taking \(\varepsilon \) small enough)

$$\begin{aligned} \text {d}\Gamma _1(t)&\le - c|\textrm{q}(t)|^2\text {d}t-c|\textrm{q}(t)|^{\lambda +1}\text {d}t-c\sum _{i=1}^N |\textrm{f}_i|^2 \nonumber \\ {}&-\varepsilon \, c \!\!\!\sum _{1\le i<j \le N}\frac{1}{|q_i(t)-q_j(t)|^{2\beta _1}} \text {d}t+C \text {d}t \nonumber \\&\qquad + \sum _{i=1}^N\sqrt{2\gamma }\langle q_i(t),\text {d}W_{i,0}(t)\rangle +\sum _{i=1}^N \sum _{\ell =1}^{k_i} \sqrt{2\alpha _{i,\ell }}\langle f_{i,\ell }(t),\text {d}W_{i,\ell }(t)\rangle \nonumber \\&\qquad - \varepsilon \sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , \sqrt{2\gamma }\text {d}W_{i,0}(t) \Bigg \rangle . \end{aligned}$$
(4.20)

We emphasize that in (4.20), \(c>0\) is independent of \(\varepsilon \), whereas \(C>0\) may still depend on \(\varepsilon \).

Now, to establish (4.13), we aim to employ the well-known exponential Martingale inequality applied to (4.20). The argument below is similarly to that found in Hairer and Mattingly (2008, Lemma 5.1). See also Glatt-Holtz et al. (2021, 2022).

For \(\kappa \in (0,1)\) to be chosen later, from (4.20), we observe that

$$\begin{aligned} \kappa \text {d}\Gamma _1(t)&\le - c\kappa |\textrm{q}(t)|^2\text {d}t-c\kappa |\textrm{q}(t)|^{\lambda +1}\text {d}t-c\kappa \sum _{i=1}^N |\textrm{f}_i|^2 +C\kappa \text {d}t \nonumber \\&\quad - \kappa \varepsilon \, c \!\!\!\sum _{1\le i<j \le N}\frac{1}{|q_i(t)-q_j(t)|^{2\beta _1}} \text {d}t+\text {d}M_1(t). \end{aligned}$$
(4.21)

In the above, the semi-Martingale term \(M_1(t)\) is defined as

$$\begin{aligned} \text {d}M_1(t)&= \kappa \sum _{i=1}^N\sqrt{2\gamma }\langle q_i(t),\text {d}W_{i,0}(t)\rangle +\kappa \sum _{i=1}^N \sum _{\ell =1}^{k_i} \sqrt{2\alpha _{i,\ell }}\langle f_{i,\ell }(t),\text {d}W_{i,\ell }(t)\rangle \\&\quad -\kappa \varepsilon \sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}} , \sqrt{2\gamma }\text {d}W_{i,0}(t) \Bigg \rangle , \end{aligned}$$

whose variation process \(\langle M_1\rangle (t)\) is given by

$$\begin{aligned} \text {d}\langle M_1 \rangle (t)&=2\gamma \kappa ^2 \Bigg |\sum _{i=1}^N\Bigg ( q_i(t)- \varepsilon \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{\beta _1+1}}\Bigg )\Bigg |^2\text {d}t\\ {}&\quad +\kappa ^2\sum _{i=1}^N \sum _{\ell =1}^{k_i} 2\alpha _{i,\ell }| f_{i,\ell }(t)|^2\text {d}t. \end{aligned}$$

Using Cauchy–Schwarz inequality, it is clear that

$$\begin{aligned} \text {d}\langle M_1 \rangle (t)&\le - \tilde{c}\kappa ^2|\textrm{q}(t)|^2\text {d}t-\tilde{c}\kappa ^2\sum _{i=1}^N |\textrm{f}_i|^2 \text {d}t- \kappa ^2\varepsilon \tilde{c}\!\!\!\sum _{1\le i<j \le N}\frac{1}{|q_i(t)-q_j(t)|^{2\beta _1}} \text {d}t, \end{aligned}$$

for some positive constant \(\tilde{c}\) independent of both \(\kappa \) and \(\varepsilon \). It follows from (4.21) that

$$\begin{aligned} \kappa \text {d}\Gamma _1(t)&\le C\kappa \text {d}t-\frac{c}{\kappa }\langle M_1\rangle (t)\text {d}t + \text {d}M_1(t). \end{aligned}$$

Recalling the exponential Martingale inequality applying to \(M_1(t)\),

$$\begin{aligned} \mathbb {P}\Bigg (\sup _{t\ge 0}\Bigg [M_1(t)-\frac{c}{\kappa }\langle M_1\rangle (t)\Bigg ] >r\Bigg )\le e^{-\frac{2c}{\kappa }r}, \quad r\ge 0, \end{aligned}$$
(4.22)

we deduce that

$$\begin{aligned} \mathbb {P}\Bigg (\sup _{t\in [0,T]}\Bigg [\kappa \Gamma _1(t)-\kappa \Gamma _1(0)-\kappa Ct\Bigg ] >r\Bigg )\le e^{-\frac{2c}{\kappa }r}. \end{aligned}$$

In particular, by choosing \(\kappa \) sufficiently small, the above inequality implies

$$\begin{aligned} \mathbb {E}\exp \Bigg \{\sup _{t\in [0,T]}\kappa \Gamma _1(t)\Bigg \}\le C, \end{aligned}$$
(4.23)

for some positive constant \(C=C(T,\kappa ,\varepsilon ,\textrm{x}(0),\textrm{z}_1(0),\dots ,\textrm{z}_N(0) )\). Recalling \(\Gamma _1\) as in (4.15), the estimate (4.23) produces (4.13). Hence, part (a) is established for \(\beta _1>1\).

(b) Considering \(\beta _1=1\), in this case, we introduce the function \(\Gamma _2\) defined as

$$\begin{aligned} \Gamma _2(\textrm{q},\textrm{f}_1,\dots ,\textrm{f}_N)&= \frac{1}{2}\gamma |\textrm{q}|^2 +\frac{1}{2}\sum _{i=1}^N \sum _{\ell =1}^{k_i}\frac{1}{\alpha _{i,\ell }}|f_{i,\ell }|^2 -\varepsilon \,\gamma \!\!\!\sum _{1\le i<j\le N}\!\!\!\log |q_i-q_j|. \end{aligned}$$
(4.24)

With regard to the log term on the above right-hand side, the following identity holds

$$\begin{aligned}&\text {d}\Bigg (-\varepsilon \gamma \!\!\!\sum _{1\le i<j\le N}\!\!\!\log |q_i(t)-q_j(t)|\Bigg ) \nonumber \\&\quad = -\varepsilon \sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^2} , -\nabla U(q_i(t))\text {d}t- \sum _{\ell \ne i}\nabla G\big (q_i(t)-q_\ell (t)\big ) \text {d}t \nonumber \\&\qquad -\sum _{i=1}^{k_i} \lambda _{i,\ell }^2 q_i(t)\text {d}t+\sum _{\ell =1}^{k_i}\lambda _{i,\ell } f_{i,\ell }(t)\text {d}t+\sqrt{2\gamma }\text {d}W_{i,0}(t) \Bigg \rangle \nonumber \\&\qquad -\varepsilon (d-2)\!\!\!\sum _{1\le i<j\le N} \frac{1}{|q_i(t)-q_j(t)|^2}\text {d}t. \end{aligned}$$
(4.25)

Similarly to the estimates in Case 1, we readily have

$$\begin{aligned}&-\varepsilon \sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{2}} , -\nabla U(q_i(t))-\sum _{i=1}^{k_i} \lambda _{i,\ell }^2 q_i(t)+\sum _{\ell =1}^{k_i}\lambda _{i,\ell } f_{i,\ell }(t)\Bigg \rangle \nonumber \\&\quad \le C\varepsilon ^{1/2}|\textrm{q}(t)|^2+C\varepsilon |\textrm{q}(t)|^{\lambda -1}+C\varepsilon ^{1/2}\sum _{i=1}^N |\textrm{f}_i(t)|^2\nonumber \\ {}&\quad +C\varepsilon ^{3/2}\!\!\!\sum _{1\le i<j\le N}\frac{1}{|q_i(t)-q_j(t)|^{2}}. \end{aligned}$$
(4.26)

Concerning the cross terms involving G on the right-hand side of (4.25). We employ an argument from Case 1 while making use of condition (2.7) and the estimate (A.3) to arrive at

$$\begin{aligned}&-\varepsilon \sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^2} , - \sum _{\ell \ne i}\nabla G\big (q_i(t)-q_\ell (t)\big )\Bigg \rangle \nonumber \\&\quad \le -2a_4 \varepsilon \!\!\!\sum _{1\le i<j \le N}\frac{1}{|q_i(t)-q_j(t)|^{2}}+C\varepsilon ^{3/2}\!\!\!\sum _{1\le i<j \le N}\frac{1}{|q_i(t)-q_j(t)|^{2}}+\tilde{C}. \end{aligned}$$
(4.27)

In the above, we emphasize that C is independent of \(\varepsilon \) even though \(\tilde{C}\) may still depend on \(\varepsilon \). Turning to the last term on the right-hand side of (4.25), i.e.,

$$\begin{aligned} -\varepsilon (d-2)\!\!\!\sum _{1\le i<j\le N} \frac{1}{|q_i(t)-q_j(t)|^2}\text {d}t, \end{aligned}$$

there are two cases to be considered, depending on the dimension d. In dimension \(d\ge 2\), it is clear that the above expression is negative and thus is negligible. On the other hand, in dimension \(d=1\), it is reduced to

$$\begin{aligned} \varepsilon \!\!\!\sum _{1\le i<j\le N} \frac{1}{|q_i(t)-q_j(t)|^2}\text {d}t. \end{aligned}$$

In view of Assumption 2.5, we combine with (4.27) to obtain

$$\begin{aligned}&-\varepsilon \sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^2} , - \sum _{\ell \ne i}\nabla G\big (q_i(t)-q_\ell (t)\big )\Bigg \rangle +\varepsilon \!\!\!\sum _{1\le i<j\le N} \frac{1}{|q_i(t)-q_j(t)|^2}\\&\quad \le -(2a_4-1)\varepsilon \!\!\!\sum _{1\le i<j\le N} \frac{1}{|q_i(t)-q_j(t)|^2}+C\varepsilon ^{3/2}\!\!\!\sum _{1\le i<j \le N}\frac{1}{|q_i(t)-q_j(t)|^{2}}+\tilde{C}, \end{aligned}$$

whence (by taking \(\varepsilon \) sufficiently small)

$$\begin{aligned}&-\varepsilon \sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^2} , - \sum _{\ell \ne i}\nabla G\big (q_i(t)-q_\ell (t)\big )\Bigg \rangle +\varepsilon \!\!\!\sum _{1\le i<j\le N} \frac{1}{|q_i(t)-q_j(t)|^2}\\&\quad \le -c\varepsilon \!\!\!\sum _{1\le i<j\le N} \frac{1}{|q_i(t)-q_j(t)|^2}+\tilde{C}, \end{aligned}$$

From (4.25) and (4.26), we find

$$\begin{aligned}&\text {d}\Bigg (-\!\!\!\sum _{1\le i<j\le N}\!\!\!\log |q_i(t)-q_j(t)|\Bigg )\nonumber \\&\quad \le -c\varepsilon \!\!\!\sum _{1\le i<j \le N}\frac{1}{|q_i(t)-q_j(t)|^2} \text {d}t + C\varepsilon ^{1/2}|\textrm{q}(t)|^2\text {d}t+C\varepsilon |\textrm{q}(t)|^{\lambda -1}\text {d}t\nonumber \\&\qquad +C\varepsilon ^{1/2}\sum _{i=1}^N |\textrm{f}_i(t)|^2\text {d}t-\varepsilon \sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^2}, \sqrt{2\gamma }\text {d}W_{i,0}(t) \Bigg \rangle +\tilde{C}\text {d}t. \end{aligned}$$
(4.28)

Next, we combine estimates (4.16), (4.28) with expression (4.24) of \(\Gamma _2\) to infer

$$\begin{aligned} \text {d}\Gamma _2(t)&\le - c|\textrm{q}(t)|^2\text {d}t-c|\textrm{q}(t)|^{\lambda +1}\text {d}t-c\sum _{i=1}^N |\textrm{f}_i|^2\nonumber \\ {}&\quad -\varepsilon \, c \!\!\!\sum _{1\le i<j \le N}\frac{1}{|q_i(t)-q_j(t)|^{2}} \text {d}t+C \text {d}t \nonumber \\&\qquad + \sum _{i=1}^N\sqrt{2\gamma }\langle q_i(t),\text {d}W_{i,0}(t)\rangle +\sum _{i=1}^N \sum _{\ell =1}^{k_i} \sqrt{2\alpha _{i,\ell }}\langle f_{i,\ell }(t),\text {d}W_{i,\ell }(t)\rangle \nonumber \\&\qquad - \varepsilon \sum _{i=1}^N \Bigg \langle \sum _{j\ne i}\frac{q_i(t)-q_j(t)}{|q_i(t)-q_j(t)|^{2}} , \sqrt{2\gamma }\text {d}W_{i,0}(t) \Bigg \rangle . \end{aligned}$$
(4.29)

We may now employ an argument similarly to the exponential Martingale approach as in Case 1 to deduce the bound for all \(\kappa \) sufficiently small

$$\begin{aligned} \mathbb {E}\exp \Bigg \{\sup _{t\in [0,T]}\kappa \Gamma _2(t)\Bigg \}\le C, \end{aligned}$$

for some positive constant \(C=C(T,\kappa ,\varepsilon ,\textrm{x}(0),\textrm{z}_1(0),\dots ,\textrm{z}_N(0) )\). Recalling \(\Gamma _2\) defined in (4.24), this produces the estimate (4.14). The proof is thus finished. \(\square \)

4.4 Proof of Theorem 2.10

We are now in a position to conclude Theorem 2.10. The argument follows along the lines of the proof of Nguyen (2018, Theorem 4) adapted to our setting. See also Herzog et al. (2016, Theorem 2.4). The key observation is that instead of controlling the exiting time of the process \(\textrm{x}_m(t)\) as \(m\rightarrow 0\), we are able to control \(\textrm{q}(t)\) since \(\textrm{q}(t)\) is independent of m.

Proof of Theorem 2.10

Let \(\big (\textrm{x}_m(t)\), \(\textrm{v}_m(t)\), \(\textrm{z}_{1,m}(t)\),\(\dots \), \(\textrm{z}_{N,m}(t)\big )\) and \(\big (\textrm{q}(t)\), \(\textrm{f}_{1}(t)\),\(\dots \), \(\textrm{f}_{N}(t)\big )\), respectively, solve (1.5) and (1.10). For \(R,\,m>0\), define the following stopping times

$$\begin{aligned} \sigma ^R = \inf _{t\ge 0}\Bigg \{|\textrm{q}(t)|+\!\!\!\sum _{1\le i<j\le N}\!\!\!|q_i(t)-q_j(t)|^{-1}\ge R\Bigg \}, \end{aligned}$$
(4.30)

and

$$\begin{aligned} \sigma ^R_m = \inf _{t\ge 0}\Bigg \{|\textrm{x}_m(t)|+\!\!\!\sum _{1\le i<j\le N}\!\!\!|x_{i,m}(t)-x_{j,m}(t)|^{-1}\ge R\Bigg \}. \end{aligned}$$

Fixing \(T,\,\xi >0\), observe that

$$\begin{aligned} \mathbb {P}\Bigg (\sup _{t\in [0,T]}|\textrm{x}_m(t)-\textrm{q}(t)|>\xi \Bigg )&\le \mathbb {P}\Bigg (\sup _{t\in [0,T]}|\textrm{x}_m(t)-\textrm{q}(t)|>\xi ,\sigma ^R\wedge \sigma ^R_m\ge T\Bigg ) \nonumber \\&\quad +\mathbb {P}\big (\sigma ^R\wedge \sigma ^R_m<T\big ). \end{aligned}$$
(4.31)

To control the first term on the above right-hand side, observe that

$$\begin{aligned} \mathbb {P}\big (0\le t\le \sigma ^R\wedge \sigma ^R_m,\, \textrm{q}(t)=\textrm{q}^R(t),\, \textrm{x}_m(t)=\textrm{x}_m^R(t)\big )=1, \end{aligned}$$

where \(\textrm{q}^R(t)\) and \(\textrm{x}_m^R(t)\) are the first components of the solutions of (4.5) and (4.4), respectively. As a consequence,

$$\begin{aligned}&\mathbb {P}\Bigg (\sup _{0\le t\le T}|\textrm{x}_m(t)-\textrm{q}(t)|>\xi ,\sigma ^R\wedge \sigma ^R_m\ge T\Bigg ) \nonumber \\&\le \mathbb {P}\Bigg (\sup _{0\le t\le T}|\textrm{x}^R_m(t)-\textrm{q}^R(t)|>\xi \Bigg )\le \frac{m}{\xi ^4}\cdot C(T,R). \end{aligned}$$
(4.32)

In the last estimate above, we employed Proposition 4.1 while making use of Markov’s inequality.

Turning to \(\mathbb {P}(\sigma ^R\wedge \sigma ^R_m<T)\), we note that

$$\begin{aligned}&\mathbb {P}\big (\sigma ^R\wedge \sigma ^R_m<T\big )\nonumber \\&\quad \le \mathbb {P}\Bigg (\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}^R(t)|\le \frac{\xi }{R},\sigma ^R\wedge \sigma ^R_m< T\Bigg )\nonumber \\ {}&\quad +\mathbb {P}\Bigg (\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}^R(t)|> \frac{\xi }{R}\Bigg ) \nonumber \\&\quad \le \mathbb {P}\Bigg (\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}^R(t)|\le \frac{\xi }{R},\sigma ^R_m< T\le \sigma ^R\Bigg )+\mathbb {P}(\sigma ^R< T) \nonumber \\&\qquad +\mathbb {P}\Bigg (\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}^R(t)|> \frac{\xi }{R}\Bigg ) \nonumber \\&= I_1+I_2+I_3. \end{aligned}$$
(4.33)

Concerning \(I_3\), the same argument as in (4.32) produces the bound

$$\begin{aligned} I_3=\mathbb {P}\Bigg (\sup _{0\le t\le T}|\textrm{x}^R_m(t)-\textrm{q}^R(t)|>\frac{\xi }{R}\Bigg )\le \frac{m}{\xi ^4}\cdot C(T,R). \end{aligned}$$
(4.34)

Next, considering \(I_2\), from (4.30), observe that for all \(\varepsilon \) small and R large enough,

$$\begin{aligned} \big \{ \sigma ^R<T\big \}&=\Bigg \{ \sup _{t\in [0,T]}|\textrm{q}(t)|+\!\!\!\sum _{1\le i<j\le N}\!\!\!|q_i(t)-q_j(t)|^{-1}\ge R\Bigg \}\\&\subseteq \Bigg \{ \sup _{t\in [0,T]}|\textrm{q}(t)|\ge \frac{R}{N^2}\Bigg \}\bigcup _{1\le i<j\le N} \Bigg \{ -\varepsilon \log |q_i(t)-q_j(t)|\ge \varepsilon \log \Bigg (\frac{R}{N^2}\Bigg )\Bigg \}\\&\subseteq \Bigg \{ \sup _{t\in [0,T]}|\textrm{q}(t)|^2-\varepsilon \!\!\!\sum _{1\le i<j\le N}\!\!\!\log |q_i(t)-q_j(t)| \ge \frac{R}{N^2}\Bigg \}\\&\qquad \qquad \bigcup _{1\le i<j\le N} \Bigg \{ \sup _{t\in [0,T]}|\textrm{q}(t)|^2-\varepsilon \!\!\!\sum _{1\le i<j\le N}\!\!\!\log |q_i(t)-q_j(t)| \ge \varepsilon \log \Bigg (\frac{R}{N^2}\Bigg )\Bigg \}, \end{aligned}$$

whence,

$$\begin{aligned} \big \{ \sigma ^R<T\big \}&\subseteq \Bigg \{ \sup _{t\in [0,T]}|\textrm{q}(t)|^2-\varepsilon \!\!\!\sum _{1\le i<j\le N}\!\!\!\log |q_i(t)-q_j(t)| \ge \varepsilon \log \Bigg (\frac{R}{N^2}\Bigg )\Bigg \}. \end{aligned}$$

We note that Proposition 4.1 implies the estimate

$$\begin{aligned} \mathbb {E}\Bigg [\sup _{t\in [0,T]}\Bigg (|\textrm{q}(t)|^2-\varepsilon \!\!\!\sum _{1\le i<j\le N}\!\!\!\log |q_i(t)-q_j(t)|\Bigg )\Bigg ]\le C, \end{aligned}$$
(4.35)

for some positive constant \(C=C(T,\varepsilon )\). Together with Markov’s inequality, we infer the bound for R large enough

$$\begin{aligned} I_2=\mathbb {P}(\sigma ^R<T)\le \frac{C(T)}{\varepsilon \log (R/N^2)}\le \frac{C(T)}{\varepsilon \log R}. \end{aligned}$$
(4.36)

Turning to \(I_1\) on the right-hand side of (4.33), for R large enough and \(\xi \in (0,1)\), a chain of event implications is derived as follows:

$$\begin{aligned}&\Bigg \{\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}^R(t)|\le \frac{\xi }{R},\sigma ^R_m< T\le \sigma ^R\Bigg \}\\&\quad = \Bigg \{\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}(t)|\le \frac{\xi }{R},\sup _{t\in [0,T]}\Bigg (|\textrm{x}_m^R(t)|+\!\!\!\sum _{1\le i<j\le N}\!\!\!|x_{i,m}^{R}(t)-x_{j,m}^R(t)|^{-1}\Bigg )\\ {}&\quad \ge R,\sigma ^R_m< T\le \sigma ^R\Bigg \}\\&\quad \subseteq \Bigg \{\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}(t)|\le \frac{\xi }{R},\sup _{t\in [0,T]}|\textrm{x}_m^R(t)|\ge \frac{R}{N^2}\Bigg \}\\&\quad \bigcup _{1\le i<j\le N}\Bigg \{\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}(t)|\le \frac{\xi }{R},\sup _{t\in [0,T]}|x_{i,m}^{R}(t)-x_{j,m}^R(t)|^{-1}\ge \frac{R}{N^2}\Bigg \}\\&\quad =\Bigg \{\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}(t)|\le \frac{\xi }{R},\sup _{t\in [0,T]}|\textrm{x}_m^R(t)|\ge \frac{R}{N^2}\Bigg \}\\&\quad \bigcup _{1\le i<j\le N}\Bigg \{\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}(t)|\le \frac{\xi }{R},\inf _{t\in [0,T]}|x_{i,m}^{R}(t)-x_{j,m}^R(t)|\le \frac{N^2}{R}\Bigg \}. \end{aligned}$$

Since \(\xi \) and N are fixed, for R large enough, say, \(\frac{R}{N^2}- \frac{\xi }{R}\ge \sqrt{R} \), we have

$$\begin{aligned}&\Bigg \{\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}(t)|\le \frac{\xi }{R},\sup _{t\in [0,T]}|\textrm{x}_m^R(t)|\ge \frac{R}{N^2}\Bigg \} \\&\quad \subseteq \Bigg \{\sup _{t\in [0,T]}|\textrm{q}(t)|\ge \frac{R}{N^2}- \frac{\xi }{R}\Bigg \}\subseteq \Bigg \{\sup _{t\in [0,T]}|\textrm{q}(t)|\ge \sqrt{R}\Bigg \}\\&\quad \subseteq \Bigg \{ \sup _{t\in [0,T]}|\textrm{q}(t)|^2-\varepsilon \!\!\!\sum _{1\le i<j\le N}\!\!\!\log |q_i(t)-q_j(t)| \ge \sqrt{R}\Bigg \}. \end{aligned}$$

On the other hand, by triangle inequality,

$$\begin{aligned} \inf _{t\in [0,T]}|q_{i}(t)-q_{j}(t)| \le 2\sup _{t\in [0,T]}|\textrm{x}_m^{R}(t)-\textrm{q}(t)|+\inf _{t\in [0,T]}|x_{i,m}^{R}(t)-x_{j,m}^R(t)|, \end{aligned}$$

implying

$$\begin{aligned}&\Bigg \{\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}(t)|\le \frac{\xi }{R},\inf _{t\in [0,T]}|x_{i,m}^{R}(t)-x_{j,m}^R(t)|\le \frac{N^2}{R}\Bigg \}\\&\quad \subseteq \Bigg \{\inf _{t\in [0,T]}|q_{i}(t)-q_{j}(t)|\le \frac{2\xi +N^2}{R}\Bigg \} \subseteq \Bigg \{\inf _{t\in [0,T]}|q_{i}(t)-q_{j}(t)|\le \frac{1}{\sqrt{R}}\Bigg \} \\&\quad = \Bigg \{-\varepsilon \sup _{t\in [0,T]}\log |q_{i}(t)-q_{j}(t)|\ge \frac{1}{2}\varepsilon \log R\Bigg \}. \end{aligned}$$

It follows that for \(\varepsilon \) small and R large enough

$$\begin{aligned}&\Bigg \{\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}(t)|\le \frac{\xi }{R},\inf _{t\in [0,T]}|x_{i,m}^{R}(t)-x_{j,m}^R(t)|\le \frac{N^2}{R}\Bigg \}\\&\quad \subseteq \Bigg \{ \sup _{t\in [0,T]}|\textrm{q}(t)|^2-\varepsilon \!\!\!\sum _{1\le i<j\le N}\!\!\!\log |q_i(t)-q_j(t)| \ge \frac{1}{2}\varepsilon \log R\Bigg \}. \end{aligned}$$

As a consequence, the following holds

$$\begin{aligned}&\Bigg \{\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}^R(t)|\le \frac{\xi }{R},\sigma ^R_m< T\le \sigma ^R\Bigg \}\\&\quad \subseteq \Bigg \{ \sup _{t\in [0,T]}|\textrm{q}(t)|^2-\varepsilon \!\!\!\sum _{1\le i<j\le N}\!\!\!\log |q_i(t)-q_j(t)| \ge \frac{1}{2}\varepsilon \log R\Bigg \}. \end{aligned}$$

We employ (4.35) and Markov’s inequality to infer

$$\begin{aligned} I_1&=\mathbb {P}\Bigg (\sup _{t\in [0,T]}|\textrm{x}_m^R(t)-\textrm{q}^R(t)|\le \frac{\xi }{R},\sigma ^R_m < T\le \sigma ^R\Bigg )\le \frac{C(T)}{\varepsilon \log R}. \end{aligned}$$
(4.37)

Turning back to (4.33), we collect (4.34), (4.36) and (4.37) to arrive at the bound

$$\begin{aligned}&\mathbb {P}\big (\sigma ^R\wedge \sigma ^R_m<T\big )\le \frac{m}{\xi ^4}\cdot C(T,R)+ \frac{C(T)}{\varepsilon \log R}. \end{aligned}$$
(4.38)

We emphasize that in the above estimate C(TR) and C(R) are independent of m.

Now, putting everything together, from (4.31), (4.32) and (4.38), we obtain the estimate

$$\begin{aligned}&\mathbb {P}\Bigg (\sup _{t\in [0,T]}|\textrm{x}_m(t)-\textrm{q}(t)|>\xi \Bigg ) \le \frac{m}{\xi ^4}\cdot C(T,R)+ \frac{C(T)}{\varepsilon \log R}, \end{aligned}$$

for all \(\varepsilon \) small and R large enough. By sending R to infinity and then shrinking m further to zero, this produces the small-mass limit(2.18), thereby completing the proof. \(\square \)

Remark 4.4

For the underdamped Langevin dynamics, it is well known that the small-mass limit \(m\rightarrow 0\) and the high-friction limit \(\gamma \rightarrow + \infty \) (under an appropriate time (or noise) rescaling) both lead to the same limiting system, which is the overdamped Langevin dynamics, see, e.g., (Lelièvre et al. 2010, Section 2.2.4) and also (Duong et al. 2017) for the Vlasov–Fokker–Planck system. Similarly, one can also derive the underdamped Langevin dynamics from the generalized Langevin dynamics (GLE) under the white-noise limit, by rescaling the friction coefficients \((\lambda _i)\) and the strength of the noises \((\alpha )_i\) appropriately, see, e.g., (Ottobre and Pavliotis 2011; Nguyen 2018) for a rigorous analysis for the non-interacting GLE and (Duong and Pavliotis 2019) for a formal derivation for the interacting-GLE with regular interactions. This white-noise limit is different from the small-mass limit studied in this paper. It would be interesting to study the white-noise limit for the generalized Langevin dynamics with irregular interactions, which we leave for future work.