Appendix
1.1 Appendix A: Measure Definitions and Isoperimetry
Let \(p,\pi \) be probability distributions on \({\mathbb {R}}^{d}\) with full support and smooth densities, define the Kullback–Leibler (KL) divergence of p with respect to \(\pi \) as
$$\begin{aligned} H(p\vert \pi ){\mathop {=}\limits ^{\triangle }}\int _{{R}^{d}}p(x)\log \frac{p(x)}{\pi (x)}\,\hbox {d}x. \end{aligned}$$
(7.1)
Likewise, we denote the entropy of p with
$$\begin{aligned} {\displaystyle H(p){\mathop {=}\limits ^{\triangle }}-\int p(x)\log p(x)\hbox {d}x} \end{aligned}$$
(7.2)
and for \({\mathcal {B}}({\mathbb {R}}^{d})\) denotes the Borel \(\sigma \)-field of \({\mathbb {R}}^{d}\), define the relative Fisher information and total variation metrics correspondingly as
$$\begin{aligned}{} & {} {\displaystyle I(p\vert \pi ){\mathop {=}\limits ^{\triangle }}\int _{{\mathbb {R}}^{d}}p(x)\Vert \nabla \log \frac{p(x)}{\pi (x)}\Vert ^{2}\hbox {d}x}, \end{aligned}$$
(7.3)
$$\begin{aligned}{} & {} {\displaystyle TV(p,{\displaystyle \ \pi ){\mathop {=}\limits ^{\triangle }}\sup _{A\in {\mathcal {B}}({\mathbb {R}}^{d})}\vert \int _{A}p(x)\hbox {d}x-\int _{A}\pi (x)\hbox {d}x\vert }.} \end{aligned}$$
(7.4)
Furthermore, we define a transference plan \(\zeta \), a distribution on \(({\mathbb {R}}^{d}\times {\mathbb {R}}^{d},\ {\mathcal {B}}({\mathbb {R}}^{d}\times {\mathbb {R}}^{d}))\) (where \({\mathcal {B}}({\mathbb {R}}^{d}\times {\mathbb {R}}^{d})\) is the Borel \(\sigma \)-field of (\({\mathbb {R}}^{d}\times {\mathbb {R}}^{d}\))) so that \(\zeta (A\times {\mathbb {R}}^{d})=p(A)\) and \(\zeta ({\mathbb {R}}^{d}\times A)=\pi (A)\) for any \(A\in {\mathcal {B}}({\mathbb {R}}^{d})\). Let \(\Gamma (P,\ Q)\) designate the set of all such transference plans. Then for \(\beta >0\), the \(L_{\beta }\)-Wasserstein distance is formulated as:
$$\begin{aligned} W_{\beta }(p,\pi ){\mathop {=}\limits ^{\triangle }}\left( \inf _{\zeta \in \Gamma (P,Q)}\int _{x,y\in {\mathbb {R}}^{d}}\Vert x-y\Vert ^{\beta }\textrm{d}\zeta (x,\ y)\right) ^{1/\beta }. \end{aligned}$$
(7.5)
Note that although KL divergence is an asymmetric measure of distance between probability distributions, it is the preferred measure of distance here since it also implies total variation distance via Pinsker’s inequality. In addition, KL divergence also governs the quadratic Wasserstein \(W_{2}\) distance under log-Sobolev, Talagrand, and Poincaré inequalities defined below.
Definition 7.1
The probability distribution p satisfies a logarithmic Sobolev inequality with constant \(\gamma >0\) (in short: \(LSI(\gamma )\)) if for all probability distribution p absolutely continuous \(w.r.t.\ \pi \),
$$\begin{aligned} H({\displaystyle p\vert \pi )\le \frac{1}{2\gamma }I(p\vert \pi )}. \end{aligned}$$
(7.6)
Definition 7.2
The probability distribution p satisfies a Talagrand inequality with constant \(\gamma >0\) (in short: \(T(\gamma )\)) if for all probability distribution p, absolutely continuous \(w.r.t.\ \pi \), with finite moments of order 2,
$$\begin{aligned} W_{2}(p,\ \pi )\le \sqrt{\frac{2H(p\vert \pi )}{\gamma }}. \end{aligned}$$
(7.7)
Definition 7.3
The probability distribution p satisfies a Poincaré inequality with constant \(\gamma >0\) (in short: \(PI(\gamma )\)) if for all smooth function \(g:{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\),
$$\begin{aligned} Var_{p}(g)\le \frac{1}{\gamma }E_{p}[\Vert \nabla g\Vert ^{2}], \end{aligned}$$
(7.8)
where \(Var_{p}(g)=E_{p}[g^{2}]-E_{p}[g]^{2}\) is the variance of g under p.
1.2 Appendix B: Proofs of p-Generalized Gaussian Smoothing
1.2.1 Proof of \(\alpha \)-Mixture Weakly Smooth Property
Lemma 7.4
If potential \(U:{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\) satisfies \(\alpha \)-mixture weakly smooth, then:
$$\begin{aligned} U(y)\le U(x)+\left\langle \nabla U(x),\ y-x\right\rangle +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\Vert y-x\Vert ^{1+\alpha _{i}}. \end{aligned}$$
Proof
We have
$$\begin{aligned}&\Vert U(x)-U(y)-\langle \nabla U(y),x-y\rangle \Vert \\&\quad = \Big \vert \int _{0}^{1}\langle \nabla U(y+t(x-y)),x-y\rangle \text {d}t-\langle \nabla U(y),x-y\rangle \Big \vert \\&\quad = \Big \vert \int _{0}^{1}\langle \nabla U(y+t(x-y))-\nabla U(y),x-y\rangle \text {d}t\Big \vert .\\&\quad \le \int _{0}^{1}\Vert \nabla U(y+t(x-y))-\nabla U(y)\Vert \Vert x-y\Vert \text {d}t\\&\quad \le \int _{0}^{1}\sum _{i}L_{i}t^{\alpha _{i}}\Vert x-y\Vert ^{\alpha _{i}}\Vert x-y\Vert \text {d}t\\&\quad = \sum _{i}\frac{L_{i}}{1+\alpha _{i}}\Vert x-y\Vert ^{1+\alpha _{i}}, \end{aligned}$$
where the first line comes from Taylor expansion, the third line follows from Cauchy–Schwarz inequality and the fourth line is due to Assumption 2.1. This gives us the desired result. \(\square \)
1.2.2 Proof of p-Generalized Gaussian Smoothing Properties
Lemma 7.5
If potential \(U:{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\) satisfies \(\alpha \)-mixture weakly smooth, then:
-
(i)
\(\forall x\in {\mathbb {R}}^{d}\): \(\Vert U_{\mu }(x)-U(x)\Vert {\displaystyle \le \sum _{i}L_{i}\mu ^{1+\alpha _{i}}d^{\frac{1+\alpha _{i}}{p}},}\)
-
(ii)
\(\forall x\in {\mathbb {R}}^{d}\): \({\displaystyle \left\| \nabla U_{\mu }(x)-\nabla U(x)\right\| \le \sum _{i}L_{i}\mu ^{\alpha _{i}}d^{\frac{3}{p}}},\)
-
(iii)
\(\forall x,\ y\in {\mathbb {R}}^{d}\): \({\displaystyle \left\| \nabla U_{\mu }(y)-\nabla U_{\mu }(x)\right\| \le \sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}\left\| y-x\right\| .}\)
Proof
(i). Since \(U_{\mu }(x)=\mathrm {{\mathbb {E}}}_{\xi }[U(x+\mu \xi )]\), \(U(x)=\mathrm {{\mathbb {E}}}_{\xi }[U(x)]\) and \({\mathbb {E}}_{\xi }\mu \left\langle \nabla U(x),\ \xi \right\rangle =0\), we have
$$\begin{aligned} U_{\mu }(x)-U(x)={\mathbb {E}}_{\xi }\left[ U(x+\mu \xi )-U(x)-\mu \left\langle \nabla U(x),\ \xi \right\rangle \right] . \end{aligned}$$
By the definition of the density of p-generalized Gaussian distribution [1], we also have:
$$\begin{aligned} U_{\mu }(x)-U(x)=\frac{1}{\kappa }\int _{{\mathbb {R}}^{d}}[U(x+\mu \xi )-U(x)-\mu \left\langle \nabla U(x),\ \xi \right\rangle ]e^{-\left\| \xi \right\| _{p}^{p}/p}\hbox {d}\xi . \end{aligned}$$
Applying Eq. 2.1 and previous inequality:
$$\begin{aligned} \vert U_{\mu }(x)-U(x)\vert&=\Vert \frac{1}{\kappa }\int _{{\mathbb {R}}^{d}}\left[ U(x+\mu \xi )-U(x)-\mu \left\langle \nabla U(x),\ \xi \right\rangle \right] e^{-\left\| \xi \right\| _{p}^{p}/p}\hbox {d}\xi \Vert \\&\le \sum _{i}\frac{L_{i}}{\kappa (1+\alpha _{i})}\mu ^{1+\alpha _{i}}\int _{{\mathbb {R}}^{d}}\left\| \xi \right\| ^{(1+\alpha _{i})}e^{-\left\| \xi \right\| _{p}^{p}/p}\hbox {d}\xi \\&=\sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}E\left[ \left\| \xi \right\| ^{(1+\alpha _{i})}\right] . \end{aligned}$$
If \(p\le 2\), then \(\left\| \xi \right\| \le \left\| \xi \right\| _{p}\) and we get
$$\begin{aligned} \vert U_{\mu }(x)-U(x)\vert&\le \sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}E\left[ \left\| \xi \right\| ^{(1+\alpha _{i})}\right] \\&{\mathop {\le }\limits ^{_{1}}}\sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}{\mathbb {E}}\left[ \left\| \xi \right\| _{p}^{2}\right] ^{\frac{1+\alpha _{i}}{2}}\\&{\mathop {\le }\limits ^{_{2}}}\sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}\left( \left( d+1\right) ^{\frac{2}{p}}\right) ^{\frac{1+\alpha _{i}}{2}}\\&\le \sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}d^{\frac{1+\alpha _{i}}{p}}\\&\le \sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}d^{\frac{2}{p}}, \end{aligned}$$
where step 1 follows from Jensen inequality and \(0\le \alpha \le 1\), step 2 is from Lemma 7.25 in which if \(\xi \sim N_{p}\left( 0,I_{d}\right) \) then \(d^{\left\lfloor \frac{n}{p}\right\rfloor }\le E(\left\| \xi \right\| _{p}^{n})\le \left[ d+\frac{n}{2}\right] ^{\frac{n}{p}}\), where\(\left\lfloor x\right\rfloor \) denotes the largest integer less than or equal to x, and the last step is by simplification when d is large enough and \(\mu \) is small enough.
(ii). We adapt the technique of [34] to p-generalized Gaussian smoothing. Let \(y=x+\mu \xi \), then \(U_{\mu }(x)\) is rewritten in another form as
$$\begin{aligned} U_{\mu }(x)&=\mathrm {{\mathbb {E}}}_{\xi }[U(x+\mu \xi )]\\&=\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}U(y)e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\hbox {d}y. \end{aligned}$$
Now taking the gradient with respect to x of \(U_{\mu }(x)\) gives
$$\begin{aligned} \nabla _{x}U_{\mu }(x)=\frac{1}{\kappa \mu }\nabla _{x}\int _{{\mathbb {R}}^{d}}U(y)e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\hbox {d}y. \end{aligned}$$
By Fubini theorem with some regularity (i.e., \({\mathbb {E}}\vert U(y)\vert <\infty \)), we can exchange the gradient and integral and get
$$\begin{aligned} \nabla _{x}U_{\mu }(x)&=\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}\nabla _{x}\left( U(y)e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\right) \hbox {d}y\\&=\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}U(y)\nabla _{x}\left( e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\right) \hbox {d}y\\&=\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}U(y)e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\frac{-1}{\mu ^{p}}\left\| y-x\right\| _{p}^{p}\nabla _{x}(\left\| y-x\right\| _{p})\hbox {d}y\\&=\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}U(y)e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\frac{1}{\mu ^{p}}(y-x)\circ \Vert y-x\Vert ^{p-2}\hbox {d}y, \end{aligned}$$
where \(\circ \) stands for the Hadamard product and \(\Vert \cdot \Vert \) is used for absolute value of each component of the vector \(y-x\). Therefore, by changing variable back to \(\xi \), we deduce
$$\begin{aligned} \nabla _{x}U_{\mu }(x)&=\frac{1}{\kappa }\int _{{\mathbb {R}}^{d}}U(x+\mu \xi )e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\frac{1}{\mu }\xi \circ \Vert \xi \Vert ^{p-2}\hbox {d}\xi \\&={\mathbb {E}}_{\xi }\left[ \frac{U(x+\mu \xi )\xi \circ \Vert \xi \Vert ^{p-2}}{\mu }\right] . \end{aligned}$$
In addition, if \(\xi \sim N_{p}(0,I_{d})\), \({\mathbb {E}}\left( \xi \right) =\frac{1}{\kappa }\int \xi e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi =0\) and then \(\nabla _{\xi }{\mathbb {E}}\left( \xi \right) =0\). Since \(\xi e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\) is bounded, we can exchange the gradient and the integral and get
$$\begin{aligned} \nabla _{\xi }\frac{1}{\kappa }\int \xi e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi&=\frac{1}{\kappa }\int \nabla _{\xi }\left( \xi e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\right) \hbox {d}\xi \\ 0&=\frac{1}{\kappa }\int e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi +\frac{1}{\kappa }\int \xi \nabla _{\xi }\left( e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\right) \hbox {d}\xi \\ 0&=1-\frac{1}{\kappa }\int \xi \textrm{e}^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi \right\| _{p}^{p-1}\nabla _{\xi }\left( \left\| \xi \right\| _{p}\right) \hbox {d}\xi \\ 0&=1-\frac{1}{\kappa }\int \xi \cdot \xi \circ \Vert \xi \Vert ^{p-2}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi , \end{aligned}$$
which implies
$$\begin{aligned} \frac{1}{\kappa }\int \xi \cdot \xi \circ \Vert \xi \Vert ^{p-2}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi =1. \end{aligned}$$
(7.9)
On the other hand, we also have \(\frac{1}{\kappa }\int e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi =1\) so \(\nabla _{\xi }\int e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi =0.\) By exchanging the gradient and the integral, we also get
$$\begin{aligned} 0&=\nabla _{\xi }\int e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi \\&=\int \nabla _{\xi }e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi \\&=\int \nabla _{\xi }\left( e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\right) \hbox {d}\xi \\&=-\int e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\xi \circ \Vert \xi \Vert ^{p-2}\hbox {d}\xi \end{aligned}$$
which implies that
$$\begin{aligned} {\mathbb {E}}_{\xi }\left[ \xi \circ \Vert \xi \Vert ^{p-2}\right] =0. \end{aligned}$$
(7.10)
From 7.9 and 7.10, we obtain
$$\begin{aligned} \left\| \nabla U_{\mu }(x)-\nabla U(x)\right\|&=\left\| \frac{1}{\kappa }\int _{{\mathbb {R}}^{d}}\left[ \frac{U(x+\mu \xi )-U(x)}{\mu } -\left\langle \nabla U(x),\xi \right\rangle \right] \right. \\&\qquad \left. \xi \circ \Vert \xi \Vert ^{p-2}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi \right\| \\&{\mathop {\le }\limits ^{_{1}}}\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}\Vert U(x+\mu \xi )-U(x)-\mu \left\langle \nabla U(x),\xi \right\rangle \Vert \\&\qquad e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi \circ \Vert \xi \Vert ^{p-2}\right\| \hbox {d}\xi \\&{\mathop {\le }\limits ^{_{2}}}\sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\kappa \left( 1+\alpha _{i}\right) }\int _{{\mathbb {R}}^{d}}\left\| \xi \right\| ^{\alpha _{i}+1}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi \circ \Vert \xi \Vert ^{p-2}\right\| \hbox {d}\xi \\&=\sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\kappa \left( 1+\alpha _{i}\right) }\int _{{\mathbb {R}}^{d}}\left\| \xi \right\| ^{\alpha _{i}+1}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi ^{p-1}\right\| \hbox {d}\xi , \end{aligned}$$
where step 1 follows from Jensen inequality, step 2 is due to 2.1 and the last step follows from component-wise operation of norm. If \(p\le 2\), by using generalized Holder inequality, \(\left\| \xi ^{p-1}\right\| \) can be bounded as follows:
$$\begin{aligned} \left\| \xi ^{p-1}\right\|&\le \left\| \xi ^{p-1}\right\| _{p}\nonumber \\&=\left\| \xi ^{p-1}\cdot 1_{d}\right\| _{p}\nonumber \\&{\mathop {\le }\limits ^{}}\left\| \xi \right\| _{p}^{p-1}\left\| 1_{d}\right\| _{p}^{2-p}\nonumber \\&=\left\| \xi \right\| _{p}^{p-1}d^{\frac{2-p}{p}}. \end{aligned}$$
(7.11)
As a result, if \(1\le p\le 2\) we have
$$\begin{aligned} \left\| \nabla U_{\mu }(x)-\nabla U(x)\right\|&\le \sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\kappa \left( 1+\alpha _{i}\right) }\int _{{\mathbb {R}}^{d}}\left\| \xi \right\| ^{\alpha _{i}+1}\left\| \xi \right\| _{p}^{p-1}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi \\&{\mathop {\le }\limits ^{_{1}}}\sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\left( 1+\alpha _{i}\right) }d^{\frac{2-p}{p}}{\mathbb {E}}\left[ \left\| \xi \right\| _{p}^{p+\alpha _{i}}\right] \\&{\mathop {\le }\limits ^{_{2}}}\sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\left( 1+\alpha _{i}\right) }d^{\frac{2-p}{p}}{\mathbb {E}}\left[ \left\| \xi \right\| _{p}^{2p}\right] ^{\frac{p+\alpha }{2p}}\\&{\mathop {\le }\limits ^{_{3}}}\sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\left( 1+\alpha _{i}\right) }d^{\frac{2-p}{p}}\left( d+p\right) ^{\frac{p+\alpha }{p}}\\&{\mathop {\le }\limits ^{}}\sum _{i}L_{i}\mu ^{\alpha _{i}}d^{\frac{3}{p}}, \end{aligned}$$
where step 1 is from \(\left\| \xi \right\| \le \left\| \xi \right\| _{p}\), step 2 follows from Jensen inequality and \(\alpha \le p\), step 3 is due to 2.1 and in the last two steps we have used simplification for large enough d and small enough \(\mu \).
(iii) In this case, using Eqs. 2.1 and 7.10, we get:
$$\begin{aligned} \nabla U_{\mu }(x)=\frac{1}{\kappa }\int _{{\mathbb {R}}^{d}}\left[ \frac{U(x+\mu \xi )-U(x)}{\mu }\right] \xi \circ \Vert \xi \Vert ^{p-2}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi . \end{aligned}$$
Let \(V(x)=U(x+\mu \xi )-U(x)\), from the above equation, we obtain
$$\begin{aligned}&\left\| \nabla U_{\mu }(y)-\nabla U_{\mu }(x)\right\| \\&\quad =\left\| \frac{1}{\mu \kappa }\int _{{\mathbb {R}}^{d}}\left( V(y)-V(x)\right) e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\xi \circ \Vert \xi \Vert ^{p-2}\hbox {d}\xi \right\| \\&\quad =\left\| \frac{1}{\mu \kappa }\int _{{\mathbb {R}}^{d}}\int _{0}^{1}\left\langle \nabla V\left( ty+\left( 1-t\right) x\right) ,y-x\right\rangle \hbox {d}t\,e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\xi \circ \Vert \xi \Vert ^{p-2}\hbox {d}\xi \right\| \\&\quad =\left\| \frac{1}{\mu \kappa }\int _{{\mathbb {R}}^{d}}\int _{0}^{1}\left\langle \nabla U \left( ty+\left( 1-t\right) x+\mu \xi \right) -\nabla U\left( ty+\left( 1-t\right) x\right) ,y-x\right\rangle \right. \\&\qquad \left. \hbox {d}t\,e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\xi \circ \Vert \xi \Vert ^{p-2}\hbox {d}\xi \right\| \\&\quad \le \frac{1}{\mu \kappa }\int \int _{0}^{1}\left\| \nabla U\left( ty+\left( 1-t\right) x+\mu \xi \right) -\nabla U\left( ty+\left( 1-t\right) x\right) \right\| \left\| y-x\right\| \\&\qquad \hbox {d}t\,e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi \circ \Vert \xi \Vert ^{p-2}\right\| \hbox {d}\xi \\&\quad \le \sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}\kappa }\int _{{\mathbb {R}}^{d}}\left\| \xi \right\| ^{\alpha _{i}}\left\| y-x\right\| \,e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi ^{p-1}\right\| \hbox {d}\xi . \end{aligned}$$
Since \(p\le 2\), we have
$$\begin{aligned}&\left\| \nabla U_{\mu }(y)-\nabla U_{\mu }(x)\right\| \\&\quad \le \sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2-p}{p}}{\mathbb {E}}\left( \left\| \xi \right\| _{p}^{p-1+\alpha }\right) \left\| y-x\right\| \\&\quad {\mathop {\le }\limits ^{_{1}}}\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2-p}{p}}{\mathbb {E}}\left( \left\| \xi \right\| _{p}^{p}\right) ^{\frac{p-1+\alpha }{p}}\left\| y-x\right\| \\&\quad {\mathop {\le }\limits ^{_{2}}}\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2-p}{p}}\left( d+\frac{p}{2}\right) ^{\frac{p-1+\alpha }{p}}\left\| y-x\right\| \\&\quad {\mathop {\le }\limits ^{}}\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}\left\| y-x\right\| , \end{aligned}$$
where step 1 follows from Jensen inequality and \(\alpha _{i}\le 1\), step 2 is due to 2.1 and in the last two steps is because of simplification for large enough d and small enough \(\mu \).
\(\square \)
1.3 Appendix C: Proofs Under LSI
1.3.1 Proof of Lemma 3.3
Lemma 7.6
Suppose \(\pi =e^{-U}\) satisfies \(\alpha \)-mixture weakly smooth. Let \(p_{0}=N(0,\frac{1}{L}I)\). Then, \(H(p_{0}\vert \pi )\le U(0)-\frac{d}{2}\log \frac{2\Pi e}{L}+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left( \frac{d}{L}\right) ^{\frac{1+\alpha _{i}}{2}}=O(d).\)
Proof
Since U is mixture weakly smooth, for all \(x\in {\mathbb {R}}^{d}\) we have
$$\begin{aligned} U(x)&\le U(0)+\langle \nabla U(0),x\rangle +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\Vert x\Vert ^{1+\alpha _{i}}\\&=U(0)+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\Vert x\Vert ^{1+\alpha _{i}}. \end{aligned}$$
Let \(X\sim \rho =N(0,\frac{1}{L}I)\). Then
$$\begin{aligned} {\mathbb {E}}_{\rho }[U(X)]&\le U(0)+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}{\mathbb {E}}_{\rho }\left( \Vert x\Vert ^{1+\alpha _{i}}\right) \\&\le U(0)+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}{\mathbb {E}}_{\rho }\left( \Vert x\Vert ^{2}\right) ^{\frac{1+\alpha _{i}}{2}}\\&\le U(0)+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left( \frac{d}{L}\right) ^{\frac{1+\alpha _{i}}{2}}. \end{aligned}$$
Recall the entropy of \(\rho \) is \(H(\rho )=-{\mathbb {E}}_{\rho }[\log \rho (X)]=\frac{d}{2}\log \frac{2\Pi e}{L}\). Therefore, the KL divergence is
$$\begin{aligned} {\mathbb {E}}(\rho \vert \pi )&=\int \rho \left( \log \rho +U\right) \hbox {d}x\\&=-H(\rho )+{\mathbb {E}}_{\rho }[U]\\&\le U(0)-\frac{d}{2}\log \frac{2\Pi e}{L}+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left( \frac{d}{L}\right) ^{\frac{1+\alpha _{i}}{2}}\\&=O(d). \end{aligned}$$
This is the desired result. \(\square \)
1.3.2 Proof of Lemma 3.3
Lemma 7.7
Assume \(\pi =e^{-U(x)}\) is \(\alpha \)-mixture weakly smooth. Then
$$\begin{aligned} {\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] \le 2\left( \sum _{i}L_{i}\right) ^{2}d^{\frac{3}{p}}. \end{aligned}$$
Proof
It is well known that for any test function \(\phi \left( x\right) :{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\), c.f. [38] we have
$$\begin{aligned} \frac{\hbox {d}}{\hbox {d}t}{\mathbb {E}}_{p_{t}}\left[ \phi \left( x\right) \right] =\int \left( \left( \triangle \phi \left( x\right) \right) -\left\langle \nabla U\left( x\right) ,\nabla \phi \left( x\right) \right\rangle \right) p_{t}\left( x\right) \hbox {d}x. \end{aligned}$$
Since \(\pi \) is stationary distribution of \(p_{t}(x)\), let \(\phi \left( x\right) =U_{\mu }\left( x\right) \), we have
$$\begin{aligned} \frac{\hbox {d}}{\hbox {d}t}{\mathbb {E}}_{\pi }\left[ U_{\mu }\left( x\right) \right] =\int \left( \left( \triangle U_{\mu }\left( x\right) \right) -\left\langle \nabla U\left( x\right) ,\nabla U_{\mu }\left( x\right) \right\rangle \right) \pi \left( x\right) \hbox {d}x=0. \end{aligned}$$
So
$$\begin{aligned} {\mathbb {E}}_{\pi }\left\langle \nabla U\left( x\right) ,\nabla U_{\mu }\left( x\right) \right\rangle&={\mathbb {E}}_{\pi }\left( \triangle U_{\mu }\left( x\right) \right) {\mathop {\le }\limits ^{}}d\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}, \end{aligned}$$
where the last step comes from Lemma 2.10 that \(\nabla U_{\mu }\left( x\right) \) is \(\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}\)-Lipschitz, \(\nabla ^{2}U_{\mu }\left( x\right) \preceq \left( \sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}\right) \,I\). In addition,
$$\begin{aligned} {\mathbb {E}}_{\pi }\left\langle \nabla U\left( x\right) ,\nabla U_{\mu }\left( x\right) \right\rangle&={\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] +{\mathbb {E}}_{\pi }\left\langle \nabla U\left( x\right) ,\nabla U_{\mu }\left( x\right) -\nabla U\left( x\right) \right\rangle \\&{\mathop {\ge }\limits ^{_{1}}}{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] -{\mathbb {E}}_{\pi }\left\| \nabla U\left( x\right) \right\| \left\| \nabla U_{\mu }\left( x\right) -\nabla U\left( x\right) \right\| \\&{\mathop {\ge }\limits ^{}}{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] -\sqrt{{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] }\sum _{i}L_{i}\mu ^{\alpha _{i}}d^{\frac{3}{p}}, \end{aligned}$$
where step 1 follows from Young inequality and the last step comes from Cauchy inequality and Lemma 2.10. From quadratic inequality,
$$\begin{aligned} {\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] -\sqrt{{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] }\sum _{i}L_{i}\mu ^{\alpha _{i}}d^{\frac{3}{p}}\le d\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}} \end{aligned}$$
and since \(\sqrt{{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] }\ge 0\), we obtain
$$\begin{aligned} \sqrt{{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] }&\le \frac{1}{2}\left[ \sqrt{\left( \sum _{i}L_{i}\mu ^{\alpha _{i}}\right) ^{2}d^{\frac{6}{p}}+4d\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}}+\sum _{i}L_{i}\mu ^{\alpha _{i}}d^{\frac{3}{p}}\right] . \end{aligned}$$
Since it is true for every \(\mu ,\) simply choose \(\mu =1,\) we get
$$\begin{aligned} {\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right]&\le \frac{1}{4}\left[ \sqrt{\left( \sum _{i}L_{i}\right) ^{2}d^{\frac{6}{p}}+4d\left( \sum _{i}L_{i}\right) d^{\frac{2}{p}}}+\sum _{i}L_{i}d^{\frac{3}{p}}\right] ^{2}\\&\le 2\left( \sum _{i}L_{i}\right) ^{2}d^{\frac{3}{p}}, \end{aligned}$$
for large enough d. \(\square \)
1.3.3 Proof of Lemma 3.1
Lemma 7.8
Suppose \(\pi \) is \(\gamma -\)log-Sobolev, \(\alpha \)-mixture weakly smooth with \(\max \left\{ L_{i}\right\} =L\ge 1\). If \(0<\eta \le \left( \frac{\gamma }{9N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\), then along each step of ULA (3.6),
$$\begin{aligned} H(p_{k+1}\vert \pi )\le e^{-\gamma \eta }H(p_{k}\vert \pi )+2\eta ^{\alpha +1}D_{3}, \end{aligned}$$
(7.12)
where \(D_{3}=\sum _{i}10N^{3}L^{6}+16NL^{4}+8N^{2}L^{4}d^{\frac{3}{p}}+4NL^{2}d\).
Proof
We adapt the proof of [38]. First, recall that the discretization of the LMC is
$$\begin{aligned} x_{k,t}{\mathop {=}\limits ^{}}x_{k}-t\nabla U(x_{k})+\sqrt{2t}\,z_{k}, \end{aligned}$$
where \(z_{k}\sim N(0,I)\) is independent of \(x_{k}\). Let \(x_{k}\sim p_{k}\) and \(x^{*}\sim \pi \) with an optimal coupling \((x_{k},x^{*})\) so that \({\mathbb {E}}[\Vert x_{k}-x^{*}\Vert ^{2}]=W_{2}(p_{k},\pi )^{2}\). Let \(D_{1i}=8NL_{i}^{2+2\alpha _{i}}\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) +16L_{i}^{2+2\alpha _{i}}+8L_{i}^{2}\left( \sum _{i}L_{i}\right) ^{2}d^{\frac{3}{p}}+4L_{i}^{2}d^{\alpha _{i}}\), we deduce
$$\begin{aligned}&L_{i}^{2}E_{p_{k}}\left[ \left\| -t\nabla U(x_{k})+\sqrt{2t}z_{k}\right\| ^{2\alpha _{i}}\right] \nonumber \\&\quad {\mathop {\le }\limits ^{_{1}}}2L_{i}^{2}t^{2\alpha _{i}}{\mathbb {E}}_{p_{k}}\left[ \left\| \nabla U(x_{k})\right\| ^{2\alpha _{i}}\right] +4L_{i}^{2}t^{\alpha _{i}}{\mathbb {E}}_{p_{k}}\left[ \left\| z_{k}\right\| ^{2\alpha _{i}}\right] \nonumber \\&\quad {\mathop {\le }\limits ^{_{2}}}2L_{i}^{2}t^{2\alpha _{i}}{\mathbb {E}}_{p_{k}}\left[ \left\| \nabla U(x_{k})\right\| ^{2\alpha _{i}}\right] +4L_{i}^{2}t^{\alpha _{i}}{\mathbb {E}}_{p_{k}}\left[ \left\| z_{k}\right\| ^{2}\right] ^{\alpha _{i}}\nonumber \\&\quad {\mathop {\le }\limits ^{_{3}}}4L_{i}^{2}t^{2\alpha _{i}}{\mathbb {E}}\left[ \left\| \nabla U(x_{k})-\nabla U(x^{*})\right\| ^{2\alpha _{i}}+\left\| \nabla U(x^{*})\right\| ^{2\alpha _{i}}\right] +4L_{i}^{2}t^{\alpha _{i}}d^{\alpha _{i}}\nonumber \\&\quad {\mathop {\le }\limits ^{_{4}}}4L_{i}^{2}t^{2\alpha _{i}}\mathrm {{\mathbb {E}}}\left( \sum _{i}L_{i}\left\| x_{k}-x^{*}\right\| ^{\alpha _{i}}\right) ^{2\alpha _{i}}+4L_{i}^{2}t^{2\alpha _{i}}{\mathbb {E}}\left\| \nabla U(x^{*})\right\| ^{2\alpha _{i}}+4L_{i}^{2}t^{\alpha _{i}}d^{\alpha _{i}}\nonumber \\&\quad \le 8L_{i}^{2+2\alpha _{i}}t^{2\alpha _{i}}N\sum _{j}L_{i}^{2\alpha _{i}}\mathrm {{\mathbb {E}}}\left[ \left\| x_{k}-x^{*}\right\| ^{2\alpha _{j}\alpha _{i}}\right] +4L_{i}^{2}t^{2\alpha }{\mathbb {E}}\left\| \nabla U(x^{*})\right\| ^{2}\nonumber \\&\qquad +4L_{i}^{2}t^{2\alpha }+4L_{i}^{2}t^{\alpha }d^{\alpha }\nonumber \\&\quad {\mathop {\le }\limits ^{_{5}}}8NL_{i}^{2+2\alpha _{i}}t^{2\alpha _{i}}\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) \mathrm {{\mathbb {E}}}\left[ 1+\left\| x_{k}-x^{*}\right\| ^{2}\right] +4L_{i}^{2}t^{2\alpha }{\mathbb {E}}\left\| \nabla U(x^{*})\right\| ^{2}\nonumber \\&\qquad +4L_{i}^{2}t^{2\alpha }+4L_{i}^{2}t^{\alpha }d^{\alpha }\nonumber \\&\quad {\mathop {\le }\limits ^{}}8NL_{i}^{2+2\alpha _{i}}\eta ^{2\alpha }\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) \mathrm {{\mathbb {E}}}\left[ \left\| x_{k}-x^{*}\right\| ^{2}\right] \nonumber \\&\qquad +\left( 8NL_{i}^{2+2\alpha _{i}}\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) +16L_{i}^{2+2\alpha _{i}}+8L_{i}^{2}\left( \sum _{i}L_{i}\right) ^{2}d^{\frac{3}{p}}+4L_{i}^{2}d^{\alpha _{i}}\right) \eta ^{\alpha _{i}}\nonumber \\&\quad \le \frac{16N}{\gamma }\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) L^{2+2\alpha _{i}}\eta ^{2\alpha _{i}}H(p_{k}\vert \pi )+D_{1i}\eta ^{\alpha _{i}}, \end{aligned}$$
(7.13)
where step 1 follows from Lemma 7.22 in Appendix F, step 2 is from \(\alpha \le 1\) and Jensen’s inequality, step 3 comes from normal distribution, and step 4 follows our Assumption 2.1, and in step 5 we have used \(\alpha _{i}\le 1\) and the last step is due to Talagrand inequality which comes from log-Sobolev inequality and Lemma 7.25 in Appendix F. Similarly, we get
$$\begin{aligned}&\mathrm {{\mathbb {E}}}_{p_{kt}}\left\| \nabla U(x_{k})-\nabla U(x_{k,t})\right\| ^{2}\nonumber \\&\quad {\mathop {\le }\limits ^{_{1}}}\sum _{i}L_{i}^{2}\mathrm {{\mathbb {E}}}_{p_{kt}}\left\| {\tilde{x}}_{k,t}-x_{k}\right\| ^{2\alpha _{i}}\nonumber \\&\quad =\sum _{i}L_{i}^{2}\mathrm {{\mathbb {E}}}_{p_{k}}\left\| -t\nabla U(x_{k})+\sqrt{2t}z_{k}\right\| ^{2\alpha _{i}}\nonumber \\&\quad {\mathop {\le }\limits ^{_{2}}}\sum _{i}\frac{16N}{\gamma }\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) L^{2+2\alpha _{i}}\eta ^{2\alpha _{i}}H(p_{k}\vert \pi )+\left( \sum _{i}D_{1i}\eta ^{\alpha _{i}}\right) \nonumber \\&\quad {\mathop {\le }\limits ^{_{3}}}\frac{20N^{3}}{\gamma }L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{3}\eta ^{\alpha }, \end{aligned}$$
(7.14)
where step 1 follows from Assumption 2.1, step 2 comes from similar reasoning as equation (7.13), and the last step comes from \(\eta \le \frac{1}{L}\) and \(\eta \le 1\) and definition of \(D_{3}\). Therefore, from [38] Lemma 3, the time derivative of KL divergence along LMC is bounded by
$$\begin{aligned} \frac{\hbox {d}}{\hbox {d}t}H\left( p_{k,t}\vert \pi \right)&\le -\frac{3}{4}I\left( p_{k,t}\vert \pi \right) +{\mathbb {E}}_{p_{kt}}\left[ \left\| \nabla U(x_{k,t})-\nabla U(x_{k})\right\| ^{2}\right] \\&\le -\frac{3}{4}I(p_{k}\vert \pi )+\frac{20N^{3}}{\gamma }L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{3}\eta ^{\alpha }\\&\le -\frac{3\gamma }{2}H(p_{k,t}\vert \pi )+\frac{20N^{3}}{\gamma }L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{3}\eta ^{\alpha }, \end{aligned}$$
where in the last inequality we have used the definition 7.1 of LSI inequality. Note that we do not use the Lipschitz gradient condition in the original [38] Lemma 3 here and we provide a modified version [38] Lemma 3 for stochastic gradient in the later part.
Multiplying both sides by \(e^{\frac{3\gamma }{2}t}\) and integrating both sides from \(t=0\) to \(t=\eta \), we obtain
$$\begin{aligned}&e^{\frac{3\gamma }{2}\eta }H(p_{k+1}\vert \pi )-H(p_{k}\vert \pi )\nonumber \\&\quad \le 2\left( \frac{e^{\frac{3\gamma }{2}\eta }-1}{3\gamma }\right) \left( \frac{20N^{3}}{\gamma }L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{3}\eta ^{\alpha }\right) \end{aligned}$$
(7.15)
$$\begin{aligned}&\quad \le 2\eta \left( \frac{20N^{3}}{\gamma }L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{3}\eta ^{\alpha }\right) , \end{aligned}$$
(7.16)
where the last line holds by \(e^{c}\le 1+2c\) for \(0<c=\frac{3\gamma }{2}\eta <1\). Rearranging the term of the above inequality and using the facts that \(1+\eta ^{1+2\alpha }\frac{40N^{3}}{\gamma }L^{6}\le 1+\frac{\gamma \eta }{2}\le e^{\frac{\gamma \eta }{2}}\) when \(\eta \le \left( \frac{\gamma }{9N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\) and \(e^{-\frac{3\gamma }{2}\eta }\le 1\) leads to
$$\begin{aligned} H(p_{k+1}\vert \pi )&\le e^{-\frac{3\gamma }{2}\eta }\left( 1+\eta ^{1+2\alpha }\frac{40N^{3}}{\gamma }L^{6}\right) H(p_{k}\vert \pi )+2\eta ^{\alpha +1}D_{3}\nonumber \\&\le e^{-\gamma \eta }H(p_{k}\vert \pi )+2\eta ^{\alpha +1}D_{3}. \end{aligned}$$
(7.17)
as desired. \(\square \)
1.3.4 Proof of Theorem 3.2
Theorem 7.9
Suppose \(\pi \) is \(\gamma -\)log-Sobolev, \(\alpha \)-mixture weakly smooth. Let \(L=1\vee \max \left\{ L_{i}\right\} \), and for any \(x_{0}\sim p_{0}\) with \(H(p_{0}\vert \pi )=C_{0}<\infty \), the iterates \(x_{k}\sim p_{k}\) of ULA with step size
$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma },\left( \frac{\gamma }{9N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\right\} \end{aligned}$$
(7.18)
satisfies
$$\begin{aligned} H(p_{k}\vert \pi )\le e^{-\frac{3\gamma }{2}\eta k}H(p_{0}\vert \pi )+2\eta ^{\alpha +1}D_{3}, \end{aligned}$$
(7.19)
where \(D_{3}=\sum _{i}10N^{3}L^{6}+16NL^{4}+8N^{2}L^{4}d^{\frac{3}{p}}+4NL^{2}d\). Then, for any \(\epsilon >0\), to achieve \(H(p_{k}\vert \pi )<\epsilon \), it suffices to run LMC with step size
$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma },\left( \frac{\gamma }{9N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }},\left( \frac{3\epsilon \gamma }{16D_{3}}\right) ^{\frac{1}{\alpha }}\right\} \end{aligned}$$
(7.20)
for \(k\ge \frac{1}{\gamma \eta }\log \frac{2\,H\left( p_{0}\vert \pi \right) }{\epsilon }\) iterations.
Proof
Applying inequality 7.19 recursively, and using the inequality \(1-e^{-c}\ge \frac{3}{4}c\) for \(0<c=\gamma \eta \le \frac{1}{4}\) we obtain
$$\begin{aligned} H(p_{k}\vert \pi )&\le \,e^{-\gamma \eta k}H(p_{0}\vert \pi )+\frac{2\eta ^{\alpha +1}D_{3}}{1-e^{-\gamma \eta }}\nonumber \\&\le \,e^{-\gamma \eta k}H(p_{0}\vert \pi )+\frac{2\eta ^{\alpha +1}D_{3}}{\frac{3}{4}\gamma \eta }\nonumber \\&\le \,e^{-\gamma \eta k}H(p_{0}\vert \pi )+\frac{8\eta ^{\alpha }D_{3}}{3\gamma }. \end{aligned}$$
(7.21)
Note that last inequality holds if we choose \(\eta \) such that it satisfies
$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma },\left( \frac{\gamma }{9N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\right\} . \end{aligned}$$
Given \(\epsilon >0\), if we further assume \(\eta \le \left( \frac{3\epsilon \gamma }{16D_{3}}\right) ^{\frac{1}{\alpha }}\), then the above implies \(H(p_{k}\vert \pi )\le e^{-\gamma \eta k}H(p_{0}\vert \pi )+\frac{\epsilon }{2}.\) This means for \(k\ge \frac{1}{\gamma \eta }\log \frac{2\,H\left( p_{0}\vert \pi \right) }{\epsilon },\) we have \(H(p_{k}\vert \pi )\le \frac{\epsilon }{2}+\frac{\epsilon }{2}=\epsilon \), as desired. \(\square \)
1.4 Appendix D: Proof of Sampling via Smoothing Potential
1.4.1 Proof of Lemma 3.4
Lemma 7.10
For any \(x_{k}\in {\mathbb {R}}^{d}\), then \(g_{\mu }(x_{k},\zeta _{1})\) is an unbiased estimator of \(\nabla U_{\mu }\) such that
$$\begin{aligned} \textrm{Var}\left[ g_{\mu }(x_{k},\zeta _{1})\right]&\le 4N^{2}L^{2}\mu ^{2\alpha }d^{\frac{2\alpha }{p}}. \end{aligned}$$
Proof
Recall that by definition of \(U_{\mu }\), we have \(\nabla U_{\mu }(x)=\mathrm {\mathrm {{\mathbb {E}}}}_{\zeta }[U(x+\mu {\zeta })]\), where \({\zeta }\sim N_{p}(0,I_{d})\), and is independent of \(\zeta _{1}\). Clearly, \({\mathbb {E}}{}_{{{\zeta _{1}}}}[g_{\mu }(x,\zeta _{1})]={\mathbb {E}}{}_{{{\zeta _{1}}}}\nabla U(x+\mu \zeta _{1})=\nabla {\mathbb {E}}{}_{{{\zeta _{1}}}}U(x+\mu \zeta _{1})=\nabla U_{\mu }(x)\) by exchange gradient and expectation and the definition of \(U_{\mu }(x)\). We now proceed to bound the variance of \(g_{\mu }(x,\zeta _{1})\). We have:
$$\begin{aligned}&\mathrm {{\mathbb {E}}}_{{\zeta _{1}}} [\Vert \nabla U_{\mu }(x)-g_{\mu }(x,\zeta _{1})\Vert _{2}^{2}]\\&\quad \le \mathrm {{\mathbb {E}}}_{\zeta _{1},{\zeta }}[\Vert \nabla U(x+\mu {\zeta })-\nabla U(x+\mu {\zeta _{1}})\Vert ^{2}].\\&\quad \le N\sum _{i}L_{i}^{2}\mathrm {{\mathbb {E}}}_{{\zeta _{1}},{\zeta }}[\Vert \mu ({\zeta }-{\zeta _{1}})\Vert ^{2\alpha _{i}}\\&\quad \le N\sum _{i}L_{i}^{2}\mu ^{2\alpha _{i}}\mathrm {{\mathbb {E}}}_{\zeta _{1},{\zeta }}[\Vert {\zeta }-{\zeta _{1}}\Vert ^{2\alpha _{i}}]\\&\quad \le 2N\sum _{i}L_{i}^{2}\mu ^{2\alpha _{i}}\left( \mathrm {{\mathbb {E}}}\left[ \Vert {\zeta }\Vert ^{2\alpha _{i}}\right] +\mathrm {{\mathbb {E}}}\left[ \Vert {\zeta _{1}}\Vert ^{2\alpha _{i}}\right] \right) \\&\quad \le 2N\sum _{i}L_{i}^{2}\mu ^{2\alpha _{i}}\left( \left( \mathrm {{\mathbb {E}}}\left[ \Vert {\zeta }\Vert ^{2}\right] \right) ^{\alpha _{i}}+\left( \mathrm {{\mathbb {E}}}\left[ \Vert \zeta _{1}\Vert ^{2}\right] \right) ^{\alpha _{i}}\right) \\&\quad \le 4N\sum _{i}L_{i}^{2}\mu ^{2\alpha _{i}}d^{\frac{2\alpha _{i}}{p}}\\&\quad \le 4N^{2}L^{2}\mu ^{2\alpha }d^{\frac{2\alpha }{p}}, \end{aligned}$$
as claimed. \(\square \)
1.4.2 Proof of Lemma 3.6
Before proving Theorem 3.6, we need an additional lemma.
Lemma 7.11
([38] modified Lemma 3) Suppose \(x_{k,t}\) is the interpolation of the discretized process (1.2). Let \(p_{k,t}\), \(p_{kt}\) and \(p_{kt\zeta }\) denote its distribution, the joint distribution of \(x_{k,t}\) and \(x_{k}\) and the joint distribution of \(x_{k,t}\), \(x_{k}\) and \(\zeta \), respectively. Here \(g_{\mu }(x_{k},\zeta )\) is an estimate of \(\nabla U_{\mu }(x_{k})\) with noise \(\zeta \) such that \(E_{\zeta }g_{\mu }(x_{k},\zeta )=\nabla U_{\mu }(x_{k})\). Then,
$$\begin{aligned} {\displaystyle \frac{\hbox {d}}{\hbox {d}t}H\left( p_{k,t}\vert \pi _{\mu }\right) \le -\frac{3}{4}I\left( p_{k,t}\vert \pi _{\mu }\right) +{\mathbb {E}}_{p_{kt\zeta }}\left[ \left\| \nabla U_{\mu }(x_{k,t})-g_{\mu }(x_{k},\zeta )\right\| ^{2}\right] }. \end{aligned}$$
(7.22)
Proof
The steps follow exactly as in Lemma 3, and we provide the proof here for completeness. For each \(t>0\), let \(p_{k\zeta \vert t}(x_{k},\zeta )\) denote the distributions of \(x_{k}\) and \(\zeta \) conditioned on \(x_{k,t}\) and \(p_{t\vert k\zeta }(x_{k,t})\) denote the distributions of \(x_{k,t}\) conditioned on \(x_{k}\) and \(\zeta \). Following the Fokker–Planck equation, we have
$$\begin{aligned} \frac{\partial p_{t\vert k\zeta }(x_{k,t})}{\partial t}=\nabla \cdot \left( p_{t\vert k\zeta }(x_{k,t})g_{\mu }(x_{k},\zeta )\right) +\triangle p_{t\vert k\zeta }(x_{k,t}), \end{aligned}$$
(7.23)
which integrating with respect to \(x_{k}\) and \(\zeta \) achieves
$$\begin{aligned} \frac{\partial p_{k,t}(x)}{\partial t}&=\int \int \frac{\partial p_{t\vert k\zeta }(x)}{\partial t}p_{k\zeta }(x_{k},\zeta )\hbox {d}x_{k}\hbox {d}\zeta \nonumber \\&=\int \int \left( \nabla \cdot \left( p_{t\vert k\zeta }(x_{k,t})g_{\mu }(x_{k},\zeta )\right) +\triangle p_{t\vert k\zeta }(x_{k,t})\right) \hbox {d}x_{k}\hbox {d}\zeta \nonumber \\&=\int \int \left( \nabla \cdot \left( p_{t\vert k\zeta }(x_{k,t})g_{\mu }(x_{k},\zeta )\right) \right) +\triangle p_{k,t}(x)\nonumber \\&=\nabla \cdot (p_{k,t}(x)\int \int p_{k\zeta \vert t}(x_{k})g_{\mu }(x_{k},\zeta )\hbox {d}x_{k}\hbox {d}\zeta )+\triangle p_{k,t}(x) \end{aligned}$$
(7.24)
$$\begin{aligned}&=\nabla \cdot \ (p_{k,t}(x)\mathrm {{\mathbb {E}}}_{p_{k\zeta \vert t}}[g_{\mu }(x_{k},\zeta )\vert x_{k,t}=x])+\triangle p_{k,t}(x). \end{aligned}$$
(7.25)
Combining with \(\int p_{t}\frac{\partial }{\partial t}\log \frac{p_{t}}{\pi _{\mu }}\,\hbox {d}x=\int \frac{\partial p_{t}}{\partial t}\,\hbox {d}x=\frac{\hbox {d}}{\hbox {d}t}\int p_{t}\,\hbox {d}x=0\), we get the following inequality for time derivative of KL-divergence.
$$\begin{aligned} \frac{\hbox {d}}{\hbox {d}t}H\left( p_{k,t}\vert \pi _{\mu }\right)&=\frac{\hbox {d}}{\hbox {d}t}\int p_{k,t}(x)\log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&=\int \frac{\partial p_{k,t}}{\partial t}(x)\log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&=\int \left[ \nabla \cdot \left( p_{k,t}(x)\mathrm {{\mathbb {E}}}_{p_{k\zeta \vert t}}[g_{\mu }(x_{k},\zeta )\vert x_{k,t}=x]\right) \right] \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&\quad +\int \left[ \triangle p_{k,t}(x)\right] \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&{\mathop {=}\limits ^{\left( i\right) }}\int \left[ \nabla \cdot \left( p_{k,t}(x)\mathrm {{\mathbb {E}}}_{p_{k\zeta \vert t}}[g_{\mu }(x_{k},\zeta )\vert x_{k,t}=x]\right) \right] \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&\quad +\int \left[ \nabla \cdot \left( \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) -\nabla U_{\mu }(x)\right) \right] \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&{\mathop {=}\limits ^{\left( ii\right) }}-\int p_{k,t}(x)\left\langle \mathrm {{\mathbb {E}}}_{p_{k\zeta \vert t}}[g_{\mu }(x_{k},\zeta )\vert x_{k,t}=x],\ \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \right\rangle \hbox {d}x\nonumber \\&\quad -\int p_{k,t}(x)\left\langle \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) -\nabla U_{\mu }(x),\ \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \right\rangle \hbox {d}x\nonumber \\&=-I\left( p_{k,t}\vert \pi _{\mu }\right) \nonumber \\&\quad +\int p_{k,t}(x)\left\langle \nabla U_{\mu }(x)-\mathrm {{\mathbb {E}}}_{p_{k\zeta \vert t}}[g_{\mu }(x_{k},\zeta )\vert x_{k,t}=x],\ {\displaystyle \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) }\right\rangle \hbox {d}x\nonumber \\&=-I\left( p_{k,t}\vert \pi _{\mu }\right) +\mathrm {{\mathbb {E}}}_{p_{kt\zeta }}\left\langle \nabla U_{\mu }(x_{k,t})-g_{\mu }(x_{k},\zeta ),\ {\displaystyle \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) }\right\rangle \nonumber \\&{\mathop {\le }\limits ^{\left( iii\right) }}-I\left( p_{k,t}\vert \pi _{\mu }\right) \nonumber \\&\quad +\textrm{E}_{p_{kt\zeta }}\left\| \nabla U_{\mu }(x_{k,t})-g_{\mu }(x_{k},\zeta )\right\| ^{2}+\frac{1}{4}\mathrm {{\mathbb {E}}}_{p_{k,t}}\left\| \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \right\| ^{2}\nonumber \\&=-\frac{3}{4}I\left( p_{k,t}\vert \pi _{\mu }\right) +\mathrm {{\mathbb {E}}}_{p_{kt\zeta }}\left\| \nabla U_{\mu }(x_{k,t})-g_{\mu }(x_{k},\zeta )\right\| ^{2} \end{aligned}$$
(7.26)
in which equality \(\left( i\right) \) follows from \(\triangle p_{k,t}=\nabla \cdot (\nabla p_{k,t})\), equality \(\left( ii\right) \) follows from the divergence theorem, inequality \(\left( iii\right) \) follows from \(\left\langle u,\ v\right\rangle {\displaystyle \le \Vert u\Vert ^{2}+\frac{1}{4}\Vert v\Vert ^{2}}\), and in the last step, the expectation is taken with respect to both \(x_{k}\),\(x_{k,t}\) and \(\zeta \). \(\square \)
We now ready to state and prove Theorem 3.6.
Theorem 7.12
Suppose \(\pi _{\mu }\) is \(\gamma _{1}-\)log-Sobolev, \(\alpha \)-mixture weakly smooth, \(L=1\vee \max \left\{ L_{i}\right\} \), and for any \(x_{0}\sim p_{0}\) with \(H(p_{0}\vert \pi )=C_{0}<\infty \), the iterates \(x_{k}\sim p_{k}\) of ULA with step size
$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma },\left( \frac{\gamma _{1}}{13N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\right\} \end{aligned}$$
(7.27)
satisfies
$$\begin{aligned} H(p_{k}\vert \pi _{\mu })\le e^{-\frac{3\gamma _{1}}{2}\eta k}H(p_{0}\vert \pi _{\mu })+2\eta ^{\alpha +1}D_{4}, \end{aligned}$$
(7.28)
where \(D_{4}=\sum _{i}10N^{3}L^{6}+16NL^{4}+8N^{2}L^{4}d^{\frac{3}{p}}+4NL^{2}d+8N^{2}L^{2}d^{\frac{2\alpha }{p}}\). Then, for any \(\epsilon >0\), to achieve \(H(p_{k}\vert \pi )<\epsilon \), it suffices to run LMC with step size
$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma _{1}},\left( \frac{\gamma _{1}}{13N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }},\left( \frac{3\epsilon \gamma _{1}}{16D_{4}}\right) ^{\frac{1}{\alpha }}\right\} \end{aligned}$$
(7.29)
for \(k\ge \frac{2}{\gamma _{1}\eta }\log \frac{3\,H\left( p_{0}\vert \pi _{\mu }\right) }{\epsilon }\) iterations.
Proof
We adapt the proof of [38]. First, recall that the discretization of the ULA is
$$\begin{aligned} x_{k,t}{\mathop {=}\limits ^{}}x_{k}-\eta g_{\mu }(x_{k},\zeta )+\sqrt{2\eta }\,z_{k}, \end{aligned}$$
where \(z_{k}\sim N(0,I)\) is independent of \(x_{k}\). Let \(x_{k}\sim p_{k}\) and \(x^{*}\sim \pi \) with an optimal coupling \((x_{k},x^{*})\) so that \({\mathbb {E}}[\Vert x_{k}-x^{*}\Vert ^{2}]=W_{2}(p_{\mu ,k},\pi _{\mu })^{2}\). Choosing \(\mu =\sqrt{\eta }\), we have
$$\begin{aligned}&\mathrm {{\mathbb {E}}}_{p_{kt\zeta }}\left\| \nabla U_{\mu }(x_{k,t})-g_{\mu }(x_{k},\zeta )\right\| ^{2}\\&\quad {\mathop {\le }\limits ^{_{1}}}2\left[ \mathrm {{\mathbb {E}}}_{p_{kt\zeta }}\left\| \nabla U_{\mu }(x_{k,t})-\nabla U_{\mu }(x_{k})\right\| ^{2}+\left\| \nabla U_{\mu }(x_{k})-g_{\mu }(x_{k},\zeta )\right\| ^{2}\right] \\&\quad {\mathop {\le }\limits ^{_{2}}}\frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi _{\mu })+D_{3}\eta ^{\alpha }+8N^{2}L^{2}\mu ^{2\alpha }d^{\frac{2\alpha }{p}}\\&\quad {\mathop {\le }\limits ^{}}\frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi _{\mu })+D_{4}\eta ^{\alpha }, \end{aligned}$$
where step 1 follows from Young inequality and Assumption 2, step 2 comes from equation (7.14), and the last step comes from \(\eta \le \frac{1}{L}\) and \(\eta \le 1\) and the definition of \(D_{4}\). Therefore, from Lemma 3.6, the time derivative of KL divergence along LMC is bounded by:
$$\begin{aligned} \frac{\hbox {d}}{\hbox {d}t}H\left( p_{k,t}\vert \pi _{\mu }\right)&\le -\frac{3}{4}I(p_{k,t}\vert \pi _{\mu })+\frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi _{\mu })+D_{4}\eta ^{\alpha }\nonumber \\&\le -{\frac{3\gamma _{1}}{2}}H(p_{k,t}\vert \pi _{\mu })+\frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi _{\mu })+D_{4}\eta ^{\alpha }, \end{aligned}$$
(7.30)
where in the last inequality we have used the definition 7.1 of LSI inequality. Multiplying both sides by \(e^{\frac{3\gamma _{1}}{2}t}\), and integrating both sides from \(t=0\) to \(t=\eta \) we obtain
$$\begin{aligned} e^{\frac{3\gamma }{2}\eta }H(p_{k+1}\vert \pi _{\mu })-H(p_{k}\vert \pi _{\mu })&\le 2\left( \frac{e^{\frac{3\gamma _{1}}{2}\eta }-1}{3\gamma _{1}}\right) \left( \frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi _{\mu })+D_{4}\eta ^{\alpha }\right) \nonumber \\&\le 2\eta \left( \frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{4}\eta ^{\alpha }\right) , \end{aligned}$$
(7.31)
where the last line holds by \(e^{c}\le 1+2c\) for \(0<c=\frac{3\gamma _{1}}{2}\eta <1\). Rearranging the term of the above inequality and using the facts that \(1+\eta ^{1+2\alpha }\frac{80N^{3}}{\gamma _{1}}L^{6}\le 1+\frac{\gamma _{1}\eta }{2}\le e^{\frac{\gamma _{1}\eta }{2}}\) when \(\eta \le \left( \frac{\gamma _{1}}{13N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\) and \(e^{-\frac{3\gamma _{1}}{2}\eta }\le 1\) leads to
$$\begin{aligned} H(p_{k+1}\vert \pi _{\mu })&\le e^{-\frac{3\gamma _{1}}{2}\eta }\left( 1+\eta ^{1+2\alpha }\frac{80N^{3}}{\gamma _{1}}L^{6}\right) H(p_{k}\vert \pi _{\mu })+2\eta ^{\alpha +1}D_{4}\nonumber \\&\le e^{-\gamma _{1}\eta }H(p_{k}\vert \pi _{\mu })+2\eta ^{\alpha +1}D_{3}. \end{aligned}$$
(7.32)
Applying this inequality recursively and using the inequality \(1-e^{-c}\ge \frac{3}{4}c\) for \(0<c=\gamma _{1}\eta \le \frac{1}{4}\), we obtain
$$\begin{aligned} H(p_{k}\vert \pi _{\mu })&\le \,e^{-\gamma _{1}\eta k}H(p_{0}\vert \pi _{\mu })+\frac{2\eta ^{\alpha +1}D_{4}}{1-e^{-\gamma _{1}\eta }}\nonumber \\&\le \,e^{-\gamma _{1}\eta k}H(p_{0}\vert \pi _{\mu })+\frac{2\eta ^{\alpha +1}D_{4}}{\frac{3}{4}\gamma _{1}\eta }\nonumber \\&\le \,e^{-\gamma _{1}\eta k}H(p_{0}\vert \pi _{\mu })+\frac{8\eta ^{\alpha }D_{4}}{3\gamma _{1}}. \end{aligned}$$
(7.33)
Note that last inequality holds if we choose \(\eta \) such that it satisfies
$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma _{1}},\left( \frac{\gamma _{1}}{13N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\right\} . \end{aligned}$$
From Lemma 3.5, by choosing \(\mu =\sqrt{\eta }\) small enough so that \(W_{2}(\pi ,\ \pi _{\mu })\le 3\sqrt{NLE_{2}}\eta ^{\frac{\alpha }{2}}d^{\frac{1}{p}}\). Since \(\pi \) satisfies log-Sobolev inequality, by triangle inequality we also get
$$\begin{aligned} W_{2}(p_{k},\ \pi )&\le W_{2}(p_{k},\ \pi _{\mu })+W_{2}(\pi ,\ \pi _{\mu })\\&\le \sqrt{\frac{2}{\gamma }H(p_{k},\pi _{\mu })}+W_{2}(\pi ,\ \pi _{\mu })\\&\le \frac{1}{\sqrt{\gamma _{1}}}e^{-\frac{\gamma _{1}}{2}\eta k}\sqrt{H(p_{0}\vert \pi _{\mu })}+\frac{2}{\gamma _{1}}\eta ^{\frac{\alpha }{2}}\sqrt{D_{4}}+3\sqrt{NLE_{2}}\eta ^{\frac{\alpha }{2}}d^{\frac{1}{p}}. \end{aligned}$$
Given \(\epsilon >0\), if we further assume \(\eta \le \left( \frac{\epsilon \gamma _{1}}{6\sqrt{D_{4}}}\right) ^{\frac{2}{\alpha }}\wedge \left( \frac{\epsilon }{9\sqrt{NLE_{2}}d^{\frac{1}{p}}}\right) ^{\frac{2}{\alpha }}\), then the above inequality implies \(H(p_{k}\vert \pi _{\mu })\le \frac{1}{\sqrt{\gamma _{1}}}e^{-\frac{\gamma _{1}}{2}\eta k}\sqrt{H(p_{0}\vert \pi _{\mu })}+\frac{2\epsilon }{3}.\) This means for \(k\ge \frac{2}{\gamma _{1}\eta }\log \frac{3\sqrt{H\left( p_{0}\vert \pi _{\mu }\right) \gamma _{1}}}{\epsilon },\) we have \(H(p_{k}\vert \pi )\le \frac{\epsilon }{3}+\frac{2\epsilon }{3}=\epsilon \), as desired. \(\square \)
1.4.3 Proof of Lemma 3.5
Lemma 7.13
Assume that \(\pi \propto \exp (-\pi )\) and \(\pi _{\mu }\propto \exp (-U_{\mu })\) and \(\pi \) has a bounded second moment, that is \(\int \left\| x\right\| ^{2}\pi (x)\textrm{d}x=E_{2}<\infty \). We deduce the following bounds
$$\begin{aligned} W_{2}^{2}(\pi ,\ \pi _{\mu })\le 8.24NL\mu ^{1+\alpha }d^{\frac{2}{p}}E_{2}, \end{aligned}$$
for any \(\mu \le 0.05\).
Proof
This proof adapts the technique of the proof of [10]’s Proposition 1. Without loss of generality, we may assume that \({\displaystyle \int _{{\mathbb {R}}^{p}}\exp (-U(x))\hbox {d}x=1}\). We first give upper and lower bounds to the normalizing constant of \(\pi _{\mu }\), that is
$$\begin{aligned} c_{\mu }&{\mathop {=}\limits ^{_{\triangle }}}\int _{{\mathbb {R}}^{d}}\pi (x)e^{-\left( U_{\mu }(x)-U(x)\right) }\hbox {d}x\\&={\mathbb {E}}_{\pi }\left( e^{-\left( U_{\mu }(x)-U(x)\right) }\right) . \end{aligned}$$
The constant \(c_{\mu }\) is an expectation of \(e^{-\left( U_{\mu }(x)-U(x)\right) }\) with respect to the density \(\pi \) so it can be trivially upper bounded by \(e^{M}\) and lower bounded by \(e^{-M}\), where \(\Vert U_{\mu }(x)-U(x)\Vert \le \sum _{i}L_{i}\mu ^{1+\alpha _{i}}d^{\frac{2}{p}}=M\). Now we control the distance between densities \(\pi \) and \(\pi _{\mu }\) at any fixed \(x\in {\mathbb {R}}^{d}\):
$$\begin{aligned} \Vert \pi (x)-\pi _{\mu }(x)\Vert&=\pi (x)\left\| 1-\frac{e^{-\left( U_{\mu }(x)-U(x)\right) }}{c_{\mu }}\right\| \\ ~&\le \pi (x)\left\{ \left( 1-\frac{e^{-\left( U_{\mu }(x)-U(x)\right) }}{e^{M}}\right) +e^{-\left( U_{\mu }(x)-U(x)\right) }\left( \frac{1}{c_{\mu }}-\frac{1}{e^{M}}\right) \right\} \\&\le \pi (x)\left( 1-e^{-2M}+e^{2M}-1\right) \\&\le \pi (x)\left( 2M+e^{2M}-1\right) . \end{aligned}$$
The first inequality is from triangle inequality of absolute value, second inequality is trivial while the last inequality follows from \(1-e^{-x}\le x\) for any \(x\ge 0\). To bound \(W_{2}\), we use an inequality from [39](Theorem 6.15, page 115):
$$\begin{aligned} W_{2}^{2}(\pi ,\ \pi _{\mu })\le 2\int _{{\mathbb {R}}^{d}}\Vert x\Vert _{2}^{2}\Vert \pi (x)-\pi _{\mu }(x)\Vert \hbox {d}x. \end{aligned}$$
Combining this with the bound on \(\Vert \pi (x)-\pi _{\mu }(x)\Vert \) shown above, we have
$$\begin{aligned} W_{2}^{2}(\pi ,\ \pi _{\mu })&\le 2\int _{{\mathbb {R}}^{d}}\Vert x\Vert _{2}^{2}\pi (x)\left( 2M+e^{2M}-1\right) \hbox {d}x\\&\le 2\left( 2M+e^{2M}-1\right) E_{\pi }\left[ \Vert x\Vert ^{2}\right] \\&\le 2\left( 2M+e^{2M}-1\right) E_{2}\\&\le 8.24M\sum _{i}L_{i}\mu ^{1+\alpha _{i}}d^{\frac{2}{p}}E_{2}\\&\le NL\mu ^{1+\alpha }d^{\frac{2}{p}}E_{2}, \end{aligned}$$
where in the last inequality \(M<0.05\) ensures that \(e^{2M}-1\le 2.12M\). This gives the desired result. \(\square \)
1.5 Appendix E: Convexification of Non-convex Domain
1.5.1 Proof of Lemma 4.1
Lemma 7.14
For function V defined as
$$\begin{aligned} V(\ x)=\inf _{\begin{array}{c} \{\ x_{i}\}\subset \Omega ,\\ \left\{ \lambda _{i}\big \vert \sum _{i}\lambda _{i}=1\right\} \\ \text {s.t.},\sum _{i}\lambda _{i}\ x_{i}=\ x \end{array} }\left\{ \sum _{i=1}^{l}\lambda _{i}U(\ x_{i})\right\} , \end{aligned}$$
(7.34)
\(\forall \ x\in {\mathbb {B}}(0,R)\), \(\inf _{\left\| x\right\| =R}U(x)\le V(\ x)\le \sup _{\left\| x\right\| =R}U(x)\).
Proof
The proof is taken from [31, 43]. We provide it here for completeness. First, by definition of V inside \({\mathbb {B}}(0,R)\), we show that for any linear combination of the form \(\sum _{i}\lambda _{i}U(\ x_{i})\), where\(\sum _{i}\lambda _{i}=1,\) we can find another representation \(\sum _{j}\lambda _{j}U(\ x_{j})\), where \(\sum _{j}\lambda _{j}=1\) and \(\left\| x_{j}\right\| =R\) such that \(\sum _{j}\lambda _{j}U(\ x_{j})\le \sum _{i}\lambda _{i}U(\ x_{i})\). This follows straightforwardly as follows.
For any \(\ x_{j}\in \{\ x_{i}\}\), such that \(\left\| {\bar{x}}_{j}\right\| >R\), there exists a new convex combination \(\{\ x_{i}\}\bigcup \{{\bar{x}}_{j}\}\setminus \{\ x_{j}\}\) with \(\left\| {\bar{x}}_{j}\right\| =R\), such that \(\sum _{i}\lambda _{i}U(\ x_{i})\ge {\tilde{\lambda }}_{j}U({\bar{x}}_{j})+\sum _{i\ne j}{\tilde{\lambda }}_{i}U(\ x_{i})\). In this case, we choose \({\bar{x}}_{j}\) where \(\left\| {\bar{x}}_{j}\right\| =R\), such that:
$$\begin{aligned} {\bar{x}}_{j}&=\dfrac{1-{\bar{\lambda }}_{j}}{1-\lambda _{j}}\ x+\dfrac{{\bar{\lambda }}_{j}-\lambda _{j}}{1-\lambda _{j}}\ x_{j},\,\lambda _{j}<{\bar{\lambda }}_{j}<1,\nonumber \\&={\bar{\lambda }}_{j}\ x_{j}+\left( \dfrac{1-{\bar{\lambda }}_{j}}{1-\lambda _{j}}\right) \left( \sum _{i\ne j}\lambda _{i}\ x_{i}\right) . \end{aligned}$$
(7.35)
Since U is convex on \(\Omega \),
$$\begin{aligned} U({\bar{x}}_{j})\le {\bar{\lambda }}_{j}U(\ x_{j})+\left( \dfrac{1-{\bar{\lambda }}_{j}}{1-\lambda _{j}}\right) \left( \sum _{i\ne j}\lambda _{i}U(\ x_{i})\right) . \end{aligned}$$
(7.36)
On the other hand, x can be represented as a convex combination of \(\{\ x_{i}\}\bigcup \{{\bar{x}}_{j}\}\setminus \{\ x_{j}\}\):
$$\begin{aligned} x=\dfrac{\lambda _{j}}{{\bar{\lambda }}_{j}}{\bar{x}}_{j}+\left( 1-\dfrac{\lambda _{j}}{{\bar{\lambda }}_{j}}\dfrac{1-{\bar{\lambda }}_{j}}{1-\lambda _{j}}\right) \left( \sum _{i\ne j}\lambda _{i}\ x_{i}\right) ={\tilde{\lambda }}_{j}{\bar{x}}_{j}+\sum _{i\ne j}{\tilde{\lambda }}_{i}\ x_{i}, \end{aligned}$$
(7.37)
and that
$$\begin{aligned} \sum _{i}\lambda _{i}U(\ x_{i})&\ge \dfrac{\lambda _{j}}{{\bar{\lambda }}_{j}}U({\bar{x}}_{j})+\left( 1-\dfrac{\lambda _{j}}{{\bar{\lambda }}_{j}}\dfrac{1-{\bar{\lambda }}_{j}}{1-\lambda _{j}}\right) \left( \sum _{i\ne j}\lambda _{i}U(\ x_{i})\right) \nonumber \\&={\tilde{\lambda }}_{j}U({\bar{x}}_{j})+\sum _{i\ne j}{\tilde{\lambda }}_{i}U(\ x_{i}). \end{aligned}$$
(7.38)
As a result, \(V(\ x)\) can be represented as
$$\begin{aligned} V(\ x)=\inf _{\begin{array}{c} \{\ x_{j}\}\subset \Omega ,\\ \left\{ \lambda _{j}\big \vert \sum _{j}\lambda _{j}=1\right\} \\ \text {s.t.},\sum _{j}\lambda _{j}\ x_{j}=\ x,\,\left\| x_{i}\right\| =R \end{array} }\left\{ \sum _{j}\lambda _{j}U(\ x_{j})\right\} . \end{aligned}$$
(7.39)
By the representation of V inside \({\mathbb {B}}(0,R)\), we obtain \(\inf _{\left\| {\bar{x}}\right\| =R}U({\bar{x}})\le V(\ x)\le \sup _{\left\| {\bar{x}}\right\| =R}U({\bar{x}})\). \(\square \)
1.5.2 Proof of Lemma 4.2
Lemma 7.15
For U satisfying \(\alpha \)-mixture weakly smooth and \(\left( \mu ,\theta \right) \)-degenerated convex outside the ball radius R, there exists \({\hat{U}}\in C^{1}({\mathbb {R}}^{d})\) with a Hessian that exists everywhere on \({\mathbb {R}}^{d}\), and \({\hat{U}}\) is \(\left( \left( 1-\theta \right) \frac{\mu }{2},\theta \right) \)-degenerated convex on \({\mathbb {R}}^{d}\) (that is \(\nabla ^{2}{\hat{U}}(x)\succeq \left( 1-\theta \right) \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}\)), such that
$$\begin{aligned} \sup \left( {\hat{U}}(\ x)-U(\ x)\right)&-\inf \left( {\hat{U}}(\ x)-U(\ x)\right) \le \sum _{i}L_{i}R^{1+\alpha _{i}}+\frac{4\mu }{\left( 2-\theta \right) }\ R^{2-\theta }. \end{aligned}$$
(7.40)
Proof
Following closely to [31]’s approach, let \(g(\ x)=\frac{\mu }{2\left( 2-\theta \right) }\ \left( 1+\left\| x\right\| ^{2}\right) ^{1-\frac{\theta }{2}}\) for \(0\le \theta <1\). The gradient of \(g\left( x\right) \) is \(\nabla g(\ x)=\frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}x\) and the Hessian of \(g\left( x\right) \) is
$$\begin{aligned} \nabla ^{2}g(\ x)&=\frac{\mu }{2}\left[ \left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}-\theta \left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}xx^{T}\right] \nonumber \\&\preceq \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}. \end{aligned}$$
(7.41)
On the other hand, we also have
$$\begin{aligned} \nabla ^{2}g(\ x)&=\frac{\mu }{2}\left[ \left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}-\theta \left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}xx^{T}\right] \nonumber \\&=\frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}\left[ I_{d}+I_{d}\left\| x\right\| ^{2}-\theta \left\| x\right\| ^{2}\frac{xx^{T}}{\left\| x\right\| ^{2}}\right] \nonumber \\&=\frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}\left[ I_{d}+I_{d}\left( 1-\theta \right) \left\| x\right\| ^{2}+\theta \left\| x\right\| ^{2}\left( I_{d}-\frac{xx^{T}}{\left\| x\right\| ^{2}}\right) \right] \nonumber \\&\succeq \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}\left( \left( 1-\theta \right) \left\| x\right\| ^{2}+1\right) I_{d}\nonumber \\&\succeq \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}\left( \left( 1-\theta \right) \left( \left\| x\right\| ^{2}+1\right) \right) I_{d}\nonumber \\&\succeq \left( 1-\theta \right) \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}. \end{aligned}$$
(7.42)
We adapt [43] by denoting \({\tilde{U}}\left( x\right) =U\left( x\right) -g\left( x\right) .\) Since \(U\left( x\right) \) is \(\left( \mu ,\theta \right) \)-degenerated convex outside the ball, we deduce for every \(\left\| x\right\| \ge R,\)
$$\begin{aligned} \nabla ^{2}{\tilde{U}}\left( x\right)&=\nabla ^{2}U\left( x\right) -\nabla ^{2}g\left( x\right) \nonumber \\&\succeq \mu \left( 1+\left\| x\right\| {}^{2}\right) ^{-\frac{\theta }{2}}I_{d}-\frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}\nonumber \\&\succeq \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}, \end{aligned}$$
(7.43)
which implies that \({\tilde{U}}\left( x\right) \) is \(\left( \frac{\mu }{2},\theta \right) \)-degenerated convex outside the ball. Now, we construct \({\hat{U}}(\ x)\) so that it is twice differentiable, degenerated convex on all \({\mathbb {R}}^{d}\) and differs from \(U(\ x)\) less than \(4LR^{1+\alpha }+4LR^{1+\ell +\alpha }+\frac{4\mu }{\left( 2-\theta \right) }\ R^{2-\theta }\). Based on the same construction of [31], we first define the function V as the convex extension [43] of \({\tilde{U}}\) from domain \(\Omega ={\mathbb {R}}^{d}\setminus {\mathbb {B}}\left( 0,R\right) \) to its convex hull \(\Omega ^{co}\), \(V\left( x\right) =\inf \left\{ \sum _{i}\lambda _{i}{\tilde{U}}(\ x_{i})\right\} \) for every \(x\in {\mathbb {R}}^{d}.\) Since \({\tilde{U}}(\ x)\) is convex in \(\Omega \), \(V(\ x)={\tilde{U}}(\ x)\) for \(\ x\in \Omega \). By Lemma 4.1, \(V(\ x)\) is convex on the entire domain \({\mathbb {R}}^{d}\) and \(V(\ x)\) can be represented as
$$\begin{aligned} V(\ x)=\inf _{\begin{array}{c} \{\ x_{j}\}\subset \Omega ,\\ \left\{ \lambda _{j}\big \vert \sum _{j}\lambda _{j}=1\right\} \\ \text {s.t.},\sum _{j}\lambda _{j}\ x_{j}=\ x,\textrm{and}\left\| x_{i}\right\| =R \end{array} }\left\{ \sum _{j}\lambda _{j}{\tilde{U}}(\ x_{j})\right\} . \end{aligned}$$
(7.44)
Therefore, \(\forall \ x\in {\mathbb {B}}(0,R)\), \(\inf _{\left\| {\bar{x}}\right\| =R}{\tilde{U}}({\bar{x}})\le V(\ x)\le \sup _{\left\| {\bar{x}}\right\| =R}{\tilde{U}}({\bar{x}})\). Next we construct \({\tilde{V}}(\ x)\) to be a smoothing of V on \({\mathbb {B}}\left( 0,R+\epsilon \right) \). Consider the function \(\varphi {\displaystyle (x)}\) of a variable x in \({\mathbb {R}}^{d}\) defined by
$$\begin{aligned} {\displaystyle \varphi (x)={\left\{ \begin{array}{ll} Ce^{-1/(1-\left\| x\right\| ^{2})}, &{} \text { if }\left\| x\right\| <1,\\ 0, &{} \text { if }\left\| x\right\| \ge 1, \end{array}\right. }} \end{aligned}$$
(7.45)
where the numerical constant C ensures normalization. Let \({\displaystyle \varphi _{\delta }(x)=\delta ^{-d}\varphi (\delta ^{-1}x)}\) be a smooth function supported on the ball \({\mathbb {B}}(0,\delta )\). Define
$$\begin{aligned} {\tilde{V}}(\ x)&=\int V(\ y)\varphi _{\delta }(\ x-y)\hbox {d}y\nonumber \\&=\int V(\ x-y)\varphi _{\delta }(y)\hbox {d}y\nonumber \\&=E_{y}\left[ V(x-y)\right] . \end{aligned}$$
(7.46)
The third equality implies that for any x and \(z\in {\mathbb {R}}^{d}\),
$$\begin{aligned} \left\langle \nabla {\tilde{V}}(\ x)-\nabla {\tilde{V}}(\ z),x-z\right\rangle&=\left\langle \nabla E_{y}\left[ V(x-y)\right] -\nabla E_{y}\left[ V(z-y)\right] ,x-z\right\rangle \nonumber \\&{\mathop {=}\limits ^{_{1}}}\left\langle E_{y}\left[ \nabla V(x-y)\right] -E_{y}\left[ \nabla V(z-y)\right] ,x-z\right\rangle \nonumber \\&=\left\langle E_{y}\left[ \nabla V(x-y)-\nabla V(z-y)\right] ,x-z\right\rangle \nonumber \\&=E_{y}\left\langle \nabla V(x-y)-\nabla V(z-y),x-z\right\rangle \nonumber \\&\ge 0, \end{aligned}$$
(7.47)
where step 1 follows from exchangeability of gradient and integral and the last line is because of convexity of V, which indicates \({\tilde{V}}\) is a smooth and convex function on \({\mathbb {R}}^{d}.\) Also, note that the definition of \({\tilde{V}}\) implies that \(\forall \left\| x\right\| <R+\epsilon \),
$$\begin{aligned} \inf _{\left\| {\bar{x}}\right\|<R+\epsilon +\delta }V({\bar{x}})\le {\tilde{V}}(\ x)\le \sup _{\left\| {\bar{x}}\right\| <R+\epsilon +\delta }V({\bar{x}}). \end{aligned}$$
(7.48)
And by Lemma 4.1, for \(\quad \forall \left\| {\bar{x}}\right\| <R+\epsilon \)
$$\begin{aligned} \inf _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+\epsilon +\delta \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}})\le {\tilde{V}}(\ x)\le \sup _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+\epsilon +\delta \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}}). \end{aligned}$$
(7.49)
Finally, we construct the auxiliary function:
$$\begin{aligned} {\hat{U}}(\ x)-g\left( x\right) =\left\{ \begin{array}{l} {\tilde{U}}(\ x),\ \left\| x\right\| \ge R+2\epsilon ,\\ \alpha (\ x){\tilde{U}}(\ x)+(1-\alpha (\ x)){\tilde{V}}(\ x),\ R+\epsilon<\left\| x\right\| <R+2\epsilon ,\\ {\tilde{V}}(\ x),\ \left\| x\right\| \le R+\epsilon , \end{array}\right. \end{aligned}$$
(7.50)
where \(\alpha (\ x)=\dfrac{1}{2}\cos \left( \pi \dfrac{\left\| x\right\| ^{2}}{\epsilon \left( 2R+3\epsilon \right) ^{2}}-\frac{\left( R+\epsilon \right) ^{2}}{\epsilon \left( 2R+3\epsilon \right) ^{2}}\pi \right) +\dfrac{1}{2}\). Here we know that \({\tilde{U}}(\ x)\) is convex and smooth in \({\mathbb {R}}^{d}\setminus {\mathbb {B}}\left( 0,R\right) \); \({\tilde{V}}(\ x)\) is also convex and smooth in \({\mathbb {R}}^{d}\setminus {\mathbb {B}}\left( 0,R+\epsilon \right) \). Hence, for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \),
$$\begin{aligned} \nabla ^{2}\left( {\hat{U}}(\ x)-g\left( x\right) \right)&=\nabla ^{2}{\tilde{U}}(\ x)+\nabla ^{2}\left( (1-\alpha (\ x))({\tilde{V}}(\ x)-{\tilde{U}}(\ x))\right) \nonumber \\&=\alpha (\ x)\nabla ^{2}{\tilde{U}}(\ x)+(1-\alpha (\ x))\nabla ^{2}{\tilde{V}}(\ x)\nonumber \\&-\nabla ^{2}\alpha (\ x)\left( {\tilde{V}}(\ x)-{\tilde{U}}(\ x)\right) -2\nabla \alpha (\ x)\left( \nabla {\tilde{V}}(\ x)-\nabla {\tilde{U}}(\ x)\right) ^{T}\nonumber \\&\succeq -\nabla ^{2}\alpha (\ x)\left( {\tilde{V}}(\ x)-{\tilde{U}}(\ x)\right) -2\nabla \alpha (\ x)\left( \nabla {\tilde{V}}(\ x)-\nabla {\tilde{U}}(\ x)\right) ^{T}. \end{aligned}$$
(7.51)
Note that for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \), we have
$$\begin{aligned}&\left\| \nabla g(\ x)-\nabla g(\ x-y)\right\| \nonumber \\&\quad =\left\| \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}x-\frac{\mu }{2}\left( 1+\left\| x-y\right\| ^{2}\right) ^{-\frac{\theta }{2}}\left( x-y\right) \right\| \end{aligned}$$
(7.52)
$$\begin{aligned}&\quad \le \left\| \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}x-\frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}\left( x-y\right) \right\| \nonumber \\&\qquad +\left\| \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}\left( x-y\right) -\frac{\mu }{2}\left( 1+\left\| x-y\right\| ^{2}\right) ^{-\frac{\theta }{2}}\left( x-y\right) \right\| \nonumber \\&\quad \le \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}\left\| y\right\| +\frac{\mu }{2}\Vert \left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}-\left( 1+\left\| x-y\right\| ^{2}\right) ^{-\frac{\theta }{2}}\Vert \left\| x-y\right\| \nonumber \\&\quad \le \frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta +\frac{\mu }{2}\frac{\Vert \left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}-\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}\Vert }{\left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}}\left\| \left( x-y\right) \right\| \nonumber \\&\quad {\mathop {\le }\limits ^{_{1}}}\frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta +\frac{\mu }{2}\frac{\Vert \left( 1+\left\| x\right\| ^{2}\right) -\left( 1+\left\| x-y\right\| ^{2}\right) \Vert }{\left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}}\left\| \left( x-y\right) \right\| \nonumber \\&\quad \le \frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta +\frac{\mu }{2}\frac{\Vert \left( \left\| x\right\| -\left\| x-y\right\| \right) \left( \left\| x\right\| +\left\| x-y\right\| \right) \Vert }{\left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}}\left\| \left( x-y\right) \right\| \nonumber \\&\quad {\mathop {\le }\limits ^{_{2}}}\frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta +\frac{\mu }{2}\frac{\left\| y\right\| \left( \left\| x\right\| +\left\| x-y\right\| \right) }{\left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}}\left\| \left( x-y\right) \right\| \nonumber \\&\quad \le \frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta +\frac{\mu }{2}\frac{2\left( R+2\epsilon +\delta \right) ^{2}\delta }{\left( 1+\left( R+\epsilon \right) ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left( R+\epsilon -\delta \right) ^{2}\right) ^{\frac{\theta }{2}}}, \end{aligned}$$
(7.53)
where 1 follows from Lemma 7.24, while 2 is due to triangle inequality. As a result, we get
$$\begin{aligned} \left\| \nabla {\tilde{V}}(\ x)-\nabla {\tilde{U}}(\ x)\right\|&=\int \left\| \nabla {\tilde{U}}(\ x-\ y)-\nabla {\tilde{U}}(\ x)\right\| \varphi _{\delta }(\ y)\hbox {d}y\nonumber \\&\le \sum _{i}L_{i}\delta ^{\alpha _{i}}+\left\| \nabla g(\ x)-\nabla g(\ x-y)\right\| \nonumber \\&\le NL\delta ^{\alpha }+\frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta \end{aligned}$$
(7.54)
$$\begin{aligned}&\quad +\frac{\mu }{2}\frac{2\left( R+\epsilon -\delta \right) ^{2}\delta }{\left( 1+\left( R+\epsilon \right) ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left( R+\epsilon -\delta \right) ^{2}\right) ^{\frac{\theta }{2}}}. \end{aligned}$$
(7.55)
On the other hand, we also acquire
$$\begin{aligned}&\vert {\tilde{U}}(\textrm{x})-{\tilde{U}}(x-\textrm{y})\vert \nonumber \\&\quad \le \mathrm {\max }\left\{ \left\langle \nabla U(\mathrm {x-y}),\textrm{y}\right\rangle ,\left\langle \nabla U(\textrm{x}),\mathrm {-y}\right\rangle \right\} \nonumber \\&\qquad +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}+\Vert g\left( x\right) -g\left( x-y\right) \Vert \nonumber \\&\quad {\le \max }\left\{ \left\langle \nabla U(\mathrm {x-y}),\textrm{y}\right\rangle ,\left\langle \nabla U(\textrm{x}),\mathrm {-y}\right\rangle \right\} +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}\nonumber \\&\qquad +\Vert \frac{\mu }{2\left( 2-\theta \right) }\ \left( 1+\left\| x\right\| ^{2}\right) ^{1-\frac{\theta }{2}}-\frac{\mu }{2\left( 2-\theta \right) }\ \left( 1+\left\| x-y\right\| ^{2}\right) ^{1-\frac{\theta }{2}}\Vert \nonumber \\&\quad {\mathop {\le }\limits ^{_{1}}}\mathrm {\max }\left\{ \sum _{i}L_{i}\left\| \mathrm {x-y}\right\| ^{\alpha _{i}}\left\| y\right\| ,\sum _{i}L_{i}\left\| \textrm{x}\right\| ^{\alpha _{i}}\left\| y\right\| \right\} \nonumber \\&\qquad +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}+\frac{\mu }{2\left( 2-\theta \right) }\Vert \left( 1+\left\| x\right\| ^{2}\right) -\left( 1+\left\| x-y\right\| ^{2}\right) \Vert \end{aligned}$$
(7.56)
$$\begin{aligned}&\quad \le L\left\| y\right\| \mathrm {\max }\left\{ \sum _{i}L_{i}\left\| \mathrm {x-y}\right\| ^{\alpha _{i}},\sum _{i}L_{i}\left\| \textrm{x}\right\| ^{\alpha _{i}}\right\} \nonumber \\&\qquad +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}+\frac{\mu }{2\left( 2-\theta \right) }\Vert \left( \left\| x\right\| -\left\| x-y\right\| \right) \left( \left\| x\right\| +\left\| x-y\right\| \right) \Vert \end{aligned}$$
(7.57)
$$\begin{aligned}&\quad \le L\left\| y\right\| \mathrm {\max }\left\{ \sum _{i}L_{i}\left\| \mathrm {x-y}\right\| ^{\alpha _{i}},\sum _{i}L_{i}\left\| \textrm{x}\right\| ^{\alpha _{i}}\right\} \end{aligned}$$
(7.58)
$$\begin{aligned}&\qquad +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}+\frac{\mu }{2\left( 2-\theta \right) }\left( \left\| x\right\| +\left\| x-y\right\| \right) \left\| y\right\| , \end{aligned}$$
(7.59)
where 1 follows again from Lemma 7.24 and the last inequality is because of triangle inequality. Hence for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \), \(\left\| y\right\| \le \delta \),
$$\begin{aligned} {\tilde{V}}(\ x)-{\tilde{U}}(\ x)&=\int \left( {\tilde{U}}(\ x-\ y)-{\tilde{U}}(\ x)\right) \varphi _{\delta }(\ y)d\ y\\&\le L\left\| y\right\| \mathrm {\max }\left\{ \sum _{i}L_{i}\left\| \mathrm {x-y}\right\| ^{\alpha _{i}},\sum _{i}L_{i}\left\| \textrm{x}\right\| ^{\alpha _{i}}\right\} \\&\quad +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}+\frac{\mu }{2\left( 2-\theta \right) }\left( \left\| x\right\| +\left\| x-y\right\| \right) \left\| y\right\| \\&\le L\delta \left[ \sum _{i}L_{i}\left( R+2\epsilon +\delta \right) ^{\alpha _{i}}\right] \\&\quad +\sum _{i}\frac{L}{1+\alpha _{i}}\delta ^{\alpha _{i}+1}+\frac{\mu }{\left( 2-\theta \right) }\left( R+2\epsilon +\delta \right) \delta . \end{aligned}$$
Therefore, when \(R+\epsilon<\left\| x\right\| <R+2\epsilon \),
$$\begin{aligned} \nabla ^{2}\left( {\hat{U}}(\ x)-g\left( x\right) \right)&\succeq -\frac{\left( R+\epsilon \right) ^{2}\pi \left( L\delta \left[ \sum _{i}L_{i}\left( R+2\epsilon +\delta \right) ^{\alpha _{i}}\right] \right) }{\epsilon \left( 2R+3\epsilon \right) }I_{d}\nonumber \\&\quad -\frac{\left( R+\epsilon \right) ^{2}\pi \left( +\sum _{i}\frac{L}{1+\alpha _{i}}\delta ^{\alpha _{i}+1}+\frac{\mu }{\left( 2-\theta \right) }\left( R+2\epsilon +\delta \right) \delta \right) }{\epsilon \left( 2R+3\epsilon \right) }I_{d}\nonumber \\&\quad -\frac{\left( R+\epsilon \right) ^{4}\pi ^{2}\left( NL\delta ^{\alpha }+\frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta \right) }{\epsilon ^{2}\left( 2R+3\epsilon \right) }I_{d}\nonumber \\&\quad -\frac{\left( R+\epsilon \right) ^{4}\pi ^{2}\left( \frac{\mu }{2}\frac{2\left( R+\epsilon -\delta \right) ^{2}\delta }{\left( 1+\left( R+\epsilon \right) ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left( R+\epsilon -\delta \right) ^{2}\right) ^{\frac{\theta }{2}}}\right) }{\epsilon ^{2}\left( 2R+3\epsilon \right) }I_{d}. \end{aligned}$$
(7.60)
Taking the limit when \(\delta \rightarrow 0^{+}\), we obtain that for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \), \(\nabla ^{2}\left( {\hat{U}}(\ x)-g\left( x\right) \right) \) is positive semi-definite; hence, it is positive semi-definite on the entire \({\mathbb {R}}^{d}\), or \({\hat{U}}(\ x)-g\left( x\right) \) is convex on \({\mathbb {R}}^{d}\). From (7.49), we know that for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \),
$$\begin{aligned} \inf _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}})&\le {\hat{U}}(\ x)-g\left( x\right) \le \sup _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}}). \end{aligned}$$
(7.61)
Therefore,
$$\begin{aligned}&\sup \left( {\hat{U}}(\ x)-U(\ x)\right) -\inf \left( {\hat{U}}(\ x)-U(\ x)\right) \nonumber \\&\quad =\sup \left( {\hat{U}}(\ x)-g\left( x\right) -{\tilde{U}}(\ x)\right) -\inf \left( {\hat{U}}(\ x)-g\left( x\right) -{\tilde{U}}(\ x)\right) \end{aligned}$$
(7.62)
$$\begin{aligned}&\quad \le 2\left( \sup _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}})-\inf _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}})\right) \nonumber \\&\quad \le 2\left( \sup _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) }{\tilde{U}}({\bar{x}})-\inf _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) }{\tilde{U}}({\bar{x}})\right) . \end{aligned}$$
(7.63)
Since U is \(\alpha \)-mixture weakly smooth and \(\nabla U(0)=0\), we deduce
$$\begin{aligned} \Vert U(\ x)-U(0)\Vert&=\Vert U(\ x)-U(0)-\ \left\langle x,\nabla U(0)\right\rangle \Vert \nonumber \\&\le \sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{1+\alpha _{i}}\nonumber \\&\le \sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left( R+2\epsilon \right) ^{1+\alpha _{i}}\nonumber \\&\le \sum _{i}L_{i}R^{1+\alpha _{i}} \end{aligned}$$
(7.64)
and
$$\begin{aligned} \Vert g(\ x)\Vert&=\left\| \frac{\mu }{2\left( 2-\theta \right) }\ \left( 1+\left\| x\right\| ^{2}\right) ^{1-\frac{\theta }{2}}\right\| \nonumber \\&\le \frac{\mu }{2\left( 2-\theta \right) }\ \left( 1+\left( R+2\epsilon \right) ^{2}\right) ^{1-\frac{\theta }{2}}\nonumber \\&\le \frac{\mu }{\left( 2-\theta \right) }\ R^{2-\theta }. \end{aligned}$$
(7.65)
So for \(\forall \left\| x\right\| \le \left( R+2\epsilon \right) \), \(\epsilon \) is sufficiently small,
$$\begin{aligned} \sup _{{\bar{x}}\in {\mathbb {B}}\left( R+2\epsilon \right) }{\tilde{U}}({\bar{x}})-\inf _{{\bar{x}}\in {\mathbb {B}}\left( R+2\epsilon \right) }&{\tilde{U}}({\bar{x}})\le \sum _{i}L_{i}R^{1+\alpha _{i}}+\frac{2\mu }{\left( 2-\theta \right) }\ R^{2-\theta }. \end{aligned}$$
As a result, we get
$$\begin{aligned} \sup \left( {\hat{U}}(\ x)-U(\ x)\right) -\inf&\left( {\hat{U}}(\ x)-U(\ x)\right) \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+\frac{4\mu }{\left( 2-\theta \right) }\ R^{2-\theta }. \end{aligned}$$
\(\square \)
Remark 7.16
When \(\theta =0,\) the \(\left( \mu ,\theta \right) \)-degenerated convex outside the ball is equivalent to the \(\mu \)-strongly convex outside the ball, we achieve a result for strongly convex outside the ball similar to [31] but for \(\alpha \)-mixture weakly smooth instead of smooth. The constant could be improved by a factor of 2 if we take \(\epsilon \) to be arbitrarily small.
1.5.3 Proof of Lemma 4.4
Lemma 7.17
For U satisfying \(\gamma -\)Poincaré, \(\alpha \)-mixture weakly smooth with \(\alpha _{N}=1\) and \(2-\)dissipative, there exists \(\breve{U}\in C^{1}({\mathbb {R}}^{d})\) with a Hessian that exists everywhere on \({\mathbb {R}}^{d}\), and \(\breve{U}\) is log-Sobolev on \({\mathbb {R}}^{d}\) such that
$$\begin{aligned} \sup \left( \breve{U}(\ x)-U(\ x)\right) -\inf \left( \breve{U}(\ x)-U(\ x)\right) \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }.\nonumber \\ \end{aligned}$$
(7.66)
Proof
First, given \(R>0,\) let \({\overline{U}}(\textrm{x}):=U(\textrm{x})+\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}\) for \(\lambda _{0}=\frac{2\,L}{R^{1-\alpha }}\), we obtain the following property
$$\begin{aligned}&\left\langle \nabla {\overline{U}}(\textrm{x})-\nabla {\overline{U}}(\textrm{y}),x-y\right\rangle \nonumber \\&\quad =\left\langle \nabla \left( U(\textrm{x})+\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}\right) -\nabla \left( U(\textrm{y})+\frac{L_{N}+\lambda _{0}}{2}\left\| y\right\| ^{2}\right) ,x-y\right\rangle \end{aligned}$$
(7.67)
$$\begin{aligned}&\quad =\left\langle \nabla U(\textrm{x})-\nabla U(\textrm{y})+(L_{N}+\lambda _{0})\left( x-y\right) ,x-y\right\rangle \nonumber \\&\quad {\mathop {\ge }\limits ^{i}}-\sum _{i<N}L_{i}\left\| x-y\right\| ^{1+\alpha }+\lambda _{0}\left\| x-y\right\| ^{2}\nonumber \\&\quad \ge \frac{\lambda _{0}}{2}\left\| x-y\right\| ^{2}\,for\,\left\| x-y\right\| \ge \left( \frac{NL}{\lambda _{0}}\right) ^{\frac{1}{1-\alpha _{1}}}=R, \end{aligned}$$
(7.68)
where (i) follows from Assumption 2.1. This implies that \({\overline{U}}(\textrm{x})\) is \(\lambda _{0}-\) strongly convex outside the ball \(B_{R}=\left\{ x:\left\| x\right\| \le R\right\} \). Though \({\overline{U}}(\textrm{x})\) behaves differently than Lemma 4.2 assumptions, with some additional verifications, we still can apply Lemma 4.2 to derive the result. We sketch the proof as follows. There exists \({\hat{U}}\in C^{1}({\mathbb {R}}^{d})\) with a Hessian that exists everywhere on \({\mathbb {R}}^{d}\),
$$\begin{aligned} {\hat{U}}(\ x)-\frac{\lambda _{0}}{4}\left\| x\right\| ^{2}=\left\{ \begin{array}{l} \tilde{{\overline{U}}}(\ x),\ \left\| x\right\| \ge R+2\epsilon ,\\ \alpha (\ x)\tilde{{\overline{U}}}(\ x)+(1-\alpha (\ x)){\tilde{V}}(\ x),\ R+\epsilon<\left\| x\right\| <R+2\epsilon ,\\ {\tilde{V}}(\ x),\ \left\| x\right\| \le R+\epsilon , \end{array}\right. \end{aligned}$$
(7.69)
where \(\alpha (\ x)\) is defined as before. Both \(\tilde{{\overline{U}}}(\ x)\) and \({\tilde{V}}(\ x)\) are convex and smooth in \({\mathbb {R}}^{d}\setminus {\mathbb {B}}\left( 0,R\right) \) and for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \), \(\left\| y\right\| \le \delta \),
$$\begin{aligned} \nabla ^{2}\left( {\hat{U}}(\ x)-\frac{\lambda _{0}}{4}\left\| x\right\| ^{2}\right)&\succeq -\nabla ^{2}\alpha (\ x)\left( {\tilde{V}}(\ x)-\tilde{{\overline{U}}}(\ x)\right) \nonumber \\&\quad -2\nabla \alpha (\ x)\left( \nabla {\tilde{V}}(\ x)-\nabla \tilde{{\overline{U}}}(\ x)\right) ^{T}. \end{aligned}$$
(7.70)
In this case, we have
$$\begin{aligned} \left\| \nabla {\tilde{V}}(\ x)-\nabla \tilde{{\overline{U}}}(\ x)\right\|&=\left\| \nabla \int \left( {\overline{U}}(\ x-\ y)-{\overline{U}}(\ x)\right) \varphi _{\delta }(\ y)\hbox {d}y\right\| \nonumber \\&{\mathop {\le }\limits ^{_{1}}}\left\| \nabla \int \left( U(\ x-\ y)-U(\ x)\right) \varphi _{\delta }(\ y)\hbox {d}y\right\| \nonumber \\&\quad +\lambda _{0}\int \left\| y\right\| \varphi _{\delta }(\ y)\hbox {d}y\nonumber \\&\le \left\| \int \left( \nabla U(\ x-\ y)-\nabla U(\ x)\right) \varphi _{\delta }(\ y)\hbox {d}y\right\| +\lambda _{0}\delta \nonumber \\&\le \sum _{i}L_{i}\delta ^{\alpha _{1}}+\lambda _{0}\delta , \end{aligned}$$
(7.71)
where 1 holds by triangle inequality and the last line is because of \(\alpha \)-mixture weakly smooth assumption, while
$$\begin{aligned}&\Vert \tilde{{\overline{U}}}(\textrm{x})-\tilde{{\overline{U}}}(x-\textrm{y})\Vert \nonumber \\&\quad {\mathop {\le }\limits ^{_{1}}}\Vert {\overline{U}}(\textrm{x})-{\overline{U}}(x-\textrm{y})\Vert +\Vert \frac{L+\lambda _{0}}{2}\left\| x\right\| ^{2}-\frac{L+\lambda _{0}}{2}\left\| x-y\right\| ^{2}\Vert \nonumber \\&\quad {\mathop {\le }\limits ^{_{2}}}\left\{ \left\langle \nabla U(\mathrm {x-y}),\textrm{y}\right\rangle \vee \left\langle \nabla U(\textrm{x}),\mathrm {-y}\right\rangle \right\} +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left\| y\right\| ^{\alpha _{i}+1}\nonumber \\&\qquad +\frac{L_{N}+\lambda _{0}}{2}\Vert \left\| x\right\| ^{2}-\left\| x-y\right\| ^{2}\Vert \end{aligned}$$
(7.72)
$$\begin{aligned}&\quad {\le }\left\{ \left\langle \nabla U(\mathrm {x-y}),\textrm{y}\right\rangle \vee \left\langle \nabla U(\textrm{x}),\mathrm {-y}\right\rangle \right\} +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left\| y\right\| ^{\alpha _{i}+1}\nonumber \\&\qquad +\frac{L_{N}+\lambda _{0}}{2}\left( \left\| x\right\| -\left\| x-y\right\| \right) \left( \left\| x\right\| +\left\| x-y\right\| \right) \end{aligned}$$
(7.73)
$$\begin{aligned}&\quad \le \left\{ \left( \sum _{i}L_{i}\left\| \mathrm {x-y}\right\| ^{\alpha _{i}}\right) \left\| y\right\| \vee \left( \sum _{i}L_{i}\left\| \textrm{x}\right\| ^{\alpha _{i}}\right) \left\| y\right\| \right\} \nonumber \\&\qquad +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left\| y\right\| ^{\alpha _{i}+1}+\frac{L_{N}+\lambda _{0}}{2}\left\| y\right\| \mathrm {\max }\left\{ \left\| \mathrm {x-y}\right\| ,\left\| \textrm{x}\right\| \right\} \end{aligned}$$
(7.74)
$$\begin{aligned}&\quad \le \sum _{i}L_{i}\left( R+2\epsilon +\delta \right) ^{\alpha _{i}}\delta \end{aligned}$$
(7.75)
$$\begin{aligned}&\qquad +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\delta ^{\alpha _{i}+1}+\frac{L_{N}+\lambda _{0}}{2}\left( R+2\epsilon +\delta \right) \delta , \end{aligned}$$
(7.76)
where 1 is due to triangle inequality, 2 follows from Assumption 1, and the last line holds by plugging in all the limits. Taking the limit when \(\delta \rightarrow 0^{+},\) and for sufficiently small \(\epsilon \), we obtain \({\hat{U}}(\ x)-\frac{\lambda _{0}}{4}\left\| x\right\| ^{2}\) is convex on all \({\mathbb {R}}^{d}\) or \({\hat{U}}(\ x)\) is \(\frac{\lambda _{0}}{2}\)- strongly convex. By definition of \({\overline{U}}\), for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \) we obtain
$$\begin{aligned} \Vert {\overline{U}}(\ x)-{\overline{U}}(0)\Vert&\le \Vert U(\ x)-U(0)-\ \left\langle x,\nabla U(0)\right\rangle \Vert +\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}\nonumber \\&\le +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{\alpha _{i}+1}+\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}\nonumber \\&\le +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left( R+2\epsilon +\delta \right) ^{\alpha _{i}+1}+\frac{L_{N}+\lambda _{0}}{2}\left( R+2\epsilon +\delta \right) ^{2}\nonumber \\&\le \sum _{i}L_{i}R^{1+\alpha _{i}}+\left( L_{N}+\lambda _{0}\right) R^{2}. \end{aligned}$$
(7.77)
As a result, from Lemma 4.2 we deduce
$$\begin{aligned} \sup \left( {\hat{U}}(\ x)-{\overline{U}}(\ x)\right)&-\inf \left( {\hat{U}}(\ x)-{\overline{U}}(\ x)\right) \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+2\left( L_{N}+\lambda _{0}\right) R^{2}. \end{aligned}$$
(7.78)
Let \(\breve{U}\left( x\right) ={\hat{U}}\left( x\right) -\left( \frac{L_{N}}{2}+\frac{\lambda _{0}}{4}\right) \left\| x\right\| ^{2}\) then for \(\left\| x\right\| >R+2\epsilon +\delta \), \({\hat{U}}\left( x\right) ={\overline{U}}\left( x\right) \) so \(\breve{U}\left( x\right) =U\left( x\right) \). For \(\left\| x\right\| \le R+2\epsilon +\delta \), we have
$$\begin{aligned}&\sup \left( \breve{U}(x)-U(x)\right) -\inf \left( \breve{U}(\ x)-U(x)\right) \nonumber \\&\quad \le \sup \left( {\hat{U}}(x)+\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}-{\overline{U}}(x)\right) -\inf \left( {\hat{U}}(x)+\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}-{\overline{U}}(x)\right) \nonumber \\&\quad \le \sup \left( {\hat{U}}(x)-{\overline{U}}(x)\right) -\inf \left( {\hat{U}}(x)-{\overline{U}}(x)\right) +\left( L_{N}+\lambda _{0}\right) \left( R+2\epsilon +\delta \right) ^{2}\nonumber \\&\quad \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+2\left( L_{N}+\lambda _{0}\right) R^{2}+2\left( L_{N}+\lambda _{0}\right) R^{2}.\nonumber \\&\quad \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }. \end{aligned}$$
(7.79)
So for every \(x\in {\mathbb {R}}^{d},\)
$$\begin{aligned} \sup \left( \breve{U}(x)-U(x)\right) -\inf \left( \breve{U}(\ x)-U(x)\right) \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }. \end{aligned}$$
Since U(x) is \(PI(\gamma )\), and using [25]’s Lemma 1.2, we have \(\breve{U}(\ x)\) is Poincaré with constant
$$\begin{aligned} \gamma _{1}=\gamma e^{-4\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }. \end{aligned}$$
On the other hand, we know that \(\nabla ^{2}\breve{U}\left( x\right) =\nabla ^{2}{\hat{U}}\left( x\right) -\left( L_{N}+\frac{\lambda _{0}}{2}\right) I\succeq -LI\) for since \({\hat{U}}\left( x\right) \) is \(\frac{\lambda _{0}}{2}-\)strongly convex, which implies that \(\nabla ^{2}\breve{U}\left( x\right) \) is lower bounded by \(-LI\). In addition, for \(\left\| x\right\| >R+2\epsilon +\delta \) from \(2-\)dissipative assumption, we have for some a, \(b>0,\left\langle \nabla \breve{U}(\textrm{x}),x\right\rangle \ge a\left\| x\right\| ^{2}-b\), while for \(\left\| x\right\| \le R+2\epsilon +\delta \)
$$\begin{aligned} \left\langle \nabla \breve{U}\left( x\right) ,x\right\rangle&\ge \left\langle -\nabla \left( \left( \frac{L_{N}}{2}+\frac{\lambda _{0}}{4}\right) \left\| x\right\| ^{2}\right) ,x\right\rangle \\&\ge -\left( L_{N}+\frac{\lambda _{0}}{2}\right) \left\| x\right\| ^{2}\\&\ge -\left( L_{N}+\frac{\lambda _{0}}{2}\right) R^{2}.\\&\ge a\left\| x\right\| ^{2}-\left( L_{N}+\frac{\lambda _{0}}{2}\right) R^{2}-aR^{2}, \end{aligned}$$
so for every \(x\in {\mathbb {R}}^{d},\)
$$\begin{aligned} \left\langle \nabla \breve{U}(\textrm{x}),x\right\rangle \ge a\left\| x\right\| ^{2}-\left( b+\left( L_{N}+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}\right) . \end{aligned}$$
We choose \(W=e^{a_{1}\left\| x\right\| ^{2}}\) and \(V=a_{1}\left\| x\right\| ^{2}\) with \(0<a_{1}=\frac{a}{4}\). One sees that W satisfies Lyapunov inequality
$$\begin{aligned} {\mathcal {L}}W&=\left( 2a_{1}d+4a_{1}^{2}\left\| x\right\| ^{2}-2a_{1}\left\langle \nabla U(\textrm{x}),x\right\rangle \right) W\nonumber \\&\le \left( 2a_{1}d+4a_{1}^{2}\left\| x\right\| ^{2}-2a_{1}a\left\| x\right\| ^{2}+2a_{1}\left( b+\left( L_{N}+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}\right) \right) W\nonumber \\&\le \left( -\frac{a^{2}}{2}\left\| x\right\| ^{2}+\frac{a}{2}\left( b+\left( L_{N}+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}+d\right) \right) W. \end{aligned}$$
(7.80)
By [5]’s Theorem 1.9, \(\breve{U}\left( x\right) \) satisfies a defective log-Sobolev. In addition, by Rothaus’ lemma, a defective log-Sobolev inequality together with the \(PI(\gamma _{1})\) implies the log-Sobolev inequality with the log Sobolev constant is \(\gamma _{2}=\frac{2}{[A+(B+2)\frac{1}{\gamma _{_{1}}})]}\), where
$$\begin{aligned} A&=\left( 1-\frac{L}{2}\right) \frac{8}{a^{2}}+\zeta , \end{aligned}$$
(7.81)
$$\begin{aligned} B&=2\left[ \frac{2\left( \left( b+4\left( L_{N}+\frac{\lambda _{0}}{4}\right) R^{2}+aR^{2}\right) +d\right) }{a}+M_{2}\right] \left( 1-\frac{L}{2}+\frac{1}{\zeta }\right) , \end{aligned}$$
(7.82)
where \(M_{2}=\int \left\| x\right\| ^{2}e^{-\breve{U}(x)}\hbox {d}x\). But it is well known from Lemma 10 that \(M_{2}=O(d)\), so the log-Sobolev constant is just O(d). This concludes the proof. \(\square \)
1.5.4 Proof of Lemma 4.5
Theorem 7.18
Suppose \(\pi \) is \(\gamma -\)Poincaré, \(\alpha \)-mixture weakly smooth with \(\alpha _{N}=1\) and \(2-\)dissipativity (i.e., \(\left\langle \nabla U(x),x\right\rangle \ge a\left\| x\right\| ^{2}-b\)) for some \(a,b>0\), and for any \(x_{0}\sim p_{0}\) with \(H(p_{0}\vert \pi )=C_{0}<\infty \), the iterates \(x_{k}\sim p_{k}\) of LMC with step size \(\eta \le 1\wedge \frac{1}{4\gamma _{3}}\wedge \left( \frac{\gamma _{3}}{16L^{1+\alpha }}\right) ^{\frac{1}{\alpha }}\)satisfies
$$\begin{aligned} H(p_{k}\vert \pi )\le e^{-\gamma _{3}\epsilon k}H(p_{0}\vert \pi )+\frac{8\eta ^{\alpha }D_{3}}{3\gamma _{3}}, \end{aligned}$$
(7.83)
where \(D_{3}\) is defined as in equation (3.8) and
$$\begin{aligned} M_{2}&=\int \left\| x\right\| ^{2}e^{-\breve{U}(x)}\textrm{d}x=O(d) \end{aligned}$$
(7.84)
$$\begin{aligned} \zeta&=\sqrt{2\left[ \frac{2\left( b+\left( L+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}+d\right) }{a}+M_{2}\right] \frac{e^{4\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }}{\gamma }}\end{aligned}$$
(7.85)
$$\begin{aligned} A&=\left( 1-\frac{L}{2}\right) \frac{8}{a^{2}}+\zeta ,\end{aligned}$$
(7.86)
$$\begin{aligned} B&=2\left[ \frac{2\left( \left( b+4\left( L+\frac{\lambda _{0}}{4}\right) R^{2}+aR^{2}\right) +d\right) }{a}+M_{2}\right] \left( 1-\frac{L}{2}+\frac{1}{\zeta }\right) ,\nonumber \\ \gamma _{3}&=\frac{2\gamma e^{-\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }}{A\gamma +(B+2)e^{4\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }}. \end{aligned}$$
(7.87)
Then, for any \(\epsilon >0\), to achieve \(H(p_{k}\vert \pi )<\epsilon \), it suffices to run ULA with step size \(\eta \le 1\wedge \frac{1}{4\gamma _{3}}\wedge \left( \frac{\gamma _{3}}{16L^{1+\alpha }}\right) ^{\frac{1}{\alpha }}\wedge \left( \frac{3\epsilon \gamma _{3}}{16D_{3}}\right) ^{\frac{1}{\alpha }}\)for \(k\ge \frac{1}{\gamma _{3}\eta }\log \frac{2\,H\left( p_{0}\vert \pi \right) }{\epsilon }\) iterations.
Proof
From Lemma 7.17, we can optimize over \(\zeta \) and get
$$\begin{aligned} \zeta =\sqrt{2\left[ \frac{2\left( b+\left( L+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}+d\right) }{a}+M_{2}\right] \frac{1}{\gamma _{1}}}. \end{aligned}$$
By using Holley–Stroock perturbation theorem [21], we have U(x) is log-Sobolev on \({\mathbb {R}}^{d}\) with constant
$$\begin{aligned} \gamma _{3}=\frac{2\gamma e^{-\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }}{[A\gamma +(B+2)e^{4\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) })]}. \end{aligned}$$
Applying Theorem 3.2, we get the desired result.\(\square \)
1.5.5 Proof of Theorem 4.5
Lemma 7.19
If U satisfies Assumption 2.3, then
$$\begin{aligned} U(x)\ge \frac{a}{2\beta }\Vert x\Vert ^{\beta }+U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}R^{\alpha _{i}+1}-b. \end{aligned}$$
(7.88)
Proof
Using the technique of [18], let \(R=\left( \frac{2b}{a}\right) ^{\frac{1}{\beta }}\), we lower bound \(U\left( x\right) \) when \(\left\| x\right\| \le R\),
$$\begin{aligned} U(x)&=U(0)+\int _{0}^{1}\left\langle \nabla U(tx),\ x\right\rangle \hbox {d}t\nonumber \\&\ge U(0)-\int _{0}^{1}\left\| \nabla U(tx)\right\| \left\| x\right\| \hbox {d}t\nonumber \\&\ge U(0)-\sum _{i}L_{i}\left\| x\right\| ^{\alpha _{i}+1}\int _{0}^{1}t^{\alpha _{i}}\hbox {d}t\nonumber \\&\ge U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left\| x\right\| ^{\alpha _{i}+1}\nonumber \\&\ge U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}R^{\alpha _{i}+1}. \end{aligned}$$
(7.89)
For \(\left\| x\right\| >R\), we can lower bound U as follows.
$$\begin{aligned} U(x)&=U(0)+\int _{0}^{\frac{R}{\Vert x\Vert }}\left\langle \nabla U(tx),\ x\right\rangle \hbox {d}t+\int _{\frac{R}{\Vert x\Vert }}^{1}\left\langle \nabla U(tx),\ x\right\rangle \hbox {d}t\nonumber \\&\ge U(0)-\int _{0}^{\frac{R}{\left\| x\right\| }}\left\| \nabla U(tx)\right\| \left\| x\right\| \hbox {d}t+\int _{\frac{R}{\left\| x\right\| }}^{1}\frac{1}{t}\left\langle \nabla U(tx),\ tx\right\rangle \hbox {d}t\nonumber \\&\ge U(0)-\left\| x\right\| \int _{0}^{\frac{R}{\left\| x\right\| }}\sum _{i}L_{i}\left\| tx\right\| ^{\alpha _{i}}\hbox {d}t+\int _{\frac{R}{\left\| x\right\| }}^{1}\frac{1}{t}\left( a\left\| tx\right\| ^{\beta }-b\right) \hbox {d}t\nonumber \\&{\mathop {\ge }\limits ^{_{1}}}U(0)-\sum _{i}L_{i}\left\| x\right\| ^{\alpha _{i}+1}\int _{0}^{\frac{R}{\left\| x\right\| }}t^{\alpha _{i}}\hbox {d}t\ +\frac{1}{2}\int _{\frac{R}{\left\| x\right\| }}^{1}\frac{1}{t}a\left\| tx\right\| ^{\beta }\hbox {d}t\nonumber \\&{\mathop {\ge }\limits ^{_{2}}}U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left\| x\right\| ^{\alpha _{i}+1}\frac{R^{\alpha _{i}+1}}{\left\| x\right\| ^{\alpha _{i}+1}}+\frac{a}{2}\left\| x\right\| ^{\beta }\int _{\frac{R}{\left\| x\right\| }}^{1}t^{\beta -1}\hbox {d}t\nonumber \\&\ge U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}R^{\alpha _{i}+1}+\frac{a}{2\beta }\left\| x\right\| ^{\beta }\left( 1-\frac{R^{\beta }}{\left\| x\right\| ^{\beta }}\right) \nonumber \\&\ge \frac{a}{2\beta }\left\| x\right\| ^{\beta }+U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}R^{\alpha _{i}+1}-b, \end{aligned}$$
(7.90)
where 1 follows from Assumption 2.3 and 2 uses the fact that if \(t{\displaystyle \ge \frac{R}{\left\| x\right\| }}\) then \({\displaystyle a\left\| tx\right\| ^{\beta }-b\ge \frac{a}{2}\left\| tx\right\| ^{\beta }}.\) Now, since for \(\left\| x\right\| \le R\), \(\frac{a}{2\beta }\left\| x\right\| ^{\beta }\le b\), we combine the inequality for \(\left\| x\right\| \le R\) and get
$$\begin{aligned} U(x)\ge \frac{a}{2\beta }\left\| x\right\| ^{\beta }+U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}R^{\alpha _{i}+1}-b. \end{aligned}$$
(7.91)
\(\square \)
1.5.6 Proof of Theorem 4.7
Lemma 7.20
Assume that U satisfies Assumption 2.3, then for \(\pi =e^{-U}\) and any distribution \(\rho \), we have
$$\begin{aligned} \frac{4\beta }{a}\left[ \textrm{H}(\rho \vert \pi )+{\tilde{d}}+{\tilde{\mu }}\right] \ge \textrm{E}_{\rho }\left[ \left\| x\right\| {}^{\beta }\right] , \end{aligned}$$
(7.92)
where
$$\begin{aligned} {\tilde{\mu }}&=\frac{1}{2}\log \left( \frac{2}{\beta }\right) +\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b+\vert U(0)\vert , \end{aligned}$$
(7.93)
$$\begin{aligned} {\tilde{d}}&=\frac{d}{\beta }\left[ \frac{\beta }{2}log\left( \pi \right) +\log \left( \frac{4\beta }{a}\right) +\left( 1-\frac{\beta }{2}\right) \log \left( \frac{d}{2e}\right) \right] . \end{aligned}$$
(7.94)
Proof
Let \(q(x)=e^{\frac{a}{4\beta }\left\| x\right\| {}^{\beta }-U(x)}\) and \(C_{q}=\int e^{\frac{a}{4\beta }\left\| x\right\| {}^{\beta }-U(x)}\hbox {d}x\). First, we need to bound \(\log C_{q}\). Using Lemma 7.19, we have
$$\begin{aligned} U(x)&\ge \frac{a}{2\beta }\left\| x\right\| ^{\beta }+U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}-b. \end{aligned}$$
(7.95)
Regrouping the terms and integrating both sides gives
$$\begin{aligned}&\int e^{\frac{a}{4\beta }\left\| x\right\| {}^{\beta }-U(x)}\hbox {d}x\le e^{-U(0)+\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b}\int e^{-\frac{a}{4\beta }\left\| x\right\| {}^{\beta }}\hbox {d}x\nonumber \\&\quad =\frac{2\pi ^{d/2}}{\beta }\left( \frac{4\beta }{a}\right) ^{\frac{d}{\beta }}e^{-U(0)+\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b}\frac{\Gamma \left( \frac{d}{\beta }\right) }{\Gamma \left( \frac{d}{2}\right) }\nonumber \\&\quad \le \frac{2\pi ^{d/2}}{\beta }\left( \frac{4\beta }{a}\right) ^{\frac{d}{\beta }}\frac{\left( \frac{d}{\beta }\right) ^{\frac{d}{\beta }-\frac{1}{2}}}{\left( \frac{d}{2}\right) ^{\frac{d}{2}-\frac{1}{2}}}e^{\frac{d}{2}-\frac{d}{\beta }}e^{-U(0)+\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b}, \end{aligned}$$
(7.96)
where the equality on the second line comes from using polar coordinates and the third line follows from an inequality for the ratio of Gamma functions [23]. Plugging this back into the previous inequality and taking logs, we deduce
$$\begin{aligned} {\displaystyle \log (C_{q})}&={\displaystyle \log \left( \int e^{\frac{a}{4\beta }\left\| x\right\| {}^{\beta }-U(x)}\hbox {d}x\right) }\nonumber \\&\le \frac{d}{2}\log (\pi )+\frac{d}{\beta }\log \left( \frac{4\beta }{a}\right) +\left( \frac{d}{\beta }-\frac{d}{2}\right) \log \left( \frac{d}{2e}\right) \nonumber \\&\quad +\left( \frac{d}{\beta }+\frac{1}{2}\right) \log \left( \frac{2}{\beta }\right) +\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b+\vert U(0)\vert \nonumber \\&\le \frac{d}{\beta }\left[ \frac{\beta }{2}log(\pi )+\log \left( \frac{4\beta }{a}\right) +\left( 1-\frac{\beta }{2}\right) \log \left( \frac{d}{2e}\right) \right] \nonumber \\&\quad +\frac{1}{2}\log \left( \frac{2}{\beta }\right) +\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b+\vert U(0)\vert \nonumber \\&\le {\tilde{d}}+\tilde{\mu ,} \end{aligned}$$
(7.97)
as definitions of \({\tilde{d}}\) and \({\tilde{\mu }}\). Using this bound on \(\log C_{q}\) we get
$$\begin{aligned} \textrm{H}(\rho \vert \pi )&=\int \rho \log \frac{\rho }{q/C_{q}}+\int \rho \log \frac{q/C_{q}}{\pi }\nonumber \\&=\textrm{H}(\rho \vert q/C_{q})+\textrm{E}_{\rho }\left[ \log \frac{q/C_{q}}{e^{-U}}\right] \nonumber \\&{\mathop {\ge }\limits ^{_{\left( 1\right) }}}\frac{a}{4\beta }\textrm{E}_{\rho }\left[ \left\| x\right\| {}^{\beta }\right] -\log \left( C_{q}\right) \end{aligned}$$
(7.98)
$$\begin{aligned}&\ge \frac{a}{4\beta }\textrm{E}_{\rho }\left[ \left\| x\right\| {}^{\beta }\right] -{\tilde{d}}-\tilde{\mu ,} \end{aligned}$$
(7.99)
where \(\left( 1\right) \) follows from definition of \(C_{q}\) and the fact that relative information is always non-negative. Rearranging the terms completes the proof. \(\square \)
Theorem 7.21
Suppose \(\pi \) is non-strongly convex outside the ball \({\mathbb {B}}(0,R)\), \(\alpha \)-mixture weakly smooth with \(\alpha _{N}=1\) and \(2-\)dissipativity (i.e.,\(\left\langle \nabla U(x),x\right\rangle \ge a\left\| x\right\| ^{2}-b\)) for some \(a,b>0\), and for any \(x_{0}\sim p_{0}\) with \(H(p_{0}\vert \pi )=C_{0}<\infty \), the iterates \(x_{k}\sim p_{k}\) of LMC with step size \(\eta \le 1\wedge \frac{1}{4\gamma _{3}}\wedge \left( \frac{\gamma _{3}}{16L^{1+\alpha }}\right) ^{\frac{1}{\alpha }}\)satisfies
$$\begin{aligned} H(p_{k}\vert \pi )\le e^{-\gamma _{3}\epsilon k}H(p_{0}\vert \pi )+\frac{8\eta ^{\alpha }D_{3}}{3\gamma _{3}}, \end{aligned}$$
(7.100)
where \(D_{3}\) is defined as in equation (3.8) and for some universal constant K,
$$\begin{aligned} M_{2}&=\int \left\| x\right\| ^{2}e^{-\breve{U}(x)}\textrm{d}x=O(d) \end{aligned}$$
(7.101)
$$\begin{aligned} \zeta&=K\sqrt{64d\left[ \frac{2\left( b+\left( L+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}+d\right) }{a}+M_{2}\right] \left( \frac{a+b+2aR^{2}+3}{ae^{-4\left( 4L_{N}R^{2}+4LR^{1+\alpha }\right) }}\right) }\end{aligned}$$
(7.102)
$$\begin{aligned} A&=\left( 1-\frac{L}{2}\right) \frac{8}{a^{2}}+\zeta ,\end{aligned}$$
(7.103)
$$\begin{aligned} B&=2\left[ \frac{2\left( \left( b+4\left( L+\frac{\lambda _{0}}{4}\right) R^{2}+aR^{2}\right) +d\right) }{a}+M_{2}\right] \left( 1-\frac{L}{2}+\frac{1}{\zeta }\right) ,\nonumber \\ \gamma _{3}&=\frac{2e^{-\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }}{A+(B+2)32K^{2}d\left( \frac{a+b+2aR^{2}+3}{a}\right) e^{4\left( 4L_{N}R^{2}+4LR^{1+\alpha }\right) }}=\frac{1}{O(d)}. \end{aligned}$$
(7.104)
Then, for any \(\epsilon >0\), to achieve \(H(p_{k}\vert \pi )<\epsilon \), it suffices to run ULA with step size \(\eta \le 1\wedge \frac{1}{4\gamma _{3}}\wedge \left( \frac{\gamma _{3}}{16L^{1+\alpha }}\right) ^{\frac{1}{\alpha }}\wedge \left( \frac{3\epsilon \gamma _{3}}{16D_{3}}\right) ^{\frac{1}{\alpha }}\)for \(k\ge \frac{1}{\gamma _{3}\eta }\log \frac{2\,H\left( p_{0}\vert \pi \right) }{\epsilon }\) iterations.
Proof
Using Lemma 2, there exists \(\breve{U}\left( x\right) \in C^{1}({\mathbb {R}}^{d})\) with its Hessian exists everywhere on \({\mathbb {R}}^{d}\), and \(\breve{U}\) is convex on \({\mathbb {R}}^{d}\) such that
$$\begin{aligned} \sup \left( \breve{U}(\ x)-U(\ x)\right) -\inf \left( \breve{U}(\ x)-U(\ x)\right) \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}. \end{aligned}$$
(7.105)
We can prove by two different approaches.
First approach: Since \(\breve{U}\) is convex, by Theorem 1.2 of [4], \(\breve{U}\) satisfies Poincaré inequality with constant
$$\begin{aligned} \gamma&\ge \frac{1}{4K^{2}\int \left\| x-E_{\pi }(x)\right\| ^{2}\pi \left( x\right) \hbox {d}x}\\&{\mathop {\ge }\limits ^{_{1}}}\frac{1}{8K^{2}\left( E_{\pi }\left( \left\| x\right\| ^{2}\right) +\left\| E_{\pi }(x)\right\| ^{2}\right) }\\&{\mathop {\ge }\limits ^{}}\frac{1}{16K^{2}E_{\pi }\left( \left\| x\right\| ^{2}\right) }, \end{aligned}$$
where K is a universal constant, step 1 follows from Young inequality and the last line is due to Jensen inequality. In addition, for \(\left\| x\right\| >R+2\epsilon +\delta \) from \(2-\)dissipative assumption, we have for some a, \(b>0,\left\langle \nabla \breve{U}(x),x\right\rangle =\left\langle \nabla U(x),x\right\rangle \ge a\left\| x\right\| ^{2}-b\), while for \(\left\| x\right\| \le R+2\epsilon +\delta \) by convexity of \(\breve{U}\)
$$\begin{aligned} \left\langle \nabla \breve{U}(x),x\right\rangle&\ge 0\\&\ge a\left\| x\right\| ^{2}-a\left( R+2\epsilon +\delta \right) ^{2}\\&\ge a\left\| x\right\| ^{2}-2aR^{2}. \end{aligned}$$
so for every \(x\in {\mathbb {R}}^{d},\)
$$\begin{aligned} \left\langle \nabla \breve{U}(x),x\right\rangle \ge a\left\| x\right\| ^{2}-\left( b+2aR^{2}\right) . \end{aligned}$$
Therefore, \(\breve{U}(\textrm{x})\) also satisfies \(2-\)dissipative, which implies
$$\begin{aligned} E_{\breve{\pi }}\left( \left\| x\right\| ^{2}\right) \le 2d\left( \frac{a+b+2aR^{2}+3}{a}\right) , \end{aligned}$$
so the Poincaré constant satisfies
$$\begin{aligned} \gamma {\mathop {\ge }\limits ^{}}\frac{1}{32K^{2}d\left( \frac{a+b+2aR^{2}+3}{a}\right) }. \end{aligned}$$
From [25]’s Lemma 1.2, we have U satisfies Poincaré inequality with constant
$$\begin{aligned} \gamma \ge \frac{1}{32K^{2}d\left( \frac{a+b+2aR^{2}+3}{a}\right) }e^{-4\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}\right) }. \end{aligned}$$
Now, applying the previous section result, we derive the desired result.
Second approach: By employing Lemma 7.25, combined with \(2-\)dissipative assumption, we get:
$$\begin{aligned} \int e^{\frac{a}{8}\left\| x\right\| {}^{2}-U(x)}\hbox {d}x\le e^{\left( {\tilde{d}}+{\tilde{\mu }}\right) }, \end{aligned}$$
(7.106)
which in turn implies
$$\begin{aligned} \int e^{\frac{a}{8}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x\le e^{\left( {\tilde{d}}+{\tilde{\mu }}\right) +2\sum _{i}L_{i}R^{1+\alpha _{i}}}. \end{aligned}$$
(7.107)
Let \(\mu _{1}=\frac{e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}}{\int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x}\) and assume that \(\mu _{2}=\frac{\mu _{1}e^{\frac{a}{16p}\left\| x\right\| {}^{2}}}{\int e^{\frac{a}{16p}\left\| x\right\| {}^{2}}\hbox {d}\mu _{1}}\). We have \(\mu _{1}\) is \(\frac{a}{8p}\) strongly convex or log-Sobolev with constant \(\frac{a}{8p}\) and by Cauchy–Schwarz inequality, we have
$$\begin{aligned} \left\| \frac{\hbox {d}\mu _{2}}{\hbox {d}\mu _{1}}\right\| _{L^{p}\left( \mu _{1}\right) }^{p}&=\frac{\int e^{\frac{a}{16}\left\| x\right\| {}^{2}}\hbox {d}\mu _{1}}{\left( \int e^{\frac{a}{16p}\left\| x\right\| {}^{2}}\hbox {d}\mu _{1}\right) ^{p}}\nonumber \\&\le \left( \int e^{\frac{a}{8}\left\| x\right\| {}^{2}}\hbox {d}\mu _{1}\right) ^{\frac{1}{2}}\left( \int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}}\hbox {d}\mu _{1}\right) ^{p}\nonumber \\&=\left( \frac{\int e^{\frac{a\left( 2p-1\right) }{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x}{\int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x}\right) ^{\frac{1}{2}}\left( \frac{\int e^{\frac{-a}{8p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x}{\int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x}\right) ^{p} \end{aligned}$$
(7.108)
Since
$$\begin{aligned} \Vert U(\ x)-U(0)\Vert&=\Vert U(\ x)-U(0)-\ \left\langle x,\nabla U(0)\right\rangle \Vert \nonumber \\&\le \sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{1+\alpha _{i}}+\frac{L_{N}}{2}\left\| x\right\| ^{2}, \end{aligned}$$
(7.109)
this implies \(U(\ x)\le \Vert U(0)\Vert +\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{1+\alpha _{i}}+\frac{L_{N}}{2}\left\| x\right\| ^{2}\) which in turn indicates
$$\begin{aligned} \int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x&\ge \int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\Vert U(0)\Vert -\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{1+\alpha _{i}}-\frac{L_{N}}{2}\left\| x\right\| ^{2}-2\sum _{i}L_{i}R^{1+\alpha _{i}}}\hbox {d}x\nonumber \\&\ge e^{-\Vert U(0)\Vert -2\sum _{i}L_{i}R^{1+\alpha _{i}}}\int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{1+\alpha _{i}}-\frac{L_{N}}{2}\left\| x\right\| ^{2}}\hbox {d}x\nonumber \\&\ge e^{-\Vert U(0)\Vert -2\sum _{i}L_{i}R^{1+\alpha _{i}}-\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}}\int e^{-\left( \frac{a}{16p}+\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}+\frac{L_{N}}{2}\right) \left\| x\right\| {}^{2}}\hbox {d}x\nonumber \\&\ge \frac{\pi ^{\frac{d}{2}}}{\left( \frac{a}{16p}+\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}+\frac{L_{N}}{2}\right) ^{\frac{d}{2}}}e^{-\Vert U(0)\Vert -2\sum _{i}L_{i}R^{1+\alpha _{i}}-\frac{L}{1+\alpha }}. \end{aligned}$$
(7.110)
On the other hand,
$$\begin{aligned} \int e^{\frac{-a}{8p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x&\le \int e^{\frac{a\left( 2p-1\right) }{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x\nonumber \\&\le \int e^{\frac{a}{8p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x\nonumber \\&\le e^{\left( {\tilde{d}}+{\tilde{\mu }}\right) +2\sum _{i}L_{i}R^{1+\alpha _{i}}}. \end{aligned}$$
(7.111)
Combining this with previous inequality, we obtain
$$\begin{aligned} \left\| \frac{\hbox {d}\mu _{2}}{\hbox {d}\mu _{1}}\right\| _{L^{p}\left( \mu _{1}\right) }^{p}&\le \left( \frac{e^{\left( \left( {\tilde{d}}+{\tilde{\mu }}\right) +2\sum _{i}L_{i}R^{1+\alpha _{i}}\right) }}{\frac{\pi ^{\frac{d}{2}}}{\left( \frac{a}{16p}+\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}+\frac{L_{N}}{2}\right) ^{\frac{d}{2}}}e^{-\Vert U(0)\Vert -2\sum _{i}L_{i}R^{1+\alpha _{i}}-\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}}}\right) ^{p+\frac{1}{2}}\nonumber \\&=\Lambda ^{p}. \end{aligned}$$
(7.112)
Taking logarithm of \(\Lambda \), we get
$$\begin{aligned} \log \Lambda&=\frac{\left( p+\frac{1}{2}\right) }{p}\log \left( \frac{e^{\left( \left( {\tilde{d}}+{\tilde{\mu }}\right) +2\sum _{i}L_{i}R^{1+\alpha _{i}}\right) }}{\frac{\pi ^{\frac{d}{2}}}{\left( \frac{a}{16p}+\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}+\frac{L_{N}}{2}\right) ^{\frac{d}{2}}}e^{-\Vert U(0)\Vert -2\sum _{i}L_{i}R^{1+\alpha _{i}}-\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}}}\right) \nonumber \\&=\frac{\left( p+\frac{1}{2}\right) }{p}\left( {\tilde{d}}+\frac{d}{2}\log \left( \frac{a}{8p}+\frac{a}{16p}+\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}+\frac{L_{N}}{2}\right) -\frac{d}{2}\log \left( \pi \right) \right) \nonumber \\&\quad +\frac{\left( p+\frac{1}{2}\right) }{p}\left( {\tilde{\mu }}+2\sum _{i}L_{i}R^{1+\alpha _{i}}+\Vert U(0)\Vert +\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}\right) \nonumber \\&={\tilde{O}}\left( d\right) . \end{aligned}$$
(7.113)
Since \(\mu _{2}\) is log concave, from Lemma 9, we have for some universal constant C (not depending on d), it is log-Sobolev with constant
$$\begin{aligned} C({\displaystyle \Lambda ,p)}&=\frac{1}{C}\frac{a}{8p}\frac{p-1}{p}\frac{1}{1+\log \Lambda }\nonumber \\&=\frac{1}{C}\frac{a}{8p}\frac{p-1}{p}\frac{1}{1+{\tilde{O}}\left( d\right) }\nonumber \\&=\frac{1}{{\tilde{O}}\left( d\right) }. \end{aligned}$$
(7.114)
From this, by using Holley–Stroock perturbation theorem, we obtain \(U(\ x)\) is log Sobolev on \({\mathbb {R}}^{d}\) with constant \(\frac{1}{{\tilde{O}}\left( d\right) }e^{-2\sum _{i}L_{i}R^{1+\alpha _{i}}}.\) Now, applying Theorem 3.2, we derive the desired result.
\(\square \)
1.6 Appendix F: Proof of Additional Lemmas
Lemma 7.22
For any \(0\le \varpi \le k\in N^{+}\), we have
$$\begin{aligned} \Vert x+y\Vert ^{\varpi }\le 2^{k-1}(\Vert x\Vert ^{\varpi }+\Vert y\Vert ^{\varpi }). \end{aligned}$$
(7.115)
Proof
Let’s consider functions \(f_{k}(u)=2^{k-1}(u^{\varpi }+1)-(1+u)^{\varpi }\). We prove \(f_{k}(u)\ge 0\) for every \(u\ge 0\) by induction. For \(k=1\), since \(0\le \varpi \le 1,\) we have \(f_{1}^{\prime }(u)=\varpi u^{\varpi -1}-\varpi (1+u)^{\varpi -1}\ge 0\). This implies \(f_{1}(u)\) increases on \(\left[ 0,\infty \right] \) and since \(f(0)=0,\) which in turn indicates \(f(u)\ge 0.\) Therefore, the statement is true for \(k=1.\)
Assume that it is true for \(k=n\), we will show that it is also true for \(k=n+1.\) If we differentiate \(f_{n+1}(u)\), we get
$$\begin{aligned} f_{n+1}^{\prime }(u)&=2^{n}\varpi u^{\varpi -1}-\varpi (1+u)^{\varpi -1}\nonumber \\&=\varpi \left( 2^{n}u^{\varpi -1}-(1+u)^{\varpi -1}\right) \nonumber \\&\ge 0, \end{aligned}$$
(7.116)
for \(1\le \varpi \le n+1\) by induction assumption while for \(0\le \varpi \le 1\), \(u^{\varpi -1}-(1+u)^{\varpi -1}\ge u^{\varpi -1}-(1+u)^{\varpi -1}\ge 0.\) Hence, f increases on \(\left[ 0,\infty \right] \) and since \(f(0)=2^{k-1}-1\ge 0,\) this implies \(f\ge 0\).
Applying to our case for \(0\le \varpi \le k\),
$$\begin{aligned} 2^{k-1}(\Vert x\Vert ^{\varpi }+\Vert y\Vert ^{\varpi })&=\Vert x\Vert ^{\varpi }2^{k-1}\left( 1+\left( \frac{\left\| y\right\| }{\left\| x\right\| }\right) ^{\omega }\right) \nonumber \\&\ge \Vert x\Vert ^{\varpi }\left( 1+\left( \frac{\left\| y\right\| }{\left\| x\right\| }\right) \right) ^{\varpi }\nonumber \\&=\left( \left\| x\right\| +\left\| y\right\| \right) ^{\varpi }\nonumber \\&\ge \left( \left\| x+y\right\| \right) ^{\varpi }, \end{aligned}$$
(7.117)
which conclude our proof. \(\square \)
Lemma 7.23
For \(\theta >0\), \(f\left( r\right) =m\left( r\right) r^{2}=\mu \left( 1+r^{2}\right) ^{-\frac{\theta }{2}}r{}^{2}\ge \frac{\mu }{2}r{}^{2-\theta }-\frac{\mu }{2}2{}^{\frac{2-\theta }{\theta }},\) and for \(\theta =0,\) \(f\left( r\right) =\mu r{}^{2}.\)
Proof
For \(\theta =0,\) it is straightforward. For \(\theta >0,\) from Lemma 2 above, for \(r\ge 2^{\frac{1}{\theta }}\),
$$\begin{aligned} f\left( r\right)&=\mu \left( 1+r^{2}\right) ^{-\frac{\theta }{2}}r{}^{2}\nonumber \\&\ge \mu \left( 1+r^{\theta }\right) ^{-1}r{}^{2}\nonumber \\&=\mu \left( r^{2\theta }-1\right) ^{-1}r{}^{2}\left( r^{\theta }-1\right) \nonumber \\&\ge \mu r{}^{2-2\theta }\left( r^{\theta }-1\right) \nonumber \\&\ge \frac{\mu }{2}r{}^{2-\theta }. \end{aligned}$$
(7.118)
For \(r<2^{\frac{1}{\theta }}\), \(f\left( r\right) \ge 0\ge \frac{\mu }{2}r{}^{2-\theta }-\frac{\mu }{2}2{}^{\frac{2-\theta }{\theta }}\) which concludes statement. \(\square \)
Lemma 7.24
\(f\left( \theta \right) =\Vert \left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}-\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}\Vert \)is increasing function.
Proof
If \(\left\| x\right\| \ge \left\| x-y\right\| ,\) we have \(f\left( \theta \right) =\left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}-\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}\). Differentiating f with respect to \(\theta \) gives
$$\begin{aligned} f^{\prime }\left( \theta \right)&=\frac{1}{2}\ln \left( 1+\left\| x\right\| ^{2}\right) \left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}\nonumber \\&-\frac{1}{2}\ln \left( 1+\left\| x-y\right\| ^{2}\right) \left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}\nonumber \\&\ge 0. \end{aligned}$$
(7.119)
Similarly, if \(\left\| x\right\| \le \left\| x-y\right\| \) we also obtain \(f^{\prime }\left( \theta \right) \ge 0,\) which implies that f increases as desired. \(\square \)
Lemma 7.25
If \(\xi \sim N_{p}\left( 0,I_{d}\right) \) then \(d^{\left\lfloor \frac{n}{p}\right\rfloor }\le E(\left\| \xi \right\| _{p}^{n})\le \left[ d+\frac{n}{2}\right] ^{\frac{n}{p}}\), where\(\left\lfloor x\right\rfloor \) denotes the largest integer less than or equal to x. If \(n=kp,\) then \(E(\left\| \xi \right\| _{p}^{n})=d..(d+k-1)\).
Proof
From [36], we have \(E(\left\| \xi \right\| _{p}^{n})=p^{\frac{n}{p}}\frac{\Gamma \left( \frac{d+n}{p}\right) }{\Gamma \left( \frac{d}{p}\right) }.\)
Since \(\Gamma \) is an increasing function,
$$\begin{aligned} p^{\frac{n}{p}}\frac{\Gamma \left( \frac{d+n}{p}\right) }{\Gamma \left( \frac{d}{p}\right) }\ge p^{\frac{n}{p}}\frac{\Gamma \left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor \right) }{\Gamma \left( \frac{d}{p}\right) }=p^{\frac{n}{p}}\frac{d}{p}\ldots \left( \frac{d}{p}+k-1\right) \ge d^{\left\lfloor \frac{n}{p}\right\rfloor }. \end{aligned}$$
If \(n=kp\) for \(k\in N\) then \(E(\left\| \xi \right\| _{p}^{n})=p^{\frac{n}{p}}\frac{d}{p}\ldots \left( \frac{d}{p}+k-1\right) .\) If \(n\ne kp\), let \(\left\lfloor \frac{n}{p}\right\rfloor =k\). Since \(\Gamma \) is log-convex, by Jensen’s inequality for any \(p\ge 1\), we acquire
$$\begin{aligned}&\left( 1-\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}\right) \log \Gamma \left( \frac{d}{p}\right) +\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}\log \Gamma \left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor +1\right) \\&\quad \ge \log \Gamma \left( \left( 1-\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}\right) \frac{d}{p}+\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}\left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor +1\right) \right) \\&\quad \ge \log \Gamma \left( \frac{d+n}{p}\right) >0. \end{aligned}$$
Raising e to the power of both sides, we get
$$\begin{aligned} \Gamma \left( \frac{d}{p}\right) ^{\left( 1-\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}\right) }\Gamma \left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor +1\right) ^{\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}}\ge \Gamma \left( \frac{d+n}{p}\right) , \end{aligned}$$
which implies that
$$\begin{aligned} \begin{array}{cc} \left[ \frac{\Gamma \left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor +1\right) }{\Gamma \left( \frac{d}{p}\right) }\right] ^{\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}} &{} \ge \frac{\Gamma \left( \frac{d+n}{p}\right) }{\Gamma \left( \frac{d}{p}\right) }\\ \left[ \frac{d}{p}\ldots \left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor \right) \right] ^{\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}} &{} \ge \frac{\Gamma \left( \frac{d+n}{p}\right) }{\Gamma \left( \frac{d}{p}\right) }. \end{array} \end{aligned}$$
Combining with \(E(\left\| \xi \right\| _{p}^{n})=p^{\frac{n}{p}}\frac{\Gamma \left( \frac{d+n}{p}\right) }{\Gamma \left( \frac{d}{p}\right) }\) gives the conclusion. \(\square \)