Skip to main content
Log in

Unadjusted Langevin Algorithm for Non-convex Weakly Smooth Potentials

  • Published:
Communications in Mathematics and Statistics Aims and scope Submit manuscript

Abstract

Discretization of continuous-time diffusion processes is a widely recognized method for sampling. However, the canonical Euler Maruyama discretization of the Langevin diffusion process, referred as unadjusted Langevin algorithm (ULA), studied mostly in the context of smooth (gradient Lipschitz) and strongly log-concave densities, is a considerable hindrance for its deployment in many sciences, including statistics and machine learning. In this paper, we establish several theoretical contributions to the literature on such sampling methods for non-convex distributions. Particularly, we introduce a new mixture weakly smooth condition, under which we prove that ULA will converge with additional log-Sobolev inequality. We also show that ULA for smoothing potential will converge in \(L_{2}\)-Wasserstein distance. Moreover, using convexification of nonconvex domain (Ma et al. in Proc Natl Acad Sci 116(42):20881–20885, 2019) in combination with regularization, we establish the convergence in Kullback–Leibler divergence with the number of iterations to reach \(\epsilon \)-neighborhood of a target distribution in only polynomial dependence on the dimension. We relax the conditions of Vempala and Wibisono (Advances in Neural Information Processing Systems, 2019) and prove convergence guarantees under isoperimetry, and non-strongly convex at infinity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arellano-Valle, R.B., Richter, W.-D.: On skewed continuous ln, p-symmetric distributions. Chilean J. Stat. 3(2), 193–212 (2012)

    MathSciNet  MATH  Google Scholar 

  2. Bakry, D., Émery, M.: Diffusions hypercontractives. In: Séminaire de Probabilités XIX 1983/84, pp. 177–206. Springer, Berlin (1985)

  3. Bernton, E.: Langevin Monte Carlo and JKO splitting. arXiv preprint arXiv:1802.08671 (2018)

  4. Bobkov, S.G.: Isoperimetric and analytic inequalities for log-concave probability measures. Ann. Probab. 27(4), 1903–1921 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cattiaux, P., Guillin, A., Wu, L.-M.: A note on Talagrand’s transportation inequality and logarithmic Sobolev inequality. Probab. Theory Relat. Fields 148(1–2), 285–304 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chatterji, N.S., Diakonikolas, J., Jordan, M.I., Bartlett, P.L.: Langevin Monte Carlo without smoothness. arXiv preprint arXiv:1905.13285 (2019)

  7. Chen, Z., Vempala, S.S.: Optimal convergence rate of Hamiltonian Monte Carlo for strongly logconcave distributions. arXiv preprint arXiv:1905.02313 (2019)

  8. Cheng, X., Bartlett, P.L.: Convergence of Langevin MCMC in KL-divergence. PMLR 83(83), 186–211 (2018)

    MathSciNet  MATH  Google Scholar 

  9. Dalalyan, A.S.: Further and stronger analogy between sampling and optimization: Langevin monte carlo and gradient descent. arXiv preprint arXiv:1704.04752 (2017)

  10. Dalalyan, A.S., Riou-Durand, L., Karagulyan, A.: Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets. arXiv preprint arXiv:1906.08530 (2019)

  11. Dalalyan, A.S., Karagulyan, A.: User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stoch. Process. Appl. 129(12), 5278–5311 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  12. Durmus, A., Moulines, E., Saksman, E.: On the convergence of Hamiltonian Monte Carlo. arXiv preprint arXiv:1705.00166 (2017)

  13. Durmus, A., Moulines, E., Pereyra, M.: Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau. SIAM J. Imag. Sci. 11(1), 473–506 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  14. Durmus, A., Majewski, S., Miasojedow, B.: Analysis of Langevin Monte Carlo via convex optimization. J. Mach. Learn. Res. 20, 73–1 (2019)

    MathSciNet  MATH  Google Scholar 

  15. Dwivedi, R., Chen, Y., Wainwright, M.J., Yu, B.: Log-concave sampling: metropolis-hastings algorithms are fast! In: Conference on Learning Theory, pp. 793–797 (2018)

  16. Dwivedi, R., Chen, Y., Wainwright, M.J., Yu, B.: Log-concave sampling: Metropolis-Hastings algorithms are fast. J. Mach. Learn. Res. 20(183), 1–42 (2019)

    MathSciNet  MATH  Google Scholar 

  17. Eberle, A., Guillin, A., Zimmer, R.: Couplings and quantitative contraction rates for Langevin dynamics. Ann. Probab. 47(4), 1982–2010 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  18. Erdogdu, M.A., Hosseinzadeh, R.: On the convergence of langevin monte carlo: The interplay between tail growth and smoothness. arXiv preprint arXiv:2005.13097 (2020)

  19. Gorham, J., Mackey, L.: Measuring sample quality with kernels. In: International Conference on Machine Learning, pp. 1292–1301 (2017)

  20. Gross, L.: Logarithmic sobolev inequalities. Am. J. Math. 97(4), 1061–1083 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  21. Holley, R., Stroock, D.W.: Logarithmic Sobolev inequalities and stochastic ising models (1986)

  22. Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker-Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  23. Kečkić, J.D., Vasić, P.M.: Some inequalities for the gamma function. Publ. de l’Institut Math. 11(31), 107–114 (1971)

    MathSciNet  MATH  Google Scholar 

  24. Kloeden, P.E., Platen, E.: Stochastic differential equations. In: Numerical Solution of Stochastic Differential Equations, pp. 103–160. Springer (1992)

  25. Ledoux, M.: Logarithmic sobolev inequalities for unbounded spin systems revisited. In: Séminaire de Probabilités XXXV, pp. 167–194. Springer, Berlin (2001)

  26. Lee, Y.T., Vempala, S.S.: Convergence rate of Riemannian Hamiltonian Monte Carlo and faster polytope volume computation. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1115–1121 (2018)

  27. Li, X., Wu, Y., Mackey, L., Erdogdu, M.A.: Stochastic Runge–Kutta accelerates Langevin Monte Carlo and beyond. In: Advances in Neural Information Processing Systems, pp. 7748–7760 (2019)

  28. Lovász, L., Vempala, S.: Hit-and-run from a corner. In: Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, pp. 310–314 (2004)

  29. Lovász, L., Vempala, S.: Fast algorithms for logconcave functions: sampling, rounding, integration and optimization. In: 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pp. 57–68 (2006). IEEE

  30. Lovász, L., Vempala, S.: The geometry of logconcave functions and sampling algorithms. Random Struct. Algorithms 30(3), 307–358 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  31. Ma, Y.-A., Chen, Y., Jin, C., Flammarion, N., Jordan, M.I.: Sampling can be faster than optimization. Proc. Natl. Acad. Sci. 116(42), 20881–20885 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  32. Mangoubi, O., Smith, A.: Rapid mixing of Hamiltonian Monte Carlo on strongly log-concave distributions. arXiv preprint arXiv:1708.07114 (2017)

  33. Mangoubi, O., Vishnoi, N.: Dimensionally tight bounds for second-order Hamiltonian Monte Carlo. In: Advances in Neural Information Processing Systems, pp. 6027–6037 (2018)

  34. Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  35. Raginsky, M., Rakhlin, A., Telgarsky, M.: Non-convex learning via stochastic gradient Langevin dynamics: a nonasymptotic analysis. arXiv preprint arXiv:1702.03849 (2017)

  36. Richter, W.-D.: Generalized spherical and simplicial coordinates. J. Math. Anal. Appl. 336(2), 1187–1202 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  37. Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer, Berlin (2013)

    MATH  Google Scholar 

  38. Vempala, S., Wibisono, A.: Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices. In: Advances in Neural Information Processing Systems, pp. 8094–8106 (2019)

  39. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2008)

    MATH  Google Scholar 

  40. Villani, C.: Topics in Optimal Transportation vol. 58. American Mathematical Society (2021)

  41. Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 681–688 (2011)

  42. Xu, P., Chen, J., Zou, D., Gu, Q.: Global convergence of Langevin dynamics based algorithms for nonconvex optimization. In: Advances in Neural Information Processing Systems, pp. 3122–3133 (2018)

  43. Yan, M.: Extension of convex function. arXiv preprint arXiv:1207.0944 (2012)

Download references

Acknowledgements

This research was funded in part by the University of Mississippi summer grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dao Nguyen.

Ethics declarations

Conflict of interest

We, hereby, declare that we have no conflict of interest.

Appendix

Appendix

1.1 Appendix A: Measure Definitions and Isoperimetry

Let \(p,\pi \) be probability distributions on \({\mathbb {R}}^{d}\) with full support and smooth densities, define the Kullback–Leibler (KL) divergence of p with respect to \(\pi \) as

$$\begin{aligned} H(p\vert \pi ){\mathop {=}\limits ^{\triangle }}\int _{{R}^{d}}p(x)\log \frac{p(x)}{\pi (x)}\,\hbox {d}x. \end{aligned}$$
(7.1)

Likewise, we denote the entropy of p with

$$\begin{aligned} {\displaystyle H(p){\mathop {=}\limits ^{\triangle }}-\int p(x)\log p(x)\hbox {d}x} \end{aligned}$$
(7.2)

and for \({\mathcal {B}}({\mathbb {R}}^{d})\) denotes the Borel \(\sigma \)-field of \({\mathbb {R}}^{d}\), define the relative Fisher information and total variation metrics correspondingly as

$$\begin{aligned}{} & {} {\displaystyle I(p\vert \pi ){\mathop {=}\limits ^{\triangle }}\int _{{\mathbb {R}}^{d}}p(x)\Vert \nabla \log \frac{p(x)}{\pi (x)}\Vert ^{2}\hbox {d}x}, \end{aligned}$$
(7.3)
$$\begin{aligned}{} & {} {\displaystyle TV(p,{\displaystyle \ \pi ){\mathop {=}\limits ^{\triangle }}\sup _{A\in {\mathcal {B}}({\mathbb {R}}^{d})}\vert \int _{A}p(x)\hbox {d}x-\int _{A}\pi (x)\hbox {d}x\vert }.} \end{aligned}$$
(7.4)

Furthermore, we define a transference plan \(\zeta \), a distribution on \(({\mathbb {R}}^{d}\times {\mathbb {R}}^{d},\ {\mathcal {B}}({\mathbb {R}}^{d}\times {\mathbb {R}}^{d}))\) (where \({\mathcal {B}}({\mathbb {R}}^{d}\times {\mathbb {R}}^{d})\) is the Borel \(\sigma \)-field of (\({\mathbb {R}}^{d}\times {\mathbb {R}}^{d}\))) so that \(\zeta (A\times {\mathbb {R}}^{d})=p(A)\) and \(\zeta ({\mathbb {R}}^{d}\times A)=\pi (A)\) for any \(A\in {\mathcal {B}}({\mathbb {R}}^{d})\). Let \(\Gamma (P,\ Q)\) designate the set of all such transference plans. Then for \(\beta >0\), the \(L_{\beta }\)-Wasserstein distance is formulated as:

$$\begin{aligned} W_{\beta }(p,\pi ){\mathop {=}\limits ^{\triangle }}\left( \inf _{\zeta \in \Gamma (P,Q)}\int _{x,y\in {\mathbb {R}}^{d}}\Vert x-y\Vert ^{\beta }\textrm{d}\zeta (x,\ y)\right) ^{1/\beta }. \end{aligned}$$
(7.5)

Note that although KL divergence is an asymmetric measure of distance between probability distributions, it is the preferred measure of distance here since it also implies total variation distance via Pinsker’s inequality. In addition, KL divergence also governs the quadratic Wasserstein \(W_{2}\) distance under log-Sobolev, Talagrand, and Poincaré inequalities defined below.

Definition 7.1

The probability distribution p satisfies a logarithmic Sobolev inequality with constant \(\gamma >0\) (in short: \(LSI(\gamma )\)) if for all probability distribution p absolutely continuous \(w.r.t.\ \pi \),

$$\begin{aligned} H({\displaystyle p\vert \pi )\le \frac{1}{2\gamma }I(p\vert \pi )}. \end{aligned}$$
(7.6)

Definition 7.2

The probability distribution p satisfies a Talagrand inequality with constant \(\gamma >0\) (in short: \(T(\gamma )\)) if for all probability distribution p, absolutely continuous \(w.r.t.\ \pi \), with finite moments of order 2,

$$\begin{aligned} W_{2}(p,\ \pi )\le \sqrt{\frac{2H(p\vert \pi )}{\gamma }}. \end{aligned}$$
(7.7)

Definition 7.3

The probability distribution p satisfies a Poincaré inequality with constant \(\gamma >0\) (in short: \(PI(\gamma )\)) if for all smooth function \(g:{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\),

$$\begin{aligned} Var_{p}(g)\le \frac{1}{\gamma }E_{p}[\Vert \nabla g\Vert ^{2}], \end{aligned}$$
(7.8)

where \(Var_{p}(g)=E_{p}[g^{2}]-E_{p}[g]^{2}\) is the variance of g under p.

1.2 Appendix B: Proofs of p-Generalized Gaussian Smoothing

1.2.1 Proof of \(\alpha \)-Mixture Weakly Smooth Property

Lemma 7.4

If potential \(U:{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\) satisfies \(\alpha \)-mixture weakly smooth, then:

$$\begin{aligned} U(y)\le U(x)+\left\langle \nabla U(x),\ y-x\right\rangle +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\Vert y-x\Vert ^{1+\alpha _{i}}. \end{aligned}$$

Proof

We have

$$\begin{aligned}&\Vert U(x)-U(y)-\langle \nabla U(y),x-y\rangle \Vert \\&\quad = \Big \vert \int _{0}^{1}\langle \nabla U(y+t(x-y)),x-y\rangle \text {d}t-\langle \nabla U(y),x-y\rangle \Big \vert \\&\quad = \Big \vert \int _{0}^{1}\langle \nabla U(y+t(x-y))-\nabla U(y),x-y\rangle \text {d}t\Big \vert .\\&\quad \le \int _{0}^{1}\Vert \nabla U(y+t(x-y))-\nabla U(y)\Vert \Vert x-y\Vert \text {d}t\\&\quad \le \int _{0}^{1}\sum _{i}L_{i}t^{\alpha _{i}}\Vert x-y\Vert ^{\alpha _{i}}\Vert x-y\Vert \text {d}t\\&\quad = \sum _{i}\frac{L_{i}}{1+\alpha _{i}}\Vert x-y\Vert ^{1+\alpha _{i}}, \end{aligned}$$

where the first line comes from Taylor expansion, the third line follows from Cauchy–Schwarz inequality and the fourth line is due to Assumption 2.1. This gives us the desired result. \(\square \)

1.2.2 Proof of p-Generalized Gaussian Smoothing Properties

Lemma 7.5

If potential \(U:{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\) satisfies \(\alpha \)-mixture weakly smooth, then:

  1. (i)

    \(\forall x\in {\mathbb {R}}^{d}\): \(\Vert U_{\mu }(x)-U(x)\Vert {\displaystyle \le \sum _{i}L_{i}\mu ^{1+\alpha _{i}}d^{\frac{1+\alpha _{i}}{p}},}\)

  2. (ii)

    \(\forall x\in {\mathbb {R}}^{d}\): \({\displaystyle \left\| \nabla U_{\mu }(x)-\nabla U(x)\right\| \le \sum _{i}L_{i}\mu ^{\alpha _{i}}d^{\frac{3}{p}}},\)

  3. (iii)

    \(\forall x,\ y\in {\mathbb {R}}^{d}\): \({\displaystyle \left\| \nabla U_{\mu }(y)-\nabla U_{\mu }(x)\right\| \le \sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}\left\| y-x\right\| .}\)

Proof

(i). Since \(U_{\mu }(x)=\mathrm {{\mathbb {E}}}_{\xi }[U(x+\mu \xi )]\), \(U(x)=\mathrm {{\mathbb {E}}}_{\xi }[U(x)]\) and \({\mathbb {E}}_{\xi }\mu \left\langle \nabla U(x),\ \xi \right\rangle =0\), we have

$$\begin{aligned} U_{\mu }(x)-U(x)={\mathbb {E}}_{\xi }\left[ U(x+\mu \xi )-U(x)-\mu \left\langle \nabla U(x),\ \xi \right\rangle \right] . \end{aligned}$$

By the definition of the density of p-generalized Gaussian distribution [1], we also have:

$$\begin{aligned} U_{\mu }(x)-U(x)=\frac{1}{\kappa }\int _{{\mathbb {R}}^{d}}[U(x+\mu \xi )-U(x)-\mu \left\langle \nabla U(x),\ \xi \right\rangle ]e^{-\left\| \xi \right\| _{p}^{p}/p}\hbox {d}\xi . \end{aligned}$$

Applying Eq. 2.1 and previous inequality:

$$\begin{aligned} \vert U_{\mu }(x)-U(x)\vert&=\Vert \frac{1}{\kappa }\int _{{\mathbb {R}}^{d}}\left[ U(x+\mu \xi )-U(x)-\mu \left\langle \nabla U(x),\ \xi \right\rangle \right] e^{-\left\| \xi \right\| _{p}^{p}/p}\hbox {d}\xi \Vert \\&\le \sum _{i}\frac{L_{i}}{\kappa (1+\alpha _{i})}\mu ^{1+\alpha _{i}}\int _{{\mathbb {R}}^{d}}\left\| \xi \right\| ^{(1+\alpha _{i})}e^{-\left\| \xi \right\| _{p}^{p}/p}\hbox {d}\xi \\&=\sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}E\left[ \left\| \xi \right\| ^{(1+\alpha _{i})}\right] . \end{aligned}$$

If \(p\le 2\), then \(\left\| \xi \right\| \le \left\| \xi \right\| _{p}\) and we get

$$\begin{aligned} \vert U_{\mu }(x)-U(x)\vert&\le \sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}E\left[ \left\| \xi \right\| ^{(1+\alpha _{i})}\right] \\&{\mathop {\le }\limits ^{_{1}}}\sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}{\mathbb {E}}\left[ \left\| \xi \right\| _{p}^{2}\right] ^{\frac{1+\alpha _{i}}{2}}\\&{\mathop {\le }\limits ^{_{2}}}\sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}\left( \left( d+1\right) ^{\frac{2}{p}}\right) ^{\frac{1+\alpha _{i}}{2}}\\&\le \sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}d^{\frac{1+\alpha _{i}}{p}}\\&\le \sum _{i}\frac{L_{i}\mu ^{1+\alpha _{i}}}{(1+\alpha _{i})}d^{\frac{2}{p}}, \end{aligned}$$

where step 1 follows from Jensen inequality and \(0\le \alpha \le 1\), step 2 is from Lemma 7.25 in which if \(\xi \sim N_{p}\left( 0,I_{d}\right) \) then \(d^{\left\lfloor \frac{n}{p}\right\rfloor }\le E(\left\| \xi \right\| _{p}^{n})\le \left[ d+\frac{n}{2}\right] ^{\frac{n}{p}}\), where\(\left\lfloor x\right\rfloor \) denotes the largest integer less than or equal to x, and the last step is by simplification when d is large enough and \(\mu \) is small enough.

(ii). We adapt the technique of [34] to p-generalized Gaussian smoothing. Let \(y=x+\mu \xi \), then \(U_{\mu }(x)\) is rewritten in another form as

$$\begin{aligned} U_{\mu }(x)&=\mathrm {{\mathbb {E}}}_{\xi }[U(x+\mu \xi )]\\&=\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}U(y)e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\hbox {d}y. \end{aligned}$$

Now taking the gradient with respect to x of \(U_{\mu }(x)\) gives

$$\begin{aligned} \nabla _{x}U_{\mu }(x)=\frac{1}{\kappa \mu }\nabla _{x}\int _{{\mathbb {R}}^{d}}U(y)e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\hbox {d}y. \end{aligned}$$

By Fubini theorem with some regularity (i.e., \({\mathbb {E}}\vert U(y)\vert <\infty \)), we can exchange the gradient and integral and get

$$\begin{aligned} \nabla _{x}U_{\mu }(x)&=\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}\nabla _{x}\left( U(y)e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\right) \hbox {d}y\\&=\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}U(y)\nabla _{x}\left( e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\right) \hbox {d}y\\&=\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}U(y)e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\frac{-1}{\mu ^{p}}\left\| y-x\right\| _{p}^{p}\nabla _{x}(\left\| y-x\right\| _{p})\hbox {d}y\\&=\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}U(y)e^{-\frac{1}{p\mu ^{p}}\left\| y-x\right\| _{p}^{p}}\frac{1}{\mu ^{p}}(y-x)\circ \Vert y-x\Vert ^{p-2}\hbox {d}y, \end{aligned}$$

where \(\circ \) stands for the Hadamard product and \(\Vert \cdot \Vert \) is used for absolute value of each component of the vector \(y-x\). Therefore, by changing variable back to \(\xi \), we deduce

$$\begin{aligned} \nabla _{x}U_{\mu }(x)&=\frac{1}{\kappa }\int _{{\mathbb {R}}^{d}}U(x+\mu \xi )e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\frac{1}{\mu }\xi \circ \Vert \xi \Vert ^{p-2}\hbox {d}\xi \\&={\mathbb {E}}_{\xi }\left[ \frac{U(x+\mu \xi )\xi \circ \Vert \xi \Vert ^{p-2}}{\mu }\right] . \end{aligned}$$

In addition, if \(\xi \sim N_{p}(0,I_{d})\), \({\mathbb {E}}\left( \xi \right) =\frac{1}{\kappa }\int \xi e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi =0\) and then \(\nabla _{\xi }{\mathbb {E}}\left( \xi \right) =0\). Since \(\xi e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\) is bounded, we can exchange the gradient and the integral and get

$$\begin{aligned} \nabla _{\xi }\frac{1}{\kappa }\int \xi e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi&=\frac{1}{\kappa }\int \nabla _{\xi }\left( \xi e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\right) \hbox {d}\xi \\ 0&=\frac{1}{\kappa }\int e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi +\frac{1}{\kappa }\int \xi \nabla _{\xi }\left( e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\right) \hbox {d}\xi \\ 0&=1-\frac{1}{\kappa }\int \xi \textrm{e}^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi \right\| _{p}^{p-1}\nabla _{\xi }\left( \left\| \xi \right\| _{p}\right) \hbox {d}\xi \\ 0&=1-\frac{1}{\kappa }\int \xi \cdot \xi \circ \Vert \xi \Vert ^{p-2}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi , \end{aligned}$$

which implies

$$\begin{aligned} \frac{1}{\kappa }\int \xi \cdot \xi \circ \Vert \xi \Vert ^{p-2}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi =1. \end{aligned}$$
(7.9)

On the other hand, we also have \(\frac{1}{\kappa }\int e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi =1\) so \(\nabla _{\xi }\int e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi =0.\) By exchanging the gradient and the integral, we also get

$$\begin{aligned} 0&=\nabla _{\xi }\int e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi \\&=\int \nabla _{\xi }e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi \\&=\int \nabla _{\xi }\left( e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\right) \hbox {d}\xi \\&=-\int e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\xi \circ \Vert \xi \Vert ^{p-2}\hbox {d}\xi \end{aligned}$$

which implies that

$$\begin{aligned} {\mathbb {E}}_{\xi }\left[ \xi \circ \Vert \xi \Vert ^{p-2}\right] =0. \end{aligned}$$
(7.10)

From 7.9 and 7.10, we obtain

$$\begin{aligned} \left\| \nabla U_{\mu }(x)-\nabla U(x)\right\|&=\left\| \frac{1}{\kappa }\int _{{\mathbb {R}}^{d}}\left[ \frac{U(x+\mu \xi )-U(x)}{\mu } -\left\langle \nabla U(x),\xi \right\rangle \right] \right. \\&\qquad \left. \xi \circ \Vert \xi \Vert ^{p-2}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi \right\| \\&{\mathop {\le }\limits ^{_{1}}}\frac{1}{\kappa \mu }\int _{{\mathbb {R}}^{d}}\Vert U(x+\mu \xi )-U(x)-\mu \left\langle \nabla U(x),\xi \right\rangle \Vert \\&\qquad e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi \circ \Vert \xi \Vert ^{p-2}\right\| \hbox {d}\xi \\&{\mathop {\le }\limits ^{_{2}}}\sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\kappa \left( 1+\alpha _{i}\right) }\int _{{\mathbb {R}}^{d}}\left\| \xi \right\| ^{\alpha _{i}+1}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi \circ \Vert \xi \Vert ^{p-2}\right\| \hbox {d}\xi \\&=\sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\kappa \left( 1+\alpha _{i}\right) }\int _{{\mathbb {R}}^{d}}\left\| \xi \right\| ^{\alpha _{i}+1}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi ^{p-1}\right\| \hbox {d}\xi , \end{aligned}$$

where step 1 follows from Jensen inequality, step 2 is due to 2.1 and the last step follows from component-wise operation of norm. If \(p\le 2\), by using generalized Holder inequality, \(\left\| \xi ^{p-1}\right\| \) can be bounded as follows:

$$\begin{aligned} \left\| \xi ^{p-1}\right\|&\le \left\| \xi ^{p-1}\right\| _{p}\nonumber \\&=\left\| \xi ^{p-1}\cdot 1_{d}\right\| _{p}\nonumber \\&{\mathop {\le }\limits ^{}}\left\| \xi \right\| _{p}^{p-1}\left\| 1_{d}\right\| _{p}^{2-p}\nonumber \\&=\left\| \xi \right\| _{p}^{p-1}d^{\frac{2-p}{p}}. \end{aligned}$$
(7.11)

As a result, if \(1\le p\le 2\) we have

$$\begin{aligned} \left\| \nabla U_{\mu }(x)-\nabla U(x)\right\|&\le \sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\kappa \left( 1+\alpha _{i}\right) }\int _{{\mathbb {R}}^{d}}\left\| \xi \right\| ^{\alpha _{i}+1}\left\| \xi \right\| _{p}^{p-1}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi \\&{\mathop {\le }\limits ^{_{1}}}\sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\left( 1+\alpha _{i}\right) }d^{\frac{2-p}{p}}{\mathbb {E}}\left[ \left\| \xi \right\| _{p}^{p+\alpha _{i}}\right] \\&{\mathop {\le }\limits ^{_{2}}}\sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\left( 1+\alpha _{i}\right) }d^{\frac{2-p}{p}}{\mathbb {E}}\left[ \left\| \xi \right\| _{p}^{2p}\right] ^{\frac{p+\alpha }{2p}}\\&{\mathop {\le }\limits ^{_{3}}}\sum _{i}\frac{L_{i}\mu ^{\alpha _{i}}}{\left( 1+\alpha _{i}\right) }d^{\frac{2-p}{p}}\left( d+p\right) ^{\frac{p+\alpha }{p}}\\&{\mathop {\le }\limits ^{}}\sum _{i}L_{i}\mu ^{\alpha _{i}}d^{\frac{3}{p}}, \end{aligned}$$

where step 1 is from \(\left\| \xi \right\| \le \left\| \xi \right\| _{p}\), step 2 follows from Jensen inequality and \(\alpha \le p\), step 3 is due to 2.1 and in the last two steps we have used simplification for large enough d and small enough \(\mu \).

(iii) In this case, using Eqs. 2.1 and 7.10, we get:

$$\begin{aligned} \nabla U_{\mu }(x)=\frac{1}{\kappa }\int _{{\mathbb {R}}^{d}}\left[ \frac{U(x+\mu \xi )-U(x)}{\mu }\right] \xi \circ \Vert \xi \Vert ^{p-2}e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\hbox {d}\xi . \end{aligned}$$

Let \(V(x)=U(x+\mu \xi )-U(x)\), from the above equation, we obtain

$$\begin{aligned}&\left\| \nabla U_{\mu }(y)-\nabla U_{\mu }(x)\right\| \\&\quad =\left\| \frac{1}{\mu \kappa }\int _{{\mathbb {R}}^{d}}\left( V(y)-V(x)\right) e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\xi \circ \Vert \xi \Vert ^{p-2}\hbox {d}\xi \right\| \\&\quad =\left\| \frac{1}{\mu \kappa }\int _{{\mathbb {R}}^{d}}\int _{0}^{1}\left\langle \nabla V\left( ty+\left( 1-t\right) x\right) ,y-x\right\rangle \hbox {d}t\,e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\xi \circ \Vert \xi \Vert ^{p-2}\hbox {d}\xi \right\| \\&\quad =\left\| \frac{1}{\mu \kappa }\int _{{\mathbb {R}}^{d}}\int _{0}^{1}\left\langle \nabla U \left( ty+\left( 1-t\right) x+\mu \xi \right) -\nabla U\left( ty+\left( 1-t\right) x\right) ,y-x\right\rangle \right. \\&\qquad \left. \hbox {d}t\,e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\xi \circ \Vert \xi \Vert ^{p-2}\hbox {d}\xi \right\| \\&\quad \le \frac{1}{\mu \kappa }\int \int _{0}^{1}\left\| \nabla U\left( ty+\left( 1-t\right) x+\mu \xi \right) -\nabla U\left( ty+\left( 1-t\right) x\right) \right\| \left\| y-x\right\| \\&\qquad \hbox {d}t\,e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi \circ \Vert \xi \Vert ^{p-2}\right\| \hbox {d}\xi \\&\quad \le \sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}\kappa }\int _{{\mathbb {R}}^{d}}\left\| \xi \right\| ^{\alpha _{i}}\left\| y-x\right\| \,e^{-\frac{1}{p}\left\| \xi \right\| _{p}^{p}}\left\| \xi ^{p-1}\right\| \hbox {d}\xi . \end{aligned}$$

Since \(p\le 2\), we have

$$\begin{aligned}&\left\| \nabla U_{\mu }(y)-\nabla U_{\mu }(x)\right\| \\&\quad \le \sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2-p}{p}}{\mathbb {E}}\left( \left\| \xi \right\| _{p}^{p-1+\alpha }\right) \left\| y-x\right\| \\&\quad {\mathop {\le }\limits ^{_{1}}}\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2-p}{p}}{\mathbb {E}}\left( \left\| \xi \right\| _{p}^{p}\right) ^{\frac{p-1+\alpha }{p}}\left\| y-x\right\| \\&\quad {\mathop {\le }\limits ^{_{2}}}\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2-p}{p}}\left( d+\frac{p}{2}\right) ^{\frac{p-1+\alpha }{p}}\left\| y-x\right\| \\&\quad {\mathop {\le }\limits ^{}}\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}\left\| y-x\right\| , \end{aligned}$$

where step 1 follows from Jensen inequality and \(\alpha _{i}\le 1\), step 2 is due to 2.1 and in the last two steps is because of simplification for large enough d and small enough \(\mu \).

\(\square \)

1.3 Appendix C: Proofs Under LSI

1.3.1 Proof of Lemma 3.3

Lemma 7.6

Suppose \(\pi =e^{-U}\) satisfies \(\alpha \)-mixture weakly smooth. Let \(p_{0}=N(0,\frac{1}{L}I)\). Then, \(H(p_{0}\vert \pi )\le U(0)-\frac{d}{2}\log \frac{2\Pi e}{L}+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left( \frac{d}{L}\right) ^{\frac{1+\alpha _{i}}{2}}=O(d).\)

Proof

Since U is mixture weakly smooth, for all \(x\in {\mathbb {R}}^{d}\) we have

$$\begin{aligned} U(x)&\le U(0)+\langle \nabla U(0),x\rangle +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\Vert x\Vert ^{1+\alpha _{i}}\\&=U(0)+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\Vert x\Vert ^{1+\alpha _{i}}. \end{aligned}$$

Let \(X\sim \rho =N(0,\frac{1}{L}I)\). Then

$$\begin{aligned} {\mathbb {E}}_{\rho }[U(X)]&\le U(0)+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}{\mathbb {E}}_{\rho }\left( \Vert x\Vert ^{1+\alpha _{i}}\right) \\&\le U(0)+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}{\mathbb {E}}_{\rho }\left( \Vert x\Vert ^{2}\right) ^{\frac{1+\alpha _{i}}{2}}\\&\le U(0)+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left( \frac{d}{L}\right) ^{\frac{1+\alpha _{i}}{2}}. \end{aligned}$$

Recall the entropy of \(\rho \) is \(H(\rho )=-{\mathbb {E}}_{\rho }[\log \rho (X)]=\frac{d}{2}\log \frac{2\Pi e}{L}\). Therefore, the KL divergence is

$$\begin{aligned} {\mathbb {E}}(\rho \vert \pi )&=\int \rho \left( \log \rho +U\right) \hbox {d}x\\&=-H(\rho )+{\mathbb {E}}_{\rho }[U]\\&\le U(0)-\frac{d}{2}\log \frac{2\Pi e}{L}+\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left( \frac{d}{L}\right) ^{\frac{1+\alpha _{i}}{2}}\\&=O(d). \end{aligned}$$

This is the desired result. \(\square \)

1.3.2 Proof of Lemma 3.3

Lemma 7.7

Assume \(\pi =e^{-U(x)}\) is \(\alpha \)-mixture weakly smooth. Then

$$\begin{aligned} {\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] \le 2\left( \sum _{i}L_{i}\right) ^{2}d^{\frac{3}{p}}. \end{aligned}$$

Proof

It is well known that for any test function \(\phi \left( x\right) :{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\), c.f. [38] we have

$$\begin{aligned} \frac{\hbox {d}}{\hbox {d}t}{\mathbb {E}}_{p_{t}}\left[ \phi \left( x\right) \right] =\int \left( \left( \triangle \phi \left( x\right) \right) -\left\langle \nabla U\left( x\right) ,\nabla \phi \left( x\right) \right\rangle \right) p_{t}\left( x\right) \hbox {d}x. \end{aligned}$$

Since \(\pi \) is stationary distribution of \(p_{t}(x)\), let \(\phi \left( x\right) =U_{\mu }\left( x\right) \), we have

$$\begin{aligned} \frac{\hbox {d}}{\hbox {d}t}{\mathbb {E}}_{\pi }\left[ U_{\mu }\left( x\right) \right] =\int \left( \left( \triangle U_{\mu }\left( x\right) \right) -\left\langle \nabla U\left( x\right) ,\nabla U_{\mu }\left( x\right) \right\rangle \right) \pi \left( x\right) \hbox {d}x=0. \end{aligned}$$

So

$$\begin{aligned} {\mathbb {E}}_{\pi }\left\langle \nabla U\left( x\right) ,\nabla U_{\mu }\left( x\right) \right\rangle&={\mathbb {E}}_{\pi }\left( \triangle U_{\mu }\left( x\right) \right) {\mathop {\le }\limits ^{}}d\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}, \end{aligned}$$

where the last step comes from Lemma 2.10 that \(\nabla U_{\mu }\left( x\right) \) is \(\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}\)-Lipschitz, \(\nabla ^{2}U_{\mu }\left( x\right) \preceq \left( \sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}\right) \,I\). In addition,

$$\begin{aligned} {\mathbb {E}}_{\pi }\left\langle \nabla U\left( x\right) ,\nabla U_{\mu }\left( x\right) \right\rangle&={\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] +{\mathbb {E}}_{\pi }\left\langle \nabla U\left( x\right) ,\nabla U_{\mu }\left( x\right) -\nabla U\left( x\right) \right\rangle \\&{\mathop {\ge }\limits ^{_{1}}}{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] -{\mathbb {E}}_{\pi }\left\| \nabla U\left( x\right) \right\| \left\| \nabla U_{\mu }\left( x\right) -\nabla U\left( x\right) \right\| \\&{\mathop {\ge }\limits ^{}}{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] -\sqrt{{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] }\sum _{i}L_{i}\mu ^{\alpha _{i}}d^{\frac{3}{p}}, \end{aligned}$$

where step 1 follows from Young inequality and the last step comes from Cauchy inequality and Lemma 2.10. From quadratic inequality,

$$\begin{aligned} {\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] -\sqrt{{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] }\sum _{i}L_{i}\mu ^{\alpha _{i}}d^{\frac{3}{p}}\le d\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}} \end{aligned}$$

and since \(\sqrt{{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] }\ge 0\), we obtain

$$\begin{aligned} \sqrt{{\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right] }&\le \frac{1}{2}\left[ \sqrt{\left( \sum _{i}L_{i}\mu ^{\alpha _{i}}\right) ^{2}d^{\frac{6}{p}}+4d\sum _{i}\frac{L_{i}}{\mu ^{1-\alpha _{i}}}d^{\frac{2}{p}}}+\sum _{i}L_{i}\mu ^{\alpha _{i}}d^{\frac{3}{p}}\right] . \end{aligned}$$

Since it is true for every \(\mu ,\) simply choose \(\mu =1,\) we get

$$\begin{aligned} {\mathbb {E}}_{\pi }\left[ \left\| \nabla U(x)\right\| ^{2}\right]&\le \frac{1}{4}\left[ \sqrt{\left( \sum _{i}L_{i}\right) ^{2}d^{\frac{6}{p}}+4d\left( \sum _{i}L_{i}\right) d^{\frac{2}{p}}}+\sum _{i}L_{i}d^{\frac{3}{p}}\right] ^{2}\\&\le 2\left( \sum _{i}L_{i}\right) ^{2}d^{\frac{3}{p}}, \end{aligned}$$

for large enough d. \(\square \)

1.3.3 Proof of Lemma 3.1

Lemma 7.8

Suppose \(\pi \) is \(\gamma -\)log-Sobolev, \(\alpha \)-mixture weakly smooth with \(\max \left\{ L_{i}\right\} =L\ge 1\). If \(0<\eta \le \left( \frac{\gamma }{9N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\), then along each step of ULA (3.6),

$$\begin{aligned} H(p_{k+1}\vert \pi )\le e^{-\gamma \eta }H(p_{k}\vert \pi )+2\eta ^{\alpha +1}D_{3}, \end{aligned}$$
(7.12)

where \(D_{3}=\sum _{i}10N^{3}L^{6}+16NL^{4}+8N^{2}L^{4}d^{\frac{3}{p}}+4NL^{2}d\).

Proof

We adapt the proof of [38]. First, recall that the discretization of the LMC is

$$\begin{aligned} x_{k,t}{\mathop {=}\limits ^{}}x_{k}-t\nabla U(x_{k})+\sqrt{2t}\,z_{k}, \end{aligned}$$

where \(z_{k}\sim N(0,I)\) is independent of \(x_{k}\). Let \(x_{k}\sim p_{k}\) and \(x^{*}\sim \pi \) with an optimal coupling \((x_{k},x^{*})\) so that \({\mathbb {E}}[\Vert x_{k}-x^{*}\Vert ^{2}]=W_{2}(p_{k},\pi )^{2}\). Let \(D_{1i}=8NL_{i}^{2+2\alpha _{i}}\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) +16L_{i}^{2+2\alpha _{i}}+8L_{i}^{2}\left( \sum _{i}L_{i}\right) ^{2}d^{\frac{3}{p}}+4L_{i}^{2}d^{\alpha _{i}}\), we deduce

$$\begin{aligned}&L_{i}^{2}E_{p_{k}}\left[ \left\| -t\nabla U(x_{k})+\sqrt{2t}z_{k}\right\| ^{2\alpha _{i}}\right] \nonumber \\&\quad {\mathop {\le }\limits ^{_{1}}}2L_{i}^{2}t^{2\alpha _{i}}{\mathbb {E}}_{p_{k}}\left[ \left\| \nabla U(x_{k})\right\| ^{2\alpha _{i}}\right] +4L_{i}^{2}t^{\alpha _{i}}{\mathbb {E}}_{p_{k}}\left[ \left\| z_{k}\right\| ^{2\alpha _{i}}\right] \nonumber \\&\quad {\mathop {\le }\limits ^{_{2}}}2L_{i}^{2}t^{2\alpha _{i}}{\mathbb {E}}_{p_{k}}\left[ \left\| \nabla U(x_{k})\right\| ^{2\alpha _{i}}\right] +4L_{i}^{2}t^{\alpha _{i}}{\mathbb {E}}_{p_{k}}\left[ \left\| z_{k}\right\| ^{2}\right] ^{\alpha _{i}}\nonumber \\&\quad {\mathop {\le }\limits ^{_{3}}}4L_{i}^{2}t^{2\alpha _{i}}{\mathbb {E}}\left[ \left\| \nabla U(x_{k})-\nabla U(x^{*})\right\| ^{2\alpha _{i}}+\left\| \nabla U(x^{*})\right\| ^{2\alpha _{i}}\right] +4L_{i}^{2}t^{\alpha _{i}}d^{\alpha _{i}}\nonumber \\&\quad {\mathop {\le }\limits ^{_{4}}}4L_{i}^{2}t^{2\alpha _{i}}\mathrm {{\mathbb {E}}}\left( \sum _{i}L_{i}\left\| x_{k}-x^{*}\right\| ^{\alpha _{i}}\right) ^{2\alpha _{i}}+4L_{i}^{2}t^{2\alpha _{i}}{\mathbb {E}}\left\| \nabla U(x^{*})\right\| ^{2\alpha _{i}}+4L_{i}^{2}t^{\alpha _{i}}d^{\alpha _{i}}\nonumber \\&\quad \le 8L_{i}^{2+2\alpha _{i}}t^{2\alpha _{i}}N\sum _{j}L_{i}^{2\alpha _{i}}\mathrm {{\mathbb {E}}}\left[ \left\| x_{k}-x^{*}\right\| ^{2\alpha _{j}\alpha _{i}}\right] +4L_{i}^{2}t^{2\alpha }{\mathbb {E}}\left\| \nabla U(x^{*})\right\| ^{2}\nonumber \\&\qquad +4L_{i}^{2}t^{2\alpha }+4L_{i}^{2}t^{\alpha }d^{\alpha }\nonumber \\&\quad {\mathop {\le }\limits ^{_{5}}}8NL_{i}^{2+2\alpha _{i}}t^{2\alpha _{i}}\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) \mathrm {{\mathbb {E}}}\left[ 1+\left\| x_{k}-x^{*}\right\| ^{2}\right] +4L_{i}^{2}t^{2\alpha }{\mathbb {E}}\left\| \nabla U(x^{*})\right\| ^{2}\nonumber \\&\qquad +4L_{i}^{2}t^{2\alpha }+4L_{i}^{2}t^{\alpha }d^{\alpha }\nonumber \\&\quad {\mathop {\le }\limits ^{}}8NL_{i}^{2+2\alpha _{i}}\eta ^{2\alpha }\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) \mathrm {{\mathbb {E}}}\left[ \left\| x_{k}-x^{*}\right\| ^{2}\right] \nonumber \\&\qquad +\left( 8NL_{i}^{2+2\alpha _{i}}\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) +16L_{i}^{2+2\alpha _{i}}+8L_{i}^{2}\left( \sum _{i}L_{i}\right) ^{2}d^{\frac{3}{p}}+4L_{i}^{2}d^{\alpha _{i}}\right) \eta ^{\alpha _{i}}\nonumber \\&\quad \le \frac{16N}{\gamma }\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) L^{2+2\alpha _{i}}\eta ^{2\alpha _{i}}H(p_{k}\vert \pi )+D_{1i}\eta ^{\alpha _{i}}, \end{aligned}$$
(7.13)

where step 1 follows from Lemma 7.22 in Appendix F, step 2 is from \(\alpha \le 1\) and Jensen’s inequality, step 3 comes from normal distribution, and step 4 follows our Assumption 2.1, and in step 5 we have used \(\alpha _{i}\le 1\) and the last step is due to Talagrand inequality which comes from log-Sobolev inequality and Lemma 7.25 in Appendix F. Similarly, we get

$$\begin{aligned}&\mathrm {{\mathbb {E}}}_{p_{kt}}\left\| \nabla U(x_{k})-\nabla U(x_{k,t})\right\| ^{2}\nonumber \\&\quad {\mathop {\le }\limits ^{_{1}}}\sum _{i}L_{i}^{2}\mathrm {{\mathbb {E}}}_{p_{kt}}\left\| {\tilde{x}}_{k,t}-x_{k}\right\| ^{2\alpha _{i}}\nonumber \\&\quad =\sum _{i}L_{i}^{2}\mathrm {{\mathbb {E}}}_{p_{k}}\left\| -t\nabla U(x_{k})+\sqrt{2t}z_{k}\right\| ^{2\alpha _{i}}\nonumber \\&\quad {\mathop {\le }\limits ^{_{2}}}\sum _{i}\frac{16N}{\gamma }\left( \left( \sum _{j}L_{i}\right) ^{2}+1\right) L^{2+2\alpha _{i}}\eta ^{2\alpha _{i}}H(p_{k}\vert \pi )+\left( \sum _{i}D_{1i}\eta ^{\alpha _{i}}\right) \nonumber \\&\quad {\mathop {\le }\limits ^{_{3}}}\frac{20N^{3}}{\gamma }L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{3}\eta ^{\alpha }, \end{aligned}$$
(7.14)

where step 1 follows from Assumption 2.1, step 2 comes from similar reasoning as equation (7.13), and the last step comes from \(\eta \le \frac{1}{L}\) and \(\eta \le 1\) and definition of \(D_{3}\). Therefore, from [38] Lemma 3, the time derivative of KL divergence along LMC is bounded by

$$\begin{aligned} \frac{\hbox {d}}{\hbox {d}t}H\left( p_{k,t}\vert \pi \right)&\le -\frac{3}{4}I\left( p_{k,t}\vert \pi \right) +{\mathbb {E}}_{p_{kt}}\left[ \left\| \nabla U(x_{k,t})-\nabla U(x_{k})\right\| ^{2}\right] \\&\le -\frac{3}{4}I(p_{k}\vert \pi )+\frac{20N^{3}}{\gamma }L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{3}\eta ^{\alpha }\\&\le -\frac{3\gamma }{2}H(p_{k,t}\vert \pi )+\frac{20N^{3}}{\gamma }L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{3}\eta ^{\alpha }, \end{aligned}$$

where in the last inequality we have used the definition 7.1 of LSI inequality. Note that we do not use the Lipschitz gradient condition in the original [38] Lemma 3 here and we provide a modified version [38] Lemma 3 for stochastic gradient in the later part.

Multiplying both sides by \(e^{\frac{3\gamma }{2}t}\) and integrating both sides from \(t=0\) to \(t=\eta \), we obtain

$$\begin{aligned}&e^{\frac{3\gamma }{2}\eta }H(p_{k+1}\vert \pi )-H(p_{k}\vert \pi )\nonumber \\&\quad \le 2\left( \frac{e^{\frac{3\gamma }{2}\eta }-1}{3\gamma }\right) \left( \frac{20N^{3}}{\gamma }L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{3}\eta ^{\alpha }\right) \end{aligned}$$
(7.15)
$$\begin{aligned}&\quad \le 2\eta \left( \frac{20N^{3}}{\gamma }L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{3}\eta ^{\alpha }\right) , \end{aligned}$$
(7.16)

where the last line holds by \(e^{c}\le 1+2c\) for \(0<c=\frac{3\gamma }{2}\eta <1\). Rearranging the term of the above inequality and using the facts that \(1+\eta ^{1+2\alpha }\frac{40N^{3}}{\gamma }L^{6}\le 1+\frac{\gamma \eta }{2}\le e^{\frac{\gamma \eta }{2}}\) when \(\eta \le \left( \frac{\gamma }{9N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\) and \(e^{-\frac{3\gamma }{2}\eta }\le 1\) leads to

$$\begin{aligned} H(p_{k+1}\vert \pi )&\le e^{-\frac{3\gamma }{2}\eta }\left( 1+\eta ^{1+2\alpha }\frac{40N^{3}}{\gamma }L^{6}\right) H(p_{k}\vert \pi )+2\eta ^{\alpha +1}D_{3}\nonumber \\&\le e^{-\gamma \eta }H(p_{k}\vert \pi )+2\eta ^{\alpha +1}D_{3}. \end{aligned}$$
(7.17)

as desired. \(\square \)

1.3.4 Proof of Theorem 3.2

Theorem 7.9

Suppose \(\pi \) is \(\gamma -\)log-Sobolev, \(\alpha \)-mixture weakly smooth. Let \(L=1\vee \max \left\{ L_{i}\right\} \), and for any \(x_{0}\sim p_{0}\) with \(H(p_{0}\vert \pi )=C_{0}<\infty \), the iterates \(x_{k}\sim p_{k}\) of ULA  with step size

$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma },\left( \frac{\gamma }{9N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\right\} \end{aligned}$$
(7.18)

satisfies

$$\begin{aligned} H(p_{k}\vert \pi )\le e^{-\frac{3\gamma }{2}\eta k}H(p_{0}\vert \pi )+2\eta ^{\alpha +1}D_{3}, \end{aligned}$$
(7.19)

where \(D_{3}=\sum _{i}10N^{3}L^{6}+16NL^{4}+8N^{2}L^{4}d^{\frac{3}{p}}+4NL^{2}d\). Then, for any \(\epsilon >0\), to achieve \(H(p_{k}\vert \pi )<\epsilon \), it suffices to run LMC with step size

$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma },\left( \frac{\gamma }{9N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }},\left( \frac{3\epsilon \gamma }{16D_{3}}\right) ^{\frac{1}{\alpha }}\right\} \end{aligned}$$
(7.20)

for \(k\ge \frac{1}{\gamma \eta }\log \frac{2\,H\left( p_{0}\vert \pi \right) }{\epsilon }\) iterations.

Proof

Applying inequality 7.19 recursively, and using the inequality \(1-e^{-c}\ge \frac{3}{4}c\) for \(0<c=\gamma \eta \le \frac{1}{4}\) we obtain

$$\begin{aligned} H(p_{k}\vert \pi )&\le \,e^{-\gamma \eta k}H(p_{0}\vert \pi )+\frac{2\eta ^{\alpha +1}D_{3}}{1-e^{-\gamma \eta }}\nonumber \\&\le \,e^{-\gamma \eta k}H(p_{0}\vert \pi )+\frac{2\eta ^{\alpha +1}D_{3}}{\frac{3}{4}\gamma \eta }\nonumber \\&\le \,e^{-\gamma \eta k}H(p_{0}\vert \pi )+\frac{8\eta ^{\alpha }D_{3}}{3\gamma }. \end{aligned}$$
(7.21)

Note that last inequality holds if we choose \(\eta \) such that it satisfies

$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma },\left( \frac{\gamma }{9N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\right\} . \end{aligned}$$

Given \(\epsilon >0\), if we further assume \(\eta \le \left( \frac{3\epsilon \gamma }{16D_{3}}\right) ^{\frac{1}{\alpha }}\), then the above implies \(H(p_{k}\vert \pi )\le e^{-\gamma \eta k}H(p_{0}\vert \pi )+\frac{\epsilon }{2}.\) This means for \(k\ge \frac{1}{\gamma \eta }\log \frac{2\,H\left( p_{0}\vert \pi \right) }{\epsilon },\) we have \(H(p_{k}\vert \pi )\le \frac{\epsilon }{2}+\frac{\epsilon }{2}=\epsilon \), as desired. \(\square \)

1.4 Appendix D: Proof of Sampling via Smoothing Potential

1.4.1 Proof of Lemma 3.4

Lemma 7.10

For any \(x_{k}\in {\mathbb {R}}^{d}\), then \(g_{\mu }(x_{k},\zeta _{1})\) is an unbiased estimator of \(\nabla U_{\mu }\) such that

$$\begin{aligned} \textrm{Var}\left[ g_{\mu }(x_{k},\zeta _{1})\right]&\le 4N^{2}L^{2}\mu ^{2\alpha }d^{\frac{2\alpha }{p}}. \end{aligned}$$

Proof

Recall that by definition of \(U_{\mu }\), we have \(\nabla U_{\mu }(x)=\mathrm {\mathrm {{\mathbb {E}}}}_{\zeta }[U(x+\mu {\zeta })]\), where \({\zeta }\sim N_{p}(0,I_{d})\), and is independent of \(\zeta _{1}\). Clearly, \({\mathbb {E}}{}_{{{\zeta _{1}}}}[g_{\mu }(x,\zeta _{1})]={\mathbb {E}}{}_{{{\zeta _{1}}}}\nabla U(x+\mu \zeta _{1})=\nabla {\mathbb {E}}{}_{{{\zeta _{1}}}}U(x+\mu \zeta _{1})=\nabla U_{\mu }(x)\) by exchange gradient and expectation and the definition of \(U_{\mu }(x)\). We now proceed to bound the variance of \(g_{\mu }(x,\zeta _{1})\). We have:

$$\begin{aligned}&\mathrm {{\mathbb {E}}}_{{\zeta _{1}}} [\Vert \nabla U_{\mu }(x)-g_{\mu }(x,\zeta _{1})\Vert _{2}^{2}]\\&\quad \le \mathrm {{\mathbb {E}}}_{\zeta _{1},{\zeta }}[\Vert \nabla U(x+\mu {\zeta })-\nabla U(x+\mu {\zeta _{1}})\Vert ^{2}].\\&\quad \le N\sum _{i}L_{i}^{2}\mathrm {{\mathbb {E}}}_{{\zeta _{1}},{\zeta }}[\Vert \mu ({\zeta }-{\zeta _{1}})\Vert ^{2\alpha _{i}}\\&\quad \le N\sum _{i}L_{i}^{2}\mu ^{2\alpha _{i}}\mathrm {{\mathbb {E}}}_{\zeta _{1},{\zeta }}[\Vert {\zeta }-{\zeta _{1}}\Vert ^{2\alpha _{i}}]\\&\quad \le 2N\sum _{i}L_{i}^{2}\mu ^{2\alpha _{i}}\left( \mathrm {{\mathbb {E}}}\left[ \Vert {\zeta }\Vert ^{2\alpha _{i}}\right] +\mathrm {{\mathbb {E}}}\left[ \Vert {\zeta _{1}}\Vert ^{2\alpha _{i}}\right] \right) \\&\quad \le 2N\sum _{i}L_{i}^{2}\mu ^{2\alpha _{i}}\left( \left( \mathrm {{\mathbb {E}}}\left[ \Vert {\zeta }\Vert ^{2}\right] \right) ^{\alpha _{i}}+\left( \mathrm {{\mathbb {E}}}\left[ \Vert \zeta _{1}\Vert ^{2}\right] \right) ^{\alpha _{i}}\right) \\&\quad \le 4N\sum _{i}L_{i}^{2}\mu ^{2\alpha _{i}}d^{\frac{2\alpha _{i}}{p}}\\&\quad \le 4N^{2}L^{2}\mu ^{2\alpha }d^{\frac{2\alpha }{p}}, \end{aligned}$$

as claimed. \(\square \)

1.4.2 Proof of Lemma 3.6

Before proving Theorem 3.6, we need an additional lemma.

Lemma 7.11

([38] modified Lemma 3) Suppose \(x_{k,t}\) is the interpolation of the discretized process (1.2). Let \(p_{k,t}\), \(p_{kt}\) and \(p_{kt\zeta }\) denote its distribution, the joint distribution of \(x_{k,t}\) and \(x_{k}\) and the joint distribution of \(x_{k,t}\), \(x_{k}\) and \(\zeta \), respectively. Here \(g_{\mu }(x_{k},\zeta )\) is an estimate of \(\nabla U_{\mu }(x_{k})\) with noise \(\zeta \) such that \(E_{\zeta }g_{\mu }(x_{k},\zeta )=\nabla U_{\mu }(x_{k})\). Then,

$$\begin{aligned} {\displaystyle \frac{\hbox {d}}{\hbox {d}t}H\left( p_{k,t}\vert \pi _{\mu }\right) \le -\frac{3}{4}I\left( p_{k,t}\vert \pi _{\mu }\right) +{\mathbb {E}}_{p_{kt\zeta }}\left[ \left\| \nabla U_{\mu }(x_{k,t})-g_{\mu }(x_{k},\zeta )\right\| ^{2}\right] }. \end{aligned}$$
(7.22)

Proof

The steps follow exactly as in Lemma 3, and we provide the proof here for completeness. For each \(t>0\), let \(p_{k\zeta \vert t}(x_{k},\zeta )\) denote the distributions of \(x_{k}\) and \(\zeta \) conditioned on \(x_{k,t}\) and \(p_{t\vert k\zeta }(x_{k,t})\) denote the distributions of \(x_{k,t}\) conditioned on \(x_{k}\) and \(\zeta \). Following the Fokker–Planck equation, we have

$$\begin{aligned} \frac{\partial p_{t\vert k\zeta }(x_{k,t})}{\partial t}=\nabla \cdot \left( p_{t\vert k\zeta }(x_{k,t})g_{\mu }(x_{k},\zeta )\right) +\triangle p_{t\vert k\zeta }(x_{k,t}), \end{aligned}$$
(7.23)

which integrating with respect to \(x_{k}\) and \(\zeta \) achieves

$$\begin{aligned} \frac{\partial p_{k,t}(x)}{\partial t}&=\int \int \frac{\partial p_{t\vert k\zeta }(x)}{\partial t}p_{k\zeta }(x_{k},\zeta )\hbox {d}x_{k}\hbox {d}\zeta \nonumber \\&=\int \int \left( \nabla \cdot \left( p_{t\vert k\zeta }(x_{k,t})g_{\mu }(x_{k},\zeta )\right) +\triangle p_{t\vert k\zeta }(x_{k,t})\right) \hbox {d}x_{k}\hbox {d}\zeta \nonumber \\&=\int \int \left( \nabla \cdot \left( p_{t\vert k\zeta }(x_{k,t})g_{\mu }(x_{k},\zeta )\right) \right) +\triangle p_{k,t}(x)\nonumber \\&=\nabla \cdot (p_{k,t}(x)\int \int p_{k\zeta \vert t}(x_{k})g_{\mu }(x_{k},\zeta )\hbox {d}x_{k}\hbox {d}\zeta )+\triangle p_{k,t}(x) \end{aligned}$$
(7.24)
$$\begin{aligned}&=\nabla \cdot \ (p_{k,t}(x)\mathrm {{\mathbb {E}}}_{p_{k\zeta \vert t}}[g_{\mu }(x_{k},\zeta )\vert x_{k,t}=x])+\triangle p_{k,t}(x). \end{aligned}$$
(7.25)

Combining with \(\int p_{t}\frac{\partial }{\partial t}\log \frac{p_{t}}{\pi _{\mu }}\,\hbox {d}x=\int \frac{\partial p_{t}}{\partial t}\,\hbox {d}x=\frac{\hbox {d}}{\hbox {d}t}\int p_{t}\,\hbox {d}x=0\), we get the following inequality for time derivative of KL-divergence.

$$\begin{aligned} \frac{\hbox {d}}{\hbox {d}t}H\left( p_{k,t}\vert \pi _{\mu }\right)&=\frac{\hbox {d}}{\hbox {d}t}\int p_{k,t}(x)\log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&=\int \frac{\partial p_{k,t}}{\partial t}(x)\log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&=\int \left[ \nabla \cdot \left( p_{k,t}(x)\mathrm {{\mathbb {E}}}_{p_{k\zeta \vert t}}[g_{\mu }(x_{k},\zeta )\vert x_{k,t}=x]\right) \right] \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&\quad +\int \left[ \triangle p_{k,t}(x)\right] \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&{\mathop {=}\limits ^{\left( i\right) }}\int \left[ \nabla \cdot \left( p_{k,t}(x)\mathrm {{\mathbb {E}}}_{p_{k\zeta \vert t}}[g_{\mu }(x_{k},\zeta )\vert x_{k,t}=x]\right) \right] \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&\quad +\int \left[ \nabla \cdot \left( \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) -\nabla U_{\mu }(x)\right) \right] \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \hbox {d}x\nonumber \\&{\mathop {=}\limits ^{\left( ii\right) }}-\int p_{k,t}(x)\left\langle \mathrm {{\mathbb {E}}}_{p_{k\zeta \vert t}}[g_{\mu }(x_{k},\zeta )\vert x_{k,t}=x],\ \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \right\rangle \hbox {d}x\nonumber \\&\quad -\int p_{k,t}(x)\left\langle \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) -\nabla U_{\mu }(x),\ \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \right\rangle \hbox {d}x\nonumber \\&=-I\left( p_{k,t}\vert \pi _{\mu }\right) \nonumber \\&\quad +\int p_{k,t}(x)\left\langle \nabla U_{\mu }(x)-\mathrm {{\mathbb {E}}}_{p_{k\zeta \vert t}}[g_{\mu }(x_{k},\zeta )\vert x_{k,t}=x],\ {\displaystyle \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) }\right\rangle \hbox {d}x\nonumber \\&=-I\left( p_{k,t}\vert \pi _{\mu }\right) +\mathrm {{\mathbb {E}}}_{p_{kt\zeta }}\left\langle \nabla U_{\mu }(x_{k,t})-g_{\mu }(x_{k},\zeta ),\ {\displaystyle \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) }\right\rangle \nonumber \\&{\mathop {\le }\limits ^{\left( iii\right) }}-I\left( p_{k,t}\vert \pi _{\mu }\right) \nonumber \\&\quad +\textrm{E}_{p_{kt\zeta }}\left\| \nabla U_{\mu }(x_{k,t})-g_{\mu }(x_{k},\zeta )\right\| ^{2}+\frac{1}{4}\mathrm {{\mathbb {E}}}_{p_{k,t}}\left\| \nabla \log \left( \frac{p_{k,t}(x)}{\pi _{\mu }(x)}\right) \right\| ^{2}\nonumber \\&=-\frac{3}{4}I\left( p_{k,t}\vert \pi _{\mu }\right) +\mathrm {{\mathbb {E}}}_{p_{kt\zeta }}\left\| \nabla U_{\mu }(x_{k,t})-g_{\mu }(x_{k},\zeta )\right\| ^{2} \end{aligned}$$
(7.26)

in which equality \(\left( i\right) \) follows from \(\triangle p_{k,t}=\nabla \cdot (\nabla p_{k,t})\), equality \(\left( ii\right) \) follows from the divergence theorem, inequality \(\left( iii\right) \) follows from \(\left\langle u,\ v\right\rangle {\displaystyle \le \Vert u\Vert ^{2}+\frac{1}{4}\Vert v\Vert ^{2}}\), and in the last step, the expectation is taken with respect to both \(x_{k}\),\(x_{k,t}\) and \(\zeta \). \(\square \)

We now ready to state and prove Theorem 3.6.

Theorem 7.12

Suppose \(\pi _{\mu }\) is \(\gamma _{1}-\)log-Sobolev, \(\alpha \)-mixture weakly smooth, \(L=1\vee \max \left\{ L_{i}\right\} \), and for any \(x_{0}\sim p_{0}\) with \(H(p_{0}\vert \pi )=C_{0}<\infty \), the iterates \(x_{k}\sim p_{k}\) of ULA  with step size

$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma },\left( \frac{\gamma _{1}}{13N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\right\} \end{aligned}$$
(7.27)

satisfies

$$\begin{aligned} H(p_{k}\vert \pi _{\mu })\le e^{-\frac{3\gamma _{1}}{2}\eta k}H(p_{0}\vert \pi _{\mu })+2\eta ^{\alpha +1}D_{4}, \end{aligned}$$
(7.28)

where \(D_{4}=\sum _{i}10N^{3}L^{6}+16NL^{4}+8N^{2}L^{4}d^{\frac{3}{p}}+4NL^{2}d+8N^{2}L^{2}d^{\frac{2\alpha }{p}}\). Then, for any \(\epsilon >0\), to achieve \(H(p_{k}\vert \pi )<\epsilon \), it suffices to run LMC with step size

$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma _{1}},\left( \frac{\gamma _{1}}{13N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }},\left( \frac{3\epsilon \gamma _{1}}{16D_{4}}\right) ^{\frac{1}{\alpha }}\right\} \end{aligned}$$
(7.29)

for \(k\ge \frac{2}{\gamma _{1}\eta }\log \frac{3\,H\left( p_{0}\vert \pi _{\mu }\right) }{\epsilon }\) iterations.

Proof

We adapt the proof of [38]. First, recall that the discretization of the ULA is

$$\begin{aligned} x_{k,t}{\mathop {=}\limits ^{}}x_{k}-\eta g_{\mu }(x_{k},\zeta )+\sqrt{2\eta }\,z_{k}, \end{aligned}$$

where \(z_{k}\sim N(0,I)\) is independent of \(x_{k}\). Let \(x_{k}\sim p_{k}\) and \(x^{*}\sim \pi \) with an optimal coupling \((x_{k},x^{*})\) so that \({\mathbb {E}}[\Vert x_{k}-x^{*}\Vert ^{2}]=W_{2}(p_{\mu ,k},\pi _{\mu })^{2}\). Choosing \(\mu =\sqrt{\eta }\), we have

$$\begin{aligned}&\mathrm {{\mathbb {E}}}_{p_{kt\zeta }}\left\| \nabla U_{\mu }(x_{k,t})-g_{\mu }(x_{k},\zeta )\right\| ^{2}\\&\quad {\mathop {\le }\limits ^{_{1}}}2\left[ \mathrm {{\mathbb {E}}}_{p_{kt\zeta }}\left\| \nabla U_{\mu }(x_{k,t})-\nabla U_{\mu }(x_{k})\right\| ^{2}+\left\| \nabla U_{\mu }(x_{k})-g_{\mu }(x_{k},\zeta )\right\| ^{2}\right] \\&\quad {\mathop {\le }\limits ^{_{2}}}\frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi _{\mu })+D_{3}\eta ^{\alpha }+8N^{2}L^{2}\mu ^{2\alpha }d^{\frac{2\alpha }{p}}\\&\quad {\mathop {\le }\limits ^{}}\frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi _{\mu })+D_{4}\eta ^{\alpha }, \end{aligned}$$

where step 1 follows from Young inequality and Assumption 2, step 2 comes from equation (7.14), and the last step comes from \(\eta \le \frac{1}{L}\) and \(\eta \le 1\) and the definition of \(D_{4}\). Therefore, from Lemma 3.6, the time derivative of KL divergence along LMC is bounded by:

$$\begin{aligned} \frac{\hbox {d}}{\hbox {d}t}H\left( p_{k,t}\vert \pi _{\mu }\right)&\le -\frac{3}{4}I(p_{k,t}\vert \pi _{\mu })+\frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi _{\mu })+D_{4}\eta ^{\alpha }\nonumber \\&\le -{\frac{3\gamma _{1}}{2}}H(p_{k,t}\vert \pi _{\mu })+\frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi _{\mu })+D_{4}\eta ^{\alpha }, \end{aligned}$$
(7.30)

where in the last inequality we have used the definition 7.1 of LSI inequality. Multiplying both sides by \(e^{\frac{3\gamma _{1}}{2}t}\), and integrating both sides from \(t=0\) to \(t=\eta \) we obtain

$$\begin{aligned} e^{\frac{3\gamma }{2}\eta }H(p_{k+1}\vert \pi _{\mu })-H(p_{k}\vert \pi _{\mu })&\le 2\left( \frac{e^{\frac{3\gamma _{1}}{2}\eta }-1}{3\gamma _{1}}\right) \left( \frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi _{\mu })+D_{4}\eta ^{\alpha }\right) \nonumber \\&\le 2\eta \left( \frac{40N^{3}}{\gamma _{1}}L^{6}\eta ^{2\alpha }H(p_{k}\vert \pi )+D_{4}\eta ^{\alpha }\right) , \end{aligned}$$
(7.31)

where the last line holds by \(e^{c}\le 1+2c\) for \(0<c=\frac{3\gamma _{1}}{2}\eta <1\). Rearranging the term of the above inequality and using the facts that \(1+\eta ^{1+2\alpha }\frac{80N^{3}}{\gamma _{1}}L^{6}\le 1+\frac{\gamma _{1}\eta }{2}\le e^{\frac{\gamma _{1}\eta }{2}}\) when \(\eta \le \left( \frac{\gamma _{1}}{13N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\) and \(e^{-\frac{3\gamma _{1}}{2}\eta }\le 1\) leads to

$$\begin{aligned} H(p_{k+1}\vert \pi _{\mu })&\le e^{-\frac{3\gamma _{1}}{2}\eta }\left( 1+\eta ^{1+2\alpha }\frac{80N^{3}}{\gamma _{1}}L^{6}\right) H(p_{k}\vert \pi _{\mu })+2\eta ^{\alpha +1}D_{4}\nonumber \\&\le e^{-\gamma _{1}\eta }H(p_{k}\vert \pi _{\mu })+2\eta ^{\alpha +1}D_{3}. \end{aligned}$$
(7.32)

Applying this inequality recursively and using the inequality \(1-e^{-c}\ge \frac{3}{4}c\) for \(0<c=\gamma _{1}\eta \le \frac{1}{4}\), we obtain

$$\begin{aligned} H(p_{k}\vert \pi _{\mu })&\le \,e^{-\gamma _{1}\eta k}H(p_{0}\vert \pi _{\mu })+\frac{2\eta ^{\alpha +1}D_{4}}{1-e^{-\gamma _{1}\eta }}\nonumber \\&\le \,e^{-\gamma _{1}\eta k}H(p_{0}\vert \pi _{\mu })+\frac{2\eta ^{\alpha +1}D_{4}}{\frac{3}{4}\gamma _{1}\eta }\nonumber \\&\le \,e^{-\gamma _{1}\eta k}H(p_{0}\vert \pi _{\mu })+\frac{8\eta ^{\alpha }D_{4}}{3\gamma _{1}}. \end{aligned}$$
(7.33)

Note that last inequality holds if we choose \(\eta \) such that it satisfies

$$\begin{aligned} \eta \le \min \left\{ 1,\frac{1}{4\gamma _{1}},\left( \frac{\gamma _{1}}{13N^{\frac{3}{2}}L^{3}}\right) ^{\frac{1}{\alpha }}\right\} . \end{aligned}$$

From Lemma 3.5, by choosing \(\mu =\sqrt{\eta }\) small enough so that \(W_{2}(\pi ,\ \pi _{\mu })\le 3\sqrt{NLE_{2}}\eta ^{\frac{\alpha }{2}}d^{\frac{1}{p}}\). Since \(\pi \) satisfies log-Sobolev inequality, by triangle inequality we also get

$$\begin{aligned} W_{2}(p_{k},\ \pi )&\le W_{2}(p_{k},\ \pi _{\mu })+W_{2}(\pi ,\ \pi _{\mu })\\&\le \sqrt{\frac{2}{\gamma }H(p_{k},\pi _{\mu })}+W_{2}(\pi ,\ \pi _{\mu })\\&\le \frac{1}{\sqrt{\gamma _{1}}}e^{-\frac{\gamma _{1}}{2}\eta k}\sqrt{H(p_{0}\vert \pi _{\mu })}+\frac{2}{\gamma _{1}}\eta ^{\frac{\alpha }{2}}\sqrt{D_{4}}+3\sqrt{NLE_{2}}\eta ^{\frac{\alpha }{2}}d^{\frac{1}{p}}. \end{aligned}$$

Given \(\epsilon >0\), if we further assume \(\eta \le \left( \frac{\epsilon \gamma _{1}}{6\sqrt{D_{4}}}\right) ^{\frac{2}{\alpha }}\wedge \left( \frac{\epsilon }{9\sqrt{NLE_{2}}d^{\frac{1}{p}}}\right) ^{\frac{2}{\alpha }}\), then the above inequality implies \(H(p_{k}\vert \pi _{\mu })\le \frac{1}{\sqrt{\gamma _{1}}}e^{-\frac{\gamma _{1}}{2}\eta k}\sqrt{H(p_{0}\vert \pi _{\mu })}+\frac{2\epsilon }{3}.\) This means for \(k\ge \frac{2}{\gamma _{1}\eta }\log \frac{3\sqrt{H\left( p_{0}\vert \pi _{\mu }\right) \gamma _{1}}}{\epsilon },\) we have \(H(p_{k}\vert \pi )\le \frac{\epsilon }{3}+\frac{2\epsilon }{3}=\epsilon \), as desired. \(\square \)

1.4.3 Proof of Lemma 3.5

Lemma 7.13

Assume that \(\pi \propto \exp (-\pi )\) and \(\pi _{\mu }\propto \exp (-U_{\mu })\) and \(\pi \) has a bounded second moment, that is \(\int \left\| x\right\| ^{2}\pi (x)\textrm{d}x=E_{2}<\infty \). We deduce the following bounds

$$\begin{aligned} W_{2}^{2}(\pi ,\ \pi _{\mu })\le 8.24NL\mu ^{1+\alpha }d^{\frac{2}{p}}E_{2}, \end{aligned}$$

for any \(\mu \le 0.05\).

Proof

This proof adapts the technique of the proof of [10]’s Proposition 1. Without loss of generality, we may assume that \({\displaystyle \int _{{\mathbb {R}}^{p}}\exp (-U(x))\hbox {d}x=1}\). We first give upper and lower bounds to the normalizing constant of \(\pi _{\mu }\), that is

$$\begin{aligned} c_{\mu }&{\mathop {=}\limits ^{_{\triangle }}}\int _{{\mathbb {R}}^{d}}\pi (x)e^{-\left( U_{\mu }(x)-U(x)\right) }\hbox {d}x\\&={\mathbb {E}}_{\pi }\left( e^{-\left( U_{\mu }(x)-U(x)\right) }\right) . \end{aligned}$$

The constant \(c_{\mu }\) is an expectation of \(e^{-\left( U_{\mu }(x)-U(x)\right) }\) with respect to the density \(\pi \) so it can be trivially upper bounded by \(e^{M}\) and lower bounded by \(e^{-M}\), where \(\Vert U_{\mu }(x)-U(x)\Vert \le \sum _{i}L_{i}\mu ^{1+\alpha _{i}}d^{\frac{2}{p}}=M\). Now we control the distance between densities \(\pi \) and \(\pi _{\mu }\) at any fixed \(x\in {\mathbb {R}}^{d}\):

$$\begin{aligned} \Vert \pi (x)-\pi _{\mu }(x)\Vert&=\pi (x)\left\| 1-\frac{e^{-\left( U_{\mu }(x)-U(x)\right) }}{c_{\mu }}\right\| \\ ~&\le \pi (x)\left\{ \left( 1-\frac{e^{-\left( U_{\mu }(x)-U(x)\right) }}{e^{M}}\right) +e^{-\left( U_{\mu }(x)-U(x)\right) }\left( \frac{1}{c_{\mu }}-\frac{1}{e^{M}}\right) \right\} \\&\le \pi (x)\left( 1-e^{-2M}+e^{2M}-1\right) \\&\le \pi (x)\left( 2M+e^{2M}-1\right) . \end{aligned}$$

The first inequality is from triangle inequality of absolute value, second inequality is trivial while the last inequality follows from \(1-e^{-x}\le x\) for any \(x\ge 0\). To bound \(W_{2}\), we use an inequality from [39](Theorem 6.15, page 115):

$$\begin{aligned} W_{2}^{2}(\pi ,\ \pi _{\mu })\le 2\int _{{\mathbb {R}}^{d}}\Vert x\Vert _{2}^{2}\Vert \pi (x)-\pi _{\mu }(x)\Vert \hbox {d}x. \end{aligned}$$

Combining this with the bound on \(\Vert \pi (x)-\pi _{\mu }(x)\Vert \) shown above, we have

$$\begin{aligned} W_{2}^{2}(\pi ,\ \pi _{\mu })&\le 2\int _{{\mathbb {R}}^{d}}\Vert x\Vert _{2}^{2}\pi (x)\left( 2M+e^{2M}-1\right) \hbox {d}x\\&\le 2\left( 2M+e^{2M}-1\right) E_{\pi }\left[ \Vert x\Vert ^{2}\right] \\&\le 2\left( 2M+e^{2M}-1\right) E_{2}\\&\le 8.24M\sum _{i}L_{i}\mu ^{1+\alpha _{i}}d^{\frac{2}{p}}E_{2}\\&\le NL\mu ^{1+\alpha }d^{\frac{2}{p}}E_{2}, \end{aligned}$$

where in the last inequality \(M<0.05\) ensures that \(e^{2M}-1\le 2.12M\). This gives the desired result. \(\square \)

1.5 Appendix E: Convexification of Non-convex Domain

1.5.1 Proof of Lemma 4.1

Lemma 7.14

For function V defined as

$$\begin{aligned} V(\ x)=\inf _{\begin{array}{c} \{\ x_{i}\}\subset \Omega ,\\ \left\{ \lambda _{i}\big \vert \sum _{i}\lambda _{i}=1\right\} \\ \text {s.t.},\sum _{i}\lambda _{i}\ x_{i}=\ x \end{array} }\left\{ \sum _{i=1}^{l}\lambda _{i}U(\ x_{i})\right\} , \end{aligned}$$
(7.34)

\(\forall \ x\in {\mathbb {B}}(0,R)\), \(\inf _{\left\| x\right\| =R}U(x)\le V(\ x)\le \sup _{\left\| x\right\| =R}U(x)\).

Proof

The proof is taken from [31, 43]. We provide it here for completeness. First, by definition of V inside \({\mathbb {B}}(0,R)\), we show that for any linear combination of the form \(\sum _{i}\lambda _{i}U(\ x_{i})\), where\(\sum _{i}\lambda _{i}=1,\) we can find another representation \(\sum _{j}\lambda _{j}U(\ x_{j})\), where \(\sum _{j}\lambda _{j}=1\) and \(\left\| x_{j}\right\| =R\) such that \(\sum _{j}\lambda _{j}U(\ x_{j})\le \sum _{i}\lambda _{i}U(\ x_{i})\). This follows straightforwardly as follows.

For any \(\ x_{j}\in \{\ x_{i}\}\), such that \(\left\| {\bar{x}}_{j}\right\| >R\), there exists a new convex combination \(\{\ x_{i}\}\bigcup \{{\bar{x}}_{j}\}\setminus \{\ x_{j}\}\) with \(\left\| {\bar{x}}_{j}\right\| =R\), such that \(\sum _{i}\lambda _{i}U(\ x_{i})\ge {\tilde{\lambda }}_{j}U({\bar{x}}_{j})+\sum _{i\ne j}{\tilde{\lambda }}_{i}U(\ x_{i})\). In this case, we choose \({\bar{x}}_{j}\) where \(\left\| {\bar{x}}_{j}\right\| =R\), such that:

$$\begin{aligned} {\bar{x}}_{j}&=\dfrac{1-{\bar{\lambda }}_{j}}{1-\lambda _{j}}\ x+\dfrac{{\bar{\lambda }}_{j}-\lambda _{j}}{1-\lambda _{j}}\ x_{j},\,\lambda _{j}<{\bar{\lambda }}_{j}<1,\nonumber \\&={\bar{\lambda }}_{j}\ x_{j}+\left( \dfrac{1-{\bar{\lambda }}_{j}}{1-\lambda _{j}}\right) \left( \sum _{i\ne j}\lambda _{i}\ x_{i}\right) . \end{aligned}$$
(7.35)

Since U is convex on \(\Omega \),

$$\begin{aligned} U({\bar{x}}_{j})\le {\bar{\lambda }}_{j}U(\ x_{j})+\left( \dfrac{1-{\bar{\lambda }}_{j}}{1-\lambda _{j}}\right) \left( \sum _{i\ne j}\lambda _{i}U(\ x_{i})\right) . \end{aligned}$$
(7.36)

On the other hand, x can be represented as a convex combination of \(\{\ x_{i}\}\bigcup \{{\bar{x}}_{j}\}\setminus \{\ x_{j}\}\):

$$\begin{aligned} x=\dfrac{\lambda _{j}}{{\bar{\lambda }}_{j}}{\bar{x}}_{j}+\left( 1-\dfrac{\lambda _{j}}{{\bar{\lambda }}_{j}}\dfrac{1-{\bar{\lambda }}_{j}}{1-\lambda _{j}}\right) \left( \sum _{i\ne j}\lambda _{i}\ x_{i}\right) ={\tilde{\lambda }}_{j}{\bar{x}}_{j}+\sum _{i\ne j}{\tilde{\lambda }}_{i}\ x_{i}, \end{aligned}$$
(7.37)

and that

$$\begin{aligned} \sum _{i}\lambda _{i}U(\ x_{i})&\ge \dfrac{\lambda _{j}}{{\bar{\lambda }}_{j}}U({\bar{x}}_{j})+\left( 1-\dfrac{\lambda _{j}}{{\bar{\lambda }}_{j}}\dfrac{1-{\bar{\lambda }}_{j}}{1-\lambda _{j}}\right) \left( \sum _{i\ne j}\lambda _{i}U(\ x_{i})\right) \nonumber \\&={\tilde{\lambda }}_{j}U({\bar{x}}_{j})+\sum _{i\ne j}{\tilde{\lambda }}_{i}U(\ x_{i}). \end{aligned}$$
(7.38)

As a result, \(V(\ x)\) can be represented as

$$\begin{aligned} V(\ x)=\inf _{\begin{array}{c} \{\ x_{j}\}\subset \Omega ,\\ \left\{ \lambda _{j}\big \vert \sum _{j}\lambda _{j}=1\right\} \\ \text {s.t.},\sum _{j}\lambda _{j}\ x_{j}=\ x,\,\left\| x_{i}\right\| =R \end{array} }\left\{ \sum _{j}\lambda _{j}U(\ x_{j})\right\} . \end{aligned}$$
(7.39)

By the representation of V inside \({\mathbb {B}}(0,R)\), we obtain \(\inf _{\left\| {\bar{x}}\right\| =R}U({\bar{x}})\le V(\ x)\le \sup _{\left\| {\bar{x}}\right\| =R}U({\bar{x}})\). \(\square \)

1.5.2 Proof of Lemma 4.2

Lemma 7.15

For U satisfying \(\alpha \)-mixture weakly smooth and \(\left( \mu ,\theta \right) \)-degenerated convex outside the ball radius R, there exists \({\hat{U}}\in C^{1}({\mathbb {R}}^{d})\) with a Hessian that exists everywhere on \({\mathbb {R}}^{d}\), and \({\hat{U}}\) is \(\left( \left( 1-\theta \right) \frac{\mu }{2},\theta \right) \)-degenerated convex on \({\mathbb {R}}^{d}\) (that is \(\nabla ^{2}{\hat{U}}(x)\succeq \left( 1-\theta \right) \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}\)), such that

$$\begin{aligned} \sup \left( {\hat{U}}(\ x)-U(\ x)\right)&-\inf \left( {\hat{U}}(\ x)-U(\ x)\right) \le \sum _{i}L_{i}R^{1+\alpha _{i}}+\frac{4\mu }{\left( 2-\theta \right) }\ R^{2-\theta }. \end{aligned}$$
(7.40)

Proof

Following closely to [31]’s approach, let \(g(\ x)=\frac{\mu }{2\left( 2-\theta \right) }\ \left( 1+\left\| x\right\| ^{2}\right) ^{1-\frac{\theta }{2}}\) for \(0\le \theta <1\). The gradient of \(g\left( x\right) \) is \(\nabla g(\ x)=\frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}x\) and the Hessian of \(g\left( x\right) \) is

$$\begin{aligned} \nabla ^{2}g(\ x)&=\frac{\mu }{2}\left[ \left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}-\theta \left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}xx^{T}\right] \nonumber \\&\preceq \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}. \end{aligned}$$
(7.41)

On the other hand, we also have

$$\begin{aligned} \nabla ^{2}g(\ x)&=\frac{\mu }{2}\left[ \left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}-\theta \left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}xx^{T}\right] \nonumber \\&=\frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}\left[ I_{d}+I_{d}\left\| x\right\| ^{2}-\theta \left\| x\right\| ^{2}\frac{xx^{T}}{\left\| x\right\| ^{2}}\right] \nonumber \\&=\frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}\left[ I_{d}+I_{d}\left( 1-\theta \right) \left\| x\right\| ^{2}+\theta \left\| x\right\| ^{2}\left( I_{d}-\frac{xx^{T}}{\left\| x\right\| ^{2}}\right) \right] \nonumber \\&\succeq \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}\left( \left( 1-\theta \right) \left\| x\right\| ^{2}+1\right) I_{d}\nonumber \\&\succeq \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}-1}\left( \left( 1-\theta \right) \left( \left\| x\right\| ^{2}+1\right) \right) I_{d}\nonumber \\&\succeq \left( 1-\theta \right) \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}. \end{aligned}$$
(7.42)

We adapt [43] by denoting \({\tilde{U}}\left( x\right) =U\left( x\right) -g\left( x\right) .\) Since \(U\left( x\right) \) is \(\left( \mu ,\theta \right) \)-degenerated convex outside the ball, we deduce for every \(\left\| x\right\| \ge R,\)

$$\begin{aligned} \nabla ^{2}{\tilde{U}}\left( x\right)&=\nabla ^{2}U\left( x\right) -\nabla ^{2}g\left( x\right) \nonumber \\&\succeq \mu \left( 1+\left\| x\right\| {}^{2}\right) ^{-\frac{\theta }{2}}I_{d}-\frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}\nonumber \\&\succeq \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}I_{d}, \end{aligned}$$
(7.43)

which implies that \({\tilde{U}}\left( x\right) \) is \(\left( \frac{\mu }{2},\theta \right) \)-degenerated convex outside the ball. Now, we construct \({\hat{U}}(\ x)\) so that it is twice differentiable, degenerated convex on all \({\mathbb {R}}^{d}\) and differs from \(U(\ x)\) less than \(4LR^{1+\alpha }+4LR^{1+\ell +\alpha }+\frac{4\mu }{\left( 2-\theta \right) }\ R^{2-\theta }\). Based on the same construction of [31], we first define the function V as the convex extension [43] of \({\tilde{U}}\) from domain \(\Omega ={\mathbb {R}}^{d}\setminus {\mathbb {B}}\left( 0,R\right) \) to its convex hull \(\Omega ^{co}\), \(V\left( x\right) =\inf \left\{ \sum _{i}\lambda _{i}{\tilde{U}}(\ x_{i})\right\} \) for every \(x\in {\mathbb {R}}^{d}.\) Since \({\tilde{U}}(\ x)\) is convex in \(\Omega \), \(V(\ x)={\tilde{U}}(\ x)\) for \(\ x\in \Omega \). By Lemma 4.1, \(V(\ x)\) is convex on the entire domain \({\mathbb {R}}^{d}\) and \(V(\ x)\) can be represented as

$$\begin{aligned} V(\ x)=\inf _{\begin{array}{c} \{\ x_{j}\}\subset \Omega ,\\ \left\{ \lambda _{j}\big \vert \sum _{j}\lambda _{j}=1\right\} \\ \text {s.t.},\sum _{j}\lambda _{j}\ x_{j}=\ x,\textrm{and}\left\| x_{i}\right\| =R \end{array} }\left\{ \sum _{j}\lambda _{j}{\tilde{U}}(\ x_{j})\right\} . \end{aligned}$$
(7.44)

Therefore, \(\forall \ x\in {\mathbb {B}}(0,R)\), \(\inf _{\left\| {\bar{x}}\right\| =R}{\tilde{U}}({\bar{x}})\le V(\ x)\le \sup _{\left\| {\bar{x}}\right\| =R}{\tilde{U}}({\bar{x}})\). Next we construct \({\tilde{V}}(\ x)\) to be a smoothing of V on \({\mathbb {B}}\left( 0,R+\epsilon \right) \). Consider the function \(\varphi {\displaystyle (x)}\) of a variable x in \({\mathbb {R}}^{d}\) defined by

$$\begin{aligned} {\displaystyle \varphi (x)={\left\{ \begin{array}{ll} Ce^{-1/(1-\left\| x\right\| ^{2})}, &{} \text { if }\left\| x\right\| <1,\\ 0, &{} \text { if }\left\| x\right\| \ge 1, \end{array}\right. }} \end{aligned}$$
(7.45)

where the numerical constant C ensures normalization. Let \({\displaystyle \varphi _{\delta }(x)=\delta ^{-d}\varphi (\delta ^{-1}x)}\) be a smooth function supported on the ball \({\mathbb {B}}(0,\delta )\). Define

$$\begin{aligned} {\tilde{V}}(\ x)&=\int V(\ y)\varphi _{\delta }(\ x-y)\hbox {d}y\nonumber \\&=\int V(\ x-y)\varphi _{\delta }(y)\hbox {d}y\nonumber \\&=E_{y}\left[ V(x-y)\right] . \end{aligned}$$
(7.46)

The third equality implies that for any x and \(z\in {\mathbb {R}}^{d}\),

$$\begin{aligned} \left\langle \nabla {\tilde{V}}(\ x)-\nabla {\tilde{V}}(\ z),x-z\right\rangle&=\left\langle \nabla E_{y}\left[ V(x-y)\right] -\nabla E_{y}\left[ V(z-y)\right] ,x-z\right\rangle \nonumber \\&{\mathop {=}\limits ^{_{1}}}\left\langle E_{y}\left[ \nabla V(x-y)\right] -E_{y}\left[ \nabla V(z-y)\right] ,x-z\right\rangle \nonumber \\&=\left\langle E_{y}\left[ \nabla V(x-y)-\nabla V(z-y)\right] ,x-z\right\rangle \nonumber \\&=E_{y}\left\langle \nabla V(x-y)-\nabla V(z-y),x-z\right\rangle \nonumber \\&\ge 0, \end{aligned}$$
(7.47)

where step 1 follows from exchangeability of gradient and integral and the last line is because of convexity of V, which indicates \({\tilde{V}}\) is a smooth and convex function on \({\mathbb {R}}^{d}.\) Also, note that the definition of \({\tilde{V}}\) implies that \(\forall \left\| x\right\| <R+\epsilon \),

$$\begin{aligned} \inf _{\left\| {\bar{x}}\right\|<R+\epsilon +\delta }V({\bar{x}})\le {\tilde{V}}(\ x)\le \sup _{\left\| {\bar{x}}\right\| <R+\epsilon +\delta }V({\bar{x}}). \end{aligned}$$
(7.48)

And by Lemma 4.1, for \(\quad \forall \left\| {\bar{x}}\right\| <R+\epsilon \)

$$\begin{aligned} \inf _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+\epsilon +\delta \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}})\le {\tilde{V}}(\ x)\le \sup _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+\epsilon +\delta \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}}). \end{aligned}$$
(7.49)

Finally, we construct the auxiliary function:

$$\begin{aligned} {\hat{U}}(\ x)-g\left( x\right) =\left\{ \begin{array}{l} {\tilde{U}}(\ x),\ \left\| x\right\| \ge R+2\epsilon ,\\ \alpha (\ x){\tilde{U}}(\ x)+(1-\alpha (\ x)){\tilde{V}}(\ x),\ R+\epsilon<\left\| x\right\| <R+2\epsilon ,\\ {\tilde{V}}(\ x),\ \left\| x\right\| \le R+\epsilon , \end{array}\right. \end{aligned}$$
(7.50)

where \(\alpha (\ x)=\dfrac{1}{2}\cos \left( \pi \dfrac{\left\| x\right\| ^{2}}{\epsilon \left( 2R+3\epsilon \right) ^{2}}-\frac{\left( R+\epsilon \right) ^{2}}{\epsilon \left( 2R+3\epsilon \right) ^{2}}\pi \right) +\dfrac{1}{2}\). Here we know that \({\tilde{U}}(\ x)\) is convex and smooth in \({\mathbb {R}}^{d}\setminus {\mathbb {B}}\left( 0,R\right) \); \({\tilde{V}}(\ x)\) is also convex and smooth in \({\mathbb {R}}^{d}\setminus {\mathbb {B}}\left( 0,R+\epsilon \right) \). Hence, for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \),

$$\begin{aligned} \nabla ^{2}\left( {\hat{U}}(\ x)-g\left( x\right) \right)&=\nabla ^{2}{\tilde{U}}(\ x)+\nabla ^{2}\left( (1-\alpha (\ x))({\tilde{V}}(\ x)-{\tilde{U}}(\ x))\right) \nonumber \\&=\alpha (\ x)\nabla ^{2}{\tilde{U}}(\ x)+(1-\alpha (\ x))\nabla ^{2}{\tilde{V}}(\ x)\nonumber \\&-\nabla ^{2}\alpha (\ x)\left( {\tilde{V}}(\ x)-{\tilde{U}}(\ x)\right) -2\nabla \alpha (\ x)\left( \nabla {\tilde{V}}(\ x)-\nabla {\tilde{U}}(\ x)\right) ^{T}\nonumber \\&\succeq -\nabla ^{2}\alpha (\ x)\left( {\tilde{V}}(\ x)-{\tilde{U}}(\ x)\right) -2\nabla \alpha (\ x)\left( \nabla {\tilde{V}}(\ x)-\nabla {\tilde{U}}(\ x)\right) ^{T}. \end{aligned}$$
(7.51)

Note that for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \), we have

$$\begin{aligned}&\left\| \nabla g(\ x)-\nabla g(\ x-y)\right\| \nonumber \\&\quad =\left\| \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}x-\frac{\mu }{2}\left( 1+\left\| x-y\right\| ^{2}\right) ^{-\frac{\theta }{2}}\left( x-y\right) \right\| \end{aligned}$$
(7.52)
$$\begin{aligned}&\quad \le \left\| \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}x-\frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}\left( x-y\right) \right\| \nonumber \\&\qquad +\left\| \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}\left( x-y\right) -\frac{\mu }{2}\left( 1+\left\| x-y\right\| ^{2}\right) ^{-\frac{\theta }{2}}\left( x-y\right) \right\| \nonumber \\&\quad \le \frac{\mu }{2}\left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}\left\| y\right\| +\frac{\mu }{2}\Vert \left( 1+\left\| x\right\| ^{2}\right) ^{-\frac{\theta }{2}}-\left( 1+\left\| x-y\right\| ^{2}\right) ^{-\frac{\theta }{2}}\Vert \left\| x-y\right\| \nonumber \\&\quad \le \frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta +\frac{\mu }{2}\frac{\Vert \left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}-\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}\Vert }{\left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}}\left\| \left( x-y\right) \right\| \nonumber \\&\quad {\mathop {\le }\limits ^{_{1}}}\frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta +\frac{\mu }{2}\frac{\Vert \left( 1+\left\| x\right\| ^{2}\right) -\left( 1+\left\| x-y\right\| ^{2}\right) \Vert }{\left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}}\left\| \left( x-y\right) \right\| \nonumber \\&\quad \le \frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta +\frac{\mu }{2}\frac{\Vert \left( \left\| x\right\| -\left\| x-y\right\| \right) \left( \left\| x\right\| +\left\| x-y\right\| \right) \Vert }{\left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}}\left\| \left( x-y\right) \right\| \nonumber \\&\quad {\mathop {\le }\limits ^{_{2}}}\frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta +\frac{\mu }{2}\frac{\left\| y\right\| \left( \left\| x\right\| +\left\| x-y\right\| \right) }{\left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}}\left\| \left( x-y\right) \right\| \nonumber \\&\quad \le \frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta +\frac{\mu }{2}\frac{2\left( R+2\epsilon +\delta \right) ^{2}\delta }{\left( 1+\left( R+\epsilon \right) ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left( R+\epsilon -\delta \right) ^{2}\right) ^{\frac{\theta }{2}}}, \end{aligned}$$
(7.53)

where 1 follows from Lemma 7.24, while 2 is due to triangle inequality. As a result, we get

$$\begin{aligned} \left\| \nabla {\tilde{V}}(\ x)-\nabla {\tilde{U}}(\ x)\right\|&=\int \left\| \nabla {\tilde{U}}(\ x-\ y)-\nabla {\tilde{U}}(\ x)\right\| \varphi _{\delta }(\ y)\hbox {d}y\nonumber \\&\le \sum _{i}L_{i}\delta ^{\alpha _{i}}+\left\| \nabla g(\ x)-\nabla g(\ x-y)\right\| \nonumber \\&\le NL\delta ^{\alpha }+\frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta \end{aligned}$$
(7.54)
$$\begin{aligned}&\quad +\frac{\mu }{2}\frac{2\left( R+\epsilon -\delta \right) ^{2}\delta }{\left( 1+\left( R+\epsilon \right) ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left( R+\epsilon -\delta \right) ^{2}\right) ^{\frac{\theta }{2}}}. \end{aligned}$$
(7.55)

On the other hand, we also acquire

$$\begin{aligned}&\vert {\tilde{U}}(\textrm{x})-{\tilde{U}}(x-\textrm{y})\vert \nonumber \\&\quad \le \mathrm {\max }\left\{ \left\langle \nabla U(\mathrm {x-y}),\textrm{y}\right\rangle ,\left\langle \nabla U(\textrm{x}),\mathrm {-y}\right\rangle \right\} \nonumber \\&\qquad +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}+\Vert g\left( x\right) -g\left( x-y\right) \Vert \nonumber \\&\quad {\le \max }\left\{ \left\langle \nabla U(\mathrm {x-y}),\textrm{y}\right\rangle ,\left\langle \nabla U(\textrm{x}),\mathrm {-y}\right\rangle \right\} +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}\nonumber \\&\qquad +\Vert \frac{\mu }{2\left( 2-\theta \right) }\ \left( 1+\left\| x\right\| ^{2}\right) ^{1-\frac{\theta }{2}}-\frac{\mu }{2\left( 2-\theta \right) }\ \left( 1+\left\| x-y\right\| ^{2}\right) ^{1-\frac{\theta }{2}}\Vert \nonumber \\&\quad {\mathop {\le }\limits ^{_{1}}}\mathrm {\max }\left\{ \sum _{i}L_{i}\left\| \mathrm {x-y}\right\| ^{\alpha _{i}}\left\| y\right\| ,\sum _{i}L_{i}\left\| \textrm{x}\right\| ^{\alpha _{i}}\left\| y\right\| \right\} \nonumber \\&\qquad +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}+\frac{\mu }{2\left( 2-\theta \right) }\Vert \left( 1+\left\| x\right\| ^{2}\right) -\left( 1+\left\| x-y\right\| ^{2}\right) \Vert \end{aligned}$$
(7.56)
$$\begin{aligned}&\quad \le L\left\| y\right\| \mathrm {\max }\left\{ \sum _{i}L_{i}\left\| \mathrm {x-y}\right\| ^{\alpha _{i}},\sum _{i}L_{i}\left\| \textrm{x}\right\| ^{\alpha _{i}}\right\} \nonumber \\&\qquad +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}+\frac{\mu }{2\left( 2-\theta \right) }\Vert \left( \left\| x\right\| -\left\| x-y\right\| \right) \left( \left\| x\right\| +\left\| x-y\right\| \right) \Vert \end{aligned}$$
(7.57)
$$\begin{aligned}&\quad \le L\left\| y\right\| \mathrm {\max }\left\{ \sum _{i}L_{i}\left\| \mathrm {x-y}\right\| ^{\alpha _{i}},\sum _{i}L_{i}\left\| \textrm{x}\right\| ^{\alpha _{i}}\right\} \end{aligned}$$
(7.58)
$$\begin{aligned}&\qquad +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}+\frac{\mu }{2\left( 2-\theta \right) }\left( \left\| x\right\| +\left\| x-y\right\| \right) \left\| y\right\| , \end{aligned}$$
(7.59)

where 1 follows again from Lemma 7.24 and the last inequality is because of triangle inequality. Hence for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \), \(\left\| y\right\| \le \delta \),

$$\begin{aligned} {\tilde{V}}(\ x)-{\tilde{U}}(\ x)&=\int \left( {\tilde{U}}(\ x-\ y)-{\tilde{U}}(\ x)\right) \varphi _{\delta }(\ y)d\ y\\&\le L\left\| y\right\| \mathrm {\max }\left\{ \sum _{i}L_{i}\left\| \mathrm {x-y}\right\| ^{\alpha _{i}},\sum _{i}L_{i}\left\| \textrm{x}\right\| ^{\alpha _{i}}\right\} \\&\quad +\sum _{i}\frac{L}{1+\alpha _{i}}\Vert \textrm{y}\Vert ^{\alpha _{i}+1}+\frac{\mu }{2\left( 2-\theta \right) }\left( \left\| x\right\| +\left\| x-y\right\| \right) \left\| y\right\| \\&\le L\delta \left[ \sum _{i}L_{i}\left( R+2\epsilon +\delta \right) ^{\alpha _{i}}\right] \\&\quad +\sum _{i}\frac{L}{1+\alpha _{i}}\delta ^{\alpha _{i}+1}+\frac{\mu }{\left( 2-\theta \right) }\left( R+2\epsilon +\delta \right) \delta . \end{aligned}$$

Therefore, when \(R+\epsilon<\left\| x\right\| <R+2\epsilon \),

$$\begin{aligned} \nabla ^{2}\left( {\hat{U}}(\ x)-g\left( x\right) \right)&\succeq -\frac{\left( R+\epsilon \right) ^{2}\pi \left( L\delta \left[ \sum _{i}L_{i}\left( R+2\epsilon +\delta \right) ^{\alpha _{i}}\right] \right) }{\epsilon \left( 2R+3\epsilon \right) }I_{d}\nonumber \\&\quad -\frac{\left( R+\epsilon \right) ^{2}\pi \left( +\sum _{i}\frac{L}{1+\alpha _{i}}\delta ^{\alpha _{i}+1}+\frac{\mu }{\left( 2-\theta \right) }\left( R+2\epsilon +\delta \right) \delta \right) }{\epsilon \left( 2R+3\epsilon \right) }I_{d}\nonumber \\&\quad -\frac{\left( R+\epsilon \right) ^{4}\pi ^{2}\left( NL\delta ^{\alpha }+\frac{\mu }{2}\left( 1+\left( R+\epsilon \right) {}^{2}\right) ^{-\frac{\theta }{2}}\delta \right) }{\epsilon ^{2}\left( 2R+3\epsilon \right) }I_{d}\nonumber \\&\quad -\frac{\left( R+\epsilon \right) ^{4}\pi ^{2}\left( \frac{\mu }{2}\frac{2\left( R+\epsilon -\delta \right) ^{2}\delta }{\left( 1+\left( R+\epsilon \right) ^{2}\right) ^{\frac{\theta }{2}}\left( 1+\left( R+\epsilon -\delta \right) ^{2}\right) ^{\frac{\theta }{2}}}\right) }{\epsilon ^{2}\left( 2R+3\epsilon \right) }I_{d}. \end{aligned}$$
(7.60)

Taking the limit when \(\delta \rightarrow 0^{+}\), we obtain that for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \), \(\nabla ^{2}\left( {\hat{U}}(\ x)-g\left( x\right) \right) \) is positive semi-definite; hence, it is positive semi-definite on the entire \({\mathbb {R}}^{d}\), or \({\hat{U}}(\ x)-g\left( x\right) \) is convex on \({\mathbb {R}}^{d}\). From (7.49), we know that for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \),

$$\begin{aligned} \inf _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}})&\le {\hat{U}}(\ x)-g\left( x\right) \le \sup _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}}). \end{aligned}$$
(7.61)

Therefore,

$$\begin{aligned}&\sup \left( {\hat{U}}(\ x)-U(\ x)\right) -\inf \left( {\hat{U}}(\ x)-U(\ x)\right) \nonumber \\&\quad =\sup \left( {\hat{U}}(\ x)-g\left( x\right) -{\tilde{U}}(\ x)\right) -\inf \left( {\hat{U}}(\ x)-g\left( x\right) -{\tilde{U}}(\ x)\right) \end{aligned}$$
(7.62)
$$\begin{aligned}&\quad \le 2\left( \sup _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}})-\inf _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) \setminus {\mathbb {B}}(0,R)}{\tilde{U}}({\bar{x}})\right) \nonumber \\&\quad \le 2\left( \sup _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) }{\tilde{U}}({\bar{x}})-\inf _{{\bar{x}}\in {\mathbb {B}}\left( 0,R+2\epsilon \right) }{\tilde{U}}({\bar{x}})\right) . \end{aligned}$$
(7.63)

Since U is \(\alpha \)-mixture weakly smooth and \(\nabla U(0)=0\), we deduce

$$\begin{aligned} \Vert U(\ x)-U(0)\Vert&=\Vert U(\ x)-U(0)-\ \left\langle x,\nabla U(0)\right\rangle \Vert \nonumber \\&\le \sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{1+\alpha _{i}}\nonumber \\&\le \sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left( R+2\epsilon \right) ^{1+\alpha _{i}}\nonumber \\&\le \sum _{i}L_{i}R^{1+\alpha _{i}} \end{aligned}$$
(7.64)

and

$$\begin{aligned} \Vert g(\ x)\Vert&=\left\| \frac{\mu }{2\left( 2-\theta \right) }\ \left( 1+\left\| x\right\| ^{2}\right) ^{1-\frac{\theta }{2}}\right\| \nonumber \\&\le \frac{\mu }{2\left( 2-\theta \right) }\ \left( 1+\left( R+2\epsilon \right) ^{2}\right) ^{1-\frac{\theta }{2}}\nonumber \\&\le \frac{\mu }{\left( 2-\theta \right) }\ R^{2-\theta }. \end{aligned}$$
(7.65)

So for \(\forall \left\| x\right\| \le \left( R+2\epsilon \right) \), \(\epsilon \) is sufficiently small,

$$\begin{aligned} \sup _{{\bar{x}}\in {\mathbb {B}}\left( R+2\epsilon \right) }{\tilde{U}}({\bar{x}})-\inf _{{\bar{x}}\in {\mathbb {B}}\left( R+2\epsilon \right) }&{\tilde{U}}({\bar{x}})\le \sum _{i}L_{i}R^{1+\alpha _{i}}+\frac{2\mu }{\left( 2-\theta \right) }\ R^{2-\theta }. \end{aligned}$$

As a result, we get

$$\begin{aligned} \sup \left( {\hat{U}}(\ x)-U(\ x)\right) -\inf&\left( {\hat{U}}(\ x)-U(\ x)\right) \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+\frac{4\mu }{\left( 2-\theta \right) }\ R^{2-\theta }. \end{aligned}$$

\(\square \)

Remark 7.16

When \(\theta =0,\) the \(\left( \mu ,\theta \right) \)-degenerated convex outside the ball is equivalent to the \(\mu \)-strongly convex outside the ball, we achieve a result for strongly convex outside the ball similar to [31] but for \(\alpha \)-mixture weakly smooth instead of smooth. The constant could be improved by a factor of 2 if we take \(\epsilon \) to be arbitrarily small.

1.5.3 Proof of Lemma 4.4

Lemma 7.17

For U satisfying \(\gamma -\)Poincaré, \(\alpha \)-mixture weakly smooth with \(\alpha _{N}=1\) and \(2-\)dissipative, there exists \(\breve{U}\in C^{1}({\mathbb {R}}^{d})\) with a Hessian that exists everywhere on \({\mathbb {R}}^{d}\), and \(\breve{U}\) is log-Sobolev on \({\mathbb {R}}^{d}\) such that

$$\begin{aligned} \sup \left( \breve{U}(\ x)-U(\ x)\right) -\inf \left( \breve{U}(\ x)-U(\ x)\right) \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }.\nonumber \\ \end{aligned}$$
(7.66)

Proof

First, given \(R>0,\) let \({\overline{U}}(\textrm{x}):=U(\textrm{x})+\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}\) for \(\lambda _{0}=\frac{2\,L}{R^{1-\alpha }}\), we obtain the following property

$$\begin{aligned}&\left\langle \nabla {\overline{U}}(\textrm{x})-\nabla {\overline{U}}(\textrm{y}),x-y\right\rangle \nonumber \\&\quad =\left\langle \nabla \left( U(\textrm{x})+\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}\right) -\nabla \left( U(\textrm{y})+\frac{L_{N}+\lambda _{0}}{2}\left\| y\right\| ^{2}\right) ,x-y\right\rangle \end{aligned}$$
(7.67)
$$\begin{aligned}&\quad =\left\langle \nabla U(\textrm{x})-\nabla U(\textrm{y})+(L_{N}+\lambda _{0})\left( x-y\right) ,x-y\right\rangle \nonumber \\&\quad {\mathop {\ge }\limits ^{i}}-\sum _{i<N}L_{i}\left\| x-y\right\| ^{1+\alpha }+\lambda _{0}\left\| x-y\right\| ^{2}\nonumber \\&\quad \ge \frac{\lambda _{0}}{2}\left\| x-y\right\| ^{2}\,for\,\left\| x-y\right\| \ge \left( \frac{NL}{\lambda _{0}}\right) ^{\frac{1}{1-\alpha _{1}}}=R, \end{aligned}$$
(7.68)

where (i) follows from Assumption 2.1. This implies that \({\overline{U}}(\textrm{x})\) is \(\lambda _{0}-\) strongly convex outside the ball \(B_{R}=\left\{ x:\left\| x\right\| \le R\right\} \). Though \({\overline{U}}(\textrm{x})\) behaves differently than Lemma 4.2 assumptions, with some additional verifications, we still can apply Lemma 4.2 to derive the result. We sketch the proof as follows. There exists \({\hat{U}}\in C^{1}({\mathbb {R}}^{d})\) with a Hessian that exists everywhere on \({\mathbb {R}}^{d}\),

$$\begin{aligned} {\hat{U}}(\ x)-\frac{\lambda _{0}}{4}\left\| x\right\| ^{2}=\left\{ \begin{array}{l} \tilde{{\overline{U}}}(\ x),\ \left\| x\right\| \ge R+2\epsilon ,\\ \alpha (\ x)\tilde{{\overline{U}}}(\ x)+(1-\alpha (\ x)){\tilde{V}}(\ x),\ R+\epsilon<\left\| x\right\| <R+2\epsilon ,\\ {\tilde{V}}(\ x),\ \left\| x\right\| \le R+\epsilon , \end{array}\right. \end{aligned}$$
(7.69)

where \(\alpha (\ x)\) is defined as before. Both \(\tilde{{\overline{U}}}(\ x)\) and \({\tilde{V}}(\ x)\) are convex and smooth in \({\mathbb {R}}^{d}\setminus {\mathbb {B}}\left( 0,R\right) \) and for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \), \(\left\| y\right\| \le \delta \),

$$\begin{aligned} \nabla ^{2}\left( {\hat{U}}(\ x)-\frac{\lambda _{0}}{4}\left\| x\right\| ^{2}\right)&\succeq -\nabla ^{2}\alpha (\ x)\left( {\tilde{V}}(\ x)-\tilde{{\overline{U}}}(\ x)\right) \nonumber \\&\quad -2\nabla \alpha (\ x)\left( \nabla {\tilde{V}}(\ x)-\nabla \tilde{{\overline{U}}}(\ x)\right) ^{T}. \end{aligned}$$
(7.70)

In this case, we have

$$\begin{aligned} \left\| \nabla {\tilde{V}}(\ x)-\nabla \tilde{{\overline{U}}}(\ x)\right\|&=\left\| \nabla \int \left( {\overline{U}}(\ x-\ y)-{\overline{U}}(\ x)\right) \varphi _{\delta }(\ y)\hbox {d}y\right\| \nonumber \\&{\mathop {\le }\limits ^{_{1}}}\left\| \nabla \int \left( U(\ x-\ y)-U(\ x)\right) \varphi _{\delta }(\ y)\hbox {d}y\right\| \nonumber \\&\quad +\lambda _{0}\int \left\| y\right\| \varphi _{\delta }(\ y)\hbox {d}y\nonumber \\&\le \left\| \int \left( \nabla U(\ x-\ y)-\nabla U(\ x)\right) \varphi _{\delta }(\ y)\hbox {d}y\right\| +\lambda _{0}\delta \nonumber \\&\le \sum _{i}L_{i}\delta ^{\alpha _{1}}+\lambda _{0}\delta , \end{aligned}$$
(7.71)

where 1 holds by triangle inequality and the last line is because of \(\alpha \)-mixture weakly smooth assumption, while

$$\begin{aligned}&\Vert \tilde{{\overline{U}}}(\textrm{x})-\tilde{{\overline{U}}}(x-\textrm{y})\Vert \nonumber \\&\quad {\mathop {\le }\limits ^{_{1}}}\Vert {\overline{U}}(\textrm{x})-{\overline{U}}(x-\textrm{y})\Vert +\Vert \frac{L+\lambda _{0}}{2}\left\| x\right\| ^{2}-\frac{L+\lambda _{0}}{2}\left\| x-y\right\| ^{2}\Vert \nonumber \\&\quad {\mathop {\le }\limits ^{_{2}}}\left\{ \left\langle \nabla U(\mathrm {x-y}),\textrm{y}\right\rangle \vee \left\langle \nabla U(\textrm{x}),\mathrm {-y}\right\rangle \right\} +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left\| y\right\| ^{\alpha _{i}+1}\nonumber \\&\qquad +\frac{L_{N}+\lambda _{0}}{2}\Vert \left\| x\right\| ^{2}-\left\| x-y\right\| ^{2}\Vert \end{aligned}$$
(7.72)
$$\begin{aligned}&\quad {\le }\left\{ \left\langle \nabla U(\mathrm {x-y}),\textrm{y}\right\rangle \vee \left\langle \nabla U(\textrm{x}),\mathrm {-y}\right\rangle \right\} +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left\| y\right\| ^{\alpha _{i}+1}\nonumber \\&\qquad +\frac{L_{N}+\lambda _{0}}{2}\left( \left\| x\right\| -\left\| x-y\right\| \right) \left( \left\| x\right\| +\left\| x-y\right\| \right) \end{aligned}$$
(7.73)
$$\begin{aligned}&\quad \le \left\{ \left( \sum _{i}L_{i}\left\| \mathrm {x-y}\right\| ^{\alpha _{i}}\right) \left\| y\right\| \vee \left( \sum _{i}L_{i}\left\| \textrm{x}\right\| ^{\alpha _{i}}\right) \left\| y\right\| \right\} \nonumber \\&\qquad +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left\| y\right\| ^{\alpha _{i}+1}+\frac{L_{N}+\lambda _{0}}{2}\left\| y\right\| \mathrm {\max }\left\{ \left\| \mathrm {x-y}\right\| ,\left\| \textrm{x}\right\| \right\} \end{aligned}$$
(7.74)
$$\begin{aligned}&\quad \le \sum _{i}L_{i}\left( R+2\epsilon +\delta \right) ^{\alpha _{i}}\delta \end{aligned}$$
(7.75)
$$\begin{aligned}&\qquad +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\delta ^{\alpha _{i}+1}+\frac{L_{N}+\lambda _{0}}{2}\left( R+2\epsilon +\delta \right) \delta , \end{aligned}$$
(7.76)

where 1 is due to triangle inequality, 2 follows from Assumption 1, and the last line holds by plugging in all the limits. Taking the limit when \(\delta \rightarrow 0^{+},\) and for sufficiently small \(\epsilon \), we obtain \({\hat{U}}(\ x)-\frac{\lambda _{0}}{4}\left\| x\right\| ^{2}\) is convex on all \({\mathbb {R}}^{d}\) or \({\hat{U}}(\ x)\) is \(\frac{\lambda _{0}}{2}\)- strongly convex. By definition of \({\overline{U}}\), for \(R+\epsilon<\left\| x\right\| <R+2\epsilon \) we obtain

$$\begin{aligned} \Vert {\overline{U}}(\ x)-{\overline{U}}(0)\Vert&\le \Vert U(\ x)-U(0)-\ \left\langle x,\nabla U(0)\right\rangle \Vert +\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}\nonumber \\&\le +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{\alpha _{i}+1}+\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}\nonumber \\&\le +\sum _{i}\frac{L_{i}}{1+\alpha _{i}}\left( R+2\epsilon +\delta \right) ^{\alpha _{i}+1}+\frac{L_{N}+\lambda _{0}}{2}\left( R+2\epsilon +\delta \right) ^{2}\nonumber \\&\le \sum _{i}L_{i}R^{1+\alpha _{i}}+\left( L_{N}+\lambda _{0}\right) R^{2}. \end{aligned}$$
(7.77)

As a result, from Lemma 4.2 we deduce

$$\begin{aligned} \sup \left( {\hat{U}}(\ x)-{\overline{U}}(\ x)\right)&-\inf \left( {\hat{U}}(\ x)-{\overline{U}}(\ x)\right) \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+2\left( L_{N}+\lambda _{0}\right) R^{2}. \end{aligned}$$
(7.78)

Let \(\breve{U}\left( x\right) ={\hat{U}}\left( x\right) -\left( \frac{L_{N}}{2}+\frac{\lambda _{0}}{4}\right) \left\| x\right\| ^{2}\) then for \(\left\| x\right\| >R+2\epsilon +\delta \), \({\hat{U}}\left( x\right) ={\overline{U}}\left( x\right) \) so \(\breve{U}\left( x\right) =U\left( x\right) \). For \(\left\| x\right\| \le R+2\epsilon +\delta \), we have

$$\begin{aligned}&\sup \left( \breve{U}(x)-U(x)\right) -\inf \left( \breve{U}(\ x)-U(x)\right) \nonumber \\&\quad \le \sup \left( {\hat{U}}(x)+\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}-{\overline{U}}(x)\right) -\inf \left( {\hat{U}}(x)+\frac{L_{N}+\lambda _{0}}{2}\left\| x\right\| ^{2}-{\overline{U}}(x)\right) \nonumber \\&\quad \le \sup \left( {\hat{U}}(x)-{\overline{U}}(x)\right) -\inf \left( {\hat{U}}(x)-{\overline{U}}(x)\right) +\left( L_{N}+\lambda _{0}\right) \left( R+2\epsilon +\delta \right) ^{2}\nonumber \\&\quad \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+2\left( L_{N}+\lambda _{0}\right) R^{2}+2\left( L_{N}+\lambda _{0}\right) R^{2}.\nonumber \\&\quad \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }. \end{aligned}$$
(7.79)

So for every \(x\in {\mathbb {R}}^{d},\)

$$\begin{aligned} \sup \left( \breve{U}(x)-U(x)\right) -\inf \left( \breve{U}(\ x)-U(x)\right) \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }. \end{aligned}$$

Since U(x) is \(PI(\gamma )\), and using [25]’s Lemma 1.2, we have \(\breve{U}(\ x)\) is Poincaré with constant

$$\begin{aligned} \gamma _{1}=\gamma e^{-4\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }. \end{aligned}$$

On the other hand, we know that \(\nabla ^{2}\breve{U}\left( x\right) =\nabla ^{2}{\hat{U}}\left( x\right) -\left( L_{N}+\frac{\lambda _{0}}{2}\right) I\succeq -LI\) for since \({\hat{U}}\left( x\right) \) is \(\frac{\lambda _{0}}{2}-\)strongly convex, which implies that \(\nabla ^{2}\breve{U}\left( x\right) \) is lower bounded by \(-LI\). In addition, for \(\left\| x\right\| >R+2\epsilon +\delta \) from \(2-\)dissipative assumption, we have for some a, \(b>0,\left\langle \nabla \breve{U}(\textrm{x}),x\right\rangle \ge a\left\| x\right\| ^{2}-b\), while for \(\left\| x\right\| \le R+2\epsilon +\delta \)

$$\begin{aligned} \left\langle \nabla \breve{U}\left( x\right) ,x\right\rangle&\ge \left\langle -\nabla \left( \left( \frac{L_{N}}{2}+\frac{\lambda _{0}}{4}\right) \left\| x\right\| ^{2}\right) ,x\right\rangle \\&\ge -\left( L_{N}+\frac{\lambda _{0}}{2}\right) \left\| x\right\| ^{2}\\&\ge -\left( L_{N}+\frac{\lambda _{0}}{2}\right) R^{2}.\\&\ge a\left\| x\right\| ^{2}-\left( L_{N}+\frac{\lambda _{0}}{2}\right) R^{2}-aR^{2}, \end{aligned}$$

so for every \(x\in {\mathbb {R}}^{d},\)

$$\begin{aligned} \left\langle \nabla \breve{U}(\textrm{x}),x\right\rangle \ge a\left\| x\right\| ^{2}-\left( b+\left( L_{N}+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}\right) . \end{aligned}$$

We choose \(W=e^{a_{1}\left\| x\right\| ^{2}}\) and \(V=a_{1}\left\| x\right\| ^{2}\) with \(0<a_{1}=\frac{a}{4}\). One sees that W satisfies Lyapunov inequality

$$\begin{aligned} {\mathcal {L}}W&=\left( 2a_{1}d+4a_{1}^{2}\left\| x\right\| ^{2}-2a_{1}\left\langle \nabla U(\textrm{x}),x\right\rangle \right) W\nonumber \\&\le \left( 2a_{1}d+4a_{1}^{2}\left\| x\right\| ^{2}-2a_{1}a\left\| x\right\| ^{2}+2a_{1}\left( b+\left( L_{N}+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}\right) \right) W\nonumber \\&\le \left( -\frac{a^{2}}{2}\left\| x\right\| ^{2}+\frac{a}{2}\left( b+\left( L_{N}+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}+d\right) \right) W. \end{aligned}$$
(7.80)

By [5]’s Theorem 1.9, \(\breve{U}\left( x\right) \) satisfies a defective log-Sobolev. In addition, by Rothaus’ lemma, a defective log-Sobolev inequality together with the \(PI(\gamma _{1})\) implies the log-Sobolev inequality with the log Sobolev constant is \(\gamma _{2}=\frac{2}{[A+(B+2)\frac{1}{\gamma _{_{1}}})]}\), where

$$\begin{aligned} A&=\left( 1-\frac{L}{2}\right) \frac{8}{a^{2}}+\zeta , \end{aligned}$$
(7.81)
$$\begin{aligned} B&=2\left[ \frac{2\left( \left( b+4\left( L_{N}+\frac{\lambda _{0}}{4}\right) R^{2}+aR^{2}\right) +d\right) }{a}+M_{2}\right] \left( 1-\frac{L}{2}+\frac{1}{\zeta }\right) , \end{aligned}$$
(7.82)

where \(M_{2}=\int \left\| x\right\| ^{2}e^{-\breve{U}(x)}\hbox {d}x\). But it is well known from Lemma 10 that \(M_{2}=O(d)\), so the log-Sobolev constant is just O(d). This concludes the proof. \(\square \)

1.5.4 Proof of Lemma 4.5

Theorem 7.18

Suppose \(\pi \) is \(\gamma -\)Poincaré, \(\alpha \)-mixture weakly smooth with \(\alpha _{N}=1\) and \(2-\)dissipativity (i.e., \(\left\langle \nabla U(x),x\right\rangle \ge a\left\| x\right\| ^{2}-b\)) for some \(a,b>0\), and for any \(x_{0}\sim p_{0}\) with \(H(p_{0}\vert \pi )=C_{0}<\infty \), the iterates \(x_{k}\sim p_{k}\) of LMC  with step size \(\eta \le 1\wedge \frac{1}{4\gamma _{3}}\wedge \left( \frac{\gamma _{3}}{16L^{1+\alpha }}\right) ^{\frac{1}{\alpha }}\)satisfies

$$\begin{aligned} H(p_{k}\vert \pi )\le e^{-\gamma _{3}\epsilon k}H(p_{0}\vert \pi )+\frac{8\eta ^{\alpha }D_{3}}{3\gamma _{3}}, \end{aligned}$$
(7.83)

where \(D_{3}\) is defined as in equation (3.8) and

$$\begin{aligned} M_{2}&=\int \left\| x\right\| ^{2}e^{-\breve{U}(x)}\textrm{d}x=O(d) \end{aligned}$$
(7.84)
$$\begin{aligned} \zeta&=\sqrt{2\left[ \frac{2\left( b+\left( L+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}+d\right) }{a}+M_{2}\right] \frac{e^{4\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }}{\gamma }}\end{aligned}$$
(7.85)
$$\begin{aligned} A&=\left( 1-\frac{L}{2}\right) \frac{8}{a^{2}}+\zeta ,\end{aligned}$$
(7.86)
$$\begin{aligned} B&=2\left[ \frac{2\left( \left( b+4\left( L+\frac{\lambda _{0}}{4}\right) R^{2}+aR^{2}\right) +d\right) }{a}+M_{2}\right] \left( 1-\frac{L}{2}+\frac{1}{\zeta }\right) ,\nonumber \\ \gamma _{3}&=\frac{2\gamma e^{-\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }}{A\gamma +(B+2)e^{4\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }}. \end{aligned}$$
(7.87)

Then, for any \(\epsilon >0\), to achieve \(H(p_{k}\vert \pi )<\epsilon \), it suffices to run ULA with step size \(\eta \le 1\wedge \frac{1}{4\gamma _{3}}\wedge \left( \frac{\gamma _{3}}{16L^{1+\alpha }}\right) ^{\frac{1}{\alpha }}\wedge \left( \frac{3\epsilon \gamma _{3}}{16D_{3}}\right) ^{\frac{1}{\alpha }}\)for \(k\ge \frac{1}{\gamma _{3}\eta }\log \frac{2\,H\left( p_{0}\vert \pi \right) }{\epsilon }\) iterations.

Proof

From Lemma 7.17, we can optimize over \(\zeta \) and get

$$\begin{aligned} \zeta =\sqrt{2\left[ \frac{2\left( b+\left( L+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}+d\right) }{a}+M_{2}\right] \frac{1}{\gamma _{1}}}. \end{aligned}$$

By using Holley–Stroock perturbation theorem [21], we have U(x) is log-Sobolev on \({\mathbb {R}}^{d}\) with constant

$$\begin{aligned} \gamma _{3}=\frac{2\gamma e^{-\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }}{[A\gamma +(B+2)e^{4\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) })]}. \end{aligned}$$

Applying Theorem 3.2, we get the desired result.\(\square \)

1.5.5 Proof of Theorem 4.5

Lemma 7.19

If U satisfies Assumption 2.3, then

$$\begin{aligned} U(x)\ge \frac{a}{2\beta }\Vert x\Vert ^{\beta }+U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}R^{\alpha _{i}+1}-b. \end{aligned}$$
(7.88)

Proof

Using the technique of [18], let \(R=\left( \frac{2b}{a}\right) ^{\frac{1}{\beta }}\), we lower bound \(U\left( x\right) \) when \(\left\| x\right\| \le R\),

$$\begin{aligned} U(x)&=U(0)+\int _{0}^{1}\left\langle \nabla U(tx),\ x\right\rangle \hbox {d}t\nonumber \\&\ge U(0)-\int _{0}^{1}\left\| \nabla U(tx)\right\| \left\| x\right\| \hbox {d}t\nonumber \\&\ge U(0)-\sum _{i}L_{i}\left\| x\right\| ^{\alpha _{i}+1}\int _{0}^{1}t^{\alpha _{i}}\hbox {d}t\nonumber \\&\ge U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left\| x\right\| ^{\alpha _{i}+1}\nonumber \\&\ge U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}R^{\alpha _{i}+1}. \end{aligned}$$
(7.89)

For \(\left\| x\right\| >R\), we can lower bound U as follows.

$$\begin{aligned} U(x)&=U(0)+\int _{0}^{\frac{R}{\Vert x\Vert }}\left\langle \nabla U(tx),\ x\right\rangle \hbox {d}t+\int _{\frac{R}{\Vert x\Vert }}^{1}\left\langle \nabla U(tx),\ x\right\rangle \hbox {d}t\nonumber \\&\ge U(0)-\int _{0}^{\frac{R}{\left\| x\right\| }}\left\| \nabla U(tx)\right\| \left\| x\right\| \hbox {d}t+\int _{\frac{R}{\left\| x\right\| }}^{1}\frac{1}{t}\left\langle \nabla U(tx),\ tx\right\rangle \hbox {d}t\nonumber \\&\ge U(0)-\left\| x\right\| \int _{0}^{\frac{R}{\left\| x\right\| }}\sum _{i}L_{i}\left\| tx\right\| ^{\alpha _{i}}\hbox {d}t+\int _{\frac{R}{\left\| x\right\| }}^{1}\frac{1}{t}\left( a\left\| tx\right\| ^{\beta }-b\right) \hbox {d}t\nonumber \\&{\mathop {\ge }\limits ^{_{1}}}U(0)-\sum _{i}L_{i}\left\| x\right\| ^{\alpha _{i}+1}\int _{0}^{\frac{R}{\left\| x\right\| }}t^{\alpha _{i}}\hbox {d}t\ +\frac{1}{2}\int _{\frac{R}{\left\| x\right\| }}^{1}\frac{1}{t}a\left\| tx\right\| ^{\beta }\hbox {d}t\nonumber \\&{\mathop {\ge }\limits ^{_{2}}}U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left\| x\right\| ^{\alpha _{i}+1}\frac{R^{\alpha _{i}+1}}{\left\| x\right\| ^{\alpha _{i}+1}}+\frac{a}{2}\left\| x\right\| ^{\beta }\int _{\frac{R}{\left\| x\right\| }}^{1}t^{\beta -1}\hbox {d}t\nonumber \\&\ge U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}R^{\alpha _{i}+1}+\frac{a}{2\beta }\left\| x\right\| ^{\beta }\left( 1-\frac{R^{\beta }}{\left\| x\right\| ^{\beta }}\right) \nonumber \\&\ge \frac{a}{2\beta }\left\| x\right\| ^{\beta }+U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}R^{\alpha _{i}+1}-b, \end{aligned}$$
(7.90)

where 1 follows from Assumption 2.3 and 2 uses the fact that if \(t{\displaystyle \ge \frac{R}{\left\| x\right\| }}\) then \({\displaystyle a\left\| tx\right\| ^{\beta }-b\ge \frac{a}{2}\left\| tx\right\| ^{\beta }}.\) Now, since for \(\left\| x\right\| \le R\), \(\frac{a}{2\beta }\left\| x\right\| ^{\beta }\le b\), we combine the inequality for \(\left\| x\right\| \le R\) and get

$$\begin{aligned} U(x)\ge \frac{a}{2\beta }\left\| x\right\| ^{\beta }+U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}R^{\alpha _{i}+1}-b. \end{aligned}$$
(7.91)

\(\square \)

1.5.6 Proof of Theorem 4.7

Lemma 7.20

Assume that U satisfies Assumption 2.3, then for \(\pi =e^{-U}\) and any distribution \(\rho \), we have

$$\begin{aligned} \frac{4\beta }{a}\left[ \textrm{H}(\rho \vert \pi )+{\tilde{d}}+{\tilde{\mu }}\right] \ge \textrm{E}_{\rho }\left[ \left\| x\right\| {}^{\beta }\right] , \end{aligned}$$
(7.92)

where

$$\begin{aligned} {\tilde{\mu }}&=\frac{1}{2}\log \left( \frac{2}{\beta }\right) +\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b+\vert U(0)\vert , \end{aligned}$$
(7.93)
$$\begin{aligned} {\tilde{d}}&=\frac{d}{\beta }\left[ \frac{\beta }{2}log\left( \pi \right) +\log \left( \frac{4\beta }{a}\right) +\left( 1-\frac{\beta }{2}\right) \log \left( \frac{d}{2e}\right) \right] . \end{aligned}$$
(7.94)

Proof

Let \(q(x)=e^{\frac{a}{4\beta }\left\| x\right\| {}^{\beta }-U(x)}\) and \(C_{q}=\int e^{\frac{a}{4\beta }\left\| x\right\| {}^{\beta }-U(x)}\hbox {d}x\). First, we need to bound \(\log C_{q}\). Using Lemma 7.19, we have

$$\begin{aligned} U(x)&\ge \frac{a}{2\beta }\left\| x\right\| ^{\beta }+U(0)-\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}-b. \end{aligned}$$
(7.95)

Regrouping the terms and integrating both sides gives

$$\begin{aligned}&\int e^{\frac{a}{4\beta }\left\| x\right\| {}^{\beta }-U(x)}\hbox {d}x\le e^{-U(0)+\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b}\int e^{-\frac{a}{4\beta }\left\| x\right\| {}^{\beta }}\hbox {d}x\nonumber \\&\quad =\frac{2\pi ^{d/2}}{\beta }\left( \frac{4\beta }{a}\right) ^{\frac{d}{\beta }}e^{-U(0)+\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b}\frac{\Gamma \left( \frac{d}{\beta }\right) }{\Gamma \left( \frac{d}{2}\right) }\nonumber \\&\quad \le \frac{2\pi ^{d/2}}{\beta }\left( \frac{4\beta }{a}\right) ^{\frac{d}{\beta }}\frac{\left( \frac{d}{\beta }\right) ^{\frac{d}{\beta }-\frac{1}{2}}}{\left( \frac{d}{2}\right) ^{\frac{d}{2}-\frac{1}{2}}}e^{\frac{d}{2}-\frac{d}{\beta }}e^{-U(0)+\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b}, \end{aligned}$$
(7.96)

where the equality on the second line comes from using polar coordinates and the third line follows from an inequality for the ratio of Gamma functions [23]. Plugging this back into the previous inequality and taking logs, we deduce

$$\begin{aligned} {\displaystyle \log (C_{q})}&={\displaystyle \log \left( \int e^{\frac{a}{4\beta }\left\| x\right\| {}^{\beta }-U(x)}\hbox {d}x\right) }\nonumber \\&\le \frac{d}{2}\log (\pi )+\frac{d}{\beta }\log \left( \frac{4\beta }{a}\right) +\left( \frac{d}{\beta }-\frac{d}{2}\right) \log \left( \frac{d}{2e}\right) \nonumber \\&\quad +\left( \frac{d}{\beta }+\frac{1}{2}\right) \log \left( \frac{2}{\beta }\right) +\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b+\vert U(0)\vert \nonumber \\&\le \frac{d}{\beta }\left[ \frac{\beta }{2}log(\pi )+\log \left( \frac{4\beta }{a}\right) +\left( 1-\frac{\beta }{2}\right) \log \left( \frac{d}{2e}\right) \right] \nonumber \\&\quad +\frac{1}{2}\log \left( \frac{2}{\beta }\right) +\sum _{i}\frac{L_{i}}{\alpha _{i}+1}\left( \frac{2b}{a}\right) ^{\frac{\alpha _{i}+1}{\beta }}+b+\vert U(0)\vert \nonumber \\&\le {\tilde{d}}+\tilde{\mu ,} \end{aligned}$$
(7.97)

as definitions of \({\tilde{d}}\) and \({\tilde{\mu }}\). Using this bound on \(\log C_{q}\) we get

$$\begin{aligned} \textrm{H}(\rho \vert \pi )&=\int \rho \log \frac{\rho }{q/C_{q}}+\int \rho \log \frac{q/C_{q}}{\pi }\nonumber \\&=\textrm{H}(\rho \vert q/C_{q})+\textrm{E}_{\rho }\left[ \log \frac{q/C_{q}}{e^{-U}}\right] \nonumber \\&{\mathop {\ge }\limits ^{_{\left( 1\right) }}}\frac{a}{4\beta }\textrm{E}_{\rho }\left[ \left\| x\right\| {}^{\beta }\right] -\log \left( C_{q}\right) \end{aligned}$$
(7.98)
$$\begin{aligned}&\ge \frac{a}{4\beta }\textrm{E}_{\rho }\left[ \left\| x\right\| {}^{\beta }\right] -{\tilde{d}}-\tilde{\mu ,} \end{aligned}$$
(7.99)

where \(\left( 1\right) \) follows from definition of \(C_{q}\) and the fact that relative information is always non-negative. Rearranging the terms completes the proof. \(\square \)

Theorem 7.21

Suppose \(\pi \) is non-strongly convex outside the ball \({\mathbb {B}}(0,R)\), \(\alpha \)-mixture weakly smooth with \(\alpha _{N}=1\) and \(2-\)dissipativity (i.e.,\(\left\langle \nabla U(x),x\right\rangle \ge a\left\| x\right\| ^{2}-b\)) for some \(a,b>0\), and for any \(x_{0}\sim p_{0}\) with \(H(p_{0}\vert \pi )=C_{0}<\infty \), the iterates \(x_{k}\sim p_{k}\) of LMC  with step size \(\eta \le 1\wedge \frac{1}{4\gamma _{3}}\wedge \left( \frac{\gamma _{3}}{16L^{1+\alpha }}\right) ^{\frac{1}{\alpha }}\)satisfies

$$\begin{aligned} H(p_{k}\vert \pi )\le e^{-\gamma _{3}\epsilon k}H(p_{0}\vert \pi )+\frac{8\eta ^{\alpha }D_{3}}{3\gamma _{3}}, \end{aligned}$$
(7.100)

where \(D_{3}\) is defined as in equation (3.8) and for some universal constant K,

$$\begin{aligned} M_{2}&=\int \left\| x\right\| ^{2}e^{-\breve{U}(x)}\textrm{d}x=O(d) \end{aligned}$$
(7.101)
$$\begin{aligned} \zeta&=K\sqrt{64d\left[ \frac{2\left( b+\left( L+\frac{\lambda _{0}}{2}\right) R^{2}+aR^{2}+d\right) }{a}+M_{2}\right] \left( \frac{a+b+2aR^{2}+3}{ae^{-4\left( 4L_{N}R^{2}+4LR^{1+\alpha }\right) }}\right) }\end{aligned}$$
(7.102)
$$\begin{aligned} A&=\left( 1-\frac{L}{2}\right) \frac{8}{a^{2}}+\zeta ,\end{aligned}$$
(7.103)
$$\begin{aligned} B&=2\left[ \frac{2\left( \left( b+4\left( L+\frac{\lambda _{0}}{4}\right) R^{2}+aR^{2}\right) +d\right) }{a}+M_{2}\right] \left( 1-\frac{L}{2}+\frac{1}{\zeta }\right) ,\nonumber \\ \gamma _{3}&=\frac{2e^{-\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}+4L_{N}R^{2}+4LR^{1+\alpha }\right) }}{A+(B+2)32K^{2}d\left( \frac{a+b+2aR^{2}+3}{a}\right) e^{4\left( 4L_{N}R^{2}+4LR^{1+\alpha }\right) }}=\frac{1}{O(d)}. \end{aligned}$$
(7.104)

Then, for any \(\epsilon >0\), to achieve \(H(p_{k}\vert \pi )<\epsilon \), it suffices to run ULA with step size \(\eta \le 1\wedge \frac{1}{4\gamma _{3}}\wedge \left( \frac{\gamma _{3}}{16L^{1+\alpha }}\right) ^{\frac{1}{\alpha }}\wedge \left( \frac{3\epsilon \gamma _{3}}{16D_{3}}\right) ^{\frac{1}{\alpha }}\)for \(k\ge \frac{1}{\gamma _{3}\eta }\log \frac{2\,H\left( p_{0}\vert \pi \right) }{\epsilon }\) iterations.

Proof

Using Lemma 2, there exists \(\breve{U}\left( x\right) \in C^{1}({\mathbb {R}}^{d})\) with its Hessian exists everywhere on \({\mathbb {R}}^{d}\), and \(\breve{U}\) is convex on \({\mathbb {R}}^{d}\) such that

$$\begin{aligned} \sup \left( \breve{U}(\ x)-U(\ x)\right) -\inf \left( \breve{U}(\ x)-U(\ x)\right) \le 2\sum _{i}L_{i}R^{1+\alpha _{i}}. \end{aligned}$$
(7.105)

We can prove by two different approaches.

First approach: Since \(\breve{U}\) is convex, by Theorem 1.2 of [4], \(\breve{U}\) satisfies Poincaré inequality with constant

$$\begin{aligned} \gamma&\ge \frac{1}{4K^{2}\int \left\| x-E_{\pi }(x)\right\| ^{2}\pi \left( x\right) \hbox {d}x}\\&{\mathop {\ge }\limits ^{_{1}}}\frac{1}{8K^{2}\left( E_{\pi }\left( \left\| x\right\| ^{2}\right) +\left\| E_{\pi }(x)\right\| ^{2}\right) }\\&{\mathop {\ge }\limits ^{}}\frac{1}{16K^{2}E_{\pi }\left( \left\| x\right\| ^{2}\right) }, \end{aligned}$$

where K is a universal constant, step 1 follows from Young inequality and the last line is due to Jensen inequality. In addition, for \(\left\| x\right\| >R+2\epsilon +\delta \) from \(2-\)dissipative assumption, we have for some a, \(b>0,\left\langle \nabla \breve{U}(x),x\right\rangle =\left\langle \nabla U(x),x\right\rangle \ge a\left\| x\right\| ^{2}-b\), while for \(\left\| x\right\| \le R+2\epsilon +\delta \) by convexity of \(\breve{U}\)

$$\begin{aligned} \left\langle \nabla \breve{U}(x),x\right\rangle&\ge 0\\&\ge a\left\| x\right\| ^{2}-a\left( R+2\epsilon +\delta \right) ^{2}\\&\ge a\left\| x\right\| ^{2}-2aR^{2}. \end{aligned}$$

so for every \(x\in {\mathbb {R}}^{d},\)

$$\begin{aligned} \left\langle \nabla \breve{U}(x),x\right\rangle \ge a\left\| x\right\| ^{2}-\left( b+2aR^{2}\right) . \end{aligned}$$

Therefore, \(\breve{U}(\textrm{x})\) also satisfies \(2-\)dissipative, which implies

$$\begin{aligned} E_{\breve{\pi }}\left( \left\| x\right\| ^{2}\right) \le 2d\left( \frac{a+b+2aR^{2}+3}{a}\right) , \end{aligned}$$

so the Poincaré constant satisfies

$$\begin{aligned} \gamma {\mathop {\ge }\limits ^{}}\frac{1}{32K^{2}d\left( \frac{a+b+2aR^{2}+3}{a}\right) }. \end{aligned}$$

From [25]’s Lemma 1.2, we have U satisfies Poincaré inequality with constant

$$\begin{aligned} \gamma \ge \frac{1}{32K^{2}d\left( \frac{a+b+2aR^{2}+3}{a}\right) }e^{-4\left( 2\sum _{i}L_{i}R^{1+\alpha _{i}}\right) }. \end{aligned}$$

Now, applying the previous section result, we derive the desired result.

Second approach: By employing Lemma 7.25, combined with \(2-\)dissipative assumption, we get:

$$\begin{aligned} \int e^{\frac{a}{8}\left\| x\right\| {}^{2}-U(x)}\hbox {d}x\le e^{\left( {\tilde{d}}+{\tilde{\mu }}\right) }, \end{aligned}$$
(7.106)

which in turn implies

$$\begin{aligned} \int e^{\frac{a}{8}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x\le e^{\left( {\tilde{d}}+{\tilde{\mu }}\right) +2\sum _{i}L_{i}R^{1+\alpha _{i}}}. \end{aligned}$$
(7.107)

Let \(\mu _{1}=\frac{e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}}{\int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x}\) and assume that \(\mu _{2}=\frac{\mu _{1}e^{\frac{a}{16p}\left\| x\right\| {}^{2}}}{\int e^{\frac{a}{16p}\left\| x\right\| {}^{2}}\hbox {d}\mu _{1}}\). We have \(\mu _{1}\) is \(\frac{a}{8p}\) strongly convex or log-Sobolev with constant \(\frac{a}{8p}\) and by Cauchy–Schwarz inequality, we have

$$\begin{aligned} \left\| \frac{\hbox {d}\mu _{2}}{\hbox {d}\mu _{1}}\right\| _{L^{p}\left( \mu _{1}\right) }^{p}&=\frac{\int e^{\frac{a}{16}\left\| x\right\| {}^{2}}\hbox {d}\mu _{1}}{\left( \int e^{\frac{a}{16p}\left\| x\right\| {}^{2}}\hbox {d}\mu _{1}\right) ^{p}}\nonumber \\&\le \left( \int e^{\frac{a}{8}\left\| x\right\| {}^{2}}\hbox {d}\mu _{1}\right) ^{\frac{1}{2}}\left( \int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}}\hbox {d}\mu _{1}\right) ^{p}\nonumber \\&=\left( \frac{\int e^{\frac{a\left( 2p-1\right) }{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x}{\int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x}\right) ^{\frac{1}{2}}\left( \frac{\int e^{\frac{-a}{8p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x}{\int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x}\right) ^{p} \end{aligned}$$
(7.108)

Since

$$\begin{aligned} \Vert U(\ x)-U(0)\Vert&=\Vert U(\ x)-U(0)-\ \left\langle x,\nabla U(0)\right\rangle \Vert \nonumber \\&\le \sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{1+\alpha _{i}}+\frac{L_{N}}{2}\left\| x\right\| ^{2}, \end{aligned}$$
(7.109)

this implies \(U(\ x)\le \Vert U(0)\Vert +\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{1+\alpha _{i}}+\frac{L_{N}}{2}\left\| x\right\| ^{2}\) which in turn indicates

$$\begin{aligned} \int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x&\ge \int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\Vert U(0)\Vert -\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{1+\alpha _{i}}-\frac{L_{N}}{2}\left\| x\right\| ^{2}-2\sum _{i}L_{i}R^{1+\alpha _{i}}}\hbox {d}x\nonumber \\&\ge e^{-\Vert U(0)\Vert -2\sum _{i}L_{i}R^{1+\alpha _{i}}}\int e^{\frac{-a}{16p}\left\| x\right\| {}^{2}-\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}\left\| x\right\| ^{1+\alpha _{i}}-\frac{L_{N}}{2}\left\| x\right\| ^{2}}\hbox {d}x\nonumber \\&\ge e^{-\Vert U(0)\Vert -2\sum _{i}L_{i}R^{1+\alpha _{i}}-\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}}\int e^{-\left( \frac{a}{16p}+\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}+\frac{L_{N}}{2}\right) \left\| x\right\| {}^{2}}\hbox {d}x\nonumber \\&\ge \frac{\pi ^{\frac{d}{2}}}{\left( \frac{a}{16p}+\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}+\frac{L_{N}}{2}\right) ^{\frac{d}{2}}}e^{-\Vert U(0)\Vert -2\sum _{i}L_{i}R^{1+\alpha _{i}}-\frac{L}{1+\alpha }}. \end{aligned}$$
(7.110)

On the other hand,

$$\begin{aligned} \int e^{\frac{-a}{8p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x&\le \int e^{\frac{a\left( 2p-1\right) }{16p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x\nonumber \\&\le \int e^{\frac{a}{8p}\left\| x\right\| {}^{2}-\breve{U}(x)}\hbox {d}x\nonumber \\&\le e^{\left( {\tilde{d}}+{\tilde{\mu }}\right) +2\sum _{i}L_{i}R^{1+\alpha _{i}}}. \end{aligned}$$
(7.111)

Combining this with previous inequality, we obtain

$$\begin{aligned} \left\| \frac{\hbox {d}\mu _{2}}{\hbox {d}\mu _{1}}\right\| _{L^{p}\left( \mu _{1}\right) }^{p}&\le \left( \frac{e^{\left( \left( {\tilde{d}}+{\tilde{\mu }}\right) +2\sum _{i}L_{i}R^{1+\alpha _{i}}\right) }}{\frac{\pi ^{\frac{d}{2}}}{\left( \frac{a}{16p}+\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}+\frac{L_{N}}{2}\right) ^{\frac{d}{2}}}e^{-\Vert U(0)\Vert -2\sum _{i}L_{i}R^{1+\alpha _{i}}-\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}}}\right) ^{p+\frac{1}{2}}\nonumber \\&=\Lambda ^{p}. \end{aligned}$$
(7.112)

Taking logarithm of \(\Lambda \), we get

$$\begin{aligned} \log \Lambda&=\frac{\left( p+\frac{1}{2}\right) }{p}\log \left( \frac{e^{\left( \left( {\tilde{d}}+{\tilde{\mu }}\right) +2\sum _{i}L_{i}R^{1+\alpha _{i}}\right) }}{\frac{\pi ^{\frac{d}{2}}}{\left( \frac{a}{16p}+\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}+\frac{L_{N}}{2}\right) ^{\frac{d}{2}}}e^{-\Vert U(0)\Vert -2\sum _{i}L_{i}R^{1+\alpha _{i}}-\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}}}\right) \nonumber \\&=\frac{\left( p+\frac{1}{2}\right) }{p}\left( {\tilde{d}}+\frac{d}{2}\log \left( \frac{a}{8p}+\frac{a}{16p}+\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}+\frac{L_{N}}{2}\right) -\frac{d}{2}\log \left( \pi \right) \right) \nonumber \\&\quad +\frac{\left( p+\frac{1}{2}\right) }{p}\left( {\tilde{\mu }}+2\sum _{i}L_{i}R^{1+\alpha _{i}}+\Vert U(0)\Vert +\sum _{i<N}\frac{L_{i}}{1+\alpha _{i}}\right) \nonumber \\&={\tilde{O}}\left( d\right) . \end{aligned}$$
(7.113)

Since \(\mu _{2}\) is log concave, from Lemma 9, we have for some universal constant C (not depending on d), it is log-Sobolev with constant

$$\begin{aligned} C({\displaystyle \Lambda ,p)}&=\frac{1}{C}\frac{a}{8p}\frac{p-1}{p}\frac{1}{1+\log \Lambda }\nonumber \\&=\frac{1}{C}\frac{a}{8p}\frac{p-1}{p}\frac{1}{1+{\tilde{O}}\left( d\right) }\nonumber \\&=\frac{1}{{\tilde{O}}\left( d\right) }. \end{aligned}$$
(7.114)

From this, by using Holley–Stroock perturbation theorem, we obtain \(U(\ x)\) is log Sobolev on \({\mathbb {R}}^{d}\) with constant \(\frac{1}{{\tilde{O}}\left( d\right) }e^{-2\sum _{i}L_{i}R^{1+\alpha _{i}}}.\) Now, applying Theorem 3.2, we derive the desired result.

\(\square \)

1.6 Appendix F: Proof of Additional Lemmas

Lemma 7.22

For any \(0\le \varpi \le k\in N^{+}\), we have

$$\begin{aligned} \Vert x+y\Vert ^{\varpi }\le 2^{k-1}(\Vert x\Vert ^{\varpi }+\Vert y\Vert ^{\varpi }). \end{aligned}$$
(7.115)

Proof

Let’s consider functions \(f_{k}(u)=2^{k-1}(u^{\varpi }+1)-(1+u)^{\varpi }\). We prove \(f_{k}(u)\ge 0\) for every \(u\ge 0\) by induction. For \(k=1\), since \(0\le \varpi \le 1,\) we have \(f_{1}^{\prime }(u)=\varpi u^{\varpi -1}-\varpi (1+u)^{\varpi -1}\ge 0\). This implies \(f_{1}(u)\) increases on \(\left[ 0,\infty \right] \) and since \(f(0)=0,\) which in turn indicates \(f(u)\ge 0.\) Therefore, the statement is true for \(k=1.\)

Assume that it is true for \(k=n\), we will show that it is also true for \(k=n+1.\) If we differentiate \(f_{n+1}(u)\), we get

$$\begin{aligned} f_{n+1}^{\prime }(u)&=2^{n}\varpi u^{\varpi -1}-\varpi (1+u)^{\varpi -1}\nonumber \\&=\varpi \left( 2^{n}u^{\varpi -1}-(1+u)^{\varpi -1}\right) \nonumber \\&\ge 0, \end{aligned}$$
(7.116)

for \(1\le \varpi \le n+1\) by induction assumption while for \(0\le \varpi \le 1\), \(u^{\varpi -1}-(1+u)^{\varpi -1}\ge u^{\varpi -1}-(1+u)^{\varpi -1}\ge 0.\) Hence, f increases on \(\left[ 0,\infty \right] \) and since \(f(0)=2^{k-1}-1\ge 0,\) this implies \(f\ge 0\).

Applying to our case for \(0\le \varpi \le k\),

$$\begin{aligned} 2^{k-1}(\Vert x\Vert ^{\varpi }+\Vert y\Vert ^{\varpi })&=\Vert x\Vert ^{\varpi }2^{k-1}\left( 1+\left( \frac{\left\| y\right\| }{\left\| x\right\| }\right) ^{\omega }\right) \nonumber \\&\ge \Vert x\Vert ^{\varpi }\left( 1+\left( \frac{\left\| y\right\| }{\left\| x\right\| }\right) \right) ^{\varpi }\nonumber \\&=\left( \left\| x\right\| +\left\| y\right\| \right) ^{\varpi }\nonumber \\&\ge \left( \left\| x+y\right\| \right) ^{\varpi }, \end{aligned}$$
(7.117)

which conclude our proof. \(\square \)

Lemma 7.23

For \(\theta >0\), \(f\left( r\right) =m\left( r\right) r^{2}=\mu \left( 1+r^{2}\right) ^{-\frac{\theta }{2}}r{}^{2}\ge \frac{\mu }{2}r{}^{2-\theta }-\frac{\mu }{2}2{}^{\frac{2-\theta }{\theta }},\) and for \(\theta =0,\)  \(f\left( r\right) =\mu r{}^{2}.\)

Proof

For \(\theta =0,\) it is straightforward. For \(\theta >0,\) from Lemma 2 above, for \(r\ge 2^{\frac{1}{\theta }}\),

$$\begin{aligned} f\left( r\right)&=\mu \left( 1+r^{2}\right) ^{-\frac{\theta }{2}}r{}^{2}\nonumber \\&\ge \mu \left( 1+r^{\theta }\right) ^{-1}r{}^{2}\nonumber \\&=\mu \left( r^{2\theta }-1\right) ^{-1}r{}^{2}\left( r^{\theta }-1\right) \nonumber \\&\ge \mu r{}^{2-2\theta }\left( r^{\theta }-1\right) \nonumber \\&\ge \frac{\mu }{2}r{}^{2-\theta }. \end{aligned}$$
(7.118)

For \(r<2^{\frac{1}{\theta }}\), \(f\left( r\right) \ge 0\ge \frac{\mu }{2}r{}^{2-\theta }-\frac{\mu }{2}2{}^{\frac{2-\theta }{\theta }}\) which concludes statement. \(\square \)

Lemma 7.24

\(f\left( \theta \right) =\Vert \left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}-\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}\Vert \)is increasing function.

Proof

If \(\left\| x\right\| \ge \left\| x-y\right\| ,\) we have \(f\left( \theta \right) =\left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}-\left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}\). Differentiating f with respect to \(\theta \) gives

$$\begin{aligned} f^{\prime }\left( \theta \right)&=\frac{1}{2}\ln \left( 1+\left\| x\right\| ^{2}\right) \left( 1+\left\| x\right\| ^{2}\right) ^{\frac{\theta }{2}}\nonumber \\&-\frac{1}{2}\ln \left( 1+\left\| x-y\right\| ^{2}\right) \left( 1+\left\| x-y\right\| ^{2}\right) ^{\frac{\theta }{2}}\nonumber \\&\ge 0. \end{aligned}$$
(7.119)

Similarly, if \(\left\| x\right\| \le \left\| x-y\right\| \) we also obtain \(f^{\prime }\left( \theta \right) \ge 0,\) which implies that f increases as desired. \(\square \)

Lemma 7.25

If \(\xi \sim N_{p}\left( 0,I_{d}\right) \) then \(d^{\left\lfloor \frac{n}{p}\right\rfloor }\le E(\left\| \xi \right\| _{p}^{n})\le \left[ d+\frac{n}{2}\right] ^{\frac{n}{p}}\), where\(\left\lfloor x\right\rfloor \) denotes the largest integer less than or equal to x. If \(n=kp,\) then \(E(\left\| \xi \right\| _{p}^{n})=d..(d+k-1)\).

Proof

From [36], we have \(E(\left\| \xi \right\| _{p}^{n})=p^{\frac{n}{p}}\frac{\Gamma \left( \frac{d+n}{p}\right) }{\Gamma \left( \frac{d}{p}\right) }.\)

Since \(\Gamma \) is an increasing function,

$$\begin{aligned} p^{\frac{n}{p}}\frac{\Gamma \left( \frac{d+n}{p}\right) }{\Gamma \left( \frac{d}{p}\right) }\ge p^{\frac{n}{p}}\frac{\Gamma \left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor \right) }{\Gamma \left( \frac{d}{p}\right) }=p^{\frac{n}{p}}\frac{d}{p}\ldots \left( \frac{d}{p}+k-1\right) \ge d^{\left\lfloor \frac{n}{p}\right\rfloor }. \end{aligned}$$

If \(n=kp\) for \(k\in N\) then \(E(\left\| \xi \right\| _{p}^{n})=p^{\frac{n}{p}}\frac{d}{p}\ldots \left( \frac{d}{p}+k-1\right) .\) If \(n\ne kp\), let \(\left\lfloor \frac{n}{p}\right\rfloor =k\). Since \(\Gamma \) is log-convex, by Jensen’s inequality for any \(p\ge 1\), we acquire

$$\begin{aligned}&\left( 1-\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}\right) \log \Gamma \left( \frac{d}{p}\right) +\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}\log \Gamma \left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor +1\right) \\&\quad \ge \log \Gamma \left( \left( 1-\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}\right) \frac{d}{p}+\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}\left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor +1\right) \right) \\&\quad \ge \log \Gamma \left( \frac{d+n}{p}\right) >0. \end{aligned}$$

Raising e to the power of both sides, we get

$$\begin{aligned} \Gamma \left( \frac{d}{p}\right) ^{\left( 1-\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}\right) }\Gamma \left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor +1\right) ^{\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}}\ge \Gamma \left( \frac{d+n}{p}\right) , \end{aligned}$$

which implies that

$$\begin{aligned} \begin{array}{cc} \left[ \frac{\Gamma \left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor +1\right) }{\Gamma \left( \frac{d}{p}\right) }\right] ^{\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}} &{} \ge \frac{\Gamma \left( \frac{d+n}{p}\right) }{\Gamma \left( \frac{d}{p}\right) }\\ \left[ \frac{d}{p}\ldots \left( \frac{d}{p}+\left\lfloor \frac{n}{p}\right\rfloor \right) \right] ^{\frac{n}{p\left\lfloor \frac{n}{p}\right\rfloor +p}} &{} \ge \frac{\Gamma \left( \frac{d+n}{p}\right) }{\Gamma \left( \frac{d}{p}\right) }. \end{array} \end{aligned}$$

Combining with \(E(\left\| \xi \right\| _{p}^{n})=p^{\frac{n}{p}}\frac{\Gamma \left( \frac{d+n}{p}\right) }{\Gamma \left( \frac{d}{p}\right) }\) gives the conclusion. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, D., Dang, X. & Chen, Y. Unadjusted Langevin Algorithm for Non-convex Weakly Smooth Potentials. Commun. Math. Stat. (2023). https://doi.org/10.1007/s40304-023-00350-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40304-023-00350-w

Keywords

Mathematics Subject Classification

Navigation