1 Introduction

We consider the problem of approximate sampling of a distribution, which is, in the context of Bayesian inference, a permanently present challenge. The goal is to simulate realizations of a random variable that is distributed according to a probability measure of interest \(\pi \) defined on \((\mathbb {R}^d, \mathcal {B}(\mathbb {R}^d))\), with \(d \in \mathbb {N}\) and \(\mathcal {B}(\mathbb {R}^d)\) being the Borel \(\sigma \)-algebra of \(\mathbb {R}^d\). We assume to be able to evaluate a not necessarily normalized Lebesgue density of \(\pi \) given by \(\varrho : \mathbb {R}^d \rightarrow \mathbb {R}_+\), i.e., for any \(A \in \mathcal {B}(\mathbb {R}^d)\) we have

$$\begin{aligned} \pi (A) = \frac{1}{C} \int _{A} \varrho (x) \text {d}x, \end{aligned}$$
(1)

where

$$\begin{aligned} C:= \int _{\mathbb {R}^d} \varrho (x) \text {d}x \in (0,\infty ) \end{aligned}$$

is an unknown normalization constant. Because of the only partial knowledge about \(\varrho \), the standard approach for dealing with such sampling problems is to construct a Markov chain with limit distribution \(\pi \).

The slice sampling methodology (see e.g. Besag and Green 1993) provides a framework for the construction of a Markov chain \((X_n)_{n\in \mathbb {N}_0}\) with \(\pi \)-reversible transition kernel, where the distribution of \(X_n\) converges (under weak regularity conditions) to the distribution of interest, see e.g. Roberts and Rosenthal (1999). We focus here on polar slice sampling (PSS) that exploits the almost surely well-defined factorization \(\varrho (x) = p_0(x) p_1(x)\) with

$$\begin{aligned} p_0(x):= \left\| x\right\| ^{1-d}, \qquad p_1(x):= \left\| x\right\| ^{d-1} \varrho (x), \end{aligned}$$
(2)

where \(\left\| \cdot \right\| \) denotes the Euclidean norm in \(\mathbb {R}^d\). The choice of this particular factorization in the slice sampling context has been proposed in Roberts and Rosenthal (2002). The resulting transition mechanism of the corresponding Markov chain \((X_n)_{n\in \mathbb {N}_0}\) on \((\mathbb {R}^d, \mathcal {B}(\mathbb {R}^d))\) can be presented as follows.

Algorithm 1.1

Given the target density \(\varrho = p_0 \, p_1\) and the current state \(X_{n-1}=x\), PSS w.r.t. \(\varrho \) generates the next instance \(X_n\) by the following two steps:

  1. 1.

    Draw an auxiliary random variable \(T_{n}\) with respect to (w.r.t.) the uniform distribution on \((0,p_1(x))\). Call the realization \(t_n\) and define the super level set

    $$\begin{aligned} L(t_n,p_1): = \{z \in \mathbb {R}^d \mid p_1(z) > t_n \}. \end{aligned}$$
  2. 2.

    Draw \(X_{n}\) from the distribution \(\mu _{t_n}\) on \(\mathbb {R}^d\) that is given by

    $$\begin{aligned} \mu _{t_n}(A):= \frac{\int _{A \cap L(t_n,p_1)} p_0(z) \, \text {d}z}{\int _{L(t_n,p_1)} p_0(z) \, \text {d}z}. \end{aligned}$$

Roberts and Rosenthal (2002) offer an implementation of Algorithm 1.1 using polar coordinates and an acceptance rejection approach w.r.t. radius and spherical element. Admittedly, already in easy examples, the acceptance probability can be very small, which turns the implementation to be computationally demanding, especially in the case of large d. However, in Schär et al. (2023) a Gibbsian polar slice sampling methodology has been proposed that on the one hand mimics PSS and on the other hand offers a computationally feasible scheme. Actually our investigation is very much driven by the hope to carry the result about the dimension-independence of PSS over to this related approach. To illustrate the empirically dimension-independent performance of PSS we present the following numerical illustration.

Motivating numerical illustration. We consider the polar and uniform slice sampling Markov chains. The transition mechanism of the latter is exactly as stated in Algorithm 1.1, except that it sets the factorization of \(\varrho \) to \(p_0(x):= 1\) and \(p_1(x):= \varrho (x)\) for any \(x\in \mathbb {R}^d\) (in contrast to (2)). For both the unimodal target densityFootnote 1\(\varrho (x) = \exp (-\left\| x\right\| )\) and the volcano-shaped target density \(\varrho (x) = \exp (-(\left\| x\right\| -2)^2)\), we plot in Fig. 1 proxies of the integrated autocorrelation time (IAT) of the aforementioned Markov chains, depending on the state space dimension d. Since the IAT characterizes the asymptotic mean squared errorFootnote 2 (and the asymptotic variance within CLTs) of the Markov chain Monte Carlo time average w.r.t. summary function \(g:\mathbb {R}^d\rightarrow \mathbb {R}\) we can conclude that the smaller it is, the ‘better’ is the Markov chain. We consider \(g(x) = \left\| x\right\| \).

Fig. 1
figure 1

Sample space dimension d versus approximations of the integrated autocorrelation time \(\text {IAT}_{g,P}\), as defined in (8), computed using the heuristic described in Gelman et al. (2013, Chapter 11.5). The figure on top depicts IATs for the target density \(\varrho (x) = \exp (-\left\| x\right\| )\) and that below IATs for target density \(\varrho (x) = \exp (-(\left\| x\right\| - 2)^2)\). Both figures use the summary function \(g(x) = \left\| x\right\| \). Each plotted point represents an average over \(n_{\text {rep}} = 10\) separate runs of the samplers, using \(n_{\text {it}} = 10^5\) iterations for each sampler and repetition

In Fig. 1 it is clearly visible that the IAT of PSS is constantly slightly larger than 1 regardless of the dimension. In contrast to that the IAT of uniform slice sampling (USS) increases as the state space dimension increases, showing that the efficiency of the corresponding Markov chain degenerates with increasing dimension. That is also theoretically confirmed in Natarovskii et al. (2021). However, it is particularly surprising that PSS exhibits such remarkably ‘good’ constant dimension behavior.

Roberts and Rosenthal (2002) explain this behavior in their Theorem 7 with Remark 8 by proving that for any rotational invariant \(\varrho \) that is log-concave along rays emanating from the origin and any initial state \(x\in \mathbb {R}^d\) satisfying \(p_1(x)\ge 0.01 \cdot \left( \sup _{w\in \mathbb {R}^d} p_1(w)\right) \) one has

$$\begin{aligned} \left\| P^{525}(x,\cdot ) - \pi \right\| _{\text {tv}} \le 0.01, \end{aligned}$$
(3)

where \(P^{525}(x,\cdot ) = \mathbb {P}(X_{525}\in \cdot \mid X_0=x)\) and

$$\begin{aligned} \left\| P^{525}(x,\cdot ) - \pi \right\| _{\text {tv}}:= \sup _{A \in \mathcal {B}(\mathbb {R}^d)} \left|{P^{525}(x,A) - \pi (A)}\right| \end{aligned}$$

is the total variation distance between \(\pi \) and \(P^{525}(x,\cdot )\). Actually, in RobertsandRosenthal (2002, Theorem 7), there is no rotational invariance assumption, but an asymmetry parameter appears, and, as long as this does not depend on the dimension, the former result holds by changing the 525 to some larger number still independent of d.

We refine and extend the result (3) by providing in the same setting a lower bound of 1/2 of the spectral gap of the Markov operator of PSS. Even though we postpone the definition and discussion about the spectral gap of a Markov chain (or corresponding transition kernel) to Sect. 2, we want to briefly motivate here that it is a crucial object. A quantitative lower bound of the gap of a transition kernel P corresponding to a Markov chain \((X_n)_{n \in \mathbb {N}_0}\) with stationary distribution \(\pi \) implies a number of useful properties. These include geometric convergence (with explicit convergence rate) of the distribution of \(X_n\) to \(\pi \) as \(n \rightarrow \infty \) (see (6) or (Roberts and Rosenthal 1997, Theorem 2.1) or (Gallegos-Herrada et al. 2022, Theorem 1)), a non-asymptotic error bound for the classical Markov chain Monte Carlo time average (Rudolf 2012, Theorem 3.41), a central limit theorem (CLT) (Kipnis and Varadhan 1986) and an estimate of the CLT asymptotic variance (Flegal and Jones 2010). Moreover, it implies an explicit upper bound of the IAT (which follows for example by (7) below) and therefore explains the motivating numerical illustration straightforwardly.

Our investigation builds upon the work of Natarovskii et al. (2021). There, among other things, a duality technique that gives sufficient conditions for quantitative lower bounds on the spectral gap of USS has been developed. We extend the duality argumentation to general slice sampling (with a non-specified factorization \(\varrho = \varrho _0 \,\varrho _1\)) and apply the resulting theory to PSS. More precisely, in the general setting we offer a sufficient condition of the spectral gap in terms of properties of the function \(\ell _{\varrho _0,\varrho _1} :(0,\infty ) \rightarrow \mathbb {R}_+\) given by

$$\begin{aligned} \ell _{\varrho _0,\varrho _1}(t):= \int _{L(t,\varrho _1)} \varrho _0(x) \,\text {d}x, \qquad t \in (0,\infty ), \end{aligned}$$

see Theorem 3.9 and Definition 3.7 below. Applying this result in the context of the PSS factorization yields the dimension-independent lower bound of 1/2 of the spectral gap, as long as \(\varrho \) is rotational invariant, log-concave along rays emanating from the origin and sufficiently smooth, see Theorem 3.13.

We now provide some guidance trough the structure of the paper. In the next section we introduce our notation and define all required Markov chain related objects. Afterwards, in Sect. 3.1, we discuss how a number of theoretical results from Natarovskii et al. (2021) translate from USS to the general case. In Sect. 3.2, we apply the results from Sect. 3.1 to PSS, thereby proving a lower bound on its spectral gap. Concluding remarks with a discussion of our results and an outlook can be found in Sect. 4.

2 Preliminaries

We introduce our notation and state some useful facts. All appearing random variables map from a joint sufficiently rich probability space onto their respective state space. With \(\lambda \) we denote the Lebesgue measure on \((\mathbb {R},\mathcal {B}(\mathbb {R}))\) and for the surface measure on the Euclidean unit sphere \(\mathbb {S}^{d-1}\) equipped with its natural Borel \(\sigma \)-algebra \(\mathcal {B}(\mathbb {S}^{d-1})\) we write \(\sigma _{d-1}\). We provide details about kernels.

Let \((G,\mathcal {G})\) and \((H,\mathcal {H})\) be measurable spaces. A transition kernel on \(G \times \mathcal {H}\) is a mapping \(P{:}G \times \mathcal {H}\rightarrow [0,1]\) such that \(P(\cdot ,A)\) is a measurable function for all \(A \in \mathcal {H}\) and \(P(x,\cdot ) \in \mathcal {M}_1(H)\) for all \(x \in G\), where \(\mathcal {M}_1(H)\) denotes the set of probability measures on \((H,\mathcal {H})\). Let P be a transition kernel on \(G \times \mathcal {G}\), then P acts on measurable functions \(g{:}G \rightarrow \mathbb {R}\) by

$$\begin{aligned} P g (x):= \int _G g(y) P(x, \text {d}y), \quad x \in G. \end{aligned}$$
(4)

Let Q be a transition kernel on \(G \times \mathcal {H}\) and let \(\xi \in \mathcal {M}_1(G)\), then Q acts on \(\xi \) as

$$\begin{aligned} \xi Q(A):= \int _G Q(x,A) \xi (\text {d}x), \quad A \in \mathcal {H}, \end{aligned}$$

and defines a probability measure, i.e., \(\xi Q \in \mathcal {M}_1(H)\). Moreover, the tensor product of \(\xi \) and Q is defined as the probability measure on \((G \times H, \mathcal {G}\times \mathcal {H})\) determined by

$$\begin{aligned} (\xi \otimes Q)(A \times B)&:= \int _A Q(x,B) \xi (\text {d}x) \\&= \int _A \int _B Q(x,\text {d}y) \xi (\text {d}x) , \; A \in \mathcal {G}, B \in \mathcal {H}. \end{aligned}$$

Additionally, let R be a transition kernel on \(H \times \mathcal {G}\), then the composition of Q and R is the transition kernel QR on \(G \times \mathcal {G}\) defined by

$$\begin{aligned} Q R(x, A):= \int _H R(y, A) Q(x, \text {d}y), \quad x \in G, A \in \mathcal {G}. \end{aligned}$$

Using this, for a transition kernel P on \(G\times \mathcal {G}\), one recursively defines \(P^1:= P\) and \(P^n:= P P^{n-1}\) for \(n \ge 2\).

For a Markov chain \((X_n)_{n\in \mathbb {N}_0}\) on \((G,\mathcal {G})\) with transition kernel P and initial distribution \(\xi \in \mathcal {M}_1(G)\) it is well known that the probability measure \(\xi P^n\) coincides with the distribution of \(X_n\). We say that the transition kernel P (and the corresponding Markov chain) has invariant distribution \(\pi \in \mathcal {M}_1(G)\) if \(\pi P = \pi \). Moreover, it is reversible w.r.t. \(\pi \) if \((\pi \otimes P)(A\times B) = (\pi \otimes P)(B\times A)\) for all \(A,B\in \mathcal {G}\).

We turn to the definition of the spectral gap of a transition kernel P on \(G \times \mathcal {G}\) that is reversible w.r.t. \(\pi \in \mathcal {M}_1(G)\) and therefore has \(\pi \) as invariant distribution. With \(L_2(\pi )\) we denote the space of measurable functions \(g:G \rightarrow \mathbb {R}\) satisfying

$$\begin{aligned} \left\| g\right\| _{2,\pi }^2:= \int _G g(x)^2 \pi (\text {d}x) < \infty . \end{aligned}$$

Note that \(\left\| \cdot \right\| _{2,\pi }\) is a norm on the quotient space of \(L_2(\pi )\) under the equivalence relation identifying functions that coincide \(\pi \)-a.e. It is induced by the inner product \(\langle \cdot , \cdot \rangle _{\pi }\) on \(L_2(\pi )\) defined by

$$\begin{aligned} \langle g, h \rangle _{\pi }:= \int _G g(x) h(x) \pi (\text {d}x). \end{aligned}$$

Observe that P acting on functions \(g: G \rightarrow \mathbb {R}\) via \(g \mapsto P g\) as in (4) defines a linear operator mapping from \(L_2(\pi )\) into \(L_2(\pi )\). Interpreting \(\pi \) as a transition kernel that is constant in its first argument, \(\pi \) also induces a linear operator mapping from \(L_2(\pi )\) into \(L_2(\pi )\), specifically by

$$\begin{aligned} \pi g (x) = \int _{\mathbb {R}^d} g(y) \pi (\text {d}y). \end{aligned}$$
(5)

This allows us to define the spectral gap of P as

$$\begin{aligned} \textsf {gap}_{\pi }(P):= 1 - \left\| P - \pi \right\| _{L_2(\pi ) \rightarrow L_2(\pi )}, \end{aligned}$$

where \(\left\| \cdot \right\| _{L_2(\pi ) \rightarrow L_2(\pi )}\) denotes the operator norm w.r.t. \(\left\| \cdot \right\| _{2,\pi }\).

With these formal notions at hand, we may now explicitly state some of the consequences of spectral gap estimates for \(\pi \)-reversible Markov chains that we already mentioned in the introduction. For example, it is well known, see e.g. (Novak and Rudolf 2014, Lemma 2), that it implies geometric convergence, i.e.,

$$\begin{aligned} \left\| \xi P^n -\pi \right\| _{\text {tv}} \le (1-\textsf {gap}_{\pi }(P))^n \left\| \frac{{\textrm{d}}\xi }{{\textrm{d}}\pi }-1 \right\| _{2,\pi }, \end{aligned}$$
(6)

where \(\left\| \cdot \right\| _{\text {tv}}\) again denotes the total variation distance.

An explicit lower bound of \(\textsf {gap}_\pi (P)\) also leads to a mean squared error bound of the Markov chain Monte Carlo sample average, see (Rudolf 2012, Theorem 3.41). Moreover, a classical result of Kipnis and Varadhan (1986) states that if the initial distribution is the invariant distribution \(\pi \) and \(g\in L_2(\pi )\) then the \(\sqrt{n}\)-scaled sample average error

$$\begin{aligned} \sqrt{n} \left( \frac{1}{n}\sum _{i=1}^{n}g(X_i) -\pi (g) \right) \end{aligned}$$

converges weakly to the normal distribution \(\mathcal {N}(0,\sigma _{g,P}^2)\) with mean zero and variance

$$\begin{aligned} \sigma _{g,P}^2 = \langle (I+P)(I-P)^{-1}(g-\pi (g)),(g-\pi (g))\rangle _{\pi }, \end{aligned}$$

where I denotes the identity map. The significant quantity \(\sigma _{g,P}^2\) satisfies

$$\begin{aligned} \text {IAT}_{g,P} \cdot \left\| g-\pi (g)\right\| _{2,\pi }^2 = \sigma _{g,P}^2 \le \frac{2 \left\| g-\pi (g)\right\| _{2,\pi }^2}{\textsf {gap}_{\pi }(P)}, \end{aligned}$$
(7)

where

$$\begin{aligned} \text {IAT}_{g,P} = 1 + 2\sum _{k\ge 1} \gamma _g(k), \end{aligned}$$
(8)

with correlations

$$\begin{aligned} \gamma _g(k) = \text {Corr}(g(X_0),g(X_{k})), \end{aligned}$$

denotes the integrated autocorrelation time.

3 Spectral gap estimate

In this section, we first introduce general slice sampling and derive a tool that can be used to establish spectral gap estimates. We then apply it to PSS.

3.1 General slice sampling

For the probability measure of interest \(\pi \in \mathcal {M}_1(\mathbb {R}^d)\) we assume to have an almost sure (w.r.t. the Lebesgue measure) factorization of the not necessarily normalized density of the form

$$\begin{aligned} \varrho (x) = \varrho _0(x) \varrho _1(x), \qquad x \in \mathbb {R}^d, \end{aligned}$$

with measurable functions \(\varrho _i:\mathbb {R}^d \rightarrow \mathbb {R}_+\) for \(i=0,1\). General slice sampling exploits this representation by (essentially) performing the two steps of Algorithm 1.1, except that \(\varrho _0\) takes the role of \(p_0\) and \(\varrho _1\) the role of \(p_1\). We refer to the 1st step as T-update and to the 2nd one as X-update. The transition kernels \(U_T\) on \(\mathbb {R}^d \times \mathcal {B}((0,\infty ))\) and \(U_X\) on \((0,\infty ) \times \mathcal {B}(\mathbb {R}^d)\) that correspond to the aforementioned T- and X-update of Algorithm 1.1 are given by

$$\begin{aligned} U_T(x,A)&= \frac{\lambda \left( A\cap (0,\varrho _1(x))\right) }{\lambda \left( (0,\varrho _1(x))\right) } = \frac{\int _A \mathbbm {1}_{L(t,\varrho _1)}(x)\, \text {d}t}{\varrho _1(x)} , \\ U_X(t,B)&= \frac{\int _{B\cap L(t,\varrho _1)} \varrho _0(x) \,\text {d}x}{\int _{L(t,\varrho _1)} \varrho _0(x) \, \text {d}x} =: \mu _t(B) , \end{aligned}$$

where \(x\in \mathbb {R}^d, A\in \mathcal {B}((0,\infty ))\) and \(t\in (0,\infty ),\; B\in \mathcal {B}(\mathbb {R}^d)\). Thus, the Markov chain \((X_n)_{n\in \mathbb {N}_0}\) of the slice sampling for \(\pi \) has transition kernel \(P_X = U_T U_X\). Moreover, the sequence of auxiliary random variables \((T_n)_{n \in \mathbb {N}}\), see the 2nd step of Algorithm 1.1, is (also) a Markov chain on \(((0,\infty ),\mathcal {B}((0,\infty )))\) with transition kernel \(P_T=U_X U_T\).

We now elaborate on how the investigation of the spectral gap of USS by Natarovskii et al. (2021) translates to general slice sampling. As a first step, we provide the invariant distribution of \(P_T\), which follows by standard arguments that are also delivered for the convenience of the reader.

Lemma 3.1

Let \(\widetilde{\pi }\in \mathcal {M}_1((0,\infty ))\) be determined by the probability density function

$$\begin{aligned} \widetilde{\varrho }(t) = C^{-1} \int _{L(t,\varrho _1)} \varrho _0(x) \text {d}x, \qquad t\in (0,\infty ), \end{aligned}$$

such that \(\widetilde{\pi }(\text {d}t) = \widetilde{\varrho }(t)\, \text {d}t\). Then \(P_T\) is reversible w.r.t. \(\widetilde{\pi }\).

Proof

Fix \(A \in \mathcal {B}(\mathbb {R}^d)\) and \(B \in \mathcal {B}((0,\infty ))\). Note that

$$\begin{aligned}&(\pi \otimes U_T)(A \times B) \\&= \int _A U_T(x, B) \pi (\text {d}x) \\&= \int _A \frac{\int _B \mathbbm {1}_{L(t,\varrho _1)}(x)\, \text {d}t}{\varrho _1(x)} \cdot C^{-1} \, \varrho _0(x) \varrho _1(x) \text {d}x \\&= C^{-1} \int _B \int _{A\cap L(t,\varrho _1)} \varrho _0(x)\, \text {d}x \text {d}t \\&= \int _B U_X(t,A) \widetilde{\varrho }(t)\, \text {d}t . \end{aligned}$$

This yields

$$\begin{aligned} 1&= (\pi \otimes U_T)(\mathbb {R}^d \times (0,\infty )) \\&= \int _{0}^\infty U_X(t,\mathbb {R}^d) \widetilde{\varrho }(t)\, \text {d}t = \int _{0}^\infty \widetilde{\varrho }(t)\, \text {d}t , \end{aligned}$$

proving that \(\widetilde{\varrho }\) is indeed normalized. Plugging this fact into the former computation shows

$$\begin{aligned} (\pi \!\otimes \! U_T)(A \!\times \!B) \!= \!\int _B U_X(t,A) \,\widetilde{\pi }(\text {d}t) \!=\! (\widetilde{\pi } \!\otimes \! U_X)(B \!\times \!A). \end{aligned}$$

For any measurable \(F:\mathbb {R}^d \times (0,\infty ) \rightarrow \mathbb {R}\) (for which one of the following integrals exists) the latter equation extends to

$$\begin{aligned}&\int _{\mathbb {R}^d} \int _0^\infty F(x,t)\, U_T(x, \text {d}t)\, \pi (\text {d}x) \\&= \int _0^\infty \int _{\mathbb {R}^d} F(x,t)\, U_X(t, \text {d}x)\, \widetilde{\pi }(\text {d}t). \end{aligned}$$

Therefore, we obtain

$$\begin{aligned}&(\widetilde{\pi } \otimes P_T)(B_1 \times B_2) \\&= (\widetilde{\pi } \otimes U_X U_T)(B_1 \times B_2) \\&= \int _{0}^\infty \mathbbm {1}_{B_1}(t) U_X U_T (t, B_2) \widetilde{\pi }(\text {d}t) \\&= \int _{0}^\infty \int _{\mathbb {R}^d} \mathbbm {1}_{B_1}(t) U_T(x,B_2) U_X(t, \text {d}x) \widetilde{\pi }(\text {d}t) \\&= \int _{\mathbb {R}^d} \int _{0}^\infty \mathbbm {1}_{B_1}(t) U_T(x,B_2) U_T(x, \text {d}t) \pi (\text {d}x) \\&= \int _{\mathbb {R}^d} U_T(x,B_1) U_T(x,B_2) \pi (\text {d}x) \end{aligned}$$

for \(B_1, B_2 \in \mathcal {B}((0,\infty ))\). The last expression is symmetric in \(B_1\) and \(B_2\), such that a backwards argumentation interchanging the roles of \(B_1\) and \(B_2\) shows that \(P_T\) is reversible w.r.t. \(\widetilde{\pi }\). \(\square \)

Note that by the same steps one can prove the well-known fact that \(P_X\) is reversible w.r.t. \(\pi \). Having this we are able to formulate our spectral gap duality result. The statement follows by the application of Lemmas A.1 and A.2 that can be found in the appendix.

Theorem 3.2

The linear operators \(P_X :L_2(\pi ) \rightarrow L_2(\pi )\) and \(P_T :L_2(\widetilde{\pi })\rightarrow L_2(\widetilde{\pi })\) induced by the corresponding transition kernels via (4) satisfy

$$\begin{aligned} \textsf {gap}_{\pi }(P_X) = \textsf {gap}_{\widetilde{\pi }}(P_T). \end{aligned}$$

Proof

Define the linear operators \(W:= U_T - \widetilde{\pi }\) and \(W^{*}:= U_X - \pi \). By Lemmas A.1 and A.2 (i), we know that \(W^{*}\) is the adjoint operator of W. Furthermore, by Lemma A.2 (ii) and the fact that \(P_X = U_T U_X\), we get

$$\begin{aligned} W W^{*}&= (U_T - \widetilde{\pi })(U_X - \pi ) \\&= U_T U_X - U_T \pi - \widetilde{\pi } U_X + \widetilde{\pi } \pi \\&= U_T U_X - \pi = P_X - \pi . \end{aligned}$$

Analogously, by Lemma A.2 (iii) and the fact that \(P_T = U_X U_T\), we get

$$\begin{aligned} W^{*} W&= (U_X - \pi )(U_T - \widetilde{\pi }) \\&= U_X U_T - U_X \widetilde{\pi } - \pi U_T + \pi \widetilde{\pi } \\&= U_X U_T - \widetilde{\pi } = P_T - \widetilde{\pi } . \end{aligned}$$

Now, denoting by \(\left\| \cdot \right\| _{L_2(\cdot ) \rightarrow L_2(\cdot )}\) the respective operator norms and applying some well-known facts from functional analysis (Werner 2011, Theorem V.5.2), we obtain

$$\begin{aligned} \left\| P_X - \pi \right\| _{L_2(\pi ) \rightarrow L_2(\pi )}&= \left\| W W^{*}\right\| _{L_2(\pi ) \rightarrow L_2(\pi )} \\&= \left\| W^{*}\right\| _{L_2(\pi ) \rightarrow L_2(\widetilde{\pi })}^2 \\&= \left\| W\right\| _{L_2(\widetilde{\pi }) \rightarrow L_2(\pi )}^2 \\&= \left\| W^{*} W\right\| _{L_2(\widetilde{\pi }) \rightarrow L_2(\widetilde{\pi })} \\&= \left\| P_T - \widetilde{\pi }\right\| _{L_2(\widetilde{\pi }) \rightarrow L_2(\widetilde{\pi })} . \end{aligned}$$

By the spectral gap’s definition, this implies the claimed identity. \(\square \)

Keeping in mind that \(\varrho =\varrho _0 \, \varrho _1\) we verify next that \(P_T\) (essentially) only depends on the target distribution \(\pi \) through a univariate function \(\ell _{\varrho _0,\varrho _1}\). Here \(\ell _{\varrho _0,\varrho _1}\) can be considered as an immediate extension of the level-set function from Natarovskii et al. (2021, e.g. Lemma 2.4) into the general slice sampling setting. We start with a proper definition.

Definition 3.3

For a factorized density \(\varrho = \varrho _0\, \varrho _1\) we define the generalized level-set function \(\ell _{\varrho _0,\varrho _1}: (0,\infty ) \rightarrow \mathbb {R}_+\) by

$$\begin{aligned} \ell _{\varrho _0,\varrho _1}(t):= \int _{L(t,\varrho _1)} \varrho _0(x) \text {d}x, \qquad t \in (0,\infty ). \end{aligned}$$

Observe that \(- \ell _{\varrho _0,\varrho _1}\) is a non-decreasing function on \((0,\infty )\). Therefore it can serve as the integrator in a Lebesgue–Stieltjes integral, see e.g. (Athreya and Lahiri 2006, Section 1.3.2). We now derive the aforementioned representation of \(P_T\) that only depends on \(\ell _{\varrho _0,\varrho _1}\).

Theorem 3.4

For any \(t > 0\) and \(B \in \mathcal {B}((0,\infty ))\) we have

$$\begin{aligned} P_T(t,B) = \frac{1}{\ell _{\varrho _0,\varrho _1}(t)} \int _t^{\infty } \frac{\lambda (B \, \cap (0,s))}{s}\; \text {d}(-\ell _{\varrho _0,\varrho _1})(s), \end{aligned}$$

where the right-hand side denotes a Lebesgue–Stieltjes integral w.r.t. \(- \ell _{\varrho _0,\varrho _1}\).

Proof

Define the measure \(\xi \) on \((\mathbb {R}^d, \mathcal {B}(\mathbb {R}^d))\) by \(\xi (A):= \int _A \varrho _0(x) \text {d}x\) for \(A\in \mathcal {B}(\mathbb {R}^d)\), such that \(\xi (\text {d}x) = \varrho _0 (x) \text {d}x\) and \(\ell _{\varrho _0,\varrho _1}(t) = \xi (L(t,\varrho _1))\). As \(- \ell _{\varrho _0,\varrho _1}\) is easily seen to be right-continuous, the Lebesgue–Stieltjes measure it generates (Athreya and Lahiri 2006, Section 1.3.2) is determined by mapping, for any \(t_1, t_2 \in (0,\infty )\) with \(t_1 < t_2\), the interval \((t_1,t_2]\) to

$$\begin{aligned}&(- \ell _{\varrho _0,\varrho _1})(t_2) - (- \ell _{\varrho _0,\varrho _1})(t_1) \\&= \xi (L(t_1,\varrho _1)) - \xi (L(t_2,\varrho _1)) \\&= \xi (\{x \in \mathbb {R}^d \mid \varrho _1(x)> t_1\}) - \xi (\{x \in \mathbb {R}^d \mid \varrho _1(x) > t_2\}) \\&= \xi (\{x \in \mathbb {R}^d \mid t_1 < \varrho _1(x) \le t_2\}) \\&= (\xi \circ \varrho _1^{-1})((t_1,t_2]) . \end{aligned}$$

Therefore it is given by \(\xi \circ \varrho _1^{-1}\), the pushforward measure of \(\xi \) w.r.t. \(\varrho _1\).

For any \(B \in \mathcal {B}((0,\infty ))\), let us define a function \(g_B: (0,\infty ) \rightarrow \mathbb {R}_+\) by

$$\begin{aligned} g_B(s):= \frac{\lambda (B \, \cap (0,s))}{s} \end{aligned}$$

and observe that

$$\begin{aligned} g_B(\varrho _1(x))&= \frac{\int _B \mathbbm {1}_{(0,\varrho _1(x))}(t)\, \text {d}t}{\varrho _1(x)} \\&= \frac{\int _B \mathbbm {1}_{L(t,\varrho _1)}(x)\, \text {d}t}{\varrho _1(x)} = U_T(x,B) \end{aligned}$$

for any \(x \in \mathbb {R}^d\). Now, by the change of variables formula (Bogachev 2007, Theorem 3.6.1) and the fact that \(\xi \circ \varrho _1^{-1}\) is the Lebesgue–Stieltjes measure generated by \(-\ell _{\varrho _0,\varrho _1}\), we get

$$\begin{aligned} P_T(t,B)&= \int _{\mathbb {R}^d} U_T(x, B) U_X(t, \text {d}x) \\&= \frac{\int _{L(t,\varrho _1)} U_T(x,B) \varrho _0(x) \text {d}x}{\int _{L(t,\varrho _1)} \varrho _0(x) \text {d}x} \\&= \frac{1}{\ell _{\varrho _0,\varrho _1}(t)} \int _{L(t,\varrho _1)} U_T(x,B) \xi (\text {d}x) \\&= \frac{1}{\ell _{\varrho _0,\varrho _1}(t)} \int _{\mathbb {R}^d} g_B(\varrho _1(x)) \mathbbm {1}_{(0,\varrho _1(x))}(t) \xi (\text {d}x) \\&= \frac{1}{\ell _{\varrho _0,\varrho _1}(t)} \int _0^{\infty } g_B(s) \mathbbm {1}_{(0,s)}(t)\; (\xi \circ \varrho _1^{-1})(\text {d}s) \\&= \frac{1}{\ell _{\varrho _0,\varrho _1}(t)} \int _t^{\infty } \frac{\lambda (B \, \cap (0,s))}{s} (\xi \circ \varrho _1^{-1})(\text {d}s) \\&= \frac{1}{\ell _{\varrho _0,\varrho _1}(t)}\int _t^{\infty } \frac{\lambda (B \, \cap (0,s))}{s} \text {d}(-\ell _{\varrho _0,\varrho _1})(s) \end{aligned}$$

for any \(t > 0\), \(B \in \mathcal {B}((0,\infty ))\), which proves the claimed result. \(\square \)

By combining the previous two theorems suitably we are able to show that if two distributions have the same function \(\ell _{\varrho _0,\varrho _1}\), the spectral gaps of slice sampling for them also coincide.

Theorem 3.5

For \(d, k \in \mathbb {N}\) let \(\pi \in \mathcal {M}_1(\mathbb {R}^d)\) and \(\nu \in \mathcal {M}_1(\mathbb {R}^k)\) be distributions with not necessarily normalized Lebesgue-densities \(\varrho \) and \(\eta \) satisfying \(\varrho = \varrho _0 \, \varrho _1\) and \(\eta = \eta _0 \, \eta _1\) for some measurable functions \(\varrho _j :\mathbb {R}^d \rightarrow \mathbb {R}_+\) and \(\eta _j :\mathbb {R}^k \rightarrow \mathbb {R}_+\) for \(j=0,1\). If \(\ell _{\varrho _0,\varrho _1} \equiv \ell _{\eta _0,\eta _1}\), i.e., if \(\ell _{\varrho _0,\varrho _1}(t) = \ell _{\eta _0,\eta _1}(t)\) for all \(t \in (0,\infty )\), then

$$\begin{aligned} \textsf {gap}_{\pi }(P_X^{(\pi )}) = \textsf {gap}_{\nu }(P_X^{(\nu )}), \end{aligned}$$

where \(P_X^{(\pi )}\) is the transition kernel of slice sampling for \(\pi \) based on the factorization \(\varrho = \varrho _0 \, \varrho _1\) and \(P_X^{(\nu )}\) is the transition kernel of slice sampling for \(\nu \) based on the factorization \(\eta = \eta _0 \, \eta _1\).

Proof

By Theorem 3.4 and the assumption \(\ell _{\varrho _0,\varrho _1} \equiv \ell _{\eta _0,\eta _1}\), we immediately get \(P_T^{(\pi )}(t,B) = P_T^{(\nu )}(t,B)\) for all \(t \in (0,\infty )\) and \(B \in \mathcal {B}((0,\infty ))\), where \(P_T^{(\pi )}\) is the transition kernel of the auxiliary chain \((T_n)_{n \in \mathbb {N}}\) of the slice sampler for \(\pi \) and \(P_T^{(\nu )}\) the corresponding one for \(\nu \). As the kernels of the auxiliary chains coincide, their invariant distributions, say \(\widetilde{\pi }, \widetilde{\nu }\) (cf. Lemma 3.1), must do so as well, i.e., \(\widetilde{\pi } \equiv \widetilde{\nu }\). Applying Theorem 3.2 twice yields

$$\begin{aligned} \textsf {gap}_{\pi }(P_X^{(\pi )}) = \textsf {gap}_{\widetilde{\pi }}(P_T^{(\pi )}) = \textsf {gap}_{\widetilde{\nu }}(P_T^{(\nu )}) = \textsf {gap}_{\nu }(P_X^{(\nu )}). \end{aligned}$$

\(\square \)

In contrast to the investigation of Natarovskii et al. (2021) the former result shows that two different slice samplers (possibly based on different kinds of factorizations, not just different target distributions) have the same spectral gap as long as their corresponding generalized level-set functions coincide. We illustrate the variability of this result in the following consideration.

Example 3.6

For any \(d \in \mathbb {N}\), let \(\varrho : \mathbb {R}^d \rightarrow \mathbb {R}_+\) and \(\eta : \mathbb {R}\rightarrow \mathbb {R}_+\) be given by

$$\begin{aligned} \varrho (x) = \left\| x\right\| ^{1-d} \exp (-\left\| x\right\| ), \qquad \eta (s) = \exp (- c_d \left|{s}\right|) \end{aligned}$$

for \(x \in \mathbb {R}^d\), \(s \in \mathbb {R}\), with \(c_d:= 2 \sigma _{d-1}(\mathbb {S}^{d-1})^{-1}\). Let \(\pi \) and \(\nu \) be the distributions with non-normalized densities \(\varrho \) and \(\eta \). We now consider PSS for \(\pi \) and USS for \(\nu \), i.e., we factorize \(\varrho = p_0 \, p_1\) (cf. (2)) andFootnote 3\(\eta = \textbf{1} \cdot \eta \). By the polar coordinates formula, see Proposition A.3, we readily obtain

$$\begin{aligned} \ell _{p_0,p_1}(t)&= \int _{\mathbb {R}^d} \left\| x\right\| ^{1-d} \mathbbm {1}_{(t,\infty )}(\exp (-\left\| x\right\| )) \text {d}x \\&= \sigma _{d-1}(\mathbb {S}^{d-1}) \cdot \int _0^{\infty } r^{1-d} \mathbbm {1}_{(-\infty ,-\log t)}(r) r^{d-1} \text {d}r \\&= 2 c_d^{-1} \cdot (-\log t) \end{aligned}$$

for \(t \in (0,1)\) and \(\ell _{p_0,p_1} = 0\) for \(t \ge 1\). Furthermore, for \(\eta \) one has

$$\begin{aligned} \ell _{\textbf{1},\eta }(t)&= \int _{\mathbb {R}} \mathbbm {1}_{(t,\infty )}(\exp (-c_d \left|{s}\right|) \text {d}s \\&= \int _{\mathbb {R}} \mathbbm {1}_{(-\infty ,c_d^{-1} \cdot (- \log t))}(\left|{s}\right|) \text {d}s = 2 c_d^{-1} \cdot (-\log t) , \end{aligned}$$

again for \(t \in (0,1)\), with \(\ell _{\textbf{1},\eta }(t) = 0\) for all \(t \ge 1\). Overall, this yields

$$\begin{aligned} \ell _{\varrho _0,\varrho _1}(t) = 2 c_d^{-1} \log (t^{-1}) \mathbbm {1}_{(0,1)}(t) = \ell _{\textbf{1},\eta }(t) \end{aligned}$$

for all \(t \in (0,\infty )\). Hence by Theorem 3.5 the spectral gaps that correspond to the different slice sampling schemes coincide. In particular, from Natarovskii et al. (2021, Example 3.15) we know that \( \textsf {gap}_{\nu }(P_X^{(\nu )}) \ge 1/2\). Consequently, we obtain for PSS that also \(\textsf {gap}_{\pi }(P_X^{(\pi )}) \ge 1/2\).

The example already indicates how Theorem 3.5 can be applied to carry the spectral gap from one slice sampling scheme to another. Now, we identify properties of the generalized level set function that allow the formerly stated ‘carrying over’ in a universal fashion, cf. (Natarovskii et al. 2021, Definition 3.9).

Definition 3.7

For any \(k \in \mathbb {N}\), we define \(\Lambda _k\) as the class of continuous functions \(\ell :(0,\infty ) \, \rightarrow \mathbb {R}_+\) that satisfy

  1. (i)

    \(\lim _{t \rightarrow \infty } \ell (t) = 0\) and \(\mathcal {L}:= \lim _{t \searrow 0} \ell (t) \in (0,\infty ]\),

  2. (ii)

    \(\ell \) restricted to its open support

    $$\begin{aligned} \,\textsf {supp}(\ell ):= \left( 0,\sup \{t \in (0,\infty ) :\ell (t) > 0\}\right) \end{aligned}$$

    is strictly decreasing, and

  3. (iii)

    the function \(g:(0,\mathcal {L}^{1/k}) \rightarrow (0,\infty )\) with \(g(r) =\ell ^{-1}(r^k)\) is log-concave.

Remark 3.8

Conditions (i) and (ii) together with the assumed continuity of \(\ell \) guarantee that \(\ell \) restricted to its open support \(\,\textsf {supp}(\ell )\) maps surjectively onto \(I_{\ell }:= (0,\mathcal {L})\). As condition (ii) also guarantees injectivity of this restricted function, it must actually be bijective, which gives the existence of the inverse function \(\ell ^{-1}: I_{\ell } \rightarrow \,\textsf {supp}(\ell )\) used in condition (iii). Observe that, as the inverse of a strictly decreasing function, \(\ell ^{-1}\) must again be strictly decreasing.

The properties of the formerly defined classes of functions allow us to construct for \(\ell \in \Lambda _k\) a not necessarily normalized density function \(\eta :\mathbb {R}^k \rightarrow \mathbb {R}_+\) for which USS, targeting \(\nu \in \mathcal {M}_1(\mathbb {R}^k)\) given by

$$\begin{aligned} \nu (A) = \frac{\int _A \eta (z) \text {d}z}{\int _{\mathbb {R}^k} \eta (z)\, \text {d}z}, \qquad A\in \mathcal {B}(\mathbb {R}^k), \end{aligned}$$
(9)

has a spectral gap of at least \(1/(k+1)\) and satisfies \(\ell _{\textbf{1},\eta } \equiv \ell \). With that and Theorem 3.5 we can draw conclusions about the spectral gap of generalized slice sampling. The correspondingly formulated statement reads as follows.

Theorem 3.9

Given \(\varrho _0 :\mathbb {R}^d \rightarrow \mathbb {R}_+\) and a not necessarily normalized density \(\varrho :\mathbb {R}^d \rightarrow \mathbb {R}_+\), choose \(\varrho _1 :\mathbb {R}^d \rightarrow \mathbb {R}_+\), so that \(\varrho = \varrho _0 \,\varrho _1\). Let \(\pi \in \mathcal {M}_1(\mathbb {R}^d)\) be specified by \(\varrho \) as in (1). Let \(P_X^{(\pi )}\) be the transition kernel that corresponds to slice sampling for \(\pi \) based on \(\varrho _0\) and \(\varrho _1\). Then, for \(k \in \mathbb {N}\) with \(\ell _{\varrho _0,\varrho _1} \in \Lambda _k\), we have

$$\begin{aligned} \textsf {gap}_{\pi }(P_X^{(\pi )}) \ge \frac{1}{k+1}. \end{aligned}$$

Proof

To shorten the notation we set \(\ell :=\ell _{\varrho _0,\varrho _1}\) and \(\mathcal {L}:= \sup _{t>0} \ell (t) = \lim _{t\searrow 0} \ell (t)\). Fix \(k \in \mathbb {N}\) with \(\ell \in \Lambda _k\). Let \(\kappa := \left( k \, \sigma _{k-1}(\mathbb {S}^{k-1})^{-1} \mathcal {L}\right) ^{1/k} \in (0,\infty ]\) and define \(\phi : (0,\kappa ) \rightarrow \mathbb {R}\) by

$$\begin{aligned} \phi (r):= -\log \left( \ell ^{-1}\left( \frac{\sigma _{k-1}(\mathbb {S}^{k-1})}{k} r^k \right) \right) . \end{aligned}$$
(10)

Then, one readily observes that

  • \(\phi \) is strictly increasing as composition of the strictly increasing function \(r \mapsto \sigma _{k-1}(\mathbb {S}^{k-1})/k \cdot r^k\) and the strictly decreasing functions \(\ell ^{-1}\) and \(r \mapsto -\log r\),

  • \(\phi \) is convex as composition of the linear function \(r \mapsto (\sigma _{k-1}(\mathbb {S}^{k-1})/k)^{1/k} r\) and the (by Definition 3.7 iii) convex function \(r \mapsto -\log \ell ^{-1}(r^k)\); and

  • the inverse of \(\phi \) is given by

    $$\begin{aligned} \phi ^{-1}(s) = \left( \frac{k \cdot \ell (\exp (-s)) }{\sigma _{k-1}(\mathbb {S}^{k-1})}\right) ^{1/k} \end{aligned}$$
    (11)

    for \(s> - \log (\sup \{t \in (0,\infty ) \; :\ell (t) > 0 \})\).

Consider \(\nu \in \mathcal {M}_1(\mathbb {R}^k)\) as in (9) determined by the not necessarily normalized Lebesgue-density \(\eta :\mathbb {R}^k \rightarrow \mathbb {R}_+\) given by

$$\begin{aligned} \eta (x) = \exp (-\phi (\left\| x\right\| )) \mathbbm {1}_{(0,\kappa )}(\left\| x\right\| ), \qquad x\in \mathbb {R}^k, \end{aligned}$$

with \(\phi \) from (10). By the fact that \(\phi \) is strictly increasing and convex, (Natarovskii et al. 2021, Corollary 3.1) yields that the spectral gap of the transition kernel \(P_X^{(\nu )}\) of USS for \(\nu \) satisfies

$$\begin{aligned} \textsf {gap}_{\nu }(P_X^{(\nu )}) \ge \frac{1}{k+1}. \end{aligned}$$
(12)

Now the goal is to show that the level-set function \(\ell _{{\textbf {1}},\eta }\) is identical to \(\ell =\ell _{\varrho _0,\varrho _1}\). We obtain for \(0 \ne x \in \mathbb {R}^k\) and \(t \in (0,\sup _{y \in \mathbb {R}^k} \eta (y))\) that

$$\begin{aligned} \eta (x) > t \quad&\Leftrightarrow \quad \phi (\left\| x\right\| )< -\log t \quad \text {and} \quad \left\| x\right\|< \kappa \\&\Leftrightarrow \quad \left\| x\right\|< \phi ^{-1}(-\log t) \quad \text {and} \quad \left\| x\right\|< \kappa \\&\Leftrightarrow \quad \left\| x\right\| < \phi ^{-1}(-\log t) , \end{aligned}$$

where the second equivalence relies on \(\phi ^{-1}\) being strictly increasing, and the third equivalence on \(\phi ^{-1}\) mapping to the domain \((0,\kappa )\) of \(\phi \), so that in particular \(\phi ^{-1} < \kappa \). Hence, the super-level set of \(\eta \) is

$$\begin{aligned} L(t,\eta ) = \{ x\in \mathbb {R}^k :\left\| x\right\| < \phi ^{-1}(-\log t) \}. \end{aligned}$$

Consequently, by the polar coordinates formula, see Proposition A.3, we get

$$\begin{aligned} \ell _{\textbf{1},\eta }(t)&= \int _{L(\eta ,t)} \textbf{1}(x) \text {d}x \\&= \sigma _{k-1}(\mathbb {S}^{k-1}) \int _0^{\infty } \mathbbm {1}_{[0,\phi ^{-1}(-\log t)]}(r)\, r^{k-1}\, \text {d}r \\&= \sigma _{k-1}(\mathbb {S}^{k-1}) \left[ \frac{1}{k} r^k \right] _0^{\phi ^{-1}(-\log t)} \\&= \frac{\sigma _{k-1}(\mathbb {S}^{k-1})}{k} \phi ^{-1}(-\log t)^k = \ell (t), \end{aligned}$$

where the last equality follows by plugging in (11).

Finally, by Theorem 3.5 and (12), we obtain

$$\begin{aligned} \textsf {gap}_{\pi }(P_X^{(\pi )}) = \textsf {gap}_{\nu }(P_X^{(\nu )}) \ge \frac{1}{k+1}, \end{aligned}$$

which concludes the proof. \(\square \)

We add an open question about the result. For this the following class of ‘good’, in the sense of the previous theorem, probability measures is required.

Definition 3.10

Given \(\varrho _0:\mathbb {R}^d \rightarrow \mathbb {R}_+\) and \(k\in \mathbb {N}\) define \(\Pi _{\varrho _0,k}\subset \mathcal {M}_1(\mathbb {R}^d)\) as a class of probability measures which satisfies that

  1. 1.

    \(\pi \in \Pi _{\varrho _0,k}\) is determined by a not necessarily normalized density \(\varrho :\mathbb {R}^d \rightarrow \mathbb {R}_+\) as defined in (1) with \(\varrho _1 :\mathbb {R}^d \rightarrow \mathbb {R}_+\) being chosen so that \(\varrho = \varrho _0 \,\varrho _1\); and

  2. 2.

    \(\ell _{\varrho _0,\varrho _1}\in \Lambda _k\).

Then, Theorem 3.9 yields

$$\begin{aligned} \inf _{\pi \in \Pi _{\varrho _0,k}} \textsf {gap}_{\pi }(P_X^{(\pi )}) \ge \frac{1}{k+1}, \end{aligned}$$

i.e., the worst case behavior of the spectral gap on the input class \(\Pi _{\varrho _0,k}\) is at least \(1/(k+1)\). The question we pose is how good the lower bound actually is. In other words, is there a matching upper bound of the worst case spectral gap that also converges with \(k\rightarrow \infty \) to zero and if so, does it lead to the fact that the lower bound cannot be improved? Any insight into that direction may lead to a characterization of the spectral gap of generalized slice sampling and therefore indicate its limitations.

We finish this section with an immediate consequence of Theorem 3.9 w.r.t. PSS.

Corollary 3.11

For a not necessarily normalized density function \(\varrho :\mathbb {R}^d \rightarrow \mathbb {R}_+\) let \(\pi \) be the corresponding distribution as in (1) and define \(p_0\), \(p_1\) by (2). Assume that \(\ell _{p_0,p_1} \in \Lambda _k\) for some \(k \in \mathbb {N}\). Then \(P = P^{(\pi )}_X\), the transition kernel of PSS for \(\pi \) satisfies

$$\begin{aligned} \textsf {gap}_{\pi }(P) \ge \frac{1}{k+1}. \end{aligned}$$

Remark 3.12

In the setting of the previous corollary assume that k does not depend on d. In that case we have already a dimension-independent lower bound of the spectral gap of PSS. We want to emphasize that even though the spectral gap is independent of d, implementing the 2nd step of Algorithm 1.1 may lead to an acceptance probability that decreases w.r.t. d. The already mentioned Gibbsian polar slice sampler of Schär et al. (2023) addresses this issue.

We now move on to our next main result, where we apply Theorem 3.9 (or Corollary 3.11) and provide concrete properties of \(\varrho \) that lead to a spectral gap of at least 1/2 of PSS.

3.2 Polar slice sampling

We assume here \(d \ge 2\) and note that PSS coincides with USS for \(d=1\). Consequently, in that case, spectral gap estimates for the latter carry over to the former.

The strategy in this section is to consider a specific class of not necessarily normalized density functions \(\varrho \) for which we verify that the corresponding PSS generalized level set function \(\ell _{p_0,p_1}\) satisfies \(\ell _{p_0,p_1} \in \Lambda _1\). Then, by applying Theorem 3.9 we readily obtain a dimension independent lower bound of the spectral gap of PSS. We formulate the main statement and discuss the required assumptions.

Theorem 3.13

Let \(\pi \in \mathcal {M}_1(\mathbb {R}^d)\) be a distribution with not necessarily normalized density \(\varrho \) given by

$$\begin{aligned} \varrho (x) = \exp (- \phi (\left\| x\right\| )) \mathbbm {1}_{(0,\kappa )}(\left\| x\right\| ), \end{aligned}$$

where \(\kappa \in (0,\infty ]\) and \(\phi : (0,\kappa ) \rightarrow \mathbb {R}\) is a convex and twice differentiable function that satisfies \(\lim _{r \nearrow \kappa } \phi (r) = \infty \). Then we have

$$\begin{aligned} \textsf {gap}_{\pi }(P) \ge \frac{1}{2}, \end{aligned}$$

where \(P=P^{(\pi )}_X\) is the transition kernel of PSS for \(\pi \).

Remark 3.14

We discuss the conditions and appearing objects of the theorem:

  • The parameter \(\kappa \) controls the support of \(\varrho \): If it is finite, \(\varrho \) is only supported on the zero-centered Euclidean ball of radius \(\kappa \), and if it is infinite, \(\varrho \) is supported on all of \(\mathbb {R}^d\).

  • The densities \(\varrho \) to which the theorem applies are rotationally invariant, i.e., they may only depend on the function’s argument x through \(\left\| x\right\| \).

  • The convexity constraint on \(\phi \) gives that \(\varrho \) is log-concave along rays emanating from the origin, which already emerged to be a useful property for proving theoretical results regarding PSS in Roberts and Rosenthal (2002). In particular, it guarantees that the later appearing function \(h_1:(0,\kappa ) \rightarrow \mathbb {R}_+\), \(r\mapsto r^{d-1} \exp (-\phi (r))\) has interval-like super level sets.

  • That the function \(\phi \) is required to be twice differentiable eases our proof, but we believe that the theorem’s claim is still true without this assumption.

  • The condition \(\lim _{r \nearrow \kappa } \phi (r) = \infty \) means that \(\varrho \) must tend to zero whenever its argument approaches the boundary of the support. The requirement is always satisfied when \(\kappa =\infty \), since \(\varrho \) is assumed to be Lebesgue-integrable.

Note that the rotational invariance, convexity and the constraint \(\lim _{r \nearrow \kappa } \phi (r) = \infty \) without any further monotonicity requirements on \(\phi \) lead to two types of admissible target densities: Unimodal densities, which result from non-decreasing \(\phi \), and “volcano”-shaped densities, which result from \(\phi \) that are initially strictly decreasing and then at some point become strictly increasing. Particularly notable is that the lower-bound of the spectral gap on the class of \(\varrho \) specified in the theorem is constant, i.e., does not depend on any continuity or concentration parameter, not even on the state space dimension d. In the course of proving Theorem 3.9 we start with a characterization of the corresponding level set functions.

Lemma 3.15

Assume that the requirements of Theorem 3.13 are satisfied. Factorize \(\varrho \) in accordance with PSS, i.e., \(\varrho = p_0 \, p_1\) (cf. (2)). Define \(h_1: (0,\kappa ) \rightarrow \mathbb {R}_+\) by

$$\begin{aligned} h_1(r):= r^{d-1} \exp (-\phi (r)). \end{aligned}$$

Then, there exists a value \(r_{\text {mode}} \in (0,\kappa )\) such that

$$\begin{aligned}&\ell _{p_0,p_1}(t) \\&= {\left\{ \begin{array}{ll} \sigma _{d-1}(\mathbb {S}^{d-1}) (r_{\max }(t) - r_{\min }(t)) &{} 0<t < h_1(r_{\text {mode}}) , \\ 0 &{} t \ge h_1(r_{\text {mode}}) , \end{array}\right. } \end{aligned}$$

with functions

$$\begin{aligned} r_{\max } :&\left( 0,h_1(r_{\text {mode}})\right) \rightarrow \left( r_{\text {mode}},\kappa \right) , \\&\; t \mapsto \left( h_1|_{(r_{\text {mode}},\kappa )} \right) ^{-1}(t), \\ r_{\min } :&\left( 0,h_1(r_{\text {mode}})\right) \rightarrow \left( 0,r_{\text {mode}}\right) , \\&\; t \mapsto \left( h_1|_{(0,r_{\text {mode}})} \right) ^{-1}(t), \end{aligned}$$

that are strictly decreasing and strictly increasing, respectively.

Proof

The generalized level set function \(\ell _{p_0,p_1}\) of \(\varrho = p_0 \, p_1\), given as in (2), satisfies, by virtue of Proposition A.3, for all \(t > 0\) that

$$\begin{aligned} \ell _{p_0,p_1}(t)&= \int _{\mathbb {R}^d} \left\| x\right\| ^{1-d} \mathbbm {1}_{(t,\infty )}(h_1(\left\| x\right\| )) \nonumber \mathbbm {1}_{(0,\kappa )}(\left\| x\right\| ) \text {d}x \\&= \sigma _{d-1}(\mathbb {S}^{d-1}) \int _0^{\kappa } r^{1-d}\,\mathbbm {1}_{(t,\infty )}(h_1(r)) \, r^{d-1} \text {d}r \nonumber \\&= \sigma _{d-1}(\mathbb {S}^{d-1}) \lambda (\{r \in (0,\kappa ) \; :h_1(r) > t \}). \end{aligned}$$
(13)

We analyze the function \(h_1\) to deduce the claimed representation of \(\ell _{p_0,p_1}\) from the former expression. Observe that

$$\begin{aligned} h_1^{\prime }(r)&= (d-1) r^{d-2} \exp (-\phi (r)) - r^{d-1} \phi ^{\prime }(r) \exp (-\phi (r)) \\&= (d - 1 - r \phi ^{\prime }(r)) r^{d-2} \exp (-\phi (r)) . \end{aligned}$$

Define \(h_2: (0,\kappa ) \rightarrow \mathbb {R}, \; r \mapsto r \phi ^{\prime }(r)\) and observe \(h_2^{\prime }(r) = \phi ^{\prime }(r) + r \phi ^{\prime \prime }(r)\). Let \(r_{\phi } \in [0,\kappa )\) be such that \(\phi \) is decreasing on \((0,r_{\phi })\) (possibly \(\emptyset \)) and strictly increasing on \((r_{\phi },\kappa )\). That this value exists is an immediate consequence of the convexity of \(\phi \) and the requirement \(\lim _{r \nearrow \kappa } \phi (r) = \infty \). Now for \(r \in (0,r_{\phi })\), where \(\phi \) is decreasing (\(\phi ^{\prime }(r) \le 0\)), we have \(h_2(r) \le 0\). And for \(r \in (r_{\phi },\kappa )\), where \(\phi \) is not just convex (\(\phi ^{\prime \prime }(r) \ge 0\)) but also strictly increasing (\(\phi ^{\prime }(r) > 0\)), we get \(h_2^{\prime }(r) > 0\).

In cases where \(\kappa < \infty \), the property \(\lim _{r \nearrow \kappa } \phi (r) = \infty \) implies \(\lim _{r \nearrow \kappa } \phi ^{\prime }(r) = \infty \)

and thus \(\lim _{r \nearrow \kappa } h_2(r) = \infty \). If \(\kappa = \infty \), then the same result follows from \(\phi ^{\prime }(r)\) being positive for \(r > r_{\phi }\) and non-decreasing on account of its slope \(\phi ^{\prime \prime }\) being non-negative by convexity of \(\phi \).

Combining these observations, we see that \(h_2\) is upper-bounded by zero on \((0,r_{\phi })\) and strictly increasing towards \(+\infty \) on \((r_{\phi },\kappa )\). Therefore, there exists an \(r_{\text {mode}} \in (0,\kappa )\) such that \(r\mapsto d - 1 - r \phi ^{\prime }(r)\) within \(h_1^{\prime }\) is positive on \((0,r_{\text {mode}})\) with \(h_2(r_{\text {mode}}) = d-1\) and negative on \((r_{\text {mode}},\kappa )\). Consequently, for \(r \in \left( 0,r_{\text {mode}}\right) \) we get

$$\begin{aligned} h_1^{\prime }(r) = \underbrace{(d - 1 - r \phi ^{\prime }(r))}_{>0} \underbrace{r^{d-2}}_{>0} \underbrace{\exp (-\phi (r))}_{>0} > 0, \end{aligned}$$

whereas for \(r \in \left( r_{\text {mode}},\kappa \right) \) we have

$$\begin{aligned} h_1^{\prime }(r) = \underbrace{(d - 1 - r \phi ^{\prime }(r))}_{<0} \underbrace{r^{d-2}}_{>0} \underbrace{\exp (-\phi (r))}_{>0} < 0. \end{aligned}$$

In other words \(h_1\) is unimodal with mode located at \(r_{\text {mode}}\). Moreover, the above shows that \(h_1|_{]0,r_{\text {mode}}[}\), the inverse of \(r_{\min }\), is a strictly increasing function, which implies that \(r_{\min }\) is also strictly increasing. Analogously it follows that \(r_{\max }\) is strictly decreasing. Besides that we have \(h_1(r) < h_1(r_{\text {mode}})\) for \(r \ne r_{\text {mode}}\), which by (13) readily gives \(\ell _{p_0,p_1}(t) = 0\) for \(t \ge h_1(r_{\text {mode}})\).

Since \(\phi ^{\prime }\) is positive on \((r_{\phi },\kappa )\) and non-decreasing (as noted before), there is an \(\varepsilon >0\) with \(r_\phi +\varepsilon <\kappa \) such that for all \(r\ge r_\phi +\varepsilon \) it satisfies \(\phi ^{\prime }(r) \ge \phi ^{\prime }(r_{\phi } + \varepsilon ) > 0\). For \(\kappa =\infty \), using the former observation, we have by L’Hospital’s rule,

$$\begin{aligned} \lim _{r\nearrow \infty } h_1(r)&= \lim _{r\nearrow \infty } \frac{r^{d-1}}{\exp (\phi (r))} \\&= \lim _{r\nearrow \infty } \frac{(d-1) r^{d-2}}{\phi ^{\prime }(r) \exp (\phi (r))} \\&\le \frac{d-1}{\phi ^{\prime }(r_{\phi } + \varepsilon )} \lim _{r\nearrow \infty } \frac{r^{d-2}}{\exp (\phi (r))} \end{aligned}$$

and by iterating this inductively we get

$$\begin{aligned} \lim _{r\nearrow \infty } h_1(r) \le \frac{(d-1)!}{\phi ^{\prime }(r_{\phi } + \varepsilon )^{d-1}} \lim _{r\nearrow \infty } \frac{1}{\exp (\phi (r))} = 0. \end{aligned}$$

For \(\kappa <\infty \) by \(\lim _{r \nearrow \kappa } \phi (r) = \infty \) it also follows \(\lim _{r\nearrow \kappa } h_1(r)=0\). Consequently we have

$$\begin{aligned} \lim _{r\searrow 0} h_1(r) = 0 = \lim _{r\nearrow \kappa } h_1(r), \end{aligned}$$

where the first equality holds by definition. This finally allows us to conclude

$$\begin{aligned}&\{r \in (0,\kappa ) \; :h_1(r) > t \} \\&= \left( \left( h_1|_{(0,r_{\text {mode}})} \right) ^{-1}(t),\left( h_1|_{(r_{\text {mode}},\kappa )} \right) ^{-1}(t)\right) \\&= (r_{\min }(t),r_{\max }(t)) \end{aligned}$$

for \(t \in \left( 0,h_1(r_{\text {mode}})\right) \). The claimed formula for \(\ell _{p_0,p_1}\) is obtained by plugging this identity into (13). \(\square \)

Using the formerly developed tool, we are able to deliver the proof of the theorem.

Proof of Theorem 3.13

To verify the statement of Theorem 3.13 we show that for \(\varrho \), satisfying the assumptions formulated there, the corresponding level set function \(\ell _{p_0,p_1}\) satisfies \(\ell _{p_0,p_1} \in \Lambda _1\). By Lemma 3.15 it is easily seen that \(\ell _{p_0,p_1}\) is a continuous function, such that it is sufficient to check (i), (ii) and (iii) of Definition 3.7 for \(k=1\).

To (i): Just by being a generalized level set function, \(\ell _{p_0,p_1}\) satisfies the limit properties.

To (ii): The monotonicity properties of \(r_{\max }\) and \(r_{\min }\) provided by Lemma 3.15 yield that \(\ell _{p_0,p_1}\) is strictly decreasing on \(\textrm{supp}(\ell _{p_0,p_1})\).

To (iii): By Proposition A.5 it is sufficient to show that

$$\begin{aligned} h_3(s):= \ell _{p_0,p_1}(\exp (-s)) \end{aligned}$$

is concave on a set D, which, by Lemma 3.15, is here given by \(D = (-\log h_1(r_{\text {mode}}),\infty )\). Using the lemma’s representation of \(\ell _{p_0,p_1}\), we can rewrite \(h_3\) as

$$\begin{aligned} h_3(s) = \sigma _{d-1}(\mathbb {S}^{d-1}) (r_{\max }(\exp (-s)) - r_{\min }(\exp (-s))). \end{aligned}$$

Consequently, the concavity of \(h_3\) follows by concavity of

$$\begin{aligned} h_4:&\left( -\log h_1(r_{\text {mode}}),\infty \right) \rightarrow \left( r_{\text {mode}},\kappa \right) , \\&\; s \mapsto r_{\max }(\exp (-s)) \end{aligned}$$

as well as convexity of

$$\begin{aligned} h_5:&\left( -\log h_1(r_{\text {mode}}),\infty \right) \rightarrow \left( 0,r_{\text {mode}}\right) , \\&\; s \mapsto r_{\min }(\exp (-s)) . \end{aligned}$$

Concavity of \(h_4\) means convexity of \(-h_4\). Clearly, \(-h_4\) is continuous and as \(h_4\) is the composition of two strictly decreasing functions, it is strictly increasing, so \(-h_4\) is strictly decreasing. By Lemma A.4, convexity of \(-h_4\) is equivalent to convexity of its inverse \(r \mapsto h_4^{-1}(-r)\), which in turn is equivalent to convexity of \(h_4^{-1}\) itself (as the graph of one of these functions is just a reflection of that of the other on the axis \(r = 0\)), which is given by

$$\begin{aligned} h_4^{-1}(r)&= - \log r_{\max }^{-1}(r) = - \log h_1|_{(r_{\text {mode}},\kappa )}(r) \\&= - \log ( r^{d-1} \exp (-\phi (r)) ) = \phi (r) - (d-1) \log r . \end{aligned}$$

However, since \(\phi \) is convex by assumption and \(-\log \) is known to be convex, the convexity of \(h_4^{-1}\) is obvious.

As the composition of a strictly increasing and a strictly decreasing function, \(h_5\) is strictly decreasing. Because \(h_5\) is clearly also continuous, applying Lemma A.4 again yields that the convexity of \(h_5\) is equivalent to that of its inverse \(h_5^{-1}\), which is given by

$$\begin{aligned} h_5^{-1}(r)&= - \log r_{\min }^{-1}(r) = - \log h_1|_{]0, r_{\text {mode}}[}(r) \\&= - \log \left( r^{d\!-\!1} \exp \left( -\phi (r)\right) \right) \!=\! \phi (r) \!-\! (d-1) \log r . \end{aligned}$$

Thus, the convexity of \(h_5^{-1}\) follows by the same argument as that of \(h_4^{-1}\).

Therefore iii for \(k=1\) is proven and \(\ell _{p_0,p_1}\in \Lambda _1\). By Theorem 3.9 this implies the claimed spectral gap estimate. \(\square \)

4 Concluding remarks

Driven by empirically observed dimension independent IAT behavior, as documented in the motivating illustration in Sect. 1, and the recent algorithmic contribution about Gibbsian polar slice sampling, we investigated the spectral gap of PSS. For arbitrary dimension, if \(\varrho \), the possibly not normalized density function of the distribution of interest, is rotationally invariant, log-concave along rays emanating from the origin and sufficiently smooth we proved a lower bound of 1/2 on the spectral gap. Along the way we significantly extended the theory of Natarovskii et al. (2021) into the setting of general slice sampling that is based on a factorization \(\varrho = \varrho _0 \, \varrho _1\). In Definition 3.7 we presented a class of functions \(\Lambda _k\), already introduced in Natarovskii et al. (2021, Definition 3.9), which provides the required conditions on the level set function \(\ell _{\varrho _0,\varrho _1}\) for verifying the lower bound \(1/(k+1)\) of the spectral gap for generalized slice sampling. As an immediate consequence this lower bound can be applied in the PSS-setting. Moreover, it served as the main tool for proving the aforementioned dimension-independent spectral gap estimate for PSS.

We point to open questions, limitations and some directions of future work. Let us start with the question that has already been formulated after Definition 3.10 on how ‘good’ the lower bound of the spectral gap of generalized slice sampling of Theorem 3.9 actually is. We conjecture that at least for some \(\varrho _0\) on the class \(\Pi _{\varrho _0,k}\) the result cannot be qualitatively improved. We surmise that there is an upper bound ‘function’ \(u:\mathbb {N}\rightarrow \mathbb {R}_+\) with \(\lim _{k \rightarrow \infty } u(k) = 0\) such that the worst case spectral gap satisfies

$$\begin{aligned} \frac{1}{k+1} \le \inf _{\pi \in \Pi _{\varrho _0,k}} \textsf {gap}_{\pi }(P^{(\pi )}_X) \le u(k). \end{aligned}$$

By proving this conjecture, one would show that the parameter k is the right quantity for characterizing the spectral gap of general slice sampling, which points to the limitation that for large k the ‘efficiency’ of slice sampling indeed deteriorates. Related to the understanding of the limitations of Theorem 3.9 one may ask whether an extension into a manifold setting is possible. Recently there have been investigations of slice sampling approaches on the sphere, see e.g. Habeck et al. (2023), Lie et al. (2021), which may serve as a starting point into that direction.

Regarding our explicit dimension-independent spectral gap estimate for PSS, it is reasonable to ask how the proven estimate generalizes to broader classes of target densities, for example rotationally asymmetric ones, or those not centered around the origin. Neither rotational invariance nor being centered around the origin are properties that are exploited in the generic algorithmic description of PSS. Therefore, those seem to be merely exploited within our analysis technique rather than necessary. It would be interesting to find other, more commonly used properties, e.g., strong concavity of smooth log-densities, that yield dimension independent convergence results. Unfortunately, the proof of our result cannot readily be adapted to such cases and it is unknown how the spectral gap behaves.

As explained before, a crucial motivation for studying PSS is Gibbsian polar slice sampling, introduced in Schär et al. (2023). It can be considered a hybrid slice sampler, cf. Latuszynski and Rudolf (2014), that mimics PSS. Under suitable assumptions, it has been shown in Latuszynski and Rudolf (2014) that hybrid uniform slice sampling has a positive spectral gap whenever USS has one. Explicit lower bounds of the gap are to some extent inherited. It is of course very natural to ask for an extension of this result regarding PSS and the Gibbsian approach.