Dimension-independent spectral gap of polar slice sampling

Polar slice sampling, a Markov chain construction for approximate sampling, performs, under suitable assumptions on the target and initial distribution, provably independent of the state space dimension. We extend the aforementioned result of Roberts and Rosenthal (Stoch Model 18(2):257–280, 2002) by developing a theory which identifies conditions, in terms of a generalized level set function, that imply an explicit lower bound on the spectral gap even in a general slice sampling context. Verifying the identified conditions for polar slice sampling yields a lower bound of 1/2 on the spectral gap for arbitrary dimension if the target density is rotationally invariant, log-concave along rays emanating from the origin and sufficiently smooth. The general theoretical result is potentially applicable beyond the polar slice sampling framework.


Introduction
We consider the problem of approximate sampling of a distribution, which is, in the context of Bayesian inference, a permanently present challenge.The goal is to simulate realizations of a random variable that is distributed according to a probability measure of interest π defined on (R d , B(R d )), with d ∈ N and B(R d ) being the Borel σ-algebra of R d .We assume to be able to evaluate a not necessarily normalized Lebesgue density of π given by ϱ : R d → R + , i.e., for any A ∈ B(R d ) we have where is an unknown normalization constant.Because of the only partial knowledge about ϱ, the standard approach for dealing with such sampling problems is to construct a Markov chain with limit distribution π.
The slice sampling methodology (see e.g.Besag and Green (1993)) provides a framework for the construction of a Markov chain (X n ) n∈N0 with πreversible transition kernel, where the distribution of X n converges (under weak regularity conditions) to the distribution of interest, see e.g.Roberts and Rosenthal (1999).We focus here on polar slice sampling (PSS) that exploits the almost surely welldefined factorization ϱ(x) = p 0 (x)p 1 (x) with where ∥•∥ denotes the Euclidean norm in R d .The choice of this particular factorization in the slice sampling context has been proposed in Roberts and Rosenthal (2002).The resulting transition mechanism of the corresponding Markov chain (X n ) n∈N0 on (R d , B(R d )) can be presented as follows.
Algorithm 1.1.Given the target density ϱ = p 0 p 1 and the current state X n−1 = x, PSS w.r.t.ϱ generates the next instance X n by the following two steps: 1. Draw an auxiliary random variable T n with respect to (w.r.t.) the uniform distribution on (0, p 1 (x)).Call the realization t n and define the super level set L(t n , p 1 ) := {z ∈ R d | p 1 (z) > t n }.
2. Draw X n from the distribution µ tn on R d that is given by µ tn (A) := A∩L(tn,p1) p 0 (z) dz L(tn,p1) p 0 (z) dz . Roberts and Rosenthal (2002) offer an implementation of Algorithm 1.1 using polar coordinates and an acceptance rejection approach w.r.t.radius and spherical element.Admittedly, already in easy examples, the acceptance probability can be very small, which turns the implementation to be computationally demanding, especially in the case of large d.However, in Schär et al. (2023) a Gibbsian polar slice sampling methodology has been proposed that on the one hand mimics PSS and on the other hand offers a computationally feasible scheme.Actually our investigation is very much driven by the hope to carry the result about the dimension-independence of PSS over to this related approach.To illustrate the empirically dimension-independent performance of PSS we present the following numerical illustration.
Motivating numerical illustration.We consider the polar and uniform slice sampling Markov chains.The transition mechanism of the latter is exactly as stated in Algorithm 1.1, except that it sets the factorization of ϱ to p 0 (x) := 1 and p 1 (x) := ϱ(x) for any x ∈ R d (in contrast to (2)).For both the unimodal target density1 ϱ(x) = exp(−∥x∥) and the volcano-shaped target density ϱ(x) = exp −(∥x∥ − 2) 2 , we plot in Figure 1 proxies of the integrated autocorrelation time (IAT) of the aforementioned Markov chains, depending on the state space dimension d.Since the IAT characterizes the asymptotic mean squared error 2 (and the asymptotic variance within CLTs) of the Markov chain Monte Carlo time average w.r.t.summary function g : R d → R we can conclude that the smaller it is, the 'better' is the Markov chain.We consider g(x) = ∥x∥.In Figure 1 it is clearly visible that the IAT of PSS is constantly slightly larger than 1 where π(g) = R d gdπ with π(g 2 ) < ∞ and IAT g,P defined in (8) below.
Fig. 1 Sample space dimension d versus approximations of the integrated autocorrelation time IAT g,P , as defined in (8), computed using the heuristic described in (Gelman et al., 2013, Chapter 11.5) regardless of the dimension.In contrast to that the IAT of uniform slice sampling (USS) increases as the state space dimension increases, showing that the efficiency of the corresponding Markov chain degenerates with increasing dimension.That is also theoretically confirmed in Natarovskii et al. (2021).However, it is particularly surprising that PSS exhibits such remarkably 'good' constant dimension behavior.Roberts and Rosenthal (2002) explain this behavior in their Theorem 7 with Remark 8 by proving that for any rotational invariant ϱ that is log-concave along rays emanating from the origin and any initial state where is the total variation distance between π and P 525 (x, •).Actually, in (Roberts and Rosenthal, 2002, Theorem 7), there is no rotational invariance assumption, but an asymmetry parameter appears, and, as long as this does not depend on the dimension, the former result holds by changing the 525 to some larger number still independent of d.
We refine and extend the result (3) by providing in the same setting a lower bound of 1/2 of the spectral gap of the Markov operator of PSS.Even though we postpone the definition and discussion about the spectral gap of a Markov chain (or corresponding transition kernel) to Section 2, we want to briefly motivate here that it is a crucial object.A quantitative lower bound of the gap of a transition kernel P corresponding to a Markov chain (X n ) n∈N0 with stationary distribution π implies a number of useful properties.These include geometric convergence (with explicit convergence rate) of the distribution of X n to π as n → ∞ (see ( 6) or (Roberts andRosenthal, 1997, Theorem 2.1) or (Gallegos-Herrada et al., 2022, Theorem 1)), a non-asymptotic error bound for the classical Markov chain Monte Carlo time average (Rudolf, 2012, Theorem 3.41), a central limit theorem (CLT) Kipnis and Varadhan (1986) and an estimate of the CLT asymptotic variance Flegal and Jones (2010).Moreover, it implies an explicit upper bound of the IAT (which follows for example by ( 7) below) and therefore explains the motivating numerical illustration straightforwardly.
Our investigation builds upon the work of Natarovskii et al. (2021).There, among other things, a duality technique that gives sufficient conditions for quantitative lower bounds on the spectral gap of USS has been developed.We extend the duality argumentation to general slice sampling (with a non-specified factorization ϱ = ϱ 0 ϱ 1 ) and apply the resulting theory to PSS.More precisely, in the general setting we offer a sufficient condition of the spectral gap in terms of properties of the function ℓ ϱ0,ϱ1 : (0, ∞) → R + given by see Theorem 3.9 and Definition 3.7 below.Applying this result in the context of the PSS factorization yields the dimension-independent lower bound of 1/2 of the spectral gap, as long as ϱ is rotational invariant, log-concave along rays emanating from the origin and sufficiently smooth, see Theorem 3.13.We now provide some guidance trough the structure of the paper.In the next section we introduce our notation and define all required Markov chain related objects.Afterwards, in Section 3.1, we discuss how a number of theoretical results from Natarovskii et al. (2021) translate from USS to the general case.In Section 3.2, we apply the results from Section 3.1 to PSS, thereby proving a lower bound on its spectral gap.Concluding remarks with a discussion of our results and an outlook can be found in Section 4.

Preliminaries
We introduce our notation and state some useful facts.All appearing random variables map from a joint sufficiently rich probability space onto their respective state space.With λ we denote the Lebesgue measure on (R, B(R)) and for the surface measure on the Euclidean unit sphere S d−1 equipped with its natural Borel σ-algebra B(S d−1 ) we write σ d−1 .We provide details about kernels.
Let (G, G) and (H, H) be measurable spaces.A transition kernel on G × H is a mapping P : G × H → [0, 1] such that P (•, A) is a measurable function for all A ∈ H and P (x, •) ∈ M 1 (H) for all x ∈ G, where M 1 (H) denotes the set of probability measures on (H, H).Let P be a transition kernel on G × G, then P acts on measurable functions g : G → R by Let Q be a transition kernel on G × H and let ξ ∈ M 1 (G), then Q acts on ξ as and defines a probability measure, i.e., ξQ ∈ M 1 (H).Moreover, the tensor product of ξ and Q is defined as the probability measure on (G×H, G ×H) determined by Additionally, let R be a transition kernel on H × G, then the composition of Q and R is the transition kernel QR on G × G defined by Using this, for a transition kernel P on G × G, one recursively defines P 1 := P and P n := P P n−1 for n ≥ 2. For a Markov chain (X n ) n∈N0 on (G, G) with transition kernel P and initial distribution ξ ∈ M 1 (G) it is well known that the probability measure ξP n coincides with the distribution of X n .We say that the transition kernel P (and the corresponding Markov chain) has invariant distribution We turn to the definition of the spectral gap of a transition kernel P on G × G that is reversible w.r.t.π ∈ M 1 (G) and therefore has π as invariant distribution.With L 2 (π) we denote the space of measurable functions g : Note that ∥•∥ 2,π is a norm on the quotient space of L 2 (π) under the equivalence relation identifying functions that coincide π-a.e.It is induced by the inner product ⟨•, •⟩ π on L 2 (π) defined by ⟨g, h⟩ π := G g(x)h(x)π(dx).
Observe that P acting on functions g : G → R via g → P g as in (4) defines a linear operator mapping from L 2 (π) into L 2 (π).Interpreting π as a transition kernel that is constant in its first argument, π also induces a linear operator mapping from L 2 (π) into L 2 (π), specifically by This allows us to define the spectral gap of P as where ∥•∥ L2(π)→L2(π) denotes the operator norm w.r.t.∥•∥ 2,π .With these formal notions at hand, we may now explicitly state some of the consequences of spectral gap estimates for π-reversible Markov chains that we already mentioned in the introduction.For example, it is well known, see e.g.(Novak and Rudolf, 2014, Lemma 2), that it implies geometric convergence, i.e., where ∥•∥ tv again denotes the total variation distance.
An explicit lower bound of gap π (P ) also leads to a mean squared error bound of the Markov chain Monte Carlo sample average, see (Rudolf, 2012, Theorem 3.41).Moreover, a classical result of Kipnis and Varadhan (1986) states that if the initial distribution is the invariant distribution π and g ∈ L 2 (π) then the √ n-scaled sample average error converges weakly to the normal distribution N (0, σ 2 g,P ) with mean zero and variance where I denotes the identity map.The significant quantity σ 2 g,P satisfies where with correlations denotes the integrated autocorrelation time.

Spectral Gap Estimate
In this section, we first introduce general slice sampling and derive a tool that can be used to establish spectral gap estimates.We then apply it to PSS.

General Slice Sampling
For the probability measure of interest π ∈ M 1 (R d ) we assume to have an almost sure (w.r.t. the Lebesgue measure) factorization of the not necessarily normalized density of the form with measurable functions ϱ i : R d → R + for i = 0, 1.General slice sampling exploits this representation by (essentially) performing the two steps of Algorithm 1.1, except that ϱ 0 takes the role of p 0 and ϱ 1 the role of p 1 .We refer to the 1st step as T -update and to the 2nd one as X-update.The transition kernels correspond to the aforementioned T -and X-update of Algorithm 1.1 are given by Thus, the Markov chain (X n ) n∈N0 of the slice sampling for π has transition kernel P X = U T U X .Moreover, the sequence of auxiliary random variables (T n ) n∈N , see the 2nd step of Algorithm 1.1, is (also) a Markov chain on ((0, ∞), B((0, ∞))) with transition kernel P T = U X U T .
We now elaborate on how the investigation of the spectral gap of USS by Natarovskii et al. (2021) translates to general slice sampling.As a first step, we provide the invariant distribution of P T , which follows by standard arguments that are also delivered for the convenience of the reader.
Lemma 3.1.Let π ∈ M 1 ((0, ∞)) be determined by the probability density function This yields proving that ϱ is indeed normalized.Plugging this fact into the former computation shows For any measurable F : R d × (0, ∞) → R (for which one of the following integrals exists) the latter equation extends to Therefore, we obtain The last expression is symmetric in B 1 and B 2 , such that a backwards argumentation interchanging the roles of B 1 and B 2 shows that P T is reversible w.r.t.π.
Note that by the same steps one can prove the well-known fact that P X is reversible w.r.t.π.Having this we are able to formulate our spectral gap duality result.The statement follows by the application of Lemmas A.1 and A.2 that can be found in the appendix.
Proof.Define the linear operators W := U T − π and W * := U X − π.By Lemmas A.1 and A.2 (i), we know that W * is the adjoint operator of W . Furthermore, by Lemma A.2 (ii) and the fact that P X = U T U X , we get Analogously, by Lemma A.2 (iii) and the fact that the respective operator norms and applying some well-known facts from functional analysis (Werner, 2011, Theorem V.5.2), we obtain By the spectral gap's definition, this implies the claimed identity.
Keeping in mind that ϱ = ϱ 0 ϱ 1 we verify next that P T (essentially) only depends on the target distribution π through a univariate function ℓ ϱ0,ϱ1 .Here ℓ ϱ0,ϱ1 can be considered as an immediate extension of the level-set function from (Natarovskii et al., 2021, e.g.Lemma 2.4) into the general slice sampling setting.We start with a proper definition.
For any B ∈ B((0, ∞)), let us define a function and observe that for any x ∈ R d .Now, by the change of variables formula (Bogachev, 2007, Theorem 3.6.1)and the fact that ξ • ϱ −1 1 is the Lebesgue-Stieltjes measure generated by −ℓ ϱ0,ϱ1 , we get for any t > 0, B ∈ B((0, ∞)), which proves the claimed result.
By combining the previous two theorems suitably we are able to show that if two distributions have the same function ℓ ϱ0,ϱ1 , the spectral gaps of slice sampling for them also coincide.
Proof.By Theorem 3.4 and the assumption ℓ ϱ0,ϱ1 ≡ ℓ η0,η1 , we immediately get P (π) T (t, B) for all t ∈ (0, ∞) and B ∈ B((0, ∞)), where P (π) T is the transition kernel of the auxiliary chain (T n ) n∈N of the slice sampler for π and P (ν) T the corresponding one for ν.As the kernels of the auxiliary chains coincide, their invariant distributions, say π, ν (cf.Lemma 3.1), must do so as well, i.e., π ≡ ν.Applying Theorem 3.2 twice yields In contrast to the investigation of Natarovskii et al. (2021) the former result shows that two different slice samplers (possibly based on different kinds of factorizations, not just different target distributions) have the same spectral gap as long as their corresponding generalized level-set functions coincide.We illustrate the variability of this result in the following consideration.
Overall, this yields for all t ∈ (0, ∞).Hence by Theorem 3.5 the spectral gaps that correspond to the different slice sampling schemes coincide.In particular, from (Natarovskii et al., 2021, Example 3.15) we know that gap ν (P (ν) X ) ≥ 1/2.Consequently, we obtain for PSS that also gap π (P (π) X ) ≥ 1/2.The example already indicates how Theorem 3.5 can be applied to carry the spectral gap from one slice sampling scheme to another.Now, we identify properties of the generalized level set function that allow the formerly stated 'carrying over' in a universal fashion, cf.(Natarovskii et al., 2021, Definition 3.9).Definition 3.7.For any k ∈ N, we define Λ k as the class of continuous functions ℓ : (0, ∞) → R + that satisfy (i) lim t→∞ ℓ(t) = 0 and L := lim t↘0 ℓ(t) ∈ (0, ∞], (ii) ℓ restricted to its open support is strictly decreasing, and (iii) the function g : (0, Remark 3.8.Conditions (i) and (ii) together with the assumed continuity of ℓ guarantee that ℓ restricted to its open support supp(ℓ) maps surjectively onto I ℓ := (0, L).As condition (ii) also guarantees injectivity of this restricted function, it must actually be bijective, which gives the existence of the inverse function ℓ −1 : I ℓ → supp(ℓ) used in condition (iii).Observe that, as the inverse of a strictly decreasing function, ℓ −1 must again be strictly decreasing.The properties of the formerly defined classes of functions allow us to construct for ℓ ∈ Λ k a not necessarily normalized density function η : R k → R + for which USS, targeting ν ∈ M 1 (R k ) given by has a spectral gap of at least 1/(k + 1) and satisfies ℓ 1,η ≡ ℓ.With that and Theorem 3.5 we can draw conclusions about the spectral gap of generalized slice sampling.The correspondingly formulated statement reads as follows.
Theorem 3.9.Given ϱ 0 : R d → R + and a not necessarily normalized density ϱ : R d → R + , choose ϱ 1 : R d → R + , so that ϱ = ϱ 0 ϱ 1 .Let π ∈ M 1 (R d ) be specified by ϱ as in (1).Let P (π) X be the transition kernel that corresponds to slice sampling for π based on ϱ 0 and ϱ 1 .Then, for k ∈ N with ℓ ϱ0,ϱ1 ∈ Λ k , we have Proof.To shorten the notation we set ℓ := ℓ ϱ0,ϱ1 and Then, one readily observes that • ϕ is strictly increasing as composition of the strictly increasing function r → σ k−1 (S k−1 )/k • r k and the strictly decreasing functions ℓ −1 and r → − log r, • ϕ is convex as composition of the linear function r → (σ k−1 (S k−1 )/k) 1/k r and the (by Definition 3.7 (iii)) convex function r → − log ℓ −1 (r k ); and • the inverse of ϕ is given by Consider ν ∈ M 1 (R k ) as in (9) determined by the not necessarily normalized Lebesgue-density η : R k → R + given by with ϕ from (10).By the fact that ϕ is strictly increasing and convex, (Natarovskii et al., 2021, Corollary 3.1) yields that the spectral gap of the transition kernel P (ν) X of USS for ν satisfies Now the goal is to show that the level-set function 1,η is identical to ℓ = ℓ ϱ0,ϱ1 .We obtain for 0 ̸ = x ∈ R k and t ∈ (0, sup y∈R k η(y)) that η(x) > t ⇔ ϕ(∥x∥) < − log t and ∥x∥ < κ where the second equivalence relies on ϕ −1 being strictly increasing, and the third equivalence on ϕ −1 mapping to the domain (0, κ) of ϕ, so that in particular ϕ −1 < κ.Hence, the super-level set of η is Consequently, by the polar coordinates formula, see Proposition A.3, we get where the last equality follows by plugging in (11).Finally, by Theorem 3.5 and ( 12), we obtain which concludes the proof.
We add an open question about the result.For this the following class of 'good', in the sense of the previous theorem, probability measures is required.
Then, Theorem 3.9 yields inf i.e., the worst case behavior of the spectral gap on the input class Π ϱ0,k is at least 1/(k + 1).The question we pose is how good the lower bound actually is.In other words, is there a matching upper bound of the worst case spectral gap that also converges with k → ∞ to zero and if so, does it lead to the fact that the lower bound cannot be improved?Any insight into that direction may lead to a characterization of the spectral gap of generalized slice sampling and therefore indicate its limitations.We finish this section with an immediate consequence of Theorem 3.9 w.r.t.PSS.
Remark 3.12.In the setting of the previous corollary assume that k does not depend on d.In that case we have already a dimension-independent lower bound of the spectral gap of PSS.We want to emphasize that even though the spectral gap is independent of d, implementing the 2nd step of Algorithm 1.1 may lead to an acceptance probability that decreases w.r.t.d.The already mentioned Gibbsian polar slice sampler of Schär et al. (2023) addresses this issue.
We now move on to our next main result, where we apply Theorem 3.9 (or Corollary 3.11) and provide concrete properties of ϱ that lead to a spectral gap of at least 1/2 of PSS.

Polar Slice Sampling
We assume here d ≥ 2 and note that PSS coincides with USS for d = 1.Consequently, in that case, spectral gap estimates for the latter carry over to the former.
The strategy in this section is to consider a specific class of not necessarily normalized density functions ϱ for which we verify that the corresponding PSS generalized level set function ℓ p0,p1 satisfies ℓ p0,p1 ∈ Λ 1 .Then, by applying Theorem 3.9 we readily obtain a dimension independent lower bound of the spectral gap of PSS.We formulate the main statement and discuss the required assumptions.
Theorem 3.13.Let π ∈ M 1 (R d ) be a distribution with not necessarily normalized density ϱ given by ϱ(x) = exp(−ϕ(∥x∥))1 (0,κ) (∥x∥), where κ ∈ (0, ∞] and ϕ : (0, κ) → R is a convex and twice differentiable function that satisfies lim r↗κ ϕ(r) = ∞.Then we have gap π (P ) ≥ 1 2 , where X is the transition kernel of PSS for π.Remark 3.14.We discuss the conditions and appearing objects of the theorem: • The parameter κ controls the support of ϱ: If it is finite, ϱ is only supported on the zero-centered Euclidean ball of radius κ, and if it is infinite, ϱ is supported on all of R d .• The densities ϱ to which the theorem applies are rotationally invariant, i.e. they may only depend on the function's argument x through ∥x∥.• The convexity constraint on ϕ gives that ϱ is logconcave along rays emanating from the origin, which already emerged to be a useful property for proving theoretical results regarding PSS in Roberts and Rosenthal (2002).In particular, it guarantees that the later appearing function h 1 : (0, κ) → R + , r → r d−1 exp(−ϕ(r)) has interval-like super level sets.
• That the function ϕ is required to be twice differentiable eases our proof, but we believe that the theorem's claim is still true without this assumption.• The condition lim r↗κ ϕ(r) = ∞ means that ϱ must tend to zero whenever its argument approaches the boundary of the support.The requirement is always satisfied when κ = ∞, since ϱ is assumed to be Lebesgue-integrable.
Note that the rotational invariance, convexity and the constraint lim r↗κ ϕ(r) = ∞ without any further monotonicity requirements on ϕ lead to two types of admissible target densities: Unimodal densities, which result from non-decreasing ϕ, and "volcano"shaped densities, which result from ϕ that are initially strictly decreasing and then at some point become strictly increasing.Particularly notable is that the lower-bound of the spectral gap on the class of ϱ specified in the theorem is constant, i.e., does not depend on any continuity or concentration parameter, not even on the state space dimension d.In the course of proving Theorem 3.9 we start with a characterization of the corresponding level set functions.
Lemma 3.15.Assume that the requirements of Theorem 3.13 are satisfied.Factorize ϱ in accordance with PSS, i.e., ϱ = p 0 p 1 (cf.( 2 Then, there exists a value r mode ∈ (0, κ) such that with functions that are strictly decreasing and strictly increasing, respectively.
Proof.The generalized level set function ℓ p0,p1 of ϱ = p 0 p 1 , given as in (2), satisfies, by virtue of Proposition A.3, for all t > 0 that We analyze the function h 1 to deduce the claimed representation of ℓ p0,p1 from the former expression.
Combining these observations, we see that h 2 is upper-bounded by zero on (0, r ϕ ) and strictly increasing towards +∞ on (r ϕ , κ).Therefore, there exists an r mode ∈ (0, κ) such that r → d−1−rϕ ′ (r) within h ′ 1 is positive on (0, r mode ) with h 2 (r mode ) = d − 1 and negative on (r mode , κ).Consequently, for r ∈ (0, r mode ) we get whereas for r ∈ (r mode , κ) we have In other words h 1 is unimodal with mode located at r mode .Moreover, the above shows that h 1 | ]0,r mode [ , the inverse of r min , is a strictly increasing function, which implies that r min is also strictly increasing.
Using the formerly developed tool, we are able to deliver the proof of the theorem.
Proof of Theorem 3.13.To verify the statement of Theorem 3.13 we show that for ϱ, satisfying the assumptions formulated there, the corresponding level set function ℓ p0,p1 satisfies ℓ p0,p1 ∈ Λ 1 .By Lemma 3.15 it is easily seen that ℓ p0,p1 is a continuous function, such that it is sufficient to check (i), (ii) and (iii) of Definition 3.7 for k = 1.To (i): Just by being a generalized level set function, ℓ p0,p1 satisfies the limit properties.To (ii): The monotonicity properties of r max and r min provided by Lemma 3.15 yield that ℓ p0,p1 is strictly decreasing on supp(ℓ p0,p1 ).To (iii): By Proposition A.5 it is sufficient to show that h 3 (s) := ℓ p0,p1 (exp(−s)) is concave on a set D, which, by Lemma 3.15, is here given by D = (− log h 1 (r mode ), ∞).Using the lemma's representation of ℓ p0,p1 , we can rewrite h 3 as Consequently, the concavity of h 3 follows by concavity of as well as convexity of h 5 : (− log h 1 (r mode ), ∞) → (0, r mode ) , s → r min (exp(−s)).
Concavity of h 4 means convexity of −h 4 .Clearly, −h 4 is continuous and as h 4 is the composition of two strictly decreasing functions, it is strictly increasing, so −h 4 is strictly decreasing.By Lemma A.4, convexity of −h 4 is equivalent to convexity of its inverse r → h −1 4 (−r), which in turn is equivalent to convexity of h −1 4 itself (as the graph of one of these functions is just a reflection of that of the other on the axis r = 0), which is given by However, since ϕ is convex by assumption and − log is known to be convex, the convexity of h −1 4 is obvious.As the composition of a strictly increasing and a strictly decreasing function, h 5 is strictly decreasing.Because h 5 is clearly also continuous, applying Lemma A.4 again yields that the convexity of h 5 is equivalent to that of its inverse h −1 5 , which is given by Thus, the convexity of h −1 5 follows by the same argument as that of h −1 4 .Therefore (iii) for k = 1 is proven and ℓ p0,p1 ∈ Λ 1 .By Theorem 3.9 this implies the claimed spectral gap estimate.

Concluding Remarks
Driven by empirically observed dimension independent IAT behavior, as documented in the motivating illustration in Section 1, and the recent algorithmic contribution about Gibbsian polar slice sampling, we investigated the spectral gap of PSS.For arbitrary dimension, if ϱ, the possibly not normalized density function of the distribution of interest, is rotationally invariant, log-concave along rays emanating from the origin and sufficiently smooth we proved a lower bound of 1/2 on the spectral gap.Along the way we significantly extended the theory of Natarovskii et al. (2021) into the setting of general slice sampling that is based on a factorization ϱ = ϱ 0 ϱ 1 .In Definition 3.7 we presented a class of functions Λ k , already introduced in (Natarovskii et al., 2021, Definition 3.9), which provides the required conditions on the level set function ℓ ϱ0,ϱ1 for verifying the lower bound 1/(k+1) of the spectral gap for generalized slice sampling.As an immediate consequence this lower bound can be applied in the PSS-setting.Moreover, it served as the main tool for proving the aforementioned dimension-independent spectral gap estimate for PSS.
We point to open questions, limitations and some directions of future work.Let us start with the question that has already been formulated after Definition 3.10 on how 'good' the lower bound of the spectral gap of generalized slice sampling of Theorem 3.9 actually is.We conjecture that at least for some ϱ 0 on the class Π ϱ0,k the result cannot be qualitatively improved.We surmise that there is an upper bound 'function' u : N → R + with lim k→∞ u(k) = 0 such that the worst case spectral gap satisfies By proving this conjecture, one would show that the parameter k is the right quantity for characterizing the spectral gap of general slice sampling, which points to the limitation that for large k the 'efficiency' of slice sampling indeed deteriorates.Related to the understanding of the limitations of Theorem 3.9 one may ask whether an extension into a manifold setting is possible.Recently there have been investigations of slice sampling approaches on the sphere, see e.g.Habeck et al. (2023); Lie et al. (2021), which may serve as a starting point into that direction.
Regarding our explicit dimension-independent spectral gap estimate for PSS, it is reasonable to ask how the proven estimate generalizes to broader classes of target densities, for example rotationally asymmetric ones, or those not centered around the origin.Neither rotational invariance nor being centered around the origin are properties that are exploited in the generic algorithmic description of PSS.Therefore, those seem to be merely exploited within our analysis technique rather than necessary.It would be interesting to find other, more commonly used properties, e.g., strong concavity of smooth log-densities, that yield dimension independent convergence results.Unfortunately, the proof of our result cannot readily be adapted to such cases and it is unknown how the spectral gap behaves.
As explained before, a crucial motivation for studying PSS is Gibbsian polar slice sampling, introduced in Schär et al. (2023).It can be considered a hybrid slice sampler, cf.Latuszyński and Rudolf (2014), that mimics PSS.Under suitable assumptions, it has been shown in Latuszyński and Rudolf (2014) that hybrid uniform slice sampling has a positive spectral gap whenever USS has one.Explicit lower bounds of the gap are to some extent inherited.It is of course very natural to ask for an extension of this result regarding PSS and the Gibbsian approach.Proof.See (Schilling, 2005, Theorem 15.13).
Lemma A.4. Assume h : I 1 → I 2 is a strictly monotone, continuous function mapping between open intervals I 1 , I 2 ⊆ R. Then it is clear that h maps bijectively onto its image, w.l.o.g.I 2 , with the inverse h −1 : I 2 → I 1 having the same monotonicity property.Moreover, • if h is increasing, then it is convex if and only if h −1 is concave • if h is decreasing, then it is convex if and only if h −1 is convex.
Proof.Denote by α ∈ [0, 1] and r 1 , r 2 ∈ I 1 arbitrary elements of the respective sets, then, due to the bijectivity of h onto I 2 , the values s i := h(r i ), i = 1, 2, are arbitrary elements of I 2 .Now where in the last step we use the fact that h and h −1 have the same monotonicity property.
It is easy to verify that the inverse h −1 : D → (0, L 1/k ) of h is given by Because h is composed of continuous functions, h is also continuous.Moreover, since h is a composition of the strictly decreasing functions − log and ℓ −1 and the strictly increasing function r → r k , it is strictly increasing.Hence, by Lemma A.4, convexity of h is equivalent to concavity of h −1 , which is precisely the condition stated in the proposition.
Fig. 1 Sample space dimension d versus approximations of the integrated autocorrelation time IAT g,P , as defined in (8), computed using the heuristic described in (Gelman et al., 2013, Chapter 11.5).The figure on top depicts IATs for the target density ϱ(x) = exp(−∥x∥) and that below IATs for target density ϱ(x) = exp −(∥x∥ − 2) 2 .Both figures use the summary function g(x) = ∥x∥.Each plotted point represents an average over nrep = 10 separate runs of the samplers, using n it = 10 5 iterations for each sampler and repetition.