Discrete mixture representations of spherical distributions

We obtain discrete mixture representations for parametric families of probability distributions on Euclidean spheres, such as the von Mises--Fisher, the Watson and the angular Gaussian families. In addition to several special results we present a general approach to isotropic distribution families that is based on density expansions in terms of special surface harmonics. We discuss the connections to stochastic processes on spheres, in particular random walks, discrete mixture representations derived from spherical diffusions, and the use of Markov representations for the mixing base to obtain representations for families of spherical distributions.


Introduction
A discrete mixture representation for a parametric family {P θ : θ ∈ Θ} of probability measures in terms of another family {Q n : n ∈ N 0 } of probability measures, the mixing base, all defined on the same measurable space, is of the form (1) Here, for each θ ∈ Θ, the mixing coefficients (w θ (n)) n∈N0 are the individual probabilities of a distribution W θ , the mixing distribution, on (the set of subsets of) N 0 .A classical case is the representation of non-central chisquared distributions with k degrees of freedom, P θ = χ 2 k (θ 2 ) with non-centrality parameter θ 2 > 0, as Poisson mixtures of central chisquared distributions, where Q n = χ 2 2n+1 := χ 2 2n+1 (0) and where W θ is the Poisson distribution with mean λ = θ 2 /2; see e.g. the books of Liese and Miescke (2008) and Mörters and Peres (2010) where the representation appears in statistics in connection with the power of statistical tests and in probability theory in connection with the local times of Markov processes respectively.Such mixture representations can be related to two-stage experiments: In order to obtain a value x with distribution P θ we first choose n according to W θ and then choose x according to Q n .This leads to an immediate application of discrete mixture representations in the context of simulation methodology.
In the present paper we continue our previous investigations (see Baringhaus and Grübel (2021a,b)), and now specifically consider distributions on the Euclidean sphere S d := {x ∈ R d+1 : x = 1} of (d + 1)-dimensional real vectors of unit length.This case seems to us to deserve some interest, in particular if specific properties of spheres are taken into account: The group O(d+1) of orthogonal transformations of the ambient space R d+1 acts transitively on S d , and there is a 'polar decomposition' (or 'tangent-normal decomposition', see Section 4.3) that relates We generally assume that the distribution parameters in the above general setup are of the form θ = (η, ρ), where η ∈ S d may be seen as a location parameter; instead of P θ we also write P η,ρ .We obtain mixture representations that split the dependence on the two parts of the parameter in the sense that (2) In particular, the mixing distributions depend on ρ only.For fixed ρ on the left, or fixed n ∈ N 0 on the right hand side of (2), the families {P η,ρ : η ∈ S d } respectively {Q n,η : η ∈ S d } are parametrized by the sphere and are defined on its Borel subsets B(S d ).We assume that these families interact with the group action mentioned above in the sense that they are isotropic; see (8) below.In particular, their elements are then rotationally symmetric about the axis specified by η.As a simple application of the representation (2) we mention that with the finite sums R η,ρ (A) := n k=0 w θ (n) Q k (A) we have monotonically increasing approximations of the probabilities P η,ρ (A), A ∈ B(S d ), with uniform error bounds in the sense that 0 ≤ P η,ρ (A) − R η,ρ (A) ≤ The literature contains several other applications; see for example the relation to nonparametric Bayesian inference in Baringhaus and Grübel (2021b, Section 5.1).
In Section 2 we collect some basic notation and obtain mixture representations for the von Mises-Fisher family and two spherical Cauchy families in Theorem 2, the Watson family in Theorem 3, and an angular Gaussian family in Theorem 5.The mixing bases are chosen specifically for the respective family, with a view towards reflecting its properties.A different base will generally lead to a different representation, as demonstrated by Baringhaus and Grübel (2021a) in the context of non-central chisquared distributions.In Section 3 we present a general approach that uses expansions of densities in terms of special surface harmonics.The resulting mixing base has a structural property that we call self-mixing stability.This property makes it comparably easy to relate different expansions to each other.We obtain representations with this mixing base for the wrapped Cauchy and the wrapped normal families in Theorem 8, and for the von Mises-Fisher families in Theorem 9.These results only hold under conditions on the parameter ρ that ensure that the respective distribution is not too far away from the uniform distribution on the sphere; in Example 11 we work out a possibility for extending this range.
For fixed ρ or n we may regard P η,ρ and Q n,η as probability kernels via (η, A) → P η,ρ (A), (η, A) → Q n,η (A).This provides a general connection with Markov processes.We briefly return to the classical mixture representation of non-central one-dimensional chisquared distributions, which may be written as (3) Here the random variables N (θ), X, E 1 , E 2 , . . .are independent, X has the standard normal distribution, E 1 , E 2 , . . .are exponentially distributed with mean 1, and N (θ) has the Poisson distribution with parameter θ 2 /2.The path-wise point of view displays the distributions χ 2 1 (θ), θ ≥ 0, as the distributions of randomly stopped partial sums of independent random variables, and (3) may be used to read off stochastic monotonicity and infinite divisibility of non-central chisquared distributions.Note that the representation only covers the onedimensional marginal distributions of the process ((X + θ) 2 ) θ≥0 , as the left hand side of (3) is obviously not pathwise monotone in the 'time parameter' θ.Quite generally, (1) can be related to randomly stopped stochastic processes: If X = (X n ) n∈N0 is such that X n has distribution Q n for all n ∈ N 0 then X τ has distribution P θ if τ is independent of X and has distribution W θ .
In Section 4 we discuss several connections between families of spherical distributions and stochastic processes on spheres.We consider random walks on spheres in Section 4.1, distribution families that arise in connection with diffusion processes in Section 4.2, and the use of Markov representations of the mixing base in connection with almost sure representations for distribution families in Section 4.3.The ultraspherical mixing base from Section 3 will be useful at various stages.
For a single transition kernel we obtain a family (X η n ) n∈N0 of Markov chains indexed by their initial state η, meaning that X η 0 = η with probability 1. Isotropy of the kernel then extends to isotropy of the corresponding distributions on the path space.For the elements of the mixing base in Section 3 the marginal distributions of these chains have a particular simple description.Further, isotropy relates a family {P η : η ∈ S d } to a single distribution on [−1, 1] via the latitude projection x → η t x, with η as 'north pole'.For the chain X η = (X η n ) n∈N0 we obtain an associated latitude process Y = (Y n ) n∈N0 via Y n := η t X n for all n ∈ N 0 .For isotropic kernels this is again a Markov chain, now on [−1, 1] and with start at 1. Finally, for the von Mises-Fisher distributions we show that a homogeneous Markov process on the sphere with these as marginal distributions does not exist, see Theorem 16, and we obtain a result similar to (3), see Example 18.
Proofs are collected in Section 5. Mixing of distributions is a standard topic in probability theory and statistics, see e.g.Lindsay (1995).Spherical data and families of spherical distributions have similarly been investigated for a long time and by many researchers; standard references are the classic monograph of Watson (1983) and, more recently, the book of Mardia and Jupp (2000).For a review of distributions on spheres we refer to Pewsey and García-Portugués (2021), see also Watson (1982).Of particular interest for the topics treated here is the very recent paper of Mijatović et al. (2020) where a discrete mixture representation for the marginal distributions of spherical Brownian motion is developed.More specific references will be given at the appropriate places below.

Generalities and some special results
We need some basic notions and definitions.We write X ∼ µ if X is a random variable on some background probability space (Ω, F , P) with distribution µ.Formally, let (Ω, A) and (Ω ′ , A ′ ) be measurable spaces and suppose that T : Ω → Ω ′ is (A, A ′ )-measurable.Then the push-forward P T of a probability measure P on (Ω, A) under T is the probability measure on (Ω ′ , A ′ ) given by P T (A) = P (T −1 (A)), A ∈ A ′ , and X ∼ µ is the same as P X = µ.For many of the measurable spaces considered below there is a canonical uniform distribution, often defined by invariance under a group operation.To avoid tiresome repetitions we agree that densities refer to the respective uniform distribution if not specified otherwise.
We fix a dimension d ≥ 1, but instead of d we often use as this is common in connection with families of special functions.In particular, whenever d and λ appear together, they are related by (4).The group O(d + 1) of orthogonal (d + 1) × (d + 1)-matrices U acts on S d via x → U x, and the uniform distribution unif(S d ) on the sphere is the unique probability measure on the Borel subsets of S d that is invariant under all such transformations.For a fixed η ∈ S d the push-forward ν d of unif(S d ) under the mapping x → η t x has density h d with respect to the uniform distribution unif(−1, 1) on the interval [−1, 1], where Note that this does not depend on η with unif(C d (η, y)) the unique probability measure on this set that is invariant under the subgroup ).This may be seen in the context of the polar decomposition mentioned in the introduction.Conversely, given a probability measure ν on [−1, 1] and a parameter η ∈ S d , we can construct a distribution µ = µ η on S d via the kernel (y, A) → unif(C d (η, y))(A).In particular, for bounded and measurable functions φ : For η ∈ S d and a measurable function g : is unif(S d )-integrable if and only if g is ν d -integrable, and then Thus f η is the density of a probability measure on S d if and only if g is the ν d -density of a probability measure on [−1, 1].In particular, a probability density g on [−1, 1] generates a family {Q η : η ∈ S d } of spherical distributions via (7), and such families are isotropic in the sense that In particular, each Q η is invariant under all rotations with axis η.As the function η Some classical special functions will be needed below.Let be the ascending factorials.The modified Bessel functions I α of the first kind are given by with real nonnegative parameter α, the confluent hypergeometric functions are with real positive parameters α and β, and the hypergeometric functions are with real positive parameters α, β, and γ.
Example 1.(a) Let p > − 1 /2 and let ν be the distribution on [−1, 1] with unif(−1, 1)-density y → , and we obtain the spherical power distribution SP d (η, p) with density We mainly use this with p = n ∈ N 0 , and then have where the norming constants are given by (c) The von Mises-Fisher distributions, which we denote by MF d (η, ρ), ρ > 0, arise if we start with ν d -density proportional to y → exp(ρy), −1 ≤ y ≤ 1.The continuous density of the associated spherical distribution is where the norming constants are given by .
(d) The Watson distributions Wat d (η, ρ), ρ ∈ R, arise if we begin with ν d -density proportional to y → exp(ρy 2 ), −1 ≤ y ≤ 1.The continuous density of the associated spherical distribution is (e) The angular Gaussian distributions are the distributions of X = Z/ Z , where Z has the (d + 1)-variate normal distribution N d+1 (a, Σ) with mean vector a ∈ R d+1 \ {0} and symmetric positive definite covariance matrix Σ; see, e.g. Watson (1983, p. 108).Here, we exclusively deal with the case where Σ is the identity matrix I d+1 , as the radial parts then lead to isotropic families.The distributions arising in this special case seem to have first been studied in detail by Saw (1978).Putting η = a/ a and ρ = ( 1 2 a 2 ) 1 /2 we denote by AG d (η, ρ) the distribution of X and speak of the angular Gaussian distribution with parameters η and ρ.Its density is represented by the infinite series We refer to Saw (1978), where it is also pointed out that, with a random variable S ∼ χ 2 d+1 , the density can be written as (f) We consider two types of spherical Cauchy distributions: The spherical Cauchy distributions of type I have the unif(S d )-densities and the spherical Cauchy distributions of type II have the unif(S d )-densities both with parameters η ∈ S d and ρ ∈ (0, 1).We denote by CI d (η, ρ) the distribution with unif(S d )-density f CI d ( • |η, ρ), and by CII d (η, ρ) the distribution with unif(S d )-density The distributions in Example 1 (a) -(e) and their push-forwards under x → η t x, respectively, are all classical; for basic as well as specific properties and interesting historical comments we refer to Watson (1982Watson ( , 1983) ) and Mardia and Jupp (2000).It is well known, for example, that the von Mises-Fisher family in part (c) arises from the multivariate normal distributions in part (e) by conditioning on Z .A well known relation with Brownian motion on S d is addressed in Section 4.2 below.In the special case d = 1 = (d + 1)/2 the distributions WC 1 (η, ρ) := CI 1 (η, ρ) = CII 1 (η, ρ) in Example 1 (f) are known as the wrapped Cauchy or circular Cauchy distributions; see, e.g.Mardia and Jupp (2000) and Section 3 below.Hence, with the two types of spherical Cauchy distributions given above we have two different extensions of this distribution family to higher dimensions.For distinction, we added the name supplement 'of type I' and 'of type II', respectively.The spherical Cauchy distributions of type I were introduced and studied by Kato and McCullagh (2020).Generalizing results obtained by McCullagh (1996) for d = 1, the authors especially deal with the behavior of the spherical Cauchy distributions of type I under Möbius transformations.The densities f CII d ( • |η, ρ) were considered by McCullagh (1989), though the author does not speak of spherical Cauchy distributions but, with the push-forward of CII(η, ρ) under x → η t x, of a noncentral version of the univariate symmetric beta distribution.
We recall from the introductory remarks that for a discrete mixture representation we need a mixing base (Q n,η ) n∈N0 , η ∈ S d , where each Q n,η is a probability measure on the sphere, and mixing distributions on N 0 that depend on ρ only; see (2).Of special interest in the latter context are the the negative binomial distributions NB(r, p) with parameters r > 0, p ∈ (0, 1), and probability mass function the confluent hypergeometric series distributions CHS(α, β, τ ) on N 0 with parameters α, β, τ > 0 and probability mass functions chs(n|α, β, τ and the hypergeometric series distributions HS(α, β, γ, τ ) on N 0 with parameters α, β, γ, τ > 0 and probability mass functions For τ = 0 we take CHS(α, β, τ ) and HS(α, β, τ ) to be the one-point mass at 0. The distribution CHS(α, β, τ ) arises as the stationary distribution of a birth-death process with birth rates (α + i)τ and death rates i(β + i − 1), i ∈ N 0 ; see Hall (1956).Note that these three distribution families are subclasses of the family of generalized hypergeometric distributions considered recently by Themangani et al (2020).We also require that each family {Q n,η : η ∈ S d } is isotropic.
We can now state our first results.Let d ∈ N be fixed and let λ be as in (4).
Theorem 3. (a) The family {Wat d (η, ρ) : η ∈ S d , ρ ≥ 0} has a unique discrete mixture representation with mixing base SP d (η, n), n ∈ N 0 .This representation is given by In order to obtain a similar representation for the family of {AG d (η, ρ) : η ∈ S d , ρ > 0} of angular Gaussian distributions we make use of the integral representation of the parabolic cylinder functions D ν with real index ν < 0; see Magnus et al. (1966, p. 328).
k ∈ N 0 , is a probability mass function.
We write DPC(δ, τ ) for the associated discrete parabolic cylinder distribution with parameters δ > 1 /2 and τ > 0. For the special values δ = λ + 1 = (d + 1)/2 with d ∈ N the statement of the lemma also follows from ( 20) below as the values on the right hand side of ( 19) are all nonnegative.
Theorem 5.The family {AG d (η, ρ) : η ∈ S d , ρ > 0} has a unique discrete mixture representation with mixing base SBeta d (η, 1, n + 1), n ∈ N 0 .This representation is given by Remark 6.(a) Regarding probability measures as real functions on a set of events, we may define the series in ( 13) -( 17) and ( 20) as referring to pointwise convergence of functions.
In fact, as the distributions involved all have smooth densities and compact domain, convergence even holds with respect to uniform convergence in spaces of continuous functions.(b) The representations are minimal in the sense that the respective mixing base cannot be reduced.This follows from the uniqueness and the fact that the mixing probabilities are strictly positive.
(c) In connection with the base in Theorem 2 all mixtures have densities that are increasing in η t x, and in Theorem 3 and Theorem 5 all mixtures are invariant under the reflection x → −x.
(d) Interestingly, the von Mises-Fisher family, the spherical Cauchy families and the angular Gaussian family have the same mixing base of spherical beta distributions.So, these families are obtained by picking at random (the index n of) the element SBeta d (η, 1, n + 1) according to the respective mixing distributions.Another family with this mixing base is the family of spherical normal distributions; see Section 4.2.
(e) For a discussion of other similarities as well as differences between the von Mises-Fisher family and the spherical Cauchy family of type I we refer to Kato and McCullagh (2020).The von Mises-Fisher family and the spherical Cauchy family of type II both have representations in terms of multivariate Brownian motion.To be specific, let X = (X t ) t≥0 be a standard Brownian motion in R d+1 , let Y = (Y t ) t≥0 with Y t = ρηt + X t for t ≥ 0 be the drifted standard Brownian motion with constant drift vector ρη, where ρ ≥ 0, η ∈ S d , and let 2013) for a more recent proof and historical remarks on this result.Further, with Chung (1982, p. 170) andMcCullagh (1989).

Ultraspherical mixing bases
Our aim in this section is a mixing base that is applicable for general distribution families where, as before, we consider distributions P θ on (S d , B(S d )), with θ = (η, ρ) ∈ S d × I and I ⊂ R + an interval, that have densities f θ of the form We will occasionally omit d or λ from the notation.Recall that λ = (d − 1)/2 and that ν d is the push-forward of unif(S d ) under the mapping x → η t x.
We assume that the functions g ρ in ( 21) are elements of and on H λ we use the inner product and the norm We deal with a special complete sequence of orthogonal polynomials in this space.For d = 1 and λ = 0 this is the sequence of Chebyshev polynomials T n of the first kind of degree n ∈ N 0 , for d > 1 and λ > 0 we use the sequence of Gegenbauer or ultraspherical polynomials C λ n of degree n ∈ N 0 ; see Erdélyi et al. (1953, Chs. X, XI).These functions play an important role in directional statistics, especially nonparametric directional statistics; see e.g. the papers of Bingham (1972), Giné (1975), Prentice (1978), Baringhaus (1991), Jupp (2008), andGarcía-Portugués et al. (2021).The functions are standardized such that Of course, for n > 0 none of these functions is a probability density with respect to ν d .However, it is known that the Chebyshev polynomials and, for λ > 0, the Gegenbauer polynomials attain their absolute maximum on [−1, 1] at t = 1; see Erdélyi et al. (1953, p. 206, formula (7)) and Abramowitz and Stegun (1964, p. 786).Hence the standardization with We assume throughout this section that g ρ is such that For fixed n ∈ N 0 and η ∈ S d the function H λ n,η : S d → R defined by is the unique surface harmonic of degree n that depends only on η t x and that satisfies H λ n,η (η) = 1; see Erdélyi et al. (1953, p. 238).Obviously, We will repeatedly make use of the basic formulas and Erdélyi et al. (1953, p. 245) and Saw (1984, formula (1.14)).
The mixing bases considered in what follows are of a very simple structure: The densities of the base distributions are built with only two special surface harmonics.To be precise, for n ∈ N and real numbers −1 ≤ α ≤ +1 the functions are unif(S d )-densities of probability distributions ∆ λ n,η,α on S d .These distributions can be regarded as a multivariate generalization of the cardioid distributions introduced by Jeffreys (1948, p. 302); see also Mardia and Jupp (2000, Section 3.5.5).Let ∆ λ 0,η,α := unif(S d ).Here we mainly deal with the special distributions ∆ λ n,η := ∆ λ n,η,1 , but see also Remark 10 (a) and Proposition 12 below.So, for n ∈ N the unif(S d )-density of ∆ λ n,η is simply the sum of the two special surface harmonics H λ 0,η ≡ 1 and H λ n,η .With the mixing base ∆ λ n,η , n ∈ N 0 , given for each η ∈ S d , we obtain discrete mixture representations for all spherical distributions with densities of the form (21) that are not too far away from unif(S d ).This may be seen as an instance of the perturbation approach discussed in the survey paper of Pewsey and García-Portugués (2021) and, indeed, the value of β(ρ) may be interpreted as a distance between ν d and the measure with ν d -density g ρ .By an ultraspherical mixing base we mean a family {∆ λ n,η : n ∈ N 0 }.The following general formula is an immediate consequence of the above definitions and the expansion in ( 22).
Applying this construction to several specific families we have to take care of the crucial condition β(ρ) ≤ 1, equivalently w ρ (0) ≥ 0. In each case, we obtain the mixing distribution and a range of ρ-values for the validity of the representation.Any distribution on N 0 may be written as a mixture of unit mass at 0 and a distribution on N and it turns out that the latter are occasionally from a standard family.For a distribution on N 0 with mass function w on N 0 such that w(0) < 1 we call the distribution on N with mass function n → w(n)/(1 − w(0)), n ∈ N, its zero-truncated counterpart.Some of the results stated in what follows turn out to be simple consequences of Proposition 7.
In the first theorem we consider two families of wrapped distributions, hence d = 1 and λ = 0.There are different notational conventions in the literature; here, we regard the wrapped distribution associated with a given distribution µ on (the Borel subsets of) the real line as the push-forward µ T of µ under the mapping T : R → S 1 , x → (cos(x), sin(x)) t .This is often applied to location-scale families.Alternatively, the interval [−π, π) is used instead of S 1 as the base set for the wrapped distribution.This means that with X ∼ µ one deals with the [−π, π)-valued random variable X 0 as the variable X reduced modulo 2π.If X has the density f with respect to the Lebesgue measure, then X 0 has the unif ([−π, +π))density 2π see Feller (1971, p. 632).For example, the wrapped normal distribution WN 1 (η, ρ) arises from the normal distribution N (α, σ 2 ) with mean α and variance σ 2 , where η = (cos(α), sin(α)) t and ρ = σ 2 .Note that some authors use ρ = 2σ 2 , see, e.g.Hartmann and Watson (1974).With X ∼ N α, σ 2 we deduce from (29) that X 0 has the density It is worthwhile to note that as ρ → 0 the distribution WN 1 (η, ρ) converges weakly to the one-point mass distribution at η.This is in contrast to other distributions considered here.For example, MF 1 (η, ρ), Wat 1 (η, ρ), AG 1 (η, ρ) all converge weakly to the uniform distribution unif(S 1 ) as ρ → 0. Also, as ρ → ∞, the distribution WN 1 (η, ρ) converges weakly to unif(S 1 ).
For the wrapped Cauchy distribution WC 1 (η, ρ) we follow the definition given by Pewsey and García-Portugués (2021): If X has a standard Cauchy distribution with density x → 1/(π(1 + x 2 )), x ∈ R, then we apply the wrapping procedure to Y := σX + α, where σ > 0, α ∈ R, and take η as in the wrapped normal case.For the scaling we use the parametrization ρ := e −σ ∈ (0, 1) and augment this with the limiting uniform distribution at ρ = 0. Using (29) again we obtain that the distribution WC 1 (η, ρ) has the density see also Pewsey and García-Portugués (2021).We write geo N0 (n|p) = p(1 − p) n , n ∈ N 0 , for the probability mass function of the geometric distribution on N 0 with parameter p ∈ (0, 1), and geo N for the mass function of its zero-truncated counterpart.Recall that the function β for a given distribution family is defined in (23).
Theorem 8. (a) For the wrapped Cauchy distributions we have β(ρ) = 2ρ/(1 − ρ) and, for (b) For the wrapped normal distributions we have We deal with the von Mises-Fisher families next.For these, we need variants of the Skellam distribution with parameter ρ.This distribution arises as the distribution of N 1 −N 2 where N 1 , N 2 are independent random variables that both have the Poisson distribution with parameter ρ/2; see Irwin (1937), and see Skellam (1946) where the more general case with possibly different means for N 1 and N 2 is considered.In the one-dimensional case we need the positive Skellam distribution, which is the conditional distribution of |N 1 − N 2 | given that N 1 = N 2 .The associated probability mass function is given by psk For d > 1 we use the generalized positive Skellam distribution with parameters κ > 0 and τ > 0, with mass function gpsk(n|κ, τ , n ∈ N.
It will turn out as part of the proof of the next result that this is indeed a probability mass function.We have lim κ→0 the equation β λ (ρ) = 1 has a unique finite positive solution ρ 0 (λ), and for 0 ≤ ρ ≤ ρ 0 (λ) it holds that Remark 10.(a) The condition that β n (ρ) ≥ 0 for all n ∈ N in Proposition 7 is satisfied in all of the above families, but it can easily be removed by an appropriate extension of the mixing base.For this, let ∆ λ,− n,η be the distribution with density x → 1 − D λ n (η t x), x ∈ S d .Then the representation (28) continues to hold if we take (b) The condition β(ρ) ≤ 1 in Proposition 7 holds if P η,ρ is sufficiently close to the uniform distribution on the sphere.If instead of P η,ρ we consider a mixture of this distribution with the uniform, with enough weight on the latter, then the result is close enough to the uniform, and we again obtain a mixture representation.In the von Mises-Fisher case with d = 1, for example, we get for ρ > ρ 0 and with α(ρ) (c) Is there a countable mixing base that represents all isotropic spherical distributions with densities of the form x → g(η t x) with η ∈ S d and g ∈ H λ ?This may be rephrased in terms of the set of extremal points of a convex set in an infinite dimensional space.We refer to Baringhaus and Grübel (2021b) for such geometric aspects in general, and for the construction of tree-based mixing bases that would lead to a positive answer for the set of all g ∈ H λ that are Riemann integrable.
The passage from an L 2 -expansion (22) of g ρ to the mixture representation (28) heavily relies on the nonnegativity of the functions 1 + D λ n (respectively 1 − D λ n in part (a) above).More generally, we may consider a mixing base (Q n,η ) n∈N where the density of Q n,η is a polynomial of degree n in η t x.This leads to the consideration of general linear combinations of ultraspherical polynomials; indeed, finding conditions for such polynomials to be nonnegative (on a given interval) is an ongoing research topic, see Askey (1975).
We confine ourselves to an example with λ = 0 and the Chebyshev polynomials.A change of mixing base will obviously lead to a change in the sequence of mixing coefficients.It turns out that this may lead to a representation of wider applicability.
On the other hand, Turan (1953) proved that, for all n ∈ N 0 , n k=0 In view of T k (cos ϑ) = cos(kϑ) this can be used to obtain an alternative representation.For this, let Σ n,η be the distribution on S 1 with density x → n k=0 α k T k (η t x), where α k := ( 1 /2) k /k! and n ∈ N 0 .Then (34), together with a summation by parts, leads to where geo N0 and geo N are as in Theorem 8.Both ( 35) and ( 36) hold for all η ∈ S 1 , but note that the range of permissible ρ-values has increased from 0 ≤ ρ ≤ 1 − 2 −1/2 to the full interval 0 ≤ ρ < 1.
As pointed out earlier mixing families constructed with surface harmonics can be used with all distributions of the form (21) as long as these are sufficiently close to the uniform.The following result gives a property which we interpret as self-mixing stability of the distribution families Mixing two elements of the same family results in a distribution that belongs to this family as well.Additionally, mixing any two elements moves the mixing distribution closer to the uniform distribution.Generally, the mixing operation relates distributions with different location parameter η ∈ S d to each other.
Proposition 12. Let γ λ n be the constants defined in (24).(a) For all n ∈ N 0 and η We recall that a probability kernel from a measurable space (E, E) to another measurable space (F, F ) is a function Q : E × F → R that is E-measurable in its first and a probability measure on (F, F ) in its second argument.Given a probability measure P on (E, E) and a kernel Q from (E, E) to (F, F ) we define a probability measure P • Q on (F, F ) by For a family {Q x : x ∈ E} of probability measures Q x on (F, F ) with the property that the map which may be interpreted as a mixing operation.For use in the next section we note that for a family {P η : η ∈ S d } of probability measures P η on (S d , B(S d )) and a kernel Q from (S d , B(S d )) to (S d , B(S d )) that are both isotropic in the sense defined by ( 8) and ( 9) respectively, the mixing results in an isotropic family again, Further, for α = β = 1 the statements in Proposition 12 can be simply rephrased as Using the self-mixing stability and the bilinearity of the operation defined in (37) we obtain a discrete mixture representation for the composition of two families that both have a representation in terms of the ultraspherical mixing base.
Proposition 13.Suppose that P η and P ′ η , η ∈ S d , are distribution families on S d with discrete mixture representations where w(0 This can be used to obtain the composition of wrapped Cauchy distributions.Indeed, taken together, Theorem 8 (a) and Proposition 13 lead to It is worthwhile to point out that the restriction ρ, ρ ′ ≤ 1/3 in (39) can be omitted.In fact, for all 0 ≤ ρ, ρ ′ < 1, using ( 24), ( 25), ( 26), and (31), the density of WC 1 (η, ρ) In the next section this will be put into a wider context.

Discrete mixture representations and Markov processes
Let {P η,ρ : η ∈ S d , ρ ∈ I}, I ⊂ R + an interval, be a family of distributions of the type considered in the previous sections.Below we briefly discuss three different connections to Markov processes.First, for ρ ∈ I fixed, the corresponding subfamily may be regarded as a probability kernel and thus induces a Markov chain on spheres.Second, an isotropic diffusion process on S d leads to a family of the above type via its one-dimensional marginal distributions, where η and ρ take over the role of starting point and (transformed) time parameter respectively.Third, starting with a discrete mixture representation we may find a discrete time Markov chain with marginal distributions equal to elements of the mixing base, and thus obtain an almost sure representation of the family as the distributions of the chain at random times.

Random walks on spheres
Let {Q η : η ∈ S d } be a family of probability distributions that leads to a Markov kernel as described at the end of Section 3.Such kernels arise as transition probabilities of Markov processes.We may, for example, fix an η ∈ S d and define a Markov chain (X n ) n∈N0 with state space (S d , B(S d )) by the requirements that X 0 ≡ η and that the distribution of X n+1 conditionally on X n = ξ is given by Q ξ .For isotropic kernels each transition can be divided into two steps that make use of the representation of S d by [−1, 1] × S d−1 that also appeared in connection with ( 5) and ( 6).In the geometrically most familiar case we consider the current position as the 'north pole', then first choose a latitude and thereafter, independently, a longitude uniformly at random.The result is regarded as the new north pole.A generalization of this setup has been considered by Bingham (1972), see also the references given there.
The case d = 1 is somewhat special as the wrapping procedure is a group homomorphism from the additive group of real numbers into the multiplicative group S 1 , regarded as a subset of C and endowed with complex multiplication.Wrapping a random walk or a Lévy process thus leads to processes with values in S 1 that have stationary and independent increments, where the latter are now to be understood as ratios rather than differences.In fact, the location-scale family of Cauchy distributions arises as the one-dimensional marginals of a specific Lévy process, which gives (39) after an appropriate rescaling of the variance parameter.A similar approach, now using Brownian motion on the real line, gives a corresponding statement for the wrapped normal distributions.
We collect some observations in the following result.Recall that the distribution L(X) of a Markov chain X = (X n ) n∈N0 with state space (S d , B(S d )) is a probability measure on the path space (S N0 d , B S N0 d ) of the chain, where the σ-field B S N0 d is generated by the projections π k : Any measurable mapping T : S d → S d may be lifted to a mapping from and to paths by componentwise application.
Proposition 14. Suppose that Q is an isotropic kernel on S d and that X η = (X η n ) n∈N0 is a Markov chain with start at η ∈ S d and transition kernel Q.(a) The family {L(X η ) : η ∈ S d } of probability measures on the path space is isotropic in the sense that L(X holds with w 1 (k) := w(k) for all k ∈ N 0 and, for n > 1, The fact that the geodesic distances from the starting point are again a Markov chain is an instance of lumpability, see Rogers and Pitman (1981) for a general discussion.That the dependence on η is lost in the lumping transition is part of the assertion of part (b).Further, it follows from the cosine theorem for spherical triangles that the transition kernel Q Y of Y may be written as with Z, U independent, L(Z) = L(η t X η 1 ) and L(U ) = ν d−1 ; see also Bingham (1972).Part (c) shows that the mixing base introduced in Section 3 is useful in the Markov chain context, and (40) may be seen as a discrete mixture representation of the family {P η,n : η ∈ S d , n ∈ N}, with P η,n = L(X η n ).It implies that the marginal distributions of the associated latitude process are given by where µ λ k is the push-forward of ∆ λ k,η under x → η t x.

Diffusion processes
Let X = (X t ) t≥0 be a homogeneous continuous time Markov process on the sphere with start at η, i.e.P(X 0 = η) = 1, and transition densities p t (x, y), t > 0, x, y ∈ S d , that are isotropic in the sense that p t (U x, U y) = p t (x, y) for all t > 0, x, y ∈ S d and U ∈ O(d + 1).
Then the marginal distributions L(X t ), t ≥ 0, of the process may have a discrete mixture representation of the type considered above, with ρ related to time t.
We sketch the basic argument, see also Karlin and McGregor (1960), and then apply this in the context of the spherical Brownian motion (B t ) t≥0 .For a general discussion of the latter we refer to Ito and McKean (1997, Section 7.15) and Hsu (2002, Example 3.3.2).We note in passing that the marginal distributions of X characterize the full distribution L(X) of the process, in view of the Chapman-Kolmogorov equations and the invariance of the transition mechanism under orthogonal transformations (clearly, O(d + 1) acts transitively on S d ).
Suppose that X has transition densities p t (x, y) and that its infinitesimal generator A has a discrete spectrum.As transitions are isotropic it is enough to consider one specific starting value x = η.For the Kolmogorov forward equations •) (y) we may try to find a family of basic solutions φ by a separation ansatz φ(t, y) = f (t)g(y).This leads to f ′ (t) f (t) = (Ag)(y) g(y) .
As the left and right hand side respectively depend on t and y only, we may hope that where ω n , n ∈ N 0 , are the eigenvalues of the operator A, with eigenfunctions φ n,η .
Theorem 15.Let (B t ) t≥0 be the spherical Brownian motion on S d , d > 1, with start at η ∈ S d .Let and let br λ t (n Further, let t λ 0 be the unique solution of the equation β λ (t) = 1.Then, for t ≥ t λ 0 , In view of the wrapping representation mentioned above the corresponding result for d = 1 is contained in part (b) of Theorem 8.
Let B = (B t ) t≥0 be as in the theorem and let Y = (Y t ) t≥0 with Y t = η t B t , t ≥ 0, be the associated latitude process.Then a polar decomposition shows that, for t > 0 fixed, B t can be synthesized (in distribution) from Y t and an independent random variable Z that is uniformly distributed on S d−1 .On the level of processes the conditional distribution of B given Y is a result known as the skew product decomposition of spherical Brownian motion; see Ito andMcKean (1997, p. 270) andMijatović et al. (2020).
A representation with a different mixing base, closer in spirit to the representations in Section 2, has very recently been obtained by Mijatović et al. (2020).The result is based on the authors' observation that for a spherical Brownian motion (B λ t ) t≥0 with start at η ∈ S d , d > 1, the rescaled latitude process (Y t ) t≥0 , Y t := (1 − η t B λ t )/2 for all t ≥ 0, is a neutral Wright-Fisher diffusion with both mutation parameters equal to λ.For these, a discrete mixture representation had earlier been given by Jenkins and Spano (2017), see also Griffiths and Spano (2010).Taken together, this leads to with where the term (d) −1 appearing with n = k = 0 is defined to be 1/(d−1); see Jenkins and Spano (2017, formula ( 5)).Interestingly, the mixture coefficients turn out to be the individual probabilities associated with the marginal distributions of a particular pure death process (Z t ) t>0 , i.e. w λ t (n) = P (Z t = n).In contrast to our representation in Theorem 15 via surface harmonics no further restrictions on the time parameter are needed.Moreover, there is also a fascinating probabilistic interpretation, relating neutral Wright-Fisher diffusions to Kingman's coalescent via moment duality; see Mijatović et al. (2020) for the details.Mardia and Jupp (2000) call the L(B λ t ) Brownian motion distributions on S d ; Kent (1977) regards L(η t B λ t ) as a spherical normal distribution.We adopt the notation of the latter.Remembering that B λ starts in η ∈ S d , we denote by SN d (η, ρ) = L B λ 1/ρ the spherical normal distribution with parameters η ∈ S d and ρ > 0.Then, interestingly, by (42) we have a discrete mixture representation for the family { SN d (η, ρ) : η ∈ S d , ρ > 0} with the same mixing base of spherical beta distributions SBeta d (η, 1, n + 1), n ∈ N 0 , as obtained in Section 2 for the von Mises-Fisher, the spherical Cauchy, and the angular Gaussian families.
As explained at the beginning of this subsection, isotropic diffusion processes on the sphere may lead to a discrete mixture representation for the family of their marginal distributions.Conversely, for a given family of the type considered in the previous sections, one might ask for a representation of its elements as the marginals of some diffusion process with values in S d .The following result answers this question for the von Mises-Fisher distributions.
Theorem 16.There is no homogeneous Markov process X = (X t ) t≥0 on S d with the property that, for all η ∈ S d , Alternatively one might start with a diffusion on the ambient space R d+1 and then use the transition x → x/ x from R d+1 to S d .For example, if B = (B t ) t≥0 is a Brownian motion on R d+1 with start at η ∈ S d , then X = (X t ) t≥0 , with X t := B t −1 B t for all t ≥ 0, represents the family {AG d (η, ρ) : ρ > 0} in the sense that for all t > 0.
Remark 17.We relate Theorem 16 to the infinite divisibility statement for the von Mises-Fisher distributions obtained by Kent (1977).Interestingly, in both cases the proofs are based on the same series expansions ( 57), ( 59) of the densities f MF d ( • |η, ρ) in terms of ultraspherical polynomials.An important ingredient of Kent's approach is the associative convolution algebra (F , • d ) on the space F of probability measures on [−1, 1] introduced by Bingham (1972), where the convolution see also (41).Here S 1 , S 2 , Λ are independent, S i ∼ F i for i = 1, 2, and Λ ∼ ν d−1 .For d = 0 we take ν 0 to be the discrete uniform distribution on S 0 := {−1, 1}.With this definition a probability measure F ∈ F is said to be • p -infinitely divisible if for each m ∈ N there exists an Kent (1977) showed that MF * d (ρ) is • d -infinitely divisible.In fact, Kent even gives an interesting representation of the distributions based on the spherical Brownian motion (B t ) t≥0 on S d with start at η ∈ S d : He showed that for each m ∈ N there exists an absolutely continuous distribution G m on the positive half-line such that η t B Tm ∼ F m , where T m ∼ G m is independent of (B t ) t≥0 .Consequently, In order to lift this from the unit interval to the sphere let η ∈ S d be fixed and let P η be the family of distributions P on S d that are axially symmetric with respect to η, i.e.P U = P for each U ∈ O(d + 1) with U η = η, and P U the push-forward of P under U .In order to carry over the convolution operation ) in such a way that U x η = x for all x ∈ S d and that x → U x is a measurable injection (a measurable embedding) from S d into O(d + 1).Then, for x, y ∈ S d , let x ⊕ y := U x y.For P 1 , P 2 ∈ P η , with independent S d -valued random vectors X i ∼ P i , i = 1, 2, the ⋆ d -convolution P 1 ⋆ d P 2 of P 1 and P 2 is defined to be the distribution of X 1 ⊕ X 2 .Kent showed that P 1 ⋆ d P 2 ∈ P η , and with F i as the distribution of η t X i , i = 1, 2, the cosine theorem for spherical triangles leads to From ( 44) it now follows that MF d (η, ρ) = L B Tm ⋆ d m for all m ∈ N.

Almost sure representations
The basic relation (2) connects {P θ : θ ∈ Θ} to the distributions W ρ on N 0 and Q n,η on S d .Isotropy means that we may consider the η-part of the parameter as fixed.In this situation, if (N ρ ) ρ≥0 and (X n ) n∈N0 are independent stochastic processes such that Equation ( 45) may be regarded as an almost sure representation of the distributional equation (2).Classical examples of such almost sure representations are the Skorohod coupling in connection with distributional convergence, see e.g.Kallenberg (1997, Theorem 3.30), and the representation of a sequence of uniformly distributed permutations on the sets {0, 1 . . ., n}, n ∈ N, by the Chinese restaurant process, see e.g.Pitman (2006, Section 3.1).
Of course, such representations are most useful if the successive variables are close to each other (and not simply chosen to be independent).Our aim here are representations of the type (45) for the distribution families considered above.For this, we first formalize the polar decomposition.
Let η ∈ S d , we assume that d > 1. Recall from ( 6) that C d (η, y) = {z ∈ S d : η t z = y} and let be the normalized projection onto the orthogonal complement of the linear subspace of R d+1 spanned by η.This can be extended to the whole of the sphere by choosing some arbitrary of ξ ∈ S d as the value of n η (±η).Then an inverse of the polar decomposition x → (η t x, n η (x)) is given by Ψ in the sense that Ψ η (η t x, n η (x)) = x for all x ∈ S d , and a random vector X with values in S d may be written as Let Q η be a distribution on (S d , B(S d )) that has a density f with respect to unif(S d ) which can be written as f η (x) = g(η t x), x ∈ S d , see (7).If X ∼ Q η then η t X and n η (X) are independent, η t X has the distribution ν d;g with ν d -density g, and n η (X) ∼ unif (C d (η, 0)); see Watson (1983, p. 92) and Mardia and Jupp (2000, p. 169).So, conversely, if we have independent random variables Y ∼ ν d;g and Z ∼ unif (C d (η, 0)), then Mardia and Jupp (2000, p. 161, p. 169) call ( 46) and the distributional version (47) the tangent-normal decomposition.In the past this decomposition has been successfully applied by many authors treating different problems in directional statistics, see e.g.Saw (1978Saw ( , 1983Saw ( , 1984)), García-Portugués et al. ( 2020), and Ulrich (1984).In practice, a random variable X with distribution Q η is simply obtained as follows.Suppose first that η is equal to e 1 = (1, 0, . . ., 0) t , the first unit vector in the canonical basis of R d+1 .With e 1 as 'north pole' the polar representation takes on a particularly simple form: Then, starting with a random variable Y ∼ ν d;g and another random variable Z ∼ unif(S d−1 ) independent of Y , we obtain an S d -valued random variable X with distribution Q e1 via X := Φ(Y, Z).For a general η ∈ S d we use that Q η is the push-forward Q U e1 of Q e1 under the mapping x → U x, where U ∈ O(d + 1) is such that η = U e 1 , i.e.U has η as its first column.Then, defining Φ η (y, z) = U Φ(y, z) it follows that X := Φ η (Y, Z) ∼ Q η ; see also Saw (1978) and Ulrich (1984) for this construction.
Starting with a random variable Y that is almost surely equal to 1, it follows that the random variable X = Φ η (Y, Z) is almost surely equal to η.This means that from a Markov chain (Y n ) n∈N0 with state space [−1, 1] starting in 1, where for n ∈ N the distribution of Y n is the push-forward of Q n,η under x → η t x, and a single random variable Z ∼ unif(S d−1 ), we obtain a Markov chain (X n ) n∈N0 with the desired one-dimensional marginal distributions by putting X n := Φ η (Y n , Z) for all n ∈ N 0 .Clearly, (Y n ) n∈N0 is then the latitude process associated with (X n ) n∈N0 .
This reduces the first step in an almost sure construction (45) to finding a Markov chain on [−1, 1] with prescribed marginals.For the second step we require a suitable integer-valued process N = (N ρ ) ρ≥0 with marginal distributions W ρ .One general possibility is the quantile transformation, which can also be used to construct a Skorohod coupling for real random variables: With U ∼ unif(0, 1) we obtain a random variable X with distribution function F via X := F −1 (U ), where F −1 (u) := inf{t ∈ R; F (t) ≥ u} for 0 < u < 1.If F = F ρ is the distribution function associated with W ρ the paths of the process N = (N ρ ) ρ≥0 constructed in this way depend on the relations between the distribution functions for different ρ's.In particular, if the distributions W ρ are stochastically monotone, meaning that whenever ρ ≤ ρ ′ , then the paths of N are increasing.It is well known that this stochastic monotonicity applies to arbitrary distributions W ρ , W ρ ′ with monotone likelihood ratio.To be precise, defining a likelihood ratio of W ρ ′ with respect to W ρ to be a B(R)-measurable function L ρ,ρ ′ : R → [0, ∞] such that W ρ (L ρ,ρ ′ < ∞) = 1 and the distributions W ρ , W ρ ′ with ρ < ρ ′ have monotone likelihood ratio if there exists an increasing function h ρ,ρ ′ : R → [0, ∞] such that Note that if f ρ , f ρ ′ are densities of W ρ , W ρ ′ with respect to some σ-finite measure, then is a special version of the likelihood ratio of W ρ ′ with respect to W ρ ; here 1( To this end let V , W i , i ∈ N 0 , be independent random variables with V ∼ Γ(d/2, 1), W 0 ∼ Γ(d/2, 1), and 1) for all i ∈ N.Then, using the well known connection between beta and gamma distributions, see e.g.Johnson and Kotz (1970), we obtain independent random variables B 0 ∼ Beta(d/2, d/2), B n ∼ Beta(d + n − 1, 1), n ∈ N, via

and products
Ỹn : The transformation Y n := 1 − 2 Ỹn , n ∈ N 0 , now gives the desired sequence Y = (Y n ) n∈N0 .Moreover, Ỹn+1 = Ỹn B n+1 implies that As B n+1 is independent of Y n this shows that Y is a Markov chain.Suppose now that (N ρ ) ρ≥0 is a stochastic process with N ρ ∼ CHS(d/2, d, 2ρ) for all ρ ≥ 0. Let Z ∼ unif(S d−1 ) be independent of the variables V and W i , i ∈ N. Then (X ρ ) ρ≥0 with and Φ η as in the remarks preceding the example, has the desired property that X ρ ∼ MF d (η, ρ) for all ρ ≥ 0.
For the construction of the counting process we use the quantile transformation.The likelihood ratios turn out to be which, as a function of n, is increasing whenever ρ < ρ ′ .As explained above, this shows that the paths of (N ρ ) ρ≥0 are increasing.From (50) it follows that Y n ≥ Y n−1 for all n ∈ N.
Taken together we see that we have found an almost sure representation (51) with a process (Y Nρ ) ρ≥0 that has increasing paths.
Some comments are in order.Obviously, almost sure representations are generally not unique.In the first step of Example 18, we could use a sequence (Y n ) n∈N0 of independent random variables with Y n ∼ Beta [−1,1] (d/2, d/2+n) for all n ∈ N 0 , or we could use the quantile transformation to obtain suitable variables Y n as functions of one single U ∼ unif(0, 1) (in fact, the corresponding likelihood ratios would be increasing).Similar to (3) in the classical case, the representation (51) strikes a structural middle ground in this spectrum from no dependence at all to total dependence between the variables of interest.Also, the Markov chain featuring in the denominator of has some resemblance to the sum appearing in (3).In Baringhaus and Grübel (2021a, Remark 4 (a)) we found a discrete mixture representation for the non-central family of hyperbolic secant distributions that may similarly written as a function of a Markov chain of this type.
Returning to the general situation, we may regard the right hand side of (3) or (45) as a representation of a continuous time stochastic process Z = (Z t ) t≥0 by independent processes X = (X n ) n∈N0 and N = (N t ) t≥0 via Z t = X Nt for all t ≥ 0. As already mentioned in the introduction, equality of the marginal distributions is considerably weaker than equality of the distributions of the processes.For example, the representation in Example 18 leads to a process that moves by jumps from η on a fixed great circle through η towards the equator in a piecewise constant manner.Loosely speaking, a discrete mixture representation on the process level is only possible for processes of the pure jump type.However, if the time parameter of the base process is X continuous too then we obtain a connection to a famous group of results, known as skew product decompositions.For example, with (X t ) t≥0 a Brownian motion on R d+1 starting at a = 0 and R t := X t , t ≥ 0, we have R −1 t X t = B Nt for all t ≥ 0, where (B t ) t≥0 is a Brownian motion on S d starting at η := a −1 a, N t is given implicitly by Nt 0 R −2 s ds = t, and B and N are independent.In particular, this leads to a representation of the distribution of X t / X t , t ≥ 0, which is a family of spherical distributions, as a continuous mixture.In contrast to discrete mixture representations these seem to be less suitable for simulation.
for the gamma function we obtain and it follows that the density of SBeta d (η, 1, n + 1) can be written as We now write exp ρ η t x = exp(−ρ) exp ρ (1 + η t x) and use the expansion together with the identity In order to prove uniqueness suppose that for two sequences v = (v n ) n∈N0 , w = (w n ) n∈N0 of non-negative real numbers with sum 1.
Passing to the respective push-forwards under x → η t x this leads to the equality of the (continuous) densities, hence the sequences v and w are equal to each other.
The proof of the uniqueness is similar to that of part (a).(c) Noticing (d + 1)/2 = λ + 1, and see Magnus et al. (1966, p. 39), we get arguing as in part (b) that

Proof of Theorem 3
(a) Straightforward manipulations give (b) As at the beginning of the proof of Theorem 2, with p = q = n + 1, n ∈ N 0 , the general expression (11) for the norming constants can be simplified to Hence the density for the associated spherical beta distribution may be written as For ρ < 0 and η ∈ S d we have (see, e.g.Magnus et al. (1966, p. 267)) it follows that for all η ∈ S d , ρ < 0.
In both cases, it is easy to adapt the uniqueness argument from the von Mises-Fisher context to the Watson situation.

Proof of Theorem 8
(a) We use Proposition 7. We have The representation now follows easily.

Proof of Proposition 12
The statements are a consequence of the basic formulas ( 25) and (26).

Proof of Proposition 13
All terms involved are positive, so there are no convergence issues, and the representation follows from the bilinearity of the mixture operation.
5.9 Proof of Proposition 14 (a) The distribution of X is determined by the distributions of the vectors (X 0 , . . ., X n ), n ∈ N 0 .We may thus use induction, and (38) provides the necessary argument for the induction step.(b) Given a north pole η we obtain a partitioning of S d into the sets C η (y) = {x ∈ S d : η t x = y} with the same latitude y ∈ [−1, 1].Isotropy implies that the transition mechanism interacts with the function that maps the points of S d to their latitude in the manner required by Dynkin's criterion, see Dynkin (1965, Theorem 10.13).As the action of O(d + 1) on S d is doubly transitive, for unit vectors η 1 , η 2 ∈ S d with the property that η t η 1 = η t η 2 there exists some U ∈ O(d + 1) such that U η = η and U η 1 = η 2 .The isotropy of Q then implies that for all A ∈ B([−1, 1]) (c) This follows easily on using induction and Proposition 13.

Proof of Theorem 15
The transition probabilities for the spherical Brownian motion in dimension d ≥ 2 are given in Hartmann and Watson (1974)

Proof of Theorem 16
Suppose that, on the contrary, (X t ) t≥0 is a homogeneous Markov process with the property that there exists a function κ : (0, ∞) → (0, ∞) such that for all ξ ∈ S d it holds that if P(X 0 = ξ) = 1 then L(X t ) = MF d (ξ, κ(t)) for all t > 0.
Because of the Markov property we would then have positive parameters ρ, ρ ′ , ρ ′′ such that n) for all η ∈ S d , A ∈ B(S d ).