A harmonic property of right invariant priors

Sei, Tomonari; Komaki, Fumiyasu

doi:10.1007/s41884-024-00133-4

A harmonic property of right invariant priors

Research Paper
Open access
Published: 30 April 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Information Geometry Aims and scope Submit manuscript

A harmonic property of right invariant priors

Download PDF

179 Accesses
6 Altmetric
Explore all metrics

Abstract

It is shown that, on any Lie group, the density ratio of the right invariant measure to the left invariant measure is harmonic with respect to the left invariant Riemannian metric. This result is applied to the Bayesian prediction theory on group invariant statistical models. A method of constructing Bayesian prior distributions that asymptotically dominate the right invariant priors is provided.

Mixing Sets for Rigid Transformations

Article 01 September 2021

Extra Invariance of Group Actions

Article 28 July 2021

Application of the Cramér–Wold theorem to testing for invariance under group actions

Article 18 November 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In Bayesian statistics, if the model has a group structure, inference based on the right invariant prior is known to have desirable properties; see [1] and references therein. The same holds true in Bayesian prediction [5, 10].

On the other hand, in the theory of Bayesian prediction, prior distributions that are superharmonic with respect to the Fisher metric have better performance than the Jeffreys prior [6].

These two facts raise the problem of the relation between right invariant priors and superharmonic priors. The Jeffreys prior corresponds to the left invariant prior in the models with group structures. In some examples such as location-scale models, the ratio of the right invariant prior to the left invariant prior is known to be harmonic with respect to the Fisher metric [7]. However, it was not known whether the harmonic property of the ratio of the right invariant prior to the left invariant prior holds in general. This paper proves the claim. The result is helpful for understanding the dominance property of the right invariant prior to the Jeffreys prior as shown in Lemma 2. We also provide a method of constructing prior distributions that asymptotically dominate the right invariant prior in Lemma 3, as we will demonstrate through examples.

In Sect. 2, we prove that the ratio of the right invariant measure to the left invariant measure is harmonic with respect to any left invariant metric. In Sect. 3, we apply the theorem to the Bayesian prediction problem.

2 Main result

Let G be a Lie group and e be its identity element. Choose a left invariant Riemannian metric h on G. We use the symbol h for Riemannian metrics to distinguish with elements of G usually denoted as g. In application to statistics, h is the Fisher metric of group-invariant models; see Sect. 3.

Let $\nu _\textrm{L}$ be the left invariant measure (left Haar measure) on G. Up to multiplicative constants, $\nu _\textrm{L}$ is written in terms of h as

$$\begin{aligned} \nu _\textrm{L}(\textrm{d}x) = \sqrt{|h|}\textrm{d}x^1\wedge \cdots \wedge \textrm{d}x^n, \end{aligned}$$

where $x^i$ is a local coordinate of $x\in G$ and |h| is the determinant of the metric with respect to the local coordinate system. Denote the reciprocal of the modulus of G by $\pi _\mathrm{R/L}$, that is,

$$\begin{aligned} \pi _\mathrm{R/L}(g)\int _G f(x)\nu _\textrm{L}(\textrm{d}x) = \int _G f(xg)\nu _\textrm{L}(\textrm{d}x) \end{aligned}$$

(1)

(Eq. (1.2) of [1]) for any $g\in G$ and $f\in C_0(G)$, where $C_0(G)$ denotes the set of continuous functions with compact supports. The map $\pi _\mathrm{R/L}:G\rightarrow \mathbb {R}_{>0}$ is a group homomorphism. Define the right invariant measure $\nu _\textrm{R}$ by

$$\begin{aligned} \nu _\textrm{R}(\textrm{d}x) = \pi _\mathrm{R/L}(x) \nu _\textrm{L}(\textrm{d}x) \end{aligned}$$

(see Eq. (1.4) of [1]). It is said that G is unimodular if $\pi _\mathrm{R/L}(x)=1$ for all $x\in G$. We are interested in groups that are not unimodular.

Define the Laplace–Beltrami operator $\Delta $ associated with the metric h by

$$\begin{aligned} \Delta f = \frac{1}{\sqrt{|h|}}\partial _i(\sqrt{|h|}h^{ij}\partial _jf),\quad f\in C^2(G), \end{aligned}$$

where $\partial _i$ denotes the partial derivative with respect to the local coordinate, $h^{ij}$ is the inverse matrix of $h_{ij}=h(\partial _i,\partial _j)$, and Einstein’s summation convention is used. We call $\Delta $ the Laplacian for simplicity. The Laplacian does not depend on the choice of the local coordinate system. A function f is said to be harmonic if $\Delta f=0$ everywhere and superharmonic if $\Delta f\le 0$ everywhere.

Our main theorem is stated as follows.

Theorem 1

The function $\pi _\mathrm{R/L}$ is harmonic.

Proof

Take a function $f\in C_0^\infty (G)$ such that $\int f(x)\nu _\textrm{L}(\textrm{d}x)\ne 0$. Equation (1) is written as

$$\begin{aligned} \pi _\mathrm{R/L}(g)\int f(x)\nu _\textrm{L}(\textrm{d}x) = \int (f\circ L_x)(g)\nu _\textrm{L}(\textrm{d}x), \end{aligned}$$

where $L_x$ is the left translation by x. Applying the Laplacian to both sides with respect to g yields

$$\begin{aligned} (\Delta \pi _\mathrm{R/L})(g) \int f(x)\nu _\textrm{L}(\textrm{d}x)&= \int \Delta (f\circ L_x)(g) \nu _\textrm{L}(\textrm{d}x)\\&= \int ((\Delta f)\circ L_x)(g) \nu _\textrm{L}(\textrm{d}x)\\&= \pi _\mathrm{R/L}(g)\int (\Delta f)(x) \nu _\textrm{L}(\textrm{d}x)\\&= 0, \end{aligned}$$

where the first equality uses Lebesgue’s convergence theorem, the second equality uses the isometric property of $L_x$ and the invariance of $\Delta $ with respect to isometries (see p.246, Proposition 2.4 of [4]), the third equality uses Eq. (1) again, and the fourth equality uses an integral formula on the Laplacian (see p.245 Proposition 2.3 of [4]). This proves $\Delta \pi _\mathrm{R/L}(g)=0$. $\square $

Example 1

(Affine transformations) Consider the group of affine transformations

$$\begin{aligned} G = \left\{ g=\begin{pmatrix} 1&{} 0\\ \mu &{} \sigma \end{pmatrix} \mid \mu \in \mathbb {R},\ \sigma >0 \right\} , \end{aligned}$$

which is used to analyze the location-scale family in statistics. Let us directly show that $\pi _\mathrm{R/L}$ is harmonic, as pointed out by [7]. It is widely known that the left and right invariant measures are

$$\begin{aligned} \nu _\textrm{L}(\textrm{d}g)= \frac{\textrm{d}\mu \wedge \textrm{d}\sigma }{\sigma ^2} \end{aligned}$$

and

$$\begin{aligned} \nu _\textrm{R}(\textrm{d}g) = \frac{\textrm{d}\mu \wedge \textrm{d}\sigma }{\sigma }, \end{aligned}$$

respectively (p. 63 of [3]). The density function of $\nu _\textrm{R}$ with respect to $\nu _\textrm{L}$ is

$$\begin{aligned} \pi _\mathrm{R/L}(g)=\sigma . \end{aligned}$$

To derive the Laplacian, we determine a left invariant Riemannian metric. The metric tensor $h_e$ at the identity element e is arbitrarily chosen. From the G-invariance, the metric tensor at $g\in G$ is

$$\begin{aligned} h_g&= (L_{g^{-1}})^*h_e = B^\top h_e B = \frac{1}{\sigma ^2}h_e, \end{aligned}$$

where $(L_{g^{-1}})^*$ denotes the pull-back operator associated with the left translation $L_{g^{-1}}$ and B is the Jacobian matrix of $L_{g^{-1}}$. Indeed, the left translation

$$\begin{aligned} \begin{pmatrix} 1&{} 0\\ \mu _y&{} \sigma _y \end{pmatrix} = \begin{pmatrix} 1&{} 0\\ \mu &{} \sigma \end{pmatrix}^{-1} \begin{pmatrix} 1&{} 0\\ \mu _x&{} \sigma _x \end{pmatrix} \end{aligned}$$

has the Jacobian matrix

$$\begin{aligned} B = \frac{\partial (\mu _y,\sigma _y)}{\partial (\mu _x,\sigma _x)} = \frac{1}{\sigma }\begin{pmatrix} 1&{} 0\\ 0&{} 1 \end{pmatrix}. \end{aligned}$$

The Laplacian is

$$\begin{aligned} \Delta&= \frac{1}{\sqrt{|h_g|}}\partial _i (\sqrt{|h_g|}(h_g)^{ij}\partial _j) \\&= \frac{\sigma ^2}{\sqrt{|h_e|}}(\partial _\mu ,\partial _\sigma )\left\{ \sigma ^{-2}\sqrt{|h_e|} \sigma ^2h_e^{-1} \begin{pmatrix} \partial _\mu \\ \partial _\sigma \end{pmatrix} \right\} \\&= \sigma ^2(\partial _\mu ,\partial _\sigma ) \left\{ h_e^{-1}\begin{pmatrix} \partial _\mu \\ \partial _\sigma \end{pmatrix} \right\} . \end{aligned}$$

It is immediate to see that $\pi _\mathrm{R/L}(g)=\sigma $ is harmonic for any choice of $h_e$.

3 Application to Bayesian prediction

3.1 Bayesian prediction problem

We briefly recall the Bayesian prediction problem and its relation with geometric quantities such as the Fisher metric and Laplacian.

A statistical model, or simply called a model, is a set of probability measures on a given measurable space $(\mathcal {X},\mathcal {F})$ indexed by a parameter $\theta $ as

$$\begin{aligned} \mathcal {P}=\{P_\theta \mid \theta \in \Theta \}. \end{aligned}$$

We assume that the model is identifiable, that is, $\theta _1\ne \theta _2$ implies $P_{\theta _1}\ne P_{\theta _2}$. The parameter space $\Theta $ is assumed to be an orientable d-dimensional $C^\infty $-manifold. Let $P_\theta $ be absolutely continuous with respect to a base measure $v(\textrm{d}x)$ and its density function $p(x|\theta )$ be positive everywhere and differentiable with respect to $\theta $. The Fisher metric on $\Theta $ is defined by

$$\begin{aligned} h_{ij}(\theta ) = \int \{\partial _i\log p(x|\theta )\}\{\partial _j\log p(x|\theta )\}P_\theta (\textrm{d}x), \end{aligned}$$

where $\partial _i$ is the partial derivative with respect to local coordinates of $\theta $. We assume that the Fisher metric is of $C^\infty $ class and positive definite everywhere. The Fisher metric does not depend on the choice of the base measure $v(\textrm{d}x)$.

A Borel measure on $\Theta $ is called a Bayesian prior distribution or just a prior. The volume element

$$\begin{aligned} J(\textrm{d}\theta ) = \sqrt{|h|}\textrm{d}\theta ^1\wedge \cdots \wedge \textrm{d}\theta ^d \end{aligned}$$

induced from the Fisher metric is called the Jeffreys prior. The Jeffreys prior does not depend on the choice of the local coordinate system. We focus on priors $\pi (\theta )J(\textrm{d}\theta )$ that are absolutely continuous with respect to the Jeffreys prior. We call $\pi (\theta )$ the prior density. Since $J(\textrm{d}\theta )$ does not depend on the local coordinate system, $\pi (\theta )$ is a scalar function. The functions $\pi $ are assumed to be positive-valued and of $C^2$ class. We consider not only proper priors but also improper priors.

A statistical prediction problem is to estimate the distribution of future observation $y\in \mathcal {X}$ based on an independent sample $x^n=(x_1,\ldots ,x_n)\in \mathcal {X}^n$ from $P_\theta $. The Bayesian predictive density

$$\begin{aligned} p_\pi (y|x^n) = \int p(y|\theta )\pi (\theta |x^n)J(\textrm{d}\theta ), \end{aligned}$$

(2)

based on the posterior density

$$\begin{aligned} \pi (\theta |x^n) = \frac{\prod _{i=1}^np(x_i|\theta )\pi (\theta )}{\int \prod _{i=1}^np(x_i|\theta )\pi (\theta )J(\textrm{d}\theta )} \end{aligned}$$

is of interest.

The Bayesian prediction problem is to find a prior density function that has smaller prediction risk. We adopt the following risk function.

Definition 1

(Asymptotic risk; Eq. (13) of [7]) The asymptotic risk function of the prior density $\pi \in C^2(\Theta )$ is defined by

$$\begin{aligned} r(\pi ) = r(\pi ,\theta ) = \frac{1}{\sqrt{\pi (\theta )}}\Delta \sqrt{\pi (\theta )}, \end{aligned}$$

where $\Delta $ denotes the Laplacian on $\Theta $ with respect to the Fisher metric. A prior density $\pi _1$ is said to dominate $\pi _2$ asymptotically if $r(\pi _1,\theta )\le r(\pi _2,\theta )$ for all $\theta $ and $r(\pi _1,\theta )<r(\pi _2,\theta )$ for some $\theta $.

The asymptotic risk is the leading term of the asymptotic expansion of the Kullback–Leibler risk of the Bayesian predictive density (2) as $n\rightarrow \infty $. See Eq. (4) of [6] and Eq. (13) of [7] for details. It is straightforward to see

$$\begin{aligned} r(\pi )&= \frac{1}{2\pi }\Delta \pi - \frac{1}{4}h^{ij}(\partial _i\log \pi )(\partial _j\log \pi ). \end{aligned}$$

(3)

This is proved as

$$\begin{aligned} \frac{1}{\sqrt{\pi }}\Delta \sqrt{\pi }&= \frac{1}{\sqrt{\pi }\sqrt{|h|}}\partial _i(h^{ij}\sqrt{|h|}\partial _j\sqrt{\pi }) \\&= \frac{1}{\sqrt{\pi }\sqrt{|h|}}\partial _i\left( h^{ij}\sqrt{|h|}\frac{\partial _j\pi }{2\sqrt{\pi }}\right) \\&= \frac{1}{\sqrt{\pi }\sqrt{|h|}}\left( \frac{\partial _i(h^{ij}\sqrt{|h|}\partial _j\pi )}{2\sqrt{\pi }} - (h^{ij}\sqrt{|h|}\partial _j\pi )\frac{\partial _i\pi }{4\pi ^{3/2}} \right) \\&= \frac{1}{2\pi }\Delta \pi - \frac{1}{4}h^{ij}(\partial _i\log \pi )(\partial _j\log \pi ). \end{aligned}$$

Our problem is to find a prior density $\pi $ that has smaller asymptotic risk. The asymptotic risk of the Jeffreys prior density is 0 from the definition. Non-constant superharmonic prior densities asymptotically dominate the Jeffreys prior density since (3) holds.

3.2 Group invariant models

We consider the Bayesian prediction problem over group invariant models. Refer to [1, 2, 13] for comprehensive textbooks on the invariant models.

For simplicity, we suppose that the sample space $\mathcal {X}$ is also a $C^\infty $ manifold. Let a Lie group G act on $\mathcal {X}$ smoothly from the left. For a probability measure P on $\mathcal {X}$ and $g\in G$, the push-forward measure $g_*P$ is defined by $g_*P(B)=P(g^{-1}B)$ for Borel sets B. The group G acts on the set of all probability measures by the push-forward operation.

Definition 2

(Group invariant model; Definition 3.1 of [1]) A statistical model $\mathcal {P}$ is said to be G-invariant if for each $P\in \mathcal {P}$, $g_*P\in \mathcal {P}$ for all $g\in G$.

If a G-invariant statistical model is parameterized as $\mathcal {P}=\{P_\theta \mid \theta \in \Theta \}$, the left action of G on $\Theta $ is well defined by $P_{g\theta }=g_*P_\theta $ under identifiability. We assume that G transitively acts on $\Theta $.

Let $v(\textrm{d}x)$ be the base measure of $\mathcal {P}$ as in the preceding subsection. We say that tensors on $\Theta $ are G-invariant if they are preserved under the group action.

Lemma 1

Let $\mathcal {P}$ be a G-invariant model. Then, the Fisher metric h is G-invariant. In particular, the Jeffreys prior is a left G-invariant measure on $\Theta $.

See “Appendix” for the proof. Lemma 1 is used to prove Lemma 2.

We say that G acts freely on $\Theta $ if $g\theta =\theta $ for some $\theta \in \Theta $ implies $g=e$. If the action is free, the parameter space $\Theta =\{g\theta _0\mid g\in G\}$ is identified with G, where $\theta _0\in \Theta $ is a fixed element. Under the identification, the left invariant measure $\nu _\textrm{L}$ on G is equal to the Jeffreys prior and the right invariant measure $\nu _\textrm{R}$ is a prior on $\Theta $, which we call the right invariant prior. It is known that the right invariant prior provides the best invariant predictive distribution [5, 10], which means that the right invariant prior attains the minimum of the Kullback–Leibler risk in the class of invariant predictive distributions. In particular, the right invariant prior dominates the Jeffreys prior if G is not unimodular. This fact is reflected in the following lemma. We prove the lemma in “Appendix” without using the fact.

Lemma 2

Suppose that G is not unimodular and acts freely on $\Theta $. Then, the asymptotic risk of the right invariant prior density $\pi _\mathrm{R/L}(\theta )$ is a negative constant.

Even if the action of G is not free, the theorem holds for any Lie subgroup $G_1$ of G that acts freely and transitively on $\Theta $. In that case, we can identify $\Theta $ with $G_1$ and construct harmonic prior densities from the right invariant measures on $G_1$. Furthermore, since all the conjugate subgroups $gG_1g^{-1}$ ($g\in G$) act freely as well, various harmonic prior densities are obtained. The prior densities have the same asymptotic risk because $G_1$ and $gG_1g^{-1}$ are isomorphic. We can reduce the asymptotic risk by aggregating the prior densities as follows.

Lemma 3

Let $\pi _1$ and $\pi _2$ be smooth positive functions on $\Theta $. Define the generalized mean $\bar{\pi }_\beta $ by

$$\begin{aligned} \bar{\pi }_\beta = \left( \frac{\pi _1^\beta +\pi _2^\beta }{2}\right) ^{1/\beta } \end{aligned}$$

for $\beta \ne 0$ and $\bar{\pi }_0 = (\pi _1\pi _2)^{1/2}$ for $\beta =0$. If $\beta <1/2$, then

$$\begin{aligned} r(\bar{\pi }_\beta )\le \frac{\pi _1^\beta r(\pi _1) + \pi _2^\beta r(\pi _2)}{\pi _1^\beta + \pi _2^\beta }. \end{aligned}$$

The equality holds for all $\theta \in \Theta $ if and only if $\pi _1/\pi _2$ is constant.

See “Appendix” for the proof. The case $\beta =0$ is proved in [12].

We provide two applications of the lemma. In the applications, we first find a closed subgroup $G_1$ that freely and transitively acts on $\Theta $. Then, take $g\in G$ and put $G_2=gG_1g^{-1}$. Under the identifications $G_1\simeq \Theta $, $g_1\mapsto g_1\theta _0$, and $G_2\simeq \Theta $, $g_2\mapsto g_2g\theta _0$ as G-spaces, the following equality holds for any $\theta \in \Theta $:

$$\begin{aligned} \pi _2(\theta )=\pi _1(g^{-1}\theta ), \end{aligned}$$

where $\pi _1$ and $\pi _2$ are the densities of the right invariant priors of $G_1$ and $G_2$, respectively. Indeed, the left and right invariant measures on $G_2$ are the push-forward of those on $G_1$ by $g_1\mapsto gg_1g^{-1}$. Then the density $\pi _1(g_1\theta _0)$ is equal to $\pi _2((gg_1g^{-1})g\theta _0)=\pi _2(gg_1\theta _0)$, which proves $\pi _2(\theta )=\pi _1(g^{-1}\theta )$ for $\theta =gg_1\theta _0$.

Example 2

(Cauchy location-scale family [11]) Consider the Cauchy density function

$$\begin{aligned} p(x|\mu ,\sigma ) = \frac{1}{\pi \sigma (1+(x-\mu )^2/\sigma ^2)},\quad x\in \mathbb {R}, \end{aligned}$$

with respect to the Lebesgue measure, where $\mu $ and $\sigma $ are called the location and scale parameters, respectively. The parameter space is $\Theta =\{(\mu ,\sigma )\mid \mu \in \mathbb {R},\sigma >0\}$. The density function is written in terms of complex numbers as

$$\begin{aligned} p(x|\mu ,\sigma ) = \frac{\sigma }{\pi |x-(\mu +i\sigma )|^2},\quad i=\sqrt{-1}. \end{aligned}$$

The general linear group $G=\textrm{GL}^+(2,\mathbb {R})$ with positive determinant acts on this model through the linear fractional transformation

$$\begin{aligned} \begin{pmatrix} a&{} b\\ c&{} d \end{pmatrix} \cdot x = \frac{ax+b}{cx+d}, \quad \begin{pmatrix} a&{} b\\ c&{} d \end{pmatrix}\in G, \quad x\in \mathcal {X}=\mathbb {R}. \end{aligned}$$

The action of G on the parameter space is

$$\begin{aligned} \begin{pmatrix} a&{} b\\ c&{} d \end{pmatrix}\cdot (\mu +i\sigma )&= \frac{a(\mu +i\sigma )+b}{c(\mu +i\sigma )+d} \\&= \frac{ac\sigma ^2+(a\mu +b)(c\mu +d)}{(c\sigma )^2+(c\mu +d)^2}+i\frac{(ad-bc)\sigma }{(c\sigma )^2+(c\mu +d)^2} \end{aligned}$$

for $(\mu ,\sigma )\in \Theta $. See [11] for details. Although the action of G on $\Theta $ is not free, a subgroup

$$\begin{aligned} G_1 = \left\{ \begin{pmatrix} \sigma &{} \mu \\ 0&{} 1 \end{pmatrix} \mid \mu \in \mathbb {R},\sigma >0 \right\} \end{aligned}$$

acts freely. We can identify $G_1$ with $\Theta $. As in Example 1, the left and right invariant measures of $G_1$ are $\sigma ^{-2}\textrm{d}\mu \wedge \textrm{d}\sigma $ and $\sigma ^{-1}\textrm{d}\mu \wedge \textrm{d}\sigma $, respectively. The density of the right invariant prior on $G_1$ is

$$\begin{aligned} \pi _1(\mu ,\sigma ) = \sigma . \end{aligned}$$

From Theorem 1 and Lemma 2, the asymptotic risk of $\pi _1$ is negative constant. Now consider a conjugate group

$$\begin{aligned} G_2 = gG_1g^{-1},\quad g=\begin{pmatrix} 0&{} 1\\ -1&{} 0 \end{pmatrix}\in G, \end{aligned}$$

which also acts freely on $\Theta $. The density of the right invariant prior on $G_2$ is

$$\begin{aligned} \pi _2(\mu ,\sigma ) = \pi _1(g^{-1}(\mu ,\sigma )) = \frac{\sigma }{\sigma ^2+\mu ^2}. \end{aligned}$$

This prior density is discussed in [7].

Finally, by taking the geometric mean of $\pi _1$ and $\pi _2$, we obtain a prior density

$$\begin{aligned} \sqrt{\pi _1\pi _2} = \frac{\sigma }{\sqrt{\sigma ^2+\mu ^2}} = \frac{1}{\sqrt{1+(\mu /\sigma )^2}}, \end{aligned}$$

which shrinks the signal-noise ratio $\mu /\sigma $ to the origin. Lemma 3 implies that the asymptotic risk of $(\pi _1\pi _2)^{1/2}$ is smaller than those of the right invariant priors $\pi _1$ and $\pi _2$.

For location-scale families other than the Cauchy family, the general linear group does not act because the family is not closed under the reciprocal 1/X of the random variable X. However, the dominance relationship on the asymptotic risk remains true because the asymptotic risk depends only on the Riemannian structure. See also [7] for this point.

Example 3

(two-dimensional Wishart model [8, 12]) Suppose that a random variable X has the two-dimensional Wishart distribution $W_2(n,\Sigma )$ with the degree of freedom n and the covariance parameter

$$\begin{aligned} \Sigma =\begin{pmatrix} \sigma _{11}&{}\sigma _{12}\\ \sigma _{21}&{}\sigma _{22} \end{pmatrix}. \end{aligned}$$

The model is G-invariant with respect to the general linear group $G=\textrm{GL}(2,\mathbb {R})$, where the group action is defined by $(g,X)\mapsto gXg^\top $ and $(g,\Sigma )\mapsto g\Sigma g^\top $. The sample space $\mathcal {X}$ and the parameter space $\Theta $ are the set of positive definite symmetric matrices. The subgroup

$$\begin{aligned} G_1 = \left\{ \begin{pmatrix} a&{} 0\\ b&{} c \end{pmatrix} \mid a,c>0,b\in \mathbb {R} \right\} \end{aligned}$$

of G has a one-to-one correspondence with $\Theta $ through the Cholesky decomposition $\Sigma =gg^\top $ with $g\in G_1$. The left and right invariant measures of $G_1$ are

$$\begin{aligned} \nu _\textrm{L} = \frac{1}{ac^2}\textrm{d}a\wedge \textrm{d}b\wedge \textrm{d}c = \frac{1}{4|\Sigma |^{3/2}}\textrm{d}\sigma _{11}\wedge \textrm{d}\sigma _{12}\wedge \textrm{d}\sigma _{22} \end{aligned}$$

and

$$\begin{aligned} \nu _\textrm{R} = \frac{1}{a^2c}\textrm{d}a\wedge \textrm{d}b\wedge \textrm{d}c = \frac{1}{4\sigma _{11}|\Sigma |}\textrm{d}\sigma _{11}\wedge \textrm{d}\sigma _{12}\wedge \textrm{d}\sigma _{22}, \end{aligned}$$

respectively. The density of the right invariant prior on $G_1$ is

$$\begin{aligned} \pi _1(\Sigma ) = \frac{|\Sigma |^{1/2}}{\sigma _{11}} \end{aligned}$$

in the $\Sigma $-coordinate. A conjugate group

$$\begin{aligned} G_2 = gG_1 g^\top ,\quad g=\begin{pmatrix}0&{}1\\ 1&{}0\end{pmatrix}, \end{aligned}$$

also acts freely on $\Theta $. The density of the right invariant prior on $G_2$ is

$$\begin{aligned} \pi _2(\Sigma ) = \pi _1(g\Sigma g^\top ) = \frac{|\Sigma |^{1/2}}{\sigma _{22}}. \end{aligned}$$

The harmonic mean of $\pi _1$ and $\pi _2$ is

$$\begin{aligned} \left( \frac{\pi _1^{-1}+\pi _2^{-1}}{2}\right) ^{-1} = \frac{2|\Sigma |^{1/2}}{\textrm{tr}(\Sigma )}, \end{aligned}$$

which is orthogonally invariant and shrinks the ratio of the two eigenvalues towards one. Lemma 3 implies that the prior asymptotically dominates the right invariant priors $\pi _1$ and $\pi _2$. The dominance relationship holds even in finite-sample cases as shown by [8].

Similarly, the geometric mean of $\pi _1$ and $\pi _2$ is

$$\begin{aligned} \sqrt{\pi _1\pi _2} = \frac{|\Sigma |^{1/2}}{(\sigma _{11}\sigma _{22})^{1/2}}, \end{aligned}$$

which is scale invariant and shrinks the correlation coefficient towards the origin. Again, Lemma 3 tells us that the prior asymptotically dominates the right invariant priors $\pi _1$ and $\pi _2$. This relation holds even in finite-sample cases as shown by [12].

The two examples show how Theorem 1 is useful in Bayesian inference.

We finally mention the predictive metric defined by [9], which appears in the asymptotic risk when the observed and predicted variables have different statistical models. The predictive metric is G-invariant whenever the statistical models for observed and predicted variables are G-invariant. The method of obtaining harmonic prior distributions is applicable to this case.

Data availability

The manuscript has no associated data.

References

Eaton, M.L.: Group Invariance Applications in Statistics. IMS, Hayward, California (1989)
Book Google Scholar
Eaton, M.L.: Multivariate Statistics: A Vector Space Approach. IMS, Beachwood, Ohio (2007)
Book Google Scholar
Fraser, D.A.S.: The Structure of Inference. John Wiley & Sons, New York (1968)
Google Scholar
Helgason, S.: Groups and Geometric Analysis. American Mathematical Society, Orlando, Florida (1984)
Google Scholar
Komaki, F.: Bayesian predictive distribution with right invariant priors. Calc. Stat. Assoc. Bull. 52, 171–179 (2002)
MathSciNet Google Scholar
Komaki, F.: Shrinkage priors for Bayesian prediction. Ann. Stat. 34(2), 808–819 (2006)
Article MathSciNet Google Scholar
Komaki, F.: Bayesian prediction based on a class of shrinkage priors for location-scale models. Ann. Inst. Stat. Math. 59, 135–146 (2007)
Article MathSciNet Google Scholar
Komaki, F.: Bayesian predictive densities based on superharmonic priors for the 2-dimensional Wishart model. J. Multivar. Anal. 100(10), 2137–2154 (2009)
Article MathSciNet Google Scholar
Komaki, F.: Asymptotic properties of Bayesian predictive densities when the distributions of data and target variables are different. Bayesian Anal. 10(1), 31–51 (2015)
Article MathSciNet Google Scholar
Liang, F., Barron, A.: Exact minimax strategies for predictive density estimation, data compression, and model selection. IEEE Trans. Inf. Theory 50, 2708–2726 (2004)
Article MathSciNet Google Scholar
McCullagh, P.: Möbius transformation and Cauchy parameter estimation. Ann. Stat. 24(2), 787–808 (1996)
Article Google Scholar
Sei, T., Komaki, F.: A correlation-shrinkage prior for Bayesian prediction of the two-dimensional Wishart model. Biometrika 109(4), 1173–1180 (2022)
Article MathSciNet Google Scholar
Wijsman, R.A.: Invariant Measures on Groups and Their Use in Statistics. IMS, Hayward, California (1990)
Book Google Scholar

Download references

Acknowledgements

The authors are grateful to two anonymous referees for their careful reading and insightful suggestions.

Funding

Open Access funding provided by The University of Tokyo. This work was supported by JSPS KAKENHI Grant Number 21K11781, 22H00510, and AMED Grant Numbers JP23dm0207001 and JP23dm0307009.

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
Tomonari Sei & Fumiyasu Komaki
Mathematical Informatics Collaboration Unit, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako City, Saitama, 351-0198, Japan
Fumiyasu Komaki
International Research Center for Neurointelligence, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
Fumiyasu Komaki

Authors

Tomonari Sei
View author publications
You can also search for this author in PubMed Google Scholar
Fumiyasu Komaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fumiyasu Komaki.

Ethics declarations

Conflict of interest

T. Sei and F. Komaki are current members of the Editorial Board of Information Geometry. On behalf of all authors, the corresponding author states that there is no other Conflict of interest.

Additional information

Communicated by Hiroshi Matsuzoe.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proof of lemmas

1.1 Proof of Lemma 1

Recall that the Fisher metric is a (0, 2) symmetric tensor

$$\begin{aligned} h(\theta ) = \int \{\textrm{d}\log p(x|\theta )\}^2 p(x|\theta )v(\textrm{d}x), \end{aligned}$$

where $\textrm{d}\log p(x|\theta )=\partial _i\log p(x|\theta )\textrm{d}\theta ^i$ is the exterior derivative with respect to $\theta \in \Theta $. We prove $g^*h=h$. The pull-back $g^*h$ is

$$\begin{aligned} g^*h&= g^* \int \{\textrm{d}\log p(x|\theta )\}^2p(x|\theta )v(\textrm{d}x)\\&= \int \{g^*\textrm{d}\log p(x|\theta )\}^2p(x|g\theta )v(\textrm{d}x).\\&= \int \{\textrm{d}\log p(x|g\theta )\}^2p(x|g\theta )v(\textrm{d}x). \end{aligned}$$

By assumption, the statistical model satisfies $g_*P_\theta =P_{g\theta }$, which is equivalent to $g_*(p(x|\theta )v(\textrm{d}x))=p(x|g\theta )v(\textrm{d}x)$ and therefore

$$\begin{aligned} p(g^{-1}x|\theta )(g_*v)(\textrm{d}x) = p(x|g\theta )v(\textrm{d}x). \end{aligned}$$

In particular, $g_*v$ and v are absolutely continuous with respect to each other because $p(x|\theta )$ is assumed to be positive everywhere. We have

$$\begin{aligned} p(x|g\theta ) = p(g^{-1}x|\theta )\frac{\textrm{d}(g_*v)}{\textrm{d}v}(x). \end{aligned}$$

Since $\textrm{d}(g_*v)/\textrm{d}v$ does not depend on $\theta $, we obtain

$$\begin{aligned} \textrm{d}\log p(x|g\theta ) = \textrm{d}\log p(g^{-1}x|\theta ). \end{aligned}$$

Therefore

$$\begin{aligned} g^*h&= \int \{\textrm{d}\log p(x|g\theta )\}^2p(x|g\theta )v(\textrm{d}x)\\&= \int \{\textrm{d}\log p(g^{-1}x|\theta )\}^2p(g^{-1}x|\theta )(g_*v)(\textrm{d}x)\\&= \int \{\textrm{d}\log p(x|\theta )\}^2 p(x|\theta )v(\textrm{d}x)\\&= h. \end{aligned}$$

This proves the G-invariance of h.

1.2 Proof of Lemma 2

From Eq. (3) and Theorem 1, the asymptotic risk of $\pi _\mathrm{R/L}$ is

$$\begin{aligned} r(\pi _\mathrm{R/L}) = - \frac{1}{4} h^{ij}(\partial _i\log \pi _\mathrm{R/L})(\partial _j\log \pi _\mathrm{R/L}). \end{aligned}$$

The G-invariance of the asymptotic risk follows from the facts that h is G-invariant and $\pi _\mathrm{R/L}$ is group homomorphic. Therefore, $r(\pi _\mathrm{R/L})$ is a constant function on $\Theta $ since G acts transitively by assumption. If G is not unimodular, the asymptotic risk is negative because $\partial _i\log \pi _\mathrm{R/L}(\theta )\ne 0$ at some $\theta \in \Theta $.

1.3 Proof of Lemma 3

Consider K smooth positive functions $\pi _1,\ldots ,\pi _K$ for $K\ge 2$. The lemma is a special case $K=2$. Denote the generalized mean by

$$\begin{aligned} \bar{\pi }=\left( K^{-1}\sum _{k=1}^K \pi _k^\beta \right) ^{1/\beta }. \end{aligned}$$

We prove $r(\bar{\pi })\le \sum _{k=1}^K \lambda _kr(\pi _k)$, where

$$\begin{aligned} \lambda _k = \frac{\pi _k^\beta }{\sum _{l=1}^K \pi _l^\beta },\quad 1\le k\le K. \end{aligned}$$

Define

$$\begin{aligned} \mu _i=\sum _{k=1}^K \lambda _k\frac{\partial _i\sqrt{\pi _k}}{\sqrt{\pi _k}},\quad 1\le i\le d. \end{aligned}$$

It is straightforward to see

$$\begin{aligned} \partial _i \sqrt{\bar{\pi }} = \sqrt{\bar{\pi }}\mu _i \end{aligned}$$

and

$$\begin{aligned} \partial _i \left( \frac{\sqrt{\bar{\pi }}\lambda _k}{\sqrt{\pi _k}}\right) = \frac{\sqrt{\bar{\pi }}\lambda _k}{\sqrt{\pi _k}}(2\beta -1)\left( \frac{\partial _i\sqrt{\pi _k}}{\sqrt{\pi _k}} - \mu _i\right) . \end{aligned}$$

By using the formulas, we obtain

$$\begin{aligned} r(\bar{\pi })&= \frac{1}{\sqrt{\bar{\pi }|h|}}\partial _i\left( \sqrt{|h|}h^{ij}\partial _j\sqrt{\bar{\pi }} \right) \\&= \frac{1}{\sqrt{\bar{\pi }|h|}}\partial _i\left( \sqrt{|h|}h^{ij}\sqrt{\bar{\pi }}\sum _k \lambda _k\frac{\partial _j\sqrt{\pi _k}}{\sqrt{\pi _k}} \right) \\&= \sum _k \frac{\lambda _k}{\sqrt{\pi _k}}\frac{1}{\sqrt{|h|}}\partial _i\left( \sqrt{|h|}h^{ij}\partial _j\sqrt{\pi _k} \right) + \sum _k h^{ij}\frac{1}{\sqrt{\bar{\pi }}}\partial _i\left( \frac{\sqrt{\bar{\pi }}\lambda _k}{\sqrt{\pi _k}}\right) \partial _j\sqrt{\pi _k} \\&= \sum _k\lambda _k r(\pi _k) + \sum _k h^{ij}(2\beta -1)\lambda _k \left( \frac{\partial _i\sqrt{\pi _k}}{\sqrt{\pi _k}} - \mu _i\right) \frac{\partial _j\sqrt{\pi _k}}{\sqrt{\pi _k}} \\&= \sum _k\lambda _k r(\pi _k) + (2\beta -1)\sum _k h^{ij}\lambda _k \left( \frac{\partial _i\sqrt{\pi _k}}{\sqrt{\pi _k}} - \mu _i\right) \left( \frac{\partial _j\sqrt{\pi _k}}{\sqrt{\pi _k}} - \mu _j\right) . \end{aligned}$$

The last term is non-positive since $\beta <1/2$. This proves the desired inequality $r(\bar{\pi }) \le \sum _k \lambda _k r(\pi _k)$. The equality holds if and only if $(\partial _i\sqrt{\pi _k})/\sqrt{\pi _k}=\mu _i$ for all i and k, or equivalently, $\pi _k/\pi _l$ are constants for all k, l.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sei, T., Komaki, F. A harmonic property of right invariant priors. Info. Geo. (2024). https://doi.org/10.1007/s41884-024-00133-4

Download citation

Received: 16 July 2023
Revised: 05 April 2024
Accepted: 08 April 2024
Published: 30 April 2024
DOI: https://doi.org/10.1007/s41884-024-00133-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A harmonic property of right invariant priors

Abstract

Similar content being viewed by others

Mixing Sets for Rigid Transformations

Extra Invariance of Group Actions

Application of the Cramér–Wold theorem to testing for invariance under group actions

1 Introduction

2 Main result

Theorem 1

Proof

Example 1

3 Application to Bayesian prediction

3.1 Bayesian prediction problem

Definition 1

3.2 Group invariant models

Definition 2

Lemma 1

Lemma 2

Lemma 3

Example 2

Example 3

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Proof of lemmas

Proof of lemmas

1.1 Proof of Lemma 1

1.2 Proof of Lemma 2

1.3 Proof of Lemma 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation