1 Introduction

In Bayesian statistics, if the model has a group structure, inference based on the right invariant prior is known to have desirable properties; see [1] and references therein. The same holds true in Bayesian prediction [5, 10].

On the other hand, in the theory of Bayesian prediction, prior distributions that are superharmonic with respect to the Fisher metric have better performance than the Jeffreys prior [6].

These two facts raise the problem of the relation between right invariant priors and superharmonic priors. The Jeffreys prior corresponds to the left invariant prior in the models with group structures. In some examples such as location-scale models, the ratio of the right invariant prior to the left invariant prior is known to be harmonic with respect to the Fisher metric [7]. However, it was not known whether the harmonic property of the ratio of the right invariant prior to the left invariant prior holds in general. This paper proves the claim. The result is helpful for understanding the dominance property of the right invariant prior to the Jeffreys prior as shown in Lemma 2. We also provide a method of constructing prior distributions that asymptotically dominate the right invariant prior in Lemma 3, as we will demonstrate through examples.

In Sect. 2, we prove that the ratio of the right invariant measure to the left invariant measure is harmonic with respect to any left invariant metric. In Sect. 3, we apply the theorem to the Bayesian prediction problem.

2 Main result

Let G be a Lie group and e be its identity element. Choose a left invariant Riemannian metric h on G. We use the symbol h for Riemannian metrics to distinguish with elements of G usually denoted as g. In application to statistics, h is the Fisher metric of group-invariant models; see Sect. 3.

Let \(\nu _\textrm{L}\) be the left invariant measure (left Haar measure) on G. Up to multiplicative constants, \(\nu _\textrm{L}\) is written in terms of h as

$$\begin{aligned} \nu _\textrm{L}(\textrm{d}x) = \sqrt{|h|}\textrm{d}x^1\wedge \cdots \wedge \textrm{d}x^n, \end{aligned}$$

where \(x^i\) is a local coordinate of \(x\in G\) and |h| is the determinant of the metric with respect to the local coordinate system. Denote the reciprocal of the modulus of G by \(\pi _\mathrm{R/L}\), that is,

$$\begin{aligned} \pi _\mathrm{R/L}(g)\int _G f(x)\nu _\textrm{L}(\textrm{d}x) = \int _G f(xg)\nu _\textrm{L}(\textrm{d}x) \end{aligned}$$
(1)

(Eq. (1.2) of [1]) for any \(g\in G\) and \(f\in C_0(G)\), where \(C_0(G)\) denotes the set of continuous functions with compact supports. The map \(\pi _\mathrm{R/L}:G\rightarrow \mathbb {R}_{>0}\) is a group homomorphism. Define the right invariant measure \(\nu _\textrm{R}\) by

$$\begin{aligned} \nu _\textrm{R}(\textrm{d}x) = \pi _\mathrm{R/L}(x) \nu _\textrm{L}(\textrm{d}x) \end{aligned}$$

(see Eq. (1.4) of [1]). It is said that G is unimodular if \(\pi _\mathrm{R/L}(x)=1\) for all \(x\in G\). We are interested in groups that are not unimodular.

Define the Laplace–Beltrami operator \(\Delta \) associated with the metric h by

$$\begin{aligned} \Delta f = \frac{1}{\sqrt{|h|}}\partial _i(\sqrt{|h|}h^{ij}\partial _jf),\quad f\in C^2(G), \end{aligned}$$

where \(\partial _i\) denotes the partial derivative with respect to the local coordinate, \(h^{ij}\) is the inverse matrix of \(h_{ij}=h(\partial _i,\partial _j)\), and Einstein’s summation convention is used. We call \(\Delta \) the Laplacian for simplicity. The Laplacian does not depend on the choice of the local coordinate system. A function f is said to be harmonic if \(\Delta f=0\) everywhere and superharmonic if \(\Delta f\le 0\) everywhere.

Our main theorem is stated as follows.

Theorem 1

The function \(\pi _\mathrm{R/L}\) is harmonic.

Proof

Take a function \(f\in C_0^\infty (G)\) such that \(\int f(x)\nu _\textrm{L}(\textrm{d}x)\ne 0\). Equation (1) is written as

$$\begin{aligned} \pi _\mathrm{R/L}(g)\int f(x)\nu _\textrm{L}(\textrm{d}x) = \int (f\circ L_x)(g)\nu _\textrm{L}(\textrm{d}x), \end{aligned}$$

where \(L_x\) is the left translation by x. Applying the Laplacian to both sides with respect to g yields

$$\begin{aligned} (\Delta \pi _\mathrm{R/L})(g) \int f(x)\nu _\textrm{L}(\textrm{d}x)&= \int \Delta (f\circ L_x)(g) \nu _\textrm{L}(\textrm{d}x)\\&= \int ((\Delta f)\circ L_x)(g) \nu _\textrm{L}(\textrm{d}x)\\&= \pi _\mathrm{R/L}(g)\int (\Delta f)(x) \nu _\textrm{L}(\textrm{d}x)\\&= 0, \end{aligned}$$

where the first equality uses Lebesgue’s convergence theorem, the second equality uses the isometric property of \(L_x\) and the invariance of \(\Delta \) with respect to isometries (see p.246, Proposition 2.4 of [4]), the third equality uses Eq. (1) again, and the fourth equality uses an integral formula on the Laplacian (see p.245 Proposition 2.3 of [4]). This proves \(\Delta \pi _\mathrm{R/L}(g)=0\). \(\square \)

Example 1

(Affine transformations) Consider the group of affine transformations

$$\begin{aligned} G = \left\{ g=\begin{pmatrix} 1&{} 0\\ \mu &{} \sigma \end{pmatrix} \mid \mu \in \mathbb {R},\ \sigma >0 \right\} , \end{aligned}$$

which is used to analyze the location-scale family in statistics. Let us directly show that \(\pi _\mathrm{R/L}\) is harmonic, as pointed out by [7]. It is widely known that the left and right invariant measures are

$$\begin{aligned} \nu _\textrm{L}(\textrm{d}g)= \frac{\textrm{d}\mu \wedge \textrm{d}\sigma }{\sigma ^2} \end{aligned}$$

and

$$\begin{aligned} \nu _\textrm{R}(\textrm{d}g) = \frac{\textrm{d}\mu \wedge \textrm{d}\sigma }{\sigma }, \end{aligned}$$

respectively (p. 63 of [3]). The density function of \(\nu _\textrm{R}\) with respect to \(\nu _\textrm{L}\) is

$$\begin{aligned} \pi _\mathrm{R/L}(g)=\sigma . \end{aligned}$$

To derive the Laplacian, we determine a left invariant Riemannian metric. The metric tensor \(h_e\) at the identity element e is arbitrarily chosen. From the G-invariance, the metric tensor at \(g\in G\) is

$$\begin{aligned} h_g&= (L_{g^{-1}})^*h_e = B^\top h_e B = \frac{1}{\sigma ^2}h_e, \end{aligned}$$

where \((L_{g^{-1}})^*\) denotes the pull-back operator associated with the left translation \(L_{g^{-1}}\) and B is the Jacobian matrix of \(L_{g^{-1}}\). Indeed, the left translation

$$\begin{aligned} \begin{pmatrix} 1&{} 0\\ \mu _y&{} \sigma _y \end{pmatrix} = \begin{pmatrix} 1&{} 0\\ \mu &{} \sigma \end{pmatrix}^{-1} \begin{pmatrix} 1&{} 0\\ \mu _x&{} \sigma _x \end{pmatrix} \end{aligned}$$

has the Jacobian matrix

$$\begin{aligned} B = \frac{\partial (\mu _y,\sigma _y)}{\partial (\mu _x,\sigma _x)} = \frac{1}{\sigma }\begin{pmatrix} 1&{} 0\\ 0&{} 1 \end{pmatrix}. \end{aligned}$$

The Laplacian is

$$\begin{aligned} \Delta&= \frac{1}{\sqrt{|h_g|}}\partial _i (\sqrt{|h_g|}(h_g)^{ij}\partial _j) \\&= \frac{\sigma ^2}{\sqrt{|h_e|}}(\partial _\mu ,\partial _\sigma )\left\{ \sigma ^{-2}\sqrt{|h_e|} \sigma ^2h_e^{-1} \begin{pmatrix} \partial _\mu \\ \partial _\sigma \end{pmatrix} \right\} \\&= \sigma ^2(\partial _\mu ,\partial _\sigma ) \left\{ h_e^{-1}\begin{pmatrix} \partial _\mu \\ \partial _\sigma \end{pmatrix} \right\} . \end{aligned}$$

It is immediate to see that \(\pi _\mathrm{R/L}(g)=\sigma \) is harmonic for any choice of \(h_e\).

3 Application to Bayesian prediction

3.1 Bayesian prediction problem

We briefly recall the Bayesian prediction problem and its relation with geometric quantities such as the Fisher metric and Laplacian.

A statistical model, or simply called a model, is a set of probability measures on a given measurable space \((\mathcal {X},\mathcal {F})\) indexed by a parameter \(\theta \) as

$$\begin{aligned} \mathcal {P}=\{P_\theta \mid \theta \in \Theta \}. \end{aligned}$$

We assume that the model is identifiable, that is, \(\theta _1\ne \theta _2\) implies \(P_{\theta _1}\ne P_{\theta _2}\). The parameter space \(\Theta \) is assumed to be an orientable d-dimensional \(C^\infty \)-manifold. Let \(P_\theta \) be absolutely continuous with respect to a base measure \(v(\textrm{d}x)\) and its density function \(p(x|\theta )\) be positive everywhere and differentiable with respect to \(\theta \). The Fisher metric on \(\Theta \) is defined by

$$\begin{aligned} h_{ij}(\theta ) = \int \{\partial _i\log p(x|\theta )\}\{\partial _j\log p(x|\theta )\}P_\theta (\textrm{d}x), \end{aligned}$$

where \(\partial _i\) is the partial derivative with respect to local coordinates of \(\theta \). We assume that the Fisher metric is of \(C^\infty \) class and positive definite everywhere. The Fisher metric does not depend on the choice of the base measure \(v(\textrm{d}x)\).

A Borel measure on \(\Theta \) is called a Bayesian prior distribution or just a prior. The volume element

$$\begin{aligned} J(\textrm{d}\theta ) = \sqrt{|h|}\textrm{d}\theta ^1\wedge \cdots \wedge \textrm{d}\theta ^d \end{aligned}$$

induced from the Fisher metric is called the Jeffreys prior. The Jeffreys prior does not depend on the choice of the local coordinate system. We focus on priors \(\pi (\theta )J(\textrm{d}\theta )\) that are absolutely continuous with respect to the Jeffreys prior. We call \(\pi (\theta )\) the prior density. Since \(J(\textrm{d}\theta )\) does not depend on the local coordinate system, \(\pi (\theta )\) is a scalar function. The functions \(\pi \) are assumed to be positive-valued and of \(C^2\) class. We consider not only proper priors but also improper priors.

A statistical prediction problem is to estimate the distribution of future observation \(y\in \mathcal {X}\) based on an independent sample \(x^n=(x_1,\ldots ,x_n)\in \mathcal {X}^n\) from \(P_\theta \). The Bayesian predictive density

$$\begin{aligned} p_\pi (y|x^n) = \int p(y|\theta )\pi (\theta |x^n)J(\textrm{d}\theta ), \end{aligned}$$
(2)

based on the posterior density

$$\begin{aligned} \pi (\theta |x^n) = \frac{\prod _{i=1}^np(x_i|\theta )\pi (\theta )}{\int \prod _{i=1}^np(x_i|\theta )\pi (\theta )J(\textrm{d}\theta )} \end{aligned}$$

is of interest.

The Bayesian prediction problem is to find a prior density function that has smaller prediction risk. We adopt the following risk function.

Definition 1

(Asymptotic risk; Eq. (13) of [7]) The asymptotic risk function of the prior density \(\pi \in C^2(\Theta )\) is defined by

$$\begin{aligned} r(\pi ) = r(\pi ,\theta ) = \frac{1}{\sqrt{\pi (\theta )}}\Delta \sqrt{\pi (\theta )}, \end{aligned}$$

where \(\Delta \) denotes the Laplacian on \(\Theta \) with respect to the Fisher metric. A prior density \(\pi _1\) is said to dominate \(\pi _2\) asymptotically if \(r(\pi _1,\theta )\le r(\pi _2,\theta )\) for all \(\theta \) and \(r(\pi _1,\theta )<r(\pi _2,\theta )\) for some \(\theta \).

The asymptotic risk is the leading term of the asymptotic expansion of the Kullback–Leibler risk of the Bayesian predictive density (2) as \(n\rightarrow \infty \). See Eq. (4) of [6] and Eq. (13) of [7] for details. It is straightforward to see

$$\begin{aligned} r(\pi )&= \frac{1}{2\pi }\Delta \pi - \frac{1}{4}h^{ij}(\partial _i\log \pi )(\partial _j\log \pi ). \end{aligned}$$
(3)

This is proved as

$$\begin{aligned} \frac{1}{\sqrt{\pi }}\Delta \sqrt{\pi }&= \frac{1}{\sqrt{\pi }\sqrt{|h|}}\partial _i(h^{ij}\sqrt{|h|}\partial _j\sqrt{\pi }) \\&= \frac{1}{\sqrt{\pi }\sqrt{|h|}}\partial _i\left( h^{ij}\sqrt{|h|}\frac{\partial _j\pi }{2\sqrt{\pi }}\right) \\&= \frac{1}{\sqrt{\pi }\sqrt{|h|}}\left( \frac{\partial _i(h^{ij}\sqrt{|h|}\partial _j\pi )}{2\sqrt{\pi }} - (h^{ij}\sqrt{|h|}\partial _j\pi )\frac{\partial _i\pi }{4\pi ^{3/2}} \right) \\&= \frac{1}{2\pi }\Delta \pi - \frac{1}{4}h^{ij}(\partial _i\log \pi )(\partial _j\log \pi ). \end{aligned}$$

Our problem is to find a prior density \(\pi \) that has smaller asymptotic risk. The asymptotic risk of the Jeffreys prior density is 0 from the definition. Non-constant superharmonic prior densities asymptotically dominate the Jeffreys prior density since (3) holds.

3.2 Group invariant models

We consider the Bayesian prediction problem over group invariant models. Refer to [1, 2, 13] for comprehensive textbooks on the invariant models.

For simplicity, we suppose that the sample space \(\mathcal {X}\) is also a \(C^\infty \) manifold. Let a Lie group G act on \(\mathcal {X}\) smoothly from the left. For a probability measure P on \(\mathcal {X}\) and \(g\in G\), the push-forward measure \(g_*P\) is defined by \(g_*P(B)=P(g^{-1}B)\) for Borel sets B. The group G acts on the set of all probability measures by the push-forward operation.

Definition 2

(Group invariant model; Definition 3.1 of [1]) A statistical model \(\mathcal {P}\) is said to be G-invariant if for each \(P\in \mathcal {P}\), \(g_*P\in \mathcal {P}\) for all \(g\in G\).

If a G-invariant statistical model is parameterized as \(\mathcal {P}=\{P_\theta \mid \theta \in \Theta \}\), the left action of G on \(\Theta \) is well defined by \(P_{g\theta }=g_*P_\theta \) under identifiability. We assume that G transitively acts on \(\Theta \).

Let \(v(\textrm{d}x)\) be the base measure of \(\mathcal {P}\) as in the preceding subsection. We say that tensors on \(\Theta \) are G-invariant if they are preserved under the group action.

Lemma 1

Let \(\mathcal {P}\) be a G-invariant model. Then, the Fisher metric h is G-invariant. In particular, the Jeffreys prior is a left G-invariant measure on \(\Theta \).

See “Appendix” for the proof. Lemma 1 is used to prove Lemma 2.

We say that G acts freely on \(\Theta \) if \(g\theta =\theta \) for some \(\theta \in \Theta \) implies \(g=e\). If the action is free, the parameter space \(\Theta =\{g\theta _0\mid g\in G\}\) is identified with G, where \(\theta _0\in \Theta \) is a fixed element. Under the identification, the left invariant measure \(\nu _\textrm{L}\) on G is equal to the Jeffreys prior and the right invariant measure \(\nu _\textrm{R}\) is a prior on \(\Theta \), which we call the right invariant prior. It is known that the right invariant prior provides the best invariant predictive distribution [5, 10], which means that the right invariant prior attains the minimum of the Kullback–Leibler risk in the class of invariant predictive distributions. In particular, the right invariant prior dominates the Jeffreys prior if G is not unimodular. This fact is reflected in the following lemma. We prove the lemma in “Appendix” without using the fact.

Lemma 2

Suppose that G is not unimodular and acts freely on \(\Theta \). Then, the asymptotic risk of the right invariant prior density \(\pi _\mathrm{R/L}(\theta )\) is a negative constant.

Even if the action of G is not free, the theorem holds for any Lie subgroup \(G_1\) of G that acts freely and transitively on \(\Theta \). In that case, we can identify \(\Theta \) with \(G_1\) and construct harmonic prior densities from the right invariant measures on \(G_1\). Furthermore, since all the conjugate subgroups \(gG_1g^{-1}\) (\(g\in G\)) act freely as well, various harmonic prior densities are obtained. The prior densities have the same asymptotic risk because \(G_1\) and \(gG_1g^{-1}\) are isomorphic. We can reduce the asymptotic risk by aggregating the prior densities as follows.

Lemma 3

Let \(\pi _1\) and \(\pi _2\) be smooth positive functions on \(\Theta \). Define the generalized mean \(\bar{\pi }_\beta \) by

$$\begin{aligned} \bar{\pi }_\beta = \left( \frac{\pi _1^\beta +\pi _2^\beta }{2}\right) ^{1/\beta } \end{aligned}$$

for \(\beta \ne 0\) and \(\bar{\pi }_0 = (\pi _1\pi _2)^{1/2}\) for \(\beta =0\). If \(\beta <1/2\), then

$$\begin{aligned} r(\bar{\pi }_\beta )\le \frac{\pi _1^\beta r(\pi _1) + \pi _2^\beta r(\pi _2)}{\pi _1^\beta + \pi _2^\beta }. \end{aligned}$$

The equality holds for all \(\theta \in \Theta \) if and only if \(\pi _1/\pi _2\) is constant.

See “Appendix” for the proof. The case \(\beta =0\) is proved in [12].

We provide two applications of the lemma. In the applications, we first find a closed subgroup \(G_1\) that freely and transitively acts on \(\Theta \). Then, take \(g\in G\) and put \(G_2=gG_1g^{-1}\). Under the identifications \(G_1\simeq \Theta \), \(g_1\mapsto g_1\theta _0\), and \(G_2\simeq \Theta \), \(g_2\mapsto g_2g\theta _0\) as G-spaces, the following equality holds for any \(\theta \in \Theta \):

$$\begin{aligned} \pi _2(\theta )=\pi _1(g^{-1}\theta ), \end{aligned}$$

where \(\pi _1\) and \(\pi _2\) are the densities of the right invariant priors of \(G_1\) and \(G_2\), respectively. Indeed, the left and right invariant measures on \(G_2\) are the push-forward of those on \(G_1\) by \(g_1\mapsto gg_1g^{-1}\). Then the density \(\pi _1(g_1\theta _0)\) is equal to \(\pi _2((gg_1g^{-1})g\theta _0)=\pi _2(gg_1\theta _0)\), which proves \(\pi _2(\theta )=\pi _1(g^{-1}\theta )\) for \(\theta =gg_1\theta _0\).

Example 2

(Cauchy location-scale family [11]) Consider the Cauchy density function

$$\begin{aligned} p(x|\mu ,\sigma ) = \frac{1}{\pi \sigma (1+(x-\mu )^2/\sigma ^2)},\quad x\in \mathbb {R}, \end{aligned}$$

with respect to the Lebesgue measure, where \(\mu \) and \(\sigma \) are called the location and scale parameters, respectively. The parameter space is \(\Theta =\{(\mu ,\sigma )\mid \mu \in \mathbb {R},\sigma >0\}\). The density function is written in terms of complex numbers as

$$\begin{aligned} p(x|\mu ,\sigma ) = \frac{\sigma }{\pi |x-(\mu +i\sigma )|^2},\quad i=\sqrt{-1}. \end{aligned}$$

The general linear group \(G=\textrm{GL}^+(2,\mathbb {R})\) with positive determinant acts on this model through the linear fractional transformation

$$\begin{aligned} \begin{pmatrix} a&{} b\\ c&{} d \end{pmatrix} \cdot x = \frac{ax+b}{cx+d}, \quad \begin{pmatrix} a&{} b\\ c&{} d \end{pmatrix}\in G, \quad x\in \mathcal {X}=\mathbb {R}. \end{aligned}$$

The action of G on the parameter space is

$$\begin{aligned} \begin{pmatrix} a&{} b\\ c&{} d \end{pmatrix}\cdot (\mu +i\sigma )&= \frac{a(\mu +i\sigma )+b}{c(\mu +i\sigma )+d} \\&= \frac{ac\sigma ^2+(a\mu +b)(c\mu +d)}{(c\sigma )^2+(c\mu +d)^2}+i\frac{(ad-bc)\sigma }{(c\sigma )^2+(c\mu +d)^2} \end{aligned}$$

for \((\mu ,\sigma )\in \Theta \). See [11] for details. Although the action of G on \(\Theta \) is not free, a subgroup

$$\begin{aligned} G_1 = \left\{ \begin{pmatrix} \sigma &{} \mu \\ 0&{} 1 \end{pmatrix} \mid \mu \in \mathbb {R},\sigma >0 \right\} \end{aligned}$$

acts freely. We can identify \(G_1\) with \(\Theta \). As in Example 1, the left and right invariant measures of \(G_1\) are \(\sigma ^{-2}\textrm{d}\mu \wedge \textrm{d}\sigma \) and \(\sigma ^{-1}\textrm{d}\mu \wedge \textrm{d}\sigma \), respectively. The density of the right invariant prior on \(G_1\) is

$$\begin{aligned} \pi _1(\mu ,\sigma ) = \sigma . \end{aligned}$$

From Theorem 1 and Lemma 2, the asymptotic risk of \(\pi _1\) is negative constant. Now consider a conjugate group

$$\begin{aligned} G_2 = gG_1g^{-1},\quad g=\begin{pmatrix} 0&{} 1\\ -1&{} 0 \end{pmatrix}\in G, \end{aligned}$$

which also acts freely on \(\Theta \). The density of the right invariant prior on \(G_2\) is

$$\begin{aligned} \pi _2(\mu ,\sigma ) = \pi _1(g^{-1}(\mu ,\sigma )) = \frac{\sigma }{\sigma ^2+\mu ^2}. \end{aligned}$$

This prior density is discussed in [7].

Finally, by taking the geometric mean of \(\pi _1\) and \(\pi _2\), we obtain a prior density

$$\begin{aligned} \sqrt{\pi _1\pi _2} = \frac{\sigma }{\sqrt{\sigma ^2+\mu ^2}} = \frac{1}{\sqrt{1+(\mu /\sigma )^2}}, \end{aligned}$$

which shrinks the signal-noise ratio \(\mu /\sigma \) to the origin. Lemma 3 implies that the asymptotic risk of \((\pi _1\pi _2)^{1/2}\) is smaller than those of the right invariant priors \(\pi _1\) and \(\pi _2\).

For location-scale families other than the Cauchy family, the general linear group does not act because the family is not closed under the reciprocal 1/X of the random variable X. However, the dominance relationship on the asymptotic risk remains true because the asymptotic risk depends only on the Riemannian structure. See also [7] for this point.

Example 3

(two-dimensional Wishart model [8, 12]) Suppose that a random variable X has the two-dimensional Wishart distribution \(W_2(n,\Sigma )\) with the degree of freedom n and the covariance parameter

$$\begin{aligned} \Sigma =\begin{pmatrix} \sigma _{11}&{}\sigma _{12}\\ \sigma _{21}&{}\sigma _{22} \end{pmatrix}. \end{aligned}$$

The model is G-invariant with respect to the general linear group \(G=\textrm{GL}(2,\mathbb {R})\), where the group action is defined by \((g,X)\mapsto gXg^\top \) and \((g,\Sigma )\mapsto g\Sigma g^\top \). The sample space \(\mathcal {X}\) and the parameter space \(\Theta \) are the set of positive definite symmetric matrices. The subgroup

$$\begin{aligned} G_1 = \left\{ \begin{pmatrix} a&{} 0\\ b&{} c \end{pmatrix} \mid a,c>0,b\in \mathbb {R} \right\} \end{aligned}$$

of G has a one-to-one correspondence with \(\Theta \) through the Cholesky decomposition \(\Sigma =gg^\top \) with \(g\in G_1\). The left and right invariant measures of \(G_1\) are

$$\begin{aligned} \nu _\textrm{L} = \frac{1}{ac^2}\textrm{d}a\wedge \textrm{d}b\wedge \textrm{d}c = \frac{1}{4|\Sigma |^{3/2}}\textrm{d}\sigma _{11}\wedge \textrm{d}\sigma _{12}\wedge \textrm{d}\sigma _{22} \end{aligned}$$

and

$$\begin{aligned} \nu _\textrm{R} = \frac{1}{a^2c}\textrm{d}a\wedge \textrm{d}b\wedge \textrm{d}c = \frac{1}{4\sigma _{11}|\Sigma |}\textrm{d}\sigma _{11}\wedge \textrm{d}\sigma _{12}\wedge \textrm{d}\sigma _{22}, \end{aligned}$$

respectively. The density of the right invariant prior on \(G_1\) is

$$\begin{aligned} \pi _1(\Sigma ) = \frac{|\Sigma |^{1/2}}{\sigma _{11}} \end{aligned}$$

in the \(\Sigma \)-coordinate. A conjugate group

$$\begin{aligned} G_2 = gG_1 g^\top ,\quad g=\begin{pmatrix}0&{}1\\ 1&{}0\end{pmatrix}, \end{aligned}$$

also acts freely on \(\Theta \). The density of the right invariant prior on \(G_2\) is

$$\begin{aligned} \pi _2(\Sigma ) = \pi _1(g\Sigma g^\top ) = \frac{|\Sigma |^{1/2}}{\sigma _{22}}. \end{aligned}$$

The harmonic mean of \(\pi _1\) and \(\pi _2\) is

$$\begin{aligned} \left( \frac{\pi _1^{-1}+\pi _2^{-1}}{2}\right) ^{-1} = \frac{2|\Sigma |^{1/2}}{\textrm{tr}(\Sigma )}, \end{aligned}$$

which is orthogonally invariant and shrinks the ratio of the two eigenvalues towards one. Lemma 3 implies that the prior asymptotically dominates the right invariant priors \(\pi _1\) and \(\pi _2\). The dominance relationship holds even in finite-sample cases as shown by [8].

Similarly, the geometric mean of \(\pi _1\) and \(\pi _2\) is

$$\begin{aligned} \sqrt{\pi _1\pi _2} = \frac{|\Sigma |^{1/2}}{(\sigma _{11}\sigma _{22})^{1/2}}, \end{aligned}$$

which is scale invariant and shrinks the correlation coefficient towards the origin. Again, Lemma 3 tells us that the prior asymptotically dominates the right invariant priors \(\pi _1\) and \(\pi _2\). This relation holds even in finite-sample cases as shown by [12].

The two examples show how Theorem 1 is useful in Bayesian inference.

We finally mention the predictive metric defined by [9], which appears in the asymptotic risk when the observed and predicted variables have different statistical models. The predictive metric is G-invariant whenever the statistical models for observed and predicted variables are G-invariant. The method of obtaining harmonic prior distributions is applicable to this case.