On local divergences between two probability measures

Avlogiaris, G.; Micheas, A.; Zografos, K.

doi:10.1007/s00184-015-0556-6

On local divergences between two probability measures

Published: 24 July 2015

Volume 79, pages 303–333, (2016)
Cite this article

Metrika Aims and scope Submit manuscript

G. Avlogiaris¹,
A. Micheas² &
K. Zografos¹

365 Accesses
9 Citations
Explore all metrics

Abstract

A broad class of local divergences between two probability measures or between the respective probability distributions is proposed in this paper. The introduced local divergences are based on the classic Csiszár $\phi $-divergence and they provide with a pseudo-distance between two distributions on a specific area of their common domain. The range of values of the introduced class of local divergences is derived and explicit expressions of the proposed local divergences are also derived when the underlined distributions are members of the exponential family of distributions or they are described by multivariate normal models. An application is presented to illustrate the behavior of local divergences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Criterion for Local Model Selection

Article 29 March 2018

Robust statistical inference based on the C-divergence family

Article 30 July 2018

On Similarity Correlation of Probability Distributions

References

Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J R Stat Soc Ser B 28:131–142
MathSciNet MATH Google Scholar
Basu A, Shioya H, Park C (2011) Statistical inference, the minimum distance approach. Chapman & Hall/CRC, Boca Raton
MATH Google Scholar
Cressie N, Read TRC (1984) Multinomial goodness-of-fit tests. J R Stati Soc Ser B 46:440–464
MathSciNet MATH Google Scholar
Csiszár I (1963) Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizitat von Markoffschen Ketten. Magyar Tud Akad Mat Kutato Int Kozl 8:85–108
MathSciNet MATH Google Scholar
Csiszár I (1967) Information-type measures of difference of probability distributions and indirect observations. Stud Sci Math Hung 2:299–318
MathSciNet MATH Google Scholar
Csiszár I, Korner J (1981) Information theory. Coding theorems for discrete memoryless systems. Akademiai Kiado, Budapest
MATH Google Scholar
Ebrahimi N, Soofi S, Soyer R (2010) Information measures in perspective. Int Stat Rev 78:383–412
Article MATH Google Scholar
Johnson RA, Wichern DW (1992) Applied multivariate statistical analysis, 3rd edn. Prentice Hall International Editions, Englewood Cliffs
MATH Google Scholar
Kagan AM (1963) On the theory of Fisher’s information quantity. Dokl Akad Nauk SSSR 151:277–278
MathSciNet Google Scholar
Kullback S (1959) Information theory and statistics. Wiley, New York
MATH Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Article MathSciNet MATH Google Scholar
Landaburu E, Morales D, Pardo L (2005) Divergence-based estimation and testing with misclassified data. Stat Pap 46:397–409
Article MathSciNet MATH Google Scholar
Landaburu E, Pardo L (2000) Goodness of fit tests with weights in the classes based on $(h,\phi )$-divergences. Kybernetika 36:589–602
MathSciNet MATH Google Scholar
Landaburu E, Pardo L (2003) Minimum $(h,\phi )$ -divergences estimators with weights. Appl Math Comput 140:15–28
Article MathSciNet MATH Google Scholar
Liese F, Vajda I (1987) Convex statistical distances. Teubner Texts in Mathematics, Leipzig
MATH Google Scholar
Liese F, Vajda I (2006) On divergences and informations in statistics and information theory. IEEE Trans Inf Theory 52:4394–4412
Article MathSciNet MATH Google Scholar
McElroy T, Holan S (2009) A local spectral approach for assessing time series model misspecification. J Multivar Anal 100:604–621
Article MathSciNet MATH Google Scholar
Morales D, Pardo L, Vajda I (2000) Rényi statistics in directed families of exponential experiments. Statistics 34:151–174
Article MathSciNet MATH Google Scholar
Morales D, Pardo L, Pardo MC, Vajda I (2004) Ré nyi statistics for testing composite hypotheses in general exponential models. Statistics 38:133–147
Article MathSciNet Google Scholar
Nielsen F, Nock R (2011) On Rényi and Tsallis entropies and divergences for exponential families. arXiv:1105.3259v1 [cs.IT] 17 May 2011
Papaioannou T (1986) Measures of information. In: Kotz S, Johnson NL (eds) Encyclopedia of statistical sciences, vol 5. Wiley, New York, pp 391–397
Google Scholar
Papaioannou T (2001) On distances and measures of information: a case of diversity. In: Charalambides CA, Koutras MV, Balakrishnan N (eds) Probability and statistical models with applications. Chapman & Hall/CRC, Boca Raton, pp 503–515
Google Scholar
Pardo L (2006) Statistical inference based on divergence measures. Chapman & Hall/CRC, Boca Raton
MATH Google Scholar
Rényi A (1960) On measures of entropy and information. In: Proceedings of the 4th Berkeley symposium on mathematical statistics and probability, vol 1, Berkeley, pp 547–561
Soofi E (2000) Principal information theoretic approaches. J Am Stat Assoc 95:1349–1353
Article MathSciNet MATH Google Scholar
Soofi E, Retzer JJ (2002) Information indices: unification and applications. Information and entropy econometrics. J Econom 107:17–40
Article MathSciNet MATH Google Scholar
Stummer W, Vajda I (2010) On divergences of finite measures and their applicability in statistics and information theory. Statistics 44:169–187
Article MathSciNet MATH Google Scholar
Ullah A (1996) Entropy, divergence and distance measures with econometric applications. J Stat Plan Inference 49:137–162
Article MathSciNet MATH Google Scholar
Vajda I (1972) On the f-divergence and singularity of probability measures. Period Math Hung 2(1–4):223–234
Article MathSciNet MATH Google Scholar
Vajda I (1973) $\chi ^{\alpha }$-divergence and generalized Fisher’s information. Transactions of the sixth Prague conference on information theory. Statistical decision functions, random processes, pp 873–886
Vajda I (1989) Theory of statistical inference and information. Kluwer Academic Publishers, Dordrecht
MATH Google Scholar
Vajda I (1995) Information theoretic methods in tatistics. Research report no, 1834, Academy of Sciences of the Czech Republic. Institute of Information Theory and Automation, Prague
Zografos K (2008) On Mardia’s and Song’s measures of kurtosis in elliptical distributions. J Multivar Anal 99:858–879
Article MathSciNet MATH Google Scholar
Zografos K, Nadarajah S (2005) Expressions for Rényi and Shannon entropies for multivariate distributions. Stat Probab Lett 71:71–84
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors are grateful to a Reviewer for valuable comments and suggestions which have improved the presentation of the paper.

Author information

Authors and Affiliations

Department of Mathematics, University of Ioannina, 451 10, Ioannina, Greece
G. Avlogiaris & K. Zografos
Department of Statistics, University of Missouri-Columbia, Columbia, USA
A. Micheas

Authors

G. Avlogiaris
View author publications
You can also search for this author in PubMed Google Scholar
A. Micheas
View author publications
You can also search for this author in PubMed Google Scholar
K. Zografos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Zografos.

Appendices

Appendix 1

This appendix provides a detailed proof of Theorem 1.

Proof of Theorem 1

(a) It is clear, from (9) and (10), that $0\le \widetilde{D}_{\phi }^{R}(P,Q)$. We proceed with the upper bound of $\widetilde{D}_{\phi }^{R}(P,Q)$. Given that $\overline{ \phi }(1)=0$ and motivated by a similar proof in Stummer and Vajda (2010, p. 174), we can write

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)=\int \limits _{\{p<q\}}r(x)q(x)\overline{\phi } \left( \frac{p(x)}{q(x)}\right) d\mu (x)+\int \limits _{\{q<p\}}r(x)q(x) \overline{\phi }\left( \frac{p(x)}{q(x)}\right) d\mu (x). \end{aligned}$$

(24)

Define the function

$$\begin{aligned} \overline{\phi ^{*}}(u)=\phi ^{*}(u)+\phi _{+}^{\prime }(1)(u-1). \end{aligned}$$

(25)

Then, for $u=\frac{q(x)}{p(x)}=\frac{q}{p}$, $x\in {\mathcal {X}}$,

$$\begin{aligned} {rp}\overline{\phi ^{*}}\left( \frac{q}{p}\right) =rp\phi ^{*}\left( \frac{q}{p}\right) +rp\phi _{+}^{\prime }(1)\left( \frac{q}{p} -1\right) . \end{aligned}$$

(26)

Hence,

$$\begin{aligned} \int \limits _{\{q<p\}}r(x)p(x)\overline{\phi ^{*}}\left( \frac{q(x)}{p(x)} \right) d\mu (x)= & {} \int \limits _{\{q<p\}}r(x)p(x)\phi ^{*}\left( \frac{q(x) }{p(x)}\right) d\mu (x)+\phi _{+}^{\prime }(1)\\&\times \int \limits _{\{q<p\}}r(x)\left( q(x)-p(x)\right) d\mu (x) \end{aligned}$$

and taking into account that $\phi ^{*}(u)=u\phi \left( 1/u\right) $, $ u>0$,

$$\begin{aligned} \int \limits _{\{q<p\}}r(x)p(x)\overline{\phi ^{*}}\left( \frac{q(x)}{p(x)} \right) d\mu (x)= & {} \int \limits _{\{q<p\}}r(x)q(x)\phi \left( \frac{p(x)}{q(x) }\right) d\mu (x)-\phi _{+}^{\prime }(1)\nonumber \\&\times \int \limits _{\{q<p\}}r(x)\left( p(x)-q(x)\right) d\mu (x) \nonumber \\= & {} \int \limits _{\{q<p\}}r(x)q(x)\overline{\phi }\left( \frac{p(x)}{q(x)} \right) d\mu (x). \end{aligned}$$

(27)

Therefore, from Eqs. (24) and (27) we conclude

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)=\int \limits _{\{p<q\}}r(x)q(x)\overline{\phi } \left( \frac{p(x)}{q(x)}\right) d\mu (x)+\int \limits _{\{q<p\}}r(x)p(x) \overline{\phi ^{*}}\left( \frac{q(x)}{p(x)}\right) d\mu (x). \end{aligned}$$

(28)

On the other hand, based on Stummer and Vajda (2010, p. 174), for a convex function $\phi \in \Phi ^{*}$ which is strictly convex at 1, with $ \phi _{+}^{\prime }(1)=0$, it is true that

$$\begin{aligned} 0=\phi (1)\le \phi (t_{2})\le \phi (t_{1})\le \phi (0),\text { for any } 0\le t_{1}\le t_{2}\le 1. \end{aligned}$$

(29)

Applying inequality (29) to $\phi =\overline{\phi }$, it is clear that on the subset $\left\{ x\in {\mathcal {X}}:p(x)<q(x)\right\} $ of $ {\mathcal {X}}$, it holds that $0\le \overline{\phi }\left( \frac{p}{q}\right) \le \overline{\phi }(0)$, and therefore

$$\begin{aligned} 0\le \int \limits _{\{p<q\}}r(x)q(x)\overline{\phi }\left( \frac{p(x)}{q(x)} \right) d\mu (x)\le \overline{\phi }(0)\int \limits _{\{p<q\}}r(x)q(x)d\mu (x). \end{aligned}$$

Moreover, based on the non-negativity of r and q on any subset of $ {\mathcal {X}}$, it is true that

$$\begin{aligned} 0\le \int \limits _{\{p<q\}}r(x)q(x)d\mu (x)\le \int \limits _{{\mathcal {X}} }r(x)q(x)d\mu (x)=\xi _{0}. \end{aligned}$$

So, the last two inequalities lead to

$$\begin{aligned} 0\le \int \limits _{\{p<q\}}r(x)q(x)\overline{\phi }\left( \frac{p(x)}{q(x)} \right) d\mu (x)\le \overline{\phi }(0)\xi _{0}. \end{aligned}$$

(30)

In a manner quite similar to the above, applying inequality (29) to $ \phi =\overline{\phi ^{*}}$, it is clear that on the subset $\left\{ x\in {\mathcal {X}}:q(x)<p(x)\right\} $ of ${\mathcal {X}}$ it holds that $0\le \overline{\phi ^{*}}\left( \frac{q}{p}\right) \le \overline{\phi ^{*}}(0)$, and therefore

$$\begin{aligned} 0\le \int \limits _{\{q<p\}}r(x)p(x)\overline{\phi ^{*}}\left( \frac{q(x) }{p(x)}\right) d\mu (x)\le \overline{\phi ^{*}}(0)\int \limits _{\{q<p \}}r(x)p(x)d\mu (x). \end{aligned}$$

Hence,

$$\begin{aligned} 0\le \int \limits _{\{q<p\}}r(x)p(x)\overline{\phi ^{*}}\left( \frac{q(x) }{p(x)}\right) d\mu (x)\le \overline{\phi ^{*}}(0)\int \limits _{\{q<p \}}r(x)p(x)d\mu (x)\le \overline{\phi ^{*}}(0)\xi _{1}. \end{aligned}$$

(31)

A combination of (28), (30) and (31) gives

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)\le \overline{\phi }(0)\xi _{0}+\overline{ \phi ^{*}}(0)\xi _{1}. \end{aligned}$$

Based on (7) and (25), $\overline{\phi }(0)=\phi (0)+\phi _{+}^{\prime }(1)$ and $\overline{\phi ^{*}}(0)=\phi ^{*}(0)-\phi _{+}^{\prime }(1)$. These identities along with the previous inequality complete the proof of part (a) of the theorem.

(b) To proceed with the proof of part (b) of the theorem, suppose first that $P=Q$. Then, it is clear from (10) that $\widetilde{D} _{\phi }^{R}(P,P)=D_{\phi }^{R}(P,P)=\phi (1)=0$, because $\phi \in \Phi ^{*}$.

Conversely, let $\widetilde{D}_{\phi }^{R}(P,Q)=0$. Taking into account (9) and the fact that $\phi (1)=0$,

$$\begin{aligned} \phi \left( \frac{p(x)}{q(x)}\right) =\phi (1)+\phi _{+}^{\prime }(1)\left( \frac{p(x)}{q(x)}-1\right) , \end{aligned}$$

(32)

a.e. with respect to measure $\mu $, for Radon-Nikodym derivative r positive on ${\mathcal {X}}$. On the other hand, based on Vajda (1989, p. 58)

$$\begin{aligned} \phi \left( x\right) >\phi (1)+\phi _{+}^{\prime }(1)\left( x-1\right) , \text { for every }x\ne 1, \end{aligned}$$

because $\phi $ is strictly convex at 1. Therefore, the only way equality ( 32) to be valid, taking into account the above inequality, is when $ \frac{p(x)}{q(x)}=1$ or $P=Q$, which completes the proof of part (b) of the theorem.

(c) Suppose that $P\perp Q$. Then, following Vajda (1972, p. 227), $u=\frac{dQ}{dP+dQ}=0\;[P]$ and $u=\frac{dQ}{dP+dQ}=1$ [Q]. Taking into account that $u=\frac{q}{p+q}$, we conclude that if $P\perp Q$, then $ q(x)=0$, $a.e. \ x\in {\mathcal {X}}\,[P]$ and $p(x)=0$, $a.e. \ x\in {\mathcal {X}}\;[Q]$. Equation (28) is refined as follows,

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)=\int \limits _{\{p<q\}}r(x)\overline{\phi } \left( \frac{p(x)}{q(x)}\right) dQ(x)+\int \limits _{\{q<p\}}r(x)\overline{ \phi ^{*}}\left( \frac{q(x)}{p(x)}\right) dP(x), \end{aligned}$$

and subject to the condition $P\perp Q,$

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)= & {} \int \limits _{\{p<q\}}r(x)\overline{\phi } \left( \frac{p(x)}{q(x)}\right) dQ(x)+\int \limits _{\{q<p\}}r(x)\overline{ \phi ^{*}}\left( \frac{q(x)}{p(x)}\right) dP(x)\nonumber \\= & {} \overline{\phi }\left( 0\right) \int \limits _{\{p<q\}}r(x)dQ(x)+\overline{ \phi ^{*}}\left( 0\right) \int \limits _{\{q<p\}}r(x)dP(x). \end{aligned}$$

(33)

On the other hand, because of $p(x)=0$, $a.e.\ x\in {\mathcal {X}}$ [Q], it is clear that

$$\begin{aligned} Q(\{p\ge q\})=Q(\{p>q\})+Q(\{p=q\})=Q(\{p>q\})=Q(\{q<0\})=Q(\varnothing )=0 \end{aligned}$$

since $Q(\{q=p\})=0$, by taking into account that $P\perp Q$. This last equality leads to $\int _{\{p\ge q\}}r(x)dQ(x)=0,$ and therefore

$$\begin{aligned} \xi _{0}= & {} \int \limits _{{\mathcal {X}}}r(x)q(x)d\mu (x)=\int \limits _{{\mathcal {X}} }r(x)dQ(x)=\int \limits _{\{p<q\}}r(x)dQ(x)+\int \limits _{\{p\ge q\}}r(x)dQ(x)\nonumber \\= & {} \int \limits _{\{p<q\}}r(x)dQ(x). \end{aligned}$$

(34)

Similarly, it can be shown that $P(\{q\ge p\})=0$ and hence

$$\begin{aligned} \xi _{1}=\int \limits _{{\mathcal {X}}}r(x)q(x)d\mu (x)=\int \limits _{\{q<p\}}r(x)dP(x). \end{aligned}$$

(35)

Equations (33), (34) and (35) give that if $P\perp Q$ then $\widetilde{D}_{\phi }^{R}(P,Q)=\overline{\phi }\left( 0\right) \xi _{0}+\overline{\phi ^{*}}\left( 0\right) \xi _{1}$ which completes the proof of this part of the theorem in view of the equations $\overline{\phi } (0)=\phi (0)+\phi _{+}^{\prime }(1)$ and $\overline{\phi ^{*}}(0)=\phi ^{*}(0)-\phi _{+}^{\prime }(1)$.

It remains to prove that if $\phi (0)+\phi ^{*}(0)<\infty $ and $ \widetilde{D}_{\phi }^{R}(P,Q)=\overline{\phi }\left( 0\right) \xi _{0}+ \overline{\phi ^{*}}\left( 0\right) \xi _{1}$, then $P\perp Q$. Relationships (24) and (29) immediately lead to

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)\le \overline{\phi }\left( 0\right) \int \limits _{\{p<q\}}r(x)dQ(x)+\overline{\phi ^{*}}\left( 0\right) \int \limits _{\{q<p\}}r(x)dP(x). \end{aligned}$$

This last inequality with the assumption $\widetilde{D}_{\phi }^{R}(P,Q)=\overline{\phi }\left( 0\right) \xi _{0}+\overline{\phi ^{*}}\left( 0\right) \xi _{1}$ and $\phi (0)+\phi ^{*}(0)<\infty $ lead to

$$\begin{aligned} \int \limits _{\{p<q\}}r(x)dQ(x)=\xi _{0}\text { and }\int \limits _{\{q<p \}}r(x)dP(x)=\xi _{1}, \end{aligned}$$

and therefore

$$\begin{aligned} \int \limits _{\{p\ge q\}}r(x)dQ(x)=\int \limits _{\{q\ge p\}}r(x)dP(x)=0. \end{aligned}$$

This last equation lead to

$$\begin{aligned} Q(\{p\ge q\})=0\text { and }P(\{q\ge p\})=0, \end{aligned}$$

or

$$\begin{aligned} Q(\{p\ge q\})=0\text { and }P(\{q>p\})=0 \end{aligned}$$

for Radon-Nikodym derivative r positive on ${\mathcal {X}}$. This last conclusion proves that $P\perp Q$ and the proof of the theorem is completed. $\square $

Appendix 2

This appendix provides a detailed proof of Proposition 3.

Proof of Proposition 3

Based on (14), straightforward calculations give

$$\begin{aligned} K_{\lambda ,\omega }(\theta _{1},\theta _{2})= & {} \exp \{\lambda C(\theta _{2})-(\lambda +1)C(\theta _{1})-C(\omega )+C((\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega )\} \\&\times \int \limits _{{\mathcal {X}}}\exp \{h(x)\}\exp \left\{ \left( \sum \limits _{i=1}^{k}[(\lambda +1)\theta _{1i}-\lambda \theta _{2i}+\omega _{i}]T_{i}(x)\right) \right. \\&-C((\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega )+h(x)\}d\mu (x). \end{aligned}$$

Taking into account (19) and (21),

$$\begin{aligned} K_{\lambda ,\omega }(\theta _{1},\theta _{2})=\exp \left\{ M_{C,\lambda }^{(2)}(\theta _{1},\theta _{2},\omega )\right\} E_{(\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega }\{\exp (h(x))\}. \end{aligned}$$

(36)

On the other hand, it can be easily shown that $E_{\theta _{j}}\left( f_{\omega }(X)\right) =\int _{{\mathcal {X}}}f_{\omega }(x)f_{C}(x,\theta _{j})d\mu (x)$, $j=1,2,$ defined by (15), is given by

$$\begin{aligned} E_{\theta _{j}}\left( f_{\omega }(X)\right)= & {} \exp \{-C(\theta _{j})-C(\omega )+C(\theta _{j}+\omega )\} \\&\times \int \limits _{{\mathcal {X}}}\exp (h(x))\exp \left\{ \left( \sum \limits _{i=1}^{k}(\omega _{i}+\theta _{ji})T_{i}(x)-C(\theta _{j}+\omega )+h(x)\right) \right\} \\&\quad d\mu (x),\;j=1,2, \end{aligned}$$

and therefore

$$\begin{aligned} E_{\theta _{j}}\left( f_{\omega }(X)\right) =\exp \{-C(\theta _{j})-C(\omega )+C(\theta _{j}+\omega )\}E_{\theta _{j}+\omega }\left( \exp \left( h(X)\right) \right) ,\text { }j=1,2. \end{aligned}$$

(37)

The result (18) follows as an application of (13), (36) and (37). $\square $

Appendix 3

This appendix provides a detailed proof of Proposition 4.

Proof of Proposition 4

Based on Proposition 3,

(38)

with $\theta _{1}=(\theta _{11},\theta _{12})=(\Sigma _{1}^{-1}\mu _{1},- \frac{1}{2}\Sigma _{1}^{-1})$, $\theta _{2}=(\theta _{21},\theta _{22})=(\Sigma _{2}^{-1}\mu _{2},-\frac{1}{2}\Sigma _{2}^{-1})$, $\omega =(\omega _{1},\omega _{2})=(\Sigma ^{-1}\mu ,-\frac{1}{2}\Sigma ^{-1})$ and

$$\begin{aligned} M_{C,\lambda }^{(2)}(\theta _{1},\theta _{2},\omega )=\lambda C(\theta _{2})-(\lambda +1)C(\theta _{1})-C(\omega )+C((\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega ). \end{aligned}$$

(39)

Based on (22),

$$\begin{aligned} C(\theta _{i})= & {} \log \left( (2\pi )^{k/2}|\Sigma _{i}|^{1/2}\right) +\frac{1}{ 2}\mu _{i}^{t}\Sigma _{i}^{-1}\mu _{i},i=1,2 \nonumber \\ C(\omega )= & {} \log \left( (2\pi )^{k/2}|\Sigma |^{1/2}\right) +\frac{1}{2}\mu ^{t}\Sigma ^{-1}\mu . \end{aligned}$$

(40)

On the other hand,

$$\begin{aligned} \theta _{1}+\omega =\left( \Sigma _{1}^{-1}\mu _{1}+\Sigma ^{-1}\mu ,-\frac{1 }{2}(\Sigma _{1}^{-1}+\Sigma ^{-1})\right) , \end{aligned}$$

and it is immediate to see, by means of (22), that

$$\begin{aligned} C(\theta _{1}+\omega )= & {} \log \left( (2\pi )^{k/2}|\Sigma ^{-1}+\Sigma _{1}^{-1}|^{-1/2}\right) \nonumber \\&+\frac{1}{2}\left( \Sigma ^{-1}\mu +\Sigma _{1}^{-1}\mu _{1}\right) ^{t}\left( \Sigma ^{-1}+\Sigma _{1}^{-1}\right) ^{-1}\left( \Sigma ^{-1}\mu +\Sigma _{1}^{-1}\mu _{1}\right) .\quad \quad \end{aligned}$$

(41)

Taking into account the identity (cf. Pardo 2006, p. 49)

$$\begin{aligned}&\left( \Sigma ^{-1}\mu +\Sigma _{i}^{-1}\mu _{i}\right) ^{t}\left( \Sigma ^{-1}+\Sigma _{i}^{-1}\right) ^{-1}\left( \Sigma ^{-1}\mu +\Sigma _{i}^{-1}\mu _{i}\right) -\mu ^{t}\Sigma ^{-1}\mu -\mu _{i}^{t}\Sigma _{i}^{-1}\mu _{i} \\&\qquad =-(\mu -\mu _{i})^{t}(\Sigma +\Sigma _{i})^{-1}(\mu -\mu _{i}),\quad i=1,2, \end{aligned}$$

straightforward algebra entails that

$$\begin{aligned} C(\theta _{i}+\omega )-C(\theta _{i})-C(\omega )= & {} \log \left( (2\pi )^{-k/2}|\Sigma ^{-1}+\Sigma _{i}^{-1}|^{-1/2}|\Sigma |^{-1/2}|\Sigma _{i}|^{-1/2}\right) \nonumber \\&-\frac{1}{2}(\mu -\mu _{i})^{t}(\Sigma +\Sigma _{i})^{-1}(\mu -\mu _{i}),\quad i=1,2.\nonumber \\ \end{aligned}$$

(42)

It remains to evaluate $M_{C,\lambda }^{(2)}(\theta _{1},\theta _{2},\omega )$, given by (39). It is easy to see that

$$\begin{aligned} (\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega= & {} \left( (\lambda +1)\Sigma _{1}^{-1}\mu _{1}-\lambda \Sigma _{2}^{-1}\mu _{2}+\Sigma ^{-1}\mu ,\right. \\&\left. (-1/2)\left( (\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}\right) \right) , \end{aligned}$$

and therefore

$$\begin{aligned} C((\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega )= & {} \log \left( (2\pi )^{k/2}|(\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}|^{-1/2}\right) \nonumber \\&+\frac{1}{2}\left( (\lambda +1)\Sigma _{1}^{-1}\mu _{1}-\lambda \Sigma _{2}^{-1}\mu _{2}+\Sigma ^{-1}\mu \right) ^{t}\nonumber \\&\times \, \left( (\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}\right) ^{-1} \nonumber \\&\times \left( (\lambda +1)\Sigma _{1}^{-1}\mu _{1}-\lambda \Sigma _{2}^{-1}\mu _{2}+\Sigma ^{-1}\mu \right) , \end{aligned}$$

(43)

with $(\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}>0,$ for $\lambda \ne 0,-1$. Based now on (39), (40) and (43),

$$\begin{aligned} M_{C,\lambda }^{(2)}(\theta _{1},\theta _{2},\omega )= & {} \log \left( (2\pi )^{-k/2}\right) |\Sigma |^{-\frac{1}{2}}|\Sigma _{1}|^{-\frac{ \lambda +1}{2}}|\Sigma _{2}|^{\frac{\lambda }{2}}\nonumber \\&\times \, \left| (\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}\right| ^{- \frac{1}{2}} \nonumber \\&-\frac{1}{2}\left( \mu ^{t}\Sigma ^{-1}\mu +(\lambda +1)\mu _{1}^{t}\Sigma _{1}^{-1}\mu _{1}-\lambda \mu _{2}^{t}\Sigma _{2}^{-1}\mu _{2}-B_{1}^{t}B_{2}B_{1}\right) ,\nonumber \\ \end{aligned}$$

(44)

with

$$\begin{aligned} B_{1}= & {} (\lambda +1)\Sigma _{1}^{-1}\mu _{1}-\lambda \Sigma _{2}^{-1}\mu _{2}+\Sigma ^{-1}\mu , \\ B_{2}= & {} \left( (\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}\right) ^{-1}. \end{aligned}$$

Taking into account that $h(X)=0$ (cf. Eq. (22)), the result follows as an application of (38), (42) and (44). $\square $

Appendix 4

This appendix provides a detailed proof of Proposition 5.

Proof of Proposition 5

(a) Based on (23) and taking into account (12), straightforward algebra leads to the desired result.

(b) Based on part (a) and on Eq. (22),

$$\begin{aligned} D_{0}^{(\mu ,\Sigma )}((\mu _{1},\Sigma _{1}),(\mu _{2},\Sigma _{2}))= & {} \exp \{C(\theta _{1}+\omega )-C(\theta _{1})-C(\omega )\} \nonumber \\&\times \left( C(\theta _{2})-C(\theta _{1})+(\theta _{1}-\theta _{2})^{t}E_{\theta _{1}+\omega }\left( T(X)\right) \right) \nonumber \\&-\exp \{C(\theta _{1}+\omega )-C(\theta _{1})-C(\omega )\} \nonumber \\&+\exp \{C(\theta _{2}+\omega )-C(\theta _{2})-C(\omega )\}, \end{aligned}$$

(45)

with

$$\begin{aligned} \theta _{i}=\left( \Sigma _{i}^{-1}\mu _{i},- \frac{1}{2}\Sigma _{i}^{-1}\right) ,i=1,2,\;\omega =\left( \Sigma ^{-1}\mu ,- \frac{1}{2}\Sigma ^{-1}\right) \text { and }T(X)=\left( X,XX^{t}\right) . \end{aligned}$$

(46)

Simple algebraic manipulations lead to,

$$\begin{aligned} C(\theta _{2})-C(\theta _{1})=\frac{1}{2}\left( \log \frac{|\Sigma _{2}|}{ |\Sigma _{1}|}+\mu _{2}^{t}\Sigma _{2}^{-1}\mu _{2}-\mu _{1}^{t}\Sigma _{1}^{-1}\mu _{1}\right) . \end{aligned}$$

(47)

On the other hand, taking into account that

$$\begin{aligned} E_{\theta _{1}+\omega }(X)=\int \limits _{{\mathcal {X}}}xf_{C}(x,\theta _{1}+\omega )d\mu (x), \end{aligned}$$

Eq. (46) entails,

$$\begin{aligned} E_{\theta _{1}+\omega }(X)=\left( \Sigma ^{-1}+\Sigma _{1}^{-1}\right) ^{-1}\left( \Sigma ^{-1}\mu +\Sigma _{1}^{-1}\mu _{1}\right) . \end{aligned}$$

(48)

Then,

$$\begin{aligned} E_{\theta _{1}+\omega }\left( XX^{t}\right)= & {} Var_{\theta _{1}+\omega }(X)+\left( E_{\theta _{1}+\omega }(X)\right) \left( E_{\theta _{1}+\omega }(X)\right) ^{t} \nonumber \\= & {} (\Sigma ^{-1}+\Sigma _{1}^{-1})^{-1}+\left\{ (\Sigma ^{-1}+\Sigma _{1}^{-1})^{-1}\left( \Sigma ^{-1}\mu +\Sigma _{1}^{-1}\mu _{1}\right) \right. \nonumber \\&\times \left. \left( \Sigma ^{-1}\mu +\Sigma _{1}^{-1}\mu _{1}\right) ^{t}(\Sigma ^{-1}+\Sigma _{1}^{-1})^{-1}\right\} \end{aligned}$$

(49)

and

$$\begin{aligned} \theta _{1}-\theta _{2}=\left( \Sigma _{1}^{-1}\mu _{1}-\Sigma _{2}^{-1}\mu _{2},-\frac{1}{2}\left( \Sigma _{1}^{-1}-\Sigma _{2}^{-1}\right) \right) . \end{aligned}$$

(50)

Based on (46) and (50), algebraic manipulations entail that,

$$\begin{aligned} (\theta _{1}-\theta _{2})^{t}E_{\theta _{1}+\omega }\left( T(X)\right)= & {} \left( \Sigma _{1}^{-1}\mu _{1}-\Sigma _{2}^{-1}\mu _{2}\right) ^{t}E_{\theta _{1}+\omega }\left( X\right) \nonumber \\&+\,trace\left\{ -\frac{1}{2}\left( \Sigma _{1}^{-1}-\Sigma _{2}^{-1}\right) ^{t}E_{\theta _{1}+\omega }\left( XX^{t}\right) \right\} ,\qquad \end{aligned}$$

(51)

Hence, taking into account (48), (49) and (51)

(52)

with $\mu ^{*}=E_{\theta _{1}+\omega }(X).$ Based on the fact that, for $ i=1,2$,

$$\begin{aligned} \exp \left( C(\theta _{i}+\omega )-C(\theta _{i})-C(\omega )\right)= & {} (2\pi )^{-\frac{k}{2}}|\Sigma |^{-\frac{1}{2}}|\Sigma _{i}|^{-\frac{1}{2} }\left| \Sigma ^{-1}+\Sigma _{i}^{-1}\right| ^{-\frac{1}{2}} \\&\times \exp \left\{ -\frac{1}{2}(\mu -\mu _{i})^{t}(\Sigma +\Sigma _{i})^{-1}(\mu -\mu _{i})\right\} \\= & {} E_{(\mu _{i},\Sigma _{i})}\left( f_{N(\mu ,\Sigma )}(X)\right) , \end{aligned}$$

Eqs. (42), (45), (47) and (52) complete the proof of part (b) of the proposition. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Avlogiaris, G., Micheas, A. & Zografos, K. On local divergences between two probability measures. Metrika 79, 303–333 (2016). https://doi.org/10.1007/s00184-015-0556-6

Download citation

Received: 11 July 2014
Published: 24 July 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s00184-015-0556-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On local divergences between two probability measures

Abstract

Access this article