Skip to main content
Log in

On local divergences between two probability measures

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

A broad class of local divergences between two probability measures or between the respective probability distributions is proposed in this paper. The introduced local divergences are based on the classic Csiszár \(\phi \)-divergence and they provide with a pseudo-distance between two distributions on a specific area of their common domain. The range of values of the introduced class of local divergences is derived and explicit expressions of the proposed local divergences are also derived when the underlined distributions are members of the exponential family of distributions or they are described by multivariate normal models. An application is presented to illustrate the behavior of local divergences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J R Stat Soc Ser B 28:131–142

    MathSciNet  MATH  Google Scholar 

  • Basu A, Shioya H, Park C (2011) Statistical inference, the minimum distance approach. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Cressie N, Read TRC (1984) Multinomial goodness-of-fit tests. J R Stati Soc Ser B 46:440–464

    MathSciNet  MATH  Google Scholar 

  • Csiszár I (1963) Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizitat von Markoffschen Ketten. Magyar Tud Akad Mat Kutato Int Kozl 8:85–108

    MathSciNet  MATH  Google Scholar 

  • Csiszár I (1967) Information-type measures of difference of probability distributions and indirect observations. Stud Sci Math Hung 2:299–318

    MathSciNet  MATH  Google Scholar 

  • Csiszár I, Korner J (1981) Information theory. Coding theorems for discrete memoryless systems. Akademiai Kiado, Budapest

    MATH  Google Scholar 

  • Ebrahimi N, Soofi S, Soyer R (2010) Information measures in perspective. Int Stat Rev 78:383–412

    Article  MATH  Google Scholar 

  • Johnson RA, Wichern DW (1992) Applied multivariate statistical analysis, 3rd edn. Prentice Hall International Editions, Englewood Cliffs

    MATH  Google Scholar 

  • Kagan AM (1963) On the theory of Fisher’s information quantity. Dokl Akad Nauk SSSR 151:277–278

    MathSciNet  Google Scholar 

  • Kullback S (1959) Information theory and statistics. Wiley, New York

    MATH  Google Scholar 

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86

    Article  MathSciNet  MATH  Google Scholar 

  • Landaburu E, Morales D, Pardo L (2005) Divergence-based estimation and testing with misclassified data. Stat Pap 46:397–409

    Article  MathSciNet  MATH  Google Scholar 

  • Landaburu E, Pardo L (2000) Goodness of fit tests with weights in the classes based on \((h,\phi )\)-divergences. Kybernetika 36:589–602

    MathSciNet  MATH  Google Scholar 

  • Landaburu E, Pardo L (2003) Minimum \((h,\phi )\) -divergences estimators with weights. Appl Math Comput 140:15–28

    Article  MathSciNet  MATH  Google Scholar 

  • Liese F, Vajda I (1987) Convex statistical distances. Teubner Texts in Mathematics, Leipzig

    MATH  Google Scholar 

  • Liese F, Vajda I (2006) On divergences and informations in statistics and information theory. IEEE Trans Inf Theory 52:4394–4412

    Article  MathSciNet  MATH  Google Scholar 

  • McElroy T, Holan S (2009) A local spectral approach for assessing time series model misspecification. J Multivar Anal 100:604–621

    Article  MathSciNet  MATH  Google Scholar 

  • Morales D, Pardo L, Vajda I (2000) Rényi statistics in directed families of exponential experiments. Statistics 34:151–174

    Article  MathSciNet  MATH  Google Scholar 

  • Morales D, Pardo L, Pardo MC, Vajda I (2004) Ré nyi statistics for testing composite hypotheses in general exponential models. Statistics 38:133–147

    Article  MathSciNet  Google Scholar 

  • Nielsen F, Nock R (2011) On Rényi and Tsallis entropies and divergences for exponential families. arXiv:1105.3259v1 [cs.IT] 17 May 2011

  • Papaioannou T (1986) Measures of information. In: Kotz S, Johnson NL (eds) Encyclopedia of statistical sciences, vol 5. Wiley, New York, pp 391–397

    Google Scholar 

  • Papaioannou T (2001) On distances and measures of information: a case of diversity. In: Charalambides CA, Koutras MV, Balakrishnan N (eds) Probability and statistical models with applications. Chapman & Hall/CRC, Boca Raton, pp 503–515

    Google Scholar 

  • Pardo L (2006) Statistical inference based on divergence measures. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Rényi A (1960) On measures of entropy and information. In: Proceedings of the 4th Berkeley symposium on mathematical statistics and probability, vol 1, Berkeley, pp 547–561

  • Soofi E (2000) Principal information theoretic approaches. J Am Stat Assoc 95:1349–1353

    Article  MathSciNet  MATH  Google Scholar 

  • Soofi E, Retzer JJ (2002) Information indices: unification and applications. Information and entropy econometrics. J Econom 107:17–40

    Article  MathSciNet  MATH  Google Scholar 

  • Stummer W, Vajda I (2010) On divergences of finite measures and their applicability in statistics and information theory. Statistics 44:169–187

    Article  MathSciNet  MATH  Google Scholar 

  • Ullah A (1996) Entropy, divergence and distance measures with econometric applications. J Stat Plan Inference 49:137–162

    Article  MathSciNet  MATH  Google Scholar 

  • Vajda I (1972) On the f-divergence and singularity of probability measures. Period Math Hung 2(1–4):223–234

    Article  MathSciNet  MATH  Google Scholar 

  • Vajda I (1973) \(\chi ^{\alpha }\)-divergence and generalized Fisher’s information. Transactions of the sixth Prague conference on information theory. Statistical decision functions, random processes, pp 873–886

  • Vajda I (1989) Theory of statistical inference and information. Kluwer Academic Publishers, Dordrecht

    MATH  Google Scholar 

  • Vajda I (1995) Information theoretic methods in tatistics. Research report no, 1834, Academy of Sciences of the Czech Republic. Institute of Information Theory and Automation, Prague

  • Zografos K (2008) On Mardia’s and Song’s measures of kurtosis in elliptical distributions. J Multivar Anal 99:858–879

    Article  MathSciNet  MATH  Google Scholar 

  • Zografos K, Nadarajah S (2005) Expressions for Rényi and Shannon entropies for multivariate distributions. Stat Probab Lett 71:71–84

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors are grateful to a Reviewer for valuable comments and suggestions which have improved the presentation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Zografos.

Appendices

Appendix 1

This appendix provides a detailed proof of Theorem 1.

Proof of Theorem 1

(a) It is clear, from (9) and (10), that \(0\le \widetilde{D}_{\phi }^{R}(P,Q)\). We proceed with the upper bound of \(\widetilde{D}_{\phi }^{R}(P,Q)\). Given that \(\overline{ \phi }(1)=0\) and motivated by a similar proof in Stummer and Vajda (2010, p. 174), we can write

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)=\int \limits _{\{p<q\}}r(x)q(x)\overline{\phi } \left( \frac{p(x)}{q(x)}\right) d\mu (x)+\int \limits _{\{q<p\}}r(x)q(x) \overline{\phi }\left( \frac{p(x)}{q(x)}\right) d\mu (x). \end{aligned}$$
(24)

Define the function

$$\begin{aligned} \overline{\phi ^{*}}(u)=\phi ^{*}(u)+\phi _{+}^{\prime }(1)(u-1). \end{aligned}$$
(25)

Then, for \(u=\frac{q(x)}{p(x)}=\frac{q}{p}\), \(x\in {\mathcal {X}}\),

$$\begin{aligned} {rp}\overline{\phi ^{*}}\left( \frac{q}{p}\right) =rp\phi ^{*}\left( \frac{q}{p}\right) +rp\phi _{+}^{\prime }(1)\left( \frac{q}{p} -1\right) . \end{aligned}$$
(26)

Hence,

$$\begin{aligned} \int \limits _{\{q<p\}}r(x)p(x)\overline{\phi ^{*}}\left( \frac{q(x)}{p(x)} \right) d\mu (x)= & {} \int \limits _{\{q<p\}}r(x)p(x)\phi ^{*}\left( \frac{q(x) }{p(x)}\right) d\mu (x)+\phi _{+}^{\prime }(1)\\&\times \int \limits _{\{q<p\}}r(x)\left( q(x)-p(x)\right) d\mu (x) \end{aligned}$$

and taking into account that \(\phi ^{*}(u)=u\phi \left( 1/u\right) \), \( u>0\),

$$\begin{aligned} \int \limits _{\{q<p\}}r(x)p(x)\overline{\phi ^{*}}\left( \frac{q(x)}{p(x)} \right) d\mu (x)= & {} \int \limits _{\{q<p\}}r(x)q(x)\phi \left( \frac{p(x)}{q(x) }\right) d\mu (x)-\phi _{+}^{\prime }(1)\nonumber \\&\times \int \limits _{\{q<p\}}r(x)\left( p(x)-q(x)\right) d\mu (x) \nonumber \\= & {} \int \limits _{\{q<p\}}r(x)q(x)\overline{\phi }\left( \frac{p(x)}{q(x)} \right) d\mu (x). \end{aligned}$$
(27)

Therefore, from Eqs. (24) and (27) we conclude

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)=\int \limits _{\{p<q\}}r(x)q(x)\overline{\phi } \left( \frac{p(x)}{q(x)}\right) d\mu (x)+\int \limits _{\{q<p\}}r(x)p(x) \overline{\phi ^{*}}\left( \frac{q(x)}{p(x)}\right) d\mu (x). \end{aligned}$$
(28)

On the other hand, based on Stummer and Vajda (2010, p. 174), for a convex function \(\phi \in \Phi ^{*}\) which is strictly convex at 1, with \( \phi _{+}^{\prime }(1)=0\), it is true that

$$\begin{aligned} 0=\phi (1)\le \phi (t_{2})\le \phi (t_{1})\le \phi (0),\text { for any } 0\le t_{1}\le t_{2}\le 1. \end{aligned}$$
(29)

Applying inequality (29) to \(\phi =\overline{\phi }\), it is clear that on the subset \(\left\{ x\in {\mathcal {X}}:p(x)<q(x)\right\} \) of \( {\mathcal {X}}\), it holds that \(0\le \overline{\phi }\left( \frac{p}{q}\right) \le \overline{\phi }(0)\), and therefore

$$\begin{aligned} 0\le \int \limits _{\{p<q\}}r(x)q(x)\overline{\phi }\left( \frac{p(x)}{q(x)} \right) d\mu (x)\le \overline{\phi }(0)\int \limits _{\{p<q\}}r(x)q(x)d\mu (x). \end{aligned}$$

Moreover, based on the non-negativity of r and q on any subset of \( {\mathcal {X}}\), it is true that

$$\begin{aligned} 0\le \int \limits _{\{p<q\}}r(x)q(x)d\mu (x)\le \int \limits _{{\mathcal {X}} }r(x)q(x)d\mu (x)=\xi _{0}. \end{aligned}$$

So, the last two inequalities lead to

$$\begin{aligned} 0\le \int \limits _{\{p<q\}}r(x)q(x)\overline{\phi }\left( \frac{p(x)}{q(x)} \right) d\mu (x)\le \overline{\phi }(0)\xi _{0}. \end{aligned}$$
(30)

In a manner quite similar to the above, applying inequality (29) to \( \phi =\overline{\phi ^{*}}\), it is clear that on the subset \(\left\{ x\in {\mathcal {X}}:q(x)<p(x)\right\} \) of \({\mathcal {X}}\) it holds that \(0\le \overline{\phi ^{*}}\left( \frac{q}{p}\right) \le \overline{\phi ^{*}}(0)\), and therefore

$$\begin{aligned} 0\le \int \limits _{\{q<p\}}r(x)p(x)\overline{\phi ^{*}}\left( \frac{q(x) }{p(x)}\right) d\mu (x)\le \overline{\phi ^{*}}(0)\int \limits _{\{q<p \}}r(x)p(x)d\mu (x). \end{aligned}$$

Hence,

$$\begin{aligned} 0\le \int \limits _{\{q<p\}}r(x)p(x)\overline{\phi ^{*}}\left( \frac{q(x) }{p(x)}\right) d\mu (x)\le \overline{\phi ^{*}}(0)\int \limits _{\{q<p \}}r(x)p(x)d\mu (x)\le \overline{\phi ^{*}}(0)\xi _{1}. \end{aligned}$$
(31)

A combination of (28), (30) and (31) gives

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)\le \overline{\phi }(0)\xi _{0}+\overline{ \phi ^{*}}(0)\xi _{1}. \end{aligned}$$

Based on (7) and (25), \(\overline{\phi }(0)=\phi (0)+\phi _{+}^{\prime }(1)\) and \(\overline{\phi ^{*}}(0)=\phi ^{*}(0)-\phi _{+}^{\prime }(1)\). These identities along with the previous inequality complete the proof of part (a) of the theorem.

(b) To proceed with the proof of part (b) of the theorem, suppose first that \(P=Q\). Then, it is clear from (10) that \(\widetilde{D} _{\phi }^{R}(P,P)=D_{\phi }^{R}(P,P)=\phi (1)=0\), because \(\phi \in \Phi ^{*}\).

Conversely, let \(\widetilde{D}_{\phi }^{R}(P,Q)=0\). Taking into account (9) and the fact that \(\phi (1)=0\),

$$\begin{aligned} \phi \left( \frac{p(x)}{q(x)}\right) =\phi (1)+\phi _{+}^{\prime }(1)\left( \frac{p(x)}{q(x)}-1\right) , \end{aligned}$$
(32)

a.e. with respect to measure \(\mu \), for Radon-Nikodym derivative r positive on \({\mathcal {X}}\). On the other hand, based on Vajda (1989, p. 58)

$$\begin{aligned} \phi \left( x\right) >\phi (1)+\phi _{+}^{\prime }(1)\left( x-1\right) , \text { for every }x\ne 1, \end{aligned}$$

because \(\phi \) is strictly convex at 1. Therefore, the only way equality ( 32) to be valid, taking into account the above inequality, is when \( \frac{p(x)}{q(x)}=1\) or \(P=Q\), which completes the proof of part (b) of the theorem.

(c) Suppose that \(P\perp Q\). Then, following Vajda (1972, p. 227), \(u=\frac{dQ}{dP+dQ}=0\;[P]\) and \(u=\frac{dQ}{dP+dQ}=1\) [Q]. Taking into account that \(u=\frac{q}{p+q}\), we conclude that if \(P\perp Q\), then \( q(x)=0\), \(a.e. \ x\in {\mathcal {X}}\,[P]\) and \(p(x)=0\), \(a.e. \ x\in {\mathcal {X}}\;[Q]\). Equation (28) is refined as follows,

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)=\int \limits _{\{p<q\}}r(x)\overline{\phi } \left( \frac{p(x)}{q(x)}\right) dQ(x)+\int \limits _{\{q<p\}}r(x)\overline{ \phi ^{*}}\left( \frac{q(x)}{p(x)}\right) dP(x), \end{aligned}$$

and subject to the condition \(P\perp Q,\)

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)= & {} \int \limits _{\{p<q\}}r(x)\overline{\phi } \left( \frac{p(x)}{q(x)}\right) dQ(x)+\int \limits _{\{q<p\}}r(x)\overline{ \phi ^{*}}\left( \frac{q(x)}{p(x)}\right) dP(x)\nonumber \\= & {} \overline{\phi }\left( 0\right) \int \limits _{\{p<q\}}r(x)dQ(x)+\overline{ \phi ^{*}}\left( 0\right) \int \limits _{\{q<p\}}r(x)dP(x). \end{aligned}$$
(33)

On the other hand, because of \(p(x)=0\), \(a.e.\ x\in {\mathcal {X}}\) [Q], it is clear that

$$\begin{aligned} Q(\{p\ge q\})=Q(\{p>q\})+Q(\{p=q\})=Q(\{p>q\})=Q(\{q<0\})=Q(\varnothing )=0 \end{aligned}$$

since \(Q(\{q=p\})=0\), by taking into account that \(P\perp Q\). This last equality leads to \(\int _{\{p\ge q\}}r(x)dQ(x)=0,\) and therefore

$$\begin{aligned} \xi _{0}= & {} \int \limits _{{\mathcal {X}}}r(x)q(x)d\mu (x)=\int \limits _{{\mathcal {X}} }r(x)dQ(x)=\int \limits _{\{p<q\}}r(x)dQ(x)+\int \limits _{\{p\ge q\}}r(x)dQ(x)\nonumber \\= & {} \int \limits _{\{p<q\}}r(x)dQ(x). \end{aligned}$$
(34)

Similarly, it can be shown that \(P(\{q\ge p\})=0\) and hence

$$\begin{aligned} \xi _{1}=\int \limits _{{\mathcal {X}}}r(x)q(x)d\mu (x)=\int \limits _{\{q<p\}}r(x)dP(x). \end{aligned}$$
(35)

Equations (33), (34) and (35) give that if \(P\perp Q\) then \(\widetilde{D}_{\phi }^{R}(P,Q)=\overline{\phi }\left( 0\right) \xi _{0}+\overline{\phi ^{*}}\left( 0\right) \xi _{1}\) which completes the proof of this part of the theorem in view of the equations \(\overline{\phi } (0)=\phi (0)+\phi _{+}^{\prime }(1)\) and \(\overline{\phi ^{*}}(0)=\phi ^{*}(0)-\phi _{+}^{\prime }(1)\).

It remains to prove that if \(\phi (0)+\phi ^{*}(0)<\infty \) and \( \widetilde{D}_{\phi }^{R}(P,Q)=\overline{\phi }\left( 0\right) \xi _{0}+ \overline{\phi ^{*}}\left( 0\right) \xi _{1}\), then \(P\perp Q\). Relationships (24) and (29) immediately lead to

$$\begin{aligned} \widetilde{D}_{\phi }^{R}(P,Q)\le \overline{\phi }\left( 0\right) \int \limits _{\{p<q\}}r(x)dQ(x)+\overline{\phi ^{*}}\left( 0\right) \int \limits _{\{q<p\}}r(x)dP(x). \end{aligned}$$

This last inequality with the assumption \(\widetilde{D}_{\phi }^{R}(P,Q)=\overline{\phi }\left( 0\right) \xi _{0}+\overline{\phi ^{*}}\left( 0\right) \xi _{1}\) and \(\phi (0)+\phi ^{*}(0)<\infty \) lead to

$$\begin{aligned} \int \limits _{\{p<q\}}r(x)dQ(x)=\xi _{0}\text { and }\int \limits _{\{q<p \}}r(x)dP(x)=\xi _{1}, \end{aligned}$$

and therefore

$$\begin{aligned} \int \limits _{\{p\ge q\}}r(x)dQ(x)=\int \limits _{\{q\ge p\}}r(x)dP(x)=0. \end{aligned}$$

This last equation lead to

$$\begin{aligned} Q(\{p\ge q\})=0\text { and }P(\{q\ge p\})=0, \end{aligned}$$

or

$$\begin{aligned} Q(\{p\ge q\})=0\text { and }P(\{q>p\})=0 \end{aligned}$$

for Radon-Nikodym derivative r positive on \({\mathcal {X}}\). This last conclusion proves that \(P\perp Q\) and the proof of the theorem is completed. \(\square \)

Appendix 2

This appendix provides a detailed proof of Proposition 3.

Proof of Proposition 3

Based on (14), straightforward calculations give

$$\begin{aligned} K_{\lambda ,\omega }(\theta _{1},\theta _{2})= & {} \exp \{\lambda C(\theta _{2})-(\lambda +1)C(\theta _{1})-C(\omega )+C((\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega )\} \\&\times \int \limits _{{\mathcal {X}}}\exp \{h(x)\}\exp \left\{ \left( \sum \limits _{i=1}^{k}[(\lambda +1)\theta _{1i}-\lambda \theta _{2i}+\omega _{i}]T_{i}(x)\right) \right. \\&-C((\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega )+h(x)\}d\mu (x). \end{aligned}$$

Taking into account (19) and (21),

$$\begin{aligned} K_{\lambda ,\omega }(\theta _{1},\theta _{2})=\exp \left\{ M_{C,\lambda }^{(2)}(\theta _{1},\theta _{2},\omega )\right\} E_{(\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega }\{\exp (h(x))\}. \end{aligned}$$
(36)

On the other hand, it can be easily shown that \(E_{\theta _{j}}\left( f_{\omega }(X)\right) =\int _{{\mathcal {X}}}f_{\omega }(x)f_{C}(x,\theta _{j})d\mu (x)\), \(j=1,2,\) defined by (15), is given by

$$\begin{aligned} E_{\theta _{j}}\left( f_{\omega }(X)\right)= & {} \exp \{-C(\theta _{j})-C(\omega )+C(\theta _{j}+\omega )\} \\&\times \int \limits _{{\mathcal {X}}}\exp (h(x))\exp \left\{ \left( \sum \limits _{i=1}^{k}(\omega _{i}+\theta _{ji})T_{i}(x)-C(\theta _{j}+\omega )+h(x)\right) \right\} \\&\quad d\mu (x),\;j=1,2, \end{aligned}$$

and therefore

$$\begin{aligned} E_{\theta _{j}}\left( f_{\omega }(X)\right) =\exp \{-C(\theta _{j})-C(\omega )+C(\theta _{j}+\omega )\}E_{\theta _{j}+\omega }\left( \exp \left( h(X)\right) \right) ,\text { }j=1,2. \end{aligned}$$
(37)

The result (18) follows as an application of (13), (36) and (37). \(\square \)

Appendix 3

This appendix provides a detailed proof of Proposition 4.

Proof of Proposition 4

Based on Proposition 3,

(38)

with \(\theta _{1}=(\theta _{11},\theta _{12})=(\Sigma _{1}^{-1}\mu _{1},- \frac{1}{2}\Sigma _{1}^{-1})\), \(\theta _{2}=(\theta _{21},\theta _{22})=(\Sigma _{2}^{-1}\mu _{2},-\frac{1}{2}\Sigma _{2}^{-1})\), \(\omega =(\omega _{1},\omega _{2})=(\Sigma ^{-1}\mu ,-\frac{1}{2}\Sigma ^{-1})\) and

$$\begin{aligned} M_{C,\lambda }^{(2)}(\theta _{1},\theta _{2},\omega )=\lambda C(\theta _{2})-(\lambda +1)C(\theta _{1})-C(\omega )+C((\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega ). \end{aligned}$$
(39)

Based on (22),

$$\begin{aligned} C(\theta _{i})= & {} \log \left( (2\pi )^{k/2}|\Sigma _{i}|^{1/2}\right) +\frac{1}{ 2}\mu _{i}^{t}\Sigma _{i}^{-1}\mu _{i},i=1,2 \nonumber \\ C(\omega )= & {} \log \left( (2\pi )^{k/2}|\Sigma |^{1/2}\right) +\frac{1}{2}\mu ^{t}\Sigma ^{-1}\mu . \end{aligned}$$
(40)

On the other hand,

$$\begin{aligned} \theta _{1}+\omega =\left( \Sigma _{1}^{-1}\mu _{1}+\Sigma ^{-1}\mu ,-\frac{1 }{2}(\Sigma _{1}^{-1}+\Sigma ^{-1})\right) , \end{aligned}$$

and it is immediate to see, by means of (22), that

$$\begin{aligned} C(\theta _{1}+\omega )= & {} \log \left( (2\pi )^{k/2}|\Sigma ^{-1}+\Sigma _{1}^{-1}|^{-1/2}\right) \nonumber \\&+\frac{1}{2}\left( \Sigma ^{-1}\mu +\Sigma _{1}^{-1}\mu _{1}\right) ^{t}\left( \Sigma ^{-1}+\Sigma _{1}^{-1}\right) ^{-1}\left( \Sigma ^{-1}\mu +\Sigma _{1}^{-1}\mu _{1}\right) .\quad \quad \end{aligned}$$
(41)

Taking into account the identity (cf. Pardo 2006, p. 49)

$$\begin{aligned}&\left( \Sigma ^{-1}\mu +\Sigma _{i}^{-1}\mu _{i}\right) ^{t}\left( \Sigma ^{-1}+\Sigma _{i}^{-1}\right) ^{-1}\left( \Sigma ^{-1}\mu +\Sigma _{i}^{-1}\mu _{i}\right) -\mu ^{t}\Sigma ^{-1}\mu -\mu _{i}^{t}\Sigma _{i}^{-1}\mu _{i} \\&\qquad =-(\mu -\mu _{i})^{t}(\Sigma +\Sigma _{i})^{-1}(\mu -\mu _{i}),\quad i=1,2, \end{aligned}$$

straightforward algebra entails that

$$\begin{aligned} C(\theta _{i}+\omega )-C(\theta _{i})-C(\omega )= & {} \log \left( (2\pi )^{-k/2}|\Sigma ^{-1}+\Sigma _{i}^{-1}|^{-1/2}|\Sigma |^{-1/2}|\Sigma _{i}|^{-1/2}\right) \nonumber \\&-\frac{1}{2}(\mu -\mu _{i})^{t}(\Sigma +\Sigma _{i})^{-1}(\mu -\mu _{i}),\quad i=1,2.\nonumber \\ \end{aligned}$$
(42)

It remains to evaluate \(M_{C,\lambda }^{(2)}(\theta _{1},\theta _{2},\omega )\), given by (39). It is easy to see that

$$\begin{aligned} (\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega= & {} \left( (\lambda +1)\Sigma _{1}^{-1}\mu _{1}-\lambda \Sigma _{2}^{-1}\mu _{2}+\Sigma ^{-1}\mu ,\right. \\&\left. (-1/2)\left( (\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}\right) \right) , \end{aligned}$$

and therefore

$$\begin{aligned} C((\lambda +1)\theta _{1}-\lambda \theta _{2}+\omega )= & {} \log \left( (2\pi )^{k/2}|(\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}|^{-1/2}\right) \nonumber \\&+\frac{1}{2}\left( (\lambda +1)\Sigma _{1}^{-1}\mu _{1}-\lambda \Sigma _{2}^{-1}\mu _{2}+\Sigma ^{-1}\mu \right) ^{t}\nonumber \\&\times \, \left( (\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}\right) ^{-1} \nonumber \\&\times \left( (\lambda +1)\Sigma _{1}^{-1}\mu _{1}-\lambda \Sigma _{2}^{-1}\mu _{2}+\Sigma ^{-1}\mu \right) , \end{aligned}$$
(43)

with \((\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}>0,\) for \(\lambda \ne 0,-1\). Based now on (39), (40) and (43),

$$\begin{aligned} M_{C,\lambda }^{(2)}(\theta _{1},\theta _{2},\omega )= & {} \log \left( (2\pi )^{-k/2}\right) |\Sigma |^{-\frac{1}{2}}|\Sigma _{1}|^{-\frac{ \lambda +1}{2}}|\Sigma _{2}|^{\frac{\lambda }{2}}\nonumber \\&\times \, \left| (\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}\right| ^{- \frac{1}{2}} \nonumber \\&-\frac{1}{2}\left( \mu ^{t}\Sigma ^{-1}\mu +(\lambda +1)\mu _{1}^{t}\Sigma _{1}^{-1}\mu _{1}-\lambda \mu _{2}^{t}\Sigma _{2}^{-1}\mu _{2}-B_{1}^{t}B_{2}B_{1}\right) ,\nonumber \\ \end{aligned}$$
(44)

with

$$\begin{aligned} B_{1}= & {} (\lambda +1)\Sigma _{1}^{-1}\mu _{1}-\lambda \Sigma _{2}^{-1}\mu _{2}+\Sigma ^{-1}\mu , \\ B_{2}= & {} \left( (\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}\right) ^{-1}. \end{aligned}$$

Taking into account that \(h(X)=0\) (cf. Eq. (22)), the result follows as an application of (38), (42) and (44). \(\square \)

Appendix 4

This appendix provides a detailed proof of Proposition 5.

Proof of Proposition 5

(a) Based on (23) and taking into account (12), straightforward algebra leads to the desired result.

(b) Based on part (a) and on Eq. (22),

$$\begin{aligned} D_{0}^{(\mu ,\Sigma )}((\mu _{1},\Sigma _{1}),(\mu _{2},\Sigma _{2}))= & {} \exp \{C(\theta _{1}+\omega )-C(\theta _{1})-C(\omega )\} \nonumber \\&\times \left( C(\theta _{2})-C(\theta _{1})+(\theta _{1}-\theta _{2})^{t}E_{\theta _{1}+\omega }\left( T(X)\right) \right) \nonumber \\&-\exp \{C(\theta _{1}+\omega )-C(\theta _{1})-C(\omega )\} \nonumber \\&+\exp \{C(\theta _{2}+\omega )-C(\theta _{2})-C(\omega )\}, \end{aligned}$$
(45)

with

$$\begin{aligned} \theta _{i}=\left( \Sigma _{i}^{-1}\mu _{i},- \frac{1}{2}\Sigma _{i}^{-1}\right) ,i=1,2,\;\omega =\left( \Sigma ^{-1}\mu ,- \frac{1}{2}\Sigma ^{-1}\right) \text { and }T(X)=\left( X,XX^{t}\right) . \end{aligned}$$
(46)

Simple algebraic manipulations lead to,

$$\begin{aligned} C(\theta _{2})-C(\theta _{1})=\frac{1}{2}\left( \log \frac{|\Sigma _{2}|}{ |\Sigma _{1}|}+\mu _{2}^{t}\Sigma _{2}^{-1}\mu _{2}-\mu _{1}^{t}\Sigma _{1}^{-1}\mu _{1}\right) . \end{aligned}$$
(47)

On the other hand, taking into account that

$$\begin{aligned} E_{\theta _{1}+\omega }(X)=\int \limits _{{\mathcal {X}}}xf_{C}(x,\theta _{1}+\omega )d\mu (x), \end{aligned}$$

Eq. (46) entails,

$$\begin{aligned} E_{\theta _{1}+\omega }(X)=\left( \Sigma ^{-1}+\Sigma _{1}^{-1}\right) ^{-1}\left( \Sigma ^{-1}\mu +\Sigma _{1}^{-1}\mu _{1}\right) . \end{aligned}$$
(48)

Then,

$$\begin{aligned} E_{\theta _{1}+\omega }\left( XX^{t}\right)= & {} Var_{\theta _{1}+\omega }(X)+\left( E_{\theta _{1}+\omega }(X)\right) \left( E_{\theta _{1}+\omega }(X)\right) ^{t} \nonumber \\= & {} (\Sigma ^{-1}+\Sigma _{1}^{-1})^{-1}+\left\{ (\Sigma ^{-1}+\Sigma _{1}^{-1})^{-1}\left( \Sigma ^{-1}\mu +\Sigma _{1}^{-1}\mu _{1}\right) \right. \nonumber \\&\times \left. \left( \Sigma ^{-1}\mu +\Sigma _{1}^{-1}\mu _{1}\right) ^{t}(\Sigma ^{-1}+\Sigma _{1}^{-1})^{-1}\right\} \end{aligned}$$
(49)

and

$$\begin{aligned} \theta _{1}-\theta _{2}=\left( \Sigma _{1}^{-1}\mu _{1}-\Sigma _{2}^{-1}\mu _{2},-\frac{1}{2}\left( \Sigma _{1}^{-1}-\Sigma _{2}^{-1}\right) \right) . \end{aligned}$$
(50)

Based on (46) and (50), algebraic manipulations entail that,

$$\begin{aligned} (\theta _{1}-\theta _{2})^{t}E_{\theta _{1}+\omega }\left( T(X)\right)= & {} \left( \Sigma _{1}^{-1}\mu _{1}-\Sigma _{2}^{-1}\mu _{2}\right) ^{t}E_{\theta _{1}+\omega }\left( X\right) \nonumber \\&+\,trace\left\{ -\frac{1}{2}\left( \Sigma _{1}^{-1}-\Sigma _{2}^{-1}\right) ^{t}E_{\theta _{1}+\omega }\left( XX^{t}\right) \right\} ,\qquad \end{aligned}$$
(51)

Hence, taking into account (48), (49) and (51)

(52)

with \(\mu ^{*}=E_{\theta _{1}+\omega }(X).\) Based on the fact that, for \( i=1,2\),

$$\begin{aligned} \exp \left( C(\theta _{i}+\omega )-C(\theta _{i})-C(\omega )\right)= & {} (2\pi )^{-\frac{k}{2}}|\Sigma |^{-\frac{1}{2}}|\Sigma _{i}|^{-\frac{1}{2} }\left| \Sigma ^{-1}+\Sigma _{i}^{-1}\right| ^{-\frac{1}{2}} \\&\times \exp \left\{ -\frac{1}{2}(\mu -\mu _{i})^{t}(\Sigma +\Sigma _{i})^{-1}(\mu -\mu _{i})\right\} \\= & {} E_{(\mu _{i},\Sigma _{i})}\left( f_{N(\mu ,\Sigma )}(X)\right) , \end{aligned}$$

Eqs. (42), (45), (47) and (52) complete the proof of part (b) of the proposition. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Avlogiaris, G., Micheas, A. & Zografos, K. On local divergences between two probability measures. Metrika 79, 303–333 (2016). https://doi.org/10.1007/s00184-015-0556-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-015-0556-6

Keywords

Mathematics Subject Classification

Navigation