Abstract
A broad class of local divergences between two probability measures or between the respective probability distributions is proposed in this paper. The introduced local divergences are based on the classic Csiszár \(\phi \)-divergence and they provide with a pseudo-distance between two distributions on a specific area of their common domain. The range of values of the introduced class of local divergences is derived and explicit expressions of the proposed local divergences are also derived when the underlined distributions are members of the exponential family of distributions or they are described by multivariate normal models. An application is presented to illustrate the behavior of local divergences.
Similar content being viewed by others
References
Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J R Stat Soc Ser B 28:131–142
Basu A, Shioya H, Park C (2011) Statistical inference, the minimum distance approach. Chapman & Hall/CRC, Boca Raton
Cressie N, Read TRC (1984) Multinomial goodness-of-fit tests. J R Stati Soc Ser B 46:440–464
Csiszár I (1963) Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizitat von Markoffschen Ketten. Magyar Tud Akad Mat Kutato Int Kozl 8:85–108
Csiszár I (1967) Information-type measures of difference of probability distributions and indirect observations. Stud Sci Math Hung 2:299–318
Csiszár I, Korner J (1981) Information theory. Coding theorems for discrete memoryless systems. Akademiai Kiado, Budapest
Ebrahimi N, Soofi S, Soyer R (2010) Information measures in perspective. Int Stat Rev 78:383–412
Johnson RA, Wichern DW (1992) Applied multivariate statistical analysis, 3rd edn. Prentice Hall International Editions, Englewood Cliffs
Kagan AM (1963) On the theory of Fisher’s information quantity. Dokl Akad Nauk SSSR 151:277–278
Kullback S (1959) Information theory and statistics. Wiley, New York
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Landaburu E, Morales D, Pardo L (2005) Divergence-based estimation and testing with misclassified data. Stat Pap 46:397–409
Landaburu E, Pardo L (2000) Goodness of fit tests with weights in the classes based on \((h,\phi )\)-divergences. Kybernetika 36:589–602
Landaburu E, Pardo L (2003) Minimum \((h,\phi )\) -divergences estimators with weights. Appl Math Comput 140:15–28
Liese F, Vajda I (1987) Convex statistical distances. Teubner Texts in Mathematics, Leipzig
Liese F, Vajda I (2006) On divergences and informations in statistics and information theory. IEEE Trans Inf Theory 52:4394–4412
McElroy T, Holan S (2009) A local spectral approach for assessing time series model misspecification. J Multivar Anal 100:604–621
Morales D, Pardo L, Vajda I (2000) Rényi statistics in directed families of exponential experiments. Statistics 34:151–174
Morales D, Pardo L, Pardo MC, Vajda I (2004) Ré nyi statistics for testing composite hypotheses in general exponential models. Statistics 38:133–147
Nielsen F, Nock R (2011) On Rényi and Tsallis entropies and divergences for exponential families. arXiv:1105.3259v1 [cs.IT] 17 May 2011
Papaioannou T (1986) Measures of information. In: Kotz S, Johnson NL (eds) Encyclopedia of statistical sciences, vol 5. Wiley, New York, pp 391–397
Papaioannou T (2001) On distances and measures of information: a case of diversity. In: Charalambides CA, Koutras MV, Balakrishnan N (eds) Probability and statistical models with applications. Chapman & Hall/CRC, Boca Raton, pp 503–515
Pardo L (2006) Statistical inference based on divergence measures. Chapman & Hall/CRC, Boca Raton
Rényi A (1960) On measures of entropy and information. In: Proceedings of the 4th Berkeley symposium on mathematical statistics and probability, vol 1, Berkeley, pp 547–561
Soofi E (2000) Principal information theoretic approaches. J Am Stat Assoc 95:1349–1353
Soofi E, Retzer JJ (2002) Information indices: unification and applications. Information and entropy econometrics. J Econom 107:17–40
Stummer W, Vajda I (2010) On divergences of finite measures and their applicability in statistics and information theory. Statistics 44:169–187
Ullah A (1996) Entropy, divergence and distance measures with econometric applications. J Stat Plan Inference 49:137–162
Vajda I (1972) On the f-divergence and singularity of probability measures. Period Math Hung 2(1–4):223–234
Vajda I (1973) \(\chi ^{\alpha }\)-divergence and generalized Fisher’s information. Transactions of the sixth Prague conference on information theory. Statistical decision functions, random processes, pp 873–886
Vajda I (1989) Theory of statistical inference and information. Kluwer Academic Publishers, Dordrecht
Vajda I (1995) Information theoretic methods in tatistics. Research report no, 1834, Academy of Sciences of the Czech Republic. Institute of Information Theory and Automation, Prague
Zografos K (2008) On Mardia’s and Song’s measures of kurtosis in elliptical distributions. J Multivar Anal 99:858–879
Zografos K, Nadarajah S (2005) Expressions for Rényi and Shannon entropies for multivariate distributions. Stat Probab Lett 71:71–84
Acknowledgments
The authors are grateful to a Reviewer for valuable comments and suggestions which have improved the presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
This appendix provides a detailed proof of Theorem 1.
Proof of Theorem 1
(a) It is clear, from (9) and (10), that \(0\le \widetilde{D}_{\phi }^{R}(P,Q)\). We proceed with the upper bound of \(\widetilde{D}_{\phi }^{R}(P,Q)\). Given that \(\overline{ \phi }(1)=0\) and motivated by a similar proof in Stummer and Vajda (2010, p. 174), we can write
Define the function
Then, for \(u=\frac{q(x)}{p(x)}=\frac{q}{p}\), \(x\in {\mathcal {X}}\),
Hence,
and taking into account that \(\phi ^{*}(u)=u\phi \left( 1/u\right) \), \( u>0\),
Therefore, from Eqs. (24) and (27) we conclude
On the other hand, based on Stummer and Vajda (2010, p. 174), for a convex function \(\phi \in \Phi ^{*}\) which is strictly convex at 1, with \( \phi _{+}^{\prime }(1)=0\), it is true that
Applying inequality (29) to \(\phi =\overline{\phi }\), it is clear that on the subset \(\left\{ x\in {\mathcal {X}}:p(x)<q(x)\right\} \) of \( {\mathcal {X}}\), it holds that \(0\le \overline{\phi }\left( \frac{p}{q}\right) \le \overline{\phi }(0)\), and therefore
Moreover, based on the non-negativity of r and q on any subset of \( {\mathcal {X}}\), it is true that
So, the last two inequalities lead to
In a manner quite similar to the above, applying inequality (29) to \( \phi =\overline{\phi ^{*}}\), it is clear that on the subset \(\left\{ x\in {\mathcal {X}}:q(x)<p(x)\right\} \) of \({\mathcal {X}}\) it holds that \(0\le \overline{\phi ^{*}}\left( \frac{q}{p}\right) \le \overline{\phi ^{*}}(0)\), and therefore
Hence,
A combination of (28), (30) and (31) gives
Based on (7) and (25), \(\overline{\phi }(0)=\phi (0)+\phi _{+}^{\prime }(1)\) and \(\overline{\phi ^{*}}(0)=\phi ^{*}(0)-\phi _{+}^{\prime }(1)\). These identities along with the previous inequality complete the proof of part (a) of the theorem.
(b) To proceed with the proof of part (b) of the theorem, suppose first that \(P=Q\). Then, it is clear from (10) that \(\widetilde{D} _{\phi }^{R}(P,P)=D_{\phi }^{R}(P,P)=\phi (1)=0\), because \(\phi \in \Phi ^{*}\).
Conversely, let \(\widetilde{D}_{\phi }^{R}(P,Q)=0\). Taking into account (9) and the fact that \(\phi (1)=0\),
a.e. with respect to measure \(\mu \), for Radon-Nikodym derivative r positive on \({\mathcal {X}}\). On the other hand, based on Vajda (1989, p. 58)
because \(\phi \) is strictly convex at 1. Therefore, the only way equality ( 32) to be valid, taking into account the above inequality, is when \( \frac{p(x)}{q(x)}=1\) or \(P=Q\), which completes the proof of part (b) of the theorem.
(c) Suppose that \(P\perp Q\). Then, following Vajda (1972, p. 227), \(u=\frac{dQ}{dP+dQ}=0\;[P]\) and \(u=\frac{dQ}{dP+dQ}=1\) [Q]. Taking into account that \(u=\frac{q}{p+q}\), we conclude that if \(P\perp Q\), then \( q(x)=0\), \(a.e. \ x\in {\mathcal {X}}\,[P]\) and \(p(x)=0\), \(a.e. \ x\in {\mathcal {X}}\;[Q]\). Equation (28) is refined as follows,
and subject to the condition \(P\perp Q,\)
On the other hand, because of \(p(x)=0\), \(a.e.\ x\in {\mathcal {X}}\) [Q], it is clear that
since \(Q(\{q=p\})=0\), by taking into account that \(P\perp Q\). This last equality leads to \(\int _{\{p\ge q\}}r(x)dQ(x)=0,\) and therefore
Similarly, it can be shown that \(P(\{q\ge p\})=0\) and hence
Equations (33), (34) and (35) give that if \(P\perp Q\) then \(\widetilde{D}_{\phi }^{R}(P,Q)=\overline{\phi }\left( 0\right) \xi _{0}+\overline{\phi ^{*}}\left( 0\right) \xi _{1}\) which completes the proof of this part of the theorem in view of the equations \(\overline{\phi } (0)=\phi (0)+\phi _{+}^{\prime }(1)\) and \(\overline{\phi ^{*}}(0)=\phi ^{*}(0)-\phi _{+}^{\prime }(1)\).
It remains to prove that if \(\phi (0)+\phi ^{*}(0)<\infty \) and \( \widetilde{D}_{\phi }^{R}(P,Q)=\overline{\phi }\left( 0\right) \xi _{0}+ \overline{\phi ^{*}}\left( 0\right) \xi _{1}\), then \(P\perp Q\). Relationships (24) and (29) immediately lead to
This last inequality with the assumption \(\widetilde{D}_{\phi }^{R}(P,Q)=\overline{\phi }\left( 0\right) \xi _{0}+\overline{\phi ^{*}}\left( 0\right) \xi _{1}\) and \(\phi (0)+\phi ^{*}(0)<\infty \) lead to
and therefore
This last equation lead to
or
for Radon-Nikodym derivative r positive on \({\mathcal {X}}\). This last conclusion proves that \(P\perp Q\) and the proof of the theorem is completed. \(\square \)
Appendix 2
This appendix provides a detailed proof of Proposition 3.
Proof of Proposition 3
Based on (14), straightforward calculations give
Taking into account (19) and (21),
On the other hand, it can be easily shown that \(E_{\theta _{j}}\left( f_{\omega }(X)\right) =\int _{{\mathcal {X}}}f_{\omega }(x)f_{C}(x,\theta _{j})d\mu (x)\), \(j=1,2,\) defined by (15), is given by
and therefore
The result (18) follows as an application of (13), (36) and (37). \(\square \)
Appendix 3
This appendix provides a detailed proof of Proposition 4.
Proof of Proposition 4
Based on Proposition 3,
with \(\theta _{1}=(\theta _{11},\theta _{12})=(\Sigma _{1}^{-1}\mu _{1},- \frac{1}{2}\Sigma _{1}^{-1})\), \(\theta _{2}=(\theta _{21},\theta _{22})=(\Sigma _{2}^{-1}\mu _{2},-\frac{1}{2}\Sigma _{2}^{-1})\), \(\omega =(\omega _{1},\omega _{2})=(\Sigma ^{-1}\mu ,-\frac{1}{2}\Sigma ^{-1})\) and
Based on (22),
On the other hand,
and it is immediate to see, by means of (22), that
Taking into account the identity (cf. Pardo 2006, p. 49)
straightforward algebra entails that
It remains to evaluate \(M_{C,\lambda }^{(2)}(\theta _{1},\theta _{2},\omega )\), given by (39). It is easy to see that
and therefore
with \((\lambda +1)\Sigma _{1}^{-1}-\lambda \Sigma _{2}^{-1}+\Sigma ^{-1}>0,\) for \(\lambda \ne 0,-1\). Based now on (39), (40) and (43),
with
Taking into account that \(h(X)=0\) (cf. Eq. (22)), the result follows as an application of (38), (42) and (44). \(\square \)
Appendix 4
This appendix provides a detailed proof of Proposition 5.
Proof of Proposition 5
(a) Based on (23) and taking into account (12), straightforward algebra leads to the desired result.
(b) Based on part (a) and on Eq. (22),
with
Simple algebraic manipulations lead to,
On the other hand, taking into account that
Eq. (46) entails,
Then,
and
Based on (46) and (50), algebraic manipulations entail that,
Hence, taking into account (48), (49) and (51)
with \(\mu ^{*}=E_{\theta _{1}+\omega }(X).\) Based on the fact that, for \( i=1,2\),
Eqs. (42), (45), (47) and (52) complete the proof of part (b) of the proposition. \(\square \)
Rights and permissions
About this article
Cite this article
Avlogiaris, G., Micheas, A. & Zografos, K. On local divergences between two probability measures. Metrika 79, 303–333 (2016). https://doi.org/10.1007/s00184-015-0556-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-015-0556-6
Keywords
- \(\phi \)-divergence
- Kullback–Leibler divergence
- Cressie and Read power divergence
- Local divergence
- Exponential family