Cramér–Rao lower bounds arising from generalized Csiszár divergences

Ashok Kumar, M.; Vijay Mishra, Kumar

doi:10.1007/s41884-020-00029-z

Cramér–Rao lower bounds arising from generalized Csiszár divergences

Research Paper
Published: 16 June 2020

Volume 3, pages 33–59, (2020)
Cite this article

Information Geometry Aims and scope Submit manuscript

390 Accesses
5 Citations
2 Altmetric
Explore all metrics

Abstract

We study the geometry of probability distributions with respect to a generalized family of Csiszár f-divergences. A member of this family is the relative $\alpha $-entropy which is also a Rényi analog of relative entropy in information theory and known as logarithmic or projective power divergence in statistics. We apply Eguchi’s theory to derive the Fisher information metric and the dual affine connections arising from these generalized divergence functions. This enables us to arrive at a more widely applicable version of the Cramér–Rao inequality, which provides a lower bound for the variance of an estimator for an escort of the underlying parametric probability distribution. We then extend the Amari–Nagaoka’s dually flat structure of the exponential and mixer models to other distributions with respect to the aforementioned generalized metric. We show that these formulations lead us to find unbiased and efficient estimators for the escort model. Finally, we compare our work with prior results on generalized Cramér–Rao inequalities that were derived from non-information-geometric frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Some New Flexibilizations of Bregman Divergences and Their Asymptotics

On the Maximum $$f$$ -Divergence of Probability Distributions Given the Value of Their Coupling

Article 01 October 2021

Refinement of Jensen’s inequality and estimation of f- and Rényi divergence via Montgomery identity

Article Open access 19 November 2018

Notes

A divergence function is a non-negative function D on $S\times S$ satisfying $D(p,q) \ge 0$ with equality iff $p=q$.

References

Amari, S.: Information geometry and its applications. Springer, New York (2016)
Book Google Scholar
Amari, S., Cichocki, A.: Information geometry of divergence functions. Bull. Polish Acad. Sci. Tech. Sci. 58(1), 183–195 (2010)
Google Scholar
Amari, S., Nagaoka, H.: Methods of information geometry. Oxford University Press, Oxford (2000)
MATH Google Scholar
Arıkan, E.: An inequality on guessing and its application to sequential decoding. IEEE Trans Inf Theory 42(1), 99–105 (1996)
Article MathSciNet Google Scholar
Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information geometry. Springer, New York (2017)
Book Google Scholar
Basu, A., Shioya, H., Park, C.: Statistical inference: The minimum distance approach. In: Monographs on Statistics and Applied Probability. Chapman & Hall/CRC Press, London (2011)
Bercher, J.F.: On a ($\beta $, q)-generalized fisher information and inequalities involving q- gaussian distributions. J. Math. Phys. 53(063303), 1–12 (2012)
MathSciNet Google Scholar
Bercher, J.F.: On generalized Cramér-Rao inequalities, generalized Fisher information and characterizations of generalized q-Gaussian distributions. J. Phys. A Math. Theor. 45(25), 255303 (2012)
Article MathSciNet Google Scholar
Blumer, A.C., McEliece, R.J.: The Rényi redundancy of generalized Huffman codes. IEEE Trans. Inf. Theory 34(5), 1242–1249 (1988)
Article Google Scholar
Bunte, C., Lapidoth, A.: Codes for tasks and Rényi entropy. IEEE Trans. Inf. Theory 60(9), 5065–5076 (2014)
Article Google Scholar
Campbell, L.L.: A coding theorem and Rényi’s entropy. Inf. Control 8, 423–429 (1965)
Article Google Scholar
Cichocki, A., Amari, S.: Families of alpha- beta- and gamma- divergences: Flexible and robust measures of similarities. Entropy 12, 1532–1568 (2010)
Article MathSciNet Google Scholar
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley, Hoboken (2012)
MATH Google Scholar
Csiszár, I.: Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Stat. 19(4), 2032–2066 (1991)
Article MathSciNet Google Scholar
Eguchi, S.: Geometry of minimum contrast. Hiroshima Math. J. 22(3), 631–647 (1992)
Article MathSciNet Google Scholar
Eguchi, S., Kato, S.: Entropy and divergence associated with power function and the statistical application. Entropy 12(2), 262–274 (2010)
Article MathSciNet Google Scholar
Eguchi, S., Komori, O., Kato, S.: Projective power entropy and maximum Tsallis entropy distributions. Entropy 13(10), 1746–1764 (2011)
Article MathSciNet Google Scholar
van Erven, T., Harremoës, P.: Rényi divergence and Kullback–Leibler divergence. IEEE Trans. Inf. Theory 60(7), 3797–3820 (2014)
Article Google Scholar
Fujisawa, H., Eguchi, S.: Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal. 99, 2053–2081 (2008)
Article MathSciNet Google Scholar
Furuichi, S.: On the maximum entropy principle and the minimization of the Fisher information in Tsallis statistics. J. Math. Phys. 50(013303), 1–12 (2009)
MathSciNet MATH Google Scholar
Huleihel, W., Salamatian, S., Médard, M.: Guessing with limited memory. In: IEEE International Symposium on Information Theory, pp. 2253–2257 (2017)
Jones, M.C., Hjort, N.L., Harris, I.R., Basu, A.: A comparison of related density based minimum divergence estimators. Biometrika 88(3), 865–873 (2001)
Article MathSciNet Google Scholar
Karthik, P.N., Sundaresan, R.: On the equivalence of projections in relative $\alpha $-entropy and Rényi divergence. In: National Conference on Communication, pp. 1–6 (2018)
Kumar, M.A., Mishra, K.V.: Information geometric approach to Bayesian lower error bounds. In: IEEE International Symposium on Information Theory, pp. 746–750 (2018)
Kumar, M.A., Sason, I.: Projection theorems for the Rényi divergence on alpha-convex sets. IEEE Trans. Inf. Theory 62(9), 4924–4935 (2016)
Article Google Scholar
Kumar, M.A., Sundaresan, R.: Minimization problems based on relative $\alpha $-entropy I: Forward projection. IEEE Trans. Inf. Theory 61(9), 5063–5080 (2015)
Article MathSciNet Google Scholar
Kumar, M.A., Sundaresan, R.: Minimization problems based on relative $\alpha $-entropy II: Reverse projection. IEEE Trans. Inf.Theory 61(9), 5081–5095 (2015)
Article MathSciNet Google Scholar
Lutwak, E., Yang, D., Lv, S., Zhang, G.: Extensions of fisher information and stam’s inequality. IEEE Trans. Inf. Theory 58(3), 1319–1327 (2012)
Article MathSciNet Google Scholar
Lutwak, E., Yang, D., Zhang, G.: Cramér-Rao and moment-entropy inequalities for Rényi entropy and generalized Fisher information. IEEE Trans. Inf. Theory 51(1), 473–478 (2005)
Article Google Scholar
Mishra, K.V., Kumar, M.A.: Generalized Bayesian Cramér-Rao inequality via information geometry of relative $\alpha $-entropy. In: IEEE Annual Conference on Information Science and Systems, pp. 1–6 (2020)
Naudts, J.: Estimators, escort probabilities, and $\phi $-exponential families in statistical physics. J. Inequal. Pure Appl. Math. 5(4), 1–15 (2004)
MathSciNet MATH Google Scholar
Naudts, J.: Generalised thermostatistics. Springer, New York (2011)
Book Google Scholar
Notsu, A., Komori, O., Eguchi, S.: Spontaneous clustering via minimum gamma-divergence. Neural Comput. 26(2), 421–448 (2014)
Article MathSciNet Google Scholar
Rényi, A., et al.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, p. 547–561 (1961)
Sundaresan, R.: Guessing under source uncertainty. IEEE Trans. Inf. Theory 53(1), 269–287 (2007)
Article MathSciNet Google Scholar
Tsallis, C., Mendes, R.S., Plastino, A.R.: The role of constraints within generalized nonextensive statistics. Phys. A 261, 534–554 (1998)
Article Google Scholar
Zhang, J.: Divergence function, duality, and convex analysis. Neural Comput. 16, 159–195 (2004)
Article Google Scholar

Download references

Acknowledgements

The authors are indebted to Prof. Rajesh Sundaresan of the Indian Institute of Science, Bengaluru for his helpful suggestions and discussions that improved the presentation of this material substantially. We sincerely thank the anonymous reviewers for their constructive suggestions that significantly improved the presentation of the manuscript.

Author information

Authors and Affiliations

Department of Mathematics, Indian Institute of Technology Palakkad, Palakkad, 678557, India
M. Ashok Kumar
United States Army Research Laboratory, Adelphi, MD, 20783, USA
Kumar Vijay Mishra

Authors

M. Ashok Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Kumar Vijay Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Ashok Kumar.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Proof of Theorem 3

Taking log on both sides of (48),

$$\begin{aligned} \log p_{\theta }^{(\alpha )}(x)&= -\log M(\theta ) + {\frac{1}{1-\frac{1}{\alpha }}}\log \Big [ {q^{(\alpha )}(x)}^{1-\frac{1}{\alpha }} + \sum \limits _{i=1}^k \theta _i h_i(x) \Big ]. \end{aligned}$$

(62)

Partial derivative produces

$$\begin{aligned} \partial _i \log p_{\theta }^{(\alpha )}(x)&= -\partial _i\log M(\theta ) + {\frac{1}{1-\frac{1}{\alpha }}}\frac{h_i(x)}{ \Big [ {q^{(\alpha )}(x)}^{1-\frac{1}{\alpha }} +\sum \limits _{i=1}^k \theta _i h_i(x) \Big ]}, \end{aligned}$$

(63)

or

$$\begin{aligned} \partial _i \log p_{\theta }^{(\alpha )}(x)&= -\partial _i\log M(\theta ) + \frac{\alpha }{\alpha -1} \frac{f_i(x)}{ \Big [q(x)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(x) \Big ]}. \end{aligned}$$

(64)

Taking expectation on both sides of (64), we obtain

$$\begin{aligned} E_{\theta ^{(\alpha )}} \left[ \partial _i \log p_{\theta }^{(\alpha )}(x)\right]&= -\partial _i\log M(\theta ) + \frac{\alpha }{\alpha -1} E_{\theta ^{(\alpha )}} \Bigg [\frac{f_i(X)}{ q(X)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(X)}\Bigg ]. \end{aligned}$$

(65)

Since the expected value of the score function vanishes (left hand side of (65)), we have

$$\begin{aligned} \partial _i\log M(\theta ) = \frac{\alpha }{\alpha -1} E_{\theta ^{(\alpha )}} \Bigg [\frac{f_i(X)}{ q(X)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(X)}\Bigg ]. \end{aligned}$$

(66)

Substituting (66) into (64), we get

$$\begin{aligned} \partial _i \log p_{\theta }^{(\alpha )}(x)= & {} \frac{\alpha }{\alpha -1} \frac{f_i(x)}{ \Big [ q(x)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(x) \Big ]} \nonumber \\&- \frac{\alpha }{\alpha -1} E_{\theta ^{(\alpha )}} \Bigg [\frac{f_i(X)}{ q(X)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(X) }\Bigg ] \nonumber \\=: & {} \widehat{\eta }_i(x) - \eta _i, \end{aligned}$$

(67)

where

$$\begin{aligned} \widehat{\eta }_i(x) := \frac{\alpha }{\alpha -1}\frac{ f_i(x)}{ \Big [q(x)^{\alpha -1} + \sum \limits _{i=1}^k\theta _i f_i(x) \Big ]} \text { and } \eta _i := E_{\theta ^{(\alpha )}}[\widehat{\eta }_i(X)] \end{aligned}$$

Moreover, (66) implies that $\log M(\theta )$ should be the potential (if exists).

The Riemmanian metric becomes

$$\begin{aligned} g_{ij}^{(\alpha )}(\theta ) = \frac{1}{\alpha ^2}E_{\theta ^{(\alpha )}}[(\widehat{\eta }_i(X) - \eta _i)(\widehat{\eta }_j(X) - \eta _j)]. \end{aligned}$$

(68)

This further strengthens our expectation that the $\eta _i$’s are dual parameters to $\theta _i$’s. However, it is surprising that it is not so as we shall see now. We have

$$\begin{aligned} \eta _j= & {} \frac{\alpha }{\alpha -1}E_{\theta ^{(\alpha )}}\Bigg [ \frac{f_j(X)}{ q(X)^{\alpha -1} + \sum \limits _{j=1}^k\theta _j f_j(X) }\Bigg ]\nonumber \\= & {} \frac{\alpha }{\alpha -1}\sum \limits _x \Bigg [\frac{f_j(x)}{ q(x)^{\alpha -1} + \sum \limits _{j=1}^k\theta _j f_j(x)}\Bigg ] p_{\theta }^{(\alpha )}(x). \end{aligned}$$

(69)

Let $R_{\theta }(x) = q(x)^{\alpha -1} + \sum \limits _{j=1}^k\theta _j f_j(x)$. Partial differentiation produces

$$\begin{aligned} \frac{\partial \eta _j}{\partial \theta _i}= & {} \frac{\alpha }{\alpha -1}\frac{\partial }{\partial \theta _i}\left( \sum _x \frac{f_j(x)p_{\theta }^{(\alpha )}(x)}{R_{\theta }(x)}\right) \nonumber \\= & {} \frac{\alpha }{\alpha -1}\sum _x \frac{R_{\theta }(x)f_j(x) \partial _i p_{\theta }^{(\alpha )}(x) - f_j(x) p_{\theta }^{(\alpha )}(x) \partial _i(R_{\theta }(x))}{(R_{\theta }(x))^2}\nonumber \\= & {} \frac{\alpha }{\alpha -1}\sum _x \left[ \frac{f_j(x)}{R_{\theta }(x)}\partial _i p_{\theta }^{(\alpha )}(x) - p_{\theta }^{(\alpha )}(x)\frac{f_j(x)}{R_{\theta }(x)} \frac{f_i(x)}{R_{\theta }(x)} \right] . \end{aligned}$$

(70)

From (64) - (67), we have

$$\begin{aligned} \frac{\alpha }{\alpha -1}\frac{f_i(x)}{ R_{\theta }}&= \partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i. \end{aligned}$$

(71)

Substituting (71) into (70) gives

$$\begin{aligned} \frac{\partial \eta _j}{\partial \theta _i}&= \frac{\alpha }{\alpha -1}\sum _x \bigg [ \partial _i p_{\theta }^{(\alpha )}(x) \frac{\alpha -1}{\alpha } (\partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i) \nonumber \\&- p_{\theta }^{(\alpha )}(x) \left( \frac{\alpha -1}{\alpha }\right) ^2 (\partial _j \log p_{\theta }^{(\alpha )}(x) + \eta _j) (\partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i) \bigg ]\nonumber \\&= \sum _x \bigg [\partial _i p_{\theta }^{(\alpha )}(x) (\partial _i \log p_{\theta }^{(\alpha )}(x) - \eta _i)\nonumber \\&- p_{\theta }^{(\alpha )}(x) \left( \frac{\alpha -1}{\alpha }\right) (\partial _j \log p_{\theta }^{(\alpha )}(x) + \eta _j) (\partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i)\bigg ]\nonumber \\&= \sum _x \bigg [\partial _i p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i \partial _i p_{\theta }^{(\alpha )}(x)\nonumber \\&- \left( \frac{\alpha -1}{\alpha }\right) p_{\theta }^{(\alpha )}(x) \bigg (\partial _j \log p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x) + \eta _i \partial _i \log p_{\theta }^{(\alpha )}(x) \nonumber \\&+\eta _i \partial _j \log p_{\theta }^{(\alpha )}(x) + \eta _i \eta _j \bigg )\bigg ]\nonumber \end{aligned}$$

$$\begin{aligned}&= \sum _x \bigg [\partial _i p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x) - p_{\theta }^{(\alpha )}(x) \frac{1}{p_{\theta }^{(\alpha )}(x)} \partial _j p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x) \nonumber \\&+\frac{1}{\alpha }\bigg (p_{\theta }^{(\alpha )}(x)\partial _j \log p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x)\bigg ) - \left( \frac{\alpha -1}{\alpha }\right) \eta _i \eta _j p_{\theta }^{(\alpha )}(x) \bigg ) \bigg ]\nonumber \\&= \sum _x \bigg [\frac{1}{\alpha }\bigg (p_{\theta }^{(\alpha )}(x)\partial _j \log p_{\theta }^{(\alpha )}(x) \partial _i \log p_{\theta }^{(\alpha )}(x)\bigg ) - \left( \frac{\alpha -1}{\alpha }\right) \eta _i \eta _j p_{\theta }^{(\alpha )}(x) \bigg ) \bigg ]\nonumber \\&= \alpha g_{ij}^{(\alpha )}(\theta ) - \left( \frac{\alpha -1}{\alpha }\right) \eta _i \eta _j.\end{aligned}$$

(72)

This shows that $\eta _i$ cannot be the dual parameters of $\theta _i$ for the statistical model $\mathbb {M}^{(\alpha )}$. This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ashok Kumar, M., Vijay Mishra, K. Cramér–Rao lower bounds arising from generalized Csiszár divergences. Info. Geo. 3, 33–59 (2020). https://doi.org/10.1007/s41884-020-00029-z

Download citation

Received: 14 January 2020
Revised: 23 May 2020
Published: 16 June 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s41884-020-00029-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cramér–Rao lower bounds arising from generalized Csiszár divergences

Abstract

Access this article

Similar content being viewed by others

Some New Flexibilizations of Bregman Divergences and Their Asymptotics

On the Maximum $$f$$ -Divergence of Probability Distributions Given the Value of Their Coupling

Refinement of Jensen’s inequality and estimation of f- and Rényi divergence via Montgomery identity

Notes

References

Acknowledgements