Abstract
Most methods for describing the relationship among random variables require specific probability distributions and some assumptions concerning random variables. Mutual information, based on entropy to measure the dependency among random variables, does not need any specific distribution and assumptions. Redundancy, which is an analogous version of mutual information, is also proposed as a method. In this paper, the concepts of redundancy and mutual information are explored as applied to multi-dimensional categorical data. We found that mutual information and redundancy for categorical data can be expressed as a function of the generalized likelihood ratio statistic under several kinds of independent log-linear models. As a consequence, mutual information and redundancy can also be used to analyze contingency tables stochastically. Whereas the generalized likelihood ratio statistic to test the goodness-of-fit of the log-linear models is sensitive to the sample size, the redundancy for categorical data does not depend on sample size but depends on its cell probabilities.
Similar content being viewed by others
References
Agresti A (1996) An introduction to categorical data analysis. Wiley, New York
Abramson N (1963) Information theory and coding. Mcgraw Hill, New York
Bishop YMM, Fienberg SE, Holland PW (1975) Discrete multivariate analysis. MIT Press, Cambridge
Brillinger R (2004) Some data analyses using mutual information. Braz J Probab Stat 18: 163–183
Brillinger R, Guha A (2007) Mutual information in the frequency domain. J Stat Plan Inference 137(3): 1076–1084
Cover T, Thomas J (1991) Elements of information theory. Wiley, New York
Christensen R (1997) Log-linear models and logistic regression, 2nd edn. Springer, Heidelberg
Darbellay GA (1999) An estimator for the mutual information based on a criterion for conditional independence. Comput Stat Data Anal 32(1): 1–17
DeGroot MH (1962) Uncertainty, information and sequential experiments. Ann Math Stat 33: 404–419
Fraser A, Swinney H (1986) Independent coordinates for strange attractors from mutual information. Phys Rev 33(2): 1134–1140
Gallager RG (1968) Information theory and reliable communication. Wiley, New York
Joe H (1989a) Relative entropy measures of multivariate dependence. J Am Stat Assoc 84: 157–164
Joe H (1989b) Estimation for entropy and other functionals of a multivariate density. Ann Inst Stat Math 41: 683–697
Moddemeijer R (1989) On estimation of entropy and mutual information of continuous distributions. Signal Process 16(3): 233–246
Palus M (1993) Identifying and quantifying chaos by using information theoretic functions in time series prediction: forecasting the future and understanding the past. SantaFe Inst Stud Sci Complexity 15: 387–413
Palus M, Pivka D (1995) Estimating predictability: redundancy and surrogate data method. Neural Network World 15: 537–552
Palus M, Vejmelka M (2007) Directionality of coupling from bivariate time series: how to avoid false causalities and missed connections. Phys Rev E 17: 056211
Prichard D, Theiler J (1994) Generating surrogate data for time series with several simultaneously measured variables. Phys Rev Lett 73: 951–954
Roulston M (1997) Significance testing of information theoretic functionals. Phys D Nonlinear Phenomena 110: 62–66
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27: 379–423
Wienholt W, Sendhoff B (1996) How to determine the redundancy of noisy chaotic time series. Int J Bifurcation Chaos 6: 101–117
Yule GU (1900) On the association of attributes in statistics. Philos Trans Ser A 194: 257–319
Yule GU (1912) On the methods of measuring association between two attributes. J R Stat Soc 75: 579–642
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper was supported by Samsung Research Fund, Sunkyunkwan University, 2006.
Rights and permissions
About this article
Cite this article
Hong, C.S., Kim, B.J. Mutual information and redundancy for categorical data. Stat Papers 52, 17–31 (2011). https://doi.org/10.1007/s00362-009-0196-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-009-0196-x