Abstract
This chapter develops the multivariate (i.e., three or more variables) entropies. The Shannon mutual information is negative in the standard probability theory example of three random variables that are pair-wise independent but not mutually independent. When we assume metrical data in the values of the random variable (e.g., a real-valued variable), then there is a natural notion of metrical logical entropy and it is twice the variance—which makes the connection with basic concepts of statistics. The twice-variance formula shows how to extend logical entropy to continuous random variables. Boltzmann entropy is analyzed to show that Shannon entropy only arises in statistical mechanics as a numerical approximation that has attractive properties of analytical tractability. Edwin Jaynes’s Method of MaxEntropy uses the maximization of the Shannon entropy to generalize the indifference principle. When other constraints rule out the uniform distribution, the Jaynes recommendation is to choose the distribution that maximizes the Shannon entropy. The maximization of logical entropy yields a different distribution. Which solution is best? The maximum logical entropy solution is closest to the uniform distribution in terms of the ordinary Euclidean notion of distance. The chapter ends by giving the transition to coding theory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The usual version of the inclusion-exclusion principle would be: \(h(X,Y,Z)=h(X)+h\left ( Y\right ) +h\left ( Z\right ) -m\left ( X,Y\right ) -m\left ( X,Z\right ) -m\left ( Y,Z\right ) +m\left ( X,Y,Z\right ) \) but \(m\left ( X,Y\right ) =h(X)+h\left ( Y\right ) -h\left ( X,Y\right ) \) and so forth, so substituting for \(m\left ( X,Y\right ) \), \(m\left ( X,Z\right ) \), and \(m\left ( Y,Z\right ) \) gives the formula.
- 2.
The multivariate generalization of the ‘Shannon’ mutual information was developed not by Shannon but by William J. McGill [18] and Robert M. Fano [9, 10] at MIT in the early 1950s and independently by Nelson M. Blachman [4]. The criterion for it being the ‘correct’ generalization seems to be that it satisfied the generalized Venn diagram formulas of the inclusion-exclusion principle.
- 3.
Fano had earlier noted that, for three or more variables, the mutual information could be negative [10, p. 58].
- 4.
These formulas, for the equiprobable case, were derived using the “difference method” by Zhang et al. [24] as new formulas for variance and covariance although it is doubtful that anything so basic could really be new.
- 5.
The physical Boltzmann constant is irrelevant for our information theoretic purposes and is ignored.
- 6.
If one used more terms, then the numerical approximation would be even better but the resulting expression would be unworkable and not a Shannon entropy formula. One writer notes that there is a much better approximation, \(\ln \left ( n!\right ) \approx \sqrt {\ln \left ( 2\pi \right ) }+\left ( n+\frac {1}{2}\right ) \ln \left ( n\right ) -n\), before proceeding with the usual approximation [2, p. 533]. D. J. C. MacKay [16, p. 2] makes a similar observation. The two-term approximation is a ‘sweet spot’ between accuracy for very large n and analytical tractability.
References
Abramson, Norman 1963. Information Theory and Coding. New York: McGraw-Hill.
Atkins, Peter, Julio de Paula, and James Keeler. 2018. Atkins’ Physical Chemistry 11th Ed. Oxford UK: Oxford University Press.
Best, Michael J. 2017. Quadratic Programming with Computer Programs. Boca Raton FL: CRC Press.
Blachman, Nelson M. 1961. A Generalization of Mutual Information. Proc. IRE 49: 1331–1332.
Breiman, Leo, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. 1984. Classification and Regression Trees. Monterey CA: Wadsworth and Brooks/Cole Advanced Books and Software.
Cover, Thomas, and Joy Thomas. 1991. Elements of Information Theory. New York: John Wiley.
Csiszar, Imre, and Janos Körner. 1981. Information Theory: Coding Theorems for Discrete Memoryless Systems. New York: Academic Press.
Ellerman, E. Castedo. 2021. Variance Vs Entropy of One-hot Vectors. OSF Preprints. https://doi.org/10.31219/osf.io/43bme.
Fano, Robert M. 1950. The Transmission of Information II. Research Laboratory of Electronics Report 149. Cambridge MA: MIT.
Fano, Robert M. 1961. Transmission of Information. Cambridge MA: MIT Press.
Feller, William. 1968. An Introduction to Probability Theory and Its Applications Vol. 1. 3rd ed. New York: John Wiley.
Harper, Larry H. 2004. Global Methods for Combinatorial Isoperimetric Problems. Cambridge UK: Cambridge University Press.
Jaynes, Edwin T. 1978. Where do we stand on maximum entropy? In The Maximum Entropy Formalism, ed. Raphael D. Levine and Myron Tribus, 15–118. Cambridge MA: MIT.
Jaynes, Edwin T. 2003. Probability Theory: The Logic of Science. Edited by G. Larry Bretthorst. Cambridge UK: Cambridge University Press.
Kaplan, Wilfred. 1999. Maxima and Minima with Applications: Practical Optimization and Duality. New York: John Wiley & Sons.
MacKay, D. J. C. 2003. Information Theory, Inference, and Learning Algorithms. Cambridge UK: Cambridge University Press.
MacKay, Donald M. 1969. Information, Mechanism and Meaning. Cambridge: MIT Press.
McGill, William J. 1954. Multivariate information transmission. Transactions of the IRE Professional Group on Information Theory 4: 93–111. https://doi.org/10.1109/TIT.1954.1057469.
Nei, Masatoshi. 1973. Analysis of Gene Diversity in Subdivided Populations. Proc. Nat. Acad. Sci. U.S.A. 70: 3321–3.
Ramshaw, John D. 2018. The Statistical Foundations of Entropy. Singapore: World Scientific Publishing.
Tribus, Myron. 1978. Thirty Years of Information Theory. In The Maximum Entropy Formalism, ed. Raphael D. Levine and Myron Tribus, 1–14. Cambridge MA: MIT.
Weir, Bruce S. 1996. Genetic Data Analysis II: Methods for Discrete Population Genetic Data. Sunderland MA: Sinauer Associates.
Yeung, Raymond W. 2002. A First Course in Information Theory. New York: Springer Science+Business Media.
Zhang, Yuli, Huaiyu Wu, and Lei Cheng. 2012. Some new deformation formulas about variance and covariance. In Proceedings of 2012 International Conference on Modelling, Identification and Control (ICMIC2012), 987–992.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Ellerman, D. (2021). Further Developments of Logical Entropy. In: New Foundations for Information Theory. SpringerBriefs in Philosophy. Springer, Cham. https://doi.org/10.1007/978-3-030-86552-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-86552-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86551-1
Online ISBN: 978-3-030-86552-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)