Further Developments of Logical Entropy

Ellerman, David

doi:10.1007/978-3-030-86552-8_4

David Ellerman²

Part of the book series: SpringerBriefs in Philosophy ((BRIEFSPHILOSOPH))

540 Accesses

Abstract

This chapter develops the multivariate (i.e., three or more variables) entropies. The Shannon mutual information is negative in the standard probability theory example of three random variables that are pair-wise independent but not mutually independent. When we assume metrical data in the values of the random variable (e.g., a real-valued variable), then there is a natural notion of metrical logical entropy and it is twice the variance—which makes the connection with basic concepts of statistics. The twice-variance formula shows how to extend logical entropy to continuous random variables. Boltzmann entropy is analyzed to show that Shannon entropy only arises in statistical mechanics as a numerical approximation that has attractive properties of analytical tractability. Edwin Jaynes’s Method of MaxEntropy uses the maximization of the Shannon entropy to generalize the indifference principle. When other constraints rule out the uniform distribution, the Jaynes recommendation is to choose the distribution that maximizes the Shannon entropy. The maximization of logical entropy yields a different distribution. Which solution is best? The maximum logical entropy solution is closest to the uniform distribution in terms of the ordinary Euclidean notion of distance. The chapter ends by giving the transition to coding theory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The usual version of the inclusion-exclusion principle would be: \(h(X,Y,Z)=h(X)+h\left ( Y\right ) +h\left ( Z\right ) -m\left ( X,Y\right ) -m\left ( X,Z\right ) -m\left ( Y,Z\right ) +m\left ( X,Y,Z\right ) \) but \(m\left ( X,Y\right ) =h(X)+h\left ( Y\right ) -h\left ( X,Y\right ) \) and so forth, so substituting for \(m\left ( X,Y\right ) \), \(m\left ( X,Z\right ) \), and \(m\left ( Y,Z\right ) \) gives the formula.
2.
The multivariate generalization of the ‘Shannon’ mutual information was developed not by Shannon but by William J. McGill [18] and Robert M. Fano [9, 10] at MIT in the early 1950s and independently by Nelson M. Blachman [4]. The criterion for it being the ‘correct’ generalization seems to be that it satisfied the generalized Venn diagram formulas of the inclusion-exclusion principle.
3.
Fano had earlier noted that, for three or more variables, the mutual information could be negative [10, p. 58].
4.
These formulas, for the equiprobable case, were derived using the “difference method” by Zhang et al. [24] as new formulas for variance and covariance although it is doubtful that anything so basic could really be new.
5.
The physical Boltzmann constant is irrelevant for our information theoretic purposes and is ignored.
6.
If one used more terms, then the numerical approximation would be even better but the resulting expression would be unworkable and not a Shannon entropy formula. One writer notes that there is a much better approximation, \(\ln \left ( n!\right ) \approx \sqrt {\ln \left ( 2\pi \right ) }+\left ( n+\frac {1}{2}\right ) \ln \left ( n\right ) -n\), before proceeding with the usual approximation [2, p. 533]. D. J. C. MacKay [16, p. 2] makes a similar observation. The two-term approximation is a ‘sweet spot’ between accuracy for very large n and analytical tractability.

References

Abramson, Norman 1963. Information Theory and Coding. New York: McGraw-Hill.
Google Scholar
Atkins, Peter, Julio de Paula, and James Keeler. 2018. Atkins’ Physical Chemistry 11th Ed. Oxford UK: Oxford University Press.
Google Scholar
Best, Michael J. 2017. Quadratic Programming with Computer Programs. Boca Raton FL: CRC Press.
Book Google Scholar
Blachman, Nelson M. 1961. A Generalization of Mutual Information. Proc. IRE 49: 1331–1332.
Google Scholar
Breiman, Leo, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. 1984. Classification and Regression Trees. Monterey CA: Wadsworth and Brooks/Cole Advanced Books and Software.
Google Scholar
Cover, Thomas, and Joy Thomas. 1991. Elements of Information Theory. New York: John Wiley.
Book Google Scholar
Csiszar, Imre, and Janos Körner. 1981. Information Theory: Coding Theorems for Discrete Memoryless Systems. New York: Academic Press.
Google Scholar
Ellerman, E. Castedo. 2021. Variance Vs Entropy of One-hot Vectors. OSF Preprints. https://doi.org/10.31219/osf.io/43bme.
Fano, Robert M. 1950. The Transmission of Information II. Research Laboratory of Electronics Report 149. Cambridge MA: MIT.
Google Scholar
Fano, Robert M. 1961. Transmission of Information. Cambridge MA: MIT Press.
Book Google Scholar
Feller, William. 1968. An Introduction to Probability Theory and Its Applications Vol. 1. 3rd ed. New York: John Wiley.
Google Scholar
Harper, Larry H. 2004. Global Methods for Combinatorial Isoperimetric Problems. Cambridge UK: Cambridge University Press.
Book Google Scholar
Jaynes, Edwin T. 1978. Where do we stand on maximum entropy? In The Maximum Entropy Formalism, ed. Raphael D. Levine and Myron Tribus, 15–118. Cambridge MA: MIT.
Google Scholar
Jaynes, Edwin T. 2003. Probability Theory: The Logic of Science. Edited by G. Larry Bretthorst. Cambridge UK: Cambridge University Press.
Chapter Google Scholar
Kaplan, Wilfred. 1999. Maxima and Minima with Applications: Practical Optimization and Duality. New York: John Wiley & Sons.
Google Scholar
MacKay, D. J. C. 2003. Information Theory, Inference, and Learning Algorithms. Cambridge UK: Cambridge University Press.
Google Scholar
MacKay, Donald M. 1969. Information, Mechanism and Meaning. Cambridge: MIT Press.
Book Google Scholar
McGill, William J. 1954. Multivariate information transmission. Transactions of the IRE Professional Group on Information Theory 4: 93–111. https://doi.org/10.1109/TIT.1954.1057469.
Article Google Scholar
Nei, Masatoshi. 1973. Analysis of Gene Diversity in Subdivided Populations. Proc. Nat. Acad. Sci. U.S.A. 70: 3321–3.
Article Google Scholar
Ramshaw, John D. 2018. The Statistical Foundations of Entropy. Singapore: World Scientific Publishing.
Book Google Scholar
Tribus, Myron. 1978. Thirty Years of Information Theory. In The Maximum Entropy Formalism, ed. Raphael D. Levine and Myron Tribus, 1–14. Cambridge MA: MIT.
Google Scholar
Weir, Bruce S. 1996. Genetic Data Analysis II: Methods for Discrete Population Genetic Data. Sunderland MA: Sinauer Associates.
Google Scholar
Yeung, Raymond W. 2002. A First Course in Information Theory. New York: Springer Science+Business Media.
Book Google Scholar
Zhang, Yuli, Huaiyu Wu, and Lei Cheng. 2012. Some new deformation formulas about variance and covariance. In Proceedings of 2012 International Conference on Modelling, Identification and Control (ICMIC2012), 987–992.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Social Sciences, University of Ljubljana, Ljubljana, Slovenia
David Ellerman

Authors

David Ellerman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ellerman, D. (2021). Further Developments of Logical Entropy. In: New Foundations for Information Theory. SpringerBriefs in Philosophy. Springer, Cham. https://doi.org/10.1007/978-3-030-86552-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-86552-8_4
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86551-1
Online ISBN: 978-3-030-86552-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics