Article Outline
Glossary
Definition of the Subject
Entropy example: How many questions?
Distribution Entropy
A Gander at Shannon's Noisy Channel Theorem
The Information Function
Entropy of a Process
Entropy of a Transformation
Determinism and Zero‐Entropy
The Pinsker–Field and K‑Automorphisms
Ornstein Theory
Topological Entropy
Three Recent Results
Exodos
Bibliography
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This is the Greek spelling.
- 2.
- 3.
In this paper, unmarked logs will be to base-2. In entropy theory, it does not matter much what base is used, but base-2 is convenient for computing entropy for messages described in bits. When using the natural logarithm, some people refer to the unit of information as a nat. In this paper, I have picked bits rather than nats.
- 4.
This is holds when each probability p is a reciprocal power of two. For general probabilities, the “expected number of questions” interpretation holds in a weaker sense: Throw N darts independently at N copies of the dartboard. Efficiently ask Yes/No questions to determine where all N darts landed. Dividing by N, then sending \({N \to\infty}\), will be the \({p \log\big(\frac{1}{p}\big)}\) sum of Eq. (1).
- 5.
There does not seem to be a standard name for this function. I use \({\boldsymbol{\eta}}\), since an uppercase \({\boldsymbol{\eta}}\) looks like an H, which is the letter that Shannon used to denote what I am calling distribution‐entropy.
- 6.
Curiosity: Just in this paragraph we compute distropy in nats, that is, using natural logarithm. Given a small probability \({p\in[0,1]}\) and setting \({x := 1/p}\), note that \({\boldsymbol{\eta}(p) = \log(x)/x \approx 1/\pi(x)}\), where \({\pi(x)}\) denotes the number of prime numbers less‐equal x. (This approximation is a weak form of the Prime Number Theorem.) Is there any actual connection between the ‘approximate distropy’ function \({\mathcal{H}_\pi(\vec{\mathbf{p}}) := \sum_{p\in \vec{\mathbf{p}}} 1 / \pi (1/p)}\) and Number Theory, other than a coincidence of growth rate?
- 7.
The noise‐process is assumed to be independent of the signal‐process. In contrast, when the perturbation is highly dependent on the signal, then it is sometimes called distortion.
- 8.
I am now at liberty to reveal that our X has always been a Lebesgue space, that is, measure‐isomorphic to an interval of ℝ together with countably many point-atoms (points with positive mass). The equivalence of generating and separating is a technical theorem, due to Rokhlin.
Assuming μ to be Lebesgue is not much of a limitation. For instance, if μ is a finite measure on any Polish space, then μ extends to a Lebesgue measure on the μ‑completion of the Borel sets. To not mince words: All spaces are Lebesgue spaces unless you are actively looking for trouble.
- 9.
This is sometimes called measure(-theoretic) entropy or (perhaps unfortunately) metric entropy, to distinguish it from topological entropy. Tools known prior to entropy, such as spectral properties, did not distinguish the two Bernoulli‐shifts; see Spectral Theory of Dynamical Systems for the definitions.
- 10.
It is an easier result, undoubtedly known much earlier, that every ergodic T has a countable generating partition – possibly of ∞‑distropy.
- 11.
On the set of ordered K‑set partitions (with K fixed) this convergence is the same as: \({{\textsf{Q}}^{(L)} \to {\textsf{Q}}}\) when \({{\mu\left(Fat\left(\textsf{Q}^{(L)},\textsf{Q}\right)\right)} \to 1}\).
An alternative approach is the Rokhlin metric, \( \operatorname{Dist}(\textsf{P},\textsf{Q}) := {\mathcal{H}(\textsf{P}|\textsf{Q})} + {\mathcal{H}(\textsf{Q}|\textsf{P})}\), which has the advantage of working for unordered partitions.
- 12.
i. e, \({{{S_n} \to T}}\) IFF \({{\forall A \in\mathcal{X}}: {\mu({{S_n}^{-1}(A)} \triangle {{T}^{-1}(A)})} \to 0}\); this is a metric‐topology, since our probability space is countably generated. This can be restated in terms of the unitary operator \({{U_T}}\) on \({\mathbb{L}^{2}(\mu)}\), where \({{U_T(f)} := {f\circ T}}\). Namely, \({{S_n}\to T}\) in the coarse topology IFF \({{U_{S_n}}\to{U_T}}\) in the strong operator topology.
- 13.
Fix an \({\varepsilon > 0}\) and an \({N > 1/\varepsilon}\). Points \({x,T(x),\dots, T^N(x)}\) have some two at distance less than \({\frac{1}{N}}\); say, \({\operatorname{Dist}({T^i(x),T^j(x)}) < \varepsilon}\), for some \({0\le i<j\le N}\). Since T is an isometry, \( \varepsilon > \operatorname{Dist}({x,T^k(x)}) > 0 \), where \({k:=j-i}\). So the T k‐orbit of x is ε‐dense.
- 14.
In engineering circles, this is called the Almost‐everywhere equi‐partition theorem.
- 15.
Traditionally, this called the Pinsker algebra where, in this context, “algebra” is understood to mean “σ‑algebra”.
- 16.
Because we only work on a compact space, we can omit “finite”. Some generalizations of topological entropy to non‐compact spaces require that only finite open‐covers be used [37].
- 17.
- 18.
Perhaps the ∅-bad‐length, 574, is shorter than the 1-bad‐length because, say, ∅s take less tape‐space than 1s and so – being written more densely – cause ambiguity sooner.
- 19.
A popular computer‐algebra‐system was not, at least under my inexpert tutelage, able to simplify this. However, once top-ent gave the correct answer, the software was able to detect the equality.
- 20.
The ergodic measures are the extreme points of \({{\mathcal{M} (T)}}\); call them \({{\mathcal{M}_{\text{Erg}} (T)}}\). This \({{\mathcal{M} (T)}}\) is the set of barycenters obtained from Borel probability measures on \({{\mathcal{M}_{\text{Erg}} (T)}}\) (see Krein-Milman_theorem, Choquet_theory in [60]). In this instance, what explains the failure to have an ergodic maximal‐entropy measure? Let \({{\mu_{k}}}\) be an invariant ergodic measure on \({{Y_{k}}}\). These measures do converge to the one-point (ergodic) probability measure \({{\mu_{\infty}}}\) on \({{Y_{\infty}}}\). But the map \({\mu \mapsto \mathcal{E}_{\mu}(T)}\) is not continuous at \({{\mu_{\infty}}}\).
- 21.
Measures \({{\alpha_{L}}\to{\mu}}\) IFF \({{\int{f}{{\text{d}} {\alpha_{L}}}}\to{\int{f}{{\text{d}} {\mu}}}}\), for each continuous \({f\colon X \to \mathbb{R}}\). This metrizable topology makes \({\mathcal{M}}\) compact. Always, \({{\mathcal{M}(T)}}\) is a non-void compact subset (see Measure Preserving Systems.)
- 22.
For any two sets \({B,{B^{\prime}}\subset X}\), the union \({{\partial B} \cup {\partial{B^{\prime}}}}\) is a superset of the three boundaries \({\partial{(B\cup{B^{\prime}})}, {\partial{(B\cap{B^{\prime}})}}, {\partial{(B\smallsetminus{B^{\prime}})}}}\).
Abbreviations
- Measure space:
-
A measure space \({(X,\mathcal{X},\mu)}\) is a set X, a field (that is, a σ‑algebra) \({\mathcal{X}}\) of subsets of X, and a countably‐additive measure \({\mu\colon \mathcal{X} \to [0,\infty]}\). (We often just write \({(X,\mu)}\), with the field implicit.) For a collection \({\mathcal{C} \subset \mathcal{X}}\), use \({\operatorname{Fld}(\mathcal{C}) \supset \mathcal{C}}\) for the smallest field including \({\mathcal{C}}\). The number \({\mu(B)}\) is the “μ-mass of B ”.
- Measure‐preserving map:
-
A measure‐preserving map \({\psi\colon (X,\mathcal{X},\mu) \to (Y,\mathcal{Y},\nu)}\) is a map \({\psi\colon X \to Y}\) such that the inverse image of each \({B \in \mathcal{Y}}\) is in \({\mathcal{X}}\), and \({\mu({\psi}^{-1} (B)) = \nu(B)}\). A (measure‐preserving) transformation is a measure‐preserving map \( T\colon (X,\mathcal{X},\mu) \to (X,\mathcal{X},\mu) \). Condense this notation to \({(T\colon X,\mathcal{X},\mu)}\) or \({(T\colon X,\mu)}\).
- Probability space:
-
A probability space is a measure space \({(X,\mu)}\) with \({\mu(X)=1}\); this μ is a probability measure. All our maps/transformations in this article are on probability spaces.
- Factor map:
-
A factor map
$$ \psi\colon (T\colon X,\mathcal{X},\mu) \to (S\colon Y,\mathcal{Y},\nu) $$is a measure‐preserving map \({\psi\colon X \to Y}\) which intertwines the transformations, \({\psi\circ T = S \circ\psi}\). And ψ is an isomorphism if – after deleting a nullset (a mass-zero set) in each space – this ψ is a bijection and \({{\psi}^{-1}}\) is also a factor map.
- Almost everywhere (a.e.):
-
A measure‐theoretic statement holds almost everywhere, abbreviated a.e., if it holds off of a nullset. (Eugene Gutkin once remarked to me that the problem with Measure Theory is … that you have to say “almost everywhere”, almost everywhere.) For example, \({B \smash{\overset{a.e.}{\supset} A }}\) means that \({\mu(B \smallsetminus A)}\) is zero. The a.e. will usually be implicit.
- Probability vector:
-
A probability vector \({\vec{v} = (v_{1},v_{2},\dots)}\) is a list of non‐negative reals whose sum is 1. We generally assume that probability vectors and partitions (see below) have finitely many components. Write “countable probability vector/partition”, when finitely or denumerably many components are considered.
- Partition:
-
A partition \({\textsf{P}=(A_{1},A_{2},\dots)}\) splits X into pairwise disjoint subsets \({A_{i}\in\mathcal{X}}\) so that the disjoint union \({\bigsqcup_{i} A_{i}}\) is all of X. Each A i is an atom of \({\textsf{P}}\). Use \({|\textsf{P}|}\) or \({^{\#}\textsf{P}}\) for the number of atoms. When \({\textsf{P}}\) partitions a probability space, then it yields a probability vector \({\vec{v}}\), where \({v_{j} := \mu(A_{j})}\). Lastly, use \({\textsf{P} \langle x \rangle}\) to denote the \({\textsf{P}}\)‐atom that owns x.
- Fonts:
-
We use the font \({\mathcal{H},\mathcal{E},\mathcal{I}}\) for distribution‐entropy, entropy and the information function. In contrast, the script font \({\mathcal{ABC}\dots}\) will be used for collections of sets; usually subfields of \({\mathcal{X}}\). Use \({\mathbb{E}(\cdot)}\) for the (conditional) expectation operator.
- Notation:
-
\({\mathbb{Z} = \text{integers}}\), \({\mathbb{Z}_{+} = \text{positive integers}}\), and \( \mathbb{N} = \text{natural numbers} = {0,1,2,\dots}\). (Some well‐meaning folk use ℕ for \({\mathbb{Z}_{+}}\), saying ‘Nothing could be more natural than the positive integers’. And this is why \({0\in\mathbb{N}}\). Use \({\lceil \cdot \rceil}\) and \({\lfloor \cdot \rfloor}\) for the ceiling and floor functions; \({\lfloor \cdot \rfloor}\) is also called the “greatest‐integer function”. For an interval \({J := [a,b) \subset [-\infty,+\infty]}\), let \({[a\dots b)}\) denote the interval of integers \({J \cap\mathbb{Z}}\) (with a similar convention for closed and open intervals). E. g., \({({\text{e}} \dots \pi] = ({\text{e}} \dots \pi) = \{3\}}\).
For subsets A and B of the same space, Ω, use \({A\subset B}\) for inclusion and \({A \varsubsetneqq B}\) for proper inclusion. The difference set \({B \smallsetminus A}\) is \({\{\omega \in B \:|\: \omega \notin A\}}\). Employ A c for the complement \({\Omega\smallsetminus A}\). Since we work in a probability space, if we let \({x := \mu(A)}\), then a convenient convention is to have
$$ x^{c} \quad\text{denote}\quad 1 - x\:, $$since then \({\mu(A^{c})}\) equals x c.
Use \({A \triangle B}\) for the symmetric difference \( [A \smallsetminus B] \cup [B \smallsetminus A] \). For a collection \({\mathcal{C}=\{{E_{j}}\}_{j}}\) of sets in Ω, let the disjoint union \({\bigsqcup_{j}{E_{j}}}\) or \({\bigsqcup(\mathcal{C})}\) represent the union \({\bigcup_{j}{E_{j}}}\) and also assert that the sets are pairwise disjoint.
Use “\({\forall_{\mathrm{large}} n}\)” to mean: “\({\exists n_0}\) such that \({\forall n > {n_0}}\)”. To refer to left hand side of an Eq. (20), use LhS(20); do analogously for RhS(20), the right hand side.
Bibliography
Historical
Adler R, Weiss B (1967) Entropy, a complete metric invariant for automorphisms of the torus. Proc Natl Acad Sci USA 57:1573–1576
Clausius R (1864) Abhandlungen ueber die mechanische Wärmetheorie, vol 1. Vieweg, Braunschweig
Clausius R (1867) Abhandlungen ueber die mechanische Wärmetheorie, vol 2. Vieweg, Braunschweig
Kolmogorov AN (1958) A new metric invariant of transitive automorphisms of Lebesgue spaces. Dokl Akad Nauk SSSR 119(5):861–864
McMillan B (1953) The basic theorems of information theory. Ann Math Stat 24:196–219
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423,623–656
Sinai Y (1959) On the Concept of Entropy of a Dynamical System, Dokl Akad Nauk SSSR 124:768–771
Recent Results
Bowen L (2008) A new measure‐conjugacy invariant for actions of free groups. http://www.math.hawaii.edu/%7Elpbowen/notes11.pdf
Gutman Y, Hochman M (2006) On processes which cannot be distinguished by finitary observation. http://arxiv.org/pdf/math/0608310
Ornstein DS, Weiss B (2007) Entropy is the only finitely‐observable invariant. J Mod Dyn 1:93–105; http://www.math.psu.edu/jmd
Ergodic Theory Books
Brin M, Stuck G (2002) Introduction to dynamical systems. Cambridge University Press, Cambridge
Cornfeld I, Fomin S, Sinai Y (1982) Ergodic theory. Grundlehren der Mathematischen Wissenschaften, vol 245. Springer, New York
Friedman NA (1970) Introduction to ergodic theory. Van Nostrand Reinhold, New York
Halmos PR (1956) Lectures on ergodic theory. The Mathematical Society of Japan, Tokyo
Katok A, Hasselblatt B (1995) Introduction to the modern theory of dynamical systems. (With a supplementary chapter by Katok and Leonardo Mendoza). Encyclopedia of Mathematics and its Applications, vol 54. Cambridge University Press, Cambridge
Keller G, Greven A, Warnecke G (eds) (2003) Entropy. Princeton Series in Applied Mathematics. Princeton University Press, Princeton
Lind D, Marcus B (1995) An introduction to symbolic dynamics and coding. Cambridge University Press, Cambridge,
Mañé R (1987) Ergodic theory and differentiable dynamics. Ergebnisse der Mathematik und ihrer Grenzgebiete, ser 3, vol 8. Springer, Berlin
Parry W (1969) Entropy and generators in ergodic theory. Benjamin, New York
Petersen K (1983) Ergodic theory. Cambridge University Press, Cambridge
Rudolph DJ (1990) Fundamentals of measurable dynamics. Clarendon Press, Oxford
Sinai Y (1994) Topics in ergodic theory. Princeton Mathematical Series, vol 44. Princeton University Press, Princeton
Walters P (1982) An introduction to ergodic theory. Graduate Texts in Mathematics, vol 79. Springer, New York
Differentiable Entropy
Ledrappier F, Young L-S (1985) The metric entropy of diffeomorphisms. Ann Math 122:509–574
Pesin YB (1977) Characteristic Lyapunov exponents and smooth ergodic theory. Russ Math Surv 32:55–114
Young L-S (1982) Dimension, entropy and Lyapunov exponents. Ergod Theory Dyn Syst 2(1):109–124
Finite Rank
Ferenczi S (1997) Systems of finite rank. Colloq Math 73(1):35–65
King JLF (1988) Joining‐rank and the structure of finite rank mixing transformations. J Anal Math 51:182–227
Maximal‐Entropy Measures
Buzzi J, Ruette S (2006) Large entropy implies existence of a maximal entropy measure for interval maps. Discret Contin Dyn Syst 14(4):673–688
Denker M (1976) Measures with maximal entropy. In: Conze J-P, Keane MS (eds) Théorie ergodique, Actes Journées Ergodiques, Rennes, 1973/1974. Lecture Notes in Mathematics, vol 532. Springer, Berlin, pp 70–112
Misiurewicz M (1973) Diffeomorphism without any measure with maximal entropy. Bull Acad Polon Sci Sér Sci Math Astron Phys 21:903–910
Topological Entropy
Adler R, Marcus B (1979) Topological entropy and equivalence of dynamical systems. Mem Amer Math Soc 20(219)
Adler RL, Konheim AG, McAndrew MH (1965) Topological entropy. Trans Am Math Soc 114(2):309–319
Bowen R (1971) Entropy for group endomorphisms and homogeneous spaces. Trans Am Math Soc 153:401–414. Errata 181:509–510 (1973)
Bowen R (1973) Topological entropy for noncompact sets. Trans Am Math Soc 184:125–136
Dinaburg EI (1970) The relation between topological entropy and metric entropy. Sov Math Dokl 11:13–16
Hasselblatt B, Nitecki Z, Propp J (2005) Topological entropy for non‐uniformly continuous maps. http://www.citebase.org/abstract?id=oai:arXiv.org:math/0511495
Determinism and Zero‐Entropy, and Entropy Observation
Bailey D (1976) Sequential schemes for classifying and predicting ergodic processes. Ph D Dissertation, Stanford University
Kalikow S, King JLF (1994) A countably‐valued sleeping stockbroker process. J Theor Probab 7(4):703–708
King JLF (1992) Dilemma of the sleeping stockbroker. Am Math Monthly 99(4):335–338
Ornstein DS, Weiss B (1990) How sampling reveals a process. Ann Probab 18(3):905–930
Ornstein DS, Weiss B (1993) Entropy and data compression schemes. IEEE Trans Inf Theory 39(1):78–83
Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans Inf Theory 23(3):337–343
Bernoulli Transformations, K‑Automorphisms, Amenable Groups
Berg KR (1975) Independence and additive entropy. Proc Am Math Soc 51(2):366–370; http://www.jstor.org/stable/2040323
Meshalkin LD (1959) A case of isomorphism of Bernoulli schemes. Dokl Akad Nauk SSSR 128:41–44
Ornstein DS (1970) Bernoulli shifts with the same entropy are isomorphic. Adv Math 5:337–352
Ornstein DS (1974) Ergodic theory randomness and dynamical systems, Yale Math Monographs, vol 5. Yale University Press, New Haven
Ornstein DS, Shields P (1973) An uncountable family of K‑automorphisms. Adv Math 10:63–88
Ornstein DS, Weiss B (1983) The Shannon–McMillan–Briman theorem for a class of amenable groups. Isr J Math 44(3):53–60
Ornstein DS, Weiss B (1987) Entropy and isomorphism theorems for actions of amenable groups. J Anal Math 48:1–141
Shields P (1973) The theory of Bernoulli shifts. University of Chicago Press, Chicago
Sinai YG (1962) A weak isomorphism of transformations having an invariant measure. Dokl Akad Nauk SSSR 147:797–800
Thouvenot J-P (1975) Quelques propriétés des systèmes dynamiques qui se décomposent en un produit de deux systèmes dont l'un est un schéma de Bernoulli. Isr J Math 21:177–207
Thouvenot J-P (1977) On the stability of the weak Pinsker property. Isr J Math 27:150–162
Abramov Formula
Ward T, Zhang Q (1992) The Abramov–Rohlin entropy addition formula for amenable group actions. Monatshefte Math 114:317–329
Miscellaneous
Newhouse SE (1989) Continuity properties of entropy. Ann Math 129:215–235
Rudin W (1973) Functional analysis. McGraw‐Hill, New York
Tribus M, McIrvine EC (1971) Energy and information. Sci Am 224:178–184
Wikipedia, http://en.wikipedia.org/wiki/. pages: http://en.wikipedia.org/wiki/Spectral_radius, http://en.wikipedia.org/wiki/Information_entropy
Books and Reviews
Boyle M, Downarowicz T (2004) The entropy theory of symbolic extensions. Invent Math 156(1):119–161
Downarowicz T, Serafin J (2003) Possible entropy functions. Isr J Math 135:221–250
Hassner M (1980) A non‐probabilistic source and channel coding theory. Ph D Dissertation, UCLA
Katok A, Sinai YG, Stepin AM (1977) Theory of dynamical systems and general transformation groups with an invariant measure. J Sov Math 7(6):974–1065
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag
About this entry
Cite this entry
King, J.L.F. (2012). Entropy in Ergodic Theory. In: Meyers, R. (eds) Mathematics of Complexity and Dynamical Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1806-1_14
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1806-1_14
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1805-4
Online ISBN: 978-1-4614-1806-1
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering