Abstract
Since compressing data incrementally by a non-branching hierarchy has resulted in substantial efficiency gains for performing induction in previous work, we now explore branching hierarchical compression as a means for solving induction problems for generally intelligent systems. Even though assuming the compositionality of data generation and the locality of information may result in a loss of the universality of induction, it has still the potential to be general in the sense of reflecting the inherent structure of real world data imposed by the laws of physics. We derive a proof that branching compression hierarchies (BCHs) create power law functions of mutual algorithmic information between two strings as a function of their distance – a ubiquitous characteristic of natural data, which opens the possibility of efficient natural data compression by BCHs. Further, we show that such hierarchies guarantee the existence of short features in the data which in turn increases the efficiency of induction even more.
Keywords
- Hierarchical compression
- Incremental compression
- Algorithmic complexity
- Universal induction
- Power laws
- Scale free structure
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
For notation and definitions please consult the Preliminaries section below.
References
Solomonoff, R.J.: A formal theory of inductive inference. Part I. Inf. Control 7(1), 1–22 (1964)
Solomonoff, R.J.: A formal theory of inductive inference. Part II. Inf. Control 7(2), 224–254 (1964)
Lin, H.W., Tegmark, M.: Why does deep and cheap learning work so well? arXiv preprint arXiv:1608.08225 (2016)
Franz, A.: Some theorems on incremental compression. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS, vol. 9782, pp. 74–83. Springer, Cham (2016). doi:10.1007/978-3-319-41649-6_8
Bak, P.: How Nature Works: The Science of Self-organized Criticality. Copernicus, New York (1996)
Saremi, S., Sejnowski, T.J.: Hierarchical model of natural images and the origin of scale invariance. Proc. Natl. Acad. Sci. 110(8), 3071–3076 (2013)
Lin, H.W., Tegmark, M.: Critical behavior from deep dynamics: a hidden dimension in natural language. arXiv preprint arXiv:1606.06737 (2016)
Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications. Springer, New York (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Proofs
A Proofs
Proof
(Lemma 1 ). Recall that \(f_{l}\) and \(p_{l}\) are the shortest feature and parameter of \(q_{l-1}\) and therefore independent, \(K(q_{l-1})\mathop {=}\limits ^{+}l(f_{l})+K(p_{l})\), as was proven in [4, Corrolary 2]. From Eq. (3.1) we obtain
Since \(f_{l}\) and \(p_{l}\) cannot be made dependent by conditioning, we get \(K(q_{l-1}|q_{h})\mathop {=}\limits ^{+}K(f_{l}|q_{h})+K(p_{l}|q_{h})\). Due to assumption (2), the first term becomes \(K(f_{l}|q_{h})=K(f_{l})\mathop {=}\limits ^{+}l(f_{l})\). Therefore, the conditional version can be computed analogously to Eq. (A.1):
However, since \(K(q_{h}|q_{h})=O(1)\) we obtain for the information in \(q_{h}\) about \(q_{0}\):
\(\square \)
Proof
(Lemma 2 ). We can in general expand [8, Theorem 3.9.1, p. 247]
and insert it into the independence relation Eq. 3.4. This leads to
where the last inequality follows from the fact that conditioning can only reduce the description length of z [8, Theorem 2.1.2, p. 108]. Subtracting this inequality from K(z) yields \(K(z)-K(z|a)\mathop {\ge }\limits ^{+}K(z)-K(z|y)\). Now we insert the definition of mutual information \(I(a:z)\equiv K(z)-K(z|a)\) on both sides from which the claim follows. \(\square \)
Proof
(Theorem 1 ). First, from the result in Eq. (3.3) and Lemma 2 it follows that \(I(x_{i}:x_{j})\) decays exponentially with the height h of their common ancestor \(q_{h}\)
under our assumptions. Consider that the maximal index distance between leaves in a perfect tree increases exponentially with the height h of the common ancestor:
where \(\hat{b}_{l}\) is the average branching factor at level l of the tree. By defining the total average branching factor \(\bar{b}\equiv \left( \prod _{l=1}^{h}\hat{b}_{l}\right) ^{1/h}>d_{ij}^{1/h}\), we can solve for \(h>\log _{\bar{b}}(d_{ij})\) and compute:
where \(\nu _{l}\equiv \log _{\bar{b}}(1/\alpha _{l})>0\). Inserting this into Eq. A.3 concludes the proof. \(\square \)
Proof
(Lemma 3 ). Consider the general expansion [8, Theorem 3.9.1, p. 247]
I is defined by \(I(x:y)\equiv K(y)-K(y|x)\) and is larger than zero by assumption. Since in general \(K(y|x,K(x))\mathop {\le }\limits ^{+}K(y|x)\) we obtain
\(\square \)
Proof
(Theorem 2 ). Since y is \(l(\lambda )\)-compressible by q, \(\lambda (q,p)=U\left( \left\langle \lambda ,q,p\right\rangle \right) =x\) and \(l(x)=l(y)+l(p)\), x is compressible as well:
We define \(f\equiv \left\langle \lambda ,q\right\rangle \) and obtain \(U(\left\langle f,p\right\rangle )=f(p)=x\) – the main feature equation. We can define the descriptive map \(f'\) by a function that removes y from x to obtain the remainder p: \(f'(x)=p\). It suffices if it does so for that particular x and y, not in general.
From fs definition, we get \(l(f)=l(\lambda )+l(q)=l(\lambda )+K(y)<l(y)\) since y is \(l(\lambda )\)-compressible by assumption. It follows that the (f, p)-pair compresses x at least to some extent, \(l(f)+l(p)<l(y)+l(p)=l(x)\). Therefore, f is indeed a feature of x and its length is bounded by l(y). \(\square \)
Proof
(Theorem 3 ). In general, the relation \(K(p)\mathop {\le }\limits ^{+}K(p|z)+K(z)\) is valid, since if p is computable by a detour via z, its shortest program without the detour can only be shorter. Setting \(z=K(x)\) and conditioning on x leads to
The conditioning operation is not valid in general, however the detour argument is still valid in this case. Since \(K(p|x)=l(f')\) [4, Lemma 1(2)] and \(K(p|K(x),x)=O(1)\) [4, Theorem 3(3)], we get
We now insert the “complexity of the complexity” expression in [8, Lemma 3.9.2, Eq. (3.18)] \(K(K(x)|x)\mathop {\le }\limits ^{+}\log K(x)+2\log \log K(x)\) and the first claim follows. The second claim is a property of K(K(x)|x) [8, Eq. (3.13)] and therefore also holds for \(l(f')\). \(\square \)
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Franz, A. (2017). On Hierarchical Compression and Power Laws in Nature. In: Everitt, T., Goertzel, B., Potapov, A. (eds) Artificial General Intelligence. AGI 2017. Lecture Notes in Computer Science(), vol 10414. Springer, Cham. https://doi.org/10.1007/978-3-319-63703-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-63703-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63702-0
Online ISBN: 978-3-319-63703-7
eBook Packages: Computer ScienceComputer Science (R0)