Skip to main content

Some Theorems on Incremental Compression

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9782)

Abstract

The ability to induce short descriptions of, i.e. compressing, a wide class of data is essential for any system exhibiting general intelligence. In all generality, it is proven that incremental compression – extracting features of data strings and continuing to compress the residual data variance – leads to a time complexity superior to universal search if the strings are incrementally compressible. It is further shown that such a procedure breaks up the shortest description into a set of pairwise orthogonal features in terms of algorithmic information.

Keywords

  • Incremental compression
  • Data compression
  • Algorithmic complexity
  • Universal induction
  • Universal search
  • Feature extraction

A. Franz—Independent researcher

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Note that the \(\left\langle \cdot ,\cdot \right\rangle \)-map is defined with \(\left\langle z,\epsilon \right\rangle \equiv z\), hence \(f_{k}(\epsilon )=U\left( \left\langle f_{k},\epsilon \right\rangle \right) =U(f_{k})\), so that \(f_{k}\) acts as a usual string in the universal machine.

  2. 2.

    It is not difficult to see that the “\(\ll \)” sign is justified for all but very few cases. After all, only for very few combinations of a set of fixed sum integers \(\sum _{i}l_{i}=L\) the sum \(\sum _{i}2^{l_{i}}\) is close to \(2^{L}\).

References

  1. Hutter, M.: On universal prediction and Bayesian confirmation. Theor. Comput. Sci. 384(1), 33–48 (2007)

    CrossRef  MathSciNet  MATH  Google Scholar 

  2. Levin, L.A.: Universal sequential search problems. Problemy Peredachi Informatsii 9(3), 115–116 (1973)

    MathSciNet  MATH  Google Scholar 

  3. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability, 300p. Springer, Heidelberg (2005). http://www.hutter1.net/ai/uaibook.htm

  4. Schmidhuber, J.: Optimal ordered problem solver. Mach. Learn. 54(3), 211–254 (2004)

    CrossRef  MATH  Google Scholar 

  5. Potapov, A., Rodionov, S.: Making universal induction efficient by specialization. In: Goertzel, B., Orseau, L., Snaider, J. (eds.) AGI 2014. LNCS, vol. 8598, pp. 133–142. Springer, Heidelberg (2014)

    Google Scholar 

  6. Franz, A.: Artificial general intelligence through recursive data compression and grounded reasoning: a position paper. CoRR, abs/1506.04366 (2015). http://arXiv.org/abs/1506.04366

  7. Franz, A.: Toward tractable universal induction through recursive program learning. In: Bieger, J., Goertzel, B., Potapov, A. (eds.) AGI 2015. LNCS, vol. 9205, pp. 251–260. Springer, Heidelberg (2015)

    CrossRef  Google Scholar 

  8. Li, M., Vitányi, P.M.: An Introduction to Kolmogorov Complexity and Its Applications. Texts in Computer Science. Springer, New York (2009)

    MATH  Google Scholar 

Download references

Acknowledgements

I would like to express my gratitude to Alexey Potapov and Alexander Priamikov for proof reading and helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur Franz .

Editor information

Editors and Affiliations

A Proofs

A Proofs

Proof

(Lemma 1 ).

  1. 1.

    Suppose there is a shorter program g with \(l(g)<l(f^{*})\), that generates x with the help of p: \(U\left( \left\langle g,p\right\rangle \right) =x\). Then there is also a descriptive map \(g'\equiv f'^{*},\) that computes p from x and \(l(g'(x))=l(f'^{*}(x))<l(x)-l(f^{*})<l(x)-l(g)\). Therefore, g is a feature of x by definition, which conflicts with \(f^{*}\) already being the shortest feature.

  2. 2.

    Suppose there is a shorter program \(g'\) with \(l(g')<l(f'^{*})\), that generates p with the help of x: \(U\left( \left\langle g',x\right\rangle \right) =g'(x)=p\). Then \(g'\in D_{f^{*}}(x)\) since \(f^{*}(g'(x))=f^{*}(p)=x\) and \(l(g'(x))=l(p)<l(x)-l(f^{*})\) by construction of \(f'^{*}\). However, by Eq. (4.3) \(f'^{*}\) is already the shortest program able to do so, contradicting the assumption.    \(\square \)

Proof

(Theorem 1 ). From Lemma 1 we know \(l(f^{*})=K(x|p)\), with \(p=f'^{*}(x)\). In all generality, for the shortest program q computing x, \(l(q)=K(x)=K(q)+O(1)\) holds, since it is incompressible (q would not be the shortest program otherwise). For shortest features, the conditional case is also true: \(K(x|p)=K(f^{*}|p)+O(1)\). After all, if there was a shorter program g, \(l(g)<l(f^{*})\), that computed \(f^{*}\) with the help of p, it could also go on to compute x from \(f^{*}\) and p, leading to \(K(x|p)\le l(g)+O(1)<l(f^{*})+O(1)\), which contradicts \(l(f^{*})=K(x|p)\).

Further, for any two strings \(K(f^{*}|p)\le K(f^{*})\), since p can only help in compressing \(f^{*}\). Putting it all together leads to \(l(f^{*})=K(x|p)=K(f^{*}|p)+O(1)\le K(f^{*})+O(1)\). On the other hand, since in general \(K(f^{*})\le l(f^{*})+O(1)\) is also true, the claim \(K(f^{*})=l(f^{*})+O(1)\) follows.   \(\square \)

Proof

(Theorem 2 ).

  1. 1.

    Follows immediately from \(K(f^{*})=l(f^{*})+O(1)=K(x|p)+O(1)=K(f^{*}|p)+O(1)\).

  2. 2.

    The first equality follows from Theorem 1, since we only need to read off the length of \(f^{*}\) in order to know \(K(f^{*})\) up to a constant. For the second equality, consider the symmetry of the conditional prefix complexity relation \(K(f^{*},p)=K(f^{*})+K\left( p|f^{*},K(f^{*})\right) +O(1)=K(p)+K\left( f^{*}|p,K(p)\right) +O(1)\) [8, Theorem 3.9.1, p. 247]. If p does not help computing a shorter \(f^{*}\), then knowing K(p) will not help either. Therefore, from (1), we obtain \(K\left( f^{*}|p,K(p)\right) =K(f^{*})+O(1)\) and therefore \(K\left( p|f^{*},K(f^{*})\right) =K(p)+O(1)\).

  3. 3.

    In general, by [8, Theorem 3.9.1, p. 247] we can expand \(K(f^{*},p)=K(f^{*})+K\left( p|f^{*},K(f^{*})\right) +O(1)\). After inserting (2) the claim follows.   \(\square \)

Proof

(Theorem 3 ).

  1. 1.

    Expand K(xp) up to an additive constant:

    $$\begin{aligned} K(p)+K(x|p,K(p))=K(x,p)=K(x)+K(p|x,K(x)) \end{aligned}$$
    (A.1)

    From Lemma 1(1) and Theorem 1 we know \(K(f^{*})=K(x|p)+O(1)\). Conditioning this on K(p) and using \(f^{*}\)’s independence of p and thereby of K(p) (Theorem 2(1)) we get \(K(x|p,K(p))=K(f^{*}|K(p))+O(1)=K(f^{*})+O(1)\). Inserting this into Eq. (A.1) and using Theorem 2(3), yields

    $$\begin{aligned} K(f^{*},p)=K(p)+K(f^{*})=K(x)+K(p|x,K(x))+O(1) \end{aligned}$$
    (A.2)
  2. 2.

    Fix \(f^{*}\) and let \(P_{f^{*}}(x)\equiv \left\{ f'(x):\; f'\in D_{f^{*}}(x)\right\} \) be the set of admissible parameters computing x from \(f^{*}\). From Lemma 1(2), we know that minimizing \(l(f')\), with \(s=f'(x)\), is equivalent to minimizing K(s|x), i.e. choosing a string \(p=f'^{*}(x)\in P_{f^{*}}(x)\) such that \(K(s|x)\ge K(p|x)\) for all \(s\in P_{f^{*}}(x)\). Conditioning Eq. (A.2) on x leads to:

    $$\begin{aligned} K(p|x)+K(f^{*}|x)=K(x|x)+K(p|x,K(x),x)=K(p|x,K(x)) \end{aligned}$$
    (A.3)

    up to additive constants. Since \(f^{*}\) and x are fixed, the claim \(l(f'^{*})=K(p|x)\propto K(p|x,K(x))+O(1)\) follows.

  3. 3.

    It remains to show that there exists some \(p\in P_{f^{*}}(x)\) such that \(K(p|x,K(x))=O(1)\). After all, if it does exist, it will be identified by minimizing \(l(f')\), as implied by (2). Define \(q\equiv \text{ argmin }_{s}\left\{ l(s):\; U\left( \left\langle f^{*},U(s)\right\rangle \right) =f^{*}\left( U(s)\right) =x\right\} \) and compute \(p\equiv U(q)\). Since \(f^{*}(p)=x\), \(p\in P_{f^{*}}(x)\). Further, there is no shorter program able to compute p, since with p we can compute x given \(f^{*}\) and q is already the shortest one being able to do so, by definition. Therefore, \(l(q)=K(p)+O(1)\) and \(K(x|f^{*})\le K(p)+O(1)\). Can the complexity \(K(x|f^{*})\) be strictly smaller than K(p) thereby surpassing the presumably residual part in p? Let \(p'\) be such a program: \(l(p')=K(x|f^{*})<K(p)+O(1)\). By definition of \(K(x|f^{*})\), \(f^{*}(p')=x\). However, then we can find the shortest program \(q'\) that computes \(p'\) and we get: \(f^{*}\left( U(q')\right) =x\). Since \(l(q')\le l(p')+O(1)\), we get \(l(q')<K(p)+O(1)=l(q)+O(1)\). However, this contradicts the fact that q is already the shortest program able to compute \(f^{*}(U(q))=x\). Therefore,

    $$\begin{aligned} l(q)=K(x|f^{*})=K(p)+O(1) \end{aligned}$$
    (A.4)

    In order to prove \(K(p|x,K(x))=O(1)\) consider the following general expansion

    $$\begin{aligned} K(p,x|f^{*})=K(x|f^{*})+K(p|x,K(x),f^{*})+O(1) \end{aligned}$$
    (A.5)

    Since we can compute p from q and go on to compute x given \(f^{*}\), \(l(q)=K(p,x|f^{*})+O(1)\). After all, note that with Theorem 2(2), we have \(l(q)=K(p)=K(p|f^{*})\le K(p,x|f^{*})\) up to additive constants, but since we can compute \(\left\langle p,x\right\rangle \) given \(f^{*}\) from q, we know \(K(p,x|f^{*})\le l(q)+O(1)\). Both inequalities can only be true if the equality \(l(q)=K(p,x|f^{*})+O(1)\) holds. At the same time, from Eq. (A.4), \(l(q)=K(x|f^{*})\) holds. Inserting this into Eq. (A.5) leads to \(K(p|x,K(x),f^{*})=O(1)\). Taking \(K(p)=K(p|f^{*})+O(1)\) (Theorem 2(2)), and inserting the conditionals x and K(x) leads to: \(K(p|x,K(x))=K(p|x,K(x),f^{*})+O(1)=O(1)\). Since this shows that a \(p\in P_{f^{*}}(x)\) exists with the minimal value \(K(p|x,K(x))=O(1)\), (2) implies that it must be the same or equivalent to the one found by minimizing \(l(f')\).

  4. 4.

    Conditioning Eq. (A.3) on K(x) we get \(K(p|x,K(x))+K(f^{*}|x,K(x))=K(p|x,K(x))+O(1)\) from which the claim follows.   \(\square \)

Proof

(Corollary 1 ). Inserting Eq. (5.2) into Eq. (5.1) proves the point.   \(\square \)

Proof

(Corollary 2 ). Inserting Eq. (A.2) into Eq. (5.4) and using the incompressibility of \(f^{*}\) (Theorem 1) proves the point.   \(\square \)

Proof

(Theorem 4 ). According to the definition of a feature, at a compression step the length of the parameters \(l(p_{i})<l(x)-l(f_{i}^{*})\) and their complexity (Corollary 2) decreases. Since the \(f_{i}^{*}\) are incompressible themselves (Theorem 1), the parameters store the residual information about x. Therefore, at some point, only the possibility \(p_k\equiv {f'}^{*}_k(p_{k-1})=\epsilon \) with \(l(f_{k}^{*})=K(p_{k-1})\) remains and the compression has to stop. Expanding Corollary 2 proves the result: \(K(x)=l(f_{1}^{*})+K(p_{1})+O(1)=l(f_{1}^{*})+l(f_{2}^{*})+K(p_{2})+O(1)=\sum _{i=1}^{k}l(f_{i}^{*})+O(1)\).   \(\square \)

Proof

(Theorem 5 ). Algorithmic information is defined as \(I(f_{i}^{*}:f_{j}^{*})\equiv K(f_{j}^{*})-K(f_{j}^{*}|f_{i}^{*})\). The case \(i=j\) is trivial, since \(K(f_{i}^{*}|f_{i}^{*})=0\). If \(i>j\), then \(p_{j}=\left( f_{j+1}^{*}\circ \cdots \circ f_{i}^{*}\right) (p_{i})\), which implies that all information about \(f_{i}\) is in \(p_{j}\). But since according to Theorem 2(1), \(K(f_{j}^{*}|p_{j})=K(f_{j}^{*})+O(1)\) we conclude that \(K(f_{j}^{*}|f_{i}^{*})=K(f_{j}^{*})+O(1)\). If \(i<j\), then we know that \(f_{j}^{*}\) in no way contributed to the construction of \(p_{i}\) further in the compression process. Hence \(K(f_{j}^{*}|f_{i}^{*})=K(f_{j}^{*})\).   \(\square \)

Proof

(Theorem 6 ). Let \(p\equiv f'^{*}(x)\). Further, from Lemma 1 we know that \(K(x|p)=l(f^{*})\) and \(K(p|x)=l(f'^{*})\). Using Corollary 2, the difference in algorithmic information is \(I(p:x)-I(x:p)=K(x)-K(x|p)-K(p)+K(p|x)=l(f'^{*})+O(1)\). By [8, Lemma 3.9.2, p. 250], algorithmic information is symmetric up to logarithmic terms: \(|I(x:p)-I(p:x)|\le \log K(x)+2\log \log K(x)+\log K(p)+2\log \log K(p)+O(1)\). Since x is computed from \(f^{*}\) and p, we have \(K(p)\le K(x)\). Putting everything together leads to \(l(f'^{*})\le 2\log K(x)+4\log \log K(x)+O(1)\). The second inequality follows from \(K(x)\le l(x)+O(1)\).   \(\square \)

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Franz, A. (2016). Some Theorems on Incremental Compression. In: Steunebrink, B., Wang, P., Goertzel, B. (eds) Artificial General Intelligence. AGI 2016. Lecture Notes in Computer Science(), vol 9782. Springer, Cham. https://doi.org/10.1007/978-3-319-41649-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41649-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41648-9

  • Online ISBN: 978-3-319-41649-6

  • eBook Packages: Computer ScienceComputer Science (R0)