Abstract
Algorithmic statistics studies explanations of observed data that are good in the algorithmic sense: an explanation should be simple i.e. should have small Kolmogorov complexity and capture all the algorithmically discoverable regularities in the data. However this idea can not be used in practice because Kolmogorov complexity is not computable.
In this paper we develop algorithmic statistics using space-bounded Kolmogorov complexity. We prove an analogue of one of the main result of ‘classic’ algorithmic statistics (about the connection between optimality and randomness deficiences). The main tool of our proof is the Nisan-Wigderson generator.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Kolmogorov complexity of A is defined as follows. We fix any computable bijection \(A \mapsto [A]\) from the family of finite sets to the set of binary strings, called encoding. Then we define \({{\mathrm{\mathrm {C}}}}(A)\) as the complexity \({{\mathrm{\mathrm {C}}}}([A])\) of the code [A] of A.
- 3.
The randomness deficiency of a string x with respect to a distribution P is defined as \(d(x |P):= -\log P(x) - {{\mathrm{\mathrm {C}}}}(x |P)\), the optimality deficiency is defined as \(\delta (x,P):= {{\mathrm{\mathrm {C}}}}(P) - \log P(x) - {{\mathrm{\mathrm {C}}}}(x)\).
- 4.
Such an universal machine does exist – see [5].
- 5.
Theorem 1.2 in [8] has another formulation: it does not contain any information about \(|\widehat{f}|\). However, from the proof of the theorem it follows that a needed program (denote it as \(\widehat{f}_1\)) is got from f by using an algorithmic transformation. Therefore there exists a program \(\widehat{f}\) that works functionally like \(\widehat{f}_1\) such that \(|\widehat{f}| \le |f| + O(1)\).
Also, Theorem 1.2 does not assume that \(\Pr [f(x)]\) can belong to \([\frac{1}{3}; \frac{2}{3}]\). However, this assumption does not used in the proof of Theorem 1.2.
References
Ajtai, M.: Approximate counting with uniform constant-depth circuits. In: Advanced in Computational Complexity Theory, pp. 1–20. American Mathematical Society (1993)
Buhrman, H., Fortnow, L., Laplante, S.: Resource-Bounded Kolmogorov complexity revisited. SIAM J. Comput. 31(3), 887–905 (2002)
Furst, M., Saxe, J.B., Sipser, M.: Math. Syst. Theory 17(1), 13–27 (1984)
Kolmogorov, A.N.: Approaches, three approaches to the quantitative definition of information. Problems Inf. Transmission 1(1), 4–11 (1965). English translation published in Int. J. Comput. Math. 2, 157–168 (1968)
Li, P., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn, p. 792. Springer, Heidelberg (1993). 1st edn. 1993; 2nd edn. 1997
Longpré, L.: Resource bounded kolmogorov complexity, a link between computational complexity and information theory. Ph. D. Thesis, Cornell University, Ithaca, NY (1986)
Musatov, D.: Improving the space-bounded version of muchnik’s conditional complexity theorem via “naive” derandomization. Theory Comput. Syst. 55(2), 299–312 (2014)
Nisan, N.: \(RL \subseteq SC\). J. Comput. Complex. 4, 1–11 (1994)
Nisan, N.: Pseudorandom bits for constant depth circuits. Combinatorica 11, 63–70 (1991)
Nisan, N., Wigderson, A.: Hardness vs randomness. J. Comput. Syst. Sci. 49(2), 149–167 (1994)
Shen, A., Kolmogorov, A.: Around kolmogorov complexity: basic notions and results. In: Vovk, V., Papadoupoulos, H., Gammerman, A. (eds.) Measures of Complexity: Festschrift for Alexey Chervonenkis. Springer, Heidelberg (2015)
Shen, A.: The concept of \((\alpha , \beta )\)-stochasticity in the Kolmogorov sense, and its properties. Sov. Math. Doklady 271(1), 295–299 (1983)
Shen, A., Uspensky, V., Vereshchagin, N.: Kolmogorov complexity and algorithmic randomness. In: MCCME 2013 (Russian). English translation http://www.lirmm.fr/~ashen/kolmbook-eng.pdf
Sipser, M.: A complexity theoretic approach to randomness. In: Proceedings of the 15th ACM Symposium on the Theory of Computing, pp. 330–335 (1983)
Vereshchagin, N., Vitányi, P.: Kolmogorov’s Structure Functions with an Application to the foundations of model selection. IEEE Trans. Inf. Theory 50(12), 3265–3290 (2004). Preliminary version: Proceedings of 47th IEEE Symposium on the Foundations of Computer Science, pp. 751–760 (2002)
Vereshchagin, N.K., Vitányi, P.M.B.: Rate distortion a nd denoising of individual data using kolmogorov complexity. IEEE Trans. Inf. Theory 56(7), 3438–3454 (2010)
Acknowledgments
I would like to thank Nikolay Vereshchagin and Alexander Shen for useful discussions, advice and remarks.
This work is supported by RFBR grant 16-01-00362 and supported in part by Young Russian Mathematics award and RaCAF ANR-15-CE40-0016-01 grant. The study has been funded by the Russian Academic Excellence Project ‘5-100’.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Symmetry of Information
Define \({{\mathrm{\mathrm {CD}}}}^m(A,B)\) as the minimal length of a program that inputs a pair of strings (a, b) and outputs a pair of boolean values \((a \in A, b \in B)\) using space at most m for every input.
Lemma 4
(Symmetry of information). Assume \(A, B \subseteq \{0,1\}^n\). Then
for \(p = m + \text {poly}(n + {{\mathrm{\mathrm {CD}}}}^m(A,B))\).
for \(p = 2m + \text {poly}(n + {{\mathrm{\mathrm {CD}}}}^m(A,B))\).
Proof
(of Lemma 4(a)). The proof is similar to the proof of Theorem 4(a).
Proof
(of Lemma 4(b)). Let \(k:= {{\mathrm{\mathrm {CD}}}}^m(A, B)\). Denote by \(\mathcal {D}\) the family of sets (U, V) such that \({{\mathrm{\mathrm {CD}}}}^m(U,V) \le k\) and \(U,V \subseteq \{0,1\}^n\). It is clear that \(|\mathcal {D}| < 2^{k+1}\). Denote by \(\mathcal {D}_{A}\) the pairs of \(\mathcal {D}\) whose the first element is equal to A. Let t satisfy the inequalities \(2^t \le |\mathcal {D}_{A}| < 2^{t+1}\).
Let us prove that
-
\({{\mathrm{\mathrm {CD}}}}^p(B |A)\) does not exceed t significantly;
-
\({{\mathrm{\mathrm {CD}}}}^p(A)\) does not exceed \(k - t\) significantly.
Here \(p=m + O(n)\).
We start with the first statement. There exists a program that enumerates all sets from \(\mathcal {D}_{A}\) using A as an oracle and that works in space \(2m + O(n)\). Indeed, such enumeration can be done in the following way: enumerate all programs of length k and verify the following condition for every pair of n-bit strings. First, a program uses at most m space on this input. Second, if a second n-bit string belongs to A then the program outputs 1, and 0 otherwise. Since some program loops we need additional \(m + O(n)\) space to take it into account.
Append to this program the ordinal number of a program that distinguishes (A, B). This number is not greater than \(t+1\). Therefore we have \({{\mathrm{\mathrm {CD}}}}^p(B |A) \le t + O(\log ({{\mathrm{\mathrm {CD}}}}^m(A,B) + m + n))\).
Now let us prove the second statement. Note that there exist at most \(2^{k-t +1}\) sets U such that \(|\mathcal {D}_U| \ge 2^t\) (including A). Hence, if we construct a program that enumerates all sets with such property (and does not use much space) then we will win—the set A can be described by the ordinal number of this enumeration.
Let us construct such a program. It works as follows:
enumerate all sets U that are the first elements from \(\mathcal {D}\), i.e. we enumerate programs that distinguish the corresponding sets (say, lexicographically). We go to the next step if the following properties holds. First, \(|\mathcal {D}_U| \ge 2^t\), and second: we did not meet set U earlier (i.e. every program whose the lexicographical number is smaller does not distinguish U or is not the first element from a set from \(\mathcal {D}\)).
This program works in \(2m + \text {poly}(n + {{\mathrm{\mathrm {CD}}}}^m(A,B))\) space (that we want) and has length \(O(\log ({{\mathrm{\mathrm {CD}}}}^m(A)+ n +m))\).
Proof
(of Lemma 3 ). Let us show that \(\mathcal {B}\) satisfies property \((1)^*\) with probability at most \(2^{-n}\). Since \(\mathcal {B}\) satisfies property (2) with probability at most \(\frac{1}{4}\) (see the proof of Lemma 2) it would be enough for us.
For this let us show that every part is ‘bad’ (i.e. has at least \((n + k)^2 + 1\) sets from \(\mathcal {B}\)) with probability at most \(2^{-2n}\). The probability of such event is equal to the probability of the following event: a binomial random variable with parameters \((2^k, 2^{-k}(n + 2)\ln 2)\) is greater than \((n + k)^2\). To get the needed upper bound for this probability is not difficult however the correspondent formulas are cumbersome. Take \(w:=2^k\), \(p:=2^{-k}(n + 2)\ln 2\) and \(v:=(n + k)^2\). We need to estimate
The first inequality holds since \(wp = (n+2) \ln 2 \le (n+k)^2 = v\). Now note that \(wp= (n+2) \ln 2 < 10 n\). So
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Milovanov, A. (2017). On Algorithmic Statistics for Space-Bounded Algorithms. In: Weil, P. (eds) Computer Science – Theory and Applications. CSR 2017. Lecture Notes in Computer Science(), vol 10304. Springer, Cham. https://doi.org/10.1007/978-3-319-58747-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-58747-9_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58746-2
Online ISBN: 978-3-319-58747-9
eBook Packages: Computer ScienceComputer Science (R0)