Skip to main content
Log in

On Algorithmic Statistics for Space-bounded Algorithms

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

Algorithmic statistics looks for models of observed data that are good in the following sense: a model is simple (i.e., has small Kolmogorov complexity) and captures all the algorithmically discoverable regularities in the data. However, this idea can not be used in practice as is because Kolmogorov complexity is not computable. In this paper we develop an algorithmic version of algorithmic statistics that uses space-bounded Kolmogorov complexity. We prove a space-bounded version of a basic result from “classical” algorithmic statistics, the connection between optimality and randomness deficiences. The main tool is the Nisan–Wigderson pseudo-random generator. An extended abstract of this paper was presented at the 12th International Computer Science Symposium in Russia (Milovanov 10).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The definition and basic properties of Kolmogorov complexity can be found in the textbooks [7, 16]; for a short survey see [14].

  2. Kolmogorov complexity of a finite set A is defined as follows. We fix some computable bijection (encoding) A↦[A] from the family of all finite sets to the set of all binary strings. Then we define C(A) as the complexity C([A]) of the code [A] of A.

  3. The randomness deficiency of a string x with respect to a distribution P is defined as d(x|P) := − log P(x) −C(x|P). The optimality deficiency is defined as δ(x,P) := C(P) − log P(x) −C(x). See [18] for details.

  4. We agree that only work tape cells (but not the cells on input or output tapes) are taken into account.

  5. This is only one possible way to define the notion of bounded-space complexity for a distribution. Instead of a randomized program that has P as output distribution, we may consider a program that computes P(x) for a give input x. The relations between these two definitions are not well understood.

  6. We use a stronger variant than the theorem in [5], but the proof is the same: we addedrequirement (c), but this is easily seen to be true, because the constructed program for\(\hat {f}\)is a simple transformation of f , and it suffices to add some fixed amountof instructions to f . Also, the theorem in [5] does not assume that Pr[f(x)]belongs to \([\frac {1}{3}; \frac {2}{3}]\).However, this assumption is not used in the proof of this theorem.

References

  1. Ajtai, M.: Approximate counting with uniform constant-depth circuits, advances in computational complexity theory. In: DIMACS series in discrete mathematics and theoretical computer science, american mathematical society, pp 1–20 (1993)

  2. Buhrman, H., Fortnow, L., Laplante, S.: Resource-bounded Kolmogorov complexity revisited. SIAM J. Comput. 31(3), 887–905 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  3. Demer, R.: Stack exchange discussion (Ricky Demer’s answer to the author’s question) http://cstheory.stackexchange.com/questions/34896/can-every-distribution-producible-by-a-probabilistic-pspace-machine-be-produced(2016)

  4. Furst, M., Saxe, J.B., Sipser, M.: Parity, circuits, and the polynomial-time hierarchy. Mathematical Systems Theory 17(1), 13–27 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  5. Jung, H.: Relationships between probabilistic and deterministic tape complexity. In: Mathematical foundations of computer science 1981 (MFCS 1981), Lecture Notes in Computer Science, vol. 118, pp 339–346 (1981)

  6. Kolmogorov, A.N.: Three approaches to the quantitative definition of information. Probl. Inf. Transm. 1(1), 4–11 (1965). English translation published in: International Journal of Computer Mathematics, 2, pp 157–168 (1968)

    MathSciNet  Google Scholar 

  7. Li, M., Vitányi, P.M.B.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer, Berlin (2008)

  8. Longpré, L.: Resource bounded Kolmogorov complexity, a link between computational complexity and information theory, Ph.D. thesis, TR86-776. Cornell University, Ithaca (1986)

    Google Scholar 

  9. MacWilliams, F.J., Sloane, N.J.A.: The theory of error-correcting codes. I and II. Bull. Amer. Math. Soc. 84(6), 1356–1359 (1978)

    Article  MathSciNet  Google Scholar 

  10. Milovanov, A.: On algorithmic statistics for space-bounded algorithms. In: Proceedings of 12th international computer science symposium in Russia (CSR 2017) LNCS, vol. 10304, pp 232–234 (2017)

  11. Musatov, D.: Improving the space-bounded version of Muchnik’s conditional complexity theorem via “naive” derandomization. Theory of computing systems 55 (2), 299–312 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  12. Nisan, N.: Pseudorandom bits for constant depth circuits. Combinatorica 11, 63–70 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  13. Nisan, N., Wigderson, A.: Hardness vs randomness. J. Comput. Syst. Sci. 49(2), 149–167 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  14. Shen, A.: Around Kolmogorov complexity: basic notions and results. In: Vovk, V., Papadoupoulos, H., Gammerman, A. (eds.) Measures of Complexity. Festschrift for Alexey Chervonenkis. ISBN: 978-3-319-21851-9. Springer, Berlin (2015)

  15. Shen, A.: The concept of (α,β)-stochasticity in the Kolmogorov sense, and its properties. Soviet Mathematics Doklady 271(1), 295–299 (1983)

    Google Scholar 

  16. Shen, A., Uspensky V., Vereshchagin N.: Kolmogorov complexity and algorithmic randomness, MCCME. (Russian). English translation: http://www.lirmm.fr/~ashen/kolmbook-eng.pdf (2013)

  17. Sipser, M.: A complexity theoretic approach to randomness. In: Proceedings of the 15th ACM symposium on the theory of computing, pp 330–335 (1983)

  18. Vereshchagin, N., Shen, A.: Algorithmic statistics: forty Years. In: Computability and complexity. Essays Dedicated to Rodney G. Downey on the Occasion of His 60Th Birthday. LNCS, vol. 10010, pp 669–737. Springer, Heidelberg (2017)

  19. Vereshchagin, N., Vitányi, P.M.B.: Kolmogorov’s structure functions with an application to the foundations of model selection. IEEE Trans. Inf. Theory 50(12), 3265–3290 (2004). Preliminary version: Proceedings of 47th IEEE Symposium on the Foundations of Computer Science, pp. 751–760 (2002)

    Article  MATH  Google Scholar 

  20. Vereshchagin, N., Vitányi, P.M.B.: Rate distortion and denoising of individual data using Kolmogorov complexity. IEEE Trans. Inf. Theory 56(7), 3438–3454 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

I would like to thank Bruno Bauwens, Ricky Demer, Nikolay Vereshchagin and Alexander Shen for useful discussions, advice and remarks.

This work is supported in parts by the RFBR grant 16-01-00362, by the Young Russian Mathematics award, MK-5379.2018.1 and the RaCAF ANR-15-CE40-0016-01 grant. The study has also been funded by the Russian Academic Excellence Project ‘5-100’.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexey Milovanov.

Additional information

This article is part of the Topical Collection on Computer Science Symposium in Russia

Appendix

Appendix

Proposition 13

Letx = yzbe a concatenation of strings y and z of length n where\(y= \overbrace {000 {\ldots } 00}^{n \text { zeros}}\)and C(z) > nεfor somepositiveε.Assume that x belongs to some Hamming ball B.Then

$$2{\mathrm{C}}(B) + \log |B| - {\mathrm{C}}(x) > \frac{2}{5} n - \varepsilon - O(\log n). $$

If ε is small, then Hamming ball B can not satisfy C(B) ≈ 0 and log |B|≈C(x).

Proof

Denote by r and b the radius and the center of B. Denote byb0andb1the first and the secondparts of b. (So, b = b0b1and|b0| = |b1| = n.) Then z belongs tothe Hamming ball B1withcenter b1and radius r.Hence,

$${\mathrm{C}}(B_{1}) + \log |B_{1}| \ge {\mathrm{C}}(z) - O(\log n) = {\mathrm{C}}(x) - O(\log n). $$

Note thatC(B1) ≤C(B) + O(log n). The log-sizeof B1can beestimated as \(nH(\frac {r}{n}) + O(\log n)\),where H(t) := −t log t − (1 − t) log(1 − t)is the binary Shannon entropy (see, for example, [9]). So, the inequality above can be rewrittenas

$$ {\mathrm{C}}(B) + nH\left( \frac{r}{n}\right) \ge {\mathrm{C}}(x) - O(\log n). $$
(1)

We need to show that \(2{\mathrm {C}}(B) + \log |B| - {\mathrm {C}}(x) > \frac {2}{5} n - \varepsilon - O(\log n)\),where \(\log |B| = 2nH(\frac {r}{2n}) + O(\log n)\).This inequality follows from (1) and the inequality below by an easycalculation.

$$H\left( \frac{r}{n}\right) - H\left( \frac{r}{2n}\right) \le \frac{3}{10}. $$

To verify the last inequality one can show that the maximum of the function\(H(t) - H(\frac {t}{2})\)is equal to\(\frac {\ln (\frac {3}{4} +\frac {1}{\sqrt {2}})}{\ln 4} < \frac {3}{10}\).□

1.1 Symmetry of Information

Define Cm(A,B) as the minimal length of a program that on input a pair of strings (a,b) uses at most space m, and outputs the bits (aA,bB).

Lemma 14 (Symmetry of information)

AssumeA,B ⊆{0, 1}n.Then

$$\textup{(a) } \forall m \text{ } {\mathrm{C}}^{p}(A, B) \le {\mathrm{C}}^{m}(A) + {\mathrm{C}}^{m}(B | A) + O(\log ({\mathrm{C}}^{m}(A,B)+m + n)) $$

forp = m + poly(n +Cm(A,B)).

$$\textup{(b) } \forall m \text{ } {\mathrm{C}}^{p}(A) + {\mathrm{C}}^{p}(B | A) \le {\mathrm{C}}^{m}(A, B) + O(\log ({\mathrm{C}}^{m}(A,B)+m + n) ) $$

forp = 2m + poly(n +Cm(A,B)).

Proof Proof of Lemma 14 (a)

The proof is similar to the proof of Theorem 6 (a).□

Proof Proof of Lemma 14 (b)

Let k :=Cm(A,B).Denote by \(\mathcal {D}\)the family of sets (U,V )such that Cm(U,V ) ≤ k and U,V ⊆{0, 1}n.It is clear that \(|\mathcal {D}| < 2^{k + 1}\).Denote by \(\mathcal {D}_{A}\)the pairs of \(\mathcal {D}\)for which the first element is equal to A. Let t satisfy the inequalities\(2^{t} \le |\mathcal {D}_{A}| < 2^{t + 1}\).

We prove that

  • Cp(B|A)does not exceed t significantly;

  • Cp(A)does not exceed kt significantly.

Here p = 2m + O(n).

We start with the first statement. There exists a program that enumerates all sets from\(\mathcal {D}_{A}\)using A as an oracleand that works in space 2m + O(n).Indeed, such enumeration can be done in the following way: enumerate all programs of length k andverify the following condition for every pair of n-bit strings. First, a program uses at most m space onthis input and does not loop. To verify it we need to check that the program does not compute longerthan 2O(m)steps. Second, if a second n-bit string belongs to A then the program outputs 1, and 0otherwise. Append to this program the ordinal number of a program that distinguishes(A,B). This number isnot greater than t + 1.Therefore we have Cp(B|A) ≤ t + O(log(Cm(A,B) + m + n)).

Now we prove the second statement. Note that there exist at most2kt+ 1sets U such that \(|\mathcal {D}_{U}| \ge 2^{t}\)(including A). Hence, if we construct a program that enumerates all sets with such property (anddoes not use much space) then we are finished, because the set A can be described bythe ordinal number of this enumeration. Let us construct such a program. It works asfollows:

Enumerate all sets U that are the first elements from\(\mathcal {D}\),i.e. we enumerate programs that distinguish the corresponding sets (say,lexicographically). We go to the next step if the following properties hold. First,\(|\mathcal {D}_{U}| \ge 2^{t}\), andsecond: we did not consider set U earlier (i.e. every program whose lexicographicalnumber is smaller does not distinguish U or is not the first element from a set from\(\mathcal {D}\)).

This program uses 2m + poly(n +Cm(A,B))spaceand has length O(log(Cm(A) + n + m)), and hencesatisfies all requirements. □

Proof Proof of Lemma 11

It suffices to show that \(\mathcal {B}\)satisfies property (1)with probability at most 2n,because \(\mathcal {B}\)satisfies property (2)with probability at most \(\frac {1}{4}\).

For this let us show that every part is ‘bad’ (i.e. has at least(n + k)2 + 1sets from\(\mathcal {B}\)) with probabilityat most 2− 2n.The probability of such event is equal to the probability that a binomial random variable withparameters (2k, 2k(n + 2) ln 2)exceeds (n + k)2.To bound this, we use an easy but lengthy sequence of estimations. Forw := 2k,p := 2k(n + 2) ln 2andv := (n + k)2wehave

$$\sum\limits_{i=v}^{w} {{w}\choose{i}} p^{i}(1-p)^{w-i} < w \cdot {{w}\choose{v}} p^{v}(1-p)^{w-v} < w \cdot {{w}\choose{v}} p^{v} < w \frac{(wp)^{v}}{v!}. $$

The leftmostinequality follows from wp = (n + 2) ln 2 ≤ (n + k)2 = v.Because wp = (n + 2) ln 2 < 10n,we obtain

$$w \frac{(wp)^{v}}{v!} < \frac{2^{k} (10n)^{(n+k)^{2}}}{((n+k)^{2})!} \ll 2^{-2n}. $$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Milovanov, A. On Algorithmic Statistics for Space-bounded Algorithms. Theory Comput Syst 63, 833–848 (2019). https://doi.org/10.1007/s00224-018-9845-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-018-9845-6

Keywords

Navigation