Advertisement

Algorithmic Statistics: Forty Years Later

  • Nikolay Vereshchagin
  • Alexander ShenEmail author
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10010)

Abstract

Algorithmic statistics has two different (and almost orthogonal) motivations. From the philosophical point of view, it tries to formalize how the statistics works and why some statistical models are better than others. After this notion of a “good model” is introduced, a natural question arises: it is possible that for some piece of data there is no good model? If yes, how often these bad (non-stochastic) data appear “in real life”?

Another, more technical motivation comes from algorithmic information theory. In this theory a notion of complexity of a finite object (=amount of information in this object) is introduced; it assigns to every object some number, called its algorithmic complexity (or Kolmogorov complexity). Algorithmic statistic provides a more fine-grained classification: for each finite object some curve is defined that characterizes its behavior. It turns out that several different definitions give (approximately) the same curve.

Road-map: Sect. 2 considers the notion of \((\alpha ,\beta )\)-stochasticity; Sect. 3 considers two-part descriptions and the so-called “minimal description length principle”; Sect. 4 gives one more approach: we consider the list of objects of bounded complexity and measure how far some object is from the end of the list, getting some natural class of “standard descriptions” as a by-product; finally, Sect. 5 establishes a connection between these notions and resource-bounded complexity. The rest of the paper deals with an attempts to make theory close to practice by considering restricted classes of descriptions (Sect. 6) and strong models (Sect. 7).

In this survey we try to provide an exposition of the main results in the field (including full proofs for the most important ones), as well as some historical comments. We assume that the reader is familiar with the main notions of algorithmic information (Kolmogorov complexity) theory. An exposition can be found in [42, Chaps. 1, 3, 4] or [22, Chaps. 2, 3], see also the survey [36].

A short survey of main results of algorithmic statistics was given in [41] (without proofs); see also the last chapter of the book [42].

Notes

Acknowledgments

We are grateful to several people who contributed and/or carefully read preliminary versions of this survey, in particular, to B. Bauwens, P. Gács, A. Milovanov, G. Novikov, A. Romashchenko, P. Vitányi, and to all participants of Kolmogorov seminar in Moscow State University and ESCAPE group in LIRMM. We are also grateful to an anonymous referee for correcting several mistakes.

References

  1. 1.
    Adleman, L.M.: Time, space and randomness. MIT report MIT/LCS/TM-131, March 1979Google Scholar
  2. 2.
    Antunes, L., Bauwens, B., Souto, A., Teixeira, A.: Sophistication vs. logical depth. Theory of Computing Systems. doi: 10.1007/s00224-016-9672-6
  3. 3.
    Antunes, L., Fortnow, L.: Sophistication revisited. Theory Comput. Syst. 45(1), 150–161 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Antunes, L., Fortnow, L., van Melkebeek, D.: Computational depth, In: Proceedings of the 16th IEEE Conference on Computational Complexity, pp. 266–273. IEEE, New York (2001). Journal version: Computational depth: concept and applications. Theoret. Comput. Sci. 354(3), 391–404 (2006)Google Scholar
  5. 5.
    Antunes, L., Matos, A., Souto, A., Vitányi, P.: Depth as randomness deficiency. Theory Comput. Syst. 45(4), 724–739 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Bauwens, B.: Computability in statistical hypotheses testing, and characterizations of independence and directed influences in time series using Kolmogorov complexity. Ph.D. thesis, University of Gent, May 2010Google Scholar
  7. 7.
    Bennett, C.H.: Logical depth and physical complexity. In: Herken, R. (ed.) The Universal Turing Machine: A Half-Century Survey, pp. 227–257. Oxford University Press, New York (1988)Google Scholar
  8. 8.
    Bienvenu, L., Desfontaines, D., Shen, A.: What percentage of programs halt? In: Halldórsson, M.M., Iwama, K., Kobayashi, N., Speckmann, B. (eds.) ICALP 2015. LNCS, vol. 9134, pp. 219–230. Springer, Heidelberg (2015). doi: 10.1007/978-3-662-47672-7_18 Google Scholar
  9. 9.
    Bienvenu, L., Gács, P., Hoyrup, M., Rojas, C., Shen, A.: Algorithmic tests and randomness with respect to a class of measures. Proc. Steklov Inst. Math. 274, 34–89 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Cover, T.: Kolmogorov complexity, data compression and inference. In: Skwirzynski, J.K. (ed.) The Impact of Processing Techniques on Communications. NATO ASI Series, vol. 91, pp. 23–33. Martinus Nijhoff Publishers, Dordrecht (1985). doi: 10.1007/978-94-009-5113-6_2 CrossRefGoogle Scholar
  11. 11.
    Epstein, S., Levin, L.: Sets have simple members. http://arxiv.org/abs/1107.1458, reposted as http://arxiv.org/abs/1403.4539
  12. 12.
    Gács, P.: On the relation between descriptional complexity and algorithmic probability. Theoret. Comput. Sci. 22, 71–93 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Gács, P., Tromp, J., Vitányi, P.M.B.: Algorithmic statistics. IEEE Trans. Inf. Theory 47(6), 2443–2463 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Kolmogorov, A.N.: Three approaches to the quantitative definition of information (in Russian). Prob. Inf. Trans. 1(1), 4–11 (1965). English translation published: Int. J. Comput. Math. 2, 157–168 (1968)MathSciNetGoogle Scholar
  15. 15.
    Kolmogorov, A.N.: Talk at the Information Theory Symposium in Tallinn, Estonia (then USSR) (1974) [As reported by Cover in his 1985 paper [10]]Google Scholar
  16. 16.
    Kolmogorov, A.N.: The complexity of algorithms and the objective definition of randomness. Summary of the talk presented April 16, 1974 at Moscow Mathematical Society. Uspekhi matematicheskikh nauk (Russian) 29(4[178]), 155 (1974). http://mi.mathnet.ru/rus/umn/v29/i4/p153 (A short note in Russian)
  17. 17.
    Kolmogorov, A.N.: Talk at the seminar at Moscow State University Mathematics Department (Logic Division), 26 November 1981. [The definition of \((\alpha ,\beta )\)-stochasticity was defined in this talk, and the question about the fraction of non-stochastic objects was posed.]Google Scholar
  18. 18.
    Koppel, M.: Complexity, depth and sophistication. Compl. Syst. 1, 1087–1091 (1987)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Koppel, M.: Structure. In: Herken, R. (ed.) The Universal Turing Machine: A Half-Century Survey, pp. 435–452. Oxford University Press (1988)Google Scholar
  20. 20.
    Koppel, M., Atlan, H.: An almost machine-independent theory of program-length complexity, sophistication, and induction. Inf. Sci. 56(1–3), 23–33 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Levin, L.: Randomness conservation inequalities; information and independence in mathematical theories. Inf. Control 61(1), 15–37 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and its Applications, 3rd edn. Springer, New York (2008)CrossRefzbMATHGoogle Scholar
  23. 23.
    Longpré, L.: Resource bounded Kolmogorov complexity, a link between computational complexity and information theory. Ph.D. Thesis, Department of Computer Science, Cornell University, TR 86–776 (1986)Google Scholar
  24. 24.
    Milovanov, A.: Some properties of antistochastic strings. In: Beklemishev, L.D., Musatov, D.V. (eds.) CSR 2015. LNCS, vol. 9139, pp. 339–349. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-20297-6_22 Google Scholar
  25. 25.
    Milovanov, A.: Algorithmic statistic, prediction and machine learning. In: 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016), Leibnitz International Proceedings in Informatics (LIPIcs), vol. 47, pp. 54:1–54:13 (2016). doi: 10.4230/LIPIcs.STACS.2016.54, http://drops.dagstuhl.de/opus/volltexte/2016/5755/
  26. 26.
    Milovanov, A.: Algorithmic statistics: normal objects and universal models. In: Kulikov, A.S., Woeginger, G.J. (eds.) CSR 2016. LNCS, vol. 9691, pp. 280–293. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-34171-2_20 Google Scholar
  27. 27.
    Mota, F., Aaronson, S., Antunes, L., Souto, A.: Sophistication as randomness deficiency. In: Jurgensen, H., Reis, R. (eds.) DCFS 2013. LNCS, vol. 8031, pp. 172–181. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-39310-5_17 CrossRefGoogle Scholar
  28. 28.
    Muchnik, An.A., Mezhirov, I., Shen, A., Vereshchagin, N.K.: Game interpretation of Kolmogorov complexity. https://arxiv.org/abs/1003.4712
  29. 29.
    Muchnik, An.A., Romashchenko, A.: Stability of properties of Kolmogorov complexity under relativization. Prob. Inf. Trans. 46(1), 38–61 (2010)Google Scholar
  30. 30.
    Muchnik, An.A., Semenov, A.L., Uspensky, V.A.: Mathematical metaphysics of randomness. Theoret. Comput. Sci. 207(2), 263–317 (1998)Google Scholar
  31. 31.
    Muchnik, An.A., Shen, A., Vyugin, M.: Game arguments in computability theory and algorithmic information theory. https://arxiv.org/pdf/1204.0198.pdf
  32. 32.
    de Rooij, S., Vitányi, P.M.B.: Approximating rate-distortion graphs of individual data: experiments in lossy compression and denoising. IEEE Trans. Comput. 61(3), 395–407 (2012)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)CrossRefzbMATHGoogle Scholar
  34. 34.
    Shen, A.: The concept of \((\alpha,\beta )\)-stochasticity in the Kolmogorov sense, and its properties. Soviet Math. Dokl. 28(1), 295–299 (1983)zbMATHGoogle Scholar
  35. 35.
    Shen, A.: Discussion on Kolmogorov complexity and statistical analysis. Comput. J. 42(4), 340–342 (1999)CrossRefzbMATHGoogle Scholar
  36. 36.
    Shen, A.: Around Kolmogorov complexity: basic notions and results. In: Vovk, V., Papadopoulos, H., Gammerman, A. (eds.) Measures of Complexity: Festschrift for Alexey Chervonenkis, pp. 75–116. Springer, Switzerland (2015). doi: 10.1007/978-3-319-21852-6_7. arXiv:1504.04955 CrossRefGoogle Scholar
  37. 37.
    Solomonoff, R.: A formal theory of inductive inference. Part I. Inf. Control 7(1), 1–22 (1964)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Solomonoff, R.: A formal theory of inductive inference. Part II. Applications of the systems to various problems in induction. Inf. Control 7(2), 224–254 (1964)MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Vereshchagin, N.: Algorithmic minimal sufficient statistic revisited. In: Ambos-Spies, K., Löwe, B., Merkle, W. (eds.) CiE 2009. LNCS, vol. 5635, pp. 478–487. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-03073-4_49 CrossRefGoogle Scholar
  40. 40.
    Vereshchagin, N.: Algorithmic minimal sufficient statistics: a new approach. Theory Comput. Syst. 58(3), 463–481 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Vereshchagin, N., Shen, A.: Algorithmic statistics revisited. In: Vovk, V., Papadopoulos, H., Gammerman, A. (eds.) Measures of Complexity: Festschrift for Alexey Chervonenkis, pp. 235–252. Springer, Switzerland (2015). arXiv:1504.04950 CrossRefGoogle Scholar
  42. 42.
    Vereshchagin, N., Uspensky, V., Shen, A.: Kolmogorov complexity and algorithmic randomness. In: MCCME 2013, 576 pp., Moscow (2013). http://www.lirmm.fr/~ashen/kolmbook.pdf (Russian version), http://www.lirmm.fr/~ashen/kolmbook-eng.pdf (English version)
  43. 43.
    Vereshchagin, N.K., Vitányi, P.M.B.: Kolmogorov’s structure functions and model selection. IEEE Trans. Inf. Theory 50(12), 3265–3290 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  44. 44.
    Vereshchagin, N.K., Vitányi, P.M.B.: Rate distortion and denoising of individual data using Kolmogorov complexity. IEEE Trans. Inf. Theory 56(7), 3438–3454 (2010)MathSciNetCrossRefGoogle Scholar
  45. 45.
    Vitányi, P.M.B.: Meaningful information. IEEE Trans. Inf. Theory 52(10), 4617–4626 (2006). arXiv:cs/0111053 MathSciNetCrossRefzbMATHGoogle Scholar
  46. 46.
    V’yugin, V.V.: On the defect of randomness of a finite object with respect to measures with given complexity bounds. SIAM Theory Prob. Appl. 32(3), 508–512 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  47. 47.
    V’yugin, V.V.: Algorithmic complexity and stochastic properties of finite binary sequences. Comput. J. 42(4), 294–317 (1999)CrossRefzbMATHGoogle Scholar
  48. 48.
    V’yugin, V.V.: Does snooping help? Theoret. Comput. Sci. 276(1), 407–415 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  49. 49.
    Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11(2), 185–194 (1968)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.National Research University Higher School of EconomicsMoscowRussia
  2. 2.YandexMoscowRussia
  3. 3.Moscow State UniversityMoscowRussia
  4. 4.LIRMM, CNRS & Univ. MontpellierMontpellierFrance

Personalised recommendations