Skip to main content

Predictive Rate-Distortion for Infinite-Order Markov Processes

A Correction to this article was published on 13 February 2021

This article has been updated


Predictive rate-distortion analysis suffers from the curse of dimensionality: clustering arbitrarily long pasts to retain information about arbitrarily long futures requires resources that typically grow exponentially with length. The challenge is compounded for infinite-order Markov processes, since conditioning on finite sequences cannot capture all of their past dependencies. Spectral arguments confirm a popular intuition: algorithms that cluster finite-length sequences fail dramatically when the underlying process has long-range temporal correlations and can fail even for processes generated by finite-memory hidden Markov models. We circumvent the curse of dimensionality in rate-distortion analysis of finite- and infinite-order processes by casting predictive rate-distortion objective functions in terms of the forward- and reverse-time causal states of computational mechanics. Examples demonstrate that the resulting algorithms yield substantial improvements.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Change history


  1. The predictive information function \(R(I_0)\) is the predictive rate-distortion function R(D) evaluated at \(D = E - I_0\).

  2. More precisely, each element of \(H(W^{\mathcal {A}})\) is the entropy in the next observation given that one is currently in the corresponding mixed state.

  3. These information functions are closely related to the more familiar information curves seen in Refs. [8, 9] and elsewhere, as the informational distortion is the excess entropy less the predictable information captured.


  1. Crutchfield, J.P., Young, K.: Inferring statistical complexity. Phys. Rev. Lett. 63, 105–108 (1989)

    ADS  MathSciNet  Article  Google Scholar 

  2. Shalizi, C.R., Crutchfield, J.P.: Computational mechanics: pattern and prediction, structure and simplicity. J. Stat. Phys. 104, 817–879 (2001)

    MathSciNet  Article  MATH  Google Scholar 

  3. Crutchfield, J.P.: Between order and chaos. Nat. Phys. 8(January), 17–24 (2012)

    MathSciNet  Google Scholar 

  4. Crutchfield, J.P.: The calculi of emergence: computation, dynamics, and induction. Phys. D 75, 11–54 (1994)

    Article  MATH  Google Scholar 

  5. Upper, D.R.: Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models. PhD thesis, University of California, Berkeley. Published by University Microfilms International, Ann Arbor (1997)

  6. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)

  7. Shannon, C.E.: Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Convention Rec. Part 4, 7:325–350 (1959)

  8. Still, S., Crutchfield, J.P.: Structure or noise? 2007. Santa Fe Institute Working Paper 2007–08-020. arXiv:0708.0654

  9. Still, S., Crutchfield, J.P., Ellison, C.J.: Optimal causal inference: estimating stored information and approximating causal architecture. Chaos 20(3), 037111 (2010)

    ADS  MathSciNet  Article  MATH  Google Scholar 

  10. Palmer, S.E., Marre, O., Berry, M.J., Bialek, W.: Predictive information in a sensory population. Proc. Natl. Acad. Sci. USA 112(22), 6908–6913 (2015)

    ADS  Article  Google Scholar 

  11. Andrews, B.W., Iglesias, P.A.: An information-theoretic characterization of the optimal gradient sensing response of cells. PLoS Comput. Biol. 3(8), 1489–1497 (2007)

    MathSciNet  Article  Google Scholar 

  12. Sims, C.R.: The cost of misremembering: inferring the loss function in visual working memory. J. Vis. 15(3), 2 (2015)

    MathSciNet  Article  Google Scholar 

  13. Li, M., Vitanyi, P.M.B.: An Introduction to Kolmogorov Complexity and its Applications. Springer, New York (1993)

    Book  MATH  Google Scholar 

  14. Bialek, W., Nemenman, I., Tishby, N.: Predictability, complexity, and learning. Neural Comp. 13, 2409–2463 (2001)

    Article  MATH  Google Scholar 

  15. Bar, M.: Predictions: a universal principle in the operation of the human brain. Phil. Trans. Roy. Soc. Lond. Ser. B: Biol. Sci. 364(1521), 1181–1182 (2009)

    Article  Google Scholar 

  16. Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, New York (2006)

    MATH  Google Scholar 

  17. Ephraim, Y., Merhav, N.: Hidden Markov processes. IEEE Trans. Info. Theory 48(6), 1518–1569 (2002)

    MathSciNet  Article  MATH  Google Scholar 

  18. Paz, A.: Introduction to Probabilistic Automata. Academic Press, New York (1971)

    MATH  Google Scholar 

  19. Rabiner, L.R., Juang, B.H.: An introduction to hidden Markov models. IEEE ASSP Magazine (1986)

  20. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications. IEEE Proc. 77, 257 (1989)

    Article  Google Scholar 

  21. Lohr, W.: Properties of the statistical complexity functional and partially deterministic hmms. Entropy 11(3), 385–401 (2009)

    ADS  MathSciNet  Article  MATH  Google Scholar 

  22. Crutchfield, J.P., Ellison, C.J., Mahoney, J.R.: Time’s barbed arrow: Irreversibility, crypticity, and stored information. Phys. Rev. Lett. 103(9), 094101 (2009)

    ADS  Article  Google Scholar 

  23. Ellison, C.J., Mahoney, J.R., James, R.G., Crutchfield, J.P., Reichardt, J.: Information symmetries in irreversible processes. Chaos 21(3), 037107 (2011)

    ADS  MathSciNet  Article  MATH  Google Scholar 

  24. Creutzig, F., Globerson, A., Tishby, N.: Past-future information bottleneck in dynamical systems. Phys. Rev. E 79, 041925 (2009)

    ADS  Article  Google Scholar 

  25. Gray, R.M.: Source Coding Theory. Kluwer Academic Press, Norwell (1990)

    MATH  Google Scholar 

  26. Tishby, N., Pereira, F.C., Bialek, W. The information bottleneck method. In: The 37th Annual Allerton Conference on Communication, Control, and Computing (1999)

  27. Harremoës, P., Tishby, N.: The information bottleneck revisited or how to choose a good distortion measure. In: IEEE International Symposium on Information Theory. ISIT 2007, pp. 566–570. (2007)

  28. Ellison, C.J., Mahoney, J.R., Crutchfield, J.P.: Prediction, retrodiction, and the amount of information stored in the present. J. Stat. Phys. 136(6), 1005–1034 (2009)

    ADS  MathSciNet  Article  MATH  Google Scholar 

  29. Still, S.: Information bottleneck approach to predictive inference. Entropy 16(2), 968–989 (2014)

    ADS  MathSciNet  Article  Google Scholar 

  30. Shalizi, C.R., Crutchfield, J.P.: Information bottlenecks, causal states, and statistical relevance bases: How to represent relevant information in memoryless transduction. Adv. Comp. Syst. 5(1), 91–95 (2002)

    Article  MATH  Google Scholar 

  31. Creutzig, F., Sprekeler, H.: Predictive coding and the slowness principle: an information-theoretic approach. Neural Comput. 20, 1026–1041 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  32. Gueguen, L., Datcu, M.: Image time-series data mining based on the information-bottleneck principle. IEEE Trans. Geo. Remote Sens. 45(4), 827–838 (2007)

    ADS  Article  Google Scholar 

  33. Gueguen, L., Le Men, C., Datcu, M.: Analysis of satellite image time series based on information bottleneck. Bayesian Inference and Maximum Entropy Methods in Science and Engineering (AIP Conference Proceedings). vol. 872, pp. 367–374 (2006)

  34. Rey, M., Roth, V.: Meta-Gaussian information bottleneck. Adv. Neural Info. Proc. Sys. 25, 1925–1933 (2012)

    Google Scholar 

  35. Ara, P.M., James, R.G., Crutchfield, J.P.: The elusive present: Hidden past and future dependence and why we build models. Phys. Rev. E, 93(2):022143 (2016)

  36. Crutchfield, J.P., Riechers, P., Ellison, C.J.: Exact complexity: spectral decomposition of intrinsic computation. Phys. Lett. A, 380(9–10):998–1002 (2015)

  37. Crutchfield, J.P., Feldman, D.P.: Regularities unseen, randomness observed: Levels of entropy convergence. Chaos 13(1), 25–54 (2003)

    ADS  MathSciNet  Article  MATH  Google Scholar 

  38. Debowski, L.: Excess entropy in natural language: present state and perspectives. Chaos 21(3), 037105 (2011)

    ADS  MathSciNet  Article  MATH  Google Scholar 

  39. Travers, N., Crutchfield, J.P.: Infinite excess entropy processes with countable-state generators. Entropy 16, 1396–1413 (2014)

    ADS  MathSciNet  Article  Google Scholar 

  40. Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S.: An information theoretic analysis of maximum likelihood mixture estimation for exponential families. In: Proceedings of Twenty-First International Conference Machine Learning, p. 8. ACM (2004)

  41. Brodu, N.: Reconstruction of \(\epsilon \)-machines in predictive frameworks and decisional states. Adv. Complex Syst. 14(05), 761–794 (2011)

    MathSciNet  Article  Google Scholar 

  42. Crutchfield, J.P., Ellison, C.J.: The past and the future in the present. 2014. SFI Working Paper 10–12-034; arXiv:1012.0356 [nlin.CD]

  43. Csiszar, I.: Information measures: a critical survey. In: Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, (1974), pp. 73–86. Academia (1977)

  44. Parker, A.E., Gedeon, T., Dimitrov, A.G.: Annealing and the rate distortion problem. In: Advances Neural Information Processing Systems, pp. 969–976 (2002)

  45. Parker, A.E., Gedeon, T.: Bifurcation structure of a class of SN-invariant constrained optimization problems. J. Dyn. Diff. Eq. 16(3), 629–678 (2004)

    MathSciNet  Article  MATH  Google Scholar 

  46. Parker, A.E., Dimitrov, A.G., Gedeon, T.: Symmetry breaking in soft clustering decoding of neural codes. IEEE Trans. Info. Th. 56(2), 901–927 (2010)

    MathSciNet  Article  Google Scholar 

  47. Elidan, G., Friedman, N.: The information bottleneck EM algorithm. In: Proceedings of Nineteenth Conference Uncertainty in Artificial Intellligence, UAI’03, pp. 200–208. Morgan Kaufmann Publishers Inc., San Francisco (2003)

  48. Marzen, S., Crutchfield, J.P.: Informational and causal architecture of discrete-time renewal processes. Entropy 17(7), 4891–4917 (2015)

    ADS  Article  Google Scholar 

  49. Rose, K.: A mapping approach to rate-distortion computation and analysis. IEEE Trans. Info. Ther. 40(6), 1939–1952 (1994)

    Article  MATH  Google Scholar 

  50. Marzen, S., Crutchfield, J.P.: Information anatomy of stochastic equilibria. Entropy 16, 4713–4748 (2014)

    ADS  MathSciNet  Article  Google Scholar 

  51. Crutchfield, J.P., Farmer, J.D., Huberman, B.A.: Fluctuations and simple chaotic dynamics. Phys. Rep. 92, 45 (1982)

    ADS  MathSciNet  Article  Google Scholar 

  52. James, R.G., Burke, K., Crutchfield, J.P.: Chaos forgets and remembers: measuring information creation, destruction, and storage. Phys. Lett. A 378, 2124–2127 (2014)

    ADS  Article  Google Scholar 

  53. Young, K., Crutchfield, J.P.: Fluctuation spectroscopy. Chaos Solitons Fractals 4, 5–39 (1994)

    ADS  Article  MATH  Google Scholar 

  54. Riechers, P.M., Mahoney, J.R., Aghamohammadi, C., Crutchfield, J.P.: Minimized state-complexity of quantum-encoded cryptic processes. Physical Review A (2016, in press). arXiv:1510.08186 [physics.quant-ph]

  55. Nemenman, I., Shafee, F., Bialek, W.: Entropy and inference, revisited. In: Dietterich, T.G., Becker, S., Ghahramani, Z., (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 471–478. MIT Press, Cambridge (2002)

  56. Blackwell, D.: The entropy of functions of finite-state Markov chains. vol. 28, pp. 13–20, Publishing House of the Czechoslovak Academy of Sciences, Prague, 1957. Held at Liblice near Prague from November 28 to 30 (1956)

  57. Haslinger, R., Klinkner, K.L., Shalizi, C.R.: The computational structure of spike trains. Neural Comp. 22, 121–157 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  58. Watson, R.: The Structure of Dynamic Memory in a Simple Model of Inhibitory Neural Feedback. PhD thesis, University of California, Davis (2014)

Download references


The authors thank C. Ellison, C. Hillar, W. Bialek, I. Nemenman, P. Riechers, and S. Still for helpful discussions. This material is based upon work supported by, or in part by, the U. S. Army Research Laboratory and the U. S. Army Research Office under contract number W911NF-13-1-0390. SM was funded by a National Science Foundation Graduate Student Research Fellowship and the U.C. Berkeley Chancellor’s Fellowship.

Author information

Authors and Affiliations


Corresponding author

Correspondence to James P. Crutchfield.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Marzen, S.E., Crutchfield, J.P. Predictive Rate-Distortion for Infinite-Order Markov Processes. J Stat Phys 163, 1312–1338 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Optimal causal filtering
  • Computational mechanics
  • Epsilon-machine
  • Causal states
  • Predictive rate-distortion
  • Information bottleneck