Journal of Statistical Physics

, Volume 163, Issue 6, pp 1312–1338 | Cite as

Predictive Rate-Distortion for Infinite-Order Markov Processes

  • Sarah E. Marzen
  • James P. Crutchfield


Predictive rate-distortion analysis suffers from the curse of dimensionality: clustering arbitrarily long pasts to retain information about arbitrarily long futures requires resources that typically grow exponentially with length. The challenge is compounded for infinite-order Markov processes, since conditioning on finite sequences cannot capture all of their past dependencies. Spectral arguments confirm a popular intuition: algorithms that cluster finite-length sequences fail dramatically when the underlying process has long-range temporal correlations and can fail even for processes generated by finite-memory hidden Markov models. We circumvent the curse of dimensionality in rate-distortion analysis of finite- and infinite-order processes by casting predictive rate-distortion objective functions in terms of the forward- and reverse-time causal states of computational mechanics. Examples demonstrate that the resulting algorithms yield substantial improvements.


Optimal causal filtering Computational mechanics Epsilon-machine  Causal states Predictive rate-distortion  Information bottleneck 



The authors thank C. Ellison, C. Hillar, W. Bialek, I. Nemenman, P. Riechers, and S. Still for helpful discussions. This material is based upon work supported by, or in part by, the U. S. Army Research Laboratory and the U. S. Army Research Office under contract number W911NF-13-1-0390. SM was funded by a National Science Foundation Graduate Student Research Fellowship and the U.C. Berkeley Chancellor’s Fellowship.


  1. 1.
    Crutchfield, J.P., Young, K.: Inferring statistical complexity. Phys. Rev. Lett. 63, 105–108 (1989)ADSMathSciNetCrossRefGoogle Scholar
  2. 2.
    Shalizi, C.R., Crutchfield, J.P.: Computational mechanics: pattern and prediction, structure and simplicity. J. Stat. Phys. 104, 817–879 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Crutchfield, J.P.: Between order and chaos. Nat. Phys. 8(January), 17–24 (2012)MathSciNetGoogle Scholar
  4. 4.
    Crutchfield, J.P.: The calculi of emergence: computation, dynamics, and induction. Phys. D 75, 11–54 (1994)CrossRefzbMATHGoogle Scholar
  5. 5.
    Upper, D.R.: Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models. PhD thesis, University of California, Berkeley. Published by University Microfilms International, Ann Arbor (1997)Google Scholar
  6. 6.
    Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)Google Scholar
  7. 7.
    Shannon, C.E.: Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Convention Rec. Part 4, 7:325–350 (1959)Google Scholar
  8. 8.
    Still, S., Crutchfield, J.P.: Structure or noise? 2007. Santa Fe Institute Working Paper 2007–08-020. arXiv:0708.0654
  9. 9.
    Still, S., Crutchfield, J.P., Ellison, C.J.: Optimal causal inference: estimating stored information and approximating causal architecture. Chaos 20(3), 037111 (2010)ADSMathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Palmer, S.E., Marre, O., Berry, M.J., Bialek, W.: Predictive information in a sensory population. Proc. Natl. Acad. Sci. USA 112(22), 6908–6913 (2015)ADSCrossRefGoogle Scholar
  11. 11.
    Andrews, B.W., Iglesias, P.A.: An information-theoretic characterization of the optimal gradient sensing response of cells. PLoS Comput. Biol. 3(8), 1489–1497 (2007)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Sims, C.R.: The cost of misremembering: inferring the loss function in visual working memory. J. Vis. 15(3), 2 (2015)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Li, M., Vitanyi, P.M.B.: An Introduction to Kolmogorov Complexity and its Applications. Springer, New York (1993)CrossRefzbMATHGoogle Scholar
  14. 14.
    Bialek, W., Nemenman, I., Tishby, N.: Predictability, complexity, and learning. Neural Comp. 13, 2409–2463 (2001)CrossRefzbMATHGoogle Scholar
  15. 15.
    Bar, M.: Predictions: a universal principle in the operation of the human brain. Phil. Trans. Roy. Soc. Lond. Ser. B: Biol. Sci. 364(1521), 1181–1182 (2009)CrossRefGoogle Scholar
  16. 16.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, New York (2006)zbMATHGoogle Scholar
  17. 17.
    Ephraim, Y., Merhav, N.: Hidden Markov processes. IEEE Trans. Info. Theory 48(6), 1518–1569 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Paz, A.: Introduction to Probabilistic Automata. Academic Press, New York (1971)zbMATHGoogle Scholar
  19. 19.
    Rabiner, L.R., Juang, B.H.: An introduction to hidden Markov models. IEEE ASSP Magazine (1986)Google Scholar
  20. 20.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications. IEEE Proc. 77, 257 (1989)CrossRefGoogle Scholar
  21. 21.
    Lohr, W.: Properties of the statistical complexity functional and partially deterministic hmms. Entropy 11(3), 385–401 (2009)ADSMathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Crutchfield, J.P., Ellison, C.J., Mahoney, J.R.: Time’s barbed arrow: Irreversibility, crypticity, and stored information. Phys. Rev. Lett. 103(9), 094101 (2009)ADSCrossRefGoogle Scholar
  23. 23.
    Ellison, C.J., Mahoney, J.R., James, R.G., Crutchfield, J.P., Reichardt, J.: Information symmetries in irreversible processes. Chaos 21(3), 037107 (2011)ADSMathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Creutzig, F., Globerson, A., Tishby, N.: Past-future information bottleneck in dynamical systems. Phys. Rev. E 79, 041925 (2009)ADSCrossRefGoogle Scholar
  25. 25.
    Gray, R.M.: Source Coding Theory. Kluwer Academic Press, Norwell (1990)zbMATHGoogle Scholar
  26. 26.
    Tishby, N., Pereira, F.C., Bialek, W. The information bottleneck method. In: The 37th Annual Allerton Conference on Communication, Control, and Computing (1999)Google Scholar
  27. 27.
    Harremoës, P., Tishby, N.: The information bottleneck revisited or how to choose a good distortion measure. In: IEEE International Symposium on Information Theory. ISIT 2007, pp. 566–570. (2007)Google Scholar
  28. 28.
    Ellison, C.J., Mahoney, J.R., Crutchfield, J.P.: Prediction, retrodiction, and the amount of information stored in the present. J. Stat. Phys. 136(6), 1005–1034 (2009)ADSMathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Still, S.: Information bottleneck approach to predictive inference. Entropy 16(2), 968–989 (2014)ADSMathSciNetCrossRefGoogle Scholar
  30. 30.
    Shalizi, C.R., Crutchfield, J.P.: Information bottlenecks, causal states, and statistical relevance bases: How to represent relevant information in memoryless transduction. Adv. Comp. Syst. 5(1), 91–95 (2002)CrossRefzbMATHGoogle Scholar
  31. 31.
    Creutzig, F., Sprekeler, H.: Predictive coding and the slowness principle: an information-theoretic approach. Neural Comput. 20, 1026–1041 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Gueguen, L., Datcu, M.: Image time-series data mining based on the information-bottleneck principle. IEEE Trans. Geo. Remote Sens. 45(4), 827–838 (2007)ADSCrossRefGoogle Scholar
  33. 33.
    Gueguen, L., Le Men, C., Datcu, M.: Analysis of satellite image time series based on information bottleneck. Bayesian Inference and Maximum Entropy Methods in Science and Engineering (AIP Conference Proceedings). vol. 872, pp. 367–374 (2006)Google Scholar
  34. 34.
    Rey, M., Roth, V.: Meta-Gaussian information bottleneck. Adv. Neural Info. Proc. Sys. 25, 1925–1933 (2012)Google Scholar
  35. 35.
    Ara, P.M., James, R.G., Crutchfield, J.P.: The elusive present: Hidden past and future dependence and why we build models. Phys. Rev. E, 93(2):022143 (2016)Google Scholar
  36. 36.
    Crutchfield, J.P., Riechers, P., Ellison, C.J.: Exact complexity: spectral decomposition of intrinsic computation. Phys. Lett. A, 380(9–10):998–1002 (2015)Google Scholar
  37. 37.
    Crutchfield, J.P., Feldman, D.P.: Regularities unseen, randomness observed: Levels of entropy convergence. Chaos 13(1), 25–54 (2003)ADSMathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Debowski, L.: Excess entropy in natural language: present state and perspectives. Chaos 21(3), 037105 (2011)ADSMathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Travers, N., Crutchfield, J.P.: Infinite excess entropy processes with countable-state generators. Entropy 16, 1396–1413 (2014)ADSMathSciNetCrossRefGoogle Scholar
  40. 40.
    Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S.: An information theoretic analysis of maximum likelihood mixture estimation for exponential families. In: Proceedings of Twenty-First International Conference Machine Learning, p. 8. ACM (2004)Google Scholar
  41. 41.
    Brodu, N.: Reconstruction of \(\epsilon \)-machines in predictive frameworks and decisional states. Adv. Complex Syst. 14(05), 761–794 (2011)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Crutchfield, J.P., Ellison, C.J.: The past and the future in the present. 2014. SFI Working Paper 10–12-034; arXiv:1012.0356 [nlin.CD]
  43. 43.
    Csiszar, I.: Information measures: a critical survey. In: Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, (1974), pp. 73–86. Academia (1977)Google Scholar
  44. 44.
    Parker, A.E., Gedeon, T., Dimitrov, A.G.: Annealing and the rate distortion problem. In: Advances Neural Information Processing Systems, pp. 969–976 (2002)Google Scholar
  45. 45.
    Parker, A.E., Gedeon, T.: Bifurcation structure of a class of SN-invariant constrained optimization problems. J. Dyn. Diff. Eq. 16(3), 629–678 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  46. 46.
    Parker, A.E., Dimitrov, A.G., Gedeon, T.: Symmetry breaking in soft clustering decoding of neural codes. IEEE Trans. Info. Th. 56(2), 901–927 (2010)MathSciNetCrossRefGoogle Scholar
  47. 47.
    Elidan, G., Friedman, N.: The information bottleneck EM algorithm. In: Proceedings of Nineteenth Conference Uncertainty in Artificial Intellligence, UAI’03, pp. 200–208. Morgan Kaufmann Publishers Inc., San Francisco (2003)Google Scholar
  48. 48.
    Marzen, S., Crutchfield, J.P.: Informational and causal architecture of discrete-time renewal processes. Entropy 17(7), 4891–4917 (2015)ADSCrossRefGoogle Scholar
  49. 49.
    Rose, K.: A mapping approach to rate-distortion computation and analysis. IEEE Trans. Info. Ther. 40(6), 1939–1952 (1994)CrossRefzbMATHGoogle Scholar
  50. 50.
    Marzen, S., Crutchfield, J.P.: Information anatomy of stochastic equilibria. Entropy 16, 4713–4748 (2014)ADSMathSciNetCrossRefGoogle Scholar
  51. 51.
    Crutchfield, J.P., Farmer, J.D., Huberman, B.A.: Fluctuations and simple chaotic dynamics. Phys. Rep. 92, 45 (1982)ADSMathSciNetCrossRefGoogle Scholar
  52. 52.
    James, R.G., Burke, K., Crutchfield, J.P.: Chaos forgets and remembers: measuring information creation, destruction, and storage. Phys. Lett. A 378, 2124–2127 (2014)ADSCrossRefGoogle Scholar
  53. 53.
    Young, K., Crutchfield, J.P.: Fluctuation spectroscopy. Chaos Solitons Fractals 4, 5–39 (1994)ADSCrossRefzbMATHGoogle Scholar
  54. 54.
    Riechers, P.M., Mahoney, J.R., Aghamohammadi, C., Crutchfield, J.P.: Minimized state-complexity of quantum-encoded cryptic processes. Physical Review A (2016, in press). arXiv:1510.08186 [physics.quant-ph]
  55. 55.
    Nemenman, I., Shafee, F., Bialek, W.: Entropy and inference, revisited. In: Dietterich, T.G., Becker, S., Ghahramani, Z., (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 471–478. MIT Press, Cambridge (2002)Google Scholar
  56. 56.
    Blackwell, D.: The entropy of functions of finite-state Markov chains. vol. 28, pp. 13–20, Publishing House of the Czechoslovak Academy of Sciences, Prague, 1957. Held at Liblice near Prague from November 28 to 30 (1956)Google Scholar
  57. 57.
    Haslinger, R., Klinkner, K.L., Shalizi, C.R.: The computational structure of spike trains. Neural Comp. 22, 121–157 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  58. 58.
    Watson, R.: The Structure of Dynamic Memory in a Simple Model of Inhibitory Neural Feedback. PhD thesis, University of California, Davis (2014)Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Physics, Redwood Center for Theoretical NeuroscienceUniversity of California at BerkeleyBerkeleyUSA
  2. 2.Complexity Sciences Center and Department of PhysicsUniversity of California at DavisDavisUSA

Personalised recommendations