Skip to main content
Log in

Prediction and Dissipation in Nonequilibrium Molecular Sensors: Conditionally Markovian Channels Driven by Memoryful Environments

  • Original Article
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

Biological sensors must often predict their input while operating under metabolic constraints. However, determining whether or not a particular sensor is evolved or designed to be accurate and efficient is challenging. This arises partly from the functional constraints being at cross purposes and partly since quantifying the prediction performance of even in silico sensors can require prohibitively long simulations, especially when highly complex environments drive sensors out of equilibrium. To circumvent these difficulties, we develop new expressions for the prediction accuracy and thermodynamic costs of the broad class of conditionally Markovian sensors subject to complex, correlated (unifilar hidden semi-Markov) environmental inputs in nonequilibrium steady state. Predictive metrics include the instantaneous memory and the total predictable information (the mutual information between present sensor state and input future), while dissipation metrics include power extracted from the environment and the nonpredictive information rate. Success in deriving these formulae relies on identifying the environment’s causal states, the input’s minimal sufficient statistics for prediction. Using these formulae, we study large random channels and the simplest nontrivial biological sensor model—that of a Hill molecule, characterized by the number of ligands that bind simultaneously—the sensor’s cooperativity. We find that the seemingly impoverished Hill molecule can capture an order of magnitude more predictable information than large random channels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Here, when analyzing sensory information processing in biological systems, we take care to distinguish intrinsic, functional, and useful computation (Crutchfield and Young 1989; Crutchfield 1994; Crutchfield and Mitchell 1995). Intrinsic computation refers to how a physical system stores and transforms its historical information. We take functional computation as information processing in a physical device that promotes the performance of a larger, encompassing system. We take useful computation as information processing in a physical device used to achieve an external user’s goal. The first is well suited to analyzing structure in physical processes and determining if they are candidate substrates for any kind of information processing. The second is well suited for discussing biological sensors, while the third is well suited for discussing the benefits of contemporary digital computers.

  2. We only consider online or real-time computations, so that the oft-considered energy–speed–accuracy trade-off (Lan et al. 2012; Lahiri et al. 2016) reduces to an energy-accuracy trade-off.

References

  • Aghamohammdi C, Crutchfield JP (2017) Thermodynamics of random number generation. Phys Rev E 95(6):062139

    Google Scholar 

  • Arnold L (2013) Random dynamical systems. Springer, New York

    Google Scholar 

  • Barato AC, Hartich D, Seifert U (2014) Efficiency of cellular information processing. New J Phys 16(10):103024

    MathSciNet  Google Scholar 

  • Becker NB, Mugler A, ten Wolde PR (2015) Optimal prediction by cellular signaling networks. Phys Rev Lett 115(25):258103

    Google Scholar 

  • Bennett CH (1982) The thermodynamics of computation: a review. Int J Theor Phys 21(12):905–940

    Google Scholar 

  • Bialek W, Nemenman I, Tishby N (2001) Predictability, complexity, and learning. Neural Comput 13:2409–2463

    MATH  Google Scholar 

  • Bo S, Del Giudice M, Celani A (2015) Thermodynamic limits to information harvesting by sensory systems. J Stat Mech Theory Exp 2015(1):P01014

    MathSciNet  Google Scholar 

  • Boyd AB, Crutchfield JP (2016) Maxwell demon dynamics: Deterministic chaos, the Szilard map, and the intelligence of thermodynamic systems. Phys Rev Lett 116:190601

    Google Scholar 

  • Boyd AB, Mandal D, Crutchfield JP (2016) Leveraging environmental correlations: the thermodynamics of requisite variety. J Stat Phys 167(6):1555–1585

    MathSciNet  MATH  Google Scholar 

  • Boyd AB, Mandal D, Crutchfield JP (2017) Correlation-powered information engines and the thermodynamics of self-correction. Phys Rev E 95(1):012152

    Google Scholar 

  • Boyd AB, Mandal D, Riechers PM, Crutchfield JP (2017) Transient dissipation and structural costs of physical information transduction. Phys Rev Lett 118:220602

    Google Scholar 

  • Brittain RA, Jones NS, Ouldridge TE (2017) What we learn from the learning rate. J Stat Mech 2017:063502

    MathSciNet  Google Scholar 

  • Brodu N (2011) Reconstruction of \(\epsilon \)-machines in predictive frameworks and decisional states. Adv Complex Syst 14(05):761–794

    MathSciNet  Google Scholar 

  • Casas-Vázquez J, Jou D (2003) Temperature in non-equilibrium states: a review of open problems and current proposals. Rep Prog Phys 66(11):1937

    Google Scholar 

  • Chapman A, Miyake A (2015) How an autonomous quantum Maxwell demon can harness correlated information. Phys Rev E 92(6):062125

    Google Scholar 

  • Chklovskii DB, Koulakov AA (2004) Maps in the brain: what can we learn from them? Annu Rev Neurosci 27:369–392

    Google Scholar 

  • Cover TM, Thomas JA (1991) Elements of Information Theory. Wiley-Interscience, New York

    MATH  Google Scholar 

  • Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley-Interscience, New York

    MATH  Google Scholar 

  • Creutzig F, Sprekeler H (2008) Predictive coding and the slowness principle: an information-theoretic approach. Neural Comput 20(4):1026–1041

    MathSciNet  MATH  Google Scholar 

  • Creutzig F, Globerson A, Tishby N (2009) Past-future information bottleneck in dynamical systems. Phys Rev E 79(4):041925

    Google Scholar 

  • Crutchfield JP (1994) The calculi of emergence: computation, dynamics, and induction. Phys D 75:11–54

    MATH  Google Scholar 

  • Crutchfield JP, Mitchell M (1995) The evolution of emergent computation. Proc Natl Acad Sci 92:10742–10746

    MATH  Google Scholar 

  • Crutchfield JP, Young K (1989) Inferring statistical complexity. Phys Rev Lett 63:105–108

    MathSciNet  Google Scholar 

  • Crutchfield JP, Ellison CJ, Mahoney JR (2009) Time’s barbed arrow: irreversibility, crypticity, and stored information. Phys Rev Lett 103(9):094101

    Google Scholar 

  • Das SG, Rao M, Iyengar G (2017) Universal lower bound on the free-energy cost of molecular measurements. Phys Rev E 95(6):062410

    Google Scholar 

  • Deffner S, Jarzynski C (2013) Information processing and the second law of thermodynamics: an inclusive Hamiltonian approach. Phys Rev X 3(4):041003

    Google Scholar 

  • Gillespie DT (1976) A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 22(4):403–434

    MathSciNet  Google Scholar 

  • Goldt S, Seifert U (2017) Stochastic thermodynamics of learning. Phys Rev Lett 118(1):010601

    Google Scholar 

  • Govern CC, ten Wolde PR (2014) Energy dissipation and noise correlations in biochemical sensing. Phys Rev Lett 113(25):258102

    Google Scholar 

  • Hartich D, Barato AC, Seifert U (2014) Stochastic thermodynamics of bipartite systems: transfer entropy inequalities and a Maxwell’s demon interpretation. J Stat Mech Theory Exp 2014(2):P02016

    MathSciNet  Google Scholar 

  • Hartich D, Barato AC, Seifert U (2016) Sensory capacity: an information theoretical measure of the performance of a sensor. Phys Rev E 93(2):022116

    Google Scholar 

  • Hasenstaub A, Otte S, Callaway E, Sejnowski TJ (2010) Metabolic cost as a unifying principle governing neuronal biophysics. Proc Natl Acad Sci USA 107(27):12329–12334

    Google Scholar 

  • Hinczewski M, Thirumalai D (2014) Cellular signaling networks function as generalized Wiener-Kolmogorov filters to suppress noise. Phys Rev X 4(4):041017

    Google Scholar 

  • Horowitz JM, Esposito M (2014) Thermodynamics with continuous information flow. Phys Rev X 4:031015

    Google Scholar 

  • Horowitz JM, Sagawa T, Parrondo JMR (2013) Imitating chemical motors with optimal information motors. Phys Rev Lett 111(1):010602

    Google Scholar 

  • Ito S, Sagawa T (2013) Information thermodynamics on causal networks. Phys Rev Lett 111(18):180603

    Google Scholar 

  • Ito S, Sagawa T (2015) Maxwell’s demon in biochemical signal transduction with feedback loop. Nat Commun 6:7498

    Google Scholar 

  • Izhikevich EM (2007) Dynamical systems in neuroscience. MIT press, Cambridge

    Google Scholar 

  • Jaeger H (2001) Short Term Memory in Echo State Networks, vol 5. GMD-Forschungszentrum Informationstechnik

  • James RG, Ellison CJ, Crutchfield JP (2011) Anatomy of a bit: information in a time series observation. CHAOS 21(3):037109

    MathSciNet  MATH  Google Scholar 

  • James RG, Barnett N, Crutchfield JP (2016) Information flows? A critique of transfer entropies. Phys Rev Lett 116(23):238701

    MathSciNet  Google Scholar 

  • Lahiri S, Sohl-Dickstein J, Ganguli S (2016) A universal tradeoff between power, precision and speed in physical communication. arXiv:1603.07758

  • Lan G, Sartori P, Neumann S, Sourjik V, Tu Y (2012) The energy-speed-accuracy trade-off in sensory adaptation. Nat Phys 8(5):422–428

    Google Scholar 

  • Landauer R (1961) Irreversibility and heat generation in the computing process. IBM J Res Dev 5(3):183–191

    MathSciNet  MATH  Google Scholar 

  • Lang AH, Fisher CK, Mora T, Mehta P (2014) Thermodynamics of statistical inference by cells. Phys Rev Lett 113(14):148103

    Google Scholar 

  • Little DY, Sommer FT (2014) Learning and exploration in action-perception loops. Closing the loop around neural systems, p 295

  • Littman ML, Sutton RS, Singh SP (2001) Predictive representations of state. In: NIPS, vol 14, pp 1555–1561

  • Löhr W (2010) Models of discrete-time stochastic processes and associated complexity measures. Ph.D. thesis, University of Leipzig

  • Löhr W (2012) Predictive models and generative complexity. J Syst Sci Complex 25:30–45

    MathSciNet  MATH  Google Scholar 

  • Mancini F, Marsili M, Walczak AM (2016) Trade-offs in delayed information transmission in biochemical networks. J Stat Phys 162(5):1088–1129

    MathSciNet  MATH  Google Scholar 

  • Mandal D, Jarzynski C (2012) Work and information processing in a solvable model of Maxwell’s demon. Proc Natl Acad Sci USA 109(29):11641–11645

    Google Scholar 

  • Martins BMC, Swain PS (2011) Trade-offs and constraints in allosteric sensing. PLoS Comput Biol 7(11):e1002261

    Google Scholar 

  • Marzen SE (2017) Difference between memory and prediction in linear recurrent networks. Phys Rev E 96(3):032308

    Google Scholar 

  • Marzen S (2018) Infinitely large, randomly wired sensors cannot predict their input unless they are close to deterministic. PLoS One

  • Marzen SE, Crutchfield JP (2016) Predictive rate-distortion for infinite-order Markov processes. J Stat Phys 163(6):1312–1338

    MathSciNet  MATH  Google Scholar 

  • Marzen SE, Crutchfield JP (2017a) Structure and randomness of continuous-time discrete-event processes. J Stat Phys 169(2):303–315

    MathSciNet  MATH  Google Scholar 

  • Marzen SE, Crutchfield JP (2017b) Nearly maximally predictive features and their dimensions. Phys Rev E 95(5):051301(R)

    Google Scholar 

  • Marzen SE, Crutchfield JP (2018) Optimized bacteria are environmental prediction engines. Phys Rev E 98:012408

    Google Scholar 

  • Marzen SE, DeDeo S (2016) Weak universality in sensory tradeoffs. Phys Rev E 94(6):060101

    Google Scholar 

  • Marzen S, Garcia HG, Phillips R (2013) Statistical mechanics of Monod–Wyman–Changeux (MWC) models. J Mol Biol 425(9):1433–1460

    Google Scholar 

  • Maxwell JC (1888) Theory of heat, 9th edn. Longmans Green and Co, London

    Google Scholar 

  • McGrath T, Jones NS, ten Wolde PR, Ouldridge TE (2017) Biochemical machines for the interconversion of mutual information and work. Phys Rev Lett 118(2):028101

    Google Scholar 

  • Mehta P, Schwab DJ (2012) Energetic costs of cellular computation. Proc Natl Acad Sci USA 109(44):17978–17982

    Google Scholar 

  • Nemenman I, Shafee F, Bialek W (2002) Entropy and inference, revisited. In: Advances in neural information processing systems, pp 471–478

  • Palmer SE, Marre O, Berry MJ, Bialek W (2015) Predictive information in a sensory population. Proc Natl Acad Sci USA 112(22):6908–6913

    Google Scholar 

  • Parrondo JMR, Horowitz JM, Sagawa T (2015) Thermodynamics of information. Nat Phys 11(2):131–139

    Google Scholar 

  • Pfau D, Bartlett N, Wood F (2011) Probabilistic deterministic infinite automata. In: Advances in neural information processing systems, MIT Press, pp 1930–1938

  • Rabiner LR (1989) A tutorial on hidden Markov models and selected applications. IEEE Proc 77:257

    Google Scholar 

  • Rao RPN, Ballard DH (1999) Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci 2(1):79–87

    Google Scholar 

  • Sartori P, Granger L, Lee Fan CF, Horowitz JM (2014) Thermodynamic costs of information processing in sensory adaptation. PLoS Comput Biol 10(12):e1003974

    Google Scholar 

  • Shalizi CR, Crutchfield JP (2001) Computational mechanics: pattern and prediction, structure and simplicity. J Stat Phys 104:817–879

    MathSciNet  MATH  Google Scholar 

  • Spinney RE, Lizier JT, Prokopenko M (2018) Entropy balance and information processing in bipartite and nonbipartite composite systems. Phys Rev E 98(3):032141

    Google Scholar 

  • Still S (2009) Information-theoretic approach to interactive learning. EuroPhys Lett 85:28005

    Google Scholar 

  • Still S, Crutchfield JP, Ellison CJ (2010) Optimal causal inference: estimating stored information and approximating causal architecture. CHAOS 20(3):037111

    MathSciNet  MATH  Google Scholar 

  • Still S, Sivak DA, Bell AJ, Crooks GE (2012) Thermodynamics of prediction. Phys Rev Lett 109:120604

    Google Scholar 

  • Strelioff CC, Crutchfield JP (2014) Bayesian structural inference for hidden processes. Phys Rev E 89:042119

    Google Scholar 

  • Strogatz SH (1994) Nonlinear dynamics and chaos: with applications to physics, biology, chemistry, and engineering. Addison-Wesley, Reading

    MATH  Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  • Szilard L (1929) On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings. Z Phys 53:840–856

    Google Scholar 

  • Tishby N, Polani D (2011) Information theory of decisions and actions. Perception-action cycle. Springer, New York, pp 601–636

    Google Scholar 

  • Tkačik G, Walczak AM, Bialek W (2009) Optimizing information flow in small genetic networks. Phys Rev E 80(3):031920

    Google Scholar 

  • Toyabe S, Sagawa T, Ueda M, Muneyuki E, Sano M (2010) Experimental demonstration of information-to-energy conversion and validation of the generalized Jarzynski equality. Nat Phys 6(12):988

    Google Scholar 

  • Travers N, Crutchfield JP (2014) Equivalence of history and generator \(\epsilon \)-machines. Phys Rev E, page in press, SFI Working Paper 11-11-051; arXiv:1111.4500 [math.PR]

  • Van den Broeck C, Esposito M (2015) Ensemble and trajectory thermodynamics: a brief introduction. Phys A Stat Mech Appl 418:6–16

    Google Scholar 

  • Vestergaard CL, Génois M (2015) Temporal Gillespie algorithm: fast simulation of contagion processes on time-varying networks. PLoS Comput Biol 11(10):e1004579

    Google Scholar 

  • Victor JD (2002) Binless strategies for estimation of information from neural data. Phys Rev E 66(5):051903

    Google Scholar 

  • Walczak AM, Tkačik G, Bialek W (2010) Optimizing information flow in small genetic networks II. Feed-forward interactions. Phys Rev E 81(4):041905

    Google Scholar 

  • Yeung RW (1991) A new outlook on Shannon’s information measures. IEEE Trans Info Theory 37(3):466–474

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank N. Ay, A. Bell, W. Bialek, S. Dedeo, C. Hillar, I. Nemenman, and S. Still for useful conversations and the Santa Fe Institute and the Telluride Science Research Center for their hospitality during visits, where JPC is an External Faculty and a Board Member, respectively. This material is based upon work supported by, or in part by, the John Templeton Foundation grant 52095, the Foundational Questions Institute grant FQXi-RFP-1609, the U. S. Army Research Laboratory and the U. S. Army Research Office under contract W911NF-13-1-0390. S.E.M. was funded by an MIT Physics of Living Systems Fellowship and AFOSR Grant FA9550-19-1-0411.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James P. Crutchfield.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Revisiting the Thermodynamics of Prediction

For completeness, we review the derivation of Eq. (6). Let \( {x} _t\) represent the input at time t, \( y _t\) represent the sensor state at time t, and \(E( {x} , y )\) the system’s energy function for the reservoir of interest. We assume constant temperature. The system’s temperature-normalized nonequilibrium free energy \(F_{\hbox {neq}}\) is given by:

$$\begin{aligned} \beta F_{neq}[p( y | {x} )] = \beta \langle E( {x} , y )\rangle _{p( y | {x} )} - {\text {H}}[ Y | {X} = {x} ]. \end{aligned}$$
(A1)

Even if this is not a valid expression for nonequilibrium free energy, the validity of Still et al. (2012)’s derivation only rests on this expression being a Lyapunov function for the dynamics. Intuitively, this corresponds to an assumption that the system reduces its nonequilibrium free energy when the sensor thermalizes to its attached thermal bath. [Accordingly, the \(\beta \) in the above expression refers to the temperature of the sensor when the environment is fixed, indirectly circumventing the difficulty with defining a nonequilibrium temperature (Casas-Vázquez and Jou 2003).] If so, then:

$$\begin{aligned} \beta F_{\hbox {neq}}[p( y _t| {x} _{t+\Delta t})] \ge \beta F_{\hbox {neq}}[p( y _{t+\Delta t}| {x} _{t+\Delta t})], \end{aligned}$$

giving:

$$\begin{aligned} 0&\le \beta F_{\hbox {neq}}[p( y _t| {x} _{t+\Delta t})] -\beta F_{\hbox {neq}}[p( y _{t+\Delta t}| {x} _{t+\Delta t})] \\&\le \left( \beta \langle E( {x} _{t+\Delta t}, y _t)\rangle _{p( y _t| {x} _{t+\Delta t})}\right. \\&\left. \quad - {\text {H}}[ Y _t| {X} _{t+\Delta t}= {x} _{t+\Delta t}]\right) - (\beta \langle E( {x} _{t+\Delta t}, y _{t+\Delta t})\rangle _{p( y _{t+\Delta t}| {x} _{t+\Delta t})}\\&\quad - {\text {H}}[ Y _{t+\Delta t}| {X} _{t+\Delta t}= {x} _{t+\Delta t}]) \\&\le \beta \lim _{\Delta t\rightarrow 0} \left( \frac{\langle E( {x} _{t+\Delta t},y_t)\rangle _{p(y_t| {x} _{t+\Delta t})}}{\Delta t}- \frac{\langle E( {x} _{t+\Delta t},y_{t+\Delta t}) \rangle _{p(y_{t+\Delta t}| {x} _{t+\Delta t})}}{\Delta t}\right) \nonumber \\&\quad + \lim _{\Delta t\rightarrow 0} \left( \frac{{\text {H}}[ Y _{t+\Delta t}| {X} _{t+\Delta t}= {x} _{t+\Delta t}]}{\Delta t}- \frac{{\text {H}}[ Y _t| {X} _{t+\Delta t}= {x} _{t+\Delta t}]}{\Delta t} \right) . \end{aligned}$$

Finally, we average over possible environmental realizations, equivalent in nonequilibrium steady states (NESSs) to averages over time, to find:

$$\begin{aligned} 0&\le \beta \lim _{\Delta t\rightarrow 0} \frac{\langle E( {x} _{t+\Delta t},y_t)\rangle - \langle E( {x} _{t+\Delta t},y_{t+\Delta t}) \rangle }{\Delta t}\\&\quad + \lim _{\Delta t\rightarrow 0} \frac{{\text {H}}[ Y _{t+\Delta t}| {X} _{t+\Delta t}] - {\text {H}}[ Y _t| {X} _{t+\Delta t}]}{\Delta t}. \end{aligned}$$

The former term could be called \(-\dot{Q}\)—the negative of the sensor’s heat dissipation rate into the reservoir of interest—and the latter \(\dot{{\text {I}}}_\text {lost}\)—the rate of lost information. And so:

$$\begin{aligned} 0&\le -\beta \dot{Q} + \dot{{\text {I}}}_\text {lost}. \end{aligned}$$
(A2)

This is valid even outside of NESSs. In a NESS, however, we can invoke stationarity, concluding that:

$$\begin{aligned} \dot{Q}=-P, \end{aligned}$$
(A3)

where P is the part of the power extracted by the sensor from the environment that is dissipated as heat into the reservoir of interest. Furthermore, \(\dot{{\text {I}}}_\text {lost}\) reduces to the negative of the nonpredictive information rate, since \({\text {H}}[ Y _{t+\Delta t}| {X} _{t+\Delta t}] = {\text {H}}[ Y _{t}| {X} _{t}]\), giving:

$$\begin{aligned} \dot{{\text {I}}}_\text {lost}&= \lim _{\Delta t\rightarrow 0} \frac{{\text {H}}[ Y _t| {X} _t]-{\text {H}}[ Y _t| {X} _{t+\Delta t}]}{\Delta t}. \end{aligned}$$

Calling on a standard information theory identity (Cover and Thomas 2006)—\({\text {H}}[U|V] = {\text {H}}[U]-{\text {I}}[U;V]\)—leads to:

$$\begin{aligned} \dot{{\text {I}}}_\text {lost}&= \lim _{\Delta t\rightarrow 0} \frac{{\text {I}}[ Y _t; {X} _{t+\Delta t}]-{\text {I}}[ Y _t; {X} _t]}{\Delta t}. \end{aligned}$$

We recognize this as the continuous-time version of the nonpredictive information rate \(\dot{{\text {I}}}_\text {np}\); also called the learning rate. Hence, the nonpredictive information rate is the increase in unpredictability of sensor state \( Y _t\) given a slightly delayed environmental state:

$$\begin{aligned} \dot{{\text {I}}}_\text {np}&= \lim _{\Delta t\rightarrow 0} \frac{{\text {H}}[ Y _t| {X} _{t+\Delta t}]-{\text {H}}[ Y _t| {X} _t]}{\Delta t}. \end{aligned}$$
(A4)

In a NESS, then:

$$\begin{aligned} \dot{{\text {I}}}_\text {lost}&= - \dot{{\text {I}}}_\text {np}. \end{aligned}$$
(A5)

Outside of NESSs, these terms are augmented by the time derivative \(d {\text {H}}[ Y _t| {X} _t] / \hbox {d}t\) of the conditional entropy. This leads to the addition of an Landauer-erasure information (Landauer 1961) when integrated.

One of Still et al. (2012)’s main results follows directly from Eqs. (A2), (A3), and (A5):

$$\begin{aligned} \dot{{\text {I}}}_\text {np}= \lim _{\Delta t\rightarrow 0} \frac{{\text {I}}[ Y _t; {X} _t] - {\text {I}}[ Y _t; {X} _{t+\Delta t}]}{\Delta t} \le \beta P. \end{aligned}$$
(A6)

In contrast to Still et al. (2012)’s implication, this is true only in a NESS and Eq. (A2) should be used otherwise.

Differences in presentation between the derivation here and that of Still et al. (2012) come from the difference between discrete- and continuous-time formulations. To make this clear, we present a continuous-time formulation of the same result, following Horowitz and Esposito (2014). We start from \(\beta F_{\hbox {neq}}[p(y_{t'}|x_t)]\) being a Lyapunov function in \(t'\):

$$\begin{aligned} 0&\ge \beta \frac{\partial F_{\hbox {neq}}[p(y_{t'}|x_t)]}{\partial t'} \\&= \beta \frac{\partial }{\partial t'} \left\langle E(x_t,y_{t'})\right\rangle _{p(y_{t'}|x_t)}\Big |_{t'=t} - \frac{\partial }{\partial t'} {\text {H}}[Y_{t'}|X_t=x_t]|_{t'=t} \\&= \beta \left\langle \frac{\partial E(x_t,y_t)}{\partial y_t} \dot{y}_t \right\rangle _{p(y_{t'}|x_t)} -\frac{\partial }{\partial t'} {\text {H}}[Y_{t'}|X_t=x_t]|_{t'=t}. \end{aligned}$$

Next, as before, we average over protocols (or, equivalently in NESS, over time) to find:

$$\begin{aligned} 0&\ge \beta \left\langle \frac{\partial E(x_t,y_t)}{\partial y_t} \dot{y}_t \right\rangle _{p(x_t,y_{t})} -\frac{\partial }{\partial t'} {\text {H}}[Y_{t'}|X_t]|_{t'=t} \end{aligned}$$

We then recognize \(\beta \left\langle \frac{\partial E(x_t,y_t)}{\partial y_t} \dot{y}_t\right\rangle _{p(x_t,y_{t})}\) as the temperature-normalized rate of heat dissipation \(\beta \dot{Q}\), so that:

$$\begin{aligned} \beta \dot{Q} \le \frac{\partial }{\partial t'} {\text {H}}[Y_{t'}|X_t]|_{t'=t}. \end{aligned}$$

The quantity on the right-hand side is simply the rate \(\dot{{\text {I}}}_{\hbox {lost}}\) of information loss, defined earlier. In NESS, \(d \langle E\rangle / \hbox {d}t\) and \(d {\text {H}}[Y_t|X_t] / \hbox {d}t\) vanish. As a result, \(\beta \dot{Q}+\beta P = 0\) and:

$$\begin{aligned} \frac{\partial }{\partial t'} {\text {H}}[Y_{t'}|X_t]|_{t'=t} = -\frac{\partial }{\partial t'} {\text {H}}[Y_t|X_{t'}]|_{t'=t}, \end{aligned}$$

giving:

$$\begin{aligned} \beta P \ge \frac{\partial }{\partial t'} {\text {H}}[Y_t|X_{t'}]|_{t'=t}. \end{aligned}$$
(A7)

We recognize this as the continuous-time formulation of Eq. (A4). Again invoking stationarity, \(d {\text {H}}[X_t] / \hbox {d}t\) vanishes and so:

$$\begin{aligned} \beta P \ge -\frac{\partial }{\partial t'} {\text {I}}[X_{t'};Y_t] \vert _{t'=t}, \end{aligned}$$
(A8)

the continuous-time formulation of Eq. (A6). We have, in Eqs. (A4), (A6), (A7), and (A8), four equivalent definitions for the nonpredictive information rate in the NESS limit.

Closed-form Expressions for Unifilar Hidden Semi-Markov Environments

To find \(\rho ( \sigma ^+, y )\), we start with the following:

$$\begin{aligned} \Pr ( \mathcal {S} ^+_{t+\Delta t}&=(g, {x} ,\tau ), Y _{t+\Delta t}= y ) \nonumber \\&= \sum _{g', {x} ,',\tau ', y '} \Pr ( \mathcal {S} ^+_{t+\Delta t}=(g, {x} ,\tau ), Y _{t+\Delta t}= y | \mathcal {S} ^+_t\nonumber \\&=(g', {x} ',\tau '), Y _t= y ') \Pr ( \mathcal {S} ^+_t=(g', {x} ',\tau '), Y _t= y '). \end{aligned}$$
(B1)

We decompose the transition probability using the lack of feedback as:

$$\begin{aligned} \Pr ( \mathcal {S} ^+_{t+\Delta t}&=(g, {x} ,\tau ), Y _{t+\Delta t}= y | \mathcal {S} ^+_t=(g', {x} ',\tau '), Y _t= y ') \\&= \Pr ( \mathcal {S} ^+_{t+\Delta t}=(g, {x} ,\tau )| \mathcal {S} ^+_t\\&=(g', {x} ',\tau ')) \Pr ( Y _{t+\Delta t}= y | \mathcal {S} ^+_t=(g', {x} ',\tau '), Y _t= y '). \end{aligned}$$

From the setup, we have:

$$\begin{aligned} \Pr ( Y _{t+\Delta t}= y | \mathcal {S} ^+_t=(g', {x} ',\tau '), Y _t= y ') = {\left\{ \begin{array}{ll} k_{ y '\rightarrow y }( {x} ')\Delta t &{} y \ne y ' \\ 1-k_{ y '\rightarrow y '}( {x} ')\Delta t &{} y = y ', \end{array}\right. } \end{aligned}$$

with corrections of \(O(\Delta t^2)\).

Now split this into two cases. As long as \(\tau > \Delta t\), so that \( {x} = {x} '\), we have:

$$\begin{aligned} \Pr ( \mathcal {S} ^+_{t+\Delta t}=(g, {x} ,\tau )| \mathcal {S} ^+_t=(g', {x} ',\tau ')) = \frac{\Phi _{g}(\tau )}{\Phi _{g}(\tau ')} \delta (\tau -(\tau '+\Delta t)) \delta _{ {x} , {x} '} \delta _{g,g'}. \end{aligned}$$

Then, Eq. (B1) reduces to:

$$\begin{aligned} \Pr ( \mathcal {S} ^+_{t+\Delta t}=(g, {x} ,\tau )&, Y _{t+\Delta t}= y ) \nonumber \\&= \sum _{ y '} \Pr ( \mathcal {S} ^+_{t+\Delta t}=(g, {x} ,\tau )| \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t))\nonumber \\&\quad \times \Pr ( Y _{t+\Delta t}= y | \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t), Y _t= y ') \nonumber \\&\quad \times \Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t), Y _t= y ') \nonumber \\&= \sum _{ y '\ne y }\Pr ( \mathcal {S} ^+_{t+\Delta t}=(g, {x} ,\tau )| \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t))\nonumber \\&\quad \times \Pr ( Y _{t+\Delta t}= y | \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t), Y _t= y ') \nonumber \\&\quad \times \Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t), Y _t= y ') \nonumber \\&\quad + \Pr ( \mathcal {S} ^+_{t+\Delta t}=(g, {x} ,\tau )| \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t))\nonumber \\&\quad \times \Pr ( Y _{t+\Delta t}= y | \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t), Y _t= y ) \nonumber \\&\quad \times \Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t), Y _t= y ) \nonumber \\&= \sum _{ y '\ne y } \frac{\Phi _{g}(\tau )}{\Phi _{g}(\tau -\Delta t)} k_{ y '\rightarrow y }( {x} )\nonumber \\&\quad \times \Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t), Y _t= y ') \Delta t \nonumber \\&\quad + \frac{\Phi _{g}(\tau )}{\Phi _{g}(\tau -\Delta t)} \left( 1-k_{ y \rightarrow y }( {x} )\Delta t\right) \nonumber \\&\quad \times \Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t), Y _t= y ) ~, \end{aligned}$$
(B2)

plus terms of \(O(\Delta t^2)\). We Taylor expand \(\Phi _{g}(\tau +\Delta t) = \Phi _{g}(\tau ) - \phi _{g}(\tau )\Delta t\) to find:

$$\begin{aligned} \frac{\Phi _{g}(\tau )}{\Phi _{g}(\tau -\Delta t)} = 1-\frac{\phi _{g}(\tau )}{\Phi _{g}(\tau )}\Delta t `, \end{aligned}$$

plus terms of \(O(\Delta t^2)\). And, similarly, assuming differentiability, we write:

$$\begin{aligned}&\Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau -\Delta t), Y _t= y ')\\&\quad = \Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau ), Y _t= y ') - \frac{d}{\hbox {d}\tau } \Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau ), Y _t= y ') \Delta t, \end{aligned}$$

plus terms of \(O(\Delta t^2)\). Substitution into Eq. (B2) then gives:

$$\begin{aligned}&\Pr ( \mathcal {S} ^+_{t+\Delta t}=(g, {x} ,\tau ), Y _{t+\Delta t}= y )\\&= \left( \sum _{ y '\ne y } k_{ y '\rightarrow y }( {x} ) \Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau ), Y _t= y ') \right) \Delta t + \Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau ), Y _t= y ) \\&\qquad - \frac{d\Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau ), Y _t= y )}{\hbox {d}\tau }\Delta t - \frac{\phi _{g}(\tau )}{\Phi _{g}(\tau )} \Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau ), Y _t= y ) \Delta t \\&\qquad - k_{ y \rightarrow y }( {x} )\Pr ( \mathcal {S} ^+_t=(g, {x} ,\tau ), Y _t= y ) \Delta t, \end{aligned}$$

plus terms of \(O(\Delta t^2)\). For notational ease, we denote:

$$\begin{aligned} \rho ((g, {x} ,\tau ), y ) := \Pr ( \mathcal {S} ^+_t=( {x} ,\tau ), Y _t= y ), \end{aligned}$$

which is equal to \(\Pr ( \mathcal {S} ^+_{t+\Delta t}=(g, {x} ,\tau ), Y _{t+\Delta t}= y )\) since we assumed the system is in a NESS. Then we have:

$$\begin{aligned} \rho ((g, {x} ,\tau ), y )&= \left( \sum _{ y '\ne y } k_{ y '\rightarrow y }( {x} ) \rho ((g, {x} ,\tau ), y ') \right) \Delta t + \rho ((g, {x} ,\tau ), y ) - \frac{\hbox {d} \rho ((g, {x} ,\tau ), y )}{\hbox {d}\tau }\Delta t \\&\qquad - \frac{\phi _{g}(\tau )}{\Phi _g(\tau )} \rho ((g, {x} ,\tau ), y ) \Delta t - k_{ y \rightarrow y }( {x} ) \rho ((g, {x} ,\tau ), y )\Delta t \end{aligned}$$

plus corrections of \(O(\Delta t^2)\). We are left equating the coefficient of the \(O(\Delta t)\) term to 0:

$$\begin{aligned} \frac{d\rho ((g, {x} ,\tau ), y )}{d \tau }&= \sum _{ y '\ne y } k_{ y '\rightarrow y }( {x} ) \rho ((g, {x} ,\tau ), y ') - \frac{\phi _g(\tau )}{\Phi _{g}(\tau )} \rho ((g, {x} ,\tau ), y )\nonumber \\&\quad - k_{ y \rightarrow y }( {x} ) \rho ((g, {x} ,\tau ), y ). \end{aligned}$$
(B3)

Our task is simplified if we separate:

$$\begin{aligned} \rho ((g, {x} ,\tau ), y ) = p( y |g, {x} ,\tau ) \rho (g, {x} ,\tau ) \end{aligned}$$

and if we recall that:

$$\begin{aligned} \rho (g, {x} ,\tau ) = \mu _{g} \Phi _{g}(\tau ) p(g) p( {x} |g). \end{aligned}$$

These give:

$$\begin{aligned} \frac{\hbox {d}\rho (g, {x} ,\tau )}{\hbox {d}\tau } = -\mu _{g} \phi _{g}(\tau )p( {x} ) p( {x} |g). \end{aligned}$$
(B4)

Plugging Eq. (B4) into Eq. (B3) yields:

$$\begin{aligned} \frac{\hbox {d}p( y | {x} ,\tau )}{\hbox {d}\tau } \rho (g, {x} ,\tau )&- \mu _{g} \phi _{g}(\tau )p(g)p( {x} |g) p( y |g, {x} ,\tau ) \\&= \sum _{ y '\ne y } k_{ y '\rightarrow y }( {x} ) \rho (g, {x} ,\tau ) p( y '|g, {x} ,\tau )\\&\quad - \frac{\phi _{g}(\tau )}{\Phi _{g}(\tau )} \rho (g, {x} ,\tau ) p( y |g, {x} ,\tau )\\&\quad - k_{ y \rightarrow y }( {x} ) \rho (g, {x} ,\tau ) p( y |g, {x} ,\tau ), \end{aligned}$$

where we note that:

$$\begin{aligned} \mu _{g} \phi _{g}(\tau )p(g)p( {x} |g) p( y |g, {x} ,\tau ) = \frac{\phi _{g}(\tau )}{\Phi _{g}(\tau )} \rho (g, {x} ,\tau ) p( y |g, {x} ,\tau ). \end{aligned}$$

Hence, we are left with:

$$\begin{aligned} \frac{\hbox {d}p( y |g, {x} ,\tau )}{\hbox {d}\tau }&= \sum _{ y '\ne y } k_{ y '\rightarrow y }( {x} ) p( y '|g, {x} ,\tau ) - k_{ y \rightarrow y }( {x} ) p( y |g, {x} ,\tau ). \end{aligned}$$

We can summarize this ordinary differential equation in matrix-vector notation as follows. Let \(\mathbf {v}(g, {x} ,\tau )\) be the vector:

$$\begin{aligned} \mathbf {v}(g, {x} ,\tau ) := \begin{pmatrix} p( y _1|g, {x} ,\tau ) \\ \vdots \\ p( y _{|\mathcal {Y}|}|g, {x} ,\tau ). \end{pmatrix} \end{aligned}$$

We have:

$$\begin{aligned} \frac{d\mathbf {v}}{d\tau } = M( {x} )\mathbf {v}, \end{aligned}$$

with solution:

$$\begin{aligned} \mathbf {v}(g, {x} ,\tau ) = e^{M( {x} )\tau } \mathbf {v}(g, {x} ,0). \end{aligned}$$
(B5)

The structure of \(M( {x} )\) guarantees that probability is conserved, as long as \(1^{\top }\mathbf {v}(g, {x} ,0)=1\) for all \( {x} \in \mathcal {A}\).

Our next task is to find expressions for \(\mathbf {v}(g, {x} ,0)\). We do this by considering Eq. (B1) in the limit that \(\tau <\Delta t\). More straightforwardly, we consider the equation:

$$\begin{aligned} \rho ((g, {x} ,0), y )&= \sum _{g', {x} '} \int _0^{\infty } \hbox {d}\tau ~\frac{\phi _{g'}(\tau )}{\Phi _{g'}(\tau )} \delta _{g,\epsilon ^+(g', {x} ')} p( {x} |g) \rho ((g', {x} ',\tau ), y ), \end{aligned}$$
(B6)

which is based on the following logic. For probability to flow into \(\rho ((g, {x} ,0), y )\) from \(\rho ((g', {x} ',\tau ), y ')\), we need the dwell time for symbol \( {x} '\) to be exactly \(\tau \) and for \( y '= y \). (The latter comes from the unlikelihood of switching both channel state and input symbol at the same time.) Again decomposing:

$$\begin{aligned} \rho ((g', {x} ',\tau ), y )&= p( y |g', {x} ',\tau ) \rho (g', {x} ',\tau ) \nonumber \\&= \mu _{g'} \Phi _{g'}(\tau ) p(g')p( {x} '|g') p( y |g', {x} ',\tau ) \end{aligned}$$
(B7)

and, thus, as a special case:

$$\begin{aligned} \rho ((g, {x} ,0), y ) = p( y |g, {x} ,0) p(g) p( {x} |g) \mu _{g}. \end{aligned}$$
(B8)

Plugging both Eqs. (B7) and (B8) into Eq. (B6), we find:

$$\begin{aligned} \mu _{g} p(g) p( {x} |g) p( y |g, {x} ,0)&= \sum _{g', {x} '} \int _0^{\infty } \mu _{g'} p(g') p( {x} '|g') \phi _{g'}(\tau ) \delta _{g,\epsilon ^+(g', {x} ')} p( {x} |g) p( y |g', {x} ',\tau ) \hbox {d}\tau \\ \mu _{g} p(g) p( y |g, {x} ,0)&= \sum _{g', {x} '} \int _0^{\infty } \mu _{g'} p(g') p( {x} '|g') \phi _{g'}(\tau ) \delta _{g,\epsilon ^+(g', {x} ')} p( y |g', {x} ',\tau ) \hbox {d}\tau . \end{aligned}$$

Using Eq. (B5), we see that \(p( y |g', {x} ',\tau ) = \left( e^{M( {x} ')\tau } \mathbf {v}(g', {x} ',0)\right) _{ y }\) and \(p( y |g, {x} ,0) = \left( \mathbf {v}(g, {x} ,0)\right) _{ y }\). So, we have:

$$\begin{aligned} \mu _{g} p(g) \mathbf {v}(g, {x} ,0) = \sum _{g', {x} '} \mu _{g'} \delta _{g,\epsilon ^+(g', {x} ')} p(g') p( {x} '|g') \left( \int _0^{\infty } \phi _{g'}(\tau ) e^{M( {x} ')\tau } d\tau \right) \mathbf {v}(g', {x} ',0). \end{aligned}$$

If we form the composite vector:

$$\begin{aligned} \mathbf {U}&= \begin{pmatrix} \mathbf {u}(g_1, {x} _1) \\ \mathbf {u}(g_1, {x} _2) \\ \vdots \\ \mathbf {u}(g_{|\mathcal {G}|}, {x} _{|\mathcal {A}|}) \end{pmatrix} \\&= \begin{pmatrix} \mu _{g_1} p(g_1) \mathbf {v}(g_1, {x} _1,0) \\ \vdots \\ \mu _{g_{|\mathcal {G}|}} p(g_{|\mathcal {G}|}) \mathbf {v}(g_{|\mathcal {G}|}, {x} _{|\mathcal {A}|},0) \end{pmatrix} \end{aligned}$$

and the matrix (written in block form) as:

$$\begin{aligned} \mathbf {C} := \begin{pmatrix} C_{(g_1, {x} _1)\rightarrow (g_1, {x} _1)} &{} C_{(g_1, {x} _2)\rightarrow (g_1, {x} _1)} &{} \ldots \\ C_{(g_1, {x} _1)\rightarrow (g_1, {x} _2)} &{} C_{(g_1, {x} _2)\rightarrow (g_1, {x} _2)} &{} \ldots \\ \vdots &{} \vdots &{} \ddots , \end{pmatrix} \end{aligned}$$

with:

$$\begin{aligned} C_{(g', {x} ')\rightarrow (g, {x} )} = \delta _{g,\epsilon ^+(g', {x} ')} p( {x} '|g') \int _0^{\infty } \phi _{g'}(t) e^{M( {x} ') t} \hbox {d}t, \end{aligned}$$

we then have:

$$\begin{aligned} \mathbf {U} = \text {eig}_1(\mathbf {C}). \end{aligned}$$
(B9)

Finally, we must normalize \(\mathbf {u}( {x} )\) appropriately. We do this by recalling that \(1^{\top }\mathbf {v}(g, {x} ,0)=1\), since \(\mathbf {v}(g, {x} ,0)\) is a vector of probabilities. Then we have:

$$\begin{aligned} \mathbf {u}(g, {x} ) \rightarrow \frac{\mathbf {u}(g, {x} )}{1^{\top }\mathbf {u}(g, {x} )} \mu _{g} p(g). \end{aligned}$$

for each \(g, {x} \).

To calculate prediction metrics—i.e., \(I_{\hbox {mem}}\) and \(I_{\hbox {fut}}\)—we need \(p( {x} , y )\) and \(p( y , \sigma ^-)\). The former is a marginalization of \(p( \sigma ^+, y )\) that we just calculated. The second can be calculated via:

$$\begin{aligned} p( \sigma ^-, y ) = \sum _{ \sigma ^+} p( \sigma ^-| \sigma ^+) p( y , \sigma ^+), \end{aligned}$$

where:

$$\begin{aligned} p( \sigma ^-| \sigma ^+)&= p((g_-, {x} _-,\tau _-)|(g_+, {x} _+,\tau _+)) \\&= \delta _{ {x} _+, {x} _-} p(g_-|g_+, {x} _+) \mu _{g_+} \phi _{g_+}(\tau _+ +\tau _-). \end{aligned}$$

Hence, we turn our attention to calculating dissipation metrics, for which we only need:

$$\begin{aligned} \frac{\delta p}{\delta t} = \lim _{\Delta t\rightarrow 0} \frac{\Pr ( {X} _{t+\Delta t}= {x} , Y _t= y ) - \Pr ( {X} _t= {x} , Y _t= y )}{\Delta t}. \end{aligned}$$

Moreover, we can use the Markov chain \( Y _t\rightarrow \mathcal {S} ^+_t\rightarrow {X} _{t+\Delta t}\) to compute it:

$$\begin{aligned} \Pr ( {X} _{t+\Delta t}= {x} , Y _t= y ) = \sum _{ \sigma ^+} \Pr ( {X} _{t+\Delta t}= {x} | \mathcal {S} ^+_t= \sigma ^+) \Pr ( Y _t= y , \mathcal {S} ^+_t= \sigma ^+). \end{aligned}$$

We have:

$$\begin{aligned} \Pr ( {X} _{t+\Delta t}= {x} | \mathcal {S} ^+_t= \sigma ^+)&= \Pr ( {X} _{t+\Delta t}= {x} | \mathcal {S} ^+_t=(g', {x} ',\tau ')) \\&= {\left\{ \begin{array}{ll} \frac{\Phi _{g'}(\tau '+\Delta t)}{\Phi _{g'}(\tau ')} &{} {x} = {x} ' \\ \frac{\phi _{g'}(\tau ')}{\Phi _{g'}(\tau ')} p( {x} |\epsilon ^+(g', {x} ')) \Delta t &{} {x} \ne {x} '. \end{array}\right. } \end{aligned}$$

This, combined with \(p( \sigma ^+, y )\), gives:

$$\begin{aligned} \Pr ( {X} _{t+\Delta t}= {x} , Y _t= y )&= \sum _{g', {x} '\ne {x} } \int \hbox {d}\tau '~\rho ((g', {x} ',\tau '), y ) \frac{\phi _{g'}(\tau ')}{\Phi _{g'}(\tau ')} \Delta t~p( {x} |\epsilon ^+(g', {x} ')) \\&\quad + \sum _{g'} \int \hbox {d}\tau ' \frac{\Phi _{g'}(\tau '+\Delta t)}{\Phi _{g'}(\tau ')} \rho ((g', {x} ',\tau '), y ) \\&= \Pr ( {X} _{t}= {x} , Y _t= y )\nonumber \\&\quad + \Delta t \Big ( \sum _{g', {x} '\ne {x} } \int d\tau ' p( {x} |\epsilon ^+(g', {x} ')) \frac{\phi _{g'}(\tau ')}{\Phi _{g'}(\tau ')} \rho ((g', {x} ',\tau '), y ) \nonumber \\&\quad -\sum _{g'} \int \hbox {d}\tau ' \frac{\phi _{g'}(\tau ')}{\Phi _{g'}(\tau ')} \rho ((g', {x} ,\tau '), y )\Big ), \end{aligned}$$

correct to \(O(\Delta t)\). Recalling that:

$$\begin{aligned} \rho ((g', {x} ',\tau '), y )&= \rho (g', {x} ',\tau ')p( y |g', {x} ',\tau ') \\&= p( {x} '|g') \Phi _{g'}(\tau ') \left( e^{M( {x} ')\tau '}\mathbf {u}(g', {x} ')\right) _{ y }, \end{aligned}$$

gives:

$$\begin{aligned} \frac{\delta p}{\delta t}= & {} \lim _{\Delta t\rightarrow 0} \frac{\Pr ( {X} _{t+\Delta t}= {x} , Y _t= y ) -\Pr ( {X} _{t}= {x} , Y _t= y )}{\Delta t} \nonumber \\= & {} \sum _{g', {x} '\ne {x} } \int \hbox {d}\tau ' p( {x} |\epsilon ^+(g', {x} ')) \frac{\phi _{g'}(\tau ')}{\Phi _{g'}(\tau ')} \rho ((g', {x} ',\tau '), y ) -\sum _{g'} \int \hbox {d}\tau ' \frac{\phi _{g'}(\tau ')}{\Phi _{g'}(\tau ')} \rho ((g', {x} ,\tau '), y ) \nonumber \\\end{aligned}$$
(B10)
$$\begin{aligned}= & {} \sum _{g', {x} '\ne {x} } \int \hbox {d}\tau ' ~p( {x} |\epsilon ^+(g', {x} ')) p( {x} '|g') \phi _{g'}(\tau ') \left( e^{M( {x} ')\tau '}\mathbf {u}(g', {x} ')\right) _{ y } \nonumber \\&\qquad - \sum _{g'} \int \hbox {d}\tau '~p( {x} |g') \phi _{g'}(\tau ') \left( e^{M( {x} )\tau '}\mathbf {u}(g', {x} )\right) _{ y }. \end{aligned}$$
(B11)

From this, Eqs. (7) and (10) can be used to calculate \(\dot{{\text {I}}}_\text {np}\) and \(\beta P\).

Specialization to Semi-Markov Input

Up to this point, we wrote expressions for the general case of unifilar hidden semi-Markov environment inputs to the sensor. We now specialize to the semi-Markov input case: the environment’s states are directly observed, not hidden. Not surprisingly, a great simplification ensues: hidden states g are the current emitted symbols \( {x} \). Recall that, in an abuse of notation, \(q( {x} | {x} ')\) is now the probability of observing symbol \( {x} \) after seeing symbol \( {x} '\).

Hence, forward-time causal states are given by the pair \(( {x} ,\tau )\). The analog of Eq. (B5) is:

$$\begin{aligned} \mathbf {p}( y | {x} ,\tau ) = e^{M( {x} )\tau } \mathbf {p}( y | {x} ,0), \end{aligned}$$

and we define vectors:

$$\begin{aligned} \mathbf {u}( {x} ) := \mu _{ {x} } p( {x} ) \mathbf {p}( y | {x} ,0). \end{aligned}$$

The large vector:

$$\begin{aligned} \mathbf {U} := \begin{pmatrix} \mathbf {u}( {x} _1) \\ \vdots \\ \mathbf {u}( {x} _{|\mathcal {A}}) \end{pmatrix} \end{aligned}$$

is the eigenvector \(\text {eig}_1(\mathbf{C} )\) of eigenvalue 1 of the matrix:

$$\begin{aligned} \mathbf{C} = \begin{pmatrix} 0 &{} q( {x} _1| {x} _2) \int _0^{\infty } \phi _{ {x} _2}(\tau ) e^{M( {x} _2)\tau }\hbox {d}\tau &{} \ldots \\ q( {x} _2| {x} _1) \int _0^{\infty } \phi _{ {x} _1}(\tau ) e^{M( {x} _1)\tau }\hbox {d}\tau &{} 0 &{} \ldots \\ \vdots &{} \vdots &{} \ddots \end{pmatrix} , \end{aligned}$$

where normalization requires \(1^{\top } \mathbf {u}( {x} ) = \mu _{ {x} } p( {x} )\).

We continue by finding \(p( y )\), since from this we obtain \({\text {H}}[ Y ]\). We do this via straightforward marginalization:

$$\begin{aligned} p( y )&= \sum _{ \sigma ^+} \rho ( \sigma ^+, y ) = \sum _{ \sigma ^+} p( y | \sigma ^+) \rho ( \sigma ^+) \\&= \sum _{ {x} } \int _0^{\infty }~p( y | {x} ,\tau ) \rho ( {x} ,\tau )~\hbox {d}\tau \\&= \sum _{ {x} } \int _0^{\infty } \left( e^{M( {x} )\tau } \mathbf {v}( {x} ,0)\right) _{ y } \mu _{ {x} } p( {x} ) \Phi _{ {x} }(\tau ) \hbox {d}\tau \\&= \sum _{ {x} } \left( \left( \int _0^{\infty }e^{M( {x} )\tau } \Phi _{ {x} }(\tau ) \hbox {d}\tau \right) \mathbf {u}( {x} )\right) _{ y } . \end{aligned}$$

This implies that:

$$\begin{aligned} \mathbf {p}( y ) = \sum _{ {x} } \left( \int _0^{\infty } e^{M( {x} )\tau } \Phi _{ {x} }(\tau ) \hbox {d}\tau \right) \mathbf {u}( {x} ) . \end{aligned}$$

From earlier, recall that \(\mathbf {u}( {x} ) := \mu _{ {x} } p( {x} ) \mathbf {p}( y | {x} ,0)\).

Next, we aim to find \(p( {x} , y )\), again via marginalization:

$$\begin{aligned} p( {x} , y )&= \int _0^{\infty } \rho (( {x} ,\tau ), y ) d\tau \nonumber \\&= \int _0^{\infty } \mu _{ {x} } p( {x} ) \Phi _{ {x} }(\tau ) p( y | {x} ,\tau ) d\tau \nonumber \\&= \int _0^{\infty } \mu _{ {x} } p( {x} ) \Phi _{ {x} }(\tau ) \left( e^{M( {x} )\tau } \mathbf {v}( {x} ,0)\right) _{ y } \hbox {d}\tau \nonumber \\&= \left( \left( \int _0^{\infty } e^{M( {x} )\tau }\Phi _{ {x} }(\tau )\hbox {d}\tau \right) \mathbf {u}( {x} ) \right) _{ y } . \end{aligned}$$
(C1)

From the joint distribution \(p( {x} , y )\), we easily numerically obtain \({\text {I}}[ {X} ; Y ]\), since \(|\mathcal {A}| < \infty \) and \(|\mathcal {Y}| <\infty \).

For notational ease, we introduced \(\mathcal {T}_t\) in this section as the random variable for the time since last symbol, whose realization is \(\tau \). Finally, we require \(p( y | \sigma ^-)\) to calculate \({\text {H}}[ Y | \mathcal {S} ^-]\), which we can then combine with \({\text {H}}[ Y ]\) to get an estimate for \({\text {I}}_\text {fut}\). We utilize the Markov chain \( Y \rightarrow \mathcal {S} ^+\rightarrow \mathcal {S} ^-\), as stated earlier, and so have:

$$\begin{aligned} p( y | \sigma ^-)&= \sum _{ \sigma ^+} \rho ( y , \sigma ^+| \sigma ^-) \\&= \sum _{ \sigma ^+} p( y | \sigma ^+, \sigma ^-) \rho ( \sigma ^+| \sigma ^-) \\&= \sum _{ \sigma ^+} p( y | \sigma ^+) \rho ( \sigma ^+| \sigma ^-) . \end{aligned}$$

Eq. (B5) gives us \(p( y | \sigma ^+)\) as:

$$\begin{aligned} p( y | \sigma ^+)= & {} p( y | {x} _+,\tau _+) \\= & {} \left( e^{M( {x} _+)\tau _+} \mathbf {v}( {x} _+,0)\right) _{ y } \end{aligned}$$

and Eq. (2) gives us \(\rho ( \sigma ^+| \sigma ^-)\) after some manipulation:

$$\begin{aligned} \rho ( \sigma ^+| \sigma ^-)&= \rho (( {x} _+,\tau _+)|( {x} _-,\tau _-)) \\&= \delta _{ {x} _+, {x} _-} \frac{\phi _{ {x} _-}(\tau _+ + \tau _-)}{\Phi _{ {x} _-}(\tau _-)} . \end{aligned}$$

Combining the two equations gives:

$$\begin{aligned} p( y | {x} _-,\tau _-)&= \sum _{ {x} _+} \int _0^{\infty }~\delta _{ {x} _+, {x} _-} \frac{\phi _{ {x} _-}(\tau _+ + \tau _-)}{\Phi _{ {x} _-}(\tau _-)} \left( e^{M( {x} _+)\tau _+} \mathbf {v}( {x} _+,0)\right) _{ y }~\hbox {d}\tau _+ \\&= \frac{1}{\Phi _{ {x} _-}(\tau _-)} \left( \left( \int _0^{\infty } \phi _{ {x} _-}(\tau _+ + \tau _-)e^{M( {x} _-)\tau _+} \hbox {d}\tau _+ \right) \mathbf {v}( {x} _-,0)\right) _{ y } . \end{aligned}$$

From this conditional distribution, we compute \({\text {H}}[ Y | \mathcal {S} ^-= \sigma ^-]\), and so \({\text {H}}[ Y | \mathcal {S} ^-]=\langle {\text {H}}[ Y | \mathcal {S} ^-= \sigma ^-]\rangle _{\rho ( \sigma ^-)}\). In more detail, define:

$$\begin{aligned} D_{ {x} }(\tau ) := \int _0^{\infty } \phi _{ {x} }(\tau +s) e^{M( {x} )s} \hbox {d}s , \end{aligned}$$

and we have:

$$\begin{aligned} \mathbf {p}( y | {x} _-,\tau _-) = D_{ {x} _-}(\tau _-)\mathbf {u}( {x} _-)/\mu _{ {x} _-}p( {x} _-)\Phi _{ {x} _-}(\tau _-) . \end{aligned}$$

This conditional distribution gives:

$$\begin{aligned} {\text {H}}[ Y | {X} _-= {x} _-,\mathcal {T}_-=\tau _-]&= -\sum _{ y } p( y | {x} _-,\tau _-)\log p( y | {x} _-,\tau _-) \\&= - 1^{\top } \left( \frac{D_{ {x} _-}(\tau _-)\mathbf {u}( {x} _-)}{\mu _{ {x} _-}p( {x} _-)\Phi _{ {x} _-}(\tau _-)} \log \left( \frac{D_{ {x} _-}(\tau _-)\mathbf {u}( {x} _-)}{\mu _{ {x} _-}p( {x} _-)\Phi _{ {x} _-}(\tau _-)} \right) \right) \\&= -\frac{1}{\mu _{ {x} _-}p( {x} _-)\Phi _{ {x} _-}(\tau _-)} \Big (1^{\top } \left( (D_{ {x} _-}(\tau _-)\mathbf {u}( {x} _-))\log (D_{ {x} _-}(\tau _-)\mathbf {u}( {x} _-))\right) \\&\qquad -1^{\top } (D_{ {x} _-}(\tau _-)\mathbf {u}( {x} _-)) \log (\mu _{ {x} _-}p( {x} _-)\Phi _{ {x} _-}(\tau _-))\Big ) . \end{aligned}$$

We recognize the factor \(\mu _{ {x} _-}p( {x} _-)\Phi _{ {x} _-}(\tau _-)\) as \(\rho ( {x} _-,\tau _-)\) and so we find that:

$$\begin{aligned} {\text {H}}[ Y | {X} _-,\mathcal {T}_-]&= \sum _{ {x} _-} \int _0^{\infty }~\rho ( {x} _-,\tau _-) {\text {H}}[ Y | {X} _-= {x} _-,\mathcal {T}_-=\tau _-] d\tau _- \\&= -\int _0^{\infty } \left( \sum _{ {x} _-} 1^{\top } \left( (D_{ {x} _-}(\tau _-)\mathbf {u}( {x} _-))\log (D_{ {x} _-}(\tau _-)\mathbf {u}( {x} _-)) \right) \right) \hbox {d}\tau _- \\&\qquad + \int _0^{\infty } \left( \sum _{ {x} _-} 1^{\top } D_{ {x} _-}(\tau _-) \mathbf {u}( {x} _-)\log (\mu _{ {x} _-}p( {x} _-)\Phi _{ {x} _-}(\tau _-))\right) \hbox {d}\tau _- . \end{aligned}$$

This, combined with earlier formula for \({\text {H}}[ Y ]\), gives \({\text {I}}_\text {fut}\).

Finally, we wish to find an expression for the nonpredictive information rate \(\dot{{\text {I}}}_\text {np}\). We review the somewhat compact derivation of \(\delta p / \delta t\) in the more general case, specialized for semi-Markov input. This requires finding an expression for \(\Pr ( Y _t= y , {X} _{t+\Delta t}= {x} )\) as an expansion in \(\Delta t\). We start as usual:

$$\begin{aligned} \Pr ( Y _t= y , {X} _{t+\Delta t}= {x} ) = \sum _{ {x} '} \int _0^{\infty } \Pr ( Y _t= y , {X} _{t+\Delta t}= {x} , {X} _t= {x} ',\mathcal {T}_t=\tau ) \hbox {d}\tau \end{aligned}$$

and utilize the Markov chain \( Y _t\rightarrow \mathcal {S} ^+_t\rightarrow {X} _{t+\Delta t}\), giving:

$$\begin{aligned}&\Pr ( Y _t= y , {X} _{t+\Delta t}= {x} ) = \sum _{ {x} '} \int _0^{\infty }\nonumber \\&\quad \Pr ( Y _t= y | {X} _t= {x} ',\mathcal {T}_t=\tau ) \Pr ( {X} _{t+\Delta t}= {x} | {X} _t= {x} ',\mathcal {T}_t=\tau ) \rho ( {x} ',\tau ) \hbox {d}\tau . \end{aligned}$$
(C2)

We have \(\Pr ( Y _t= y | {X} _t= {x} ,\mathcal {T}_t=\tau )\) from Eq. (B5). So, we turn our attention to finding \(\Pr ( {X} _{t+\Delta t}= {x} | {X} _t= {x} ',\mathcal {T}_t=\tau )\). Some thought reveals that:

$$\begin{aligned} \Pr ( {X} _{t+\Delta t}= {x} | {X} _t= {x} ',\mathcal {T}_t=\tau ) = {\left\{ \begin{array}{ll} \Delta t q( {x} | {x} ') \phi _{ {x} '}(\tau ) / \Phi _{ {x} '}(\tau ) &{} {x} \ne {x} ' \\ \Phi _{ {x} '}(\tau +\Delta t) / \Phi _{ {x} '}(\tau ) &{} {x} = {x} ' \end{array}\right. } , \end{aligned}$$
(C3)

plus corrections of \(O(\Delta t^2)\). We substitute Eq. (C3) into Eq. (C2) to get:

$$\begin{aligned} \Pr ( Y _t= y , {X} _{t+\Delta t}= {x} )&= \left( \sum _{ {x} '\ne {x} } \int _0^{\infty } \Pr ( Y _t= y | {X} _t= {x} ',\mathcal {T}_t=\tau ) q( {x} | {x} ') \frac{\phi _{ {x} '}(\tau )}{\Phi _{ {x} '}(\tau )} \rho ( {x} ',\tau )\hbox {d}\tau \right) \Delta t \\&\qquad + \int _0^{\infty } \Pr ( Y _t= y | {X} _t= {x} ,\mathcal {T}_t=\tau ) \frac{\Phi _{ {x} }(\tau +\Delta t)}{\Phi _{ {x} }(\tau )} \rho ( {x} ,\tau )\hbox {d}\tau , \end{aligned}$$

plus corrections of \(O(\Delta t^2)\). Recalling:

$$\begin{aligned} \frac{\Phi _{ {x} }(\tau +\Delta t)}{\Phi _{ {x} }(\tau )} = 1-\frac{\phi _{ {x} }(\tau )}{\Phi _{ {x} }(\tau )} \Delta t , \end{aligned}$$

plus corrections of \(O(\Delta t^2)\), we simplify further:

$$\begin{aligned} \Pr ( Y _t= y , {X} _{t+\Delta t}= {x} )&= \left( \sum _{ {x} '\ne {x} } \int _0^{\infty }\Pr ( Y _t= y | {X} _t= {x} ',\mathcal {T}_t=\tau ) q( {x} | {x} ')\frac{\phi _{ {x} '}(\tau )}{\Phi _{ {x} '}(\tau )}\rho ( {x} ',\tau )d\tau \right) \Delta t \\&\qquad + \int _0^{\infty } \Pr ( Y _t= y | {X} _t= {x} ,\mathcal {T}_t=\tau ) \rho ( {x} ,\tau )d\tau \\&\qquad -\left( \int _0^{\infty } \Pr ( Y _t= y | {X} _t= {x} ,\mathcal {T}_t=\tau )\frac{\phi _{ {x} }(\tau )}{\Phi _{ {x} }(\tau )}\rho ( {x} ,\tau )d\tau \right) \Delta t , \end{aligned}$$

plus \(O(\Delta t^2)\) corrections. We notice that:

$$\begin{aligned} \int _0^{\infty } \Pr ( Y _t= y | {X} _t= {x} ,\mathcal {T}_t=\tau ) \rho ( {x} ,\tau )\hbox {d}\tau = \Pr ( Y _t= y , {X} _t= {x} ) , \end{aligned}$$

so that:

$$\begin{aligned}&\lim _{\Delta t\rightarrow 0} \frac{\Pr ( Y _t= y , {X} _{t+\Delta t}= {x} )-\Pr ( Y _t= y , {X} _t= {x} )}{\Delta t}\\&= \sum _{ {x} '\ne {x} } \int _0^{\infty } \Pr ( Y _t= y | {X} _t= {x} ',\mathcal {T}_t=\tau ) q( {x} | {x} ') \frac{\phi _{ {x} '}(\tau )}{\Phi _{ {x} '}(\tau )} \rho ( {x} ',\tau )\hbox {d}\tau \\&\qquad -\int _0^{\infty } \Pr ( Y _t= y | {X} _t= {x} ,\mathcal {T}_t=\tau ) \frac{\phi _{ {x} }(\tau )}{\Phi _{ {x} }(\tau )} \rho ( {x} ,\tau )\hbox {d}\tau . \end{aligned}$$

Substituting Eqs. (B5) and (1) into the above expressions yields:

$$\begin{aligned}&\sum _{ {x} '\ne {x} } \int _0^{\infty } \Pr ( Y _t= y | {X} _t= {x} ',\mathcal {T}_t=\tau ) q( {x} | {x} ') \frac{\phi _{ {x} '}(\tau )}{\Phi _{ {x} '}(\tau )} \rho ( {x} ',\tau )\hbox {d}\tau \\&\quad = \sum _{ {x} '} q( {x} | {x} ') \left( \left( \int _0^{\infty } \phi _{ {x} '}(\tau ) e^{M( {x} ')\tau } \hbox {d}\tau \right) \mathbf {u}( {x} ')\right) _{ y } \end{aligned}$$

and:

$$\begin{aligned} \int _0^{\infty } \Pr ( Y _t= y | {X} _t= {x} ,\mathcal {T}_t=\tau ) \frac{\phi _{ {x} }(\tau )}{\Phi _{ {x} }(\tau )} \rho ( {x} ,\tau )\hbox {d}\tau&= \left( \left( \int _0^{\infty } \phi _{ {x} }(\tau ) e^{M( {x} )\tau } \hbox {d}\tau \right) \mathbf {u}( {x} )\right) _{ y } , \end{aligned}$$

so that we have:

$$\begin{aligned} \lim _{\Delta t\rightarrow 0}&\frac{\Pr ( Y _t= y , {X} _{t+\Delta t}= {x} ) -\Pr ( Y _t= y , {X} _t= {x} )}{\Delta t} \\&\qquad \qquad \qquad = \Big ( \sum _{ {x} '} q( {x} | {x} ') \left( \int _0^{\infty } \phi _{ {x} '}(\tau ) e^{M( {x} ')\tau } \hbox {d}\tau \right) \mathbf {u}( {x} ') - \left( \int _0^{\infty } \phi _{ {x} }(\tau ) e^{M( {x} )\tau } \hbox {d}\tau \right) \mathbf {u}( {x} )\Big )_{ y }. \end{aligned}$$

For notational ease, denote the left-hand side as \(\delta p( {x} , y ) / \delta t\). The nonpredictive information rate is given by:

$$\begin{aligned} \dot{{\text {I}}}_\text {np}&= \lim _{\Delta t\rightarrow 0} \frac{{\text {I}}[ {X} _t; Y _t]-{\text {I}}[ {X} _{t+\Delta t}; Y _t]}{\Delta t} \\&= \lim _{\Delta t\rightarrow 0} \frac{\left( {\text {H}}[ {X} _t]+{\text {H}}[ Y _t] - {\text {H}}[ {X} _t, Y _t]\right) - \left( {\text {H}}[ {X} _{t+\Delta t}] + {\text {H}}[ Y _t] - {\text {H}}[ {X} _{t+\Delta t}, Y _t]\right) }{\Delta t} \\&= \lim _{\Delta t\rightarrow 0} \frac{{\text {H}}[ {X} _{t+\Delta t}, Y _t] - {\text {H}}[ {X} _t, Y _t]}{\Delta t} , \end{aligned}$$

where we utilize stationarity to assert \({\text {H}}[ {X} _t]={\text {H}}[ {X} _{t+\Delta t}]\). Then, correct to \(O(\Delta t)\), we have:

$$\begin{aligned} {\text {H}}[ {X} _{t+\Delta t}, Y _t]&= - \sum _{ {x} , y } \left( p( {x} , y ) + \frac{\delta p( {x} , y )}{\delta t}\Delta t\right) \log \left( p( {x} , y ) + \frac{\delta p( {x} , y )}{\delta t}\Delta t\right) \\&= -\sum _{ {x} , y } p( {x} , y ) \log p( {x} , y ) - \sum _{ {x} , y }p( {x} , y ) \frac{\delta p( {x} , y )/\delta t}{p( {x} , y )}~\Delta t\\&\quad - \sum _{ {x} , y } \frac{\delta p( {x} , y )}{\delta t} \log p( {x} , y ) \Delta t \\&= {\text {H}}[ {X} _t; Y _t] - \sum _{ {x} , y } \frac{\delta p( {x} , y )}{\delta t} \log p( {x} , y ) \Delta t , \end{aligned}$$

which implies:

$$\begin{aligned} \dot{{\text {I}}}_\text {np}&= \sum _{ {x} , y } \frac{\delta p( {x} , y )}{\delta t} \log p( {x} , y ) , \end{aligned}$$

with:

$$\begin{aligned} \frac{\delta p( {x} , y )}{\delta t}&= \left( \sum _{ {x} '} q( {x} | {x} ')\left( \int _0^{\infty } \Phi _{ {x} '}(\tau ) e^{M( {x} ')\tau } \hbox {d}\tau \right) \right. \mathbf {u}( {x} ') - \left( \int _0^{\infty } \phi _{ {x} }(\tau ) e^{M( {x} )\tau } \hbox {d}\tau \right) \left. \mathbf {u}( {x} )\right) _{ y } \end{aligned}$$
(C4)

and \(p( {x} , y )\) given in Eq. (C1).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marzen, S.E., Crutchfield, J.P. Prediction and Dissipation in Nonequilibrium Molecular Sensors: Conditionally Markovian Channels Driven by Memoryful Environments. Bull Math Biol 82, 25 (2020). https://doi.org/10.1007/s11538-020-00694-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11538-020-00694-2

Keywords

Navigation