Renewal model for dependent binary sequences

We suggest to construct infinite stochastic binary sequences by associating one of the two symbols of the sequence with the renewal times of an underlying renewal process. Focusing on stationary binary sequences corresponding to delayed renewal processes, we investigate correlations and the ability of the model to implement a prescribed autocovariance structure, showing that a large variety of subexponential decay of correlations can be accounted for. In particular, robustness and efficiency of the method are tested by generating binary sequences with polynomial and stretched-exponential decay of correlations. Moreover, to justify the maximum entropy principle for model selection, an asymptotic equipartition property for typical sequences that naturally leads to the Shannon entropy of the waiting time distribution is demonstrated. To support the comparison of the theory with data, a law of large numbers and a central limit theorem are established for the time average of general observables.


Introduction
Binary processes arise in natural and social sciences any time a phenomenon is dichotomized according to the occurrence or non-occurrence of a property of interest. Modeling, statistical inference from data, and generation of dependent binary sequences, both finite and infinite, have thus received a lot of attention in a number of disciplines, such as statistical physics [1], system biology [2], computational neuroscience [3], actuarial and financial sciences [4], and machine learning [5], among others. While finite sequences are generally associated with graphical models [6], infinite sequences are usually understood as time series.
Modeling and generating binary sequences with prescribed correlations is one of the major problems to be faced [1]. For finite sequences, the solution to the problem is in principle represented by the Ising model, which is an exponential family able to reproduce all inputs in (the interior of) the correlation polytope [6]. For infinite sequences, the problem of constructing binary variables with given correlations is more challenging since there is no universal framework to refer to. Indeed, finite-dimensional distributions structured according to the Ising model are generally not consistent under marginalization, so that they cannot be regarded as children of a common probability measure associated with a stochastic process. On the other hand, the Ising model is the only known parametric family that can cover all correlations of a finite number of binary variables. For this reason, in order to describe and generate infinite binary sequences, researchers have devised several approaches, each with its own advantages and disadvantages, which can be grouped in autoregressive models [7][8][9][10][11][12][13][14], latent factor models [15][16][17], and mixed models which combine autoregression with latent factors [7,20]. Autoregressive models are Markov chains with the property that the current probability of a symbol conditional on the past history is determined by a linear function of the previous outcomes [7], in a number corresponding to the order of the chain, linearity being postulated to not have explosion of parameters. These type of models can produce any exponential decay of correlations, whereas subexponential decays are off limits due to Markovianity and finite state space [21]. Latent factor models rely on an underlying latent process, and the first proposal to generate an infinite dependent binary sequence was to clip a latent Gaussian process at a fixed level [15]. Different latent factor approaches are represented by mixture models whereby the binary symbols are drawn independently with law determined by the realization of an underlying process, such as a latent Gaussian process [16] or a latent binary process [17]. Latent factor approaches can introduce a long-range dependence between the binary variables and can thus describe both exponential and subexponential decays of correlations, such as polynomial decay. However, the analysis of clipping performed in [18] has revealed that there are serious restrictions to the type of correlations that can be produced. In fact, decay rates, in the case of exponential decay, and decay exponents, in the case of polynomial decay, cannot be smaller than a certain minimum threshold. Similar restrictions have been found in mixture models with latent binary inputs [17]. These restrictions have been weakened in [19] by means of an algorithm that by iterating the latent factor model of [17], which transforms the latent binary input in a binary output, adds dependence to an initial uncorrelated binary sequence step by step. Other algorithms to generate dependent binary sequences and their demonstration through numerical experiments are discussed in [22][23][24][25].
In this paper we suggest and investigate a new latent factor approach for infinite binary sequences that relies on renewal processes [26]. Our proposal is to associate one of the two symbols of the sequence with the renewal times of a discrete-time delayed renewal process, and the other symbol with all other times, thus defining a regenerative phenomenon a la Kingman [27]. Delay makes it possible to construct stationary binary sequences. This way, we can offer a flexible and powerful model that is under full mathematical control and that is able to implement a large variety of subexponential prescriptions for the correlations, from polynomial to stretched-exponential decays as we shall demonstrate. Furthermore, reflecting an underlying renewal structure, generation of the process in numerical experiments is immediate.
The paper is organized as follows. In section 2 we introduce the model and discuss its Markovian limit. Section 3 is devoted to correlations. In this section we prove the fundamental property of the model that binary symbols that follows a renewal time are independent of previous symbols and we show that the autocovariance solves a renewal equation. Then we investigate the direct problem of determining the asymptotics of correlations from the waiting time distribution and the inverse problem of associating a renewal structure to a prescribed autocovariance for binary symbols. In section 4 we prove limit theorems for the probability of typical sequences and for the time averages of general observables. In particular, we demonstrate an asymptotic equipartition property that naturally introduces the Shannon entropy of the waiting time distribution and leads to the maximum entropy principle for model selection [28,29]. Then we provide a law of large numbers and a central limit theorem for the normal fluctuations of empirical means, and we apply them to estimation of the waiting time distribution and of the autocovariance from data. These results cannot be deduced from the standard limit theory of regenerative processes with independent cycles [26] and require new arguments. Conclusions are drawn in section 5. The most technical mathematical proofs are reported in the appendices in order to not interrupt the flow of the presentation.

Definition of the model
Let on a probability space (Ω, F , P) be given positive integer-valued random variables S 1 , S 2 , . . .. We construct a binary stochastic sequence X := {X t } t≥1 with entries valued in N 2 := {0, 1} by supposing that the variable S n is the waiting time for the nth symbol 1 of the sequence, which occurs at the renewal time T n := S 1 + · · · + S n . Thus, X t := 1 if t ∈ {T n } n≥1 and X t := 0 otherwise for each t ≥ 1. Our fundamental assumptions are (i) that waiting times are independent and (ii) that S 2 , S 3 , . . . have the same distribution, which could differ from the distribution of S 1 . Under these assumptions, the sequence {T n } n≥1 is a renewal process, delayed if S 1 is not distributed as S 2 . We refer to [26] for basics of renewal theory. The reason for letting S 1 behave differently from the other waiting times is that its distribution can be chosen in such a way that the sequence X is stationary, as explained by the following theorem which is proved in Appendix A. We recall that the process is understood as a random element in the set N ∞ 2 of all binary sequences equipped with the cylinder σ-field B. Hereafter, E denotes expectation with respect to P. Theorem 2.1 recovers well-known conditions for the delayed renewal chain {T n } n≥1 to be stationary, that means that the statistical properties of the number of renewals in a given temporal window are invariant with respect to time shifts [26]. Under the hypothesis of stationarity, which is now made, the only input of the model X is the distribution of the waiting time S 2 . Depending on the need, we refer to its density p := P[S 2 = · ] or to its tail Q := P[S 2 > · ]. Thus, according to Theorem 2.1, we suppose that µ : The analysis of the finite-dimensional distributions reveals that if the sequence X is stationary, then it is time-reversible in the sense that (X t , . . . , X 2 , X 1 ) is distributed as (X 1 , X 2 , . . . , X t ) for all t. The finite-dimensional distributions of X are provided by the next proposition, which is demonstrated in Appendix B. We make the usual conventions that an empty sum is assumed to be 0 and an empty product is assumed to be 1.
Proposition 2.1 can be used to compute the conditional probability of a symbol given the past and allows us to get some insight into the dependence structure of the stochastic sequence X. Let us denote by l t (x 1 , . . . , x t ) the position of the first symbol 1 in the binary string (x 1 , . . . , x t ): l t (x 1 , . . . , x t ) := l if x l = 1 and x 1 = · · · = x l−1 = 0 for some l ≤ t, and l t (x 1 , . . . , x t ) := ∞ if x 1 = · · · = x t = 0. Proposition 2.1 shows through easy manipulations that for any positive integer t and binary numbers x 1 , . . . , x t provided that P[X 1 = x t , . . . , X t = x 1 ] > 0. Pay attention to the inverse order of x 1 , . . . , x t in the conditioning event. Formula (2.1) draws a link between our model and the so-called "stochastic chains with memory of variable length" [30]. In this class of models, which can be built on a larger alphabet than N 2 , the number of variables involved in the conditioning event is determined by a "context length function", which itself depends on the past variables. The context length function of our model is exactly the function l t that maps any binary string (x 1 , . . . , x t ) in l t (x 1 , . . . , x t ). Since the function l t is unbounded as the time t goes on, our model constitutes a stochastic chain having in general memory of unbounded variable length.
In spite of an unbounded context length, the process X is a Markov chain for certain particular waiting time distributions p. The sequence X is a Markov chain of order M ≥ 1 if the conditional probability of a symbol given the past depends only on the state of the last M variables. We can include sequences of i.i.d. random variables in this definition by letting M to take the value zero. The requirement that the conditional probability (2.1) is independent of x M +1 , . . . , x t for Markovianity of order M ≥ 0 results in the following corollary of Proposition 2.1, whose proof is reported in Appendix C.
Corollary 2.1. The binary sequence X is a Markov chain of order M ≥ 0 if and only if there exists a real number λ ∈ [0, 1) such that p(M + s + 1) = λ s p(M + 1) for all s ≥ 1.

Correlations
The binary sequence X := {X t } t≥1 is a regenerative process with regeneration points {T n } n≥1 and independent cycles [26]. This means that the past and the future of each renewal event, i.e. of each symbol 1, are independent. Such conditional independence is stated by the following proposition, whose proof is provided in Appendix D.
Proposition 3.1. For every positive integers s ≤ t and binary numbers x 1 , . . . , x t Conditional independence associated with renewals is exploited in this section to study correlations. The covariance of two random variables Y and Z on the probability space For every t ≥ 0, the autocovariance ρ t := cov[X 1 , X t+1 ] reads ρ t = c t − c 2 0 with Notice that c 0 = E[X 1 ] = 1/µ. The next proposition, which is based on Proposition 3.1 and is proved in Appendix E, shows that the sequence c := {c t } t≥0 solves a renewal equation with waiting time distribution p. As a consequence, the renewal theorem [31] gives lim t↑∞ c t = c 0 /µ = c 2 0 , namely lim t↑∞ ρ t = 0, provided that p is aperiodic. The probability distribution p is aperiodic if there is no proper sublattice of {1, 2, . . . } containing its support.
By resorting to the literature on the renewal equation, we investigate the direct problem of determining the asymptotics of the sequence c, and hence of the autocovariance, from a given waiting time distribution p, as well as the inverse problem of finding conditions on a prescribed c such that c solves a renewal equation with some probability distribution p of finite mean. The latter aims to answer the question of which autocovariance structures can be reproduced by our model. Before that, we want to point out that the autocovariance of the binary sequence X describes the time dependence of any temporal correlations, as shown by the following technical lemma that explores temporal correlations of general observables. The lemma is proved in Appendix F and will be used in section 4 to address the mixing properties of X. and N n 2 , respectively, such that f (0, . . . , 0) = g(0, . . . , 0) = 0. Set Z t := g(X t , . . . , X t+n−1 ) for t ≥ 1. Then, for all t ≥ 1 where for i ≥ 1 and j ≥ 1 with p(0) := −1.

Autocorrelation: direct problem
Let us study the asymptotics of the autocovariance for a given waiting time distribution. Pure exponential decay of correlations can be described by Markov chains as identified by Corollary 2.1 and is somehow trivial. We shall touch on the exponential behavior of the autocovariance when dealing with the inverse problem. Here we focus on subexponential decays that account for long-range dependence that cannot be explained by Markov processes. A natural setting for subexponentiality in renewal theory was given in [32]. Let the symbol ∼ denotes asymptotic equivalence for sequences: a t ∼ b t means lim t↑∞ a t /b t = 1. Following [32], we say that a positive real sequence a := {a t } t≥0 belongs to the class S of subexponential sequences if A := t≥0 a t < ∞, a t+1 ∼ a t , and t n=0 a n a t−n ∼ 2Aa t . The requirement a t+1 ∼ a t prevents a from growing exponentially, thus justifying the terminology "subexponential". The asymptotic behavior of the autocovariance ρ t can be characterized in general when there exists λ ∈ (0, 1] such that {λ −t Q(t)} t≥0 ∈ S , namely when the tail probability Q has exponential rate λ and a subexponential correction that forms a sequence in S . The case λ = 1 corresponds to a pure subexponential behavior. In fact, Theorem 3.2 of [33] for the rate of convergence of renewal sequences gives the following result.
Theorem 3.1. Assume that p is aperiodic and that {λ −t Q(t)} t≥0 ∈ S for some λ ∈ (0, 1]. Then Subexponential behaviors that find wide application are polynomial decays, which fall under the umbrella of regular variation, and Weibull-type decays represented by stretched exponentials. We now suppose λ = 1 and discuss these decays in some detail. We stress that the necessary condition t≥0 Q(t) < ∞ for the sequence {Q(t)} t≥0 to belong to S is satisfied since t≥0 Q(t) = µ < ∞ by the hypothesis of stationarity.

Polynomial decay
A positive sequence a := {a t } t≥0 is regularly varying if there exists an index α ∈ R and a slowly varying function such that a t ∼ t α (t). A real measurable function is slowly varying if it is positive on a neighborhood of infinity, say (τ, ∞) with some τ > 0, and satisfies the scale-invariance property lim z↑∞ (ηz)/ (z) = 1 for any number η > 0. Trivially, a measurable function with a finite positive limit at infinity is slowly varying. The simplest non-trivial example is represented by the logarithm. We refer to [34] for the theory of slow and regular variation. We stress that a slowly varying function is dominated by polynomials in the sense that lim z↑∞ z γ (z) = ∞ and lim z↑∞ z −γ (z) = 0 for all γ > 0 according to Proposition 1.3.6 of [34]. The uniform convergence theorem [34] states that the scale-invariance property of slowly varying functions actually is uniform for η that belongs to each compact set in (0, ∞). This fact can be used to show that a t+1 ∼ a t if a is a regularly varying sequence. Combined with the dominated convergence theorem, it also shows that t n=0 a n a t−n ∼ 2 t/2 n=0 a n a t−n ∼ 2Aa t when A := t≥0 a t < ∞, t/2 denoting the integer part of t/2. Thus, any summable regularly varying sequence is an element of S . Summability imposes the restriction α ≤ −1 to the index.
These arguments show that if Q(t) ∼ t −γ−1 (t) with an exponent γ > 0 and an arbitrary slowly varying function , then {Q(t)} t≥0 ∈ S . In such case we have the asymptotic equivalence n>t Q(n) ∼ (1/γ)t −γ (t) by Proposition 1.5.10 of [34]. Thus, we get the following corollary of Theorem 3.1.
Corollary 3.1. Assume that p is aperiodic and Q(t) ∼ t −γ−1 (t) with an exponent γ > 0 and a slowly varying function . Then In contrast to the latent factor models analyzed in [17] and [18], which can account for polynomial decay of correlations but not for too small exponents, Corollary 3.1 of Theorem 3.1 shows that a renewal structure is able to describe polynomial decays of the autocovariance with any exponent γ > 0. Actually, the hypothesis γ > 0 is not necessary and we can also have γ = 0, so that the autocovariance decays slower than any polynomial, but in this case the asymptotic behavior of { n>t Q(n)} t≥0 cannot be resolved in general. Notice that summability of {Q(t)} t≥0 when γ = 0 imposes restrictions on . For instance, if Q(t) ∼ t −1 (ln t) −β−1 with a number β > 0, then {Q(t)} t≥0 ∈ S and Theorem 3.1 gives

Stretched-exponential decay
In [32] the following sufficient condition for subexponentiality was proposed. Let h be a continuously differentiable real function on a neighborhood of infinity, say (τ, ∞) with some τ > 0, such that its derivative h enjoys the properties that −z 2 h (z) is increasing to infinity with respect to z and that ∞ τ e 1 2 z 2 h (z) dz < ∞. Then, a positive sequence a := {a t } t≥0 such that A := t≥0 a t < ∞ and a t+1 ∼ a t ∼ e −th(t) satisfies t n=0 a n a t−n ∼ 2Aa t , and hence belongs to S . We have used this criterion to investigate the asymptotic behavior of the autocovariance when Q(t) ∼ e −t β (t) with a stretching exponent β ∈ (0, 1) and some function . The following corollary of Theorem 3.1, which is proved in Appendix G, gives sufficient conditions on that imply {Q(t)} t≥0 ∈ S . We point out that is slowly varying under those conditions. Corollary 3.2. Assume that p is aperiodic and Q(t) ∼ e −t β (t) with a stretching exponent β ∈ (0, 1) and a twice continuously differentiable positive function on a neighborhood of infinity that satisfies lim z↑∞ z (z)/ (z) = 0 and lim z↑∞ z 2 (z)/ (z) exists. Then To conclude, we observe that the hypothesis β < 1 can be relaxed in favor of β = 1 to come fairly close to exponential decay of correlations while staying in the framework of subexponential sequences. If for example Q(t) ∼ e −t(ln t) −γ with some number γ > 0, then This function h satisfies the above sufficient condition for subexponentiality, so that

Autocorrelation: inverse problem
Let us now investigate the possibility to implement a prescribed autocovariance. Here the focus is on short time scales since Theorem 3.1 and its Corollaries 3.1 and 3.2 already settle the issue on long time scales, demonstrating that a large class of asymptotic prescriptions can be obtained. We want to understand conditions on a non-negative sequence c := {c t } t≥0 under which there exists a waiting time distribution p of finite mean whose associated stationary binary sequence X := {X t } t≥1 satisfies c t = E[X 1 X t+1 ] for all t ≥ 0. For simplicity, we assume that c t > 0 for every t ≥ 0. In the light of Theorem 2.1 and Proposition 3.2, this is tantamount to ask under which conditions on c there exists a probability distribution p with the properties s≥1 sp(s) = 1/c 0 and t s=1 p(s) c t−s = c t for all t ≥ 1. Such a p, if any, is uniquely defined by the renewal equation and meets the requirement This way, the renewal theorem [31] gives lim t↑∞ Finding the minimal conditions on the sequence c for the existence of an associated waiting time distribution p is a difficult task. We stress that the problem consists in determining whether or not the function p that solves the problem t s=1 p(s) c t−s = c t for every t ≥ 1 is non-negative and sums to 1. However, there is a sufficient condition that covers many applications. A sequence c : for all t ≥ 1. It follows that c 0 > 0. The following theorem states that the hypothesis that c is a Kaluza sequence such that lim t↑∞ c t = c 2 0 guarantees the existence of an associated waiting time distribution of finite mean. The proof is provided in Appendix H.
Then, there exists a unique waiting time distribution p with the properties s≥1 sp(s) = 1/c 0 and t s=1 p(s) c t−s = c t for all t ≥ 1. As a consequence, the stationary binary sequence We point out that Theorem 3.2 of [33] offers an inverse of Theorem 3.1: if c := {c t } t≥0 is a Kaluza sequence such that lim t↑∞ c t = c 2 0 and {c t+1 −c t } t≥0 ∈ S , then there exists a unique associated waiting time distribution p whose tail Q enjoys the property ( A practical criterion to recognize a Kaluza sequence that puts the emphasis on the autocovariance is the following. Consider a sequence c := {c t } t≥0 defined by c 0 := ξ and c t := ξ 2 + me −φ(t) for t ≥ 1 with constants ξ ∈ (0, 1] and m ∈ [0, ξ(1 − ξ)] and a concave function φ such that φ(0) = 0 and lim z↑∞ φ(z) = ∞. Then, c is a Kaluza sequence and This way, the concavity of φ and the consequent convexity of e −φ give c t−1 c t+1 − c 2 t ≥ 0 for all t ≥ 1. We have thus proved the following corollary of Theorem 3.2.

Limit theorems
There are a number of limit theorems for the sequence X := {X t } t≥1 that follow from an underlying ergodic property. In this section we discuss some of these limit theorems, proving at first ergodicity of a related dynamical system. We refer to [35] for basics of ergodic theory. Recalling that B denotes the σ-field on N ∞ 2 generated by the cylinder subsets, it is now convenient to introduce the probability measure P o [ · ] := P[X ∈ · ] on B and the probability space (N ∞ 2 , B, P o ). We can deduce almost sure convergence in (Ω, F , P) from almost sure convergence in (N ∞ 2 , B, P o ). In fact, since almost sure convergence is defined only in terms of probability distributions [36] x, then lim t↑∞ G t (X) = G(X) P-a.s.. The same can be said for convergence in mean. The reason to deal with the new probability space (N ∞ 2 , B, P o ) is that it can be endowed with a measure-preserving transformation. Such transformation is the left-shift operator T that maps any binary sequence x : The operator T is measurable and, due to stationary of X, preserves measures, namely The transformation T is strong-mixing in the sense of ergodic theory, as stated by the following lemma that in proved in Appendix I. The proof relies on Lemma 3.1.
Lemma 4.1. Assume that p is aperiodic. Then, for all A and B in B Due to strong-mixing, the transformation T is ergodic according to Corollary 1.14.2 of [35], which means that the only members We use ergodicity to demonstrate an asymptotic equipartition property, which justifies the principle of maximum entropy for model selection, and to investigate the behavior of empirical means.

Asymptotic equipartition property
Description of data requires to select a statistical model, that is a waiting time distribution p once our framework is considered. One tool for model selection is the maximum entropy principle [28,29], which would amount to pick the probability distribution p of finite mean that meets certain moment constraints representing the available information and that maximizes the Shannon entropy. The Shannon entropy H(p) of p is defined by for all s ≥ 1 by concavity. Actually, the largest value of the entropy is µ ln µ + (1 − µ) ln(µ − 1), which is attained if and only if p(s) = µ −s (µ − 1) s−1 for every s. The Shannon entropy can be derived axiomatically as a measure of uncertainty in the outcomes of a random variable. Instead, in this section we show that the entropy H(p) of p naturally arises as the answer to the question "how many typical sequences are there?". In fact, we demonstrate that there are about e (t/µ)H(p) typical sequences of length t, each with probability about e −(t/µ)H(p) . It follows that selecting the waiting time distribution that maximizes the entropy means not excluding possible sequences arbitrarily. If the only available information is the mean µ, then the maximum entropy prescription is the waiting time distribution p defined by p(s) = µ −s (µ − 1) s−1 for all s. According to Corollary 2.1, the binary sequence X associated with such p is a sequence of i.i.d. binary random variables with mean 1/µ.
Let us formalize and explain the above statements. Together with the Shannon entropy of the waiting time distribution, we consider for each t ≥ 1 the Shannon entropy H(π t ) of the finite-dimensional distribution π t of X: Proposition 2.1 and stationary and reversibility of X give After invoking the properties of Cesáro means and the dominated converge theorem, this formula shows that the entropy rate of the sequence X is Then, under ergodicity of the left-shift operator T , the Shannon-McMillan-Breiman theorem [37] These considerations, in combination with Lemma 4.1 and the possibility to deduce almost sure convergence in (Ω, F , P) from almost sure convergence in (N ∞ 2 , B, P o ), prove the following asymptotic equipartition property, which states that all long typical sequences of length t have roughly the same probability e −(t/µ)H(p) .
Theorem 4.1 implies that there are about e (t/µ)H(p) typical sequences of length t. This fact is illustrated by the following corollary, whose proof is reported in Appendix J. A formal notion of typical set is needed here. Given a real number δ ∈ (0, 1), according to [37] we say that X ⊆ N t 2 is a typical set of length t ≥ 1 if P[(X 1 , . . . , X t ) ∈ X ] ≥ 1 − δ. We denote by |X | the cardinality of a set X .

Empirical means
Applications to real data require to explain whether or not ensemble averages can be estimated by means of time averages, also known as empirical means. If G is a B-measurable function on N ∞ 2 , then its empirical mean up to time t ≥ 1 is the random variable Birkhoff ergodic theorem [35] tells us that if the left-shift operator T is ergodic and the expectation The convergence also holds in mean by Corollary 1.14.1 of [35]. This way, Lemma 4.1 and the possibility to export convergences from (N ∞ 2 , B, P o ) to (Ω, F , P) result in the following law of large numbers, which gives a positive answer to the possibility of estimating ensemble averages with time averages. We stress that in most applications the observable G depends only on a finite number of variables, so that G is automatically B-measurable and bounded.
P-a.s. and in mean.
We deepen the study of empirical means investigating their normal fluctuations. Since X is a regenerative process with independent cycles [26], if G depends on exactly one binary variable, then the normal fluctuations of its empirical mean are described under the optimal hypothesis E[S 2 2 ] = s≥1 s 2 p(s) < ∞ by the central limit theorem for cumulative processes with a regenerative structure [26]. It is worth noting that in such case the empirical mean of G up to time t is a linear function of the number of renewals by t, so that even its large fluctuations can be completely characterized through well-established large deviation principles for discrete-time renewal-reward processes [38,39], which include the counting renewal process. To deal with functions G that involve more than one variable, and that cannot be tackled by the standard limit theory of regenerative processes with independent cycles, we resort to the theory of the central limit theorem for stationary sequences [40]. To begin with, we introduce the strong mixing coefficient α t of the sequence X, which measures the memory of the past on future events t time steps later. According to [21], the strong mixing coefficient α t , or α-mixing coefficient, of X is defined for each t ≥ 1 by Here F b a is the σ-algebra generated by X a , . . . , The sequence X is strong mixing in the sense of probability theory if lim t↑∞ α t = 0. The following proposition provides an estimate of the α-mixing coefficient of our model. The proof is based on Lemma 3.1 and is proposed in Appendix K. Recall that ρ t := cov[X 1 , X t+1 ].
Empirical means display normal fluctuations when the strong mixing coefficient decays reasonably fast, and precisely when t≥1 α t < ∞. In fact, let us consider a bounded observable G and for each n ≥ 0 let us set Z n := G(T n X) for brevity. Due to boundedness of G Theorem 4.2 tells us that lim t↑∞ (1/t)  is finite and non-negative, and provided that v = 0 We point out that the hypotheses of the theorem about G are automatically satisfied when G depends only on a finite number of variables, since in such case E[Z 0 |F m 1 ] = Z 0 for all sufficiently large m. The following lemma based on Proposition 4.1 shows that the finiteness of the second moment of the waiting time distribution suffices for t≥1 α t < ∞, thus implying the validity of the central limit theorem. The proof is provided in Appendix L.

Empirical analyses
Finally, we demonstrate the theory of empirical means through estimation of the waiting time distribution and of the autocovariance from data. The focus is on the possibility to identify their decay, that is on the possibility to estimate p(s) for large s and ρ τ for large τ . In order to avoid complications related to nonlinear functions of empirical means, we imagine that µ is known in advance. Importantly, we suppose that either (i) or (ii) of Lemma 4.2 hold, so that if G is an observable that depends on a finite number of variables, then both the law of large numbers stated by Theorem 4.   We have made use of the identity s≥1 s 2 p(s) = µ + 2µ 3 t≥0 ρ t provided by Lemma 4.2 to get at the second equality. If a data sequence of length t is given, then we expect to be able to estimate p(s) for values of s such that v s /t p(s). At large s, this means values of s such that p(s) µ/t because v s ∼ µp(s). Figure 4.3 shows results of estimation for two data sequences of length t = 10 6 and t = 10 8 , respectively, when data are generated by the models of Figure 4.1, which correspond to c t = 1/4 + (1/4)(1 + t) −γ for t ≥ 0 with exponents γ = 2 and γ = 4. In such cases, the condition p(s) µ/t for proper estimation reads s 1.56 t 1/4 for γ = 2 and s 1.65 t 1/6 for γ = 4. Figure 4.4 reports the same estimation for the models of for all x ∈ N ∞ 2 . As E[Z 0 ] = ρ τ , the empirical mean of F estimates ρ τ . Once again, simple applications of Proposition 3.1 yield cov[Z 0 , Z n ] = µ 2 c 2 n c τ −n −c 2 τ if 0 ≤ n < τ and cov[Z 0 , Z n ] = µ 2 c 2 τ ρ n−τ if n ≥ τ . This way, the variance v τ := cov[Z 0 , Since lim τ ↑∞ v τ = c 2 0 (1 − 4c 0 + 3c 2 0 ) + 8c 2 0 t≥0 ρ t + 2µ t≥1 ρ 2 t =: σ 2 with σ > 0, we expect that ρ τ can be estimated with a data sequence of length t when τ satisfies ρ τ σ/ √ t.  This time, the condition ρ τ σ/ √ t is τ 0.25 ln 2 t for β = 1/2 and τ 0.5 ln t for β = 1. The comparison between the waiting time distribution and the autocovariance shows that the former is easier to estimate than the latter.
A final remark is in order. If µ is not known in advance, then its inverse can be estimated through the empirical mean of the observable G(x) := x 1 for all x ∈ N ∞ 2 . Apart from an additive constant, this observable is (4.2) with τ = 0. Thus, the variance of its empirical mean exactly is v 0 given by (4.3).

Conclusions
We have explored the use of renewal processes to generate binary sequences valued in {0, 1}, where the symbol 1 marks a renewal event. Focusing on stationary binary sequences corresponding to delayed renewal processes, we have demonstrated the ability of the model to account for subexponential autocovariances with special attention to polynomial and stretched-exponential decays. Our model performs at least as well as the algorithms proposed in [19] and [23,24] that seem to represent the state of the art, but at variance with them, generating a binary sequence with our model is a trivial task. Furthermore, our model is under full mathematical control, and this fact allowed us to build a mathematical theory of its asymptotic properties. In fact, in addition to shedding light on the asymptotic behaviors of a large class of correlations and to demonstrating an asymptotic equipartition property, we have developed a theory for empirical means proving a law of large numbers and a central limit theorem. The latter describes the typical fluctuations of empirical means when the second moment of the waiting time distribution is finite. We leave for future work the study of typical fluctuations when the second moment of the waiting time distribution is infinite. In such case, empirical means are expected to be not in the Gaussian basin of attraction. We also leave for future work the investigation of their large fluctuations.

Author declarations: Data availability
No datasets were used to support this study.

A Proof of Theorem 2.1
A monotone class argument [36] shows that the sequence X is stationary, namely that for all B ∈ B, if and only if (X 2 , . . . , X t+1 ) is distributed as (X 1 , . . . , X t ) for every t ≥ 1. If the process X is stationary, then for any s ≥ 2 P[T 1 ≥ s] = P X 1 = · · · = X s−1 = 0 = P X 2 = · · · = X s = 0 = P X 1 = 0, X 2 = · · · = X s = 0 which, due to the independence between S 1 = T 1 and S 2 = T 2 − T 1 , is tantamount to This identity is trivial for s = 1 and is valid for all s ≥ 1 as a consequence. Let us prove that (X 2 , . . . , X t+1 ) is distributed as (X 1 , . . . , X t ) for any given t ≥ 1. Pick binary numbers x 1 , . . . , x t . By writing X k+1 as 1 {k+1∈{Tn} n≥1 } and by observing that S 1 is independent of At this point, we use the formula P[S 1 = s This way, by also noticing that The third term of the r.h.s. cancels with the first one since the independence between S 2 and {T n − S 1 − S 2 + s} n≥2 shows that the latter is an expanded version of the former. Then

C Proof of Corollary 2.1
Thanks to the independence between S 1 and S 2 , for all t ≥ 1 we have Thus, if X is a sequence of i.i.d. binary random variables with mean 1/µ, then for each t Setting λ := 1 − 1/µ ∈ [0, 1), this identity shows that p(s + 1) = λ s p(1) for all s ≥ 1. On the contrary, if p(s+1) = λ s p(1) for all s ≥ 1 with some real number λ ∈ [0, 1), then Proposition 2.1 demonstrates that X is a sequence of i.i.d. random variables after some simple algebra.
Let us move to discuss Markovianity of order M ≥ 1. The sequence X is a Markov chain of order M ≥ 1 if for all t > M and x 1 , . . . , x t+1 such that P[ The constraint P[X 1 = x 1 , . . . , X t = x t ] > 0 can be dropped by multiplying both sides by P[X 1 = x 1 , . . . , X t = x t ] and P[X 1 = x 1 , . . . , X M = x M ]. In fact, the above condition is tantamount to ask the for all t > M and x 1 , . . . , x t+1 which, due to stationarity, reads By making the choice x 1 = 1 and x 2 = · · · = x t+1 = 0 in (C.1) we get for each t > M which, by appealing to the independence between S 1 and S 2 once more, reduces to This is a necessary condition for Markovianity of order M ≥ 1.
In order to find a sufficient condition, we plug in (C.1) the explicit expression of finitedimensional distributions given by Proposition 2. Fix positive integers s ≤ t and arbitrary binary numbers x 1 , . . . , x t . The proposition is trivial if s = 1, s = t, or 1 < s < t and x s = 0. Then, assume 1 < s < t and x s = 1. Since the condition X s = 1 implies that s ∈ {T m } m≥1 and since T 1 < T 2 < · · · , we can write For each m ≥ 1, the sequence {T n − T m } n≥m is independent of {T n } m n=1 and distributed as {T n − T 1 } n≥1 }, so that This identity proves the proposition since by taking the sum over x 1 , . . . , x s−1 we realize that , which allows us to write down This way, Proposition 3.1 shows that Finally, stationarity yields (1 − X m+t+k−1 )X m+t+j−1 for j ≥ 1 and t ≥ 1. We are going to show that and, for every i ≥ 1 and j ≥ 1, We verify (F.1) at first. Pick t ≥ 1. The identities 1 = Z m+t as f (0, . . . , 0) = 0 and g(0, . . . , 0) = 0 by hypothesis. Then, we can write down the formula for two given indices i ≤ m and j ≤ n. The condition Φ i = 0 implies X m−i+1 = 1 and X m−i+2 = · · · = X m = 0. This way, Proposition 3.1 yields Similarly, as the random quantity Ψ j (t)Z m+t is a function of X m+t , . . . , X m+t+n−1 only, we have By continuing with these arguments, since the condition Ψ j (t) = 0 entails X m+t = · · · = X m+t+j−2 = 0 and X m+t+j−1 = 1, we realize that . . . , 0, X m+t+j−1 , . . . , X m+t+n−1 ) X m+t+j−1 = 1 .
These four relations show that .

(F.4)
On the other hand, reversibility of X gives and Let us move to (F.2), which we prove by induction with respect to t. To begin with, let us verify that cov[Φ i , Ψ j (1)] = C i,j (1) for each i ≥ 1 and j ≥ 1. By recalling (F.6) and (F.8) and by invoking stationarity of X we find On the other hand, by definition we have

Proposition 3.2 tells us that
Then, by recalling that p(0) := −1, we realize that
To conclude, let us show that if cov[Φ i , Ψ j (t)] = C i,j (t) for all positive indices i and j and some t ≥ 1, then cov[Φ i , Ψ j (t + 1)] = C i,j (t + 1). Fix i ≥ 1 and j ≥ 1. By using the identity Ψ j (t + 1) = Ψ j+1 (t) + X m+t j−1 k=1 (1 − X m+t+k )X m+t+j and the induction hypothesis we get On the other hand, Proposition 3.1 allows the factorization .
Since X m+t = Ψ 1 (t), the induction hypothesis and stationarity of X give It follows that This bound shows that is slowly varying since lim a↑∞ σ(a) = 0. It also shows that βz β−1 (z)/2 ≤ (z + 1) β (z + 1) − z β (z) ≤ 2βz β−1 (z) for each z large enough, so that lim z↑∞ [(z + 1) β (z + 1) − z β (z)] = 0 due to the fact that is dominated by polynomials according to Proposition 1.3.6 of [34]. Then, Let h be the continuously differentiable real function on a neighborhood of infinity that maps z in h(z) := z β−1 (z). We now demonstrate that −z 2 h (z) is increasing to infinity with respect to z and that ∞ τ e 1 2 z 2 h (z) dz < ∞ for some τ > 0. It follows that {Q(t)} t≥0 ∈ S by the condition for subexponentiality given in [32], so that Theorem 3.1 ensures us that ρ t ∼ (1/µ 3 ) n>t Q(n). As lim z↑∞ z (z)/ (z) = 0 by hypothesis, the formula Since is dominated by polynomials, this limit entails lim z↑∞ −z 2 h (z) = ∞ and at the same time shows that ∞ τ e 1 2 z 2 h (z) dz < ∞ for some τ large enough. It remains to verify that −z 2 h (z) is increasing with respect to z. The condition lim z↑∞ z (z)/ (z) = 0 implies lim z↑∞ z 2 (z)/ (z) = 0 by L'Hôpital's rule as lim z↑∞ z 2 (z)/ (z) is assumed to exist. Then, the formula This suffices to state that −z 2 h (z) is increasing with respect to z.

H Proof of Theorem 3.2
The renewal equation defines a unique function p according to the scheme p(1) := c 1 /c 0 and p(t + 1) := c t+1 /c 0 − t s=1 p(s)c t−s+1 /c 0 for t ≥ 1. We must verify that p(s) ≥ 0 for all s ≥ 1 and that s≥1 p(s) = 1.
We verify non-negativity of p by induction. We have p(1) := c 1 /c 0 > 0. Pick t ≥ 1 and assume that p(s) ≥ 0 for every s ≤ t. To say that c is a Kaluza sequence means to say that {c n /c n−1 } n≥1 is non-decreasing, so that c t−s+1 /c t−s ≤ c t+1 /c t for each s ≤ t. Then, the renewal equation yields Let us now show that p sums to 1. By combining non-negativity of p with the renewal equation we get n s=1 p(s) c t−s ≤ c t for all t ≥ n ≥ 1. This bound gives s≥1 p(s) ≤ 1 by sending t to infinity and by recalling that lim t↑∞ c t = c 2 0 by hypothesis. Then, the fact that s≥1 p(s) < ∞ allows us an application of the dominated convergence theorem to the renewal equation, which shows that s≥1 p(s) = 1 by using lim t↑∞ c t = c 2 0 once again.

K Proof of Proposition 4.1
Fix t ≥ 1 and assume that n≥t |ρ n+1 − ρ n | < ∞ and n≥t n |ρ n+1 − 2ρ n + ρ n−1 | < ∞, otherwise there is nothing to prove. We verify first that for all positive integers m and n and all sets A ∈ F m 1 and B ∈ F m+t+n−1 Coefficients C i,j (t) were introduced by Lemma 3.1. A monotone class argument [36] allows us to extend the bound (K.1) to all B ∈ F ∞ m+t , so that α(t) ≤ i≥1 j≥1 |C i,j (t)|. Then we demonstrate that i≥1 j≥1 Let us verify (K.1). Pick m ≥ 1, n ≥ 1, A ∈ F m 1 , and B ∈ F m+t+n−1 m+t . Since A is measurable with respect to F m 1 , there exists a set A ⊆ N m 2 such that A = {ω ∈ Ω : (X 1 (ω), . . . , X m (ω)) ∈ A}. In the same way, there exists B ⊆ N n 2 with the property that B = {ω ∈ Ω : (X m+t (ω), . . . , X m+t+n−1 (ω)) ∈ B}.
where for i ≥ 1 and j ≥ 1 with p(0) := −1. As f and g can only take the values −1, 0, and 1, this formula results in Let us move now to (K.2). Simple algebra shows that for all i ≥ 1 and j ≥ 1 We recall that Q is the tail of the probability distribution p. This identity leads to the bound i≥1 j≥1 and We address K 1 , K 2 , and K 3 separately.

L Proof of Lemma 4.2
Assume that p is aperiodic. To begin with, we demonstrate that there exists a real sequence {γ t } t≥0 such that t≥0 |γ t | < ∞ and 1 t≥0 Q(t)z t = t≥0 γ t z t (L.1) for all complex numbers z in the open unit disk, Q being the tail of the waiting time distribution p. Consider the function f defined by f (z) := t≥0 Q(t)z t for |z| ≤ 1. We have 1 − s≥1 p(s)z s = (1 − z)f (z) for all |z| ≤ 1. We claim that f has no zeros for |z| ≤ 1.
In fact, |z| < 1 entails | s≥1 p(s)z s | < 1, so that any zeros of f must occur on the circle |z| = 1. The point z = 1 cannot be a zero since f (1) = µ = 0. If f (e iθ ) = 0 for some θ ∈ (0, 2π), then s≥1 p(s)e isθ = 1, which gives cos(sθ) = 1 for all s ≥ 1 such that p(s) > 0. But this is impossible because p is aperiodic. Then, the function that maps z in 1/f (z) has no singularities in the open unit disk and is continuous in the closed unit disk. As a consequence, for |z| < 1 it can be expanded in a power series as 1/f (z) = t≥0 γ t z t with γ t defined for every t ≥ 0 by Here r is any number in (0, 1). As 1/f is bounded on the closed unit disk by continuity, we can invoke the dominated convergence theorem to set r = 1 in the last integral: At the same time, since f (e iθ ) = t≥0 Q(t)e itθ converges absolutely and has no zeros for real values of θ, Theorem 18.21 of [42] states that 1/f (e iθ ) = ∞ t=−∞ ξ t e itθ for all θ ∈ [0, 2π] with coefficients that fulfills ∞ t=−∞ |ξ t | < ∞. By plugging this expansion in (L.2) we get γ t = ξ t for every t ≥ 0, and hence t≥0 |γ t | < ∞.
We now observe that by taking a Cauchy product, formula (L.1) shows that the sequence {γ t } t≥0 solves the problem γ 0 = 1 and γ t = − t s=1 Q(s)γ t−s for all t ≥ 1. Simple manipulations of the renewal equation demonstrate that the same problem is solved by the sequence {γ t } t≥0 with entries γ 0 = 1 and γ t = µ(ρ t − ρ t−1 ) for t ≥ 1. Since this problem has a unique solution, we get γ t = µ(ρ t − ρ t−1 ) for t ≥ 1 and 1 t≥0 Q(t)z t = 1 + µ t≥1 (ρ t − ρ t−1 )z t for every |z| < 1. Derivative with respect to z gives This formula is what we need to prove the equivalence between points (i) and (ii) of the lemma. In fact, if p is aperiodic and s≥1 s 2 p(s) < ∞, then 1/ t≥0 Q(t)z t = t≥0 γ t z t with t≥0 |γ t | < ∞ and t≥1 tQ(t) = s≥1 (1/2)(s − 1)sp(s) < ∞. This way, since the Cauchy product of absolutely convergent series is absolutely convergent, by expanding the l.h.s. of (L.3) in a power series and by making a comparison with the r.h.s. we realize that t≥1 t|ρ t − ρ t−1 | < ∞. If instead t≥1 t|ρ t − ρ t−1 | < ∞, then p is aperiodic as we have seen in the proof of Proposition 4.1, so that (L.3) holds. It follows that n t=1 tQ(t) ≤ µ 3 t≥1 t|ρ t − ρ t−1 | < ∞ for all positive n, which suffices to prove that s≥1 s 2 p(s) < ∞. Suppose now that p is aperiodic and that s≥1 s 2 p(s) < ∞. Let us demonstrate that t≥1 α t < ∞. Proposition 4.1 tells us that t≥1 α t ≤ 3µ 2 t≥1 t|ρ t − ρ t−1 | + 4µ 2 t≥1 t 2 |ρ t+1 − 2ρ t + ρ t−1 |. We already know that t≥1 t|ρ t − ρ t−1 | < ∞ and it remains to verify that t≥1 t 2 |ρ t+1 − 2ρ t + ρ t−1 | < ∞. By taking the derivative of (L.3) with respect to z we get (L.4) A shift of the index of the last series yields (L.5) By subtracting (L.5) from (L.4) we reach the result As before, using 1/ t≥0 Q(t)z t = t≥0 γ t z t and bearing in mind that the Cauchy product of absolutely convergent series is absolutely convergent, by expanding the l.h.s. of (L.6) in a power series and by making a comparison with the r.h.s. we find t≥1 t 2 |ρ t−1 −2ρ t +ρ t+1 | < ∞.