A Large Deviation Perspective on Ratio Observables in Reset Processes: Robustness of Rate Functions

We study large deviations of a ratio observable in discrete-time reset processes. The ratio takes the form of a current divided by the number of reset steps and as such it is not extensive in time. A large deviation rate function can be derived for this observable via contraction from the joint probability density function of current and number of reset steps. The ratio rate function is differentiable and we argue that its qualitative shape is ‘robust’, i.e. it is generic for reset processes regardless of whether they have short- or long-range correlations. We discuss similarities and differences with the rate function of the efficiency in stochastic thermodynamics.


Introduction
Stochastic reset processes have the property of being re-initialised at random times to a specific initial condition, which can be a particular probability distribution, or a fixed state. Although their natural framework is the mathematical language of renewal theory, see [17,22], in the last decade reset processes have also been widely studied in the statistical physics community. They have been used to model the population dynamics after catastrophic events [6,13,33], the dynamics of queues [13,20], the path of a transport protein moving on a cytoskeleton filament [42,43], and also the foraging of animals in nature [2]. Clearly, different observables are of interest in various real world scenarios, for instance, mean first passage times in search strategies for animal foraging [16], or additive functionals of time (e.g. position or current) in cellular transport proteins. Reset applications are not only restricted to classical environments, but can also be extended to quantum mechanical systems [18,44,52].
Communicated by Abhishek Dhar.
In this paper, we focus attention on a different class of observables: ratios between addictive functionals of time. In finance, for example, one can calculate the Sharpe ratio, which gives a good estimate of the excess expected return of an investment given its volatility [35]. In stochastic thermodynamics, physicists have recently studied thermodynamic/kinetic uncertainty relations, which give bounds for a type of ratio observable [1,14,53]. In the same field, there are studies of fluctuations of the efficiency, defined as the ratio between the output work and the input heat, of small-scale engines working in an energetically unstable environment [19,48,49,[57][58][59]. The latter is relevant in biology, where understanding the efficiency of molecular motors [31], e.g. myosin heads moving on actin filaments [8,32], is important in medical applications. Ratios also appear in probability theory, for example representing maximum likelihood estimators for Ornstein-Uhlenbeck processes with and without shift [3,4].
Of particular significance for the present work, it is argued in stochastic thermodynamics [57,58] that the fluctuating efficiency can be described by a universal large deviation rate function shape, characterised by having a maximum (as well as the usual minimum) and tails tending to a horizontal asymptote. These intriguing features have attracted considerable recent attention with physical explanations proposed for their appearance [19,57,58]. Understanding both typical efficiency values, attained in the long-time limit, and fluctuations, arising in finite time, is important for predicting the performance of nano-motors, which can now be realised experimentally [39]. Beyond this practical example, uncovering the general features of ratio quantities contributes to building the theoretical framework of non-equilibrium statistical mechanics where dynamical fluctuations, studied by means of large deviation theory, play a crucial role. In this spirit, we present here a rigorous analysis of ratio observables associated to a particular class of stochastic processes: although such ratios are not true efficiencies, they share many features, e.g. the tail shape, and thus help to elucidate the underlying mathematical structure.
We now outline the concrete details of our approach. In this paper we study the ratio of the integrated current and the number of resets in a discrete-time reset process, aiming to understand its probability measure in the exponential scaling limit by means of large deviation theory. We prove that a large deviation principle is valid for this quantity by means of a contraction principle allowing us to transfer, under a continuous mapping, the large deviation principle that holds for the joint observable (current, number of reset steps) to the ratio observable. We then investigate the form of the obtained large deviation rate function in several situations, and we notice in all cases that, although tails are always bounded from above by a horizontal asymptote, the characteristic fluctuating efficiency maximum [19,57,58] is not present. Indeed, while the asymptotic shape is a notable property of ratio observables, which often present heavy tails and lack of exponential tightness in their distributions, the maximum can be thought as a geometric consequence of having both positive and negative fluctuations in the denominator. The main result we find relates to the 'robustness' of the large deviation ratio rate function. We argue that the qualitative shape of the rate function is generic for reset processes whether they have short-or long-range correlations. In particular, we show that the rate function is differentiable. In contrast, we prove (calculation in the appendix) that when the reset nature of the process is lost, i.e. the numerator of the ratio observable is independent from the denominator, a 'mode-switching' phase transition in the fluctuations of the ratio appears, and the rate function is not differentiable.

Model Framework
The reset process we consider has the property of being returned at random times to a certain initial condition represented by a 'fixed' internal state. For our purposes, it suffices to think of a discrete-time random walk with hopping probabilities that depend on the time since reset. Here, the reset can be thought of as restarting a clock variable which controls the dynamics.
In describing our models we find it useful to split the reset process into two layers. The bottom layer is a discrete-time stochastic process X n = (X 1 , X 2 , ..., X n ) composed of n Bernoulli random variables of parameter r , i.e. with probability r , X i = 1 (corresponding to a reset at the i-th time step), otherwise X i = 0 (no reset). The top layer is a discrete-time (but continuous-space) random walk Y n = (Y 1 , Y 2 , ..., Y n ), taking a jump, Y i , at the i-th time step according to a certain probability density function depending in general on the time since the last reset. For definiteness, we think of periodic boundary conditions since we are chiefly interested in the net movement of the random walker rather than its position. We refer to the bottom layer as the on-off process, and to the top layer as the random walk. The reset nature of the process arises from the restarting of the internal clock (happening when X i = 1), re-initialising the dynamical rules for the movement of the random walk in the top layer.
In this framework the observables we study are: the empirical number of reset steps, the empirical current, and their ratio. They read respectively We focus on the long-time behaviour of these observables, with the aim of studying the exponential scaling of the ratio probability density function. The intensive (rescaled) observables are: N n /n:=η ∈ D [0, 1], J n /n:= j ∈ R, and Ω n :=ω ∈ R. Note that N n , the denominator of Ω n , can take only positive values with 0 included. The possible divergence in the ratio will be important later when considering the validity of the so-called contraction principle. The reset character arises from the correlations between N n and J n which come from two sources. Firstly, we typically enforce Y i = 0 when X i = 1 (corresponding to freezing of the current during reset in the spirit of [25]). Secondly, we allow the possibility that the distribution of Y i when X i = 0 depends on the time elapsed since the last reset (i.e. the internal clock time). It is the presence of these correlations that makes our study of reset processes a difficult, and interesting, task. To gain some initial intuition and to demonstrate the mathematical techniques, we first introduce two minimal models where correlations are minimised. Later we will consider models with both types of correlations discussed above, as well as those where the on-off process is itself correlated. The first minimal model, called M 1 , does not present any of these correlations, i.e. it is characterised by having completely uncorrelated layers. To be more specific, regardless of what happens in the bottom layer, the random walk in the top layer takes a jump at time step i according to a Gaussian distribution of mean μ and variance σ 2 = 2. The second minimal model, called M 2 , presents the first kind of correlations introduced above; it is a type of 'lazy' random walk as in [25]. In contrast to M 1 , now the top layer is coupled with the bottom onethe random walk takes a jump at the i-th time step only if a reset does not happen in the other layer (X i = 0), according to the Gaussian probability density function introduced above.

Large Deviation Principles for the Empirical Means
A large deviation principle (LDP) holds for a particular (continuous) observable A n associated to a stochastic process if where I ∈ [0, ∞) is the so-called large deviation rate function. In our convention P is a probability measure, whereas P is a probability density function, i.e. P(A n /n ∈ [a, a + da]) = P(a)da. With a little abuse of notation, the shorthand P(A n /n = a):=P(A n /n ∈ [a, a +da]) is used for both continuous and discrete random variables throughout the paper. The dominant exponential behaviour of the probability measure corresponds to P (A n /n = a) = e −n I (a)+o(n) for large n. Usually, in large deviation applications, this is written as: where represents asymptotic identity at logarithmic scale. The rate function I is continuous 1 and its zeros represent typical values attained by A n /n in the thermodynamic limit n → ∞, while its tails characterize how likely fluctuations are to appear. One straightforwardly has LDPs for the time-additive observables N n /n and J n /n such that we can write the large deviation forms P(η) e −n I (η) , and P( j) e −n I ( j) . The traditional way to prove an LDP for a general observable A n is to apply the Gärtner-Ellis theorem, which makes use of a Legendre-Fenchel transform in order to calculate the rate function from the scaled cumulant generating function (SCGF) defined as Note that this theorem requires that the SCGF exists and is differentiable. 2 For the models M 1 and M 2 , introduced in Sect. 2.1, the SCGFs associated to N n /n and J n /n can be calculated straightforwardly. For the on-off process they are whereas for the current process we have and Notice that λ M 1 (k) is the SCGF associated to a random walk with no resets, and it often appears in the text. Throughout the manuscript we consistently use l for the conjugate variable to N n /n and k for the conjugate variable to J n /n to indicate implicitly the corresponding random variable without complicating the notation. [A similar convention applies for the arguments of rate functions.] All the functions introduced above are differentiable in the interior of their domains, thus in principle one can calculate the corresponding rate functions via the Gärtner-Ellis theorem (6). Analytically we can show for M 1 that can only be calculated numerically.
Making use of the Gärtner-Ellis theorem once again, it is also possible to show that an LDP holds for the joint probability density function P(η, j). In order to do so, we need to find the SCGF λ(l, k). For the model M 1 , since Y n and X n are independent processes However, for M 2 more care is needed. We calculate the moment generating function G M 2 (l, k, n) directly using the definition of G and conditioning the process Y n on X n , G M 2 (l, k, n) = E e l N n +k J n = x n y n ∈R n dy n P(Y n = (y 1 , ..., y n )|X n = (x 1 , ..., x n )) ×P(X n = (x 1 , ..., x n ))e l i x i +k i y i .
First we exploit the independence in both processes and then the fact that the X i s are identically distributed: The rescaled limit of the logarithmic moment generating function is Hence, both λ M 1 (l, k) and λ M 2 (l, k) exist and are differentiable in the interior of their domains D × R. This is sufficient to state that LDPs hold for the joint probability density functions In fact, for M 1 it suffices to recall that Y n and X n are independent of each other, and this implies that In contrast, for M 2 correlations between the top and the bottom layers do not allow us to proceed analytically and I M 2 (η, j) can only be calculated numerically, either parametrically, exploiting Legendre duality I M 2 (λ (l), λ (k)) = λ (l)l + λ (k)k − λ(l, k), or by direct implementation of the Legendre-Fenchel transform (15). 3

Large Deviation Principle for the Ratio
We now turn to the main topic of the paper, showing that an LDP holds also for the extensive observable nΩ n in the form This follows by the contraction principle [11,56] from the LDP for the joint probability density function. The contraction principle is a powerful general technique which shows that an LDP is conserved under any continuous mapping, and here makes evident how the LDP for the ratio derives as a restriction on the bigger state space of (N n , J n ). More concretely the contraction principle emerges as a saddle-point approximation on the LDP for the probability density function P(η, j). As a consequence an LDP holds also for the ratio with rate function A caveat here is that, in fact, our mapping ω = j/η is continuous everywhere except at η = 0; this has important consequences to which we will return later. For the rate functions I M 1 (ω) and I M 2 (ω) the 'infimum' in equation (17) involves transcendental equations, but we report in Fig. 1 the rate functions calculated numerically. Note, as expected, that: (i) there is a unique zero representing the typical value taken by the ratio observable in the long-time (thermodynamic) limit n → ∞, which is easily calculated as ω M 1 = μ/r for the case of completely uncorrelated layers andω M 2 = (1 − r )μ/r for the correlated model, where the random walk in the top layer only hops on average for a fraction 1 − r of steps; (ii) the fluctuations, represented by the non-zero values of the rate functions, are obviously not symmetric for μ = 0; and (iii) the curves look smooth. [This last point is investigated in detail in Sect. 3

.]
A more unusual feature of the ratio rate functions is the presence of horizontal asymptotes and the associated non-convexity. 4 In fact, the horizontal asymptotes correspond to the limit η → 0 + , where the mapping is not continuous. Here rare events are realised by heavy tails which mask the linear exponential scaling of the ratio observable. Furthermore, we can straightforwardly obtain the position of the asymptotes through the argument that large deviations are realised in the least unlikely amongst all the unlikely ways [12]. In the case μ > 0 such analysis shows that values of the rate function for ω → ∞ are given by the probability of a typical current J n =ĵ and N n → 0 + , rather than J n → ∞ and N n finite and non-zero. Similarly, in the case we look at ω → −∞, the rate function is given by the probability of J n → 0 − and N n → 0 + , rather than requiring J n → −∞. Hence, the asymptotes read (in the case μ > 0) Evidently I (ω → −∞) > I (ω → +∞), corresponding to asymmetric fluctuations for μ > 0. A reflected argument holds for μ < 0, whereas for μ = 0, due to the symmetry of the random walk, asymptotes are equivalent and fluctuations are symmetric. Note also that the non-convexity of the rate function is a feature that could not have been obtained by means of the Gärtner-Ellis theorem and Legendre-Fenchel transformation.
In closing this subsection, we remark that similar features have been observed in other ratio observables in the field of stochastic thermodynamics. In the work of Verley et al. [57], and following papers [19,48,49,58], the object of study is the ratio between the output work produced and the input heat absorbed in different systems representing nano-machines operating in the presence of highly fluctuating energy fluxes. In that case too the tails of the ratio rate function were found to tend to an asymptotic value. Indeed, we can understand this asymptotic behaviour in the rate function as a universal property of ratio observables when the denominator can, in principle, approach 0. For instance, one can simply think of the ratio of two arbitrary Gaussian distributions [26,27,37,38], which can be proven to be always composed of a mixture of two terms: a Cauchy unimodal distribution, and a bimodal distribution with heavy tails. Note that in this example, and the thermodynamic efficiency associated with nano-engines, the denominator in the ratio can have both positive and negative fluctuations which generically lead to the presence of a maximum in the rate function. In particular, this maximum marks a transition between a phase where fluctuations are generated by atypical values in the numerator, and a phase where they are generated by atypical values in the denominator. In contrast, in our case, the denominator can have only positive fluctuations so no such maximum appears.

Results
In this section we apply the previously-introduced methods to analyse how the ratio observable behaves in some stochastic reset models.

Robustness and Differentiability of Ratio Rate Functions
So far we have seen that in toy models, e.g. M 1 and M 2 , the ratio rate function appears smooth (everywhere differentiable). We are interested in understanding whether this is always the case, even for genuine reset models with correlations.
It is known that, at equilibrium, non-differentiable points in rate functions are connected to phase transitions. For instance, the microcanonical entropy of non-additive systems with longrange interactions, under mean-field-like approximations, can present non-differentiable points [5,7,23,29], signalling microcanonical first-order transitions. Furthermore, the appearance of cusps in SCGFs or in large deviation rate functions is also an important topic in nonequilibrium physics and can be understood as the appearance of dynamical phase transitions in the fluctuations. As one recent example, a non-differentiable point in the Freidlin-Wentzell (low-noise limit) large deviation rate function for the current of a driven diffusing particle on a ring has been identified [40,46,50]. In a similar spirit, in the next sub-sections, we will examine the smoothness of the ratio rate function for stochastic reset processes with dynamical phase transitions in the fluctuations of observables J n and N n .
In order to check if a ratio rate function I (ω) is differentiable we seek to find necessary and sufficient conditions for the appearance of non-differentiable points. So far, many results are known regarding the regularity of functions coming from a variational calculation such as that of equation (17). In [9], and later in [10,28], sufficient conditions have been found such that 5 The conditions to meet are: first, η has to be minimized over a set that does not depend on ω, and second, related to the implicit function theorem, the minimizer function η(ω), satisfying equation (17), has to be continuous and bijective. In our case, the first condition always holds so negation of the second one, meaning that the solution set of minimizers η (for a particular ω) is not singleton, is necessary for the function I (ω) to present jumps in its first derivative. We believe that for well-behaved I (η, j) this necessary condition becomes also sufficient; one exception is the case where I (η, 0) itself has a non-singleton set of minima.
In practice, this means that determining whether I (ω) is differentiable boils down to analysing If for a particular ω * this equation is verified for more than a single value of η ∈ (0, 1], then a non-differentiable point should appear at I (ω * ). We will see in Appendix 2 an example where this can be checked analytically. However, often an analytical form ofĨ (η, ω) is not available and in such cases we conduct a numerical analysis, discretizing the domain of ω and plotting the locus of minimizing points (η, ηω) of equation (17) on the joint rate function I (η, j). In general, if the set of minimizers η is not always singleton, the locus (η, ηω) on I (η, j) presents a linear section of values η satisfying the minimization condition, and . Such a feature is seen in the analytical example of Appendix 2 and is related to the appearance of a 'mode-switching' dynamical phase transition in the generation of fluctuations; it is useful to compare Fig. 7b showing the phase transition with Figs. 2b and 5b of the main text where no such transition is present.
As well as smoothness of the ratio rate function we are interested in whether it retains its general shape under the addition of interactions in genuine reset processes. Loosely speaking, we will use the term 'robust' for the case where the qualitative shape retains the salient features discussed in Sect. 2.3. This is reminiscent of the concept of 'universality' in other areas of statistical physics.

Finite-Time Correlations
Our analysis begins with the model of [25]. In this model, the combination of reset and finite-time correlations in the random walk generates dynamical phase transitions (DPTs) in the observable J n . These are distinguished by the analyticity of the SCGF: for first-order DPTs the SCGF is not differentiable, whereas for continuous DPTs it is. Here, DPTs are interpreted as transitions between fluctuations that involve resets and fluctuations that do not. The Legendre-Fenchel transform can be applied to the differentiable branches of the SCGF, and, as a consequence, the rate function so obtained will present a gap for the set of values which the derivative of the SCGF does not take. It is customary in statistical mechanics to extend the region over which the Legendre-Fenchel transform is defined by a Maxwell construction, i.e. by drawing a linear section connecting the two branches of the rate function. In general, the function so derived is the convex-hull of the true rate function, but for finitetime correlations, because of subadditivity, it is known to be exactly the true rate function. In [25] the linear sections appearing in the current rate functions correspond to mixed regimes where typical trajectories switch between periods with resets and periods with no resets. In the following, we want to understand how these DPTs influence the ratio observable Ω n .

Model M a
As in M 1 and M 2 the bottom layer is a discrete-time stochastic process X n = (X 1 , X 2 , ..., X n ) composed of n Bernoulli random variables of parameter r , and the top layer is a discrete-time (but continuous-space) random walk Y n = (Y 1 , Y 2 , ..., Y n ). At time step l after a reset, the random walk takes a jump according to an l-dependent Gaussian density function with mean μ and variance σ 2 = 2(1 − B/(l + d)), a function of parameters d and 0 < B ≤ d + 1. If time goes on, and reset events do not happen, the variance of the Gaussian distribution increases monotonically towards the asymptote σ 2 = 2 as l → ∞. In this model we focus on the long-time behaviour of the observables introduced in Sect. 2.1: N n , J n , and the ratio Ω n . As J n is correlated with N n , this leads naturally to short range correlations, and although DPTs are not present in N n , they are in J n .

Joint Scaled Cumulant Generating Function
In order to derive the function I M a (ω), characterising the exponential scaling of Ω n , we first need to derive the joint rate function I M a (η, j), which can be done by applying the standard procedure of Sect. 2 for LDPs of empirical means. It is difficult to calculate the SCGF λ M a (l, k) directly from the definition lim n→∞ (1/n) ln E e l N n +k J n , as there is no factorization property. For this reason we rely on a different technique originally applied to study the free-energy of long linear chain molecules [34], such as DNA [47], and more recently used in the context of continuous-time [43] and discrete-time reset processes [25]. The strategy is to rewrite the moment generating function G r M a (l, k, n) as a convolution of independent 'microscopic' contributions and then to take its discrete Laplace transform (or z-transform)G r M a (l, k, z). This method is tantamount to working in the grand-canonical ensemble in time, where z represents the fugacity, and allows us to relax the constraint we would have on summing over paths of a certain length when calculating the moment generating function directly. The SCGF λ M a (l, k) is then obtained as the natural logarithm of z * (l, k), the convergence boundary point (loosely the radius of convergence) ofG r M a (l, k, z). In our set-up the microscopic moment generating function for a sequence of n−1 non-reset steps (along with n − 1 random walk jumps) followed by one reset step is 6 where n ≥ 1 and H n = n k=1 1/k is the truncated harmonic series. Note that we exclude microscopic sequences of zero length by enforcing W (l, k, 0) = 0. The convolution of the microscopic moment generating functions returns the generating function of the whole process. Notice that this procedure assumes that the process always finishes with a reset step; this assumption is expected to make no difference in the infinite-time limit, at least in the case of finite moments in the distribution of inter-reset times.
We can now calculate the z-transformG r M a (l, k, z). First we distribute the n factors of z −n among the microscopic sequences and then we change the order of summation over n and s as follows: The {i σ } is interpreted as a sum over all possible configurations of s sequences of length {i 1 , i 2 , ..., i s } with the constraint that σ i σ = n. This restriction can be dropped when first summing over all possible values of s, allowing us to carry out each sum over i σ independently: In the last step a geometric series appears andW (l, k, z):= ∞ n=1 W (l, k, n)z −n denotes the z-transform of W (l, k, n). Notice again that our choice of indexing excludes zero-length trajectories; this affects only the 'boundary' term in the numerator. We remark thatW (l, k, z) 6 One can also define two different microscopic generating functions for sequences characterised by only non-reset steps and only reset steps. As in [25], this approach is particularly useful when the probability of reset does not depend on the time elapsed since last reset. However, our generating function (20), built on the more general framework of renewal theory, allows the consideration of different scenarios in the on-off process [53]. As mentioned, the SCGF λ M a (l, k) corresponds to the natural logarithm of the largest real value z * (l, k) at whichG r M a (l, k, z) diverges. This can be identified withẑ, the largest real solution of the equationW (l, k, z) = 1, whenẑ ≥ z c ; otherwise we have directly z * = z c . The change from convergent to divergentW (l, k, z) corresponds to a phase transition in the reset process, between a regime where current fluctuations are optimally realised in the presence of reset events, and a regime where current fluctuations are realised by trajectories with no reset events at all [25,51]. As discussed in the previous section, if the obtained SCGF λ M a (l, k) is differentiable, the joint large deviation rate function I M a (η, j) can be derived by Legendre-Fenchel transform (15). 7

Ratio Observable
By the contraction principle (17) on I M a (η, j) we derive I M a (ω), focusing on the asymmetric case μ = 0. In Fig. 2a we plot the ratio rate functions for a random walk with μ = −1, and varying B. Notice that the left tails all coincide, indeed such fluctuations are obtained taking η → 0 + , regardless of the current, which stays in its typical stateĵ = (1 − r )μ for any ω ≤ω M a = (1 − r )μ/r . For any value of the parameters μ and B, the rate functions I M a (ω) in Fig. 2a look robust, i.e. the qualitative features are unchanged under the appearance of DPTs in the fluctuations of the current J n , generated by finite-time correlations.
I M a (ω) also looks differentiable. As argued in Sect. 3.1, this can be investigated looking at the locus of minimizers (η, ηω), satisfying equation (17), on the joint rate function. We note that only the presence of first-order DPTs in the process is translated into the appearance of linear regions in I M a (η, j), and only these could in principle influence the fluctuations of the observable Ω n . On increasing the parameter B these linear regions extend and get closer to the bottom of the joint rate function, where contraction minimizers (η, ηω) lie. However, this does not affect the variational calculation much. We report in Fig. 2b example minimizers  (η, ηω) for the case B = 2.5. It is evident that the locus of minimizing points is a curve which stays close to the minimum of I M a (η, j) where linear sections from first-order DPTs extend only in pathological cases (e.g. r → 1, d → ∞ and B = O(d)).
In order to gain more general understanding about the robustness and differentiability of the ratio rate function when finite-time correlations generate DPTs, we also investigated a rather unphysical model (denoted by M a1 ) characterised by having a first-order DPT in the on-off process in the bottom layer uncoupled from the random walk in the top layer. The joint SCGF in this case is artificially constructed and is Here 0 ≤ b ≤ 1/4 is a parameter which allows us to move the first-order DPT. Calculating analytically the rate function I M a1 (η) we see that for small but finite b the linear section extends close to the minimum without actually reaching it. We see that in this case the ratio rate function I M a1 (ω) is robust and presents a unique typical state. The limiting case b = 0 is pathological in the sense that I M a1 (η) has a flat section at zero leading to a corresponding flat section in I M a1 (ω). However, even here the ratio rate function is differentiable.

Numerical Checks
In deriving the SCGF as the natural logarithm of the convergence radius z * (l, k), we assume that any non-analyticities in pre-factors in the moment generating function G r M a (l, k, n) do not affect its exponential behaviour in the limit n → ∞. 8 To show that such pre-factors do not play a role here, we make use of an inverse numerical z-transform ofG r M a (l, k, z) to check that the directly calculated moment generating function G r M a (l, k, n) approaches z * (l, k) smoothly in the limit n → ∞.
The inverse z-transform ofG r M a (l, k, z) is defined as However, numerical integration may lead to inaccurate results and hence we make use of two other techniques as explained in [41]: the first method is algebraic, based on truncating the z-transform, the second method relies instead on a discrete Fourier transform. We refer to Appendix 1 for further details on the methods. In Fig. 3 we compare the SCGF λ r M a (l, k) calculated as the natural logarithm of the convergence radius ofG r M a (l, k, z) with the rescaled natural logarithm of the approximated (a) (b) Fig. 3 Comparison of SCGFs obtained as ln z * (l, k) and through inverse z-transforms using the algebraic method (Alg), and the inverse Fourier transform (Inv. Fourier); a differentiable SCGF, b SCGF nondifferentiable at k 1.25. Curves are not distinguishable by eye moment generating function G r (l, k, n) obtained using the methods explained above, for cases and with and without a first-order DPT. As computation becomes daunting quite fast, we report the comparison only for a subset of the domain of λ M a (l, k). In both cases there is a very good matching between curves, suggesting that pre-factors in the moment generating function G r M a (l, k, n) do not influence our study, and the obtained LDPs for the joint observable (N n , J n ) are correct.

Infinite-Time Correlations
In our models so far we have seen that short-range correlations, although they may generate DPTs in fluctuations of the current or number of resets, do not have much influence on the asymptotic fluctuations of the observable Ω n , whose rate function is robust and stays differentiable. Now we extend the analysis to long-range correlations. We present here a model where infinite-time correlations appear in the bottom layer, representing the on-off process, and extend to the coupled random walk in the top layer. In Appendix 2 we consider a similar artificial model where we remove the coupling between the two layers, allowing us to carry out some analytical calculations and make an illuminating comparison.

Model M b
Differently from the models M 1 and M 2 of Sect. 2.1 and model M a of Sect. 3.2, here the bottom layer is composed of two discrete-time stochastic processes glued together: n Bernoulli random variables of parameter r , X n = (X 1 , X 2 , ..., X n ), and a 'stiff' block of another n variables X n = (X n+1 , X n+2 , ..., X 2n ) either all 0, or all 1 with equal probability. Note that both the blocks are extensive in time. In the top layer a discrete-time and continuousspace random walk composed of is coupled with the bottom process. If X i = 0 the random walk takes a jump of non-zero length according to a Gaussian density function of mean μ and variance σ 2 = 2; on the other hand when reset occurs X i = 1 and Y i = 0. Besides the label M b , we propose the name of two-block reset model to refer to this stochastic reset process. Indeed, the bottom on-off process is similar in spirit to the so-called two-block spin model introduced in [55]; the first block of steps X n plays the role of a classical paramagnet, whereas the second half X n is analogous to a ferromagnet and brings infinite-time correlations both in the bottom layer and, as a consequence of the coupling, in the top one.
If the Bernoulli parameter is r = 1/2, our model is directly equivalent to the twoblock spin model in [55], where reset steps correspond to up spins and non-reset steps correspond to down spins. In this case, we can obtain the large deviation rate function I 1/2 M b (η) = − lim n→∞ 1/(2n)P(N 2n /(2n) = η) by following the derivation in [55]. Specifically, we map the energy per spin u into the mean number of reset steps η according to η = 1 + u/2 and, as in [56], reflect and translate the microcanonical entropy s(u) by  (1/2) ln |Λ|, where |Λ| = 2 is the state-space cardinality of the Bernoulli random variables. This leads to which we plot in Fig. 4a along with its convex envelope. Notice that I 1/2 M b (η) has two minima, η 1 = r /2 andη 2 = (1 + r )/2, corresponding to the boundaries of the flat region of zeros in its convex envelope. Although in the general case r = 1/2, the microcanonical description breaks down, as microstates are no longer equally likely, it is still possible to calculate the rate function I M b (η) from a probabilistic point of view. The derivation begins with conditioning P(N 2n = 2nη) on the appearance of a block (X n+1 , X n+2 , ..., X 2n ) of either all reset steps or all non-reset steps. This breaks the ergodicity, now either η ∈ [0, 1/2] or η ∈ (1/2, 1], and everything boils down to calculating the probability that in the first block (X 1 , X 2 , ..., X n ) there are either n(2η − 1) or 2nη reset steps. The number of reset steps follows a binomial distribution, thus making use of Stirling's approximation we get Obviously, this recovers (25) in the case r = 1/2. Furthermore, as expected, I M b (η) is a non-convex function, which is a consequence of long-range correlations in the model. Indeed, similarly to [55], adding an extensive block of steps which are either all reset or all non-reset, makes the model a 'switch' between two different phases. Naturally, since the top layer is coupled with the bottom one, the phase transition appearing in the on-off process is inherited also by the random walk and this is reflected in the joint large deviation structure. From the joint SCGF λ M b (l, k), calculated below, one obtains the current SCGF λ M b (k) = lim n→∞ 1/(2n) ln E e 2nk j , which reads Since λ M b (k) has non-differentiable points, the Legendre-Fenchel transform only recovers the convex hull of the true current large deviation rate function. In fact, here the transform can only be done numerically; we show the result in Fig. 4b for two different values of the parameter μ and r = 1/2. It is easy to prove that, provided μ = 0, there are two jump discontinuities in the derivative of λ M b (k). They arise at k * 1 = 0, and at k * 2 = −μ, and are also evident in Fig. 4b where I M b ( j) possesses linear sections with slopes k * 1 and k * 2 . In particular we note that the flat section of zeros is bounded by the typical values for the currentĵ 1 Summing up, the main property of this model is the appearance of a DPT in the bottom layer, where long-range correlations break the ergodicity causing the system trajectories to be characterised by having either η ∈ [0, 1/2] or η ∈ (1/2, 1]. In our reset process, since the random walk in the top layer is coupled with the on-off layer, the phase transition is inherited by the random walk, provided that μ = 0. [If the random walk is symmetric, it will keep taking jumps of mean length 0, and these do not bring any extensive contribution to the observable J 2n , regardless of the phase manifest in the bottom layer.] Below we consider how this behaviour is reflected in the observable Ω 2n .

Joint Scaled Cumulant Generating Function
We first calculate the moment generating function G r M b (l, k, 2n) by using its definition and introducing the auxiliary random variable S ∼ Bernoulli(1/2) characterising the nature of the block (X n+1 , X n+2 , ..., X 2n ). This allows us to write the two observables of interest as The calculation follows by recognising that we can split the whole expectation value into two independent factors: one related to the process composed of Bernoulli random variables X n in the bottom layer, and one related to the 'stiff' bit X n . This yields where in the last line we recall that the first integral has already been calculated in Sect. 2.2, whereas the second one can easily be done recognising the i.i.d. property of the conditioned process Y n | {S = 0}. The SCGF is obtained as follows: It is analytical everywhere except on the locus of points k 2 + μk − l = 0 in R 2 . The Gärtner-Ellis theorem can be applied on the differentiable regions, and the convex hull of the large deviation rate function I M b (η, j) can thus be obtained numerically through Legendre-Fenchel transform. Notice here that the function I M b (η, j), as a consequence of Legendretransforming, presents a flat region of zeros.

Ratio Observable
Once again, the large deviation rate function I M b (ω) is obtained contracting the joint rate function I M b (η, j) using equation (17). Consistent with the presence of a phase transition in the typical states of the observables N n and J n , we expect that the observable Ω n (for μ = 0) has two typical states, also featuring an ergodicity breaking. This is indeed the case, as we can see from Fig. 5a, where for any curve obtained with μ = 0 there is a flat region marking a non-singleton set of zeros. The boundaries of this set are highlighted by coloured dots which mark the two typical statesω 1 =ĵ 1 /η 2 = (1 − r )μ/(1 + r ) andω 2 =ĵ 2 /η 1 = (2 − r )μ/r arising from the ergodicity breaking in the on-off process in the bottom layer. As evident in Fig. 5b, the flat region in the ratio rate function corresponds to a set of zeros appearing in the joint rate function I M b (η, j) minimizing the variational equation (17) for ω ∈ ω 1 ,ω 2 . Notice that this flat region of zeros does not represent a phase coexistence region where fluctuations have a different scaling, as seen for instance in systems like the 2D Ising model [15,56], models of growing clusters [30], and critical constrained pinning models [60,61]; it is just a consequence of calculating the joint rate function I M b (η, j) through Legendre-Fenchel transform, which gives as output the convex hull of the true joint rate function. To support this argument we compare the ratio rate function I M b (μ) obtained for μ = −1 with Monte Carlo simulations in Fig. 6. Here we see that the simulations, which presumably converge to the true ratio rate function as the trajectory length is increased, do not match with the theoretical curve in the flat region. Instead, they highlight the two typical states and suggest the same scaling of large deviations throughout the domain, indicating once again that the flat part does not constitute a phase coexistence region, but is just the convex hull of the true rate function in that interval.
Although the ratio Ω n has an ergodicity-breaking phase transition by construction of the process, tails of the rate functions still seem to be robust. Numerics suggest that the rate function is differentiable, which we believe is a consequence of correlations between the onoff process in the bottom layer and the random walk in the top layer. Indeed, the presence of such correlations gives a curved shape to the joint rate function, and for this reason the locus of minimizers satisfying equation (17) draws a curve without linear sections on I M b (η, j) (see Fig. 5b). In contrast, model M b1 in Appendix 2 has no correlations between the layers, and shows the appearance of a non-differentiable point at ω * = 0. A pre-cursor of this non-differentiability can be seen in Fig. 5a where a rapid change of slope happens close to ω = 0.
Finally, we argue that a flat region in the ratio rate function is manifest generically when a phase transition generates a flat region of zeros (not coincident with the η axis) in the joint rate function. The phase transition can be in the bottom layer, as seen here, or directly in the random walk layer, or in both. We also investigated a reset process based on the number of red balls extracted from a Eggenberg-Pólya urn model with two initial balls: a red one, and a blue one [17,36]. In this process the resets are power-law correlated but the ratio rate function is found to be qualitatively equivalent to that shown here, and is not explicitly reported.

Conclusion
We have studied large deviation properties of a ratio observable in stochastic reset processes. We focused on a class of discrete-time processes in which an internal clock (controlled by an on-off process in the bottom layer) restarts with some probability at each step, while a random walk (in the top layer of the model) takes jumps according to a probability density function dependent on the time since reset. In particular, we have shown via contraction, how to derive a large deviation rate function for the ratio observable: current divided by the number of reset steps. Significantly, the large deviation rate function so obtained is non-convex and has tails bounded by horizontal asymptotes, which can be derived analytically from fluctuation properties of the empirical mean current and the empirical mean number of reset steps. We regard the presence of these tails as a universal feature characterising ratios of observables in cases where the denominator can be arbitrary close to 0. Technically this corresponds to the ratio rate function being weak, which is a signature of the well-known fact that often ratio observables have heavy-tailed distributions. In contrast to the large deviation rate function of the efficiency studied in stochastic thermodynamics, our ratio rate function does not have a maximum. Such a maximum corresponds to a phase transition in the fluctuations of the efficiency and we assert that this is a consequence of having a denominator that can take both positive and negative values, which cannot happen in our case as the number of reset steps must be positive.
We argue that whenever the reset nature of the process is conserved, meaning that the random walk in the top layer is coupled to the bottom on-off process, the ratio large deviation rate function has the general properties described above and is differentiable. We demonstrated this for two particular models with dynamical phase transitions in the current and/or on-off processes. The ratio rate function was found to be robust in the presence of such dynamical phase transitions although, when long-range correlations are present, the convex hull of the rate function has a flat region of zeros instead of a single minimum. The boundaries of this interval represent the two typical states of the ratio surviving in the thermodynamic limit and correspond to an ergodicity breaking. Physically there is no phase coexistence here; the flat section of the rate function is merely an artifact of the Legendre-Fenchel transform.
Understanding general features of the ratio observable is potentially important for many interdisciplinary applications, e.g. molecular and nano-motors, where correlations may make it difficult to calculate the rate function analytically. In the particular context of our work here, we note that reset dynamics appear in run-and-tumble models (as used to describe bacterial motility) and such processes can manifest a change of scaling [21]. It would be interesting to see if the ratio observable is affected by this scaling change and similar kinds of phase transition [45] and, more generally, if one can obtain probabilistic bounds on the rate function. Mathematically, there are also questions related to the existence of a weak large deviation principle [11] when one allows the number of reset steps to be zero. There is much scope for future work, both theoretical and applied.
The analytic method is based on transforming the z-transform into a discrete Fourier transform and then applying well-known routines for calculating its inverse. The first step is to substitute z = e iω inG r M a (l, k, z), making the latter periodic in ω. Then we obtain a finite sample by taking ω = 2πk/M for integer k ∈ [0, M − 1], and finally calculate the inverse discrete Fourier transform. Just like the previous method, this works provided that G r M a (l, k, n) is absolutely summable, or rescaled to be such, and gives a better approximation for bigger values of M.
The plots in Fig. 3 are obtained using m = 430, and M = N = 400.

Appendix 2: Infinite-Time Correlations: M b 1
We study here a modified version of the model M b in Sect. 3.3. Here the random walk in the top layer is uncoupled from the bottom on-off process, eliminating the reset nature. Due to this change there is no need to distinguish Y n from Y n , and we will write the full top-layer process as Y 2n . Specifically, regardless of what happens in the bottom layer, the random walk takes a jump according to a Gaussian distribution of mean μ and variance σ 2 = 2. The observable N 2n in the bottom layer behaves as already seen in Sect. 3.3, whereas the observable J 2n , being independent from the resets, does not present any DPT. Its rate function is that of a Gaussian random walk characterised by symmetric fluctuations around a single typical value μ. Even though the random walk steps are i.i.d., we still expect that the rate function for the observable Ω 2n behaves similarly to that in Sect. 3.3.3. This is because the ergodicity is broken in the bottom layer, and the presence of two typical states for the observable N 2n influences the ratio J 2n /N 2n . In particular, the observable Ω 2n should also have two typical states:ω 1 = 2μ/r andω 2 = 2μ/(1 + r ).
Since the bottom on-off process and the random walk are two independent processes, the joint SCGF can be written as a sum of the SCGFs for the independent observables N 2n and J 2n , i.e. λ M b 1 (l, k) = λ M b 1 (l)+λ M b 1 (k). From the logarithmic moment generating functions, we find The joint SCGF obtained is analytic everywhere in R 2 except at l = 0. The joint large deviation rate function I M b 1 (η, j) can be numerically retrieved through Legendre-Fenchel transform, and from there by contraction we can get I M b 1 (ω). As expected, I M b 1 (ω) is robust in the tails and presents a flat region of zeros bounded by the typical valuesω 1 andω 2 , see Fig. 7a. Just like for model M b , we should not confuse this flat region, arising as a natural consequence of Legendre-Fenchel transforming the nondifferentiable joint SCGF λ M b 1 (l, k), with a coexistence phase region. Indeed, fluctuations between the two typical states marked by coloured dots in Fig. 7a still scale exponentially linearly in n, as indicated by Monte Carlo simulations in Fig. 8.
Although Fig. 7a closely resembles Fig. 5a, one particular feature of the former does not appear in the latter. Uncoupling the random walk in the top layer from the bottom on-off process leads to a genuine 'kink' appearing at ω * = 0 for any I M b 1 (ω) with μ = 0. This kink consists of a jump in the first derivative of the function I M b 1 (ω), as evident from Fig. 9 where, for each curve, the left-hand and right-hand limits of the derivative are marked with  It is clear that for ω * = 0, the equation is verified for any η ∈ [r /2, (1 + r )/2], meaning the solution set of minimizers of (17) for ω * is not singleton.
Physically this discontinuity can be considered as a 'mode-switching' phase transition in the generation of fluctuations. Moving towards ω * from the left we have that fluctuations of the ratio are realized in trajectories with few reset steps, while moving towards ω * from the right, they are realized in trajectories having many reset steps. This can also be seen in Fig. 7b where the locus of minimizers (η, ηω) shows a numerical jump (independently of the discretization used for ω). This corresponds to the linear section η ∈ r 2 , 1+r 2 . In contrast, in model M b of Sect. 3.3, correlations between the bottom process and the random walk in the top layer prevent this sudden switch and no transition of this kind happens.